Service Reliability Engineer

IO Global

RemoteMid-level

🇬🇧 United Kingdom

Site Reliability Engineer

Technology

Ansible AWS Bash CI/CD Cloud Go Grafana Kubernetes Loki Prometheus Python Rust Terraform

🔥 Apply now

Summary
The Service Reliability Engineer (SRE) at Project Catalyst plays a crucial role in ensuring the reliability, availability, and performance of our production systems supporting our open-source projects. Reporting to the Senior Service Reliability Engineer, this role engages closely with development teams and key stakeholders to integrate software engineering principles with systems engineering. The responsibilities include creating and maintaining tools, automations, and infrastructure code to enhance platform efficiency and resilience. Successful candidates will contribute significantly to our mission by improving service scalability and performance while fostering a culture of collaboration and continuous improvement.

Duties

Design, write, and deliver tools and software using Go, Python, and Bash to enhance the availability, scalability, and efficiency of our services.
Manage the entire lifecycle of services—from inception and design, through deployment, operation, and refinement.
Conduct sustainable incident response and lead blameless postmortems.
Participate in on-call rotations, addressing service interruptions and technical challenges promptly.
Collaborate with development teams to design solutions that prioritize customer experience, scalability, and performance.
Analyze system performance and reliability to provide enhancement recommendations.
Establish and maintain service-level objectives (SLOs), service-level indicators (SLIs), and error budgets.
Implement and advocate for Security Best Practices.

The above list of responsibilities is not an exhaustive list of duties and you will be expected to perform different tasks as necessitated by your changing role within the organisation.

Requirements

Key Competencies

Technical skills: Advanced proficiency in Go, Python, and Bash. Expertise in AWS, Kubernetes, and monitoring tools like Prometheus, Grafana, and Loki.
Business skills: Effective problem-solving capabilities, ability to conduct performance tuning, and knowledge of CI/CD processes.
Human skills: Exceptional communication and teamwork skills, with a strong ethic of collaboration and leadership.

Education / Experience

BS degree in Computer Science or related technical field, or equivalent practical experience.
Extensive experience in DevOps, SysAdmin, or a similar role, with a strong background in Infrastructure as Code (using Terraform and Ansible).
Prior experience with Rust and additional cloud providers (AWS preferred, GCP, or Azure) is advantageous. Cloud certifications are a plus.

Specialist Skills

Deep knowledge of Infrastructure as Code (IaC) principles.
Practical experience in designing and implementing cloud-based solutions.
Familiarity with Rust as a software development tool is a plus.

Benefits

Flexible schedule
Remote work
Laptop reimbursement
New starter package to buy hardware essentials (headphones, monitor, etc)
Learning & Development opportunities
Competitive PTO
Medical Benefits

At IOG, we value diversity and always treat all employees and job applicants based on merit, qualifications, competence, and talent. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

🔥 Apply now

IO Global

Founded in 2015, IO Global is one of the world’s pre-eminent blockchain research and engineering companies

Blockchain

Cryptocurrencies

Research and Development (R&D)

Software

Technology

iohk.io

🏭software development

🎂2014

Other jobs at IO Global

Remote🇬🇧

Financial Analyst

Remote🇬🇧

Business Development Manager - Midnight

Remote🇬🇧

Head of Architecture

Remote🇬🇧

Senior Software Engineer

Remote🇬🇧

Senior Formal Methods Engineer

View all IO Global jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

🇬🇧 United Kingdom

Site Reliability Engineer

Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs

Remote🇬🇧Added 14 days ago

Site Reliability Engineer

Employment Hero is one of Australia’s fastest-growing SaaS companies, providing a cloud-based platform to help small and medium-sized businesses manage their HR, payroll, recruitment, and employee benefits (human resources services)

C#.NETAWSPulumiTerraformCI/CDGitOps workflowsSaaSCloud

Remote🇬🇧🏖️👶💰Added a month ago

Site Reliability Engineer

Neon is an open-source company building a cloud-native Postgres database service.

AzureAWSTerraformGrafana CloudVictoriaMetricsFluxEKSAKSPagerdutyPostgreSQL + 3

Remote🇬🇧💰Added a month ago

Senior Site Reliability Engineer

Axon is a tech company on a mission to protect life and improve safety and justice issues through innovative devices and software. (public safety)

AzureAWSGCPKubernetesPythonGolangScalaCloudCI/CD

Remote🇬🇧🏖️👶💰Added a month ago

Senior Site Reliability Engineer

Paddle offers SaaS companies a completely different approach to their payment infrastructure. (technology, information and internet)

DockerAWS ECS FargateAWS EKSAWS SQSAWS EventbridgeGoAurora MySQLPostgreSQLRedisTerraform + 9

Remote🇬🇧💰Added a month ago

Senior Site Reliability Engineer

Axon is a tech company on a mission to protect life and improve safety and justice issues through innovative devices and software. (public safety)

LinuxAzureAWSKubernetesDockerPythongoBashGitHubArgoCD + 3

Service Reliability Engineer

Requirements

Benefits

IO Global

LinkedIn

Other jobs at IO Global

Notifications about similar jobs

Similar jobs