Service Reliability Engineer

RemoteMid-level
🇬🇧 United Kingdom
Site Reliability Engineer
Technology

Summary
The Service Reliability Engineer (SRE) at Project Catalyst plays a crucial role in ensuring the reliability, availability, and performance of our production systems supporting our open-source projects. Reporting to the Senior Service Reliability Engineer, this role engages closely with development teams and key stakeholders to integrate software engineering principles with systems engineering. The responsibilities include creating and maintaining tools, automations, and infrastructure code to enhance platform efficiency and resilience. Successful candidates will contribute significantly to our mission by improving service scalability and performance while fostering a culture of collaboration and continuous improvement.

Duties

  • Design, write, and deliver tools and software using Go, Python, and Bash to enhance the availability, scalability, and efficiency of our services.
  • Manage the entire lifecycle of services—from inception and design, through deployment, operation, and refinement.
  • Conduct sustainable incident response and lead blameless postmortems.
  • Participate in on-call rotations, addressing service interruptions and technical challenges promptly.
  • Collaborate with development teams to design solutions that prioritize customer experience, scalability, and performance.
  • Analyze system performance and reliability to provide enhancement recommendations.
  • Establish and maintain service-level objectives (SLOs), service-level indicators (SLIs), and error budgets.
  • Implement and advocate for Security Best Practices.

The above list of responsibilities is not an exhaustive list of duties and you will be expected to perform different tasks as necessitated by your changing role within the organisation.

Requirements

Key Competencies

  • Technical skills: Advanced proficiency in Go, Python, and Bash. Expertise in AWS, Kubernetes, and monitoring tools like Prometheus, Grafana, and Loki.
  • Business skills: Effective problem-solving capabilities, ability to conduct performance tuning, and knowledge of CI/CD processes.
  • Human skills: Exceptional communication and teamwork skills, with a strong ethic of collaboration and leadership.

Education / Experience

  • BS degree in Computer Science or related technical field, or equivalent practical experience.

  • Extensive experience in DevOps, SysAdmin, or a similar role, with a strong background in Infrastructure as Code (using Terraform and Ansible).

  • Prior experience with Rust and additional cloud providers (AWS preferred, GCP, or Azure) is advantageous. Cloud certifications are a plus.

Specialist Skills

  • Deep knowledge of Infrastructure as Code (IaC) principles.
  • Practical experience in designing and implementing cloud-based solutions.
  • Familiarity with Rust as a software development tool is a plus.

Benefits

  • Flexible schedule
  • Remote work
  • Laptop reimbursement
  • New starter package to buy hardware essentials (headphones, monitor, etc)
  • Learning & Development opportunities
  • Competitive PTO
  • Medical Benefits

At IOG, we value diversity and always treat all employees and job applicants based on merit, qualifications, competence, and talent. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

 

IO Global

IO Global

Founded in 2015, IO Global is one of the world’s pre-eminent blockchain research and engineering companies

Blockchain
Cryptocurrencies
Research and Development (R&D)
Software
Technology

LinkedIn

🏭software development
🎂2014

Other jobs at IO Global

 

 

 

 

 

 

 

 

View all IO Global jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

🇬🇧 United Kingdom
Site Reliability Engineer
Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs