Senior Site Reliability Engineer

RemoteSenior
🇩🇪 Germany
Site Reliability Engineer
Technology

We're looking for a Senior Site Reliability Engineer to join our Infrastructure team. This Engineer will enable our developers as they work efficiently while building a vibrant ecosystem for the Avalanche Blockchain. You'll enable our teams across several business units and engineering teams to design, optimize, and and implement greenfield technology for a variety of use cases. This particular role will be a key part of our release schedule and production monitoring.

WHAT YOU WILL DO

  • Develop and optimize highly reliable and scalable infrastructure focused on SRE principles.
  • Implement and maintain monitoring, logging, and tracing tools to gain insights into service behavior and health.
  • Uphold SLOs (Service Level Objectives), SLIs (Service Level Indicators), and error budgets for critical systems.
  • Enhance the reliability and resiliency of critical systems by identifying single points of failure and implementing best practices.
  • Collaborate with software developers to build reliability and performance into applications from inception.
  • Automate and streamline incident management processes to minimize service disruption and improve response times.
  • Participate in on-call rotations, ensuring quick restoration of services and fostering a blameless post-mortem culture.
  • Foster a continuous improvement mindset by analyzing and learning from incidents and implementing preventive measures.
  • Leverage cloud technologies and IaC tools to ensure scalability and repeatability.
  • Advocate for best practices in reliability, security, and maintainability within the team.

WHAT YOU WILL BRING

  • BS in Computer Science or related field.
  • 5+ years of experience as an SRE, DevOps, or Cloud Engineer.
  • Strong grasp of SRE principles, including error budgets, SLOs, and SLIs.
  • Cloud networking and orchestration with AWS (EKS, ECS, VPC, S3, ELB).
  • Strong Kubernetes experience with Docker or RKT containerization.
  • Proficiency in Infrastructure as Code (IaC) using tools such as Terraform, Terragrunt, and Ansible.
  • Experience with monitoring and observability tools like Prometheus, Grafana, or ELK Stack.
  • Building and maintaining CI/CD pipelines with GitHub Actions (preferred), Jenkins, Travis CI, Circle CI.
  • Experience with automation and configuration management using Ansible, Puppet or Chef.
  • Experience with Linux-based infrastructures. (Ubuntu preferred).
  • Experience with scripting languages and the creation of scripts. (Python and GoLang preferred).
  • Working knowledge of decentralized architecture design patterns and distributed systems.

**MUST BE LOCATED IN EUROPE**

#LI-Remote #LI-DS1

 

Ava Labs

Ava Labs

A world-class blockchain development team redefining the way people build permissionless networks and create value with Web3.

Blockchain
Technology

LinkedIn

🏭software development
🎂2018

Other jobs at Ava Labs

 

 

 

 

 

 

 

 

View all Ava Labs jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

🇩🇪 Germany
Site Reliability Engineer
Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs