Senior Site Reliability Engineer

RemoteSenior
🇩🇪 Germany
Site Reliability Engineer
Technology

The HiveMQ Cloud Organization builds and operates our multi cloud fully Managed Service Offering including our free serverless and dedicated enterprise plans.

We are looking for an experienced Site Reliability Engineer for our Cloud Operations team.

You will…

  • Ensure the HiveMQ Cloud platform and related services customers rely on are always highly available, reliable, and scalable.
  • Run our AWS, GCP, and Azure global infrastructure with Helm, Terraform, Kubernetes, and other industry-standard tools.
  • Employ modernized software delivery methods such as infrastructure as code, distributed containerized service deployments, and self-healing fully managed SaaS services to automate the deployment and maintenance of customer-facing products and internal systems like observability and monitoring
  • Plan, implement, and maintain our infrastructure to meet current or estimated demand while ensuring efficient use of cloud resources and related costs.
  • Work on application monitoring, infrastructure change management, platform incident management, response, and post-incident reviews.
  • Help debug production issues across services and levels of the stack and improve our products and services.
  • Operate tools that power our observability, monitoring, and on-call systems.
  • Help define Service Level Objectives and means to measure, automate remediations, and alert on them.
  • Be on call.
  • Live a culture of teamwork, quality, growth, drive to action, and excellence.
  • Contribute to the overall platform and engineering vision of HiveMQ.

You have…

  • Experience operating at scale Cloud (SaaS, IaaS or PaaS) products and services in a Cloud environment with high degrees of automation.
  • Proven experience in building and operating applications at production-quality in the cloud with Cloud native technologies like Kubernetes, Docker, Terraform, Helm, CI/CD and other IaC tools.
  • The ability to methodically diagnose systems, networking and application issues in on-call operation.
  • Experience operating with at least one of the major 3 Cloud providers (Amazon Web Services, Microsoft Azure, Google Cloud Platform).
  • Strong Experience with metrics and monitoring solutions such as Grafana, Prometheus, Loki, Mimir or similar (like ElasticSearch, Kibana).
  • High standards on building platform and infrastructure setups with automation, modular reusable infrastructure as code, GitOps, Test- and CI/CD-driven.
  • The ability to solve problems independently and are driven towards execution.
  • A systematic but pragmatic approach paired with a high sense of ownership and take pride in the work you accomplish as a team.
  • A good understanding of how agile platform engineering using Kanban in a self-organized team works.
  • Excellent English communication skills and able to work in a collaborative team environment.

 

HiveMQ

HiveMQ empowers businesses to transform with the most trusted MQTT platform

Software
Technology

LinkedIn

🏭software development
🎂2012

Other jobs at HiveMQ

 

 

 

 

 

 

 

 

View all HiveMQ jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

🇩🇪 Germany
Site Reliability Engineer
Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs