Site Reliability Engineer

RemoteMid-level

🗺️ Anywhere in world

👶Paid parental leave

Technology

Cloud Crossplane Docker Elastic Elastic Cloud Elastic Stack Enterprise Graphite Influx Kubernetes Linux Open Source Prometheus public cloud REST SaaS Serverless Terraform

🔥 Apply now

As part of the Platform Engineering department, the SRE team is designing, building, scaling and maintaining the multi-cloud platform for hosting internal and external services such as the Elastic Cloud Hosted and Serverless. This includes developing new software and tools that themselves support the rest of the infrastructure, so that we can rapidly deploy products from all corners of the organization. We need help in this journey to offer a truly exceptional customer experience. This is where you come in!

What you will be doing:

Lead technical initiatives aimed at improving the reliability of the global Elastic infrastructure, taking an engineering approach to the prevention, detection, and timely mitigation of issues.
Contribute to SRE engineering through auto-remediation and system engineering efforts to continue our efforts in reducing human intervention in automation of processes and operational tasks.
Developing and maintaining software, tooling and automations to support the ever growing scaling demands of this global infrastructure.
Champion an environment focused on collaboration, operational excellence, and uplifting others.
Respond to major incidents, correcting and improving systems to prevent incidents and grow at scale. Participate in a weekly on-call rotation, using a follow-the-sun model.

What you bring along:

A well-rounded view of and true appreciation for reliability, borne of real-world experience operating production services. You have examples of using software engineering practices and SRE principles to solve operational problems.
A background in software engineering, and can confidently collaborate with engineers to identify and resolve issues. Ideally with experience in public cloud and managed Kubernetes services
Outstanding interpersonal skills, and are able to build strong relationships with your inclusive communication methods. Examples of working in distributed teams or working remotely is desirable.

Bonus Points:

You don't need to have all of these items, but these represent the types of work you will do as a Site Reliability Engineer at Elastic.

You have operated a SaaS product in a public cloud ideally built using Infrastructure-as-Code tooling such as Crossplane or Terraform
You have built or managed a Kubernetes-at-scale infrastructure, ideally across multiple cloud providers, and the vital automation to support it.
You have written non-trivial programs in Go
You have worked with containerized services (such as Docker.)
You have experience in system administration with professional skills in Linux on distributed systems at scale.
You have designed, implemented or diagnosed and resolved issues with the Elastic Stack.
You have demonstrable experience in leading and improving alerting and major incident management standard processes metrics systems (e.g. Elastic Stack, Graphite, Prometheus, Influx) to diagnose issues and quantify impacts to share with others at varying level of the organization.
You are experienced in contributing in a self-organizing and collaborative team environment.
You have mentored, coached, and grown team members to bring out the best in them.

🔥 Apply now

Referral Board

Elastic is an open source search company that powers enterprise search, observability, and security solutions built on one technology stack that can be deployed anywhere.

Technology

Other jobs at Referral Board

Remote🇬🇧👶

Senior Software Engineer - Vector Search

Remote🇬🇧👶

Principal Software Engineer II - Analytical Engine

Remote🇺🇸👶

Senior Software Engineer - Vector Search

Remote🇪🇸👶

Senior Software Engineer

🇪🇸👶

Manager, Software Engineering - Core Search

View all Referral Board jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

🗺️ Anywhere in world

Site Reliability Engineer

Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs

RemoteContract🗺️Added 14 days ago

Senior Site Reliability Engineer

Yassir - Leading super App for on-demand, ride-hailing, last-mile delivery, payment services and more, operating in 45 cities across multiple countries.(it services and it consulting)

TypeScriptNodeGCPCloudKubernetesAPI-firstAPI managementAPICI/CDGoogle Cloud

RemoteContract🗺️Added 14 days ago

Senior Site Reliability Engineer

Yassir - Leading super App for on-demand, ride-hailing, last-mile delivery, payment services and more, operating in 45 cities across multiple countries.(it services and it consulting)

GolangNodeAPIGCPCloudKubernetesK8sCI/CDGoogle Cloud

RemoteContract🗺️Added 4 months ago

Senior Site Reliability Engineer

Yassir - Leading super App for on-demand, ride-hailing, last-mile delivery, payment services and more, operating in 45 cities across multiple countries.(it services and it consulting)

TypeScriptNodeGCPKubernetesAPICloudCI/CDGoogle Cloud

Remote🗺️Added 6 months ago

Site Reliability Engineer

Sporty Group - Leading B2B technology and operations service provider based in Mumbai, India, providing expertise to the global online and mobile gaming industry.(spectator sports)

AWSPythonJavaHelmRancherArgoCDPrometheusJaegerLokiELK + 10

Remote🗺️Added 5 months ago

Site Reliability Engineer

Sporty Group - Leading B2B technology and operations service provider based in Mumbai, India, providing expertise to the global online and mobile gaming industry.(spectator sports)

AWSPythonJavaHelmRancherArgoCDPrometheusJaegerLokiELK + 13