Site Reliability Engineer Technical Lead

Nethermind

RemoteSenior

🇪🇺 Europe

Site Reliability Engineer

Technology

ArgoCD AWS CI/CD Cloud Enterprise GCP Git GitHub Actions Go Grafana Kubernetes Loki Prometheus Python Web3

🔥 Apply now

What are we all about?

We are a team of builders and researchers on a mission to empower enterprises and developers worldwide to access and build on decentralized systems.

Our expertise covers several domains: Ethereum and Starknet protocol engineering, layer-2, cryptography research, protocol research, decentralized finance (DeFi), security auditing, formal verification, real-time monitoring, smart contract development, and dapps and enterprise engineering.

Working to solve some of the most challenging problems in the blockchain space, we frequently collaborate with renowned companies, such as Ethereum Foundation, Starknet Foundation, Gnosis Chain, Flashbots, Forta Protocol, Lido, EigenLayer, Open Zeppelin, RISCZero, Aleph Zero, and many more.

Today, we are a 350+ strong team working remotely across 66+ countries.

View all our open positions here: https://www.nethermind.io/open-roles

Are you the one? We're seeking an experienced Site Reliability Engineer to lead and mentor our SRE team. You're a seasoned professional with a proven track record in designing and implementing robust SRE processes at scale. You excel in cloud and hybrid environments, have a deep understanding of containerization, and are passionate about creating resilient, high-performance systems that can handle extreme traffic peaks. Beyond technical expertise, you're a skilled communicator and collaborator, able to bridge the gap between technical teams and stakeholders. You thrive in cross-functional environments and can effectively represent SRE concerns at the leadership level.

Responsibilities:

Lead the implementation and refinement of SRE practices across the organization, including SLOs, error budgets, and blameless postmortems
Design and implement automation to eliminate toil and improve system reliability and efficiency
Lead initiatives and architect scalable hybrid cloud solutions for Web3 infrastructure
Manage error budgets and make data-driven decisions about when to prioritize reliability vs. new features
Drive SRE practices to ensure high availability, performance, and reliability under varying load conditions
Collaborate closely with Platform engineering team to build reliability into services from the ground up
Collaborate closely with Nethermind’s Infrastructure Leadership department to align SRE strategies with overall technical vision
Drive the adoption of observability best practices and implement comprehensive monitoring systems
Develop and maintain service level indicators (SLIs) and objectives (SLOs), working with product owners to define appropriate reliability targets
Mentor team members in SRE practices and foster a culture of continuous learning
Lead capacity planning efforts, using quantitative analysis to predict and address future scaling challenges
Contribute to long-term technical roadmaps, balancing reliability concerns with product innovation

Skills:

5+ years of experience in Site Reliability Engineering or DevOps
Expert knowledge of cloud platforms (AWS, GCP)
Expert knowledge of Kubernetes
Proven experience in designing and implementing scalable, efficient, resilient systems
Deep understanding of Linux/Unix systems and networking protocols
Strong programming skills in Python or Go
Strong background in monitoring, observability, and logging systems (e.g., Grafana, Prometheus, Loki)
Expertise in CI/CD tools (e.g. GitHub Actions, ArgoCD)
Excellent communication skills, both written and verbal, with the ability to explain complex technical concepts to various audiences
Experience in producing technical documentation, runbooks, presentations, and post-mortem reports
Experience and passion for mentoring and upskilling team members

Nice to have:

Experience leading technical teams
Contributions to open-source projects or thought leadership in SRE
Familiarity with MLOps and big data technologies
Knowledge of blockchain technology and infrastructure
Experience with chaos engineering principles and tools
Familiarity with traffic management and CDN technologies
Systems or backend engineering background

Disclaimer: I hereby consent to my personal information being stored and processed by Demerzel Solutions Limited (t/a Nethermind) (the “Company”) for recruitment purposes in relation to both the selected job role and any other role the Company considers me a qualified candidate for. All data storing and processing by the Company takes place in accordance with the UK GDPR. Kindly refer to our privacy policy for more details.
Your consent to share personal information is entirely voluntary, and you may withdraw your consent at any time. Should you have any questions about this process, or wish to withdraw your consent please contact: legalnotices@nethermind.io

Keep up to date on what we are working on by following us on our social channels

Click here to view our Privacy Policy.

🔥 Apply now

Nethermind

A team of builders and researchers on a mission to empower enterprises and developers worldwide to access and build on decentralized systems.

Blockchain

Research and Development (R&D)

Technology

🌍 nethermind.io All open jobs

🌍 linkedin.com

🏭it services and it consulting

🎂2017

Other jobs at Nethermind

Remote🦅💃🇪🇺🕌🐨🌍🌏🏯🌅

Distributed Systems Engineer

Remote🦅💃🇪🇺🕌🐨🌍🌏🏯🌅

Distributed Systems Engineer

Remote🦅💃🇪🇺🕌🐨🌍🌏🏯🌅

Cairo Developer

Remote🦅💃🇪🇺🕌🐨🌍🌏🏯🌅

Cairo Developer

Remote🦅💃🇪🇺🕌🐨🌍🌏🏯🌅

Blockchain Engineer

View all Nethermind jobs

Why OmniJobs?

Rare & hidden jobs
New jobs every day
No expired job posts
All jobs in English

Receive emails about similar jobs

Get alerts to your inbox about new open jobs that are similar to this one.

🇪🇺 Europe

Site Reliability Engineer

Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs

Remote🇪🇸📚Added 4 days ago

Senior Site Reliability Engineer

Landbot is a leading No-Code Chatbot Builder operating in over 40 countries.

KubernetesGCPAWSAzurePythonGoTerraformHelmObservabilityOpenTelemetry + 4

Remote🇺🇦🇦🇷🇧🇷🇲🇽🇵🇱Added 4 days ago

Site Reliability Engineer

Solvd Inc. is a premier software engineering company with over 12 years of experience and a global presence.

GoTypeScriptKubernetesDockerContainerdAWSAzureGCPCloudARM Templates + 10

Remote🇬🇧💰👶Added 6 days ago

Senior Site Reliability Engineer

Cutover is a SaaS company that simplifies complexity, streamlines work, and increases visibility in IT disaster and cyber recovery, cloud migration, release management, and technology implementation.

ReactRuby on RailsKubernetesTerraformGitAnsibleDatadogSaaSCloudEnterprise + 2

Remote🇸🇪💰Added 6 days ago

Site Reliability Engineer

Neo4j (software development)

PrometheusGrafanaDatadogGoogle LoggingKubernetesGoKustomizeTerraformPythonEnvoy + 7

Remote🇬🇧💰👶Added 6 days ago

Senior Site Reliability Engineer

GoDaddy - Empowering everyday entrepreneurs around the world by providing the help and tools to succeed online.

OpenStackKubernetesDockerAnsiblePuppetLinuxPythonCloudCI/CDAgile + 1

Remote🇧🇬💰👶Added 6 days ago

Site Reliability Engineer

GoDaddy - Empowering everyday entrepreneurs around the world by providing the help and tools to succeed online.

LinuxAnsiblePuppetSaltbashPythonGolangMySQLPostgreSQLCassandra

Site Reliability Engineer Technical Lead

What are we all about?

Nethermind

LinkedIn

Other jobs at Nethermind

Why OmniJobs?

Receive emails about similar jobs

Similar jobs