Senior Kubernetes Operations Engineer

RemoteSenior
🇪🇺 Europe
💰Equity

What You’ll Do

  • Remotely install, upgrade, operate and maintain bare-metal Kubernetes clusters (up to thousands of nodes each)
  • Handle cluster degradation, recovery and resizing using our fleet management tooling
  • Perform out-of-hours on-call response for critical incidents as part of a well-balanced on-call rotation
  • Work on improving our tooling, automation, and processes, for both daily operations, alerting, and incident response
  • Dive into systems at a low level to solve unique cluster problems and write up your findings
  • Assist customers with high-level Kubernetes questions and integration with applications, storage and authentication
  • Assist with initial cluster build-outs and validation to help identify failed hardware before customer delivery
  • Work closely with our HPC Ops and Datacenter Ops teams on issues that require lower-level expertise or cross-functional solutions
  • Mentor and assist less-experienced team members
  • Have a voice in our product direction and help us think about how to minimize operational costs and complexity

You

  • Are an experienced operations engineer, SRE, sysadmin or similar with a deep knowledge of running Linux clusters and systems
  • Are very familiar with running on bare-metal (including knowledge of BMCs, kernel drivers, PXE, RAID, VLANs, hypervisors)
  • Have a good understanding of containers, virtualisation, and the mechanisms underpinning them
  • Have a good understanding of daily operation, bug-fixing and maintenance of Kubernetes
  • Have experience in an on-call environment and with incident response
  • Can perform incident post-mortems and develop procedures and tooling to prevent root causes from reoccurring
  • Have an excellent ability to learn on-the-fly and adapt to solve problems
  • Are able to work either independently with limited direction, or as part of a team
  • Are able to work with customers during incidents either via tickets, live messaging, or as part of a larger call.

Nice to Have

  • Deep Kubernetes experience
  • Experience with user-level restrictions and hardening (e.g. AppArmor)
  • Experience with network engineering
  • Experience with HPC clusters, environments & tooling
  • Experience with large-scale AI/ML training clusters
  • Experience with machine learning/AI frameworks
  • A passion for running your own bare-metal lab

Salary Range Information

Based on market data and other factors, the salary range for this position is approximately €157,170 - €225,990. However, a salary higher or lower than this range may be appropriate for a candidate whose qualifications differ meaningfully from those listed in the job description.

 

Lambda

Lambda

Lambda is a leading provider of GPU cloud and on-prem systems for deep learning and AI research and engineering.

Artificial Intelligence
Cloud Computing
Technology

LinkedIn

🏭software development
🎂2012

Other jobs at Lambda

 

 

 

 

 

 

 

 

View all Lambda jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

🇪🇺 Europe
"Senior Kubernetes Operations Engineer"
Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs