Senior Platform Observability Engineer

Hybrid
Senior
🇨🇭 Switzerland
Technology

You are excited by the prospect of managing more than 20 TB of telemetry data per day, originating from a fleet of 10 000+ nodes (including linux hosts, k8s clusters, VMs) this job is for you:

Senior Platform Observability Engineer

Your Mission:

We are seeking a highly skilled and experienced Senior Platform Observability Engineer to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and efficiency of our core observability infrastructure that supports our engineering teams and customer-facing portal. Your work will include evolving these systems and participate in fostering adoption of observability best-practices in the organisation.

Observability Platform Operations

  • Configure, operate, and enhance our observability platforms and frameworks (Thanos, Loki, Tempo, OpenTelemetry Collector + custom processors).
  • Continuously improve and drive organization-wide adoption of observability best-practices, ensuring comprehensive monitoring, logging, and tracing.
  • Develop and maintain automated solutions for monitoring, alerting, and incident response.

System Optimization

  • Collaborate with engineering teams to understand their needs and provide robust, scalable solutions utilizing the observability platform.
  • Optimize system performance and ensure high availability through proactive monitoring and maintenance.
  • Develop and implement strategies for cost optimization, capacity planning, and performance tuning.

Innovation and Improvement

  • Stay up-to-date with the latest industry trends, tools, and technologies to drive continuous improvement.

  • Experiment with and implement new tools, especially around observability and telemetry, to enhance platform capabilities.

  • Evaluate and integrate OpenTelemetry Collector where beneficial to enhance telemetry data collection and analysis.

Your Qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent experience).

  • 5+ years of experience in platform engineering, site reliability engineering, or a related role.

  • Coding knowledge in Golang

  • Demonstrated experience in managing large-scale infrastructures and observability platforms (such as Thanos, Mimir, Cortex, Tempo, Loki, Clickhouse)

  • Managed observability stacks (e.g., Thanos, Mimir, Cortex, Tempo, Loki, Clickhouse) with skills in configuration, operation, and improvement.

  • Deep knowledge of Kubernetes architecture and hands-on cluster management.

  • Proficient in writing and maintaining Helm charts for Kubernetes resource management.

  • Experienced in GitOps practices, including version control and CI/CD pipelines.

  • Expertise in Docker containerization, orchestration, and optimization.

  • Skilled in Linux system administration, scripting, and automation.

You are a quick learner who easily adapts to new concepts and technologies, with strong communication skills that allow you to effectively convey complex ideas to both technical and non-technical stakeholders. You maintain a keen focus on customer needs, ensuring platform operations support both internal engineering teams and external users. Additionally, you thrive in collaborative environments, contributing to continuous improvement and innovation.

What We Offer

Want to join a team that enjoys making secure connectivity simple for our customers? You’ll be among people who believe in:

Caring PASSIONATELY about keeping our customers safe – We’re dedicated to solving problems. Whatever it takes.

Thinking UNCONVENTIONALLY to stay ahead – The world never fails to surprise us. So let’s surprise it first.

Doing the hard work to make things SIMPLE – Craft and hone something that delights in its simplicity.

Working COLLABORATIVELY to build success – The power of the team will always make us faster and better.

As a testament to this, Open Systems has been recognized as an outstanding place to work. You’ll be surrounded by smart teams who enrich your experience and provide opportunities you will need to develop your skills and advance your career.

We look forward to receiving your online application (please note that you need to compress your application into two attachments). Only direct applications will be considered.

Backed by the Service Experience Promise, Open Systems simply and cost-effectively connects and secures hybrid environments and thus ensures your organization can meet business objectives. Open Systems uniquely focuses on a superior user experience when helping organizations reduce risk, improve efficiency, and accelerate innovation. The Open Systems SASE Experience delivers on the promise of ZTNA with a comprehensive, unified and easy-to-implement and use SASE platform that combines SD-WAN and Security Service Edge delivered as a Service. We provide 24x7 operational management and engineering support from assigned engineering teams and ensure affordable and predictable costs.

Discover more at open-systems.com.

#LI-HG2

 

Open Systems AG

Open Systems AG

Open Systems delivers cybersecurity beyond expectations by partnering with organizations to boost the security performance of their digital transformations

Cybersecurity
Technology
Consulting

Other jobs at Open Systems AG

 

 

 

 

 

 

 

 

View all Open Systems AG jobs

Why OmniJobs?

  • Rare & hidden jobs
  • New jobs every day
  • No expired job posts
  • All jobs in English

Receive emails about similar jobs

Get alerts to your inbox about new open jobs that are similar to this one.

🇨🇭 Switzerland
Technology

No spam. No ads. Unsubscribe anytime.

Similar jobs