- Design, build, and maintain our infrastructure and tools to allow for the highly reliable and scalable deployment of services and applications, incorporating both cloud-based and on-premise solutions
- Implement comprehensive monitoring and observability frameworks to detect and resolve issues proactively, using tools like Prometheus, Grafana, and Zabbix for system health and performance metrics
- Develop and manage incident response protocols, including on-call rotations, incident analysis, and conducting postmortems to ensure continuous improvement in system reliability and performance
- Automate infrastructure and workflows using Infrastructure as Code (IaC) tools like Ansible
- Optimize system performance through regular performance tuning, capacity planning, and conducting reliability experiments to identify and mitigate potential points of failure
- Collaborate with development teams to advocate for reliability and scalable practices throughout the software development life cycle, and assist in the design and review of new systems and major changes
- 5+ years of experience in IT with a focus on system administration and automation
- Expertise in Linux system administration and in using Infrastructure-as-Code tools like Ansible
- Strong knowledge of scripting and programming in Bash and Python
- Experience with containerization technologies (Docker) and orchestration tools (e.g., Docker Swarm or Kubernetes)
- Experience of running demanding Java applications in production with an understanding of the JVM and Java memory management
- Work experience in the data center, such as cabling, server racking, up to and including data center design
- Strong analytical and problem-solving skills with experience in troubleshooting complex issues triggered and supported by monitoring tools
- Effective communication and collaboration abilities, essential for working across teams and with stakeholders
- Fluent in English and German
- Scale-up company with a market-leading product
- Open culture with diverse international teams
- Flexible working hours
- State-of-the-art equipment
- Personal development support, e.g. access to the learning platform Udemy
- Regular feedback rounds Remote within Germany
ย
![FactFinder](https://assets.cdn.personio.de/logos/72772/social/fee2984dc04ecb120ede9204411a004e.png)
FactFinder
A company revolutionizing product discovery tools for people worldwide.
Other jobs at FactFinder
ย
ย
ย
ย
ย
ย
ย
ย
Notifications about similar jobs
Get notifications to your inbox about new jobs that are similar to this one.
No spam. No ads. Unsubscribe anytime.
Similar jobs
ย
ย
ย
ย
ย
ย
ย
ย