Staff Site Reliability Engineer

Hybrid

Senior

🇮🇳 India

Site Reliability Engineer

Technology

What you get to do in this role:

For Early Engagement – Thoroughly assess new services and products, ensuring a seamless transition to cloud production while upholding the appropriate reliability standards to meet our Service Level Objectives (SLOs) and Service Level Agreements (SLAs).
For Early Engagement – Contribute to the development of systems that provide real-time insights into their operations while actively running.
For RCA/PRB - work with engineering stakeholders on Root Cause Analysis and critical problems.
For Cloud Improvements - Use knowledge and experience in software development, application support, systems engineering and networking to drive Cloud improvements.

To be successful in this role you have:

Knowledge of Linux systems.
Comfortable designing, authoring, testing, and debugging code in a team setting in one of the following languages: Python, Go, JavaScript, or Ruby.
Experience working with Relational Database: MySQL, MariaDB or PostgresSQL.
Proficient in managing large-scale systems, with strong focus on automating processes, enhancing observability, ensuring high availability, and optimising performance for critical services.
Expertise in Observability and Monitoring of applications, services, and networks.
Experience with DevOps automation, CI/CD pipeline and agile methodologies such as Gitlab CI-CD.
Experience working with Cloud technologies such as Azure and AWS.
Experience in configuration management of infrastructure using Ansible.
Experience with Kubernetes to orchestrate the deployment, scaling, and management of containers.
Knowledge of core AI/ML techniques and algorithms.
Familiar with implementing Chaos engineering principles.
Experience in incident response process, post-mortem practices, or service best practice standards.
Review PCRs and suggest (or develop) additional measures for prevention.
Self-motivation to find how things are architecture (working) without prior knowledge or with limited prior knowledge (code deep dive, configuration deep dive).
Ability to build/develop ad-hoc tools/scripts to work around issues while waiting for upstream fixes.
ServiceNow platform knowledge is added advantage.

ServiceNow

At ServiceNow, our technology makes the world work for everyone, and our people make it possible

Artificial Intelligence

Software

🏭software development

🇨🇦

AI Red Team Engineer

🇺🇸💰

ITSM Process Engineer

🇺🇸💰

Technical Product Owner

🇺🇸💰

Staff Technical Product Manager

🇺🇸💰

Digital Data Analytics Manager

Get notifications to your inbox about new jobs that are similar to this one.

🇮🇳 India

Site Reliability Engineer

No spam. No ads. Unsubscribe anytime.