Senior Chief Engineer SRE

Senior

🇮🇳 India

Site Reliability Engineer

Technology

Ansible AWS CI/CD Cloud Grafana Influx DB Jenkins Kubernetes Mysql Nosql Prometheus Python Shell Spinnaker Terraform Time series DB TSDB

🔥 Apply now

Position Summary

Site Reliability Engineer .
Site reliability engineers will be dedicated full-time to creating software that improves the reliability of systems in production, fixing issues, responding to incidents and usually taking on-call responsibilities. Operate system efficiently and systematically through continuous monitoring and improvement, system/service operation automation and process application.

Building software to help operations and support teams
SRE teams are in charge of proactively building and implementing services to make IT and support better at their jobs. This can be anything from adjustments to monitoring and alerting to code changes in production. A site reliability engineer can be tasked with building a homegrown tool from scratch to help with weaknesses in software delivery or incident management.

Role and Responsibilities

Fixing support escalation issues
Similarly to the point above, a site reliability engineer can expect to spend time fixing support escalation cases. But, as your SRE operations mature, your systems will become more reliable and you’ll see fewer critical incidents in production – leading to fewer support escalations. Because an SRE team touches so many different parts of the engineering and IT organization, they can be a great source of knowledge and can be helpful for routing issues to the right people and teams.

Optimizing on-call rotations and processes
More times than not, site reliability engineers will need to take on-call responsibilities. At most organizations, the SRE role will have a lot of say in how the team can improve system reliability through the optimization of on-call processes. SRE teams will help add automation and context to alerts – leading to better real-time collaborative response from on-call responders. Additionally, site reliability engineers can update runbooks, tools and documentation to help prepare on-call teams for future incidents.

Documenting “tribal” knowledge
SRE teams gain exposure to systems in both staging and production, as well as all technical teams. They take part in work with software development, support, IT operations and on-call duties – meaning they build up a great amount of historical knowledge over time. Instead of siloing this knowledge into the mind of one team or one person, site reliability engineers can be tasked with documenting much of what they know. Constant upkeep of documentation and runbooks can ensure that teams get the information they need right when they need it.

Conducting post-incident reviews
Without thorough post-incident reviews, you have no way to identify what’s working and what’s not. SRE teams need to keep teams honest and ensure that everyone – software developers and IT professionals – are conducting post-incident reviews, documenting their findings and taking action on their learnings. Then, site reliability engineers are often tasked with action items for building or optimizing some part of the SDLC or incident lifecycle to bolster the reliability of their service.

Skills and Qualifications

Primary Skill sets, 5-10 years
• Public Cloud - AWS, Kubernetes

• Scripting- Shell, Terraform, Ansible, Python, Jenkins, Spinnaker, CI/CD

• Knowledge and understanding of install, configure and manage the public cloud infrastructure on AWS, GCP using Terraform and ansible

• Operate system efficiently and systematically through continuous monitoring improvement, system/service operation automation and process application.

• Experienced professional with full understanding on specialized areas; resolves a wide range of issues in creative ways

• Works on problems of diverse scope where analyzing data requires evaluating identifiable factors. Demonstrates good judgement in selecting methods and techniques for obtaining solutions

• Normally receives little instruction on day-to-day work and receives general instructions on new assignments

• Perform to monitor server application and infrastructure for 24 hours every day and handle faults.

• Perform system operation automation of service for cost-effectiveness.

• Typically requires minimum 10 years' of related experience and a Bachelor's degree, or 3 years and a Master's degree;

• Good English command proficiency

Secondary - Monitoring using Grafana, Prometheus, Influx DB, TSDB(2-4 years)
Desired: Mysql, Nosql, Time series DB

* Please visit Samsung membership to see Privacy Policy, which defaults according to your location. You can change Country/Language at the bottom of the page. If you are European Economic Resident, please click here.

🔥 Apply now

Samsung Electronics

A tech leader in mobile technologies, consumer electronics, home appliances, and enterprise solutions.

Technology

Consumer Goods

Electronics

Home Goods

samsung.com

Other jobs at Samsung Electronics

Contract🇬🇧

Senior Strategic Business Development Manager

🇩🇪

Technical Account Manager

🇨🇦🇺🇸

Director Digital Health Portfolio

🇳🇱💰

Marketing Data Trainee

🇷🇴

Recruitment Operations Team Lead

View all Samsung Electronics jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

🇮🇳 India

Site Reliability Engineer

No spam. No ads. Unsubscribe anytime.

Similar jobs

🇮🇳Added 11h ago

Senior Site Reliability Engineer

At ServiceNow, our technology makes the world work for everyone, and our people make it possible (software development)

Linux systemsPythonC/C++JavascriptNetworking

🇮🇳Added 12h ago

Site Reliability Engineer

Dentsu International is a modern marketing solutions company. (Marketing & Advertising)

🇮🇳Added 16h ago

Senior Site Reliability Engineer

Truecaller - A global communication company focused on building trustworthy conversations

LinuxDockerKubernetesGCPGoApache CassandraScyllaDBMySQLPostgreSQLRedis + 4

🇮🇳Added 3 days ago

Site Reliability Engineer

Innovative company working on exciting missions in the world, focused on building and enhancing QualysGuard scanner services platform.

Qualys Cloud PlatformMiddleware technologiesCloud

🇮🇳Added 2 days ago

Senior Site Reliability Engineer II

Elsevier Inc. Company - A global leader in information and analytics, helping researchers and healthcare professionals advance science and improve health outcomes for the benefit of society.

AWSAzureLinuxRHELCENTOSPythonAnsibleTerraformGitlab-CIJenkins + 3

🇮🇳Added 4 days ago

Sr. Site Reliability Engineering - Hadoop

Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities. (legal services)

HadoopHiveSparkHDFSAirflowOoziePythonZookeeperYarn

Senior Chief Engineer SRE

Samsung Electronics

LinkedIn

Other jobs at Samsung Electronics

Notifications about similar jobs

Similar jobs