Senior Site Reliability Engineer

NVIDIA

RemoteSenior

🇺🇸 United States

🇨🇦 Canada

💰Equity

Site Reliability Engineer

Technology

Site Reliability Engineering (SRE) at NVIDIA is an engineering discipline to design, build and maintain large scale production systems with high efficiency and availability using the combination of software and systems engineering practices. This is a highly specialized discipline which demand knowledge across different systems, networking, coding, database, capacity management, continuous delivery and deployment and open source cloud enabling technologies like Kubernetes and OpenStack. SRE at NVIDIA ensures that our internal and external facing GPU cloud services run maximum reliability and uptime as promised to the users and at the same time enabling developers to make changes to the existing system through careful preparation and planning while keeping an eye on capacity, latency and performance. SRE is also a mindset and a set of engineering approaches to running better production systems and optimizations. Much of our software development focuses on eliminating manual work through automation, performance tuning and growing efficiency of production systems.

As SREs are responsible for the big picture of how our systems relate to each other, we use a breadth of tools and approa...

NVIDIA

NVIDIA is a technology company that specializes in AI computing, with a focus on Deep Learning GPUs

Artificial Intelligence

Technology

nvidia.com

Other jobs at NVIDIA

🇮🇱

Senior Hardware Program Manager

🇮🇱

Senior Customer Project Manager

🇨🇳

Social & Influencer Marketing Lead

🇮🇱

Senior Mechanical Engineer

🇮🇱

Senior High-Speed Optical Transceiver Practical Engineer

View all NVIDIA jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

🇺🇸 United States

🇨🇦 Canada

Site Reliability Engineer

Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs

Remote🇨🇦Added 8h ago

Site Reliability Engineer

Improving healthcare through innovative technology is at the core of Intelerad’s work

LinuxCentOSPostgres DatabaseAWS cloudPythonGoJavaC/C++VMware EnterpriseWindows Server + 5

Remote🇺🇸💰Added 4 days ago

Senior Platform Operations Engineer

NBCUniversal - We create world-class content, which we distribute across our portfolio of film, television, and streaming, and bring to life through our theme parks and consumer experiences. (entertainment providers)

AWSEC2ECSRDSIAMKubernetesCI/CDTravisCIJenkinsGitlab CI + 9

Remote🇺🇸Added 6 days ago

Site Reliability Engineer

PayNearMe develops award-winning technology to facilitate the end-to-end customer payment experience, making it easy for businesses to manage and accept payments.

TerraformKubernetesDockerDatadogPythonBashGoGitLab CIAWSGCP + 14

Remote🇺🇸🇨🇦💰👶Added 6 days ago

Engineer II - Site Reliability

CrowdStrike - A fast-growing security company that protects our wide range of customers from cybersecurity attacks.

LinuxC++JavaPythonGoSANNASNFSObject StorageFreeNAS + 12

Remote🇺🇸💰Added 6 days ago

Site Reliability Engineer

Together AI - A research-driven artificial intelligence company on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models.

AnsibleTerraformKubernetesCloud

Remote🇺🇸💰👶Added 7 days ago

Senior Site Reliability Engineer

RingCentral - Global leader in cloud-based communications and collaboration software

LinuxPythonGoPHPPerlAWSKubernetesKafkaELK stackZabbix + 16

Senior Site Reliability Engineer

NVIDIA

LinkedIn

Other jobs at NVIDIA

Notifications about similar jobs

Similar jobs