Support Operations Engineer

Hybrid
Mid-level
🇺🇸 United States
👶Paid parental leave
Site Reliability Engineer
 

About the role:

***Please note that this role is for the 12pm - 9pm EST shift and will need to work those hours.***

About the Team

CoreWeave’s Support Operations team ensures peak performance and reliability across thousands of nodes in multiple supercomputer clusters, each with tens of thousands of GPUs. Collaborate with pioneering generative AI labs, world-renowned VFX organizations, and visionary developers and artists. These innovators leverage our cutting-edge GPU cloud infrastructure to power their mission-critical workflows and achieve unprecedented capabilities.

About the role:

As a Support Operations Engineer, you will be responsible for deploying, configuring, and maintaining CoreWeave’s GPU fleet across our growing number of data centers in the U.S., Europe, and beyond.

What You'll Do:

  • You’ll monitor our fleet’s health, performance, and reliability for issues through the use of our observability stack - Grafana, Prometheus, Victoria Metrics.
  • You’ll use CoreWeave Kubernetes to troubleshoot customer support requests and act as a technical escalation point on Infrastructure issues for the Customer Success organization.
  • You’ll lear...
 

 

CoreWeave

CoreWeave

CoreWeave is a specialized cloud provider delivering GPU compute resources for VFX, rendering, machine learning, and batch processing.

Cloud Computing
Machine Learning

LinkedIn

🏭it services and it consulting
🎂2017

Other jobs at CoreWeave

 

 

 

 

 

 

 

 

View all CoreWeave jobs

Why OmniJobs?

  • Rare & hidden jobs
  • New jobs every day
  • No expired job posts
  • All jobs in English

Receive emails about similar jobs

Get alerts to your inbox about new open jobs that are similar to this one.

🇺🇸 United States
Site Reliability Engineer

No spam. No ads. Unsubscribe anytime.

Similar jobs