Senior Infrastructure Site Reliability Engineer

Crisis Text Line

RemoteSenior

🇺🇸 United States

👶Paid parental leave

💰Equity

Site Reliability Engineer

Technology

Amazon ECS AWS AWS Fargate Bash Cloud CloudWatch Kubernetes PowerShell Python Terraform

🔥 Apply now

Why you should join our team:

Our work is transforming the way people in pain access support at their fingertips

Our work is innovative in the crisis response space

Our dynamic, fun, and diverse culture

Our meaningful cause, led by empathy and innovation

Our strong values at the center of all we do

Our commitment to diversity, equity and inclusion

Our commitment to engagement and belonging

Our commitment to our employees and their holistic wellbeing

Our value of work/life balance

Our growth mindset and prioritize professional development

Our leaders who truly care

What you'll be doing:

At Crisis Text Line, the engineering, product, and design teams are commonly referred to as Build. The vision of the team is to:

Deliver the most trusted, innovative, and easy-to-use Crisis Care Platform in the industry and drive unprecedented levels of growth for people in need worldwide.
Ensure that every user feels a sense of community on our platform, allowing us to build trust, and grow our impact.
Allow our volunteers to spend all their time supporting people in need in an environment with few constraints, and minimal time searching for supporting information, resources, or support.
Provide a services/API-first architecture based on federated sources of data and infuses predictive insights (ML and otherwise) in every aspect of our Platform and Experience.

Role:
As a Site Reliability Engineer (SRE) at Crisis Text Line, you will be responsible for designing, implementing, and maintaining our cloud infrastructure to ensure optimal performance, availability, and security. You will work closely with our engineering and operations teams to streamline our deployment processes, enhance our monitoring and alerting systems, and drive continuous improvements to our platform reliability. This role offers an exciting opportunity to leverage your expertise in AWS Fargate, CloudWatch alerting, and monitoring to support our mission-critical applications and services.

Responsibilities:

Lead, and maintain highly available, scalable, and secure infrastructure on AWS Fargate.
Design and maintain CloudWatch alerting and monitoring configurations to proactively identify and resolve potential issues.
Mentor and guide junior team members, sharing best practices and promoting a culture of excellence.
Collaborate with cross-functional teams to define and implement best practices for infrastructure as code (IaC), continuous integration/continuous deployment (CI/CD), and site reliability engineering (SRE) methodologies.
Lead in incident response and resolution, including troubleshooting complex system issues and implementing preventive measures to minimize downtime.
Automate repetitive tasks and processes to improve operational efficiency and reduce manual intervention.
Conduct performance tuning and optimization of infrastructure components to ensure optimal resource utilization and cost efficiency.
Stay up-to-date with emerging technologies and industry trends to drive innovation and continuous improvement.

Qualifications:

Bachelor's degree in Computer Science, Engineering, or related field (Master's degree preferred) or equivalent experience.
Experience in site reliability engineering (SRE) or related roles, with a focus on cloud infrastructure management.
Hands-on experience with AWS services, particularly AWS Fargate, CloudWatch, and related tools.
Proficiency with infrastructure as code (IaC) tools such as Terraform or CloudFormation.
Strong scripting and automation skills using languages such as Python, Bash, or PowerShell.
Experience with container orchestration platforms such as Kubernetes or Amazon ECS.
Solid understanding of networking concepts, security best practices, and DevOps principles.
Strong problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment.
AWS certifications (e.g., AWS Certified Solutions Architect, AWS Certified DevOps Engineer) are a plus.

Reliable High-Speed Internet Required: Must have a stable high-speed internet connection to support seamless remote collaboration, virtual meetings, online job tasks, etc.

The full salary range for this position, across all United States geographies, is $107,000-$162,000 per year. The upper portion of the salary range is typically reserved for existing employees who demonstrate strong performance over time. Starting salary will vary by location, qualifications, and prior experience; during the interview process, candidates will learn the starting salary range applicable for their location. We pay competitively in the tech-forward nonprofit space and offer a robust benefits package.

Only candidates in the following states will be eligible for employment: CA, CO, CT, FL, GA, HI, IL, IN, IA, MD, MA, MI, MN, MO, NJ, NM, NY, NC, OH, PA, TN, TX, UT, VA, WA.

🔥 Apply now

Crisis Text Line

Crisis Text Line provides free, 24/7, high-quality text-based mental health support and crisis intervention by empowering a community of trained volunteers to support people in their moments of need.

🏥Good health and wellbeing

⚖️Peace and justice

Mental Health

Nonprofits

Counseling

Healthcare

CSR (Corporate Social Responsibility)

crisistextline.org

Other jobs at Crisis Text Line

Remote🇺🇸💰👶

Data Engineer

Remote🇺🇸💰👶

Senior Full Stack Engineer

Remote🇺🇸👶💰

Senior Data Engineer

View all Crisis Text Line jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

🇺🇸 United States

Site Reliability Engineer

Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs

Remote🇺🇸💰Added 4 days ago

Senior Platform Operations Engineer

NBCUniversal - We create world-class content, which we distribute across our portfolio of film, television, and streaming, and bring to life through our theme parks and consumer experiences. (entertainment providers)

AWSEC2ECSRDSIAMKubernetesCI/CDTravisCIJenkinsGitlab CI + 9

Remote🇺🇸🇨🇦💰Added 3 days ago

Senior Site Reliability Engineer

NVIDIA is a technology company that specializes in AI computing, with a focus on Deep Learning GPUs

CloudKubernetesOpenStackDockerPythonGoPerlRubyLinuxNetworking + 2

Remote🇺🇸Added 5 days ago

Site Reliability Engineer

PayNearMe develops award-winning technology to facilitate the end-to-end customer payment experience, making it easy for businesses to manage and accept payments.

TerraformKubernetesDockerDatadogPythonBashGoGitLab CIAWSGCP + 14

Remote🇺🇸🇨🇦💰👶Added 5 days ago

Engineer II - Site Reliability

CrowdStrike - A fast-growing security company that protects our wide range of customers from cybersecurity attacks.

LinuxC++JavaPythonGoSANNASNFSObject StorageFreeNAS + 12

Remote🇺🇸💰Added 5 days ago

Site Reliability Engineer

Together AI - A research-driven artificial intelligence company on a mission to significantly lower the cost of modern AI systems by co-designing software, hardware, algorithms, and models.

AnsibleTerraformKubernetesCloud

Remote🇺🇸💰👶Added 6 days ago

Senior Site Reliability Engineer

RingCentral - Global leader in cloud-based communications and collaboration software

LinuxPythonGoPHPPerlAWSKubernetesKafkaELK stackZabbix + 16

Senior Infrastructure Site Reliability Engineer

Crisis Text Line

LinkedIn

Other jobs at Crisis Text Line

Notifications about similar jobs

Similar jobs