Β 

Senior Infrastructure Site Reliability Engineer

RemoteSenior
πŸ‡ΊπŸ‡Έ United States
πŸ‘ΆPaid parental leave
πŸ’°Equity
Site Reliability Engineer
Technology

Why you should join our team:

Our work is transforming the way people in pain access support at their fingertips

Our work is innovative in the crisis response space

Our dynamic, fun, and diverse culture

Our meaningful cause, led by empathy and innovation

Our strong values at the center of all we do

Our commitment to diversity, equity and inclusion

Our commitment to engagement and belonging

Our commitment to our employees and their holistic wellbeing

Our value of work/life balance

Our growth mindset and prioritize professional development

Our leaders who truly care

What you'll be doing:

At Crisis Text Line, the engineering, product, and design teams are commonly referred to as Build. The vision of the team is to:

  • Deliver the most trusted, innovative, and easy-to-use Crisis Care Platform in the industry and drive unprecedented levels of growth for people in need worldwide.
  • Ensure that every user feels a sense of community on our platform, allowing us to build trust, and grow our impact.
  • Allow our volunteers to spend all their time supporting people in need in an environment with few constraints, and minimal time searching for supporting information, resources, or support.
  • Provide a services/API-first architecture based on federated sources of data and infuses predictive insights (ML and otherwise) in every aspect of our Platform and Experience.

Role:
As a Site Reliability Engineer (SRE) at Crisis Text Line, you will be responsible for designing, implementing, and maintaining our cloud infrastructure to ensure optimal performance, availability, and security. You will work closely with our engineering and operations teams to streamline our deployment processes, enhance our monitoring and alerting systems, and drive continuous improvements to our platform reliability. This role offers an exciting opportunity to leverage your expertise in AWS Fargate, CloudWatch alerting, and monitoring to support our mission-critical applications and services.

Responsibilities:

  • Lead, and maintain highly available, scalable, and secure infrastructure on AWS Fargate.
  • Design and maintain CloudWatch alerting and monitoring configurations to proactively identify and resolve potential issues.
  • Mentor and guide junior team members, sharing best practices and promoting a culture of excellence.
  • Collaborate with cross-functional teams to define and implement best practices for infrastructure as code (IaC), continuous integration/continuous deployment (CI/CD), and site reliability engineering (SRE) methodologies.
  • Lead in incident response and resolution, including troubleshooting complex system issues and implementing preventive measures to minimize downtime.
  • Automate repetitive tasks and processes to improve operational efficiency and reduce manual intervention.
  • Conduct performance tuning and optimization of infrastructure components to ensure optimal resource utilization and cost efficiency.
  • Stay up-to-date with emerging technologies and industry trends to drive innovation and continuous improvement.

Qualifications:

  • Bachelor's degree in Computer Science, Engineering, or related field (Master's degree preferred) or equivalent experience.
  • Experience in site reliability engineering (SRE) or related roles, with a focus on cloud infrastructure management.
  • Hands-on experience with AWS services, particularly AWS Fargate, CloudWatch, and related tools.
  • Proficiency with infrastructure as code (IaC) tools such as Terraform or CloudFormation.
  • Strong scripting and automation skills using languages such as Python, Bash, or PowerShell.
  • Experience with container orchestration platforms such as Kubernetes or Amazon ECS.
  • Solid understanding of networking concepts, security best practices, and DevOps principles.
  • Strong problem-solving skills and the ability to work effectively in a fast-paced, collaborative environment.
  • AWS certifications (e.g., AWS Certified Solutions Architect, AWS Certified DevOps Engineer) are a plus.

Reliable High-Speed Internet Required: Must have a stable high-speed internet connection to support seamless remote collaboration, virtual meetings, online job tasks, etc.

The full salary range for this position, across all United States geographies, is $107,000-$162,000 per year. The upper portion of the salary range is typically reserved for existing employees who demonstrate strong performance over time. Starting salary will vary by location, qualifications, and prior experience; during the interview process, candidates will learn the starting salary range applicable for their location. We pay competitively in the tech-forward nonprofit space and offer a robust benefits package.

Only candidates in the following states will be eligible for employment: CA, CO, CT, FL, GA, HI, IL, IN, IA, MD, MA, MI, MN, MO, NJ, NM, NY, NC, OH, PA, TN, TX, UT, VA, WA.

Β 

Crisis Text Line

Crisis Text Line provides free, 24/7, high-quality text-based mental health support and crisis intervention by empowering a community of trained volunteers to support people in their moments of need.

πŸ₯Good health and wellbeing
βš–οΈPeace and justice
Mental Health
Nonprofits
Counseling
Healthcare
CSR (Corporate Social Responsibility)

Other jobs at Crisis Text Line

Β 

Β 

Β 

Β 

Β 

Β 

Β 

Β 

View all Crisis Text Line jobs

Notifications about similar jobs

Get notifications to your inbox about new jobs that are similar to this one.

πŸ‡ΊπŸ‡Έ United States
Site Reliability Engineer
Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs

Β 

Β 

Β 

Β 

Β 

Β 

Β 

Β