Senior Site Reliability Engineer

Hybrid
Senior
🇺🇸 United States
Site Reliability Engineer
Technology

Material Bank is a fast-paced, high-growth technology company and created the world's largest material marketplace for the Architecture and Design industry, providing the fastest and most powerful way to start and manage a design project. Learn more about us at www.materialbank.com or see below.

--

SREs focus on system-level optimization (covering operating systems, storage, and networking) and implement best practices to ensure high availability, reliability, and scalability. You’ll work on diverse challenges, ranging from algorithms to distributed systems.

Key Responsibilities:

  • Incident Response: Participate in an on-call rotation using OpsGenie to handle incidents that could impact Materialbank's availability. Support service engineers in addressing customer issues and proactively work to prevent future incidents.
  • Infrastructure Management: Operate and manage infrastructure with tools like Terraform, GitHub/CodePipeline CI/CD, and Kubernetes. Design, build, and maintain core infrastructure to support the growth and scalability of Materialbank, accommodating hundreds of thousands of users.
  • Monitoring & Automation: Develop monitoring systems that focus on early detection of potential issues. Document processes thoroughly, turning findings into repeatable actions and automated solutions.
  • Process Improvement: Continuously refine operational processes, such as deployments and upgrades, to ensure they are efficient and predictable.
  • System Debugging: Troubleshoot and resolve production issues across various services and infrastructure layers.
  • Scalability Planning: Strategically plan the expansion and enhancement of Materialbank's infrastructure.

Potential Projects:

  • Automating infrastructure management using Terraform.
  • Enhancing or creating new monitoring metrics with NewRelic.
  • Collaborating with release managers to deploy and troubleshoot new versions of Materialbank.com.
  • Planning and executing infrastructure deployments and improvements on AWS.
  • Partnering with development teams to establish SLAs, share performance data, and improve reliability through SLOs and error budgets.

Execution & Organization:

  • Plan and manage projects using Agile methodologies, driving progress through epics and issues.
  • Organize workloads effectively and lead initiatives related to OKRs.
  • Operate autonomously, self-organizing and reporting asynchronously as needed.

Collaboration & Communication:

  • Lead and contribute to the design and scope of issues, epics, and OKRs.
  • Contribute to and maintain documentation, including creating and updating runbooks, general documentation, and writing blog posts.
  • Conduct Root Cause Analysis (RCA) investigations and perform readiness reviews.
  • Enhance team practices through code reviews, collaborative work handoffs, and incident management.

Influence & Leadership:

  • Participate actively in the hiring process, including contributing to candidate assessments, interviews, and evaluations.
  • Share knowledge, mentor team members, and foster a culture of continuous learning and collaboration.
  • Demonstrate self-awareness, manage team conflicts constructively, and provide/receive feedback effectively.
  • Build and maintain positive relationships with other engineering teams at Materialbank, contributing to overall product improvement.
  • Take initiative and responsibility, stepping in to address challenges proactively and offering constructive feedback.

What you'll bring: Preferred experience

  • 5+ years of SRE practice experience with cloud platforms/providers and Linux systems.
  • Implement infrastructure as code with Terraform and GitHub/CodePipeline, containerize environments using Kubernetes/ECS, and utilize cloud technologies to achieve operational goals.
  • Manage and troubleshoot operating systems, storage solutions, networking (VPCs, proxies, and CDNs), and administer high-availability Aurora MySQL and Redis clusters.
  • Implement monitoring and instrumentation using New Relic, log management systems, and integrate with Slack and OpsGenie.
  • Focus on engineering best practices, including availability, reliability, scalability, and disaster recovery.
  • Work across various programming languages such as Shell, GoLang, and/or Python.

What you’ll get from us:

  • Our people: If you thrive in an inclusive, innovative, and fast-paced organization, look no further! You will get to work alongside some of the brightest minds - Join a genuinely fun and supportive workplace where we keep our employees consistently engaged through internal communication and corporate events
  • Relaxation and Celebrations: Generous PTO, Sick Days, Paid National Holidays, and even more (ask us about this when we connect).
  • Health Benefits : We contribute to your medical, dental, vision and short-term/long-term disability plans and have a strong employee assistance program.
  • Plan for your Retirement : 401(k) eligible after your first 90 day's employed!
  • Giving Back: We sponsor multiple events throughout the year to help out our communities. You will receive time off to give back as well.
  • Growth: We’ll help you take your career to the next level. We want you to be creative and take initiative which will allow you to grow and create within the company. Most importantly, be the best at what matters!
  • Flexible Work Schedules: With business units and employees across the globe, Material Technologies has embraced a hybrid working model allowing department leaders to decide on the best approach for their respective teams, whether that be remote, in person, or a little of both.

About Material Bank

Material Bank is the world’s largest material marketplace for the architecture and design industry, providing the fastest and most powerful way to search and sample materials. Material Bank connects design professionals to hundreds of manufacturers through facilitating brand discovery, rep engagement, and material sampling.

Material Bank has transformed the way an entire industry discovers and samples materials. By removing the friction that exists in the process, we drive business between architects and designers (members) and our Brand Partners (clients).

Our powerful material database and proprietary robotic distribution facility allow members to order samples until midnight (ET) to be delivered free of charge anywhere in the US, in one box, by 10:30 AM the next morning.

Connect with us and discover your career at Material Bank.

--

Material Bank is proud to be an equal opportunity employer. We value diversity, and all applicants will be considered for employment without attention to race, color, religion, sex, sexual orientation, gender identity, age, national origin, veteran or disability status or other status protected under any applicable federal, state or local law.

 

Material Bank

Material Bank

Material Bank is the world's largest material marketplace for the architecture and design industry, providing the fastest and most powerful way to search and sample materials.

🛒Responsible consumption and production
Construction
E-commerce

LinkedIn

🏭Design
🎂2018
286
25.1K

Updated  

Other jobs at Material Bank

 

 

 

 

 

 

 

 

View all Material Bank jobs

Why OmniJobs?

  • Rare & hidden jobs
  • New jobs every day
  • No expired job posts
  • All jobs in English

Receive emails about similar jobs

Get alerts to your inbox about new open jobs that are similar to this one.

🇺🇸 United States
Site Reliability Engineer

No spam. No ads. Unsubscribe anytime.

Similar jobs