Principal Site Reliability Engineer

Senior

🇺🇸 United States

Site Reliability Engineer

Technology

Ansible AWS CI/CD Cloud GitHub Actions Machine learning SaaS Terraform

🔥 Apply now

About Us

Terminal builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning. Our platform provides warehouse operators with the intelligence needed to optimize their usage of trucks, trailers, chassis, containers and personnel. These are the fundamental operating assets of commerce - and represent the last great frontier of untapped data. In the process, Terminal will address many industry-wide pain points, including compliance, manual processes, equipment location, phantom costs, and labor inefficiencies. Ultimately, Terminal will become the central nervous system for the yard, seamlessly connecting all data sources to support an extensive range of essential functions.

Overview

Our world class vision engineering team has built an engine that can process the movement of trucks and containers in real-time. It’s now time to unlock the potential of that engine by building SaaS applications that leverage the vision engine to transform the logistics industry. We’re hiring the team of engineers that will architect and build these applications from the ground up.

We are seeking an experienced Principal Site Reliability Engineer with a minimum of 12 years of relevant experience to join our team. As a member of our Engineering team, you will play a pivotal role in architecting and developing cutting-edge solutions. The ideal candidate possesses expertise in AWS, proficiency in operations, and running software at scale. They will have a deep understanding of event-driven technologies, hands-on experience with modern data stores, and a commitment to implementing observability and a passion for operational excellence. Taking ownership of production quality, reliability and security.

Responsibilities

Design, build, and operate infrastructure using Infrastructure as Code (IaC) tools like Terraform and Ansible. Develop and maintain infrastructure automation to ensure scalability and reliability.
Define and implement best practices for continuous deployment of software and services using CI/CD tools such as GitHub Actions. Automate deployment processes to streamline operations.
Collaborate with cross-functional teams to establish and enforce best practices for system reliability. Utilize service-level objectives (SLOs), error budgets, and other reliability metrics to measure, monitor, and enhance system performance.
Develop automation to eliminate operational toil and reduce overhead for managing and deploying production systems. Enhance observability and monitoring to proactively identify and address issues.
Lead incident response efforts, including diagnosis, resolution, and post-mortem analysis. Implement robust monitoring and alerting systems to ensure quick detection and resolution of issues.
Monitor system performance and capacity, identifying and implementing improvements to ensure high availability and reliability of services.
Ensure that systems adhere to security best practices and regulatory compliance requirements. Implement security measures and conduct regular audits to safeguard production environments.
Stay current with emerging technologies and industry trends. Contribute to the continuous evolution of our technology stack, adapting to new challenges and opportunities.

Requirements

Minimum of 12 years of experience in Site Reliability Engineering or a related role, with a proven track record of managing complex production environments.
Strong background in operating systems, networking, distributed systems, and database management. Expertise in AWS cloud services and infrastructure management.
Demonstrated experience in incident response, production monitoring, and capacity planning. Ability to handle high-pressure situations and ensure system reliability.
Proficiency in automating infrastructure and deployment processes using tools like Terraform, Ansible, and CI/CD pipelines.
Excellent problem-solving and analytical abilities, with the capability to diagnose and resolve complex issues in production environments.
Strong communication skills, with the ability to convey technical concepts clearly to both technical and non-technical stakeholders.
Proven ability to work collaboratively with cross-functional teams, including engineering, product, and operations teams.
Experience implementing security best practices and conducting security audits to ensure compliance and protect production systems.
Comfort with a fast-paced, dynamic startup environment. Ability to quickly learn and adapt to new technologies and methodologies.

What We Offer

Joining the Terminal team means being part of a dynamic, innovative environment where your work directly impacts the future of logistics and the global supply chain. You will work closely with a team of experts passionate about operational excellence and technological innovation. We offer competitive salaries, a comprehensive benefits package, and opportunities for professional growth.

🔥 Apply now

Terminal Industries

Terminal builds software that digitizes, indexes, and automates the yard, leveraging best-in-class machine learning to optimize usage of trucks, trailers, chassis, containers, and personnel.

Artificial Intelligence

Logistics

Software

Supply Chain

Technology

🌍 terminal-industries.com All open jobs

🌍 linkedin.com

Other jobs at Terminal Industries

Remote🇺🇸

Field Engineer

🇺🇸

Principal Software Engineer

🇺🇸

Senior Site Reliability Engineer - Edge IoT

🇺🇸

Senior Software Engineer - Vision Engine Runtime

View all Terminal Industries jobs

Why OmniJobs?

Rare & hidden jobs
New jobs every day
No expired job posts
All jobs in English

Receive emails about similar jobs

Get alerts to your inbox about new open jobs that are similar to this one.

🇺🇸 United States

Site Reliability Engineer

No spam. No ads. Unsubscribe anytime.

Similar jobs

RemoteContract🇺🇸Added 19h ago

Site Implementation Engineer

Burwood Group, Inc - A technology consulting firm that helps companies use and manage technology to transform business and improve outcomes.

Cisco MerakiAzureVNETsVPNsPeeringZscalerESXiDHCPADKaseya + 2

Contract🇺🇸Added 19h ago

Site Implementation Engineer

Burwood Group, Inc - A technology consulting firm that helps companies use and manage technology to transform business and improve outcomes.

ESXiVelocloudAruba CentralCradlepointKaseyaEnterprise

🇺🇸💰Added 4h ago

Site Reliability Engineer

Perplexity AI - Perplexity is a conversational answer engine that has grown to 10 million monthly active users and has served over half a billion queries in 2023

PythonPostgreSQLDynamoDBRedisKubernetesDockerTerraformCloudAWS

🇺🇸💰Added 5h ago

Site Reliability Engineer

Perplexity AI - Perplexity is a conversational answer engine that has grown to 10 million monthly active users and has served over half a billion queries in 2023

PythonPostgreSQLDynamoDBRedisKubernetesDockerTerraformCloudAWS

🇺🇸Added 21h ago

Sr. Reliability Engineer I

Biogen discovers, develops, and delivers worldwide innovative therapies for people living with serious neurological and neurodegenerative diseases.

🇺🇸Added 21h ago

Sr. Reliability Engineer I

Biogen discovers, develops, and delivers worldwide innovative therapies for people living with serious neurological and neurodegenerative diseases.

Principal Site Reliability Engineer

About Us

Overview

Responsibilities

Requirements

What We Offer

Terminal Industries

LinkedIn

Other jobs at Terminal Industries

Why OmniJobs?

Receive emails about similar jobs

Similar jobs