Site Reliability Engineer

Doxel

RemoteSenior

🇺🇸 United States

🏖️Unlimited holidays

Site Reliability Engineer

Agile Airflow AWS Cloud Docker GCS Kubernetes Python SaaS SQL

🔥 Apply now

Doxel AI is hiring a Site Reliability Engineer to join the Engineering team to focus on the robustness and performance of our systems and processes. This role will accelerate our mission to ensure every decision on a construction site is a great one.

Construction is the 2nd largest industry in the world (4x the size of SaaS!). But unlike software (with observability platforms such as AppDynamics and Datadog), construction teams lack automated feedback loops to help projects stay on schedule and on budget. Without this observability, construction wastes a whopping $3T per year because glitches aren’t detected fast enough to recover.

Doxel AI exists to bring computer vision to construction, so the industry can deliver what society needs to thrive. From hospitals to data centers, from foreman to VPs of construction, teams use Doxel to make better decisions everyday. In fact, Doxel has contributed to the construction of the facilities that provide many of the products and services you use everyday.

We have classic computer vision, deep learning ML object detection, a low-latency 3D three.js web app, a complex data pipeline powering it all in the background. We’re building out new workflows, analytics dashboards, and forecasting engines. Join us in bringing AI to construction!

The Role

Doxel Engineers produce the foundation for Doxel's construction insights including the behind the scenes technology that snapshots hundreds of thousands of square feet of construction activity per day and the software that ingests 100s of gigabytes of data per site per day, as well as the state-of-the-art web application that renders this data in a useful, performant manner for our customers.

We are looking for our first Site Reliability Engineer (SRE) to bring a holistic approach to the reliability, availability, and performance of our systems, reducing the frequency of service disruptions and increasing the customer’s trust and delight in our product. You care deeply about reliable, low-upkeep software. You bring curiosity and a can-do attitude to work, eager to dive into ambiguous problems and bring order and reliability to our systems and processes so that they can scale far beyond our current use cases.

What You'll Do

System Monitoring: Setting up and maintaining monitoring systems to track performance metrics and uptime.
Incident Management: Optimizing our incident response processes, performing root cause analysis, and ensuring quick recovery from outages.
Infrastructure Management: Designing and maintaining scalable and reliable infrastructure, including servers, databases, networks, and cloud resources.
Automation: Developing and implementing automation tools to streamline operational tasks and reduce manual intervention.
Performance Optimization: Analyzing system performance and implementing changes to improve efficiency.
Capacity Management: Managing the allocation of resources and ensuring that systems are neither under nor over-provisioned.
Security: Ensuring that systems are secure from vulnerabilities and threats. This includes implementing security best practices, performing regular audits, and responding to security incidents.
Collaboration: Working closely with development teams to design and implement reliable systems, providing guidance on best practices for code deployment and system design to improve reliability.
Documentation: Creating and maintaining detailed documentation on systems, processes, and procedures to ensure transparency and facilitate troubleshooting.
Disaster Recovery: Developing and testing disaster recovery plans to ensure that systems can be restored quickly in the event of a major failure.

What You'll Bring to the Team

5+ years with experience in Python (preferred) or another FP Language
5+ years of experience working in a full stack or back end focused engineering role within an agile, cloud based environment
Experience with Kubernetes, Docker, and Airflow
Experience with GCS (preferred) or AWS
Experience with SQL and deep database knowledge
Experience with automating manual processes
Experience with DataDog or similar monitoring tools
Experience developing processes with people and systems and fostering adoption of those processes
Bachelor’s degree in Computer Science or other technical discipline

The base pay range for this position is $160,000 - $200,000. Pay is based on factors such as location, skill level, qualifications, competencies, and overall experience. Options may be included as an additional part of the package.

Doxel also provides comprehensive health/dental/vision benefits for employees and their families, an Unlimited PTO policy, a 401(k) program, and a flexible work environment among other benefits. Doxel is an equal opportunity employer and actively seeks diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

🔥 Apply now

Doxel

Doxel is an AI platform that uses cameras and image recognition to analyze construction projects and predict delays, helping teams make decisions with objective data.

Artificial Intelligence

Construction

Data Analytics

🌍 doxel.ai All open jobs

🌍 linkedin.com

Other jobs at Doxel

🇩🇪🏖️

Construction Client Manager

Remote🇺🇸💰🏖️

Senior Software Engineer - Data Science

Remote🇺🇸💰🏖️

Product Manager - Data and AI

Remote🇺🇸💰🏖️

Senior 3D Frontend Engineer

View all Doxel jobs

Why OmniJobs?

Rare & hidden jobs
New jobs every day
No expired job posts
All jobs in English

Receive emails about similar jobs

Get alerts to your inbox about new open jobs that are similar to this one.

🇺🇸 United States

Site Reliability Engineer

Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs

Remote🇺🇸💰Added 12h ago

Site Reliability Engineer

GitLab is a leading DevSecOps platform empowering organizations to deliver software faster and more efficiently (it services and it consulting)

KubernetesTerraformGogRPCGitRubySaaSCloudOpen Source

Remote🇺🇸💰👶Added 13h ago

Site Reliability Engineer - Delivery:Deployments

GitLab is a leading DevSecOps platform empowering organizations to deliver software faster and more efficiently (it services and it consulting)

GitKubernetesRelease ProcessesDeployment Strategies

Remote🇺🇸👶Added a day ago

Senior Site Reliability Engineer

Hypori is a leading provider of SaaS cybersecurity solutions for Federal and Commercial customers, including the United States Army (computer and network security)

TerraformCI/CDK8sPythonJavaGoDatadogGrafanaNew RelicPuppet + 6

Remote🇺🇸Added a day ago

Site Reliability Engineer

Osmosis Labs - Osmosis is a community-created decentralized exchange that serves the Cosmos community, bringing multi-chain DeFi to life, starting with the Cosmos interoperability layer.

SREDatadogNew RelicPrometheusGrafanaDockerKubernetesTerraformGoogle CloudCloudflare + 5

Remote🇺🇸💰👶Added a day ago

Senior Site Reliability Engineer

Veza Technologies, Inc. - Veza is the identity security company that provides a platform for securing access across cloud infrastructure, data systems, SaaS apps, and on-prem apps(technology, information and internet)

AWSKubernetesLinuxAWS networkingVPCGitOpsPrometheusGrafanaBazelHelm + 3

Remote🇺🇸💰👶Added 2 days ago

Staff Site Reliability Engineer

Crisis Text Line provides free, 24/7, high-quality text-based mental health support and crisis intervention by empowering a community of trained volunteers to support people in their moments of need.

AWSTerraformCloudPythonBashAnsibleDockerKubernetesJenkinsGitLab CI + 12

Site Reliability Engineer

What You'll Do

What You'll Bring to the Team

Doxel

LinkedIn

Other jobs at Doxel

Why OmniJobs?

Receive emails about similar jobs

Similar jobs