Site Reliability Engineer

RemoteSenior
🇺🇸 United States
🏖️Unlimited holidays
Site Reliability Engineer

Doxel AI is hiring a Site Reliability Engineer to join the Engineering team to focus on the robustness and performance of our systems and processes. This role will accelerate our mission to ensure every decision on a construction site is a great one.

Construction is the 2nd largest industry in the world (4x the size of SaaS!). But unlike software (with observability platforms such as AppDynamics and Datadog), construction teams lack automated feedback loops to help projects stay on schedule and on budget. Without this observability, construction wastes a whopping $3T per year because glitches aren’t detected fast enough to recover.

Doxel AI exists to bring computer vision to construction, so the industry can deliver what society needs to thrive. From hospitals to data centers, from foreman to VPs of construction, teams use Doxel to make better decisions everyday. In fact, Doxel has contributed to the construction of the facilities that provide many of the products and services you use everyday.

We have classic computer vision, deep learning ML object detection, a low-latency 3D three.js web app, a complex data pipeline powering it all in the background. We’re building out new workflows, analytics dashboards, and forecasting engines. Join us in bringing AI to construction!

The Role

Doxel Engineers produce the foundation for Doxel's construction insights including the behind the scenes technology that snapshots hundreds of thousands of square feet of construction activity per day and the software that ingests 100s of gigabytes of data per site per day, as well as the state-of-the-art web application that renders this data in a useful, performant manner for our customers.

We are looking for our first Site Reliability Engineer (SRE) to bring a holistic approach to the reliability, availability, and performance of our systems, reducing the frequency of service disruptions and increasing the customer’s trust and delight in our product. You care deeply about reliable, low-upkeep software. You bring curiosity and a can-do attitude to work, eager to dive into ambiguous problems and bring order and reliability to our systems and processes so that they can scale far beyond our current use cases.

What You'll Do

  • System Monitoring: Setting up and maintaining monitoring systems to track performance metrics and uptime.
  • Incident Management: Optimizing our incident response processes, performing root cause analysis, and ensuring quick recovery from outages.
  • Infrastructure Management: Designing and maintaining scalable and reliable infrastructure, including servers, databases, networks, and cloud resources.
  • Automation: Developing and implementing automation tools to streamline operational tasks and reduce manual intervention.
  • Performance Optimization: Analyzing system performance and implementing changes to improve efficiency.
  • Capacity Management: Managing the allocation of resources and ensuring that systems are neither under nor over-provisioned.
  • Security: Ensuring that systems are secure from vulnerabilities and threats. This includes implementing security best practices, performing regular audits, and responding to security incidents.
  • Collaboration: Working closely with development teams to design and implement reliable systems, providing guidance on best practices for code deployment and system design to improve reliability.
  • Documentation: Creating and maintaining detailed documentation on systems, processes, and procedures to ensure transparency and facilitate troubleshooting.
  • Disaster Recovery: Developing and testing disaster recovery plans to ensure that systems can be restored quickly in the event of a major failure.

What You'll Bring to the Team

  • 5+ years with experience in Python (preferred) or another FP Language
  • 5+ years of experience working in a full stack or back end focused engineering role within an agile, cloud based environment
  • Experience with Kubernetes, Docker, and Airflow
  • Experience with GCS (preferred) or AWS
  • Experience with SQL and deep database knowledge
  • Experience with automating manual processes
  • Experience with DataDog or similar monitoring tools
  • Experience developing processes with people and systems and fostering adoption of those processes
  • Bachelor’s degree in Computer Science or other technical discipline

The base pay range for this position is $160,000 - $200,000. Pay is based on factors such as location, skill level, qualifications, competencies, and overall experience. Options may be included as an additional part of the package.

Doxel also provides comprehensive health/dental/vision benefits for employees and their families, an Unlimited PTO policy, a 401(k) program, and a flexible work environment among other benefits. Doxel is an equal opportunity employer and actively seeks diversity at our company. We do not discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status.

 

Doxel

Doxel

Doxel is an AI platform that uses cameras and image recognition to analyze construction projects and predict delays, helping teams make decisions with objective data.

Artificial Intelligence
Construction
Data Analytics

Other jobs at Doxel

 

 

 

 

 

 

 

 

View all Doxel jobs

Why OmniJobs?

  • Rare & hidden jobs
  • New jobs every day
  • No expired job posts
  • All jobs in English

Receive emails about similar jobs

Get alerts to your inbox about new open jobs that are similar to this one.

🇺🇸 United States
Site Reliability Engineer
Remote

No spam. No ads. Unsubscribe anytime.

Similar jobs