About the Team/Role
Working closely with the Platform Operations Lead, the Site Reliability Engineer is responsible for building out WEX’s Travel engineering solutions and operational problems with a focus on optimizing existing systems, building infrastructure and eliminating work through automation in an Agile environment.
**How you’ll make an impact
- Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation and refinement
- Support capacity planning, availability, scalability, security and latency considerations for new infrastructure and service provisioning as appropriate
- Scale and optimize existing infrastructure and services sustainably through mechanisms, including automation, and evolve them by improving reliability and efficiency
- Manage end-to-end availability and performance of mission-critical services and build automation to prevent problem recurrence
- Maintain infrastructure and services by measuring, and monitoring system metrics to proactively identify operational efficiencies, potential outages and security threats in Development, UAT, Staging and Production environments
- Practice sustainable incident response and blameless postmortems
- Build infrastructure and drive projects that break things with the aim to improve the robustness of production systems
- Use the core Site Reliability Engineering principles of change management, monitoring, emergency response, capacity planning, and production readiness reviews to run the platform
- Step back to observe patterns and develop innovative tools and automation to eliminate or minimize menial tasks. Use those learnings to drive the best operational practices
- Develop and maintain solution and operational documentation and designs for all infrastructure and services within the scope of SRE
- Preserve operational visibility and response capabilities — fixing and improving our dashboards, alerts, and automation
- Take part in on-call rotation as part of the Platform Operations team supporting the Wex Travel Platform
Experience you’ll bring
- Proficient in one or more of the following scripting languages: JavaScript, Nodejs, Python, PowerShell, Bash, etc
- 2 years experience working with public cloud platforms, Azure preferable
- Experience handling large numbers of diverse systems with configuration management systems like Puppet, Chef, Ansible etc.
- Understanding of standard networking protocols and components such as HTTP, DNS, TCP/IP, ICMP, the OSI Model, Subnetting and Load Balancing strategies
- Understanding of Serverless Application Framework
- Experience in containerised workloads and management platforms such as Docker or Kubernetes
- Familiarity with distributed systems is a plus including Microservices
- Experience in Infrastructure automation tools such as Cloudformation, Terraform
- Understanding of CI/CD processes and experience with deployment automation tools such as CodePipeline, CodeDeploy, Jenkins, Bamboo
- Strong debugging, troubleshooting, and problem-solving skills
- Effective communication, collaboration & negotiation skills with the ability to interface with various business units and third parties
- Experience liaising with developers, operations staff and third-party resources
- Understanding of API integration
- JIRA & Confluence (Desirable)
- Software Engineering or Computer Science equivalent degree (Desirable)
Simplifying the business of running a business.
Updated
Other jobs at WEX Brazil Technology Services
Why OmniJobs?
- Rare & hidden jobs
- New jobs every day
- No expired job posts
- All jobs in English
Receive emails about similar jobs
Get alerts to your inbox about new open jobs that are similar to this one.
No spam. No ads. Unsubscribe anytime.
Similar jobs