Senior Site Reliability Engineer

Senior

💰$140–180K

Irvine, 🇺🇸 United States

Site Reliability Engineer

Technology

AWS Azure Bash Cloud GCP Go Java Kubernetes Node OCI Python

🔥 Apply now

At the forefront of the future of connected living, TP-Link's Systems Inc. R&D Center in Irvine, Southern California's innovation hub, spearheads research and development of next-generation networking, IoT smart home products, and software services. Our team of passionate engineers are constantly innovating, engineering solutions that transform the end user experience with simpler, smarter, and more reliable connectivity.

We're looking for a passionate and experienced Senior Site Reliability Engineer to join our team and play a crucial role in ensuring our cloud platform's security, Reliability, scalability, and operational excellence.

About Us:

Headquartered in the United States, TP-Link Systems Inc. is a global provider of reliable networking devices and smart home products, consistently ranked as the world’s top provider of Wi-Fi devices. The company is committed to delivering innovative products that enhance people’s lives through faster, more reliable connectivity. With a commitment to excellence, TP-Link serves customers in over 170 countries and continues to grow its global footprint.

We believe technology changes the world for the better! At TP-Link Systems Inc, we are committed to crafting dependable, high-performance products to connect users worldwide with the wonders of technology.

Embracing professionalism, innovation, excellence, and simplicity, we aim to assist our clients in achieving remarkable global performance and enable consumers to enjoy a seamless, effortless lifestyle.

Responsibilities:

Serve as technical SME for implementing and operating Microservices on Kubernetes cloud-based platforms.
Collaborate with the Cloud Technical Development and DevOps teams to deploy services to the Multi-Cloud Platform.
Performing Load Tests and Chaos Tests to ensure the scalability and reliability of microservices.
Build Observability for Microservices and cloud platforms like AWS, OCI, Azure, and GCP.
Write and Execute the Disaster recovery plans in collaboration with the Development and DevOps team.
Analyze and resolve production risks caused by insufficient resources, such as node groups, CPU, memory, HPA scheduling, JVM pre-warming, etc.
Write and maintain scripts for automation using languages like Python, Go, or Bash.
Define and maintain the KPIs (SLA/SLO/SLI) for all cloud microservices with development teams to better understand the business.
Create and maintain technical documentation, including architecture diagrams, design documents, and standard operating procedures.
Guarantee adherence to security and compliance standards, including ISO27001, SOC2, and GDPR.
Lead incident response efforts to troubleshoot and resolve production issues quickly.
Perform post-incident analysis to identify root causes and potential workarounds/solutions.
Assist with product/technology selection, including implementation of POCs
Be fluid and open to change and evolving processes and tools
Help to mentor and train less senior members of the team
Ability to be part of On-call rotation and provide support after work hours and on weekends.
Other duties as assigned

Requirements

Bachelor's degree in Computer Science, Information Technology, or a related field.
5+ years of experience as a Site Reliability Engineer.
Proficiency in programming and scripting languages like Java, Python, Bash, or PowerShell.
Hands-on experience in SRE, DevOps, cloud operations, and cloud security best practices.
Strong knowledge of security technologies, including Identity and access management, Network security, Application security, and Data protection.
Strong problem-solving and analytical skills, with the ability to work independently and as part of a team.
Experience in developing and maintaining technical documentation and implementing compliance requirements.

Additional Skills (Preferred):

Expert-level cloud certifications include AWS Solutions Architect, Professional, Azure Solutions Architect Expert, and GCP Professional Cloud Architect.
Experience with container orchestration technologies (e.g., Kubernetes).

Benefits

Base Salary Range: $140,000 - $180,000

Competitive salary and comprehensive benefits package.
The chance to be part of a growing and innovative company.
Engaging and inclusive work culture.
The opportunity to be involved in challenging and impactful projects.

🔥 Apply now

TP-Link USA Corporation

TP-Link USA is a leading provider of consumer Wi-Fi networking products

Consumer Goods

Technology

🌍 tp-link.com All open jobs

Reliably Smart

🌍 linkedin.com

🏭Information Technology & Services

🎂1996

Employees80

Followers7.5K

Updated

Other jobs at TP-Link USA Corporation

🇺🇸

Quality Assurance Engineer

🇺🇸

IT Manager

🇺🇸

Patent Counsel

🇺🇸

Senior Product & Commercial Counsel

View all TP-Link USA Corporation jobs

Why OmniJobs?

Rare & hidden jobs
New jobs every day
No expired job posts
All jobs in English

Receive emails about similar jobs

Get alerts to your inbox about new open jobs that are similar to this one.

🇺🇸 United States

Site Reliability Engineer

No spam. No ads. Unsubscribe anytime.

Similar jobs

🇺🇸💰Added 20h ago

Staff Site Reliability Engineer - Database Management

Visa is a world leader in digital payments, facilitating more than 215 billion payments transactions between consumers, merchants, financial institutions and government entities. (legal services)

Microsoft SQL ServerAWSGCPPowerShellTSQLPythonC++EnterpriseCloud

Remote🇺🇸Added 10h ago

Senior Site Reliability Engineer

MasteryPrep’s mission is to level the playing field in education by offering the most effective test preparation available – made accessible to all students. (Education Management)

Google CloudFirestorePostgreSQLTypeScriptReactPythonUnix/LinuxTerraformCloudDocker + 4

🇺🇸🇨🇦💰Added 7h ago

Senior Site Reliability Engineer

IL Mellanox Technologies, Ltd. - NVIDIA is a company that designs graphics processing units (GPUs).(Computer Hardware)

AI trainingAI infrastructureMicroservicesELKPrometheusLokiPythonGoPerlRuby + 4

🇺🇸💰👶Added 3 days ago

Lead Launch Reliability Engineer

SpaceX is actively developing the technologies to enable human life on Mars. (aviation and aerospace component manufacturing)

Remote🇺🇸👶Added 4 days ago

Senior Site Reliability Engineer

FareHarbor is the world’s leading provider of reservation software to the tour, activity and attraction industry, working with thousands of businesses across North America and the globe. (Leisure, Travel & Tourism)

LinuxAWSTerraformAWS CloudFormationKubernetesAWS EKSFargateOpenTelemetrySLOsCloud

Senior Site Reliability Engineer

Requirements

Benefits

TP-Link USA Corporation

LinkedIn

Other jobs at TP-Link USA Corporation

Why OmniJobs?

Receive emails about similar jobs

Similar jobs