
Empowering IT professionals with powerful management tools
SolarWinds Inc. is a leading provider of IT management software, headquartered in Austin, Texas. The company offers a range of products including network performance monitoring, systems management, and IT security solutions, serving over 300,000 customers worldwide, including major organizations lik...
Employees enjoy competitive salaries, stock options, generous PTO policies, remote work flexibility, and comprehensive health benefits....
SolarWinds fosters a culture focused on customer success and product excellence, with a strong emphasis on engineering and innovation in IT management...

SolarWinds • Bangalore, India
SolarWinds is seeking a Senior Staff Site Reliability Engineer to lead reliability strategy and architecture for their Observability Platform. You'll work with ClickHouse, Kubernetes, and cloud services like AWS and Azure. This role requires deep expertise in large-scale SaaS infrastructure.
You have extensive experience in site reliability engineering, with a strong focus on maintaining and optimizing large-scale SaaS infrastructures. Your expertise in ClickHouse and Kubernetes allows you to manage production clusters effectively, ensuring high availability and performance. You thrive in collaborative environments, where you can lead teams in implementing reliability strategies and architectural decisions that enhance system performance. Your background in cloud services, particularly AWS and Azure, equips you with the skills to design and manage scalable solutions that meet the demands of modern applications. You are detail-oriented and have a strong understanding of distributed systems, which enables you to troubleshoot complex issues efficiently. You believe in the power of automation and continuously seek ways to improve operational processes through innovative solutions.
Experience with GitOps practices is a plus, as it aligns with your commitment to modern DevOps methodologies. Familiarity with high-throughput data pipelines and observability tools will further enhance your ability to contribute to our team. You are open to learning new technologies and adapting to evolving industry standards, which will help you stay at the forefront of site reliability engineering.
In this role, you will take ownership of the reliability strategy for SolarWinds' Observability Platform, focusing on the SaaS Logs and data pipelines powered by ClickHouse. You will lead the design and implementation of performance-optimized schemas, ensuring that our systems can handle massive datasets efficiently. Your responsibilities will include managing ClickHouse production clusters, driving automation around data platform operations, and collaborating with cross-functional teams to enhance system reliability. You will also be tasked with shaping how we ingest, store, and query observability datasets, making critical decisions that impact the overall performance of our services. As a senior member of the team, you will mentor junior engineers and share your knowledge to foster a culture of continuous improvement and learning.
At SolarWinds, we prioritize a people-first culture that values collaboration and innovation. You will have the opportunity to work with a talented team dedicated to delivering world-class solutions. We offer competitive compensation and benefits, along with opportunities for professional growth and development. Join us in our mission to empower customers and drive business transformation through our powerful and secure solutions. We encourage you to apply even if your experience doesn't match every requirement, as we value diverse perspectives and backgrounds.
Apply now or save it for later. Get alerts for similar jobs at SolarWinds.