
Revolutionizing space travel for humanity's future
SpaceX, founded in 2002 by Elon Musk, is a leading aerospace manufacturer and space transportation company headquartered in Hawthorne, California. The company has developed the Falcon 9 and Falcon Heavy rockets, as well as the Dragon spacecraft, which delivers cargo to the International Space Statio...
Employees at SpaceX enjoy competitive salaries, stock options, generous PTO policies, and comprehensive health benefits. The company also supports pro...
SpaceX fosters a culture of innovation and engineering excellence, encouraging employees to tackle ambitious projects and push the boundaries of space...

SpaceX • Hawthorne, CA
SpaceX is seeking a Site Reliability Engineer for the GNC team to operate and scale mission-critical products. You'll work with technologies like Docker, Kubernetes, and AWS to maintain high-performance computing systems. This role requires a strong background in software development and operations.
You have a solid background in site reliability engineering, with experience in deploying and maintaining mission-critical systems. Your expertise in Linux and cloud services like AWS allows you to effectively manage and scale infrastructure. You are proficient in scripting languages such as Python and Bash, enabling you to automate processes and improve system reliability. Your familiarity with containerization technologies like Docker and orchestration tools like Kubernetes equips you to handle complex deployments. You thrive in collaborative environments, working closely with software engineers to ensure the operability of products. You are adaptable and can manage multiple priorities in a fast-paced setting.
Experience with monitoring tools such as Prometheus and Grafana is a plus, as it helps you maintain system health and performance. Familiarity with CI/CD practices will enhance your ability to streamline deployment processes. A background in high-performance computing will be beneficial for managing SpaceX's HPC cluster. You understand the importance of incident response and are prepared to tackle outages effectively.
As a Site Reliability Engineer at SpaceX, you will be responsible for deploying, upgrading, and maintaining a suite of GNC products and services. You will provision and manage both virtual and physical servers, ensuring that the infrastructure is robust and scalable. Collaborating with the GNC software engineering team, you will create highly operable and maintainable products that support mission-critical operations. You will monitor the performance of the HPC cluster and respond to any outages, ensuring minimal disruption to operations. Your role will involve adding monitoring capabilities for web applications and continuously improving the underlying computational infrastructure of GNC. You will also participate in the design and implementation of automated data analysis systems and continuous integration systems for rocket and simulation software.
At SpaceX, you will be part of a mission-driven team focused on enabling human life on Mars. We offer a competitive salary and benefits package, along with opportunities for professional growth and development. You will work in a dynamic environment where innovation is encouraged, and your contributions will have a direct impact on the future of space exploration. Join us in our quest to make humanity a multi-planetary species.
Apply now or save it for later. Get alerts for similar jobs at SpaceX.