- San Francisco, CA, United States
Our infrastructure team manages our data center, and high-performance computing clusters. This includes running and scaling Kubernetes, deploying on-prem hardware, capacity planning, and working with other teams on experiment and tooling design. See our recent blog post (https://blog.openai.com/scaling-kubernetes-to-2500-nodes/) to get a sense of what kind of challenges we solve in our day-to-day work. This position closely resembles infrastructure/DevOps in a very large-scale startup.
We look for a track record of the following:
- Experience, designing, implementing, and running production services
- Comfort managing and monitoring large-scale infrastructure deployments
- Willingness to debug problems across the stack, such as networking issues, performance problems, or memory leaks
In this role, you will work closely with and directly accelerate researchers, but don't need to become a machine learning expert yourself. We value people who can quickly obtain deep technical understanding of new domains, and enjoy being self-directed and identifying the most important problems to solve. Experience with high-performance computing, or open-source contributions are a bonus.
We're a non-profit artificial intelligence research company. Our mission is to build safe AI, and ensure AI's benefits are as widely and evenly distributed as possible.
In the short term, we're building on recent advances in AI research and working towards the next set of breakthroughs.
Skip straight to final-round interviews by applying through Triplebyte.