Infrastructure Engineer

San Francisco, CA, United States

OpenAI


Role Location

  • San Francisco, CA, United States

Employees

26 - 50 people

Address

3180 18 Th St Ste 100
San Francisco, CA, 94110, US

Tech Stack

  • Python
  • Tensorflow
  • Kubernetes

Role Description

Our infrastructure team manages our data center, and high-performance computing clusters. This includes running and scaling Kubernetes, deploying on-prem hardware, capacity planning, and working with other teams on experiment and tooling design. See our recent blog post (https://blog.openai.com/scaling-kubernetes-to-2500-nodes/) to get a sense of what kind of challenges we solve in our day-to-day work. This position closely resembles infrastructure/DevOps in a very large-scale startup.

We look for a track record of the following:

  • Experience, designing, implementing, and running production services
  • Comfort managing and monitoring large-scale infrastructure deployments
  • Willingness to debug problems across the stack, such as networking issues, performance problems, or memory leaks

In this role, you will work closely with and directly accelerate researchers, but don't need to become a machine learning expert yourself. We value people who can quickly obtain deep technical understanding of new domains, and enjoy being self-directed and identifying the most important problems to solve. Experience with high-performance computing, or open-source contributions are a bonus.

About OpenAI

We're a non-profit artificial intelligence research company. Our mission is to build safe AI, and ensure AI's benefits are as widely and evenly distributed as possible.

In the short term, we're building on recent advances in AI research and working towards the next set of breakthroughs.

Interested in this role?
Skip straight to final-round interviews by applying through Triplebyte.