Software Engineer, Backend Systems

Remote, San Francisco, CA, United States

Determined AI


Role Locations

  • Remote
  • San Francisco, CA, United States

Employees

26 - 50 people

Address

324 5 Th St
San Francisco, CA, 94107-1002, US

Tech Stack

  • Go
  • Python
  • Docker
  • Tensorflow
  • PyTorch
  • Keras
  • Kubernetes
  • PostgreSQL
  • AWS
  • Azure Cloud
  • Google Cloud Platform

Role Description

As a Backend Systems Engineer, you will play a fundamental role building our state-of-the-art machine learning platform. You will get the chance to tackle challenging problems at the cutting edge of machine learning research and development, and to collaborate with leading machine learning researchers and engineers.

You will have the opportunity to define major aspects of our product: you’ll be expected to take on difficult problems without clear solutions, and to design, build, and iterate until we’ve reached elegant solutions that delight our customers. You will work on problems such as efficient cluster scheduling over heterogeneous GPUs and ML accelerators, implementing cutting-edge algorithms for hyperparameter optimization, and designing systems for managing ETL pipelines and automated deployment of ML models.

Requirements

  • Strong problem solving and analytical skills
  • Excellent communication skills, both written and verbal
  • An exceptional track record of designing, implementing, testing, and debugging scalable, reliable production-quality software
  • Experience with distributed and/or concurrent software development
  • NOTE: knowledge of machine learning or statistics is not required

Preferred

  • Experience building systems for large-scale data management, analytics, cluster scheduling, or machine learning
  • In-depth knowledge of building and deploying software on AWS, GCP, and/or Azure
  • Familiarity with modern container-based cluster managers (e.g., Kubernetes, DC/OS)
  • Experience doing operations and being on-call for production systems
  • Interest or experience in machine learning
  • Familiarity with hardware performance, HPC, and/or scientific computing
  • Graduate degree (or bachelor's degree with relevant experience) in computer science or a related field

About Determined AI

We build a high-performance model development environment that enables ML engineers to train better models more quickly, to seamlessly utilize and manage large GPU clusters, and to collaborate more easily with their teammates. Determined allows ML engineers to focus on doing ML at scale, rather than managing infrastructure or writing boilerplate code.

We work at the intersection of large-scale distributed systems and cutting-edge machine learning. Our customers are highly skilled ML engineers and domain experts working on exciting problems in biotech, hardware design, autonomous vehicles, and more. We interact with them to learn more about their data sets, modeling problems, and infrastructure, to help them with our product, and to improve our product offering.

After 4 years as a startup company, we were recently acquired by Hewlett Packard Enterprise (HPE). At HPE, we will remain a distinct organization — we'll be building the same product targeting the same users. Plus we'll have access to HPE's customers, hardware products, and resources to take our mission to the next level.

Company Culture

We believe the best ideas can come from anyone and anywhere, and we have to be humble enough to listen for them. We are customer-focused, but don't think the customer is always right. We are excited about the latest in ML and distributed systems research but try to implement the minimum valuable product. We believe in open communication and transparency in our process and priorities. We believe in the healing power of karaoke and hot sauce.

Interested in this role?
Skip straight to final-round interviews by applying through Triplebyte.