Software Engineer, Data

Remote, San Francisco, CA, United States, Toronto, ON, Canada


Role Locations

  • Remote
  • San Francisco, CA, United States
  • Toronto, ON, Canada


26 - 50 people


525 Brannan St Ste 308
San Francisco, CA, 94107-1632, US

Tech Stack

  • Python
  • Tensorflow
  • Kubernetes
  • Apache Spark
  • Google Cloud Platform

Role Description

We’re on a mission to understand and structure the world’s medical data, starting by making sense of the terabytes of clinician notes contained within the electronic health records of the world’s largest health systems.

We’re seeking exceptional Data Engineers to work on data products that drive the core of our business--a backend expert able to unify data, and build systems that scale from both an operational and an organizational perspective.

Please note, this position has a minimum requirement of 3+ years of experience. For earlier career candidates, we encourage you to apply to our SF and/or Toronto locations.

As a Data Engineer you will:

  • Develop data infrastructure to ingest, sanitize and normalize a broad range of medical data, such as electronics health records, journals, established medical ontologies, crowd-sourced labelling and other human inputs

  • Build performant and expressive interfaces to the data

  • Build infrastructure to help us not only scale up data ingest, but large-scale cloud-based machine learning

We’re looking for teammates who bring:

  • 3+ years of development experience in a company/production setting

  • Experience building data pipelines from disparate sources

  • Hands-on experience building and scaling up compute clusters

  • Excitement about learning how to build and support machine learning pipelines that scale not just computationally, but in ways that are flexible, iterative, and geared for collaboration

  • A solid understanding of databases and large-scale data processing frameworks like Hadoop or Spark. You’ve not only worked with a variety of technologies, but know how to pick the right tool for the job

  • A unique combination of creative and analytic skills capable of designing a system capable of pulling together, training, and testing dozens of data sources under a unified ontology

Bonus points if you have experience with:

  • Developing systems to do or support machine learning, including experience working with NLP toolkits like Stanford CoreNLP, OpenNLP, and/or Python’s NLTK

  • Expertise with wrangling healthcare data and/or HIPAA

  • Experience with managing large-scale data labelling and acquisition, through tools such as through Amazon Turk or DeepDive

About Fathom

We're building a deep learning system to structure and organize all the free text in patient medical records. Our first application for this system is automatically turning that text into health insurance billing codes. Each year, $3.5 billion is spent on using manual labor to solve this problem with only 83-85% accuracy achieved. Our financing was led by Google Ventures and 8VC and our engineering team comes from Google, Facebook, and Twitch.

Company Culture

We are driven and passionate, motivated by learning, personal growth and impact. We have a heavy emphasis on feedback and communication, running our OKRs on two weeks cycles and monthly one hour check-ins.

Interested in this role?
Skip straight to final-round interviews by applying through Triplebyte.