Senior Software Engineer (Data Infrastructure)

Remote, San Diego, CA, United States • $165k - $195k • 0.01% - 0.02%

Prometheus Biosciences


Role Locations

  • Remote
  • San Diego, CA, United States

Compensation

  • $165k - $195k
  • 0.01% - 0.02%

Employees

26 - 50 people

Address

9410 Carroll Park Dr.
San Diego, CA, 92121, US

Tech Stack

  • Google Cloud Platform
  • Unix shell
  • Python
  • Anaconda
  • R
  • Bash
  • PyTorch
  • TensorFlow
  • xgboost
  • sk
  • Pandas
  • Seaborn
  • SQL
  • PostgreSQL

Role Description

Data Science and Engineering (DSE) is at the heart of Prometheus, the first company using precision medicine and machine learning to create better therapeutics for patients with GI and immune-mediated disease. We invite you to learn more about this exciting position on our team.


Job description

Overview

Are you a skilled, ambitious, and enthusiastic software engineer who wants to make a difference and improve human health?

No medical, genomic, or scientific background is required. Rather, you have genuine excitement to address important biomedical challenges (and help patients), write exceptional software, enjoy collaborating with brilliant scientists, and thrive as an early shaper of our organizational culture.

You will:

  1. Provide expertise as we make organization-wide decisions about data formats, cloud infrastructure for compute and storage, and the rest of our technical stack.
  2. Build systems that:
    1. Clean, ingest, migrate, and organize complex multimodal data from a variety of sources and formats into a single source of truth in the cloud.
    2. Enable collaborators at academic institutions, and scientists at Prometheus, to update our central database as we enroll more patients and generate more data.
    3. Enable scientists and other non-developers to perform queries, summarize statistics, and visualize data from our consolidated database.
    4. Select data from a defined cohort of patients, load a desired subset of data, and train machine learning models (using GPUs in the cloud) to improve how we identify important data features and discover new medicines.
    5. Version control data.
  3. Establish agile software development best practices.
  4. Review other developer's code, provide feedback, and integrate suggestions from others.
  5. Write clear, accessible, and effective documentation.
  6. Support, train, and mentor engineers and data scientists.
  7. Collaborate with brilliant scientists (immunologists, cell biologists, etc.) across Prometheus.
  8. Identify and mitigate performance, scaling, or resource issues.
  9. Spearhead, plan, and carry out the implementation of solutions while self-managing your time and focus.

About you:

  1. You have 5-8 years of professional software engineering experience with a focus on data infrastructure.
  2. You are fluent in Python and SQL, and feel at home in a remote Linux server session.
  3. You are familiar with cloud compute and storage (eg., GCP, AWS, Azure, Redbricks, Snowflake).
  4. You are passionate about high code quality, automated testing, collaborative developer workflow on GitHub, and other engineering best practices.
  5. You are a concise and effective communicator (verbal and written).
  6. You share our vision for using the power of data, compute, and science to make a substantially positive impact in the world.

Any of the following is a plus:

  1. Degree(s) in Computer Science, Computer Engineering, Biomedical Engineering, Electrical Engineering, Mathematics, Physics, Statistics, or similar technical discipline.
  2. Experience with R, Java, Google BigQuery, Elasticsearch, React, and/or PostgreSQL
  3. Experience troubleshooting complex distributed systems
  4. Familiarity with containers and data pipelining (Kubernetes, airflow/Kafka, Amazon RDS)
  5. Familiarity with infrastructure-as-code (e.g. Terraform, Helm)
  6. Experience with genomics, immunology, or any other field within biomedicine

FAQ

Data Science and Engineering (DSE) is at the heart of Prometheus, the first company using precision medicine and machine learning to create better therapeutics for patients with GI and immune-mediated disease. We invite you to learn more about this exciting position on our team.


Job description

Overview

Are you a skilled, ambitious, and enthusiastic software engineer who wants to make a difference and improve human health?

No medical, genomic, or scientific background is required. Rather, you have genuine excitement to address important biomedical challenges (and help patients), write exceptional software, enjoy collaborating with brilliant scientists, and thrive as an early shaper of our organizational culture.

You will:

  1. Provide expertise as we make organization-wide decisions about data formats, cloud infrastructure for compute and storage, and the rest of our technical stack.
  2. Build systems that:
    1. Clean, ingest, migrate, and organize complex multimodal data from a variety of sources and formats into a single source of truth in the cloud.
    2. Enable collaborators at academic institutions, and scientists at Prometheus, to update our central database as we enroll more patients and generate more data.
    3. Enable scientists and other non-developers to perform queries, summarize statistics, and visualize data from our consolidated database.
    4. Select data from a defined cohort of patients, load a desired subset of data, and train machine learning models (using GPUs in the cloud) to improve how we identify important data features and discover new medicines.
    5. Version control data.
  3. Establish agile software development best practices.
  4. Review other developer's code, provide feedback, and integrate suggestions from others.
  5. Write clear, accessible, and effective documentation.
  6. Support, train, and mentor engineers and data scientists.
  7. Collaborate with brilliant scientists (immunologists, cell biologists, etc.) across Prometheus.
  8. Identify and mitigate performance, scaling, or resource issues.
  9. Spearhead, plan, and carry out the implementation of solutions while self-managing your time and focus.

About you:

  1. You have 5-8 years of professional software engineering experience with a focus on data infrastructure.
  2. You are fluent in Python and SQL, and feel at home in a remote Linux server session.
  3. You are familiar with cloud compute and storage (eg., GCP, AWS, Azure, Redbricks, Snowflake).
  4. You are passionate about high code quality, automated testing, collaborative developer workflow on GitHub, and other engineering best practices.
  5. You are a concise and effective communicator (verbal and written).
  6. You share our vision for using the power of data, compute, and science to make a substantially positive impact in the world.

Any of the following is a plus:

  1. Degree(s) in Computer Science, Computer Engineering, Biomedical Engineering, Electrical Engineering, Mathematics, Physics, Statistics, or similar technical discipline.
  2. Experience with R, Java, Google BigQuery, Elasticsearch, React, and/or PostgreSQL
  3. Experience troubleshooting complex distributed systems
  4. Familiarity with containers and data pipelining (Kubernetes, airflow/Kafka, Amazon RDS)
  5. Familiarity with infrastructure-as-code (e.g. Terraform, Helm)
  6. Experience with genomics, immunology, or any other field within biomedicine

FAQ

What is DSE?

DSE stands for Data Science and Engineering. Our team consists of software engineers, data scientists, machine learning experts, and bioinformaticians. We develop world-class software, leverage cloud computing resources, and apply computational techniques (including ML) to discover and develop precision therapies and companion diagnostics. This will make clinical development faster and more efficient.

To enable this work, we utilize of the world’s largest gastrointestinal databases which includes a biobank of specimens from patients suffering from inflammatory bowel disease (IBD) and other GI disorders. The biobank, licensed exclusively from Cedars-Sinai Medical Center, holds more than 20,000 samples collected for more than 20 years. We have a vast wealth of information including genetic, clinical, and imaging data. If you join us as one of the first members of our team, you will shape:

  1. Our future culture.
  2. Our engineering and data practices.
  3. Who we hire (and how).
  4. The scientific direction of precision medicine in GI and inflammation.

What are DSE’s priorities?

We are focused on three projects:

  • Atlas: Map out siloed data locations, modalities, size, and related projects.
  • Cirrus: Build data infrastructure to ingest terabtyes of multimodal biological and clinical data from individual silos into a single source of truth in the cloud. Create an interactive query and visualization platform that lives on top of our data warehouse, enabling non-developer scientists to interact with our data assets.
  • Titan: Develop a comprehensive computational biology and ML toolbox using VMs in the cloud with GPU acceleration to perform supervised learning, clustering, and more.

How does DSE drive the vision of Prometheus?

Our vision is to utilize targeted medicine to improve the quality of life of people with GI and autoimmune disease. This is enabled by the principled application of world-class software and powerful compute on unique datasets, guided by rigorous scientific thinking. Hence, DSE is central to the success of Prometheus. DSE will play a core role in setting the direction of future scientific work and therapeutic development.

Who currently works on the DSE Team?

We’re just getting started, so you’ll be one of the founding members of the team!

Currently, we are:

  • Erik Reinertsen, MD, PhD (Director of DSE)
  • Mahyar Sabripour, PhD (Computational biologist)

We report to Laurens Kruidenier, PhD (Chief Science Officer).

We are immediately recruiting:

  • Senior Software Engineer (Data Infrastructure)
  • Senior Data Scientist (Bioinformatics & Genomics)

Later we will expand the team with more data scientists, ML engineers, product managers, and designers.

What opportunities will I have for learning, mentorship, and working with people outside of DSE?

You will be able to attend weekly seminars and alternate topics between biology and computer science / ML:

  • Primers: introductory lectures to teach the basics of a new field.
  • Journal club: a critical appraisal of cutting-edge literature in both biology and computational science, e.g. ML.
  • Leaders in data science and pharma: conversations and perspectives with academic and industry pioneers, investors, etc.

You will also work closely with and learn from professionals outside of DSE:

  • Biologists, immunologists, other researchers in our preclinical R&D and clinical development teams.
  • Data scientists, statisticians, geneticists, and clinical researchers at Cedars Sinai, UCSD, and other leading academic institutions.

Where is Prometheus located?

HQ is in 15 minutes north of San Diego. We will move to a new office at Torrey Pines in early 2022 that is next to the Pacific Ocean.

How much money have you raised?

We are well capitalized. We raised $219M when we completed our IPO in March 2021. A few months prior, we raised $130M from private investors.

What is the tech stack?

We don’t expect you to know everything we use from day one, and look forward to you evolving our tools and workflow.

  • GitHub for issues, task management, code, ground truth documentation, etc. We are developing best practices around asynchronous communication that enables deep work.
  • Google Cloud Platform for compute and data storage, but may switch to AWS or Azure depending on partnership opportunities.
  • Data infra TBD.
  • Python for data wrangling and model training.
  • TensorFlow vs PyTorch for deep learning: we will choose one rather than concurrently develop pipelines in both, but have not yet committed.
  • R for some legacy bioinformatics analyses, but we aim to develop as much as possible in Python.

For the basic IT, we use Microsoft 365, Slack for chat, Zoom for video calls, Notion for our internal wiki, etc.

What is your team’s workflow?

  1. Our team operates on two-week cycles that we call sprints.
  2. Sprints begin with a planning meeting that ends in defined tasks with specific owners, acceptance criteria, and estimated time to completion.
  3. Halfway through the sprint, we refine our backlog.
  4. Sprints end with a Friday afternoon retro, where we share what went well, what can be improved, and action items for the team.
  5. We work on GitHub, and do code reviews.
  6. Daily standups are late morning and conducted around the GitHub project board.

About Prometheus Biosciences

The first company using precision medicine and companion diagnostics to create new therapeutics for GI and immune-mediated diseases.

Company Culture

Many of us have families. We usually leave work around 5-6 PM. We enjoy our weekends and imagine you feel the same way. However, as a rapidly growing company with big goals, we are really motivated to work hard and make a difference for patients.

Occasionally some of us stay late, or work on a Saturday. But this is always self-driven, rather than an unwritten expectation of the company.

We care about results, not sheer hours of effort. This is especially important for engineers and data scientists. We want Prometheus to be a place where you will do the best work of your career.

Our company values help align our culture. They are: - Focus on the individual patient. - Be bold - no guts, no glory. - Follow the science and the data. - Engage in rigorous, honest debate. - Let passion fuel progress. - Be accountable to each other. - Evolve continuously. - Embrace moonshot thinking.

We will do virtual lunch together 1x a week (paid for by the company).

A few times a year, we get the entire team together to celebrate our recent victories and plan our next milestones.

You are also welcome to visit San Diego and work from HQ whenever you want.

Interested in this role?
Skip straight to final-round interviews by applying through Triplebyte.