Datafold

remote
< 10 Employees
< 10 Engineers
$2M - $5M Funding
Pre-Series A

Similar to how Datadog/Prometheus/New Relic enable software monitoring, we connect to all elements in the Data stack and help Data teams with discovering datasets, monitoring the quality and reliability of their tables and ETL pipelines, and automating data QA.

Active Roles

Why join us?

  • Now is a unique time to join: on the one hand, we already have great momentum behind us – product, customers (inc. Thumbtack & Patreon), investors (YCombinator & NEA), funding – but on the other hand, there is so much ahead and so many opportunities for you to shape our product, technical stack, and culture.

  • Fully remote, highly technical team with deep expertise in the Data domain: our founders built real-time telemetry systems and scaled some of the world’s largest data platforms at Autodesk and Lyft.

  • We are creating a new product category – Data Observability – something that has never been done before, already makes 100+ data engineers/scientists very happy, and eventually will become a must-have tool in every company's data stack.


Engineering at Datafold

Engineering team and processes

Our Engineers report to an Engineering Manager and work closely with CTO & CEO. We use two-week sprints & track work in Linear that is linked with our Slack & Github automatically to keep all information easily accessible.

Technical Challenges

Example: to help our customers monitor data quality in their ETL pipelines, we need to map the dependencies between all tables and consumers (e.g. ML models, BI dashboards) in their warehouse, so that we can assess the impact and trace cascading data corruption. Some of the larger customers have 5-10K tables and 200-500K columns. We wrote a SQL parser that decomposes every query in the warehouse into AST and combines them all into a global dependency graph stored in Neo4j. The next challenge is using the information stored in the graph to rank tables & columns by importance so that we can prioritize anomalies and issues intelligently.

Projects you might work on
  • Move our full-text data catalog search from front end to back end (front end simply can't scale for large customers)

  • Given a collection of SQL queries (~100K) executed in a customer's warehouse, infer their entity-relationship model to help data users understand how tables can be joined for analysis.

  • Identify data distribution drift in a particular column in the warehouse and using the dependency (lineage) graph, try to find the root cause (e.g. upstream data source has changed)

Tech stack
Python
Flask
PostgreSQL
React
TypeScript
Docker
Kubernetes

Working at Datafold

Since the very beginning, our team has been remote & working from multiple continents and time zones. We value strong work ethics, honesty, and a growth mindset and are looking for mature and well-organized professionals who are excited about building a new and innovative product that will redefine how organizations use data.

Diversity and Inclusion
Perks & benefits
  • Company Retreats

    We will conduct the next retreat as soon as the traveling restrictions will be lifted off!

  • Work from Home
  • LGBTQ+ friendly

    We welcome all orientations

  • Flexible Hours

    As a fully remote and globally distributed team, we encourage our employees to choose the work hours that align with their biological rhythms, personal and family circumstances.

  • Generous Vacation

    We offer unlimited PTO, that's it.

Our Team by the Numbers
Company-wide
Women
22%

Interested in this company?
Skip straight to final-round interviews by applying through Triplebyte.

Apply