Similar to how Datadog/Prometheus/New Relic enable software monitoring, we connect to all elements in the Data stack and help Data teams with discovering datasets, monitoring the quality and reliability of their tables and ETL pipelines, and automating data QA.
Why join us?
Now is a unique time to join: on the one hand, we already have great momentum behind us – product, customers (inc. Thumbtack & Patreon), investors (YCombinator & NEA), funding – but on the other hand, there is so much ahead and so many opportunities for you to shape our product, technical stack, and culture.
Fully remote, highly technical team with deep expertise in the Data domain: our founders built real-time telemetry systems and scaled some of the world’s largest data platforms at Autodesk and Lyft.
We are creating a new product category – Data Observability – something that has never been done before, already makes 100+ data engineers/scientists very happy, and eventually will become a must-have tool in every company's data stack.
Engineering at Datafold
Our Engineers report to an Engineering Manager and work closely with CTO & CEO. We use two-week sprints & track work in Linear that is linked with our Slack & Github automatically to keep all information easily accessible.
Example: to help our customers monitor data quality in their ETL pipelines, we need to map the dependencies between all tables and consumers (e.g. ML models, BI dashboards) in their warehouse, so that we can assess the impact and trace cascading data corruption. Some of the larger customers have 5-10K tables and 200-500K columns. We wrote a SQL parser that decomposes every query in the warehouse into AST and combines them all into a global dependency graph stored in Neo4j. The next challenge is using the information stored in the graph to rank tables & columns by importance so that we can prioritize anomalies and issues intelligently.
Move our full-text data catalog search from front end to back end (front end simply can't scale for large customers)
Given a collection of SQL queries (~100K) executed in a customer's warehouse, infer their entity-relationship model to help data users understand how tables can be joined for analysis.
Identify data distribution drift in a particular column in the warehouse and using the dependency (lineage) graph, try to find the root cause (e.g. upstream data source has changed)
Working at Datafold
Since the very beginning, our team has been remote & working from multiple continents and time zones. We value strong work ethics, honesty, and a growth mindset and are looking for mature and well-organized professionals who are excited about building a new and innovative product that will redefine how organizations use data.
We will conduct the next retreat as soon as the traveling restrictions will be lifted off!
Work from Home
We welcome all orientations
As a fully remote and globally distributed team, we encourage our employees to choose the work hours that align with their biological rhythms, personal and family circumstances.
We offer unlimited PTO, that's it.
Interested in this company?
Skip straight to final-round interviews by applying through Triplebyte.