Etleap came to be out of the frustration with how much time data wrangling takes away from the actual analysis. As engineers we were tired of spending countless hours building and maintaining data pipelines and warehouses on behalf of our analytics teams. That is why we're creating an intuitive ETL tool that enables the data analysts themselves to integrate data from any source.
The product takes data source input, such as databases, log files, Salesforce, Marketo, and other cloud tools, lets users set up transformations, and builds the data warehouse tables.
The timing is right for a product like Etleap — With data warehousing moving to the cloud (AWS Redshift, Google BigQuery, Snowflake, etc.), and the proliferation of data sources (microservice DBs, online ad tools, CRM systems, marketing plaforms, etc.) the time has come for an ETL solution that abstracts the painful complexities of data infrastructure and puts the data analyst in charge.
We're in a big and growing market — Cloud ETL is a new market, and it's a $5B opportunity that's growing quickly. Etleap is growing quickly too, with consistent 18% month-over-month growth for the past year. Customers include data-driven teams at companies like PagerDuty, Okta, Zenefits, Airtable, and AXS.
We're funded by top-tier investors - Etleap started out in YCombinator, and has since raised funding from First Round Capital, SV Angel, and other top-tier investors.
Our back-end team consists of generalists, and everyone is involved in all aspects of development: building product features, making infrastructure decisions, and making sure our SaaS product is operating smoothly. We work in two-week sprints and use a standard GitHub pull request flow, with a strong focus on low-latency code reviews.
Many companies have ETL infrastructure that's custom-built for their needs. At Etleap, we're building generalized ETL infrastructure that has to work for hundreds of companies. We have tens of thousands of data pipelines continuously in production, and they should all run continuously with no human intervention. Building robust systems and automating monitoring is key in order for this to scale.
Micro-batching data processing system. Etleap processes terabytes of data every day using auto-scaling Hadoop MapReduce clusters. This is great for throughput but means end-to-end latency, i.e. the time from which data is produced to the time it’s available in the data warehouse, is at least 3-5 minutes, which is too high for time-sensitive data analysis. Let’s introduce a streaming or micro-batching processing system to reduce end-to-end latency down to 10-15 seconds. This would ideally be efficient for big workloads as well so we can replace our MapReduce system, but could also be used just for the speed layer in a lambda architecture where MapReduce is still the batch layer.
Automatic analysis of customers' query logs. Longer-term, we will enable data teams to set up and manage their data warehouse entirely through Etleap’s platform. To that end we’re building Etleap features that provide insights that a DBA with deep data warehouse expertise would normally provide. In Redshift, distribution and sort keys determine how data is stored within the cluster, which has a large effect on query performance. Which columns ought to be used as distribution and sort keys depend on data access patterns. Let’s automate the analysis of Redshift query logs over time and present suggestions to users of how they can change their distribution and sort keys (through Etleap) to improve query performance.
SQL extraction using replication logs. Etleap’s SQL extractor is currently query-based, which means we can get new rows, and also updates, if the users have an update timestamp column. However, we can’t get deleted rows, and our workaround of using periodic refreshes is OK but not great. Using MySQL and Postgres replication logs to capture changed data can fix these problems. We’d take an initial snapshot of the DB and then use binlogs to get new, updated, and deleted data from there. This has to be made seamless and robust. Another advantage of the replication log approach is that we can build complete history tables in the data warehouse, which is an often-requested feature from our customers.
We value engineers who can get things done, and who think creatively from first principles to build our product in new ways. Additionally, all our engineers take mentorship seriously, and are always open to discussing technical decisions, product ideas, or plan future infrastructure. We have a very strict no-asshole hiring policy.
Unlimited vacation policy, strongly encourage everyone to take 3 weeks.
We have company-sponsored lunch together every day.
Just came back from our company retreat in Florida. We do these twice a year - great for team-building!
We think the back-end team works best when everyone comes to the office every day, but there are no hard rules here.
We're headquartered in a very cool space on Townsend street in SOMA, SF. Our interior designer friends did an excellent job making it a nice place to be!
Interested in this company?
Skip straight to final-round interviews by applying through Triplebyte.