Machine Learning Engineer

Remote, San Francisco, CA, United States • $140k - $220k • 0.02% - 0.10%


Role Locations

  • Remote
  • San Francisco, CA, United States


  • $140k - $220k
  • 0.02% - 0.10%


26 - 50 people


1543 Mission St
San Francisco, CA, 94103-2512, US

Tech Stack

  • Apache Spark
  • Scala
  • AWS

Role Description

Veraset, a division of SafeGraph, is looking for a Data Engineer to build, scale, and monitor pipelines that process hundreds of terabytes of geospatial data daily, to join our fast-growing team.

Veraset is a well funded, early-stage, profitable startup in the data-as-a-service space. You'll be reporting directly to the CTO. This position will involve travel up to 20% of the time.

Who you are:

  • You are excited about the idea of leveraging open-source and proprietary technologies to build massive pipelines that process billions of rows of geo-location data
  • You want to join a well funded, profitable, early stage data startup
  • You have a strong GTD mindset and problems in an effective and scalable way
  • You can work and communicate effectively with technical and non-technical colleagues
  • You don’t need supervision to thrive - you think in first principles, know the right questions to ask, and can develop and execute on a vision

What you’ve done:

  • You’ve spent 2-4 years experience in a technical role at a cutting-edge technology company
  • You have a strong engineering background in a technical discipline - ideally Computer Science, Engineering, Physics, Math, Data Science, or Statistics
  • You’ve built pipelines that process massive amounts - ideally hundreds of TBs - of data in the past and know your storage and analysis technologies well
  • You’ve worked tirelessly to optimize the ETL (extracting, transforming, and loading) pipelines you’ve built and can speak credibly to the challenges and solutions you’ve developed
  • You’ve become comfortable working in one of Java, Scala, or Python
  • You’ve worked with distributed system, cloud technologies, and open-source technologies like Spark, Cassandra, Kafka, Elastic Search, Airflow, Hadoop

What you’ll do:

  • Help architect, create, manage, and optimize Veraset’s massive, geospatial, data pipeline
  • Build analytics around the core pipeline to provide actionable insights into key metrics such as customer acquisition, data insight generation, or operational efficiency
  • Dive-deep into the data to gain key insights and proposed scope expansions to grow our business
  • Collaborate cross-functionally with our customer, sales, and product teams
  • Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc...
  • Work with our leadership to help craft the direction of the data and architecture at Veraset
  • Contribute to building out the software engineering team at Veraset
  • Own your destiny - Investigate, propose, and action your ideas to help us achieve our potential as a business

Who we are:

  • Veraset is a B2B company that sells data to data scientists and machine learning engineers
  • Our mission is to gather, curate, and deliver high-quality data to innovators in the Fortune 10, startups, and the public sector. Customers have used our data to:
    • Aid in disaster relief
    • Inform equity trading decisions
    • Determine how far people will travel to restaurants based on type of cuisine
    • Select ideal real estate sites
    • Create advertising audiences
    • Quantify the effectiveness of advertising campaigns
    • In-app personalization
    • City Planning
  • Veraset is affiliated with SafeGraph. SafeGraph raised a $20 million Series A in 2017 and is profitable
  • We’re a remote-friendly company with team members in San Francisco, New York City, Salt Lake City. We have full offices in NYC (Tribeca) and SF (SoMa).

About SafeGraph

We build truth datasets to power machine learning, data engineering, and data science. We are currently focused on understanding how humans interact with the physical world.

Company Culture

Do fewer things but be great at them. It is 1000x better to be the best than it is to be “just” really good. But it’s very difficult to be the best at many things. Every team member strives to do as few things as possible, and the company strives to do as few things as possible—so that we can be the best at what we do. This is also why we strive to only do one new thing at a time — series beats parallel.

Judgement is the x-factor. It is essential that every team member at Safegraph makes key decisions autonomously, so that we move fast and limit bureaucracy. But as Voltaire (sometimes attributed to Spider-Man’s Uncle) said, “with great power comes great responsibility.” To make great and efficient decisions at all levels of the company, we need to (1) clearly communicate the company’s strategy to all team members; (2) hire super smart teammates that work hard; (3) only hire people who have demonstrated sound judgment and are deserving of our trust.

We are the enablers, not the solvers As a company, it is important we have the humility to accept that our clients will ultimately be the ones to make the world a better place and solve humanity's greatest challenges … we are just an enabler. This humility should always color everything we do.

Respect our own time -- get leverage. Because we hire only the most talented people, SafeGraph team members must constantly seek leverage. We put an extremely high value on our own time. A team member rarely does a repetitive or mundane task more than a few times before she automates it (through engineering, outsourcing, selecting a vendor, etc.). SafeGraphers should spend over 75% of their time doing things that are really hard and that only they can do. We know that the more we can leverage ourselves, our teams, and our organization, the bigger SafeGraph can scale.

Respect others’ time -- don’t be a bottleneck. Humility means respecting the time of our coworkers, partners, clients, recruits, etc. We never want to be a bottleneck as a company or as an individual. Bottlenecks cause frustration and cost us customers and revenue. They lower morale and create uncomfortable conversations like “did you get that email I sent Tuesday?” We return all emails and voicemails within 12 hours, even if it’s just to say “I'll get back to you tomorrow." We strive to never be bottlenecks.

Focus on growth. Great team members continue to improve and grow. The only way to do that is to actively solicit feedback on how to get better and find ways to work on one’s strengths. Great team members also focus on making those around them better, and they give feedback often. Giving constructive feedback (suggestions of how to improve flaws) is helpful but giving specific, positive feedback can lead to even faster growth and higher leverage of people’s strengths. We have extremely high expectations of ourselves and of our team members.

Interested in this role?
Skip straight to final-round interviews by applying through Triplebyte.