Senior Data Engineer - Spark
- Washington, D.C., DC, United States
Sayari is looking for a mid-level to senior Data Engineer to join our Infrastructure team located in Washington, DC. The Infrastructure team is an integral part of our Engineering division and works closely with our Software Engineering & Data Science teams, as well as other key stakeholders across the business.
What You Will Do: As a member of Sayari's engineering team, you will work to build the next generation of our entity resolution engine. This work will involve quickly getting up to speed on the existing production engine (implemented in Apache Spark), and then starting to focus on improvements that will take it to the next level. This is a cutting edge problem where no definitive solution exists, and you'll have the opportunity to implement schemes described in academia and bring your own insights to the table.
What You Will Need: - Strong experience with any two of: Python, Java, Scala - 2+ years of experience using Apache Spark - Solid experience with a NoSQL database like Cassandra or Neo4J - Experience working on a cloud platform like GCP, AWS, or Azure - Experience working collaboratively with git
What We Would Like: - Experience with, or interest in, graph databases - Experience with entity resolution/record linkage techniques (e.g. similarity joins, blocking schemes, string similarity) - Experience with data orchestration frameworks like Apache Airflow
Who You Are: - Strong process-oriented self-starter, with impeccable organizational skills - Experienced in supporting and working with cross-functional teams in a dynamic environment - Interested in learning from and mentoring team members - Passionate about open source development and innovative technology
Please note: No sponsorship is available for this position. Applicants must be currently authorized to work in the United States for any employer.
Sayari Graph is the first purpose-built tool for navigating the complexity of global corporate ownership and commercial relationships. This provides a complete picture of customers, vendors, and third-parties, while maintaining provenance back to primary source documents.
Graph can be delivered as a cloud application with an intuitive user interface, REST API, data subscription, or on-premise.
All of our current openings are remote
Limitless growth and learning opportunities in a startup environment. A strong commitment to diversity, equity & inclusion.
Skip straight to final-round interviews by applying through Triplebyte.