Data Ingestion Engineer
- East Bay, CA, United States
- Los Angeles, CA, United States
- New York, NY, United States
- Seattle, WA, United States
- San Francisco, CA, United States
- Silicon Valley, CA, United States
Golden is looking for a Data Ingestion Engineer to generate growth in the number of entities and triples in the Golden Knowledge Base as part of our mission to organize and map the world’s knowledge. We are looking for a data engineer to both execute large-scale data ingestion projects, one-off urgent ingestion projects, and to develop tooling for data ingestion at a larger scale with more accuracy and at higher levels of automation. The Golden Knowledge Base has data on millions of entities, with both unstructured/text information and structured data fields/triples. This role will be responsible for growing the number of entities and the number of triples. Our team uses React, Python, Django, AWS, and Postgres. You will work closely with Golden’s AI/NLP team to extend and scale data ingestion techniques. As an early team member, you would be working in a startup environment with high autonomy and responsibility. Your work will also relate to the real world in making a product that everyone will use and love. We are looking for special early engineers to join us.
What You Will Be Doing
- Identify and proactively create new data ingestion and processing tooling to eliminate manual processes, inefficient or repetitive work, and address quality issues.
- Connect to public databases to ingest data as well as executing one-off imports of data.
- Make thoughtful judgements on data quality to clean data sources for import.
- Use third-party APIs and web scraping tools to source data at scale.
- Work with the AI/NLP team to scale and embed techniques and help with data ingestion projects.
- Demonstrate common sense in applying business logic to ontological/schema decisions.
- Work on the front lines of building a new knowledge graph.
Qualifications We Need
- Demonstrated ability in data ingestion, generation and problem solving.
- Strong experience at scale with data-oriented products.
- Experience ingesting large scale structured data.
- Deep scraper experience.
- Experience with Python, Jupyter notebooks, and Pandas to inspect and analyze data sources.
- Preferred PhD, Master’s or Bachelor’s in a STEM field. We also consider exceptional applicants with other backgrounds (e.g., self-taught).
- You worked with any of the following: extraction of triples, topic prediction, taxonomic detection, event detection, clustering, relevancy, deduction and inference of data, generation of text with NLP.
- Public data domain knowledge.
- Strong experience with probability and statistics.
- Experience with Machine Learning or Natural Language Processing techniques.
- DevOps / Infrastructure experience in data ingestion.
- You were emotionally moved by Free Solo, which chronicles a quest to triumph over the impossible. :-)
Golden is on a mission to map human knowledge and accelerate discovery and education. We are building the world’s first self-constructing knowledge database making it easier to explore and contribute to public and private knowledge. Golden leverages human effort by using machine intelligence to make the process of gathering and communicating knowledge simpler. Golden is venture-backed by a16z, Founders Fund, Giga fund and other top tier investors, and is led by Jude Gomila, a founder of Heyzap (YC ‘09, acquired for $45M in 2016), and investor in over 200 startups. To learn more about Golden visit us at https://golden.com/ check out our blog https://golden.com/blog or join the conversation at @Golden
Golden is on a mission to map human knowledge. We are building the world’s first self-constructing knowledge database.
Intellectually curious, meritocratic, pragmatic, growth driven, multi disciplinary, hackerish, learning focused, respectful, do no harm.
Skip straight to final-round interviews by applying through Triplebyte.