Data Engineer
Data Engineer
Job Summary:
At AnthologyAI, we believe that behind every data point is a story of hundreds of decisions made by real people. These stories are the pulse of every sector in the global economy, shaping our present and future. Our mission is to democratize access to these stories in the data economy. We are the most efficient, accurate, and actionable source of consumer intelligence. We ethically capture and analyze consumer behaviors 24/7 via our app, Caden, without compromising privacy or security.
We have set out to pioneer a breakthrough platform that combines billions of unbiased first-party consumer data with advanced predictive AI models. It will empower businesses across industries—from retail to banking— regardless of their internal data capabilities, to predict market dynamics and consumer behaviors with unparalleled precision.
We’re led by industry veterans, backed by powerhouse investors (almost $30M total investment), and powered by an extremely talented, experienced and diverse team.
This position will report to the Data Science Manager and will be an integral part of the Data & AI organization. Your expertise and daily contributions will drive the development and enhancement of our products/services that will generate business value to our business, clients and our users.
What You'll Do
- Work within the Data Science team to implement various data pipelines for our end-to-end solutions.
- Provide active input into the design of our data product offerings portfolio and our data dissemination framework.
- Integrate the right tools and methods for data enhancement, data quality, data obfuscation, privacy measurement and enhancement.
- Implement data security and access controls throughout the data pipeline.
- Develop actionable tools for monitoring the health of implemented pipelines and identify and fix issues in real-time.
- Define, manage, and contribute to the architecture of the AnthologyAI data and machine learning deployment pipelines.
What You’ve Done
- 2-3 years of industry experience developing ETL (Data processing pipeline) to integrate large volumes of data from various sources with a variety of database technologies.
- Advanced SQL knowledge and experience in no-SQL, GraphQL, etc.
- Experience in delivering production-ready code (Python, Java, etc.) to retrieve, cleanse, transform the data for analytical/modeling purpose
- Experience working in DataBricks
- Experience using modern big data pipelines (AWS, GCP, DataBricks)
- Experience with BigData frameworks (Hadoop, Hive, Spark, Kafka, Airflow, etc.)
- Ability to think out-of-box and evaluate results based on customer value
- Experience setting up and managing large data pipelines
Required Experience:
- Degree within Computer Science, Data Engineering, or a related field.
- Proven experience in designing, developing, and deploying pipelines in a real-world setting.
- Strong programming skills in Python or similar language
- Knowledge of AI/ML models, Natural language process (NLP) and data mining.
- Proficiency in SQL and working with large and complex datasets.
Nice To Have:
- Experience with Triple stores / ontology databases (RDF, OWL, SPARQL, Jena, etc.) and knowledge graphs
- Developing new metrics
- data/business analysis experience
- Sales eng experience
- Experience working with regulated data (healthcare, finance, etc)
Why AnthologyAI?
- Join a high-growth startup that is at the forefront of innovation
- Opportunity to make a significant impact on the company's strategic and growth trajectory
- Collaborative and inclusive work environment that encourages innovation and growth
- Competitive compensation package that includes equity
- Health & Commuter Benefits
- Flexible PTO
- Hybrid work arrangements
This role will work (hybrid) 3 days a week onsite out of our SoHo office.
The salary range for this position is $115,000- $160,000 per year based on candidate qualifications.
** There is currently no relocation and/or visa (immigration) assistance provided for this position.