Back to Jobs

1 Day ago

Job Title : Data Engineer / AI Data Pipeline Engineer

Apply Now

Delhi, DL, India

Information Technology

Full-Time

Web Spiders

Overview

Location

Kolkata, Rajarhat

Type

Full Time

Department

IT Technology

We are looking for a hands-on Data Engineer / AI Data Pipeline Engineer to join our growing engineering team. You'll work on cutting-edge AI-powered data enrichment, taxonomy validation, and scalable reporting frameworks across large-scale retail and enterprise datasets. The role sits at the intersection of data engineering and applied LLMs, and requires strong skills in Python, SQL, AWS cloud services, modern ETL architecture, and LLM-powered automation workflows. ‍ Experience: 3–4 years ‍Location: Rajarhat-Newtown (Kolkata) ‍Employment Type: Full-time, Onsite ‍Timing: Ability to work in the US Eastern time zone. This may be relaxed to half day IST and half day US EST - based on project needs. ‍Documents : Must have Aadhar Card, Education Certificates that are verifiable, Past company letters ( if applicable) and criminal background clearance. ‍ Key Skills Required: AI-Powered Taxonomy Audit & Enrichment: Design and develop scalable, AI-driven taxonomy audit pipelines for retail store and brand data validation. Build automated workflows leveraging LLMs (GPT-4o / OpenAI APIs) for classification, enrichment, and ontology standardization, using Instructor and Pydantic for reliable structured outputs. Integrate web research and scraping systems (Serper API, ScrapingBee, html2text) to validate structured and unstructured data. Develop human-in-the-loop review workflows using Label Studio for confirm/edit/reject audit processes. Improve taxonomy coverage and entity-resolution accuracy through AI-assisted clustering and enrichment of unmapped transaction data. Data Engineering & Pipeline Development: Build and maintain modular, reusable ETL/data pipeline frameworks. Refactor legacy reporting systems into modern, maintainable architectures with reusable SQL modules and query builders. Develop validation frameworks, logging systems, automated migration workflows, and configurable comparison contexts. Orchestrate workflows with Apache Airflow (DAGs, PythonOperator, XCom) and cloud-native AWS services. Ensure backward compatibility and production stability during migration initiatives. Reporting & Cloud Infrastructure: Develop and optimize advanced SQL queries and reporting pipelines on Amazon Redshift / Redshift Serverless and PostgreSQL (RDS). Manage data workflows using AWS services including S3, Lambda, Glue, CloudWatch, SSM Parameter Store, and Secrets Manager. Monitor production pipelines, troubleshoot issues, and improve performance and reliability. Collaborate with cross-functional teams across Data Engineering, AI/ML, QA, and Product. ‍ Required Skills & Experience: 3–4 years of experience in Python-based data engineering or backend engineering. Strong proficiency in Python, including pandas, requests, psycopg2, and boto3, with solid modular application development. Hands-on experience with Apache Airflow (DAGs, PythonOperator, XCom). Strong advanced SQL skills and a solid grasp of data warehousing concepts. Experience with Amazon Redshift and PostgreSQL. Sound understanding of ETL/data pipeline architecture and workflow orchestration. Hands-on experience with AWS services: S3, Lambda, Glue, CloudWatch, SSM Parameter Store, and Secrets Manager. Experience integrating LLM APIs (GPT-4o / OpenAI) into production workflows. Familiarity with web scraping, search APIs, and data enrichment systems. Experience with Git/GitHub, Jira, and Confluence. Strong debugging, problem-solving, and analytical skills. ‍ Good to Have: Experience with Instructor, Pydantic, or AI workflow orchestration frameworks. Exposure to Label Studio or other human-review annotation systems. Experience with AI-assisted entity resolution and taxonomy/ontology systems. Familiarity with scalable, modular ETL framework design. Background in retail transaction data or taxonomy/master-data management. ‍ Tech Stack: Languages & Libraries: Python, advanced SQL, pandas, boto3, psycopg2, requests Orchestration: Apache Airflow AI / LLM: GPT-4o / OpenAI APIs, Instructor, Pydantic Data & Warehousing: Amazon Redshift / Redshift Serverless, PostgreSQL (RDS) AWS: S3, Lambda, Glue, CloudWatch, SSM Parameter Store, Secrets Manager Scraping & Search: Serper API, ScrapingBee, html2text Human Review: Label Studio Collaboration: Git/GitHub, Jira, Confluence ‍ Preferred Candidate Profile: Self-driven, with end-to-end ownership of data workflows. Comfortable in fast-paced AI/data engineering environments. Strong communication and collaboration skills. Passionate about building scalable, AI-assisted automation systems.

Share job

Similar Jobs

View All

1 Day ago

Java Full Stack Developer with AWS

Information Technology

Delhi, DL, India

Company Profile: Founded in 1976, CGI is among the largest independent IT and business consulting services firms in the world. With 94,000 consultants and professionals across the globe, CGI delivers an end-to-end portfolio of capabilities, from stra...

More info

1 Day ago

Cloud Engineer

Information Technology

1000000 - 1300000 INR - Yearly
Delhi, DL, India

Position: Cloud System EngineerJob Description:Plus91 is looking for a Cloud System Engineer to be a core member of our IT Support Team. You can expect to be challenged and grow in a dynamic environment with technological advancements. You will work ...

More info

1 Day ago

Senior Cloud Engineer

Information Technology

Delhi, DL, India

Every day, Global Payments makes it possible for millions of people to move money between buyers and sellers using our payments solutions for credit, debit, prepaid and merchant services. Our worldwide team helps over 3 million companies, more than 1...

More info

1 Day ago

Lead Software Engineer - Java with Azure

Information Technology

Delhi, DL, India

EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will c...

More info

1 Day ago

Senior Data Engineer

Information Technology

Delhi, DL, India

Ciklum is looking for a Senior Data Engineer to join our team full-time in India. We are a custom product engineering company that supports both multinational organizations and scaling startups to solve their most complex business challenges. W...

More info

1 Day ago

Principal Software Engineer

Information Technology

Delhi, DL, India

About Team: RX Global aims to create unforgettable experiences for attendees and exhibitors through organizing events. Innovation, creativity, and collaboration drive the company to offer exceptional services to clients. About the role: The Principal...

More info

1 Day ago

Business Analyst I, GTS Field - Kadie Newman

Information Technology

Delhi, DL, India

DESCRIPTION About the Team The TESS (Transportation Execution Systems & Services) Analytics team is part of NA Transportation Services and is responsible for delivering data-driven insights, dashboards, automation, and governance solutions that supp...

More info

1 Day ago

Data Engineer

Information Technology

Delhi, DL, India

Project Role : Data Engineer Project Role Description : Design, develop and maintain data solutions for data generation, collection, and processing. Create data pipelines, ensure data quality, and implement ETL (extract, transform and load) processes...

More info

Talk to us

Feel free to call, email, or hit us up on our social media accounts.

Email info@antaltechjobs.in