Free cookie consent management tool by TermsFeed Data Engineer for AI | Antal Tech Jobs
Back to Jobs
10 Weeks ago

Data Engineer for AI

decor
Pune, Maharashtra, India
Information Technology
Full-Time
Cloudera

Overview

Business Area:
Professional Services
Seniority Level:
Mid-Senior level
Job Description:
At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world’s largest enterprises.
Role:
As a Customer Enablement Engineer specializing in Data Engineering for AI, you will design, develop, and deliver comprehensive curriculum content, including student guides, labs, quizzes, and certifications on data engineering and data preparation skills. This curriculum will enable Cloudera customers to effectively build AI systems on the Cloudera Hybrid platform.
Objective of this Role:
To ensure customers are successfully enabled to prepare data with high quality that meets the requirements to efficiently build their ML/AI including LLMs.
As the Data Engineer for AI you will:
  • Responsible for developing high quality and impactful “data engineering for AI” course
  • Enable instructors to successfully deliver the course in classrooms to our customers
  • Deliver hands-on workshops to customers in person or remote on select course topics
  • Record and publish course content as online modules in digital format
  • Work with internal & external SMEs and Customers to regularly seek inputs for improvement
  • Assist Edu sales leaders to sell Educational products by being a technical resources
  • Own your own self development and stay resourceful all the time. Enrich your own knowledge on various topics in data analytics and AI by being a self-learner .
We’re excited about you if you have:
  • Five (5) or more years of data engineering experience with SQL, Python, Hive, Spark, Flink, Kafka, Nifi and Airflow.
  • Hands-on experience in developing data ingest (batch and realtime) pipelines from various data sources into large analytics platforms, data warehouses, data lakes and lake houses
  • Experience with one or more LMS (learning management systems)
  • Experience or educated in preparing data ( both structured and unstructured ) for ML/AI model development including training and fine tuning of LLMs
  • Experience with data governance, data lineage, and metadata best practices
  • Experienced using data quality & data profiling tools and data catalogs
  • Experience in having published technology education content on digital media platforms like Udemy, LinkedIn, YouTube or own website etc as Curriculum Developer or independent contributor
  • Experience in working in public cloud environments from one of the hyperscalers like AWS, Google Cloud and Microsoft Azure). A cloud certification is preferred
  • Experience working with containers and Kubernetes. A certification in Kubernetes is preferred
  • Experience in (or trained on) the Cloudera platform (CDP, HDP or CDH ) and any underlying Apache projects
  • Experience or training in preparing data for ML/AI model development including LLMs
  • Experience or training on Iceberg, Trino and Vector databases like Pinecone orMilvus
  • Experience using configuration management tools such as Git, Ansible, Puppet or Chef
  • Familiarity with scripting tools such as bash shell scripts, Python and/or Perl
Soft Skills Essential
  • Ability to work closely with the curriculum content development team to define the operational requirements for technical training courses
  • Ability to build efficient, well-architected, easy-to-use hands-on lab environments
  • Ability to work as part of a remote, distributed team
It is a plus if you have:
  • Certification in cloud on at least one hypescaler: AWS, Azure, or GCP
  • Expertise in preprocessing unstructured data for generative AI, including tokenization and embedding generation
  • Proficiency with one or more vector databases (e.g., Pinecone, Milvus) for managing embeddings in semantic search and data retrieval.
  • Skills in handling large-scale datasets for LLMs, including sharding, distributed loading, and parallel data processing.
  • Knowledge of data lineage, versioning, and metadata tracking to ensure compliant, high-quality training data for generative AI.
What you can expect from us:
  • Generous PTO Policy
  • Support work life balance with
    Unplugged Days
  • Flexible WFH Policy
  • Mental & Physical Wellness programs
  • Phone and Internet Reimbursement program
  • Access to Continued Career Development
  • Comprehensive Benefits and Competitive Packages
  • Paid Volunteer Time
  • Employee Resource Groups
Cloudera is an Equal Opportunity / Affirmative Action Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, pregnancy, sexual orientation, gender identity, national origin, age, protected veteran status, or disability status.
#LI-Hybrid
#LI-SN1
Share job
Similar Jobs
View All
1 Day ago
Product Security Engineer
Information Technology
  • 3 - 6 Yrs
  • Noida
Role: Product Security Engineer Experience: 3+ Years Location: Noida Job Description: Security Specialist in areas of Security Vulnerability Assessment & Penetration Testing. Responsible for periodic assessment and implementation of remediation...
decor
1 Day ago
Sr. DevOps Engineer
Information Technology
  • 7 - 12 Yrs
  • Mumbai
ComUnus is hiring for Sr. DevOps Engineer No Of Position : 3 Exp. Req : 7+ Yrs Work Location : Mumbai (Vikhroli) Max NP : Immediate Joiners are preferred Must Have Skills : AWS Key Responsibilities: 1. 7+ years of Hands-On experience as De...
decor
2 Days ago
Interesting Job Opportunity: Bitkraft Technologies - Full Stack Developer - AngularJS & Node.js
Information Technology
SummaryBitkraft Technologies LLP is looking for Full-stack Engineers to join our software engineering team. You will be working across the stack on cutting edge web development projects for our custom services business.As a Full-stack Engineer, you ...
decor
2 Days ago
Big Data Engineer
Information Technology
DescriptionAmazon Retail Financial Intelligence Systems is seeking a seasoned and talented Senior Data Engineer to join the Fortune Platform team. Fortune is a fast growing team with a mandate to build tools to automate profit-and-loss forecasting a...
decor
2 Days ago
Interesting Job Opportunity: i2k2 - Python Developer - Web Crawling
Information Technology
Profile : Python Developer Experience : 3 To 6 YearsRequirement : Expertise in Python Development, AWS, Web Crawling, Databases (MYSQL, SQL SERVER), etc.Location : Work From Office (Work From Office)Working Days : 5Prefer Immediate Joiners.Job Descr...
decor
2 Days ago
Engineer - SSIS and T-SQL Developer
Information Technology
We are seeking an experienced in SSIS and T-SQL Developer specializes in ETL processes and SQL Server development, ensuring efficient data integration and database performance.. The successful candidate will work closely with cross-functional teams ...
decor
2 Days ago
Antino Labs - Python Developer - Geospatial Domain
Information Technology
  • Mumbai, Maharashtra, India
We are looking for a highly skilled Python Developer with a strong foundation in AI technologies and hands-on experience in Django DRF, web scraping, and geospatial tools.The ideal candidate should be passionate about building scalable backend syste...
decor
2 Days ago
Web Developer in Delhi, Noida, Gurgaon, Faridabad (Hybrid)
Information Technology
  • Mumbai, Maharashtra, India
Key Responsibilities Develop and maintain UniInsightt’s website using WordPress (core platform). Create performance-optimized landing pages and improve website UX/UI. Set up and manage basic analytics tools to track user engagement. Collaborate ...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media