Overview
About EveoAI
EveoAI is an innovative AI-driven startup building next-generation solutions for personality and fashion analysis. We're developing proprietary Large Language Models (LLMs) and AI systems that understand human personality, style preferences, and create virtual personas. Our mission is to transform how individuals express themselves through AI-powered personalization.
About the Role
We're looking for a talented Data Scientist to join our AI and Data Engineering team. You'll be responsible for mining data from diverse internet sources, cleaning and processing large datasets, and managing end-to-end ML pipeline operations. This is a unique opportunity to work on the intersection of data engineering and machine learning at an early-stage, well-funded startup.
Key Responsibilities
- Mine and collect data from internet sources using web scraping, APIs, and data ingestion techniques
- Design and implement robust data cleaning and preprocessing pipelines
- Transform raw, unstructured data into high-quality datasets suitable for ML model training
- Develop and maintain end-to-end ML pipelines for training, validation, and deployment
- Build data quality monitoring and validation frameworks
- Optimize data processing workflows for performance and scalability
- Collaborate with ML engineers to ensure data readiness and pipeline reliability
- Implement feature engineering and data augmentation strategies
- Manage data storage, versioning, and accessibility in production environments
- Create data documentation and maintain data lineage tracking
- Develop automated ETL/ELT workflows using orchestration tools
- Analyze data distribution and quality metrics to identify improvements
Required Skills & Experience
- 2+ years of professional experience in data science, data engineering, or ML operations
- Strong proficiency in Python with experience in data manipulation libraries (Pandas, NumPy)
- Experience with data mining and web scraping (BeautifulSoup, Scrapy, or similar)
- Proficiency in SQL and database management (PostgreSQL, MongoDB)
- Experience with data processing frameworks (Spark, Dask, or similar)
- Understanding of data cleaning, validation, and quality assurance
- Knowledge of ML pipeline orchestration tools (Airflow, Prefect, or similar)
- Experience with version control (Git) and collaborative development
- Strong problem-solving skills and attention to detail
- Ability to work with unstructured and semi-structured data
Nice-to-Have Skills
- Experience with cloud data platforms (AWS S3, GCP BigQuery, Azure Data Lake)
- Knowledge of vector databases and embeddings (Pinecone, Weaviate, Milvus)
- Familiarity with containerization and deployment (Docker, Kubernetes)
- Experience with CI/CD pipelines and MLOps practices
- Understanding of data privacy and compliance requirements
- Experience with real-time data streaming (Kafka, Kinesis)
- Knowledge of graph databases for relationship data
- Experience with LLM fine-tuning and dataset preparation
What We Offer
- Competitive salary with equity options in a well-funded startup
- Opportunity to work on cutting-edge AI and data science technology
- Collaborative team environment with experienced AI researchers and engineers
- Flexible work arrangement with office in Gujarat, India
- Professional growth and learning opportunities
- Health and wellness benefits
Ideal Candidate
You're passionate about data and its transformative potential, enjoy problem-solving with large datasets, and want to make an impact in the AI revolution. You should be detail-oriented, proactive in identifying data quality issues, and excited about building robust data infrastructure that powers next-generation AI applications.