Overview
Job Title: Data Engineer
Location: TrivandrumType: Full-timeExperience Level: Mid-Senior (3+ years)
About the Role
We are seeking a skilled Data Engineer to design and implement robust, scalable data pipelines for processing and transforming log data stored in Elasticsearch. You will play a key role in building the data pipeline for our advanced ML-powered behavioural anomaly detection platform.
This role also involves designing and maintaining the feature engineering pipeline, including integration with a feature store like Feast, and ensuring high-quality, low-latency data delivery for ML models. If you have strong experience in ELK stack, Python, and modern data architectures, and are excited by the intersection of AI and cybersecurity, this is for you.
Key Responsibilities
ETL Pipeline Development:
- Build scalable ETL workflows to extract raw logs from Elasticsearch.
- Clean, normalize, and transform logs into structured features for ML use cases.
- Maintain data freshness with either batch or near real-time workflows.
Feature Store Integration:
- Design schemas for storing derived features into a feature store (e.g., Feast).
- Collaborate with ML engineers to ensure features are aligned with model requirements.
- Manage historical feature backfills and real-time lookups.
Data Infrastructure and Architecture:
- Optimize Elasticsearch queries and index management for performance and cost.
- Design data schema, partitioning, and retention policies for long-term storage.
- Ensure data integrity, versioning, and reproducibility of transformed data.
Monitoring and Scaling:
- Implement monitoring for pipeline performance and failures.
- Scale pipelines to support growing log data (in the scale of 100s of GBs per day).
Collaboration:
- Work closely with security analysts and AI engineers to translate behavioural insights into engineered features.
- Document data lineage, transformation logic, and data dictionaries.
Minimum Qualifications
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, or related field.
- 3+ years of experience in data engineering roles with Python and Elasticsearch.
- Strong experience building data pipelines using:
- Python (pandas, elasticsearch-py, PySpark is a bonus)
- Orchestration tools (e.g., Apache Airflow, Prefect)
- Familiarity with log processing, especially NGINX, Apache logs, HTTP protocols, and cybersecurity-relevant fields (IP, headers, user agents).
- Experience with feature stores such as Feast, Tecton, or custom-built systems.
- Solid understanding of data modeling, versioning, and time-series data handling.
- Knowledge of DevOps practices (Docker, Git, CI/CD workflows).
Nice to Have
- Experience with Kafka, Fluentd or Logstash pipelines.
- Experience deploying data workloads on cloud environments (AWS/GCP/Azure).
- Exposure to anomaly detection or cybersecurity ML systems.
- Familiarity with ML workflows, model deployment, and MLOps.
Job Type: Full-time
Pay: ₹35,000.00 - ₹60,000.00 per month
Benefits:
- Health insurance
- Provident Fund
Schedule:
- Day shift
- Monday to Friday
Experience:
- Data engineering: 3 years (Required)
Work Location: In person