Gurugram, Haryana, India
Information Technology
Full-Time
TekWissen India
Overview
Overview
TekWissen Group is a workforce management provider operating throughout India and several other countries worldwide. The client below is a leading technology company offering a range of IT solutions to businesses and organizations, enabling them to transform their digital futures
Position: Senior Software Engineer
Location: Hyderabad
Duration: 24 Months
Job Type: Contract
Work Type: Hybrid
Shift Timings: 9:00 AM-6:00 PM
Job Description
Expertise in Data Ingestion and Transformation tools and technologies
TekWissen Group is a workforce management provider operating throughout India and several other countries worldwide. The client below is a leading technology company offering a range of IT solutions to businesses and organizations, enabling them to transform their digital futures
Position: Senior Software Engineer
Location: Hyderabad
Duration: 24 Months
Job Type: Contract
Work Type: Hybrid
Shift Timings: 9:00 AM-6:00 PM
Job Description
- Install, configure, and administer data ingestion and transformation platforms including StreamSets, Apache Spark, Informatica, and Apache NiFi for maximum utilization and throughput.
- Perform StreamSets Data Collector and Control Hub administration including pipeline deployment, monitoring, and optimization across multiple environments.
- Experience in designing and implementing real-time and batch data pipelines using Apache Spark with Python (PySpark), with Scala knowledge as an added advantage.
- Hands-on experience with Apache NiFi for data flow automation, including processor configuration, flow management, and cluster coordination.
- Experience in upgrading and migrating data ingestion tools and pipelines to higher versions while maintaining data integrity and minimal downtime.
- Highly proficient in Python programming for data processing, pipeline development, and automation scripting.
- Expert-level SQL skills for complex data transformations, performance optimization, and database interactions across various RDBMS platforms.
- Experience with diverse database technologies and data sources:
- MPP databases: Teradata, Greenplum for large-scale analytical workloads
- OLTP databases: Oracle, SQL Server, PostgreSQL for transactional processing
- In-memory databases: SAP HANA, MemSQL for high-performance analytics
- Big Data platforms: Hive, HBase, and Kudu for distributed data storage and processing
- Additional data sources including SFDC and modern cloud data platforms
- Ability to write efficient, scalable code following best practices and coding standards for data engineering projects.
- Extensive experience with automated testing integration using CI/CD tools, preferably GitLab (Jenkins experience is beneficial).
- Implement and maintain automated deployment pipelines for data ingestion and transformation workflows.
- Experience in version control, branching strategies, and collaborative development practices for data engineering projects.
- Knowledge of infrastructure as code and containerization technologies for deployment automation.
- Perform data pipeline deployments across versioned repositories from development to production environments.
- Experience with TLS hardening and security best practices for data platforms and APIs.
- Proficient in using command-line tools and APIs for platform administration and monitoring.
- Create, maintain, and restore backup strategies for data pipelines, configurations, and metadata repositories.
- Expertise in troubleshooting performance bottlenecks in data processing workflows and implementing optimization strategies.
- Experience in cluster maintenance and distributed system administration for Spark and other big data technologies.
- Production environment health monitoring, alerting, and incident response for data pipelines.
- Root cause analysis of pipeline failures with comprehensive documentation of issues and resolutions.
- Develop and enforce data engineering coding standards and best practices across development teams.
- Experience with data quality validation, testing frameworks, and automated data validation processes.
- Knowledge of data governance principles and implementation of data lineage tracking.
- Scheduling and orchestration of data workflows with proper error handling and retry mechanisms.
Expertise in Data Ingestion and Transformation tools and technologies
- StreamSets
- Spark (python preferable, scala good to have)
- Informatica
- Apache Ni-fi
- Highly proficient in Python and SQL
- Has working experience with automated test integration using CI/CD tools
- Gitlab preferable (Jenkins good to have)
- OLTP DB's like Oracle, SQL Server and Postgres
- In-memory DB's like HANA, MemSQL
- Hive, HBase and Kudu
- Total Exp: 5- 8 years
- Rel Exp: 5+ years
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in