
Overview
Responsible for designing, developing, and optimizing data processing solutions using a combination of Big Data technologies. Focus on building scalable and efficient data pipelines for handling large datasets and enabling batch & real-time data streaming and processing.
Responsibilities:
> Develop Spark applications using Scala or Python (Pyspark) for data transformation, aggregation, and analysis.
> Develop and maintain Kafka-based data pipelines: This includes designing Kafka Streams, setting up Kafka Clusters, and ensuring efficient data flow.
> Create and optimize Spark applications using Scala and PySpark: They leverage these languages to process large datasets and implement data transformations and aggregations.
> Integrate Kafka with Spark for real-time processing: They build systems that ingest real-time data from Kafka and process it using Spark Streaming or Structured Streaming.
> Collaborate with data teams: This includes data engineers, data scientists, and DevOps, to design and implement data solutions.
> Tune and optimize Spark and Kafka clusters: Ensuring high performance, scalability, and efficiency of data processing workflows.
> Write clean, functional, and optimized code: Adhering to coding standards and best practices.
> Troubleshoot and resolve issues: Identifying and addressing any problems related to Kafka and Spark applications.
> Maintain documentation: Creating and maintaining documentation for Kafka configurations, Spark jobs, and other processes.
> Stay updated on technology trends: Continuously learning and applying new advancements in functional programming, big data, and related technologies.
Proficiency in:
Hadoop ecosystem big data tech stack(HDFS, YARN, MapReduce, Hive, Impala).
Spark (Scala, Python) for data processing and analysis.
Kafka for real-time data ingestion and processing.
ETL processes and data ingestion tools
Deep hands-on expertise in Pyspark, Scala, Kafka
Programming Languages:
Scala, Python, or Java for developing Spark applications.
SQL for data querying and analysis.
Other Skills:
Data warehousing concepts.
Linux/Unix operating systems.
Problem-solving and analytical skills.
Version control systems
-
Job Family Group:
Technology-
Job Family:
Applications Development-
Time Type:
Full time-
Most Relevant Skills
Please see the requirements listed above.-
Other Relevant Skills
For complementary skills, please see above and/or contact the recruiter.-
Citi is an equal opportunity employer, and qualified candidates will receive consideration without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, disability, status as a protected veteran, or any other characteristic protected by law.
If you are a person with a disability and need a reasonable accommodation to use our search tools and/or apply for a career opportunity review Accessibility at Citi.
View Citi’s EEO Policy Statement and the Know Your Rights poster.