Badarpur, UP, India
Information Technology
Full-Time
EXL Service
Overview
Job Description: We are looking for a Senior Spark solution developer to be able to design and build solution accelerator & code generation frameworks for one of our customer programs which aims towards building a Business Rules Engine for data standardization and curation needs on Hadoop cluster. This is high visibility fast paced key initiative will integrate data across internal and external sources, provide analytical insights and integrate with customer’s critical systems.
- Responsibilities: Ability to design and build Python-based code generation framework and runtime engine by reading Business Rules repository in order to:
- Generate PySpark runtime executable code for all business rules stitched together
- Orchestrate the pipeline of runtime executables as per standardization and curation needs on the Hadoop cluster
- Build Spark code generation optimizers which will factor-in the rule processing patterns and thus generate code with minimum intermediate Dataframes and persistence respectively.
- Build PySpark based applications for both batch and streaming requirements, which will require in-depth knowledge on majority of Hadoop and NoSQL databases as well.
- Design a graph-based recursive model for capturing Business Rules metadata in a JSON format
- Build recursive parsers of JSON and XML documents and objects for metadata-driven code generation on PySpark
- Develop and execute data pipeline testing processes and validate business rules and policies
- Optimize performance of the built Spark applications in Hadoop using configurations around Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
- Optimize performance for data access requirements by choosing the appropriate native Hadoop file formats (Avro, Parquet, ORC etc) and compression codec respectively.
- Participate in the agile development process, and document and communicate issues and bugs relative to data standards
- Create and maintain an integration and regression testing framework in Jenkins integrated with BitBucket and GIT repositories
- Develop & review technical documentation for artifacts delivered
- Pair up with other data engineers to develop analytic applications leveraging Big Data technologies: Hadoop, NoSQL, and In-memory Data Grids
- Qualifications: Bachelor's degree in a quantitative field (such as Engineering, Computer Science, Statistics, Econometrics) and a minimum of 5 years of experience
- Minimum 5 years of extensive experience in design, build and deployment of Python-based applications
- Minimum 3 years of experience in build & deployment of Bigdata applications using SparkSQL, SparkStreaming in Python
- Expertise on graph algorithms and advanced recursion techniques
- Expertise in handling complex large-scale Big Data environments preferably (20Tb+)
- Minimum 3 years of experience in the following: HIVE, YARN, Kafka, HBase, MongoDB
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in