Overview
About the positionAs a Mid-level Data Engineer in the MIDAS (Management Integration & Data Analytics System) Data Platform Team, you will build from scratch and maintain the central data hub connecting most systems found inside one of Japan's more innovative digital banks.
You will work with modern cloud-based data technologies to ingest data from various banking systems, apply complex business logic on it, then serve it to downstream systems for enterprise management, regulatory reporting, risk management and many other applications.
Thanks to the high expectations towards the banking domain, you will have the opportunity to work on complex data engineering challenges including data quality, reconciliation across multiple systems, time-critical data processing, and complete traceability.
This is a mid-level position where you will work with increasing independence on data pipeline development, while collaborating closely with senior engineers and the technical lead for guidance on complex problems.
This position involves employment with Money Forward, Inc., and a secondment to the new company (SMBC Money Forward Bank Preparatory Corporation). The evaluation system and employee benefits will follow the policies of Money Forward, Inc.
Who we are?
We are a startup team partnering with Sumitomo Mitsui Financial Group and Sumitomo Mitsui Banking Corporation to establish a new digital bank. Our mission is to build embedded financial products from the ground up, with a strong focus on supporting small and medium-sized businesses (SMBs).
Development Structure
We operate in a small, agile team while collaborating closely with partners from the banking industry. The MIDAS team is growing rapidly, aiming for more than 10 data engineers within this year.
Technology Stack and Tools Used
- Cloud Infrastructure
- AWS (primary cloud platform in Tokyo region)
- S3 for data lake storage with VPC networking for secure connectivity
- AWS IAM for security and access management
- Data Lakehouse Architecture
- Modern lakehouse architecture using Delta Lake or Apache Iceberg for ACID transactions, time-travel, and schema evolution
- Columnar storage formats (Parquet) optimized for analytics
- Bronze/Silver/Gold medallion architecture for progressive data refinement
- Partition strategies and Z-ordering for query performance
- Orchestration & Processing
- Managed workflow orchestration platforms (Amazon MWAA/Apache Airflow, Databricks Workflows, or similar)
- Distributed data processing with Apache Spark
- Serverless compute options for cost optimization
- Streaming and batch ingestion patterns (AutoLoader, scheduled jobs)
- Data Transformation
- dbt (data build tool) for SQL-based analytics engineering
- Delta Live Tables or AWS Glue for declarative ETL pipelines
- SQL and Python for data transformations
- Incremental materialization strategies for efficiency
- Query & Analytics
- Serverless query engines (Amazon Athena, Databricks SQL, or Redshift Serverless)
- Auto-scaling compute for variable workloads
- Query result caching and optimization
- REST APIs for data serving to downstream consumers
- Data Quality & Governance
- Automated data quality frameworks (AWS Glue Data Quality, Delta Live Tables expectations, Great Expectations)
- Cross-system reconciliation and validation logic
- Fine-grained access control with column/row-level security (AWS Lake Formation or Unity Catalog)
- Automated data lineage tracking for regulatory compliance
- Audit logging and 10-year data retention policies
- Business Intelligence
- Amazon QuickSight and/or Databricks SQL Dashboards
- Integration with enterprise BI tools (Tableau, PowerBI, Looker
- Development & DevOps
- Languages: SQL (primary), Python
- Version Control: GitHub
- CI/CD: GitHub Actions
- Infrastructure as Code: Terraform
- Monitoring: CloudWatch, Databricks monitoring, or similar
- AI-Assisted Development: Claude Code, GitHub Copilot, ChatGPT
- Develop and maintain data ingestion pipelines from multiple banking source systems
- Build data transformations to ensure data quality, consistency, and business logic correctness
- Set up and maintain orchestration workflows for scheduled data processing
- Implement data quality checks and validation rules based on business requirements
- Develop and maintain API interfaces for data serving to downstream systems
- Set up BI tool integrations and develop reports and dashboards
- Write tests for data pipelines and transformations
- Monitor scheduled jobs, troubleshoot failures, and implement fixes
- Optimize data pipeline performance and query efficiency
- Document data flows, transformation logic, and system configurations
- Participate in code reviews and collaborate with team members
- Learn banking domain concepts and regulatory requirements
- Contribute to team knowledge sharing and best practices
Requirements
- 2-5 years of experience in data engineering with data focus, or analytics engineering
- Strong proficiency in SQL and working knowledge of Python
- Hands-on experience building data pipelines using tools like Airflow, dbt, or similar
- Experience with cloud platforms (AWS, Azure, or GCP) and object storage (S3, ADLS, GCS)
- Understanding of data modeling concepts including dimensional modeling and fact/dimension tables
- Experience with data quality validation and testing
- Ability to debug and troubleshoot data pipeline issues
- Experience with version control (Git) and basic understanding of CI/CD concepts
- Understanding of data governance basics: access control and audit logging
- Good problem-solving skills and ability to work with moderate independence
- Good communication skills and willingness to ask questions when blocked
- Bachelor's degree in Computer Science, Engineering, Mathematics, or a related field, or equivalent practical experience
- Language ability: Japanese at Business level and/or English at Business level (TOEIC score of 700 or above)
While not specifically required, tell us if you have any of the following.
- Experience in financial services, fintech, or regulated industries
- Basic knowledge of banking domain concepts: core banking, payments, or regulatory reporting
- Exposure to data platforms in regulated environments (FISC Guidelines, GDPR, APPI)
- Hands-on experience with Databricks platform or AWS native data services
- Experience with performance tuning: partitioning strategies, file formats, query optimization
- Experience building REST APIs with Python (FastAPI, Flask, or similar)
- Knowledge of streaming data pipelines (Kafka, Kinesis, or similar)
- Basic experience with Terraform
- Experience with BI tools (QuickSight, Tableau, Looker, PowerBI)
- Experience with data visualization and dashboard design
- Interest in obtaining certifications (AWS Certified Data Analytics, Databricks certifications)
This role offers exceptional learning opportunities:
- Cloud Engineering : Hands-on experience with AWS services (S3, IAM, VPC networking), understanding cloud-native architectures
- Lakehouse Technologies: Deep dive into Delta Lake or Apache Iceberg including ACID transactions, time-travel queries, and schema evolution
- Data Orchestration: Build production workflows with Apache Airflow or Databricks Workflows including DAG design, dependency management, and error handling
- Analytics Engineering: Master dbt (data build tool) for SQL-based transformations, incremental models, and data testing
- Data Processing: Work with Apache Spark for distributed data processing and learn optimization techniques
- Data Modeling: Learn dimensional modeling, slowly changing dimensions (SCD), fact/dimension tables, and star schema design
- Banking Domain: Understand core banking systems, payment flows, regulatory reporting (FSA/BOJ), and financial reconciliation
- Data Quality: Implement validation frameworks, cross-system reconciliation, and automated testing for data pipelines
- Governance & Compliance: Experience with fine-grained access control, audit logging, data lineage tracking, and regulatory compliance (FISC Guidelines)
- Performance Optimization: Learn query optimization, partitioning strategies, Z-ordering, and cost management for cloud data platforms
- Professional Development: Mentorship from experienced data engineers and architects, code review practices, and engineering best practices
We are committed to your professional growth:
- Clear progression path from Mid-level → Senior Data Engineer → Technical Lead
- Regular 1-on-1s with PM and Tech Lead for feedback and career planning
- Opportunities to lead features and projects as you gain experience
- Support for certifications and training (AWS, Databricks, dbt)
- Exposure to architecture decisions and system design discussions
- Increasing ownership of data platform components
- Potential to specialize in areas of interest (data governance, real-time streaming, ML infrastructure, cost optimization)
- Mentorship opportunities as the team grows