Free cookie consent management tool by TermsFeed IT Engineer 4 | Antal Tech Jobs
Back to Jobs
3 Weeks ago

IT Engineer 4

decor
Bangalore, Karnataka, India
Information Technology
Full-Time
Lam Research

Overview

The group you’ll be a part of

The Global Information Systems Group is dedicated to the success of Lam through providing best-in-class and innovative information system solutions and services. Together, we support users globally with data, information, and systems to achieve their business objectives.

The impact you’ll make

We are seeking a HPC Systems Engineer to lead the evaluation, deployment, and ongoing management of our large-scale CPU and GPU-clustered environments. You will be the technical owner for the HPC system lifecycle—from initial hardware planning and installation to advanced performance tuning and troubleshooting. This role is highly collaborative, requiring you to work closely with Networking and Security teams to build a secure, high-speed foundational infrastructure that supports mission-critical research and engineering workloads.

What You’ll Do

  • Cluster Lifecycle Management: Lead the evaluation, planning, configuration, and physical/virtual deployment of multiple large-scale CPU + GPU clusters.
  • System Administration: Perform expert-level Linux system administration, including kernel tuning, security hardening, and OS lifecycle management (e.g., RHEL, Ubuntu, or Rocky Linux).
  • Workload Management: Act as the subject matter expert for SLURM, managing complex partitioning, resource quality of service (QoS), and scheduling optimization for mixed workloads.
  • Infrastructure Design: Architect and build the physical and logical infrastructure for HPC, including high-speed fabric integration (InfiniBand/Ethernet) and power/cooling planning.
  • Software Stack & Modules: Maintain and curate the HPC application stack using software management tools like LMOD or Tcl Modules, ensuring researchers have access to optimized compilers, libraries (MPI, CUDA), and applications.
  • GPU Optimization: Spec and tune GPU environments (e.g., NVIDIA H100/B200), focusing on GPUDirect, NVLink topologies, and containerized runtimes like Apptainer/Singularity.
  • Troubleshooting & Performance: Conduct deep-dive root cause analysis for complex system failures and performance bottlenecks across compute, network, and software layers.
  • Cross-Functional Leadership: Closely own infrastructure projects by coordinating with Networking (low-latency fabric) and Security (compliance, identity management) to ensure all builds meet enterprise standards.

Who We’re Looking For

  • Experience with GPU-aware MPI implementations and performance profiling tools (e.g., NVIDIA Nsight, Tau).
  • Knowledge of container orchestration in HPC (e.g., Kubernetes for AI/ML workloads alongside SLURM).
  • Certifications such as RHCE (Red Hat Certified Engineer) or relevant NVIDIA/InfiniBand technical training.

Preferred Qualifications

  • Education: BS/MS in Computer Science, Electrical Engineering, or a related field.
  • HPC Experience: 6+ years of hands-on experience managing production-grade HPC clusters.
  • Scheduler Expertise: Deep proficiency in SLURM administration, including writing custom prolog/epilog scripts and managing GRES (Generic Resources) for GPUs.
  • Linux Mastery: Advanced knowledge of Linux internals, shell scripting (Bash), and at least one high-level language (Python or Go).
  • Automation: Extensive experience with configuration management and provisioning tools (e.g., Ansible, Terraform, xCAT, or Warewulf).
  • Networking: Familiarity with HPC-specific networking such as InfiniBand (NDR/HDR) and RoCE v2.

Our commitment

We believe it is important for every person to feel valued, included, and empowered to achieve their full potential. By bringing unique individuals and viewpoints together, we achieve extraordinary results.

Lam Research ("Lam" or the "Company") is an equal opportunity employer. Lam is committed to and reaffirms support of equal opportunity in employment and non-discrimination in employment policies, practices and procedures on the basis of race, religious creed, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex (including pregnancy, childbirth and related medical conditions), gender, gender identity, gender expression, age, sexual orientation, or military and veteran status or any other category protected by applicable federal, state, or local laws. It is the Company's intention to comply with all applicable laws and regulations. Company policy prohibits unlawful discrimination against applicants or employees.

Lam offers a variety of work location models based on the needs of each role. Our hybrid roles combine the benefits of on-site collaboration with colleagues and the flexibility to work remotely and fall into two categories – On-site Flex and Virtual Flex. ‘On-site Flex’ you’ll work 3+ days per week on-site at a Lam or customer/supplier location, with the opportunity to work remotely for the balance of the week. ‘Virtual Flex’ you’ll work 1-2 days per week on-site at a Lam or customer/supplier location, and remotely the rest of the time.

Share job
Similar Jobs
View All
20 Hours ago
Data Engineer
Fintech
  • 3 - 5 Yrs
  • Mumbai
Data Engineer Mumbai | Full-Time  Experience: 3–6 Years Budget: Up to ₹27 LPA Industry: General Insurance (Digital-First Organization) We’re rebuilding insurance from the ground up digital-first, transparent, fast, and fair. No legacy te...
decor
1 Day ago
QA Manager
Fintech
  • 10 - 18 Yrs
  • Pune
Job Description We are seeking an experienced and dynamic QA Manager to lead our quality assurance team in delivering high-quality software products for our organization. The ideal candidate will have a strong background in manual and automation tes...
decor
1 Day ago
Database Administrator (DBA)
Information Technology
  • Bangalore, Karnataka, India
This role is for one of our clients Company Name: cloudtechner Seniority level: Mid-Senior level Min Experience: 5 years Location: Gurgaon, NCR JobType: full-time We are looking for an experienced and detail-oriented Database Administrator (DBA) to ...
decor
1 Day ago
Salesforce Data Engineer
Information Technology
  • Bangalore, Karnataka, India
DescriptionRole Summary :We are seeking a highly skilled Salesforce Data Engineer with deep expertise in the Salesforce platform and a strong focus on building and operating Salesforce Data Cloud (D360) solutions. The ideal candidate will design, int...
decor
1 Day ago
Business Analyst I
Information Technology
  • Bangalore, Karnataka, India
Through our dedicated associates, Conduent delivers mission-critical services and solutions on behalf of Fortune 100 companies and over 500 governments - creating exceptional outcomes for our clients and the millions of people who count on them. You ...
decor
1 Day ago
Associate Software Engineer - Test Automation (Infra)
Information Technology
  • Bangalore, Karnataka, India
Veeva Systems is a mission-driven organization and pioneer in industry cloud, helping life sciences companies bring therapies to patients faster. As one of the fastest-growing SaaS companies in history, we surpassed $2B in revenue in our last fiscal ...
decor
1 Day ago
Interesting Job Opportunity: Data Analyst - SQL/Python
Information Technology
  • Bangalore, Karnataka, India
DescriptionWe are seeking a skilled Data Analyst with strong expertise in Python, SQL, and Excel, coupled with a solid foundation in statistics and a good understanding of retail demand processes.The ideal candidate will be responsible for transformi...
decor
1 Day ago
EY - GDS Consulting - AI and DATA - GCP Data Engineer - Senior
Information Technology
  • Bangalore, Karnataka, India
At EY, you’ll have the chance to build a career as unique as you are, with the global scale, support, inclusive culture and technology to become the best version of you. And we’re counting on your unique voice and perspective to help EY become even b...
decor

Talk to us

Feel free to call, email, or hit us up on our social media accounts.
Social media