Pune, Maharashtra, India
Information Technology
Full-Time
NVIDIA
Overview
Joining NVIDIA's DGX Cloud AI Efficiency Team means contributing to the infrastructure that powers our innovative AI research. This team focuses on optimizing efficiency and resiliency of AI workloads, as well as developing scalable AI and Data infrastructure tools and services. Our objective is to deliver a stable, scalable environment for AI researchers, providing them with the necessary resources and scale to foster innovation. We are seeking excellent Software Engineers to design and develop tools for AI application performance analysis. Your work will enable AI researchers to work efficiently with a wide variety of DGXC cloud AI systems as they seek out opportunities for performance optimization and continuously deliver high quality AI products. Join our technically diverse team of AI infrastructure experts to unlock unprecedented AI performance in every domain.
What You'll Be Doing
JR2006293
What You'll Be Doing
- Develop AI performance tools for large scale AI systems providing real time insight into applications performance and system bottlenecks.
- Conduct in-depth hardware-software performance studies
- Define performance and efficiency evaluation methodologies
- Automate performance data analysis and visualization to convert profiling data into actionable optimizations
- Support deep learning software engineers and GPU architects in their performance analysis efforts
- Work with various teams at NVIDIA to incorporate and influence the latest technologies for GPU performance analysis
- Minimum of 8+ years of experience in software infrastructure and tools
- BS or higher degree in computer science or similar (or equivalent experience)
- Adept programming skills in multiple languages including C++ and Python
- Solid foundation in operating systems and computer architecture
- Outstanding ability to understand users, prioritize among many contending requests, and build consensus
- Passion for “it just works” automation, eliminating repetitive tasks, and enabling team members
- Experience in working with the large scale AI cluster
- Experience with CUDA and GPU computing systems
- Hands-on experience with deep learning frameworks (TensorFlow, PyTorch, JAX/XLA etc.)
- Deep understanding of the software performance analysis and optimization process
JR2006293
Similar Jobs
View All
Talk to us
Feel free to call, email, or hit us up on our social media accounts.
Email
info@antaltechjobs.in