Senior Research Engineer - Performance Optimization
Company: Luma AI
Location: San Jose
Posted on: June 1, 2025
Job Description:
We are looking for engineers with significant problem solving
experience in PyTorch, CUDA and distributed systems. You will work
with Research Scientists to build & train cutting edge foundation
models on thousands of GPUs.Responsibilities
- Ensure efficient implementation of models & systems for data
processing, training, inference and deployment.
- Identify and implement optimization techniques for massively
parallel and distributed systems.
- Identify and remedy efficiency bottlenecks (memory, speed,
utilization) by profiling and implementing high-performance CUDA,
Triton, C++ and PyTorch code.
- Work closely together with the research team to ensure systems
are planned to be as efficient as possible from start to
finish.
- Build tools to visualize, evaluate and filter datasets.
- Implement cutting-edge product prototypes based on multimodal
generative AI.Experience
- Experience training large models using Python & Pytorch,
including practical experience working with the entire development
pipeline from data processing, preparation & data loading to
training and inference.
- Experience optimizing and deploying inference workloads for
throughput and latency across the stack (inputs, model inference,
outputs, parallel processing etc.).
- Experience with profiling CPU & GPU code in PyTorch, including
Nvidia Nsight or similar.
- Experience writing & improving highly parallel & distributed
PyTorch code, with familiarity in DDP, FSDP, Tensor Parallel,
etc.
- Experience writing high-performance parallel C++. Bonus if done
within an ML context with PyTorch, like for data loading, data
processing, inference code.
- Experience with high-performance Triton / CUDA and writing
custom PyTorch kernels. Top candidates will be able to utilize
tensor cores; optimize performance with CUDA memory and other
similar skills.
- Good to have experience working with Deep learning concepts
such as Transformers & Multimodal Generative models such as
Diffusion Models and GANs.
- Good to have experience building inference / demo prototype
code (incl. Gradio, Docker etc.).Compensation
- The pay range for this position in California is $180,000 -
$250,000yr; however, base pay offered may vary depending on
job-related knowledge, skills, candidate location, and experience.
We also offer competitive equity packages in the form of stock
options and a comprehensive benefits plan.$200,000 - $280,000 a
yearIn addition to cash base pay, you'll also receive a sizable
grant of Luma's equity.The pay range for this position is for Bay
Area. Base pay offered may vary depending on job-related knowledge,
skills, candidate location, and experience.Your applications are
reviewed by real people.
#J-18808-Ljbffr
Keywords: Luma AI, San Leandro , Senior Research Engineer - Performance Optimization, Engineering , San Jose, California
Didn't find what you're looking for? Search again!
Loading more jobs...