Apply For Senior Site Reliability Engineer - AI Research Clusters

  • company name NVIDIA
  • working location Office Location
  • job type Full Time

Experience: 5 - 5 years required

Pay:

Salary Information not included

Type: Full Time

Location: Haryana

Skills: Performance Analysis, Cluster operations, Real Time monitoring, Logging, Bash Scripting, Kubernetes, Docker, InfiniBand, RDMA, Lustre, GPFS, GPU Computing, AI Infrastructure, Site Reliability Engineering, Optimizations, Deep learning workflows, Large scale automation solutions, alerting, Python Programming, Cluster configuration management, Terraform, Enroot, AIHPC schedulers, NVIDIA GPUs, CUDA programming, MLPerf benchmarking, Cloud Deployment, multicloud experience

Apply for this job

Apply