Apply For Senior Site Reliability Engineer - AI Research Clusters
NVIDIA
Office Location
Full Time
Experience: 5 - 5 years required
Pay:
Salary Information not included
Type: Full Time
Location: Haryana
Skills: Performance Analysis, Cluster operations, Real Time monitoring, Logging, Bash Scripting, Kubernetes, Docker, InfiniBand, RDMA, Lustre, GPFS, GPU Computing, AI Infrastructure, Site Reliability Engineering, Optimizations, Deep learning workflows, Large scale automation solutions, alerting, Python Programming, Cluster configuration management, Terraform, Enroot, AIHPC schedulers, NVIDIA GPUs, CUDA programming, MLPerf benchmarking, Cloud Deployment, multicloud experience
Jobs Form