Sr Manager - Platform Engineering Tata Communications
Tata Communications
Office Location
Full Time
Experience: 7 - 7 years required
Pay:
Salary Information not included
Type: Full Time
Location: Chennai
Skills: Linux, Kubernetes, Ansible, Scripting, BCM, CUDA, aws, Azure, Jenkins, Git, sonarqube, Bugzilla, VLAN, InfiniBand, Routing, Firewall, NFS, HP, Zabbix, SNOW, SLUM, Nvidia BCM, Elk, NVIDIA GPU, Python Scripting, LLM, Generative Ai, Google Cloud, Harbor Registry, VXLAN, IP Subnetting, DDN, Parallel FS, Object Storage, Dell, Promotus Grafana
About Tata Communications
Job Description
You will be part of the AI/HPC engineering team specializing in platform standardization initiatives, innovation, testing, and optimization of various AI technologies. Your role will involve installation, administration, troubleshooting, and analytical skills in technology stacks such as Linux, Kubernetes, SLUM, Nvidia BCM, and open-source infrastructure tools like Ansible and scripting. As a qualified candidate with a B.E/B.Tech degree and over 7+ years of experience in the IT Infrastructure industry, including 7 to 8 years in HPC and/or AI technology, you should possess a strong knowledge of scripting and Linux, with a minimum of 2 years in Kubernetes. Your responsibilities will include managing, installing, configuring, deploying, troubleshooting, and administrating open-source HPC software like BCM, SLUM, Ansible, and ELK. Additionally, you should have a good grasp of Linux OS with scripting, knowledge in BCM, Nvidia GPU, and Cuda, and experience with Ansible playbook and managing HPC environments. Exposure to Python scripting and familiarity with at least one of the LLM/Generative AI and GPU offerings on public clouds such as AWS, Azure, or Google Cloud will be beneficial. You should also have experience in using DevOps tools for deploying and managing tools like Jenkins, Git, SonarQube, Bugzilla, and Harbor Registry. It would be advantageous if you have knowledge in networking concepts like VLAN, VXLAN, InfiniBand, IP Subnetting, routing, and firewall, as well as in storage technologies such as DDN, Parallel FS, object storage, and NFS. Familiarity with infrastructure components like HP/Dell rack servers and GPU, and management/monitoring tools like Zabbix, Promotus Grafana, and SNow will also be valued.,