DevOps Engineer Albatronix

  • company name Albatronix
  • working location Office Location
  • job type Full Time

Experience: 5 - 5 years required

Pay:

Salary Information not included

Type: Full Time

Location: Karnataka

Skills: Cloud Services, containerization, Automation tools, version control, devops

About Albatronix

Job Description

About The Opportunity Were a deep-tech innovator at the intersection of Artificial Intelligence, machine-learning infrastructure, and edge-to-cloud platforms. Our award-winning solutions let Fortune-500 enterprises build, train, and deploy large-scale AI modelsseamlessly, securely, and at lightning speed. As global demand for generative AI, RAG pipelines, and autonomous agents accelerates, were scaling our MLOps team to keep our customers two steps ahead of the curve. Role & Responsibilities (max 6) Own the full MLOps stackdesign, build, and harden GPU-accelerated Kubernetes clusters across on-prem DCs and AWS/GCP/Azure for model training, fine-tuning, and low-latency inference. Automate everything: craft IaC modules (Terraform/Pulumi) and CI/CD pipelines that deliver zero-downtime releases and reproducible experiment tracking. Ship production-grade LLM workloadsoptimize RAG/agent pipelines, manage model registries, and implement self-healing workflow orchestration with Kubeflow/Flyte/Prefect. Eliminate bottlenecks: profile CUDA, resolve driver mismatches, and tune distributed frameworks (Ray, DeepSpeed) for multi-node scale-out. Champion reliability: architect HA data lakes, databases, ingress/egress, DNS, and end-to-end observability (Prometheus/Grafana) targeting 99.99 % uptime. Mentor & influence: instill platform-first mind-set, codify best practices, and report progress/road-blocks directly to senior leadership. Skills & Qualifications (max 6) Must-Have 5 + yrs DevOps/Platform experience with Docker & Kubernetes; expert bash/Python/Go scripting. Hands-on building ML infrastructure for distributed GPU training and scalable model serving. Deep fluency in cloud services (EKS/GKE/AKS), networking, load-balancing, RBAC, and Git-based CI/CD. Proven mastery of IaC & config-management (Terraform, Pulumi, Ansible). Preferred Production experience with LLM fine-tuning, RAG architectures, or agentic workflows at scale. Exposure to Kubeflow, Flyte, Prefect, or Ray; track record of setting up observability and data-lake pipelines (Delta Lake, Iceberg). Skills: cloud services,containerization,automation tools,version control,devops,