AI AND MACHINE LEARNING ENGINEER Dicetek LLC

  • company name Dicetek LLC
  • working location Office Location
  • job type Full Time

Experience: 2 - 2 years required

Pay:

Salary Information not included

Type: Full Time

Location: All India

Skills: devops, containerization, Docker, Python, Bash, MLOps, backend engineering, DeepStream, Gstplugins, nvdsanalytics, nvstreammux, GPU scheduling, NVIDIA GPUs, TensorRT, mixed precision, CUDA toolkit, Yolo, CNNs, LLMs, CICD scripting, cloud GPUs, Edge devices, Nsight Systems, DCGM, Triton Inference Server, Distributed Training, PyTorch DDP, DeepSpeed, Frontend, REST gRPC API design

About Dicetek LLC

Job Description

You are in need of 1 AI and Machine Learning Engineer to assist your Team in Emerging Technologies. The chosen resource is required to work offshore and should possess the following qualifications and experience: Must Have: - At least 2 years of experience in MLOps, DevOps, or backend engineering for AI workloads. - Proficiency in DeepStream 7.x power user pipelines, Gstplugins, nvdsanalytics, nvstreammux. - Strong understanding of containerization (Docker) and GPU scheduling. - Demonstrated track record in optimizing latency/throughput on NVIDIA GPUs (TensorRT, mixed precision, CUDA toolkit). - Hands-on experience in deploying YOLO or similar CNNs in a production environment. - Familiarity with self-hosting and serving LLMs (vLLM, TensorRTLLM, or similar) along with quantization, pruning, and distillation. - Proficiency in Python & bash scripting and confidence in CI/CD scripting. Nice to have: - Exposure to cloud GPUs (AWS/GCP/Azure). - Experience with edge devices such as Jetson, Xavier, Orin. - Proficiency in performance profiling with Nsight Systems / DCGM. - Knowledge of Triton Inference Server internals. - Familiarity with distributed training (PyTorch DDP, DeepSpeed). - Basic frontend/REST gRPC API design skills. Responsibilities: - Build & automate inference pipelines. - Design, containerize, and deploy CV models (YOLO v8/v11, custom CNNs) with DeepStream 7.x, optimizing for lowest latency and highest throughput on NVIDIA GPUs. - Migrate existing Triton workloads to DeepStream with minimal downtime. - Serve and optimize large language models. - Self-host Llama 3.2, Llama 4, and future LLM/VLMs on the cluster using best practice quantization, pruning, and distillation techniques. - Expose fast, reliable APIs and monitoring for downstream teams. - Continuous delivery & observability. - Automate build/test/release steps and set up health metrics, logs, and alerts to ensure model stability in production. - Efficiently allocate GPU resources across CV and LLM services. - Model lifecycle support (10-20%): Assist data scientists with occasional fine-tuning or retraining runs and package models for production.,