Senior MLOps Engineer
Job Brief
Position: Senior MLOps Engineer
Role Objective
Ensure all production Machine Learning models are reliable, continuously monitored, retrainable, and fully auditable. This role exists to prevent silent model decay and reduce operational load on Data Science teams by architecting robust, automated deployment and monitoring frameworks.
Role Description
The Senior MLOps Engineer is responsible for the technical implementation of production-grade Machine Learning systems. This includes architecting scalable pipelines, managing real-time inference services, and ensuring seamless integration within cloud environments.
The role focuses on transitioning models from experimentation to high-throughput production services, ensuring each solution is secure, performant, and engineered to solve complex organizational challenges at scale.
Company Overview
NXT is a fast-growing technology organization focused on building scalable products and intelligent solutions that simplify and improve the lives of users worldwide. We aim to foster a high-performance technology culture and an entrepreneurial ecosystem recognized for excellence and innovation.
Key Responsibilities
End-to-End Productionization: Architect and deploy ML models across the full lifecycle (training → deployment → monitoring).
Model Observability: Implement comprehensive monitoring for data drift, model drift, and performance degradation.
Retraining & Rollbacks: Design and own automated retraining pipelines and safe rollback mechanisms to maintain model integrity.
ML CI/CD: Build and maintain automated workflows for model versioning, lineage tracking, and artifact storage.
Inference Optimization: Apply optimization techniques such as quantization, pruning, and caching to reduce compute costs and improve latency.
Standardization: Establish best practices for model registries, metadata management, and reproducibility of production assets.
Operational Integration: Collaborate with cross-functional teams to operationalize ML models for use cases such as search ranking, valuation engines, and quality scoring.
Technical Stack & Tools
Programming: Python (Expert), SQL (Expert), FastAPI or Flask (Model APIs)
Orchestration: Airflow, Dagster, or Prefect; GitHub Actions for CI/CD
MLOps Frameworks: MLflow, DVC, Weights & Biases, Kubeflow
Infrastructure: Docker, Kubernetes
Cloud Platforms: AWS SageMaker, Azure ML, or GCP Vertex AI
Monitoring: Prometheus, Grafana, or equivalent observability tools
Data Processing: Apache Spark (PySpark), Databricks, or Snowflake
Qualifications & Experience
Experience:
5+ years in Data Science / Machine Learning
3+ years specifically in MLOps or Production ML Engineering
Production Delivery:
Proven experience building, deploying, and maintaining high-throughput inference services in live production environments.System Grit:
Comfortable working with legacy systems and imperfect infrastructure to design robust, scalable pipelines.Business Systems Integration:
Experience integrating ML solutions with enterprise systems such as ERP, CRM, or supply chain platforms.Cloud Collaboration:
Strong experience delivering AI solutions on AWS, Azure, or GCP in collaboration with Data Engineering and DevOps teams.
Additional Notes
Immediate joiners will be preferred.