Rajat Gupta

About Me

My Journey

I’m a Data Scientist & ML Engineer with end-to-end experience building production-ready machine learning systems for risk modelling, anomaly detection, and high-frequency data pipelines. My background at CERN has trained me to work with large, complex datasets and deliver models that are fast, reliable, and explainable.

I enjoy working on the full lifecycle — from data engineering and feature design to model training, deployment, and monitoring. Recently, I’ve built an end-to-end anomaly detection pipeline on GitHub event streams (Airflow + MLflow + Docker + AWS), a real-time credit risk scoring system (FastAPI + XGBoost + SHAP), and a vector-search powered analytics assistant using Qdrant and Sentence-Transformers.

I’m comfortable collaborating with cross-functional teams, turning ambiguous problems into measurable outcomes, and deploying models that work in both batch and low-latency environments. Open to Data Scientist, ML Engineer, and ML Ops–oriented roles in Europe or remote.

Skills & Expertise

Python SQL Pandas NumPy Polars DuckDB Bash XGBoost LightGBM CatBoost Scikit-learn Feature Engineering Time-Series Forecasting (ARIMA, Prophet) Anomaly Detection Model Explainability (SHAP) A/B Testing PyTorch TensorFlow Keras CNNs VAEs Generative Models Sentence-Transformers Vector Search (Qdrant) RAG Pipelines Airflow MLflow Docker FastAPI GitHub Actions (CI/CD) Monitoring & Drift Detection (Evidently) AWS (S3, EC2) Terraform Parquet ETL Pipelines Data Validation Statistical Testing Performance Optimization

Education & Experience

Data Scientist [2022 - Present]

ATLAS@LHC (CERN and University of Pittsburgh)

Designed and deployed ML models for anomaly detection and real-time classification on 1B+ high-frequency events, building end-to-end pipelines for feature engineering, training, optimisation, and validation.
Built model-distillation workflows and co-developed low-latency ML models with engineers for hardware-constrained environments, reducing inference latency to sub-3 μs.
Developed scalable data pipelines and automated analysis workflows, improving model throughput and reducing manual effort for the team. Created data-driven compression solutions using tree models and autoencoders, improving storage & retrieval efficiency for high-throughput pipelines.

Data Analyst (Ph.D. in Particle Physics) [2016-2022]

CMS@LHC (CERN and Panjab University)

Processed and analysed large-scale structured event data to identify patterns, anomalies, and key performance signals using statistical modelling and Python- based analytics.
Built reproducible data pipelines for cleaning, validation, and feature extraction across multi-TB datasets, improving team throughput and analysis reliability.
Collaborated with physicists, engineers, and analysts to translate findings into clear insights adopted in workflow improvements.

ML Engineer

Featured Projects

GitHub Anomaly Detection

OverPrompt – IPL Analytics & Chat Agent

Credit Risk & Behaviour Scoring

Robotics Object Detection with YOLO Distillation

About Me

My Journey

Skills & Expertise

Education & Experience

Data Scientist [2022 - Present]

Data Analyst (Ph.D. in Particle Physics) [2016-2022]

Get In Touch

Contact Information

Email

Location

Connect with me