Data Scientist & ML Engineer

Avalvir
Sekhon

I build systems that learn, predict, and generate — from RAG pipelines to fine-tuned vision models to time-series forecasts.

LLMs & RAG Computer Vision NLP Time Series Python · R GCP · Azure MDS @ UBC
View Projects LinkedIn ↗
Scroll to explore

Turning messy
data into decisions

I'm a Data Scientist with 6+ years of experience building ML systems across publishing, agriculture, and forecasting. Currently finishing my Master of Data Science at UBC.

My sweet spot is end-to-end work — from raw data wrangling to deployed, production-grade models. I've shipped RAG systems, crop disease detectors, demand forecasters, and most recently, a comparative fine-tuning study on vision transformers.

Open to roles across Canada.

6+
Years Industry Experience
3
Companies shipped at
85.6%
ViT accuracy, Food-101
MDS
UBC Graduate 2026
Work history

Experience

Dec 2023 – Aug 2025
Doaba Publications
Data Scientist
  • Built a RAG system integrating retrieval with GPT-family models — boosted response accuracy and customer satisfaction by 20%
  • Developed an NLP Content Recommendation System with BERT, NER, topic modeling & semantic similarity; deployed on GCP with real-time updates — 25% lift in reader retention
  • Created a Text Summarization tool using Seq2Seq & Transformer architectures achieving 30% reduction in reading time
  • Managed multi-cloud deployment with Docker on GCP; streamlined ETL pipelines via SQL stored procedures
RAGLLMsBERT GCPDockerLangChainHuggingFace
Jun 2022 – Dec 2023
Arctic Glacier Canada Inc.
Forecast Analyst (Data Scientist)
  • Deployed time-series forecasting system (ARIMA, Prophet, LSTM) for weekly/monthly demand — 23% improvement over baseline
  • Ran end-to-end EDA & feature engineering pipeline — 20% improvement in model accuracy
  • Built interactive Tableau dashboards for forecasted trends, directly supporting budget planning decisions
  • Managed data migration from AWS → Azure Databricks; version-controlled ML workflows with Git/GitHub
ARIMALSTMProphet TableauPower BIAzureSQL
Jan 2019 – May 2022
Black Eye Technologies
Data Scientist
  • Digitized agricultural records with OCR + NER (spaCy); built preprocessing pipelines in Python/Pandas
  • Developed a CNN-based crop disease detection system (TensorFlow) — 20% accuracy improvement over traditional methods, with real-time diagnosis interface for farmers
  • Monitored model performance with continuous feedback loops; collaborated with domain experts on feature selection
CNNTensorFlowOCR spaCyPower BIComputer Vision
Selected work

Projects

🍔
Fine-Tuning Showdown:
Food-101 Vision Models
A collaborative deep learning study with teammates Zaed and Zihuan comparing fine-tuning strategies across ResNet-18, EfficientNetV2-S, and ViT on the Food-101 dataset (101 classes, 100K images). The goal: find what actually works when you can't retrain the whole model.
ViT with LayerNorm tuning: 85.6% accuracy training only 58K params
Tested blur, noise, occlusion & downsampled inputs for real-world robustness
Grad-CAM heatmaps revealed how each strategy attends spatially — stunning
LoRA on the head alone? Struggled on all 3 architectures for 101 classes
ViTResNet-18 EfficientNetLoRA Grad-CAMPyTorch
01
Model Accuracy Comparison
ViT (LayerNorm) 85.6%
EfficientNetV2-S
ResNet-18
LoRA (all 3 models) struggled
Built with Zaed & Zihuan · Full notebook + Grad-CAM visualizations on GitHub
02
💧
Underground Water Level Prediction
Time series forecasting model (ARIMA, SARIMA) predicting underground water levels with a 5-year forecast horizon. Extensive preprocessing, single & double-variable analyses, and a user-friendly Tkinter interface.
ARIMA & SARIMA with seaborn/matplotlib visualizations
5-year predictive model with Tkinter UI for accessibility
Single & double-variable trend analysis
Technical stack

Skills

ML & AI
  • Large Language Models (GPT, BERT)
  • RAG & LangChain / LlamaIndex
  • CNNs, RNNs, LSTMs, ViT
  • NLP & Transformers
  • Computer Vision
  • Time Series Forecasting
  • Recommendation Systems
  • XGBoost, Random Forest
Cloud & MLOps
  • GCP (BigQuery, Cloud Run)
  • Azure Databricks
  • AWS
  • Docker & OpenShift
  • Git / GitHub
  • MLOps pipelines
Data & Viz
  • SQL & ETL processes
  • Tableau
  • Power BI
  • Matplotlib, Seaborn
  • Pandas, NumPy
  • Scikit-learn, Keras
Languages
  • Python
  • R
  • C / C++
  • SQL
  • Linux / Bash
Academic background

Education

Master of Data Science
University of British Columbia
Sept 2025 – June 2026

Coursework in Bayesian statistics, machine learning, data visualization, and advanced statistical modeling.

Bachelor of Science — Computer Science
Punjab Technical University
Jan 2015 – Nov 2018
Get in touch

Let's work
together

Open to Data Scientist roles across Canada. Always happy to talk about ML, NLP, or fine-tuning strategies that actually work.