Harsh Tomar

Harsh Tomar

I build AI systems that actually work — from tracking tennis balls at 30 FPS to generating novel molecules

Self-taught AI/ML developer who learns by building. I've implemented Vision Transformers, VLMs, and LoRA from scratch in PyTorch. My Tennis Vision project got 23+ stars for detecting players with 95% accuracy and tracking balls in real-time. Currently exploring LLM reasoning, RAG pipelines, and agentic AI. I break things, fix them, and document it all.

Research Paper Implementations

From-scratch PyTorch implementations of cutting-edge AI research papers with detailed architectural breakdowns

Complete PyTorch implementation of PaLiGemma vision-language model combining Google's Gemma language model with SigLIP vision encoder. Features detailed architectural breakdowns, clean educational code, and comprehensive documentation for multimodal AI understanding.

VLM Architecture PaLiGemma Architecture SigLIP Vision Transformer RoPE Embeddings

Architecture Components

  • SigLIP Vision Encoder: Processes images into embeddings using Vision Transformer with 16×16 patches, generating 196 tokens for 224×224 images with learned positional embeddings
  • Gemma Language Model: Decoder-only architecture with RMSNorm, GeLU activations, Rotary Position Encoding (RoPE), and grouped-query attention for efficiency
  • Rotary Position Encoding: Sophisticated position encoding applying rotation matrices to query/key vectors, enabling extrapolation beyond training context length
  • Grouped-Query Attention: Reduces computational requirements by sharing key-value heads across multiple query heads while maintaining quality
  • KV-Cache Mechanism: Efficient autoregressive inference with cached key-value pairs for faster generation

Implementation Features

  • Complete from-scratch implementation in PyTorch with detailed comments
  • Pre-trained weight loading from Hugging Face (6GB model)
  • Visual question answering, image captioning, and multimodal chat capabilities
  • Inference script with top-p sampling and temperature control
  • Comprehensive documentation with architecture diagrams and research papers
PaLiGemma VLM Model
SigLIP Vision Encoder
Gemma Language Model
RoPE Position Encoding
PyTorch • Transformers • SigLIP • Gemma • RoPE • KV-Cache • Vision-Language Models

Complete PyTorch implementation of Vision Transformer from "An Image is Worth 16x16 Words" paper. Includes training pipelines for CIFAR-10 and ImageNet with patch embedding, multi-head self-attention, position encodings, and comprehensive architectural visualizations.

Patch Embeddings Multi-Head Attention ViT Classifier Layer Normalization

Architecture Implementation

  • Patch Embedding: Divides images into 16×16 non-overlapping patches, linearly projects to embedding dimension using Conv2d for efficiency
  • Multi-Head Self Attention: Jointly attends to information from different representation subspaces with scaled dot-product attention
  • MLP Block: Feed-forward network with GELU activation applied after attention mechanism
  • Transformer Block: Combines attention and MLP with residual connections and layer normalization for stable training
  • Class Token: Learnable embedding prepended to sequence for classification, similar to BERT's [CLS] token

Implementation Variants

  • CIFAR-10: Lightweight model (384 dim, 6 blocks, 6 heads) for educational purposes with attention visualization
  • ImageNet: Full-scale model (768 dim, 12 blocks, 12 heads) with flash attention and distributed training support
  • Data augmentation strategies (RandAugment, Mixup, CutMix)
  • Different pooling strategies (CLS token vs average pooling)
  • Comprehensive training analysis with and without augmentation
16×16 Patch Size
12 Layers Transformer Blocks
768 Embedding Dim
12 Heads Attention Heads
PyTorch • Vision Transformer • Self-Attention • CIFAR-10 • ImageNet • GELU

Pure PyTorch implementations of LoRA and QLoRA for memory-efficient fine-tuning of large language models and vision transformers. Features custom training scripts, 4-bit quantization, and practical examples achieving 65-85% memory reduction while maintaining performance.

LoRA Architecture QLoRA Architecture Memory Comparison

LoRA Architecture

  • Low-Rank Adaptation: Injects trainable rank decomposition matrices (A, B) into frozen pre-trained weights W, computing y = x(W + AB)
  • Parameter Efficiency: Trains <1% of parameters with rank r typically 8, 16, or 32
  • Scaling Factor: Uses α to control update magnitude with rank-stabilized variant (α·AB/√r)
  • Layer Support: LoRALinear, LoRAEmbedding, LoRAConv2d for different layer types
  • Memory Reduction: 65% reduction for BERT, 50% for LLaMA-7B

QLoRA Innovations

  • 4-bit NF4 Quantization: Normal Float data type optimized for LLM weight distributions
  • Double Quantization: Quantizes quantization constants for additional memory savings
  • Paged Optimizers: Offloads optimizer states to CPU reducing GPU memory usage
  • BF16 LoRA Training: Maintains adapters in BF16 precision for numerical stability
  • Memory Efficiency: 85% reduction enabling LLaMA-65B fine-tuning on consumer GPUs
85% Memory Reduction
<1% Trainable Params
4-bit Quantization
NF4 Data Type
PyTorch • LoRA • QLoRA • 4-bit Quantization • PEFT • Fine-tuning • Memory Optimization

Core concepts of reasoning in Large Language Models implemented from scratch. Explores inference-time compute scaling, reinforcement learning approaches, chain-of-thought mechanisms, and advanced reasoning techniques for building more capable AI systems.

Reasoning Test-Time Compute Scaling Laws Chain of Thought Prompting Beam Search Decoding Tree

Learn more about test-time compute mechanisms in Davis Treybig's blog.

Inference-Time Compute Scaling

  • Zero-Shot Prompting: Applied to Llama 3.2 built from scratch for baseline reasoning capabilities
  • Beam Search: Demonstration of search-based decoding strategies for improved output quality
  • Method Comparison: Comparing different inference-time compute methods against baseline performance
  • Model Size Effects: Analysis of how model size impacts accuracy with chain-of-thought reasoning
  • Scaling Test Time Compute: Research on computational strategies during inference for better reasoning

Reinforcement Learning Approaches

  • Exploration of RL techniques for improving reasoning capabilities
  • Policy optimization for multi-step reasoning tasks
  • Reward modeling for reasoning quality assessment
  • Integration of RL with language model pre-training

Implementation Notebooks

  • Applying Zero-Shot Prompting to Llama 3.2 Built From Scratch
  • Beam Search Demonstration with detailed visualizations
  • Inference-Time Compute Scaling: Comparing Different Methods
  • Effect of Model Size on Accuracy with Chain-of-Thought Reasoning
Llama 3.2 Base Model
CoT Chain-of-Thought
RL Reinforcement Learning
Beam Search Decoding Strategy
PyTorch • Reasoning • LLMs • Reinforcement Learning • Chain-of-Thought • Llama 3.2

Additional Projects

Multiple PDF Chat App

Conversational AI application for querying multiple PDF documents simultaneously using RAG architecture. Features semantic search, context-aware responses, and document source attribution.

LangChain • RAG • PDF Processing • Vector DB

Google ADK Experiments

Exploration of Google's Agent Development Kit (ADK) for building intelligent agents with tool integration, function calling, and multi-step reasoning capabilities.

Google ADK • Agent Development • Function Calling

Web Search Agent

Intelligent web search agent with query understanding, result ranking, and answer synthesis. Implements semantic search and multi-source information aggregation.

LangChain • Web Search • NLP • Information Retrieval

LLMOps

Comprehensive guide to LLM operations covering deployment, monitoring, maintenance, and improvement of LLM applications at scale. Production best practices and tooling.

MLOps • LLM Deployment • Monitoring • Production

Cursor Agent

Python-based AI agent replicating Cursor's coding assistant capabilities with function calling, code generation, and intelligent coding assistance using Claude, OpenAI, and Ollama.

Python • AI Agents • Code Generation • Function Calling

Reasoning from Scratch

Implementation of reasoning LLM in PyTorch from scratch, step by step. Explores chain-of-thought, tree-of-thought, and other reasoning mechanisms in language models.

PyTorch • Reasoning • LLM • From Scratch

ML/DL/NLP Learning

Comprehensive collection of machine learning, deep learning, and NLP implementations covering fundamental algorithms to advanced architectures.

ML • DL • NLP • PyTorch • TensorFlow

Dog vs Cat Classifier

CNN-based image classifier for binary classification with data augmentation, transfer learning, and model optimization techniques.

CNN • Transfer Learning • Image Classification

InsureML Pipeline

End-to-end MLOps vehicle insurance prediction system achieving 87% accuracy. Features MongoDB Atlas, AWS (S3, ECR, EC2), Docker containerization, FastAPI, and CI/CD with GitHub Actions.

MLOps • FastAPI • MongoDB • AWS • Docker • CI/CD

MLOps Learning

Comprehensive MLOps concepts with 130+ commits covering Docker, Kubernetes, DVC data versioning, MLflow experiment tracking, CI/CD pipelines, and Prometheus & Grafana monitoring.

Docker • Kubernetes • DVC • MLflow • CI/CD • Prometheus

AgentForge

Comprehensive guide for building AI agents using modern frameworks. Covers CrewAI, LangGraph, AG2 (AutoGen), LlamaIndex, smolagents, and more with hands-on examples.

CrewAI • LangGraph • AG2 • LlamaIndex • AI Agents

Technical Expertise

Programming Languages

Python JavaScript Bash

ML/AI Frameworks

PyTorch TensorFlow Scikit-learn HuggingFace Keras XGBoost

Computer Vision

YOLOv5-v8 OpenCV Object Detection Image Segmentation Supervision Optical Flow PIL

Generative AI & LLM

LangChain LangGraph LlamaIndex CrewAI AG2 (AutoGen) RAG Prompt Engineering OpenAI API

Data Science

NumPy Pandas Matplotlib Seaborn Statistical Analysis Feature Engineering A/B Testing

Development Tools

Git Docker Jupyter VS Code Streamlit FastAPI Flask

Cloud & Deployment

AWS (EC2, S3, Lambda) GCP Firebase Netlify CI/CD MLflow

Databases

MongoDB Pinecone Chroma Vector DBs

Professional Experience

AI Intern

i3 Digital Health May 2025 - Present

Architecting intelligent research profiling systems aggregating 10,000+ research papers from multiple APIs, reducing manual research time by 85%. Built NLP pipelines using LangChain achieving 92% accuracy in topic classification. Developed RAG-powered search agents improving match relevance by 78%. Deployed scalable AI solutions serving 500+ researchers using FastAPI, Docker, and AWS infrastructure.

85% time reduction 92% classification accuracy 120+ users served

Community Contributor

CNCF & Google Developer Groups Jan 2023 - Present

Active member of Cloud Native Computing Foundation participating in 15+ cloud-native technology discussions. Engaged in Google Developer Groups collaborating on machine learning initiatives, presenting at 2 tech talks on AI/ML best practices. Mentored 10+ junior developers through community workshops and open-source contributions.

15+ discussions 2 tech talks 10+ mentees

Achievements & Impact

47+
GitHub Repositories
40+
GitHub Stars
1+
Years Experience in AI/ML space
83%
Accuracy Improvements

Education

Bachelor of Technology in AI & Data Science

Lakshmi Narain College of Technology, Bhopal

Nov 2022 - May 2026 • CGPA: 7.2/10

Relevant Coursework: Machine Learning, Computer Vision, Deep Learning, NLP, Data Structures & Algorithms, Reinforcement Learning, Statistical Analysis, Neural Networks

Technical Writings

Tennis Vision: Deep Dive

Comprehensive analysis of building an AI-powered tennis analysis system with computer vision techniques, model training, and performance optimization strategies.

In Progress

Reasoning in LLMs from Scratch

Exploring core concepts of reasoning capabilities in Large Language Models, implementation details, and architectural considerations for building reasoning systems.

In Progress

Recommendation

"I was impressed by Harsh's commitment and technical prowess — he attacks each challenge with enthusiasm, learning desire, and will to accomplish. His interest in Machine Learning, Computer Vision, and AI has surpassed what one might initially expect from someone at his level."
Yashvardhan Singh Software Engineer at BARCO, B.Tech. IIT Delhi

Open to AI/ML internships and full-time opportunities — feel free to reach out via the links above.