Harsh Mehta

I spend most of my time convincing AI systems to be boring. The interesting ones hallucinate, improvise, go off-script. The useful ones don't. I'm a data scientist and AI engineer, and the work that keeps me up is the gap between a system that demos well and one you'd trust with someone else's money. I also mix front-of-house audio, which turns out to be reasonable preparation for shipping things that have to work live in front of an audience.

About Me

Harsh Mehta profile photo

Most of what I do is translation work. A mortgage underwriter says "this is taking too long" and six weeks later there's a RAG system reading loan documents in under a minute. A venue owner says "I can't keep up with bookings" and a few months later an agent is handling the calendar, the quoting, and the back-and-forth. Somewhere between the complaint and the shipped thing is the part I enjoy.

I'm trained as a data scientist and I now mostly build AI systems. That covers multi-agent LLM workflows at ATC, production RAG for Wayfair and Outamation, and Ensemble, the agentic booking platform I started after NSF I-Corps and about ninety customer conversations. I care about the stuff that makes these systems actually work in front of people: evaluation loops, guardrails, and the boring observability code that tells you when your agent is drifting.

Outside the technical work: I mixed front-of-house audio for years, for rooms from 300 people to stadiums. I think about cinema and music more than is strictly productive. I read a lot of history. I'm always up for a conversation with founders or teams building something strange.

Education

University of Wisconsin-Madison

M.S. in Information (Data, ML, Cloud Focus)

Sept 2023 - May 2025

GPA: 3.95/4.0

UW-Madison logo

University of Mumbai

Bachelor of Management Studies (Business Analytics Major)

Jun 2013 - Jul 2017

GPA: 3.5/4.0

University of Mumbai logo

Experience

As a Founder, Data Scientist, and Analyst, I apply deep expertise in data science, machine learning, and cloud technologies (AWS, Python, SQL) to uncover strategic insights and tackle complex business problems.

Nov 2025 - Present
Iowa (Remote)
ATC logo

ATC

Architecting multi-agent LLM workflows and responsible-AI guardrails for enterprise deployments.

Nov 2025 - Present
New York, NY (Remote)
Wayfair (Extern) logo

Wayfair (Extern)

Deployed real-time agentic decisioning pipelines and LLM inference infrastructure on AWS for enterprise catalog workflows.

Aug 2025 - Oct 2025
New York, NY (Remote)
Outamation (Extern) logo

Outamation (Extern)

Shipped a production RAG system for financial document intelligence with 95% extraction accuracy and sub-60s SLA.

Feb 2025 - Present
Madison, WI / Remote
Ensemble AI logo

Ensemble AI

Agentic AI booking platform for live venues — built through NSF I-Corps with a full-stack MVP and LangChain multi-agent core.

Jul 2024 - Aug 2024
Remote
HP Tech Ventures logo

HP Tech Ventures

Evaluated startups for investment by analyzing data with Python and developing Snowflake ETL framework

May 2024 - Jun 2024
Remote
Beats by Dre logo

Beats by Dre

Led competitive analysis and gathered consumer insights using Python NLP to boost marketing effectiveness

Jan 2021 - Apr 2023
Mumbai, India
Prayas Entertainment logo

Prayas Entertainment

Boosted operational efficiency and revenue through data analytics and predictive modeling

Mar 2017 - Mar 2020
Mumbai, India
Indigo Events & Promotions logo

Indigo Events & Promotions

Directed data-driven media strategies that improved customer satisfaction and lead conversions

2024
Madison, WI
UW Transportation Services Campus Parking Efficiency Project logo

UW Transportation Services Campus Parking Efficiency Project

Analyzed parking transaction records to improve resource allocation and reduce search time

Oct 2023 – May 2025
Madison, WI (On-site/On-campus)
Student Assistant – UW-Madison Music Library logo

Student Assistant – UW-Madison Music Library

Curated, catalogued, and dispatched thousands of music-collection items while modernising data workflows for campus-wide access.

Projects

A selection of my recent projects in data science, machine learning, and cloud computing.

Triple-Ocr-Rag: Production RAG for Mortgage Intelligence
RAG
OCR
FAISS

Triple-Ocr-Rag: Production RAG for Mortgage Intelligence

Three parallel RAG pipelines for mortgage document extraction — DocTR + DeepSeek, Chandra OCR, and a 5-tier local fallback — with FAISS vector search and a Gradio UI.

RAG
OCR
FAISS
DeepSeek
Gradio
Python
Auralis: AI Music Intelligence Platform
PyTorch
PANNs
MERT

Auralis: AI Music Intelligence Platform

GPU-accelerated ML pipeline for audio classification and embedding using PyTorch with PANNs and MERT, powering real-time music recommendations.

PyTorch
PANNs
MERT
FastAPI
React
TypeScript
InsightTube AI: Video Intelligence Platform
Semantic Search
Distributed Systems
Docker

InsightTube AI: Video Intelligence Platform

AI-driven video intelligence with semantic search, distributed data pipelines, and containerized analytics services.

Semantic Search
Distributed Systems
Docker
Python
NLP
VoiceQL: Voice-to-Data Query Agent
Edge ML
Arduino
Whisper.cpp

VoiceQL: Voice-to-Data Query Agent

Edge ML system on Arduino R4 WiFi translating natural language to SQL with sub-2-second end-to-end latency.

Edge ML
Arduino
Whisper.cpp
SQLCoder-7B
Edge Impulse
Kafka Login Event Stream Pipeline
Kafka
Confluent
Docker

Kafka Login Event Stream Pipeline

Real-time Kafka pipeline routing user login events by device platform with missing-data handling, built on Confluent Kafka and Docker.

Kafka
Confluent
Docker
Python
Streaming
Fieldbox Sensor EDA
Python
IoT
EDA

Fieldbox Sensor EDA

Automated EDA pipeline for IoT sensor data — quality checks, anomaly detection, and plain-English narrative reports generated from a file-watcher daemon.

Python
IoT
EDA
Anomaly Detection
Shell
Arduino Starter: ESP32 EchoKit in C and C++
C++
Arduino
ESP32

Arduino Starter: ESP32 EchoKit in C and C++

A beginner-to-intermediate ESP32 project showing how Arduino's .ino files abstract C++, with a parallel pure-C++ implementation of the same hardware logic.

C++
Arduino
ESP32
Embedded
millis()
UW Transportation Services Campus Parking Efficiency Project
Python
SQL
Snowflake

UW Transportation Services Campus Parking Efficiency Project

Analyzed 11M+ parking transaction records using Python, SQL, and Snowflake to uncover usage patterns and weather impacts.

Python
SQL
Snowflake
Tableau
Power BI
MSBA Financial Group Cloud-Native Data Architecture Project
AWS
S3
Glue

MSBA Financial Group Cloud-Native Data Architecture Project

Designed an end to end AWS data pipeline (S3, Glue, Redshift).

AWS
S3
Glue
Redshift
SageMaker
Hard Drive Data Extraction Tool
Python
JavaScript
PostgreSQL

Hard Drive Data Extraction Tool

Developed a user friendly tool (Python, JS, PostgreSQL) to extract hard drive data to JSON, standardizing output for analytics.

Python
JavaScript
PostgreSQL
JSON
Research: Job Posting Analytics on Twitter
NLP
Data Mining
Twitter API

Research: Job Posting Analytics on Twitter

Co authoring research analyzing 100k+ Twitter job postings using data mining and NLP to uncover hiring trends for peer reviewed publication.

NLP
Data Mining
Twitter API
Research
Seeing Sound
Python
MIDI
Tableau

Seeing Sound

Where this all started. In 2024, I wanted to know what music looked like as data, so I parsed MIDI files from Parker, Zeppelin, Queen, and Gershwin into CSVs and built visualizations in Tableau and Excel.

Python
MIDI
Tableau
Excel
Data Viz
Music Analysis
Pandas
Origin Project
Pre-ML
2024
Graduate Studies: Learnings & Reflections
Data Analytics
Machine Learning
Cloud Computing

Graduate Studies: Learnings & Reflections

Overview of key concepts, skills, and insights gained during the M.S. in Information program at UW Madison.

Data Analytics
Machine Learning
Cloud Computing
Live Audio / AV Tech Engineer – University of Wisconsin-Madison
Live Audio
AV Technology
Event Support

Live Audio / AV Tech Engineer – University of Wisconsin-Madison

Front-of-house audio engineer and AV lead for concerts, and campus festivals.

Live Audio
AV Technology
Event Support
Collaboration
Audio Engineering
Problem Solving
Project Coordination

Skills

My technical expertise and professional capabilities.

GenAI & Agentic
GPT-4
Claude
Llama
LangChain
AutoGen
CrewAI
LlamaIndex
RAG
Prompt Engineering
LoRA / QLoRA / PEFT
Fine-Tuning
Responsible AI
Guardrails
Pinecone
FAISS
Chroma
n8n
A2A Protocol
MLOps & Production ML
PyTorch
TensorFlow
Transformers
Scikit-learn
Model Versioning
Drift Detection
Observability
A/B Testing
CI/CD for ML
Evaluation Pipelines
Predictive Modeling
Recommendation Systems
NLP
Data Science & Languages
Python
NumPy
Pandas
PySpark
SQL
SQL Server
R
TypeScript
JavaScript
C++
Statistical Analysis
Data Mining
Data Modeling
ETL / ELT
Cloud & Infrastructure
AWS SageMaker
AWS Lambda
AWS S3
AWS Glue
AWS Redshift
Azure OpenAI
GCP Vertex AI
BigQuery
Databricks
Snowflake
Kafka
Airflow
dbt
Docker
PostgreSQL
MongoDB
Git
Visualization & BI
Tableau
Power BI
Alteryx
Excel
Matplotlib
Seaborn
Plotly
Data Visualization
Web & API Development
FastAPI
Flask
React
Node.js
Streamlit
HTML
CSS
API Development
Linux
Business & Strategy
AI Strategy
Business Intelligence
Business Analysis
Product Development
Project Management
Agile / SDLC
Financial Forecasting
Automation
Media & AV
Live Audio
AV Technology
Audio Engineering
Acoustics
Sound Design

Certifications

Industry certifications spanning cloud, data engineering, machine learning, and analytics.

Databricks logo

Databricks Data Engineer Associate

Databricks

Issued: Oct 2025

Expires: Oct 2027

ID: 163058068

Databricks
Data Engineering
Spark
Delta Lake
Snowflake logo

SnowPro Core Certification

Snowflake

Issued: Oct 2025

Expires: Oct 2027

Snowflake
Data Warehousing
SQL
Tableau / Salesforce logo

Salesforce Certified Tableau Data Analyst

Tableau / Salesforce

Issued: Sep 2025

Expires: Sep 2027

Tableau
Data Visualization
Analytics
Microsoft logo

Microsoft Certified: Power BI Data Analyst Associate

Microsoft

Issued: Aug 2025

Expires: Aug 2026

ID: 35D921A8713D9F24

Power BI
DAX
Data Modeling
Microsoft logo

Microsoft Certified: Azure Data Scientist Associate

Microsoft

Issued: Aug 2025

Expires: Aug 2026

ID: 6FD3E3FF02565EF2

Azure ML
MLOps
Machine Learning
Microsoft logo

Microsoft Certified: Azure Fundamentals

Microsoft

Issued: Aug 2025

Expires: Aug 2026

ID: 7930DE5CF753647B

Azure
Cloud Fundamentals

Get in Touch

If you are hiring, investing, or collaborating, I invite you to explore and connect.

Send a Message