Harsh Mehta
I spend most of my time convincing AI systems to be boring. The interesting ones hallucinate, improvise, go off-script. The useful ones don't. I'm a data scientist and AI engineer, and the work that keeps me up is the gap between a system that demos well and one you'd trust with someone else's money. I also mix front-of-house audio, which turns out to be reasonable preparation for shipping things that have to work live in front of an audience.
About Me

Most of what I do is translation work. A mortgage underwriter says "this is taking too long" and six weeks later there's a RAG system reading loan documents in under a minute. A venue owner says "I can't keep up with bookings" and a few months later an agent is handling the calendar, the quoting, and the back-and-forth. Somewhere between the complaint and the shipped thing is the part I enjoy.
I'm trained as a data scientist and I now mostly build AI systems. That covers multi-agent LLM workflows at ATC, production RAG for Wayfair and Outamation, and Ensemble, the agentic booking platform I started after NSF I-Corps and about ninety customer conversations. I care about the stuff that makes these systems actually work in front of people: evaluation loops, guardrails, and the boring observability code that tells you when your agent is drifting.
Outside the technical work: I mixed front-of-house audio for years, for rooms from 300 people to stadiums. I think about cinema and music more than is strictly productive. I read a lot of history. I'm always up for a conversation with founders or teams building something strange.
Education
University of Wisconsin-Madison
M.S. in Information (Data, ML, Cloud Focus)
Sept 2023 - May 2025
GPA: 3.95/4.0

University of Mumbai
Bachelor of Management Studies (Business Analytics Major)
Jun 2013 - Jul 2017
GPA: 3.5/4.0
Experience
As a Founder, Data Scientist, and Analyst, I apply deep expertise in data science, machine learning, and cloud technologies (AWS, Python, SQL) to uncover strategic insights and tackle complex business problems.
ATC
Architecting multi-agent LLM workflows and responsible-AI guardrails for enterprise deployments.
Wayfair (Extern)
Deployed real-time agentic decisioning pipelines and LLM inference infrastructure on AWS for enterprise catalog workflows.
Outamation (Extern)
Shipped a production RAG system for financial document intelligence with 95% extraction accuracy and sub-60s SLA.
Ensemble AI
Agentic AI booking platform for live venues — built through NSF I-Corps with a full-stack MVP and LangChain multi-agent core.
HP Tech Ventures
Evaluated startups for investment by analyzing data with Python and developing Snowflake ETL framework
Beats by Dre
Led competitive analysis and gathered consumer insights using Python NLP to boost marketing effectiveness
Prayas Entertainment
Boosted operational efficiency and revenue through data analytics and predictive modeling
Indigo Events & Promotions
Directed data-driven media strategies that improved customer satisfaction and lead conversions
UW Transportation Services Campus Parking Efficiency Project
Analyzed parking transaction records to improve resource allocation and reduce search time
Student Assistant – UW-Madison Music Library
Curated, catalogued, and dispatched thousands of music-collection items while modernising data workflows for campus-wide access.
Projects
A selection of my recent projects in data science, machine learning, and cloud computing.

Triple-Ocr-Rag: Production RAG for Mortgage Intelligence
Three parallel RAG pipelines for mortgage document extraction — DocTR + DeepSeek, Chandra OCR, and a 5-tier local fallback — with FAISS vector search and a Gradio UI.

Auralis: AI Music Intelligence Platform
GPU-accelerated ML pipeline for audio classification and embedding using PyTorch with PANNs and MERT, powering real-time music recommendations.

InsightTube AI: Video Intelligence Platform
AI-driven video intelligence with semantic search, distributed data pipelines, and containerized analytics services.

VoiceQL: Voice-to-Data Query Agent
Edge ML system on Arduino R4 WiFi translating natural language to SQL with sub-2-second end-to-end latency.

Kafka Login Event Stream Pipeline
Real-time Kafka pipeline routing user login events by device platform with missing-data handling, built on Confluent Kafka and Docker.

Fieldbox Sensor EDA
Automated EDA pipeline for IoT sensor data — quality checks, anomaly detection, and plain-English narrative reports generated from a file-watcher daemon.

Arduino Starter: ESP32 EchoKit in C and C++
A beginner-to-intermediate ESP32 project showing how Arduino's .ino files abstract C++, with a parallel pure-C++ implementation of the same hardware logic.

UW Transportation Services Campus Parking Efficiency Project
Analyzed 11M+ parking transaction records using Python, SQL, and Snowflake to uncover usage patterns and weather impacts.

MSBA Financial Group Cloud-Native Data Architecture Project
Designed an end to end AWS data pipeline (S3, Glue, Redshift).

Hard Drive Data Extraction Tool
Developed a user friendly tool (Python, JS, PostgreSQL) to extract hard drive data to JSON, standardizing output for analytics.

Research: Job Posting Analytics on Twitter
Co authoring research analyzing 100k+ Twitter job postings using data mining and NLP to uncover hiring trends for peer reviewed publication.

Seeing Sound
Where this all started. In 2024, I wanted to know what music looked like as data, so I parsed MIDI files from Parker, Zeppelin, Queen, and Gershwin into CSVs and built visualizations in Tableau and Excel.

Graduate Studies: Learnings & Reflections
Overview of key concepts, skills, and insights gained during the M.S. in Information program at UW Madison.

Live Audio / AV Tech Engineer – University of Wisconsin-Madison
Front-of-house audio engineer and AV lead for concerts, and campus festivals.
Skills
My technical expertise and professional capabilities.
Certifications
Industry certifications spanning cloud, data engineering, machine learning, and analytics.

Databricks Data Engineer Associate
Databricks
Issued: Oct 2025
Expires: Oct 2027
ID: 163058068

SnowPro Core Certification
Snowflake
Issued: Oct 2025
Expires: Oct 2027

Salesforce Certified Tableau Data Analyst
Tableau / Salesforce
Issued: Sep 2025
Expires: Sep 2027

Microsoft Certified: Power BI Data Analyst Associate
Microsoft
Issued: Aug 2025
Expires: Aug 2026
ID: 35D921A8713D9F24

Microsoft Certified: Azure Data Scientist Associate
Microsoft
Issued: Aug 2025
Expires: Aug 2026
ID: 6FD3E3FF02565EF2

Microsoft Certified: Azure Fundamentals
Microsoft
Issued: Aug 2025
Expires: Aug 2026
ID: 7930DE5CF753647B
Blog
Thoughts and insights on data science, AI, and technology.
Get in Touch
If you are hiring, investing, or collaborating, I invite you to explore and connect.


