PavanYellathakota

Data AnalystData ScientistAI / ML Engineer

Seattle, WA, USAOpen for Opportunities

A Data Explorer passionate about every field where data creates impact. My work spans Market Research, Supply Chain Analytics, ETL Pipelines and extends into ML, Deep Learning, and BERT-based Text and Semantic Analysis.

NLP & Text AnalysisETL PipelinesML / Deep LearningQuant ResearchCloud Data Eng.Market Research

Get in touch

GitHub LinkedIn

Download Resume

About me

Building at the intersection of data, intelligence, and engineering.

I'm a Data Analyst turned ML Engineer with a Master's in Applied Data Science and 3+ years of experience building end-to-end data pipelines and ML systems. My work spans the full spectrum, from Market Research and Supply Chain Analytics to designing Databases, ETL Pipelines, and production-grade AI backends.

I extend that foundation into Machine Learning and Deep Learning, with a specific focus on BERT-based Text & Semantic Analysis. I'm skilled at bridging the gap between data engineering and stakeholder decision-making through statistical validation, A/B testing, and interactive analytics.

Currently working as a Founding ML Engineer at Alphonso AI, where I architect retrieval systems and LLM-driven backends, specialising in transitioning R&D research into production infrastructure using FastAPI, pgvector, and Vertex AI.

Retrieval & RAGNLP / BERTTime-Series ForecastingA/B TestingETL / ELTCloud Data Eng.Statistical ModellingBI & Dashboarding

LocationSeattle, WA, USA

GraduateM.S. Applied Data Science, Clarkson University

UndergraduateB.Tech CS, Yogi Vemana University

Emailpavan.yellathakota.ds@gmail.com

StatusOpen to opportunities

Get in touch

Skills

Technical stack

Tools and technologies across the full data & ML lifecycle, from raw ingestion to model deployment and visualisation.

Programming

PythonSQLRPandasNumPyScikit-learnXGBoostPyTorchCausalMLSciPy

Gen AI & Vector DB

Gemini (Vertex AI)HuggingFaceOpenAIOllamaDeepSeekLangChainpgvectorMCPANN Search

ML & Data Science

A/B TestingTime-Series ForecastingStatistical ModelingStatsmodelsProphetSARIMAXBERT / NLPMLflow

Data Engineering

PySparkAWS GlueAirflowdbtAirbytePostgreSQLRedshiftSnowflakeETL / ELT

Backend & MLOps

FastAPIDockerDigitalOceanAWS SageMakerAWS LambdaAmazon S3Amazon AthenaDatabricks

Cloud Analytics & Visualisation

TableauPower BIAmazon QuickSightLooker StudioGoogle Data StudioMatplotlibExcel

Experience

Professional journey

From market research and supply chain analytics to founding ML infrastructure and quantitative research on live portfolios.

Alphonso AI· backed by Shipley Center for InnovationCurrent

Founding ML EngineerRemote, USA

Jul 2025 – Present8 mos

Backend ArchitectureDesigned a 0→1 Backend Ecosystem using FastAPI and PostgreSQL, orchestrating a scalable microservices bridge between Java-based core services and Python-native ML workloads.
Cost-Efficient InfrastructureDeployed and managed production services on DigitalOcean VPS to optimize infrastructure overhead; implemented Docker-based containerization to ensure environment parity across R&D and production.
Advanced Retrieval (RAG)Engineered a Multi-Model "Text-to-Query" (TTQ) engine leveraging Gemini (Vertex AI) and DeepSeek APIs to enable dynamic, prompt-driven semantic search across high-dimensional talent data.
Search OptimizationDeployed a multi-stage retrieval pipeline utilizing pgvector for Approximate Nearest Neighbor (ANN) search and CUDA-accelerated Cross-Encoders for high-precision re-ranking, targeting 38% improvement in Precision@N.
Domain-Aware RecommendationDeveloped a sector-specific ranking system using Vectorized Embeddings; shifted logic from generic role-matching to domain-expertise alignment, improving candidate-to-company fit.
Generative Team-CompositionBuilt a module that translates natural language product descriptions into granular technical requirements and specific candidate matches, bridging the gap for non-technical founders.
System Design & MCPLed relational schema normalization, API contract definition, and R&D into Model Context Protocol (MCP) for agentic, self-correcting database interactions.

PythonFastAPIPostgreSQLDockerpgvectorHuggingFaceGeminiVertex AIDeepSeekDigitalOcean

Student Managed Investment Fund· Clarkson University

Graduate Quantitative ResearcherPotsdam, NY

Built a BERT-based NLP sentiment pipeline for SEC EDGAR and Reddit data, and developed Monte Carlo simulations to stress-test a $650K portfolio that delivered a 51% return, outperforming S&P 500.

Sep 2024 – Apr 20257 mos

HAVK Mladost· Elite Athletics Club

Graduate Data Science ConsultantPotsdam, NY

Architected a centralised AWS S3 data lake for an elite athletics club and built PySpark ETL jobs on AWS Glue, cutting data retrieval latency by 30% and enabling real-time performance analytics.

Oct 2023 – May 20251 yr 7 mos

eAppSys Limited

Business Data AnalystHyderabad, India

Built demand forecasting models (Prophet/SARIMAX) for 1,500+ SKUs improving forecast accuracy by 15%, and deployed automated KPI dashboards saving the procurement team 12+ hours/week.

Jul 2022 – Dec 20225 mos

Kantar GDC India

Data AnalystPune, India

Built automated data pipelines integrating 10M+ survey records from 30+ sources for Tracker and Syndicated Research projects, cutting processing latency by 30% across MECA markets.

Sep 2021 – May 20228 mos

Education

Academic background

Formal training across computer science and applied data science, spanning two continents and over seven years of academic development.

Master of Science· Applied Data Science

Clarkson UniversityPotsdam, NY, USA

2023 to 2025

Specialisation in Machine Learning, NLP, and Cloud Data Engineering
Graduate research through the Student Managed Investment Fund (SMIF), managing a $650K real-capital portfolio
Consulting engagement with HAVK Mladost (Croatia) on sports analytics and AWS data infrastructure
Coursework: Statistical Learning, Big Data Analytics, Financial Modelling, Database Systems

Machine LearningNLPCloud EngineeringFinancial AnalyticsStatistics

Bachelor of Technology· Computer Science

Yogi Vemana UniversityProddatur, Andhra Pradesh, India

Undergraduate engineering degree in Computer Science with foundation in data structures, algorithms, databases, and software engineering.

2016 to 2020

Projects

Notable work

End-to-end builds spanning NLP pipelines, ML systems, financial analytics, full-stack web apps, and data-driven research.

Text Analysis using NLP and LDAFeatured

NLP & Text

Topic modelling and text analysis pipeline using Latent Dirichlet Allocation.

Detoxify TeluguFeatured

NLP & Text

Toxic comment classification for the Telugu language using deep learning.

Synthetic Data Generator

Machine Learning

Configurable tool to generate realistic synthetic datasets for ML training and QA.

BingeMax Recommendation EngineFeatured

Recommendation

Personalised movie recommendation system using collaborative and content filtering.

Pre-Owned Cars Price Prediction

Machine Learning

ML regression model predicting used car prices with 90%+ accuracy.

Fake News ClassifierFeatured

NLP & Text

NLP-based classification system to detect fake news articles with high precision.

Stay Connected

Get in touch

Whether you have a role in mind, a project to discuss, or just want to connect. My inbox is open.

Let's connect

Open to data science, ML engineering, and product analytics roles. Always happy to exchange ideas or explore collaboration.

pavan.yellathakota.ds@gmail.com
+1 (929) 278-4589
Seattle, WA. Open to Relocate and remote opportunities

GitHub LinkedIn

Available for new opportunities