PavanYellathakota

Data AnalystData ScientistAI / ML Engineer
Seattle, WA, USAOpen for Opportunities

A Data Explorer passionate about every field where data creates impact. My work spans Market Research, Supply Chain Analytics, ETL Pipelines and extends into ML, Deep Learning, and BERT-based Text and Semantic Analysis.

NLP & Text AnalysisETL PipelinesML / Deep LearningQuant ResearchCloud Data Eng.Market Research
About me

Building at the intersection of data, intelligence, and engineering.

I'm a Data Analyst turned ML Engineer with a Master's in Applied Data Science and 3+ years of experience building end-to-end data pipelines and ML systems. My work spans the full spectrum, from Market Research and Supply Chain Analytics to designing Databases, ETL Pipelines, and production-grade AI backends.

I extend that foundation into Machine Learning and Deep Learning, with a specific focus on BERT-based Text & Semantic Analysis. I'm skilled at bridging the gap between data engineering and stakeholder decision-making through statistical validation, A/B testing, and interactive analytics.

Currently working as a Founding ML Engineer at Alphonso AI, where I architect retrieval systems and LLM-driven backends, specialising in transitioning R&D research into production infrastructure using FastAPI, pgvector, and Vertex AI.

Retrieval & RAGNLP / BERTTime-Series ForecastingA/B TestingETL / ELTCloud Data Eng.Statistical ModellingBI & Dashboarding
Skills

Technical stack

Tools and technologies across the full data & ML lifecycle, from raw ingestion to model deployment and visualisation.

Programming
PythonSQLRPandasNumPyScikit-learnXGBoostPyTorchCausalMLSciPy
Gen AI & Vector DB
Gemini (Vertex AI)HuggingFaceOpenAIOllamaDeepSeekLangChainpgvectorMCPANN Search
ML & Data Science
A/B TestingTime-Series ForecastingStatistical ModelingStatsmodelsProphetSARIMAXBERT / NLPMLflow
Data Engineering
PySparkAWS GlueAirflowdbtAirbytePostgreSQLRedshiftSnowflakeETL / ELT
Backend & MLOps
FastAPIDockerDigitalOceanAWS SageMakerAWS LambdaAmazon S3Amazon AthenaDatabricks
Cloud Analytics & Visualisation
TableauPower BIAmazon QuickSightLooker StudioGoogle Data StudioMatplotlibExcel
Experience

Professional journey

From market research and supply chain analytics to founding ML infrastructure and quantitative research on live portfolios.

  • Backend ArchitectureDesigned a 0→1 Backend Ecosystem using FastAPI and PostgreSQL, orchestrating a scalable microservices bridge between Java-based core services and Python-native ML workloads.
  • Cost-Efficient InfrastructureDeployed and managed production services on DigitalOcean VPS to optimize infrastructure overhead; implemented Docker-based containerization to ensure environment parity across R&D and production.
  • Advanced Retrieval (RAG)Engineered a Multi-Model "Text-to-Query" (TTQ) engine leveraging Gemini (Vertex AI) and DeepSeek APIs to enable dynamic, prompt-driven semantic search across high-dimensional talent data.
  • Search OptimizationDeployed a multi-stage retrieval pipeline utilizing pgvector for Approximate Nearest Neighbor (ANN) search and CUDA-accelerated Cross-Encoders for high-precision re-ranking, targeting 38% improvement in Precision@N.
  • Domain-Aware RecommendationDeveloped a sector-specific ranking system using Vectorized Embeddings; shifted logic from generic role-matching to domain-expertise alignment, improving candidate-to-company fit.
  • Generative Team-CompositionBuilt a module that translates natural language product descriptions into granular technical requirements and specific candidate matches, bridging the gap for non-technical founders.
  • System Design & MCPLed relational schema normalization, API contract definition, and R&D into Model Context Protocol (MCP) for agentic, self-correcting database interactions.
PythonFastAPIPostgreSQLDockerpgvectorHuggingFaceGeminiVertex AIDeepSeekDigitalOcean
Education

Academic background

Formal training across computer science and applied data science, spanning two continents and over seven years of academic development.

  • Specialisation in Machine Learning, NLP, and Cloud Data Engineering
  • Graduate research through the Student Managed Investment Fund (SMIF), managing a $650K real-capital portfolio
  • Consulting engagement with HAVK Mladost (Croatia) on sports analytics and AWS data infrastructure
  • Coursework: Statistical Learning, Big Data Analytics, Financial Modelling, Database Systems
Machine LearningNLPCloud EngineeringFinancial AnalyticsStatistics
Projects

Notable work

End-to-end builds spanning NLP pipelines, ML systems, financial analytics, full-stack web apps, and data-driven research.

Stay Connected

Get in touch

Whether you have a role in mind, a project to discuss, or just want to connect. My inbox is open.

Let's connect

Open to data science, ML engineering, and product analytics roles. Always happy to exchange ideas or explore collaboration.

Available for new opportunities
© 2026 Pavan Yellathakota. Built with Astro + Tailwind.pye.pages.dev