PavanYellathakota
A Data Explorer passionate about every field where data creates impact. My work spans Market Research, Supply Chain Analytics, ETL Pipelines and extends into ML, Deep Learning, and BERT-based Text and Semantic Analysis.
Building at the intersection of data, intelligence, and engineering.
I'm a Data Analyst turned ML Engineer with a Master's in Applied Data Science and 3+ years of experience building end-to-end data pipelines and ML systems. My work spans the full spectrum, from Market Research and Supply Chain Analytics to designing Databases, ETL Pipelines, and production-grade AI backends.
I extend that foundation into Machine Learning and Deep Learning, with a specific focus on BERT-based Text & Semantic Analysis. I'm skilled at bridging the gap between data engineering and stakeholder decision-making through statistical validation, A/B testing, and interactive analytics.
Currently working as a Founding ML Engineer at Alphonso AI, where I architect retrieval systems and LLM-driven backends, specialising in transitioning R&D research into production infrastructure using FastAPI, pgvector, and Vertex AI.
Technical stack
Tools and technologies across the full data & ML lifecycle, from raw ingestion to model deployment and visualisation.
Professional journey
From market research and supply chain analytics to founding ML infrastructure and quantitative research on live portfolios.
- Backend ArchitectureDesigned a 0→1 Backend Ecosystem using FastAPI and PostgreSQL, orchestrating a scalable microservices bridge between Java-based core services and Python-native ML workloads.
- Cost-Efficient InfrastructureDeployed and managed production services on DigitalOcean VPS to optimize infrastructure overhead; implemented Docker-based containerization to ensure environment parity across R&D and production.
- Advanced Retrieval (RAG)Engineered a Multi-Model "Text-to-Query" (TTQ) engine leveraging Gemini (Vertex AI) and DeepSeek APIs to enable dynamic, prompt-driven semantic search across high-dimensional talent data.
- Search OptimizationDeployed a multi-stage retrieval pipeline utilizing pgvector for Approximate Nearest Neighbor (ANN) search and CUDA-accelerated Cross-Encoders for high-precision re-ranking, targeting 38% improvement in Precision@N.
- Domain-Aware RecommendationDeveloped a sector-specific ranking system using Vectorized Embeddings; shifted logic from generic role-matching to domain-expertise alignment, improving candidate-to-company fit.
- Generative Team-CompositionBuilt a module that translates natural language product descriptions into granular technical requirements and specific candidate matches, bridging the gap for non-technical founders.
- System Design & MCPLed relational schema normalization, API contract definition, and R&D into Model Context Protocol (MCP) for agentic, self-correcting database interactions.
Academic background
Formal training across computer science and applied data science, spanning two continents and over seven years of academic development.
- Specialisation in Machine Learning, NLP, and Cloud Data Engineering
- Graduate research through the Student Managed Investment Fund (SMIF), managing a $650K real-capital portfolio
- Consulting engagement with HAVK Mladost (Croatia) on sports analytics and AWS data infrastructure
- Coursework: Statistical Learning, Big Data Analytics, Financial Modelling, Database Systems
Get in touch
Whether you have a role in mind, a project to discuss, or just want to connect. My inbox is open.
Let's connect
Open to data science, ML engineering, and product analytics roles. Always happy to exchange ideas or explore collaboration.
- pavan.yellathakota.ds@gmail.com
- +1 (929) 278-4589
- Seattle, WA. Open to Relocate and remote opportunities
