Back to Resource Hub
Pick Your Sector

The domain shapes the work

Most beginners learn tools like Python and SQL in a vacuum. In reality, the domain dictates the workflow, the metrics that matter, and the consequences of being wrong. Sixteen sectors, each with its own language, stack, and ethical terrain.

The key insight

A data scientist in Finance worries about fraud and regulation. One in Healthcare worries about patient privacy and life-or-death accuracy. The Python libraries might overlap, but the thinking is completely different. Pick a domain that genuinely interests you, then build depth there.

01

Tech & SaaS

The landscape: Fast-moving, massive data volume from logs and user events, relentless focus on product growth metrics.

Core objective: Increase user engagement, reduce churn, optimize product features, and grow recurring revenue.

Key Terminology

A/B Testing

Showing two versions of a feature to different user cohorts to measure which performs better. The bread and butter of product analytics.

Churn Rate

The percentage of customers who stop using the product in a given period. The single metric most SaaS companies obsess over.

LTV (Lifetime Value)

Total revenue expected from a single customer account over their relationship with the product. Compared against CAC.

CAC (Customer Acquisition Cost)

How much it costs to acquire one new customer. LTV / CAC ratio determines business viability.

Cohort Analysis

Tracking behavior of a group of users who joined in the same period. Reveals retention patterns over time.

Product Telemetry / Clickstream

Data generated by every click, scroll, and navigation event a user triggers inside the product.

DAU / MAU

Daily Active Users / Monthly Active Users. Core engagement metrics. The DAU/MAU ratio shows stickiness.

Funnel Analysis

Tracking user drop-off at each step toward a goal (signup, purchase, activation). Reveals where users abandon.

Typical Roles

Product Analyst

Analyzes feature usage, designs experiments, and translates data into product decisions.

Growth Analyst

Owns acquisition and activation metrics. Partners with marketing and engineering on growth loops.

Data Scientist

Builds recommendation systems, personalization models, and churn prediction pipelines.

Analytics Engineer

Maintains the semantic data layer. Owns dbt models, data contracts, and BI tooling.

ML Engineer

Deploys ranking, recommendations, and fraud models into real-time serving infrastructure.

Tech Stack

Core

SQLPythondbtSparkAirflow

Specialized

AmplitudeMixpanelLookerTableauModeSegmentBigQuerySnowflakeDatabricks

Ethical terrain: User privacy is the landmine. Clickstream data is intimate. GDPR, CCPA, and cookie consent laws constrain what you can track. Dark patterns in A/B tests (manipulating users into purchases) are a reputation and regulatory risk.

02

Finance, FinTech & Banking

The landscape: High precision requirements, time-series heavy, enormous regulatory overhead, and zero tolerance for model errors that cost money.

Core objective: Manage risk, detect fraud, automate decisions, forecast markets, and satisfy regulators.

Key Terminology

Credit Risk

The probability that a borrower defaults on a loan. Output of credit scoring models (e.g., FICO, internal scorecards).

Fraud Detection

Real-time anomaly detection on transactions. False negatives (missed fraud) cost money. False positives (flagging legit transactions) lose customers.

AML (Anti-Money Laundering)

Detecting patterns of suspicious financial activity and reporting them to regulators. Graph analytics are common here.

Basel III / IV

International banking regulations that determine how much capital banks must hold against risk. Models must satisfy these frameworks.

VaR (Value at Risk)

Statistical measure of the potential loss in value of a portfolio over a defined period at a given confidence level.

Alpha / Beta

Alpha is excess return over a benchmark. Beta measures sensitivity to market movements. Quants live by these.

P&L (Profit and Loss)

The financial statement summarizing revenues, costs, and expenses. Every analyst must be able to read one.

Stress Testing

Simulating extreme market scenarios (2008 crash, COVID shock) to see if a portfolio or bank would survive.

Typical Roles

Credit Risk Analyst

Builds probability-of-default and loss-given-default models for retail and corporate lending.

Fraud Analyst / Scientist

Designs real-time anomaly detection systems for card-not-present fraud and account takeover.

Quantitative Analyst (Quant)

Develops mathematical models for pricing derivatives, high-frequency trading, and portfolio optimization.

AML Data Scientist

Builds graph-based models to detect money laundering networks and layering patterns.

Risk Data Engineer

Builds the pipelines that feed risk dashboards and regulatory reporting systems.

Tech Stack

Core

PythonRSQLSASPandasNumPyStatsModels

Specialized

ProphetARIMABloomberg Terminal APIQuantLibFlinkKafkaDatabricksSnowflake

Ethical terrain: Model bias in credit scoring can illegally deny loans to protected groups (fair lending laws in the US: ECOA, FCRA). Explainability is not optional: regulators require you to tell a customer exactly why they were denied. Black-box models face legal exposure.

03

Healthcare & Bio-Informatics

The landscape: High stakes, strict regulation, notoriously messy data (handwritten notes, inconsistent coding), and decisions that directly affect human lives.

Core objective: Improve patient outcomes, reduce preventable hospitalizations, accelerate drug discovery, and reduce operational costs.

Key Terminology

EHR / EMR

Electronic Health / Medical Records. The digital version of a patient chart. Source of most clinical datasets.

HIPAA (US) / GDPR (EU)

Privacy laws governing patient data. De-identification and access controls are mandatory. Violations carry massive fines.

ICD-10 Codes

International classification of diseases. E11 is Type 2 diabetes. You must understand these to work with claims data.

Sensitivity vs. Specificity

Sensitivity (recall): catching all sick people. Specificity: not over-diagnosing healthy ones. A false negative in cancer screening is catastrophic.

FHIR

Fast Healthcare Interoperability Resources. The API standard for health data exchange between systems.

DICOM

Digital Imaging and Communications in Medicine. The file format for medical images (CT scans, MRIs, X-rays).

Radiomics

Extracting quantitative features from medical images for classification, staging, or prognosis tasks.

Genomics / Transcriptomics

Analysis of DNA sequences (genomics) or gene expression data (transcriptomics) to identify disease markers.

Typical Roles

Clinical Data Analyst

Cleans EHR data and builds dashboards on readmission rates, ICU capacity, and surgical outcomes.

Computational Biologist

Analyzes DNA/RNA sequences to find genetic markers for diseases or drug targets.

Medical Imaging ML Engineer

Trains CNNs to detect tumors, segment organs, or classify pathology slides.

Health Economist / Outcomes Researcher

Models the cost-effectiveness of treatments and interventions for payers and policymakers.

Clinical NLP Scientist

Extracts structured information from unstructured clinical notes using NLP pipelines.

Tech Stack

Core

PythonRSQL

Specialized

BioPythonBioconductor (R)pydicomnibabelFHIR SDKHugging Face (clinical NLP)MIMIC-III datasetCMS Open Data

Ethical terrain: A false negative can kill someone. A false positive causes unnecessary treatment and psychological harm. Explainability to clinicians is non-negotiable. Model fairness across demographic groups (age, race, sex) is a regulatory and ethical requirement in most jurisdictions.

04

Marketing, Ad-Tech & Social Media

The landscape: Text-heavy, fast-moving, driven by sentiment and behavioral signals. Campaigns can turn in 24 hours, and data freshness is critical.

Core objective: Understand brand perception, target advertisements efficiently, measure campaign ROI, and maximize customer lifetime value.

Key Terminology

NLP (Natural Language Processing)

Teaching machines to understand human text. Underpins sentiment analysis, topic modeling, and review classification.

Sentiment Analysis

Classifying text (tweets, reviews) as positive, negative, or neutral toward a brand or topic.

MMM (Marketing Mix Modeling)

Statistical regression models measuring the incremental impact of each marketing channel (TV, digital, radio) on sales.

ROAS (Return on Ad Spend)

Revenue generated per dollar of ad spend. The primary KPI for paid media teams.

Lookalike Audience

Finding new users who statistically resemble your best existing customers. Common in Facebook/Google ad targeting.

Attribution Modeling

Assigning credit to which marketing touchpoints caused a conversion. Last-click vs. data-driven attribution.

Real-Time Bidding (RTB)

Programmatic auction where ad impressions are bought and sold in milliseconds when a webpage loads.

Customer Segmentation

Dividing customers into groups based on behavior, demographics, or value. Powers personalization.

Typical Roles

Marketing Analyst

Reports on campaign KPIs (CTR, CPL, CPA) and identifies under-performing channels.

Social Media Data Scientist

Uses NLP to monitor brand sentiment at scale across Twitter, Reddit, TikTok, and review sites.

Ad-Tech Engineer / Bidding Scientist

Builds the ML models that power real-time bidding and ad relevance scoring.

Marketing Data Engineer

Integrates data from ad platforms, CRM, and web analytics into a unified marketing data warehouse.

Tech Stack

Core

PythonSQLR

Specialized

NLTKspaCyHugging Face TransformersTextBlobGoogle Analytics 4AmplitudeBrandwatchLooker StudioMeta Ads APIGoogle Ads API

Ethical terrain: Micro-targeting at scale creates filter bubbles and can reinforce societal division. Political ad targeting has come under significant legislative scrutiny. Using look-alike modeling on protected classes (race, religion) is illegal in many contexts.

05

People Analytics (HR)

The landscape: Sensitive data about real individuals employed by the company. Ethics and legal compliance are as important as model accuracy.

Core objective: Improve hiring quality, reduce voluntary attrition, boost employee engagement, and close pay equity gaps.

Key Terminology

Attrition / Turnover

The rate at which employees leave the company. Voluntary attrition (quitting) vs. involuntary (layoffs) are tracked separately.

DEI (Diversity, Equity, Inclusion)

Analyzing promotion rates, pay gaps, and representation across gender, ethnicity, and disability status.

Performance Rating Bias

Ratings are often skewed by manager subjectivity, recency bias, and demographic factors. Data must be treated with skepticism.

Talent Acquisition Funnel

Applicant to Interview to Offer to Hire. Each stage is analyzed for conversion rates and demographic disparities.

Engagement Score

Survey-based metric of how motivated and connected employees feel. Leading indicator of attrition.

Time-to-Fill / Time-to-Hire

How long it takes to fill an open role. Key operational metric for recruiting teams.

Span of Control

Average number of direct reports per manager. Informs org design analysis.

Typical Roles

HR Analyst

Builds dashboards on headcount, turnover, compensation bands, and diversity metrics for leadership.

People Data Scientist

Builds attrition prediction models, analyzes resume screening algorithms for bias, and models team performance.

Compensation Analyst

Uses market data and internal pay equity analysis to recommend salary bands and flag outliers.

Organizational Network Analyst

Maps communication patterns (email, Slack metadata) to understand informal influence and collaboration health.

Tech Stack

Core

PythonRSQLExcel

Specialized

WorkdaySAP SuccessFactorsVisierPeopleSoftQualtricsTableauPower BI

Ethical terrain: Attrition models that flag employees as flight risks can lead to unfair treatment or preemptive termination. Bias in resume screening has been well-documented (Amazon abandoned its ML hiring tool for this reason). Any model touching hiring, promotion, or compensation is high legal risk.

06

Retail, E-commerce & Manufacturing

The landscape: Physical supply chains intersecting with digital storefronts. High seasonality, complex inventory dynamics, and thin margins where small forecast errors are expensive.

Core objective: Forecast demand accurately, optimize pricing, prevent stockouts, and reduce operational waste.

Key Terminology

SKU (Stock Keeping Unit)

Unique identifier for each distinct product variant. A blue shirt in size large is a different SKU from size medium.

Demand Forecasting

Predicting how many units of each SKU will sell next week or next month. Drives inventory purchasing decisions.

Market Basket Analysis

Finding items frequently purchased together. Powers cross-sell recommendations. Classic algorithm: Apriori.

Inventory Optimization

Setting reorder points and safety stock levels to balance holding costs against stockout risk.

Markdown Optimization

Deciding when and how much to discount aging inventory to maximize recovery value.

Predictive Maintenance

Using sensor data from manufacturing equipment to predict failure before it happens. Reduces unplanned downtime.

Price Elasticity

How sensitive demand is to a price change. Informs dynamic pricing models.

Typical Roles

Supply Chain Analyst

Forecasts demand at SKU level, monitors supplier lead times, and flags inventory risks.

Pricing Analyst / Scientist

Builds price elasticity models and dynamic pricing algorithms across product categories.

Operations / Manufacturing Data Scientist

Analyzes sensor data for predictive maintenance and optimizes production scheduling.

E-commerce Analytics Manager

Owns conversion rate optimization, basket analysis, and personalization for the digital storefront.

Tech Stack

Core

PythonSQLRPandas

Specialized

ProphetARIMALightGBMSAPOracle ERPTableauPower BIAWS ForecastGoogle Cloud Retail AI

Ethical terrain: Dynamic pricing at scale can create perceived price discrimination. Surge pricing during emergencies is regulated in many jurisdictions. Predictive maintenance models must be validated rigorously: a missed failure in a food manufacturing plant has safety consequences.

07

Cybersecurity & InfoSec

The landscape: Adversarial by nature. Your models are being actively probed by sophisticated adversaries who adapt to evade detection. Speed of inference matters as much as accuracy.

Core objective: Detect intrusions, identify malware, prevent data exfiltration, and model attacker behavior before damage occurs.

Key Terminology

SIEM (Security Information and Event Management)

Platform that aggregates and analyzes log data from across the IT environment to detect threats.

IOC (Indicator of Compromise)

Evidence that a system has been breached: unusual IP addresses, file hashes, registry keys.

APT (Advanced Persistent Threat)

Sophisticated, long-term attackers (often nation-state) who maintain access quietly over months or years.

Anomaly Detection

Identifying statistical outliers in network traffic or user behavior that may indicate an attack.

User and Entity Behavior Analytics (UEBA)

Building baseline behavioral profiles for users and flagging deviations (insider threats, compromised accounts).

False Positive Rate

In security, a high false positive rate causes alert fatigue. Analysts ignore alarms. Tuning the precision-recall tradeoff is critical.

Typical Roles

Threat Intelligence Analyst

Processes external threat feeds and maps them to the organizations attack surface.

Security Data Scientist

Builds ML models for malware classification, network intrusion detection, and user behavior anomaly detection.

Red Team Data Analyst

Analyzes findings from penetration tests to identify systemic weaknesses across the organization.

Tech Stack

Core

PythonElasticsearchSQLSpark

Specialized

SplunkElastic SIEMZeek (network analysis)YARA (malware rules)Kafkascikit-learnPyTorch (for deep packet inspection models)

Ethical terrain: Security models have high stakes false negatives. Missing an intrusion can mean ransomware encrypting entire hospital systems. UEBA models that monitor employee behavior raise significant privacy and labor rights questions.

08

Transportation & Logistics

The landscape: Real-time, geospatial, and optimization-heavy. Data is generated by vehicles, sensors, and GPS at high frequency. Route optimization is an NP-hard problem.

Core objective: Minimize delivery time and cost, maximize fleet utilization, predict disruptions, and optimize routing at scale.

Key Terminology

OTP (On-Time Performance)

The percentage of trips or deliveries completed within the scheduled window. The primary SLA metric.

Last Mile Delivery

The final step of delivery from distribution hub to the end customer. The most expensive and labor-intensive segment.

VRP (Vehicle Routing Problem)

The optimization problem of finding the most efficient set of routes for a fleet of vehicles. NP-hard for large instances.

ETA Prediction

Machine learning model to predict arrival time in real time, accounting for traffic, weather, and route conditions.

Load Factor / Capacity Utilization

How full trucks, planes, or ships are on a given route. Low load factor means wasted cost.

Dwell Time

Time a vehicle spends stationary at a stop. Bottleneck detection in rail and port operations.

Typical Roles

Logistics Data Scientist

Builds ETA prediction, demand forecasting for capacity planning, and route optimization models.

Fleet Analytics Manager

Monitors vehicle health data from telematics and predicts maintenance needs.

Network Design Analyst

Uses operations research methods to determine optimal warehouse locations and transportation lane structures.

Tech Stack

Core

PythonSQLOR-Tools (Google)Geopandas

Specialized

FoliumKepler.glOSRM (routing engine)HERE Maps APIGoogle Maps PlatformApache Flink (real-time streams)Tableau

Ethical terrain: Algorithmic routing can concentrate delivery burdens on specific neighborhoods or discriminate in service quality by area. Gig-economy worker classification (driver vs. contractor) intersects directly with how performance data is used.

09

Energy & Utilities

The landscape: Critical infrastructure. Time-series data from physical sensors at massive scale. The consequences of errors are outages affecting millions of people.

Core objective: Forecast energy demand, optimize grid stability, accelerate renewable integration, and detect equipment failure before it causes outages.

Key Terminology

Smart Grid

An electrical grid that uses sensors, automation, and data analytics to manage electricity supply and demand in real time.

SCADA (Supervisory Control and Data Acquisition)

Industrial control systems that monitor and control physical infrastructure. The data source for grid analytics.

Demand Response

Adjusting electricity consumption in response to grid signals. ML models predict which customers will curtail usage.

Renewable Intermittency

The variability of solar and wind output. Forecasting models help grid operators manage the unpredictability.

PV (Photovoltaic) Forecasting

Predicting solar panel output using weather data, cloud cover models, and historical generation patterns.

Typical Roles

Energy Data Scientist

Builds demand forecasting, renewable output prediction, and grid anomaly detection models.

Asset Performance Engineer

Analyzes turbine, transformer, and substation sensor data for predictive maintenance.

Energy Market Analyst

Models electricity spot prices, LMP (Locational Marginal Prices), and develops trading strategies.

Tech Stack

Core

PythonSQLRMATLAB (legacy)

Specialized

ProphetARIMAXGBoostPyTorch (LSTM for time-series)InfluxDBGrafanaOSIsoft PI System

Ethical terrain: Grid models that fail cause blackouts. Load shedding decisions (who loses power first) have equity implications. Low-income communities and vulnerable populations are disproportionately impacted by outage duration.

10

Environmental & Climate Science

The landscape: Long time-horizon, massive spatial datasets, satellite imagery, and models that inform policy at national and global scale.

Core objective: Model climate patterns, attribute extreme weather events to climate change, quantify environmental impact, and inform mitigation and adaptation policy.

Key Terminology

Remote Sensing

Extracting information about the Earth from satellite and aerial imagery. Used for deforestation, ice extent, and land use monitoring.

Reanalysis Data

Historical weather datasets created by running atmospheric models over past observations. ERA5 from ECMWF is the standard.

Carbon Accounting

Measuring and tracking greenhouse gas emissions at organizational, national, or global scale.

Downscaling

Taking coarse global climate model outputs and producing higher-resolution regional predictions using statistical or ML methods.

ESG (Environmental, Social, Governance)

Corporate sustainability framework. Data scientists increasingly build ESG scoring systems for investment analysis.

NetCDF / HDF5

File formats for storing large multidimensional geoscientific datasets (temperature fields, ocean salinity grids).

Typical Roles

Climate Data Scientist

Analyzes climate model outputs, identifies trends in temperature and precipitation, and builds attribution studies.

ESG / Sustainability Analyst

Quantifies corporate environmental impact, builds emissions forecasts, and supports sustainability reporting.

Remote Sensing Scientist

Processes satellite imagery to track deforestation, coastal erosion, urban heat islands, and crop stress.

Tech Stack

Core

PythonRJulia

Specialized

xarrayDaskZarrGeoPandasrasterioGoogle Earth EngineNASA EarthData APIsPangeo platform

Ethical terrain: Climate models inform trillion-dollar policy decisions. Model uncertainty must be communicated clearly to avoid misuse. Environmental justice: climate impacts are not evenly distributed, and data analysis must surface disparate impacts on vulnerable communities.

11

Urban Intelligence & Smart Cities

The landscape: IoT sensors, real-time geospatial feeds, and public datasets intersecting with civic governance. Data scientists here work in service of public benefit rather than private profit.

Core objective: Reduce congestion, improve public safety, optimize resource allocation, and make city services more equitable and responsive.

Key Terminology

GIS (Geographic Information System)

Software and frameworks for storing, analyzing, and visualizing geospatial data (coordinates, polygons, routes).

Spatial Analysis

Analyzing patterns and relationships that exist specifically because of geographic location.

IoT (Internet of Things)

Network of physical sensors: traffic cameras, air quality monitors, parking meters, garbage fill sensors.

Digital Twin

A real-time virtual model of a physical system (a city block, a transit network) used for simulation and planning.

Open Data Portal

City or government-operated repositories of public datasets (311 calls, permit applications, crime reports).

Heat Island Effect

Urban areas are measurably warmer than surrounding rural land due to dense infrastructure. Spatial data reveals where it is most intense.

Typical Roles

Urban Data Scientist

Models traffic flow, optimizes transit scheduling, and analyzes patterns in 311 service call data.

Geospatial Analyst

Maps crime patterns, infrastructure stress, and accessibility gaps to inform planning decisions.

Sustainability / Urban Planning Analyst

Analyzes energy use, emissions, and green infrastructure to support city climate targets.

Tech Stack

Core

PythonRSQLPostGIS

Specialized

GeoPandasFoliumKepler.glArcGIS (industry standard)QGIS (open source)OpenStreetMap APIsH3 (Uber spatial indexing)

Ethical terrain: Predictive policing models have been shown to systematically discriminate against minority neighborhoods, amplifying existing biases in historical arrest data. Mass IoT sensor deployment raises surveillance consent questions. Smart city data should be treated as public infrastructure, not a product.

12

Education & EdTech

The landscape: Longitudinal data, uneven data quality across institutions, and significant equity dimensions. Outcomes (grades, graduation) lag the intervention by months or years.

Core objective: Personalize learning, predict and prevent student drop-out, measure teaching effectiveness, and optimize content delivery.

Key Terminology

Learning Analytics

Using data from learning management systems (clicks, time-on-task, quiz scores) to understand how students learn.

Early Alert / At-Risk Prediction

Models that identify students likely to drop out or fail early enough to intervene with support.

Mastery Learning

Adaptive learning approach where content pacing is based on demonstrated mastery, not time spent.

Psychometrics

The science of measuring mental attributes (knowledge, ability, attitude) through tests. Underpins standardized testing design.

IRT (Item Response Theory)

Statistical models that relate individual test-taker ability to the probability of answering specific items correctly.

Typical Roles

Learning Data Scientist

Builds adaptive learning algorithms, knowledge tracing models, and engagement prediction systems.

Education Researcher / Policy Analyst

Analyzes district-level outcome data to evaluate program effectiveness and inform policy.

Assessment Designer / Psychometrician

Designs and validates standardized assessments using IRT and classical test theory.

Tech Stack

Core

PythonRSQL

Specialized

pyBKT (Bayesian Knowledge Tracing)scikit-learnxAPI (learning data standard)Canvas / Moodle LMS APIsTableauPower BI

Ethical terrain: Algorithmic tracking of student behavior at a young age raises significant consent and surveillance concerns. Predictive at-risk models can create self-fulfilling prophecies if teachers treat flagged students differently. Educational AI must be evaluated for disparate impact across racial and socioeconomic groups.

13

Sports & Human Performance

The landscape: Precision measurement of physical performance, small sample sizes (82 games in an NBA season), and increasingly available tracking data (player GPS, ball trajectory).

Core objective: Maximize player performance, inform game strategy, reduce injury risk, and evaluate player value for roster and contract decisions.

Key Terminology

Expected Value Metrics (xG, xA)

Probability-based metrics assigning value to actions (shots, passes) based on historical outcomes from similar situations.

Tracking Data / Positional Data

x/y coordinates of every player and the ball, captured at 25 frames per second by optical tracking systems.

WARP / WAR (Wins Above Replacement)

A single-number metric quantifying how many additional wins a player produces compared to a replacement-level player.

Injury Load Monitoring

Using GPS and accelerometer data to quantify physical exertion and predict soft-tissue injury risk.

Sabermetrics

Empirical, evidence-based analysis of baseball statistics pioneered by Bill James and popularized by Moneyball.

Typical Roles

Sports Data Scientist

Builds player valuation models, game strategy simulations, and injury prediction pipelines.

Performance Analyst

Analyzes GPS and video tracking data to provide tactical and physical feedback to coaching staff.

Scout / Recruitment Analyst

Builds models to identify undervalued players in transfer markets using contract and performance data.

Tech Stack

Core

PythonRSQL

Specialized

mplsoccer (football viz)pybaseballStatsBomb APIOpta APIR (ggplot2)TableauWyscout

Ethical terrain: Biometric data collected from athletes (heart rate variability, sleep quality) is intimate health data. Ownership and consent is contested: who owns the data generated by a player on the field? Injury prediction models that affect playing time decisions raise labor rights questions.

14

Agriculture 4.0

The landscape: Satellite imagery, drone data, IoT soil sensors, and weather models intersect with one of the oldest human activities. Climate change is making historical patterns unreliable.

Core objective: Maximize crop yield, reduce water and fertilizer waste, predict pest outbreaks, and build resilient food systems.

Key Terminology

Precision Agriculture

Using data and technology to apply inputs (water, fertilizer, pesticide) only where and when they are needed, at field or sub-field resolution.

NDVI (Normalized Difference Vegetation Index)

Satellite-derived index measuring plant health and biomass. High NDVI = healthy, dense vegetation.

Growing Degree Days (GDD)

A measure of heat accumulation used to predict crop developmental stages and harvest timing.

Yield Mapping

Recording crop yield at precise GPS coordinates across a field to identify spatial variability and underperforming zones.

Digital Soil Mapping

Using spatial statistics and ML to create high-resolution maps of soil properties (pH, carbon, moisture) from point samples.

Typical Roles

Agricultural Data Scientist

Builds yield prediction models, crop stress detection systems from satellite imagery, and irrigation optimization algorithms.

Agronomy Informatics Specialist

Translates agronomic domain knowledge into feature engineering and model validation strategies.

Supply Chain Analyst (Food)

Forecasts harvest volumes, models logistics from farm to distribution center, and manages cold chain data.

Tech Stack

Core

PythonRGoogle Earth Engine

Specialized

rasterioxarrayGeoPandasNDVI calculation librariesPlanet API (satellite imagery)Sentinel HubAgWeather APIs

Ethical terrain: Large-scale precision agriculture benefits well-resourced industrial farms. Smallholder farmers in lower-income countries often cannot access these tools, potentially widening agricultural inequality. Data sovereignty of farmer data collected by AgTech platforms is a growing legal and ethical debate.

16

Government & Public Policy

The landscape: Bureaucratic, data quality varies wildly, and the stakes are high: models inform decisions affecting millions of citizens. Public sector data is often messy, siloed, and collected with paper forms.

Core objective: Improve public service delivery, detect benefits fraud, allocate infrastructure budgets, and evaluate policy effectiveness at scale.

Key Terminology

Administrative Data

Records collected during the delivery of government services (tax records, benefit claims, school enrollment). Rich but access-restricted.

Program Evaluation

Rigorous statistical assessment of whether a government program achieved its intended outcomes. RCTs and quasi-experimental methods are used.

Benefits Fraud Detection

Identifying fraudulent claims in social welfare programs using anomaly detection and network analysis.

Algorithmic Accountability

Public requirement that government agencies explain and justify automated decision-making that affects citizens rights.

Open Data / FOIA

Government data published for public use (data.gov) and data obtained through Freedom of Information Act requests.

Typical Roles

Policy Analyst / Data Scientist

Evaluates program effectiveness using quasi-experimental methods and builds forecasting models for budget planning.

Government Performance Analyst

Monitors KPIs for public services (school performance, hospital wait times, infrastructure condition).

Investigative Data Journalist

Uses public records, FOIA data, and data analysis to uncover waste, fraud, and abuse in government operations.

Tech Stack

Core

PythonRSQLStata (common in policy research)

Specialized

Census APIsdata.gov datasetsSocrata (gov open data platform)ArcGISQGISTableau Public

Ethical terrain: Government AI systems that deny benefits, flag individuals for surveillance, or allocate resources unequally are subject to due process requirements. The history of algorithmic risk scores in criminal justice (COMPAS) is a cautionary case study studied in every responsible AI course.

Reference

At a Glance

Tech & SaaS "Move fast, measure everything." A/B Testing, Churn, LTV, DAU/MAU User privacy and manipulation
Finance "Trust, but verify." Credit Risk, Fraud, Basel, VaR Model bias in lending decisions
Healthcare "First, do no harm." HIPAA, ICD-10, Sensitivity, FHIR Patient privacy and false negatives
Marketing "What is the sentiment?" NLP, ROAS, Attribution, Segmentation Micro-targeting and echo chambers
People Analytics "Fairness above all." Attrition, DEI, Engagement, Pay Equity Reinforcing hiring bias
Retail "Right product, right time." SKU, Demand Forecast, Basket Analysis Dynamic pricing fairness
Cybersecurity "Assume breach." SIEM, IOC, UEBA, Anomaly Detection Employee surveillance overreach
Transportation "Optimize every mile." VRP, ETA, Last Mile, Load Factor Service quality discrimination by area
Energy "Stability first." Smart Grid, SCADA, Demand Response Outage equity impacts
Environmental "Quantify the unquantifiable." Remote Sensing, NetCDF, ESG, Downscaling Policy misinformation from uncertain models
Urban "Serve the public." GIS, IoT, Spatial Analysis, Digital Twin Predictive policing bias
Education "Every student is different." Learning Analytics, IRT, At-Risk Modeling Self-fulfilling student labels
Sports "Every tenth of a second counts." xG, Tracking Data, WAR, GPS Load Biometric data ownership disputes
Agriculture "Feed the world more efficiently." NDVI, Precision Ag, Yield Mapping, GDD Small farmer data sovereignty
Legal / Ethics "What is the burden of proof?" Disparate Impact, XAI, GDPR, Fairness This domain IS the risk domain
Government "Does this serve the citizen?" Administrative Data, Program Eval, FOIA Due process in automated decisions

The advice for beginners

+

Do not just learn Python. Pick a domain that genuinely interests you, then build depth in that domain's vocabulary and data types.

+

Domain expertise is a moat. A data scientist who understands ICD-10 codes is more valuable in healthcare than one who only knows XGBoost.

+

Read the trade press of your target sector. Healthcare IT News, Risk.net (finance), Traffic Technology Today (transportation). The vocabulary will transfer directly to your work.

+

The ethical constraints of a domain are not optional extras. They shape which models are permissible, which metrics matter, and whether your work can actually be deployed.

© 2026 Pavan Yellathakota  ·  pye.pages.dev/resources/pick-your-sector .PY