Data Engineering

Build the data backbone

ETL/ELT pipelines, orchestration, cloud data warehouses, streaming, data quality, and the modern data stack from ingestion to serving.

Transformation & ELT

dbt Core Free

The industry standard for SQL-based data transformation. Define models, tests, and documentation as code.

Visit →

dbt Learn Free

Official free courses from dbt Labs covering fundamentals, advanced techniques, and analytics engineering.

Visit →

DuckDB Free

Embedded OLAP database that runs in-process. Blazing fast for local analytics and ELT prototyping.

Visit →

SQLMesh Free

Next-generation data transformation framework with semantic understanding and virtual environments for safe deployments.

Visit →

Mage.ai Free

Modern data pipeline tool with a visual UI, notebook-style blocks, and built-in orchestration and monitoring.

Visit →

Airbyte Free

Open-source ELT platform with 350+ connectors. Self-host or use Airbyte Cloud (free tier available).

Visit →

Orchestration

Apache Airflow Free

The industry standard DAG-based workflow orchestrator. Used at Airbnb, LinkedIn, and thousands of data teams.

Visit →

Prefect Free Tier

Modern Python-native orchestrator with built-in observability. Simpler to get started than Airflow.

Visit →

Dagster Free

Asset-oriented orchestration with software-defined assets. Strong typing and lineage graph built in.

Visit →

Astronomer Cosmos Free

Run dbt Core projects as Airflow DAGs with task-level visibility and dependency management.

Visit →

Temporal Free

Durable workflow execution engine with automatic retries, versioning, and fault tolerance for long-running jobs.

Visit →

Kestra Free

Open-source declarative orchestration platform with YAML-based workflows and a built-in code editor.

Visit →

Cloud Warehouses & Lakehouses

Snowflake Free Tier

$400 free credits on sign-up. Separates compute and storage. Industry-leading cloud data warehouse.

Visit →

Databricks Community Free

Free Databricks account with Spark clusters, Delta Lake, and Unity Catalog for learning and prototyping.

Visit →

BigQuery Free Tier

1TB free queries/month, 10GB free storage. Serverless SQL over petabytes. Best for GCP-native stacks.

Visit →

Apache Iceberg Free

Open table format for huge analytic datasets. Enables time-travel, schema evolution, and hidden partitioning.

Visit →

Delta Lake Free

Open-source ACID transactions for data lakes built on Parquet. Powers the Databricks Lakehouse architecture.

Visit →

AWS Glue Free Tier

Serverless ETL service from AWS with PySpark and Python shell jobs. 1M free DPU-seconds/month.

Visit →

Streaming & Real-Time

Apache Kafka Free

Distributed event streaming platform handling trillions of events per day. Defacto standard for real-time data.

Visit →

Redpanda Free

Kafka-compatible streaming platform written in C++. 10x lower latency, no JVM. Self-hostable.

Visit →

Apache Flink Free

Stateful stream processing at scale. Low-latency event time processing with exactly-once guarantees.

Visit →

Confluent Cloud Free Tier

$400 in free credits for managed Kafka, Schema Registry, and ksqlDB on Confluent Cloud.

Visit →

Data Quality & Observability

Great Expectations Free

Define data quality expectations as code. Validate pipelines and generate documentation automatically.

Visit →

dbt Tests Free

Built-in data testing framework in dbt. Test uniqueness, referential integrity, and custom SQL assertions.

Visit →

Soda Core Free

Open-source data quality framework with YAML-defined checks. Integrates with Airflow and dbt.

Visit →

Monte Carlo Free Tier

Data observability platform using ML to detect anomalies in pipelines, tables, and dashboards.

Visit →

Certifications

dbt Certified Analytics Engineer Industry Standard

Official dbt Labs certification covering dbt Core, cloud deployments, Jinja, and data modeling best practices.

Visit →

Databricks Data Engineer Associate Industry Standard

Entry-level Databricks certification for Delta Lake, Spark SQL, and pipeline development on Databricks.

Visit →

AWS Data Engineer Associate Industry Standard

AWS certification covering Glue, Redshift, Kinesis, S3, and data pipeline architecture patterns.

Visit →

Google Professional Data Engineer Industry Standard

GCP certification for designing and building data processing systems with BigQuery, Dataflow, and Pub/Sub.

Visit →

Community & Learning

DataTalks.Club Free

Free online community and courses for data engineers, MLOps, and data scientists. Strong peer-to-peer learning.

Visit →

Fundamentals of Data Engineering Free

Joe Reis and Matt Housley's book on the full data engineering lifecycle. Highly recommended reading.

Visit →

Seattle Data Guy (YouTube) Free

Practical tutorials on modern data stack tools — dbt, Snowflake, Airbyte, and pipeline architecture.

Visit →