Build the data backbone
ETL/ELT pipelines, orchestration, cloud data warehouses, streaming, data quality, and the modern data stack from ingestion to serving.
Transformation & ELT
The industry standard for SQL-based data transformation. Define models, tests, and documentation as code.
Visit →Official free courses from dbt Labs covering fundamentals, advanced techniques, and analytics engineering.
Visit →Embedded OLAP database that runs in-process. Blazing fast for local analytics and ELT prototyping.
Visit →Next-generation data transformation framework with semantic understanding and virtual environments for safe deployments.
Visit →Modern data pipeline tool with a visual UI, notebook-style blocks, and built-in orchestration and monitoring.
Visit →Open-source ELT platform with 350+ connectors. Self-host or use Airbyte Cloud (free tier available).
Visit →Orchestration
The industry standard DAG-based workflow orchestrator. Used at Airbnb, LinkedIn, and thousands of data teams.
Visit →Modern Python-native orchestrator with built-in observability. Simpler to get started than Airflow.
Visit →Asset-oriented orchestration with software-defined assets. Strong typing and lineage graph built in.
Visit →Run dbt Core projects as Airflow DAGs with task-level visibility and dependency management.
Visit →Durable workflow execution engine with automatic retries, versioning, and fault tolerance for long-running jobs.
Visit →Open-source declarative orchestration platform with YAML-based workflows and a built-in code editor.
Visit →Cloud Warehouses & Lakehouses
$400 free credits on sign-up. Separates compute and storage. Industry-leading cloud data warehouse.
Visit →Free Databricks account with Spark clusters, Delta Lake, and Unity Catalog for learning and prototyping.
Visit →1TB free queries/month, 10GB free storage. Serverless SQL over petabytes. Best for GCP-native stacks.
Visit →Open table format for huge analytic datasets. Enables time-travel, schema evolution, and hidden partitioning.
Visit →Open-source ACID transactions for data lakes built on Parquet. Powers the Databricks Lakehouse architecture.
Visit →Serverless ETL service from AWS with PySpark and Python shell jobs. 1M free DPU-seconds/month.
Visit →Streaming & Real-Time
Distributed event streaming platform handling trillions of events per day. Defacto standard for real-time data.
Visit →Kafka-compatible streaming platform written in C++. 10x lower latency, no JVM. Self-hostable.
Visit →Stateful stream processing at scale. Low-latency event time processing with exactly-once guarantees.
Visit →$400 in free credits for managed Kafka, Schema Registry, and ksqlDB on Confluent Cloud.
Visit →Data Quality & Observability
Define data quality expectations as code. Validate pipelines and generate documentation automatically.
Visit →Built-in data testing framework in dbt. Test uniqueness, referential integrity, and custom SQL assertions.
Visit →Open-source data quality framework with YAML-defined checks. Integrates with Airflow and dbt.
Visit →Data observability platform using ML to detect anomalies in pipelines, tables, and dashboards.
Visit →Certifications
Official dbt Labs certification covering dbt Core, cloud deployments, Jinja, and data modeling best practices.
Visit →Entry-level Databricks certification for Delta Lake, Spark SQL, and pipeline development on Databricks.
Visit →AWS certification covering Glue, Redshift, Kinesis, S3, and data pipeline architecture patterns.
Visit →GCP certification for designing and building data processing systems with BigQuery, Dataflow, and Pub/Sub.
Visit →Community & Learning
Free online community and courses for data engineers, MLOps, and data scientists. Strong peer-to-peer learning.
Visit →Joe Reis and Matt Housley's book on the full data engineering lifecycle. Highly recommended reading.
Visit →Practical tutorials on modern data stack tools — dbt, Snowflake, Airbyte, and pipeline architecture.
Visit →