Backblaze
Introduction
Thanks Backblaze for providing this amazing dataset.
More details can be found on their designated page for this dataset: https://www.backblaze.com/cloud-storage/resources/hard-drive-test-data
Overview
This is a simple dashboard to showcase the data pipeline: from CSV files processed by an Airflow DAG, to a data-raw bucket inside SeaweedFS, then ingested into a CNPG database, transformed with the Airflow Cosmos dbt pipeline, queried and cached by Evidence, and finally visualized in your browser right here.
More on the setup: lab
Dashboard
Time range covered by feched part of the dataset
2025-06-30 - 2025-09-29
Backblaze datacenters on a map
Current capacity (PiB)
4,484
▲ 2.41 % from last month
Current drives count
330,237
▲ 5,353.00 from last month
Total rows in the dataset (millions)
29.8
