Backblaze

Introduction

Thanks Backblaze for providing this amazing dataset.

More details can be found on their designated page for this dataset: https://www.backblaze.com/cloud-storage/resources/hard-drive-test-data

Overview

This is a simple dashboard to showcase the data pipeline: from CSV files processed by an Airflow DAG, to a data-raw bucket inside SeaweedFS, then ingested into a CNPG database, transformed with the Airflow Cosmos dbt pipeline, queried and cached by Evidence, and finally visualized in your browser right here.

More on the setup: lab

Dashboard

Time range covered by feched part of the dataset

2025-06-30 - 2025-09-29

Backblaze datacenters on a map

Current capacity (PiB)

4,484

2.41 % from last month

Current drives count

330,237

5,353.00 from last month

Total rows in the dataset (millions)

29.8