Backblaze

Introduction

Thanks Backblaze for providing this amazing dataset.

More details can be found on their designated page for this dataset: https://www.backblaze.com/cloud-storage/resources/hard-drive-test-data

Overview

This is a simple dashboard to showcase the data pipeline: from CSV files processed by an Airflow DAG, to a data-raw bucket inside SeaweedFS, then ingested into a CNPG database, transformed with the Airflow Cosmos dbt pipeline, queried and cached by Evidence, and finally visualized in your browser right here.

More on the setup: lab

Dashboard

Time range covered by feched part of the dataset

2025-01-01 - 2025-12-31

Backblaze datacenters on a map

Current capacity (PiB)

144,388

4.12 % from last month

Current drives count

340,501

2,016 from last month

Total rows in the dataset (millions)

117