Hugging Face Storage Buckets: Your New Mutable Object Store
Hugging Face launched Storage Buckets, giving you S3-like object storage on the Hub. Manage data with Python or the hf CLI, streamlining your ML workflows.
Editorial Note
Reviewed and analysis by ScoRpii Tech Editorial Team.
In this article
Core Functionality
You can now use Storage Buckets on the Hugging Face Hub, which offers mutable object storage comparable to AWS S3, GCP Cloud Storage, or IBM Cloud Object Storage. This allows you to interact with these buckets directly via the Hugging Face Hub interface, programmatically using Python, or through the hf CLI. You have flexible options for data operations, including direct updates and modifications of objects.
The technical foundation for programmatic access includes the huggingface_hub library, version 1.5.0, and the JavaScript @huggingface/hub library, version 2.10.5. These libraries facilitate scripting and management of your bucket content. Additionally, integration with fsspec enables seamless interoperability with popular data processing frameworks like pandas, Polars, and Dask.
Architectural Shifts
The introduction of mutable storage on the Hugging Face Hub marks a significant architectural shift for data management in MLOps. You can now directly update and modify objects, streamlining workflows where data continually evolves. Previously, you might have relied on versioned datasets, which could complicate iterative modifications or the handling of intermediate artifacts.
This shift positions the Hugging Face platform more broadly in the cloud storage ecosystem, potentially reducing your reliance on external providers like AWS, GCP, or IBM for certain types of ephemeral or frequently updated ML data. For organizations utilizing models from various sources, managing associated training data or inference outputs directly within the Hugging Face environment can simplify access and deployment pipelines.
Key Features and Benefits
Some key features and benefits of Storage Buckets include:
- Mutable object storage for direct updates and modifications
- Programmatic access via Python and the
hf CLI - Integration with
fsspecfor seamless interoperability with popular data processing frameworks - Simplified access and deployment pipelines for ML data
What This Means For Your Operations
With Storage Buckets, you gain a unified environment for both your models and their associated dynamic data. This can significantly reduce the overhead of synchronizing data across disparate storage systems. Your development teams can now browse bucket contents directly on the Hub, script data ingestion or output storage with Python using huggingface_hub v1.5.0, or orchestrate operations via the hf CLI.
If you are accustomed to using fsspec with pandas, Polars, or Dask for data manipulation, the integration means your existing data processing scripts will likely adapt with minimal changes. This capability offers a more integrated and potentially efficient approach to managing the mutable components of your ML lifecycle, from feature stores to inference logs.
The Bottom Line for Developers
The introduction of Storage Buckets on the Hugging Face Hub provides a significant improvement in data management capabilities for MLOps. You can now streamline your workflows, reduce overhead, and simplify access and deployment pipelines. By leveraging the mutable object storage and programmatic access features, you can improve the efficiency and effectiveness of your ML development operations.
Originally reported by
Hugging Face BlogWhat did you think?
Stay Updated
Get the latest tech news delivered to your reader.