High-performance backups: How Ceph snapshots enable incremental full backups

August 25, 2025·

·Reading time: 4 minutes

The backup challenge at scale

When you’re managing container storage at scale, backup performance becomes critical. Large volumes—those reaching hundreds of gigabytes—can take hours to back up using traditional full-volume streaming methods. This creates operational challenges: extended backup windows, increased resource consumption, and potential data loss exposure during long-running operations.

At Upsun, we face this challenge daily with our Ceph-based storage infrastructure. While Ceph’s copy-on-write (CoW) snapshots provide instant cloning capabilities for our containers, off-site backups require a different approach. We need a solution that combines the speed of incremental backups with the reliability of full restores.

Why Ceph RBD over traditional file-based approaches

Our storage architecture leverages Ceph’s RADOS Block Device (RBD) feature rather than CephFS for container storage. This choice provides several advantages:

Simplified data management: Working with block devices means handling “bags of bytes” rather than complex file systems
Better performance: Block-level operations eliminate file system overhead
Seamless failover: Containers can migrate across our VM grid without complex file system considerations
Snapshot efficiency: RBD snapshots are instant and space-efficient

While a file-based approach like rsync might seem intuitive—comparing file lists and transferring only changed files—it doesn’t align with our block-level storage philosophy.

The Ceph RBD export-diff solution

Ceph provides an elegant solution through rbd export-diff, which extracts only the changes between two snapshots at the block level. This feature becomes the foundation for our incremental backup strategy.

Here’s how the basic process works:

Create a new RBD snapshot
Use rbd export-diff to identify changed blocks between snapshots
Export only the differential data

However, implementing production-ready backups requires additional considerations beyond the basic diff export.

Building full restore capability from incremental data

To maintain the ability to perform complete volume restores from blob storage, we developed a chunked metadata system:

Chunk-based storage architecture

4MB chunks: Each volume is divided into 4MB blocks for optimal transfer and deduplication
Hash-based deduplication: Chunk keys are generated from content hashes, eliminating duplicate data across the entire system
Project-level isolation: Each project maintains its own chunk catalog to prevent cross-customer data leakage

Metadata file structure

Each backup generates a metadata file containing:

Complete list of chunks required for full volume restoration
Chunk offsets and positions within the volume

This approach ensures that every backup point can restore a complete volume, even though we’re only transferring changed data.

The backup workflow in practice

Here’s the complete backup process:

Snapshot creation: Generate a new RBD snapshot of the volume
Differential analysis: Run rbd export-diff between the current and previous snapshots
Chunk processing: Break the differential data into 4MB chunks and generate hashes
Selective upload: Upload only chunks that don’t already exist in blob storage
Metadata generation: Create a new metadata file referencing all chunks (new and existing) required for full restoration

This workflow ensures that backup time scales with the amount of changed data rather than total volume size.

Optimized restore performance

The hash-based chunk system also accelerates restoration:

Local verification: Before downloading chunks from blob storage, verify if they already exist locally by comparing hashes
Selective download: Only retrieve chunks that have changed or are missing locally
Parallel processing: Multiple chunks can be processed simultaneously

While restore operations still require reading the full volume to verify local chunks, the selective download significantly reduces network transfer time.

Performance benefits and trade-offs

This architecture delivers substantial improvements:

Backup speed: Scales with data change rate rather than volume size
Storage efficiency: Deduplication reduces storage across all backups
Network optimization: Minimal data transfer for routine backups
Restore flexibility: Any backup point can restore a complete volume

The primary trade-off is complexity—managing chunk metadata and ensuring referential integrity requires more sophisticated backup orchestration than simple volume dumps.

Ceph can enable enterprise-grade backup performance without sacrificing restore capabilities. By leveraging block-level snapshots and implementing intelligent chunking strategies, you can achieve backup speeds that scale with your actual data change patterns.

Ready to experience high-performance, scalable container storage? Start your free trial and see how Upsun’s infrastructure handles your most demanding applications.

Last updated on August 25, 2025