Skip to main content

Creating Archives

How to build blob archives from directories for storage in OCI registries.

For file-based archives, CreateBlob is the simplest approach. It creates the archive files and returns an open BlobFile ready for use:

import (
"context"

"github.com/meigma/blob"
)

func createArchive(srcDir, destDir string) (*blob.BlobFile, error) {
return blob.CreateBlob(context.Background(), srcDir, destDir,
blob.CreateBlobWithCompression(blob.CompressionZstd),
)
}

This creates index.blob and data.blob in destDir and returns an open archive. Remember to close it when done:

blobFile, err := blob.CreateBlob(ctx, srcDir, destDir)
if err != nil {
return err
}
defer blobFile.Close()

// Use the archive immediately
content, err := blobFile.ReadFile("config.json")

Custom Filenames

Override the default filenames with options:

blobFile, err := blob.CreateBlob(ctx, srcDir, destDir,
blob.CreateBlobWithIndexName("my-archive.idx"),
blob.CreateBlobWithDataName("my-archive.dat"),
)

Saving an Existing Blob

To save an in-memory or remote Blob to local files:

// archive is a *blob.Blob from any source
err := archive.Save("/path/to/index.blob", "/path/to/data.blob")

Using Create (Advanced)

The lower-level Create function provides more control when you need to:

  • Write to non-file destinations (network streams, cloud storage)
  • Handle index and data separately
  • Integrate with custom I/O pipelines

Basic Usage

To create an archive, provide a source directory and writers for the index and data:

import (
"context"
"os"

"github.com/meigma/blob"
)

func createArchive(srcDir string) error {
indexFile, err := os.Create("archive.index")
if err != nil {
return err
}
defer indexFile.Close()

dataFile, err := os.Create("archive.data")
if err != nil {
return err
}
defer dataFile.Close()

return blob.Create(context.Background(), srcDir, indexFile, dataFile)
}

The function walks the source directory recursively, writing file contents to the data writer and metadata to the index writer. Files are written in path-sorted order to enable efficient directory fetches.

Compression

To enable zstd compression, use CreateWithCompression:

err := blob.Create(ctx, srcDir, indexW, dataW,
blob.CreateWithCompression(blob.CompressionZstd),
)

Compression reduces data size but requires decompression when reading. For typical source code and configuration files, expect 2-4x compression ratios.

Available compression options:

  • blob.CompressionNone - Store files uncompressed (default)
  • blob.CompressionZstd - Use zstd compression

Skipping Compression

Some files compress poorly because they are already compressed (images, videos, archives) or too small to benefit. Use CreateWithSkipCompression to skip these:

err := blob.Create(ctx, srcDir, indexW, dataW,
blob.CreateWithCompression(blob.CompressionZstd),
blob.CreateWithSkipCompression(blob.DefaultSkipCompression(1024)),
)

DefaultSkipCompression(minSize) creates a predicate that skips:

  • Files smaller than minSize bytes
  • Files with known compressed extensions (.jpg, .png, .zip, .gz, etc.)

Custom Skip Predicates

To define custom skip logic, pass additional predicates:

// Skip lock files and generated code
skipGenerated := func(path string, info fs.FileInfo) bool {
return strings.HasSuffix(path, ".lock") ||
strings.Contains(path, "/generated/")
}

err := blob.Create(ctx, srcDir, indexW, dataW,
blob.CreateWithCompression(blob.CompressionZstd),
blob.CreateWithSkipCompression(
blob.DefaultSkipCompression(1024),
skipGenerated,
),
)

If any predicate returns true, the file is stored uncompressed.

Change Detection

For build pipelines, enable strict change detection to catch files that change during archive creation:

err := blob.Create(ctx, srcDir, indexW, dataW,
blob.CreateWithChangeDetection(blob.ChangeDetectionStrict),
)

With strict change detection, Create verifies that file size and modification time remain unchanged after reading. If a file changes mid-write, Create returns an error rather than producing an archive with inconsistent content.

Change detection modes:

  • blob.ChangeDetectionNone - No verification (default, fewer syscalls)
  • blob.ChangeDetectionStrict - Verify files did not change during creation

File Limits

To protect against runaway archive creation, limit the number of files:

// Allow up to 50,000 files
err := blob.Create(ctx, srcDir, indexW, dataW,
blob.CreateWithMaxFiles(50000),
)

If the source directory contains more files than the limit, Create returns blob.ErrTooManyFiles.

Special values:

  • 0 - Use default limit (200,000 files)
  • Negative values - No limit

Memory Considerations

Create builds the entire index in memory before writing. Memory usage scales with the number of files and average path length.

Rough guide:

  • 10,000 files: ~3-5 MB
  • 100,000 files: ~30-50 MB
  • 200,000 files: ~60-100 MB

For archives approaching the default 200,000 file limit, ensure the build environment has sufficient memory (256 MB+ recommended).

Cancellation

Pass a context to support cancellation of long-running archive creation:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute)
defer cancel()

err := blob.Create(ctx, srcDir, indexW, dataW,
blob.CreateWithCompression(blob.CompressionZstd),
)
if errors.Is(err, context.DeadlineExceeded) {
// Archive creation timed out
}

Complete Examples

Using CreateBlob

A production archive creation function with CreateBlob:

func createProductionArchive(srcDir, destDir string) (*blob.BlobFile, error) {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
defer cancel()

return blob.CreateBlob(ctx, srcDir, destDir,
blob.CreateBlobWithCompression(blob.CompressionZstd),
blob.CreateBlobWithSkipCompression(blob.DefaultSkipCompression(1024)),
blob.CreateBlobWithChangeDetection(blob.ChangeDetectionStrict),
blob.CreateBlobWithMaxFiles(100000),
)
}

Using Create

A production archive creation function with the lower-level Create API:

func createProductionArchive(srcDir, indexPath, dataPath string) error {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
defer cancel()

indexFile, err := os.Create(indexPath)
if err != nil {
return fmt.Errorf("create index file: %w", err)
}
defer indexFile.Close()

dataFile, err := os.Create(dataPath)
if err != nil {
return fmt.Errorf("create data file: %w", err)
}
defer dataFile.Close()

err = blob.Create(ctx, srcDir, indexFile, dataFile,
blob.CreateWithCompression(blob.CompressionZstd),
blob.CreateWithSkipCompression(blob.DefaultSkipCompression(1024)),
blob.CreateWithChangeDetection(blob.ChangeDetectionStrict),
blob.CreateWithMaxFiles(100000),
)
if err != nil {
return fmt.Errorf("create archive: %w", err)
}

return nil
}

See Also