Skip to content

Delta Resolution in nanogit

Overview

This document explains how nanogit handles Git delta objects, the limitations of the Git protocol regarding delta transmission, and the design decisions made to support deltas in a stateless architecture.

What are Git Delta Objects?

Git delta objects are compressed representations of Git objects (blobs, trees, commits) that store only the differences from a base object rather than the complete content. This significantly reduces network bandwidth and storage requirements.

Delta Types

Git supports two types of delta objects:

  1. OFS_DELTA (offset delta): References the base object by its offset in the packfile
  2. REF_DELTA (reference delta): References the base object by its SHA-1 hash

nanogit currently handles REF_DELTA objects (type 7 in packfile format).

Delta Format

A delta object contains:

  • Reference to the base object (SHA-1 hash for REF_DELTA)
  • Expected source length (size of base object)
  • Target length (size of resulting object after applying delta)
  • Series of delta instructions:
    • Copy instructions: Copy bytes from base object at specific offset/length
    • Insert instructions: Insert new literal bytes

Why Git Servers Send Deltas

Git servers automatically deltify objects to optimize network transfer:

  1. Bandwidth efficiency: Deltas can reduce transfer size by 50-90% for modified files
  2. Performance: Less data to transfer = faster fetch/clone operations
  3. Standard behavior: All major Git servers (GitHub, GitLab, Bitbucket, Gitea) use deltas by default
  4. Repository structure: Git's internal storage uses deltas for efficient pack files

When Servers Send Deltas

Servers typically send deltas in these scenarios:

  • Modified files across commits (same file, small changes)
  • Similar files in the same repository (templates, duplicated code)
  • After running git repack with aggressive deltification
  • For any object where Git detects similarity with another object

Protocol Limitations: Cannot Disable Deltas

Git Protocol v2 Capabilities

nanogit uses Git protocol v2 (Smart HTTP protocol), which provides various fetch capabilities including thin-pack, no-progress, include-tag, and object filtering.

Critical limitation: There is NO capability to disable delta objects in Git protocol v2.

Why "thin-pack" Doesn't Help

The thin-pack capability controls whether the server can send "thin" packfiles that reference objects the client already has, but it does NOT control deltification. This is a common misunderstanding when working with Git protocols.

Attempted Workarounds (All Failed)

We investigated several approaches to avoid deltas, all unsuccessful:

  1. Request full objects via protocol: No protocol capability exists
  2. Use no-thin pack: Only affects base object inclusion, not deltification
  3. Fetch individual objects: Server still sends deltas for efficiency
  4. Use different protocol version: v0, v1, v2 all support deltas without disable option

Server-Side Behavior

The server's pack generation decides deltification based on:

  • Repository pack structure
  • Recent git repack operations
  • Server-side optimization settings

Even when requesting individual objects, the server may still send them as deltas if they were deltified in the repository's pack files.

Conclusion: Delta handling is mandatory for any Git client implementation.

nanogit's Delta Resolution Approach

Given the impossibility of disabling deltas, nanogit implements stateless in-memory delta resolution.

Design Constraints

  1. Stateless architecture: No persistent local .git directory or object cache
  2. No thin-pack support: All base objects must be in the same fetch response
  3. In-memory processing: Deltas resolved during fetch operation
  4. Single-fetch scope: Base objects only available within the same fetch

Implementation Strategy

The delta resolution process follows these stages:

  1. Fetch Request: Client requests objects from server using Git protocol v2
  2. Packfile Response: Server returns packfile containing regular objects and deltas
  3. Object Collection: Regular objects stored immediately, deltas collected separately
  4. Delta Resolution: Iteratively resolve deltas by finding bases and applying instructions
  5. Return Complete Objects: All objects fully resolved and available to caller

Handling Delta Chains

Delta chains occur when a delta's base object is itself a delta (e.g., Object C depends on Object B which depends on Object A). nanogit resolves these chains iteratively:

  • First iteration: Resolve deltas whose bases are available
  • Subsequent iterations: Resolve deltas whose bases were just resolved
  • Maximum iterations: Limited to prevent infinite loops in case of errors

Delta Application Process

For each delta object, nanogit:

  1. Validates the base object size matches the delta's expectation
  2. Processes each delta instruction in sequence:
    • Copy instructions: Extract bytes from base object
    • Insert instructions: Add new literal bytes
  3. Reconstructs the complete target object
  4. Calculates the hash of the resolved object
  5. Parses structured data for trees and commits

Storage Integration

Delta resolution integrates with nanogit's pluggable storage system:

  • Base objects can be retrieved from storage if previously fetched
  • Resolved objects are added to storage for use in resolving subsequent deltas
  • Storage contents remain available for the session duration

Storage Lifecycle

  1. Fetch starts with storage context injected
  2. Regular objects added to storage immediately
  3. Delta resolution checks storage for base objects
  4. Resolved objects added to storage
  5. All objects available post-fetch

Limitations and Edge Cases

Current Limitations

  1. OFS_DELTA not supported: Only REF_DELTA (type 7) is handled

    • Impact: May fail with packfiles using offset deltas
    • Mitigation: Most servers prefer REF_DELTA for network protocol
  2. Thin-pack not supported:

    • Ensures all base objects are in response
    • Trade-off: Slightly larger packfile transfers
  3. No persistent cache:

    • Stateless operation by design
    • Cannot use previously fetched objects as bases across sessions
  4. Delta chain depth limit:

    • Max iterations prevents infinite loops
    • Git servers typically limit chain depth to 50

Error Scenarios

Missing base object: Occurs when server sends thin-pack or incomplete packfile. Verify thin-pack is not requested.

Base size mismatch: Base object corrupted or wrong base selected. Check object integrity.

Delta chain too deep: Circular reference or extremely deep chain. May require investigating server packfile generation.

Performance Considerations

Memory Usage

Delta resolution operates entirely in-memory, requiring space for:

  • Regular objects received from server
  • Delta objects awaiting resolution
  • Resolved objects after delta application

CPU Cost

Delta application is CPU-intensive as it involves:

  • Memory copy operations for copy instructions
  • Minimal overhead for insert operations
  • Object parsing for trees and commits after resolution

Network Efficiency

Deltas reduce network transfer size by transmitting only differences rather than complete objects. This makes delta support essential for acceptable fetch/clone performance, particularly for repositories with large files or extensive history.

Comparison with Other Implementations

go-git

Uses similar iterative delta resolution approach in-memory. Supports both OFS_DELTA and REF_DELTA types.

libgit2

Resolves deltas during index writing phase. Can leverage on-disk object database for finding base objects. Supports streaming delta application.

Git CLI

Resolves deltas while writing packfile to disk. Uses .git/objects as base object source. Supports thin-pack with local object database.

nanogit Unique Characteristics

  1. Stateless: No .git directory, everything in-memory
  2. No thin-pack: Self-contained packfiles only
  3. Storage-agnostic: Pluggable storage backend
  4. Cloud-focused: Optimized for serverless/container environments

References

Official Git Documentation

Git Internals and Delta Compression

Protocol and Wire Format

Delta Algorithms and Implementation

Research Papers

nanogit Implementation

For implementation details, see:

  • protocol/client/fetch.go - Main delta resolution logic
  • protocol/delta.go - Delta structure and parsing
  • protocol/delta_apply.go - Delta application algorithm
  • protocol/packfile.go - Packfile object processing
  • tests/delta_integration_test.go - Integration test suite

Future Improvements

Potential Enhancements

  1. OFS_DELTA support: Handle offset deltas in addition to reference deltas
  2. Parallel delta resolution: Resolve independent delta chains concurrently
  3. Delta metrics: Track delta statistics (count, chain depth, resolution time)
  4. Streaming resolution: Apply deltas while reading packfile (memory optimization)
  5. Base object prediction: Pre-fetch likely base objects based on patterns

Performance Optimizations

  1. Base object cache: LRU cache for frequently used bases
  2. Delta reordering: Optimize resolution order to minimize iterations
  3. Lazy parsing: Defer tree/commit parsing until actually needed
  4. Zero-copy operations: Reduce memory allocations during copy instructions

Conclusion

Delta resolution in nanogit represents a balance between:

  • Protocol requirements: Cannot disable deltas, must handle them
  • Architecture constraints: Stateless, no persistent cache
  • Performance goals: Fast fetches with minimal memory footprint
  • Reliability: Robust handling of edge cases and error conditions

The implementation successfully handles deltified objects from all major Git servers while maintaining nanogit's stateless design principles. While there are limitations (no OFS_DELTA, no thin-pack), the current approach covers the vast majority of real-world use cases.

Released under the Apache 2.0 License.