Delta Resolution in nanogit
Overview
This document explains how nanogit handles Git delta objects, the limitations of the Git protocol regarding delta transmission, and the design decisions made to support deltas in a stateless architecture.
What are Git Delta Objects?
Git delta objects are compressed representations of Git objects (blobs, trees, commits) that store only the differences from a base object rather than the complete content. This significantly reduces network bandwidth and storage requirements.
Delta Types
Git supports two types of delta objects:
- OFS_DELTA (offset delta): References the base object by its offset in the packfile
- REF_DELTA (reference delta): References the base object by its SHA-1 hash
nanogit currently handles REF_DELTA objects (type 7 in packfile format).
Delta Format
A delta object contains:
- Reference to the base object (SHA-1 hash for REF_DELTA)
- Expected source length (size of base object)
- Target length (size of resulting object after applying delta)
- Series of delta instructions:
- Copy instructions: Copy bytes from base object at specific offset/length
- Insert instructions: Insert new literal bytes
Why Git Servers Send Deltas
Git servers automatically deltify objects to optimize network transfer:
- Bandwidth efficiency: Deltas can reduce transfer size by 50-90% for modified files
- Performance: Less data to transfer = faster fetch/clone operations
- Standard behavior: All major Git servers (GitHub, GitLab, Bitbucket, Gitea) use deltas by default
- Repository structure: Git's internal storage uses deltas for efficient pack files
When Servers Send Deltas
Servers typically send deltas in these scenarios:
- Modified files across commits (same file, small changes)
- Similar files in the same repository (templates, duplicated code)
- After running
git repackwith aggressive deltification - For any object where Git detects similarity with another object
Protocol Limitations: Cannot Disable Deltas
Git Protocol v2 Capabilities
nanogit uses Git protocol v2 (Smart HTTP protocol), which provides various fetch capabilities including thin-pack, no-progress, include-tag, and object filtering.
Critical limitation: There is NO capability to disable delta objects in Git protocol v2.
Why "thin-pack" Doesn't Help
The thin-pack capability controls whether the server can send "thin" packfiles that reference objects the client already has, but it does NOT control deltification. This is a common misunderstanding when working with Git protocols.
Attempted Workarounds (All Failed)
We investigated several approaches to avoid deltas, all unsuccessful:
- Request full objects via protocol: No protocol capability exists
- Use
no-thinpack: Only affects base object inclusion, not deltification - Fetch individual objects: Server still sends deltas for efficiency
- Use different protocol version: v0, v1, v2 all support deltas without disable option
Server-Side Behavior
The server's pack generation decides deltification based on:
- Repository pack structure
- Recent
git repackoperations - Server-side optimization settings
Even when requesting individual objects, the server may still send them as deltas if they were deltified in the repository's pack files.
Conclusion: Delta handling is mandatory for any Git client implementation.
nanogit's Delta Resolution Approach
Given the impossibility of disabling deltas, nanogit implements stateless in-memory delta resolution.
Design Constraints
- Stateless architecture: No persistent local .git directory or object cache
- No thin-pack support: All base objects must be in the same fetch response
- In-memory processing: Deltas resolved during fetch operation
- Single-fetch scope: Base objects only available within the same fetch
Implementation Strategy
The delta resolution process follows these stages:
- Fetch Request: Client requests objects from server using Git protocol v2
- Packfile Response: Server returns packfile containing regular objects and deltas
- Object Collection: Regular objects stored immediately, deltas collected separately
- Delta Resolution: Iteratively resolve deltas by finding bases and applying instructions
- Return Complete Objects: All objects fully resolved and available to caller
Handling Delta Chains
Delta chains occur when a delta's base object is itself a delta (e.g., Object C depends on Object B which depends on Object A). nanogit resolves these chains iteratively:
- First iteration: Resolve deltas whose bases are available
- Subsequent iterations: Resolve deltas whose bases were just resolved
- Maximum iterations: Limited to prevent infinite loops in case of errors
Delta Application Process
For each delta object, nanogit:
- Validates the base object size matches the delta's expectation
- Processes each delta instruction in sequence:
- Copy instructions: Extract bytes from base object
- Insert instructions: Add new literal bytes
- Reconstructs the complete target object
- Calculates the hash of the resolved object
- Parses structured data for trees and commits
Storage Integration
Delta resolution integrates with nanogit's pluggable storage system:
- Base objects can be retrieved from storage if previously fetched
- Resolved objects are added to storage for use in resolving subsequent deltas
- Storage contents remain available for the session duration
Storage Lifecycle
- Fetch starts with storage context injected
- Regular objects added to storage immediately
- Delta resolution checks storage for base objects
- Resolved objects added to storage
- All objects available post-fetch
Limitations and Edge Cases
Current Limitations
OFS_DELTA not supported: Only REF_DELTA (type 7) is handled
- Impact: May fail with packfiles using offset deltas
- Mitigation: Most servers prefer REF_DELTA for network protocol
Thin-pack not supported:
- Ensures all base objects are in response
- Trade-off: Slightly larger packfile transfers
No persistent cache:
- Stateless operation by design
- Cannot use previously fetched objects as bases across sessions
Delta chain depth limit:
- Max iterations prevents infinite loops
- Git servers typically limit chain depth to 50
Error Scenarios
Missing base object: Occurs when server sends thin-pack or incomplete packfile. Verify thin-pack is not requested.
Base size mismatch: Base object corrupted or wrong base selected. Check object integrity.
Delta chain too deep: Circular reference or extremely deep chain. May require investigating server packfile generation.
Performance Considerations
Memory Usage
Delta resolution operates entirely in-memory, requiring space for:
- Regular objects received from server
- Delta objects awaiting resolution
- Resolved objects after delta application
CPU Cost
Delta application is CPU-intensive as it involves:
- Memory copy operations for copy instructions
- Minimal overhead for insert operations
- Object parsing for trees and commits after resolution
Network Efficiency
Deltas reduce network transfer size by transmitting only differences rather than complete objects. This makes delta support essential for acceptable fetch/clone performance, particularly for repositories with large files or extensive history.
Comparison with Other Implementations
go-git
Uses similar iterative delta resolution approach in-memory. Supports both OFS_DELTA and REF_DELTA types.
libgit2
Resolves deltas during index writing phase. Can leverage on-disk object database for finding base objects. Supports streaming delta application.
Git CLI
Resolves deltas while writing packfile to disk. Uses .git/objects as base object source. Supports thin-pack with local object database.
nanogit Unique Characteristics
- Stateless: No .git directory, everything in-memory
- No thin-pack: Self-contained packfiles only
- Storage-agnostic: Pluggable storage backend
- Cloud-focused: Optimized for serverless/container environments
References
Official Git Documentation
- Git Packfile Format - Complete packfile specification including delta encoding
- Git Protocol v2 - Modern Git wire protocol specification
- Git Protocol Capabilities - Available protocol capabilities and limitations
- Deltified Representation - Technical specification of delta format
- git-pack-objects - Man page for pack generation with delta strategies
- git-repack - Documentation on repository repacking and deltification
Git Internals and Delta Compression
- Pro Git Book - Git Internals - Chapter on packfiles and delta compression
- Git's Object Storage - GitHub Engineering blog on Git's packed object store
- How Git Uses Delta Compression - Deep dive into delta encoding strategies
- The Git Packfile Format - Detailed explanation with examples
- Understanding Git Delta Compression - StackOverflow comprehensive answer
Protocol and Wire Format
- Git HTTP Protocol - Smart HTTP protocol details
- Git Transfer Protocols - Overview of Git protocols (HTTP, SSH, Git)
- Git Wire Protocol Version 2 - Google Open Source blog on protocol v2
Delta Algorithms and Implementation
- Delta Compression Algorithms - Wikipedia overview of delta encoding
- Xdelta Algorithm - Algorithm similar to Git's delta compression
- Implementing Efficient Deltas - Blog post on delta implementation strategies
- Rust Git Implementation - Delta Handling - Reference implementation in gitoxide
Research Papers
- "The Design of a Git Repository Storage" - Technical specification
- "Git Object Model and Storage Optimization" - Git's transition to SHA-256 discusses object storage
nanogit Implementation
For implementation details, see:
protocol/client/fetch.go- Main delta resolution logicprotocol/delta.go- Delta structure and parsingprotocol/delta_apply.go- Delta application algorithmprotocol/packfile.go- Packfile object processingtests/delta_integration_test.go- Integration test suite
Related Issues and Pull Requests
- Grafana GitSync Issue #111056 - Original issue: Missing files due to skipped deltas
- nanogit PR #95 - Implement stateless delta resolution for Git objects (merged)
- nanogit PR #96 - Add fallback fetch for missing tree objects (merged)
Future Improvements
Potential Enhancements
- OFS_DELTA support: Handle offset deltas in addition to reference deltas
- Parallel delta resolution: Resolve independent delta chains concurrently
- Delta metrics: Track delta statistics (count, chain depth, resolution time)
- Streaming resolution: Apply deltas while reading packfile (memory optimization)
- Base object prediction: Pre-fetch likely base objects based on patterns
Performance Optimizations
- Base object cache: LRU cache for frequently used bases
- Delta reordering: Optimize resolution order to minimize iterations
- Lazy parsing: Defer tree/commit parsing until actually needed
- Zero-copy operations: Reduce memory allocations during copy instructions
Conclusion
Delta resolution in nanogit represents a balance between:
- Protocol requirements: Cannot disable deltas, must handle them
- Architecture constraints: Stateless, no persistent cache
- Performance goals: Fast fetches with minimal memory footprint
- Reliability: Robust handling of edge cases and error conditions
The implementation successfully handles deltified objects from all major Git servers while maintaining nanogit's stateless design principles. While there are limitations (no OFS_DELTA, no thin-pack), the current approach covers the vast majority of real-world use cases.
