Skip to content

System Architecture for Operators

Understanding Guts architecture to effectively operate and troubleshoot nodes.

High-Level Architecture

┌─────────────────────────────────────────────────────────────────────────┐
│                              GUTS NODE                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  ┌─────────────┐ │
│  │   HTTP API   │  │   Git HTTP   │  │  WebSocket   │  │   Metrics   │ │
│  │   (Axum)     │  │   Protocol   │  │  (Realtime)  │  │ (Prometheus)│ │
│  └──────┬───────┘  └──────┬───────┘  └──────┬───────┘  └──────┬──────┘ │
│         │                 │                 │                  │        │
│         └────────────────┬┴─────────────────┴──────────────────┘        │
│                          │                                               │
│  ┌───────────────────────┴────────────────────────────────────────────┐ │
│  │                        APPLICATION LAYER                            │ │
│  │  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌─────────────┐   │ │
│  │  │Collaboration│  │    Auth    │  │   CI/CD    │  │  Compat     │   │ │
│  │  │ (PRs/Issues)│  │(Orgs/Teams)│  │(Workflows) │  │(GitHub API) │   │ │
│  │  └────────────┘  └────────────┘  └────────────┘  └─────────────┘   │ │
│  └────────────────────────────────────────────────────────────────────┘ │
│                                    │                                     │
│  ┌─────────────────────────────────┴──────────────────────────────────┐ │
│  │                         CORE LAYER                                  │ │
│  │  ┌────────────────────┐  ┌────────────────────┐                    │ │
│  │  │    Git Storage     │  │   Consensus Engine │                    │ │
│  │  │  (RocksDB/Memory)  │  │   (Simplex BFT)    │                    │ │
│  │  └────────────────────┘  └────────────────────┘                    │ │
│  └────────────────────────────────────────────────────────────────────┘ │
│                                    │                                     │
│  ┌─────────────────────────────────┴──────────────────────────────────┐ │
│  │                       NETWORK LAYER                                 │ │
│  │  ┌────────────────────────────────────────────────────────────────┐│ │
│  │  │                    P2P Network (commonware)                    ││ │
│  │  │  • Authenticated connections (Ed25519)                         ││ │
│  │  │  • Multi-channel messaging (consensus, data, sync)            ││ │
│  │  │  • QUIC + TCP transport                                        ││ │
│  │  └────────────────────────────────────────────────────────────────┘│ │
│  └────────────────────────────────────────────────────────────────────┘ │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Component Overview

HTTP API Layer

The HTTP API layer handles all external requests:

ComponentPortProtocolPurpose
REST API8080HTTP/1.1, HTTP/2Repository, collaboration, auth APIs
Git HTTP8080HTTP/1.1Git smart protocol (clone, push, fetch)
WebSocket8080WS/WSSReal-time updates, notifications
Metrics9090HTTP/1.1Prometheus metrics scraping

Key endpoints:

  • /api/* - REST API
  • /git/{owner}/{repo}/* - Git protocol
  • /ws - WebSocket connections
  • /health/* - Health checks
  • /metrics - Prometheus metrics

Application Layer

Business logic is organized into feature modules:

ModulePurposeKey Data
CollaborationPull requests, issues, reviewsPRs, issues, comments, reviews
AuthOrganizations, teams, permissionsOrgs, teams, members, ACLs
CI/CDWorkflows, runs, artifactsPipelines, jobs, artifacts
CompatGitHub API compatibilityUsers, tokens, releases

Core Layer

Git Storage

Content-addressed storage for Git objects:

/var/lib/guts/
├── objects/           # Git objects (blobs, trees, commits)
│   ├── pack/          # Pack files
│   └── loose/         # Loose objects
├── refs/              # Branch and tag references
├── consensus/         # Consensus state
└── metadata/          # Repository metadata

Storage backends:

  • Memory (development): Fast, ephemeral
  • RocksDB (production): Persistent, optimized for SSDs

Consensus Engine

Simplex BFT consensus provides:

  • Total ordering: All state changes ordered globally
  • Finality: Blocks are final after 3 network hops
  • Byzantine tolerance: Tolerates f < n/3 malicious validators

Consensus flow:

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
│ Propose  │───▶│   Vote   │───▶│ Finalize │───▶│  Apply   │
│  (Hop 1) │    │  (Hop 2) │    │  (Hop 3) │    │ (Local)  │
└──────────┘    └──────────┘    └──────────┘    └──────────┘

Network Layer

P2P networking using commonware primitives:

ChannelIDPurpose
Pending0Consensus votes in progress
Recovered1Recovered/replayed messages
Resolver2Certificate resolution
Broadcast3Block broadcast to peers
Sync4Block sync and state transfer

Connection properties:

  • Authenticated via Ed25519 keys
  • Encrypted via Noise protocol
  • Multiplexed over QUIC/TCP

Data Flow

Write Path (Push)

Client                    Node                      Network
  │                        │                          │
  │  git push              │                          │
  │───────────────────────▶│                          │
  │                        │  1. Parse pack file      │
  │                        │  2. Store objects        │
  │                        │  3. Submit to consensus  │
  │                        │─────────────────────────▶│
  │                        │                          │  Broadcast
  │                        │                          │  to validators
  │                        │◀─────────────────────────│
  │                        │  4. Wait for finality    │
  │                        │  5. Update refs          │
  │◀───────────────────────│                          │
  │  OK (refs updated)     │                          │

Read Path (Clone/Fetch)

Client                    Node                      Storage
  │                        │                          │
  │  git clone             │                          │
  │───────────────────────▶│                          │
  │                        │  1. Check refs           │
  │                        │─────────────────────────▶│
  │                        │◀─────────────────────────│
  │                        │  2. Negotiate objects    │
  │                        │  3. Generate pack file   │
  │                        │─────────────────────────▶│
  │◀───────────────────────│                          │
  │  Pack file stream      │                          │

High Availability

Single Node

For non-critical deployments:

┌────────────────┐
│   Load Balancer │
└────────┬───────┘

    ┌────┴────┐
    │  Node   │
    └─────────┘

Pros: Simple, low cost Cons: Single point of failure

For production deployments:

┌────────────────────────────────────────────────┐
│                Load Balancer                    │
└────────┬───────────────┬───────────────┬───────┘
         │               │               │
    ┌────┴────┐     ┌────┴────┐     ┌────┴────┐
    │ Node 1  │◀───▶│ Node 2  │◀───▶│ Node 3  │
    │(Validat)│     │(Validat)│     │(Validat)│
    └─────────┘     └─────────┘     └─────────┘
         │               │               │
         └───────────────┴───────────────┘
                    P2P Mesh

Pros: High availability, fault tolerance Cons: Higher complexity and cost

Geographic Distribution

For global deployments:

    US-East              EU-West              AP-Tokyo
  ┌─────────┐          ┌─────────┐          ┌─────────┐
  │ Node 1  │◀────────▶│ Node 3  │◀────────▶│ Node 5  │
  │ Node 2  │          │ Node 4  │          │ Node 6  │
  └─────────┘          └─────────┘          └─────────┘
       │                    │                    │
       └────────────────────┴────────────────────┘
              Global P2P Network

Failure Modes

Node Failure

ScenarioImpactRecovery
Single node crashAPI unavailableRestart node, automatic rejoin
Storage corruptionData loss possibleRestore from backup or resync
Network partitionSplit-brain possibleConsensus handles (2f+1 required)

Consensus Failure

ScenarioImpactRecovery
< f nodes downNoneContinue normally
f nodes downDegraded performanceRestore nodes
> f nodes downConsensus haltsRestore to quorum

Note: f = floor((n-1)/3) for n validators

Network Failure

ScenarioImpactRecovery
Peer disconnectionReduced replicationAutomatic reconnect
Bootstrap failureCan't join networkCheck bootstrap nodes
Firewall blockingP2P not workingCheck firewall rules

Operational Metrics

Key Health Indicators

MetricHealthy RangeAlert Threshold
guts_p2p_peers_connected> 3< 3
guts_consensus_block_heightIncreasingStalled > 1min
guts_http_request_duration_secondsp99 < 100msp99 > 1s
guts_storage_available_bytes> 10% capacity< 10%

Performance Baselines

OperationExpected LatencyNotes
API read< 10msLocal storage read
API write< 100msIncludes consensus
Git clone (1MB)< 1sDepends on network
Git push (1MB)< 2sIncludes consensus finality

Security Architecture

Network Security

┌─────────────────────────────────────────────────────┐
│                    Internet                          │
└────────────────────────┬────────────────────────────┘

                    ┌────┴────┐
                    │   TLS   │  (API, Git HTTPS)
                    │ Termination │
                    └────┬────┘

┌────────────────────────┼────────────────────────────┐
│   Private Network      │                            │
│                   ┌────┴────┐                       │
│                   │  Node   │                       │
│                   └────┬────┘                       │
│                        │                            │
│              ┌─────────┴─────────┐                  │
│              │   Noise Protocol  │  (P2P)           │
│              └───────────────────┘                  │
└─────────────────────────────────────────────────────┘

Key Management

Key TypePurposeStorage
Node keyP2P authenticationFile, HSM, or KMS
TLS certHTTPS terminationFile or cert manager
API tokensUser authenticationDatabase (hashed)

Scaling Considerations

Vertical Scaling

Increase resources on existing nodes:

BottleneckSolution
CPUMore cores, faster clock
MemoryMore RAM
Storage I/ONVMe, RAID
NetworkHigher bandwidth

Horizontal Scaling

Add more nodes for read scaling:

  • Read replicas: Full nodes for read-heavy workloads
  • Load balancing: Distribute API traffic
  • CDN: Cache static assets, archives

Limitations

  • Write scaling: Limited by consensus throughput
  • Storage: All nodes store all data (no sharding yet)
  • Consensus: Max recommended validators: 100

Integration Points

Monitoring Stack

┌────────────┐     ┌────────────┐     ┌────────────┐
│ Guts Node  │────▶│ Prometheus │────▶│  Grafana   │
│ (/metrics) │     │            │     │            │
└────────────┘     └────────────┘     └────────────┘

       │           ┌────────────┐     ┌────────────┐
       └──────────▶│    Loki    │────▶│  Grafana   │
         (logs)    │            │     │            │
                   └────────────┘     └────────────┘

CI/CD Integration

┌────────────┐     ┌────────────┐     ┌────────────┐
│    Git     │────▶│ Guts Node  │────▶│  Webhook   │
│   Push     │     │            │     │            │
└────────────┘     └────────────┘     └────────────┘


                   ┌────────────┐
                   │  CI Runner │
                   │ (Jenkins,  │
                   │  GitHub,   │
                   │  etc.)     │
                   └────────────┘

Released under the MIT License.