diff --git a/doc/raft_recovery.md b/doc/raft_recovery.md
new file mode 100644
index 0000000000000000000000000000000000000000..1b18ff0bfa1027b492d49b66a1c58f016d86c62f
--- /dev/null
+++ b/doc/raft_recovery.md
@@ -0,0 +1,269 @@
+# Gitaly WAL Based Backups for RAFT
+
+See [RAFT.md](raft.md) for understanding Gitaly's Multi-Raft architecture.
+
+## Executive Summary
+
+This document describes how Gitaly's partition backup system will operate with RAFT clusters. Partition backups for non-RAFT deployments should remain unchanged, as the backup solution relies solely on transaction data.
+
+## Current Architecture (Summary)
+
+Gitaly has a backup solution called **Partition Backup** which takes the current read snapshot of partition via transaction manager and archives it's filesystem:
+
+- `gitaly-backup` tool receives a list of servers via environment variable
+- For each server, it discovers partitions through the server and calls `BackupPartition` RPC
+- `BackupPartition` RPC creates a TAR archive with repositories and partition's key-value state, and uploads it to the backup storage.
+- Separetely, WAL entry archiving happens continuously via hooks in Partition Manager whenever a new log is applied.
+
+### How It Works in the existing tooling
+
+```mermaid
+flowchart TD
+ A[gitaly-backup tool] --> B[Receive backup request
with gitaly servers]
+
+ B --> C[Connect to Server A]
+ B --> D[Connect to Server B]
+
+ C --> E[ListPartitions
↓
p1, p2]
+ E --> F[BackupPartition p1]
+ E --> G[BackupPartition p2]
+
+ D --> H[ListPartitions
↓
p3]
+ H --> I[BackupPartition p3]
+
+ style A fill:#e3f2fd
+ style B fill:#fff9c4
+ style E fill:#f3e5f5
+ style H fill:#f3e5f5
+ style F fill:#e8f5e9
+ style G fill:#e8f5e9
+ style I fill:#e8f5e9
+```
+
+### Backup Object Storage Layout
+
+```yaml
+├── gitaly-backups/
+│ ├── partition-backups/
+│ │ ├── storage-a/
+│ │ │ ├── partition-1/
+│ │ │ │ ├── 0000000a.tar
+│ │ │ │ └── 0000000f.tar
+│ │ │ ├── partition-2/
+│ │ │ │ └── 0000000c.tar
+│ │ └── storage-b/
+│ │ └── partition-3/
+│ │ │ └── 0000000a.tar
+│ ├── partition-manifests/
+│ │ ├── storage-a/
+│ │ │ ├── partition-1.json
+│ │ │ └── partition-2.json
+│ │ └── storage-b/
+│ │ └──partition-3.json
+└── gitaly-wal-backups/
+ ├── storage-a/
+ │ ├── partition-1/
+ │ │ ├── 00000001.tar
+ │ │ ├── 00000002.tar
+ │ │ ├── 00000003.tar
+ │ │ ├── 00000004.tar
+ . . .
+ . . .
+ . . .
+```
+
+## Partition Backup for RAFT
+
+### Core Principle: Leader-Only Backups
+
+When RAFT is enabled, all backup operations are automatically routed to the partition's RAFT leader:
+
+```mermaid
+flowchart TD
+A[BackupPartition request for Partition 2] --> B[Receiving Server]
+B --> H{Is this partition stored
in my storage?}
+H -->|No| I[Redirect to correct
RAFT group]
+H -->|Yes| C{Is RAFT enabled?}
+C -->|No| D[Execute backup]
+C -->|Yes| E{Am I the leader?}
+E -->|Yes| F[Execute backup]
+E -->|No| G[Forward to leader]
+
+style A fill:#e1f5fe
+style D fill:#c8e6c9
+style F fill:#c8e6c9
+style G fill:#fff9c4
+style I fill:#fff9c4
+```
+
+### How Would It Work In The Existing Tooling
+
+#### 1. Discovery Phase (Unchanged)
+
+```mermaid
+flowchart TD
+ A[gitaly-backup tool] --> B[Connect to Server A]
+ A --> C[Connect to Server B]
+ A --> D[Connect to Server C]
+
+ B --> E[ListPartitions
↓
p1, p2, p4]
+ C --> F[ListPartitions
↓
p1, p2, p3]
+ D --> G[ListPartitions
↓
p1, p3, p4]
+
+ style A fill:#e3f2fd
+ style E fill:#f3e5f5
+ style F fill:#f3e5f5
+ style G fill:#f3e5f5
+```
+
+#### 2. Execution Phase (With RAFT Routing)
+
+```mermaid
+flowchart TD
+ A[For each discovered partition] --> B[BackupPartition p1
to ANY server]
+ A --> C[BackupPartition p2
to ANY server]
+ A --> D[BackupPartition p3
to ANY server]
+
+ B --> E[Automatically routes
to leader
e.g., Server A]
+ C --> F[Automatically routes
to leader
e.g., Server A]
+ D --> G[Automatically routes
to leader
e.g., Server C]
+
+ style A fill:#fff3e0
+ style B fill:#e8f5e9
+ style C fill:#e8f5e9
+ style D fill:#e8f5e9
+ style E fill:#e3f2fd
+ style F fill:#e3f2fd
+ style G fill:#e3f2fd
+
+ E --> H[Result: Each partition backed up
exactly once from its leader]
+ F --> H
+ G --> H
+
+ style H fill:#c8e6c9,stroke:#4caf50,stroke-width:2px
+```
+
+### Key Benefits
+
+1. **Consistency**: Always backs up the latest committed data
+1. **Simplicity**: No need to track which server is leader
+1. **Deduplication**: BackupPartition call would handle the deduplication. Multiple discoveries of same partition result in single backup (If there were no WAL produced between calls for same partition)
+1. **Transparency**: Works with existing Gitaly-backup tool without changes
+
+The de-duplication can be improved further if the recovery tooling tracks the list of partitions across different storages.
+
+### Alternative Solution
+
+Instead of directly executing backup jobs, the node can create a job file which will be persisted in the replicated storage (can be part of KV storage),
+which then will be easier to track among different nodes and if leadership changes before a job is fully committed, new leader will be able to pick it up.
+This also handles the de-duplication as we won't create a new job for a partition if the previous job for same LSN is not yet completed. This approach is
+inspired from [CockroachDB's backup architecture](https://www.cockroachlabs.com/docs/stable/backup-architecture)
+If we decide on this solution we need to discuss on the overall architecture of the job processing such as:
+
+```yaml
+- Job state machine (pending → running → completed)
+- How to handle stale jobs
+- Cleanup policies for completed jobs
+```
+
+## WAL Archiving with RAFT (including auto snapshotting)
+
+- Only RAFT leaders archive WAL entries
+- WAL archiver tracks metrics for backup triggers:
+ - Total WAL size since last base backup
+ - Time elapsed since last base backup
+- When configured thresholds(time or size based) are exceeded, triggers base partition backup
+- Leadership changes trigger archiver handoff
+- Ensures single, authoritative WAL sequence
+
+This approach eliminates the need for a separate auto-backup monitor since the WAL archiver already has all the necessary information and runs only on leaders.
+
+```markdown
+┌─────────────────────────────────────────────────────┐
+│ Simplified Flow │
+├─────────────────────────────────────────────────────┤
+│ │
+│ Partition Manager │
+│ │ │
+│ ▼ │
+│ WAL Archiver (Leader Only) │
+│ │ │
+│ ├─── Archive WAL entry. │
+│ │ │
+│ ├─── Track Metrics: │
+│ │ • WAL archived since last backup │
+│ │ • Time since last backup │
+│ │ │
+│ └─── Check Trigger Conditions: │
+│ • WAL size > threshold? │
+│ • Time elapsed > max interval? │
+│ │ │
+│ ▼ (if triggered) │
+│ Trigger Partition Backup │
+│ │
+│ Benefits: │
+│ • Single component handles both WAL and backups │
+│ • Natural checkpoint after X amount of changes │
+│ • No separate monitoring infrastructure │
+│ • Direct correlation between changes and backups │
+│ │
+└─────────────────────────────────────────────────────┘
+```
+
+## Recovery Process
+
+### Full Disaster Recovery Strategy
+
+Restore first, then form the cluster
+
+```markdown
+┌─────────────────────────────────────────────────────┐
+│ Full Disaster Recovery Flow │
+├─────────────────────────────────────────────────────┤
+│ │
+│ 1. Prepare Empty Nodes │
+│ └─► Install Gitaly on new servers │
+│ │
+│ 2. Restore Partitions to Individual Nodes │
+│ ├─► Node 1: Restore partitions 1, 3, 5 │
+│ ├─► Node 2: Restore partitions 2, 4, 6 │
+│ └─► Node 3: Restore partitions 1-6 (subset) │
+│ │
+│ 3. Start Nodes in Recovery Mode │
+│ └─► Nodes start without RAFT enabled │
+│ │
+│ 4. Form RAFT Cluster │
+│ ├─► Initialize cluster topology │
+│ ├─► Assign partition replicas │
+│ └─► Enable RAFT consensus │
+│ │
+│ 5. Replicate Missing Data │
+│ └─► RAFT automatically syncs replicas │
+│ │
+└─────────────────────────────────────────────────────┘
+```
+
+This approach requires the operator to be aware of the list of partitions to be restored, which can be figured out from the `/gitaly-wal-backups` directory in the backup storage.
+Currently, there is no manifest that holds the entire list of partitions. If we need the partition restore process to be automated with a single restore command without manual
+partition discovery, we either need a separate manifest file that contains every partition, or we would have to programmatically iterate through `/gitaly-wal-backups` directory
+to figure out all of the partitions that were backed up. Some options:
+
+```yaml
+Option 1: Discovery service
+ - Tool that scans backup storage on-demand
+ - Builds a special manifest dynamically for recovery which can be feed into recovery tool
+ - Special recovery manifest can make it possible to restore directly on the desired nodes
+ - Guaranteed to be accurate list of partitions that can be restored
+ - Possible with existing go-cloud blob integration
+
+Option 2: Periodic manifest generation
+ - WAL archiver updates a global manifest file
+ - Lists all known partitions and their latest backups
+
+Option 3: Metadata partition
+ - Special partition that tracks all other partitions
+ - Always restored first during recovery
+```
+
+Another problem with the current architecture is the backup storage path. We tie the backup under the current storage which would make it harder to discover during restore.
+We can remove the storage name from the path but that can affect the non-Raft architectures as two different storages can contain same partition ID for different respositories.