[go: up one dir, main page]

Skip to content

raft: Add replication support (without autocompaction) to Raft manager

#6303 (closed) (and !7553 (merged)) completes the implementation of Raft manager for single-node clusters. As the bootstrapping member is also the only member in the Group, there was no support for replication. Frankly, the initial iteration doesn't even support membership management.

In theory, as soon as the Raft manager follows the etcd/raft's life cycle and implements necessary hooks, the library will take care of the remaining flow. That would be too optimistic to trust the library blindly. Our WAL implementation in Gitaly is complicated. Stateful data are persisted via different means: KV DB for hard state, file-based WAL for log entries, and a special format for snapshotting. It's very likely something will pop up in between that impacts the whole flow.

As a result, we need to revisit the implementation of the Raft manager and reinforce the test suite. The test suite needs to be enhanced with various scenarios where a crash occurs at different stages of the replication on each node.

Some constraints for this deliverable:

  • This issue focuses on the core logic of the Raft manager.
  • Auto-compaction (and snapshotting) is ignored at this stage. It will be covered at #6463.
  • We assert the correctness of replication only. We don't assert the failover (covered in #6039).
  • This issue should cover normal members and learner members.
  • The test suite should cover different node membership changes event (join, leave, re-join, update, etc.)

Prerequisites:

Edited by Emily Chui
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information