-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: fix out-of-order terms #47
base: main
Are you sure you want to change the base?
Conversation
This commit adds a couple invariant checks which were silently ignored and resulted in incorrectly setup tests passing, and incorrect messages exchanged. Property 1: for a MsgApp message m, m.Entries[i].Term >= m.LogTerm. Or, more generally, m.Entries is a slice of contiguous entries with a non-decreasing Term, and m.Index+1 == m.Entries[0].Index && m.LogTerm <= m.Entries[0].Term. This property was broken in a few tests. The root cause was that leader appends out-of-order entries to its own log when it becomeLeader(). This was a result of incorrectly set up tests: they restored to a snapshot at (index=11,term=11), but became leader at an earlier term 1. Then it was sending the following, obviously incorrect, MsgApp: {Index=11,LogTerm=11,Entries={{Index=12,Term=1}}. The fix in tests is either going through a follower state at the correct term, by calling becomeFollower(term, ...), or initializing from a correct HardState in storage. Property 2: For a MsgSnap message m, m.Term >= m.Snapshot.Metadata.Term, because the leader doesn't know of any higher term than its own, and hence can't send a message with such an inversion. This was broken in TestRestoreFromSnapMsg, and is now fixed. Signed-off-by: Pavel Kalinnikov <[email protected]>
9ce72e6
to
0b198cf
Compare
@@ -757,7 +757,10 @@ func (r *raft) reset(term uint64) { | |||
} | |||
|
|||
func (r *raft) appendEntry(es ...pb.Entry) (accepted bool) { | |||
li := r.raftLog.lastIndex() | |||
li, lt := r.raftLog.tip() | |||
if r.Term < lt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after the first successful appendEntry
lt == r.Term
, because es[i].Term = r.Term
on line 765. So, I don't think this check should be here. It probably should happen at the end of raft.newRaft()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
li := r.raftLog.lastIndex() | ||
li, lt := r.raftLog.tip() | ||
if r.Term < lt { | ||
r.logger.Panicf("%x appending out-of-order term: %d < %d", r.id, r.Term, lt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you aren't checking Terms in es
, so msg is a bit confusing. Maybe appending out-of-order term
-> last raft log entry has term greater than raft struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about using warning
instead of panic
?
@@ -1366,11 +1366,11 @@ func TestHandleHeartbeat(t *testing.T) { | |||
// TestHandleHeartbeatResp ensures that we re-send log entries when we get a heartbeat response. | |||
func TestHandleHeartbeatResp(t *testing.T) { | |||
storage := newTestMemoryStorage(withPeers(1, 2)) | |||
storage.SetHardState(pb.HardState{Term: 3, Vote: 1, Commit: 3}) | |||
storage.Append([]pb.Entry{{Index: 1, Term: 1}, {Index: 2, Term: 2}, {Index: 3, Term: 3}}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these entries necessary for the test? Maybe both storage.SetHardState
and storage.Append
can be removed to keep things simpler
This commit adds a couple invariant checks which were silently ignored and resulted in incorrectly setup tests passing, and incorrect messages exchanged.
Property 1: for a
MsgApp
messagem
,m.Entries[i].Term >= m.LogTerm
. Or, more generally,m.Entries
is a slice of contiguous entries with a non-decreasingTerm
, andm.Index+1 == m.Entries[0].Index && m.LogTerm <= m.Entries[0].Term
.This property was broken in a few tests. The root cause was that leader appends out-of-order entries to its own log when it
becomeLeader()
. This was a result of incorrectly set up tests: they restored to a snapshot at(index=11,term=11)
, but became leader at an earlier term 1. Then it was sending the following, obviously incorrect,MsgApp
:{Index=11,LogTerm=11,Entries={{Index=12,Term=1}}
.The fix in tests is either going through a follower state at the correct term, by calling
becomeFollower(term, ...)
, or initializing from a correctHardState
in storage.Property 2: For a
MsgSnap
messagem
,m.Term >= m.Snapshot.Metadata.Term
, because the leader doesn't know of any higher term than its own, and hence can't send a message with such an inversion.This was broken in
TestRestoreFromSnapMsg
, and is now fixed.