Whenever an overlay node initiates a ping to a routing table neighbor, it piggybacks a hash of the list of FUSE IDs that this node believes it is jointly monitoring with its neighbor. When the neighbor receives this message, if the hash matches, the neighbor resets the timers for all the (FUSE ID, neighbor) pairs represented by the hash. There can be more than one timer per FUSE ID because a node may have more than one neighbor in the liveness checking tree. If one of these timers ever fires, the node sends a SoftNotification message to every neighbor in the liveness checking tree for this FUSE group, and then it cleans up the FUSE delegate state for the group. Additionally, if the timer is firing on a member, a repair is initiated.
If a node receives a non-matching hash of FUSE IDs from a neighbor, both nodes attempt to reconcile the dif- ference by exchanging their lists of live FUSE IDs. If they can communicate, they only remove the liveness check- ing trees on which they disagree, and the timers are reset on the others. If they cannot communicate, the relevant checking state is removed, and SoftNotification messages are sent.
During group creation, a race condition exists that can cause hash mismatches: a node that has just received an InstallChecking message may receive a ping from the next hop of the InstallChecking message. We resolve this race condition using a brief grace period. A node only removes a liveness checking tree that its neighbor does not believe exists if that tree has existed for longer than the grace pe- riod; in our implementation, this period is 5 seconds.