Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

gossipsub v1.1: Functional Extension for Validation Queue Protection

Lifecycle StageMaturityStatusLatest Revision
1AWorking DraftActiver1, 2020-09-05

Authors: @vyzo

Interest Group: @yusefnapora, @raulk, @whyrusleeping, @Stebalien, @daviddias, @protolambda, @djrtwo, @dryajov, @mpetrunic, @AgeManning, @Nashatyrev, @mhchia

See the lifecycle document for context about maturity level and spec status.


Overview

This document specifies an extension to gossipsub v1.1 intended to provide a circuit breaker so that routers can withstand concerted attacks targetting the validation queue with a flood of spam. This extension does not modify the protocol in any way and works in conjuction with the defensive mechanisms of gossipsub v1.1.

Validation Queue Protection

An important aspect of gossipsub is the reliance on validators to signal acceptance of incoming messages from the application to the router. The validation is asynchronous, with a typical implementation strategy that uses of a front-end queue and a limit to the number of ongoing validations. This creates a potential target for attacks, as an attacker can overload the queue by brute force, sending spam messages at a very high rate. The effect would be that legitimate messages get dropped by the validation front end, resulting in denial of service.

In order to protect the system from this class of attacks, gossipsub v1.1 incorporates a circuit breaker that sits before the validation queue and can make informed decisions on whether to push a message into the validation queue. This defensive mechanism kicks in when the system detects an elevated rate of dropped messages, and makes decisions on whether to accept incoming messages for validation based on the statistical performance of peers in the origin IP address. The decision is probabilistic and implements a Random Early Drop (RED) strategy that drops messages with a probability that depends on the acceptance rates for messages from the origin IP. This strategy can neuter attacks on the validation queue, because messages are no longer dropped indiscriminately in a drop-tail fashion.

Random Early Drop Algorithm

The algorithm has two aspects:

  • The decision on whether to trigger RED.
  • The decision on whether to drop a message from an origin IP address.

In order to trigger RED, the circuit breaker maintains the following queue statistics:

  • a decaying counter for the number of message validations.
  • a decaying counter for the number of dropped messages.

The decision on triggering RED is based on comparing the ratio of dropped messages to validations. If the ratio exceeds an application configured threshold, then the RED algorithm triggers and a decision on whether to accept the message for validation is made based on origin IP statistics. There is also a quiet period, such that if no messages have been dropped for a while, the circuit breaker turns back off.

In order to make the actual RED decision, the circuit breaker maintains the following statistics per IP:

  • a decaying counter for the number of accepted messages.
  • a decaying counter for the number of duplicate messages, mixed with a weight W_duplicate.
  • a decaying counter for the number of ignored messages, mixed with a weight W_ignored.
  • a decaying counter for the number of rejected messages, mixed with a weight W_rejected.

The router generates a random float r and accepts the message if and only if

r < (1 + accepted) / (1 + accepted + W_duplicate * duplicate + W_ignored * ignored + W_rejected * rejected)

The number of accepted messages is biased by 1 so that a single negative event cannot sinkhole an IP. It also always gives a chance for a message to be accepted, albeit with sharply decreasing probability as negative events accumulate.

All the counters decay linearly with an application configured decay factor, so that the sytem adapts to varying network conditions.

Also note that per IP statistics are retained for a configured period of time after disconnection, so that an attacker cannot easily clear traces of misbehaviour by disconnecting.

Finally, the circuit breaker should allow the application to configure per topic accepted delivery weights, so that deliveries in priority topics can be given more weight. If a topic is not configured, then its delivery weight is 1.

RED Parameters

The circuit breaker utilizes the following application configured parameters:

ParameterPurposeDefault
ActivationThresholddropped to validated message ratio threshold for triggering the circuit breaker0.33
GlobalDecayCoefficientlinear decay coefficient for global statscomputed such that the counter decays to 1% after 2 minutes
SourceDecayCoefficientlinear decay coefficient for per IP statscomputed such that the counter decays to 1% after 1 hour
QuietIntervalinterval of no dropped message events before turning off the circuit breaker1 minute
W_duplicatecounter mixin weight for duplicate messages0.125
W_ignorecounter mixin weight for ignored messages1.0
W_rejectcoutner mixin weight for rejected messages16.0
RetentionPeriodduration of stats retention after disconnection6 hours

With the default parameters, we are rapidly penalising rejections, mildly penalising ignored messages, and softly weighting duplicate messages because they occur normally for mesh peers. The result is that clearly misbehaving peers whose messages lead to outright rejections, will make up for a substantial part of the decision to break the circuit, while underperforming peers will also factor in, but with less force.