Files
serai/coordinator/src/tributary/scanner.rs

800 lines
28 KiB
Rust
Raw Normal View History

Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
use core::{marker::PhantomData, ops::Deref, future::Future, time::Duration};
use std::{sync::Arc, collections::HashSet};
2023-04-20 05:05:17 -04:00
use zeroize::Zeroizing;
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
use ciphersuite::{group::GroupEncoding, Ciphersuite, Ristretto};
use tokio::sync::broadcast;
use scale::{Encode, Decode};
use serai_client::{
primitives::{SeraiAddress, Signature},
validator_sets::primitives::{KeyPair, ValidatorSet},
Serai,
};
2023-04-20 05:05:17 -04:00
use serai_db::DbTxn;
use processor_messages::coordinator::{SubstrateSignId, SubstrateSignableId};
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
Slash malevolent validators (#294) * add slash tx * ignore unsigned tx replays * verify that provided evidence is valid * fix clippy + fmt * move application tx handling to another module * partially handle the tendermint txs * fix pr comments * support unsigned app txs * add slash target to the votes * enforce provided, unsigned, signed tx ordering within a block * bug fixes * add unit test for tendermint txs * bug fixes * update tests for tendermint txs * add tx ordering test * tidy up tx ordering test * cargo +nightly fmt * Misc fixes from rebasing * Finish resolving clippy * Remove sha3 from tendermint-machine * Resolve a DoS in SlashEvidence's read Also moves Evidence from Vec<Message> to (Message, Option<Message>). That should meet all requirements while being a bit safer. * Make lazy_static a dev-depend for tributary * Various small tweaks One use of sort was inefficient, sorting unsigned || signed when unsigned was already properly sorted. Given how the unsigned TXs were given a nonce of 0, an unstable sort may swap places with an unsigned TX and a signed TX with a nonce of 0 (leading to a faulty block). The extra protection added here sorts signed, then concats. * Fix Tributary tests I broke, start review on tendermint/tx.rs * Finish reviewing everything outside tests and empty_signature * Remove empty_signature empty_signature led to corrupted local state histories. Unfortunately, the API is only sane with a signature. We now use the actual signature, which risks creating a signature over a malicious message if we have ever have an invariant producing malicious messages. Prior, we only signed the message after the local machine confirmed it was okay per the local view of consensus. This is tolerated/preferred over a corrupt state history since production of such messages is already an invariant. TODOs are added to make handling of this theoretical invariant further robust. * Remove async_sequential for tokio::test There was no competition for resources forcing them to be run sequentially. * Modify block order test to be statistically significant without multiple runs * Clean tests --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2023-08-21 07:28:23 +03:00
use tributary::{
TransactionKind, Transaction as TributaryTransaction, TransactionError, Block, TributaryReader,
Slash malevolent validators (#294) * add slash tx * ignore unsigned tx replays * verify that provided evidence is valid * fix clippy + fmt * move application tx handling to another module * partially handle the tendermint txs * fix pr comments * support unsigned app txs * add slash target to the votes * enforce provided, unsigned, signed tx ordering within a block * bug fixes * add unit test for tendermint txs * bug fixes * update tests for tendermint txs * add tx ordering test * tidy up tx ordering test * cargo +nightly fmt * Misc fixes from rebasing * Finish resolving clippy * Remove sha3 from tendermint-machine * Resolve a DoS in SlashEvidence's read Also moves Evidence from Vec<Message> to (Message, Option<Message>). That should meet all requirements while being a bit safer. * Make lazy_static a dev-depend for tributary * Various small tweaks One use of sort was inefficient, sorting unsigned || signed when unsigned was already properly sorted. Given how the unsigned TXs were given a nonce of 0, an unstable sort may swap places with an unsigned TX and a signed TX with a nonce of 0 (leading to a faulty block). The extra protection added here sorts signed, then concats. * Fix Tributary tests I broke, start review on tendermint/tx.rs * Finish reviewing everything outside tests and empty_signature * Remove empty_signature empty_signature led to corrupted local state histories. Unfortunately, the API is only sane with a signature. We now use the actual signature, which risks creating a signature over a malicious message if we have ever have an invariant producing malicious messages. Prior, we only signed the message after the local machine confirmed it was okay per the local view of consensus. This is tolerated/preferred over a corrupt state history since production of such messages is already an invariant. TODOs are added to make handling of this theoretical invariant further robust. * Remove async_sequential for tokio::test There was no competition for resources forcing them to be run sequentially. * Modify block order test to be statistically significant without multiple runs * Clean tests --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2023-08-21 07:28:23 +03:00
tendermint::{
tx::{TendermintTx, Evidence, decode_signed_message},
Slash malevolent validators (#294) * add slash tx * ignore unsigned tx replays * verify that provided evidence is valid * fix clippy + fmt * move application tx handling to another module * partially handle the tendermint txs * fix pr comments * support unsigned app txs * add slash target to the votes * enforce provided, unsigned, signed tx ordering within a block * bug fixes * add unit test for tendermint txs * bug fixes * update tests for tendermint txs * add tx ordering test * tidy up tx ordering test * cargo +nightly fmt * Misc fixes from rebasing * Finish resolving clippy * Remove sha3 from tendermint-machine * Resolve a DoS in SlashEvidence's read Also moves Evidence from Vec<Message> to (Message, Option<Message>). That should meet all requirements while being a bit safer. * Make lazy_static a dev-depend for tributary * Various small tweaks One use of sort was inefficient, sorting unsigned || signed when unsigned was already properly sorted. Given how the unsigned TXs were given a nonce of 0, an unstable sort may swap places with an unsigned TX and a signed TX with a nonce of 0 (leading to a faulty block). The extra protection added here sorts signed, then concats. * Fix Tributary tests I broke, start review on tendermint/tx.rs * Finish reviewing everything outside tests and empty_signature * Remove empty_signature empty_signature led to corrupted local state histories. Unfortunately, the API is only sane with a signature. We now use the actual signature, which risks creating a signature over a malicious message if we have ever have an invariant producing malicious messages. Prior, we only signed the message after the local machine confirmed it was okay per the local view of consensus. This is tolerated/preferred over a corrupt state history since production of such messages is already an invariant. TODOs are added to make handling of this theoretical invariant further robust. * Remove async_sequential for tokio::test There was no competition for resources forcing them to be run sequentially. * Modify block order test to be statistically significant without multiple runs * Clean tests --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2023-08-21 07:28:23 +03:00
TendermintNetwork,
},
2023-04-20 05:05:17 -04:00
};
use crate::{Db, processors::Processors, substrate::BatchInstructionsHashDb, tributary::*, P2p};
2023-04-20 05:05:17 -04:00
#[derive(Clone, Copy, PartialEq, Eq, Debug, Encode, Decode)]
pub enum RecognizedIdType {
Batch,
Plan,
}
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
#[async_trait::async_trait]
pub trait RIDTrait {
async fn recognized_id(
&self,
set: ValidatorSet,
genesis: [u8; 32],
kind: RecognizedIdType,
id: Vec<u8>,
);
}
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
#[async_trait::async_trait]
impl<
FRid: Send + Future<Output = ()>,
F: Sync + Fn(ValidatorSet, [u8; 32], RecognizedIdType, Vec<u8>) -> FRid,
> RIDTrait for F
{
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
async fn recognized_id(
&self,
set: ValidatorSet,
genesis: [u8; 32],
kind: RecognizedIdType,
id: Vec<u8>,
) {
(self)(set, genesis, kind, id).await
}
}
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
#[async_trait::async_trait]
pub trait PublishSeraiTransaction {
async fn publish_set_keys(
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
&self,
db: &(impl Sync + Get),
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
set: ValidatorSet,
removed: Vec<SeraiAddress>,
key_pair: KeyPair,
signature: Signature,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
);
}
mod impl_pst_for_serai {
use super::*;
use serai_client::SeraiValidatorSets;
// Uses a macro because Rust can't resolve the lifetimes/generics around the check function
// check is expected to return true if the effect has already occurred
// The generated publish function will return true if *we* published the transaction
macro_rules! common_pst {
($Meta: ty, $check: ident) => {
async fn publish(
serai: &Serai,
db: &impl Get,
set: ValidatorSet,
tx: serai_client::Transaction,
meta: $Meta,
) -> bool {
loop {
match serai.publish(&tx).await {
Ok(_) => return true,
// This is assumed to be some ephemeral error due to the assumed fault-free
// creation
// TODO2: Differentiate connection errors from invariants
Err(e) => {
// The following block is irrelevant, and can/likely will fail, if we're publishing
// a TX for an old session
// If we're on a newer session, move on
if crate::RetiredTributaryDb::get(db, set).is_some() {
log::warn!("trying to publish a TX relevant to set {set:?} which isn't the latest");
return false;
}
if let Ok(serai) = serai.as_of_latest_finalized_block().await {
let serai = serai.validator_sets();
// Check if someone else published the TX in question
if $check(serai, set, meta).await {
return false;
}
}
log::error!("couldn't connect to Serai node to publish TX: {e:?}");
tokio::time::sleep(core::time::Duration::from_secs(5)).await;
}
}
}
}
};
}
#[async_trait::async_trait]
impl PublishSeraiTransaction for Serai {
async fn publish_set_keys(
&self,
db: &(impl Sync + Get),
set: ValidatorSet,
removed: Vec<SeraiAddress>,
key_pair: KeyPair,
signature: Signature,
) {
let tx = SeraiValidatorSets::set_keys(set.network, removed, key_pair, signature);
2023-12-16 20:54:24 -05:00
async fn check(serai: SeraiValidatorSets<'_>, set: ValidatorSet, (): ()) -> bool {
if matches!(serai.keys(set).await, Ok(Some(_))) {
log::info!("another coordinator set key pair for {:?}", set);
return true;
}
false
}
common_pst!((), check);
if publish(self, db, set, tx, ()).await {
log::info!("published set keys for {set:?}");
}
}
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
}
}
#[async_trait::async_trait]
pub trait PTTTrait {
async fn publish_tributary_tx(&self, tx: Transaction);
}
#[async_trait::async_trait]
impl<FPtt: Send + Future<Output = ()>, F: Sync + Fn(Transaction) -> FPtt> PTTTrait for F {
async fn publish_tributary_tx(&self, tx: Transaction) {
(self)(tx).await
}
}
pub struct TributaryBlockHandler<
'a,
D: Db,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
T: DbTxn,
Pro: Processors,
PST: PublishSeraiTransaction,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
PTT: PTTTrait,
RID: RIDTrait,
Slash malevolent validators (#294) * add slash tx * ignore unsigned tx replays * verify that provided evidence is valid * fix clippy + fmt * move application tx handling to another module * partially handle the tendermint txs * fix pr comments * support unsigned app txs * add slash target to the votes * enforce provided, unsigned, signed tx ordering within a block * bug fixes * add unit test for tendermint txs * bug fixes * update tests for tendermint txs * add tx ordering test * tidy up tx ordering test * cargo +nightly fmt * Misc fixes from rebasing * Finish resolving clippy * Remove sha3 from tendermint-machine * Resolve a DoS in SlashEvidence's read Also moves Evidence from Vec<Message> to (Message, Option<Message>). That should meet all requirements while being a bit safer. * Make lazy_static a dev-depend for tributary * Various small tweaks One use of sort was inefficient, sorting unsigned || signed when unsigned was already properly sorted. Given how the unsigned TXs were given a nonce of 0, an unstable sort may swap places with an unsigned TX and a signed TX with a nonce of 0 (leading to a faulty block). The extra protection added here sorts signed, then concats. * Fix Tributary tests I broke, start review on tendermint/tx.rs * Finish reviewing everything outside tests and empty_signature * Remove empty_signature empty_signature led to corrupted local state histories. Unfortunately, the API is only sane with a signature. We now use the actual signature, which risks creating a signature over a malicious message if we have ever have an invariant producing malicious messages. Prior, we only signed the message after the local machine confirmed it was okay per the local view of consensus. This is tolerated/preferred over a corrupt state history since production of such messages is already an invariant. TODOs are added to make handling of this theoretical invariant further robust. * Remove async_sequential for tokio::test There was no competition for resources forcing them to be run sequentially. * Modify block order test to be statistically significant without multiple runs * Clean tests --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2023-08-21 07:28:23 +03:00
P: P2p,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
> {
pub db: &'a D,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
pub txn: &'a mut T,
pub our_key: &'a Zeroizing<<Ristretto as Ciphersuite>::F>,
pub recognized_id: &'a RID,
pub processors: &'a Pro,
pub publish_serai_tx: &'a PST,
pub publish_tributary_tx: &'a PTT,
pub spec: &'a TributarySpec,
2023-04-20 05:05:17 -04:00
block: Block<Transaction>,
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
pub block_number: u32,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
_p2p: PhantomData<P>,
}
impl<
D: Db,
T: DbTxn,
Pro: Processors,
PST: PublishSeraiTransaction,
PTT: PTTTrait,
RID: RIDTrait,
P: P2p,
> TributaryBlockHandler<'_, D, T, Pro, PST, PTT, RID, P>
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
{
pub fn fatal_slash(&mut self, slashing: [u8; 32], reason: &str) {
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
let genesis = self.spec.genesis();
log::warn!("fatally slashing {}. reason: {}", hex::encode(slashing), reason);
FatallySlashed::set_fatally_slashed(self.txn, genesis, slashing);
// TODO: disconnect the node from network/ban from further participation in all Tributaries
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
}
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
// TODO: Once Substrate confirms a key, we need to rotate our validator set OR form a second
// Tributary post-DKG
// https://github.com/serai-dex/serai/issues/426
async fn handle(mut self) {
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
log::info!("found block for Tributary {:?}", self.spec.set());
let transactions = self.block.transactions.clone();
for tx in transactions {
match tx {
TributaryTransaction::Tendermint(TendermintTx::SlashEvidence(ev)) => {
// Since the evidence is on the chain, it should already have been validated
// We can just punish the signer
let data = match ev {
Evidence::ConflictingMessages(first, second) |
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
Evidence::ConflictingPrecommit(first, second) => (first, Some(second)),
Evidence::InvalidPrecommit(first) | Evidence::InvalidValidRound(first) => (first, None),
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
};
let msgs = (
decode_signed_message::<TendermintNetwork<D, Transaction, P>>(&data.0).unwrap(),
if data.1.is_some() {
Some(
decode_signed_message::<TendermintNetwork<D, Transaction, P>>(&data.1.unwrap())
.unwrap(),
)
} else {
None
},
);
// Since anything with evidence is fundamentally faulty behavior, not just temporal
// errors, mark the node as fatally slashed
self.fatal_slash(msgs.0.msg.sender, &format!("invalid tendermint messages: {msgs:?}"));
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
}
TributaryTransaction::Application(tx) => {
self.handle_application_tx(tx).await;
}
}
}
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
let genesis = self.spec.genesis();
let current_fatal_slashes = FatalSlashes::get_as_keys(self.txn, genesis);
// Calculate the shares still present, spinning if not enough are
// still_present_shares is used by a below branch, yet it's a natural byproduct of checking if
// we should spin, hence storing it in a variable here
let still_present_shares = {
// Start with the original n value
let mut present_shares = self.spec.n(&[]);
// Remove everyone fatally slashed
for removed in &current_fatal_slashes {
let original_i_for_removed =
self.spec.i(&[], *removed).expect("removed party was never present");
let removed_shares =
u16::from(original_i_for_removed.end) - u16::from(original_i_for_removed.start);
present_shares -= removed_shares;
}
// Spin if the present shares don't satisfy the required threshold
if present_shares < self.spec.t() {
loop {
log::error!(
"fatally slashed so many participants for {:?} we no longer meet the threshold",
self.spec.set()
);
tokio::time::sleep(core::time::Duration::from_secs(60)).await;
}
}
present_shares
};
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
for topic in ReattemptDb::take(self.txn, genesis, self.block_number) {
let attempt = AttemptDb::start_next_attempt(self.txn, genesis, topic);
log::info!("re-attempting {topic:?} with attempt {attempt}");
// Slash people who failed to participate as expected in the prior attempt
{
let prior_attempt = attempt - 1;
let (removed, expected_participants) = match topic {
Topic::Dkg => {
// Every validator who wasn't removed is expected to have participated
let removed =
crate::tributary::removed_as_of_dkg_attempt(self.txn, genesis, prior_attempt)
.expect("prior attempt didn't have its removed saved to disk");
let removed_set = removed.iter().copied().collect::<HashSet<_>>();
(
removed,
self
.spec
.validators()
.into_iter()
.filter_map(|(validator, _)| {
Some(validator).filter(|validator| !removed_set.contains(validator))
})
.collect(),
)
}
Topic::DkgConfirmation => {
panic!("TODO: re-attempting DkgConfirmation when we should be re-attempting the Dkg")
}
Topic::SubstrateSign(_) | Topic::Sign(_) => {
let removed =
crate::tributary::removed_as_of_set_keys(self.txn, self.spec.set(), genesis)
.expect("SubstrateSign/Sign yet have yet to set keys");
// TODO: If 67% sent preprocesses, this should be them. Else, this should be vec![]
let expected_participants = vec![];
(removed, expected_participants)
}
};
let (expected_topic, expected_label) = match topic {
Topic::Dkg => {
let n = self.spec.n(&removed);
// If we got all the DKG shares, we should be on DKG confirmation
let share_spec =
DataSpecification { topic: Topic::Dkg, label: Label::Share, attempt: prior_attempt };
if DataReceived::get(self.txn, genesis, &share_spec).unwrap_or(0) == n {
// Label::Share since there is no Label::Preprocess for DkgConfirmation since the
// preprocess is part of Topic::Dkg Label::Share
(Topic::DkgConfirmation, Label::Share)
} else {
let preprocess_spec = DataSpecification {
topic: Topic::Dkg,
label: Label::Preprocess,
attempt: prior_attempt,
};
// If we got all the DKG preprocesses, DKG shares
if DataReceived::get(self.txn, genesis, &preprocess_spec).unwrap_or(0) == n {
// Label::Share since there is no Label::Preprocess for DkgConfirmation since the
// preprocess is part of Topic::Dkg Label::Share
(Topic::Dkg, Label::Share)
} else {
(Topic::Dkg, Label::Preprocess)
}
}
}
Topic::DkgConfirmation => unreachable!(),
// If we got enough participants to move forward, then we expect shares from them all
Topic::SubstrateSign(_) | Topic::Sign(_) => (topic, Label::Share),
};
let mut did_not_participate = vec![];
for expected_participant in expected_participants {
if DataDb::get(
self.txn,
genesis,
&DataSpecification {
topic: expected_topic,
label: expected_label,
attempt: prior_attempt,
},
&expected_participant.to_bytes(),
)
.is_none()
{
did_not_participate.push(expected_participant);
}
}
// If a supermajority didn't participate as expected, the protocol was likely aborted due
// to detection of a completion or some larger networking error
// Accordingly, clear did_not_participate
// TODO
// If during the DKG, explicitly mark these people as having been offline
// TODO: If they were offline sufficiently long ago, don't strike them off
if topic == Topic::Dkg {
let mut existing = OfflineDuringDkg::get(self.txn, genesis).unwrap_or(vec![]);
for did_not_participate in did_not_participate {
existing.push(did_not_participate.to_bytes());
}
OfflineDuringDkg::set(self.txn, genesis, &existing);
}
// Slash everyone who didn't participate as expected
// This may be overzealous as if a minority detects a completion, they'll abort yet the
// supermajority will cause the above allowance to not trigger, causing an honest minority
// to be slashed
// At the end of the protocol, the accumulated slashes are reduced by the amount obtained
// by the worst-performing member of the supermajority, and this is expected to
// sufficiently compensate for slashes which occur under normal operation
// TODO
}
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
/*
All of these have the same common flow:
1) Check if this re-attempt is actually needed
2) If so, dispatch whatever events as needed
This is because we *always* re-attempt any protocol which had participation. That doesn't
mean we *should* re-attempt this protocol.
The alternatives were:
1) Note on-chain we completed a protocol, halting re-attempts upon 34%.
2) Vote on-chain to re-attempt a protocol.
This schema doesn't have any additional messages upon the success case (whereas
alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and
then preprocesses. This only sends preprocesses).
*/
match topic {
Topic::Dkg => {
let mut removed = current_fatal_slashes.clone();
let t = self.spec.t();
{
let mut present_shares = still_present_shares;
// Load the parties marked as offline across the various attempts
let mut offline = OfflineDuringDkg::get(self.txn, genesis)
.unwrap_or(vec![])
.iter()
.map(|key| <Ristretto as Ciphersuite>::G::from_bytes(key).unwrap())
.collect::<Vec<_>>();
// Pop from the list to prioritize the removal of those recently offline
while let Some(offline) = offline.pop() {
// Make sure they weren't removed already (such as due to being fatally slashed)
// This also may trigger if they were offline across multiple attempts
if removed.contains(&offline) {
continue;
}
// If we can remove them and still meet the threshold, do so
let original_i_for_offline =
self.spec.i(&[], offline).expect("offline was never present?");
let offline_shares =
u16::from(original_i_for_offline.end) - u16::from(original_i_for_offline.start);
if (present_shares - offline_shares) >= t {
present_shares -= offline_shares;
removed.push(offline);
}
// If we've removed as many people as we can, break
if present_shares == t {
break;
}
}
}
RemovedAsOfDkgAttempt::set(
self.txn,
genesis,
attempt,
&removed.iter().map(<Ristretto as Ciphersuite>::G::to_bytes).collect(),
);
if DkgLocallyCompleted::get(self.txn, genesis).is_none() {
let Some(our_i) = self.spec.i(&removed, Ristretto::generator() * self.our_key.deref())
else {
continue;
};
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
// Since it wasn't completed, instruct the processor to start the next attempt
let id =
processor_messages::key_gen::KeyGenId { session: self.spec.set().session, attempt };
let params =
frost::ThresholdParams::new(t, self.spec.n(&removed), our_i.start).unwrap();
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
let shares = u16::from(our_i.end) - u16::from(our_i.start);
self
.processors
.send(
self.spec.set().network,
processor_messages::key_gen::CoordinatorMessage::GenerateKey { id, params, shares },
)
.await;
}
}
Topic::DkgConfirmation => unreachable!(),
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
Topic::SubstrateSign(inner_id) => {
let id = processor_messages::coordinator::SubstrateSignId {
session: self.spec.set().session,
id: inner_id,
attempt,
};
match inner_id {
SubstrateSignableId::CosigningSubstrateBlock(block) => {
let block_number = SeraiBlockNumber::get(self.txn, block)
.expect("couldn't get the block number for prior attempted cosign");
// Check if the cosigner has a signature from our set for this block/a newer one
let latest_cosign =
crate::cosign_evaluator::LatestCosign::get(self.txn, self.spec.set().network)
.map_or(0, |cosign| cosign.block_number);
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
if latest_cosign < block_number {
// Instruct the processor to start the next attempt
self
.processors
.send(
self.spec.set().network,
processor_messages::coordinator::CoordinatorMessage::CosignSubstrateBlock {
id,
block_number,
},
)
.await;
}
}
SubstrateSignableId::Batch(batch) => {
// If the Batch hasn't appeared on-chain...
if BatchInstructionsHashDb::get(self.txn, self.spec.set().network, batch).is_none() {
// Instruct the processor to start the next attempt
// The processor won't continue if it's already signed a Batch
// Prior checking if the Batch is on-chain just may reduce the non-participating
// 33% from publishing their re-attempt messages
self
.processors
.send(
self.spec.set().network,
processor_messages::coordinator::CoordinatorMessage::BatchReattempt { id },
)
.await;
}
}
SubstrateSignableId::SlashReport => {
// If this Tributary hasn't been retired...
// (published SlashReport/took too long to do so)
if crate::RetiredTributaryDb::get(self.txn, self.spec.set()).is_none() {
let report = SlashReport::get(self.txn, self.spec.set())
.expect("re-attempting signing a SlashReport we don't have?");
self
.processors
.send(
self.spec.set().network,
processor_messages::coordinator::CoordinatorMessage::SignSlashReport {
id,
report,
},
)
.await;
}
}
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
}
}
Topic::Sign(id) => {
// Instruct the processor to start the next attempt
// If it has already noted a completion, it won't send a preprocess and will simply drop
// the re-attempt message
self
.processors
.send(
self.spec.set().network,
processor_messages::sign::CoordinatorMessage::Reattempt {
id: processor_messages::sign::SignId {
session: self.spec.set().session,
id,
attempt,
},
},
)
.await;
}
}
}
if Some(u64::from(self.block_number)) == SlashReportCutOff::get(self.txn, genesis) {
// Grab every slash report
let mut all_reports = vec![];
for (i, (validator, _)) in self.spec.validators().into_iter().enumerate() {
let Some(mut report) = SlashReports::get(self.txn, genesis, validator.to_bytes()) else {
continue;
};
// Assign them 0 points for themselves
report.insert(i, 0);
// Uses &[] as we only need the length which is independent to who else was removed
let signer_i = self.spec.i(&[], validator).unwrap();
let signer_len = u16::from(signer_i.end) - u16::from(signer_i.start);
// Push `n` copies, one for each of their shares
for _ in 0 .. signer_len {
all_reports.push(report.clone());
}
}
// For each participant, grab their median
let mut medians = vec![];
for p in 0 .. self.spec.validators().len() {
let mut median_calc = vec![];
for report in &all_reports {
median_calc.push(report[p]);
}
median_calc.sort_unstable();
medians.push(median_calc[median_calc.len() / 2]);
}
// Grab the points of the last party within the best-performing threshold
// This is done by first expanding the point values by the amount of shares
let mut sorted_medians = vec![];
for (i, (_, shares)) in self.spec.validators().into_iter().enumerate() {
for _ in 0 .. shares {
sorted_medians.push(medians[i]);
}
}
// Then performing the sort
sorted_medians.sort_unstable();
let worst_points_by_party_within_threshold = sorted_medians[usize::from(self.spec.t()) - 1];
// Reduce everyone's points by this value
for median in &mut medians {
*median = median.saturating_sub(worst_points_by_party_within_threshold);
}
// The threshold now has the proper incentive to report this as they no longer suffer
// negative effects
//
// Additionally, if all validators had degraded performance, they don't all get penalized for
// what's likely outside their control (as it occurred universally)
// Mark everyone fatally slashed with u32::MAX
for (i, (validator, _)) in self.spec.validators().into_iter().enumerate() {
if FatallySlashed::get(self.txn, genesis, validator.to_bytes()).is_some() {
medians[i] = u32::MAX;
}
}
let mut report = vec![];
for (i, (validator, _)) in self.spec.validators().into_iter().enumerate() {
if medians[i] != 0 {
report.push((validator.to_bytes(), medians[i]));
}
}
// This does lock in the report, meaning further slash point accumulations won't be reported
// They still have value to be locally tracked due to local decisions made based off
// accumulated slash reports
SlashReport::set(self.txn, self.spec.set(), &report);
// Start a signing protocol for this
self
.processors
.send(
self.spec.set().network,
processor_messages::coordinator::CoordinatorMessage::SignSlashReport {
id: SubstrateSignId {
session: self.spec.set().session,
id: SubstrateSignableId::SlashReport,
attempt: 0,
},
report,
},
)
.await;
}
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
}
2023-04-20 05:05:17 -04:00
}
#[allow(clippy::too_many_arguments)]
pub(crate) async fn handle_new_blocks<
D: Db,
Pro: Processors,
PST: PublishSeraiTransaction,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
PTT: PTTTrait,
RID: RIDTrait,
Slash malevolent validators (#294) * add slash tx * ignore unsigned tx replays * verify that provided evidence is valid * fix clippy + fmt * move application tx handling to another module * partially handle the tendermint txs * fix pr comments * support unsigned app txs * add slash target to the votes * enforce provided, unsigned, signed tx ordering within a block * bug fixes * add unit test for tendermint txs * bug fixes * update tests for tendermint txs * add tx ordering test * tidy up tx ordering test * cargo +nightly fmt * Misc fixes from rebasing * Finish resolving clippy * Remove sha3 from tendermint-machine * Resolve a DoS in SlashEvidence's read Also moves Evidence from Vec<Message> to (Message, Option<Message>). That should meet all requirements while being a bit safer. * Make lazy_static a dev-depend for tributary * Various small tweaks One use of sort was inefficient, sorting unsigned || signed when unsigned was already properly sorted. Given how the unsigned TXs were given a nonce of 0, an unstable sort may swap places with an unsigned TX and a signed TX with a nonce of 0 (leading to a faulty block). The extra protection added here sorts signed, then concats. * Fix Tributary tests I broke, start review on tendermint/tx.rs * Finish reviewing everything outside tests and empty_signature * Remove empty_signature empty_signature led to corrupted local state histories. Unfortunately, the API is only sane with a signature. We now use the actual signature, which risks creating a signature over a malicious message if we have ever have an invariant producing malicious messages. Prior, we only signed the message after the local machine confirmed it was okay per the local view of consensus. This is tolerated/preferred over a corrupt state history since production of such messages is already an invariant. TODOs are added to make handling of this theoretical invariant further robust. * Remove async_sequential for tokio::test There was no competition for resources forcing them to be run sequentially. * Modify block order test to be statistically significant without multiple runs * Clean tests --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2023-08-21 07:28:23 +03:00
P: P2p,
>(
db: &mut D,
2023-04-20 05:05:17 -04:00
key: &Zeroizing<<Ristretto as Ciphersuite>::F>,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
recognized_id: &RID,
processors: &Pro,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
publish_serai_tx: &PST,
publish_tributary_tx: &PTT,
2023-04-20 05:05:17 -04:00
spec: &TributarySpec,
tributary: &TributaryReader<D, Transaction>,
2023-04-20 05:05:17 -04:00
) {
let genesis = tributary.genesis();
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
let mut last_block = LastHandledBlock::get(db, genesis).unwrap_or(genesis);
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
let mut block_number = TributaryBlockNumber::get(db, last_block).unwrap_or(0);
while let Some(next) = tributary.block_after(&last_block) {
let block = tributary.block(&next).unwrap();
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
block_number += 1;
// Make sure we have all of the provided transactions for this block
for tx in &block.transactions {
// Provided TXs will appear first in the Block, so we can break after we hit a non-Provided
let TransactionKind::Provided(order) = tx.kind() else {
break;
};
// make sure we have all the provided txs in this block locally
if !tributary.locally_provided_txs_in_block(&block.hash(), order) {
return;
}
}
let mut db_clone = db.clone();
let mut txn = db_clone.txn();
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
TributaryBlockNumber::set(&mut txn, next, &block_number);
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
(TributaryBlockHandler {
db,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
txn: &mut txn,
spec,
our_key: key,
recognized_id,
Slash malevolent validators (#294) * add slash tx * ignore unsigned tx replays * verify that provided evidence is valid * fix clippy + fmt * move application tx handling to another module * partially handle the tendermint txs * fix pr comments * support unsigned app txs * add slash target to the votes * enforce provided, unsigned, signed tx ordering within a block * bug fixes * add unit test for tendermint txs * bug fixes * update tests for tendermint txs * add tx ordering test * tidy up tx ordering test * cargo +nightly fmt * Misc fixes from rebasing * Finish resolving clippy * Remove sha3 from tendermint-machine * Resolve a DoS in SlashEvidence's read Also moves Evidence from Vec<Message> to (Message, Option<Message>). That should meet all requirements while being a bit safer. * Make lazy_static a dev-depend for tributary * Various small tweaks One use of sort was inefficient, sorting unsigned || signed when unsigned was already properly sorted. Given how the unsigned TXs were given a nonce of 0, an unstable sort may swap places with an unsigned TX and a signed TX with a nonce of 0 (leading to a faulty block). The extra protection added here sorts signed, then concats. * Fix Tributary tests I broke, start review on tendermint/tx.rs * Finish reviewing everything outside tests and empty_signature * Remove empty_signature empty_signature led to corrupted local state histories. Unfortunately, the API is only sane with a signature. We now use the actual signature, which risks creating a signature over a malicious message if we have ever have an invariant producing malicious messages. Prior, we only signed the message after the local machine confirmed it was okay per the local view of consensus. This is tolerated/preferred over a corrupt state history since production of such messages is already an invariant. TODOs are added to make handling of this theoretical invariant further robust. * Remove async_sequential for tokio::test There was no competition for resources forcing them to be run sequentially. * Modify block order test to be statistically significant without multiple runs * Clean tests --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2023-08-21 07:28:23 +03:00
processors,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
publish_serai_tx,
publish_tributary_tx,
Slash malevolent validators (#294) * add slash tx * ignore unsigned tx replays * verify that provided evidence is valid * fix clippy + fmt * move application tx handling to another module * partially handle the tendermint txs * fix pr comments * support unsigned app txs * add slash target to the votes * enforce provided, unsigned, signed tx ordering within a block * bug fixes * add unit test for tendermint txs * bug fixes * update tests for tendermint txs * add tx ordering test * tidy up tx ordering test * cargo +nightly fmt * Misc fixes from rebasing * Finish resolving clippy * Remove sha3 from tendermint-machine * Resolve a DoS in SlashEvidence's read Also moves Evidence from Vec<Message> to (Message, Option<Message>). That should meet all requirements while being a bit safer. * Make lazy_static a dev-depend for tributary * Various small tweaks One use of sort was inefficient, sorting unsigned || signed when unsigned was already properly sorted. Given how the unsigned TXs were given a nonce of 0, an unstable sort may swap places with an unsigned TX and a signed TX with a nonce of 0 (leading to a faulty block). The extra protection added here sorts signed, then concats. * Fix Tributary tests I broke, start review on tendermint/tx.rs * Finish reviewing everything outside tests and empty_signature * Remove empty_signature empty_signature led to corrupted local state histories. Unfortunately, the API is only sane with a signature. We now use the actual signature, which risks creating a signature over a malicious message if we have ever have an invariant producing malicious messages. Prior, we only signed the message after the local machine confirmed it was okay per the local view of consensus. This is tolerated/preferred over a corrupt state history since production of such messages is already an invariant. TODOs are added to make handling of this theoretical invariant further robust. * Remove async_sequential for tokio::test There was no competition for resources forcing them to be run sequentially. * Modify block order test to be statistically significant without multiple runs * Clean tests --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2023-08-21 07:28:23 +03:00
block,
Reattempts (#483) * Schedule re-attempts and add a (not filled out) match statement to actually execute them A comment explains the methodology. To copy it here: """ This is because we *always* re-attempt any protocol which had participation. That doesn't mean we *should* re-attempt this protocol. The alternatives were: 1) Note on-chain we completed a protocol, halting re-attempts upon 34%. 2) Vote on-chain to re-attempt a protocol. This schema doesn't have any additional messages upon the success case (whereas alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and then preprocesses. This only sends preprocesses). """ Any signing protocol which reaches sufficient participation will be re-attempted until it no longer does. * Have the Substrate scanner track DKG removals/completions for the Tributary code * Don't keep trying to publish a participant removal if we've already set keys * Pad out the re-attempt match a bit more * Have CosignEvaluator reload from the DB * Correctly schedule cosign re-attempts * Actuall spawn new DKG removal attempts * Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing The batch ID was an opaque [u8; 5] which also included the network, yet that's redundant and unhelpful. * Clarify a pair of TODOs in the coordinator * Remove old TODO * Final comment cleanup * Correct usage of TARGET_BLOCK_TIME in reattempt scheduler It's in ms and I assumed it was in s. * Have coordinator tests drop BatchReattempts which aren't relevant yet may exist * Bug fix and pointless oddity removal We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon receiving 2/3rds of shares, so any signing protocol could cause two re-attempts (not one more). The coordinator tests randomly generated the Batch ID since it was prior an opaque byte array. While that didn't break the test, it was pointless and did make the already-succeeded check before re-attempting impossible to hit. * Add log statements, correct dead-lock in coordinator tests * Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts * Further bump timeout by a minute AFAICT, GH failed by just a few seconds. This also is worst-case in a single instance, making it fine to be decently long. * Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
block_number,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
_p2p: PhantomData::<P>,
})
.handle()
Slash malevolent validators (#294) * add slash tx * ignore unsigned tx replays * verify that provided evidence is valid * fix clippy + fmt * move application tx handling to another module * partially handle the tendermint txs * fix pr comments * support unsigned app txs * add slash target to the votes * enforce provided, unsigned, signed tx ordering within a block * bug fixes * add unit test for tendermint txs * bug fixes * update tests for tendermint txs * add tx ordering test * tidy up tx ordering test * cargo +nightly fmt * Misc fixes from rebasing * Finish resolving clippy * Remove sha3 from tendermint-machine * Resolve a DoS in SlashEvidence's read Also moves Evidence from Vec<Message> to (Message, Option<Message>). That should meet all requirements while being a bit safer. * Make lazy_static a dev-depend for tributary * Various small tweaks One use of sort was inefficient, sorting unsigned || signed when unsigned was already properly sorted. Given how the unsigned TXs were given a nonce of 0, an unstable sort may swap places with an unsigned TX and a signed TX with a nonce of 0 (leading to a faulty block). The extra protection added here sorts signed, then concats. * Fix Tributary tests I broke, start review on tendermint/tx.rs * Finish reviewing everything outside tests and empty_signature * Remove empty_signature empty_signature led to corrupted local state histories. Unfortunately, the API is only sane with a signature. We now use the actual signature, which risks creating a signature over a malicious message if we have ever have an invariant producing malicious messages. Prior, we only signed the message after the local machine confirmed it was okay per the local view of consensus. This is tolerated/preferred over a corrupt state history since production of such messages is already an invariant. TODOs are added to make handling of this theoretical invariant further robust. * Remove async_sequential for tokio::test There was no competition for resources forcing them to be run sequentially. * Modify block order test to be statistically significant without multiple runs * Clean tests --------- Co-authored-by: Luke Parker <lukeparker5132@gmail.com>
2023-08-21 07:28:23 +03:00
.await;
last_block = next;
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
LastHandledBlock::set(&mut txn, genesis, &next);
txn.commit();
2023-04-20 05:05:17 -04:00
}
}
pub(crate) async fn scan_tributaries_task<
D: Db,
Pro: Processors,
P: P2p,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
RID: 'static + Send + Sync + Clone + RIDTrait,
>(
raw_db: D,
key: Zeroizing<<Ristretto as Ciphersuite>::F>,
recognized_id: RID,
processors: Pro,
serai: Arc<Serai>,
mut tributary_event: broadcast::Receiver<crate::TributaryEvent<D, P>>,
) {
log::info!("scanning tributaries");
loop {
match tributary_event.recv().await {
Ok(crate::TributaryEvent::NewTributary(crate::ActiveTributary { spec, tributary })) => {
// For each Tributary, spawn a dedicated scanner task
tokio::spawn({
let raw_db = raw_db.clone();
let key = key.clone();
let recognized_id = recognized_id.clone();
let processors = processors.clone();
let serai = serai.clone();
async move {
let spec = &spec;
let reader = tributary.reader();
let mut tributary_db = raw_db.clone();
loop {
2023-10-14 19:55:14 -04:00
// Check if the set was retired, and if so, don't further operate
if crate::db::RetiredTributaryDb::get(&raw_db, spec.set()).is_some() {
2023-10-14 19:55:14 -04:00
break;
}
// Obtain the next block notification now to prevent obtaining it immediately after
// the next block occurs
let next_block_notification = tributary.next_block_notification().await;
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
handle_new_blocks::<_, _, _, _, _, P>(
&mut tributary_db,
&key,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
&recognized_id,
&processors,
&*serai,
Coordinator Cleanup (#481) * Move logic for evaluating if a cosign should occur to its own file Cleans it up and makes it more robust. * Have expected_next_batch return an error instead of retrying While convenient to offer an error-free implementation, it potentially caused very long lived lock acquisitions in handle_processor_message. * Unify and clean DkgConfirmer and DkgRemoval Does so via adding a new file for the common code, SigningProtocol. Modifies from_cache to return the preprocess with the machine, as there's no reason not to. Also removes an unused Result around the type. Clarifies the security around deterministic nonces, removing them for saved-to-disk cached preprocesses. The cached preprocesses are encrypted as the DB is not a proper secret store. Moves arguments always present in the protocol from function arguments into the struct itself. Removes the horribly ugly code in DkgRemoval, fixing multiple issues present with it which would cause it to fail on use. * Set SeraiBlockNumber in cosign.rs as it's used by the cosigning protocol * Remove unnecessary Clone from lambdas in coordinator * Remove the EventDb from Tributary scanner We used per-Transaction DB TXNs so on error, we don't have to rescan the entire block yet only the rest of it. We prevented scanning multiple transactions by tracking which we already had. This is over-engineered and not worth it. * Implement borsh for HasEvents, removing the manual encoding * Merge DkgConfirmer and DkgRemoval into signing_protocol.rs Fixes a bug in DkgConfirmer which would cause it to improperly handle indexes if any validator had multiple key shares. * Strictly type DataSpecification's Label * Correct threshold_i_map_to_keys_and_musig_i_map It didn't include the participant's own index and accordingly was offset. * Create TributaryBlockHandler This struct contains all variables prior passed to handle_block and stops them from being passed around again and again. This also ensures fatal_slash is only called while handling a block, as needed as it expects to operate under perfect consensus. * Inline accumulate, store confirmation nonces with shares Inlining accumulate makes sense due to the amount of data accumulate needed to be passed. Storing confirmation nonces with shares ensures that both are available or neither. Prior, one could be yet the other may not have been (requiring an assert in runtime to ensure we didn't bungle it somehow). * Create helper functions for handling DkgRemoval/SubstrateSign/Sign Tributary TXs * Move Label into SignData All of our transactions which use SignData end up with the same common usage pattern for Label, justifying this. Removes 3 transactions, explicitly de-duplicating their handlers. * Remove CurrentlyCompletingKeyPair for the non-contextual DkgKeyPair * Remove the manual read/write for TributarySpec for borsh This struct doesn't have any optimizations booned by the manual impl. Using borsh reduces our scope. * Use temporary variables to further minimize LoC in tributary handler * Remove usage of tuples for non-trivial Tributary transactions * Remove serde from dkg serde could be used to deserialize intenrally inconsistent objects which could lead to panics or faults. The BorshDeserialize derives have been replaced with a manual implementation which won't produce inconsistent objects. * Abstract Future generics using new trait definitions in coordinator * Move published_signed_transaction to tributary/mod.rs to reduce the size of main.rs * Split coordinator/src/tributary/mod.rs into spec.rs and transaction.rs
2023-12-10 20:21:44 -05:00
&|tx: Transaction| {
let tributary = tributary.clone();
async move {
match tributary.add_transaction(tx.clone()).await {
Ok(_) => {}
// Can happen as this occurs on a distinct DB TXN
Err(TransactionError::InvalidNonce) => {
log::warn!(
"publishing TX {tx:?} returned InvalidNonce. was it already added?"
)
}
Err(e) => panic!("created an invalid transaction: {e:?}"),
}
}
},
spec,
&reader,
)
.await;
2023-10-14 19:55:14 -04:00
// Run either when the notification fires, or every interval of block_time
let _ = tokio::time::timeout(
Duration::from_secs(tributary::Tributary::<D, Transaction, P>::block_time().into()),
next_block_notification,
)
.await;
}
}
});
}
2023-10-14 19:55:14 -04:00
// The above loop simply checks the DB every few seconds, voiding the need for this event
Ok(crate::TributaryEvent::TributaryRetired(_)) => {}
Err(broadcast::error::RecvError::Lagged(_)) => {
panic!("scan_tributaries lagged to handle tributary_event")
}
Err(broadcast::error::RecvError::Closed) => panic!("tributary_event sender closed"),
}
}
}