Commit Graph

21 Commits

Author SHA1 Message Date
Luke Parker
13b147cbf6 Reduce coordinator tests contention re: cosign messages 2024-03-20 08:23:23 -04:00
Luke Parker
f6409d08f3 Increase timeout in coordinator tests 2024-02-18 08:19:07 -05:00
Luke Parker
337e54c672 Redo Dockerfile generation (#530)
Moves from concatted Dockerfiles to pseudo-templated Dockerfiles via a dedicated Rust program.

Removes the unmaintained kubernetes, not because we shouldn't have/use it, but because it's unmaintained and needs to be reworked before it's present again.

Replaces the compose with the work in the new orchestrator binary which spawns everything as expected. While this arguably re-invents the wheel, it correctly manages secrets and handles the variadic Dockerfiles.

Also adds an unrelated patch for zstd and simplifies running services a bit by greater utilizing the existing infrastructure.

---

* Delete all Dockerfile fragments, add new orchestator to generate Dockerfiles

Enables greater templating.

Also delete the unmaintained kubernetes folder *for now*. This should be
restored in the future.

* Use Dockerfiles from the orchestator

* Ignore Dockerfiles in the git repo

* Remove CI job to check Dockerfiles are as expected now that they're no longer committed

* Remove old Dockerfiles from repo

* Use Debian for monero-wallet-rpc

* Remove replace_cmds for proper usage of entry-dev

Consolidates ports a bit.

Updates serai-docker-tests from "compose" to "build".

* Only write a new dockerfile if it's distinct

Preserves the updated time metadata.

* Update serai-docker-tests

* Correct the path Dockerfiles are built from

* Correct inclusion of orchestration folder in Docker builds

* Correct debug/release flagging in the cargo command

Apparently, --debug isn't an effective NOP yet an error.

* Correct path used to run the Serai node within a Dockerfile

* Correct path in Monero Dockerfile

* Attempt storing monerod in /usr/bin

* Use sudo to move into /usr/bin in CI

* Correct 18.3.0 to 18.3.1

* Escape * with quotes

* Update deny.toml, ADD orchestration in runtime Dockerfile

* Add --detach to the Monero GH CI

* Diversify dockerfiles by network

* Fixes to network-diversified orchestration

* Bitcoin and Monero testnet scripts

* Permissions and tweaks

* Flatten scripts folders

* Add missing folder specification to Monero Dockerfile

* Have monero-wallet-rpc specify the monerod login

* Have the Docker CMD specify env variables inserted at time of Dockerfile generation

They're overrideable with the global enviornment as for tests. This enables
variable generation in orchestrator and output to productionized Docker files
without creating a life-long file within the Docker container.

* Don't add Dockerfiles into Docker containers now that they have secrets

Solely add the source code for them as needed to satisfy the workspace bounds.

* Download arm64 Monero on arm64

* Ensure constant host architecture when reproducibly building the wasm

Host architecture, for some reason, can effect the generated code despite the
target architecture always being foreign to the host architecture.

* Randomly generate infrastructure keys

* Have orchestrator generate a key, be able to create/start containers

* Ensure bash is used over sh

* Clean dated docs

* Change how quoting occurs

* Standardize to sh

* Have Docker test build the dev Dockerfiles

* Only key_gen once

* cargo update

Adds a patch for zstd and reconciles the breaking nightly change which just
occurred.

* Use a dedicated network for Serai

Also fixes SERAI_HOSTNAME passed to coordinator.

* Support providing a key over the env for the Serai node

* Enable and document running daemons for tests via serai-orchestrator

Has running containers under the dev network port forward the RPC ports.

* Use volumes for bitcoin/monero

* Use bitcoin's run.sh in GH CI

* Only use the volume for testnet (not dev)
2024-02-09 02:48:44 -05:00
Luke Parker
b493e3e31f Validator DHT (#494)
* Route validators for any active set through sc-authority-discovery

Additionally adds an RPC route to retrieve their P2P addresses.

* Have the coordinator get peers from substrate

* Have the RPC return one address, not up to 3

Prevents the coordinator from believing it has 3 peers when it has one.

* Add missing feature to serai-client

* Correct network argument in serai-client for p2p_validators call

* Add a test in serai-client to check DHT population with a much quicker failure than the coordinator tests

* Update to latest Substrate

Removes distinguishing BABE/AuthorityDiscovery keys which causes
sc_authority_discovery to populate as desired.

* Update to a properly tagged substrate commit

* Add all dialed to peers to GossipSub

* cargo fmt

* Reduce common code in serai-coordinator-tests with amore involved new_test

* Use a recursive async function to spawn `n` DockerTests with the necessary networking configuration

* Merge UNIQUE_ID and ONE_AT_A_TIME

* Tidy up the new recursive code in tests/coordinator

* Use a Mutex in CONTEXT to let it be set multiple times

* Make complimentary edits to full-stack tests

* Augment coordinator P2p connection logs

* Drop lock acquisitions before recursing

* Better scope lock acquisitions in full-stack, preventing a deadlock

* Ensure OUTER_OPS is reset across the test boundary

* Add cargo deny allowance for dockertest fork
2023-12-22 21:09:18 -05:00
Luke Parker
065d314e2a Further expand clippy workspace lints
Achieves a notable amount of reduced async and clones.
2023-12-17 00:04:49 -05:00
Luke Parker
6a172825aa Reattempts (#483)
* Schedule re-attempts and add a (not filled out) match statement to actually execute them

A comment explains the methodology. To copy it here:

"""
This is because we *always* re-attempt any protocol which had participation. That doesn't
mean we *should* re-attempt this protocol.

The alternatives were:
1) Note on-chain we completed a protocol, halting re-attempts upon 34%.
2) Vote on-chain to re-attempt a protocol.

This schema doesn't have any additional messages upon the success case (whereas
alternative #1 does) and doesn't have overhead (as alternative #2 does, sending votes and
then preprocesses. This only sends preprocesses).
"""

Any signing protocol which reaches sufficient participation will be
re-attempted until it no longer does.

* Have the Substrate scanner track DKG removals/completions for the Tributary code

* Don't keep trying to publish a participant removal if we've already set keys

* Pad out the re-attempt match a bit more

* Have CosignEvaluator reload from the DB

* Correctly schedule cosign re-attempts

* Actuall spawn new DKG removal attempts

* Use u32 for Batch ID in SubstrateSignableId, finish Batch re-attempt routing

The batch ID was an opaque [u8; 5] which also included the network, yet that's
redundant and unhelpful.

* Clarify a pair of TODOs in the coordinator

* Remove old TODO

* Final comment cleanup

* Correct usage of TARGET_BLOCK_TIME in reattempt scheduler

It's in ms and I assumed it was in s.

* Have coordinator tests drop BatchReattempts which aren't relevant yet may exist

* Bug fix and pointless oddity removal

We scheduled a re-attempt upon receiving 2/3rds of preprocesses and upon
receiving 2/3rds of shares, so any signing protocol could cause two re-attempts
(not one more).

The coordinator tests randomly generated the Batch ID since it was prior an
opaque byte array. While that didn't break the test, it was pointless and did
make the already-succeeded check before re-attempting impossible to hit.

* Add log statements, correct dead-lock in coordinator tests

* Increase pessimistic timeout on recv_message to compensate for tighter best-case timeouts

* Further bump timeout by a minute

AFAICT, GH failed by just a few seconds.

This also is worst-case in a single instance, making it fine to be decently long.

* Further further bump timeout due to lack of distinct error
2023-12-12 12:28:53 -05:00
Luke Parker
695d1f0ecf Remove subxt (#460)
* Remove subxt

Removes ~20 crates from our Cargo.lock.

Removes downloading the metadata and enables removing the getMetadata RPC route
(relevant to #379).

Moves forward #337.

Done now due to distinctions in the subxt 0.32 API surface which make it
justifiable to not update.

* fmt, update due to deny triggering on a yanked crate

* Correct the handling of substrate_block_notifier now that it's ephemeral, not long-lived

* Correct URL in tests/coordinator from ws to http
2023-11-28 02:29:50 -05:00
Luke Parker
b296be8515 Replace bincode with borsh (#452)
* Add SignalsConfig to chain_spec

* Correct multiexp feature flagging for rand_core std

* Remove bincode for borsh

Replaces a non-canonical encoding with a canonical encoding which additionally
should be faster.

Also fixes an issue where we used bincode in transcripts where it cannot be
trusted.

This ended up fixing a myriad of other bugs observed, unfortunately.
Accordingly, it either has to be merged or the bug fixes from it must be ported
to a new PR.

* Make serde optional, minimize usage

* Make borsh an optional dependency of substrate/ crates

* Remove unused dependencies

* Use [u8; 64] where possible in the processor messages

* Correct borsh feature flagging
2023-11-25 04:01:11 -05:00
Luke Parker
25066437da Attempt to resolve #434
Worsens the coordinator tests' ensuring of the validity of the cosigning
protocol, yet should actually properly model it.
2023-11-18 00:12:36 -05:00
Luke Parker
96f1d26f7a Add a cosigning protocol to ensure finalizations are unique (#433)
* Add a function to deterministically decide which Serai blocks should be co-signed

Has a 5 minute latency between co-signs, also used as the maximal latency
before a co-sign is started.

* Get all active tributaries we're in at a specific block

* Add and route CosignSubstrateBlock, a new provided TX

* Split queued cosigns per network

* Rename BatchSignId to SubstrateSignId

* Add SubstrateSignableId, a meta-type for either Batch or Block, and modularize around it

* Handle the CosignSubstrateBlock provided TX

* Revert substrate_signer.rs to develop (and patch to still work)

Due to SubstrateSigner moving when the prior multisig closes, yet cosigning
occurring with the most recent key, a single SubstrateSigner can be reused.
We could manage multiple SubstrateSigners, yet considering the much lower
specifications for cosigning, I'd rather treat it distinctly.

* Route cosigning through the processor

* Add note to rename SubstrateSigner post-PR

I don't want to do so now in order to preserve the diff's clarity.

* Implement cosign evaluation into the coordinator

* Get tests to compile

* Bug fixes, mark blocks without cosigners available as cosigned

* Correct the ID Batch preprocesses are saved under, add log statements

* Create a dedicated function to handle cosigns

* Correct the flow around Batch verification/queueing

Verifying `Batch`s could stall when a `Batch` was signed before its
predecessors/before the block it's contained in was cosigned (the latter being
inevitable as we can't sign a block containing a signed batch before signing
the batch).

Now, Batch verification happens on a distinct async task in order to not block
the handling of processor messages. This task is the sole caller of verify in
order to ensure last_verified_batch isn't unexpectedly mutated.

When the processor message handler needs to access it, or needs to queue a
Batch, it associates the DB TXN with a lock preventing the other task from
doing so.

This lock, as currently implemented, is a poor and inefficient design. It
should be modified to the pattern used for cosign management. Additionally, a
new primitive of a DB-backed channel may be immensely valuable.

Fixes a standing potential deadlock and a deadlock introduced with the
cosigning protocol.

* Working full-stack tests

After the last commit, this only required extending a timeout.

* Replace "co-sign" with "cosign" to make finding text easier

* Update the coordinator tests to support cosigning

* Inline prior_batch calculation to prevent panic on rotation

Noticed when doing a final review of the branch.
2023-11-15 16:57:21 -05:00
Luke Parker
c4bdbdde11 dockertest 0.4 (#406)
* Updates to modern dockertest

* More updates to latest dockertest

* Update Cargo.lock to dockertest with handle restored

* clippy coordinator tests

* clippy full-stack tests

* Remove kayabaNerve branch for official repo's latest commit hash

* Update serai-client, remove reliance on the existence of a handle fn

* Don't use the hex encoding of unique_id in dockertests

Gets our hostnames just below 64 bytes, resolving test failures on at least
Debian-based systems.

* Use Network::Isolated for all dockertest instances

* Correct error from prior commit's edits
2023-10-23 06:59:38 -04:00
Luke Parker
7d4e8b59db Update dockertests to new serai-client 2023-10-14 15:26:36 -04:00
Luke Parker
40b7bc59d0 Use dedicated Queues for each from-to pair
Prevents one Processor's message from halting the entire pipeline.
2023-09-27 12:20:57 -04:00
Luke Parker
1bd14163a0 Log all output to console when running in CI
Attempts to debug https://github.com/serai-dex/serai/actions/runs/5992819682/job/16252622433,
which seems to be a legitimate test failure.
2023-08-27 17:51:36 -04:00
Luke Parker
45ea805620 Use an optimistic (and twice as long in the worst case) sleep for KeyGen
Possible due to Substrate having an RPC.
2023-08-21 02:31:22 -04:00
Luke Parker
666bb3e96b E2E test coordinator KeyGen 2023-08-14 06:54:17 -04:00
Luke Parker
f6f945e747 Add a LibP2P instantiation to coordinator
It's largely unoptimized, and not yet exclusive to validators, yet has basic
sanity (using message content for ID instead of sender + index).

Fixes bugs as found. Notably, we used a time in milliseconds where the
Tributary expected  seconds.

Also has Tributary::new jump to the presumed round number. This reduces slashes
when starting new chains (whose times will be before the current time) and was
the only way I was able to observe successful confirmations given current
surrounding infrastructure.
2023-08-08 15:12:47 -04:00
Luke Parker
0dd8aed134 Expand cluster-sm/local testnet to 4 validators for BFT where f=1 2023-08-06 13:42:16 -04:00
Luke Parker
cee788eac3 Test the Coordinator emits KeyGen
Mainly just a test that the full stack is properly set up and we've hit basic
functioning for further testing.
2023-08-06 12:38:44 -04:00
Luke Parker
aab8a417db Have the Coordinator scan the Substrate genesis block
Also adds a workflow for running tests/coordinator.
2023-08-02 12:18:50 -04:00
Luke Parker
d5c787fea2 Add initial coordinator e2e tests 2023-08-01 19:00:48 -04:00