Move docs to spec

This commit is contained in:
Luke Parker
2024-03-11 17:55:05 -04:00
parent 0889627e60
commit a3a009a7e9
20 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,176 @@
# Multisig Rotation
Substrate is expected to determine when a new validator set instance will be
created, and with it, a new multisig. Upon the successful creation of a new
multisig, as determined by the new multisig setting their key pair on Substrate,
rotation begins.
### Timeline
The following timeline is established:
1) The new multisig is created, and has its keys set on Serai. Once the next
`Batch` with a new external network block is published, its block becomes the
"queue block". The new multisig is set to activate at the "queue block", plus
`CONFIRMATIONS` blocks (the "activation block").
We don't use the last `Batch`'s external network block, as that `Batch` may
be older than `CONFIRMATIONS` blocks. Any yet-to-be-included-and-finalized
`Batch` will be within `CONFIRMATIONS` blocks of what any processor has
scanned however, as it'll wait for inclusion and finalization before
continuing scanning.
2) Once the "activation block" itself has been finalized on Serai, UIs should
start exclusively using the new multisig. If the "activation block" isn't
finalized within `2 * CONFIRMATIONS` blocks, UIs should stop making
transactions to any multisig on that network.
Waiting for Serai's finalization prevents a UI from using an unfinalized
"activation block" before a re-organization to a shorter chain. If a
transaction to Serai was carried from the unfinalized "activation block"
to the shorter chain, it'd no longer be after the "activation block" and
accordingly would be ignored.
We could not wait for Serai to finalize the block, yet instead wait for the
block to have `CONFIRMATIONS` confirmations. This would prevent needing to
wait for an indeterminate amount of time for Serai to finalize the
"activation block", with the knowledge it should be finalized. Doing so would
open UIs to eclipse attacks, where they live on an alternate chain where a
possible "activation block" is finalized, yet Serai finalizes a distinct
"activation block". If the alternate chain was longer than the finalized
chain, the above issue would be reopened.
The reason for UIs stopping under abnormal behavior is as follows. Given a
sufficiently delayed `Batch` for the "activation block", UIs will use the old
multisig past the point it will be deprecated. Accordingly, UIs must realize
when `Batch`s are so delayed and continued transactions are a risk. While
`2 * CONFIRMATIONS` is presumably well within the 6 hour period (defined
below), that period exists for low-fee transactions at time of congestion. It
does not exist for UIs with old state, though it can be used to compensate
for them (reducing the tolerance for inclusion delays). `2 * CONFIRMATIONS`
is before the 6 hour period is enacted, preserving the tolerance for
inclusion delays, yet still should only happen under highly abnormal
circumstances.
In order to minimize the time it takes for "activation block" to be
finalized, a `Batch` will always be created for it, regardless of it would
otherwise have a `Batch` created.
3) The prior multisig continues handling `Batch`s and `Burn`s for
`CONFIRMATIONS` blocks, plus 10 minutes, after the "activation block".
The first `CONFIRMATIONS` blocks is due to the fact the new multisig
shouldn't actually be sent coins during this period, making it irrelevant.
If coins are prematurely sent to the new multisig, they're artificially
delayed until the end of the `CONFIRMATIONS` blocks plus 10 minutes period.
This prevents an adversary from minting Serai tokens using coins in the new
multisig, yet then burning them to drain the prior multisig, creating a lack
of liquidity for several blocks.
The reason for the 10 minutes is to provide grace to honest UIs. Since UIs
will wait until Serai confirms the "activation block" for keys before sending
to them, which will take `CONFIRMATIONS` blocks plus some latency, UIs would
make transactions to the prior multisig past the end of this period if it was
`CONFIRMATIONS` alone. Since the next period is `CONFIRMATIONS` blocks, which
is how long transactions take to confirm, transactions made past the end of
this period would only received after the next period. After the next period,
the prior multisig adds fees and a delay to all received funds (as it
forwards the funds from itself to the new multisig). The 10 minutes provides
grace for latency.
The 10 minutes is a delay on anyone who immediately transitions to the new
multisig, in a no latency environment, yet the delay is preferable to fees
from forwarding. It also should be less than 10 minutes thanks to various
latencies.
4) The prior multisig continues handling `Batch`s and `Burn`s for another
`CONFIRMATIONS` blocks.
This is for two reasons:
1) Coins sent to the new multisig still need time to gain sufficient
confirmations.
2) All outputs belonging to the prior multisig should become available within
`CONFIRMATIONS` blocks.
All `Burn`s handled during this period should use the new multisig for the
change address. This should effect a transfer of most outputs.
With the expected transfer of most outputs, and the new multisig receiving
new external transactions, the new multisig takes the responsibility of
signing all unhandled and newly emitted `Burn`s.
5) For the next 6 hours, all non-`Branch` outputs received are immediately
forwarded to the new multisig. Only external transactions to the new multisig
are included in `Batch`s.
The new multisig infers the `InInstruction`, and refund address, for
forwarded `External` outputs via reading what they were for the original
`External` output.
Alternatively, the `InInstruction`, with refund address explicitly included,
could be included in the forwarding transaction. This may fail if the
`InInstruction` omitted the refund address and is too large to fit in a
transaction with one explicitly included. On such failure, the refund would
be immediately issued instead.
6) Once the 6 hour period has expired, the prior multisig stops handling outputs
it didn't itself create. Any remaining `Eventuality`s are completed, and any
available/freshly available outputs are forwarded (creating new
`Eventuality`s which also need to successfully resolve).
Once all the 6 hour period has expired, no `Eventuality`s remain, and all
outputs are forwarded, the multisig publishes a final `Batch` of the first
block, plus `CONFIRMATIONS`, which met these conditions, regardless of if it
would've otherwise had a `Batch`. No further actions by it, nor its
validators, are expected (unless, of course, those validators remain present
in the new multisig).
7) The new multisig confirms all transactions from all prior multisigs were made
as expected, including the reported `Batch`s.
Unfortunately, we cannot solely check the immediately prior multisig due to
the ability for two sequential malicious multisigs to steal. If multisig
`n - 2` only transfers a fraction of its coins to multisig `n - 1`, multisig
`n - 1` can 'honestly' operate on the dishonest state it was given,
laundering it. This would let multisig `n - 1` forward the results of its
as-expected operations from a dishonest starting point to the new multisig,
and multisig `n` would attest to multisig `n - 1`'s expected (and therefore
presumed honest) operations, assuming liability. This would cause an honest
multisig to face full liability for the invalid state, causing it to be fully
slashed (as needed to reacquire any lost coins).
This would appear short-circuitable if multisig `n - 1` transfers coins
exceeding the relevant Serai tokens' supply. Serai never expects to operate
in an over-solvent state, yet balance should trend upwards due to a flat fee
applied to each received output (preventing a griefing attack). Any balance
greater than the tokens' supply may have had funds skimmed off the top, yet
they'd still guarantee the solvency of Serai without any additional fees
passed to users. Unfortunately, due to the requirement to verify the `Batch`s
published (as else the Serai tokens' supply may be manipulated), this cannot
actually be achieved (at least, not without a ZK proof the published `Batch`s
were correct).
8) The new multisig publishes the next `Batch`, signifying the accepting of full
responsibilities and a successful close of the prior multisig.
### Latency and Fees
Slightly before the end of step 3, the new multisig should start receiving new
external outputs. These won't be confirmed for another `CONFIRMATIONS` blocks,
and the new multisig won't start handling `Burn`s for another `CONFIRMATIONS`
blocks plus 10 minutes. Accordingly, the new multisig should only become
responsible for `Burn`s shortly after it has taken ownership of the stream of
newly received coins.
Before it takes responsibility, it also should've been transferred all internal
outputs under the standard scheduling flow. Any delayed outputs will be
immediately forwarded, and external stragglers are only reported to Serai once
sufficiently confirmed in the new multisig. Accordingly, liquidity should avoid
fragmentation during rotation. The only latency should be on the 10 minutes
present, and on delayed outputs, which should've been immediately usable, having
to wait another `CONFIRMATIONS` blocks to be confirmed once forwarded.
Immediate forwarding does unfortunately prevent batching inputs to reduce fees.
Given immediate forwarding only applies to latent outputs, considered
exceptional, and the protocol's fee handling ensures solvency, this is accepted.

126
spec/processor/Processor.md Normal file
View File

@@ -0,0 +1,126 @@
# Processor
The processor is a service which has an instance spawned per network. It is
responsible for several tasks, from scanning an external network to signing
transactions with payments.
This document primarily discusses its flow with regards to the coordinator.
### Generate Key
On `key_gen::CoordinatorMessage::GenerateKey`, the processor begins a pair of
instances of the distributed key generation protocol specified in the FROST
paper.
The first instance is for a key to use on the external network. The second
instance is for a Ristretto public key used to publish data to the Serai
blockchain. This pair of FROST DKG instances is considered a single instance of
Serai's overall key generation protocol.
The commitments for both protocols are sent to the coordinator in a single
`key_gen::ProcessorMessage::Commitments`.
### Key Gen Commitments
On `key_gen::CoordinatorMessage::Commitments`, the processor continues the
specified key generation instance. The secret shares for each fellow
participant are sent to the coordinator in a
`key_gen::ProcessorMessage::Shares`.
#### Key Gen Shares
On `key_gen::CoordinatorMessage::Shares`, the processor completes the specified
key generation instance. The generated key pair is sent to the coordinator in a
`key_gen::ProcessorMessage::GeneratedKeyPair`.
### Confirm Key Pair
On `substrate::CoordinatorMessage::ConfirmKeyPair`, the processor starts using
the newly confirmed key, scanning blocks on the external network for
transfers to it.
### External Network Block
When the external network has a new block, which is considered finalized
(either due to being literally finalized or due to having a sufficient amount
of confirmations), it's scanned.
Outputs to the key of Serai's multisig are saved to the database. Outputs which
newly transfer into Serai are used to build `Batch`s for the block. The
processor then begins a threshold signature protocol with its key pair's
Ristretto key to sign the `Batch`s.
The `Batch`s are each sent to the coordinator in a
`substrate::ProcessorMessage::Batch`, enabling the coordinator to know what
`Batch`s *should* be published to Serai. After each
`substrate::ProcessorMessage::Batch`, the preprocess for the first instance of
its signing protocol is sent to the coordinator in a
`coordinator::ProcessorMessage::BatchPreprocess`.
As a design comment, we *may* be able to sign now possible, already scheduled,
branch/leaf transactions at this point. Doing so would be giving a mutable
borrow over the scheduler to both the external network and the Serai network,
and would accordingly be unsafe. We may want to look at splitting the scheduler
in two, in order to reduce latency (TODO).
### Batch Preprocesses
On `coordinator::CoordinatorMessage::BatchPreprocesses`, the processor
continues the specified batch signing protocol, sending
`coordinator::ProcessorMessage::BatchShare` to the coordinator.
### Batch Shares
On `coordinator::CoordinatorMessage::BatchShares`, the processor
completes the specified batch signing protocol. If successful, the processor
stops signing for this batch and sends
`substrate::ProcessorMessage::SignedBatch` to the coordinator.
### Batch Re-attempt
On `coordinator::CoordinatorMessage::BatchReattempt`, the processor will create
a new instance of the batch signing protocol. The new protocol's preprocess is
sent to the coordinator in a `coordinator::ProcessorMessage::BatchPreprocess`.
### Substrate Block
On `substrate::CoordinatorMessage::SubstrateBlock`, the processor:
1) Marks all blocks, up to the external block now considered finalized by
Serai, as having had their batches signed.
2) Adds the new outputs from newly finalized blocks to the scheduler, along
with the necessary payments from `Burn` events on Serai.
3) Sends a `substrate::ProcessorMessage::SubstrateBlockAck`, containing the IDs
of all plans now being signed for, to the coordinator.
4) Sends `sign::ProcessorMessage::Preprocess` for each plan now being signed
for.
### Sign Preprocesses
On `sign::CoordinatorMessage::Preprocesses`, the processor continues the
specified transaction signing protocol, sending `sign::ProcessorMessage::Share`
to the coordinator.
### Sign Shares
On `sign::CoordinatorMessage::Shares`, the processor completes the specified
transaction signing protocol. If successful, the processor stops signing for
this transaction and publishes the signed transaction. Then,
`sign::ProcessorMessage::Completed` is sent to the coordinator, to be
broadcasted to all validators so everyone can observe the attempt completed,
producing a signed and published transaction.
### Sign Re-attempt
On `sign::CoordinatorMessage::Reattempt`, the processor will create a new
a new instance of the transaction signing protocol if it hasn't already
completed/observed completion of an instance of the signing protocol. The new
protocol's preprocess is sent to the coordinator in a
`sign::ProcessorMessage::Preprocess`.
### Sign Completed
On `sign::CoordinatorMessage::Completed`, the processor verifies the included
transaction hash actually refers to an accepted transaction which completes the
plan it was supposed to. If so, the processor stops locally signing for the
transaction, and emits `sign::ProcessorMessage::Completed` if it hasn't prior.

View File

@@ -0,0 +1,31 @@
# Scanning
Only blocks with finality, either actual or sufficiently probabilistic, are
operated upon. This is referred to as a block with `CONFIRMATIONS`
confirmations, the block itself being the first confirmation.
For chains which promise finality on a known schedule, `CONFIRMATIONS` is set to
`1` and each group of finalized blocks is treated as a single block, with the
tail block's hash representing the entire group.
For chains which offer finality, on an unknown schedule, `CONFIRMATIONS` is
still set to `1` yet blocks aren't aggregated into a group. They're handled
individually, yet only once finalized. This allows networks which form
finalization erratically to not have to agree on when finalizations were formed,
solely that the blocks contained have a finalized descendant.
### Notability, causing a `Batch`
`Batch`s are only created for blocks which it benefits to achieve ordering on.
These are:
- Blocks which contain transactions relevant to Serai
- Blocks which in which a new multisig activates
- Blocks in which a prior multisig retires
### Waiting for `Batch` inclusion
Once a `Batch` is created, it is expected to eventually be included on Serai.
If the `Batch` isn't included within `CONFIRMATIONS` blocks of its creation, the
scanner will wait until its inclusion before scanning
`batch_block + CONFIRMATIONS`.

View File

@@ -0,0 +1,228 @@
# UTXO Management
UTXO-based chains have practical requirements for efficient operation which can
effectively be guaranteed to terminate with a safe end state. This document
attempts to detail such requirements, and the implementations in Serai resolving
them.
## Fees From Effecting Transactions Out
When `sriXYZ` is burnt, Serai is expected to create an output for `XYZ` as
instructed. The transaction containing this output will presumably have some fee
necessitating payment. Serai linearly amortizes this fee over all outputs this
transaction intends to create in response to burns.
While Serai could charge a fee in advance, either static or dynamic to views of
the fee market, it'd risk the fee being inaccurate. If it's too high, users have
paid fees they shouldn't have. If it's too low, Serai is insolvent. This is why
the actual fee is amortized, rather than an estimation being prepaid.
Serai could report a view, and when burning occurred, that view could be locked
in as the basis for transaction fees as used to fulfill the output in question.
This would require burns specify the most recent fee market view they're aware
of, signifying their agreeance, with Serai erroring is a new view is published
before the burn is included on-chain. Not only would this require more data be
published to Serai (widening data pipeline requirements), it'd prevent any
RBF-based solutions to dynamic fee markets causing transactions to get stuck.
## Output Frequency
Outputs can be created on an external network at rate
`max_outputs_per_tx / external_tick_rate`, where `external_tick_rate` is the
external's network limitations on spending outputs. While `external_tick_rate`
is generally writable as zero, due to mempool chaining, some external networks
may not allow spending outputs from transactions which have yet to be ordered.
Monero only allows spending outputs from transactions who have 10 confirmations,
for its own security.
Serai defines its own tick rate per external network, such that
`serai_tick_rate >= external_tick_rate`. This ensures that Serai never assumes
availability before actual availability. `serai_tick_rate` is also `> 0`. This
is since a zero `external_tick_rate` generally does not truly allow an infinite
output creation rate due to limitations on the amount of transactions allowed
in the mempool.
Define `output_creation_rate` as `max_outputs_per_tx / serai_tick_rate`. Under a
naive system which greedily accumulates inputs and linearly processes outputs,
this is the highest speed at which outputs which may be processed.
If the Serai blockchain enables burning sriXYZ at a rate exceeding
`output_creation_rate`, a backlog would form. This backlog could linearly grow
at a rate larger than the outputs could linearly shrink, creating an
ever-growing backlog, performing a DoS against Serai.
One solution would be to increase the fee associated with burning sriXYZ when
approaching `output_creation_rate`, making such a DoS unsustainable. This would
require the Serai blockchain be aware of each external network's
`output_creation_rate` and implement such a sliding fee. This 'solution' isn't
preferred as it still temporarily has a growing queue, and normal users would
also be affected by the increased fees.
The solution implemented into Serai is to consume all burns from the start of a
global queue which can be satisfied under currently available inputs. While the
consumed queue may have 256 items, which can't be processed within a single tick
by an external network whose `output_creation_rate` is 16, Serai can immediately
set a finite bound on execution duration.
For the above example parameters, Serai would create 16 outputs within its tick,
ignoring the necessity of a change output. These 16 outputs would _not_ create
any outputs Serai is expected to create in response to burns, yet instead create
16 "branch" outputs. One tick later, when the branch outputs are available to
spend, each would fund creating of 16 expected outputs.
For `e` expected outputs, the execution duration is just `log e` ticks _with the
base of the logarithm being `output_creation_rate`_. Since these `e` expected
outputs are consumed from the linearly-implemented global queue into their own
tree structure, execution duration cannot be extended. We can also re-consume
the entire global queue (barring input availability, see next section) after
just one tick, when the change output becomes available again.
Due to the logarithmic complexity of fulfilling burns, attacks require
exponential growth (which is infeasible to scale). This solution does not
require a sliding fee on Serai's side due to not needing to limit the on-chain
rate of burns, which means it doesn't so adversely affect normal users. While
an increased tree depth will increase the amount of transactions needed to
fulfill an output, increasing the fee amortized over the output and its
siblings, this fee scales linearly with the logarithmically scaling tree depth.
This is considered acceptable.
## Input Availability
The following section refers to spending an output, and then spending it again.
Spending it again, which is impossible under the UTXO model, refers to spending
the change output of the transaction it was spent in. The following section
also assumes any published transaction is immediately ordered on-chain, ignoring
the potential for latency from mempool to blockchain (as it is assumed to have a
negligible effect in practice).
When a burn for amount `a` is issued, the sum amount of immediately available
inputs may be `< a`. This is because despite each output being considered usable
on a tick basis, there is no global tick. Each output may or may not be
spendable at some moment, and spending it will prevent its availability for one
tick of a clock newly started.
This means all outputs will become available by simply waiting a single tick,
without spending any outputs during the waited tick. Any outputs unlocked at the
start of the tick will carry, and within the tick the rest of the outputs will
become unlocked.
This means that within a tick of operations, the full balance of Serai can be
considered unlocked and used to consume the entire global queue. While Serai
could wait for all its outputs to be available before popping from the front of
the global queue, eager execution as enough inputs become available provides
lower latency. Considering the tick may be an hour (as in the case of Bitcoin),
this is very appreciated.
If a full tick is waited for, due to the front of the global queue having a
notably large burn, then the entire global queue will be consumed as full input
availability means the ability to satisfy all potential burns in a solvent
system.
## Fees Incurred During Operations
While fees incurred when satisfying burn were covered above, with documentation
on how solvency is maintained, two other operating costs exists.
1) Input accumulation
2) Multisig rotations
Input accumulation refers to transactions which exist to merge inputs. Just as
there is a `max_outputs_per_tx`, there is a `max_inputs_per_tx`. When the amount
of inputs belonging to Serai exceeds `max_inputs_per_tx`, a TX merging them is
created. This TX incurs fees yet has no outputs mapping to burns to amortize
them over, accumulating operating costs.
Please note that this merging occurs in parallel to create a logarithmic
execution, similar to how outputs are also processed in parallel.
As for multisig rotation, multisig rotation occurs when a new multisig for an
external network is created and the old multisig must transfer its inputs in
order for Serai to continue its operations. This operation also incurs fees
without having outputs immediately available to amortize over.
Serai could charge fees on received outputs, deducting from the amount of
`sriXYZ` minted in order to cover these operating fees. An overt amount would be
deducted to practically ensure solvency, forming a buffer. Once the buffer is
filled, fees would be reduced. As the buffer drains, fees would go back up.
This would keep charged fees in line with actual fees, once the buffer is
initially filled, yet requires:
1) Creating and tracking a buffer
2) Overcharging some users on fees
while still risking insolvency, if the actual fees keep increasing in a way
preventing successful estimation.
The solution Serai implements is to accrue operating costs, tracking with each
created transaction the running operating costs. When a created transaction has
payments out, all of the operating costs incurred so far, which have yet to be
amortized, are immediately and fully amortized.
## Attacks by a Malicious Miner
There is the concern that a significant amount of outputs could be created,
which when merged as inputs, create a significant amount of operating costs.
This would then be forced onto random users who burn `sriXYZ` soon after, while
the party who caused the operating costs would then be able to burn their own
`sriXYZ` without notable fees.
To describe this attack in its optimal form, assume a sole malicious block
producer for an external network. The malicious miner adds an output to Serai,
not paying any fees as the block producer. This single output alone may trigger
an aggregation transaction. Serai would pay for the transaction fee, the fee
going to the malicious miner.
When Serai users burn `sriXYZ`, they are hit with the aggregation transaction's
fee plus the normally amortized fee. Then, the malicious miner burns their
`sriXYZ`, having the fee they capture be amortized over their output. In this
process, they remain net except for the increased transaction fees they gain
from other users, which they profit.
To limit this attack vector, a flat fee of
`2 * (the estimation of a 2-input-merging transaction fee)` is applied to each
input. This means, assuming an inability to manipulate Serai's fee estimations,
creating an output to force a merge transaction (and the associated fee) costs
the attacker twice as much as the associated fee.
A 2-input TX's fee is used as aggregating multiple inputs at once actually
yields in Serai's favor so long as the per-input fee exceeds the cost of the
per-input addition to the TX. Since the per-input fee is the cost of an entire
TX, this property is true.
### Profitability Without the Flat Fee With a Minority of Hash Power
Ignoring the above flat fee, a malicious miner could use aggregating multiple
inputs to achieve profit with a minority of hash power. The following is how a
miner with 7% of the external network's hash power could execute this attack
profitably over a network with a `max_inputs_per_tx` value of 16:
1) Mint `sriXYZ` with 256 outputs during their own blocks. This incurs no fees
and would force 16 aggregation transactions to be created.
2) _A miner_, which has a 7% chance of being the malicious miner, collects the
16 transaction fees.
3) The malicious miner burns their sriXYZ, with a 7% chance of collecting their
own fee or a 93% chance of losing a single transaction fee.
16 attempts would cost 16 transaction fees if they always lose their single
transaction fee. Gaining the 16 transaction fees once, offsetting costs, is
expected to happen with just 6.25% of the hash power. Since the malicious miner
has 7%, they're statistically likely to recoup their costs and eventually turn
a profit.
With a flat fee of at least the cost to aggregate a single input in a full
aggregation transaction, this attack falls apart. Serai's flat fee is the higher
cost of the fee to aggregate two inputs in an aggregation transaction.
### Solvency Without the Flat Fee
Even without the above flat fee, Serai remains solvent. With the above flat fee,
malicious miners on external networks can only steal from other users if they
can manipulate Serai's fee estimations so that the merge transaction fee used is
twice as high as the fees charged for causing a merge transaction. This is
assumed infeasible to perform at scale, yet even if demonstrated feasible, it
would not be a critical vulnerability against Serai. Solely a low/medium/high
vulnerability against the users (though one it would still be our responsibility
to rectify).