Optimizing EIP-4844 transaction validation for mempool (using KZG pro…

…ofs) (#5088) * Fix missing variables/funcs in validate_blob_transaction_wrapper() There is no `tx.message.blob_commitments` anymore, or `kzg_to_commitment()` * Introduce KZGProof as its own type instead of using KZGCommitment * Introduce high-level logic of new efficient transaction validation To validate a 4844 transaction in the mempool, the verifier checks that each provided KZG commitment matches the polynomial represented by the corresponding blob data. | d_1 | d_2 | d_3 | ... | d_4096 | -> commitment Before this patch, to do this validation, we reconstructed the commitment from the blob data (d_i above), and checked it against the provided commitment. This was expensive because computing a commitment from blob data (even using Lagrange basis) involves N scalar multiplications, where N is the number of field elements per blob. Initial benchmarking showed that this was about 40ms for N=4096 which was deemed too expensive. For more details see: https://hackmd.io/@protolambda/eip-4844-implementer-notes#Optimizations protolambda/go-ethereum#4 In this patch, we speed this up by providing a KZG proof for each commitment. The verifier can check that proof to ensure that the KZG commitment matches the polynomial represented by the corresponding blob data. | d_1 | d_2 | d_3 | ... | d_4096 | -> commitment, proof To do so, we evaluate the blob data polynomial at a random point `x` to get a value `y`. We then use the KZG proof to ensure that the commited polynomial (i.e. the commitment) also evaluates to `y` at `x`. If the check passes, it means that the KZG commitment matches the polynomial represented by the blob data. This is significantly faster since evaluating the blob data polynomial at a random point using the Barycentric formula can be done efficiently with only field operations (see https://hackmd.io/@vbuterin/barycentric_evaluation). Then, verifying a KZG proof takes two pairing operations (which take about 0.6ms each). This brings the total verification cost to about 2 ms per blob. With some additional optimizations (using linear combination tricks as the ones linked above) we can batch all the blobs together into a single efficient verification, and hence verify the entire transaction in 2.5 ms. The same techniques can be used to efficiently verify blocks on the consensus side. * Introduce polynomial helper functions for transaction validation * Implement high-level logic of aggregated proof verification * Add helper functions for aggregated proof verification Also abstract `lincomb()` out of the `blob_to_kzg()` function to be used in the verification. * Fixes after review on the consensus PR
ethereum · Jun 29, 2022 · 0cf9afe · 0cf9afe
1 parent 62f2847
commit 0cf9afe
Showing 1 changed file with 100 additions and 15 deletions.
diff --git a/EIPS/eip-4844.md b/EIPS/eip-4844.md
@@ -46,6 +46,7 @@ Compared to full data sharding, this EIP has a reduced cap on the number of thes
 | `BLS_MODULUS` | `52435875175126190479447740508185965837690552500527637822603658699938581184513` |
 | `KZG_SETUP_G2` | `Vector[G2Point, FIELD_ELEMENTS_PER_BLOB]`, contents TBD |
 | `KZG_SETUP_LAGRANGE` | `Vector[KZGCommitment, FIELD_ELEMENTS_PER_BLOB]`, contents TBD |
+| `ROOTS_OF_UNITY` | `Vector[BLSFieldElement, FIELD_ELEMENTS_PER_BLOB]` |
 | `BLOB_COMMITMENT_VERSION_KZG` | `Bytes1(0x01)` |
 | `POINT_EVALUATION_PRECOMPILE_ADDRESS` | `Bytes20(0x14)` |
 | `POINT_EVALUATION_PRECOMPILE_GAS` | `50000` |
@@ -71,21 +72,24 @@ Compared to full data sharding, this EIP has a reduced cap on the number of thes
 | `Blob` | `Vector[BLSFieldElement, FIELD_ELEMENTS_PER_BLOB]` | |
 | `VersionedHash` | `Bytes32` | |
 | `KZGCommitment` | `Bytes48` | Same as BLS standard "is valid pubkey" check but also allows `0x00..00` for point-at-infinity |
+| `KZGProof` | `Bytes48` | Same as for `KZGCommitment` |
 
 ### Helpers
 
 Converts a blob to its corresponding KZG point:
 
 ```python
+def lincomb(points: List[KZGCommitment], scalars: List[BLSFieldElement]) -> KZGCommitment:
+    """
+    BLS multiscalar multiplication. This function can be optimized using Pippenger's algorithm and variants.
+    """
+    r = bls.Z1
+    for x, a in zip(points, scalars):
+        r = bls.add(r, bls.multiply(x, a))
+    return r
+
 def blob_to_kzg(blob: Blob) -> KZGCommitment:
-    computed_kzg = bls.Z1
-    for value, point_kzg in zip(blob, KZG_SETUP_LAGRANGE):
-        assert value < BLS_MODULUS
-        computed_kzg = bls.add(
-            computed_kzg,
-            bls.multiply(point_kzg, value)
-        )
-    return computed_kzg
+    return lincomb(KZG_SETUP_LAGRANGE, blob)
 ```
 
 Converts a KZG point into a versioned hash:
@@ -101,7 +105,7 @@ Verifies a KZG evaluation proof:
 def verify_kzg_proof(polynomial_kzg: KZGCommitment,
                      x: BLSFieldElement,
                      y: BLSFieldElement,
-                     quotient_kzg: KZGCommitment):
+                     quotient_kzg: KZGProof) -> bool:
     # Verify: P - y = Q * (X - x)
     X_minus_x = bls.add(KZG_SETUP_G2[1], bls.multiply(bls.G2, BLS_MODULUS - x))
     P_minus_y = bls.add(polynomial_kzg, bls.multiply(bls.G1, BLS_MODULUS - y))
@@ -111,6 +115,39 @@ def verify_kzg_proof(polynomial_kzg: KZGCommitment,
     ])
 ```
 
+Efficiently evaluates a polynomial in evaluation form using the barycentric formula
+
+```python
+def bls_modular_inverse(x: BLSFieldElement) -> BLSFieldElement:
+    """
+    Compute the modular inverse of x
+    i.e. return y such that x * y % BLS_MODULUS == 1 and return 0 for x == 0
+    """
+    return pow(x, -1, BLS_MODULUS) if x != 0 else 0
+
+
+def div(x, y):
+    """Divide two field elements: `x` by `y`"""
+    return x * bls_modular_inverse(y) % BLS_MODULUS
+
+
+def evaluate_polynomial_in_evaluation_form(poly: List[BLSFieldElement], x: BLSFieldElement) -> BLSFieldElement:
+    """
+    Evaluate a polynomial (in evaluation form) at an arbitrary point `x`
+    Uses the barycentric formula:
+       f(x) = (1 - x**WIDTH) / WIDTH  *  sum_(i=0)^WIDTH  (f(DOMAIN[i]) * DOMAIN[i]) / (x - DOMAIN[i])
+    """
+    width = len(poly)
+    assert width == FIELD_ELEMENTS_PER_BLOB
+    inverse_width = bls_modular_inverse(width)
+
+    for i in range(width):
+        r += div(poly[i] * ROOTS_OF_UNITY[i], (x - ROOTS_OF_UNITY[i]) )
+    r = r * (pow(x, width, BLS_MODULUS) - 1) * inverse_width % BLS_MODULUS
+
+    return r
+```
+
 Approximates `2 ** (numerator / denominator)`, with the simplest possible approximation that is continuous and has a continuous derivative:
 
 ```python
@@ -321,20 +358,68 @@ class BlobTransactionNetworkWrapper(Container):
     blob_kzgs: List[KZGCommitment, MAX_TX_WRAP_KZG_COMMITMENTS]
     # BLSFieldElement = uint256
     blobs: List[Vector[BLSFieldElement, FIELD_ELEMENTS_PER_BLOB], LIMIT_BLOBS_PER_TX]
+    # KZGProof = Bytes48
+    kzg_aggregated_proof: KZGProof
 ```
 
 We do network-level validation of `BlobTransactionNetworkWrapper` objects as follows:
 
 ```python
+def hash_to_bls_field(x: Container) -> BLSFieldElement:
+    """
+    This function is used to generate Fiat-Shamir challenges. The output is not uniform over the BLS field.
+    """
+    return int.from_bytes(hash_tree_root(x), "little") % BLS_MODULUS
+
+
+def compute_powers(x: BLSFieldElement, n: uint64) -> List[BLSFieldElement]:
+    current_power = 1
+    powers = []
+    for _ in range(n):
+        powers.append(BLSFieldElement(current_power))
+        current_power = current_power * int(x) % BLS_MODULUS
+    return powers
+
+def vector_lincomb(vectors: List[List[BLSFieldElement]], scalars: List[BLSFieldElement]) -> List[BLSFieldElement]:
+    """
+    Given a list of vectors, compute the linear combination of each column with `scalars`, and return the resulting
+    vector.
+    """
+    r = [0]*len(vectors[0])
+    for v, a in zip(vectors, scalars):
+        for i, x in enumerate(v):
+            r[i] = (r[i] + a * x) % BLS_MODULUS
+    return [BLSFieldElement(x) for x in r]
+
 def validate_blob_transaction_wrapper(wrapper: BlobTransactionNetworkWrapper):
     versioned_hashes = wrapper.tx.message.blob_versioned_hashes
-    kzgs = wrapper.blob_kzgs
+    commitments = wrapper.blob_kzgs
     blobs = wrapper.blobs
-    assert len(versioned_hashes) == len(kzgs) == len(blobs)
-    for versioned_hash, kzg, blob in zip(versioned_hashes, kzgs, blobs):
-        # note: assert blob is not malformatted
-        assert kzg == blob_to_kzg(blob)
-        assert versioned_hash == kzg_to_versioned_hash(kzg)
+    # note: assert blobs are not malformatted
+
+    assert len(versioned_hashes) == len(commitments) == len(blobs)
+    number_of_blobs = len(blobs)
+
+    # Generate random linear combination challenges
+    r = hash_to_bls_field([blobs, commitments])
+    r_powers = compute_powers(r, number_of_blobs)
+
+    # Compute commitment to aggregated polynomial
+    aggregated_poly_commitment = lincomb(commitments, r_powers)
+
+    # Create aggregated polynomial in evaluation form
+    aggregated_poly = vector_lincomb(blobs, r_powers)
+
+    # Generate challenge `x` and evaluate the aggregated polynomial at `x`
+    x = hash_to_bls_field([aggregated_poly, aggregated_poly_commitment])
+    y = evaluate_polynomial_in_evaluation_form(aggregated_poly, x)
+
+    # Verify aggregated proof
+    assert verify_kzg_proof(aggregated_poly_commitment, x, y, wrapper.kzg_aggregated_proof)
+
+    # Now that all commitments have been verified, check that versioned_hashes matches the commitments
+    for versioned_hash, commitment in zip(versioned_hashes, commitments):
+        assert versioned_hash == kzg_to_versioned_hash(commitment)
 ```
 
 ## Rationale