Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: init-testnet command - issues with new validators ConsensusPubkey #276

Open
hard-nett opened this issue Mar 5, 2025 · 9 comments
Open

Comments

@hard-nett
Copy link
Contributor

hard-nett commented Mar 5, 2025

Description

When creating a testnet from mainnet state using bitsongd in-place-testnet sub-1 bitsongvaloper1..., we are met with an error :

expecting cryptotypes.PubKey, got <nil>: invalid type

This is due to the app not being able to convert the cached pubkey bytes into the expected type, when the in-place-testnet logic sets the key for the store, that makes retrieval of a validators state via a consensus pubkey address with StakingKeeper.SetValidatorByConsAddr(ctx, newVal).

logic that errors in x/staking module: pk, ok := v.ConsensusPubkey.GetCachedValue().(cryptotypes.PubKey)

We expect the retrieval of this value due to the fact that the in-place-testnet sets the testing validator to the store via StakingKeeper.SetValidator():

        // store value for validator from its operator address
	str, err := StakingKeeper.ValidatorAddressCodec().StringToBytes(validator.GetOperator())
	if err != nil {
		return err
	}
        valKey := stakingtypes.GetValidatorKey(str)

        //  stakingstore.Set(valKey, bz)

however this does not occur.

@hard-nett
Copy link
Contributor Author

hard-nett commented Mar 5, 2025

Solution (partial):

cosmossdk.io/api/cosmos/crypto/ed25519. was being used to define the type for the new validators pubkey, when the expected type exists in the github.com/cosmos/cosmos-sdk/crypto/keys/ed2559 library. Replacing this module resolves the incompatibility with the new validators Consensus Pubkey

Debugging

A fork with logs to validate difference in key set to store can be found here: https://github.com/permissionlessweb/go-bitsong/blob/v021-iterate-store/go.mod#L236

By installing cometbft with logs, and then performing the inplace-testnet cmd, we are able to also recognize the ed25519 pubkey is the same for the address printed by the cometbft and the one printed by the bitsong app. What is different is the hex ValidatorAddr.

@simi-dev
Copy link
Contributor

simi-dev commented Mar 5, 2025

The partial solution implementation triggers a panic when the chain initialization process begins.

panic: failed to reconstruct vote set from extended commit: vote.ValidatorAddress (A835AF519314FBEB50620AE8588DA9078EE10EF3) does not match address (11ADC9BDC95414E7EA944F1680A40177913D123B) for vote.ValidatorIndex (0)
        Ensure the genesis file is correct across all validators: invalid validator address

@hard-nett
Copy link
Contributor Author

hard-nett commented Mar 5, 2025

The partial solution implementation triggers a panic when the chain initialization process begins.

The address being used in cometbft is being set ala testnetify.

To run the logs:

# download log version of go-bitsong
git clone -b v021-iterate-store https://github.com/permissionlessweb/go-bitsong && cd go-bitsong
# download snapshot 
curl -o bitsong_21243915.tar.lz4 https://snapshot...
# uncompress data
lz4 -c -d bitsong_21243915.tar.lz4 | tar -x -C $HOME/.bitsongd 
# download cometbft
cd ../go-bitsong &&  git clone -b v0.38.17.logs https://github.com/permissionlessweb/cometbft
# build go binary with custom cometbft
go mod tidy && make install && bitsongd in-place-testnet sub-1 <operator-key>  

Running this returns the following error:

vote.ValidatorIndex: 1
ecs.ValidatorAddress.String(): 465BC72482D50DDEDD1B28B90D6C2615E8DB7256
vote.ValidatorAddress.String(): 465BC72482D50DDEDD1B28B90D6C2615E8DB7256
 voteSet.valSet.findProposer().Address.String(): 3CD0634DE5D166D0844075E9D2E2481A8C744565
voteSet.valSet.findProposer().PubKey.Address().String(): 3CD0634DE5D166D0844075E9D2E2481A8C744565
i: 0
val.Address: 3CD0634DE5D166D0844075E9D2E2481A8C744565
val.PubKey.Address().String(): 3CD0634DE5D166D0844075E9D2E2481A8C744565
2:45PM INF Closing application.db module=server
2:45PM INF Closing snapshots/metadata.db module=server
panic: failed to reconstruct vote set from extended commit: cannot find validator 1 in valSet of size 1: invalid validator index

WIth use of

bitsongd comet show-address 

I can see that the current validator key is bitsongvalcons18ngxxn0969ndppzqwh5a9cjgr2x8g3t96w5hj3.

This is the same as the voteSet.valSet.findProposer().Address.String() 3CD0634DE5D166D0844075E9D2E2481A8C744565, verified with:

bitsongd keys parse 3CD0634DE5D166D0844075E9D2E2481A8C744565
#  formats:
#  - bitsong18ngxxn0969ndppzqwh5a9cjgr2x8g3t90emzwd
# - bitsongvalcons18ngxxn0969ndppzqwh5a9cjgr2x8g3t96w5hj3
  ...

The key the app has coming from the modified last signing commit from the comebft service is:

bitsongd keys parse 465BC72482D50DDEDD1B28B90D6C2615E8DB7256
#  formats:
#  - bitsong1geduwfyz65xaahgm9zus6mpxzh5dkujkrn3pcl
#  - bitsongvalcons1geduwfyz65xaahgm9zus6mpxzh5dkujkky75yr
  ...

@hard-nett
Copy link
Contributor Author

hard-nett commented Mar 6, 2025

It looks like this may be due to the fact we using different versions of cometbft from cosmos-sdk and go-bitsong.

v0.50.11 of cosmos-sdk uses v0.38.12 of cometbft, while go-bitsong is using v0.38.16, the changelog between the two can be found here : cometbft/cometbft@v0.38.12...v0.38.16.

Afaik, this is the result in why we are getting a bug with creating an. in-place testnet.

since we know the ed25519 keys are correct, the other key type secp256k1 , and the file crypto/secp256k1/secp256k1.go in cometbft was updated:

cometbft/cometbft@v0.38.12...v0.38.16#diff-5d12562ee810eff7a73c18d3ae4be4d02dfaee662d01cd97584bfc2e727cf337R45

this seems to involve how the bytes are formed by the library, possibly resulting in the discrepancy of public key

Solution (Complete)

Fork and Bump cosmos-sdk to v0.38.16, replace in go-bitsong

After replacing the cosmos-sdk version bitsong uses, running the in-place testnet application displays that cometbft and the cosmos-sdk application are returning identical pubkeys for the secp256k1 key pair:

# from cosmos-sdk
3:50PM INF This node is a validator addr=3CD0634DE5D166D0844075E9D2E2481A8C744565 module=consensus pubKey=PubKeyEd25519{1B1C130C48770F069A4A876C1D391192CA34111C1A41C080BB50A666F371CE25}
# from cometbft
voteSet.valSet: ValidatorSet{
  Proposer: Validator{3CD0634DE5D166D0844075E9D2E2481A8C744565 PubKeyEd25519{1B1C130C48770F069A4A876C1D391192CA34111C1A41C080BB50A666F371CE25} VP:900000000000000 A:0}
  Validators:
    Validator{3CD0634DE5D166D0844075E9D2E2481A8C744565 PubKeyEd25519{1B1C130C48770F069A4A876C1D391192CA34111C1A41C080BB50A666F371CE25} VP:900000000000000 A:0}
}

@antstalepresh
Copy link

antstalepresh commented Mar 10, 2025

It looks like this may be due to the fact we using different versions of cometbft from cosmos-sdk and go-bitsong.

v0.50.11 of cosmos-sdk uses v0.38.12 of cometbft, while go-bitsong is using v0.38.16, the changelog between the two can be found here : cometbft/cometbft@v0.38.12...v0.38.16.

Afaik, this is the result in why we are getting a bug with creating an. in-place testnet.

since we know the ed25519 keys are correct, the other key type secp256k1 , and the file crypto/secp256k1/secp256k1.go in cometbft was updated:

cometbft/cometbft@v0.38.12...v0.38.16#diff-5d12562ee810eff7a73c18d3ae4be4d02dfaee662d01cd97584bfc2e727cf337R45

this seems to involve how the bytes are formed by the library, possibly resulting in the discrepancy of public key

Solution (Complete)

Fork and Bump cosmos-sdk to v0.38.16, replace in go-bitsong

After replacing the cosmos-sdk version bitsong uses, running the in-place testnet application displays that cometbft and the cosmos-sdk application are returning identical pubkeys for the secp256k1 key pair:

from cosmos-sdk

3:50PM INF This node is a validator addr=3CD0634DE5D166D0844075E9D2E2481A8C744565 module=consensus pubKey=PubKeyEd25519{1B1C130C48770F069A4A876C1D391192CA34111C1A41C080BB50A666F371CE25}

from cometbft

voteSet.valSet: ValidatorSet{
Proposer: Validator{3CD0634DE5D166D0844075E9D2E2481A8C744565 PubKeyEd25519{1B1C130C48770F069A4A876C1D391192CA34111C1A41C080BB50A666F371CE25} VP:900000000000000 A:0}
Validators:
Validator{3CD0634DE5D166D0844075E9D2E2481A8C744565 PubKeyEd25519{1B1C130C48770F069A4A876C1D391192CA34111C1A41C080BB50A666F371CE25} VP:900000000000000 A:0}
}

@hard-nett , Did you confirm that testnet runs successfully without seeing any issues after bumping cometbft version for cosmos sdk?

After reviewing cometbft, I found where the issue came from and noticed that the issue happened from mismatching between last validator set and validators' commit signatures for the last block of the snapshot when starting the node.

The number of signatures committed to the last block is about 65 but last validator set is automatically set to the local validator, thus 1.

This mismatch likely causes consensus validation errors, as the last commit's signatures do not match the expected validator set.

@hard-nett
Copy link
Contributor Author

hard-nett commented Mar 10, 2025

@antstalepresh i am not able to yet without modifying the cometbft logic, however we expect the tesnetify logic to remove the signing information for the last block and replace it here, if i am not mistaken: https://github.com/cosmos/cosmos-sdk/blob/v0.50.11/server/start.go#L880

i am able to update cometbft so that if the valSet size is 1, and we run into the error cannot find validator 1, attempt to lookup for the validator by the index of valIndex-1, and this allows me to create a testnet from a single validator here: https://github.com/permissionlessweb/cometbft/blob/v0.38.17.logs/types/vote_set.go#L193

@angelorc
Copy link
Collaborator

The issue was already solved by another dev. He will push the changes asap

@hard-nett
Copy link
Contributor Author

The issue was already solved by another dev. He will push the changes asap

great!

@antstalepresh
Copy link

The issue was already solved by another dev. He will push the changes asap

great!

Fyi, here is the PR for the solution.
#279

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants