Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky - TestP2PWithMultipleDocumentsWithUpdateAndDeleteBeforeConnectSingleDeleteWithShowDeleted #1338

Closed
shahzadlone opened this issue Apr 12, 2023 · 3 comments · Fixed by #1388
Assignees
Labels
bug Something isn't working

Comments

@shahzadlone
Copy link
Member

Describe the problem
Flakey test.

To Reproduce
make test

Expected behavior
Not be flaky

Platform

latest commit on develop:
bbe8408a (HEAD -> develop, origin/develop) 2 hours ago fix: API address parameter validation (#1311)

Additional context

=== Failed
=== FAIL: tests/integration/net/state/simple/peer TestP2PWithMultipleDocumentsWithUpdateAndDeleteBeforeConnectSingleDeleteWithShowDeleted (11.35s)
2023-04-12T11:22:09.996-0400    ERROR   dht     go-libp2p-kad-dht@v0.21.1/dht.go:655    failed to validate if peer is a DHT peer        {"peer": "12D3KooWQ8jR6HdUsPYZ3wPKs9SpaYT2giX6Hx2dsDfjm4DsVHAq", "error": "datastore closed", "errorVerbose": "datastore closed. Stack: /home/shahzad/~~/defradb/datastore/memory/errors.go:20 (0x1c62407)\n\tinit: ErrClosed      = errors.New(\"datastore closed\")\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6329 (0x482566)\n\tdoInit: f()\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:233 (0x474db3)\n\tmain: doInit(&main_inittask)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/asm_amd64.s:1594 (0x4a90a1)\n\tgoexit: BYTE\t$0x90\t// NOP\n"}
2023-04-12T11:22:09.996-0400, INFO, defra.tests.integration, , {"Database": "badger-in-memory"}
2023-04-12T11:22:10.003-0400    ERROR   dht     go-libp2p-kad-dht@v0.21.1/dht.go:655    failed to validate if peer is a DHT peer        {"peer": "12D3KooWEStTWTjAoonnsKR2mPRNRhFyv6JJL5UPhM4Z1WTuBdrg", "error": "datastore closed", "errorVerbose": "datastore closed. Stack: /home/shahzad/~~/defradb/datastore/memory/errors.go:20 (0x1c62407)\n\tinit: ErrClosed      = errors.New(\"datastore closed\")\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6329 (0x482566)\n\tdoInit: f()\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:233 (0x474db3)\n\tmain: doInit(&main_inittask)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/asm_amd64.s:1594 (0x4a90a1)\n\tgoexit: BYTE\t$0x90\t// NOP\n"}
2023-04-12T11:22:10.029-0400, INFO, defra.tests.integration, Starting P2P node, {"P2P address": "/ip4/0.0.0.0/tcp/0"}
2023-04-12T11:22:10.125-0400, INFO, defra.node, Created LibP2P host, {"PeerId": "12D3KooWHFDJ4fiDs5yoDvQgmLhN5NZDRJXLupiPPsHPTKQTLp2Z", "Address": ["/ip4/0.0.0.0/tcp/0"]}
2023-04-12T11:22:10.137-0400, INFO, defra.net, Starting internal broadcaster for pubsub network
2023-04-12T11:22:10.188-0400, INFO, defra.tests.integration, Starting P2P node, {"P2P address": "/ip4/0.0.0.0/tcp/0"}
2023-04-12T11:22:10.252-0400, INFO, defra.node, Created LibP2P host, {"PeerId": "12D3KooWJWvf29EEKbm49XuvQ4x2LbbWG4BpKweUabgJMALsszcw", "Address": ["/ip4/0.0.0.0/tcp/0"]}
2023-04-12T11:22:10.264-0400, INFO, defra.net, Starting internal broadcaster for pubsub network
2023-04-12T11:22:10.330-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeiaggvx7n2alz7aaronjwebcbrnh4ajnm7vkd2c7plotg5czill54m", "CID": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "Height": 2}
2023-04-12T11:22:10.331-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeieibjbtlp633yis27ycaquu3bx7ywxmij5apcylm4bsa5xbignwaa", "CID": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "Height": 2}
2023-04-12T11:22:10.334-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "CID": "bafybeiaf4dgpsrwwwuew4avlqxgadhttcgbirg6m44ebkwofrbb366yvnu", "Height": 3}
2023-04-12T11:22:10.335-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "CID": "bafybeidxn3ctwtti4e542icqonlua6llh74qqblgbllyvmyxvnkm77skcu", "Height": 3}
2023-04-12T11:22:10.339-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeidxn3ctwtti4e542icqonlua6llh74qqblgbllyvmyxvnkm77skcu", "CID": "bafybeiemznzwy4rehry372qdckm3ko5sjib4q22rhg5ij6k43qhlckoycm", "Height": 4}
2023-04-12T11:22:10.340-0400, INFO, defra.tests.integration, Parsing bootstrap peers, {"Peers": "/ip4/172.27.231.49/tcp/44627/p2p/12D3KooWJWvf29EEKbm49XuvQ4x2LbbWG4BpKweUabgJMALsszcw"}
2023-04-12T11:22:10.340-0400, INFO, defra.tests.integration, Bootstrapping with peers, {"Addresses": [{"ID":"12D3KooWJWvf29EEKbm49XuvQ4x2LbbWG4BpKweUabgJMALsszcw","Addrs":["/ip4/172.27.231.49/tcp/44627"]}]}
2023-04-12T11:22:10.376-0400, INFO, defra.node, Connected, {"Peer ID": "12D3KooWJWvf29EEKbm49XuvQ4x2LbbWG4BpKweUabgJMALsszcw"}
2023-04-12T11:22:10.381-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWHFDJ4fiDs5yoDvQgmLhN5NZDRJXLupiPPsHPTKQTLp2Z", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "JOINED"}
2023-04-12T11:22:10.381-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWHFDJ4fiDs5yoDvQgmLhN5NZDRJXLupiPPsHPTKQTLp2Z", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "JOINED"}
2023-04-12T11:22:10.390-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWJWvf29EEKbm49XuvQ4x2LbbWG4BpKweUabgJMALsszcw", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "JOINED"}
2023-04-12T11:22:10.391-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWJWvf29EEKbm49XuvQ4x2LbbWG4BpKweUabgJMALsszcw", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "JOINED"}
2023-04-12T11:22:10.478-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeiaggvx7n2alz7aaronjwebcbrnh4ajnm7vkd2c7plotg5czill54m", "CID": "bafybeiau3szayl6r2ad62ylte7jkcsxndvx3z7gth4p55wgi5eihlkehzu", "Height": 2}
2023-04-12T11:22:10.479-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeieibjbtlp633yis27ycaquu3bx7ywxmij5apcylm4bsa5xbignwaa", "CID": "bafybeigr5ehntqvbopvumjaaxbl7j2zcocijsjydzkn32i7b55vsrdblne", "Height": 2}
2023-04-12T11:22:10.503-0400, INFO, defra.tests.integration, , {"RequestResults": [{"Age":74,"Name":"Andy","_deleted":false},{"Age":62,"Name":"John","_deleted":true}]}
2023-04-12T11:22:10.506-0400, INFO, defra.tests.integration, , {"RequestResults": [{"Age":74,"Name":"Andy","_deleted":false},{"Age":66,"Name":"John","_deleted":false}]}
2023-04-12T11:22:10.508-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWHFDJ4fiDs5yoDvQgmLhN5NZDRJXLupiPPsHPTKQTLp2Z", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "LEFT"}
2023-04-12T11:22:10.509-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWHFDJ4fiDs5yoDvQgmLhN5NZDRJXLupiPPsHPTKQTLp2Z", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "LEFT"}
2023-04-12T11:22:10.511-0400, INFO, defra.db, Closing DefraDB process...
2023-04-12T11:22:10.642-0400, INFO, defra.db, Successfully closed running process
2023-04-12T11:22:10.642-0400    ERROR   peerstore/ds    pstoreds/keybook.go:63  error when fetching pubkey from datastore for peer 12D3KooWHFDJ4fiDs5yoDvQgmLhN5NZDRJXLupiPPsHPTKQTLp2Z: datastore closed

2023-04-12T11:22:10.653-0400, INFO, defra.db, Closing DefraDB process...
2023-04-12T11:22:10.663-0400    ERROR   dht     go-libp2p-kad-dht@v0.21.1/dht.go:655    failed to validate if peer is a DHT peer        {"peer": "12D3KooWJWvf29EEKbm49XuvQ4x2LbbWG4BpKweUabgJMALsszcw", "error": "datastore closed", "errorVerbose": "datastore closed. Stack: /home/shahzad/~~/defradb/datastore/badger/v3/datastore.go:28 (0x1074487)\n\tinit: ErrClosed      = errors.New(\"datastore closed\")\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6329 (0x482566)\n\tdoInit: f()\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:233 (0x474db3)\n\tmain: doInit(&main_inittask)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/asm_amd64.s:1594 (0x4a90a1)\n\tgoexit: BYTE\t$0x90\t// NOP\n"}
2023-04-12T11:22:10.664-0400    ERROR   dht     go-libp2p-kad-dht@v0.21.1/subscriber_notifee.go:108     could not check peerstore for protocol support: err: datastore closed
2023-04-12T11:22:10.665-0400    ERROR   peerstore/ds    pstoreds/keybook.go:63  error when fetching pubkey from datastore for peer 12D3KooWJWvf29EEKbm49XuvQ4x2LbbWG4BpKweUabgJMALsszcw: datastore closed

2023-04-12T11:22:10.666-0400    ERROR   dht     go-libp2p-kad-dht@v0.21.1/subscriber_notifee.go:108     could not check peerstore for protocol support: err: datastore closed
2023-04-12T11:22:10.669-0400    ERROR   peerstore/ds    pstoreds/keybook.go:81  error while updating pubkey in datastore for peer 12D3KooWJWvf29EEKbm49XuvQ4x2LbbWG4BpKweUabgJMALsszcw: datastore closed

2023-04-12T11:22:10.832-0400    ERROR   dht     go-libp2p-kad-dht@v0.21.1/dht.go:655    failed to validate if peer is a DHT peer        {"peer": "12D3KooWHFDJ4fiDs5yoDvQgmLhN5NZDRJXLupiPPsHPTKQTLp2Z", "error": "datastore closed", "errorVerbose": "datastore closed. Stack: /home/shahzad/~~/defradb/datastore/badger/v3/datastore.go:28 (0x1074487)\n\tinit: ErrClosed      = errors.New(\"datastore closed\")\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6329 (0x482566)\n\tdoInit: f()\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:6306 (0x4824b1)\n\tdoInit: doInit(t2)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/proc.go:233 (0x474db3)\n\tmain: doInit(&main_inittask)\n/home/shahzad/.local/bin/go1.19.5/src/runtime/asm_amd64.s:1594 (0x4a90a1)\n\tgoexit: BYTE\t$0x90\t// NOP\n"}
2023-04-12T11:22:10.832-0400, INFO, defra.db, Successfully closed running process
2023-04-12T11:22:10.834-0400, INFO, defra.tests.integration, , {"Database": "defra-memory-datastore"}
2023-04-12T11:22:10.836-0400, INFO, defra.tests.integration, Starting P2P node, {"P2P address": "/ip4/0.0.0.0/tcp/0"}
2023-04-12T11:22:10.866-0400, INFO, defra.node, Created LibP2P host, {"PeerId": "12D3KooWAYLeSA1z51NqQd1ci639iQZHz3uiBhVFHsr8mfK6i4bM", "Address": ["/ip4/0.0.0.0/tcp/0"]}
2023-04-12T11:22:10.869-0400, INFO, defra.net, Starting internal broadcaster for pubsub network
2023-04-12T11:22:10.872-0400, INFO, defra.tests.integration, Starting P2P node, {"P2P address": "/ip4/0.0.0.0/tcp/0"}
2023-04-12T11:22:10.902-0400, INFO, defra.node, Created LibP2P host, {"PeerId": "12D3KooWGyAE6mf3dyDXnqMBVk32WEUvhSNaj3ku58Tdvn38HdE4", "Address": ["/ip4/0.0.0.0/tcp/0"]}
2023-04-12T11:22:10.911-0400, INFO, defra.net, Starting internal broadcaster for pubsub network
2023-04-12T11:22:10.940-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeiaggvx7n2alz7aaronjwebcbrnh4ajnm7vkd2c7plotg5czill54m", "CID": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "Height": 2}
2023-04-12T11:22:10.941-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeieibjbtlp633yis27ycaquu3bx7ywxmij5apcylm4bsa5xbignwaa", "CID": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "Height": 2}
2023-04-12T11:22:10.942-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "CID": "bafybeiaf4dgpsrwwwuew4avlqxgadhttcgbirg6m44ebkwofrbb366yvnu", "Height": 3}
2023-04-12T11:22:10.943-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "CID": "bafybeidxn3ctwtti4e542icqonlua6llh74qqblgbllyvmyxvnkm77skcu", "Height": 3}
2023-04-12T11:22:10.945-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeidxn3ctwtti4e542icqonlua6llh74qqblgbllyvmyxvnkm77skcu", "CID": "bafybeiemznzwy4rehry372qdckm3ko5sjib4q22rhg5ij6k43qhlckoycm", "Height": 4}
2023-04-12T11:22:10.945-0400, INFO, defra.tests.integration, Parsing bootstrap peers, {"Peers": "/ip4/172.27.231.49/tcp/37733/p2p/12D3KooWGyAE6mf3dyDXnqMBVk32WEUvhSNaj3ku58Tdvn38HdE4"}
2023-04-12T11:22:10.945-0400, INFO, defra.tests.integration, Bootstrapping with peers, {"Addresses": [{"ID":"12D3KooWGyAE6mf3dyDXnqMBVk32WEUvhSNaj3ku58Tdvn38HdE4","Addrs":["/ip4/172.27.231.49/tcp/37733"]}]}
2023-04-12T11:22:10.997-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWGyAE6mf3dyDXnqMBVk32WEUvhSNaj3ku58Tdvn38HdE4", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "JOINED"}
2023-04-12T11:22:10.997-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWGyAE6mf3dyDXnqMBVk32WEUvhSNaj3ku58Tdvn38HdE4", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "JOINED"}
2023-04-12T11:22:10.997-0400, INFO, defra.node, Connected, {"Peer ID": "12D3KooWGyAE6mf3dyDXnqMBVk32WEUvhSNaj3ku58Tdvn38HdE4"}
2023-04-12T11:22:10.998-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWAYLeSA1z51NqQd1ci639iQZHz3uiBhVFHsr8mfK6i4bM", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "JOINED"}
2023-04-12T11:22:10.998-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWAYLeSA1z51NqQd1ci639iQZHz3uiBhVFHsr8mfK6i4bM", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "JOINED"}
2023-04-12T11:22:11.034-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeiaggvx7n2alz7aaronjwebcbrnh4ajnm7vkd2c7plotg5czill54m", "CID": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "Height": 2}
2023-04-12T11:22:11.035-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeieibjbtlp633yis27ycaquu3bx7ywxmij5apcylm4bsa5xbignwaa", "CID": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "Height": 2}
    p2p.go:476:
                Error Trace:    /home/shahzad/~~/defradb/tests/integration/p2p.go:476
                                                        /home/shahzad/~~/defradb/tests/integration/utils2.go:396
                                                        /home/shahzad/~~/defradb/tests/integration/utils2.go:289
                                                        /home/shahzad/~~/defradb/tests/integration/net/state/simple/peer/with_delete_test.go:409
                Error:          timeout occured while waiting for data stream
                Test:           TestP2PWithMultipleDocumentsWithUpdateAndDeleteBeforeConnectSingleDeleteWithShowDeleted
    p2p.go:246:
                Error Trace:    /home/shahzad/~~/defradb/tests/integration/p2p.go:246
                                                        /home/shahzad/.local/bin/go1.19.5/src/runtime/asm_amd64.s:1594
                Error:          Received unexpected error:
                                waiting for pushlog timed out. Stack: /home/shahzad/~~/defradb/node/node.go:330 (0x1c28bf9)
                                        (*Node).WaitForPushLogByPeerEvent: return errors.New("waiting for pushlog timed out")
                                /home/shahzad/~~/defradb/tests/integration/p2p.go:245 (0x1e13f8e)
                                        connectPeers.func1: err := sourceNode.WaitForPushLogByPeerEvent(targetNode.PeerID())
                                /home/shahzad/.local/bin/go1.19.5/src/runtime/asm_amd64.s:1594 (0x4a90a1)
                                        goexit: BYTE    $0x90   // NOP
                Test:           TestP2PWithMultipleDocumentsWithUpdateAndDeleteBeforeConnectSingleDeleteWithShowDeleted
2023-04-12T11:22:21.184-0400, INFO, defra.tests.integration, , {"RequestResults": [{"Age":74,"Name":"Andy","_deleted":false},{"Age":62,"Name":"John","_deleted":true}]}
2023-04-12T11:22:21.210-0400, INFO, defra.tests.integration, , {"RequestResults": [{"Age":74,"Name":"Andy","_deleted":false},{"Age":62,"Name":"John","_deleted":true}]}
    utils2.go:1121:
                Error Trace:    /home/shahzad/~~/defradb/tests/integration/utils2.go:1121
                                                        /home/shahzad/~~/defradb/tests/integration/utils2.go:933
                                                        /home/shahzad/~~/defradb/tests/integration/utils2.go:390
                                                        /home/shahzad/~~/defradb/tests/integration/utils2.go:289
                                                        /home/shahzad/~~/defradb/tests/integration/net/state/simple/peer/with_delete_test.go:409
                Error:          Not equal:
                                expected: false
                                actual  : true
                Test:           TestP2PWithMultipleDocumentsWithUpdateAndDeleteBeforeConnectSingleDeleteWithShowDeleted
                Messages:       node: 0, doc: 1
    utils2.go:1121:
                Error Trace:    /home/shahzad/~~/defradb/tests/integration/utils2.go:1121
                                                        /home/shahzad/~~/defradb/tests/integration/utils2.go:933
                                                        /home/shahzad/~~/defradb/tests/integration/utils2.go:390
                                                        /home/shahzad/~~/defradb/tests/integration/utils2.go:289
                                                        /home/shahzad/~~/defradb/tests/integration/net/state/simple/peer/with_delete_test.go:409
                Error:          Not equal:
                                expected: 0x42
                                actual  : 0x3e
                Test:           TestP2PWithMultipleDocumentsWithUpdateAndDeleteBeforeConnectSingleDeleteWithShowDeleted
                Messages:       node: 0, doc: 1
2023-04-12T11:22:21.260-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWAYLeSA1z51NqQd1ci639iQZHz3uiBhVFHsr8mfK6i4bM", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "LEFT"}
2023-04-12T11:22:21.283-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWAYLeSA1z51NqQd1ci639iQZHz3uiBhVFHsr8mfK6i4bM", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "LEFT"}
2023-04-12T11:22:21.323-0400, INFO, defra.db, Closing DefraDB process...
2023-04-12T11:22:21.323-0400, INFO, defra.db, Successfully closed running process
2023-04-12T11:22:21.338-0400, INFO, defra.db, Closing DefraDB process...
2023-04-12T11:22:21.339-0400, INFO, defra.db, Successfully closed running process

@shahzadlone shahzadlone added the bug Something isn't working label Apr 12, 2023
@shahzadlone shahzadlone added this to the DefraDB v0.5.1 milestone Apr 12, 2023
@AndrewSisley
Copy link
Contributor

AndrewSisley commented Apr 13, 2023

Spotted something today in the log below. Comparing the two datastore type runs, you can see the first (badger-IM) fails, and the second (defra-IM) succeeds. Perhaps significant is the order in which the DAG heads are replaced - the failed run replaces
"bafybeidxn3ctwtti4e542icqonlua6llh74qqblgbllyvmyxvnkm77skcu" with "bafybeiemznzwy4rehry372qdckm3ko5sjib4q22rhg5ij6k43qhlckoycm" last, but in the successful run it is replaced first.

2023-04-13T17:52:45.051-0400, INFO, defra.tests.integration, , {"Database": "badger-in-memory"}
2023-04-13T17:52:45.203-0400, INFO, defra.tests.integration, Starting P2P node, {"P2P address": "/ip4/0.0.0.0/tcp/0"}
2023-04-13T17:52:45.209-0400, INFO, defra.node, Created LibP2P host, {"PeerId": "12D3KooWFPAFkR8nspDn3BKBUTtTNgTjcSWHFyNtp15PiMaPPCGa", "Address": ["/ip4/0.0.0.0/tcp/0"]}
2023-04-13T17:52:45.212-0400, INFO, defra.net, Starting internal broadcaster for pubsub network
2023-04-13T17:52:45.222-0400, INFO, defra.tests.integration, Starting P2P node, {"P2P address": "/ip4/0.0.0.0/tcp/0"}
2023-04-13T17:52:45.241-0400, INFO, defra.node, Created LibP2P host, {"PeerId": "12D3KooWFiqvYK39A6WW65hZJMjmDUW2TJFf9KsydG3CUZx3jPiQ", "Address": ["/ip4/0.0.0.0/tcp/0"]}
2023-04-13T17:52:45.243-0400, INFO, defra.net, Starting internal broadcaster for pubsub network
2023-04-13T17:52:45.249-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeiaggvx7n2alz7aaronjwebcbrnh4ajnm7vkd2c7plotg5czill54m", "CID": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "Height": 2}
2023-04-13T17:52:45.249-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeieibjbtlp633yis27ycaquu3bx7ywxmij5apcylm4bsa5xbignwaa", "CID": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "Height": 2}
2023-04-13T17:52:45.249-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "CID": "bafybeiaf4dgpsrwwwuew4avlqxgadhttcgbirg6m44ebkwofrbb366yvnu", "Height": 3}
2023-04-13T17:52:45.250-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "CID": "bafybeidxn3ctwtti4e542icqonlua6llh74qqblgbllyvmyxvnkm77skcu", "Height": 3}
2023-04-13T17:52:45.250-0400, INFO, defra.tests.integration, Parsing bootstrap peers, {"Peers": "/ip4/10.88.111.10/tcp/45119/p2p/12D3KooWFiqvYK39A6WW65hZJMjmDUW2TJFf9KsydG3CUZx3jPiQ"}
2023-04-13T17:52:45.250-0400, INFO, defra.tests.integration, Bootstrapping with peers, {"Addresses": [{"ID":"12D3KooWFiqvYK39A6WW65hZJMjmDUW2TJFf9KsydG3CUZx3jPiQ","Addrs":["/ip4/10.88.111.10/tcp/45119"]}]}
2023-04-13T17:52:45.255-0400, INFO, defra.node, Connected, {"Peer ID": "12D3KooWFiqvYK39A6WW65hZJMjmDUW2TJFf9KsydG3CUZx3jPiQ"}
2023-04-13T17:52:45.256-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWFPAFkR8nspDn3BKBUTtTNgTjcSWHFyNtp15PiMaPPCGa", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "JOINED"}
2023-04-13T17:52:45.256-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWFPAFkR8nspDn3BKBUTtTNgTjcSWHFyNtp15PiMaPPCGa", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "JOINED"}
2023-04-13T17:52:45.257-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWFiqvYK39A6WW65hZJMjmDUW2TJFf9KsydG3CUZx3jPiQ", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "JOINED"}
2023-04-13T17:52:45.257-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWFiqvYK39A6WW65hZJMjmDUW2TJFf9KsydG3CUZx3jPiQ", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "JOINED"}
2023-04-13T17:52:45.288-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeieibjbtlp633yis27ycaquu3bx7ywxmij5apcylm4bsa5xbignwaa", "CID": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "Height": 2}
2023-04-13T17:52:45.288-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeiaggvx7n2alz7aaronjwebcbrnh4ajnm7vkd2c7plotg5czill54m", "CID": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "Height": 2}
2023-04-13T17:52:45.358-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeidxn3ctwtti4e542icqonlua6llh74qqblgbllyvmyxvnkm77skcu", "CID": "bafybeiemznzwy4rehry372qdckm3ko5sjib4q22rhg5ij6k43qhlckoycm", "Height": 4}
2023-04-13T17:52:45.360-0400, INFO, defra.tests.integration, , {"RequestResults": [{"Age":74,"Name":"Andy","_deleted":false},{"Age":62,"Name":"John","_deleted":true}]}
2023-04-13T17:52:45.360-0400, INFO, defra.tests.integration, , {"RequestResults": [{"Age":74,"Name":"Andy","_deleted":false},{"Age":62,"Name":"John","_deleted":false}]}
2023-04-13T17:52:45.361-0400, INFO, defra.db, Closing DefraDB process...
2023-04-13T17:52:45.363-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWFPAFkR8nspDn3BKBUTtTNgTjcSWHFyNtp15PiMaPPCGa", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "LEFT"}
2023-04-13T17:52:45.364-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWFPAFkR8nspDn3BKBUTtTNgTjcSWHFyNtp15PiMaPPCGa", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "LEFT"}
2023-04-13T17:52:45.368-0400, INFO, defra.db, Successfully closed running process
2023-04-13T17:52:45.369-0400	ERROR	dht	go-libp2p-kad-dht@v0.21.1/dht.go:655	failed to validate if peer is a DHT peer	{"peer": "12D3KooWFiqvYK39A6WW65hZJMjmDUW2TJFf9KsydG3CUZx3jPiQ", "error": "datastore closed", "errorVerbose": "datastore closed. Stack: /home/andy/Projects/Source/defradb/datastore/badger/v3/datastore.go:28 (0xb9959f)\n\tinit: ErrClosed      = errors.New(\"datastore closed\")\n/usr/local/go/src/runtime/proc.go:6329 (0x44bf86)\n\tdoInit: f()\n/usr/local/go/src/runtime/proc.go:6306 (0x44bed1)\n\tdoInit: doInit(t2)\n/usr/local/go/src/runtime/proc.go:6306 (0x44bed1)\n\tdoInit: doInit(t2)\n/usr/local/go/src/runtime/proc.go:6306 (0x44bed1)\n\tdoInit: doInit(t2)\n/usr/local/go/src/runtime/proc.go:6306 (0x44bed1)\n\tdoInit: doInit(t2)\n/usr/local/go/src/runtime/proc.go:233 (0x43eb93)\n\tmain: doInit(&main_inittask)\n/usr/local/go/src/runtime/asm_amd64.s:1594 (0x470e81)\n\tgoexit: BYTE\t$0x90\t// NOP\n"}
2023-04-13T17:52:45.369-0400, INFO, defra.db, Closing DefraDB process...
2023-04-13T17:52:45.370-0400	ERROR	peerstore/ds	pstoreds/keybook.go:63	error when fetching pubkey from datastore for peer 12D3KooWFiqvYK39A6WW65hZJMjmDUW2TJFf9KsydG3CUZx3jPiQ: datastore closed

2023-04-13T17:52:45.370-0400	ERROR	peerstore/ds	pstoreds/keybook.go:81	error while updating pubkey in datastore for peer 12D3KooWFiqvYK39A6WW65hZJMjmDUW2TJFf9KsydG3CUZx3jPiQ: datastore closed

2023-04-13T17:52:45.370-0400	ERROR	dht	go-libp2p-kad-dht@v0.21.1/subscriber_notifee.go:108	could not check peerstore for protocol support: err: datastore closed
2023-04-13T17:52:45.370-0400	ERROR	dht	go-libp2p-kad-dht@v0.21.1/subscriber_notifee.go:108	could not check peerstore for protocol support: err: datastore closed
2023-04-13T17:52:45.374-0400, INFO, defra.db, Successfully closed running process
2023-04-13T17:52:45.374-0400, INFO, defra.tests.integration, , {"Database": "defra-memory-datastore"}
2023-04-13T17:52:45.374-0400, INFO, defra.tests.integration, Starting P2P node, {"P2P address": "/ip4/0.0.0.0/tcp/0"}
2023-04-13T17:52:45.396-0400, INFO, defra.node, Created LibP2P host, {"PeerId": "12D3KooWD6VbHkt8WDCeQf6fhu4zik9aY4pwDFuLW5ZcLjxk4eNg", "Address": ["/ip4/0.0.0.0/tcp/0"]}
2023-04-13T17:52:45.397-0400, INFO, defra.net, Starting internal broadcaster for pubsub network
2023-04-13T17:52:45.398-0400, INFO, defra.tests.integration, Starting P2P node, {"P2P address": "/ip4/0.0.0.0/tcp/0"}
2023-04-13T17:52:45.403-0400, INFO, defra.node, Created LibP2P host, {"PeerId": "12D3KooWRs7XhEWZtjkgbuVuRi8ZVUVZV56fxnhpjjJnEZKgX24z", "Address": ["/ip4/0.0.0.0/tcp/0"]}
2023-04-13T17:52:45.404-0400, INFO, defra.net, Starting internal broadcaster for pubsub network
2023-04-13T17:52:45.408-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeiaggvx7n2alz7aaronjwebcbrnh4ajnm7vkd2c7plotg5czill54m", "CID": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "Height": 2}
2023-04-13T17:52:45.408-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeieibjbtlp633yis27ycaquu3bx7ywxmij5apcylm4bsa5xbignwaa", "CID": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "Height": 2}
2023-04-13T17:52:45.409-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "CID": "bafybeiaf4dgpsrwwwuew4avlqxgadhttcgbirg6m44ebkwofrbb366yvnu", "Height": 3}
2023-04-13T17:52:45.409-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "CID": "bafybeidxn3ctwtti4e542icqonlua6llh74qqblgbllyvmyxvnkm77skcu", "Height": 3}
2023-04-13T17:52:45.409-0400, INFO, defra.tests.integration, Parsing bootstrap peers, {"Peers": "/ip4/10.88.111.10/tcp/40365/p2p/12D3KooWRs7XhEWZtjkgbuVuRi8ZVUVZV56fxnhpjjJnEZKgX24z"}
2023-04-13T17:52:45.409-0400, INFO, defra.tests.integration, Bootstrapping with peers, {"Addresses": [{"ID":"12D3KooWRs7XhEWZtjkgbuVuRi8ZVUVZV56fxnhpjjJnEZKgX24z","Addrs":["/ip4/10.88.111.10/tcp/40365"]}]}
2023-04-13T17:52:45.412-0400, INFO, defra.node, Connected, {"Peer ID": "12D3KooWRs7XhEWZtjkgbuVuRi8ZVUVZV56fxnhpjjJnEZKgX24z"}
2023-04-13T17:52:45.413-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWRs7XhEWZtjkgbuVuRi8ZVUVZV56fxnhpjjJnEZKgX24z", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "JOINED"}
2023-04-13T17:52:45.413-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWRs7XhEWZtjkgbuVuRi8ZVUVZV56fxnhpjjJnEZKgX24z", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "JOINED"}
2023-04-13T17:52:45.414-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWD6VbHkt8WDCeQf6fhu4zik9aY4pwDFuLW5ZcLjxk4eNg", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "JOINED"}
2023-04-13T17:52:45.414-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWD6VbHkt8WDCeQf6fhu4zik9aY4pwDFuLW5ZcLjxk4eNg", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "JOINED"}
2023-04-13T17:52:45.513-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeidxn3ctwtti4e542icqonlua6llh74qqblgbllyvmyxvnkm77skcu", "CID": "bafybeiemznzwy4rehry372qdckm3ko5sjib4q22rhg5ij6k43qhlckoycm", "Height": 4}
2023-04-13T17:52:45.525-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeiaggvx7n2alz7aaronjwebcbrnh4ajnm7vkd2c7plotg5czill54m", "CID": "bafybeif5apzvwrk5jnvtbjmruq6uqw2ujvsdo424wtauxanekzgushp4y4", "Height": 2}
2023-04-13T17:52:45.525-0400, INFO, defra.merkleclock, Replacing DAG head, {"Old": "bafybeieibjbtlp633yis27ycaquu3bx7ywxmij5apcylm4bsa5xbignwaa", "CID": "bafybeigjsmbmbs5oo6623sedrczuxxz4k57cliupa4noxxt47zi2mx46my", "Height": 2}
2023-04-13T17:52:45.528-0400, INFO, defra.tests.integration, , {"RequestResults": [{"Age":74,"Name":"Andy","_deleted":false},{"Age":62,"Name":"John","_deleted":true}]}
2023-04-13T17:52:45.529-0400, INFO, defra.tests.integration, , {"RequestResults": [{"Age":74,"Name":"Andy","_deleted":false},{"Age":62,"Name":"John","_deleted":true}]}
2023-04-13T17:52:45.529-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWD6VbHkt8WDCeQf6fhu4zik9aY4pwDFuLW5ZcLjxk4eNg", "Topic": "bae-e45fa288-797b-5b12-b08a-f673f70b7ee5", "Message": "LEFT"}
2023-04-13T17:52:45.529-0400, INFO, defra.net, Received new pubsub event, {"SenderId": "12D3KooWD6VbHkt8WDCeQf6fhu4zik9aY4pwDFuLW5ZcLjxk4eNg", "Topic": "bae-b4a08ad3-8970-5511-8048-ba1b873008dc", "Message": "LEFT"}
2023-04-13T17:52:45.530-0400, INFO, defra.db, Closing DefraDB process...
2023-04-13T17:52:45.530-0400, INFO, defra.db, Successfully closed running process
2023-04-13T17:52:45.531-0400, INFO, defra.db, Closing DefraDB process...
2023-04-13T17:52:45.531-0400, INFO, defra.db, Successfully closed running process
--- FAIL: TestP2PWithMultipleDocumentsWithMultipleUpdatesBeforeConnectSingleDeleteWithShowDeleted (0.48s)
    utils2.go:1121: 
        	Error Trace:	/home/andy/Projects/Source/defradb/tests/integration/utils2.go:1121
        	            				/home/andy/Projects/Source/defradb/tests/integration/utils2.go:933
        	            				/home/andy/Projects/Source/defradb/tests/integration/utils2.go:390
        	            				/home/andy/Projects/Source/defradb/tests/integration/utils2.go:289
        	            				/home/andy/Projects/Source/defradb/tests/integration/net/state/simple/peer/with_delete_test.go:294
        	Error:      	Not equal: 
        	            	expected: true
        	            	actual  : false
        	Test:       	TestP2PWithMultipleDocumentsWithMultipleUpdatesBeforeConnectSingleDeleteWithShowDeleted
        	Messages:   	node: 1, doc: 1

EDIT: I saw this exact same pattern a second time today, with the same reversal of heads - hopefully it should be easy enough to track down.

EDIT2: This mode of failure appears much more common on my machine than in the CI, it could be that a faster test run is more likely to hit this particular issue, whereas a slower run is more likely to generate a timeout.

@AndrewSisley
Copy link
Contributor

AndrewSisley commented Apr 14, 2023

I see 3 types of failures:

  1. Timeout with queries returning correct data (common in CI)
  2. Timeout with queries returning incorrect data (semi-common in CI on develop, not able to reproduce on branch sisley/fix/I1338-sub-before-emit so far, suggesting at least partial fix)
  3. Queries return incorrect data (rare in CI, fairly common on my local machine)

My current timeout theory for some of the timeout with incorrect data is that there is a race condition for newly synced documents. A document-topic will only be subscribed to after the create-event has almost finished processing - this means that if the update/delete event is received before the topic-subscription the update/delete will never sync, and the pushLogEmitter will never emit an EvtReceivedPushLog - causing a timeout in most cases. It is worth noting however, that this mode of failure is not relevant for many tests as they do not rely on Create event to setup the document-topics (many failing tests create the docs before configuration and rely on listAllDocKeys in newServer).

I do not understand why sisley/fix/I1338-sub-before-emit might, and appears to affect things - it just feels like there is a race condition there, and more time should be spent to investigate this line of thought.

EDIT: https://github.com/sourcenetwork/defradb/actions/runs/4704788576/jobs/8345352190 failed with mode (2), on a branch based off of sisley/fix/I1338-sub-before-emit (https://github.com/sourcenetwork/defradb/compare/sisley/fix/I1338-buf-size?expand=1) so if it does affect the rate of that mode of failure it is not completely curing it.

EDIT2:

Also noteworthy is that only P2P update tests failed during these runs, and most of them were TestP2PWithMultipleDocumentUpdatesPerNode failing. No delete test failures were recorded (verify they are in the package :)).

This to me suggests the bulk of the CI timeouts may be caused by a deadlock once the event buffer is filled. More testing and investigation required. If true this will likely affect production.

@AndrewSisley
Copy link
Contributor

EventBus related issue should have been fixed in #1359

P2P test timeouts are still occuring however, so there are other root causes.

@AndrewSisley AndrewSisley removed their assignment Apr 17, 2023
@fredcarle fredcarle self-assigned this Apr 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants