-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One in ten peer TMGetLedger requests do not return a response #311
Comments
Guess these lines answer my question: which seems to trace back to one or more of the items in the job queue satisfying this algorithm: Not sure what the magic constant 4 represents, other than 4 times over the desired limit. Would be interesting to trace these overload moments to see if the server was definitely overloaded... Just seems a bit weird that the failure rate is constant at roughly 10%. |
What are you trying to accomplish exactly? Why are you making so many peers fetch so many ledgers? Whatever information you're trying to get, there's probably a sensible way to get it, and that isn't it. You're lucky any servers are replying to you at all. |
I'm writing a thin client that operates on the peer network and listens to the activity of nodes acting on it and records the ledger, transactions and account states in an efficient manner. The peer protocol is suggested as a means of achieving this type of goal: https://ripple.com/wiki/API_Overview I'm doing this to open up the ledger for detailed analysis and browse-ability by the Ripple community. Currently the rippled implementation takes too long to acquire the full ledger locally (many months). The data is key to understanding trends, trading activity, payment paths and effectively deciding on market-making and arbitrage strategies. The ripplecharts data isn't granular enough for some, if not most, use cases.
The advantage of a peer to peer network is surely to distribute the load of an operation over as many computers as possible (cf. bitcoin, bittorrent, gnutella, et al) and ensure resilience of the payloads. My code distributes the cost of downloading the ledger over as many machines as possible. It requests any single ledger from a single node. If, after a period of time, the ledger is not received it tries another node. It does not make all peers return all ledgers, if that was your assumption. It only requests ledgers that the node advertises as holding in its TMHello message and subsequent TMStatusChange messages.
Well, the ledger_data command didn't exist when I became frustrated at the time required to acquire the full ledger and the associated storage requirements of rippled, so I chose this path. Sockets are much more efficient than HTTP header parsing for every request and websockets are less amenable to connecting to many clients. Also there is no API for acquiring the full peer network to connect to, other than through the peer protocol.
Why? I've closely followed the rippled code and am not sending any spurious data to the network. The fetch pack method of synchronising appears inefficient to me, sending the account state deltas for every ledger. As far as I can see, all the information required to reconstruct the ledger is contained in the metadata section of the transaction leaf nodes and I'm not requesting any of the account state nodes, which is a much,much larger tree. The amount of work that a rippled node has to do to respond to my code's requests is less than another rippled node making a fetch pack request. It could be more efficient if there were TMGetLedgerRange and a TMGetTransactions messages... I strongly believe that Ripple will benefit from alternative implementations making use of the peer protocol and those alternatives being open source as well. I fully intend to open source the majority of my code once that is functional and I am confident that it is not a toolkit for others to DDOS the network. If this means that you have to formalise the peer/sockets API, is that a bad thing? I've submitted bug reports and errant node behaviour as a result of my work, surely this is useful too. I hope that explains the context and motivation of what I'm doing. All the best! As an addendum to be crystal clear of the code's process: A bitset is created of the ledger range (currently 32,570-5,6XX,XXX). All ledgers not present in the database are requested in batches of a 1000. Each request is routed to a single node with a qualifying advertised ledger range availability. The requests are rate-limited to n/second. If the node responds with the ledger, the transactions state is recursed using TMGetObjectsByHash for each level of the tree (not very deep compared to the account state tree - which is not requested). These requests are rate-limited also. The requests are batched to contain all hashes for the previous inner node in one request. For each successful ledger and associated transaction set retrieved the bit in the bitset is set, and not requested again. |
1fdd726 Hotfix RocksDB 3.5 d67500a Add `make install` to Makefile in 3.5.fb. 4cb631a update HISTORY.md cfd0946 comments about the BlockBasedTableOptions migration in Options REVERT: 25888ae Merge pull request XRPLF#329 from fyrz/master REVERT: 89833e5 Fixed signed-unsigned comparison warning in db_test.cc REVERT: fcac705 Fixed compile warning on Mac caused by unused variables. REVERT: b3343fd resolution for java build problem introduced by 5ec53f3 REVERT: 187b299 ForwardIterator: update prev_key_ only if prefix hasn't changed REVERT: 5ec53f3 make compaction related options changeable REVERT: d122e7b Update INSTALL.md REVERT: 986dad0 Merge pull request XRPLF#324 from dalgaaf/wip-da-SCA-20140930 REVERT: 8ee75dc db/memtable.cc: remove unused variable merge_result REVERT: 0fd8bbc db/db_impl.cc: reduce scope of prefix_initialized REVERT: 676ff7b compaction_picker.cc: remove check for >=0 for unsigned REVERT: e55aea5 document_db.cc: fix assert REVERT: d517c83 in_table_factory.cc: use correct format specifier REVERT: b140375 ttl/ttl_test.cc: prefer prefix ++operator for non-primitive types REVERT: 43c789c spatialdb/spatial_db.cc: use !empty() instead of 'size() > 0' REVERT: 0de452e document_db.cc: pass const parameter by reference REVERT: 4cc8643 util/ldb_cmd.cc: prefer prefix ++operator for non-primitive types REVERT: af8c2b2 util/signal_test.cc: suppress intentional null pointer deref REVERT: 33580fa db/db_impl.cc: fix object handling, remove double lines REVERT: 873f135 db_ttl_impl.h: pass func parameter by reference REVERT: 8558457 ldb_cmd_execute_result.h: perform init in initialization list REVERT: 063471b table/table_test.cc: pass func parameter by reference REVERT: 93548ce table/cuckoo_table_reader.cc: pass func parameter by ref REVERT: b8b7117 db/version_set.cc: use !empty() instead of 'size() > 0' REVERT: 8ce050b table/bloom_block.*: pass func parameter by reference REVERT: 53910dd db_test.cc: pass parameter by reference REVERT: 68ca534 corruption_test.cc: pass parameter by reference REVERT: 7506198 cuckoo_table_db_test.cc: add flush after delete REVERT: 1f96330 Print MB per second compaction throughput separately for reads and writes REVERT: ffe3d49 Add an instruction about SSE in INSTALL.md REVERT: ee1f3cc Package generation for Ubuntu and CentOS REVERT: f0f7955 Fixing comile errors on OS X REVERT: 99fb613 remove 2 space linter REVERT: b2d64a4 Fix linters, second try REVERT: 747523d Print per column family metrics in db_bench REVERT: 56ebd40 Fix arc lint (should fix XRPLF#238) REVERT: 637f891 Merge pull request XRPLF#321 from eonnen/master REVERT: 827e31c Make test use a compatible type in the size checks. REVERT: fd5d80d CompactedDB: log using the correct info_log REVERT: 2faf49d use GetContext to replace callback function pointer REVERT: 983d2de Add AUTHORS file. Fix XRPLF#203 REVERT: abd70c5 Merge pull request XRPLF#316 from fyrz/ReverseBytewiseComparator REVERT: 2dc6f62 handle kDelete type in cuckoo builder REVERT: 8b8011a Changed name of ReverseBytewiseComparator based on review comment REVERT: 389edb6 universal compaction picker: use double for potential overflow REVERT: 5340484 Built-in comparator(s) in RocksJava REVERT: d439451 delay initialization of cuckoo table iterator REVERT: 94997ea reduce memory usage of cuckoo table builder REVERT: c627595 improve memory efficiency of cuckoo reader REVERT: 581442d option to choose module when calculating CuckooTable hash REVERT: fbd2daf CompactedDBImpl::MultiGet() for better CuckooTable performance REVERT: 3c68006 CompactedDBImpl REVERT: f7375f3 Fix double deletes REVERT: 21ddcf6 Remove allow_thread_local REVERT: fb4a492 Merge pull request XRPLF#311 from ankgup87/master REVERT: 611e286 Merge branch 'master' of https://github.com/facebook/rocksdb REVERT: 0103b44 Merge branch 'master' of ssh://github.com/ankgup87/rocksdb REVERT: 1dfb7bb Add block based table config options REVERT: cdaf44f Enlarge log size cap when printing file summary REVERT: 7cc1ed7 Merge pull request XRPLF#309 from naveenatceg/staticbuild REVERT: ba6d660 Resolving merge conflict REVERT: 51eeaf6 Addressing review comments REVERT: fd7d3fe Addressing review comments (adding a env variable to override temp directory) REVERT: cf7ace8 Addressing review comments REVERT: 0a29ce5 re-enable BlockBasedTable::SetupForCompaction() REVERT: 55af370 Remove TODO for checking index checksums REVERT: 3d74f09 Fix compile REVERT: 53b0039 Fix release compile REVERT: d0de413 WriteBatchWithIndex to allow different Comparators for different column families REVERT: 57a32f1 change target_file_size_base to uint64_t REVERT: 5e6aee4 dont create backup_input if compaction filter v2 is not used REVERT: 49b5f94 Merge pull request XRPLF#306 from Liuchang0812/fix_cast REVERT: 787cb4d remove cast, replace %llu with % PRIu64 REVERT: a7574d4 Update logging.cc REVERT: 7e0dcb9 Update logging.cc REVERT: 57fa3cc Merge pull request XRPLF#304 from Liuchang0812/fix-check REVERT: cd44522 Merge pull request XRPLF#305 from Liuchang0812/fix-logging REVERT: 6a031b6 remove unused variable REVERT: 4436f17 fixed XRPLF#303: replace %ld with % PRId64 REVERT: 7a1bd05 Merge pull request XRPLF#302 from ankgup87/master REVERT: 423e52c Merge branch 'master' of https://github.com/facebook/rocksdb REVERT: bfeef94 Add rate limiter REVERT: 32f2532 Print compression_size_percent as a signed int REVERT: 976caca Skip AllocateTest if fallocate() is not supported in the file system REVERT: 3b897cd Enable no-fbcode RocksDB build REVERT: f445947 RocksDB: Format uint64 using PRIu64 in db_impl.cc REVERT: e17bc65 Merge pull request XRPLF#299 from ankgup87/master REVERT: b93797a Fix build REVERT: adae3ca [Java] Fix JNI link error caused by the removal of options.db_stats_log_interval REVERT: 90b8c07 Fix unit tests errors REVERT: 51af7c3 CuckooTable: add one option to allow identity function for the first hash function REVERT: 0350435 Fixed a signed-unsigned comparison in spatial_db.cc -- issue XRPLF#293 REVERT: 2fb1fea Fix syncronization issues REVERT: ff76895 Remove some unnecessary constructors REVERT: feadb9d fix cuckoo table builder test REVERT: 3c232e1 Fix mac compile REVERT: 54cada9 Run make format on PR XRPLF#249 REVERT: 27b22f1 Merge pull request XRPLF#249 from tdfischer/decompression-refactoring REVERT: fb6456b Replace naked calls to operator new and delete (Fixes XRPLF#222) REVERT: 5600c8f cuckoo table: return estimated size - 1 REVERT: a062e1f SetOptions() for memtable related options REVERT: e4eca6a Options conversion function for convenience REVERT: a7c2094 Merge pull request XRPLF#292 from saghmrossi/master REVERT: 4d05234 Merge branch 'master' of github.com:saghmrossi/rocksdb REVERT: 60a4aa1 Test use_mmap_reads REVERT: 94e43a1 [Java] Fixed 32-bit overflowing issue when converting jlong to size_t REVERT: f9eaaa6 added include for inttypes.h to fix nonworking printf statements REVERT: f090575 Replaced "built on on earlier work" by "built on earlier work" in README.md REVERT: faad439 Fix XRPLF#284 REVERT: 49aacd8 Fix make install REVERT: acb9348 [Java] Include WriteBatch into RocksDBSample.java, fix how DbBenchmark.java handles WriteBatch. REVERT: 4a27a2f Don't sync manifest when disableDataSync = true REVERT: 9b8480d Merge pull request XRPLF#287 from yinqiwen/rate-limiter-crash-fix REVERT: 28be16b fix rate limiter crash XRPLF#286 REVERT: 04ce1b2 Fix XRPLF#284 REVERT: add22e3 standardize scripts to run RocksDB benchmarks REVERT: dee91c2 WriteThread REVERT: 540a257 Fix WAL synced REVERT: 24f034b Merge pull request XRPLF#282 from Chilledheart/develop REVERT: 49fe329 Fix build issue under macosx REVERT: ebb5c65 Add make install REVERT: 0352a9f add_wrapped_bloom_test REVERT: 9c0e66c Don't run background jobs (flush, compactions) when bg_error_ is set REVERT: a9639bd Fix valgrind test REVERT: d1f24dc Relax FlushSchedule test REVERT: 3d9e6f7 Push model for flushing memtables REVERT: 059e584 [unit test] CompactRange should fail if we don't have space REVERT: dd641b2 fix RocksDB java build REVERT: 53404d9 add_qps_info_in cache bench REVERT: a52cecb Fix Mac compile REVERT: 092f97e Fix comments and typos REVERT: 6cc1286 Added a few statistics for BackupableDB REVERT: 0a42295 Fix SimpleWriteTimeoutTest REVERT: 06d9862 Always pass MergeContext as pointer, not reference REVERT: d343c3f Improve db recovery REVERT: 6bb7e3e Merger test REVERT: 88841bd Explicitly cast char to signed char in Hash() REVERT: 5231146 MemTableOptions REVERT: 1d284db Addressing review comments REVERT: 55114e7 Some updates for SpatialDB REVERT: 171d4ff remove TailingIterator reference in db_impl.h REVERT: 9b0f7ff rename version_set options_ to db_options_ to avoid confusion REVERT: 2d57828 Check stop level trigger-0 before slowdown level-0 trigger REVERT: 659d2d5 move compaction_filter to immutable_options REVERT: 048560a reduce references to cfd->options() in DBImpl REVERT: 011241b DB::Flush() Do not wait for background threads when there is nothing in mem table REVERT: a2bb7c3 Push- instead of pull-model for managing Write stalls REVERT: 0af157f Implement full filter for block based table. REVERT: 9360cc6 Fix valgrind issue REVERT: 02d5bff Merge pull request XRPLF#277 from wankai/master REVERT: 88a2f44 fix comments REVERT: 7c16e39 Merge pull request XRPLF#276 from wankai/master REVERT: 8237738 replace hard-coded number with named variable REVERT: db8ca52 Merge pull request XRPLF#273 from nbougalis/static-analysis REVERT: b7b031f Merge pull request XRPLF#274 from wankai/master REVERT: 4c2b1f0 Merge remote-tracking branch 'upstream/master' REVERT: a5d2863 typo improvement REVERT: 9f8aa09 Don't leak data returned by opendir REVERT: d1cfb71 Remove unused member(s) REVERT: bfee319 sizeof(int*) where sizeof(int) was intended REVERT: d40c1f7 Add missing break statement REVERT: 2e97c38 Avoid off-by-one error when using readlink REVERT: 40ddc3d add cache bench REVERT: 9f1c80b Drop column family from write thread REVERT: 8de151b Add db_bench with lots of column families to regression tests REVERT: c9e419c rename options_ to db_options_ in DBImpl to avoid confusion REVERT: 5cd0576 Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method. REVERT: 0fbb3fa fixed memory leak in unit test DBIteratorBoundTest REVERT: adcd253 fix asan check REVERT: 4092b7a Merge pull request XRPLF#272 from project-zerus/patch-1 REVERT: bb6ae0f fix more compile warnings REVERT: 6d31441 Merge pull request XRPLF#271 from nbougalis/cleanups REVERT: 0cd0ec4 Plug memory leak during index creation REVERT: 4329d74 Fix swapped variable names to accurately reflect usage REVERT: 45a5e3e Remove path with arena==nullptr from NewInternalIterator REVERT: 5665e5e introduce ImmutableOptions REVERT: e0b99d4 created a new ReadOptions parameter 'iterate_upper_bound' REVERT: 51ea889 Fix travis builds REVERT: a481626 Relax backupable rate limiting test REVERT: f7f973d Merge pull request XRPLF#269 from huahang/patch-2 REVERT: ef5b384 fix a few compile warnings REVERT: 2fd3806 Merge pull request XRPLF#263 from wankai/master REVERT: 1785114 delete unused Comparator REVERT: 1b1d961 update HISTORY.md REVERT: 703c3ea comments about the BlockBasedTableOptions migration in Options REVERT: 4b5ad88 Merge pull request XRPLF#260 from wankai/master REVERT: 19cc588 change to filter_block std::unique_ptr support RAII REVERT: 9b976e3 Merge pull request XRPLF#259 from wankai/master REVERT: 5d25a46 Merge remote-tracking branch 'upstream/master' REVERT: dff2b1a typo improvement REVERT: 343e98a Reverting import change REVERT: ddb8039 RocksDB static build Make file changes to download and build the dependencies .Load the shared library when RocksDB is initialized git-subtree-dir: src/rocksdb2 git-subtree-split: 1fdd726a8254c13d0c66d8db8130ad17c13d7bcc
If the server is not going to reply to you, it simply doesn't reply. The query logic rippled uses handles this case correctly. The peer ledger exchange logic is built on the assumption that servers are peers and are sharing work equally. It's not designed to tolerate a peer that makes other peers do all the work. You should issue such queries to your own rippled that's participating in the network, not to other people's rippleds that have no obligation to do extra work for you. There are really only two alternatives. A server could never ever be allowed to refuse a request. That's an obvious non-starter. Alternatively, a server could be required to always send an active rejection of a request if it wasn't going to reply to it. That would add overhead in particularly the case where the server is trying to minimize overhead. And the intention of delaying the response is to force the queries to back off (to time out waiting for it). It doesn't seem to make sense to shift this work to the server that receives the query when the whole intention of the design is to put as much work on the querier as possible. |
The peer network protocol is undocumented, so I gave up on this some years ago. |
Hi,
I'm pulling the ledger and associated transactions via the peer network. I'm issuing TMGetLedger requests with a specified LedgerSeq field and the TMLedgerInfoType set to liBase. These commands are sent to, on average 18 peers with qualifying ledger ranges, at a combined rate of 20 requests/second. That is, roughly one ledger request per client per second.
I'm finding that approximately 1 in 10 of these requests never receives a response. It's not particular to any one other node. Is this by design? Dropping requests that the node does not want to handle, or is it a bug that I need to trace? I'm currently patching the gaps by reissuing the requests later, which works fine.
Cheers!
The text was updated successfully, but these errors were encountered: