Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One in ten peer TMGetLedger requests do not return a response #311

Closed
donovanhide opened this issue Mar 20, 2014 · 5 comments
Closed

One in ten peer TMGetLedger requests do not return a response #311

donovanhide opened this issue Mar 20, 2014 · 5 comments
Assignees

Comments

@donovanhide
Copy link
Contributor

Hi,

I'm pulling the ledger and associated transactions via the peer network. I'm issuing TMGetLedger requests with a specified LedgerSeq field and the TMLedgerInfoType set to liBase. These commands are sent to, on average 18 peers with qualifying ledger ranges, at a combined rate of 20 requests/second. That is, roughly one ledger request per client per second.

I'm finding that approximately 1 in 10 of these requests never receives a response. It's not particular to any one other node. Is this by design? Dropping requests that the node does not want to handle, or is it a bug that I need to trace? I'm currently patching the gaps by reissuing the requests later, which works fine.

Cheers!

@donovanhide donovanhide changed the title One in ten peer TMGetLedger requests does not return a response One in ten peer TMGetLedger requests do not return a response Mar 20, 2014
@donovanhide
Copy link
Contributor Author

Guess these lines answer my question:

https://github.com/ripple/rippled/blob/8daecb543066040f27b2868e117a4f64677aaff0/src/ripple_overlay/impl/PeerImp.h#L2263-L2267

which seems to trace back to one or more of the items in the job queue satisfying this algorithm:

https://github.com/ripple/rippled/blob/a865149c6551b198ecded8d794df5fe908942bbe/src/ripple_core/functional/LoadMonitor.cpp#L193

Not sure what the magic constant 4 represents, other than 4 times over the desired limit. Would be interesting to trace these overload moments to see if the server was definitely overloaded... Just seems a bit weird that the failure rate is constant at roughly 10%.

@JoelKatz
Copy link
Collaborator

What are you trying to accomplish exactly? Why are you making so many peers fetch so many ledgers? Whatever information you're trying to get, there's probably a sensible way to get it, and that isn't it. You're lucky any servers are replying to you at all.

@donovanhide
Copy link
Contributor Author

What are you trying to accomplish exactly?

I'm writing a thin client that operates on the peer network and listens to the activity of nodes acting on it and records the ledger, transactions and account states in an efficient manner. The peer protocol is suggested as a means of achieving this type of goal:

https://ripple.com/wiki/API_Overview

I'm doing this to open up the ledger for detailed analysis and browse-ability by the Ripple community. Currently the rippled implementation takes too long to acquire the full ledger locally (many months). The data is key to understanding trends, trading activity, payment paths and effectively deciding on market-making and arbitrage strategies. The ripplecharts data isn't granular enough for some, if not most, use cases.

Why are you making so many peers fetch so many ledgers?

The advantage of a peer to peer network is surely to distribute the load of an operation over as many computers as possible (cf. bitcoin, bittorrent, gnutella, et al) and ensure resilience of the payloads. My code distributes the cost of downloading the ledger over as many machines as possible. It requests any single ledger from a single node. If, after a period of time, the ledger is not received it tries another node. It does not make all peers return all ledgers, if that was your assumption. It only requests ledgers that the node advertises as holding in its TMHello message and subsequent TMStatusChange messages.

Whatever information you're trying to get, there's probably a sensible way to get it, and that isn't it.

Well, the ledger_data command didn't exist when I became frustrated at the time required to acquire the full ledger and the associated storage requirements of rippled, so I chose this path. Sockets are much more efficient than HTTP header parsing for every request and websockets are less amenable to connecting to many clients. Also there is no API for acquiring the full peer network to connect to, other than through the peer protocol.

You're lucky any servers are replying to you at all.

Why? I've closely followed the rippled code and am not sending any spurious data to the network. The fetch pack method of synchronising appears inefficient to me, sending the account state deltas for every ledger. As far as I can see, all the information required to reconstruct the ledger is contained in the metadata section of the transaction leaf nodes and I'm not requesting any of the account state nodes, which is a much,much larger tree. The amount of work that a rippled node has to do to respond to my code's requests is less than another rippled node making a fetch pack request.

It could be more efficient if there were TMGetLedgerRange and a TMGetTransactions messages...

I strongly believe that Ripple will benefit from alternative implementations making use of the peer protocol and those alternatives being open source as well. I fully intend to open source the majority of my code once that is functional and I am confident that it is not a toolkit for others to DDOS the network. If this means that you have to formalise the peer/sockets API, is that a bad thing? I've submitted bug reports and errant node behaviour as a result of my work, surely this is useful too.

I hope that explains the context and motivation of what I'm doing.

All the best!

As an addendum to be crystal clear of the code's process:

A bitset is created of the ledger range (currently 32,570-5,6XX,XXX). All ledgers not present in the database are requested in batches of a 1000. Each request is routed to a single node with a qualifying advertised ledger range availability. The requests are rate-limited to n/second. If the node responds with the ledger, the transactions state is recursed using TMGetObjectsByHash for each level of the tree (not very deep compared to the account state tree - which is not requested). These requests are rate-limited also. The requests are batched to contain all hashes for the previous inner node in one request. For each successful ledger and associated transaction set retrieved the bit in the bitset is set, and not requested again.

rec pushed a commit to rec/rippled that referenced this issue Sep 9, 2015
MarkusTeufelberger pushed a commit to MarkusTeufelberger/rippled that referenced this issue Sep 21, 2015
1fdd726 Hotfix RocksDB 3.5
d67500a Add `make install` to Makefile in 3.5.fb.
4cb631a update HISTORY.md
cfd0946 comments about the BlockBasedTableOptions migration in Options
REVERT: 25888ae Merge pull request XRPLF#329 from fyrz/master
REVERT: 89833e5 Fixed signed-unsigned comparison warning in db_test.cc
REVERT: fcac705 Fixed compile warning on Mac caused by unused variables.
REVERT: b3343fd resolution for java build problem introduced by 5ec53f3
REVERT: 187b299 ForwardIterator: update prev_key_ only if prefix hasn't changed
REVERT: 5ec53f3 make compaction related options changeable
REVERT: d122e7b Update INSTALL.md
REVERT: 986dad0 Merge pull request XRPLF#324 from dalgaaf/wip-da-SCA-20140930
REVERT: 8ee75dc db/memtable.cc: remove unused variable merge_result
REVERT: 0fd8bbc db/db_impl.cc: reduce scope of prefix_initialized
REVERT: 676ff7b compaction_picker.cc: remove check for >=0 for unsigned
REVERT: e55aea5 document_db.cc: fix assert
REVERT: d517c83 in_table_factory.cc: use correct format specifier
REVERT: b140375 ttl/ttl_test.cc: prefer prefix ++operator for non-primitive types
REVERT: 43c789c spatialdb/spatial_db.cc: use !empty() instead of 'size() > 0'
REVERT: 0de452e document_db.cc: pass const parameter by reference
REVERT: 4cc8643 util/ldb_cmd.cc: prefer prefix ++operator for non-primitive types
REVERT: af8c2b2 util/signal_test.cc: suppress intentional null pointer deref
REVERT: 33580fa db/db_impl.cc: fix object handling, remove double lines
REVERT: 873f135 db_ttl_impl.h: pass func parameter by reference
REVERT: 8558457 ldb_cmd_execute_result.h: perform init in initialization list
REVERT: 063471b table/table_test.cc: pass func parameter by reference
REVERT: 93548ce table/cuckoo_table_reader.cc: pass func parameter by ref
REVERT: b8b7117 db/version_set.cc: use !empty() instead of 'size() > 0'
REVERT: 8ce050b table/bloom_block.*: pass func parameter by reference
REVERT: 53910dd db_test.cc: pass parameter by reference
REVERT: 68ca534 corruption_test.cc: pass parameter by reference
REVERT: 7506198 cuckoo_table_db_test.cc: add flush after delete
REVERT: 1f96330 Print MB per second compaction throughput separately for reads and writes
REVERT: ffe3d49 Add an instruction about SSE in INSTALL.md
REVERT: ee1f3cc Package generation for Ubuntu and CentOS
REVERT: f0f7955 Fixing comile errors on OS X
REVERT: 99fb613 remove 2 space linter
REVERT: b2d64a4 Fix linters, second try
REVERT: 747523d Print per column family metrics in db_bench
REVERT: 56ebd40 Fix arc lint (should fix XRPLF#238)
REVERT: 637f891 Merge pull request XRPLF#321 from eonnen/master
REVERT: 827e31c Make test use a compatible type in the size checks.
REVERT: fd5d80d CompactedDB: log using the correct info_log
REVERT: 2faf49d use GetContext to replace callback function pointer
REVERT: 983d2de Add AUTHORS file. Fix XRPLF#203
REVERT: abd70c5 Merge pull request XRPLF#316 from fyrz/ReverseBytewiseComparator
REVERT: 2dc6f62 handle kDelete type in cuckoo builder
REVERT: 8b8011a Changed name of ReverseBytewiseComparator based on review comment
REVERT: 389edb6 universal compaction picker: use double for potential overflow
REVERT: 5340484 Built-in comparator(s) in RocksJava
REVERT: d439451 delay initialization of cuckoo table iterator
REVERT: 94997ea reduce memory usage of cuckoo table builder
REVERT: c627595 improve memory efficiency of cuckoo reader
REVERT: 581442d option to choose module when calculating CuckooTable hash
REVERT: fbd2daf CompactedDBImpl::MultiGet() for better CuckooTable performance
REVERT: 3c68006 CompactedDBImpl
REVERT: f7375f3 Fix double deletes
REVERT: 21ddcf6 Remove allow_thread_local
REVERT: fb4a492 Merge pull request XRPLF#311 from ankgup87/master
REVERT: 611e286 Merge branch 'master' of https://github.com/facebook/rocksdb
REVERT: 0103b44 Merge branch 'master' of ssh://github.com/ankgup87/rocksdb
REVERT: 1dfb7bb Add block based table config options
REVERT: cdaf44f Enlarge log size cap when printing file summary
REVERT: 7cc1ed7 Merge pull request XRPLF#309 from naveenatceg/staticbuild
REVERT: ba6d660 Resolving merge conflict
REVERT: 51eeaf6 Addressing review comments
REVERT: fd7d3fe Addressing review comments (adding a env variable to override temp directory)
REVERT: cf7ace8 Addressing review comments
REVERT: 0a29ce5 re-enable BlockBasedTable::SetupForCompaction()
REVERT: 55af370 Remove TODO for checking index checksums
REVERT: 3d74f09 Fix compile
REVERT: 53b0039 Fix release compile
REVERT: d0de413 WriteBatchWithIndex to allow different Comparators for different column families
REVERT: 57a32f1 change target_file_size_base to uint64_t
REVERT: 5e6aee4 dont create backup_input if compaction filter v2 is not used
REVERT: 49b5f94 Merge pull request XRPLF#306 from Liuchang0812/fix_cast
REVERT: 787cb4d remove cast, replace %llu with % PRIu64
REVERT: a7574d4 Update logging.cc
REVERT: 7e0dcb9 Update logging.cc
REVERT: 57fa3cc Merge pull request XRPLF#304 from Liuchang0812/fix-check
REVERT: cd44522 Merge pull request XRPLF#305 from Liuchang0812/fix-logging
REVERT: 6a031b6 remove unused variable
REVERT: 4436f17 fixed XRPLF#303: replace %ld with % PRId64
REVERT: 7a1bd05 Merge pull request XRPLF#302 from ankgup87/master
REVERT: 423e52c Merge branch 'master' of https://github.com/facebook/rocksdb
REVERT: bfeef94 Add rate limiter
REVERT: 32f2532 Print compression_size_percent as a signed int
REVERT: 976caca Skip AllocateTest if fallocate() is not supported in the file system
REVERT: 3b897cd Enable no-fbcode RocksDB build
REVERT: f445947 RocksDB: Format uint64 using PRIu64 in db_impl.cc
REVERT: e17bc65 Merge pull request XRPLF#299 from ankgup87/master
REVERT: b93797a Fix build
REVERT: adae3ca [Java] Fix JNI link error caused by the removal of options.db_stats_log_interval
REVERT: 90b8c07 Fix unit tests errors
REVERT: 51af7c3 CuckooTable: add one option to allow identity function for the first hash function
REVERT: 0350435 Fixed a signed-unsigned comparison in spatial_db.cc -- issue XRPLF#293
REVERT: 2fb1fea Fix syncronization issues
REVERT: ff76895 Remove some unnecessary constructors
REVERT: feadb9d fix cuckoo table builder test
REVERT: 3c232e1 Fix mac compile
REVERT: 54cada9 Run make format on PR XRPLF#249
REVERT: 27b22f1 Merge pull request XRPLF#249 from tdfischer/decompression-refactoring
REVERT: fb6456b Replace naked calls to operator new and delete (Fixes XRPLF#222)
REVERT: 5600c8f cuckoo table: return estimated size - 1
REVERT: a062e1f SetOptions() for memtable related options
REVERT: e4eca6a Options conversion function for convenience
REVERT: a7c2094 Merge pull request XRPLF#292 from saghmrossi/master
REVERT: 4d05234 Merge branch 'master' of github.com:saghmrossi/rocksdb
REVERT: 60a4aa1 Test use_mmap_reads
REVERT: 94e43a1 [Java] Fixed 32-bit overflowing issue when converting jlong to size_t
REVERT: f9eaaa6 added include for inttypes.h to fix nonworking printf statements
REVERT: f090575 Replaced "built on on earlier work" by "built on earlier work" in README.md
REVERT: faad439 Fix XRPLF#284
REVERT: 49aacd8 Fix make install
REVERT: acb9348 [Java] Include WriteBatch into RocksDBSample.java, fix how DbBenchmark.java handles WriteBatch.
REVERT: 4a27a2f Don't sync manifest when disableDataSync = true
REVERT: 9b8480d Merge pull request XRPLF#287 from yinqiwen/rate-limiter-crash-fix
REVERT: 28be16b fix rate limiter crash XRPLF#286
REVERT: 04ce1b2 Fix XRPLF#284
REVERT: add22e3 standardize scripts to run RocksDB benchmarks
REVERT: dee91c2 WriteThread
REVERT: 540a257 Fix WAL synced
REVERT: 24f034b Merge pull request XRPLF#282 from Chilledheart/develop
REVERT: 49fe329 Fix build issue under macosx
REVERT: ebb5c65 Add make install
REVERT: 0352a9f add_wrapped_bloom_test
REVERT: 9c0e66c Don't run background jobs (flush, compactions) when bg_error_ is set
REVERT: a9639bd Fix valgrind test
REVERT: d1f24dc Relax FlushSchedule test
REVERT: 3d9e6f7 Push model for flushing memtables
REVERT: 059e584 [unit test] CompactRange should fail if we don't have space
REVERT: dd641b2 fix RocksDB java build
REVERT: 53404d9 add_qps_info_in cache bench
REVERT: a52cecb Fix Mac compile
REVERT: 092f97e Fix comments and typos
REVERT: 6cc1286 Added a few statistics for BackupableDB
REVERT: 0a42295 Fix SimpleWriteTimeoutTest
REVERT: 06d9862 Always pass MergeContext as pointer, not reference
REVERT: d343c3f Improve db recovery
REVERT: 6bb7e3e Merger test
REVERT: 88841bd Explicitly cast char to signed char in Hash()
REVERT: 5231146 MemTableOptions
REVERT: 1d284db Addressing review comments
REVERT: 55114e7 Some updates for SpatialDB
REVERT: 171d4ff remove TailingIterator reference in db_impl.h
REVERT: 9b0f7ff rename version_set options_ to db_options_ to avoid confusion
REVERT: 2d57828 Check stop level trigger-0 before slowdown level-0 trigger
REVERT: 659d2d5 move compaction_filter to immutable_options
REVERT: 048560a reduce references to cfd->options() in DBImpl
REVERT: 011241b DB::Flush() Do not wait for background threads when there is nothing in mem table
REVERT: a2bb7c3 Push- instead of pull-model for managing Write stalls
REVERT: 0af157f Implement full filter for block based table.
REVERT: 9360cc6 Fix valgrind issue
REVERT: 02d5bff Merge pull request XRPLF#277 from wankai/master
REVERT: 88a2f44 fix comments
REVERT: 7c16e39 Merge pull request XRPLF#276 from wankai/master
REVERT: 8237738 replace hard-coded number with named variable
REVERT: db8ca52 Merge pull request XRPLF#273 from nbougalis/static-analysis
REVERT: b7b031f Merge pull request XRPLF#274 from wankai/master
REVERT: 4c2b1f0 Merge remote-tracking branch 'upstream/master'
REVERT: a5d2863 typo improvement
REVERT: 9f8aa09 Don't leak data returned by opendir
REVERT: d1cfb71 Remove unused member(s)
REVERT: bfee319 sizeof(int*) where sizeof(int) was intended
REVERT: d40c1f7 Add missing break statement
REVERT: 2e97c38 Avoid off-by-one error when using readlink
REVERT: 40ddc3d add cache bench
REVERT: 9f1c80b Drop column family from write thread
REVERT: 8de151b Add db_bench with lots of column families to regression tests
REVERT: c9e419c rename options_ to db_options_ in DBImpl to avoid confusion
REVERT: 5cd0576 Fix compaction bug in Cuckoo Table Builder. Use kvs_.size() instead of num_entries in FileSize() method.
REVERT: 0fbb3fa fixed memory leak in unit test DBIteratorBoundTest
REVERT: adcd253 fix asan check
REVERT: 4092b7a Merge pull request XRPLF#272 from project-zerus/patch-1
REVERT: bb6ae0f fix more compile warnings
REVERT: 6d31441 Merge pull request XRPLF#271 from nbougalis/cleanups
REVERT: 0cd0ec4 Plug memory leak during index creation
REVERT: 4329d74 Fix swapped variable names to accurately reflect usage
REVERT: 45a5e3e Remove path with arena==nullptr from NewInternalIterator
REVERT: 5665e5e introduce ImmutableOptions
REVERT: e0b99d4 created a new ReadOptions parameter 'iterate_upper_bound'
REVERT: 51ea889 Fix travis builds
REVERT: a481626 Relax backupable rate limiting test
REVERT: f7f973d Merge pull request XRPLF#269 from huahang/patch-2
REVERT: ef5b384 fix a few compile warnings
REVERT: 2fd3806 Merge pull request XRPLF#263 from wankai/master
REVERT: 1785114 delete unused Comparator
REVERT: 1b1d961 update HISTORY.md
REVERT: 703c3ea comments about the BlockBasedTableOptions migration in Options
REVERT: 4b5ad88 Merge pull request XRPLF#260 from wankai/master
REVERT: 19cc588 change to filter_block std::unique_ptr support RAII
REVERT: 9b976e3 Merge pull request XRPLF#259 from wankai/master
REVERT: 5d25a46 Merge remote-tracking branch 'upstream/master'
REVERT: dff2b1a typo improvement
REVERT: 343e98a Reverting import change
REVERT: ddb8039 RocksDB static build Make file changes to download and build the dependencies .Load the shared library when RocksDB is initialized

git-subtree-dir: src/rocksdb2
git-subtree-split: 1fdd726a8254c13d0c66d8db8130ad17c13d7bcc
@JoelKatz
Copy link
Collaborator

JoelKatz commented Feb 4, 2017

If the server is not going to reply to you, it simply doesn't reply. The query logic rippled uses handles this case correctly. The peer ledger exchange logic is built on the assumption that servers are peers and are sharing work equally. It's not designed to tolerate a peer that makes other peers do all the work. You should issue such queries to your own rippled that's participating in the network, not to other people's rippleds that have no obligation to do extra work for you.

There are really only two alternatives. A server could never ever be allowed to refuse a request. That's an obvious non-starter. Alternatively, a server could be required to always send an active rejection of a request if it wasn't going to reply to it. That would add overhead in particularly the case where the server is trying to minimize overhead. And the intention of delaying the response is to force the queries to back off (to time out waiting for it). It doesn't seem to make sense to shift this work to the server that receives the query when the whole intention of the design is to put as much work on the querier as possible.

@JoelKatz JoelKatz closed this as completed Feb 4, 2017
@donovanhide
Copy link
Contributor Author

The peer network protocol is undocumented, so I gave up on this some years ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants