Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement issue #4266: alternative stretch-test based on detour #4639

Closed
wants to merge 2 commits into from
Closed

Implement issue #4266: alternative stretch-test based on detour #4639

wants to merge 2 commits into from

Conversation

deniskoronchik
Copy link
Contributor

Issue

#4266

Tasklist

  • add regression / cucumber cases (see docs/testing.md)
  • review
  • adjust for comments

Copy link
Member

@daniel-j-h daniel-j-h left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you compare the following:

  • number of found alternatives compared to master
  • performance impact

// const auto scaled_at_most_longer_by =
// scaledAtMostLongerByFactorBasedOnDuration(shortest_route_duration);
// const auto stretch_duration_limit = (1. + scaled_at_most_longer_by) *
// shortest_route_duration;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed?

// const auto stretch_duration_limit = (1. + scaled_at_most_longer_by) *
// shortest_route_duration;

const auto over_duration_limit = [&](const InternalRouteResult &route) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a high-level description comment of what the following check does and why we need it

}

auto s_it_e = short_seg.rbegin();
auto a_it_e = alter_seg.rbegin();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and above: could you either make the variable names more clear or add a comment with the route sketch from the issue

@daniel-j-h
Copy link
Member

There's also https://github.com/Project-OSRM/osrm-backend/blob/master/features/testbot/alternative.feature failing now because we no longer find an alternative. Feel free to adapt.

@daniel-j-h daniel-j-h changed the title Implement issue #4266 Implement issue #4266: alternative stretch-test based on detour Oct 26, 2017
Copy link

@MoKob MoKob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are close, but there are some structural questions / considerations. For a testcase, I would suggest you try to implement a cucumber test as follows:

test

The test should be in the scenarios twice. Once with the weights chosen so that that the stretch between X and a e f h is accepted, once so that it isn't. Weights need to be selected (max-speed) in a way that the shortest path is a b c and all other roads are invalid (you could also use restricted access). The partition size needs to be specified to separate all the four cliques.

const auto scaled_at_most_longer_by =
scaledAtMostLongerByFactorBasedOnDuration(shortest_route_duration);
const auto stretch_duration_limit = (1. + scaled_at_most_longer_by) * shortest_route_duration;
static constexpr double kStretchScale = 1.2f;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we using a different factor here, compared to https://github.com/Project-OSRM/osrm-backend/pull/4639/files#diff-11d58f5aa5de9239b03afbaf1ac02bd9R45? I would like to keep it consistent with other factors.
In case there is a good reason for a new one, it should probably be defined up next to the others.

// also that there are just on different sequence
auto s_it_b = short_seg.begin();
auto a_it_b = alter_seg.begin();
while ((*s_it_b).turn_via_node == (*a_it_b).turn_via_node &&
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I see it correctly, you are skipping one node too much. Especially on motorways, this can have a longer segment not being considered.
You want to find the last node, which is common. Instead you are finding the first uncommon one.

// shortest_route_duration;

const auto over_duration_limit = [&](const InternalRouteResult &route) {
bool passed = true;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a negative name here. Passed has an acceptance criteria meaning to me. What about something like over_limit?

auto s_it_e = short_seg.rbegin();
auto a_it_e = alter_seg.rbegin();
while ((*s_it_b).turn_via_node == (*a_it_b).turn_via_node &&
s_it_e != short_seg.rend() && a_it_e != alter_seg.rend())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, you skip over one node to much.


// now calculate duration of found parts
const auto calcDuration = [](auto it_begin, auto it_end) {
double duration = 0.0;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/Project-OSRM/osrm-backend/blob/master/include/engine/internal_route_result.hpp#L32 This value should be an integer. Any reason you switch to double here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't need to cast it later, twicely. There are no any problems, that it's already double there

}

// now calculate duration of found parts
const auto calcDuration = [](auto it_begin, auto it_end) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire loop can be skipped, if you sum the distances you see during the discovery of the bounds.

calcDuration = length(P) - length(s,v) - length(u,t) with u and v the last common nodes seen from source/target.
The same holds for the alternative path.

if (short_duration * kStretchScale < alter_duration)
{
passed = false;
break;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer return short_duration * kStretScale < alter_duration

bool passed = true;
BOOST_ASSERT(route.unpacked_path_segments.size() ==
shortest_route.unpacked_path_segments.size());
for (size_t i = 0; i < shortest_route.unpacked_path_segments.size(); ++i)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused by this loop. We should only ever have a single path-segment, or the deviation should consider all path segments.
The via-node approach we support only does support a single segment and a single deviation. The moment multiple nodes are selected for the route, we will not provide alternatives. What we do here is somewhat a mix.
The code seems to allow alternatives on a by-segment basis, but removes the entire route if a single segment doesn't offer a reasonable alternative.

I would like to either clarify that we only do single destination alternatives (-> segments.size() == 1) or replace invalid segments with the shortest path in between to allow alternatives on a per-segment basis.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm works fine with [0; N) segments, why we need it to remove it, if it more common, and prevent some future errors

@daniel-j-h
Copy link
Member

daniel-j-h commented Nov 2, 2017

Next actions here:

  • fix test as outlined above
  • fix outstanding comments from above, e.g. calcDistance shouldn't be needed
  • use the osrm-runner.js script and compare against master: performance as well as how many alternatives you will get on a few different datasets (e.g. bavaria, germany, california)

@deniskoronchik
Copy link
Contributor Author

@daniel-j-h it's very strange, because i've pushed test fix yesterday to my branch, but it doesn't appeared into PR

Will add comments, but algorithm that i've implemented was provided to me by @MoKob.
Also will prepare stats

@deniskoronchik
Copy link
Contributor Author

@daniel-j-h Have no idea how to use this script to get compare data. Is there any instruction or ready test that can be suitable? Because it seems, that it require a lot of hand made work to prepare input csv file. Maybe there are already done work?

@daniel-j-h
Copy link
Member

You run the script once and it will create random queries - save these in a txt file. Then you can run repeated tests re-using the same queries you saved. For example something like this should get you started:

time node scripts/osrm-runner.js -s localhost:5000 -b 10.162,47.680,12.096,50.233 -p '/route/v1/driving/{};{}?alternatives=true' -n 10000

It would be great to compare timings and number of found alternatives for alternatives={1,2,3,4,5} for this branch and master.

@deniskoronchik
Copy link
Contributor Author

deniskoronchik commented Nov 4, 2017

@daniel-j-h There are some results.

As you wrote, i've used script and run it first 50000 queries, on ohio BB fully inside it.
After that i've run it with -q param with the same queries but with my commits.
There are time results:

MASTER:
real	17m1.428s
user	0m33.936s
sys	0m8.376s

PR:
real	17m6.824s
user	0m35.696s
sys	0m9.244s

After that i've made python script to analyze this two csv files. It read each in memory an compare queries, codes and values fields (0, 1, 4 columns in a row) to ensure that there are equal queries.
After ensure, it calculates min, avg, max values for columns(2, 3, 5), where 5 my added column with number of routes in response. There are results (ttfb, elapse - names of column got from original script):

TTFB (min, avg, max):
FIXED: (0.546459, 20.064624352339855, 105.304418)
MASTER: (0.438234, 19.994908907320106, 88.859048)
Elapse (min, avg, max):
FIXED: (0.588163, 20.138843621239733, 105.372058)
MASTER: (0.482674, 20.067620924719886, 88.944774)
Routes num (min, avg, max):
FIXED: (1.0, 1.86986, 2.0)
MASTER: (1.0, 1.83378, 2.0)

There is a strange issue, because i have more average routes in my branch, then in master one

This metric works very strange. My opinion that this metric not work good. Proof:
out.txt

In that file, there are cases, where this PR provide 2 routes, but master provides 1.

For example check:
http://192.168.100.6:9966/?z=9&loc=40.76237765397561%2C-82.33968788244843&loc=40.09863185660194%2C-83.13247493222241&hl=en&alt=0

http://192.168.100.6:9966/?z=9&loc=39.548353938450866%2C-82.84036053325214&loc=40.80230002138789%2C-82.29140229687788&hl=en&alt=0

@deniskoronchik
Copy link
Contributor Author

I think, that approach of alternative routes find should be changed. The main problem right now, that we use 2 directional dijkstra's algorithm, main problem, that candidates for alternative routes got from cache of main route finding.
If we visualize it, that would be two intersected circles, but they have small intersection in that case we couldn't find very good candidate, because it would be a high speed road far from this one. There are no many cases, where two high speed roads goes side by side.

So I think, that implementing A* algorithm can solve this problem, because it can be implemented with heuristic function, that prioritize better class roads, (motorway has more priority, then secondary), another part of this function, heuristic distance to a goal.
I that case algorithm will prioritize check of better roads in specified direction. So we will have N branches from A to B using best roads. Another criteria should be, that 2 founded ways should be different for example, by 80% of nodes. In that case algorithm will try to go from A to B by best roads (check them firstly), then will check secondary ... living_street.

The main problem to find a good heuristic function, that will include distance and road class

@daniel-j-h
Copy link
Member

Yes you need to figure out why your branch seems to find more routes then master - this should never happen if all you do here is filter out more routes. Make sure to fix both the base map and the queries you send.

I think, that approach of alternative routes find should be changed.

Please read #3905 and http://algo2.iti.kit.edu/documents/Dissertation_Kobitzsch_Moritz.pdf section 5.2 "The Via-Node Paradigm" pages 59--65.

The approach we're doing here for MLD is fine - if we need more candidates we can make the search space larger and tune quality coefficients in the impl. to trade-off quality vs quantity.

@deniskoronchik
Copy link
Contributor Author

deniskoronchik commented Nov 8, 2017

@daniel-j-h There are new resuts:

Data length: 50000
Compare request...
Compare code...
Compare values...
TTFB (min, avg, max):
FIXED: (0.464531, 20.616985472960177, 118.720778)
MASTER: (0.438234, 19.994908907320106, 88.859048)
Elapse (min, avg, max):
FIXED: (0.505933, 20.694088594979952, 118.81091)
MASTER: (0.482674, 20.067620924719886, 88.944774)
Routes num (min, avg, max):
FIXED: (1.0, 1.0832, 2.0)
MASTER: (1.0, 1.83378, 2.0)

But there are still no good routes, but some of them good. For example:
http://192.168.100.6:9966/?z=9&loc=39.872171865356904%2C-82.15558109339224&loc=40.339825739004816%2C-82.45377897051792&hl=en&alt=0

screenshot from 2017-11-08 18-32-16

Was filtered out in master.

http://192.168.100.6:9966/?z=9&loc=39.907928284114654%2C-82.37755522203265&loc=40.0740015584715%2C-81.39990872688985&hl=en&alt=0

screenshot from 2017-11-08 18-32-52

Wasn't filtered, because diff duration < 25%

There are a list of queries, where master filter out result, and this PR didn't:
out2.txt

Was ~6000 cases, for now 412 cases

update: Whit that criteria we lost 37941 of 50000 alternative routes, that was in master

Another words, there are no alternatives for 45840 of 50000 with that criteria. The same value for a master: 8311

@deniskoronchik
Copy link
Contributor Author

deniskoronchik commented Nov 8, 2017

@daniel-j-h Also i've checked for example in case http://192.168.100.6:9966/?z=9&loc=40.76237765397561%2C-82.33968788244843&loc=40.09863185660194%2C-83.13247493222241&hl=en&alt=0 i got for filter just on alternative route. And it's already very bad. With uturn on way

@deniskoronchik
Copy link
Contributor Author

@daniel-j-h Why thats going on?
peek 2017-11-14 14-51

@deniskoronchik
Copy link
Contributor Author

with

const constexpr auto kAtMostLongerBy = 0.5;
Data length: 50000
Compare request...
Compare code...
Compare values...
TTFB (min, avg, max):
FIXED: (0.554502, 20.55214023020013, 90.156938)
MASTER: (0.438234, 19.994908907320106, 88.859048)
Elapse (min, avg, max):
FIXED: (0.596367, 20.6260494441399, 90.225105)
MASTER: (0.482674, 20.067620924719886, 88.944774)
Routes num (min, avg, max):
FIXED: (1.0, 1.09012, 2.0)
MASTER: (1.0, 1.83378, 2.0)
master-drop: 570
fixed-drop: 37753
no alternatives: 8311

@DennisOSRM
Copy link
Collaborator

closing stale PR. Reopen if still relevant.

@DennisOSRM DennisOSRM closed this May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants