use osrm-datastore for testing, keep osrm-routed runnning #889

emiltin · 2014-01-24T20:25:55Z

This branch uses osrm-datastore to load data during cucumber testing, resulting in a speed up of more than 3x on cached tests. (First run is about the same, since data must be converted with osmosis).

Before each scenario is run, osrm-datastore is used to load the new into shared memory. osmr-routed is then launched if it's not already running. As cucumber exits, osrm-routed is shutdown.

We might want to add some testing of the good ol' way of loading data directly with osrm-routed, since with the current version of this branch, osrm-routed never load data directly.

I did experience some weird behaviour when trying to launch osrm-routed manually from the command line, and then running osrm-datastore from the cuke scripts, including errors in datastore indicating failure to free data, osrm-routed not returning correct routes, and osrm-routed throwing exception. This should perhaps be investigated. But when launching both routed and datastore from the cuke scripts it seems to work fine.

emiltin · 2014-01-24T20:26:40Z

I only tried the branch on Mac so far.

emiltin · 2014-01-24T22:12:09Z

There's a ton of failed cucumber tests on Travis, but they seem related to osmosis. However the build is still reported as passing - is that because the build itself succeds, and the cucumber result is not considered?

DennisOSRM · 2014-01-24T22:31:08Z

Travis appears to have changed the environment. No idea why osmosis is broken there ATM. Nevertheless the Jenkins server does run the tests too

alex85k · 2014-01-25T07:20:44Z

I have tried this branch on home Ubuntu 12.04 and have multiple errors (manual loading and serving seems to work)

On the first run (when files are generated and delays are big) first ~20 tests seem so pass, but on later tests and after restarting cucumber there are many failing tests followed by "osrm-routed is not running" errors.

(I have 3Gb RAM machine with no swap partition, this may not be enough, but old-way tests pass without any problems)

There should be some way to increase reliability of osrm-datastore and routed on frequent reloading...
Maybe datastore test runner can become an optional way to run tests, not default?

alex85k · 2014-01-25T09:46:43Z

On Windows (waitpid-modified script from this branch and sources from #880) there are also correct results->then some incorrect results->then crashes of osrm-routed.

emiltin · 2014-01-28T11:08:45Z

doesn't really seem to work on Ubuntu (i'm running v 13).

both osrm-routed and osrm-datastore report [warn] "could not request RAM lock" on every launch, even when i run - them manually.
routing results are sometimes incorrect, perhaps because the new data is not fully loaded?
osrm-routing not responding: *** osrm-routed is not running. (RuntimeError)

alex85k · 2014-01-28T16:01:00Z

This does not depend on test-running configuration, actuallly. Huge timeiout is not a way to solve this problem :)
There is something in code or system configuratiion (shared mem, named mutexes) that makes the behavior unpredictable. However, it its hard to imagine that data is still loading after osrm-datastore already finished.

When the next request comes, the reloading process in osrm-routed is initialized, name of fileIndex file is correct... Errors may be the result of reading 1) previous data, 2) still changing data or 3) incorrectly loaded data.
I do not know how to determine exact reason...

DennisOSRM · 2014-01-28T18:30:56Z

@alex85k I haven't looked at the code, but this sounds like race conditions to me. The swapping of the data in memory should be safeguarded by mutex's.

emiltin · 2014-01-28T19:42:16Z

@DennisOSRM, can you get the experimental/cuke_datastore branch to run tests succesfully on ubuntu?

DennisOSRM · 2014-01-28T21:08:13Z

@emiltin will do tomorrow Morning.

DennisOSRM · 2014-01-30T00:02:16Z

Sorry. Got delayed. Will get to that asap

DennisOSRM · 2014-01-30T08:47:07Z

Tests run fine on my Ubuntu dev machine. First run takes 3m37s while the second (cached) run takes only 0m15s. The only downside is that the following warning is produced for every test:

[warn] Process ../build/osrm-datastore could not request RAM lock

I am not yet sure what the reason is.

DennisOSRM · 2014-01-30T09:10:22Z

So, after digging a bit deeper I found why it is warning. The OS is not allowing to lock the data into RAM as it is hitting a limit. To view the limit try

$ ulimit -l

On my system it says 64 which means you can only lock at most 64kb of data into RAM by default. The setting can be tweaked though:

$ sudo vi /etc/security/limits.conf

and then add the following two lines at the bottom, where is your user name:

<user>       hard    memlock     unlimited
<user>       soft    memlock     68719476736

Login and out ( or even reboot ) and the warning should be gone. While the message is certainly nagging, it is a message that one could safely ignore during tests.

emiltin · 2014-01-30T09:11:00Z

interesting, because they don't run at all on my ubuntu machine.
you're using the experimental/cuke_datastore branch?
what version of ubuntu are you running?
what settings are you using for shmall, shmmax, and what's your total ram?

DennisOSRM · 2014-01-30T09:16:57Z

I am running the code from this pull request on Ubuntu 13.10.

What kind of error do you get?

emiltin · 2014-01-30T09:22:19Z

here you can see the errors:
https://gist.github.com/emiltin/8705155

editing /etc/security/limits.conf did not seem to make a difference, i'm still getting the warning, and ulimit -l still reports 64.

emil@emil-OptiPlex-7010:~/code/Project-OSRM$ git branch
  develop
* experimental/cuke_datastore
  master
emil@emil-OptiPlex-7010:~/code/Project-OSRM$ git log -n 1
commit 02f631e3c6d5b580263aa74cfe0711d6746d98fc
Author: Emil Tin <emil@tin.dk>
Date:   Fri Jan 24 21:14:38 2014 +0100

    use osrm-datastore for testing, keep osrm-routed runnning
emil@emil-OptiPlex-7010:~/code/Project-OSRM$ ulimit -l
64
emil@emil-OptiPlex-7010:~/code/Project-OSRM$ tail /etc/security/limits.conf
#@faculty        soft    nproc           20
#@faculty        hard    nproc           50
#ftp             hard    nproc           0
#ftp             -       chroot          /ftp
#@student        -       maxlogins       4

# End of file

<user>       hard    memlock     unlimited
<user>       soft    memlock     68719476736
emil@emil-OptiPlex-7010:~/code/Project-OSRM$ sysctl -a | grep shmmax
sysctl: permission denied on key 'fs.protected_hardlinks'
sysctl: permission denied on key 'fs.protected_symlinks'
sysctl: permission denied on key 'kernel.cad_pid'
sysctl: permission denied on key 'kernel.usermodehelper.bset'
sysctl: permission denied on key 'kernel.usermodehelper.inheritable'
kernel.shmmax = 134217728
sysctl: permission denied on key 'net.ipv4.tcp_fastopen_key'
emil@emil-OptiPlex-7010:~/code/Project-OSRM$ sysctl -a | grep shmall
sysctl: permission denied on key 'fs.protected_hardlinks'
sysctl: permission denied on key 'fs.protected_symlinks'
sysctl: permission denied on key 'kernel.cad_pid'
sysctl: permission denied on key 'kernel.usermodehelper.bset'
sysctl: permission denied on key 'kernel.usermodehelper.inheritable'
kernel.shmall = 262144
sysctl: permission denied on key 'net.ipv4.tcp_fastopen_key'
emil@emil-OptiPlex-7010:~/code/Project-OSRM$ free -m -h
             total       used       free     shared    buffers     cached
Mem:          3.8G       1.2G       2.5G         0B        69M       699M
-/+ buffers/cache:       503M       3.3G
Swap:         3.9G         0B       3.9G

DennisOSRM · 2014-01-30T09:27:09Z

You need to replace <user> with your actual user name, ie. emil

emiltin · 2014-01-30T09:35:44Z

oh.. i see!

emiltin · 2014-01-30T10:22:35Z

i got rid of the warning, by modiying /etc/security/limits.conf.

but cucumber still reports tons of errors. if i run "cucumber -t @basic" (consisting of 11 scenarios), i will get anything from 2-9 failed scenarios, either because the routing is incorrect, or osrm-routed doesn't repond.

sometimes osrm-datastore seems to hang for 10-30 seconds, making the whole machine unresponsive, as if huge amounts of memory is being allocated.

emiltin · 2014-01-30T10:25:17Z

i added some debug info, so you can see the order in which datastore is called, and routed is launched/shutdown.

emiltin · 2014-01-30T10:40:11Z

the develop branch runs all test without errors on my machine

emiltin · 2014-01-30T10:41:29Z

rebased on latest develop. (still same errors)

emiltin · 2014-01-30T11:21:19Z

when i run the cucumber tests, and then use 'rake pid' to monitor the osrm-routed process, i can see that it at some point it changes from mode S to mode Z (Defunct "zombie" process, terminated but not reaped by its parent.) from then on, cucumber reports '*** osrm-routed is not running'

so it seems that reload data with osrm-datastore somehow causes osrm-routed to die?

it would be nice if osrm-routed would output something to the log when new data has been loaded

DennisOSRM · 2014-01-30T11:42:40Z

The point of the entire data store thingy is that osrm-routedis not terminated. Not sure what is happening there.

emiltin · 2014-01-30T11:52:48Z

yes it's odd

alex85k · 2014-01-30T15:19:22Z

I have seen the Defunct osrm-routed too when running those tests (I guess it was after getting segmentation faults)...

alex85k · 2014-01-31T11:52:01Z

I tried to compile and run the tests (cuke_datastore branch) on FreeBSD 10 virtual machine with CLang 3.3. All 251 tests passed without any shmem configuration. Second run took 1m17.311s (on VM, Core2Duo, 2Gb RAM) and did not show any errors (first time in my experience).

This is extremely strange.
Maybe there is a problem with newer or older libraries like Boost or even system libs?
(my unsuccessfull tests were on Boost 1.55)

DennisOSRM · 2014-01-31T12:59:58Z

Don't think this is related to boost. The testing code is in ruby and should not interfere with boost (as linked into the OSRM binaries).

@emiltin Is the routed process dying from a segfault or is it because of some other exception?

alex85k · 2014-01-31T13:08:12Z

Errors are caused by routed faults, not testing environment...
I had explicit segmentation faults on RedHat nonstandard system with old glibc (2007), on Windows routing daemon dies without any message. Did not notice the segfault messages on Ubuntu 12.04, but maybe there were some (routed also died periodically).

Now rebuilding with custom boost 1.55 on FreeBSD to check my hypothesis :)

DennisOSRM · 2014-10-13T09:27:27Z

Cancelled the builds for 14eac50 to get results for the latest commit earlier.

emiltin · 2014-10-13T09:58:11Z

all green on travis. but we should still add a test for direct data load.

DennisOSRM · 2014-10-13T10:27:59Z

AppVeyor is looking good, too, while it is halfway through. once we have a test for direct loading, this is looking really good to merge. We are close.

alex85k · 2014-10-13T11:15:11Z

I have tested this branch after rebase (on 6-core Xeon E5-1650 with SSD):

no more Ruby errors
only Routing on a oneway roundabout test still fails (not related to this PR)
first run - 3min , second run - 51s (Release build). Wow. :)

Thank you!

emiltin · 2014-10-14T13:44:43Z

added test of direct data load. this required some changes to the test infrastructure. you can now use

    Given the data is loaded directly

or

    Given the data is loaded with datastore

to specify for each scenario how data is loaded. To minimize the risk of hard-to-debug problems, only one instance of osrm-routed will be launched at the same time.

the default is to use datastore to load data and osrm-routed running for all tests. but osrm-routed will be relaunched when needed, ie everytime a scenario uses direct data load, or you go from direct to datastore.

Direct data is tested with these scenarios:
https://github.com/Project-OSRM/osrm-backend/blob/71b967d24308be1939e8290d4597847e526566c3/features/testbot/load.feature

DennisOSRM · 2014-10-14T13:46:12Z

Cool!

emiltin · 2014-10-14T13:56:56Z

@alex85k does the latest commit work on windows?

emiltin · 2014-10-14T14:12:51Z

21.5s on my linux box for 353 scenarios / 1467 steps :-)

alex85k · 2014-10-14T14:32:46Z

@alex85k does the latest commit work on windows?

It should work, I'll check tomorrow. But why 0.1 timeout for shutdown is too big? There is only one shutdown in testing process, if I understand correctly.

emiltin · 2014-10-14T14:36:17Z

yes only used very seldom. but it's a retry delay, not a timeout.

alex85k · 2014-10-14T14:52:35Z

Seem to work fine on Windows with latest commit (partial run, but full should be the same)

emiltin · 2014-10-14T16:18:47Z

actually you need to be sure to include feature/testbot/load.feature as well as other tests, to make sure you cover loading data both with datastore and directly. but appveyor seems happy.

emiltin · 2014-10-14T16:19:22Z

uhm guess appveyor doesn't run the cucumber tests?

DennisOSRM · 2014-10-14T16:24:00Z

uhm guess appveyor doesn't run the cucumber tests?

not yet

alex85k · 2014-10-14T18:04:59Z

I had a prototype of testing environment for Appveyor, maybe now it can fit in time (at least some tests).
@DennisOSRM : they now have 100Mb cache you asked : http://www.appveyor.com/docs/build-cache, it should be enough to store dependencies and stripped Ruby+Gems folder.

DennisOSRM · 2014-10-14T18:15:58Z

Yay! for caching

DennisOSRM · 2014-10-15T09:27:16Z

@alex85k could you provide the output of cucumber features\testbot\oneway.feature:7 on Windows?

alex85k · 2014-10-15T11:08:07Z

This time no test failures, of course :) (first run 27 min on Core2Duo, Debug) . Prevoius error was non-existing path.

@DennisOSRM: are you sure that the error will not show up on some circular isolated road or so on?

emiltin · 2014-10-15T11:16:29Z

appveyor debug build failed due to 30 min timeout

emiltin · 2014-10-15T13:16:16Z

what's left to do?

DennisOSRM · 2014-10-15T13:35:54Z

I think we are good to merge. Great job, everyone.

use osrm-datastore for testing, keep osrm-routed runnning

alex85k · 2014-10-15T15:21:51Z

Thank you!

alex85k mentioned this pull request Jan 26, 2014

Develop win rebased #880

Closed

test both datastore and direct data load

71b967d

avoid unnessecary process check

8438024

DennisOSRM mentioned this pull request Oct 15, 2014

cache build dependencies and test environment on AppVeyor #1223

Closed

add a tail to the oneway circle to avoid edge cases

f7469f2

DennisOSRM added a commit that referenced this pull request Oct 15, 2014

Merge pull request #889 from Project-OSRM/experimental/cuke_datastore

dfc81f6

use osrm-datastore for testing, keep osrm-routed runnning

DennisOSRM merged commit dfc81f6 into develop Oct 15, 2014

DennisOSRM deleted the experimental/cuke_datastore branch October 15, 2014 13:42

alex85k mentioned this pull request Oct 16, 2014

add testing on AppVeyor #1226

Closed

use osrm-datastore for testing, keep osrm-routed runnning #889

use osrm-datastore for testing, keep osrm-routed runnning #889

Conversation

emiltin commented Jan 24, 2014

emiltin commented Jan 24, 2014

emiltin commented Jan 24, 2014

DennisOSRM commented Jan 24, 2014

alex85k commented Jan 25, 2014

alex85k commented Jan 25, 2014

emiltin commented Jan 28, 2014

alex85k commented Jan 28, 2014

DennisOSRM commented Jan 28, 2014

emiltin commented Jan 28, 2014

DennisOSRM commented Jan 28, 2014

DennisOSRM commented Jan 30, 2014

DennisOSRM commented Jan 30, 2014

DennisOSRM commented Jan 30, 2014

emiltin commented Jan 30, 2014

DennisOSRM commented Jan 30, 2014

emiltin commented Jan 30, 2014

DennisOSRM commented Jan 30, 2014

emiltin commented Jan 30, 2014

emiltin commented Jan 30, 2014

emiltin commented Jan 30, 2014

emiltin commented Jan 30, 2014

emiltin commented Jan 30, 2014

emiltin commented Jan 30, 2014

DennisOSRM commented Jan 30, 2014

emiltin commented Jan 30, 2014

alex85k commented Jan 30, 2014

alex85k commented Jan 31, 2014

DennisOSRM commented Jan 31, 2014

alex85k commented Jan 31, 2014

DennisOSRM commented Oct 13, 2014

emiltin commented Oct 13, 2014

DennisOSRM commented Oct 13, 2014

alex85k commented Oct 13, 2014

emiltin commented Oct 14, 2014

DennisOSRM commented Oct 14, 2014

emiltin commented Oct 14, 2014

emiltin commented Oct 14, 2014

alex85k commented Oct 14, 2014

emiltin commented Oct 14, 2014

alex85k commented Oct 14, 2014

emiltin commented Oct 14, 2014

emiltin commented Oct 14, 2014

DennisOSRM commented Oct 14, 2014

alex85k commented Oct 14, 2014

DennisOSRM commented Oct 14, 2014

DennisOSRM commented Oct 15, 2014

alex85k commented Oct 15, 2014

emiltin commented Oct 15, 2014

emiltin commented Oct 15, 2014

DennisOSRM commented Oct 15, 2014

alex85k commented Oct 15, 2014