Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to catch known apt-get error #170

Closed
tfoote opened this issue Jan 27, 2016 · 16 comments
Closed

Failure to catch known apt-get error #170

tfoote opened this issue Jan 27, 2016 · 16 comments
Assignees
Labels

Comments

@tfoote
Copy link
Member

tfoote commented Jan 27, 2016

We've seen a failure of the apt-get.py wrapper where it's supposed to retrigger based on strings but did not do so.

http://build.ros.org/view/All/job/Ibin_arm_uThf__pr2_camera_synchronizer__ubuntu_trusty_armhf__binary/3/console

The apt-get.py update call outputs a warning which the script is supposed to catch: is not what the server reported
but somehow it doesn’t and I fail to see why / how that could be

00:00:33.064 Step 16 : RUN python3 -u /tmp/wrapper_scripts/apt-get.py update-and-install -q -y devscripts dpkg-dev python3-apt python3-catkin-pkg python3-empy python3-rosdistro python3-yaml
00:00:35.014  ---> Using cache
00:00:35.014  ---> ed79b2cf328a
00:00:35.014 Step 17 : RUN echo "2016-01-22 18:45:43 -0800"
00:00:37.162  ---> Running in e0b55cc14fa8
00:00:37.360 2016-01-22 18:45:43 -0800
00:00:37.717  ---> 6b206454b769
00:00:37.728 Removing intermediate container e0b55cc14fa8
00:00:37.728 Step 18 : RUN python3 -u /tmp/wrapper_scripts/apt-get.py update
00:00:37.894  ---> Running in 50d3e5501882
00:00:38.610 Invoking 'apt-get update'
00:00:42.581 Get:1 http://repositories.ros.org trusty InRelease [4,029 B]
00:00:44.510 Ign http://ports.ubuntu.com trusty InRelease
00:00:44.511 Get:2 http://ports.ubuntu.com trusty-updates InRelease [64.4 kB]
00:00:44.513 Hit http://ports.ubuntu.com trusty-security InRelease
00:00:44.674 Hit http://ports.ubuntu.com trusty Release.gpg
00:00:44.836 Hit http://ports.ubuntu.com trusty Release
00:00:45.556 Get:3 http://repositories.ros.org trusty/main Sources [1,101 kB]
00:00:48.232 Get:4 http://repositories.ros.org trusty/main armhf Packages [518 kB]
00:00:48.391 Get:5 http://ports.ubuntu.com trusty-updates/main Sources [311 kB]
00:00:49.018 Get:6 http://ports.ubuntu.com trusty-updates/restricted Sources [5,219 B]
00:00:49.340 Get:7 http://ports.ubuntu.com trusty-updates/universe Sources [185 kB]
00:00:49.733 Get:8 http://ports.ubuntu.com trusty-updates/main armhf Packages [762 kB]
00:00:50.321 Get:9 http://ports.ubuntu.com trusty-updates/restricted armhf Packages [13.3 kB]
00:00:50.639 Get:10 http://ports.ubuntu.com trusty-updates/universe armhf Packages [428 kB]
00:00:51.107 Hit http://ports.ubuntu.com trusty-security/main Sources
00:00:51.266 Hit http://ports.ubuntu.com trusty-security/restricted Sources
00:00:51.425 Hit http://ports.ubuntu.com trusty-security/universe Sources
00:00:51.584 Hit http://ports.ubuntu.com trusty-security/main armhf Packages
00:00:51.743 Hit http://ports.ubuntu.com trusty-security/restricted armhf Packages
00:00:51.902 Hit http://ports.ubuntu.com trusty-security/universe armhf Packages
00:00:52.062 Hit http://ports.ubuntu.com trusty/main Sources
00:00:52.220 Hit http://ports.ubuntu.com trusty/restricted Sources
00:00:52.388 Hit http://ports.ubuntu.com trusty/universe Sources
00:00:52.539 Hit http://ports.ubuntu.com trusty/main armhf Packages
00:00:52.696 Hit http://ports.ubuntu.com trusty/restricted armhf Packages
00:00:52.864 Hit http://ports.ubuntu.com trusty/universe armhf Packages
00:01:13.965 Fetched 3,392 kB in 30s (111 kB/s)
00:01:42.177 Reading package lists...
00:01:43.111 W: Size of file /var/lib/apt/lists/ports.ubuntu.com_ubuntu-ports_dists_trusty-updates_main_binary-armhf_Packages.gz is not what the server reported 762095 762158
00:01:43.769  ---> 549cc867917d
00:01:43.784 Removing intermediate container 50d3e5501882
@dirk-thomas dirk-thomas added the bug label Feb 9, 2016
@dirk-thomas
Copy link
Member

The same happens for the new known error Unable to locate package: http://build.ros.org/job/Idoc__zbar_ros__ubuntu_trusty_amd64/7/console

@dirk-thomas
Copy link
Member

@tfoote
Copy link
Member Author

tfoote commented Feb 17, 2016

@dirk-thomas dirk-thomas self-assigned this Feb 22, 2016
dirk-thomas added a commit that referenced this issue Feb 26, 2016
dirk-thomas added a commit that referenced this issue Feb 26, 2016
dirk-thomas added a commit that referenced this issue Feb 26, 2016
@dirk-thomas
Copy link
Member

PR #213 adds some debug output to the wrapper script. As soon as the problem happens again this should provide more insight why it failed.

@tfoote
Copy link
Member Author

tfoote commented Mar 4, 2016

I just noticed this job doesn't seem to retry: http://build.ros.org/job/Idev__geometry_experimental__ubuntu_trusty_amd64/7/console

@dirk-thomas
Copy link
Member

@dirk-thomas
Copy link
Member

The following jobs contain unable to locate package during an apt-get install call but don't retry the invocation:

The problem here is that apt-get has a non-zero rc. But the found known error condition is not in known_error_strings_redo_update but only in known_error_strings which is not checked for (

set(known_error_conditions) &
set(known_error_strings_redo_update)):
).

@dirk-thomas
Copy link
Member

All cases should be addressed by #232.

@dirk-thomas
Copy link
Member

I will keep this ticket for a bit. If the problem doesn't appear again the debug output added with #213 should be removed.

@scpeters
Copy link

I just saw a job that was "Unable to locate package" and retried multiple times but still failed.

I restarted it, and it made it past that point, so it seems like a spurious failure.

Is that relevant to this issue or should I report it elsewhere?

@dirk-thomas
Copy link
Member

dirk-thomas commented Apr 26, 2016

@scpeters The problem is in Step 38 of the Docker file (http://build.ros.org/job/Ipr__gazebo_ros_pkgs__ubuntu_trusty_amd64/3/consoleFull#console-section-10):

  • the first apt-get update invocation printed the warning "is not what the server reported"
  • our wrapper script catched that and reran apt-get update - this time without a warning
  • the subsequent apt-get install calls are "Unable to locate package" ros-indigo-nodelet

From the wrapper script point of view update was successful and it has no reason to reinvoke it. Only in case of the problems listed here (https://github.com/ros-infrastructure/ros_buildfarm/blob/master/scripts/wrapper/apt-get.py#L47-L51) would result in reinvoking update. So I think the script is working "correctly".

The question is if it should retry update in that case. Well, it could but the problem is that if we can't rely on apt-get to tell us if it was successful or not we basically always need to rerun it and can ditch all the specific checks in the wrapper script.

@scpeters
Copy link

So you're saying this is a bug in apt-get? If that's the case I'm not sure what we can do.

@dirk-thomas
Copy link
Member

Yes, I think apt-get is failing to report the problem. I created #289 to workaround it - but I don't like the idea of not trusting apt-get since it results in quite some overhead if the problem is for real...

@dirk-thomas
Copy link
Member

The problem in #287 has the same origin from what I see.

@dirk-thomas
Copy link
Member

This should have been addressed by the referenced PRs. I will close it for now. If you see another build failing due to this please comment here and it can be reopened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants