-
Notifications
You must be signed in to change notification settings - Fork 645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor stream parsing to ensure safety. #436
Conversation
Use ByteBuffers and ByteStreams.readFully() to ensure buffers are correctly filled. Previously, it was possible to read a partial header. readStreamFrame() is inspired by and based on: https://gist.github.com/dflemstr/9541193d141729b6c84d/e02c64c67826a2a4291eb1e950bef1e71999b8d3#file-dockerlogstream-java-L90-L115 \#### Motivation Using this configuration in pom.xml: ```maven-pom <plugin> <groupId>org.jolokia</groupId> <artifactId>docker-maven-plugin</artifactId> <version>0.13.2</version> <configuration> <verbose>true</verbose> <images> <image> <name>${docker.imageName}:${project.version}-${buildNumber}</name> <alias>db-tag</alias> <run> <log> <prefix>DB-TAG</prefix> </log> <links> <link>postgres:postgres</link> </links> <env> <RDS_USERNAME>${docker-maven-plugin.POSTGRES_USER} </RDS_USERNAME> <RDS_PASSWORD>${docker-maven-plugin.POSTGRES_PASSWORD} </RDS_PASSWORD> <RDS_DB_NAME>${docker-maven-plugin.POSTGRES_DB} </RDS_DB_NAME> </env> <!-- Should match contents of .ebextentions/10_liquibase.config --> <cmd> db tag --migrations /liquibase/db.changelog.yml /docker-it.yml ${project.artifactId}-${project.version}-${buildNumber} </cmd> <wait> <log>liquibase: Successfully released change log lock</log> <time>300000</time> </wait> </run> </image> </images> </configuration> <executions> <execution> <id>start</id> <phase>pre-integration-test</phase> <goals> <goal>build</goal> <goal>start</goal> </goals> </execution> <execution> <id>stop</id> <phase>post-integration-test</phase> <goals> <goal>stop</goal> </goals> </execution> </executions> </plugin> ``` About 50% of the time, the expected output from docker container would show up, but the plugin would not recognize it and eventually timeout. An example log snippet is: ``` /liquibase/db.changelog.initial.yml::15::thomasvandoren: ChangeSet /liquibase/db.changelog.initial.yml::15::thomasvandoren ran successfully in 3ms DB-TAG> INFO [2015-08-04 23:14:17,573] liquibase: Successfully released change log lock Exception in thread "Thread-17" java.lang.IllegalArgumentException: Invalid format: "04T23:14:17.483Z" is malformed at "T23:14:17.483Z" at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:873) at org.jolokia.docker.maven.util.Timestamp.<init>(Timestamp.java:53) at org.jolokia.docker.maven.access.log.LogRequestor.callLogCallback(LogRequestor.java:142) at org.jolokia.docker.maven.access.log.LogRequestor.parseResponse(LogRequestor.java:121) at org.jolokia.docker.maven.access.log.LogRequestor.run(LogRequestor.java:97) [ERROR] DOCKER> [thomasvandoren/test-repo:0.2-SNAPSHOT-1402d5319ddf30d2] "db-tag": Timeout after 300000 ms while waiting on on log out 'liquibase: Successfully released change log lock' ``` Other invalid format error messages that I observed include: ``` Exception in thread "Thread-13" java.lang.IllegalArgumentException: Invalid format: "?2015-08-05T20:33:12.280Z" Exception in thread "Thread-16" java.lang.IllegalArgumentException: Invalid format: "02T02:04:28.607Z" is malformed at "T02:04:28.607Z" Exception in thread "Thread-16" java.lang.IllegalArgumentException: Invalid format: ".000��" ``` There were also cases where no stack trace was printed, the expected line did show up, but the plugin still timed out "waiting" for it. I'm not sure what failed in those cases. This all seems like potential corrupt byte buffer behavior to me, which led to this change. Signed-off-by: Thomas Van Doren <thomas.vandoren@gmail.com>
Thanks a lot ;-) LGTM. With which version of d-m-p did you have the issue ? I ask because I ran also in concurrency issues where are connection was potentially reused in two different threads. That was fixed in 0.15.1 But you are right, one should at least check whether the number of bytes read are indeed the same as the declared length. |
Adapt changelog, cleaned up imports.
We have observed the invalid format issue with d-m-p version 0.13.2 through 0.15.1. The most recent, with 0.15.1, was:
Let me know if you need any more info. Thanks! |
I started using this in our regular CI build today, and 2/5 builds failed with a different error (presumably where we would have gotten the invalid format error previously):
I'll see if I can get some better information about what is going on. |
Do you run the CI with your PR applied or vanilla 0.15.1 ? |
With my PR applied. I built a 0.15.2-SNAPSHOT from this branch. |
Shit. 'hope we can sort this out soon. When you have more information, let me know. Also, I will try to reproduce it here with your pom posted. |
Here is the latest pom: <plugin>
<groupId>io.fabric8</groupId>
<artifactId>docker-maven-plugin</artifactId>
<version>0.15.2-SNAPSHOT</version>
<configuration>
<skip>${skipDocker}</skip>
<verbose>true</verbose>
<images>
<image>
<name>
postgres:9.5
</name>
<alias>postgres</alias>
<run>
<log>
<prefix>KB-POSTGRES</prefix>
</log>
<env>
<POSTGRES_USER>${docker-maven-plugin.POSTGRES_USER}</POSTGRES_USER>
<POSTGRES_PASSWORD>${docker-maven-plugin.POSTGRES_PASSWORD}</POSTGRES_PASSWORD>
<POSTGRES_DB>${docker-maven-plugin.POSTGRES_DB}</POSTGRES_DB>
</env>
<ports>
<port>postgres.port:5432</port>
</ports>
<wait>
<log>PostgreSQL init process complete; ready for start up</log>
<time>300000</time>
</wait>
</run>
</image>
<image>
<name>${docker.imageName}:${project.version}-${buildNumber}</name>
<alias>kb-image</alias>
<build>
<from>quay.io/aptible/java:oracle-java8</from>
<maintainer>rafa</maintainer>
<ports>
<port>8080</port>
<port>8081</port>
</ports>
<entryPoint>
<exec>
<args>java</args>
<args>-jar</args>
<args>/${project.build.finalName}.jar</args>
</exec>
</entryPoint>
<cmd>
<exec>
<args>server</args>
<args>/aptible.yml</args>
</exec>
</cmd>
<assembly>
<basedir>/</basedir>
<inline>
<fileSets>
<fileSet>
<outputDirectory>/</outputDirectory>
<filtered>false</filtered>
<directory>${project.build.outputDirectory}
</directory>
<includes>
<include>aptible.yml</include>
<include>docker-it.yml</include>
<include>liquibase/**</include>
</includes>
</fileSet>
<fileSet>
<outputDirectory>/</outputDirectory>
<filtered>false</filtered>
<directory>${project.build.directory}</directory>
<includes>
<include>${project.build.finalName}.jar
</include>
</includes>
</fileSet>
</fileSets>
</inline>
</assembly>
</build>
</image>
<image>
<name>${docker.imageName}:${project.version}-${buildNumber}</name>
<alias>kb-db-migrate</alias>
<run>
<log>
<prefix>KB-DB-MIGRATE</prefix>
</log>
<links>
<link>postgres:postgres</link>
</links>
<env>
<DB_USERNAME>${docker-maven-plugin.POSTGRES_USER}
</DB_USERNAME>
<DB_PASSWORD>${docker-maven-plugin.POSTGRES_PASSWORD}
</DB_PASSWORD>
<DB_NAME>${docker-maven-plugin.POSTGRES_DB}
</DB_NAME>
</env>
<cmd>
db migrate
--migrations /liquibase/db.changelog.yml
/docker-it.yml
</cmd>
<wait>
<log>liquibase: Successfully released change log lock</log>
<time>30000</time>
</wait>
</run>
</image>
<image>
<name>${docker.imageName}:${project.version}-${buildNumber}</name>
<alias>kb-db-tag</alias>
<run>
<log>
<prefix>KB-DB-TAG</prefix>
</log>
<links>
<link>postgres:postgres</link>
</links>
<env>
<DB_USERNAME>${docker-maven-plugin.POSTGRES_USER}
</DB_USERNAME>
<DB_PASSWORD>${docker-maven-plugin.POSTGRES_PASSWORD}
</DB_PASSWORD>
<DB_NAME>${docker-maven-plugin.POSTGRES_DB}
</DB_NAME>
</env>
<cmd>
db tag
--migrations /liquibase/db.changelog.yml
/docker-it.yml
${project.artifactId}-${project.version}-${buildNumber}
</cmd>
<wait>
<log>liquibase: Successfully released change log lock</log>
<time>30000</time>
</wait>
</run>
</image>
<image>
<name>${docker.imageName}:${project.version}-${buildNumber}</name>
<alias>kb-server</alias>
<run>
<ports>
<port>kb.port:8080</port>
</ports>
<log>
<prefix>KB-DOCKER</prefix>
</log>
<links>
<link>postgres:postgres</link>
</links>
<env>
<DB_USERNAME>${docker-maven-plugin.POSTGRES_USER}
</DB_USERNAME>
<DB_PASSWORD>${docker-maven-plugin.POSTGRES_PASSWORD}
</DB_PASSWORD>
<DB_NAME>${docker-maven-plugin.POSTGRES_DB}
</DB_NAME>
<AWS_ACCESS_KEY_ID>${env.AWS_ACCESS_KEY_ID}</AWS_ACCESS_KEY_ID>
<AWS_SECRET_ACCESS_KEY>${env.AWS_SECRET_ACCESS_KEY}</AWS_SECRET_ACCESS_KEY>
</env>
<cmd>
server /docker-it.yml
</cmd>
<wait>
<url>http://${docker.host.address}:${kb.port}/swagger.json</url>
<time>300000</time>
</wait>
</run>
</image>
</images>
</configuration>
<executions>
<execution>
<id>start</id>
<phase>pre-integration-test</phase>
<goals>
<goal>build</goal>
<goal>start</goal>
</goals>
</execution>
<execution>
<id>stop</id>
<phase>post-integration-test</phase>
<goals>
<goal>stop</goal>
</goals>
</execution>
</executions>
</plugin> |
When zero bytes are read when trying to read the 8 byte header, assume the stream is done and stop iterating. If 1-7 bytes are read when trying to read the 8 byte header, throw an IOException indicating the error. If anything other than the full N bytes are read for the message, i.e. if 0 to N-1 bytes are read, also throw an IOException. This ensures that unexpected log stream behavior is propaged as an error to the user. Signed-off-by: Thomas Van Doren <thomas.vandoren@gmail.com>
One issue could be that the chunks read to compare the logfile against are split within the log message (so e.g.that one chunk contains "liquibase: Successfully" and the next "released change log lock", but the log check only matches on a single's chunk's content). I will going to maintain a buffer for the whole line and match always on whole lines. |
btw, I merged your PR already into branch |
There must be something wrong with the length decoding since I don't think that there are 750 MB logs there to read (as indicated in the error message) |
Hmm, dont think its the parsing. Could you please do me a favor and rerun your example with the debugging switched on (-X) and from the latest commit 5268542 (head of branch |
Any chance to get a runnable example so that I can reproduce it ? I think the pom with some liquibase migration scripts could be enough to provoke this behaviour ... My feeling is that we would need at least two images .. |
@rhuss I shared a private repo with you that repros the issue. Shoot me an email ( thomas.vandoren@gmail.com ) if you have issues or questions. |
Thanks for sharing the repo, I could indeed reproduce the issue. From the logs (900 MB each ;-) I can see that there is an issue with concurrent reads on the socket and even when I'm not using a pooled connection librarary, I see strange behaviour of data appearing on wrong connections (we have several connections open in parallel, one is for reading the log for the wait condition, another is printing it out to the cosole). I suspect that there is an issue with UNIX socket handling in https://github.com/jnr/jnr-unixsocket When using TCP sockets, everything seems fine to me. Could you please try the following on your own:
I will also check whether an update of Though still very curious whether it make a difference for you when using TCP ... ;-) Thanks a lot for setting up the test case, much appreciated :) |
I just run 23 tests in a row with using TCP without any issue /wrt to the build. However, the Docker daemon broke completely, hanging somewhere in kernel mode. Now rebooting and the same with an updated unix socket lib ... |
Using TCP, I was able to run the build 78 times without any issues before I ran out of disk space due to /var/lib/docker/volumes getting too big. I think it's safe to say the TCP workaround works! We'll use that for now. I was able to reproduce the timeout 1 out of 10 times using unix sockets and the upgraded jnr-unixsocket library. I built docker-maven-plugin from the integration branch at 9c6ac58. Let me know if I can provide any additional information about the unix socket issue. Thanks for your help! |
Use ByteBuffers and ByteStreams.readFully() to ensure buffers are correctly
filled. Previously, it was possible to read a partial header or partial message, because
IOUtils.read only "guarantees that it will read as many bytes as possible before giving
up". This means it could read all 8 bytes requested for the header or all N bytes
requested for message, but it could also read fewer bytes in which case it would end up
with a partial header or message.
readStreamFrame() is inspired by and based on:
https://gist.github.com/dflemstr/9541193d141729b6c84d/e02c64c67826a2a4291eb1e950bef1e71999b8d3#file-dockerlogstream-java-L90-L115
Motivation
Using this configuration in pom.xml:
About 50% of the time, the expected output from docker container would show up,
but the plugin would not recognize it and eventually timeout. An example log
snippet is:
Other invalid format error messages that I observed include:
There were also cases where no stack trace was printed, the expected line did
show up, but the plugin still timed out "waiting" for it. I'm not sure what failed in
those cases.
This all seems like potential corrupt byte buffer behavior to me, which led to
this change.
Signed-off-by: Thomas Van Doren thomas.vandoren@gmail.com
Related to: #259