-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support use of Path in ParsingUtils, SeekableStreamFactory, AbstractVCFCodec #724
Support use of Path in ParsingUtils, SeekableStreamFactory, AbstractVCFCodec #724
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tomwhite Seems like very reasonable changes. Hard to test that they're actually working though. Especially the case with a weird classpath. I have a few concrete comments and then a vague suggestion that we need tests with non file:
paths just to shake out any places where it might accidentally be falling back to files without us knowing it. I think maybe if we include a test dependency on an in memory filesytem we could make much stronger test for the paths without having to bundle anything with htsjdk on release.
public static Path getPath(String uriString) throws IOException { | ||
URI uri = URI.create(uriString); | ||
try { | ||
return uri.getScheme() == null ? Paths.get(uriString) : Paths.get(uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment here explaining this line. It's going to be very confusing for anyone who doesn't know that Paths.get(String)
and Paths.get(URI)
have different mechanics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (cl == null) { | ||
throw e; | ||
} | ||
return FileSystems.newFileSystem(uri, new HashMap<>(), cl).provider().getPath(uri); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's a bug here. I tested this code explicitly by including the gcs provider in the hellbender path and running the following tests:
@Test
public void testCompletePath() throws IOException {
newFilesystem(URI.create("gs://hellbender/somefile.txt"));
}
@Test
public void testBucketOnly() throws IOException {
newFilesystem(URI.create("gs://hellbender"));
}
private void newFilesystem(URI uri) throws IOException {
ClassLoader cl = Thread.currentThread().getContextClassLoader();
final FileSystem fileSystem = FileSystems.newFileSystem(uri, new HashMap<>(), cl);
}
the first test fails with
java.lang.IllegalArgumentException: GCS FileSystem URIs mustn't have: port, userinfo, path, query, or fragment: gs://hellbender/somefile.txt
at shaded.cloud-nio.com.google.common.base.Preconditions.checkArgument(Preconditions.java:146)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.newFileSystem(CloudStorageFileSystemProvider.java:192)
at com.google.cloud.storage.contrib.nio.CloudStorageFileSystemProvider.newFileSystem(CloudStorageFileSystemProvider.java:83)
at java.nio.file.FileSystems.newFileSystem(FileSystems.java:326)
at htsjdk.tribble.util.ParsingUtilsTest.newFilesystem(ParsingUtilsTest.java:179)
at htsjdk.tribble.util.ParsingUtilsTest.testCompletePath(ParsingUtilsTest.java:169)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85)
at org.testng.internal.Invoker.invokeMethod(Invoker.java:639)
at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:816)
at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1124)
at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108)
at org.testng.TestRunner.privateRun(TestRunner.java:774)
at org.testng.TestRunner.run(TestRunner.java:624)
at org.testng.SuiteRunner.runTest(SuiteRunner.java:359)
at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:354)
at org.testng.SuiteRunner.privateRun(SuiteRunner.java:312)
at org.testng.SuiteRunner.run(SuiteRunner.java:261)
at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
at org.testng.TestNG.runSuitesSequentially(TestNG.java:1215)
at org.testng.TestNG.runSuitesLocally(TestNG.java:1140)
at org.testng.TestNG.run(TestNG.java:1048)
at org.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:72)
at org.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:127)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
The second passes. So we can't just use the URI
that's passed in to open the new file system. I'm not sure if this is a bug in the implementation of the GCS fileSystemProvider or if it's going to be a consistent issue with multiple providers. The other providers that I know of don't offer consistent behavior here that I can tell. UnixFileSystem provider always throws because it doesn't want to be reinstantiated. The HDFS provider from hdfs.jsr203 seems to be more robust and just picks out the part of the URI that it needs.
@jean-philippe-martin Can you comment on this? It looks like a simple patch to the gcs provider would make it so that this approach would work.
@tomwhite Have you tested this running in spark? I don't see how it can work, but maybe I'm missing something important. I do see that our implementation of this method in gatk has a special case for gcs still.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did try running this on HDFS with Spark a while ago and it worked. There's a fix for the GCS issue from @jean-philippe-martin here: googleapis/google-cloud-java#1470
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lbergerson: yes this was a bug, it's fixed now.
@tomwhite: Does this mean that on Spark after someone calls getPath on a gs:// URI, the fs provider will be loaded and Path.get will work correctly? Because that would be sweet.
@@ -143,6 +164,18 @@ private void tstExists(String path, boolean expectExists) throws IOException{ | |||
} | |||
|
|||
@Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a way of testing non-file URI's. I'm not sure the best way to go about it. One possibility would be to include one of the alternative filesystem providers as a test dependency. Either that or maybe we could mock one. There is a [google in memory filesystem] that might make a good test for the non-file based file systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@droazen What do you think about including a new test dependency for providing alternative paths. It might help us avoid problems like the index one we saw before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I've added a test dependency on https://github.com/google/jimfs, and a few tests that use non-file URIs that exercise the NIO file path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* @return the resulting {@code Path} | ||
* @throws IOException an I/O error occurs creating the file system | ||
*/ | ||
public static Path getPath(String uriString) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you replace the use of Paths.get
in SamInputResource.UrlInputResource
with a call to this instead? Just in case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
d3f3a4b
to
f15fbdd
Compare
Current coverage is 63.863% (diff: 50.000%)@@ master #724 diff @@
==========================================
Files 523 523
Lines 31643 31660 +17
Methods 0 0
Messages 0 0
Branches 6766 6772 +6
==========================================
+ Hits 20214 20219 +5
- Misses 9293 9301 +8
- Partials 2136 2140 +4
|
@lbergelson thanks for the review. I've addressed all your comments, so please take another look. (See broadinstitute/gatk#2344 for the companion GATK issue.) |
@tomwhite Changes look good. Tests with jimfs are a good addition. It would be good to have a test that reads a vcf with an index from jimfs so we make sure we avoid the issue we had with bam indexes. I want to get this in though, so I'm going to merge and we can add one after the fact. |
Description
Added support for java.util.nio.Path for some more code pathways.
This is to support loading of Features with Spark in GATK.
Checklist