Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NetCDF_jll v400.702.402+0 and later broken on Windows #164

Closed
visr opened this issue Feb 25, 2022 · 19 comments
Closed

NetCDF_jll v400.702.402+0 and later broken on Windows #164

visr opened this issue Feb 25, 2022 · 19 comments

Comments

@visr
Copy link
Member

visr commented Feb 25, 2022

Describe the bug

A new NetCDF_jll release was made a few days ago. It appears that this doesn't communicate well with the HDF5 dependency.

This was introduced in JuliaPackaging/Yggdrasil#4481, cc @felixcremer. I don't see anything wrong in the build file, but the main difference is that it is linked to HDF5_jll v1.12.1+0 instead of the older v1.12.0+1.

Shall we yank the build? We would lose a functioning Apple M1 build however.

Side note: this is not directly related to NCDatasets. @Alexander-Barth do you prefer that I create these issues in Yggdrasil? I thought here might be better for visibility for users.

To Reproduce

Here is an example using only NetCDF_jll, to keep it as simple as possible.

julia> using NetCDF_jll
julia> unsafe_string(ccall((:nc_inq_libvers, libnetcdf), Cstring, ()))
"4.7.4 of Feb 22 2022 14:00:01 \$"
julia> NC_CLASSIC_MODEL = 0x0100
julia> NC_NETCDF4 = 0x1000
julia> # it can create a classic netcdf (no HDF5 needed)
julia> ccall((:nc_create, libnetcdf), Cint, (Cstring, Cint, Ptr{Cint}), "test.nc3", NC_CLASSIC_MODEL, Ref(Cint(0)))
julia> # but gives a segfault on netcdf4
julia> ccall((:nc_create, libnetcdf), Cint, (Cstring, Cint, Ptr{Cint}), "test.nc4", NC_NETCDF4, Ref(Cint(0)))

Expected behavior

Create an empty ""test.nc4" file.

Environment

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 3

Full output

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x2374cb3 -- .text at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
in expression starting at REPL[13]:1
.text at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
NC4_create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
NC_create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
nc__create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
nc_create at C:\Users\visser_mn\.julia\artifacts\2b6e2ce84250e36811c3019c1ad253c1739c888f\bin\libnetcdf-18.dll (unknown line)
top-level scope at .\REPL[13]:1
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:876
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:830
jl_toplevel_eval_flex at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:830
jl_toplevel_eval at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:894 [inlined]
jl_toplevel_eval_in at /cygdrive/c/buildbot/worker/package_win64/build/src\toplevel.c:944
eval at .\boot.jl:373 [inlined]
eval_user_input at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:150
repl_backend_loop at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:246
start_repl_backend at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:231
#run_repl#47 at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:364
run_repl at C:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.7\REPL\src\REPL.jl:351
#930 at .\client.jl:394
jfptr_YY.930_36349.clone_1 at C:\Users\visser_mn\.julia\juliaup\julia-1.7.2+0~x64\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
jl_f__call_latest at /cygdrive/c/buildbot/worker/package_win64/build/src\builtins.c:757
#invokelatest#2 at .\essentials.jl:716 [inlined]
invokelatest at .\essentials.jl:714 [inlined]
run_main_repl at .\client.jl:379
exec_options at .\client.jl:309
_start at .\client.jl:495
jfptr__start_21275.clone_1 at C:\Users\visser_mn\.julia\juliaup\julia-1.7.2+0~x64\lib\julia\sys.dll (unknown line)
jl_apply at /cygdrive/c/buildbot/worker/package_win64/build/src\julia.h:1788 [inlined]
true_main at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:559
jl_repl_entrypoint at /cygdrive/c/buildbot/worker/package_win64/build/src\jlapi.c:701
mainCRTStartup at /cygdrive/c/buildbot/worker/package_win64/build/cli\loader_exe.c:42
BaseThreadInitThunk at C:\WINDOWS\System32\KERNEL32.DLL (unknown line)
RtlUserThreadStart at C:\WINDOWS\SYSTEM32\ntdll.dll (unknown line)
Allocations: 9681000 (Pool: 9675485; Big: 5515); GC: 13

Workaround

add NetCDF_jll@400.702.400

or

[compat]
NetCDF_jll = "=400.702.400"
@Alexander-Barth
Copy link
Member

Thanks a lot for letting me know!
Can somebody on Windows test if HDF5 with HDF5_jll v1.12.1+0 can run its test suite without failure?

@visr
Copy link
Member Author

visr commented Feb 26, 2022

Yes, I just checked, HDF5.jl, doesn't have an issue with that HDF5_jll, so it seems both libs work individually, but not together, on Windows.

@Alexander-Barth
Copy link
Member

Thanks @visr for the update! Filled also a bug report here:
JuliaPackaging/Yggdrasil#4511

@Alexander-Barth Alexander-Barth pinned this issue Mar 4, 2022
Alexander-Barth added a commit that referenced this issue Mar 4, 2022
@Alexander-Barth
Copy link
Member

Shall we yank the build? We would lose a functioning Apple M1 build however.

As there are many more Windows users than Apple M1 users, I would indeed be in favour to yank the build of NetCDF_jll.

@Alexander-Barth
Copy link
Member

This issue is not resolved but mitigated by declaring NCDatasets incompatible with v400.702.402+0 in NCDatasets 0.12.

@visr
Copy link
Member Author

visr commented Mar 11, 2022

Thanks! Just curious, why did you go this route versus yanking the build from the registry? Though I suppose this way Apple M1 users still have a way to use the build if they want.

@Alexander-Barth
Copy link
Member

Alexander-Barth commented Mar 11, 2022

Though I suppose this way Apple M1 users still have a way to use the build if they want.

Yes this is one reason, and I had a classroom full of students last Monday (mostly Windows, some Apple x86_64, some Linux) and needed a quick solution :-)
But I consider this just a temporary fix. Having a installable but dysfunctional NetCDF_jll (for Windows) is still bad in my opinion.

@Alexander-Barth
Copy link
Member

For Julia 1.8 all jll packages (with a shared dependency with julia) need to be rebuild. The is also the case of NetCDF_jll as NetCDF_jll and julia depend on libcurl. Unfortunately the work-around of the setting the compat entry does not work with julia 1.8 on Windows. Unfortunately, there is no installable NetCDF_jll currently on Windows with julia 1.8 (Linux and Mac OS seem to be fine on julia 1.8)

If a windows user want to contribute here is an overview of the involved steps:

  1. Install BinaryBuilder
  2. Clone https://github.com/JuliaPackaging/Yggdrasil
  3. adapt the file N/NetCDF/common.jl and N/NetCDF/NetCDF@julia-1.8/build_tarballs.jl and possibly also the corresponding files for HDF5 (dependency of NetCDF)
  4. get a GITHUB_TOKEN with write permission
  5. build the tarball with julia --color=yes ./build_tarballs.jl x86_64-w64-mingw32 --deploy="<your-github-username>/NetCDF_jll.jl" --verbose
  6. your can install your jll with Pkg.add(url="https://github.com/<your-github-username>/NetCDF_jll.jl")
  7. share your solution with a PR to Yggdrasil

Here is more information on BinaryBuilder: https://docs.binarybuilder.org/stable/. Windows support of BinarBuilder is currently under active development. Alternatively, one can also use a Linux VM for BinaryBuilder.

The hart nut to crack is this:

A workaround might be to install NetCDF via Conda.jl or build locally NetCDF_jll with HDF 1.12.0 (but I did not test these options and it is likely that one need to locally adapt NCDatasets compat entry of NetCDF_jll in the Project.toml file)

@visr
Copy link
Member Author

visr commented Apr 26, 2022

For Julia 1.8 all jll packages (with a shared dependency with julia) need to be rebuild.

I've been on 1.8 beta for a while and never encountered any issues with the Windows build that is currently pinned (NetCDF_jll v400.702.400+0), and the tests pass locally.

Do you know of any code examples that would fail on 1.8 Windows? I assume these would have to use the shared dependencies (curl, MbedTLS, zlib).

@Alexander-Barth
Copy link
Member

The world of dynamic libraries does not cease to surprise me !
I saw some issues on Linux with julia 1.8 that libnetcdf.so could not be loaded (similar to the transition from julia 1.5 to julia 1.6 where lubcurl was also updated) and rebuilding NetCDF_jll was the solution then. On Linux, I can only use NetCDF_jll v400.802.102+0 with julia 1.8 (not available for Windows).

But maybe Windows can handle a library version mismatch better than Linux.
Does a opendap URL works for you?

using NCDataset 
ds = NCDataset("https://erddap.ifremer.fr/erddap/griddap/SDC_GLO_CLIM_TS_V2_1")

@visr
Copy link
Member Author

visr commented Apr 26, 2022

Ha, fascinating indeed. The opendap URL also just works, including when I load some actual data into memory (ds["time"][:]).

@Alexander-Barth
Copy link
Member

Thank you for confirming, I am hitting another bug with Linux #173.

adigitoleo added a commit to adigitoleo/PlateMotionRequests.jl that referenced this issue May 28, 2022
The NetCDF write test fails on Windows, perhaps something similar to:
JuliaGeo/NCDatasets.jl#164
Try to mitigate by forcing an older NetCDF_jll version for now.
@Alexander-Barth
Copy link
Member

Alexander-Barth commented Jul 8, 2022

For future reference, here is how to pin the NetCDF version (for windows users only to run the NCDatasets master version on julia 1.8):

using Pkg
Pkg.add("NetCDF_jll")
Pkg.pin(name="NetCDF_jll", version="400.702.400")

@Alexander-Barth
Copy link
Member

For the record, here is a test I made in June with HDF5 1.12.2 from MSYS2 (Windows):

$ pacman -Q | grep -i HDF5
mingw-w64-x86_64-hdf5 1.12.2-1

I compiled NetCDF C 4.8.1 from source in MSYS2

./configure --disable-testsets  --enable-shared  --disable-static  --disable-dap-remote-tests
make LDFLAGS=" -no-undefined -Wl,--export-all-symbols" 

In Julia, I used these libraries using set_preferences!:

using Preferences, HDF5_jll, NetCDF_jll

set_preferences!(HDF5_jll, "libhdf5_path" => raw"C:\msys64\mingw64\bin\libhdf5-0.dll")
set_preferences!(NetCDF_jll, "libnetcdf_path" => raw"C:\msys64\home\Alexander Barth\netcdf-c\liblib\.libs\libnetcdf-19.dll")

While running the test suite,

using NCDatasets
include(joinpath(dirname(pathof(NCDatasets)),"..","test","runtests.jl"))
NetCDF library: C:\msys64\home\Alexander Barth\netcdf-c\liblib\.libs\libnetcdf-19.dll

I had no failures:

NetCDF version: 4.8.1 of Jun  8 2022 21:44:34 $
Test Summary: | Pass  Total
NCDatasets    |  829    829
Test Summary:  | Pass  Total
NetCDF4 groups |    9      9
Test Summary:          | Pass  Total
Variable-length arrays |   22     22
Test Summary:  | Pass  Total
Compound types |   16     16
Test Summary:      | Pass  Total
Time and calendars |   25     25
Test Summary:       | Pass  Total
Multi-file datasets |   70     70
Test Summary:     | Pass  Total
Deferred datasets |   13     13
Test Summary: | Pass  Total
@select macro |   33     33
Test.DefaultTestSet("@select macro", Any[], 33, false, false)

Surprisingly, when NetCDF 4.9.0 is compiled with BinaryBuilder using the recently released HDF5_jll 1.12.2, I get now (again) these errors in julia 1.8.0 rc3:

NetCDF_jll.libnetcdf = "C:\\Users\\runneradmin\\.julia\\artifacts\\e3b96f6ac2bb213ecbcbce2ca0ac0bb43bf9561d\\bin\\libnetcdf-19.dll"
NetCDF library: C:\Users\runneradmin\.julia\artifacts\e3b96f6ac2bb213ecbcbce2ca0ac0bb43bf9561d\bin\libnetcdf-19.dll
NetCDF version: 4.9.0 of Aug  1 2022 12:57:29 $

Please submit a bug report with steps to reproduce this fault, and any error messages that follow (in their entirety). Thanks.
Exception: EXCEPTION_ACCESS_VIOLATION at 0x66fb1a1c -- nc4_create_file at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:120
in expression starting at D:\a\NCDatasets.jl\NCDatasets.jl\test\test_simple.jl:11
nc4_create_file at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:120
NC4_create at /workspace/srcdir/netcdf-c-4.9.0/libhdf5\hdf5create.c:321

Full logs
https://github.com/Alexander-Barth/NCDatasets.jl/runs/7612204603?check_suite_focus=true

I somebody whats to try here is the dll of NetCDF 4.8.1 which worked for me:
https://dox.ulg.ac.be/index.php/s/DSZy9SNCUJmCRZA

$ sha1sum libnetcdf-19.dll
47efeb7dcc8756d62d4bd832f2f3bbb8e7fd2c09 *libnetcdf-19.dll

@visr
Copy link
Member Author

visr commented Aug 2, 2022

Oh that's a bummer! So do I understand correctly that it's not the HDF5 patch version 1 or 2 that is important, but whether or not netcdf is cross compiled? And the last cross compiled netcdf that was successfully built against a HDF5 mingw build, was netcdf 4.7 against HDF5 1.12.0? https://github.com/JuliaPackaging/Yggdrasil/blob/5b9aa3d48766ab2681f6b92e0b7e6116ddfc5e27/N/NetCDF/common.jl

@Alexander-Barth
Copy link
Member

Alexander-Barth commented Aug 2, 2022

To be honest, I am not sure what exactly triggers this error but it appears indeed to be a cross-compilation issue introduced in hdf5 1.12.1. As far as I know netcdf only test native compliation, and only relatively recently the mingw compiler in CI.

@Alexander-Barth
Copy link
Member

Alexander-Barth commented Aug 18, 2022

This long standing issue, should be fixed thanks to NetCDF_jll 400.902.5 .

@visr
Copy link
Member Author

visr commented Aug 18, 2022

Thanks a lot for a major effort! Hope it gets easier. Looks like this issue can be unpinned as well :)

@Alexander-Barth
Copy link
Member

Thanks Martijn for your help too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants