Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support reexporting modules #357

Closed
mboes opened this issue Jul 22, 2018 · 16 comments · Fixed by #397
Closed

Support reexporting modules #357

mboes opened this issue Jul 22, 2018 · 16 comments · Fixed by #397

Comments

@mboes
Copy link
Member

mboes commented Jul 22, 2018

GHC supports reexporting modules from upstream dependencies, since 7.10.3. See https://www.haskell.org/cabal/users-guide/developing-packages.html#pkg-field-library-reexported-modules. This will be useful for Backpack support. We could implement this with an API like the following:

haskell_library(
  name = "foo",
  deps = [":bar", ":baz"],
  exports = {":bar": "Data.Bar1 as Data.Bar1_2, Data.Bar2 as Data.Bar2_2"},
)

I propose the exports attribute name to be consistent with the Java rules. The syntax inside the strings is the same as Cabal and GHC.

@lunaris
Copy link
Collaborator

lunaris commented Jul 25, 2018

I've started hacking at this (PR shortly hopefully). Basics are there but current set of compromises/challenges is:

  • The best we have for dictionaries is attr.label_keyed_string_dict; I think we actually want the nonexistent attr.label_keyed_string_list_dict so that we can map from dependency to list of reexports. However I can work around this for now with string joining/splitting and helpers and am of course open to better suggestions on how to tackle this. (For the record we want a label key to resolve the target and its package_id, which we need for the registration file)

  • On the above point, prebuilt (GHC-included) dependencies don't have package_ids and I need a version for the reexport. I can get one from ghc-pkg list <name> so just have to hack that into haskell_import as an output (%{name}-package-id, say) and use that in the other rule. Still getting my head around this since I can't just run ghc-pkg at analysis time and stick the version in the provider.

@lunaris
Copy link
Collaborator

lunaris commented Jul 26, 2018

OK I think I've got this working. However, in doing so I've made some design decisions that I think it might be worth discussing here before I open some PRs (not sure who other than @mboes I should tag). The thing that generally underpins all of these is the problems created by packages that come pre-installed with GHC:

  • Working backwards, the first job is to amend package (actions/package.bzl) so that it extends exposed-modules with reexports of the form Module.Name from package-id:Original.Module.Name.

  • This is fine for HaskellLibraryInfo-type targets since they have package_ids and we can pull Module.Name and Original.Module.Name from the (arbitrarily-specified) strings we have in the attr.label_keyed_string_dict that is the exports attribute.

  • For HaskellPrebuiltLibraryInfo, this is not currently possible because these only have a package property (e.g. bytestring). ghc-pkg is capable of telling us that it's e.g. bytestring-0.10.8.2 but getting the result of that command is a bit fiddly.

  • Alternative solution: why not have haskell_import expose a version attribute? Since it's intended to be used by tooling (e.g. Hazel), this seems acceptable. Moreover, since we're going to need it for other things (other issues like thinning but also general improvements listed later), can we not make it mandatory = True? Design choice 1

  • Having made design choice 1, I've also got a patched version of https://github.com/FormationAI/hazel that passes the new version attribute (since it already has it in its core_packages dictionary).

  • Now that we have versioned included packages, design choice 2 is to include their package_ids (generated as name-version) into the depends field when calling package. There's an XXX in there saying we'd like to do just this, so I assume it's OK.

  • While we're in package, we can instrument it to add the aforementioned Module.Name from package-id:Original.Module.Name bits too.

  • This works just fine, unless we are reexporting from a builtin library because, despite now having the package ID, we are not including GHC's default package library path when we construct the GHC_PACKAGE_PATH environment variable.

  • The fix I've chosen for this (design choice 3) is to add another argument to haskell_toolchain, packages, which should be passed a Label for a target which hits ghc-*/package.conf.d/* files. I've achieved this by using a custom BUILD file for the GHC in nixpkgs with something like:

filegroup(
  name = "packages",
  srcs = glob(["lib/**/package.conf.d/*"]),
)

If we were to accept this design choice, we'd have to (I think) add a similar sort of BUILD file to rules_haskell that could be used (since rules_nixpkgs only exports bin and lib by default). This is I think the most tenuous decision since it affects the UX quite a bit -- feedback/alternatives quite welcome.

  • Once I've got the packages attribute I can just add those filepaths to GHC_PACKAGES_PATH and it all seems to work without a hitch. Moreover I think this lays most/nearly all of the groundwork to implement the other Cabal-like module features (thinning, etc.).

Hope this makes sense, grateful for any feedback. Proof of concept of the changes (rules_haskell only, not Hazel) is at lunaris@2a45548

@judah
Copy link
Collaborator

judah commented Jul 26, 2018

The proposed change to Hazel SGTM; as you said, we can get that information automatically and it simplifies the internals of rules_haskell.

I'm less convinced that choice 3 is necessary. It looks like GHC_PACKAGES_PATH is only used in rules_haskell when registering a package; maybe it's not ever needed then? When calling ghc-pkg, you can just pass --global (indeed, that's the default) to reference the core packages, rather than having to point directly to the files on disk. (Similarly, you can use -package-db for any additional package databases, rather than going via the env variable.)

@lunaris
Copy link
Collaborator

lunaris commented Jul 26, 2018

Thanks for the feedback! If I understand correctly, you're saying we could implement 1 and 2 and then for 3, rather than constructing something like:

GHC_PACKAGES_PATH=pG:p1:p2:..:pn ghc-pkg ...

where pG, is the newly-hacked-in path to GHC's global DB, we could instead do:

ghc-pkg -package-db p1 -package-db p2 ... -package-db pn ...

with the implicit -global taking care of whatever content pG added?

@mboes
Copy link
Member Author

mboes commented Jul 26, 2018

I suspect @regnat will object to mandatory = True for a version field in haskell_import. In his use case, packages come from Nixpkgs, so like for a Stackage snapshot, the versions of everything are already implicit and it would be a cost to have to write those down manually (would be the equivalent of writing a cabal.freeze file by hand) unless we do some code generation of BUILD files (à la Hazel I guess).

@lunaris
Copy link
Collaborator

lunaris commented Jul 26, 2018

Happy to discuss alternatives on mandatory versions. On @judah's point about global DB; on hacking/reading the document it seems that appending a separator (:) to GHC_PACKAGES_PATH triggers ghc-pkg to add them at the end of the package DB stack, which solves the problem (and allows me to completely back out design choice 3, which is good).

@lunaris
Copy link
Collaborator

lunaris commented Jul 26, 2018

Thinking about this, we could make version optional but we'd have to use ghc-pkg as my original comment describes I think and have rules depend on the files generated. Essentially ghc-pkg list {package} > $@ (or similar) and pass that down the chain. However it's a lot simpler if we do have the version as stated. Tangentially, having hacked on both I could see an argument being made to merge rules_haskell and Hazel, at least into a repository with a set of includes that users could choose to use or not. Whether or not anyone else thinks this is a good/bad idea might affect this choice.

@mboes
Copy link
Member Author

mboes commented Jul 26, 2018

I'd say try the simplest thing first and then we can iterate, if we really don't want mandatory versions.

@mboes
Copy link
Member Author

mboes commented Jul 26, 2018

I'm in favour of merging Hazel. @judah what do you think?

@judah
Copy link
Collaborator

judah commented Jul 26, 2018

Merging Hazel into rules_haskell sounds good to us. A couple of up-front requests I would have around the merge are: (1) if possible, to preserve the history with git merge --allow-unrelated-histories, and (2) to keep the code that originated from Google within a third_party/cabal2bazel directory like it is now.

I probably won't have time to actually do that merge in the near future, but could help advise if someone else wanted to take it on.

@thufschmitt
Copy link
Contributor

thufschmitt commented Jul 27, 2018 via email

@lunaris
Copy link
Collaborator

lunaris commented Jul 31, 2018

So it turns out that the version field, and using it to construct a package-id is actually too naive and doesn't work with Nix's GHC-with-packages. The current implementation works by using -package for the so-called prebuilt dependencies, which means GHC will resolve them to IDs using its package database. In my branch, I've changed that to use only package IDs (arguably the better solution) but the problem is that, for e.g. Nix-based dependencies, the ID will be something like packageName-version-nixStoreHash. At first glance, resolving this is non-trivial; will have a think.

@lunaris
Copy link
Collaborator

lunaris commented Aug 1, 2018

OK, after a bit of research I think the following plan makes sense:

  • Pause this effort and first look to merge Hazel and rules_haskell, making no API changes in the process.

  • Once the two repositories are merged, look to clean up the API to users. Currently there are a number of fiddly bits required to use both libraries -- in Hazel's case you typically need to patch your C toolchain so that your Nix store is correctly used, etc.; in rules_haskell's case we have the haskell_import functions and the like. My thinking is that we could instead create one (or two) repository rules with a richer configuration that would take care of this and make things a bit more "just load and call this rule".

  • Now that we have a set of repository rules that take care of a few more common use cases, we can exploit the fact that we are permitted some execution in the rules. I propose we execute ghc-pkg here to resolve the actual package IDs of the prebuilt dependencies/core packages. These can then be used to write the haskell_import rules (which I imagine we could turn off if users didn't want them generated for some reason). This would negate the need for Hazel to generate them or Nix users to write them.

Appreciate there's a lot of change in here; keen for feedback @mboes @judah and any others.

@mboes
Copy link
Member Author

mboes commented Aug 5, 2018

rules_haskell supports two use cases at the moment:

  1. outsourcing dealing with Hackage packages to Nixpkgs,
  2. outsourcing dealing with Hackage packages to Hazel.

@lunaris the reason we support (1) is because it's so low-tech. When rules_haskell started, we could use it in production without first working our way up to full Hazel. Now that Hazel works great, the Nixpkgs route is still a nice backup to have, and for some setups might be a better fit. That said, the future for most people is probably Hazel. So my view is that (1) is nice to have, but if it gets in the way of progress significantly then we'd be okay with dropping it.

At first, it sounded to me like this ticket was blocked in a big way by the Nixpkgs use case. Thus potentially warranting us dropping it now. But it turns out that at least for this ticket, there's another workaround.

It turns out that GHC is perfectly happy having only package names in the package configuration files in the package db. It's ghc-pkg register that's not happy about those (wants full "unit id's"). But I would argue that the bug here is that ghc-pkg's validation is too strict. That it shouldn't preclude use cases that GHC allows. I filed https://ghc.haskell.org/trac/ghc/ticket/15478#ticket to that effect.

Meanwhile, we can work around this upstream bug by not using ghc-pkg register. We can instead copy the file directly into the package db, then use ghc-pkg recache. We lose the benefit of ghc-pkg's other validation checks, but most of those would only catch potential rules_haskell bugs, not user mistakes. And we can move back to using ghc-pkg register in future versions of GHC once the upstream bug is fixed.

I'll push a PR later today with small modifications to your #381 that make it all work.

@mboes
Copy link
Member Author

mboes commented Aug 6, 2018

Fixed in #385.

@mboes mboes closed this as completed Aug 6, 2018
@lunaris
Copy link
Collaborator

lunaris commented Aug 8, 2018

So unfortunately I think the change to using -package / ghc-pkg recache might not quite resolve this. While everything works and is accepted from a ghc-pkg point of view, actually using the exports feature risks an Ambiguous module .. GHC panic. I've repurposed https://github.com/lunaris/bazel-example (master) to demonstrate this:

  • test-lib exports Data.Map.Strict from containers.
  • test-bin depends on test-lib and imports Data.Map.Strict.
  • bazel build //... throws the aforementioned error.

I'm not quite sure what is up because tweaking the command being run (from passing --subcommands to bazel build) to drop the -v0 --make and instead call -v --interactive only shows containers being loaded once. Hacking -package containers to be -package-id containers-0.5.10.2 doesn't help (at least when compiling test-bin) because it seems the library has already been compiled in a way that makes it ambiguous.

@lunaris lunaris reopened this Aug 8, 2018
mboes added a commit that referenced this issue Aug 9, 2018
In #385, we claimed to exploit the fact that GHC only needs package
names in reexport declarations to obviate the need for knowing the
versions (and dependency hashes) of all prebuilt packages. We provided
a test to substantiate this indeed works. But it turns that the test
was not a good test.

The test was flawed because it featured libraries reexporting `base`.
It turns out that `base` (along with `rts`) are magical in GHC.
Because they are so-called "wired-in packages", they are assumed
unique and are therefore not located using a version number. So while
the scheme in #385 worked for `base`, it didn't work in the general
case, as observed by @lunaris.

The fix consists qualifying all package reexports with full package
id's, not package names. How do we derive a package id given only
a package name? The solution is to have the toolchain generate a dump
of the global package database that ships with the compiler. This is
only needed once for the entire workspace. From this dump, we can
lookup the full package id given any particular name.

Fixes #357.
mboes added a commit that referenced this issue Aug 10, 2018
In #385, we claimed to exploit the fact that GHC only needs package
names in reexport declarations to obviate the need for knowing the
versions (and dependency hashes) of all prebuilt packages. We provided
a test to substantiate this indeed works. But it turns that the test
was not a good test.

The test was flawed because it featured libraries reexporting `base`.
It turns out that `base` (along with `rts`) are magical in GHC.
Because they are so-called "wired-in packages", they are assumed
unique and are therefore not located using a version number. So while
the scheme in #385 worked for `base`, it didn't work in the general
case, as observed by @lunaris.

The fix consists qualifying all package reexports with full package
id's, not package names. How do we derive a package id given only
a package name? The solution is to have the toolchain generate a dump
of the global package database that ships with the compiler. This is
only needed once for the entire workspace. From this dump, we can
lookup the full package id given any particular name.

Fixes #357.
mboes added a commit that referenced this issue Aug 10, 2018
In #385, we claimed to exploit the fact that GHC only needs package
names in reexport declarations to obviate the need for knowing the
versions (and dependency hashes) of all prebuilt packages. We provided
a test to substantiate this indeed works. But it turns that the test
was not a good test.

The test was flawed because it featured libraries reexporting `base`.
It turns out that `base` (along with `rts`) are magical in GHC.
Because they are so-called "wired-in packages", they are assumed
unique and are therefore not located using a version number. So while
the scheme in #385 worked for `base`, it didn't work in the general
case, as observed by @lunaris.

The fix consists qualifying all package reexports with full package
id's, not package names. How do we derive a package id given only
a package name? The solution is to have the toolchain generate a dump
of the global package database that ships with the compiler. This is
only needed once for the entire workspace. From this dump, we can
lookup the full package id given any particular name.

Fixes #357.
mboes added a commit that referenced this issue Aug 10, 2018
In #385, we claimed to exploit the fact that GHC only needs package
names in reexport declarations to obviate the need for knowing the
versions (and dependency hashes) of all prebuilt packages. We provided
a test to substantiate this indeed works. But it turns that the test
was not a good test.

The test was flawed because it featured libraries reexporting `base`.
It turns out that `base` (along with `rts`) are magical in GHC.
Because they are so-called "wired-in packages", they are assumed
unique and are therefore not located using a version number. So while
the scheme in #385 worked for `base`, it didn't work in the general
case, as observed by @lunaris.

The fix consists qualifying all package reexports with full package
id's, not package names. How do we derive a package id given only
a package name? The solution is to have the toolchain generate a dump
of the global package database that ships with the compiler. This is
only needed once for the entire workspace. From this dump, we can
lookup the full package id given any particular name.

Fixes #357.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants