Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Take Bazel package names into account when generating GHC package names #222

Merged
merged 3 commits into from
May 1, 2018

Conversation

mrkkrp
Copy link
Member

@mrkkrp mrkkrp commented Apr 30, 2018

Close #219.

In short, there are many possible approaches to do encoding. The key, it seems, is to take a rare character and use it as escape character remembering to escape the escape character itself to achieve injectivity. With @mboes We discussed several possibilities and I implemented an encoding scheme where escaping character is zero 0.

So we use 0R to separate workspace root from (Bazel) package name and 0D to separate package name from target name. In addition to that we escape zero 0 as 00, underscore _ as 0U and slash / as 0S. This should work.

We also discussed using dash, perhaps something symmetric like -U-, but then two underscores in a row would result in an invalid package name, because GHC does not like underscores or more than one dash in a row.

@mboes also proposed to prefix all digits with 0 because this is less-surprising (i.e. we escape 0, and so we should escape all digits for consistency).

@judah What do you think?

@mboes mboes requested a review from judah April 30, 2018 11:28
@mboes
Copy link
Member

mboes commented Apr 30, 2018

Note for @judah: ghc package names do leak to the user, so if we prefix them and if that requires some encoding, then we need to choose an encoding that's as readable as reasonably possible. These names at least leak in the Haddock HTML indexes, possibly elsewhere. If this weren't the case, any encoding no matter how ugly would do.

After "sleeping on it", I think it's worth considering these alternative nuances to the bikeshed's colour:

  • The rarer the escape character, the more readable because any occurrence of the escape character needs to be double escaped. Dates and version components might appear in folder names, so arguably any capital letter is rarer than any digit. We're restricted to [a-zA-Z0-9-] (no underscore) as characters in package names, so using Z as the escape character like GHC does could work better.
  • I believe the 0R and 0D escapes are unnecessary. All fully qualified packages names are of the following form: external/repo/a/b/c, or a/b/c if not in an external repo. So just escaping the /'s as per usual in this fully qualified path should be sufficient.

If we can hide these names to the user in all instances, this all becomes moot.

@judah
Copy link
Collaborator

judah commented Apr 30, 2018

For the Haddock index.html, the GHC package names all appear in tags like <span class="package">foo-0.0.0.0</span>. So it might be possible to post-process that file in the haskell_doc rule to demangle them. (Or, more complicated: substitute in a javascript file which rewrites them after the page has loaded.)

Package names can also appear in some error messages, but I don't know a good way around that.

Returns:
string: GHC package name to use.
"""
return "{0}0R{1}0D{2}".format(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @mboes' suggestion for dropping 0R/0D.

I also don't have a strong preference between 0, Z or something else as the escape character.

mboes added 2 commits May 1, 2018 21:59
Slight difference: where GHC has both `z` and `Z` escape
characters (since they have two namespaces), we use `Z` only. Also get
rid of `0R` and `0D`, which are now escaped like any other slash.
This makes the encoding slightly shorter in the common case.
@mboes mboes merged commit a740782 into master May 1, 2018
@mboes mboes deleted the fix-219 branch May 1, 2018 20:37
judah added a commit to FormationAI/rules_haskell that referenced this pull request May 15, 2018
As of tweag#222, both the package name and ID are z-encoded.  However,
GHC only needs the package *ID* to be unique, as long as you
call `-package-id` instead of `-package`.  As a result we can
simplify how package-names are encoded: we can keep them the same
as the rule name, other than fixing underscores.

In addition to making error messages nicer, this paves the way for some
improvements in Hazel (or similar tools), since we can influence
- It helps support the PackageImports extension.
- GHC >= 8.0 generates the Cabal macros (e.g., `MIN_VERSION_*`) automatically.

Prebuilt dependencies are still passed with `-package` since their IDs
judah added a commit that referenced this pull request May 15, 2018
As of #222, both the package name and ID are z-encoded.  However,
GHC only needs the package *ID* to be unique, as long as you
call `-package-id` instead of `-package`.  As a result we can
simplify how package-names are encoded: we can keep them the same
as the rule name, other than fixing underscores.

In addition to making error messages nicer, this paves the way for some
improvements in Hazel (or similar tools), since we can influence
- It helps support the PackageImports extension.
- GHC >= 8.0 generates the Cabal macros (e.g., `MIN_VERSION_*`) automatically.

Prebuilt dependencies are still passed with `-package` since their IDs
judah added a commit that referenced this pull request May 15, 2018
As of #222, both the package name and ID are z-encoded.  However,
GHC only needs the package *ID* to be unique, as long as you
call `-package-id` instead of `-package`.  As a result we can
simplify how package-names are encoded: we can keep them the same
as the rule name, other than fixing underscores.

In addition to (I think) making error messages nicer, this paves the way for
some improvements in Hazel (or similar tools), now that the rule and package
names can be identical:
- It helps support the PackageImports extension.
- GHC >= 8.0 generates the Cabal macros (e.g., `MIN_VERSION_*`) automatically.

Prebuilt dependencies are still passed with `-package` since their IDs
judah added a commit that referenced this pull request May 15, 2018
As of #222, both the package name and ID are z-encoded.  However,
GHC only needs the package *ID* to be unique, as long as you
call `-package-id` instead of `-package`.  As a result we can
simplify how package-names are encoded: we can keep them the same
as the rule name, other than fixing underscores.

In addition to (I think) making error messages nicer, this paves the way for
some improvements in Hazel (or similar tools), now that the rule and package
names can be identical:
- It helps support the PackageImports extension.
- GHC >= 8.0 generates the Cabal macros (e.g., `MIN_VERSION_*`) automatically.

Prebuilt dependencies are still passed with `-package` since their IDs
judah added a commit that referenced this pull request May 15, 2018
As of #222, both the package name and ID are z-encoded.  However,
GHC only needs the package *ID* to be unique, as long as you
call `-package-id` instead of `-package`.  As a result we can
simplify how package-names are encoded: we can keep them the same
as the rule name, other than fixing underscores.

In addition to (I think) making error messages nicer, this paves the way for
some improvements in Hazel (or similar tools), now that the rule and package
names can be identical:
- It helps support the PackageImports extension.
- GHC >= 8.0 generates the Cabal macros (e.g., `MIN_VERSION_*`) automatically.

Prebuilt dependencies are still passed with `-package` since their IDs
are internal details of GHC.
judah added a commit that referenced this pull request May 15, 2018
As of #222, both the package name and ID are z-encoded.  However,
GHC only needs the package *ID* to be unique, as long as you
call `-package-id` instead of `-package`.  As a result we can
simplify how package-names are encoded: we can keep them the same
as the rule name, other than fixing underscores.

In addition to (I think) making error messages nicer, this paves the way for
some improvements in Hazel (or similar tools), now that the rule and package
names can be identical:
- It helps support the PackageImports extension.
- GHC >= 8.0 generates the Cabal macros (e.g., `MIN_VERSION_*`) automatically.

Prebuilt dependencies are still passed with `-package` since their IDs
are internal details of GHC.
mboes pushed a commit that referenced this pull request May 16, 2018
Make package names more similar to the rule name.

As of #222, both the package name and ID are z-encoded.  However,
GHC only needs the package *ID* to be unique, as long as you
call `-package-id` instead of `-package`.  As a result we can
simplify how package-names are encoded: we can keep them the same
as the rule name, other than fixing underscores.

In addition to (I think) making error messages nicer, this paves the way for
some improvements in Hazel (or similar tools), now that the rule and package
names can be identical:
- It helps support the PackageImports extension.
- GHC >= 8.0 generates the Cabal macros (e.g., `MIN_VERSION_*`) automatically.

Prebuilt dependencies are still passed with `-package` since their IDs
are internal details of GHC.
judah added a commit that referenced this pull request May 18, 2018
It's needed by several Cabal packages; for example `lens`:
https://github.com/ekmett/lens/blob/1aa37c4/src/Control/Lens/Internal/TH.hs#L102

If you don't set it, then packages like `lens` default to logic like:
```
lensPackageKey          = "lens-" ++ showVersion version
```
But that doesn't work as of #222.

For example the `fuzzyset` package breaks with Hazel:
```
bazel-out/k8-fastbuild/bin/external/haskell_fuzzyset/gen-srcs-fuzzyset/Data/FuzzySet/Lens.hs:17:1: error:
    * Failed to load interface for `Control.Lens.Type'
      no unit id matching `lens-4.15.4' was found
      (This unit ID looks like the source package ID;
       the real unit ID is `externalZShaskellZUlensZSlens-4.15.4')
    * In the type signature:
        _useLevenshtein :: lens-4.15.4:Control.Lens.Type.Lens' FuzzySet Bool
   |
17 | makeLensesFor
   | ^^^^^^^^^^^^^...

```

Given that `rules_haskell` has the logic that creates the id string, I think
it makes sense for it to set this parameter itself (rather than manually-written
rules, or macros that use it like Hazel/cabal2bazel).
judah added a commit that referenced this pull request May 18, 2018
It's needed by several Cabal packages; for example `lens`:
https://github.com/ekmett/lens/blob/1aa37c4/src/Control/Lens/Internal/TH.hs#L102

If you don't set it, `lens` defaults to logic like:
```
lensPackageKey          = "lens-" ++ showVersion version
```
But that doesn't work as of #222 which makes the package key more complicated.

For example the `fuzzyset` package breaks with Hazel:
```
bazel-out/k8-fastbuild/bin/external/haskell_fuzzyset/gen-srcs-fuzzyset/Data/FuzzySet/Lens.hs:17:1: error:
    * Failed to load interface for `Control.Lens.Type'
      no unit id matching `lens-4.15.4' was found
      (This unit ID looks like the source package ID;
       the real unit ID is `externalZShaskellZUlensZSlens-4.15.4')
    * In the type signature:
        _useLevenshtein :: lens-4.15.4:Control.Lens.Type.Lens' FuzzySet Bool
   |
17 | makeLensesFor
   | ^^^^^^^^^^^^^...

```

Given that `rules_haskell` has the logic that creates the id string, I think
it makes sense for it to set this parameter itself (rather than manually-written
rules, or macros that use it like Hazel/cabal2bazel).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Conflict between haskell_libraries with the same name in different directories
3 participants