Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formalize public vs private API rule #613

Closed
mih opened this issue Feb 2, 2024 · 5 comments · Fixed by #625
Closed

Formalize public vs private API rule #613

mih opened this issue Feb 2, 2024 · 5 comments · Fixed by #625
Milestone

Comments

@mih
Copy link
Member

mih commented Feb 2, 2024

This has been a frequent discussion topic. We need a simple rule for people to tell if something is meant to be public or part of an internal API.

Right now the contributing guide says:

Code users should be able to import the most relevant functionality from the sub-package's init.py.

I would propose to establish a rule that anything that cannot be imported from a subpackages __init__.py is not considered part of the public API.

This rule is simple.

The one counter argument that I can think of it: There might be pieces that are more special (within in particular subpackage), for example, heavy-ish runtime deps that should not force unconditional imports. If that is the case, it might be worth to:

a) consider moving that code into its own subpackage (if it already has a different dependency set)
b) consider importing within "runtime" code (ie. within a function body)

while otherwise holding on to the general pattern.

@adswa
Copy link
Member

adswa commented Feb 2, 2024

This rule is indeed straightforward - it makes complete sense to me.

@christian-monch
Copy link
Contributor

I like the rule. Same reasoning as @adswa: "straightforward".

@mih
Copy link
Member Author

mih commented Feb 2, 2024

Thanks for the feedback. I looked into what this would mean (using something like git grep 'datalad_next\..*\.' datalad_next/ | grep -v __init__.py | grep -v '/patches/' for gathering initial evidence for subsequent curation).

This reveal a bunch of questions. Should we apply this rule uniformly to all sub-packages (or even sub-directories)?

I'd be in favor of that, for simplicity. I am inclined to make an exception for datalad_next/patches/ (it is not really a place to import anything from. I would try to not make an exception for datalad_next/tests/ and also not for datalad_next/utils/.

With that in mind, here are to chunks of rule-violations we would have in the current code base:

Test tooling imports not from datalad_next.tests directly

# utils
datalad_next/annexbackends/tests/test_base.py:from datalad_next.tests.utils import
datalad_next/annexremotes/tests/test_archivist.py:from datalad_next.tests.utils import
datalad_next/annexremotes/tests/test_uncurl.py:from datalad_next.tests.utils import
datalad_next/commands/tests/test_create_sibling_webdav.py:from datalad_next.tests.utils import
datalad_next/commands/tests/test_credentials.py:from datalad_next.tests.utils import
datalad_next/commands/tests/test_download.py:from datalad_next.tests.utils import
datalad_next/commands/tests/test_status.py:from datalad_next.tests.utils import
datalad_next/commands/tests/test_tree.py:from datalad_next.tests.utils import
datalad_next/credman/tests/test_credman.py:from datalad_next.tests.utils import
datalad_next/gitremotes/tests/test_datalad_annex.py:from datalad_next.tests.utils import
datalad_next/iter_collections/tests/test_iterdir.py:from datalad_next.tests.utils import
datalad_next/iter_collections/tests/test_itergitdiff.py:from datalad_next.tests.utils import
datalad_next/iter_collections/tests/test_itergitstatus.py:from datalad_next.tests.utils import
datalad_next/iter_collections/tests/test_itergittree.py:from datalad_next.tests.utils import
datalad_next/iter_collections/tests/test_itergitworktree.py:from datalad_next.tests.utils import
datalad_next/iter_collections/tests/test_utils.py:from datalad_next.tests.utils import
datalad_next/tests/fixtures.py:from datalad_next.tests.utils import
datalad_next/tests/fixtures.py:from datalad_next.tests.utils import
datalad_next/tests/fixtures.py:from datalad_next.tests.utils import
datalad_next/url_operations/tests/test_ssh.py:from datalad_next.tests.utils import

#marker  
datalad_next/archive_operations/tests/test_tarfile.py:from datalad_next.tests.marker import
datalad_next/commands/tests/test_ls_file_collection.py:from datalad_next.tests.marker import
datalad_next/iter_collections/tests/test_itertar.py:from datalad_next.tests.marker import
datalad_next/url_operations/tests/test_http.py:from datalad_next.tests.marker import

reuse of non-global fixture

datalad_next/commands/tests/test_ls_file_collection.py:from datalad_next.iter_collections.tests.test_iterzip import sample_zip

Credential manager

datalad_next/commands/credentials.py:from datalad_next.credman.manager import
datalad_next/utils/credman.py:from datalad_next.credman.manager import

iterable_subprocess

datalad_next/runners/iter_subproc.py:from datalad_next.iterable_subprocess.iterable_subprocess import

Constraints

datalad_next/annexremotes/tests/test_uncurl.py:from datalad_next.constraints.dataset import
datalad_next/commands/create_sibling_webdav.py:from datalad_next.constraints.dataset import
datalad_next/commands/create_sibling_webdav.py:from datalad_next.constraints.exceptions import
datalad_next/commands/credentials.py:from datalad_next.constraints.dataset import
datalad_next/commands/download.py:from datalad_next.constraints.dataset import
datalad_next/commands/status.py:from datalad_next.constraints.dataset import
datalad_next/commands/tests/test_ls_file_collection.py:from datalad_next.constraints.exceptions import
datalad_next/commands/tests/test_status.py:from datalad_next.constraints.exceptions import
datalad_next/commands/tree.py:from datalad_next.constraints.dataset import

Types

datalad_next/annexremotes/archivist.py:from datalad_next.types.annexkey import AnnexKey
datalad_next/annexremotes/archivist.py:from datalad_next.types.archivist import ArchivistLocator
datalad_next/annexremotes/archivist.py:from datalad_next.types.enums import ArchiveType

iterators

datalad_next/archive_operations/tarfile.py:from datalad_next.iter_collections.tarfile import
datalad_next/archive_operations/zipfile.py:from datalad_next.iter_collections.zipfile import
datalad_next/archive_operations/tests/test_tarfile.py:from datalad_next.iter_collections.utils import
datalad_next/archive_operations/tests/test_zipfile.py:from datalad_next.iter_collections.utils import
datalad_next/commands/ls_file_collection.py:from datalad_next.iter_collections.directory import
datalad_next/commands/ls_file_collection.py:from datalad_next.iter_collections.tarfile import
datalad_next/commands/ls_file_collection.py:from datalad_next.iter_collections.zipfile import
datalad_next/commands/ls_file_collection.py:from datalad_next.iter_collections.utils import
datalad_next/commands/ls_file_collection.py:from datalad_next.iter_collections.gittree import
datalad_next/commands/ls_file_collection.py:from datalad_next.iter_collections.gitworktree import
datalad_next/commands/ls_file_collection.py:from datalad_next.iter_collections.annexworktree import
datalad_next/commands/status.py:from datalad_next.iter_collections.gitdiff import
datalad_next/commands/status.py:from datalad_next.iter_collections.gitstatus import
datalad_next/conftest.py:from datalad_next.iter_collections.tests.test_itertar import
datalad_next/conftest.py:from datalad_next.iter_collections.tests.test_iterzip import

URL operations

datalad_next/annexremotes/tests/test_uncurl.py:from datalad_next.url_operations.any import
datalad_next/annexremotes/uncurl.py:from datalad_next.url_operations.any import
datalad_next/commands/download.py:from datalad_next.url_operations.any import
datalad_next/url_operations/tests/test_ssh.py:import datalad_next.url_operations.ssh

Archive operations

datalad_next/annexremotes/archivist.py:from datalad_next.archive_operations.tarfile import

Common utilities

datalad_next/iter_collections/utils.py:from datalad_next.utils.consts import
datalad_next/iter_collections/utils.py:from datalad_next.utils.multihash import
datalad_next/url_operations/file.py:from datalad_next.utils.consts import
datalad_next/url_operations/http.py:from datalad_next.utils.requests_auth import
datalad_next/utils/requests_auth.py:from datalad_next.utils.http_helpers import
datalad_next/utils/tests/test_deprecated.py:from datalad_next.utils.deprecate import

Absolute imports that should be relative

datalad_next/annexremotes/tests/test_archivist.py:from datalad_next.annexremotes.archivist import

@mih
Copy link
Member Author

mih commented Feb 2, 2024

I propose the following addition to the toplevel README:

diff --git a/README.md b/README.md
index dc020d7..00f6c36 100644
--- a/README.md
+++ b/README.md
@@ -142,6 +142,33 @@ this extension causes a range of patches to be applied to the `datalad` package
 to enable them. A comprehensive description of the current set of patch is
 available at http://docs.datalad.org/projects/next/en/latest/#datalad-patches
 
+## Developing with DataLad NEXT
+
+This extension package moves fast in comparison to the core package. Nevertheless,
+attention is paid to API stability, adequate semantic versioning, and informative
+changelogs.
+
+### Public vs internal API
+
+Anything that can be imported directly from any of the sub-packages in
+`datalad_next` is considered to be part of the public API. Changes to this API
+determine the versioning, and development is done with the aim to keep this API
+as stable as possible. This includes signatures and return value behavior.
+
+As an example: `from datalad_next.runners import iter_git_subproc` imports a
+part of the public API, but `from datalad_next.runners.git import
+iter_git_subproc` does not.
+
+### Use of the internal API
+
+Developers can obviously use parts of the non-public API. However, this should
+only be done with the understanding that these components may change from one
+release to another, with no guarantee of transition periods, deprecation
+warnings, etc.
+
+Developers are adviced to never reuse any components with names starting with
+`_` (underscore). Their use should be limited to their individual subpackage.
+
 ## Acknowledgements
 
 This DataLad extension was developed with funding from the Deutsche

@adswa
Copy link
Member

adswa commented Feb 2, 2024

Reads well to me 👍

mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
This implements the plan declared at
datalad#613 (comment)

A dedicated list of components is exposed at the package's top-level,
and only that. All other functionality and imports have been move to
package-internal files.

All other, now undesired, top-level imports have been annotated and
scheduled for removal with the v2.0 release.
@mih mih added this to the 2.0 milestone Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
This implements the plan declared at
datalad#613 (comment)

A dedicated list of components is exposed at the package's top-level,
and only that. All other functionality and imports have been move to
package-internal files.

All other, now undesired, top-level imports have been annotated and
scheduled for removal with the v2.0 release.
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
This implements the plan declared at
datalad#613 (comment)

A dedicated list of components is exposed at the package's top-level,
and only that. All other functionality and imports have been move to
package-internal files.

All other, now undesired, top-level imports have been annotated and
scheduled for removal with the v2.0 release.
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
This implements the plan declared at
datalad#613 (comment)

A dedicated list of components is exposed at the package's top-level,
and only that. All other functionality and imports have been move to
package-internal files.

All other, now undesired, top-level imports have been annotated and
scheduled for removal with the v2.0 release.
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
This pretty much takes the role of `datalad_next.tests.utils`. The new
setup treats `tests` like any other sub-package and exposes reusable
tools as a public API at the top-level.

In doing so, `SkipTest` was made obsolete and excluded from that API.

The usage of `rmtree` is altered to import from `datalad_next.utils`.

datalad#613
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
This pretty much takes the role of `datalad_next.tests.utils`. The new
setup treats `tests` like any other sub-package and exposes reusable
tools as a public API at the top-level.

In doing so, `SkipTest` was made obsolete and excluded from that API.

The usage of `rmtree` is altered to import from `datalad_next.utils`.

For the first time, test tooling helpers have their documentation
rendered too.

datalad#613
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
This pretty much takes the role of `datalad_next.tests.utils`. The new
setup treats `tests` like any other sub-package and exposes reusable
tools as a public API at the top-level.

In doing so, `SkipTest` was made obsolete and excluded from that API.

The usage of `rmtree` is altered to import from `datalad_next.utils`.

For the first time, test tooling helpers have their documentation
rendered too.

datalad#613
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 5, 2024
@mih mih closed this as completed in #625 Feb 5, 2024
mih added a commit to mih/datalad-next that referenced this issue Feb 6, 2024
This continues the work done for datalad#613. Now all relevant symbols
are directly documented, for every top-level package.

In doing so, I added a first set of markers to shape the API
streamlining that will come with v2.0.
Ping datalad#624
Ping datalad#616

Some implementations that continued to be located in `__init__.py`
files have been move into new, dedicated source files.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants