-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
❗ [Includes historical git commits] Merge latest version of ministryofjustice-data-platform-catalogue #301
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Initial commit of the catalogue library This uses python 3.10 since 3.11 doesn't work yet. * Add a minimal client class This creates the basic hierarchy of service/database/schema/table. To be extended later with optional metadata. * Add a gitignore * Add usage instructions * Lint * Set serviceType = Glue * Update documentation
* Fix typo in package name to match pypi * Install poetry before setup_python setup_python has a dependency on poetry if using the cache path.
…rnal exceptions (#2014) * Modify client to accept metadata objects This forces us to pass in: - name, description for database/schema/table - retention period at schema/table level - version, owner, email, dpi_required, domain at schema level We can also accept tags at any level. The tags must already exist within an OpenMetadata classification. Email, dpia_required, domain, versions are not yet passed through to the catalogue. These may require use of custom attributes. * Wrap exceptions from OpenMetadata This library is intended to present a catalogue-agnostic API to the rest of the ingestion service, so we should expose our own exception hierarchy instead of passing through OpenMetadata's. * lint * Bump version * Update codeowners for new library
…atalogue (#1997) Bump urllib3 in /python-libraries/data-platform-catalogue Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.6 to 2.0.7. - [Release notes](https://github.com/urllib3/urllib3/releases) - [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst) - [Commits](urllib3/urllib3@2.0.6...2.0.7) --- updated-dependencies: - dependency-name: urllib3 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…metadata/schema pushes to openmetadata (#2176) * added `get_user_id` method and pass in column descriptions * add from dict methods to entities * add tests for get_user_id and new entity methods * upversion poetry * remove unused import from client * typo * correct type please * right upversion * lint * remove commented code * making entity args non optional
* add owner attribute to `CatalogueMetadata` entity * pass owner through `create_or_update_database()` * updates tests * upversion poetry * fix test * udate readme and diagram * update docstrings * correction readme example
* Update to latest library version * Add an integration test we can run manually
…tform-catalogue (#2552) Bump cryptography in /python-libraries/data-platform-catalogue Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.5 to 41.0.6. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@41.0.5...41.0.6) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Jacob Woffenden <jacob.woffenden@digital.justice.gov.uk>
…talogue (#2867) Bump jinja2 in /python-libraries/data-platform-catalogue Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst) - [Commits](pallets/jinja@3.1.2...3.1.3) --- updated-dependencies: - dependency-name: jinja2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mat <MatMoore@users.noreply.github.com>
Generalise CatalogueClient and add datahub implementation - Convert CatalogueClient class into ABC base, so OMD and DataHub client classes can be inherited - implement DataHubCatalogueClient using DataHub gms - rename all `create_or_update_x` methods to `upsert_x` - add `create_domain` and `create_or_update_data_product` methods to DataHubCatalogueClient class - update `DataHubCatalogueClient.create_or_update_table` method to create domain and data product if they don't exist but are passed as `data_product_metadata` - associate tables with data products when created in DataHub
* Don't import openmetadata by default It's incompatable with python 3.11 which is really annoying. * Fix bug appending assets to a data product The list append() method returns None, so this was clearing the asset list every other time an asset was added. * Bump version
* Add a method for search This method accepts a query and pagination variables, and returns the total number of results plus the results to display on the current page. For each search result, return: - the result type, DataSet or DataProduct - ID, name and description - tags - a dictionary of additional metadata fields (exactly what this will contain will depend on what tests well with users) - information about the properties which matched the search query - the time the metadata was last updated The raw responses from Datahub are logged at DEBUG level. For now I've excluded filters, facets, and sorting but these are supported by the underlying API. See - https://datahubproject.io/docs/graphql/inputObjects#facetfilterinput - https://datahubproject.io/docs/graphql/objects#facetmetadata - https://datahubproject.io/docs/graphql/inputObjects#searchsortinput * Parameterise result types in search queries
* Added list data product method
* Add parameter for search filters * Allow the search response to contain facet information Facets are the dynamic search filters that show how the search results break down across different dimensions. Returning these allows us to display only options that are relevant to the current result set and indicate the number of results matching each option. For most filters, we are expected to pass a URN as the value, so this is also the easiest way for the frontend to figure out what the possible values are. * Use relative imports throughout
* Remove restrictive filter type, and test urn example * Switch to non-deprecated syntax * Document possible filters * Add number_of_assets to data products metadata * Add data product to metadata of datasets
… (#3129) Add functionality for rendering search facets E.g. the domain multi-select
* Added the ability to sort graphQL queries * Added sort parameter to base client class * Added tests for sort parameter * Allowed NoneType for sort parameter * Corrections for linter * Satisfied black * Add method SortOption.format() and it's test * Chnaged indentation * Added whitespace for linter
…ailable custom properties (#3134) * Add domain information for datasets. Previously datasets were not returning domain information. This adds that back in and parses out the ID and name. * lint * Add custom properties to search result metadata
* Added additional metadata fields
…atalogue (#3126) Bump aiohttp in /python-libraries/data-platform-catalogue Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.1 to 3.9.2. - [Release notes](https://github.com/aio-libs/aiohttp/releases) - [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst) - [Commits](aio-libs/aiohttp@v3.9.1...v3.9.2) --- updated-dependencies: - dependency-name: aiohttp dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Gary <26419401+Gary-H9@users.noreply.github.com>
…tform-catalogue (#3211) Bump cryptography in /python-libraries/data-platform-catalogue Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.6 to 42.0.0. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@41.0.6...42.0.0) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* add fix to upsert_table to persist data product metadata * test data product metadata input * make tests pass and add check for product data persisting * updated snapshot golden files * Empty Commit * Empty Commit * remove to dot. * lint * update snapshot file for test * updates for new release
* ✨ Add subdomain field to dataproduct metadata * remove openmetadata files, adds subdomain and updates domain and subdomain refferences. * resolve broken tests * resolve comments * resolve comments pt 2
* fix for getting correct page returned in search result * fix for dataset name add and not duplicate asset attach to data product * tests dataset properties return as expected * extra integration test for unique page results * updated golden files * add decorator... * poetry version and changelog * lint
…tform-catalogue (#3382) Bump cryptography in /python-libraries/data-platform-catalogue Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@42.0.0...42.0.2) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* add list_data_product_assets method to client * tests for list_data_product_assets * up changelog and poetry version * add listDataProductAssets graphql file * lint * suggestions for graphql query * search client suggestion * add data_product_list_assets to base client * lint
…tform-catalogue (#3408) Bump cryptography in /python-libraries/data-platform-catalogue Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.2 to 42.0.4. - [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst) - [Commits](pyca/cryptography@42.0.2...42.0.4) --- updated-dependencies: - dependency-name: cryptography dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This comment was marked as resolved.
This comment was marked as resolved.
The library was not being installed properly because it wasn't copied into the builder image. This results in an import error for data_platform_catalogue. Also, change the working dir for the runtime image - previously it was sticking everything in the root directory which might cause issues.
This was referenced May 2, 2024
murdo-moj
reviewed
May 3, 2024
murdo-moj
reviewed
May 3, 2024
murdo-moj
reviewed
May 3, 2024
murdo-moj
reviewed
May 3, 2024
tom-webber
approved these changes
May 3, 2024
hjribeiro-moj
approved these changes
May 3, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR supersedes #300.
There is a corresponding PR to remove this from the old repo: ministryofjustice/analytical-platform#4261
Merge library into this repo
This library was previously in https://github.com/ministryofjustice/data-platform/
I've used
git filter-repo
to preserve the git history, so this PR contains the entire history of this library.For reference, the command I ran on the source repo was
Which allowed me to pull in the commits with
Post-refactor fixes
Since we made breaking changes in the 1.0.0 refactor, this PR also contains @murdo-moj's work to update the django project.
What's still missing
CI runs both the django tests and the tests for lib/datahub-client. But we will need to push another workflow to publish the library so it can be used by our metadata ingestion code.