Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❗ [Includes historical git commits] Merge latest version of ministryofjustice-data-platform-catalogue #301

Merged
merged 65 commits into from
May 3, 2024

Conversation

MatMoore
Copy link
Contributor

@MatMoore MatMoore commented May 2, 2024

This PR supersedes #300.

There is a corresponding PR to remove this from the old repo: ministryofjustice/analytical-platform#4261

Merge library into this repo

This library was previously in https://github.com/ministryofjustice/data-platform/
I've used git filter-repo to preserve the git history, so this PR contains the entire history of this library.

For reference, the command I ran on the source repo was

git filter-repo --path python-libraries/data-platform-catalogue --path-rename python-libraries/data-platform-catalogue:lib/datahub-client

Which allowed me to pull in the commits with

git pull --allow-unrelated-histories ../dp-clone main --no-ff

Post-refactor fixes

Since we made breaking changes in the 1.0.0 refactor, this PR also contains @murdo-moj's work to update the django project.

What's still missing

CI runs both the django tests and the tests for lib/datahub-client. But we will need to push another workflow to publish the library so it can be used by our metadata ingestion code.

MatMoore and others added 30 commits October 20, 2023 09:48
* Initial commit of the catalogue library

This uses python 3.10 since 3.11 doesn't work yet.

* Add a minimal client class

This creates the basic hierarchy of service/database/schema/table.

To be extended later with optional metadata.

* Add a gitignore

* Add usage instructions

* Lint

* Set serviceType = Glue

* Update documentation
* Fix typo in package name to match pypi

* Install poetry before setup_python

setup_python has a dependency on poetry if using the cache path.
…rnal exceptions (#2014)

* Modify client to accept metadata objects

This forces us to pass in:

- name, description for database/schema/table
- retention period at schema/table level
- version, owner, email, dpi_required, domain at schema level

We can also accept tags at any level. The tags must already exist within
an OpenMetadata classification.

Email, dpia_required, domain, versions are not yet passed through to the catalogue.
These may require use of custom attributes.

* Wrap exceptions from OpenMetadata

This library is intended to present a catalogue-agnostic API to the rest
of the ingestion service, so we should expose our own exception
hierarchy instead of passing through OpenMetadata's.

* lint

* Bump version

* Update codeowners for new library
…atalogue (#1997)

Bump urllib3 in /python-libraries/data-platform-catalogue

Bumps [urllib3](https://github.com/urllib3/urllib3) from 2.0.6 to 2.0.7.
- [Release notes](https://github.com/urllib3/urllib3/releases)
- [Changelog](https://github.com/urllib3/urllib3/blob/main/CHANGES.rst)
- [Commits](urllib3/urllib3@2.0.6...2.0.7)

---
updated-dependencies:
- dependency-name: urllib3
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…metadata/schema pushes to openmetadata (#2176)

* added `get_user_id` method and pass in column descriptions

* add from dict methods to entities

* add tests for get_user_id and new entity methods

* upversion poetry

* remove unused import from client

* typo

* correct type please

* right upversion

* lint

* remove commented code

* making entity args non optional
* add owner attribute to `CatalogueMetadata` entity

* pass owner through `create_or_update_database()`

* updates tests

* upversion poetry

* fix test

* udate readme and diagram

* update docstrings

* correction readme example
* Update to latest library version
* Add an integration test we can run manually
…tform-catalogue (#2552)

Bump cryptography in /python-libraries/data-platform-catalogue

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.5 to 41.0.6.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@41.0.5...41.0.6)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Jacob Woffenden <jacob.woffenden@digital.justice.gov.uk>
…talogue (#2867)

Bump jinja2 in /python-libraries/data-platform-catalogue

Bumps [jinja2](https://github.com/pallets/jinja) from 3.1.2 to 3.1.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/main/CHANGES.rst)
- [Commits](pallets/jinja@3.1.2...3.1.3)

---
updated-dependencies:
- dependency-name: jinja2
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Mat <MatMoore@users.noreply.github.com>
Generalise CatalogueClient and add datahub implementation

- Convert CatalogueClient class into ABC base, so OMD and DataHub client classes can be inherited
- implement DataHubCatalogueClient using DataHub gms
- rename all `create_or_update_x` methods to `upsert_x`
- add `create_domain` and `create_or_update_data_product` methods to DataHubCatalogueClient class
- update `DataHubCatalogueClient.create_or_update_table` method to create domain and data product if they don't exist but are passed as `data_product_metadata`
- associate tables with data products when created in DataHub
* Don't import openmetadata by default

It's incompatable with python 3.11 which is really annoying.

* Fix bug appending assets to a data product

The list append() method returns None, so this was clearing the
asset list every other time an asset was added.

* Bump version
* Add a method for search

This method accepts a query and pagination variables, and returns
the total number of results plus the results to display on the
current page.

For each search result, return:

- the result type, DataSet or DataProduct
- ID, name and description
- tags
- a dictionary of additional metadata fields (exactly what this will
  contain will depend on what tests well with users)
- information about the properties which matched the search query
- the time the metadata was last updated

The raw responses from Datahub are logged at DEBUG level.

For now I've excluded filters, facets, and sorting but these are
supported by the underlying API.

See
- https://datahubproject.io/docs/graphql/inputObjects#facetfilterinput
- https://datahubproject.io/docs/graphql/objects#facetmetadata
- https://datahubproject.io/docs/graphql/inputObjects#searchsortinput

* Parameterise result types in search queries
* Added list data product method
* Add parameter for search filters

* Allow the search response to contain facet information

Facets are the dynamic search filters that show how the search results
break down across different dimensions. Returning these allows us
to display only options that are relevant to the current result set
and indicate the number of results matching each option.

For most filters, we are expected to pass a URN as the value, so
this is also the easiest way for the frontend to figure out what
the possible values are.

* Use relative imports throughout
* Remove restrictive filter type, and test urn example

* Switch to non-deprecated syntax

* Document possible filters

* Add number_of_assets to data products metadata

* Add data product to metadata of datasets
… (#3129)

Add functionality for rendering search facets

E.g. the domain multi-select
* Added the ability to sort graphQL queries

* Added sort parameter to base client class

* Added tests for sort parameter

* Allowed NoneType for sort parameter

* Corrections for linter

* Satisfied black

* Add method SortOption.format() and it's test

* Chnaged indentation

* Added whitespace for linter
…ailable custom properties (#3134)

* Add domain information for datasets.

Previously datasets were not returning domain information.

This adds that back in and parses out the ID and name.

* lint

* Add custom properties to search result metadata
* Added additional metadata fields
…atalogue (#3126)

Bump aiohttp in /python-libraries/data-platform-catalogue

Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.9.1 to 3.9.2.
- [Release notes](https://github.com/aio-libs/aiohttp/releases)
- [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst)
- [Commits](aio-libs/aiohttp@v3.9.1...v3.9.2)

---
updated-dependencies:
- dependency-name: aiohttp
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Gary <26419401+Gary-H9@users.noreply.github.com>
…tform-catalogue (#3211)

Bump cryptography in /python-libraries/data-platform-catalogue

Bumps [cryptography](https://github.com/pyca/cryptography) from 41.0.6 to 42.0.0.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@41.0.6...42.0.0)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* add fix to upsert_table to persist data product metadata

* test data product metadata input

* make tests pass and add check for product data persisting

* updated snapshot golden files

* Empty Commit

* Empty Commit

* remove to dot.

* lint

* update snapshot file for test

* updates for new release
* ✨ Add subdomain field to dataproduct metadata

* remove openmetadata files, adds subdomain and updates domain and subdomain refferences.

* resolve broken tests

* resolve comments

* resolve comments pt 2
* fix for getting correct page returned in search result

* fix for dataset name add and not duplicate asset attach to data product

* tests dataset properties return as expected

* extra integration test for unique page results

* updated golden files

* add decorator...

* poetry version and changelog

* lint
…tform-catalogue (#3382)

Bump cryptography in /python-libraries/data-platform-catalogue

Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.0 to 42.0.2.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@42.0.0...42.0.2)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* add list_data_product_assets method to client

* tests for list_data_product_assets

* up changelog and poetry version

* add listDataProductAssets graphql file

* lint

* suggestions for graphql query

* search client suggestion

* add data_product_list_assets to base client

* lint
…tform-catalogue (#3408)

Bump cryptography in /python-libraries/data-platform-catalogue

Bumps [cryptography](https://github.com/pyca/cryptography) from 42.0.2 to 42.0.4.
- [Changelog](https://github.com/pyca/cryptography/blob/main/CHANGELOG.rst)
- [Commits](pyca/cryptography@42.0.2...42.0.4)

---
updated-dependencies:
- dependency-name: cryptography
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
@MatMoore

This comment was marked as resolved.

The library was not being installed properly because it wasn't
copied into the builder image.

This results in an import error for data_platform_catalogue.

Also, change the working dir for the runtime image - previously it was
sticking everything in the root directory which might cause issues.
@MatMoore MatMoore changed the title ❗ Merge latest version of ministryofjustice-data-platform-catalogue ❗ [Includes historical git commits] Merge latest version of ministryofjustice-data-platform-catalogue May 2, 2024
Copy link
Contributor

@hjribeiro-moj hjribeiro-moj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@murdo-moj murdo-moj merged commit 4a0f7aa into main May 3, 2024
4 checks passed
@murdo-moj murdo-moj deleted the fmd-265-catalog-library-refactor-2 branch May 3, 2024 10:22
Copy link

sentry-io bot commented May 8, 2024

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ ValidationError: 1 validation error for Table /details/{result_type}/{urn} View Issue
  • ‼️ SystemExit: 1 /search View Issue
  • ‼️ CatalogueError: Unable to execute search query /search View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants