Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

❗ [Includes historical git commits] Merge latest version of ministryofjustice-data-platform-catalogue #301

Merged
merged 65 commits into from
May 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
494fdad
DP-1833 First draft of the catalogue library (python 3.10) (#1960)
MatMoore Oct 20, 2023
3fdac18
Fix typo in package name to match pypi (#2011)
MatMoore Oct 20, 2023
014d20b
Update paths since name of package has been renamed (#2013)
MatMoore Oct 20, 2023
019540a
Modify catalogue client to accept full metadata objects and hide inte…
MatMoore Oct 23, 2023
61f3bc5
Bump urllib3 from 2.0.6 to 2.0.7 in /python-libraries/data-platform-c…
dependabot[bot] Oct 26, 2023
5df2456
add functionality to `ministryofjustice-data-platform-catalogue` for …
LavMatt Nov 3, 2023
e3a0bfe
Dpl add owner attribute to create database (#2277)
LavMatt Nov 10, 2023
d10bf95
Update to latest OpenMetadata version (#2327)
MatMoore Nov 13, 2023
1741697
Bump cryptography from 41.0.5 to 41.0.6 in /python-libraries/data-pla…
dependabot[bot] Nov 30, 2023
86a2120
Bump jinja2 from 3.1.2 to 3.1.3 in /python-libraries/data-platform-ca…
dependabot[bot] Jan 16, 2024
bfacd6b
Implement DataHub Catalogue client for dp-catalogue library (#2902)
tom-webber Jan 19, 2024
2cc14f0
[DP-2782] Fix data product assets issue and bump the version (#3040)
MatMoore Jan 23, 2024
e2f98d6
[DP-2983] Add a method for search (#3051)
MatMoore Jan 24, 2024
47ef4b1
Bump version (#3059)
MatMoore Jan 24, 2024
9d848ab
Added list data product method (#3068)
murdo-moj Jan 25, 2024
99c752b
[DP-3074] Add filtering to search function (#3075)
MatMoore Jan 26, 2024
80ca888
[DP-3106] Enhance metadata returned with search results (#3124)
MatMoore Jan 30, 2024
86a818d
[DP-3106] Add a way to fetch lists of domains and other search facets…
MatMoore Jan 30, 2024
84b7280
Fix bug with facets query (#3133)
MatMoore Jan 30, 2024
e451a00
Dp 3127 add sorting in search (#3142)
murdo-moj Jan 31, 2024
1c5af56
Add missing domain metadata to dataset search results, and add all av…
MatMoore Feb 1, 2024
d20f15c
Dp-3048-add-metadata-items (#3206)
murdo-moj Feb 6, 2024
7aa60d8
Bump aiohttp from 3.9.1 to 3.9.2 in /python-libraries/data-platform-c…
dependabot[bot] Feb 7, 2024
ee4d458
Bump cryptography from 41.0.6 to 42.0.0 in /python-libraries/data-pla…
dependabot[bot] Feb 7, 2024
df7c6dc
Dpl 3207 data product metadata bug (#3235)
LavMatt Feb 8, 2024
5ff5989
:sparkles: Add subdomain field to dataproduct metadata (#3304)
mitchdawson1982 Feb 15, 2024
63c0eeb
Bug fix catalogue name and page search (#3381)
LavMatt Feb 19, 2024
98d1bad
Bump cryptography from 42.0.0 to 42.0.2 in /python-libraries/data-pla…
dependabot[bot] Feb 19, 2024
d08f251
Add list data product to catalogue client (#3396)
LavMatt Feb 20, 2024
eeecbb2
Bump cryptography from 42.0.2 to 42.0.4 in /python-libraries/data-pla…
dependabot[bot] Feb 22, 2024
2e7e534
Find-moj-data-83/glossary (#3449)
murdo-moj Feb 26, 2024
1f729a8
Added get_glossary_terms to ABC to fix test mocking (#3463)
murdo-moj Feb 27, 2024
8cc363d
use fully qualified name in upsert_table (#3468)
LavMatt Feb 29, 2024
7d3f503
Dc 3512 fqn in catalogue search (#3556)
LavMatt Mar 5, 2024
e3a2ace
Fix none fqn data catalogue (#3603)
LavMatt Mar 5, 2024
29cd9e7
Add method for fetching schema metadata (#3604)
MatMoore Mar 6, 2024
07f9d55
Fix dodgy getDataset query (#3637)
MatMoore Mar 7, 2024
e625bc2
Support handling of customProperties in matched fields (#3641)
mitchdawson1982 Mar 11, 2024
a98deda
Dc 175 add upsert athena methods (#3767)
LavMatt Mar 18, 2024
7faebfc
Fmd-149-charts (#3772)
murdo-moj Mar 19, 2024
a312363
v0.21.0 (#3781)
murdo-moj Mar 19, 2024
5f655f2
Dc 156 add container entities (#3790)
LavMatt Mar 21, 2024
df11b2c
Add missing metadata when fetching datasets (#3722)
MatMoore Mar 26, 2024
f702593
update lastModified property in data-platform-catalogue (#4024)
LavMatt Apr 12, 2024
6f31961
Accounted for lastModified changing to nested structure (#4063)
murdo-moj Apr 12, 2024
4caacac
Bump dnspython from 2.4.2 to 2.6.1 in /python-libraries/data-platform…
dependabot[bot] Apr 15, 2024
7385416
update instances of `lastModified` property in data-platform-catalogu…
tom-webber Apr 16, 2024
c654278
Catalogue library refactor (#4115)
murdo-moj Apr 30, 2024
11e6192
Consistently parse names from datahub (#4244)
MatMoore May 1, 2024
a3b6de7
Coerce descriptions and custom properties to string/empty string (#4256)
MatMoore May 1, 2024
d638f9d
Merge branch 'main' of git@github.com:ministryofjustice/data-platform…
MatMoore May 2, 2024
5485ffa
Configure pytest to test both projects
MatMoore May 2, 2024
9bad477
fix relative paths
MatMoore May 1, 2024
011bf54
Fix post-merge incompatabilities between lib/datahub-client and the d…
MatMoore May 2, 2024
335ad0d
Starting to fix from refactor
murdo-moj Apr 30, 2024
851c5b4
Fixing/deleting tests
murdo-moj May 1, 2024
e363a25
Foreign keys should be an empty list
murdo-moj May 1, 2024
484fa29
Ensure the library is available to the builder
MatMoore May 2, 2024
0cf3120
Clean up README
MatMoore May 2, 2024
a893d75
Handle the correct exception type
MatMoore May 2, 2024
8e5a46b
Update .vscode/launch.json
murdo-moj May 3, 2024
5ba49dc
Update templates/partial/search_result.html
murdo-moj May 3, 2024
a33943b
Update templates/details_table.html
murdo-moj May 3, 2024
377f1e7
Added vscode default settings
murdo-moj May 3, 2024
1bf4646
Removed setting from git
murdo-moj May 3, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
Expand All @@ -25,6 +24,8 @@ share/python-wheels/
.installed.cfg
*.egg
MANIFEST
.vscode/**
!.vscode/launch.json.default

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down
3 changes: 0 additions & 3 deletions .vscode/launch.json → .vscode/launch.json.default
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
Expand Down
3 changes: 3 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,16 @@ ENV POETRY_NO_INTERACTION=1 \
POETRY_CACHE_DIR=/tmp/poetry_cache

COPY pyproject.toml poetry.lock ./
COPY lib ./lib

RUN poetry install --without dev --no-root && rm -rf $POETRY_CACHE_DIR
RUN poetry run python -m nltk.downloader punkt

# The runtime image, used to just run the code provided its virtual environment
FROM python:3.11-slim-buster as runtime

WORKDIR /app

ENV VIRTUAL_ENV=/app/.venv \
PATH="/app/.venv/bin:$PATH"

Expand Down
5 changes: 2 additions & 3 deletions home/service/base.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
from data_platform_catalogue.client import BaseCatalogueClient
from data_platform_catalogue.client.datahub import DataHubCatalogueClient
from data_platform_catalogue.client.datahub_client import DataHubCatalogueClient
from django.conf import settings


class GenericService:
@staticmethod
def _get_catalogue_client() -> BaseCatalogueClient:
def _get_catalogue_client() -> DataHubCatalogueClient:
return DataHubCatalogueClient(
jwt_token=settings.CATALOGUE_TOKEN, api_url=settings.CATALOGUE_URL
)
58 changes: 3 additions & 55 deletions home/service/details.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,54 +5,6 @@
from .base import GenericService


class DataProductDetailsService(GenericService):
def __init__(self, urn: str):
self.urn = urn
self.client = self._get_catalogue_client()

filter_value = [MultiSelectFilter("urn", [urn])]
search_results = self.client.search(query="", page=None, filters=filter_value)

if not search_results.page_results:
raise ObjectDoesNotExist(urn)

self.result = search_results.page_results[0]
self.assets_in_data_product = self._get_data_product_entities()
self.context = self._get_context()

def _get_data_product_entities(self):
# we might want to implement pagination for data product children
# details at some point
data_product_search = self.client.list_data_product_assets(
urn=self.urn, count=500
).page_results

assets_in_data_product = []
for result in data_product_search:
assets_in_data_product.append(
{
"name": result.name,
"urn": result.id,
"description": result.description,
"type": "TABLE",
}
)

assets_in_data_product = sorted(assets_in_data_product, key=lambda d: d["name"])

return assets_in_data_product

def _get_context(self):
context = {
"result": self.result,
"result_type": "Data product",
"tables": self.assets_in_data_product,
"h1_value": "Details",
}

return context


class DatabaseDetailsService(GenericService):
def __init__(self, urn: str):
self.urn = urn
Expand All @@ -69,7 +21,7 @@ def __init__(self, urn: str):
self.context = self._get_context()

def _get_database_entities(self):
# we might want to implement pagination for data product children
# we might want to implement pagination for database children
# details at some point
database_search = self.client.list_database_tables(
urn=self.urn, count=500
Expand All @@ -80,7 +32,7 @@ def _get_database_entities(self):
entities_in_database.append(
{
"name": result.name,
"urn": result.id,
"urn": result.urn,
"description": result.description,
"type": "TABLE",
}
Expand Down Expand Up @@ -122,11 +74,7 @@ def __init__(self, urn: str):
# v0.12, assigning to multiple data products is not possible and we don't
# have datasets with multiple parent containers.
self.parent_entity = parents[0]
self.dataset_parent_type = (
ResultType.DATABASE.name.lower()
if "container" in self.parent_entity.id.split(":")
else ResultType.DATA_PRODUCT.name.lower()
)
self.dataset_parent_type = ResultType.DATABASE.name.lower()
else:
self.parent_entity = None
self.dataset_parent_type = None
Expand Down
3 changes: 2 additions & 1 deletion home/service/search.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from data_platform_catalogue.search_types import (
MultiSelectFilter,
ResultType,
SearchResponse,
SortOption,
)
from django.core.paginator import Paginator
Expand Down Expand Up @@ -62,7 +63,7 @@ def _build_entity_types(_, entity_types: list[str]) -> tuple[ResultType]:
)
return chosen_entities if chosen_entities else default_entities

def _get_search_results(self, page: str, items_per_page: int):
def _get_search_results(self, page: str, items_per_page: int) -> SearchResponse:
if self.form.is_bound:
form_data = self.form.cleaned_data
else:
Expand Down
2 changes: 1 addition & 1 deletion home/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
path("search", views.search_view, name="search"),
path("glossary", views.glossary_view, name="glossary"),
path(
"details/<str:result_type>/<str:id>",
"details/<str:result_type>/<str:urn>",
views.details_view,
name="details",
),
Expand Down
43 changes: 14 additions & 29 deletions home/views.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,11 @@
from django.core.exceptions import ObjectDoesNotExist
from data_platform_catalogue.client.exceptions import EntityDoesNotExist
from django.http import Http404, HttpResponseBadRequest
from django.shortcuts import render

from home.forms.search import SearchForm
from home.service.details import (
ChartDetailsService,
DatabaseDetailsService,
DataProductDetailsService,
DatasetDetailsService,
)
from home.service.glossary import GlossaryService
Expand All @@ -19,58 +18,44 @@ def home_view(request):
return render(request, "home.html", context)


def details_view(request, result_type, id):
if result_type == "data_product":
context = data_product_details(id)
return render(request, "details_data_product.html", context)
def details_view(request, result_type, urn):
if result_type == "table":
context = dataset_details(id)
context = dataset_details(urn)
return render(request, "details_table.html", context)
if result_type == "database":
context = database_details(id)
context = database_details(urn)
return render(request, "details_database.html", context)
if result_type == "chart":
context = chart_details(id)
context = chart_details(urn)
return render(request, "details_chart.html", context)


def data_product_details(id):
def database_details(urn):
try:
service = DataProductDetailsService(id)
except ObjectDoesNotExist:
service = DatabaseDetailsService(urn)
except EntityDoesNotExist:
raise Http404("Asset does not exist")

context = service.context

return context


def database_details(id):
def dataset_details(urn):
try:
service = DatabaseDetailsService(id)
except ObjectDoesNotExist:
service = DatasetDetailsService(urn)
except EntityDoesNotExist:
raise Http404("Asset does not exist")

context = service.context

return context


def dataset_details(id):
def chart_details(urn):
try:
service = DatasetDetailsService(id)
except ObjectDoesNotExist:
raise Http404("Asset does not exist")

context = service.context

return context


def chart_details(id):
try:
service = ChartDetailsService(id)
except ObjectDoesNotExist:
service = ChartDetailsService(urn)
except EntityDoesNotExist:
raise Http404("Asset does not exist")

context = service.context
Expand Down
10 changes: 10 additions & 0 deletions lib/datahub-client/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.env
coverage/
venv/
env/
.DS_STORE
.vscode
*.code-workspace
dist/
__pycache__
.idea/
Loading
Loading