Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UI for creating & downloading data exports #737

Open
mihow opened this issue Feb 21, 2025 · 4 comments · May be fixed by #786
Open

UI for creating & downloading data exports #737

mihow opened this issue Feb 21, 2025 · 4 comments · May be fixed by #786
Assignees
Labels
enhancement New feature or request frontend

Comments

@mihow
Copy link
Collaborator

mihow commented Feb 21, 2025

Summary

#720 introduces the ability to trigger exports, see their status and download a file via the backend API. We will need an interface to allow users to interact with it.

We have a request for this to be fully ready by Mid-march. The priority features only.

Detailed Description

They will likely be tied to a new type of job, but also likely need a dedicated list view with the history, status and download links. Something like the Collections list in Project detail page.

Priority features

  • Start an export from the Occurrences view with the current filters
  • View a list of exports in progress or completed
  • Provide download link

Secondary features

  • Allow exports of Sessions
  • Allow exports of Captures (collection detail view?)
  • Allow export format to be configured (Darwincore zip for occurrences and sessions, COCO for detections, etc)
  • Allow exports of Deployment metadata
  • Email notifications working

Future features

  • Notification center!

Implementation Details

See API endpoints added in #720
Hopefully no design is needed for this version, I suggest using the Project detail view tabs. Something like the the Collections list that shows the history and configuration for each export triggered, where each export is tied to a job & the job status.

The main formats for now will be a

  1. a JSON export with nested data that mostly matches the current API list views
  2. simplified & flattened CSV of each data type
  3. a Darwin Core Archive zip of flattened TSVs with data about Occurrences, Sessions and Taxa

Related Issues

#298
#720

Additional Context

Image

@mihow mihow added the enhancement New feature or request label Feb 21, 2025
@mihow mihow added the frontend label Feb 21, 2025
@annavik
Copy link
Member

annavik commented Mar 4, 2025

Thanks for this great summary Michael!

Even if we will reuse our current building blocks, I think we should spend a bit of time on the design here. How to combine these building blocks in a nice way for users and define logic for interactions I think is also part of design.

I think this is an important feature and I want to take some time to make it nice, not just squeeze it in! :) Maybe we can do a bit of sketching while we are working on the backend side of things?

@mihow
Copy link
Collaborator Author

mihow commented Mar 12, 2025

Here are all the formats available from Label Studio. You can see these are aimed at different use cases for research or compatibility with other common workflows.

Format Description Tags
JSON List of items in raw JSON format stored in one JSON file. Use to export both the data and the annotations for a dataset. It's Label Studio Common Format  
JSON_MIN List of items where only "from_name", "to_name" values from the raw JSON format are exported. Use to export only the annotations for a dataset.  
CSV Results are stored as comma-separated values with the column names specified by the values of the "from_name" and "to_name" fields.  
TSV Results are stored in tab-separated tabular file with column names specified by "from_name" "to_name" values  
COCO Popular machine learning format used by the COCO dataset for object detection and image segmentation tasks with polygons and rectangles. image segmentation, object detection
COCO_WITH_IMAGES COCO format with images downloaded. image segmentation, object detection
VOC Popular XML format used for object detection and polygon image segmentation tasks. image segmentation, object detection
YOLO Popular TXT format is created for each image file. Each txt file contains annotations for the corresponding image file, that is object class, object coordinates, height & width. image segmentation, object detection
YOLO_WITH_IMAGES YOLO format with images downloaded. image segmentation, object detection
YOLO_OBB Popular TXT format is created for each image file. Each txt file contains annotations for the corresponding image file. The YOLO OBB format designates bounding boxes by their four corner points with coordinates normalized between 0 and 1, so it is possible to export rotated objects. image segmentation, object detection
YOLO_OBB_WITH_IMAGES YOLOv8 OBB format with images downloaded. image segmentation, object detection
CONLL2003 Popular format used for the CoNLL-2003 named entity recognition challenge. sequence labeling, text tagging, named entity recognition
BRUSH_TO_NUMPY Export your brush labels as NumPy 2d arrays. Each label outputs as one image. image segmentation
BRUSH_TO_PNG Export your brush labels as PNG images. Each label outputs as one image. image segmentation
ASR_MANIFEST Export audio transcription labels for automatic speech recognition as the JSON manifest format expected by NVIDIA NeMo models. speech recognition

@mihow
Copy link
Collaborator Author

mihow commented Mar 12, 2025

Screenshots of the export process in Label Studio

  1. Create snapshot of data (choose filters, what to include etc)
  2. Create export from snapshot
  3. Choose format
  4. Export data is prepared using selected snapshot & format, file is generated and made ready for download.
Image Image Image

@mihow
Copy link
Collaborator Author

mihow commented Mar 12, 2025

Screenshots from the GBIF export process

  1. Query data
  2. Choose Download data
  3. Choose format
  4. Data & export file is prepared & public DOI link created
  5. User is emailed with download link & DOI to citable URL
Image Image Image Image

@annavik annavik linked a pull request Mar 24, 2025 that will close this issue
@annavik annavik linked a pull request Mar 24, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request frontend
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants