-
Notifications
You must be signed in to change notification settings - Fork 282
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give users the ability to download the text of a document at a txt file #2312
Comments
It would be possible to retrieve the text content for all pages via the entities API endpoint and assemble a text file that’s downloaded on the client side.
@sunu I am not sure though how well that endpoint would perform for documents with lots of pages and if this might need to be done on the server side? Also, there might be some value in exposing this feature via the API as well? |
@tillprochaska I have a Python script that I use for this very purpose. However, we get asked this a lot. Maybe it makes sense to also generate a single text file when ingesting the document that can then be called. Thought that would add a lot of overhead up front and we don't get asked this enough to justify blowing up our storage. Doing this on demand would be more sane. Maybe a job like exports? |
Yes, I would prefer doing it on the backend as an export as well. Ideally, we should cache the combined text for reuse. But I would be ok with skipping that in the first iteration. What should the url endpoint look like for this? Something like |
@sunu Not sure if that question was directed at me, but from an API consumer perspective, I’d have expected it to be part of the Although I can see that that doesn’t make a lot of sense from a technical/implementation perspective, as loading and concatenating text for a document is different from simply returning a file from storage. |
@brrttwrks Sorry, when I said "client side" I was referring to implementing it in the front end/browser (without changing the backend) and not to the Aleph CLI -- didn’t mean to say that you should keep doing the current workarounds! :) |
When reviewing a document in Aleph we provide the ability to download a pdf of the document. It would be good to also be able to download the textual content of that file as a text document.
This document would only include text, no images or formatting. Just the text.
The text was updated successfully, but these errors were encountered: