-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remote.nextstrain_dot_org: Embed images into narratives on upload #235
Conversation
Immediate 3.6 failure looks spurious. I'll re-run that job once the rest of the run completes. |
Big fan of this idea. Local narrative editing ( On the /edit/narratives side, one could imagine using this code to drag&drop a markdown + the image files which are referenced in the markdown (but it'd have to be a JS version and thus quite different). When editing functionality is actually available you could drag on image files and have them base64 embedded by the browser (again, not possible to use the CLI code here). For the local editing approach it's be nice to work out where we want to end up, and how it interfaces with /edit/narratives (if at all). Tom - do you envision the CLI addressing the URL gotchas (ii)? (I took the approach of no longer recommending local narratives editing at all.) |
…ides Fixes handling of intra-document references like links and images that reference a definition found elsewhere in the file, e.g. before the first slide or on a different slide than the link/image reference. Such features are soon to be used for embedding images (see related PR below). Anyone previewing a document with such features outside of Auspice's narrative rendering would see the link/image references work, but they'd be broken when viewed thru Auspice. Since narrative documents are intended to gracefully degrade with conventional Markdown rendering and still be a viewable document (e.g. on GitHub), it's nice to address this issue for that reason alone. It's also nice to use the DOM as a structured way to split apart the document into slides, rather than relying on string manipulation and parsing. An alternative approach would be to use a Markdown parser that emits an AST or stream of nodes as an intermediate we could split into slides while still preserving references/definitions, but that seemed like a bigger change (and I'd have to go survey the parsers available instead of using the browser's built-in DOM). While the entire internals of parseNarrativeBody() change, the shape of the output "slide objects" is the same. The one minor exception is that slides will always have a "mainDisplayMarkdown" property (though it will usually be the empty string) even if they don't have a corresponding code block in the source text. All downstream usages I found condition not on property existence but on truthiness of the value, so this should result in no functional change. Resolves: <#1599> Related-to: <nextstrain/cli#235>
This work is pretty focused on the "upload a narrative to share it" part of the process and doesn't really consider the earlier iterative part of local authoring (e.g. write some slides, preview, rinse & repeat). There's obviously a lot more to do to improve local authoring!
Do you have any guess at how useful this would be in practice? It seems like something that's super error prone (e.g. easy to miss dragging on an image file; debugging is disconnected from uploading, so you might get it right when debugging but not when uploading, etc.) and maybe only serves as a stop gap measure until full authoring in the browser? (Handling image embedding in a JS client would necessarily be different, yes. That said, I think much of the approach (modifying a stream of nodes, the node structures, etc) could be more or less directly ported from Python to JS with the biggest differences being how the stream of parsed nodes is produced and how the nodes are turned back into Markdown (or directly converted to HTML).)
To me, this sort of authoring in the browser seems ideal from the user's POV and also our implementation POV. Compared to tracking down image references after the fact, it's much simpler to handle a dropped image and embed it appropriately when serializing the editor's document model to Markdown. So IMO better to spend effort on making authoring in the browser a reality rather than the stop gap above.
I'm not sure how the CLI would address (ii) without it getting overly complex to use as it requires more information than the narrative itself provides (e.g. the CLI would have to know both the remote nextstrain.org locations of and the local filesystem locations of the narrative and datasets used). If we could, then it seems worthwhile, but (ii) feels more like an Auspice issue and question of how narratives are designed (e.g. nextstrain/auspice#890). |
Yup - and my comments are not a critique of the code in this PR (which seems great) but rather questioning how this fits into the bigger picture of narrative authoring. I'm of the opinion that we should aim to move completely away from developing narratives locally (i.e. avoid using
Agree with this too - and the workflow of dropping images onto individual slides (which are then converted to base64) is the way to go for the editor. But this leads to asking what the use case is for relative markdown images? They are (and probably will be) incompatible with /edit/narratives and, as far as I can figure, don't work within auspice either [1]. [1] E.g. a image |
Yeah, totally fair questions! :-) I'm not so sure about moving entirely away from local narrative authoring. It seems important to keep the ability to do it offline and without sharing via nextstrain.org, e.g. both for folks with spotty connectivity and folks who don't want to/can't host their narratives on nextstrain.org. Even if Auspice gains an in-browser editor for narratives (that we then leverage on nextstrain.org for saving to Groups, etc.), I imagine it would be useful to run that editor locally against local files, i.e. similar to what Stepping back a bit, and also getting to the question of local datasets for local narratives, my general thought here is that Nextstrain CLI could provide the web API needed by Auspice to view (and edit) narratives and datasets. (Right now that means the Charon API, but I imagine it might also mean parts of the RESTful API too in the future.) So instead of Auspice making requests to its own built-in server backed by the filesystem, Nextstrain CLI would run Auspice so that it makes requests to a server that Nextstrain CLI also provides. This server could be backed by the local filesystem when appropriate but could also back itself by nextstrain.org (incl. as the user logged into the CLI).
The use case I saw was to edit a narrative in your But maybe I've just imagined this use case and it isn't actually useful?? ¹ This is also the process I'd like to implement for https://github.com/nextstrain/narratives instead of having nextstrain.org source directly from GitHub. |
Jumping into the broader discussion:
This point is consistent with the recent conversations we've had with users at office hours where folks have wanted or needed to develop their narratives in private (ahead of a manuscript submission or to learn how narratives work in private). But, I also see why @jameshadfield would prefer to move away from this functionality since the local vs. remote naming is a complicated problem. If we think about the use cases in terms of preference for public/private narratives and remote/local authoring, I wonder if most users would be ok with remote authoring if they knew it could be private? Poor internet connectivity would still be an unavoidable problem with the remote authoring, though.
This seems like a huge step for the CLI! It sounds interesting, although I don't remember discussing this before in our long-term plans for the CLI. Would you be open to talking more about this direction at our next Nextstrain meeting or just for a smaller meeting of people who are interested, @tsibley?
I don't know if you've imagined this use case, although my personal experience is that I either work exclusively on a public narrative through GitHub or on a public/private narrative through Nextstrain Groups, but I never manage a narrative in both a GitHub repository and a group. This PR would make the second type of authoring (which I do most often) much easier, avoiding the need to post images somewhere publicly. I can see why versioning plus groups storage would be valuable, but for better or worse, I treat |
We request "ubuntu-latest" for our CI jobs to stay current. GitHub Actions started switching that from 20.04 → 22.04 at the beginning of October¹, but actions/setup-python doesn't support Python 3.6 on Ubuntu 22.04. This issue was first identified in CI runs for an unrelated PR.² GitHub Actions documents matrix includes to happen after matrix excludes, so combining the two works well to replace a specific job. ¹ https://github.blog/changelog/2022-11-09-github-actions-ubuntu-latest-workflows-will-use-ubuntu-22-04/ ² #235
I feel like I'm only qualified to weigh in a little bit here, as I don't fully understand the technical side, but just to give a few thoughts:
|
I'll summarise my current thoughts here, but I think this has arrived at the point where we need a synchronous meeting to talk through these things. There's no objections from me about the technical possibilities here, but I think we should consider what we want to support / invest in. Nextstrain CLI using nextstrain.org API handlers I implemented a prototype of this to develop the /edit/narratives functionality -- https://github.com/nextstrain/auspice/blob/a416642765e15881b025c23967bf2625f444b3e1/src/components/narrativeEditor/updateApiCalls.js#L10-L14 -- Public vs Private I think there's some confusion here which we could clarify more (and it's not really a binary thing either). Using nextstrain.org/edit/narratives page is private in the sense that the narrative doesn't leave your browser window, and allows private Groups datasets to be used (if you're logged in). This is equivalent to viewing your narrative using a version of nextstrain-cli which accesses nextstrain.org datasets [1]. If this is not considered private enough, then storing your datasets or narratives on nextstrain.org (private groups) is not going to be private enough, and probably auspice.us isn't either. You are then in a world where you're running your own auspice server & in that case I think there is a certain level of technical proficiency expected. (To be clear - local narratives development won't go away from Auspice - I just don't think it's the go-to technique we should recommend. This includes local usage of /edit/narratives backed by local files, which I think should already be working! In the future we could expand this to allow users to save narrative changes back to disk.) GitHub vs Groups The use case described above (develop markdown by viewing on GitHub web UI, then upload the narrative using nextstrain CLI) is pretty niche to me, but I'd love to hear dissenting experiences! Rather, I think coupling this editing approach with Nextstrain Community sharing is a nicer UX [2,3] as it keeps a single source of truth and avoids out-of-sync issues. This relates to how we're storing our core narratives: yes it's an oddity we're using a GitHub repo (a minus) but it's clear which file is the source of truth (a big plus). [1] Maybe a nextstrain.org browser has a larger attack surface than a localhost one, but I think they're equivalently secure for this conversation. [2] But it doesn't support relative images. We've gotten around this by explaining how to use the actual GitHub image path in our tutorials. Relative images here could be solved at a technical level in the nextstrain.org codebase for Community narratives, which would be cool! [3] And it (Community) doesn't support private repos. In this case I'd still lean towards the /edit/narratives + nextstrain CLI approach. |
For sure. You hadn't heard it discussed before because it was something I thought of in the shower last night. ;-) |
When you're done an editing session, do you keep the local file around and pick up from there for your next editing session? Or do you discard the local file and Do you do much collaborative authoring? I'd suspect keeping narratives in version control would be more compelling in that use case (e.g. just like with collaborative manuscript authoring). When version control is involved, |
👍 Yes, would be good to have a smaller work group meeting to hash out the goals (and non-goals) for local narratives and make a cohesive plan between Auspice, nextstrain.org, Nextstrain CLI, etc.
GitHub might be niche, but it's just an example for a more general use case that seems likely to exist (but again, I don't know if it does). As another example, you can easily imagine someone keeping their narratives in a Dropbox or other shared folder and collaborating on them there, relying on their editor to show a Markdown preview (as many do these days, e.g. VSCode), and then uploading when ready to share.
Community's use of GitHub repos and how files map has seemed confusing to many over time (unless they're already in that niche of using Git). I think the concept of "upload to nextstrain.org" is clarifying compared to the indirection of "put this on GitHub, at this specific place, and then type in a nextstrain.org URL over here". And I think the "author in private, upload to make public" use case is also important.
I don't think it's an oddity that we're using a repo to track and author core narratives. It's just like how we use repos to track and author core datasets (via the workflows that produce them). And I think it's clearer when what you see on nextstrain.org is what's been uploaded to nextstrain.org, rather than something stored elsewhere but accessed via nextstrain.org. (I'm counting
Indeed! We could use a similar approach as here with rewriting the Markdown (or otherwise arranging the correct base URL for resources). |
I'll just add one (final) comment which is to observe that a big part of crafting a narrative is how the dataset will be visualised alongside the slide content. You can get some way there by loading [1] the (dataset) URL in a browser and then looking at the markdown rendering of the slide (in GitHub, VSCode etc etc), but there are some subtle differences (text styling of the markdown, viz panel dimensions), and I think it's a nicer editing experience to see what the slide will actually look like. This was (one of) the motivations for /edit/narratives. [1] I mean explicitly loading that URL -- getting the viz into a particular state, copy & pasting the URL into the markdown, and then moving on isn't enough as we don't store every bit of UI state in the URL query. |
I always treat the version of the narrative stored in the group as the source of truth. When I switch between home and work machines, for example, I always download the remote version before starting new work.
No, but I plan to do more for seasonal flu, at least. I bet you're right that my current workflow will break down the first time @j23414 and I upload slightly different versions of the same narrative without "pulling" first. The frustrating part is that version control would add another layer of complexity by requiring a new private GitHub repo for each private group I'm writing narratives for and the actual authoring process (currently) requires uploading to the group to see the changes reflected, so the potential for conflicts in co-authorship remains. Maybe the better solution would be to author the narrative locally in version control, drag onto narratives debugger to confirm changes, commit and push changes to a parallel private repo (e.g., |
Builds with this version are seemingly-successful until runtime when this error is thrown: Traceback (most recent call last): … File "cryptography.hazmat.primitives.asymmetric.ec", line 11, in <module> File "cryptography.hazmat._oid", line 7, in <module> ImportError: …/build/x86_64-unknown-linux-gnu/debug/installation/lib/cryptography/hazmat/bindings/_rust.abi3.so: undefined symbol: PyExc_BaseException Why 0.23.0 results in a bad build remains unclear, though I'll continue to try to track down the cause. I tested and confirmed it's not the Python 3.10.4 → 3.10.8 upgrade noted in the 0.23.0 release notes¹. Maybe it's the SETUPTOOLS_USE_DISTUTILS=stdlib change? For now, avoiding 0.23.0 means we'll use the previous release, 0.22.0, which still works well in my local testing and is what we built our last release, 5.0.1, with. I'm hopeful the next release of PyOxidizer will fix the issue; the single version exclusion will help us notice if it doesn't. This issue was first identified in CI runs for an unrelated PR.² ¹ https://github.com/indygreg/PyOxidizer/releases/tag/pyoxidizer%2F0.23.0 ² #235
Maybe we can add a "checkout" system for files (based on my minimal understanding of the Perforce version control system). If one editor has checked out a file, then the other editors can only download the file but cannot update it. This would also work well with the web editor: the web editor would checkout the file while a user is editing so no one else can update the file. Once the user has made the intended edits, then the file can be checked in and the source of truth gets updated. |
Absolutely! I don't think anyone's saying otherwise. :-)
Good to know!
Perhaps Nextstrain CLI should implement conditional requests (If-Match and/or If-Unmodified-Since) to avoid stomping on unexpected edits. These are supported by the nextstrain.org API side, but not made use of by Nextstrain CLI. As their current support is for caching, there might be more nextstrain.org work needed to make them actually useful for conflict avoidance. One tricky part would be maintaining state on the Nextstrain CLI side. An in-browser editor would also greatly benefit from conditional requests, so work on the nextstrain.org to support them well would be worth it. And such an editor wouldn't suffer from questions of how to maintain state.
This would be one way to do it, a particularly infra-y way. It's certainly not a workflow I'd recommend to someone not steeped in software—the same way I wouldn't recommend authoring a manuscript in a Git repo to someone not steeped in software—but I suspect it'd work well for you and others on our team. For collaboration specifically, the non-software-y alternative to this way of doing things is using an in-browser editor with conflict avoidance, or ideally, live multi-user editing. (But I suspect the latter is still more effort than we want to expend, even if the experience is light years better.) |
I missed @joverlee521's reply while writing my own!
Locking has some inherent issues around folks/systems forgetting to release the lock (e.g. think cafe bathroom keys that walk off in someone's pocket). There's ways of mitigating the issue (records of who's holding the lock so you can pester them in side channels, automatic lock timeouts, manual lock overrides), but these all require quite a bit more interface and are all nicely avoided with conditional requests. (Also, they're not mutually exclusive, so we could do both.) Locks also mean having to build UI (e.g. on the web and in the CLI) to manually unlock resources, since there will always be times when that's needed and having to email support to do it isn't really tenable. |
0179c7b
to
5ee192d
Compare
Rebased this PR to fix unrelated CI issues. |
…ides Fixes handling of intra-document references like links and images that reference a definition found elsewhere in the file, e.g. before the first slide or on a different slide than the link/image reference. Such features are soon to be used for embedding images (see related PR below). Anyone previewing a document with such features outside of Auspice's narrative rendering would see the link/image references work, but they'd be broken when viewed thru Auspice. Since narrative documents are intended to gracefully degrade with conventional Markdown rendering and still be a viewable document (e.g. on GitHub), it's nice to address this issue for that reason alone. It's also nice to use the DOM as a structured way to split apart the document into slides, rather than relying on string manipulation and parsing. An alternative approach would be to use a Markdown parser that emits an AST or stream of nodes as an intermediate we could split into slides while still preserving references/definitions, but that seemed like a bigger change (and I'd have to go survey the parsers available instead of using the browser's built-in DOM). While the entire internals of parseNarrativeBody() change, the shape of the output "slide objects" is the same. The one minor exception is that slides will always have a "mainDisplayMarkdown" property (though it will usually be the empty string) even if they don't have a corresponding code block in the source text. All downstream usages I found condition not on property existence but on truthiness of the value, so this should result in no functional change. Resolves: <#1599> Related-to: <nextstrain/cli#235>
5ee192d
to
2f37016
Compare
Rebased onto latest master to fix conflicts, and added a changelog entry. |
This permits local authoring of narratives that reference images on the local filesystem without manual conversion (to remote images or embedded images) before upload. Unfortunately, both pyright and mypy (to a lesser degree) have enough issues type checking the pyparsing API (and thus our usage of it), that it is more tenable to skip their type checking of the new nextstrain.cli.markdown module than to work around all their issues. Hopefully this will be better in the future! It's possible some of the issues lay with the type annotations in pyparsing itself, and thus are more easily improved. A small number of new tests help ensure the parsing and embedding of images works as expected, though more should be added with time as we run across additional real-world test cases. The framework established here makes that as quick as adding another Markdown file to the test data.
…edding It's often useful to have an escape hatch, especially for things in the critical path (here, the upload path). If for any reason some narrative file finds a bug in the image embedding code and causes a crash or corrupts the Markdown, we'll appreciate having a way to disable it at runtime.
86a57df
to
4fee60a
Compare
Added 56f0673 and rebased onto latest master to fix conflicts. I plan to merge tomorrow (and probably cut a new CLI release) unless folks have review comments/objections before then. |
CHANGES.md
Outdated
@@ -51,6 +51,13 @@ change is described below. | |||
|
|||
[#240]: https://github.com/nextstrain/cli/pull/240 | |||
|
|||
* Local images used in a narrative are now automatically embedded into it when | |||
uploading the narrative to nextstrain.org via `nextstrain remote upload`. | |||
This permits local authoring of narratives that reference images on the local |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd clarify here that "local authoring" means in a markdown renderer and that relative images will not work in auspice.
Suggested by @jameshadfield in review.
This permits local authoring of narratives that reference images on the local filesystem without manual conversion (to remote images or embedded images) before upload.
Unfortunately, both pyright and mypy (to a lesser degree) have enough issues type checking the pyparsing API (and thus our usage of it), that it is more tenable to skip their type checking of the new nextstrain.cli.markdown module than to work around all their issues. Hopefully this will be better in the future! It's possible some of the issues lay with the type annotations in pyparsing itself, and thus are more easily improved.
A small number of new tests help ensure the parsing and embedding of images works as expected, though more should be added with time as we run across additional real-world test cases. The framework established here makes that as quick as adding another Markdown file to the test data.
CI issues
Both CI issues appear unrelated to this PR, so I'll likely fix them first in a separate PR.
ImportError
from inside ofcryptography.hazmat
. My guess here is a recent release of cryptography broke something.ubuntu-latest
and that started switching from 20.04 → 22.04 over the past two months. Fix should be easy: revert back to 20.04 for the 3.6 job.Testing
auspiceMainDisplayMarkdown
blocks work (most common use case?)