-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Block discovery via DOM parser #306
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think more about this parsing strategy, I find it returns different results than the grammar parser with the same input (from the grammar).
Say we have this post content
<!-- wp:text -->
<p>
start paragraph
<! /wp:text -->
continue the same paragraph
</p>
I guess the dom parser won't find the closing comment here, while the grammar parser will and split this to two blocks with invalid HTML. What is the correct result here? I think from a block editor perspective, it should be the second (grammar), but I may be wrong?
And as I already said before (DM), I don't really care the implementation details of the parser (DOM or grammar) but I do think the grammar parser has a better API for now (that could be easily used here as well)
type: match[ 1 ], | ||
startNode: child, | ||
blockNodes: [] | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would appreciate if we can have the same API (input, output) regardless of the parsing implementation
string => Array<{ blockType: string, attrs: Object, rawContent: string }>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Part of the problem I was hoping to address here is in the block implementation, how do we fetch data from within the markup itself? rawContent
is tough to work with unless you first assign it into a (new) element, then query that element. The advantage this approach has is that you skip directly to that last step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An argument for rawContent
is that we could use DOM, Regex or custom grammar even inside the custom blocks. You just parse the way you want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just parse the way you want to.
Do we really want developers creating regular expressions and grammars to extract values from a markup string? I don't foresee any good outcome that could come from this 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really want developers creating regular expressions and grammars to extract values from a markup string?
WordPress already has the shortcode API, and the parsing of the shortcode could be passed in as state or requested once it is loaded. What other values really need parsing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@joyously I think your suggestion will work fine for new dynamic blocks if we choose to leverage shortcodes for those, but in the case of a static block (text, images, etc) the hope is to represent those in post_content
as HTML for reasons identified in the Editor Technical Overview post.
So in the case of an quote block, we'd want to identify both the text of the quote and an optional citation from the HTML in post_content
. There's been some discussion in #104 about options we could explore to achieve this. The point I'd made in #306 (comment) is that you could use DOM APIs like querySelector
to get at the respective markup nodes (<p>
, <cite>
).
@@ -1,5 +1,6 @@ | |||
export default class Editor { | |||
constructor( id, settings ) { | |||
document.getElementById( id ).innerHTML = settings.content; | |||
constructor( id ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes we already mounted the post content in the DOM. I think it's better to get a string as input.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes we already mounted the post content in the DOM
This is exactly what the current editor does anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this breaks the data flow we're trying to build postcontent => state => render
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this breaks the data flow we're trying to build
We should only really need to parse
once, regardless of approach. Serializing and re-parsing is wasted effort if we're maintaining the state the whole time anyways.
I don't really know the right answer to this. After reflecting, I still think if we were to do a DOM parser for benefit of a block implementation's |
Closing since we merged #301 . Can serve as reference for future work if we want to reconsider a pure DOM approach instead. |
Alternative to #301 (cc @youknowriad)
This pull request seeks to achieve goals of #301, namely converting post content into array of block objects, using DOM parser approach instead of incorporating formal grammar.