Block discovery via DOM parser #306

aduth · 2017-03-22T20:19:18Z

This pull request seeks to achieve goals of #301, namely converting post content into array of block objects, using DOM parser approach instead of incorporating formal grammar.

youknowriad

Now that I think more about this parsing strategy, I find it returns different results than the grammar parser with the same input (from the grammar).

Say we have this post content

<!-- wp:text -->
<p>
start paragraph
<! /wp:text -->
continue the same paragraph
</p>

I guess the dom parser won't find the closing comment here, while the grammar parser will and split this to two blocks with invalid HTML. What is the correct result here? I think from a block editor perspective, it should be the second (grammar), but I may be wrong?

And as I already said before (DM), I don't really care the implementation details of the parser (DOM or grammar) but I do think the grammar parser has a better API for now (that could be easily used here as well)

youknowriad · 2017-03-22T20:23:36Z

modules/blocks/index.js

+			type: match[ 1 ],
+			startNode: child,
+			blockNodes: []
+		};


I would appreciate if we can have the same API (input, output) regardless of the parsing implementation

string => Array<{ blockType: string, attrs: Object, rawContent: string }>

Part of the problem I was hoping to address here is in the block implementation, how do we fetch data from within the markup itself? rawContent is tough to work with unless you first assign it into a (new) element, then query that element. The advantage this approach has is that you skip directly to that last step.

An argument for rawContent is that we could use DOM, Regex or custom grammar even inside the custom blocks. You just parse the way you want to.

You just parse the way you want to.

Do we really want developers creating regular expressions and grammars to extract values from a markup string? I don't foresee any good outcome that could come from this 😬

Do we really want developers creating regular expressions and grammars to extract values from a markup string?

WordPress already has the shortcode API, and the parsing of the shortcode could be passed in as state or requested once it is loaded. What other values really need parsing?

@joyously I think your suggestion will work fine for new dynamic blocks if we choose to leverage shortcodes for those, but in the case of a static block (text, images, etc) the hope is to represent those in post_content as HTML for reasons identified in the Editor Technical Overview post.

So in the case of an quote block, we'd want to identify both the text of the quote and an optional citation from the HTML in post_content. There's been some discussion in #104 about options we could explore to achieve this. The point I'd made in #306 (comment) is that you could use DOM APIs like querySelector to get at the respective markup nodes (<p>, <cite>).

youknowriad · 2017-03-22T20:24:33Z

modules/editor/editor.js

@@ -1,5 +1,6 @@
 export default class Editor {
-	constructor( id, settings ) {
-		document.getElementById( id ).innerHTML = settings.content;
+	constructor( id ) {


This assumes we already mounted the post content in the DOM. I think it's better to get a string as input.

This assumes we already mounted the post content in the DOM

This is exactly what the current editor does anyways.

I find this breaks the data flow we're trying to build postcontent => state => render

I find this breaks the data flow we're trying to build

We should only really need to parse once, regardless of approach. Serializing and re-parsing is wasted effort if we're maintaining the state the whole time anyways.

aduth · 2017-03-22T21:11:59Z

Now that I think more about this parsing strategy, I find it returns different results than the grammar parser with the same input (from the grammar).

I don't really know the right answer to this. After reflecting, I still think if we were to do a DOM parser for benefit of a block implementation's parse (to retrieve state values), it would be better to have only a single node per block, in which case your example behaves closer to the concept of a nested block (albeit an invalid one with only a closing tag).

aduth · 2017-03-23T12:40:24Z

Closing since we merged #301 . Can serve as reference for future work if we want to reconsider a pure DOM approach instead.

aduth added the Framework Issues related to broader framework topics, especially as it relates to javascript label Mar 22, 2017

Try a DOM parser for discovering block nodes

6b76651

aduth force-pushed the add/dom-parsing branch from 198ce7a to 6b76651 Compare March 22, 2017 20:28

youknowriad reviewed Mar 22, 2017

View reviewed changes

aduth closed this Mar 23, 2017

aduth deleted the add/dom-parsing branch March 23, 2017 12:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Block discovery via DOM parser #306

Block discovery via DOM parser #306

aduth commented Mar 22, 2017

youknowriad left a comment

youknowriad Mar 22, 2017

aduth Mar 22, 2017

youknowriad Mar 22, 2017

aduth Mar 22, 2017

joyously Mar 22, 2017

aduth Mar 22, 2017

youknowriad Mar 22, 2017

aduth Mar 22, 2017

youknowriad Mar 22, 2017

aduth Mar 22, 2017

aduth commented Mar 22, 2017 •

edited

Loading

aduth commented Mar 23, 2017

Block discovery via DOM parser #306

Block discovery via DOM parser #306

Conversation

aduth commented Mar 22, 2017

youknowriad left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aduth commented Mar 22, 2017 • edited Loading

aduth commented Mar 23, 2017

aduth commented Mar 22, 2017 •

edited

Loading