Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Block discovery via DOM parser #306

Closed
wants to merge 1 commit into from
Closed

Block discovery via DOM parser #306

wants to merge 1 commit into from

Conversation

aduth
Copy link
Member

@aduth aduth commented Mar 22, 2017

Alternative to #301 (cc @youknowriad)

This pull request seeks to achieve goals of #301, namely converting post content into array of block objects, using DOM parser approach instead of incorporating formal grammar.

@aduth aduth added the Framework Issues related to broader framework topics, especially as it relates to javascript label Mar 22, 2017
@aduth aduth force-pushed the add/dom-parsing branch from 198ce7a to 6b76651 Compare March 22, 2017 20:28
Copy link
Contributor

@youknowriad youknowriad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think more about this parsing strategy, I find it returns different results than the grammar parser with the same input (from the grammar).

Say we have this post content

<!-- wp:text -->
<p>
start paragraph
<! /wp:text -->
continue the same paragraph
</p>

I guess the dom parser won't find the closing comment here, while the grammar parser will and split this to two blocks with invalid HTML. What is the correct result here? I think from a block editor perspective, it should be the second (grammar), but I may be wrong?

And as I already said before (DM), I don't really care the implementation details of the parser (DOM or grammar) but I do think the grammar parser has a better API for now (that could be easily used here as well)

type: match[ 1 ],
startNode: child,
blockNodes: []
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would appreciate if we can have the same API (input, output) regardless of the parsing implementation

string => Array<{ blockType: string, attrs: Object, rawContent: string }>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of the problem I was hoping to address here is in the block implementation, how do we fetch data from within the markup itself? rawContent is tough to work with unless you first assign it into a (new) element, then query that element. The advantage this approach has is that you skip directly to that last step.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An argument for rawContent is that we could use DOM, Regex or custom grammar even inside the custom blocks. You just parse the way you want to.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just parse the way you want to.

Do we really want developers creating regular expressions and grammars to extract values from a markup string? I don't foresee any good outcome that could come from this 😬

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want developers creating regular expressions and grammars to extract values from a markup string?

WordPress already has the shortcode API, and the parsing of the shortcode could be passed in as state or requested once it is loaded. What other values really need parsing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joyously I think your suggestion will work fine for new dynamic blocks if we choose to leverage shortcodes for those, but in the case of a static block (text, images, etc) the hope is to represent those in post_content as HTML for reasons identified in the Editor Technical Overview post.

So in the case of an quote block, we'd want to identify both the text of the quote and an optional citation from the HTML in post_content. There's been some discussion in #104 about options we could explore to achieve this. The point I'd made in #306 (comment) is that you could use DOM APIs like querySelector to get at the respective markup nodes (<p>, <cite>).

@@ -1,5 +1,6 @@
export default class Editor {
constructor( id, settings ) {
document.getElementById( id ).innerHTML = settings.content;
constructor( id ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes we already mounted the post content in the DOM. I think it's better to get a string as input.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes we already mounted the post content in the DOM

This is exactly what the current editor does anyways.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this breaks the data flow we're trying to build postcontent => state => render

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this breaks the data flow we're trying to build

We should only really need to parse once, regardless of approach. Serializing and re-parsing is wasted effort if we're maintaining the state the whole time anyways.

@aduth
Copy link
Member Author

aduth commented Mar 22, 2017

Now that I think more about this parsing strategy, I find it returns different results than the grammar parser with the same input (from the grammar).

I don't really know the right answer to this. After reflecting, I still think if we were to do a DOM parser for benefit of a block implementation's parse (to retrieve state values), it would be better to have only a single node per block, in which case your example behaves closer to the concept of a nested block (albeit an invalid one with only a closing tag).

@aduth
Copy link
Member Author

aduth commented Mar 23, 2017

Closing since we merged #301 . Can serve as reference for future work if we want to reconsider a pure DOM approach instead.

@aduth aduth closed this Mar 23, 2017
@aduth aduth deleted the add/dom-parsing branch March 23, 2017 12:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Framework Issues related to broader framework topics, especially as it relates to javascript
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants