Skip to content

vtf txb display syntaxtree

srccircumflex edited this page Apr 24, 2023 · 4 revisions

↑ vtf-txb-display


Basic objects for the design of an abstract syntax tree (AST). These are derived in highlightertree to create the highlighters.

Module contents


The tree object

class syntaxtree.SyntaxTree

Parser base object for designing an abstract syntax tree.

The branches of the syntax tree are defined by SyntaxBranch objects and are attached to the main root branch; the start/node, termination, and leaves of a branch (and root branch) are created as a RegularExpression -- SyntaxLeaf-factory pair.

Leaves defined in globals (SyntaxGlobals) apply independently of currently active branches.

from re import compile

#                     _ _ _ _ _ _ _ _
#                    |               |
#                    B - l - l - g - l ... E
#                   /
# R - l - g - l - l ... i

ast = SyntaxTree()

B = SyntaxBranch(node_pattern=compile("\\("), stop_pattern=compile("\\)"))

B.add_leaf(compile("regex"), lambda parent, pattern, match, relstart: SyntaxLeaf(parent, match, relstart))
ast.globals.add(compile("regex"), lambda parent, pattern, match, relstart: SyntaxLeaf(parent, match, relstart))
from re import compile

#                      _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
#                    |                                  |
#                    B1 - l - l - l - E                 |
#                   /                                  /
# R - l - l - l - l - l - l ... i         B3 - l - l - l - l - l - E
#                          \             /                 |
#                           B2 - l - l - l ... E           |
#                            \ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |

ast = SyntaxTree()

B1 = SyntaxBranch(node_pattern=compile('"'), stop_pattern=compile('"'))
B2 = SyntaxBranch(...
B3 = SyntaxBranch(...





Parsing Process

When using the special characters of regular expressions that refer to the beginning or end of a string, such as "^" or "\\Z", it must be noted that the row is sliced during parsing. The following illustration sketches the parsing process.

from re import compile

square_bracket_branch = SyntaxBranch(node_pattern=compile("\\[node]"), stop_pattern=compile("\\[end]"))
curly_bracket_branch = SyntaxBranch(node_pattern=compile("\\{node}"), stop_pattern=compile("\\{end}"))

#   node_leaf      node_leaf          end_leaf              end_leaf
#    |    |  leafs  |    |    leafs    |   |     leafs       |   |
#    |    |[ - - - -|    |{ - - - - - }|   |- - - - - - - - ]|   |
#    |    |         |    |             |   |                 |   |
"... [node] foo bar {node} foo bar ... {end} ... foo bar ... [end] ..."

"[node]"                                                    # node found
" foo bar {node} foo bar ... {end} ... foo bar ... [end]"   # search for a sub-node
"{node}"                                                    # sub-node found
" foo bar "                                                 # applying Leave configurations to the remaining string
" foo bar ... {end} ... foo bar ... [end]"                  # search for the end of a branch
"{end}"                                                     # end of a branch found
" foo bar ... "                                             # apply leave configurations to the remaining string

Applicable leaves, branch-node leaves, and ending leaves of a branch are appended to a passed list (as SyntaxLeaf objects) during parsing; active branches are passed within a list to the parsing process and expanded by it should another branch occur, or truncated should a branch end.

A SyntaxLeaf contains the matched re.Match object, the origin SyntaxBranch and the relative starting point in the row, since it is sliced during the pars process. The methods with total_* prefix return the actual position in the row.

Also, the node attribute is not None but the beginning SyntaxBranch object if the leaf represents the beginning of a branch.

The parse process is performed row by row using the methods with map_* prefix, or only a part solely to capture the branch bifurcations by branch_grow. The methods map_leafs and branch_grow are interfaces to the actual methods which are realized as recursions, these underlying methods cannot be overwritten in inheritances.

globals() -> SyntaxGlobals
The SyntaxGlobals.
root() -> SyntaxBranch
The root - SyntaxBranch

branch_growing(string, has_end, _branches_) -> list[SyntaxBranch]

Apply to a string only the SyntaxBranch configurations (skip parsing the leaves) and expand or shorten the list of _branches_.

Via has_end it is specified whether the string has a terminating end and is processed in connection with the multiline parameterization of the SyntaxBranch.

The list of _branches_ represents the current sequence of active SyntaxBranch's; if it is empty, the root is the current SyntaxBranch.

map_globals(string, _out_) -> list[SyntaxLeaf]

Apply the leaves defined in the globals to a string, append the parsed leaves to the _out_ list as SyntaxLeaf objects.

map_leafs(string, has_end, _branches_, _leaf_out_) -> tuple[list[SyntaxBranch], list[SyntaxLeaf]]

Apply the entire configurations of the SyntaxBranch's and their SyntaxLeaf's to a string. Append the parsed leaves to the list _leaf_out_ and expand or shorten the list of active _branches_.

Via has_end it is specified whether the string has a terminating end and is processed in connection with the multiline parameterization of the SyntaxBranch.

The list of _branches_ represents the current sequence of active SyntaxBranch's; if it is empty, the root is the current SyntaxBranch.

map_tree(string, has_end, _branches_, _leaf_out_) -> tuple[list[SyntaxBranch], list[SyntaxLeaf]]

Apply the entire configurations of the SyntaxBranch's and their SyntaxLeaf's to a string. Append the parsed leaves to the list _leaf_out_ and expand or shorten the list of active _branches_.

Then apply the leaves defined in globals to the string and append the parsed leaves as SyntaxLeaf objects to the _leaf_out_ list.

Via has_end it is specified whether the string has a terminating end and is processed in connection with the multiline parameterization of the SyntaxBranch.

The list of _branches_ represents the current sequence of active SyntaxBranch's; if it is empty, the root is the current SyntaxBranch.

purge_globals() -> SyntaxGlobals

Reinitialize the current SyntaxGlobals.

purge_root() -> SyntaxBranch

Reinitialize the current root - SyntaxBranch.

set_globals(__new_globals) -> None

Set the SyntaxGlobals.

set_root(__new_root) -> None

Set the root - SyntaxBranch.
The tree components

class syntaxtree.SyntaxBranch

Syntax branch object used by the SyntaxTree.

The beginning of a branch is defined by the node_pattern and the leaf is created by the node_leaf factory, which must return a SyntaxLeaf object with the node attribute set to the beginning branch.

If the beginning of a branch is recognized by the parser methods in the AST, a definable activate function is executed, which must return a branch object.

By default, the same object is returned and appended to the sequence of active branches if the stop_pattern is a pattern;

if the stop_pattern is defined as an executable object, it receives the SyntaxBranch object and the node-SyntaxLeaf on activation and must return a pattern that defines the end of the branch. Upon activation, a "deep copy" (snap) of the branch object is then created and appended to the sequence of active branches, if activate was None at creation.

The terminating leaf object is then created by the factory stop_leaf when the pattern occurs.

The leaf factories receive the parent SyntaxBranch, the applicable pattern, the re.Match and the relative start of the sub-string when the node_pattern or stop_pattern occurs; in additionally, the beginning SyntaxBranch object is passed to the node_leaf factory.

The parameters multirow and multiline are evaluated in the parser methods of the AST. If the parameter multirow is set to True, after processing a single string the branch is NOT removed from the sequence of active branches if the string is not line ending. If the parameter multiline is set to True, the branch will be kept in the sequence even over line endings.

Via the parameterization label each object can be passed to identify the branch.

The attributes __start_leaf__ and __parent_branch__ are only set by the AST when the activate-METHOD is executed, then the stop_pattern is determined within the method and finally the PARAMETERIZED activate-FUNCTION is executed.

The leaves of the branch are created as pattern -- SyntaxLeaf-factory pairs and further forks to branches within the branch are also defined as SyntaxBranch objects.

from re import compile

ast = SyntaxTree()

B1 = SyntaxBranch(node_pattern=compile("\\("), stop_pattern=compile("\\)"), multiline=True, label="numbers in tuple")
B1.add_leaf(compile(","), label="comma")

B2 = SyntaxBranch(node_pattern=compile("#"), stop_pattern="$", multirow=False, label="comment")
B2.add_leaf(compile(".+"), lambda parent, pattern, match, relstart: SyntaxLeaf(parent, match, relstart))


Via methods with adopt_* prefix definitions of branches and/or leaves can be adopted from other branches.

from re import compile

#                     _ _ _ _ _ _ _ _
#                    |               |
#                    B - l - l - l - l ... E
#                   /
# R - l - l - l - l ... i

ast = SyntaxTree()

B = SyntaxBranch(...
from re import compile
#                    _ _ _ _ _ _ _[adopt_branches(root) **] _ _ _ _ _ _ _ _ _ _ _
#                   |     _ _ [adopt_branches(B1) + adopt_leafs(B1)] _ _ _ _ _   |
#                   |    |                                                    |  |
#                   |   B1 - l - l - l - E       B3 - l - l - l - l - l - E   |  |
#                   \ /      \                /                               |  |
#                    /        B2 - l - l - l ... E                            |  |
#                   /                                                         |  |
# R - l - l - l - l - l - l ... i         B5 - l - l - l - l - l - E          |  |
#                         /\             /                                    |  |
#                        /  B4 - l - l - l ... E                              |  |
#                       |_[**] _ _ /   |  \ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |  |
#                                      | _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ |
ast = SyntaxTree()

B1 = SyntaxBranch(...
B2 = SyntaxBranch(...
B3 = SyntaxBranch(...
B4 = SyntaxBranch(...
B5 = SyntaxBranch(...









__node_leaf__: SyntaxLeaf

__parent_branch__: SyntaxBranch

branches: tuple[SyntaxBranch]

label: Any

leafs: tuple[tuple[Pattern | str, Callable[ [SyntaxBranch, Pattern | str, Match, int], SyntaxLeaf], Any], ...]

multiline: bool

multirow: bool

node_leaf: Callable[ [SyntaxBranch, Pattern | str, Match, int, SyntaxBranch], SyntaxLeaf]

node_pattern: Pattern | str

stop_leaf: Callable[ [SyntaxBranch, Pattern | str, Match, int], SyntaxLeaf]

stop_pattern: Pattern | str | None

activate(node_leaf, parent) -> SyntaxBranch

Set the __node_leaf__ and __parent_branch__ attributes, poll the stop_pattern and return the activated version of the SyntaxBranch object.

Executed inside the pars methods in the SyntaxTree and gets the node-SyntaxLeaf and the parent SyntaxBranch.

add_branch(branch) -> None

Add a fork to the branch.

add_leaf(pattern, leaf=lambda parent, pattern, match, relstart: SyntaxLeaf(parent, match, relstart), label=None) -> None

Add a leaf of the branch as a pattern -- SyntaxLeaf-factory.

from re import compile
branch.add_leaf(compile(","), label="comma")
branch.add_leaf(compile(".+"), lambda parent, pattern, match, relstart: SyntaxLeaf(parent, match, relstart))

For later identification, each object can be used as a label.

adopt_branches(branch) -> None

Add forks to the branch from another branch.

adopt_leafs(branch_or_globals) -> None

Add leaves from another branch to the branch.

adopt_self() -> None

Add to the branch itself for a recursion.

branch_mapping(string, relstart, _out_) -> list[SyntaxLeaf]

Apply each definition of nodes to branches to the "string". Append matches as SyntaxLeaf objects to the list _out_.

relstart specifies the start position of a substring.

This method is executed inside the parsing methods in the SyntaxTree.

leaf_mapping(string, relstart, _out_) -> list[SyntaxLeaf]

Apply each branch leaf definition to the string. Append matches as SyntaxLeaf objects to the list _out_.

relstart specifies the start position of a substring.

This method is executed inside the parsing methods in the SyntaxTree.

poll_stop_pattern(node_leaf=None) -> None

Poll the stop_pattern.

Executed within the activate method and is only efficient if the stop_pattern is defined as an executable object.

  • AttributeError: node_leaf is not passed and __node_leaf__ is not yet set in the object.

remove_branches_by_attributes(deep=False, _or_=False, **attributes) -> None

Remove branch ramifications with the applicable attributes [, to the deep of all branches and ramifications]. Remove when all attribute conditions are satisfied _or_ when only one attribute applies.

remove_branches_by_label(label, deep=False) -> None

Remove branch ramifications with label [, to the deep of all branches and ramifications].

remove_leafs_by_label(label, deep=False) -> None

Remove SyntaxLeaf definitions with label [, in the deep of all branches and ramifications].

remove_leafs_by_pattern(pattern, deep=False) -> None

Remove SyntaxLeaf definitions with pattern [, in the deep of all branches and ramifications].

snap() -> SyntaxBranch

Create a "deep copy" (snap) from the current attributes of the SyntaxBranch. (Preservation should e.g. exist dependencies to the stop_pattern).

starts(string, relstart, parent) -> SyntaxLeaf | None

Return a SyntaxLeaf when the branch starts in the string.

Executed inside the pars methods in the SyntaxTree and gets the relative starting point of a substring and the parent-SyntaxBranch.

stops(string, relstart) -> SyntaxLeaf | None

Return a SyntaxLeaf when the branch stops in the string.

Executed inside the pars methods in the SyntaxTree and gets the relative starting point of a substring.

class syntaxtree.SyntaxGlobals

A container for globally defined SyntaxLeaf's.

The global leafs are created as RegularExpression -- SyntaxLeaf-factory pairs, additionally each object can be used as a label to remove definitions afterwards.

from re import compile

ast = SyntaxTree()
ast.globals.add(compile("bar"), label=Any)
ast.globals.add(compile("foo"), label=object())

leafs: tuple[tuple[Pattern | str, Callable[ [SyntaxGlobals, Pattern | str, Match, int], SyntaxLeaf], Any], ...]

add(pattern, leaf=lambda parent, pattern, match, relstart: SyntaxLeaf(parent, match, relstart), label=None) -> None

Add a SyntaxLeaf-rule. To the leaf-factory is passed on occurrence of a match on pattern; the SyntaxGlobals object, the pattern, the re.Match and the relative start, the execution should return an SyntaxLeaf. Additionally, each object can be used as a label to remove definitions afterwards.

mapping(string, relstart, _out_) -> list[SyntaxLeaf]

Apply each definition of global leaves to the string. Append matches as SyntaxLeaf objects to the list _out_.

relstart specifies the start position of a substring.

This method is executed inside the parsing methods in the SyntaxTree.

remove_by_label(label) -> None

Remove all definitions with label.

remove_by_pattern(pattern) -> None

Remove all definitions with pattern.

class syntaxtree.SyntaxLeaf

The syntax leaf object is generated as the result of a parse by SyntaxTree and is used for further processing.

The object contains the parent SyntaxBranch or SyntaxGlobals, the re.Match, the relative start ( relstart ) of a substring and, if the leaf represents the beginning of a SyntaxBranch, the beginning branch under node.

The priority of a leaf over others found in a string is realized with __lt__.

The parameterization can be defined differently by an executable object, this executable object gets this SyntaxLeaf and the other SyntaxLeaf and must return a boolean value if this SyntaxLeaf has a higher priority than the other. If True is passed to priority, the earliest leaf has the highest priority, and if there is a tie the match with the largest span has priority. If the priority parameter is False, the earliest leaf with the smallest span has priority.

Since the string is sliced during the parsing process, the start/end/span methods of the re.Match object (also realized as properties in the SyntaxLeaf) may not return the actual values with reference to the passed string; therefore, the values can be obtained considering the relative starting point via the properties with total_* prefix.

match: re.Match

node: SyntaxBranch | None

parent: SyntaxGlobals | SyntaxBranch

relstart: int

end() -> int
span() -> tuple[int, int]
start() -> int
total_end() -> int
total_span() -> tuple[int, int]
total_start() -> int

Date: 13 Dec 2022
Version: 0.1
Author: Adrian Hoefflin [srccircumflex]
Doc-Generator: "pyiStructure-RSTGenerator" <prototype>
Clone this wiki locally