Skip to content

Commit 27886d7

Browse files
committed
updating to 2012
1 parent 715b0a2 commit 27886d7

File tree

2 files changed

+197
-0
lines changed

2 files changed

+197
-0
lines changed
Binary file not shown.

data-format/EPB-data-format.txt

+197
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,197 @@
1+
Data Format for the English PropBank
2+
3+
A PropBank file contains PropBank instances (one instance per line), where each instance is represented by the following format.
4+
5+
<instance> ::= <tree_path> <tree_id> <predicate_id> <annotator_id> <lemma>-<type> <roleset_id> <aspects>( <argument>)+
6+
<argument> ::= <terminal_id>:<height>-<label>
7+
8+
<tree_path> ::= path to the Treebank file
9+
<tree_id> ::= index of the tree containing the predicate (starts with 0, indicating the 1st tree in <tree_path>)
10+
<predicate_id> ::= terminal ID of the predicate (starts with 0, indicating the 1st terminal node of the tree)
11+
<annotator_id> ::= ID of the annotator (default for adjudicated instances: gold)
12+
<lemma> ::= lemma of the predicate
13+
<type> ::= type of the predicate (a: adjective, n: noun, v: verb)
14+
<roleset_id> ::= sense ID of the predicate
15+
<aspects> ::= no longer used (default: -----)
16+
<terminal_id> ::= ID of the 1st terminal node in this argument
17+
<height> ::= height of this argument phrase from its 1st terminal node
18+
<label> ::= PropBank label
19+
20+
Example 1:
21+
Here is an example based on PropBank file (wsj_0001.prop). The first instance indicates a verb-predicate "join" (join-v) whose roleset ID is "join.01", which is the 9th terminal node (predicate_id = 8) of the 1st tree (tree_id = 0) in the Treebank file, wsj_0001.parse.
22+
23+
wsj_0001.parse 0 8 gold join-v join.01 ----- 0:2-ARG0 7:0-ARGM-MOD 8:0-rel 9:1-ARG1 11:1-ARGM-PRD 15:1-ARGM-TMP
24+
wsj_0001.parse 1 2 gold be-v be.01 ----- 0:1-ARG1 2:0-rel 3:2-ARG2
25+
wsj_0001.parse 1 10 gold publish-v publish.01 ----- 10:0-rel 11:0-ARG0
26+
27+
The argument "9:1-ARG1" indicates the phrase, "(NP (DT the) (NN board))", in the tree (see below) because "(DT the)" is the 10th terminal node and "(NP (DT the) (NN board))" has a height 1 from this terminal node.
28+
29+
(TOP (S (NP-SBJ (NP (NNP Pierre)
30+
(NNP Vinken))
31+
(, ,)
32+
(ADJP (NML (CD 61)
33+
(NNS years))
34+
(JJ old))
35+
(, ,))
36+
(VP (MD will)
37+
(VP (VB join)
38+
(NP (DT the)
39+
(NN board))
40+
(PP-CLR (IN as)
41+
(NP (DT a)
42+
(JJ nonexecutive)
43+
(NN director)))
44+
(NP-TMP (NNP Nov.)
45+
(CD 29))))
46+
(. .)))
47+
48+
49+
50+
51+
52+
MULTIPLE NODES FOR ARGUMENTS: On LINK arguments
53+
Additionally an argument can specify more than one node a tree linked by a (*) link.
54+
55+
Example 2:
56+
Here is an example based on the 3-D tree in PropBank file (bolt-eng-DF-200-192448-6189512.parse). The terminal nodes have been labeled to the left of the tree and the annotated relation has been marked to the right of the node for readability purposes.
57+
58+
bolt-eng-DF-200-192448-6189512.parse 3 11 gold go-v go.06 ----- 5:1*8:1*20:1-ARGM-MNR 9:1-ARG0 11:0-rel 12:1-ARG2 13:2-ARG1 5:1*8:1-LINK-SLC
59+
60+
61+
0: (TOP (S (NP-SBJ (NN Aggression)
62+
1: (CC and)
63+
2: (NN hostility))
64+
3: (VP (VBZ is)
65+
4: (ADVP (RB obviously))
66+
5: (NP-PRD (NP (DT the)
67+
6: (JJS worst)
68+
7: (NN way))
69+
8: (SBAR (WHADVP-4 (-NONE- 0))
70+
9: (S (NP-SBJ-1 (-NONE- *PRO*))
71+
10: (VP (TO to)
72+
11: (VP (VB go) <----[REL]
73+
12: (PRT (RP about))
74+
13: (S-PRP (NP-SBJ (-NONE- *PRO*-1))
75+
14: (VP (TO to)
76+
15: (VP (VB get)
77+
16: (S (NP-SBJ-3 (DT these)
78+
17: (NNS changes))
79+
18: (VP (VBN made)
80+
19: (NP (-NONE- *-3)))))))
81+
20: (ADVP (-NONE- *T*-4))))))))
82+
21: (. .)))
83+
84+
Sentence:
85+
"Aggression and hostility is obviously the worst way *PRO* to go about to get these changes made *T*."
86+
87+
This example shows two types of multiple node linking. Multiple nodes are found in the LINK argument (LINK-SLC) and the ARGM-MNR argument. The motivation behind the multiple node linking here is to capture the semantics of the trace (*T*) and link it to the proper semantic antecedent.
88+
89+
LINK-SLC is one of the 3 types of LINK arguments that specify a semantic relationship between the two linked nodes. All properly annotated LINK arguments will always have the following properties:
90+
- multiple nodes lined with "*"
91+
- at least one of the nodes will sit inside the domain of locality of the predicate -- generally, within the S or SBAR node headed by the predicate of concern
92+
- an anchoring node that carries an ARGM or a numbered argument label.
93+
94+
In this example, the argument "5:1*8:1-LINK-SLC", the local node "8:1" referring to the node "(WHADVP-4 (-NONE- 0))" is the argument relevant to the predicate "go" (go.06) and the node "8:1" has been specified as its semantic antecedent. The anchoring node here is the trace in "5:1*8:1*20:1" labeled as "ARGM-MNR": "20:1" is the position of trace where the semantic antecedent should be interpreted, this is * linked to the node "8:1" through syntactic indexing (see index 4 in the tree), which is the same node to which SLC refers to. Finally the "ARGM-MNR" is linked to "5:1" of the SLC which specifies the semantic antecedent of the trace.
95+
96+
This allows the following paraphrased interpretation for the go:
97+
98+
"*PRO* go about to get these changes made 'in the worst way'"
99+
100+
In addition to the LINK-SLC exist 2 other LINKs.
101+
- LINK-PRO: semantic link of the *PRO* argument if semantically recoverable in the sentence
102+
- LINK-PSV: semantic link of the passive trace to the SBJ constituent
103+
104+
105+
106+
107+
MULTIPLE NODES FOR ARGUMENTS: On Concatenation:
108+
In addition to (*), an argument can specify more than one node a tree linked by a (;) or (,). These are cases of semantic concatenation. Two syntactic constituents that are interpreted as a single semantic argument may be concatenated using (;) or (,).
109+
110+
Example 1:
111+
(;) concatenation is specific to ICH (Interpret Constituent Here) nodes. For more discussion on ICH trace nodes, please refer to English TreeBank guidelines. The predicate of the instance and ICH in the following tree is marked for readability.
112+
113+
bolt-eng-DF-199-192772-6810984.parse 24 14 gold pursue-v pursue.01 ----- 6:1*15:1-ARG1 10:2;16:1-ARGM-MNR 14:0-rel
114+
115+
0: (TOP (S (NP-SBJ (PRP I))
116+
1: (ADVP (RB really))
117+
2: (, ,)
118+
3: (ADVP (RB really))
119+
4: (VP (VBP hope)
120+
5: (SBAR (IN that)
121+
6: (S (NP-SBJ-2 (JJ alternative)
122+
7: (NNS sources))
123+
8: (VP (VBP are)
124+
9: (VP (VBG being)
125+
10: (ADVP (ADVP (RB as)
126+
11: (RB enthusiastically))
127+
12: (PP (-NONE- *ICH*-1))) <---[ICH]
128+
13: (HYPH -)
129+
14: (VP (VBN pursued) <---[REL]
130+
15: (NP (-NONE- *-2))
131+
16: (SBAR-1 (IN as)
132+
17: (S (NP-SBJ (NP (DT the)
133+
18: (NN urgency))
134+
19: (PP (IN of)
135+
20: (NP (DT the)
136+
21: (NN situation))))
137+
22: (VP (VBZ dictates)
138+
23: (SBAR (-NONE- 0)
139+
24: (S (NP-SBJ (PRP they))
140+
25: (VP (MD should)
141+
26: (VP (-NONE- *?*))))))))))))))
142+
27: (. .)))
143+
144+
Sentence:
145+
"Éalternative sources are being as enthusiastically pursued as the urgency of the situation dictatesÉ"
146+
147+
The ICH trace specifies that the SBAR at node 16:1 should be interpreted as being a part of the PP node at 12:1. Thus, in line with the TreeBank annotation, PropBank specifies that the ARGM-MNR argument of the verb "pursue" (pursue.01) includes both the phrases found at "10:2" and "16:1". Therefore, the two nodes are concatenated with the ';'.
148+
149+
Example 2:
150+
(,) concatenation has two major uses.
151+
152+
The first use is found in the "rel" of the sentence. It specifies multiword predicate such as verb particle constructions.
153+
154+
DF/01/bolt-eng-DF-200-192448-6191297.parse 3 4 gold cry-v cry.03 ----- 0:1-ARG0 4:0,5:1-rel 6:1-ARG1 8:1-ARGM-ADV
155+
156+
0: (TOP (S (S (NP-SBJ (DT The)
157+
1: (JJ Libyan)
158+
2: (NNS rebels))
159+
3: (VP (VBP are)
160+
4: (VP (VBG crying) <---[REL]
161+
5: (PRT (RP out))
162+
6: (PP-CLR (IN for)
163+
7: (NP (NN assistance)))
164+
8: (PP-CLR (IN in)
165+
9: (S-NOM (NP-SBJ (-NONE- *PRO*))
166+
10: (VP (VBG overthrowing)
167+
11: (NP (NNP Gadaffi))))))))
168+
12: (, ,)
169+
13: [TREE TRUNCATED]
170+
171+
The nodes "crying" (4:0) and "out" (5:1) are concatenated to specify that the two words act in concert as a single predicate in this instance. The definition for cry.03 is defined accordingly.
172+
173+
174+
Example 3:
175+
(,) can also unite multiple nodes for non "rel".
176+
177+
bolt-eng-DF-228-194841-7116229.parse 83 1 gold seem-v seem.01 ----- 0:1,2:1-ARG1 1:0-rel 12:1-ARGM-GOL 15:1-ARGM-TMP
178+
179+
0: (TOP (S (NP-SBJ (NNP Madison)) <---[ARG1]
180+
1: (VP (VBZ seems) <---[REL]
181+
2: (PP-CLR (IN like) <---[ARG1]
182+
3: (NP (NP (DT a)
183+
4: (JJ nice)
184+
5: (NN place))
185+
6: (SBAR (WHADVP-2 (-NONE- 0))
186+
7: (S (NP-SBJ (-NONE- *PRO*))
187+
8: (VP (TO to)
188+
9: (VP (VB settle)
189+
10: (PRT (RP down))
190+
11: (ADVP-LOC (-NONE- *T*-2))))))))
191+
12: (PP (IN for)
192+
13: [TREE TRUNCATED]
193+
194+
The nodes at "0:1" and "2:1" are interpreted as a single semantic argument of the verb "seem" (seem.01). Thus the two nodes are concatenated into 0:1,2:1-ARG1.
195+
196+
197+
For further information on PropBank annotation, please refer to the English PropBank guideline.

0 commit comments

Comments
 (0)