Skip to content

Commit 0ca793a

Browse files
committed
0.12.0 - Major refactoring, updated jtext-parser to latest version, switched from TextIteratorParser (previously TextParserImpl) to TextCharsParser and files are now ready and passed through the parsing workflow as char[] with offset and length instead of as a String. Also more clearly named classes to identify the two major parsing phases: tokenization and parsing/extracting. Moved/renamed a lot of classes/packages. Added more generic tokenization via CodeTokenizerBuilder.
1 parent 3029d0d commit 0ca793a

File tree

74 files changed

+1147
-519
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

74 files changed

+1147
-519
lines changed

.classpath

-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,6 @@
99
<classpathentry kind="con" path="org.eclipse.jdt.USER_LIBRARY/Lombok"/>
1010
<classpathentry kind="con" path="org.eclipse.jdt.USER_LIBRARY/TemplateUtil"/>
1111
<classpathentry kind="con" path="org.eclipse.jdt.USER_LIBRARY/TestChecks"/>
12-
<classpathentry kind="lib" path="C:/Users/TeamworkGuy2/Documents/Java/Libraries/data-transfer/bin/data_transfer.jar"/>
1312
<classpathentry kind="lib" path="C:/Users/TeamworkGuy2/Documents/Java/Libraries/jackson/jar/jackson-annotations-2.5.0.jar"/>
1413
<classpathentry kind="lib" path="C:/Users/TeamworkGuy2/Documents/Java/Libraries/jackson/jar/jackson-core-2.5.0.jar"/>
1514
<classpathentry kind="lib" path="C:/Users/TeamworkGuy2/Documents/Java/Libraries/jackson/jar/jackson-databind-2.5.4.jar"/>

CHANGELOG.md

+19-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,25 @@ This project does its best to adhere to [Semantic Versioning](http://semver.org/
44

55

66
--------
7-
###[0.11.0](https://github.com/TeamworkGuy2/JParseCode/commit/76f3cf2eba9327659f6e454d8bfe3208695cdab5) - 2016-09-05
7+
###[0.12.0](N/A) - 2016-09-13
8+
#### Added
9+
* PerformanceTrackers, ParseTimes, TokenizeStepDetails in new twg2.parser.codeParser.tools.performance package - used for tracking performance
10+
11+
#### Changed
12+
* biggest change is switching from jtext-parser's TextIteratorParser (previously: TextParserImpl) to TextCharsParser and files are read as char[] and stored in ParseInput and CodeFileSrc as char[] with offset and length, this will hopefully provide a small performance boost since each file's contents is copied one less time (no more new String(...) copy) and TextCharsParser is designed to take a char[] without any data copying
13+
* second large change is moving the parsing process toward a clearly defined two step process, the first step is called 'tokenization' and the second is called 'parsing/extracting'
14+
* file tokenization logic has been split up. Cs and Java FileTokenizer classes now return CodeTokenizerBuilder instances and CodeTokenizerBuilder contains all the generic logic for running the tokenization process
15+
* updated to new latest dependencies, especially jtext-parser
16+
* moved CodeFragment, DocumentFragment, and DocumentFragmentText from package twg2.parser.documentParser -> twg2.parser.fragment
17+
* moved CodeFileParsed, CodeFileSource, ParseInput, and ParserWorkflow from package twg2.parser.codeParser -> twg2.parser.workflow
18+
* moved/renamed twg2.parser.documentParser.DocumentParser -> twg2.parser.tokenizers.CodeTokenizerBuilder
19+
* moved IsParentChild from package twg2.parser.documentParser -> twg2.parser.tokenizers
20+
* CommentAndWhitespaceExtractor now drops the last trailing newlines from the comment strings
21+
* updated a number of unit tests
22+
23+
24+
--------
25+
###[0.11.0](https://github.com/TeamworkGuy2/JParseCode/commit/3029d0d08bda6cc308d3732eb09eb971fd0e6030) - 2016-09-06
826
#### Added
927
* __basic C# and Java enum parsing__
1028
* Added twg2.ast.interm.field FieldDef and FieldDefResolved to represent enum members (TODO could use some clarification/refactoring)

README.md

+6-4
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
JParseCode
22
==============
3-
version: 0.11.0
3+
version: 0.12.0
44

55
In progress C#/Java/TypeScript parser tools built atop [JTextParser] (https://github.com/TeamworkGuy2/JTextParser), [Jackson] (https://github.com/FasterXML/jackson-core/) (core, databind, annotations) and half a dozen other utility libraries.
66

7-
####Goals:
7+
### Goals:
88
* A competent source code parser that can turn C#, Java, or JavaScript/TypeScript code into a simple AST like structure ('competent' meaning this project aims to support common use cases, not every syntatic feature of the supported languages).
99
* A code first parser aimed at manipulating the resulting AST and writing it back as source code or JSON. With the goal of allowing simple language constructs like interfaces and data models to be transpiled to different languages.
1010

11-
####Not Goals:
11+
### Not Goals:
1212
* NOT to create another compiler for C#, Java, or JS/TS. This project's parser expects valid code as input, the few error messages that are present are NOT design to highlight syntax errors in the input.
1313
* NOT to create a valid AST for each supported language. Rather an AST like structure that supports the lowest common denominator between the targeted languages. This means many AST compromises are inevitable due to differences in language specs.
1414

@@ -179,7 +179,9 @@ JSON Result (printed to System.out):
179179

180180

181181
--------
182-
Command Line Interface (CLI):
182+
183+
### Command Line Interface (CLI)
184+
183185
A command line call looks like:
184186
```
185187
path/to/java -jar path/to/jparse-code.jar

bin/jparse_code-with-tests.jar

16.9 KB
Binary file not shown.

bin/jparse_code.jar

16.5 KB
Binary file not shown.

package-lib.json

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,11 @@
11
{
2-
"version" : "0.11.0",
2+
"version" : "0.12.0",
33
"name" : "jparse-code",
44
"description" : "An in-progress suite of parsing tools for C#, Java, and TypeScript source code",
55
"homepage" : "https://github.com/TeamworkGuy2/JParseCode",
66
"license" : "MIT",
77
"main" : "./bin/jparse_code.jar",
88
"dependencies" : {
9-
"data-transfer": "*",
109
"jackson-annotations": "~2.5.0",
1110
"jackson-core": "~2.5.0",
1211
"jackson-databind": "~2.5.0",

src/twg2/ast/interm/block/BlockAst.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
import twg2.annotations.Immutable;
55
import twg2.ast.interm.classes.ClassSig;
66
import twg2.parser.codeParser.BlockType;
7-
import twg2.parser.documentParser.CodeFragment;
7+
import twg2.parser.fragment.CodeFragment;
88
import twg2.treeLike.simpleTree.SimpleTree;
99

1010
/**

src/twg2/ast/interm/classes/ClassAst.java

-4
Original file line numberDiff line numberDiff line change
@@ -60,10 +60,6 @@ public Impl(T_SIG signature, List<List<String>> usingStatements, List<? extends
6060
@SuppressWarnings("unchecked")
6161
val enumsCast = (List<T_ENUM>)enums;
6262

63-
if(enumsCast != null && enumsCast.size() > 0) {
64-
System.out.println();
65-
}
66-
6763
this.signature = signature;
6864
this.usingStatements = usingStatements;
6965
this.enumMembers = enumsCast;

src/twg2/ast/interm/field/FieldDef.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
import twg2.parser.codeParser.AccessModifier;
1212
import twg2.parser.codeParser.extractors.DataTypeExtractor;
1313
import twg2.parser.codeParser.tools.NameUtil;
14-
import twg2.parser.documentParser.CodeFragment;
14+
import twg2.parser.fragment.CodeFragment;
1515
import twg2.parser.fragment.CodeFragmentType;
1616
import twg2.parser.output.WriteSettings;
1717
import twg2.text.stringEscape.StringEscapeJson;

src/twg2/ast/interm/field/FieldDefResolved.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
import twg2.io.write.JsonWrite;
1111
import twg2.parser.codeParser.AccessModifier;
1212
import twg2.parser.codeParser.tools.NameUtil;
13-
import twg2.parser.documentParser.CodeFragment;
13+
import twg2.parser.fragment.CodeFragment;
1414
import twg2.parser.output.WriteSettings;
1515
import twg2.text.stringEscape.StringEscapeJson;
1616
import twg2.treeLike.simpleTree.SimpleTree;

src/twg2/parser/codeParser/AstExtractor.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
import twg2.ast.interm.field.FieldSig;
1111
import twg2.ast.interm.method.MethodSig;
1212
import twg2.ast.interm.type.TypeSig;
13-
import twg2.parser.documentParser.CodeFragment;
13+
import twg2.parser.fragment.CodeFragment;
1414
import twg2.parser.stateMachine.AstParser;
1515
import twg2.treeLike.simpleTree.SimpleTree;
1616

src/twg2/parser/codeParser/AstNodeConsumer.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import java.util.List;
44

5-
import twg2.parser.documentParser.DocumentFragmentText;
5+
import twg2.parser.fragment.DocumentFragmentText;
66
import twg2.treeLike.IndexedSubtreeConsumer;
77
import twg2.treeLike.simpleTree.SimpleTree;
88

src/twg2/parser/codeParser/AstNodePredicate.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
import java.util.List;
44

5-
import twg2.parser.documentParser.DocumentFragmentText;
5+
import twg2.parser.fragment.DocumentFragmentText;
66
import twg2.treeLike.IndexedSubtreeConsumer;
77
import twg2.treeLike.simpleTree.SimpleTree;
88

src/twg2/parser/codeParser/ParseInput.java

-23
This file was deleted.

src/twg2/parser/codeParser/ParserBuilder.java

-79
This file was deleted.

src/twg2/parser/codeParser/codeStats/ParseDirectoryCodeFiles.java

+7-6
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
import twg2.parser.codeParser.extractors.CommentAndWhitespaceExtractor;
2424
import twg2.parser.language.CodeLanguage;
2525
import twg2.parser.language.CodeLanguageOptions;
26-
import twg2.text.stringUtils.StringReplace;
2726
import twg2.text.stringUtils.StringSplit;
2827
import twg2.tuple.Tuples;
2928

@@ -139,16 +138,18 @@ public static ParseDirectoryCodeFiles parseFileStats(Path relativePath, List<Pat
139138
for(Path path : files) {
140139
File file = path.toFile();
141140
String fullFileName = file.getName();
142-
String srcStr = StringReplace.replace(fileReader.readString(new FileReader(file)), "\r\n", "\n");
141+
char[] src = fileReader.readChars(new FileReader(file));
142+
int srcOff = 0;
143+
int srcLen = src.length;
143144
Entry<String, String> fileNameExt = StringSplit.lastMatchParts(fullFileName, ".");
144145
if("json".equals(fileNameExt.getValue())) {
145-
int lineCount = StringSplit.countMatches(srcStr, "\n");
146-
val parsedStats = new ParsedFileStats(file.toString(), srcStr.length(), 0, 0, lineCount);
146+
int lineCount = StringSplit.countMatches(src, srcOff, srcLen, new char[] { '\n' }, 0, 1);
147+
val parsedStats = new ParsedFileStats(file.toString(), srcLen, 0, 0, lineCount);
147148
filesStats.add(parsedStats);
148149
}
149150
else {
150-
val parsedFileInfo = CommentAndWhitespaceExtractor.buildCommentsAndWhitespaceTreeFromFileExtension(fileNameExt.getKey(), fileNameExt.getValue(), srcStr);
151-
val parsedStats = CommentAndWhitespaceExtractor.calcCommentsAndWhitespaceLinesTreeStats(file.toString(), srcStr.length(), parsedFileInfo.getLines(), parsedFileInfo.getDoc());
151+
val parsedFileInfo = CommentAndWhitespaceExtractor.buildCommentsAndWhitespaceTreeFromFileExtension(fileNameExt.getKey(), fileNameExt.getValue(), src, srcOff, srcLen);
152+
val parsedStats = CommentAndWhitespaceExtractor.calcCommentsAndWhitespaceLinesTreeStats(file.toString(), src, srcOff, srcLen, parsedFileInfo.getLineStartOffsets(), parsedFileInfo.getDoc());
152153
filesStats.add(parsedStats);
153154
}
154155
}

src/twg2/parser/codeParser/csharp/CsAnnotationExtractor.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@
66
import lombok.val;
77
import twg2.ast.interm.annotation.AnnotationSig;
88
import twg2.parser.codeParser.extractors.AnnotationExtractor;
9-
import twg2.parser.documentParser.CodeFragment;
109
import twg2.parser.fragment.AstFragType;
10+
import twg2.parser.fragment.CodeFragment;
1111
import twg2.parser.language.CodeLanguageOptions;
1212
import twg2.parser.stateMachine.AstParserReusableBase;
1313
import twg2.treeLike.simpleTree.SimpleTree;

src/twg2/parser/codeParser/csharp/CsAstUtil.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
import twg2.parser.codeParser.AccessModifierEnum;
66
import twg2.parser.codeParser.AccessModifierParser;
77
import twg2.parser.codeParser.AstUtil;
8-
import twg2.parser.documentParser.CodeFragment;
98
import twg2.parser.fragment.AstTypeChecker;
9+
import twg2.parser.fragment.CodeFragment;
1010
import twg2.parser.fragment.CodeFragmentType;
1111
import twg2.parser.language.CodeLanguage;
1212
import twg2.parser.language.CodeLanguageOptions;

src/twg2/parser/codeParser/csharp/CsBlockParser.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@
2323
import twg2.parser.codeParser.extractors.MethodExtractor;
2424
import twg2.parser.codeParser.tools.NameUtil;
2525
import twg2.parser.codeParser.tools.TokenListIterable;
26-
import twg2.parser.documentParser.CodeFragment;
2726
import twg2.parser.fragment.AstFragType;
27+
import twg2.parser.fragment.CodeFragment;
2828
import twg2.parser.language.CodeLanguageOptions;
2929
import twg2.parser.stateMachine.AstParser;
3030
import twg2.streams.EnhancedListBuilderIterator;

src/twg2/parser/codeParser/csharp/CsEnumMemberExtractor.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@
1515
import twg2.parser.codeParser.BlockType;
1616
import twg2.parser.codeParser.KeywordUtil;
1717
import twg2.parser.codeParser.tools.NameUtil;
18-
import twg2.parser.documentParser.CodeFragment;
1918
import twg2.parser.fragment.AstFragType;
19+
import twg2.parser.fragment.CodeFragment;
2020
import twg2.parser.fragment.CodeFragmentType;
2121
import twg2.parser.language.CodeLanguageOptions;
2222
import twg2.parser.stateMachine.AstMemberInClassParserReusable;

src/twg2/parser/codeParser/csharp/CsFileTokenizer.java

+18-26
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,7 @@
22

33
import lombok.val;
44
import twg2.parser.Inclusion;
5-
import twg2.parser.codeParser.CodeFileSrc;
65
import twg2.parser.codeParser.CommentStyle;
7-
import twg2.parser.codeParser.ParseInput;
8-
import twg2.parser.codeParser.ParserBuilder;
96
import twg2.parser.fragment.CodeFragmentType;
107
import twg2.parser.language.CodeLanguageOptions;
118
import twg2.parser.text.CharParserFactory;
@@ -14,6 +11,7 @@
1411
import twg2.parser.tokenizers.CodeBlockTokenizer;
1512
import twg2.parser.tokenizers.CodeStringTokenizer;
1613
import twg2.parser.tokenizers.CommentTokenizer;
14+
import twg2.parser.tokenizers.CodeTokenizerBuilder;
1715
import twg2.parser.tokenizers.IdentifierTokenizer;
1816
import twg2.parser.tokenizers.NumberTokenizer;
1917

@@ -23,30 +21,24 @@
2321
*/
2422
public class CsFileTokenizer {
2523

26-
public static CodeFileSrc<CodeLanguageOptions.CSharp> parse(ParseInput params) {
27-
try {
28-
val identifierParser = IdentifierTokenizer.createIdentifierWithGenericTypeTokenizer();
29-
val numericLiteralParser = NumberTokenizer.createNumericLiteralTokenizer();
24+
public static CodeTokenizerBuilder<CodeLanguageOptions.CSharp> createFileParser() {
25+
val identifierParser = IdentifierTokenizer.createIdentifierWithGenericTypeTokenizer();
26+
val numericLiteralParser = NumberTokenizer.createNumericLiteralTokenizer();
3027

31-
val parser = new ParserBuilder()
32-
.addConstParser(CommentTokenizer.createCommentTokenizer(CommentStyle.multiAndSingleLine()), CodeFragmentType.COMMENT)
33-
.addConstParser(CodeStringTokenizer.createStringTokenizerForCSharp(), CodeFragmentType.STRING)
34-
.addConstParser(CodeBlockTokenizer.createBlockTokenizer('{', '}'), CodeFragmentType.BLOCK)
35-
.addConstParser(CodeBlockTokenizer.createBlockTokenizer('(', ')'), CodeFragmentType.BLOCK)
36-
.addConstParser(createAnnotationTokenizer(), CodeFragmentType.BLOCK)
37-
.addParser(identifierParser, (text, off, len) -> {
38-
return CsKeyword.check.isKeyword(text.toString()) ? CodeFragmentType.KEYWORD : CodeFragmentType.IDENTIFIER; // possible bad performance
39-
})
40-
.addConstParser(createOperatorTokenizer(), CodeFragmentType.OPERATOR)
41-
.addConstParser(createSeparatorTokenizer(), CodeFragmentType.SEPARATOR)
42-
.addConstParser(numericLiteralParser, CodeFragmentType.NUMBER);
43-
return parser.buildAndParse(params.getSrc(), CodeLanguageOptions.C_SHARP, params.getFileName(), true);
44-
} catch(Exception e) {
45-
if(params.getErrorHandler() != null) {
46-
params.getErrorHandler().accept(e);
47-
}
48-
throw e;
49-
}
28+
val parser = new CodeTokenizerBuilder<>(CodeLanguageOptions.C_SHARP)
29+
.addConstParser(CommentTokenizer.createCommentTokenizer(CommentStyle.multiAndSingleLine()), CodeFragmentType.COMMENT)
30+
.addConstParser(CodeStringTokenizer.createStringTokenizerForCSharp(), CodeFragmentType.STRING)
31+
.addConstParser(CodeBlockTokenizer.createBlockTokenizer('{', '}'), CodeFragmentType.BLOCK)
32+
.addConstParser(CodeBlockTokenizer.createBlockTokenizer('(', ')'), CodeFragmentType.BLOCK)
33+
.addConstParser(createAnnotationTokenizer(), CodeFragmentType.BLOCK)
34+
.addParser(identifierParser, (text, off, len) -> {
35+
return CsKeyword.check.isKeyword(text.toString()) ? CodeFragmentType.KEYWORD : CodeFragmentType.IDENTIFIER; // possible bad performance
36+
})
37+
.addConstParser(createOperatorTokenizer(), CodeFragmentType.OPERATOR)
38+
.addConstParser(createSeparatorTokenizer(), CodeFragmentType.SEPARATOR)
39+
.addConstParser(numericLiteralParser, CodeFragmentType.NUMBER);
40+
41+
return parser;
5042
}
5143

5244

src/twg2/parser/codeParser/csharp/CsUsingStatementExtractor.java

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55

66
import lombok.val;
77
import twg2.parser.codeParser.tools.NameUtil;
8-
import twg2.parser.documentParser.CodeFragment;
98
import twg2.parser.fragment.AstFragType;
9+
import twg2.parser.fragment.CodeFragment;
1010
import twg2.parser.language.CodeLanguageOptions;
1111
import twg2.parser.stateMachine.AstParserReusableBase;
1212
import twg2.treeLike.simpleTree.SimpleTree;

0 commit comments

Comments
 (0)