2.0
Requirement: Java 8
New Features:
- All-in-one command for hierarchical topic detection
- Webpage visualization with direct link to corresponding documents
- Evaluation metrics: topic coherence, topic compactness(scala ver.)
- Allow input document to be listed line by line
- Supports non-ascii characters
- Supports LDA data format
- Added option to skip tree level
- Simplified HLTA parameters
- Supports seedwords of any word length
- Parallel computation in computing word-pair MI
Other changes:
- Default using Narrowly Defined Topics
- Scala calls use Stepwise EM for parameter estimation
- User defined encoding scheme in data conversion
- Pre-processor now remove punctuation instead of replacing it with underscore
- Subroutines now accept all data formats, while sparse data will be the default format
- Data Conversion default only outputs sparse data format
- Data Conversion now reads PDF directly
- Sparse data format now counts docId from 0
- HLCM data format now uses extension .hlcm
- Legacy fixes of collision with .bif format reserved words
- Fixed invalid json format