Skip to content

2.0

Compare
Choose a tag to compare
@jef33 jef33 released this 19 Mar 14:25
· 58 commits to master since this release

Requirement: Java 8

New Features:

  • All-in-one command for hierarchical topic detection
  • Webpage visualization with direct link to corresponding documents
  • Evaluation metrics: topic coherence, topic compactness(scala ver.)
  • Allow input document to be listed line by line
  • Supports non-ascii characters
  • Supports LDA data format
  • Added option to skip tree level
  • Simplified HLTA parameters
  • Supports seedwords of any word length
  • Parallel computation in computing word-pair MI

Other changes:

  • Default using Narrowly Defined Topics
  • Scala calls use Stepwise EM for parameter estimation
  • User defined encoding scheme in data conversion
  • Pre-processor now remove punctuation instead of replacing it with underscore
  • Subroutines now accept all data formats, while sparse data will be the default format
  • Data Conversion default only outputs sparse data format
  • Data Conversion now reads PDF directly
  • Sparse data format now counts docId from 0
  • HLCM data format now uses extension .hlcm
  • Legacy fixes of collision with .bif format reserved words
  • Fixed invalid json format