Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LM Toolkit Refactor #381

Open
wants to merge 47 commits into
base: 2.0.0
Choose a base branch
from
Open

LM Toolkit Refactor #381

wants to merge 47 commits into from

Conversation

dcgaines
Copy link
Collaborator

@dcgaines dcgaines commented Mar 7, 2025

Merging toolkit refactor into Banff LM branch for sim testing.

Overview

Replaced all custom models in the language module with language model adapters. Adapters rely on aactextpredict, our new LM toolkit, for the heavy lifting and only need to handle BciPy-specific things like special space and backspace characters and response type properties.

Ticket

Link a pivotal ticket here

Contributions

  • Deprecated LanguageModel classes in favor of LanguageModelAdapter classes.
  • Consolidated predict methods into the super class, only override when needed (Oracle).
  • Renamed KenLM model to NGram to match the aactextpredict package.
  • Updated all references to KenLM and LanguageModel classes to match new names/classes

Test

  • Ran all pytest cases

Documentation

  • Language module README updated. Added links to textpredict repo and AAC adapting arXiv paper.

Changelog

  • Is the CHANGELOG.md updated with your detailed changes?
  • Not yet.

tab-cmd and others added 30 commits January 8, 2025 12:43
TODO: finish processing script, integrate LLM
Copy link
Contributor

@tab-cmd tab-cmd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll wait for @lawhead to review since he has more experience here. I would keep the base class as LanguageModel or BciPyLanguageModel, but the others could be annotated as Adapters extending from that. We may want our own Uniform here without an adapter. I understand why you need it in the toolkit, but it's simple enough to keep here, and it could be a good example of how to build an LM in BciPy.

The toolkit doesn't seem to work for 3.10.6. >=3.7,<3.11?

Also, some linting errors!

@tab-cmd tab-cmd deleted the branch 2.0.0 March 20, 2025 08:44
@tab-cmd tab-cmd closed this Mar 20, 2025
@tab-cmd tab-cmd reopened this Mar 20, 2025
@tab-cmd tab-cmd changed the base branch from BANFF_lm to 2.0.0 March 20, 2025 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants