Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data.Number.LogFloat.(/): argument out of range #1

Open
JPMoresmau opened this issue Feb 7, 2014 · 3 comments
Open

Data.Number.LogFloat.(/): argument out of range #1

JPMoresmau opened this issue Feb 7, 2014 · 3 comments

Comments

@JPMoresmau
Copy link
Contributor

Given the states:
["AT","NP","NN","JJ","VBD","NR","IN"]
And the events:
["The","Fulton","County","Grand","Jury","said","Friday","an","investigation","of"]
Training a simpleHMM with baumWelch on the same list of events,result in
Data.Number.LogFloat.(/): argument out of range

I think that problem was reported first at http://izbicki.me/blog/using-hmms-in-haskell-for-bioinformatics#comment-798 , is there a more recent library to use for hidden markov models? I don't mind having a go at trying to fix this, but some guidance would be appreciated!

Thanks!

@mikeizbicki
Copy link
Owner

Can you give me the full code you used that caused the error?

Also, I'll be going on vacation in a couple hours. I'll check back here first to see if you've posted anything, otherwise it will be Monday or Tuesday before I see anything.

@JPMoresmau
Copy link
Contributor Author

with listArray from Data.Array:

      let
        states=["AT","NP","NN","JJ","VBD","NR","IN"]
        events=["The","Fulton","County","Grand","Jury","said","Friday","an","investigation","of"]
        model=baumWelch (simpleHMM states events) (listArray (1,length events) events) 2
      print model

@mikeizbicki
Copy link
Owner

LogFloat generates that error whenever a value <= 0 is stored within it. (The log of these numbers is undefined, and we have to be working in the log domain otherwise double precision will underflow the arithmetic.) So I'm guessing that's what's happening somewhere.

It might be that you have too many states for the number of events that you have. So a lot of the transition probabilities are initialized to 0 when they should be some small positive number >0. I only ever tested the code on eventlists with very few discrete events, so I never ran into this problem. For English text, however, you have so many transitions that you will never cover them all. You can verify this is in fact the problem by making sure your event list covers all possible transitions (e.g. The Fulton, The The, Fulton Fulton, and Fulton The would cover all the transitions of just those two words.)

If that is the case, the way to fix it would be to add a small number like 0.000000001 to each of the probabilities. This has a Bayesian interpretation of assigning prior probabilities to all of the possible transitions; the prior would be proportional to whatever number you decide to add. Assigning the correct prior will depend on the domain of interest. So it's probably something that we should as an actual parameter to baumWelch.

In the code, this change should be made by adjusting the initProbs, transMatrix, and outMatrix of the baumWelchItr function. Sorry that I can't make these changes for you right now because I'm away from my dev machine, but it hopefully shouldn't be too hard to figure out what to change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants