Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cholesky decomposition was not successful. The input might not be valid. #9

Open
svaibhavsingh opened this issue Mar 6, 2018 · 4 comments

Comments

@svaibhavsingh
Copy link

Hi, I know that this might not be a valid issue for this code as this there in gpflow but this error is being there whenever i use any other dataset apart from mnist. I tried increasing jitter value, that sometimes help but doesn't give any surety. I have tried sum of kernels, normalising input too, that also doesn't always solve the problem. Is there a permanent fix to this?

@hughsalimbeni
Copy link
Collaborator

There are a number of ways the cholesky can fail, and annoyingly the tensorflow error message is worse than useless. Here are some of the things I've experienced

  • a nan in the data
  • a nan coming from the kernel e.g. using the matern with near-zero correlations. (This can be harder to spot. Solved by increasing the nugget in the squared distance computation)
  • mispecified hyperparameters (this is probably the most common and is generally fixed with correct data normalization (both inputs and outputs in the regression case). NB if there's a rogue bad dimension in the data (e.g. all zeros) that can mess things up if you're dividing things by the emperical std dev)
  • overly aggressive optimization (fixed by decreasing learning rates)
  • insufficient jitter (though I usually find that 1e-6 is sufficient)
  • anything with float32 (I've found tf.cholesky very unstable in float32, e.g. tf.cholesky(tf.matmul(A, A, transpose_a=True)) with square A can fail)

As a general first line of attack, I'd always try the single layer model first (either with DGP with one layer or else SVGP) to see if the error still persists there.

@svaibhavsingh
Copy link
Author

Using float64 instead of float32 helped. Thanks alot.

@svaibhavsingh svaibhavsingh reopened this Apr 19, 2018
@svaibhavsingh
Copy link
Author

Hi, I am trying to use natural gradient + adam (natgrad) with DGP framework and again while taking natural gradient step I am facing the same issue. I tried all the above thing and still its not working. Am i missing anything?

@hughsalimbeni
Copy link
Collaborator

I intend to put up a notebook of demo usage for this soon, but for now a few pointers:

  • The nat grad step needs to be done first (i.e. nat grads, then Adam afterwards). Doing it the other way around can lead to divergence (especially if gamma=1)
  • You can use nat grads for all the layers, or just the final layer. Final layer is easier, as then for Gaussian likelihood you can probably use gamma~0.1 and it should work fine.
  • For non-gaussian likelihood you need to be more careful and start the nat grad step quite small.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants