Cholesky decomposition was not successful. The input might not be valid. #9

svaibhavsingh · 2018-03-06T16:20:53Z

Hi, I know that this might not be a valid issue for this code as this there in gpflow but this error is being there whenever i use any other dataset apart from mnist. I tried increasing jitter value, that sometimes help but doesn't give any surety. I have tried sum of kernels, normalising input too, that also doesn't always solve the problem. Is there a permanent fix to this?

hughsalimbeni · 2018-03-07T15:06:07Z

There are a number of ways the cholesky can fail, and annoyingly the tensorflow error message is worse than useless. Here are some of the things I've experienced

a nan in the data
a nan coming from the kernel e.g. using the matern with near-zero correlations. (This can be harder to spot. Solved by increasing the nugget in the squared distance computation)
mispecified hyperparameters (this is probably the most common and is generally fixed with correct data normalization (both inputs and outputs in the regression case). NB if there's a rogue bad dimension in the data (e.g. all zeros) that can mess things up if you're dividing things by the emperical std dev)
overly aggressive optimization (fixed by decreasing learning rates)
insufficient jitter (though I usually find that 1e-6 is sufficient)
anything with float32 (I've found tf.cholesky very unstable in float32, e.g. tf.cholesky(tf.matmul(A, A, transpose_a=True)) with square A can fail)

As a general first line of attack, I'd always try the single layer model first (either with DGP with one layer or else SVGP) to see if the error still persists there.

svaibhavsingh · 2018-04-02T06:42:55Z

Using float64 instead of float32 helped. Thanks alot.

svaibhavsingh · 2018-04-19T12:55:44Z

Hi, I am trying to use natural gradient + adam (natgrad) with DGP framework and again while taking natural gradient step I am facing the same issue. I tried all the above thing and still its not working. Am i missing anything?

hughsalimbeni · 2018-04-19T15:34:59Z

I intend to put up a notebook of demo usage for this soon, but for now a few pointers:

The nat grad step needs to be done first (i.e. nat grads, then Adam afterwards). Doing it the other way around can lead to divergence (especially if gamma=1)
You can use nat grads for all the layers, or just the final layer. Final layer is easier, as then for Gaussian likelihood you can probably use gamma~0.1 and it should work fine.
For non-gaussian likelihood you need to be more careful and start the nat grad step quite small.

svaibhavsingh closed this as completed Apr 2, 2018

svaibhavsingh reopened this Apr 19, 2018

andresmasegosa mentioned this issue Oct 16, 2019

tf-probability error using Wishart prior PGM-Lab/InferPy#194

Open

Sayam753 mentioned this issue Jul 27, 2020

Add Full Rank Approximation pymc-devs/pymc4#289

Merged

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cholesky decomposition was not successful. The input might not be valid. #9

Cholesky decomposition was not successful. The input might not be valid. #9

svaibhavsingh commented Mar 6, 2018

hughsalimbeni commented Mar 7, 2018

svaibhavsingh commented Apr 2, 2018

svaibhavsingh commented Apr 19, 2018

hughsalimbeni commented Apr 19, 2018

Cholesky decomposition was not successful. The input might not be valid. #9

Cholesky decomposition was not successful. The input might not be valid. #9

Comments

svaibhavsingh commented Mar 6, 2018

hughsalimbeni commented Mar 7, 2018

svaibhavsingh commented Apr 2, 2018

svaibhavsingh commented Apr 19, 2018

hughsalimbeni commented Apr 19, 2018