Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

InvalidArgumentError: Input matrix is not invertible. #39

Open
JSP21 opened this issue Apr 12, 2019 · 11 comments
Open

InvalidArgumentError: Input matrix is not invertible. #39

JSP21 opened this issue Apr 12, 2019 · 11 comments

Comments

@JSP21
Copy link

JSP21 commented Apr 12, 2019

Hi all,
I'm facing the foll. issue while executing Deep Gaussian Process SVI for a two-layer model.

I have tried adding jitter, centered the input data, tried various hyperparameter specifications, upgrading gpflow version, but couldn't resolve the error.

Any pointers, please! Thank you!

InvalidArgumentError (see above for traceback): Input matrix is not invertible.
[[Node: gradients/DGP-2c82c62a-25/conditional/base_conditional/Cholesky_grad/MatrixTriangularSolve = MatrixTriangularSolve[T=DT_FLOAT, adjoint=false, lower=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](DGP-2c82c62a-25/conditional/base_conditional/Cholesky, gradients/DGP-2c82c62a-25/conditional/base_conditional/Cholesky_grad/eye/MatrixDiag)]]

The full error trace is as follows:

File "/home/jaya/jayashree/cdgp_experiments/wconv_rbf.py", line 112, in
m_dgp2 = make_dgp(2)
File "/home/jaya/jayashree/cdgp_experiments/wconv_rbf.py", line 103, in make_dgp
num_outputs=num_classes)
File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/core/compilable.py", line 90, in init
self.build()
File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/core/node.py", line 156, in build
self._build()
File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/models/model.py", line 81, in _build
likelihood = self._build_likelihood()
File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/decors.py", line 67, in tensor_mode_wrapper
result = method(obj, *args, **kwargs)
File "/home/jaya/jayashree/cdgp_experiments/dgp.py", line 106, in _build_likelihood
L = tf.reduce_sum(self.E_log_p_Y(self.X, self.Y))
File "/home/jaya/jayashree/cdgp_experiments/dgp.py", line 95, in E_log_p_Y
Fmean, Fvar = self._build_predict(X, full_cov=False, S=self.num_samples)
File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/decors.py", line 67, in tensor_mode_wrapper
result = method(obj, *args, **kwargs)
File "/home/jaya/jayashree/cdgp_experiments/dgp.py", line 87, in _build_predict
Fs, Fmeans, Fvars = self.propagate(X, full_cov=full_cov, S=S)
File "/home/jaya/.local/lib/python3.5/site-packages/gpflow/decors.py", line 67, in tensor_mode_wrapper
result = method(obj, *args, **kwargs)
File "/home/jaya/jayashree/cdgp_experiments/dgp.py", line 76, in propagate
F, Fmean, Fvar = layer.sample_from_conditional(F, z=z, full_cov=full_cov)
File "/home/jaya/jayashree/cdgp_experiments/layers.py", line 111, in sample_from_conditional
mean, var = self.conditional(X, full_cov=full_cov)
File "/home/jaya/jayashree/cdgp_experiments/layers.py", line 96, in conditional
mean, var = single_sample_conditional(X_flat)
File "/home/jaya/jayashree/cdgp_experiments/layers.py", line 84, in single_sample_conditional
full_cov=full_cov, white=True)

InvalidArgumentError (see above for traceback): Input matrix is not invertible.
[[Node: gradients/DGP-2c82c62a-25/conditional/base_conditional/Cholesky_grad/MatrixTriangularSolve = MatrixTriangularSolve[T=DT_FLOAT, adjoint=false, lower=true, _device="/job:localhost/replica:0/task:0/device:CPU:0"](DGP-2c82c62a-25/conditional/base_conditional/Cholesky, gradients/DGP-2c82c62a-25/conditional/base_conditional/Cholesky_grad/eye/MatrixDiag)]]

@hughsalimbeni
Copy link
Collaborator

The most likely thing is grossly misspecified hyperparameters. What's the data you're using? I generally rescale the data to unit standard deviation to avoid having to set them by hand. Another issue could be a nan in the data. Also, are you using float64 or float32?

@JSP21
Copy link
Author

JSP21 commented Apr 18, 2019

I have rescaled the data to unit SD and the data type is float32. And I have verified that there are no NaNs in the data.

@hughsalimbeni
Copy link
Collaborator

Could you try with tf.float64 (in gpflowrc). Sometimes that is a cause of instability.

@hughsalimbeni
Copy link
Collaborator

(and I'm assuming jitter is 1e-6)

@JSP21
Copy link
Author

JSP21 commented Apr 26, 2019

Thank you so much. It works!

@JSP21
Copy link
Author

JSP21 commented May 16, 2019

Also, please could you kindly tell the reasons why the variational parameters q_mu and q_sqrt could possibly turn to nan on increasing the layers?

@hughsalimbeni
Copy link
Collaborator

When using the natural gradient optimizer the actual gradient step takes place in the natural parameters, which are unconstrained. That is, not all values for the natural parameters are valid (because of positive definiteness). Sometimes gradients steps are too large and the gradient step moves to values that are invalid, resulting in a nan update to q_sqrt. It is actually possible to take natural gradient steps in other parameterization, but in practice it doesn't seem to work so well. See this paper for details.

@JSP21
Copy link
Author

JSP21 commented May 16, 2019

Thank you. It makes sense.

Also, I am also facing a scenario wherein variational parameters are getting updated during the learning process, but not the kernel parameters. I wrote a new kernel by initialising the kernel parameters using gpflow.params.Parameter( ). Any pointers please to make the kernel parameters get updated while optimising?

@hughsalimbeni
Copy link
Collaborator

If you're optimizing hyperparameters then you need an additional optimizer. I tend to alternate between nat grad steps and adam steps. See, for example: https://github.com/hughsalimbeni/DGPs_with_IWVI/blob/3f6fab39586f9e45dbc26c6dec91394f9b052e9e/experiments/build_models.py#L293

@JSP21
Copy link
Author

JSP21 commented May 17, 2019

Thank you. And, what is the intuition behind going for two optimizers? Will adam alone not suffice for learning both?

@hughsalimbeni
Copy link
Collaborator

Yes, and that is indeed what I used to do. This paper https://arxiv.org/abs/1905.03350 looks at this issue in more detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants