Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA error: device-side assert triggered #4

Open
Risho92 opened this issue Sep 10, 2020 · 8 comments
Open

CUDA error: device-side assert triggered #4

Risho92 opened this issue Sep 10, 2020 · 8 comments

Comments

@Risho92
Copy link

Risho92 commented Sep 10, 2020

python3 run.py --model AttH --max_epochs 1 --batch_size 2

I was trying to execute AttH model with the above command from command prompt. I am getting an error "CUDA error: device-side assert triggered". Given below is the full Traceback. I am trying from Ubuntu 20 and Cuda 11. Can you please provide some guidance on this?

Traceback (most recent call last):
File "run.py", line 191, in
train(parser.parse_args())
File "run.py", line 142, in train
train_loss = optimizer.epoch(train_examples)
File "/home/<user_name>/Desktop/AttH/KGEmb/optimizers/kg_optimizer.py", line 175, in epoch
l = self.calculate_loss(input_batch)
File "/home/<user_name>/Desktop/AttH/KGEmb/optimizers/kg_optimizer.py", line 120, in calculate_loss
loss, factors = self.neg_sampling_loss(input_batch)
File "/home/<user_name>/Desktop/AttH/KGEmb/optimizers/kg_optimizer.py", line 80, in neg_sampling_loss
positive_score, factors = self.model(input_batch)
File "/home/<user_name>/Desktop/AttH/KGEmb/hyp_kg_env/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/<user_name>/Desktop/AttH/KGEmb/models/base.py", line 140, in forward
lhs_e, lhs_biases = self.get_queries(queries)
File "/home/<user_name>/Desktop/AttH/KGEmb/models/hyperbolic.py", line 94, in get_queries
rot_q = givens_rotations(rot_mat, head).view((-1, 1, self.rank))
File "/home/<user_name>/Desktop/AttH/KGEmb/utils/euclidean.py", line 41, in givens_rotations
givens = givens / torch.norm(givens, p=2, dim=-1, keepdim=True)
File "/home/<user_name>/Desktop/AttH/KGEmb/hyp_kg_env/lib/python3.7/site-packages/torch/functional.py", line 1123, in norm
return _VF.norm(input, p, _dim, keepdim=keepdim)
RuntimeError: CUDA error: device-side assert triggered

@kingsaint
Copy link

I am facing the same issue. Is it resolved? If yes, please let me know how.

@ines-chami
Copy link
Contributor

@kingsaint could you share the command you are running?

@kingsaint
Copy link

kingsaint commented Dec 2, 2020

@ines-chami Looks like the multi_c option should be on? The following command worked.
python run.py --dataset YAGO3-10 --model AttH --max_epochs 500 --patience 10 --rank 200 --neg_sample_size -1 learning_rate 0.0005 --multi_c

But if I don't want multiple curvatures per relation then it does not work. Index out of range error occurs at line 91 of models/hyperbolic.py

@ines-chami
Copy link
Contributor

The command:
python run.py --dataset YAGO3-10 --model AttH --max_epochs 500 --patience 10 --rank 200 --neg_sample_size -1 --learning_rate 0.0005 --batch_size 100
worked fine for me (I had to reduce the batch size to avoid memory issues).

Could you share you training log or a screenshot so I could see where the error happens?

@kingsaint
Copy link

kingsaint commented Dec 2, 2020

I used your command and got this error

Traceback (most recent call last):
File "run.py", line 191, in
train(parser.parse_args())
File "run.py", line 142, in train
train_loss = optimizer.epoch(train_examples)
File "/common/home/rb897/KGEmb/optimizers/kg_optimizer.py", line 175, in epoch
l = self.calculate_loss(input_batch)
File "/common/home/rb897/KGEmb/optimizers/kg_optimizer.py", line 122, in calculate_loss
predictions, factors = self.model(input_batch, eval_mode=True)
File "/common/users/rb897/local/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/common/home/rb897/KGEmb/models/base.py", line 140, in forward
lhs_e, lhs_biases = self.get_queries(queries)
File "/common/home/rb897/KGEmb/models/hyperbolic.py", line 95, in get_queries
rot_q = givens_rotations(rot_mat, head).view((-1, 1, self.rank))
File "/common/home/rb897/KGEmb/utils/euclidean.py", line 43, in givens_rotations
x_rot = givens[:, :, 0:1] * x + givens[:, :, 1:] * torch.cat((-x[:, :, 1:], x[:, :, 0:1]), dim=-1)
RuntimeError: CUDA error: device-side assert triggered
/opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [96,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [97,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [98,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
/opt/conda/conda-bld/pytorch_1565272271120/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [99,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.

@ines-chami
Copy link
Contributor

The first issue posted by @Risho92 seems to be caused by a divide by zero error which should be fix in this commit:
7390004

@kingsaint your issue seems to be triggered somewhere else but I cannot reproduce the bug. I am using Python 3.7.3 and the packages below:

  • numpy==1.18.3
  • torch==1.5.0

@Sahajtomar
Copy link

Hey, I am also getting this error. Can anyone help me ?

@PhaelIshall
Copy link

PhaelIshall commented Jun 9, 2021

@Sahajtomar It works perfectly if you change this in the requirements.txt file:

   numpy==1.18.3
   torch==1.5.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants