You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to train the model on two servers with one GPU each. But after I set up the configuration and ran it, the program stuck in one place and didn't react. I'm sure the program works when I train with a server.
(py36tr108cu117) (base) cx@v100:~/ViLT-master$ python run.py with data_root=../../data/TrinityMultimodalTrojAI-main/data/clean/ num_gpus=1 num_nodes=2 task_finetune_vqa_randaug per_gpu_batchsize=64 load_path=../../data/model_weight/vilt_200k_mlm_itm.ckpt
WARNING - root - Changed type of config entry "max_steps" from int to NoneType
WARNING - ViLT - No observers have been added to this run
INFO - ViLT - Running command 'main'
INFO - ViLT - Started
Global seed set to 0
INFO - lightning - Global seed set to 0
GPU available: True, used: True
INFO - lightning - GPU available: True, used: True
TPU available: None, using: 0 TPU cores
INFO - lightning - TPU available: None, using: 0 TPU cores
Using environment variable NODE_RANK for node rank (0).
INFO - lightning - Using environment variable NODE_RANK for node rank (0).
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO - lightning - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Using native 16bit precision.
INFO - lightning - Using native 16bit precision.
Missing logger folder: result/finetune_vqa_randaug_seed0_from_vilt_200k_mlm_itm
WARNING - lightning - Missing logger folder: result/finetune_vqa_randaug_seed0_from_vilt_200k_mlm_itm
Global seed set to 0
INFO - lightning - Global seed set to 0
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/2
INFO - lightning - initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/2
INFO - root - Added key: store_based_barrier_key:1 to store for rank: 0
The program stops at this point
The text was updated successfully, but these errors were encountered:
I want to train the model on two servers with one GPU each. But after I set up the configuration and ran it, the program stuck in one place and didn't react. I'm sure the program works when I train with a server.
export MASTER_ADDR=192.168.1.12
export MASTER_PORT=17788
export NODE_RANK=0
(py36tr108cu117) (base) cx@v100:~/ViLT-master$ python run.py with data_root=../../data/TrinityMultimodalTrojAI-main/data/clean/ num_gpus=1 num_nodes=2 task_finetune_vqa_randaug per_gpu_batchsize=64 load_path=../../data/model_weight/vilt_200k_mlm_itm.ckpt
WARNING - root - Changed type of config entry "max_steps" from int to NoneType
WARNING - ViLT - No observers have been added to this run
INFO - ViLT - Running command 'main'
INFO - ViLT - Started
Global seed set to 0
INFO - lightning - Global seed set to 0
GPU available: True, used: True
INFO - lightning - GPU available: True, used: True
TPU available: None, using: 0 TPU cores
INFO - lightning - TPU available: None, using: 0 TPU cores
Using environment variable NODE_RANK for node rank (0).
INFO - lightning - Using environment variable NODE_RANK for node rank (0).
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO - lightning - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Using native 16bit precision.
INFO - lightning - Using native 16bit precision.
Missing logger folder: result/finetune_vqa_randaug_seed0_from_vilt_200k_mlm_itm
WARNING - lightning - Missing logger folder: result/finetune_vqa_randaug_seed0_from_vilt_200k_mlm_itm
Global seed set to 0
INFO - lightning - Global seed set to 0
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/2
INFO - lightning - initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/2
INFO - root - Added key: store_based_barrier_key:1 to store for rank: 0
The program stops at this point
The text was updated successfully, but these errors were encountered: