nohup: ignoring input W0908 09:31:40.935000 4132297 site-packages/torch/distributed/run.py:774] W0908 09:31:40.935000 4132297 site-packages/torch/distributed/run.py:774] ***************************************** W0908 09:31:40.935000 4132297 site-packages/torch/distributed/run.py:774] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. W0908 09:31:40.935000 4132297 site-packages/torch/distributed/run.py:774] ***************************************** wandb: Appending key for api.wandb.ai to your netrc file: /home/yitongli/.netrc wandb: Currently logged in as: liyitong (liyitong-Tsinghua University) to https://api.wandb.ai. Use `wandb login --relogin` to force relogin wandb: Tracking run with wandb version 0.21.3 wandb: Run data is saved locally in ./output/wandb/run-20250908_093154-g6k5a9jv wandb: Run `wandb offline` to turn off syncing. wandb: Syncing run fluent-field-871 wandb: ⭐️ View project at https://wandb.ai/liyitong-Tsinghua%20University/self-forcing wandb: 🚀 View run at https://wandb.ai/liyitong-Tsinghua%20University/self-forcing/runs/g6k5a9jv run dir: ./output/wandb/run-20250908_093154-g6k5a9jv/files KV inference with 3 frames per block KV inference with 3 frames per block KV inference with 3 frames per block KV inference with 3 frames per block KV inference with 3 frames per block KV inference with 3 frames per block KV inference with 3 frames per block KV inference with 3 frames per block ODERegression initialized. ODERegression initialized. ODERegression initialized. ODERegression initialized. ODERegression initialized. ODERegression initialized. ODERegression initialized. ODERegression initialized. cache a block wise causal mask with block size of 3 frames BlockMask(shape=(1, 1, 32768, 32768), sparsity=42.52%, (0, 0) ████░░ ████░░ ████░░░░░░░░ ██████████░░ ██████████░░ ██████████░░░░░░░░ ████████████████░░ ████████████████░░ ████████████████░░░░░░░░ ██████████████████████░░ ██████████████████████░░ ██████████████████████░░░░░░░░ ████████████████████████████░░ ████████████████████████████░░ ████████████████████████████████░░ ████████████████████████████████░░ ████████████████████████████████░░░░░░░░ ████████████████████████████████████████ ████████████████████████████████████████ ████████████████████████████████████████ ) Start gathering distributed model states... Start gathering distributed model states... /home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:678: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( Start gathering distributed model states... /home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:678: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:678: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( Start gathering distributed model states... /home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:678: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( Start gathering distributed model states... /home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:678: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( Start gathering distributed model states... Start gathering distributed model states... /home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:678: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( /home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:678: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( Start gathering distributed model states... /home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py:678: FutureWarning: FSDP.state_dict_type() and FSDP.set_state_dict_type() are being deprecated. Please use APIs, get_state_dict() and set_state_dict(), which can support different parallelisms, FSDP1, FSDP2, DDP. API doc: https://pytorch.org/docs/stable/distributed.checkpoint.html#torch.distributed.checkpoint.state_dict.get_state_dict .Tutorial: https://pytorch.org/tutorials/recipes/distributed_checkpoint_recipe.html . warnings.warn( Model saved to ./output/2025-09-08-09-31-54.546544_seed6944469/checkpoint_model_000000/model.pt training step 0... Saving video: 0%| | 0/81 [00:00 sys.exit(main()) File "/home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 357, in wrapper return f(*args, **kwargs) File "/home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/run.py", line 901, in main run(args) File "/home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run elastic_launch( File "/home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 143, in __call__ return launch_agent(self._config, self._entrypoint, list(args)) File "/home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent result = agent.run() File "/home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/elastic/metrics/api.py", line 138, in wrapper result = f(*args, **kwargs) File "/home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py", line 715, in run result = self._invoke_run(role) File "/home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/elastic/agent/server/api.py", line 879, in _invoke_run time.sleep(monitor_interval) File "/home/yitongli/miniconda3/envs/causvid/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py", line 84, in _terminate_process_handler raise SignalException(f"Process {os.getpid()} got signal: {sigval}", sigval=sigval) torch.distributed.elastic.multiprocessing.api.SignalException: Process 4132297 got signal: 1