DPA3-微调-2D数据集-Alex2D时报错 #5155
Unanswered
Zhou-tao-USTC
asked this question in
Q&A
Replies: 1 comment
-
|
你的 KeyError: 解决建议:
Checklist:
如需进一步排查 checkpoint 结构,可用 To reply, just mention @dosu. How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
日治为:
cuda-12.9 loaded successful
/var/spool/slurmd/job784291/slurm_script: line 5: export: `968': not a valid identifier
To get the best performance, it is recommended to adjust the number of threads by setting the environment variables OMP_NUM_THREADS, DP_INTRA_OP_PARALLELISM_THREADS, and DP_INTER_OP_PARALLELISM_THREADS. See https://deepmd.rtfd.io/parallelism/ for more information.
[2026-01-15 17:52:42,080] DEEPMD INFO DeePMD version: 3.1.0
[2026-01-15 17:52:42,081] DEEPMD INFO Configuration path: finetune_input.json
[2026-01-15 17:52:43,888] DEEPMD INFO _____ _____ __ __ _____ _ _ _
[2026-01-15 17:52:43,888] DEEPMD INFO | __ \ | __ \ | / || __ \ | | ()| |
[2026-01-15 17:52:43,889] DEEPMD INFO | | | | ___ ___ | |__) || \ / || | | | ______ | | __ _ | |
[2026-01-15 17:52:43,890] DEEPMD INFO | | | | / _ \ / _ | / | |/| || | | |||| |/ /| || |
[2026-01-15 17:52:43,890] DEEPMD INFO | || || /| /| | | | | || || | | < | || |
[2026-01-15 17:52:43,891] DEEPMD INFO |/ _| _||| || |_||____/ ||_|| __|
[2026-01-15 17:52:43,891] DEEPMD INFO Please read and cite:
[2026-01-15 17:52:43,891] DEEPMD INFO Wang, Zhang, Han and E, Comput.Phys.Comm. 228, 178-184 (2018)
[2026-01-15 17:52:43,892] DEEPMD INFO Zeng et al, J. Chem. Phys., 159, 054801 (2023)
[2026-01-15 17:52:43,892] DEEPMD INFO Zeng et al, J. Chem. Theory Comput., 21, 4375-4385 (2025)
[2026-01-15 17:52:43,893] DEEPMD INFO See https://deepmd.rtfd.io/credits/ for details.
[2026-01-15 17:52:43,893] DEEPMD INFO --------------------------------------------------------------------------------------------------------------------------
[2026-01-15 17:52:43,894] DEEPMD INFO installed to: /data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd
[2026-01-15 17:52:43,894] DEEPMD INFO source:
[2026-01-15 17:52:43,894] DEEPMD INFO source branch: HEAD
[2026-01-15 17:52:43,895] DEEPMD INFO source commit: 8b3dc08
[2026-01-15 17:52:43,895] DEEPMD INFO source commit at: 2025-06-11 13:00:46 +0200
[2026-01-15 17:52:43,896] DEEPMD INFO use float prec: double
[2026-01-15 17:52:43,896] DEEPMD INFO build variant: cuda
[2026-01-15 17:52:43,897] DEEPMD INFO Backend: PyTorch
[2026-01-15 17:52:43,897] DEEPMD INFO PT ver: v2.6.0-gUnknown
[2026-01-15 17:52:43,898] DEEPMD INFO Enable custom OP: True
[2026-01-15 17:52:43,898] DEEPMD INFO build with PT ver: 2.6.0
[2026-01-15 17:52:43,898] DEEPMD INFO build with PT inc: /data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/torch/include
[2026-01-15 17:52:43,899] DEEPMD INFO /data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/torch/include/torch/csrc/api/include
[2026-01-15 17:52:43,899] DEEPMD INFO build with PT lib: /data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/torch/lib
[2026-01-15 17:52:43,900] DEEPMD INFO running on: g0018
[2026-01-15 17:52:43,900] DEEPMD INFO computing device: cuda:0
[2026-01-15 17:52:43,901] DEEPMD INFO CUDA_VISIBLE_DEVICES: 0
[2026-01-15 17:52:43,901] DEEPMD INFO Count of visible GPUs: 1
[2026-01-15 17:52:43,901] DEEPMD INFO num_intra_threads: 0
[2026-01-15 17:52:43,902] DEEPMD INFO num_inter_threads: 0
[2026-01-15 17:52:43,902] DEEPMD INFO --------------------------------------------------------------------------------------------------------------------------
[2026-01-15 17:52:44,993] DEEPMD INFO Constructing DataLoaders from 1 systems
[2026-01-15 17:52:45,538] DEEPMD INFO ---Summary of DataSystem: training -----------------------------------------------
[2026-01-15 17:52:45,538] DEEPMD INFO found 1 system(s):
[2026-01-15 17:52:45,539] DEEPMD INFO system natoms bch_sz n_bch prob pbc
[2026-01-15 17:52:45,539] DEEPMD INFO ../../init_data/C222N20 242 1 100 1.000e+00 T
[2026-01-15 17:52:45,540] DEEPMD INFO --------------------------------------------------------------------------------------
[2026-01-15 17:52:45,540] DEEPMD INFO Resuming from ../DPA-3.1-3M.pt.
Traceback (most recent call last):
File "/data/home/sczc382/run/deepmd-kit/bin/dp", line 10, in
sys.exit(main())
^^^^^^
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/main.py", line 930, in main
deepmd_main(args)
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/torch/distributed/elastic/multiprocessing/errors/init.py", line 355, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/main.py", line 532, in main
train(
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/main.py", line 342, in train
trainer = get_trainer(
^^^^^^^^^^^^
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/entrypoints/main.py", line 188, in get_trainer
trainer = training.Trainer(
^^^^^^^^^^^^^^^^^
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/train/training.py", line 529, in init
collect_single_finetune_params(
File "/data/home/sczc382/run/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/train/training.py", line 523, in collect_single_finetune_params
_origin_state_dict[new_key].clone().detach()
~~~~~~~~~~~~~~~~~~^^^^^^^^^
KeyError: 'model.Alex2D.atomic_model.models.0.out_bias'
输入设置为:
{
"model": {
"use_srtab": "../C_N_zbl.txt",
"smin_alpha": 0.1,
"sw_rmin": 0.5,
"sw_rmax": 0.9,
"type_map": [
"C",
"N"
],
"descriptor": {
"type": "dpa3",
"repflow": {
"n_dim": 128,
"e_dim": 64,
"a_dim": 32,
"nlayers": 6,
"e_rcut": 6.0,
"e_rcut_smth": 5.3,
"e_sel": 1200,
"a_rcut": 4.0,
"a_rcut_smth": 3.5,
"a_sel": 300,
"axis_neuron": 4,
"skip_stat": true,
"a_compress_rate": 1,
"a_compress_e_rate": 2,
"a_compress_use_split": true,
"update_angle": true,
"smooth_edge_update": true,
"use_dynamic_sel": true,
"sel_reduce_factor": 10.0,
"use_exp_switch": true,
"update_style": "res_residual",
"update_residual": 0.1,
"update_residual_init": "const"
},
"activation_function": "custom_silu:3.0",
"precision": "float32",
"use_tebd_bias": false,
"concat_output_tebd": false
},
"fitting_net": {
"neuron": [
240,
240,
240
],
"activation_function": "custom_silu:3.0",
"resnet_dt": true,
"precision": "float32",
"dim_case_embd": 31,
"seed": 1,
"_comment": " that's all"
},
"_comment": " that's all"
},
"learning_rate": {
"type": "exp",
"decay_steps": 50,
"start_lr": 1e-3,
"stop_lr": 3e-5,
"_comment": "that's all"
},
"loss": {
"type": "ener",
"start_pref_e": 0.2,
"limit_pref_e": 20,
"start_pref_f": 100,
"limit_pref_f": 60,
"start_pref_v": 0.02,
"limit_pref_v": 1,
"_comment": " that's all"
},
"training": {
"stat_file": "./dpa3.hdf5",
"training_data": {
"systems": [
"../../init_data/C222N20"
],
"batch_size": 1,
"_comment": "that's all"
},
"numb_steps": 10000,
"warmup_steps": 0,
"gradient_max_norm": 5.0,
"seed": 10,
"disp_file": "lcurve.out",
"disp_freq": 100,
"save_freq": 2000,
"_comment": "that's all"
},
"_comment": "that's all"
}
提交脚本为:
#!/bin/bash
module load cuda/12.9
source ~/run/deepmd-kit/bin/activate
export DP_INFER_BATCH_SIZE= 968
dp --pt train finetune_input.json --skip-neighbor-stat --finetune ../DPA-3.1-3M.pt --model-branch Alex2D
#lmp -i input.lammps -v restart 0
Beta Was this translation helpful? Give feedback.
All reactions