How to continue fine-tuning a DPA-3 model from intermediate checkpoint? #5163

qchempku2017 · 2026-01-19T10:41:47Z

qchempku2017
Jan 19, 2026

Has anyone tried hot-restarting fine-tuning process of a DPA-3 model? What would be the correct command?

My original training command was:

dp --pt train input.json --finetune DPA-3.1-3M.pt --model-branch Omat24

Now, since my fine-tuning task aborted due to hardware malfunction, I need to hot restart from model-402000.ckpt.pt checkpoint. To resume, should I modify my command as:

dp --pt train input.json --finetune --restart model-xxxx.ckpt.pt --model-branch Omat24

Thanks!

Answered by iProzd

Jan 19, 2026

@qchempku2017 You do not need to add --model-branch Omat24 while retarting, since the finetuned checkpoint is already a single-task model just one model branch. So the correct command is:

dp --pt train input.json --restart model-402000.ckpt.pt --skip-neighbor-stat

where --skip-neighbor-stat is recommended here when neighbor statistics is unnecessary.

View full answer

@qchempku2017 · 2026-01-19T10:44:23Z

dosubot[bot]
bot Jan 19, 2026

Hi @qchempku2017! I'm Dosu and I’m helping the deepmd-kit team.

To resume fine-tuning your DPA-3 model from an intermediate checkpoint, you should use only the --restart flag with your checkpoint file—do not combine --finetune and --restart, as they are mutually exclusive and will cause an error if used together. The correct command is:

dp --pt train input.json --restart model-402000.ckpt.pt --model-branch Omat24

If your fine-tuning run generated an out.json file, use that as your input instead of the original input.json to ensure all model parameters match the checkpoint:

dp --pt train out.json --restart model-402000.ckpt.pt --model-branch Omat24

This approach is the standard and recommended way to hot-restart fine-tuning after an interruption. The --restart flag will continue training from the checkpoint, preserving all learned weights and fine-tuning progress so far. You do not need to specify --finetune again; that is only for starting a new fine-tuning run from a pretrained model, not for resuming an interrupted one. For more details, see this discussion and the documentation.

If you run into errors about model parameter mismatches, make sure your input JSON matches the model structure in the checkpoint. For multi-task fine-tuning, the model section must be consistent with the pretrained model; if needed, you can extract model parameters from the pretrained .pt file and update your input JSON accordingly (details here).

Let me know if you need troubleshooting tips for specific errors!

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

0 replies

iProzd · 2026-01-19T12:53:58Z

iProzd
Jan 19, 2026
Maintainer

@qchempku2017 You do not need to add --model-branch Omat24 while retarting, since the finetuned checkpoint is already a single-task model just one model branch. So the correct command is:

dp --pt train input.json --restart model-402000.ckpt.pt --skip-neighbor-stat

where --skip-neighbor-stat is recommended here when neighbor statistics is unnecessary.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to continue fine-tuning a DPA-3 model from intermediate checkpoint? #5163

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to continue fine-tuning a DPA-3 model from intermediate checkpoint? #5163

Uh oh!

qchempku2017 Jan 19, 2026

Replies: 2 comments

Uh oh!

dosubot[bot] bot Jan 19, 2026

Uh oh!

iProzd Jan 19, 2026 Maintainer

qchempku2017
Jan 19, 2026

dosubot[bot]
bot Jan 19, 2026

iProzd
Jan 19, 2026
Maintainer