[rl] refactor save and load model weights using DCP #2221

wwwjn · 2026-01-13T00:20:14Z

Stack from ghstack (oldest at bottom):

[ghstack-poisoned]

ghstack-source-id: bcd9f5e Pull Request resolved: #2221

[ghstack-poisoned]

ghstack-source-id: b7642b4 Pull Request resolved: #2221

[ghstack-poisoned]

ghstack-source-id: b7642b4 Pull Request resolved: #2221

[ghstack-poisoned]

ghstack-source-id: 87a29dc Pull Request resolved: #2221

acisseJZhong · 2026-01-14T22:01:20Z

torchtitan/experiments/rl/unified/actors/generator.py

-        self.temp_model_dir = os.path.abspath(
-            os.path.join(job_config.job.dump_folder, "vllm_temp_model")
+        # Load TorchTitan plugin at runtime
+        from torchtitan.experiments.rl.unified.plugin import register


use from torchtitan.experiments.rl.unified import register?
can we move the import statement to the header?

acisseJZhong · 2026-01-14T22:20:09Z

torchtitan/experiments/rl/unified/models/vllm_wrapper.py

+
+        return self.load_weights_from_state_dict(torchtitan_state_dict)
+
+    def load_weights(self, weights_iter):


shall we load weights from weights_iter?

fegin · 2026-01-14T23:47:46Z

I'm wondering that should we refactor TorchTitan checkpointer so that it can be directly used in this case. While the current PR work, if TorchTitan migrates to a new checkpoint library other use cases need the same updates as well. This is more future work, not blocking this PR.

tianyu-l · 2026-01-19T11:46:57Z

torchtitan/experiments/rl/unified/models/vllm_wrapper.py

        return logits

-    def load_weights(self, weights_iter):
+    def load_weights_from_state_dict(self, titan_state_dict):


titan_state_dict is ambiguous -- both sides should be titan models.
What other name could we use, e.g. trainer_state_dict?

tianyu-l · 2026-01-19T11:52:35Z

torchtitan/experiments/rl/unified/actors/generator.py

-        # We need to split our weights to match the original 2-shard structure
-        import glob
+        # directly update model weights in place
+        load_weights = self._get_model().load_weights_from_state_dict(state_dict)


IIUC this only works when trainer and generator are on exactly the same global mesh. Is it right?

Is it true that this has been the assumption before this PR? I.e. is our current monarch script only allow colocated trainer and generator?

refactor save and load model weights using DCP

c810835

[ghstack-poisoned]

This was referenced Jan 13, 2026

[rl] Using JobConfig as the centralized config system for inference and simple GRPO #2191

Open

[rl] refactor model registery #2194

Open

pytorch-bot bot added the ciflow/8gpu label Jan 13, 2026

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 13, 2026

wwwjn added a commit that referenced this pull request Jan 13, 2026

refactor save and load model weights using DCP

8c96c73

ghstack-source-id: bcd9f5e Pull Request resolved: #2221

wwwjn changed the title ~~refactor save and load model weights using DCP~~ [WIP] refactor save and load model weights using DCP Jan 13, 2026

Update on "[WIP] refactor save and load model weights using DCP"

a8ecb76

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jan 13, 2026

refactor save and load model weights using DCP

ab0872a

ghstack-source-id: b7642b4 Pull Request resolved: #2221

Update on "[WIP] refactor save and load model weights using DCP"

0606af1

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jan 13, 2026

refactor save and load model weights using DCP

9bf2b83

ghstack-source-id: b7642b4 Pull Request resolved: #2221

Update on "[WIP] refactor save and load model weights using DCP"

a32eb5c

[ghstack-poisoned]

wwwjn added a commit that referenced this pull request Jan 14, 2026

refactor save and load model weights using DCP

0209b2b

ghstack-source-id: 87a29dc Pull Request resolved: #2221

wwwjn changed the title ~~[WIP] refactor save and load model weights using DCP~~ [rl] refactor save and load model weights using DCP Jan 14, 2026

wwwjn requested review from acisseJZhong, tianyu-l and zhxchen17 January 14, 2026 05:53

acisseJZhong reviewed Jan 14, 2026

View reviewed changes

wwwjn mentioned this pull request Jan 16, 2026

[rl] refactor grader and trainer generator actor #2244

Open

tianyu-l reviewed Jan 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[rl] refactor save and load model weights using DCP #2221

[rl] refactor save and load model weights using DCP #2221

Uh oh!

wwwjn commented Jan 13, 2026 •

edited

Loading

Uh oh!

acisseJZhong Jan 14, 2026 •

edited

Loading

Uh oh!

acisseJZhong Jan 14, 2026

Uh oh!

fegin commented Jan 14, 2026

Uh oh!

tianyu-l Jan 19, 2026

Uh oh!

tianyu-l Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants


		return self.load_weights_from_state_dict(torchtitan_state_dict)

		def load_weights(self, weights_iter):

[rl] refactor save and load model weights using DCP #2221

Are you sure you want to change the base?

[rl] refactor save and load model weights using DCP #2221

Uh oh!

Conversation

wwwjn commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

acisseJZhong Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acisseJZhong Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

fegin commented Jan 14, 2026

Uh oh!

tianyu-l Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

tianyu-l Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wwwjn commented Jan 13, 2026 •

edited

Loading

acisseJZhong Jan 14, 2026 •

edited

Loading