[FSDP2] enable per-param mesh FSDP2 for MoE and per-layer compile #2281

weifengpy · 2026-01-28T08:37:53Z

command: CUDA_VISIBLE_DEVICES=4,5,6,7 NGPU=4 CONFIG_FILE="./torchtitan/models/deepseek_v3/train_configs/deepseek_v3_16b.toml" ./run_train.sh

fsdp2 support per-param mesh: pytorch/pytorch#173509

this PR applies fully_shard on transformer_block, sharding experts on edp_mesh, and other params on dp_mesh

FSDPModule schedule 2 all-gather sequentially: 1st on transformer blocks, 2nd on experts

this make it possible for apply torch.compile on each transformer_block

def _shard_placement_fn(param: nn.Parameter) -> ShardPlacementResult:
    if param in expert_params:
        # Expert parameters: use Shard(1) on edp_mesh
        return ShardPlacementResult(
            placement=Shard(1), mesh_info=edp_mesh_info
        )
    else:
        # Non-expert parameters: use Shard(0) on dp_mesh
        return ShardPlacementResult(
            placement=Shard(0), mesh_info=dp_mesh_info
        )

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

[FSDP2] enable per-param mesh FSDP2 for MoE and per-layer compile

857c815

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

weifengpy requested review from fegin, tianyu-l, wconstab and wwwjn as code owners January 28, 2026 08:37

pytorch-bot bot added the ciflow/8gpu label Jan 28, 2026

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP2] enable per-param mesh FSDP2 for MoE and per-layer compile #2281

[FSDP2] enable per-param mesh FSDP2 for MoE and per-layer compile #2281

Uh oh!

weifengpy commented Jan 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[FSDP2] enable per-param mesh FSDP2 for MoE and per-layer compile #2281

Are you sure you want to change the base?

[FSDP2] enable per-param mesh FSDP2 for MoE and per-layer compile #2281

Uh oh!

Conversation

weifengpy commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

weifengpy commented Jan 28, 2026 •

edited

Loading