fix: enable torch.autocast for TP parallelism without FSDP #2213

eous · 2026-01-08T23:49:56Z

Remove overly conservative restriction that disabled mixed precision for TP-only configurations. torch.autocast operates at the operator level and is orthogonal to tensor parallelism.

Before: TP-only training would show warning and disable mixed precision
After: TP-only training uses torch.autocast for mixed precision

Note: PP-only training uses schedule-based execution and doesn't use maybe_enable_amp (unchanged by this PR).

Affected configurations:

TP-only (now enabled)
DDP-only (was already enabled)
Single-device (was already enabled)
FSDP/HSDP (unchanged - handled internally by fully_shard)

Remove overly conservative restriction that disabled mixed precision for TP-only configurations. torch.autocast operates at the operator level and is orthogonal to tensor parallelism. Before: TP-only training would show warning and disable mixed precision After: TP-only training uses torch.autocast for mixed precision Note: PP-only training uses schedule-based execution and doesn't use maybe_enable_amp (unchanged by this PR). Affected configurations: - TP-only (now enabled) - DDP-only (was already enabled) - Single-device (was already enabled) - FSDP/HSDP (unchanged - handled internally by fully_shard)

Copilot

Pull request overview

This PR removes an overly conservative restriction that disabled mixed precision training for Tensor Parallelism (TP) configurations without FSDP. The change enables torch.autocast for TP-only training, recognizing that autocast operates at the operator level and is orthogonal to the parallelism strategy.

Key changes:

Simplified maybe_enable_amp function logic to enable autocast for all non-FSDP configurations
Improved code comments to clarify when mixed precision is handled by FSDP vs AMP
Added explanation that PP uses schedule-based execution and doesn't utilize this context

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tianyu-l

Have you verified it works properly? Could you show evidence, in terms of param / activation / grad dtype, and throughput comparison with mixed precision off?

I vaguely remember that I've tried it before and it didn't work as expected.

eous · 2026-01-20T23:03:58Z

Have you verified it works properly? Could you show evidence, in terms of param / activation / grad dtype, and throughput comparison with mixed precision off?

I vaguely remember that I've tried it before and it didn't work as expected.

https://huggingface.co/eousphoros/persona_eta_20b_131k This model was trained with TP=4 no fdsp. The output with autocast was inline with what I expected though I lack the depth of knowledge to formerly confirm this.

tianyu-l · 2026-01-24T22:59:00Z

Thanks. It's hard to tell from the plots that it is working properly.
My impression is that autocast is not actively maintained. Is it true? @fegin

Also curious why would you use amp with TP but without FSDP?

fegin · 2026-01-28T07:19:41Z

My impression is that autocast is not actively maintained. Is it true? @fegin

autocast is the core feature. But autocast with parallelisms is not actively maintained. We have seen performance gap between autocast + DDP and FSDP implementation with world_size being the same. IMO, if we can use FSDP we should use FSDP.

Copilot AI review requested due to automatic review settings January 8, 2026 23:49

eous requested review from fegin, tianyu-l, wconstab and wwwjn as code owners January 8, 2026 23:49

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Jan 8, 2026

Copilot started reviewing on behalf of eous January 8, 2026 23:50 View session

Copilot AI reviewed Jan 8, 2026

View reviewed changes

eous mentioned this pull request Jan 8, 2026

feat(gpt-oss): Add CPU offload optimizer, differential LR/WD, and more #2205

Open

tianyu-l requested changes Jan 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enable torch.autocast for TP parallelism without FSDP #2213

fix: enable torch.autocast for TP parallelism without FSDP #2213

Uh oh!

eous commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

tianyu-l left a comment

Uh oh!

eous commented Jan 20, 2026

Uh oh!

tianyu-l commented Jan 24, 2026

Uh oh!

fegin commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: enable torch.autocast for TP parallelism without FSDP #2213

Are you sure you want to change the base?

fix: enable torch.autocast for TP parallelism without FSDP #2213

Uh oh!

Conversation

eous commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

eous commented Jan 20, 2026

Uh oh!

tianyu-l commented Jan 24, 2026

Uh oh!

fegin commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants