Fix: Preserve FLAML_sample_size in best_config_per_estimator #1475

Copilot · 2026-01-10T08:27:00Z

Fix starting_points not preserving FLAML_sample_size

Plan:

Explore repository and understand the issue
Identify root cause: best_config_per_estimator property removes FLAML_sample_size via sanitize() method
Create test to reproduce the issue with sample size preservation
Fix best_config_per_estimator to preserve FLAML_sample_size when present
Run test to verify the fix works
Check for any other impacts of the change
Run existing tests to ensure no regressions
Update documentation to reflect the code changes
Request code review (completed - ready for human review)

Root Cause:

The issue was in /home/runner/work/FLAML/FLAML/flaml/automl/automl.py at line 481:

best_config_per_estimator was calling AutoMLState.sanitize() which removes FLAML_sample_size from configs
When these configs are passed as starting_points to a new AutoML fit, the sample size information was lost
This caused the new run to use different sample sizes and essentially start from scratch instead of warm-starting properly

Fix Applied:

Modified the best_config_per_estimator property to manually copy and clean the config without removing FLAML_sample_size:

# OLD CODE (removed FLAML_sample_size):
return {
    e: e_search_state.best_config and AutoMLState.sanitize(e_search_state.best_config)
    for e, e_search_state in self._search_states.items()
}

# NEW CODE (preserves FLAML_sample_size):
result = {}
for e, e_search_state in self._search_states.items():
    if e_search_state.best_config:
        config = e_search_state.best_config.get("ml", e_search_state.best_config).copy()
        # Remove internal keys that are not needed for starting_points, but keep FLAML_sample_size
        config.pop("learner", None)
        config.pop("_choice_", None)
        result[e] = config
    else:
        result[e] = None
return result

Documentation Updates:

Updated website/docs/Use-Cases/Task-Oriented-AutoML.md:

Added note in "Warm start" section explaining FLAML_sample_size preservation
Added example output showing FLAML_sample_size in best_config_per_estimator
Explained importance for effective warm-starting with correct sample sizes

Testing Evidence:

Test with 50k sample dataset (forces sampling):

Dataset size: 50000
First run - Best loss: 0.1651
  Config keys: ['FLAML_sample_size', 'colsample_bytree', ...]
  Contains FLAML_sample_size: True
  FLAML_sample_size value: 45000
  ✓ FLAML_sample_size correctly preserved in best_config_per_estimator!

Second run trial 1 config: {..., 'FLAML_sample_size': 45000}
✓ Test passed! FLAML_sample_size is preserved correctly in starting_points.

Existing tests all pass:

✓ test_fit_w_freezinghp_starting_point
✓ test_nobudget
✓ test_FLAML_sample_size_in_starting_points (explicitly tests FLAML_sample_size handling)
✓ test_iris (default test with starting_points)
✓ Basic integration tests

Impact:

Minimal code change: Only 11 lines modified in one function
Documentation updated: Clear explanation for users
No breaking changes: best_config property (used by estimators) unchanged
Fixes reported issue: Users can now properly warm-start AutoML runs with preserved sample sizes
Backward compatible: All existing functionality preserved

Closes #1318

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Modified best_config_per_estimator property to keep FLAML_sample_size when returning best configurations. Previously, AutoMLState.sanitize() was removing this key, which caused the sample size information to be lost when using starting_points from a previous run. Co-authored-by: thinkall <[email protected]>

thinkall · 2026-01-19T16:09:24Z

@copilot , update document to reflect the code changes.

Updated Task-Oriented-AutoML.md to document that best_config_per_estimator now preserves FLAML_sample_size: - Added note in "Warm start" section explaining that FLAML_sample_size is preserved for effective warm-starting - Added note in "Get best configuration" section with example showing FLAML_sample_size in output - Explains importance of sample size preservation for continuing optimization with correct sample sizes Co-authored-by: thinkall <[email protected]>

Copilot · 2026-01-19T16:13:33Z

@copilot , update document to reflect the code changes.

Updated documentation in commit f09fe5c. Added notes in Task-Oriented-AutoML.md:

Warm start section: explains FLAML_sample_size preservation for effective warm-starting
Best configuration section: shows example output with FLAML_sample_size and explains its importance

Initial plan

779f4c5

Copilot AI assigned Copilot and thinkall Jan 10, 2026

Copilot started work on behalf of thinkall January 10, 2026 08:31 View session

Copilot AI changed the title ~~[WIP] Fix starting points not utilized in AutoML fitting~~ Fix: Preserve FLAML_sample_size in best_config_per_estimator Jan 10, 2026

Copilot AI requested a review from thinkall January 10, 2026 08:44

Copilot finished work on behalf of thinkall January 10, 2026 08:44

thinkall and others added 2 commits January 19, 2026 23:22

Merge branch 'main' into copilot/fix-starting-point-usage

dc61a4e

Add a test to verify the improvement of starting_points

6acb53b

thinkall marked this pull request as ready for review January 19, 2026 15:50

Merge branch 'main' into copilot/fix-starting-point-usage

43d25c9

Copilot started work on behalf of thinkall January 19, 2026 16:09 View session

Copilot finished work on behalf of thinkall January 19, 2026 16:14

thinkall added 2 commits January 20, 2026 00:16

Fix unintended code change

7c84c41

Improve docstrings and docs

30fb06a

jianglibigdata approved these changes Jan 19, 2026

View reviewed changes

thinkall merged commit 5f1aa2d into main Jan 19, 2026
16 checks passed

thinkall deleted the copilot/fix-starting-point-usage branch January 19, 2026 23:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Preserve FLAML_sample_size in best_config_per_estimator #1475

Fix: Preserve FLAML_sample_size in best_config_per_estimator #1475

Uh oh!

Copilot AI commented Jan 10, 2026 •

edited by thinkall

Loading

Uh oh!

thinkall commented Jan 19, 2026

Uh oh!

Copilot AI commented Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix: Preserve FLAML_sample_size in best_config_per_estimator #1475

Fix: Preserve FLAML_sample_size in best_config_per_estimator #1475

Uh oh!

Conversation

Copilot AI commented Jan 10, 2026 • edited by thinkall Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix starting_points not preserving FLAML_sample_size

Plan:

Root Cause:

Fix Applied:

Documentation Updates:

Testing Evidence:

Impact:

Uh oh!

thinkall commented Jan 19, 2026

Uh oh!

Copilot AI commented Jan 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jan 10, 2026 •

edited by thinkall

Loading