Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 10, 2026

Fix starting_points not preserving FLAML_sample_size

Plan:

  • Explore repository and understand the issue
  • Identify root cause: best_config_per_estimator property removes FLAML_sample_size via sanitize() method
  • Create test to reproduce the issue with sample size preservation
  • Fix best_config_per_estimator to preserve FLAML_sample_size when present
  • Run test to verify the fix works
  • Check for any other impacts of the change
  • Run existing tests to ensure no regressions
  • Update documentation to reflect the code changes
  • Request code review (completed - ready for human review)

Root Cause:

The issue was in /home/runner/work/FLAML/FLAML/flaml/automl/automl.py at line 481:

  • best_config_per_estimator was calling AutoMLState.sanitize() which removes FLAML_sample_size from configs
  • When these configs are passed as starting_points to a new AutoML fit, the sample size information was lost
  • This caused the new run to use different sample sizes and essentially start from scratch instead of warm-starting properly

Fix Applied:

Modified the best_config_per_estimator property to manually copy and clean the config without removing FLAML_sample_size:

# OLD CODE (removed FLAML_sample_size):
return {
    e: e_search_state.best_config and AutoMLState.sanitize(e_search_state.best_config)
    for e, e_search_state in self._search_states.items()
}

# NEW CODE (preserves FLAML_sample_size):
result = {}
for e, e_search_state in self._search_states.items():
    if e_search_state.best_config:
        config = e_search_state.best_config.get("ml", e_search_state.best_config).copy()
        # Remove internal keys that are not needed for starting_points, but keep FLAML_sample_size
        config.pop("learner", None)
        config.pop("_choice_", None)
        result[e] = config
    else:
        result[e] = None
return result

Documentation Updates:

Updated website/docs/Use-Cases/Task-Oriented-AutoML.md:

  • Added note in "Warm start" section explaining FLAML_sample_size preservation
  • Added example output showing FLAML_sample_size in best_config_per_estimator
  • Explained importance for effective warm-starting with correct sample sizes

Testing Evidence:

Test with 50k sample dataset (forces sampling):

Dataset size: 50000
First run - Best loss: 0.1651
  Config keys: ['FLAML_sample_size', 'colsample_bytree', ...]
  Contains FLAML_sample_size: True
  FLAML_sample_size value: 45000
  ✓ FLAML_sample_size correctly preserved in best_config_per_estimator!

Second run trial 1 config: {..., 'FLAML_sample_size': 45000}
✓ Test passed! FLAML_sample_size is preserved correctly in starting_points.

Existing tests all pass:

  • test_fit_w_freezinghp_starting_point
  • test_nobudget
  • test_FLAML_sample_size_in_starting_points (explicitly tests FLAML_sample_size handling)
  • test_iris (default test with starting_points)
  • ✓ Basic integration tests

Impact:

  • Minimal code change: Only 11 lines modified in one function
  • Documentation updated: Clear explanation for users
  • No breaking changes: best_config property (used by estimators) unchanged
  • Fixes reported issue: Users can now properly warm-start AutoML runs with preserved sample sizes
  • Backward compatible: All existing functionality preserved

Closes #1318


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Modified best_config_per_estimator property to keep FLAML_sample_size when returning best configurations. Previously, AutoMLState.sanitize() was removing this key, which caused the sample size information to be lost when using starting_points from a previous run.

Co-authored-by: thinkall <[email protected]>
Copilot AI changed the title [WIP] Fix starting points not utilized in AutoML fitting Fix: Preserve FLAML_sample_size in best_config_per_estimator Jan 10, 2026
Copilot AI requested a review from thinkall January 10, 2026 08:44
@thinkall thinkall marked this pull request as ready for review January 19, 2026 15:50
@thinkall
Copy link
Collaborator

@copilot , update document to reflect the code changes.

Updated Task-Oriented-AutoML.md to document that best_config_per_estimator now preserves FLAML_sample_size:
- Added note in "Warm start" section explaining that FLAML_sample_size is preserved for effective warm-starting
- Added note in "Get best configuration" section with example showing FLAML_sample_size in output
- Explains importance of sample size preservation for continuing optimization with correct sample sizes

Co-authored-by: thinkall <[email protected]>
Copy link
Contributor Author

Copilot AI commented Jan 19, 2026

@copilot , update document to reflect the code changes.

Updated documentation in commit f09fe5c. Added notes in Task-Oriented-AutoML.md:

  • Warm start section: explains FLAML_sample_size preservation for effective warm-starting
  • Best configuration section: shows example output with FLAML_sample_size and explains its importance

@thinkall thinkall merged commit 5f1aa2d into main Jan 19, 2026
16 checks passed
@thinkall thinkall deleted the copilot/fix-starting-point-usage branch January 19, 2026 23:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

starting_point not used

3 participants