This package facilitates bias adjustment of streamed daily or hourly climate variables using a quantile mapping approach. Quantile mapping aligns the statistical distribution of model outputs with observed data by adjusting modelled quantiles to match observed counterparts, thereby correcting biases across the entire distribution.
Building upon the trend-preserving bias adjustment method detailed by Stefan Lange in Trend-preserving bias adjustment and statistical downscaling with ISIMIP3BASD, this package employs the TDigest algorithm from the crick library to enable efficient, one-pass streaming adjustments. Unlike traditional methods that require complete datasets, this approach processes data sequentially, making it well-suited for large-scale climate data applications as well as for workflows working with streamed data.
A notable feature of this package is its specialized approach for bias-adjusting future climate data. Instead of applying simple quantile mapping, the method constructs a dynamically evolving future distribution, allowing climate projections to adapt over time while maintaining consistency with historical trends. This approach follows the methodology outlined by Stefan Lange, ensuring that bias adjustment for future climate data remains robust. Detailed information on this approach is provided in the How the Bias Adjustment Works section below.
- Streaming Workflow: Processes data in a streaming mode, ideal for large datasets.
- Historical Data Adjustment: Applies quantile mapping with annual trend correction for historical data, using a linear trend fitted from annual mean values. The trend correction is optional.
- Future Projections: Modified quantile mapping for future periods, blending historical and projected CDFs with optional trend correction.
- Hourly Data Handling: Aggregates hourly values to daily means or sums before adjustment and rescaling to hourly values post-adjustment while preserving the original daily cycle.
- Efficient Memory Usage: Uses TDigest objects to incrementally build and store cumulative distribution functions (CDFs) while processing data in a single pass.
- TDigest Analysis: Provides statistical summaries (maximum, minimum, mean, variance) of TDigest distributions, stored as NetCDF files for spatial analysis.
Install the package using pip:
pip install git+https://earth.bsc.es/gitlab/digital-twins/de_340-2/bias_adjustment.gitThis package uses a quantile mapping approach for bias adjustment. This method requires a reference distribution of the variable, typically derived from observational or reforecast datasets. The adjustment is applied on a spatial grid, meaning the reference distribution must be available for each grid cell. It is crucial to ensure that the reference data has the same spatial resolution and locations as the data being bias-adjusted to ensure proper correction.
This package provides two key tools for bias adjustment:
bias_cor_est: Creates reference distributions from a dataset that meets the required resolution and spatial extent. The tool fits one distribution per month and stores them in pickle files containing TDigest objects. If masked values exist in the dataset, a mask file will also be created.bias_cor_map: Applies bias adjustment to input data using reference distributions. The tool requires a reference TDigest pickle file per month, optionally a mask file, the input dataset (to be bias-adjusted), and a model distribution file containing monthly distributions per grid cell.- Important: If the input data is from a date beyond the
future_start_date, an additional TDigest file containing the future distribution is required.
- Important: If the input data is from a date beyond the
For historical data (before the future_start_date), bias adjustment is applied using standard quantile mapping:
- Daily Aggregation: If the variable is available at a sub-daily time scale, it is aggregated to daily resolution before adjustment and disaggregated back afterward.
- Detrending (if enabled): Before deriving the quantile of the input value, the input data is detrended.
- Quantile Lookup: The quantile of the detrended input value is determined using the model TDigest.
- Bias Adjustment: The corresponding value for that quantile is looked up in the reference TDigest.
- Retrending (if enabled): The trend is added back to the adjusted value.
- Sub-daily Disaggregation: If applicable, the data is disaggregated back to its original time resolution.
For future data (i.e., data after the future_start_date defined in the CLI or ba_future_start_date in the ClimateDT workflow, with a default of 2022-01-01), a different approach is applied instead of simple quantile mapping:
-
Daily Aggregation:
- If the variable is available at a sub-daily time scale, it is aggregated to daily resolution before adjustment.
-
Creation of a Future T-Digest:
- A new TDigest object for each grid cell is created.
- This new TDigest is initialized using historical TDigest objects created before the future start date.
- A new corresponding pickle file is generated to store the future TDigest objects.
-
Updating Only the Future TDigest:
- Once the future period starts, the future TDigest is updated with incoming data.
- The historical TDigest remains unchanged from this point onward.
-
Controlling the Rate of Distribution Change:
- The user can specify the
future_weightparameter (CLI) orba_future_weight(ClimateDT workflow) to accelerate distribution changes for future data. - A higher weight allows the future TDigest to adapt more quickly to incoming climate projections.
- The user can specify the
-
Bias Adjustment in the Future Period:
- The adjustment follows these steps:
- Detrending (if enabled): The input data is detrended before computing its quantile.
- Quantile Lookup: Retrieve the quantile of the detrended input value from the future TDigest.
- Reference and Model Lookup: Look up the corresponding value for that quantile in the reference TDigest and the historical model TDigest.
- Future Adjustment: Compute the difference (for additive correction) or ratio (for multiplicative correction) between the future and historical model values and apply it to the reference value.
- Retrending (if enabled): The trend is added back to the adjusted value.
- Sub-daily Disaggregation: If applicable, the data is disaggregated back to its original time resolution.
- The adjustment follows these steps:
Before running the bias adjustment, consider the properties of the variable to be adjusted:
- Trend Correction: Decide whether a linear trend should be preserved. This choice must be consistent across both
bias_corr_estandbias_corr_map. - Adjustment Method: Choose between additive or multiplicative bias correction based on the variable’s properties.
- Data Aggregation: For hourly data, decide whether to aggregate as a sum or mean.
Estimate CDFs for a reference dataset using the CLI tool bias_corr_est. The reference data is typically based on observation and represents the best possible estimate of the real variable. The reference dataset should:
- Spatially match the model dataset to ensure proper alignment. Ensure that the resolution and grid of the reference dataset match those of the model dataset. If the resolution or grid structure differs, spatial interpolation may be necessary before bias correction to ensure that reference and model data are comparable.
- Cover a sufficiently long period (typically 30 years) to estimate a meaningful distribution.
- Avoid prolonged abnormal weather patterns that could distort statistical properties.
- Be contained in a NetCDF file where the variable has three dimensions, with time as the first dimension.
Use bias_corr_map to apply bias correction using the generated TDigest files.
- TDigest files store probability distributions for each grid cell.
- Masks define the spatial extent of bias adjustment, ensuring corrections are only applied where valid reference data exists.
The bias adjustment in the ClimatDT workflow can be configured within the opa configuration with the following keywords:
bias_adjust: bool = False # Apply bias adjustment
ba_reference_dir: str = None # Directory for reference TDigest pickle files created in step 1
ba_lower_threshold: float = -np.inf # Exclude values below this threshold from bias adjustment
ba_non_negative: bool = False # Enforce non-negative corrected values
ba_agg_method: str = "sum" # Aggregation method for hourly data ("sum" or "mean")
ba_future_method: str = "additive" # Future adjustment method ("additive" or "multiplicative")
ba_future_weight: float = 1.0 # Weight for future values in TDigest blending
ba_future_start_date: str = "9999-12-31" # Start date for future projections
ba_detrend: bool = False # Detrend variable before adjustment
ba_detrend_skip_years: int = 2 # Years to skip at the start for detrendingThe package provides three CLI tools:
The bias_corr_est command processes climate data to estimate cumulative distribution functions (CDFs) for each grid cell using TDigest objects. The tool generates one TDigest file per month, capturing the variable’s distribution across time.
- Reads climate data from NetCDF files.
- Computes TDigest distributions for each grid cell and month.
- Optionally applies a mask to align with a reference dataset.
- Supports detrending to preserve long-term trends.
- Outputs one TDigest pickle file per month, storing the estimated distributions.
bias_corr_est -h-
-f,--fname
(Default:precipitation_*.nc)
Input file pattern with wildcards. -
-i,--in_path
(Default: ``)
Path to the input NetCDF files. -
-o,--output_tdigest
(Default:tdigest_restart)
Directory where TDigest files will be saved. -
-v,--vname
(Default:pre)
Name of the variable to process from the dataset.
-
-l,--lower_threshold
(Default:-inf)
Values below this threshold are ignored in TDigest calculations. -
-m,--mask_file
(Default:None)
Path to a mask file to restrict calculations to a specific spatial domain. -
-p,--proceed
(Flag, Default:False)
If enabled, updates existing TDigest files instead of overwriting. -
-d,--detrend
(Flag, Default:False)
If enabled, removes long-term trends before quantile mapping. -
-y,--detrend_skip_years
(Default:2)
Number of initial years to skip when computing the detrending regression.
The bias_corr_map command applies bias correction to input climate data by mapping quantiles based on precomputed TDigest distributions. It supports both historical and future bias adjustment, with a switching mechanism based on a defined future_start_date.
- Reads climate data from NetCDF files and applies quantile mapping.
- Uses TDigest distributions for both reference and model data.
- Supports streaming workflows by incrementally updating distributions.
- Allows detrending before adjustment and reapplying trends afterward.
- Handles future climate projections by applying an adaptive bias adjustment approach.
bias_corr_map -h-
-f,--fname
(Default:precipitation_*.nc)
Input file pattern with wildcards. -
-i,--in_path
(Default:/data/cats/data/processed/a07q/)
Path to the input NetCDF files. -
-r,--tdigest_ref
(Default:/data/cats/bias_cor/bias_corr_hack/tdigest_restart/)
Path to reference TDigest pickle files. -
-t,--tdigest_model
(Default:tdigest_model/)
Path to model TDigest pickle files. -
-v,--vname
(Default:vname)
Name of the variable to process from the dataset. -
-o,--output_dir(Default:output_dir)
Directory where the adjusted NetCDF output files will be saved.
-
-l,--lower_threshold
(Default:-inf)
Values below this threshold are ignored in the adjustment. -
-n,--non_negative
(Flag, Default:False)
Ensures that output values are non-negative (cuts off values at 0). -
-m,--mask_file(Default: None)
Path to a mask file to define the valid spatial domain. -
-a,--agg_meth
(Default:sum)
Defines how sub-daily data is aggregated before bias adjustment.
Options:"sum","mean". -
-e,--evaluate
(Flag, Default:False)
Runs bias correction in evaluation mode without updating TDigest files. -
-p,--proceed
(Flag, Default:False)
Enables updating of existing TDigest files instead of overwriting. -
-s,--stream
(Flag, Default:False)
Streaming mode for workflows where bias correction is applied incrementally. -
-d,--detrend
(Flag, Default:False)
If enabled, removes long-term trends before quantile mapping. -
-y,--detrend_skip_years
(Default:2)
Number of initial years to skip in detrending regression. -
--future_start_date
(Default:2022-01-01)
Date (YYYY-MM-DD) when the method switches from historical to future bias adjustment. -
--future_method
(Default:additive)
Method for future bias adjustment:
Options:"additive","multiplicative". -
--future_weight
(Default:100)
Weight multiplier for future values to accelerate distribution adaptation.
The bias_corr_ana command provides an optional feature for analyzing the TDigest distributions stored in pickle files. It computes key statistics for each TDigest object in a given directory and outputs them as a NetCDF file, preserving the spatial grid and applying the associated mask.
Reads TDigest pickle files (typically 12, one per month) from a specified directory. Computes key statistics for each grid cell:
- Maximum
- Minimum
- Mean
- Variance
Stores results in a NetCDF file, structured with:
- A spatial grid aligned with the data
- A time axis representing each month
bias_corr_ana -h-
-n,--name
Variable name used to determine TDigest file names. -
-t,--tdigest
Path to the folder containing TDigest pickle files. -
-m,--mask_file
Path to the mask file defining the valid spatial domain (must be in the TDigest folder).
-
-o,--output_file
(Default:tdigest_analysis.nc)
Path to the output NetCDF file where statistics will be saved. -
-s,--hist_size
(Default:10)
Number of bins used to estimate mean and variance from the TDigest histogram.
LGPLv3 (c) 2023-2025 Stephan Thober, Sebastian Müller, Matthias Kelbling