Skip to content

Transform 1-6 of top ten RUS12 tables#4970

Merged
aesharpe merged 20 commits intomainfrom
rus12-transform-2
Feb 5, 2026
Merged

Transform 1-6 of top ten RUS12 tables#4970
aesharpe merged 20 commits intomainfrom
rus12-transform-2

Conversation

@aesharpe
Copy link
Member

@aesharpe aesharpe commented Jan 22, 2026

Overview

Part of #4901 and #4884.

What problem does this address?

Adds transform functions for top RUS tables

  • "statement_of_operations"
  • "balance_sheet"
  • "sources_and_distribution"
  • "renewable_plants"
  • "plant_labor"
  • "loans"
  • "long_term_debt"
  • "meeting_and_board"
  • lines_and_stations_labor_materials"
  • "borrowers"

What did you change?

  • Add dbt schema.yml file for transformed tables
  • Add transform function for transformed tables
  • Update alembic
  • Add transformed table schema to RESOURCE_METADATA from DRAFT_RESOURCE_METADATA

TO-DO

  • Finish migrating table schema to RESOURCE_METADATA from DRAFT_RESOURCE_METADATA
  • Keep updating alembic
  • Keep adding new fields to fields.py
  • Keep adding new dbt schemas and simple tests

Documentation

Make sure to update relevant aspects of the documentation:

  • Update the release notes: reference the PR and related issues.
  • Update relevant Data Source jinja templates (see docs/data_sources/templates).
  • Update relevant table or source description metadata (see src/metadata).
  • Review and update any other aspects of the documentation that might be affected by this PR.

Testing

How did you make sure this worked? How can a reviewer verify this?

  • Materialize the asset in dagster
  • Run unit tests

To-do list

  • If updating analyses or data processing functions: make sure to update row count expectations in dbt tests.
  • Run pixi run pre-commit-run to run linters and static code analysis checks.
  • Run pixi run pytest-ci locally to ensure that the merge queue will accept your PR.
  • Review the PR yourself and call out any questions or issues you have.
  • For PRs that change the PUDL outputs significantly, run the full ETL locally and then run the data validations using dbt. If you can't run the ETL locally then run the build-deploy-pudl GitHub Action manually and ensure that it succeeds.

@aesharpe aesharpe self-assigned this Jan 22, 2026
@aesharpe aesharpe added new-data Requests for integration of new data. integrate integrate outside work into the PUDL repository rus12 USDA Rural Utilities Services Form 12 -- Financial and Operating Report: Electric Power Supply labels Jan 22, 2026
@aesharpe aesharpe moved this from New to In progress in Catalyst Megaproject Jan 22, 2026
@aesharpe aesharpe requested a review from cmgosnell January 24, 2026 00:07
@@ -1,4 +1,4 @@
year_index,borrower_id_rus,borrower_name_rus,report_year,report_month,debt_description,balance_end_of_report_year,interest,principal,total
year_index,borrower_id_rus,borrower_name_rus,report_year,report_month,debt_description,debt_balance_end_of_report_year,debt_interest_billed,debt_principal_billed,total
Copy link
Member Author

@aesharpe aesharpe Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated these columns so it would be more clear what they were. Take note @cmgosnell for rus7 if you think this is a good change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the add of debt_ at the beginning (i added loan_ but i like this better i will mirror) but i don't love _billed. I'd rather go with ending_balance instead of balance_end_of_report_year bc that's what we use for ferc1 and this is super analogous. the form says "billed this year". I think in general it is expected that the values reported are reported during the report date period and idk billed just seems superfluous but i won't die on that hill.

total should have the same treatment!

I'd shoot towards this schema:

"fields": [
                "report_date",
                "borrower_id_rus",
                "borrower_name_rus",
                "debt_description",
                "debt_ending_balance",
                "debt_interest",
                "debt_principal",
                "debt_total",
            ],

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I excluded the debt_total field here because it was redundant, do you think I should add it back in?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should keep the totals in! if we have time to validate that interest + principal = total then i could imagining removing it but unless/until then i think keeping it in is more safe (also what if there are things in total like fees or other cram that means you can't always add em up)

@aesharpe aesharpe changed the title Transform top ten RUS12 tables Transform 1-5 of top ten RUS12 tables Jan 29, 2026
@aesharpe aesharpe marked this pull request as ready for review January 29, 2026 20:21
@aesharpe aesharpe changed the title Transform 1-5 of top ten RUS12 tables Transform 1-6 of top ten RUS12 tables Jan 29, 2026
Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks real good!! are you waiting on the unit conversion from rus7 for the last two of these four tables? (is that why some of the fields are commented out? seems like it but 🤷🏻 . the second phase of rus7 tables are merged now so that should be accessible now. (be wary of alembic migrations 🙃 - i tired to lump that all together so it'd be easier)

for the fully transformed tables i left some tiny nits + we should get lined up about debt column names which feels easy and minor!

also add release notes!

@@ -1,4 +1,4 @@
year_index,borrower_id_rus,borrower_name_rus,report_year,report_month,debt_description,balance_end_of_report_year,interest,principal,total
year_index,borrower_id_rus,borrower_name_rus,report_year,report_month,debt_description,debt_balance_end_of_report_year,debt_interest_billed,debt_principal_billed,total
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i like the add of debt_ at the beginning (i added loan_ but i like this better i will mirror) but i don't love _billed. I'd rather go with ending_balance instead of balance_end_of_report_year bc that's what we use for ferc1 and this is super analogous. the form says "billed this year". I think in general it is expected that the values reported are reported during the report date period and idk billed just seems superfluous but i won't die on that hill.

total should have the same treatment!

I'd shoot towards this schema:

"fields": [
                "report_date",
                "borrower_id_rus",
                "borrower_name_rus",
                "debt_description",
                "debt_ending_balance",
                "debt_interest",
                "debt_principal",
                "debt_total",
            ],

Copy link
Member

@cmgosnell cmgosnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

three non-blocking suggestions: remove dollars from two column names, added enums and tinsy baby remove additional_details_text. honestly all so small so as to be non-blocking.

but i generated all of these tables locally and they look good!

Comment on lines 6069 to 6076
"primary_renewable_fuel_type": {
"type": "string",
"description": ("Primary renewable fuel type used by the plant."),
},
"primary_renewable_fuel_type_id": {
"type": "integer",
"description": ("Unique numeric identifier for each renewable fuel type."),
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can or should these be enumed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea I can do that quickly!

Comment on lines 6091 to 6098
"prime_mover_id": {
"type": "integer",
"description": "Unique numeric identifier for each prime mover type used by RUS borrowers.",
},
"prime_mover_type": {
"type": "string",
"description": "Type of prime mover (e.g. Hydro, Internal Combustion).",
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these also be enum-ed easily?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nbd but you left a handful of these empty text guys

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna leave the draft ones for now because they get handled in the other PR

@aesharpe aesharpe enabled auto-merge February 4, 2026 23:41
@aesharpe aesharpe added this pull request to the merge queue Feb 4, 2026
Merged via the queue into main with commit 28d4f13 Feb 5, 2026
14 checks passed
@aesharpe aesharpe deleted the rus12-transform-2 branch February 5, 2026 00:44
@github-project-automation github-project-automation bot moved this from In progress to Done in Catalyst Megaproject Feb 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integrate integrate outside work into the PUDL repository new-data Requests for integration of new data. rus12 USDA Rural Utilities Services Form 12 -- Financial and Operating Report: Electric Power Supply

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants