-
-
Notifications
You must be signed in to change notification settings - Fork 211
[ENH] V1 → V2 API Migration - datasets #1608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1608 +/- ##
==========================================
+ Coverage 53.02% 53.32% +0.29%
==========================================
Files 36 46 +10
Lines 4326 4645 +319
==========================================
+ Hits 2294 2477 +183
- Misses 2032 2168 +136 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
FYI @geetu040 Currently the
Issues:
Example:current def _get_dataset_features_file(did_cache_dir: str | Path | None, dataset_id: int) -> dict[int, OpenMLDataFeature]:
return _featuresOr by updating the Dataset class to use the underlining interface method from api_context directly. def _load_features(self) -> None:
...
self._features = api_context.backend.datasets.get_features(self.dataset_id)Another option is to add |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left an intermediate review. This is solid work and well done overall. Nice job. I'll look into the download part now.
| def list( | ||
| self, | ||
| limit: int, | ||
| offset: int, | ||
| *, | ||
| data_id: list[int] | None = None, # type: ignore | ||
| **kwargs: Any, | ||
| ) -> pd.DataFrame: ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not have same signature for all 3 methods: DatasetsAPI.list, DatasetsV1.list, DatasetsV2.list? does it raise pre-commit failures since a few might not be used?
| def list( | ||
| self, | ||
| limit: int, | ||
| offset: int, | ||
| *, | ||
| data_id: list[int] | None = None, # type: ignore | ||
| **kwargs: Any, | ||
| ) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can make this simple using private helper methods
| bool | ||
| True if the deletion was successful. False otherwise. | ||
| """ | ||
| return openml.utils._delete_entity("data", dataset_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you implement the delete logic yourself instead of openml.utils._delete_entity, how would that look? I think it would be better.
| def list( | ||
| self, | ||
| limit: int, | ||
| offset: int, | ||
| **kwargs: Any, | ||
| ) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, it can use private helper methods
| # Minimalistic check if the XML is useful | ||
| if "oml:data_qualities_list" not in qualities: | ||
| raise ValueError('Error in return XML, does not contain "oml:data_qualities_list"') | ||
| from openml._api import api_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we have this import at the very top? does it create circular import error? if not, should be moved to top from all functions.
Metadata
Reference Issue: [ENH] V1 → V2 API Migration - datasets #1592
Depends on: [ENH] V1 → V2 API Migration - core structure #1576
Change Log Entry:This PR implements Datasets resource, and refactor its existing functions