-
Notifications
You must be signed in to change notification settings - Fork 35
feat(koplugin): bulk sync annotations #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This reverts commit f7ec420.
We do not need to query the statistics database for book md5 values. We can just use the ones from the sidecar files, which we already have anyways.
…ionReader.cleanAnnotations
|
Hm, need to look into why the test was failing. Not sure why any of the changes I made here affected ths. |
|
The test is flaky :/ I hit this earlier this morning. |
| local logger = require("logger") | ||
| local DocSettings = require("docsettings") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is me being unfamiliar with lua, but can't we define these once outside the functions and use them? Seems counter-intuitive to require every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah true, good catch! I had an issue with ReaderUI, which is only available when reading a book. So I guess I put all requires inside functions for no reason. Just improved it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that requiring ReaderUI in functions is sub-optimal. I found the reason for my issues with it and implemented lazy-loading it, works perfectly fine on my side.
While at it, I found a better KoReader API for doc_settings, we use that now.
|
@tku137 can we chat directly on another platform? Discord or something else? |
Sure, I have a Discord account. I could send you a mail with my handle if you like? Or just send me an invite. |
Yup, perfect! Shoot an email at [email protected] |
| highlights = (stats and stats.highlights) or 0, | ||
| notes = (stats and stats.notes) or 0, | ||
| last_open = (summary and summary.modified) or os.time(), | ||
| total_read_time = 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like total_read_time will always be 0? Wouldn't we lose that data if we sync like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took some time. I now remember where this comes from: I have a user-patch that stores total_read_time and total_read_pages to the sidecar files because I wanted to use these for something. Thus I added those while looking at my sidecar files. HOWEVER, this is actually dangerous to sync during annotation sync and I will remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check my solution. In my understanding, I now need to distinguish between statistics.db sync and annotations sync, thus total_read_time and total_read_pages in the upload service need to be updated only on the statistics.db sync path, right?
|
The current sync method also bulk syncs all books, it just doesn't bulk sync with highlights, correct? I'm wondering if we should be keeping both of the syncs - seems kinda confusing. If this implementation bulk syncs all books with annotations, we can probably keep only that and remove the old one? Also, maybe it's a nice idea to sync only the current book on suspend? So maybe it's worth implementing a "Sync current book" alongside "Sync all books with all highlights"? |
…for bulk operations There were too many requires inside functions all over the place. Also, I saw a nicer Koreader API to get doc_settings. But I encountered an error once, turns out that after rebooting KoReader, there is a chance that ReaderUI is not yet there, failing requires at module level. I used lazy loading, which seems to run just fine. Its still nicer to require once, not for every function call, ESPECIALLY during bulk sync! This prevents crashes when requiring the module outside the reader, preserves live in-memory settings during normal usage, and makes bulk annotation reads safe and non-destructive.
This is an excellent idea. I think we should have a single "Synchronize data" button in the menu that syncs statistics (only available in bulk) and annotations in bulk, while keeping a "sync statistics as usual and annotations for opened book" version in the sync-on-suspend path. I am still somehow worried about the performance of bulk-syncing annotations, but given how "bulky" a statistics sync already is and how basically instantaneous an annotations sync is, I guess even for a couple hundred books, this still is pretty fast, since we utilize the Koreader history of opened books without crawling through a huge library. |
…tistics in full sync - rename sync functions for clarity: sync() → syncCurrentBook(), bulkSync() → syncAllBooks() - merge "Bulk Sync All Books" into single "Synchronize data" menu option - fix syncAllBooks() to include full statistics sync from database (was only syncing annotations) - sync-on-suspend now explicitly use syncCurrentBook() for fast, targeted sync Manual sync now properly syncs all statistics + all annotations. Suspend sync remains fast, syncing all statistics + current book only. I am unsure about the performance impact on syncing all anotations actually is though...
GeorgeSG
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again, this looks very good! Added some questions, this is more for my understanding of what's going on :D
| @@ -1,5 +1,5 @@ | |||
| export type KoReaderBook = { | |||
| id: number; | |||
| id: number; // Optional for annotation-only sync | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we mark as ?: number then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, I thought this is optional, removed the ? but not the comment, will do this.
| -- when inside reader | ||
| local ui = get_live_ui() | ||
| if ui and ui.doc_settings then | ||
| return ui.doc_settings:readSetting("partial_md5_checksum") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have read these comments before spending a significant amount of time :D Yes, I suspect this could be in fact the culprit.
Specifically: there might be a mismatch between the statistics and annotation sync path, which would explain the duplication of books, one with correct statistics and one with correct annotations, as some users reported. I don't know why not all users end up with such duplications, though. Unfortunately: "it works in my machine" makes debugging this really hard here :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was wrong: the mismatch between statistics and annotations path happens on master! Using md5 should absolutely solve the title matching. I was looking on the bulk-sync branch, but that is of course not the code users have reported matching issues with 🤦♂️
However, there are other issues I found that could still lead to "ghost books" and will investigate...
|
|
||
| if not book_md5 then | ||
| logger.warn("[KoInsight] Could not determine MD5 for current book, skipping annotations") | ||
| return annotations_by_book |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should return nil or {} to be sure this is hardcoded empty (just in case something somewhere pushes data to annotations_by_book.
| -- Get MD5 | ||
| local md5 = doc_settings:readSetting("partial_md5_checksum") | ||
| if not md5 then | ||
| logger.warn("[KoInsight] No MD5 found in sidecar for:", file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 Idea for the future:
Should we push these logs (warnings & errors) to KoInsight server? I'm thinking it will be useful to debug issues from users - easier to get them to share docker logs than KoReader logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats a good idea. Would make bug-hunting a lot easier.
| sorting_hint = "tools", | ||
| sub_item_table = { | ||
| -- 1) Synchronize data | ||
| -- 1) Synchronize data (all books) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a menu entry for sync current book?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its ok to have only one sync button that syncs all books, statistics is all books anyways, and annotations are synced for all books. So the "(all books)" makes sense for the user. But we use a sync current book function internally for sync on suspend, where we sync all statistics as usual, and annotations for the currently opened book. Thats how I implemented it in the latest version of this PR.
However, I can happily add a sync current book back (it was there till one of the latest commits).
|
We should also probably bump the plugin version |
This adds the possibility to bulk-sync annotations. It is probably a rarely used feature, I used it to have an initial sync of all book annotations without the need to open each book individually.
Unfortunately, getting annotations is only available for the currently opened book in koreaders api, see #79. Crawling the whole library directory for sidecar files would be pretty heavy. So in this PR, I use koreaders history api, which gives a list of books opened in the past. This includes read, reading, paused and so on books. So my thinking is: for a book to have highlights, it must have been opened at one point and thus is listed in the history.
Having a list of books from koreaders history, we then can loop over all these books and sync their annotations one by one.
Several improvements have been made along the way, for example: