Skip to content

Conversation

@tku137
Copy link
Collaborator

@tku137 tku137 commented Jan 11, 2026

This adds the possibility to bulk-sync annotations. It is probably a rarely used feature, I used it to have an initial sync of all book annotations without the need to open each book individually.

Unfortunately, getting annotations is only available for the currently opened book in koreaders api, see #79. Crawling the whole library directory for sidecar files would be pretty heavy. So in this PR, I use koreaders history api, which gives a list of books opened in the past. This includes read, reading, paused and so on books. So my thinking is: for a book to have highlights, it must have been opened at one point and thus is listed in the history.

Having a list of books from koreaders history, we then can loop over all these books and sync their annotations one by one.

Several improvements have been made along the way, for example:

  • only open docsettings once (instead of three times)
  • reduce function calling (extract common functions)
  • use md5 from sidecar files, no need to query the statistics database for the md5 value

@tku137
Copy link
Collaborator Author

tku137 commented Jan 11, 2026

Hm, need to look into why the test was failing. Not sure why any of the changes I made here affected ths.

@GeorgeSG
Copy link
Owner

The test is flaky :/ I hit this earlier this morning.

Comment on lines 33 to 34
local logger = require("logger")
local DocSettings = require("docsettings")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is me being unfamiliar with lua, but can't we define these once outside the functions and use them? Seems counter-intuitive to require every time.

Copy link
Collaborator Author

@tku137 tku137 Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah true, good catch! I had an issue with ReaderUI, which is only available when reading a book. So I guess I put all requires inside functions for no reason. Just improved it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that requiring ReaderUI in functions is sub-optimal. I found the reason for my issues with it and implemented lazy-loading it, works perfectly fine on my side.

While at it, I found a better KoReader API for doc_settings, we use that now.

@GeorgeSG
Copy link
Owner

@tku137 can we chat directly on another platform? Discord or something else?

@tku137
Copy link
Collaborator Author

tku137 commented Jan 11, 2026

@tku137 can we chat directly on another platform? Discord or something else?

Sure, I have a Discord account. I could send you a mail with my handle if you like? Or just send me an invite.

@GeorgeSG
Copy link
Owner

@tku137 can we chat directly on another platform? Discord or something else?

Sure, I have a Discord account. I could send you a mail with my handle if you like? Or just send me an invite.

Yup, perfect! Shoot an email at [email protected]

highlights = (stats and stats.highlights) or 0,
notes = (stats and stats.notes) or 0,
last_open = (summary and summary.modified) or os.time(),
total_read_time = 0,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like total_read_time will always be 0? Wouldn't we lose that data if we sync like that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took some time. I now remember where this comes from: I have a user-patch that stores total_read_time and total_read_pages to the sidecar files because I wanted to use these for something. Thus I added those while looking at my sidecar files. HOWEVER, this is actually dangerous to sync during annotation sync and I will remove it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please check my solution. In my understanding, I now need to distinguish between statistics.db sync and annotations sync, thus total_read_time and total_read_pages in the upload service need to be updated only on the statistics.db sync path, right?

@GeorgeSG
Copy link
Owner

The current sync method also bulk syncs all books, it just doesn't bulk sync with highlights, correct? I'm wondering if we should be keeping both of the syncs - seems kinda confusing. If this implementation bulk syncs all books with annotations, we can probably keep only that and remove the old one?

Also, maybe it's a nice idea to sync only the current book on suspend? So maybe it's worth implementing a "Sync current book" alongside "Sync all books with all highlights"?

…for bulk operations

There were too many requires inside functions all over the place. Also,
I saw a nicer Koreader API to get doc_settings. But I encountered an
error once, turns out that after rebooting KoReader, there is a chance
that ReaderUI is not yet there, failing requires at module level. I used
lazy loading, which seems to run just fine.

Its still nicer to require once, not for every function call, ESPECIALLY
during bulk sync!

This prevents crashes when requiring the module outside the reader,
preserves live in-memory settings during normal usage, and makes bulk
annotation reads safe and non-destructive.
@tku137
Copy link
Collaborator Author

tku137 commented Jan 21, 2026

The current sync method also bulk syncs all books, it just doesn't bulk sync with highlights, correct? I'm wondering if we should be keeping both of the syncs - seems kinda confusing. If this implementation bulk syncs all books with annotations, we can probably keep only that and remove the old one?

Also, maybe it's a nice idea to sync only the current book on suspend? So maybe it's worth implementing a "Sync current book" alongside "Sync all books with all highlights"?

This is an excellent idea. I think we should have a single "Synchronize data" button in the menu that syncs statistics (only available in bulk) and annotations in bulk, while keeping a "sync statistics as usual and annotations for opened book" version in the sync-on-suspend path.

I am still somehow worried about the performance of bulk-syncing annotations, but given how "bulky" a statistics sync already is and how basically instantaneous an annotations sync is, I guess even for a couple hundred books, this still is pretty fast, since we utilize the Koreader history of opened books without crawling through a huge library.

@coveralls
Copy link

coveralls commented Jan 21, 2026

Coverage Status

coverage: 53.861% (+0.2%) from 53.632%
when pulling 21ae613 on tku137:bulk-sync
into 1eb7dcd on GeorgeSG:master.

…tistics in full sync

- rename sync functions for clarity: sync() → syncCurrentBook(), bulkSync() → syncAllBooks()
- merge "Bulk Sync All Books" into single "Synchronize data" menu option
- fix syncAllBooks() to include full statistics sync from database (was only syncing annotations)
- sync-on-suspend now explicitly use syncCurrentBook() for fast, targeted sync

Manual sync now properly syncs all statistics + all annotations.
Suspend sync remains fast, syncing all statistics + current book only.
I am unsure about the performance impact on syncing all anotations
actually is though...
@tku137 tku137 requested a review from GeorgeSG January 21, 2026 19:52
Copy link
Owner

@GeorgeSG GeorgeSG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again, this looks very good! Added some questions, this is more for my understanding of what's going on :D

@@ -1,5 +1,5 @@
export type KoReaderBook = {
id: number;
id: number; // Optional for annotation-only sync
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mark as ?: number then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I thought this is optional, removed the ? but not the comment, will do this.

-- when inside reader
local ui = get_live_ui()
if ui and ui.doc_settings then
return ui.doc_settings:readSetting("partial_md5_checksum")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about #90, #91, #92. Do you think this will address the problem? Are the book md5s identical when syncing?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have read these comments before spending a significant amount of time :D Yes, I suspect this could be in fact the culprit.

Specifically: there might be a mismatch between the statistics and annotation sync path, which would explain the duplication of books, one with correct statistics and one with correct annotations, as some users reported. I don't know why not all users end up with such duplications, though. Unfortunately: "it works in my machine" makes debugging this really hard here :D

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was wrong: the mismatch between statistics and annotations path happens on master! Using md5 should absolutely solve the title matching. I was looking on the bulk-sync branch, but that is of course not the code users have reported matching issues with 🤦‍♂️

However, there are other issues I found that could still lead to "ghost books" and will investigate...


if not book_md5 then
logger.warn("[KoInsight] Could not determine MD5 for current book, skipping annotations")
return annotations_by_book
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should return nil or {} to be sure this is hardcoded empty (just in case something somewhere pushes data to annotations_by_book.

-- Get MD5
local md5 = doc_settings:readSetting("partial_md5_checksum")
if not md5 then
logger.warn("[KoInsight] No MD5 found in sidecar for:", file_path)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Idea for the future:

Should we push these logs (warnings & errors) to KoInsight server? I'm thinking it will be useful to debug issues from users - easier to get them to share docker logs than KoReader logs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good idea. Would make bug-hunting a lot easier.

sorting_hint = "tools",
sub_item_table = {
-- 1) Synchronize data
-- 1) Synchronize data (all books)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a menu entry for sync current book?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its ok to have only one sync button that syncs all books, statistics is all books anyways, and annotations are synced for all books. So the "(all books)" makes sense for the user. But we use a sync current book function internally for sync on suspend, where we sync all statistics as usual, and annotations for the currently opened book. Thats how I implemented it in the latest version of this PR.

However, I can happily add a sync current book back (it was there till one of the latest commits).

@GeorgeSG
Copy link
Owner

We should also probably bump the plugin version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants