feat: CID importer multithreading #67

svvimming · 2023-05-06T01:46:10Z

Description

Two main features have been added to the CID importer cron:

Backups of entire CID files retrieved from Web3.storage to the Open Panda backblaze bucket
Multithreaded processing:

CID retrieval from Web3.Storage, zst unpacking, metadata extraction and import to the database are now handled in a worker thread (cid-batch-import.js in the crons directory). The first part of the main cid-importer.js script is still the same; a manifest list of CIDs to download is still generated and stored to tmp/cid-files/cid-manifest.txt. However, where retrieving the files from the manifest list was previously handled in batches processed in series, now the script delegates batches out to worker threads to process in parallel. Two new arguments can be passed to the cid-importer.js script: --threads followed by the integer number of workers to add and a boolean argument --all, which, if true, skips the search for the last imported document in the database and retrieves all CIDs starting from the oldest existing upload. The previous two arguments, both which still apply, are; --pagesize - an integer specifying import/backup batch size and --maxpages - an integer to specify how many batches to process; if left unspecified, no limit will be placed on the number of batches.

Ticket link

https://www.notion.so/agencyundone/Backup-all-dataset-manifests-to-Backblaze-3196a93f141546a3a91602d78b3dbd7f?pvs=4

…pool

svvimming · 2023-06-02T17:55:52Z

closing in favor of #68

svvimming added 3 commits May 3, 2023 11:56

feat: cid-importer cron split to multithread processing using worker …

4645885

…pool

feat: move workerpool dependency to be package

49c1bdd

feat: console log statements and edge case handling

1359d3a

svvimming requested a review from justanothersynth May 6, 2023 01:46

svvimming self-assigned this May 6, 2023

svvimming mentioned this pull request Jun 2, 2023

feat: abstracted worker threads #68

Open

svvimming closed this Jun 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: CID importer multithreading #67

feat: CID importer multithreading #67

Uh oh!

svvimming commented May 6, 2023 •

edited

Loading

Uh oh!

svvimming commented Jun 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: CID importer multithreading #67

feat: CID importer multithreading #67

Uh oh!

Conversation

svvimming commented May 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Ticket link

Uh oh!

svvimming commented Jun 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

svvimming commented May 6, 2023 •

edited

Loading