-
Notifications
You must be signed in to change notification settings - Fork 975
bcli: Refactor bcli plugin to synchronous execution #8820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
plugins/bcli.c
Outdated
| res->output_len = strlen(res->output); | ||
|
|
||
| /* Wait for child to exit */ | ||
| while (waitpid(child, &status, 0) < 0 && errno == EINTR); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be good to have something like this here:
while (waitpid(child, &status, 0) < 0) {
if (errno == EINTR)
continue;
plugin_err(plugin, "waitpid(%s) failed: %s",
res->args, strerror(errno));
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, added this.
Also I think I forgot to close a from descriptor in wait_and_check_bitcoind and run_bitcoin_cliv. Added this as well.
| * we can attempt to prevent a crash if the 'getchaininfo' function returns | ||
| * a lower height than the one we already know, by waiting for a short period. | ||
| * However, I currently don't have a better idea on how to handle this situation. */ | ||
| u32 *height UNUSED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to leave this as it is and not use the optional last_height parameter. We're planning on getting rid of retries altogether inside bcli and have lightningd solely deal with retrying. So it feels a bit clunky to me to add retrying based on the last_height and ALSO have it be blocking to the rest of the plugin. We were never using it before and I don't think anyone has complained about this (yet) so it's probably not fatal?
| /* As of at least v0.15.1.0, bitcoind returns "success" but an empty | ||
| string on a spent txout. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't see this documented in the bitcoin-cli documentation https://developer.bitcoin.org/reference/rpc/gettxout.html , it might be good to test it out on cli to see if it every comes up with something else that's not an empty string but something like the string null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That should be alright. I've found an issue raising this.
plugins/bcli.c
Outdated
| static struct command_result *command_err_badjson(struct command *cmd, | ||
| struct bcli_result *res, | ||
| const char *errmsg) | ||
| { | ||
| char *err = tal_fmt(cmd, "%s: bad JSON: %s (%.*s)", | ||
| res->args, errmsg, (int)res->output_len, res->output); | ||
| return command_done_err(cmd, BCLI_ERROR, err, NULL); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function is actually pretty good to have! Despite the plugin now being synchronous, it doesn't have to die, which plugin_err would do.
| if (res->exitstatus != 0) | ||
| return estimatefees_null_response(cmd); | ||
|
|
||
| tokens = json_parse_simple(res->output, res->output, res->output_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found this pattern quite absurd because res->output is not a ctx! But it's used all over bcli in other places so it's not your fault at all! Have asked @rustyrussell about this and if it is intentional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asked him about this, this works okay in these scenarios because the lifetime of the pointers are the same so once the parent pointer is cleaned up any children are also cleaned up at the same time.
2faf7ec to
8512a1d
Compare
509f2f7 to
78c4f86
Compare
78c4f86 to
a8079be
Compare
abf4d82 to
ae11557
Compare
| # We block l3 from seeing close, so it will try to reestablish. | ||
| def no_new_blocks(req): | ||
| return {"error": "go away"} | ||
| return {"error": {"code": -8, "message": "Block height out of range"}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we revert with another code, it kills lightningd instantly, not making retry anymore.
I think it's expected to return a null response to the lightningd here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After dddf995 it doesn't break the tests and we can remove these changes, but I decided to leave it as it is.
ae11557 to
dddf995
Compare
| if (res->exitstatus == 8) | ||
| return getrawblockbyheight_notfound(cmd); | ||
| return command_err(cmd, res, "command failed"); | ||
| return getrawblockbyheight_notfound(cmd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think returning a null response should be alright. I don't see the case we need to return error here. We can return an empty block and try again later. Tests are waiting the same behavior.
| } | ||
| } | ||
|
|
||
| sleep(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know that the sleep was there in the async implementation of the plugin but I would add a test that runs this under load. So try to getrawblockbyheight and also try to run bcli commands at the same time.
lightningd/bitcoind.c
Outdated
| /* Check if response is an error and fail with clear message if so */ | ||
| static void check_bitcoin_error(struct bitcoind *bitcoind, const char *buf, | ||
| const jsmntok_t *toks, const char *method) | ||
| { | ||
| const jsmntok_t *err_tok, *msg_tok; | ||
|
|
||
| err_tok = json_get_member(buf, toks, "error"); | ||
| if (!err_tok) | ||
| return; | ||
|
|
||
| msg_tok = json_get_member(buf, err_tok, "message"); | ||
| if (msg_tok) | ||
| fatal("%s: %.*s", method, | ||
| json_tok_full_len(msg_tok), json_tok_full(buf, msg_tok)); | ||
| else | ||
| fatal("%s: %.*s", method, | ||
| json_tok_full_len(err_tok), json_tok_full(buf, err_tok)); | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What was the motivation behind this function. This seems like it kills bitcoind if an error response or a timeout happens from bitcoind. It's not uncommon for bitcoind (the actual bitcoin core daemon) to go out for lunch and have a timeout.
@rustyrussell checked his node and these were top 5 bitcoind timeouts in msec:
21812
22053
22406
23600
30535
60324
Separate from this, commands like estimatefees and getblockhash can fail when they are syncing. In our meeting, Rusty also suggested that it would be good to change that function from a check_bitcoin_error to something like check/get_bitcoin_result? That way you can do a lot of common result handling inside that function and as a result of that tidy up some of the code.
If we're doing this we should probably be a bit more discerning with our retry behaviour:
sendrawtransaction - This one is probably okay without a retry because we have rebroadcasting mechanisms already in lightningd.
getchaininfo - Lightningd retry loops rely on getchaininfo succeeding I think, so I wonder if this needs more internal retries or if it's okay for this to just die on one missed rpc call?
getrawblockbyheight - we have a retry on this one
estimatefees - This does fail during sync but if it returns the estimatefees_null_response() that's probably okay since it's not an error?
getutxoout - I think this one is fine as well because we only crash on a broken json.
plugins/bcli.c
Outdated
| } | ||
|
|
||
| strip_trailing_whitespace(res->output, res->output_len); | ||
| if (strlen(res->output) != 64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: magic number
plugins/bcli.c
Outdated
|
|
||
| return command_still_pending(cmd); | ||
| /* Try fetching from peers if bitcoind >= 23.0.0 */ | ||
| if (bitcoind->version >= 230000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: magic number
| const char *method, | ||
| ...) | ||
| static LAST_ARG_NULL struct bcli_result * | ||
| run_bitcoin_cli(const tal_t *ctx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use this function to refactor wait_and_check_bitcoind
d7d9a93 to
fb68ed9
Compare
fb68ed9 to
1349ce2
Compare
Add `command_err_badjson` helper for sync error handling, mirroring the async `command_err_bcli_badjson`. Store args string in `bcli_result` for consistent error messages.
Also rename command_err_badjson to generic command_err helper, since error messages aren't always about bad JSON (e.g., "command failed" for non-zero exit).
1349ce2 to
9609a0f
Compare
a55e046 to
4d62f07
Compare
Add `get_bitcoin_result` function that checks bcli plugin responses for errors and returns the result token. Previously, callbacks only detected errors when result parsing failed, ignoring the explicit error field from the plugin. Now we extract the actual error message from bcli, providing clearer reasoning when the plugin returns an error response.
Rewrite `test_bitcoin_failure` to reflect synchronous bcli behavior: the node now crashes on invalid bitcoind responses rather than retrying. Add `may_fail` and `broken_log` to handle expected crash. Update `test_bitcoind_fail_first` stderr check to match the new error message format from `get_bitcoin_result`. Update test mocks to use proper error format for "block not found". Co-authored-by: ShahanaFarooqui <[email protected]>
Remove the asynchronous execution infrastructure no longer needed after converting all bcli commands to synchronous execution. This includes removing the async callbacks, the pending request queue, etc. Fix missing `close(from)` file descriptor leak in `run_bitcoin_cliv`. Changelog-Changed: bcli plugin now uses synchronous execution, simplifying bitcoin backend communication and improving error handling reliability.
Return "not found" on any `getblockhash` exit status. Previously, only exit code 8 (block height doesn't exist) returned "not found", while other exit codes returned an error. Now any non-zero exit status returns "not found" since any failure means the block is unavailable.
… shared execution Extract `execute_bitcoin_cli` as shared function used by both `run_bitcoin_cli` and `wait_and_check_bitcoind`.
4d62f07 to
2adf968
Compare
Add `test_bcli_concurrent` to verify bcli handles concurrent requests while the `getblockfrompeer` retry path is active, simulating a pruned node scenario where `getblock` initially fails. Add `test_bcli_retry_timeout` to verify lightningd crashes with a clear error message when we run out of `getblock` retries.
2adf968 to
0278501
Compare
Changes
This PR refactors the bcli plugin from asynchronous to synchronous execution, addressing #8738.
Synchronous execution
run_bitcoin_clifunction that blocks until bitcoin-cli completes.bcli
estimatefeesreturns null feerates on failuresendrawtransactionreturnssuccess: falsewith error messagegetutxoutreturns null amount/script on failuregetrawblockbyheightreturns "not found" on anygetblockhashexit status (not just exit code 8).getchaininfofailure, corrupted JSON or exhaustedgetrawblockbyheightretries return an error to lightningd, crashing with a clear error message.getrawblockbyheightusinggetblockfrompeersimilar to async version.lightningd
get_bitcoin_resultinlightningd/bitcoind.cto check bcli responses for errors and extract the result token.getrawblockbyheightor invalid JSON).Other improvements
close(from))getblockfrompeerretry path and retry timeout behavior