webrtc: Support FIN/FIN_ACK handshake for substream shutdown #513

timwu20 · 2026-01-14T15:20:17Z

Summary

Implements proper FIN/FIN_ACK handshake for WebRTC substream shutdown according to the libp2p WebRTC-direct specification, with timeout handling to prevent indefinite waiting.

Closes #476

Changes

1. Proper FIN/FIN_ACK Handshake

poll_shutdown now waits for FIN_ACK acknowledgment before completing
Added waker mechanism to wake shutdown task when FIN_ACK is received
Ensures graceful shutdown with delivery guarantees

2. Timeout Protection

Added 5-second timeout for FIN_ACK (matching go-libp2p's behavior)
Prevents indefinite waiting if remote never sends FIN_ACK
Automatically completes shutdown after timeout to avoid resource leaks

3. Shutdown Requirements Fulfilled

✅ Stops accepting new writes during shutdown (transitions through Closing state)
✅ Flushes pending data before sending FIN
✅ Sends FIN flag with empty payload
✅ Waits for FIN_ACK acknowledgment from remote
✅ Triggers data channel closure and resource cleanup

4. Code Refactoring

Unified Message event variant to include optional flag parameter (instead of separate MessageWithFlags variant)
Unified encode functions into single function with optional flag parameter
Fixed flag handling bug: changed from bitwise operations to equality checks
Renamed plural "flags" to singular "flag" throughout codebase for consistency with protobuf schema

Implementation Details

The shutdown process now follows this state machine:

Open → Closing (blocks new writes)
Flush pending data
Send FIN flag
Closing → FinSent
Wait for FIN_ACK or timeout (5 seconds)
FinSent → FinAcked
Drop substream → channel closes → data channel cleanup

The timeout task is spawned immediately when entering FinSent state and wakes the shutdown future after 5 seconds if no FIN_ACK is received.

Testing

Added comprehensive tests for FIN/FIN_ACK handshake
Added test for timeout behavior when FIN_ACK is not received

Compatibility

Follows libp2p WebRTC specification
5-second timeout matches go-libp2p's implementation

… for acknowledgment

…f inbound to self

…ss of state

lexnv · 2026-01-22T10:00:32Z

src/transport/webrtc/util.rs

-    // Flags.
-    pub flags: Option<i32>,
+    /// Flag.
+    pub flag: Option<i32>,


nit: Maybe we could enforce here Option<Flag>?

Updated in c7f4ea8

Copilot

Pull request overview

This PR implements proper FIN/FIN_ACK handshake for WebRTC substream shutdown according to the libp2p specification. The implementation adds graceful shutdown with acknowledgment guarantees and timeout protection to prevent indefinite waiting.

Changes:

Added FIN_ACK flag to protobuf schema and implemented proper handshake with 5-second timeout
Unified message encoding API and renamed "flags" to "flag" throughout codebase for consistency
Introduced state machine with Closing, FinSent, and FinAcked states to manage shutdown lifecycle

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/schema/webrtc.proto	Added FIN_ACK flag definition to the protobuf message schema
src/transport/webrtc/util.rs	Unified encode functions into single method with optional flag parameter, renamed plural "flags" to singular "flag"
src/transport/webrtc/substream.rs	Implemented FIN/FIN_ACK handshake state machine with timeout mechanism, updated event handling, added comprehensive tests
src/transport/webrtc/connection.rs	Updated to handle new Message event structure with optional flag parameter
src/transport/webrtc/opening.rs	Updated encode function calls to pass None for flag parameter

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/transport/webrtc/substream.rs

lexnv · 2026-01-22T10:04:22Z

src/transport/webrtc/substream.rs

    rx: Receiver<Event>,
+
+    /// Waker to notify when shutdown completes (FIN_ACK received).
+    shutdown_waker: Arc<Mutex<Option<Waker>>>,


Could an AtomicWaker be used instead?

Updated in 42de0ab.

lexnv · 2026-01-22T10:15:20Z

src/transport/webrtc/substream.rs

+    Closing,
+
+    /// We sent FIN, waiting for FIN_ACK.
+    FinSent,


How do we handle receiving a FIN packet while already in the FinSent state? In other words, due to net delays remote has decided to send a FIN and we receive it immediately after sending our own packet

This is expected behaviour since we are waiting on remote to send FIN_ACK before closing our write side of the channel. When we receive a FIN from the remote we send FIN_ACK back and close the read side of the channel. It's just in practice that we don't support this "half-closed" read/write side of the transport since when we call shutdown on the substream, we are closing both sides. So if we have sent the FIN flag to the remote, we called shutdown on the substream.

lexnv · 2026-01-22T10:20:56Z

src/transport/webrtc/substream.rs

+
+                // Spawn timeout task to wake us after FIN_ACK_TIMEOUT
+                let waker = cx.waker().clone();
+                tokio::spawn(async move {


Could the spanw be avoided here?

Would it be possible for poll_shutdown to be called w different waker instances?

in this case we'll spawn a timeout which after 5s wakes the wrong waker

the poll contract states we should preserve the latest waker received

Maybe change this to an optional pinned tokio::time::sleep? Then we'll wake the same function and wake the appropriate waker

nit: maybe use pin_project to avoid the boxing? Or similar strategy to how PollSender manages mem with memcpy under the hood

Given that I've changed it to use AtomicWaker in 42de0ab, which updates to the latest worker it should now work if poll_shutdown is called multiple times. However, it's quite unlikely that the poll_shutdown would be called multiple times since the Substream is not polled in different contexts.

I've avoided the spawn by using a Option<Pin<Box<tokio::time::Sleep>>> in 2ec58e4, but I was unable to use pin_project because it would make Substream !Unpin, which is required in the rest of the codebase. The tradeoff being we are doing a small heap allocation for the timeout when shutting down.

lexnv · 2026-01-22T10:27:40Z

src/transport/webrtc/util.rs

-        let protobuf_payload = schema::webrtc::Message {
-            message: (!payload.is_empty()).then_some(payload),
-            flag: Some(flags),
+            flag,


The code below is a bit unoptimized:

one allocation for vec::with_capacity (protobuf enc)

one allocation for bytes mut for variant-prefix

Could we write something like:

let proto_len = protobuf_payload.encoded_len(); let varint_len = unsigned_varint::encode::usize_buffer().len() // single alloc let mut out_buf = Vec::with_capacity(varint_len + proto_len); let mut varint_buf = unsigned_varint::encode::usize_buffer(); let varint_bytes = unsigned_varint::encode::usize(proto_len, &mut varint_buf); out_buf.extend_from_slice(varint_bytes); // Encode protobuf directly into final buffer protobuf_payload.encode(&mut out_buf).expect(..suffucient)

?

I made the changes in a4effe2. I also optimized the decode function as well.

lexnv · 2026-01-22T10:29:29Z

src/transport/webrtc/util.rs

            Ok(message) => Ok(Self {
                payload: message.message,
-                flags: message.flag,
+                flag: message.flag,


nit: BytesMut::from(payload) will copy the entire payload, instead could we:

let (len, remaining) = unsigned_varint::decode::usize(payload)

then decode the mssage via let message = schema::webrtc::Message::decode(&remaining[..len])?

I implemented the suggestion in a4effe2.

lexnv · 2026-01-22T10:32:31Z

src/transport/webrtc/substream.rs

+            }
+
+            if flag == Flag::StopSending as i32 {
                *self.state.lock() = State::SendClosed;


Could we receive the flag StopSending in the following race-case?

T0: poll_write is called with Open

T1: poll_reserve(cx) returns Pending -- waker registered for channel capacity

T2: remote sends us the STOP_SENDING flag

Because we are not wakeing up any waker here, the poll_write will block indefinetely until we get channel capacity

Good catch! This should be fixed now in 73f76ec. I introduced a write_waker to Substream and SubstreamHandle to re-poll the waker in the event that a STOP_SENDING flag is received.

lexnv · 2026-01-22T10:36:24Z

src/transport/webrtc/substream.rs

    SendClosed,
+
+    /// Shutdown initiated, flushing pending data before sending FIN.
+    Closing,


Should we couple any State transition with an associated waker?

In other words, who is going to wake up the counter part of poll_shutdown or poll_write if we are transitioning the state from on_message or individual poll functions?

This should be addressed now in 73f76ec.

lexnv · 2026-01-22T10:38:16Z

src/transport/webrtc/substream.rs

            }

-            if flags & 2 == Flag::ResetStream as i32 {
+            if flag == Flag::ResetStream as i32 {


Similarly to the state message, we are not waking other poll_ functions to announce the state transition. Shouldn't we do that to avoid blocking / delys in processing?

We are waking the write_waker if we receive this flag in 73f76ec.

lexnv

Generally looks good to me! Left a few comments to close the gap on poll impl missing wakers, and some suggestions to optimize lock-free state, encoding / decoding inefficiencies, and wake up on state transitions 🙏

lexnv · 2026-01-22T10:44:27Z

src/transport/webrtc/substream.rs


    /// Remote is no longer interested in receiving anything.
    SendClosed,
+


We could also replace the state with an AtomicU8 and have the implementation look like:

struct SharedState { // lock-free / more efficient // BITS 0 to 3 // 0: open 1 sendClosed 2: closing 3: FinSent (our states) 4: FinAck // BIT 4: remote recv closed via FIN recv // BIT 5: STOP_SENDING remote send closed state: AtomicU8, // Waker for tasks waiting on state changes (ie poll write blocked on state, but poll shutdown has FIN_ACK) -- triggered via on_message for STOP_SENDING, FIN_ACK, RESET_STREAM etc state_waker: AtomicWaker, }

Given that state transitions are relatively rare in this protocol, the performance gain would not be that pronounced. I also think it's easier to understand semantically using the enum. I think It's more of a nice-to-have optimization, rather than a must have.

In terms of the single state_waker AtomicWaker, given that the Substream must be owned and polled by exactly one task, I think it could be merged, but I think it makes it easier to understand semantically about which waker we are registering. I'd rather not introduce the burden of figuring out if we should register over an already registered AtomicWorker based on state transition.

Does that make sense @lexnv?

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/transport/webrtc/substream.rs

src/transport/webrtc/util.rs

src/transport/webrtc/substream.rs

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/transport/webrtc/util.rs

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-23T03:44:07Z

src/transport/webrtc/substream.rs

+                    if self.read_closed.swap(true, std::sync::atomic::Ordering::SeqCst) {
+                        // Already processed FIN, ignore duplicate
+                        tracing::debug!(
+                            target: "litep2p::webrtc::substream",
+                            "received duplicate FIN, ignoring"
+                        );
+                        return Ok(());
+                    }
+
+                    // Received FIN from remote, close our read half
+                    self.inbound_tx.send(Event::RecvClosed).await?;
+
+                    // Send FIN_ACK back to remote using try_send to avoid blocking.
+                    // If the channel is full, the remote will timeout waiting for FIN_ACK
+                    // and handle it gracefully. This prevents deadlock if the outbound
+                    // channel is blocked due to backpressure.
+                    if let Err(e) = self.outbound_tx.try_send(Event::Message {
+                        payload: vec![],
+                        flag: Some(Flag::FinAck),
+                    }) {
+                        tracing::warn!(
+                            target: "litep2p::webrtc::substream",
+                            ?e,
+                            "failed to send FIN_ACK, remote will timeout"
+                        );
+                    }
+                    return Ok(());


The duplicate FIN handling only prevents sending multiple RecvClosed events via the read_closed atomic flag check, but it still attempts to send FIN_ACK even for duplicate FINs. While the try_send might fail if the channel is closed, it would be cleaner to return early after detecting a duplicate FIN (at line 199) to avoid attempting to send FIN_ACK unnecessarily.

Additionally, if a FIN is received when we're already in the FinAcked state (we've completed our own shutdown), sending another FIN_ACK might be redundant. However, this is a minor issue as the remote should not be sending FIN after receiving our FIN_ACK, and if they do, sending another FIN_ACK is harmless.

The code already returns early after detecting a duplicate FIN. The swap returns the previous value. If read_closed was already true, we return immediately at line 199 without sending FIN_ACK.

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/transport/webrtc/substream.rs

…ents

Copilot

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lexnv · 2026-01-23T09:38:16Z

src/transport/webrtc/substream.rs

        mut self: Pin<&mut Self>,
        cx: &mut Context<'_>,
    ) -> Poll<Result<(), std::io::Error>> {
+        // State machine for proper shutdown:


In poll_flush we return immediately, IIUC we should send the FIN packet only after all pending data is delivered. Do we handle this gracefully or can deliver FIN before buffered data reaches the remote?

Yes with the poll_reserve call on line 446, we won't send the FIN via self.tx unless the channel has space. After the FIN flag is sent, we won't drop the connection until a FIN_ACK is received or the timeout expires.

lexnv · 2026-01-23T09:39:39Z

src/transport/webrtc/substream.rs

+
+/// Shorter timeout for tests.
+#[cfg(test)]
+const FIN_ACK_TIMEOUT: Duration = Duration::from_secs(2);


We can remove this and rely on 5s as production

Updated in 184971c.

lexnv

LGTM!

Let's create an issue for implementing the lock-free atomic state, thanks

timwu20 · 2026-01-23T21:58:24Z

LGTM!

Let's create an issue for implementing the lock-free atomic state, thanks

Created an issue to track this #523

timwu20 added 7 commits January 8, 2026 15:42

webrtc: Implement proper FIN/FIN_ACK handshake and fix flag handling bug

5a64e7e

webrtc: Unify event variants and encoding to use optional flag parameter

057eac8

webrtc: Implement proper shutdown with FIN/FIN_ACK handshake and wait…

0027ec7

… for acknowledgment

webrtc: Add timeout for FIN_ACK to prevent indefinite waiting

6de9d33

webrtc: Fix clippy and fmt

4c5917d

webrtc: Fix FIN_ACK response to be sent outbound to network instead o…

4695143

…f inbound to self

webrtc: Fix data channel closure by detecting substream drop regardle…

8578487

…ss of state

haikoschol approved these changes Jan 14, 2026

View reviewed changes

dimartiro approved these changes Jan 15, 2026

View reviewed changes

lexnv assigned lexnv and timwu20 and unassigned lexnv Jan 22, 2026

lexnv requested review from Copilot, dmitry-markin and lexnv January 22, 2026 09:57

Copilot started reviewing on behalf of lexnv January 22, 2026 09:57 View session

lexnv reviewed Jan 22, 2026

View reviewed changes

Copilot AI reviewed Jan 22, 2026

View reviewed changes

src/transport/webrtc/substream.rs Outdated Show resolved Hide resolved

src/transport/webrtc/substream.rs Outdated Show resolved Hide resolved

lexnv reviewed Jan 22, 2026

View reviewed changes

timwu20 and others added 3 commits January 22, 2026 11:46

webrtc: Use protobuf-generated Flag enum instead of i32

c7f4ea8

Update src/transport/webrtc/substream.rs

833da9b

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Update src/transport/webrtc/substream.rs

35b99f4

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

timwu20 requested a review from Copilot January 23, 2026 03:11

Copilot started reviewing on behalf of timwu20 January 23, 2026 03:11 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

src/transport/webrtc/substream.rs Show resolved Hide resolved

src/transport/webrtc/substream.rs Outdated Show resolved Hide resolved

src/transport/webrtc/util.rs Outdated Show resolved Hide resolved

src/transport/webrtc/substream.rs Show resolved Hide resolved

timwu20 added 4 commits January 22, 2026 22:29

webrtc: Log warning for unknown flag values in message decoding

e3a436e

webrtc: Fix typo in WebRtcMessage doc comment

0a24d6b

webrtc: Address review feedback for FIN/FIN_ACK handshake

1cc104a

webrtc: Additional review feedback fixes

2440797

timwu20 force-pushed the tim/support-flags branch from 2d78d7e to 2440797 Compare January 23, 2026 03:30

timwu20 requested a review from Copilot January 23, 2026 03:31

Copilot started reviewing on behalf of timwu20 January 23, 2026 03:31 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

src/transport/webrtc/util.rs Outdated Show resolved Hide resolved

webrtc: Fix varint length calculation to use ilog2

a36a04e

timwu20 requested a review from Copilot January 23, 2026 03:38

Copilot started reviewing on behalf of timwu20 January 23, 2026 03:38 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

timwu20 requested a review from Copilot January 23, 2026 03:50

Copilot started reviewing on behalf of timwu20 January 23, 2026 03:50 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

src/transport/webrtc/substream.rs Show resolved Hide resolved

src/transport/webrtc/substream.rs Outdated Show resolved Hide resolved

webrtc: Add test for FIN with payload and clarify race condition comm…

d537573

…ents

timwu20 requested a review from Copilot January 23, 2026 04:05

Copilot started reviewing on behalf of timwu20 January 23, 2026 04:05 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

lexnv reviewed Jan 23, 2026

View reviewed changes

lexnv approved these changes Jan 23, 2026

View reviewed changes

haikoschol mentioned this pull request Jan 23, 2026

webrtc: Support FIN/FIN_ACK handshake for substream shutdown ChainSafe/litep2p#4

Closed

timwu20 mentioned this pull request Jan 23, 2026

webrtc: Replace Mutex<State> with lock-free AtomicU8 and consolidate wakers for WebRTC substream #523

Open

webrtc: Remove separate test timeout

184971c

timwu20 force-pushed the tim/support-flags branch from cecf23b to 184971c Compare January 23, 2026 21:59


		/// Remote is no longer interested in receiving anything.
		SendClosed,

webrtc: Support FIN/FIN_ACK handshake for substream shutdown #513

Are you sure you want to change the base?

webrtc: Support FIN/FIN_ACK handshake for substream shutdown #513

Uh oh!

Conversation

timwu20 commented Jan 14, 2026

Summary

Changes

1. Proper FIN/FIN_ACK Handshake

2. Timeout Protection

3. Shutdown Requirements Fulfilled

4. Code Refactoring

Implementation Details

Testing

Compatibility

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timwu20 Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timwu20 Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

timwu20 Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lexnv left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

timwu20 Jan 22, 2026 •

edited

Loading

timwu20 Jan 22, 2026 •

edited

Loading

timwu20 Jan 22, 2026 •

edited

Loading