Skip to content

Conversation

@zhijianli88
Copy link
Contributor

In the RXE driver environment, intermittent test failures occur due to the
resize_cq() operation succeeding before the completion of the posting WQE.
As a result, when WQE completion happens, the CQ can become full, and some
CQEs may be discarded by the kernel, causing poll_cq() to fail.

The failure is evidenced by the following error message:

rxe_enp2s0: cq#342 rxe_cq_post: queue full

To address this issue simply, introduce a short sleep after post_cq() to
ensure at least one WQE completes before performing the resize_cq()
operation. This mitigates the race condition that leads to CQ overflow.

In the RXE driver environment, intermittent test failures occur due to the
resize_cq() operation succeeding before the completion of the posting WQE.
As a result, when WQE completion happens, the CQ can become full, and some
CQEs may be discarded by the kernel, causing poll_cq() to fail.

The failure is evidenced by the following error message:
> rxe_enp2s0: cq#342 rxe_cq_post: queue full

To address this issue simply, introduce a short sleep after post_cq() to
ensure at least one WQE completes before performing the resize_cq()
operation. This mitigates the race condition that leads to CQ overflow.

Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant