Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fabtests/common: Set the min of tx/rx_mr_size #10819

Merged
merged 1 commit into from
Feb 25, 2025
Merged

Conversation

shijin-aws
Copy link
Contributor

In FT_OPT_ALLOC_MULT_MR mode, currently tx/rx_mr_size is set as the opts.transfer_size, and the subsequent ft_alloc_ctx_array will use this size to allocate
tx/rx buffers in the ctx array.

However, when test call ft_post_rx_buf, it will
finally post a size as MAX(opts.transfer_size, FT_MAX_CTRL_MSG). When opts.transfer_size is smaller than FT_MAX_CTRL_MSG, it will cause test call fi_recv of size larger than the actual buffer (and MR) size and cause error when the provider offload the recv directly to hardware.

This patch fixes this issue by making the tx/rx_mr_size has a min value as the FT_MAX_CTRL_MSG.

In FT_OPT_ALLOC_MULT_MR mode, currently tx/rx_mr_size
is set as the opts.transfer_size, and the subsequent
ft_alloc_ctx_array will use this size to allocate
tx/rx buffers in the ctx array.

However, when test call ft_post_rx_buf, it will
finally post a size as MAX(opts.transfer_size, FT_MAX_CTRL_MSG).
When opts.transfer_size is smaller than FT_MAX_CTRL_MSG,
it will cause test call fi_recv of size larger than
the actual buffer (and MR) size and cause error when
the provider offload the recv directly to hardware.

This patch fixes this issue by making the tx/rx_mr_size
has a min value as the FT_MAX_CTRL_MSG.

Signed-off-by: Shi Jin <sjina@amazon.com>
Copy link
Contributor

@aingerson aingerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Shi!

@aingerson
Copy link
Contributor

@shijin-aws Please hold on merging. CI failed fabtests stages for multiple providers. Our cluster is having issues so I need to figure out if it had anything to do with the changes or if it was the cluster was in a bad state.

@shijin-aws
Copy link
Contributor Author

@aingerson @zachdworkin any update on the Intel CI failure?

@zachdworkin
Copy link
Contributor

@shijin-aws our CI cluster is having node issues. We are resolving them and will run ASAP

@jiaxiyan jiaxiyan merged commit 4d7d7c1 into ofiwg:main Feb 25, 2025
13 checks passed
@shijin-aws shijin-aws deleted the mr_size branch February 25, 2025 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants