Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCT: Removed XGVMI BF2 support (umem) #10476

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

iyastreb
Copy link
Contributor

@iyastreb iyastreb commented Feb 6, 2025

What?

This PR discontinues XGVMI support on BF2 devices, by removing umem fallback XGVMI registration

However the tricky part here is that FW capability does not allow to detect whether indirect XGVMI mkeys are actually supported. So we do this check by trying to allow XGVMI access on flush_rkey during MD open.

This has consequences for GGA as well: see the changes

@iyastreb iyastreb requested a review from yosefe February 6, 2025 13:11
}

return uct_ib_mlx5_devx_xgvmi_umem_mr(md, memh);
return UCS_ERR_UNSUPPORTED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. can return UNSUPPORTED after the if in line 2728
  2. shall we check UCT_IB_MLX5_MD_FLAG_INDIRECT_XGVMI support in order to report if the MD supports exported key?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, that's the desired behavior
    We noticed that in some cases UCT_IB_MLX5_MD_FLAG_INDIRECT_XGVMI capability is advertised, but not actually supported. In this case we detect this and revoke this capability (line 2743). In this case we return UNSUPPORTED

  2. Ok, we already kind of do that:

        if (uct_ib_mlx5_devx_check_xgvmi(cap_2, uct_ib_device_name(dev))) {
            md->flags           |= UCT_IB_MLX5_MD_FLAG_INDIRECT_XGVMI;
            md->super.cap_flags |= UCT_MD_FLAG_EXPORTED_MKEY;
        }

There is a subtle difference between INDIRECT_XGVMI and EXPORTED_MKEY. The first one could be revoked if later on we detect that XGVMI is not supported at runtime

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it means we remove the fallback approach for supporting exported mkey, right?
and we expose exported mkey support in MD caps, while it may not be really possible to support.
Would it be possible to check somehow whether we can create ksms for exported key during MD creation? Then we would not need UCT_IB_MLX5_MD_FLAG_INDIRECT_XGVMI at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right

can create ksms for exported key during MD creation?

I have another idea.
If I understand correctly it is only BF2 device who advertizes INDIRECT_XGVMI support, but then fails to register it. Maybe we just disable EXPORTED_MKEY on BF2 (it is no-op anyway without UMEM fallback)?
And then we can remove UCT_IB_MLX5_MD_FLAG_INDIRECT_XGVMI

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me

@iyastreb iyastreb force-pushed the uct/remove-xgvmi-bf2-support branch from 2ea3965 to 04c837f Compare February 7, 2025 07:53
@iyastreb
Copy link
Contributor Author

iyastreb commented Feb 7, 2025

/azp run UCX PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Comment on lines +206 to +210
/*
* This check does not guarantee that XGVMI works with indirect mkeys.
* We can filter out devices that does not support XGVMI at all.
*/
result = uct_ib_mlx5_devx_check_xgvmi(cap);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will it guarantee?

Suggested change
/*
* This check does not guarantee that XGVMI works with indirect mkeys.
* We can filter out devices that does not support XGVMI at all.
*/
result = uct_ib_mlx5_devx_check_xgvmi(cap);
result = uct_ib_mlx5_devx_check_xgvmi(cap) &&
uct_ib_mlx5_devx_allow_xgvmi_access(device, uct_ib_mlx5_flush_rkey_make(), 1);

+minor refactoring of uct_ib_mlx5_devx_allow_xgvmi_access

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love it to be that simple, but of course it doesn't fly.
We need to allow XGVMI on some real valid mkey, not a dummy one like this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's the difference with

    if ((cap_2 != NULL) && (md->flush_mr != NULL) &&
        (md->super.flush_rkey != UCT_IB_MD_INVALID_FLUSH_RKEY) &&
        uct_ib_mlx5_devx_check_xgvmi(cap_2)) {
            status = uct_ib_mlx5_devx_allow_xgvmi_access(
                                                md, md->super.flush_rkey, 1);

in md_open?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The difference is that flush_rkey is a real mkey created in md_open in uct_ib_mlx5_devx_init_flush_mr.
I just reuse that key for XGVMI check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I missed devx path.. but looking into uct_ib_mlx5_devx_init_flush_mr and uct_ib_mlx5_md_buf_alloc it seems like MD is not really required, we have ibv_context and can reg/dereg local buffer to build temporary key. Do I miss something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you pls check the case with ucs_info -d described in #10372 which I fixed recently? I guess this PR in current state is a kind of regression..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to implement it the way you proposed, it should be consistent then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, it's implemented, with a little hack to avoid massive refactoring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it LGTM if uct_ib_mlx5_devx_dummy_md_init/clenup will be removed in next PR with corresponding refactoring, I think it could be useful for IB MD as well for other caps which can be detected only in runtime.

Copy link
Contributor Author

@iyastreb iyastreb Feb 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is another tricky issue with this approach..
I see that many fork tests are failing in CI, and the root cause is the initialization order: we must call uct_ib_fork_init before any ibv_create_* call:
#686

/*
169  * ibv_fork_init testing - if fork support is requested then ibv_fork_init                                                       
170  * should be called right at the beginning of the verbs initialization flow, before ibv_create_* call.
171  *
172  * Known limitations:
173  * If ibv_fork_init is called after ibv_create_* functions - it will have no effect.
174  * OMPI initializes verbs many times during initialization in the following verbs components:
175  *      oob/ud, btl/openib, mtl/mxm, pml/yalla, oshmem/ikrit, oshmem/yoda, ompi/mca/coll/{fca,hcoll}
176  *
177  * So, ibv_fork_init should be called once, in the beginning of the init flow of every verb component
178  * to proper request fork support.
179  *
180  */

Violation of this rule leads to warnings in tests and potential fork errors at runtime.
So I tend to revert the change for detecting XGVMI support before MR is created: it was already a bit hack-ish, and with this fork limitation it is just not worth it

@iyastreb iyastreb force-pushed the uct/remove-xgvmi-bf2-support branch from e59abc5 to e2baa4f Compare February 18, 2025 08:10
* indirect mkeys is actually supported. Therefore we do this check by
* allowing XGVMI on indirect KSM flush_rkey
*/
if ((cap_2 != NULL) && (md->flush_mr != NULL) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in which case md->flush_mr is NULL? maybe we can assert it is not NULL?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible error outputs of uct_ib_mlx5_devx_init_flush_mr:

  • flush_rkey=INVALID, flush_mr=NULL - can be if UCT_IB_MLX5_MD_FLAG_KSM is not supported (which is also a correct precondition for XGVMI)
  • flush_rkey=DUMMY, flush_mr=NULL - if uct_ib_reg_mr or KSM registration fails

So assertion on flush_mr!=NULL is not good, because it can be a valid device without KSM reg. At least currently we handle this situation gracefully in uct_ib_mlx5_devx_init_flush_mr.

In principle it's enough to check that flush_mr!=NULL, but I also check flush_rkey is valid just for extra safety

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe assert flush_rkey is not invalid? instead of checking it as part of the if condition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But if we do assert(flaush_rkey != INVALID) then it will crash on HW where KSM is not supported, right?
I guess this is BF2. It is actually the first point that I described in my comment - KSM is not supported is pretty valid use case IMO

@iyastreb
Copy link
Contributor Author

/azp run UCX PR

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants