Skip to content

Port the full VF2Layout pass to Rust #14056

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Apr 7, 2025
Merged

Conversation

mtreinish
Copy link
Member

@mtreinish mtreinish commented Mar 19, 2025

Summary

This commit provides a full path Rust implementation of the VF2Layout pass that performs the full pass logic in Rust. We had most of the pass in Rust already by leveraging rustworkx for its vf2 implementation and the scoring logic in Rust. So all that we did in Python was create the graphs as inputs to vf2, building the error map and other rust view objects (since the scoring in Rust predates our rust data model in Qiskit) and then process each found layout for final output. This means the performance improvement expected by this commit is not large since most of the heavy lifting was already in Rust. Instead what this really enables is running the pass in a Python-free context, mainly in a C API for executing this pass which will come in a follow up.

This rust implementation does not cover the full feature set of what the Python implementation offers. Mainly in the case of no target, coupling graph randomization (which is a shame we forgot to remove in 2.0, since it almost always produces worse results), or a user provided error map we rely on the existing Python implementation. This is a tradeoff because these use cases are non-standard and having a Rust interface for running in that mode provides little benefit for the interfaces we're building.

Details and comments

Fixes #12277

TODO:

  • Fix panics in unittests
  • Deduplicate implementation around generics for directed vs undirected
  • Benchmark and tune (and write release note if there is a performance improvement)

This commit provides a full path Rust implementation of the VF2Layout
pass that performs the full pass logic in Rust. We had most of the
pass in Rust already by leveraging rustworkx for its vf2
implementation and the scoring logic in Rust. So all that we did in
Python was create the graphs as inputs to vf2, building the error map
and other rust view objects (since the scoring in Rust predates our
rust data model in Qiskit) and then process each found layout for final
output. This means the performance improvement expected by this commit
is not large since most of the heavy lifting was already in Rust.
Instead what this really enables is running the pass in a Python-free
context, mainly in a C API for executing this pass which will come in a
follow up.

This rust implementation does not cover the full feature set of what the
Python implementation offers. Mainly in the case of no target, coupling
graph randomization (which is a shame we forgot to remove in 2.0, since it
almost always produces worse results), or a user provided error map we
rely on the existing Python implementation. This is a tradeoff because
these use cases are non-standard and having a Rust interface for running
in that mode provides little benefit for the interfaces we're building.

Fixes Qiskit#12277
@mtreinish mtreinish added on hold Can not fix yet Changelog: None Do not include in changelog Rust This PR or issue is related to Rust code in the repository mod: transpiler Issues and PRs related to Transpiler labels Mar 19, 2025
@mtreinish mtreinish added this to the 2.1.0 milestone Mar 19, 2025
@mtreinish mtreinish requested a review from a team as a code owner March 19, 2025 17:22
@qiskit-bot
Copy link
Collaborator

One or more of the following people are relevant to this code:

  • @Qiskit/terra-core

This commit address some of the panics and failures seen in scheduling
tests. The source of the errors was two fold, the first was the layout
definition coming from the mapping is sparse but we were using nlayout
which doesn't handle that well so we ended up with several sentinel
values persiting to scoring when scoring the layout which caused errors.
The second issue was we were missing the special handling of free qubits
that was added as a performance optimization in Qiskit#9148 which was causing
other failures since some of the code was expecting that.
This commit fixes several issues all stemming from the same root cause.
There are multiple mappings of indices to qubits to keep track of and
the previous commits in this PR branch were not leveraging the type
system to keep these sane. This resulted in getting the mapping between
representations incorrect at various points that was causing failures
and/or panics when the mappings were not resolved correctly. This commit
fixes the issue by moving to a typed representation of the layout as
soon as posible. This forces all the places that reference a potential
layout to reason about the various mappings correctly.
Now that the pass runs in rust we don't emit python logs for the running
of the pass. This commit removes the test log assertions since they're
no longer present.
This commit fixes the logic in the free qubit mapping and also skips
running vf2 if there are no qubits in the interaction graph and there
are free qubits.
@coveralls
Copy link

coveralls commented Mar 20, 2025

Pull Request Test Coverage Report for Build 14313701929

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 404 of 483 (83.64%) changed or added relevant lines in 4 files are covered.
  • 23 unchanged lines in 5 files lost coverage.
  • Overall coverage increased (+0.08%) to 88.147%

Changes Missing Coverage Covered Lines Changed/Added Lines %
qiskit/transpiler/passes/layout/vf2_layout.py 14 17 82.35%
crates/accelerate/src/vf2_layout.rs 385 461 83.51%
Files with Coverage Reduction New Missed Lines %
crates/accelerate/src/unitary_synthesis.rs 1 94.79%
qiskit/transpiler/passes/layout/vf2_utils.py 3 95.14%
crates/qasm2/src/lex.rs 6 92.23%
crates/qasm2/src/parse.rs 6 96.68%
qiskit/transpiler/passes/layout/vf2_layout.py 7 91.38%
Totals Coverage Status
Change from base Build 14312020200: 0.08%
Covered Lines: 73451
Relevant Lines: 83328

💛 - Coveralls

The earlier approaches were querying the target for each gate while
iterating over all the trials. This ends up doing far more hashmap
lookups (2x for each gate on a node or edge for each layout) this ends
up dominating the runtime. This also fixes an inconsistency in how the
scoring worked because we weren't correctly accounting for the number of
interactions in the circuit.
This commit fixes a bug that slipped in during an earlier fix where the
sort order for the free qubits mapping was accidentally flipped. The
intent is to put the lowest error free qubit at the end of the vec so
it's the first popped from the stack when mapping it to a qubit with
solely single qubit operations. However the comparison got reversed
accidentally and that was causing the highest error rate qubits to be
used instead. This commit fixes this oversight and fixes the one fialing
visualization test that was using the wrong qubit compared to the
expected one.
@mtreinish
Copy link
Member Author

mtreinish commented Mar 21, 2025

The one test that is failing: test.python.compiler.test_transpiler.TestTranspile.test_scheduling_dt_constraints looks to be hitting a bug elsewhere in Qiskit. The underlying issue is that the pass manager is constructed with: transpile(qc, backend=backend_v2, scheduling_method="asap", dt=backend.target.dt / 2) and the error rates from the target seem to disappear when vf2layout is passed the target object. This is causing the test to fail because it doesn't have a metric to pick a qubit from in the updated pass in rust.

I think what is happening is that there is a Target.from_configuration call to override the instruction durations with a custom dt and that doesn't know about the error rates from the original target and it's simply omitting them.

@mtreinish
Copy link
Member Author

The performance here is looking pretty good without any real tuning, especially considering most of the heavy lifting was in rust already,

Benchmarks that have improved:

| Change   | Before [e0da163e] <vf2-rs~3^2>   | After [daec870e] <vf2-rs>   |   Ratio | Benchmark (Parameter)                                   |
|----------|----------------------------------|-----------------------------|---------|---------------------------------------------------------|
| -        | 7.55±0.05ms                      | 6.32±0.06ms                 |    0.84 | utility_scale.UtilityScaleBenchmarks.time_bvlike('ecr') |
| -        | 7.61±0.02ms                      | 6.31±0.07ms                 |    0.83 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cx')  |
| -        | 7.57±0.08ms                      | 6.32±0.07ms                 |    0.83 | utility_scale.UtilityScaleBenchmarks.time_bvlike('cz')  |

Benchmarks that have stayed the same:

| Change   | Before [e0da163e] <vf2-rs~3^2>   | After [daec870e] <vf2-rs>   | Ratio   | Benchmark (Parameter)                                                         |
|----------|----------------------------------|-----------------------------|---------|-------------------------------------------------------------------------------|
|          | 0                                | 0                           | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cx')                 |
|          | 0                                | 0                           | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('cz')                 |
|          | 0                                | 0                           | n/a     | utility_scale.UtilityScaleBenchmarks.track_bvlike_depth('ecr')                |
|          | 94.2±0.9ms                       | 95.0±1ms                    | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cz')                |
|          | 30.4±0.2ms                       | 30.7±0.1ms                  | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cz')  |
|          | 30.1±0.1ms                       | 30.5±0.2ms                  | 1.01    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('ecr') |
|          | 8.78±0.06ms                      | 8.76±0.1ms                  | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cx')               |
|          | 8.80±0.04ms                      | 8.80±0.06ms                 | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('cz')               |
|          | 95.1±0.5ms                       | 95.3±0.8ms                  | 1.00    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('ecr')               |
|          | 387                              | 387                         | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cx')                 |
|          | 402                              | 402                         | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('cz')                 |
|          | 406                              | 406                         | 1.00    | utility_scale.UtilityScaleBenchmarks.track_bv_100_depth('ecr')                |
|          | 300                              | 300                         | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cx')                |
|          | 300                              | 300                         | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('cz')                |
|          | 300                              | 300                         | 1.00    | utility_scale.UtilityScaleBenchmarks.track_circSU2_depth('ecr')               |
|          | 1727                             | 1727                        | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cx')                   |
|          | 1734                             | 1734                        | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('cz')                   |
|          | 1734                             | 1734                        | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qaoa_depth('ecr')                  |
|          | 1982                             | 1982                        | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cx')                    |
|          | 1982                             | 1982                        | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('cz')                    |
|          | 1980                             | 1980                        | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qft_depth('ecr')                   |
|          | 2694                             | 2694                        | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cx')                     |
|          | 2694                             | 2694                        | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('cz')                     |
|          | 2694                             | 2694                        | 1.00    | utility_scale.UtilityScaleBenchmarks.track_qv_depth('ecr')                    |
|          | 558                              | 558                         | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cx')      |
|          | 558                              | 558                         | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('cz')      |
|          | 558                              | 558                         | 1.00    | utility_scale.UtilityScaleBenchmarks.track_square_heisenberg_depth('ecr')     |
|          | 818±20ms                         | 812±20ms                    | 0.99    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cz')                       |
|          | 826±20ms                         | 814±20ms                    | 0.99    | utility_scale.UtilityScaleBenchmarks.time_circSU2('ecr')                      |
|          | 95.1±0.4ms                       | 94.3±1ms                    | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_qft_n100('cx')                |
|          | 30.9±0.2ms                       | 30.7±0.2ms                  | 0.99    | utility_scale.UtilityScaleBenchmarks.time_parse_square_heisenberg_n100('cx')  |
|          | 528±2ms                          | 523±2ms                     | 0.99    | utility_scale.UtilityScaleBenchmarks.time_qft('ecr')                          |
|          | 105±2ms                          | 104±0.7ms                   | 0.99    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('ecr')            |
|          | 73.7±0.9ms                       | 71.8±1ms                    | 0.98    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cz')                        |
|          | 73.5±0.6ms                       | 71.8±0.4ms                  | 0.98    | utility_scale.UtilityScaleBenchmarks.time_bv_100('ecr')                       |
|          | 9.00±0.06ms                      | 8.81±0.08ms                 | 0.98    | utility_scale.UtilityScaleBenchmarks.time_parse_qaoa_n100('ecr')              |
|          | 271±4ms                          | 265±1ms                     | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cz')                          |
|          | 408±2ms                          | 400±2ms                     | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qft('cx')                           |
|          | 512±5ms                          | 500±3ms                     | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qv('cx')                            |
|          | 619±4ms                          | 604±8ms                     | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qv('cz')                            |
|          | 621±9ms                          | 606±2ms                     | 0.98    | utility_scale.UtilityScaleBenchmarks.time_qv('ecr')                           |
|          | 105±1ms                          | 103±1ms                     | 0.98    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cz')             |
|          | 821±20ms                         | 795±30ms                    | 0.97    | utility_scale.UtilityScaleBenchmarks.time_circSU2('cx')                       |
|          | 203±1ms                          | 197±3ms                     | 0.97    | utility_scale.UtilityScaleBenchmarks.time_qaoa('cx')                          |
|          | 257±1ms                          | 250±3ms                     | 0.97    | utility_scale.UtilityScaleBenchmarks.time_qaoa('ecr')                         |
|          | 70.9±0.7ms                       | 67.7±0.4ms                  | 0.96    | utility_scale.UtilityScaleBenchmarks.time_bv_100('cx')                        |
|          | 515±4ms                          | 495±5ms                     | 0.96    | utility_scale.UtilityScaleBenchmarks.time_qft('cz')                           |
|          | 92.0±1ms                         | 88.5±0.8ms                  | 0.96    | utility_scale.UtilityScaleBenchmarks.time_square_heisenberg('cx')             |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@mtreinish mtreinish removed the on hold Can not fix yet label Mar 25, 2025
@mtreinish mtreinish changed the title [WIP] Port the full VF2Layout pass to Rust Port the full VF2Layout pass to Rust Mar 25, 2025
@mtreinish
Copy link
Member Author

I took this out of WIP because everything is working now and I think it's ready for review. There is still some internal cleanup to the code I think we can do and I'll work on this but I don't want to block on internal code organization because I expect there will be a long tail of changes built on top of this (vf2 post layout, other optimizations, etc).

mtreinish added a commit to mtreinish/qiskit-core that referenced this pull request Apr 4, 2025
This commit ports the disjoint layout utils into rust. The eventual goal
of this is to enable porting the full DenseLayout and SabreLayout pass
to rust so it's a standalone function. This will enable us exposing
these passes as a C function. Since the eventual goal of this
functionality is to expose it as a C function this attempts to do as
much as possible without a py token. A standalone rust entrypoint is
added which will serve as what is called from the standalone rust passes
in the future, this is then wrapped in a python entrypoint.

One minor note is the visibility of the run pass over components
entrypoint is `pub(crate)`, but the intent is for it to be `pub`. This
visibility was limited here to avoid changing the target's visbility
level (which is `pub(crate)` for some reason). This is already being
fixed in Qiskit#14056 and when that merges we can update this PR.
Copy link
Member

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have spent quite a bit of time looking at the code, and it makes good sense to me. The only thing that I still don't fully understand is that the free qubits are treated a bit differently in the directed and the undirected cases. Could you please explain the reason for that?

fn build_average_error_map(target: &Target) -> ErrorMap {
let qargs_count = target.qargs().unwrap().count();
let mut error_map = ErrorMap::new(Some(qargs_count));
for qargs in target.qargs().unwrap().flatten() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it always safe to call target.qargs().unwrap() here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be, if .qargs() returns None we handle that before this function is ever called. It's done in build_coupling_map() at L262 it returns the None with the ? and that short circuit returns at L371 or L422 (depending on the graph type).

Comment on lines +432 to +438
return Ok(map_free_qubits(
im_graph_data.free_nodes,
HashMap::new(),
&im_graph_data.reverse_im_graph_node_map,
&avg_error_map,
target,
));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is map_free_qubits only called in the undirected case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In VF2 and VF2Post there are 3 modes of operations with code shared between the two passes.

  1. Undirected Graph w/ average error rates
  2. Directed Graph w/ average error rates (VF2Layout only)
  3. Directed Graph using target error rates (VF2PostLayout only)

in mode 3 we can't use the free qubits handling because the vf2_mapping function is passed a node and edge matcher that validates the target supports the circuit's operations on a virtual qubit. This is required because in that mode we validate that the new layout is valid per the target and we don't watch vf2 to consider a match if it leaves qubits free that don't support the operations that the free circuit qubits use.

When it comes to mode 2 we could in theory support this optimization but the code doesn't because the directed interaction graph generation is shared between modes 2 and 3. Also mode 2 in general is a bit weird and we don't ever use it by default. So I didn't implement it here because I didn't want to change the behavior from Python for this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fwiw, in my follow-on patch series, I have done this free-qubit mapping for variant 2, because it let me deduplicate some logic (though it's not all that hard to reinstate it).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mtreinish for the explanation, seeing the bigger picture definitely helps.

Comment on lines +159 to +160
# We can't use the rust fast path because we have a seed set, or no target so continue with
# the python path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is actually possible certainly for the seed, and (with greater annoyance) also for the coupling map, but I wouldn't necessarily bother in this PR now. It'd be good to remove all this extra Python code at some point, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I was thinking instead it would be better to deprecate these options on the pass and support target only moving forward and no randomization moving forward. For VF2Layout I could see the argument for maybe still supporting coupling map but for VF2PostLayout it doesn't really make any sense.

I only wanted to concentrate on the common path that we use in the preset pass managers for this PR though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I generally agree. My main concern is getting rid of the extra Python code, mostly - it's unpleasant that we're maintaining two separate paths through VF2. I don't feel like we need to do anything about it right now.

@jakelishman
Copy link
Member

Btw (for those who aren't Matt, who already knows this): I've been working on a follow-on patch series that goes on top of this PR, and reworks a lot of how we do VF2 from quite a low level. From what I saw of working with it, the Rust-space code here is a faithful port of the Python-space code, and while there's potentially ways to reduce some duplication in the constructions of the interactions and couplings, what's here works fine.

Copy link
Member

@alexanderivrii alexanderivrii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mtreinish for fully porting the VF2Layout pass to Rust, I completely agree with @jakelishman that this is a faithful port of the Python code, approving!

@alexanderivrii alexanderivrii added this pull request to the merge queue Apr 7, 2025
Merged via the queue into Qiskit:main with commit 2e18f9b Apr 7, 2025
24 checks passed
@mtreinish mtreinish deleted the vf2-rs branch April 7, 2025 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Changelog: None Do not include in changelog mod: transpiler Issues and PRs related to Transpiler Rust This PR or issue is related to Rust code in the repository
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Port VF2Layout to Rust
5 participants