Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[d3d9] Optimize SWVP devices #4274

Merged
merged 3 commits into from
Sep 22, 2024
Merged

[d3d9] Optimize SWVP devices #4274

merged 3 commits into from
Sep 22, 2024

Conversation

K0bin
Copy link
Collaborator

@K0bin K0bin commented Sep 18, 2024

Needs lots of testing.

This makes D3D9 devices, that are configured to always use software vertex processing (so not MIXED), always use the late per draw buffer upload path. We copy the vertex data that each specific draw accesses to a temporary buffer and render from that, similar to Up-draws.
This makes sense because games that use pure SWVP expect vertex processing to be synchronous which has lead to both bugs and performance problems. For example we used to run into issues when respecting NOOVERWRITE or have dozens or even hundreds of queue syncs per frame. Considering that SWVP is supposed to run on the CPU, the amount of vertices is hopefully small.

I hope this won't impact more modern or demanding games.

The game that inspires this was Phantasmat from this comment:
#4263 (comment)

It uses a single 96,000 byte vertex buffer (POOL = DEFAULT, USAGE = WRITEONLY, FVF != 0) and writes data to it before every single draw. Ofc it also doesn't specify a lock range, so we end up uploading the entire 96 KB buffer over and over again, run out of staging memory and then stall. It is a 2D game, so with this PR we upload 4 vertices for every draw.

@K0bin
Copy link
Collaborator Author

K0bin commented Sep 18, 2024

cc @WinterSnowfall

@WinterSnowfall
Copy link
Contributor

I can throw this at a bunch of games of course, but I think it would be useful and very helpful (since enhancing the HUD is a trend now) to also add the type of VP (based on device type and on m_isSWVP in case of Mixed) as an element to the D3D9 HUD.

The use of D3DCREATE_SOFTWARE_VERTEXPROCESSING devices is AFAIK very limited even in d3d8 and generally only used as a fallback in case HW or Mixed modes fail. A very limited set of games let you pick which to use.

@K0bin K0bin force-pushed the swvp-opt branch 2 times, most recently from b06b8fe to ccaf5be Compare September 19, 2024 14:19
@WinterSnowfall
Copy link
Contributor

WinterSnowfall commented Sep 19, 2024

This PR also properly fixes AlpyneDreams#179 , on which we had more or less given up in d8vk. The Supreme Ruler d3d8 games can now be played with correct text rendering even without the "Nvidia driver workaround" configuration option (which affected performance very negatively).

@K0bin
Copy link
Collaborator Author

K0bin commented Sep 19, 2024

Now you know what is a problem, as NINE have SVPs optimized.

It has nothing to do with that.

@K0bin K0bin force-pushed the swvp-opt branch 2 times, most recently from bbba13c to 88c6e82 Compare September 20, 2024 11:23
@K0bin K0bin changed the title [d3d9] Optimize pure SWVP devices [d3d9] Optimize SWVP devices Sep 20, 2024
@K0bin K0bin force-pushed the swvp-opt branch 4 times, most recently from 19792a4 to 115e9d6 Compare September 21, 2024 16:23
@doitsujin doitsujin merged commit 04ad986 into doitsujin:master Sep 22, 2024
4 checks passed
@K0bin K0bin deleted the swvp-opt branch September 22, 2024 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants