Jump to content

Basic Scene Benchmarks



I did some work optimizing the new renderer to discrete cards and came up with a new benchmark. This scene is interesting because there is not much in the scene, so the culling does not benefit much xrom multithreading, since there is so little work to do. The screen is filled with a spotlight in the foreground, and in the background a point light is visible in the distance. Both engines are running at 1280x720 resolution. This is really a pure test of deferred vs. forward lighting.

AMD R9 200

Leadwerks: 448 FPS
Turbo: 648 FPS (44% faster)

On this older AMD card we see that forward rendering automatically gives a significant boost in performance, likely due to not having to pass around a lot of big textures that are being written to.



GEForce 1070M (Laptop)

Leadwerks 4: 120 FPS
Turbo: 400 FPS (333% faster)

Here we see an even bigger performance boost. However, these are two very different cards and I don't think it's safe to say yet if one GPU vendor will get more from the new engine.

Intel Graphics 4000

Leadwerks 4: 67 FPS  (18% faster)
Turbo: 57 FPS

This is an interesting result because integrated graphics tend to struggle with pixel fillrate, but seem to have no problems with texture bandwidth since they are using regular old CPU memory for everything. Although I think the new engine will run faster on Intel graphics (especially newer CPUs) when the scene is loaded down, it seems to run a bit slower in a nearly empty scene. I think this demonstrates that if you build a renderer specifically to take advantage of one type of hardware (discrete GPUs) it will not automatically be optimal on other types of hardware with different architectures. I commented out the lighting code and the framerate shot way up, so it definitely does seem to be a function of the pixel shader, which is already about as optimized as can be.

Overall, I think the new engine will show higher relative performance the more you throw at it, but an empty scene may run a bit slower on older Intel chips since the new renderer is optimized very specifically for discrete GPUs. The parallel capabilities of discrete GPUs are relied on to handle complex lighting in a single pass, while texture bandwidth is reduced to a minimum. This is in line with the hardware trend of computing power increasing faster than memory bandwidth. This also shows that low to mid-range discrete GPUs stand to make the most gains.

  • Like 5


Recommended Comments

50 minutes ago, shadmar said:

It looks like there is less geometry in the second picture. Missing a stack on the right side.

Yeah, there was a culling problem I fixed. I don't think this has any effect on framerate.

  • Like 1

Share this comment

Link to comment

Hold on...brushes were not being removed after being collapsed into large models! The engine is way faster than this! :D

Share this comment

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...