Jump to content

Second Performance Test: nearly 400% faster!

Josh

1,600 views

After observing the behavior of the previous test, I rearranged the threading architecture for even more massive performance gains. This build runs at speeds in excess of 400 FPS with 100,000 entities....on Intel integrated graphics!

Image1.thumb.jpg.0ca7079accccac555bec07f378bd64e8.jpg

I've had more luck with concurrency in design than parallelism. (Images below are taken from here.)

Splitting the octree recursion up into separate threads produced only modest gains. It's difficult to optimize because the sparse octree is unpredictable.

concurrency-vs-parallelism-2.png.ef122c5d6445bfea18048bc5a4e947e0.png

Splitting different parts of the engine up into multiple threads did result in a massive performance boost.

concurrency-vs-parallelism-1.png.b04c1a22d05e544e0e6ea88de2f901f9.png

The same test in Leadwerks 4 runs at about 9 FPS. making Leadwerks 5 more than 45 times faster under heavy loads like this.

Alpha subscribers can try the test out here.

  • Like 2


5 Comments


Recommended Comments

When you're doing this threading it's really more about the processor than the gfx card isn't it? Are these threads on the CPU or GPU?

Share this comment


Link to comment

CPU. The rendering code is already very optimized and this is about eliminating all overhead on the CPU side.

Share this comment


Link to comment

as each thread is doing a smaller subset of the work, you are probably getting more cache hits, are you also using thread affinity on your busiest threads to stop them context switching.

 

Share this comment


Link to comment

I got the culling time down to an insanely low amount, and it would actually be much slower if I split it up into multiple threads:

 

  • Haha 1

Share this comment


Link to comment

sorry I assumed that each task was on a thread, running independently. I was suggesting that if you had more active threads than cpus, you would experience contention for those cpus. You would see this in the performance stats, as context switches, which will cause the current context to be saved, and another loaded. If this happens a lot you are loosing useful cpu processing power.

  • Like 1

Share this comment


Link to comment

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Create Your Account

Sign in

Already have an account? Sign in here.

Sign In Now
×