Jump to content
  • entries
    940
  • comments
    5,894
  • views
    863,980

First Animation Metrics


Josh

2,399 views

 Share

I got skinned animation working in the new renderer, after a few failed attempts that looked like something from John Carpenter's The Thing. I set up a timer and updated a single animation on a model 10,000 times. Animation consists of two phases. First, all animations are performed to calculate the local position and quaternion rotation. Second, 4x4 matrices are calculated for the entire hierarchy in global space and copied into an array of floats. To test this, I placed this code inside the main loop:

float frame = Millisecs() / 10.0f;
auto tm = Millisecs();
for (int n = 0; n < 10000; ++n) {
	monster->skeleton->SetAnimationFrame(frame, 1, 1, true);
	monster->UpdateSkinning();
}
int elapsed = Millisecs() - tm;
Print(elapsed);

The result in release mode was around 60 milliseconds. When I tested each command lone, I found that UpdateSkinning() took around 30-35 milliseconds while SetAnimationFrame() hovered around 20 milliseconds.

When I cut the number of iterations in half, the result was right around 30 milliseconds, which is within our window of time (33 ms) for games that run at 30 hz. If your game uses a 60 hz loop then you can cut that number in half. The model I am using also only has 24 bones, but models with up to 256 bones are supported (with a pretty linear performance cost).

Now this is with a single call to SetAnimationFrame. If the animation manager is in use there could potentially be many more calculations performed as animations are smoothly blended.

Splitting the animations up into multiple threads could be done easily, but most computers only have four CPUs, so I don't see this being useful on more than 2-3 threads. Let's say we dedicate two threads to animation. That means right now our theoretical limit is about 10,000 zombies onscreen at once. I would like to go higher, but I think this is probably our realistic limit for CPU-performed animations. The alternative would be to upload the animation sequences themselves to the GPU and perform all animation entirely on the GPU, but then we would lose all feedback on the CPU side like the current bone orientations. Perhaps a solution would be to have both a CPU and GPU animation system that produces identical results, and the CPU side stuff would only be called when needed, but that makes things pretty complicated and I am not sure I want to go down that road.

In reality, the final results will probably be quite a lot less than this when all functionality is added, but from this data we can reasonably extrapolate that Leadwerks 5 will support thousands of animated characters onscreen. According to the developers of the Dead Rising series, a few thousand is the practical limit of how many characters you would ever want onscreen, so this is encouraging. Of course, there is no limit on the number of offscreen characters, since animation is only performed for characters that appear onscreen.

billionsofzombies.thumb.jpg.e9fbbfa73645229aa65be1512b561614.jpg

 Share

7 Comments


Recommended Comments

Sounds awesome but seriously:

Quote

our theoretical limit is about 10,000 zombies onscreen at once. I would like to go higher

There is really no need to go higher... With Full HD, that would mean, that would leave about 15x15 pixel for each zombie... This is should be enough... (unless you are trying to create another Ultimate Epic Battle Simulater :lol:)

Link to comment

Don't tempt me. Actually, I just tried making two instances of a model share the same skeleton and it works. When one instance is animated, the other copies display the exact same movement. This could be used to display large groups of identical characters (or a few variations spread across the group) and the animation calculations would only have to be performed once per unique skeleton.

Yeah, this works really well. For unique animations we should be able to handle at least a thousand or two onscreen, and if you want massive armies then shared skeletons is the way to go.

Link to comment

Do you feel the future of the flexible pipeline on the GPU will be 2 way communication? It seems that would help in this specific case and I would imagine others. I say this being ignorant of the effort the gfx card creators would have to do to get this, but at a high level it seems like it's an advancement in that field that could be helpful.

Link to comment

Here's a test with 100 crawlers all sharing a single skeleton. The framerate is low because my Intel graphics chip is hitting the vertex pipeline limit, but the processing time for the animation is basically zero. I have no doubt a discrete GPU would chew through this like nothing. Intel graphics will not be able to display 1000 characters onscreen, but I think a full GPU will.

Image1.jpg

Link to comment
56 minutes ago, Rick said:

Do you feel the future of the flexible pipeline on the GPU will be 2 way communication? It seems that would help in this specific case and I would imagine others. I say this being ignorant of the effort the gfx card creators would have to do to get this, but at a high level it seems like it's an advancement in that field that could be helpful.

I think you can use a pixel buffer to retrieve a texture's pixels without stalling the pipeline. So yes, that is one option. But I am still figuring out how to design this. There are several bottlenecks to consider:

  • Animation transformations in main thread or separate thread.
  • Animation interpolation in rendering thread or in vertex shader (probably the latter).
  • Upload speed of animation data to GPU.
  • Vertex shader speed.
  • AI navmesh speed.
  • Physics system.

Any one of these could become a bottleneck that would make it pointless to develop the others to try to support more than what is realistically possible. In the video above, the developer might not be updating each animated model each frame. They are only using one or two unique models. They might not be doing any advanced animation transitions or blending at all. Maybe each vertex only uses a single bone. There's a lot of little shortcuts you can take that will damage the quality. For example, you could just have one skeleton playing each different animation, and instead of changing sequences, just assign the skeleton that has the sequence you want. This will use a tiny amount of processing power for animations, but your models will have no transitions at all in between sequences. You could save lots of time by using linear interpolation between 4x4 matrices instead of quaternion slerp, but you will experience warping artifacts.

So right now I am going to lean towards quality and try to support 2000 real-time characters with good hardware.

Link to comment

I've got the speed for 100 characters on Intel graphics up to 200 FPS now by making some changes in the animation shader. We're in good shape.

Link to comment
Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...