Jump to content

Multithreaded Architecture in Leadwerks Game Engine 5

Josh

2,925 views

Leadwerks Game Engine 5 is a restructuring of Leadwerks Game Engine 4 to adapt to the demands of virtual reality and leverage the full capabilities of modern and future hardware.

Basically, the main idea with VR is that if you don't maintain a steady 90 FPS in virtual reality, you will throw up.  Nausea may be the worst physiological feeling you can experience.  In fact, nausea has been rated by cancer patients as worse than pain.  Being sensitive to motion sickness myself, this is a problem I am very motivated to solve.

In a conventional renderer, running both your game logic and rendering at 60 hz (frames per second) seems reasonable.  However, when we up the framerate to the 90 hz required for fluid virtual reality experiences, it seems like an excessive demand on the game code.  Game logic normally handles AI, player input, and various other tasks, and those things don't have to be updated quite that often.

Distributing Tasks
The solution I have come up with is to decouple the game loop from the renderer.  In the diagram below, the game loop is running at only 30 hz, while the physics, culling, and rendering loops are running at independent frequencies.

multithread1.png.9a5b81c1fd8ac31eac590a913b23894c.png

Think of this as like gears on a bicycle.  Your pedals move slowly, but your wheels spin very fast.  The game logic loop is like your pedals, while the rendering loop is like the rear wheel it is connected to.

gears.jpg.2a420b3c735ed09ebaad61597144f6b8.jpg

Previously, your game logic needed to execute in about 8 milliseconds or it would start slowing down the framerate.  With my design here, your game code gets more than 32 milliseconds to execute, a lifetime in code execution time, while a steady framerate of 90 or 60 FPS is constantly maintained.

I actually came up with this idea on my own, but upon further research I found out this is exactly what Intel recommends.  The technique is called Free Step Mode.  The diagram below does not correspond to our engine design, but it does illustrate the idea that separate systems are operating at different speeds:

7951-2.jpg.a224d98361a9c2890ece33362afa653f.jpg

If all threads are set to execute at the same frequency, it is called Lock Step Mode.

7952.jpg.913116238ba12998a0b351549d732e16.jpg

Data Exchange
Data in the game loop is exchanged with the physics and navmesh threads, but is passed one-way on to the culling loop, where it is then passed in a single direction to the rendering loop.  This means there will be a slight delay between when an event occurs and when it makes its way to the rendering thread and the final screen output, but we are talking times on the level of perhaps 10 milliseconds, so it won't be noticeable.  The user will just see smooth motion.

multithread2.png.5f514bfdc821b4e6404455b91c10248a.png

Rather than risk a lot of mutex locks, data is going to be passed one-way and each thread will have a copy of the object.  The game loop will have the full entity class, but the rendering threads will only have a stripped-down class, something like this:

class RenderObject
{
public:
	Mat4 matrix;
	AABB aabb;
	std::vector<Surface*> surfaces;
};

The original entity this object corresponds to can be modified or deleted, without fear of affecting downstream threads.  Again, Intel confirmed what I already thought would be the best approach:

Quote

In order for a game engine to truly run parallel, with as little synchronization overhead as possible, it will need to have each system operate within its own execution state with as little interaction as possible to anything else that is going on in the engine. Data still needs to be shared however, but now instead of each system accessing a common data location to say, get position or orientation data, each system has its own copy. This removes the data dependency that exists between different parts of the engine. Notices of any changes made by a system to shared data are sent to a state manager which then queues up all the changes, called messaging. Once the different systems are done executing, they are notified of the state changes and update their internal data structures, which is also part of messaging. Using this mechanism greatly reduces synchronization overhead, allowing systems to act more independently.

-Designing the Framework of a Parallel Game Engine, Jeff Andrews, Intel
https://software.intel.com/en-us/articles/designing-the-framework-of-a-parallel-game-engine

But wait, isn't latency a huge problem in VR, and I just described a system that adds latency to the renderer?  Yes and no.  The rendering thread will constantly update the headset and controller orientations, every single frame, at 90 hz.  The rest of the world will be 1-2 frames behind, but it won't matter because it's not connected to your body.  You'll get smooth head motion with zero delays while at the same time relieving the renderer of all CPU-side bottlenecks.

Even for non-VR games, I believe this design will produce a massive performance boost unlike anything you've ever seen.



6 Comments


Recommended Comments

I don't understand, why render another frame if gameplay logic wasn't updated yet and nothing in the world changed? In non-VR games you will see two identical frames?

Will we be able to increase gameplay refresh rate? I think gameplay should be updated faster than rendering. My rhythm game feels much more responsive on 200 FPS than on 60 even though I have 60 hz monitor.

Share this comment


Link to comment
Just now, Genebris said:

I don't understand, why render another frame if gameplay logic wasn't updated yet and nothing in the world changed? In non-VR games you will see two identical frames?

Will we be able to increase gameplay refresh rate? I think gameplay should be updated faster than rendering. My rhythm game feels much more responsive on 200 FPS than on 60 even though I have 60 hz monitor.

The renderer can interpolate between two frames of data, creating a new frame in-between.

The frequencies of each system can probably be made adjustable.

Share this comment


Link to comment

Genebris,

While nothing in the game world may have changed, the camera may turn suddenly and a new rendered frame is necessary, one with out lag. I don't want game logic impacting movement of camera. 

 

 

Share this comment


Link to comment
1 minute ago, wayneg said:

Genebris,

While nothing in the game world may have changed, the camera may turn suddenly and a new rendered frame is necessary, one with out lag. I don't want game logic impacting movement of camera. 

 

 

That's the one point in the renderer where a mutex will lock, and a callback will be used to update the camera rotation and position.  The only thing that has to be instantaneous are the camera and VR controllers (if present).

Share this comment


Link to comment

I got goosebumps reading this. I am so excited to be able to leverage all of these great features. I cannot tell you how excited I am to have this the software in this architecture. It is really going to answer a lot of difficult questions for us in the engineering field as well. Well done leadwerks.

Share this comment


Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Blog Entries

    • By Josh in Josh's Dev Blog 0
      I did a little experiment with FPS Creator Pack #75 by upsampling the images with Gigapixel, which uses deep learning to upsample images and infer details that don't appear in the original pixels. The AI neural network does a pretty impressive job, generating results that are look better than a simple sharpen filter: I doubled the size of the textures to 1024x1024. Then I generated normal maps from the high-res images using AMD's TGA to DOT3 tool, and saved the normal maps with BC5 DDS compression. The diffuse textures were saved with BC7 DDS compression. The images below are using a 4x magnification to demonstrate the difference.


      As you can see, the image that is upsampled with deep learning looks normal and the resized image looks like there is butter on the lens! It's hard to believe the above images came from a 256x128 section of an image.
      The workflow was pretty tedious, as I had to convert images to TGA, then to uncompressed or BC5 DDS, and then to BC7 in Visual Studio. Each BC7 texture took maybe 5-10 minutes to compress! So while this set represents the optimum quality for 2019 game tech, and the format for assets we want to use in LE5, the workflow has a lot of room for improvement.
      You can download the full package here:
      FPSCPack75TexturesHD.zip
    • By Josh in Josh's Dev Blog 2
      DPI scaling and the 2D drawing and GUI system were an issue I was a bit concerned about, but I think I have it worked out. This all goes back to the multi-monitor support that I designed back in September. Part of that system allows you to retrieve the DPI scale for each display. This gives you another piece of information in addition to the raw screen resolution. The display scale gives you a percentage value the user expects to see vector graphics at, with 100% being what you would expect with a regular HD monitor. If we scale our GUI elements and font sizes by the display scale we can adjust for screens with any pixel density.
      This shot shows 1920x1080 fullscreen with DPI scaling set to 100%:

      Here we see the same resolution, with scaling set to 125%:

      And this is with scaling set to 150%:

      The effect of this is that if the player is using a 4K, 8K, or any other type of monitor, your game can display finely detailed text at the correct size the user expects to see. It also means that user interfaces can be rendered at any resolution for VR.
      Rather than trying to automatically scale GUI elements I am giving you full control over the raw pixels. That means you have to decide how your widgets will be scaled yourself, and program it into the game interface, but there is nothing hidden from the developer. Here is my code I am working with now to create a simple game menu. Also notice there is no CreatePanel(), CreateButton(), etc. anymore, there is just one widget you create and set the script for. I might add an option for C++ actors as well, but since these are operating on the main logic thread there's not really a downside to running the code in Lua.
      local window = ActiveWindow() if window == nullptr then return end local framebuffer = window:GetFramebuffer() if framebuffer == nil then return end self.gui = CreateGUI(self.guispritelayer) --Main background panel self.mainpanel = CreateWidget(self.gui,"",0,0,framebuffer.size.x,framebuffer.size.y) self.mainpanel:SetScript("Scripts/GUI/Panel.lua", true) local scale = window.display.scale.y local w = 120 local h = 24 local sep = 36 local x = framebuffer.size.x / 6 local y = framebuffer.size.y / 2 - sep * 3 self.resumebutton = CreateWidget(self.mainpanel,"RESUME GAME",x,y,w,h) self.resumebutton:SetScript("Scripts/GUI/Hyperlink.lua", true) self.resumebutton:SetFontSize(14 * window.display.scale.y) y=y+sep*2 self.label2 = CreateWidget(self.mainpanel,"OPTIONS",x,y,w,h) self.label2:SetScript("Scripts/GUI/Hyperlink.lua", true) self.label2:SetFontSize(14 * window.display.scale.y) y=y+sep*2 self.quitbutton = CreateWidget(self.mainpanel,"QUIT", x,y, w,h) self.quitbutton:SetScript("Scripts/GUI/Hyperlink.lua", true) self.quitbutton:SetFontSize(14 * window.display.scale.y) w = 400 * scale h = 550 * scale self.optionspanel = CreateWidget(self.mainpanel,"QUIT", (framebuffer.size.x- w) * 0.5, (framebuffer.size.y - h) * 0.5, w, h) self.optionspanel:SetScript("Scripts/GUI/Panel.lua", true) self.optionspanel.color = Vec4(0.2,0.2,0.2,1) self.optionspanel.border = true self.optionspanel.radius = 8 * scale self.optionspanel.hidden = true  
    • By Josh in Josh's Dev Blog 2
      Previously I talked about the technical details of hardware tessellation and what it took to make it truly useful. In this article I will talk about some of the implications of this feature and the more advanced ramifications of baking tessellation into Turbo Game Engine as a first-class feature in the 
      Although hardware tessellation has been around for a few years, we don't see it used in games that often. There are two big problems that need to be overcome.
      We need a way to prevent cracks from appearing along edges. We need to display a consistent density of triangles on the screen. Too many polygons is a big problem. I think these issues are the reason you don't really see much use of tessellation in games, even today. However, I think my research this week has created new technology that will allow us to make use of tessellation as an every-day feature in our new Vulkan renderer.
      Per-Vertex Displacement Scale
      Because tessellation displaces vertices, any discrepancy in the distance or direction of the displacement, or any difference in the way neighboring polygons are subdivided, will result in cracks appearing in the mesh.

      To prevent unwanted cracks in mesh geometry I added a per-vertex displacement scale value. I packed this value into the w component of the vertex position, which was not being used. When the displacement strength is set to zero along the edges the cracks disappear:

      Segmented Primitives
      With the ability to control displacement on a per-vertex level, I set about implementing more advanced model primitives. The basic idea is to split up faces so that the edge vertices can have their displacement scale set to zero to eliminate cracks. I started with a segmented plane. This is a patch of triangles with a user-defined size and resolution. The outer-most vertices have a displacement value of 0 and the inner vertices have a displacement of 1. When tessellation is applied to the plane the effect fades out as it reaches the edges of the primitive:

      I then used this formula to create a more advanced box primitive. Along the seam where the edges of each face meet, the displacement smoothly fades out to prevent cracks from appearing.

      The same idea was applied to make segmented cylinders and cones, with displacement disabled along the seams.


      Finally, a new QuadSphere primitive was created using the box formula, and then normalizing each vertex position. This warps the vertices into a round shape, creating a sphere without the texture warping that spherical mapping creates.

      It's amazing how tessellation and displacement can make these simple shapes look amazing. Here is the full list of available commands:
      shared_ptr<Model> CreateBox(shared_ptr<World> world, const float width = 1.0); shared_ptr<Model> CreateBox(shared_ptr<World> world, const float width, const float height, const float depth, const int xsegs = 1, const int ysegs = 1); shared_ptr<Model> CreateSphere(shared_ptr<World> world, const float radius = 0.5, const int segments = 16); shared_ptr<Model> CreateCone(shared_ptr<World> world, const float radius = 0.5, const float height = 1.0, const int segments = 16, const int heightsegs = 1, const int capsegs = 1); shared_ptr<Model> CreateCylinder(shared_ptr<World> world, const float radius = 0.5, const float height=1.0, const int sides = 16, const int heightsegs = 1, const int capsegs = 1); shared_ptr<Model> CreatePlane(shared_ptr<World> world, cnst float width=1, const float height=1, const int xsegs = 1, const int ysegs = 1); shared_ptr<Model> CreateQuadSphere(shared_ptr<World> world, const float radius = 0.5, const int segments = 8); Edge Normals
      I experimented a bit with edges and got some interesting results. If you round the corner by setting the vertex normal to point diagonally, a rounded edge appears.

      If you extend the displacement scale beyond 1.0 you can get a harder extended edge.

      This is something I will experiment with more. I think CSG brush smooth groups could be used to make some really nice level geometry.
      Screen-space Tessellation LOD
      I created an LOD calculation formula that attempts to segment polygons into a target size in screen space. This provides a more uniform distribution of tessellated polygons, regardless of the original geometry. Below are two cylinders created with different segmentation settings, with tessellation disabled:

      And now here are the same meshes with tessellation applied. Although the less-segmented cylinder has more stretched triangles, they both are made up of triangles about the same size.

      Because the calculation works with screen-space coordinates, objects will automatically adjust resolution with distance. Here are two identical cylinders at different distances.

      You can see they have roughly the same distribution of polygons, which is what we want. The same amount of detail will be used to show off displaced edges at any distance.

      We can even set a threshold for the minimum vertex displacement in screen space and use that to eliminate tessellation inside an object and only display extra triangles along the edges.

      This allows you to simply set a target polygon size in screen space without adjusting any per-mesh properties. This method could have prevented the problems Crysis 2 had with polygon density. This also solves the problem that prevented me from using tessellation for terrain. The per-mesh tessellation settings I worked on a couple days ago will be removed since it is not needed.
      Parallax Mapping Fallback
      Finally, I added a simple parallax mapping fallback that gets used when tessellation is disabled. This makes an inexpensive option for low-end machines that still conveys displacement.

      Next I am going to try processing some models that were not designed for tessellation and see if I can use tessellation to add geometric detail to low-poly models without any cracks or artifacts.
×
×
  • Create New...