Jump to content

Blogs

Clustered Forward Rendering

I decided I want the voxel GI system to render direct lighting on the graphics card, so in order to make that happen I need working lights and shadows in the new renderer. Tomorrow I am going to start my implementation of clustered forward rendering to replace the deferred renderer in the next game engine. This works by dividing the camera frustum up into sectors, as shown below. A list of visible lights for each cell is sent to the GPU. If you think about it, this is really another voxel algorithm. The whole idea of voxels is that it costs too much processing power to calculate something expensive for each pixel, so lets calculate it for a 3D grid of volumes and then grab those settings for each pixel inside the volume. In the case of real-time global illumination, we also do a linear blend between the values based on the pixel position. Here's a diagram of a spherical point light lying on the frustum. But if we skew the frustum so that the lines are all perpendicular, we can see this is actually a voxel problem, and it's the light that is warped in a funny way, not the frustum. I couldn't figure out how to warp the sphere exactly right, but it's something like this. For each pixel that is rendered, you transform it to the perpendicular grid above and perform lighting using only the lights that are present in that cell. This tecnnique seems like a no-brainer, but it would not have been possible to do this when our deferred renderer first came to be. GPUs were not nearly as flexible back then as they are now, and things like a variable-length for loop would be a big no-no. Well, something else interesting occurred to me while I was going over this. The new engine is an ambitious project, with a brand new editor to be built from scratch. That's going to take a lot of time. There's a lot of interest in the features I am working on now, and I would like to get them out sooner rather than later. It might be possible to incorporate the clustered forward renderer and voxel GI into Leadwerks Game Engine 4 (at which point I would probably call it 5) but keep the old engine architecture. This would give Leadwerks a big performance boost (not as big as the new architecture, but still probably 2-3x in some situations). The visuals would also make a giant leap forward into the future. And it might even be possible to release in time for Christmas. All the shaders would have to be modified, but if you just updated your project everything would run in the new Leadwerks Game Engine 5 without any problem. This would need to be a paid update, probably with a new app ID on Steam. The current Workshop contents would not be accessible from the new app ID, but we have the Marketplace for that. This would also have the benefit of bringing the editor up to date with the new rendering methods, which would mean the existing editor could be used seamlessly with the new engine. We presently can't do this because the new engine and Leadwerks 4 use completely different shaders. This could solve a lot of problems and give us a much smoother transition from here to where we want to go in the future: Leadwerks Game Engine 4 (deferred rendering, existing editor) [Now] Leadwerks Game Engine 5 (clustered forward rendering, real-time GI, PBR materials, existing architecture, existing editor) [Christmas 2018] Turbo Game Engine (clustered forward rendering, new architecture,  new editor) [April 2020] I just thought of this a couple hours ago, so I can't say right now for sure if we will go this route, but we will see. No matter what, I want to get a version 4.6 out first with a few features and fixes. You can read more about clustered forward rendering in this article

Josh

Josh

Introducing Leadwerks Marketplace

Steam Workshop was a compelling idea to allow game asset authors to sell their items for use with Leadwerks Game Engine. However, the system has turned out to have some fundamental problems, despite my best efforts to work around it. Free items are not curated, causing the store to fill with low quality content. Some people have reported trouble downloading items. The publishing process is pretty tedious. The check-out process requires adding funds to Steam Wallet, and is just not very streamlined. At the same time, three new technologies have emerged that make it possible to deliver a better customer and seller experience through our website. Amazon S3 offers cheap and reliable storage of massive amounts of data. Paypal credit card tokens allow us to safely store a token on our server that can't be used anywhere else, instead of a credit card number. This eliminates the danger of a potential website hack revealing your information. Invision Power Board has integrated both these technologies into our commerce system. it would not have been possible to build a web store a few years ago because the cost of server space would have been prohibitive, and storing hundreds of gigs of data would have made my website backup process unsustainable. So at the time, the unlimited storage of Steam and their payment processing system was very appealing. That is no longer the case. To solve the problems of Steam Workshop, and give you easy access to a large library of ready-to-use game assets, I am happy to introduce Leadwerks Marketplace. The main page shows featured assets, new content, and the most popular items, with big thumbnails everywhere.   When you view an item, screenshots, videos, and detailed technical specifications are shown: How does Leadwerks Marketplace improve things?: Easy download of zip files that are ready to use with Leadwerks. You can use File > Import menu item to extract them to the current project, or just unzip them yourself. All content is curated. Items are checked for compatibility and quality. Clear technical specifications for every file, so you know exactly what you are getting. Cheap and reliable storage forever with Amazon S3. Any DLCs or Workshop Store items you purchased can be downloaded from Leadwerks Marketplace by linking your Steam account to your profile. Easy publishing of your items with our web-based uploader. We're launching with over 50 gigabytes of game assets, and more will be added continuously. To kick off the launch we're offering some items at major discounts during the Summer Game Tournament. Here are a few assets to check out: Get "The Zone" for just $4.99: Or download our Mercenary character for free! Just create your free Leadwerks account to gain access. Other items will be featured on sale during the Summer Game Tournament: After purchasing an item, you can download it immediately. All your purchased items will be shown in the Purchases area, which you can access from the user menu in the top-right of this the website header: Here all your purchased items will be available to download, forever: If you are interested in selling your game assets to over 20,000 developers, you can upload your items now. Sellers receive a 70% royalty for each sale, with a minimum payout of just $20. See the content guidelines for details and contact me if you need any help. If you have a lot of good content, we can even convert your assets for you and make them game-ready for Leadwerks, so there's really no risk to you. Browse game assets now.  

Admin

Admin

Voxel Cone Tracing Part 5 - Hardware Acceleration

I was having trouble with cone tracing and decided to first try a basic GI algorithm based on a pattern of raycasts. Here is the result: You can see this is pretty noisy, even with 25 raycasts per voxel. Cone tracing uses an average sample, which eliminates the noise problem, but it does introduce more inaccuracy into the lighting. Next I wanted to try a more complex scene and get an estimate of performance. You may recognize the voxelized scene below as the "Sponza" scene frequently used in radiosity testing: Direct lighting takes 368 milliseconds to calculate, with voxel size of 0.25 meters. If I cut the voxel grid down to a 64x64x64 grid then lighting takes just 75 milliseconds. These speeds are good enough for soft GI that gradually adjusts as lighting changes, but I am not sure if this will be sufficient for our purposes. I'd like to do real-time screen-independent reflections. I thought about it, and I thought about it some more, and then when I was done with that I kept thinking about it. Here's the design I came up with: The final output is a 3D texture containing light data for all six possible directions.  (So a 256x256x256 grid of voxels would actually be 1536x256x256 RGB, equal to 288 megabytes.) The lit voxel array would also be six times as big. When a pixel is rendered, three texture lookups are performed on the 3D texture and multiplied by the normal of the pixel. If the voxel is empty, there is no GI information for that volume, so maybe a higher mipmap level could be used (if mipmaps are generated in the last step). The important thing is we only store the full-resolution voxel grid once. The downsampled voxel grid use an alpha channel for coverage. For example, a pixel with 0.75 alpha would have six out of eight solid child voxels. I do think voxelization is best performed on the CPU due to flexibility and the ability to cache static objects. Direct lighting, in this case, would be calculated from shadowmaps. So I have to implement the clustered forward renderer before going forward with this.

Josh

Josh

Summer Game Tournament

Summer is here, and you know what that means! Yes, it is time for another LEGENDARY game tournament. This year the theme is "Retro Gaming". Create a modern take on an arcade game hearkening back to the days of NES, Neo Geo, Sega, or just make anything you want. Either way, you get this totally radical poster as a prize! How does it work?  For one month, the Leadwerks community builds small playable games.  Some people work alone and some team up with others.  At the end of the month we release our projects to the public and play each other's games.  The point is to release something short and sweet with a constrained timeline, which has resulted in many odd and wonderful mini games for the community to play. WHEN: The tournament begins Thursday, June 21, and ends on July 31st at the stroke of midnight. HOW TO PARTICIPATE: Publish your retro-or-other-themed game to the Games Showcase before the deadline. You can work as a team or individually. Use blogs to share your work and get feedback as you build your game. Games must have a preview image, title, and contain some minimal amount of gameplay (there has to be some way to win the game) to be considered entries. It is expected that most entries will be simple, given the time constraints. This is the perfect time to try making a VR game or finish that idea you've been waiting to make! PRIZES: All participants will receive a limited-edition 11x17" poster commemorating the event. To receive your prize you need to fill in your name, mailing address, and phone number (for customs) in your account info. At the end of the tournament we will post a roundup blog featuring your entries. Let's go!

Admin

Admin

Voxel Cone Tracing Part 4 - Direct Lighting

Now that we can voxelize models, enter them into a scene voxel tree structure, and perform raycasts we can finally start calculating direct lighting. I implemented support for directional and point lights, and I will come back and add spotlights later. Here we see a shadow cast from a single directional light: And here are two point lights, one red and one green. Notice the distance falloff creates a color gradient across the floor: The idea here is to first calculate direct lighting using raycasts between the light position and each voxel: Then once you have the direct lighting, you can calculate approximate global illumination by gathering a cone of samples for each voxel, which illuminates voxels not directly visible to the light source: And if we repeat this process we can simulate a second bounce, which really fills in all the hidden surfaces: When we convert model geometry to voxels, one of the important pieces of information we lose are normals. Without normals it is difficult to calculate damping for the direct illumination calculation. It is easy to check surrounding voxels and determine that a voxel is embedded in a floor or something, but what do we do in the situation below? The thin wall of three voxels is illuminated, which will leak light into the enclosed room. This is not good: My solution is to calculate and store lighting for each face of each voxel. Vec3 normal[6] = { Vec3(-1, 0, 0), Vec3(1, 0, 0), Vec3(0, -1, 0), Vec3(0, 1, 0), Vec3(0, 0, -1), Vec3(0, 0, 1) }; for (int i = 0; i < 6; ++i) { float damping = max(0.0f,normal[i].Dot(lightdir)); //normal damping if (!isdirlight) damping *= 1.0f - min(p0.DistanceToPoint(lightpos) / light->range[1], 1.0f); //distance damping voxel->directlighting[i] += light->color[0] * damping; } This gives us lighting that looks more like the diagram below: When light samples are read, the appropriate face will be chosen and read from. In the final scene lighting on the GPU, I expect to be able to use the triangle normal to determine how much influence each sample should have. I think it will look something like this in the shader: vec4 lighting = vec4(0.0f); lighting += max(0.0f, dot(trinormal, vec3(-1.0f, 0.0f, 0.0f)) * texture(gimap, texcoord + vec2(0.0 / texwidth, 0.0)); lighting += max(0.0f, dot(trinormal, vec3(1.0f, 0.0f, 0.0f)) * texture(gimap, texcoord + vec2(1.0 / texwidth, 0.0)); lighting += max(0.0f, dot(trinormal, vec3(0.0f, -1.0f, 0.0f)) * texture(gimap, texcoord + vec2(2.0 / texwidth, 0.0)); lighting += max(0.0f, dot(trinormal, vec3(0.0f, 1.0f, 0.0f)) * texture(gimap, texcoord + vec2(3.0 / texwidth, 0.0)); lighting += max(0.0f, dot(trinormal, vec3(0.0f, 0.0f, -1.0f)) * texture(gimap, texcoord + vec2(4.0 / texwidth, 0.0)); lighting += max(0.0f, dot(trinormal, vec3(0.0f, 0.0f, 1.0f)) * texture(gimap, texcoord + vec2(5.0 / texwidth, 0.0)); This means that to store a 256 x 256 x 256 grid of voxels we actually need a 3D RGB texture with dimensions of 256 x 256 x 1536. This is 288 megabytes. However, with DXT1 compression I estimate that number will drop to about 64 megabytes, meaning we could have eight voxel maps cascading out around the player and still only use about 512 megabytes of video memory. This is where those new 16-core CPUs will really come in handy! I added the lighting calculation for the normal Vec3(0,1,0) into the visual representation of our voxels and lowered the resolution. Although this is still just direct lighting it is starting to look interesting: The last step is to downsample the direct lighting to create what is basically a mipmap. We do this by taking the average values of each voxel node's children: void VoxelTree::BuildMipmaps() { if (level == 0) return; int contribs[6] = { 0 }; for (int i = 0; i < 6; ++i) { directlighting[i] = Vec4(0); } for (int ix = 0; ix < 2; ++ix) { for (int iy = 0; iy < 2; ++iy) { for (int iz = 0; iz < 2; ++iz) { if (kids[ix][iy][iz] != nullptr) { kids[ix][iy][iz]->BuildMipmaps(); for (int n = 0; n < 6; ++n) { directlighting[n] += kids[ix][iy][iz]->directlighting[n]; contribs[n]++; } } } } } for (int i = 0; i < 6; ++i) { if (contribs[i] > 0) directlighting[i] /= float(contribs[i]); } } If we start with direct lighting that looks like the image below: When we downsample it one level, the result will look something like this (not exactly, but you get the idea): Next we will begin experimenting with light bounces and global illumination using a technique called cone tracing.

Josh

Josh

Voxel Cone Tracing Part 3 - Raycasting

I added a raycast function to the voxel tree class and now I can perform raycasts between any two positions. This is perfect for calculating direct lighting. Shadows are calculated by performing a raycast between the voxel position and the light position, as shown in the screenshot below. Fortunately the algorithm seems to work great an there are no gaps or cracks in the shadow: Here is the same scene using a voxel size of 10 centimeters: If we move the light a little lower, you can see a shadow appearing near two edges of the floor: Why is this happening? Well, the problem is that at those angles, the raycast is hitting the neighboring voxel on the floor next to the voxel we are testing: You might think that if we just move one end of the ray up to the top of the voxel it will work fine, and you'd be right, in this situation. But with slightly different geometry, we have a new problem. So how do we solve this? At any given time, a voxel can have up to three faces that face the light (but it might have as few as one). In the image below I have highlighted the two voxel faces on the right-most voxel that face the light: If we check the neighboring voxels we can see that the voxel to the left is occupied, and therefore the left face does not make a good position to test from: But the top voxel is clear, so we will test from there: If we apply the same logic to the other geometry configuration I showed, we also get a correct result. Of course, if both neighboring voxels were solid then we would not need to perform a raycast at all because we know the light would be completely blocked at this position. The code to do this just checks which side of a voxel the light position is on. As it is written now, up to three raycasts may be performed per voxel: if (lightpos.x < voxel->bounds.min.x) { if (GetSolid(ix - 1, iy, iz) == false) { result = IntersectsRay(p0 - Vec3(voxelsize * 0.5f, 0.0f, 0.0f), lightpos); } } if (lightpos.x > voxel->bounds.max.x and result == false) { if (GetSolid(ix + 1, iy, iz) == false) { result = IntersectsRay(p0 + Vec3(voxelsize * 0.5f, 0.0f, 0.0f), lightpos); } } if (lightpos.y < voxel->bounds.min.y and result == false) { if (GetSolid(ix, iy - 1, iz) == false) { result = IntersectsRay(p0 - Vec3(0.0f, voxelsize * 0.5f, 0.0f), lightpos); } } if (lightpos.y > voxel->bounds.max.y and result == false) { if (GetSolid(ix, iy + 1, iz) == false) { result = IntersectsRay(p0 + Vec3(0.0f, voxelsize * 0.5f, 0.0f), lightpos); } } if (lightpos.z < voxel->bounds.min.z and result == false) { if (GetSolid(ix, iy, iz - 1) == false) { result = IntersectsRay(p0 - Vec3(0.0f, 0.0f, voxelsize * 0.5f), lightpos); } } if (lightpos.z > voxel->bounds.max.z and result == false) { if (GetSolid(ix, iy, iz + 1) == false) { result = IntersectsRay(p0 + Vec3(0.0f, 0.0f, voxelsize * 0.5f), lightpos); } } .With this correction the artifact disappears: It even works correctly at a lower resolution: Now our voxel raycast algorithm is complete. The next step will be to calculate direct lighting on the voxelized scene using the lights that are present.

Josh

Josh

Voxel Cone Tracing Part 2 - Sparse Octree

At this point I have successfully created a sparse octree class and can insert voxelized meshes into it. An octree is a way of subdividing space into eight blocks at each level of the tree: A sparse octree doesn't create the subnodes until they are used. For voxel data, this can save a lot of memory. It was difficult to get the rounding and all the math completely perfect (and it has to be completely perfect!) but now I have a nice voxel tree that can follow the camera around and is aligned correctly to the world axis and units. The code that inserts a voxel is pretty interesting: A voxel tree is created with a number of levels, and the size of the structure is equal to pow(2,levels). For example, an octree with 8 levels creates a 3D grid of 256x256x256 voxels. Individual voxels are then inserted to the top-level tree node, which recursively calls the SetSolid() function until the last level is reached. A voxel is marked as "solid" simply by having a voxel node at the last level (0). (GetChild() has the effect of finding the specified child and creating it if it doesn't exist yet.) A bitwise flag is used to test which subnode should be called at this level. I didn't really work out the math, I just intuitively went with this solution and it worked as I expected: void VoxelTree::SetSolid(const int x, const int y, const int z, const bool solid) { int flag = pow(2, level); if (x < 0 or y < 0 or z < 0) return; if (x >= flag * 2 or y >= flag * 2 or z >= flag * 2) return; flag = pow(2, level - 1); int cx = 0; int cy = 0; int cz = 0; if ((flag & x) != 0) cx = 1; if ((flag & y) != 0) cy = 1; if ((flag & z) != 0) cz = 1; if (solid) { if (level > 0) { GetChild(cx, cy, cz)->SetSolid(x & ~flag, y & ~flag, z & ~flag, true); } } else { if (level > 0) { if (kids[cx][cy][cz] != nullptr) { kids[cx][cy][cz]->SetSolid(x & ~flag, y & ~flag, z & ~flag, false); } } else { //Remove self auto parent = this->parent.lock(); Assert(parent->kids[position.x][position.y][position.y] == Self()); parent->kids[position.x][position.y][position.y] = nullptr; } } } The voxel tree is built by adding all scene entities into the tree. From there it was easy to implement a simple raycast to see if anything was above each voxel, and color it black if another voxel is hit: And here is the same program using a higher resolution voxel tree. You can see it's not much of a stretch to implement ambient occlusion from here: At a voxel size of 0.01 meters (the first picture) the voxelization step took 19 milliseconds, so it looks like we're doing good on speed. I suspect the rest of this project will be more art than science. Stay tuned!

Josh

Josh

Bladequest - Is it the best FREE Leadwerks Game?

Hi Guys, It has been very silent around Phodex Framework and my projects, but thats because I really focused on game development and nothing else. Posting stuff and creating content costs time and I was also very busy with studying. However as promised in my last entry I was working on an actual game. I tried to create a small, but quality experience including decent design, nice graphics and, at least for a one man indie game, complex game mechanics such as first person melee and range combat. If it is the yet best free Leadwerks game is up to you. You can try it out for FREE!   About Bladequest Bladequest is my welcome gift for you and I really would like to share my knowledge and skill to create great games from your feedback and input! To do so, I need your help! That I can achieve a game you like to play, let me know what your likings are. To do so just participate in this survey! Bladequest is meant to show you, what I am able to do and what could be possible.  It is currently lacking some features and content, but we can build upon it, whatever we like! Join my email list to stay in touch, share ideas and craft an epic game together. Let’s do it! Before working on Bladequest I worked on a project called Dreamdale, which I decided to cancel for the moment, as I want to go towards smaller games and Dreamdale turned out to be a pretty huge project. I most likely will use some of the ideas in one way or another. If you are interested in the project check it out here! What next? So after I recieved enough feedback about Bladequest and about your preferences, I can sit down and concept some ideas for cool games. I then will present them to you and we will step by step create an awesome game together. Learning from my former mistakes, I would also like to start small and slowly grow bigger, so better expect a smaller and simpler, but well-thought out game. I even though about mobile games, although I prefer developing for PC, but I will do anything I can to deliever just the best experience.   Get Bladequest - TFC NOW! As always stay cool! Markus from Phodex

PROJECT LAUNCH

The project is officialy going to start soon... 15.06.2018 JOIN US NOW!   ©2018Gamehawk Visit us: www.gamehawk.de

Gamehawk

Gamehawk

What Makes a Good Brand Name?

In evaluating possible company names I have come up with the following criteria which I used to choose a name for our new game engine. Spelling and Pronunciation
The name should be unambiguous in spelling. This helps promote word-of-mouth promotion because when someone hears the name for the first time, they can easily find it online. Similarly, the name when read should be unambiguous in pronunciation. This helps the name travel from written to spoken word and back. Can you imagine telling someone else the name of this...establishment...and having them successfully type the name into a web browser?: Shorter is Better
Everything else aside, fewer letters is generally better. Here is a very long company name: And here is perhaps the shortest software company name in history. Which do you think is better? The Name Should "Pop"
A good company or product name will use hard consonants like B, T, K, X, and avoid soft sounding letters like S and F. The way a name sounds can actually influence perception of the brand, aside from the name meaning. The name "Elysium", besides being hard to pronounce and spell, is full of soft consonants that sound weak. "Blade Runner", on the other hand, starts with a hard B sound and it just sounds good. Communicate Meaning
The name should communicate the nature of the product or company. The name "Uber" doesn't mean anything except "better", which is why the company Uber originally launched as UberCab. Once they got to a certain size it was okay to drop the "cab" suffix, but do you remember the first time you heard of them? You probably thought "what the heck is an Uber?" The Leadwerks Brand
So according to our criteria above, the name Leadwerks satisfies the following conditions: The name "pops" and sounds cool. It's not too long. But here's where it falls short: Ambiguity in spelling (Leadworks?) Ambiguity in pronunciation. Leadwerks is pronounced like Led Zeppelin, but many people read it as "Leed-works". The name doesn't mean anything, even if it sounds cool. It's just a made-up word. These are the reasons I started thinking about naming the new engine something different. New Engine, New Name
So with this in mind, I set out to find a new name for the new coming engine. I was stumped until I realized that there are only so many words in the English language, and any good name you come up will invariably have been used previously in some other context, hopefully in another industry or product type. Realizing this gave me more leeway, as I did not have to come up with something completely unique the world has never heard before. Our early benchmarks indicate the new engine is a performance monster, with incredible results I did not even dream were possible. Together with the rapid development pipeline of Leadwerks, I knew I wanted to focus on speed. Finally, there was one name I kept coming back to for weeks on end. I was able to obtain a suitable domain name. I am now filing a trademark for use of this name, which requires that I begin using it commercially, which is why I am now revealing the name for the first time...                             Keep scrolling.                               How does this name stack up?: Unambiguous spelling and pronunciation. It's short. The name "pops". It communicates the defining feature of the product. Now think about our goals for the new engine's name. Will people have any trouble remembering this name? Is there any ambiguity about what the product stands for, and the promise it makes? If two developers are at a Meetup group and one of them says "I made this with Turbo" is there any doubt what the promise of this product is, i.e. massive performance? The name even works on a subconscious level. Anyone having trouble with their game performance (in other slow engines that aren't Turbo) will naturally wonder how fast it could be running in ours. The fact that the name has a positive emotional response for many people and a strong connection to the game industry is a plus. Turbo Game Engine is an unambiguous brand name that takes a stand and makes a clear promise of one thing: speed, which is incredibly important in the days of VR and 240 hz screens.

Josh

Josh

Voxel Cone Tracing

I've begun working on an implementation of voxel cone tracing for global illumination. This technique could potentially offer a way to perfrorm real-time indirect lighting on the entire scene, as well as real-time reflections that don't depend on having the reflected surface onscreen, as screen-space reflection does. I plan to perform the GI calculations all on a background CPU thread, compress the resulting textures using DXTC, and upload them to the GPU as they are completed. This means the cost of GI should be quite low, although there is going to be some latency in the time it takes for the indirect lighting to match changes to the scene. We might continue to use SSR for detailed reflections and only use GI for semi-static light bounces, or it might be fast enough for moving real-time reflections. The GPU-based implementations I have seen of this technique are techically impressive but suffer from terrible performance, and we want something fast enough to run in VR. The first step is to be able to voxelize models. The result of the voxelization operation is a bunch of points. These can be fed into a geometry shader that generates a box around each one: void main() { vec4 points[8]; points[0] = projectioncameramatrix[0] * (geometry_position[0] + vec4(-0.5f * voxelsize.x, -0.5f * voxelsize.y, -0.5f * voxelsize.z, 0.0f)); points[1] = projectioncameramatrix[0] * (geometry_position[0] + vec4(0.5f * voxelsize.x, -0.5f * voxelsize.y, -0.5f * voxelsize.z, 0.0f)); points[2] = projectioncameramatrix[0] * (geometry_position[0] + vec4(0.5f * voxelsize.x, 0.5f * voxelsize.y, -0.5f * voxelsize.z, 0.0f)); points[3] = projectioncameramatrix[0] * (geometry_position[0] + vec4(-0.5f * voxelsize.x, 0.5f * voxelsize.y, -0.5f * voxelsize.z, 0.0f)); points[4] = projectioncameramatrix[0] * (geometry_position[0] + vec4(-0.5f * voxelsize.x, -0.5f * voxelsize.y, 0.5f * voxelsize.z, 0.0f)); points[5] = projectioncameramatrix[0] * (geometry_position[0] + vec4(0.5f * voxelsize.x, -0.5f * voxelsize.y, 0.5f * voxelsize.z, 0.0f)); points[6] = projectioncameramatrix[0] * (geometry_position[0] + vec4(0.5f * voxelsize.x, 0.5f * voxelsize.y, 0.5f * voxelsize.z, 0.0f)); points[7] = projectioncameramatrix[0] * (geometry_position[0] + vec4(-0.5f * voxelsize.x, 0.5f * voxelsize.y, 0.5f * voxelsize.z, 0.0f)); vec3 normals[6]; normals[0] = (vec3(-1,0,0)); normals[1] = (vec3(1,0,0)); normals[2] = (vec3(0,-1,0)); normals[3] = (vec3(0,1,0)); normals[4] = (vec3(0,0,-1)); normals[5] = (vec3(0,0,1)); //Left geometry_normal = normals[0]; gl_Position = points[0]; EmitVertex(); gl_Position = points[4]; EmitVertex(); gl_Position = points[3]; EmitVertex(); gl_Position = points[7]; EmitVertex(); EndPrimitive(); //Right geometry_normal = normals[1]; gl_Position = points[1]; EmitVertex(); gl_Position = points[2]; EmitVertex(); ... } Here's a goblin who's polygons have been turned into Lego blocks. Now the thing most folks nowadays don't realize is that if you can voxelize a goblin, well then you can voxelize darn near anything. Global illumination will then be calculated on the voxels and fed to the GPU as a 3D texture. It's pretty complicated stuff but I am very excited to be working on this right now. If this works, then I think environment probes are going to completely go away forever. SSR might continue to be used as a low-latency high-resolution first choice when those pixels are available onscreen. We will see. It is also interesting that the whole second-pass reflective water technique will probably go away as well, since this technique should be able to handle water reflections just like any other material.

Josh

Josh

Three improvements I made to Leadwerks Game Engine 5 today

First, I was experiencing some crashes due to race conditions. These are very very bad, and very hard to track down. The problems were being caused by reuse of thread returned objects. Basically, a thread performs some tasks, returns an object with all the processed data, and then once the parent thread is done with that data it is returned to a pool of objects available for the thread to use. This is pretty complicated, and I found that when I switched to just creating a new return object each time the thread runs, the speed was the same as before. So the system is nice and stable now. I tend to be very careful about sharing data between threads and only doing it in a prescribed manner (through a command buffer and using separate objects) and I will continue to use this approach. Second, I added a built-in mouselook mode for cameras. You can call Camera::SetFreeLook(true) and get a automatic mouse controls that make the camera look around. I am not doing this to make things easier, I am doing it because it allows fast snappy mouse looking even if your game is running at a lower frequency. So you can run your game at 30 hz, giving you 33 milliseconds for all your game code to complete, but it will feel like 60+ hz because the mouse will update in the rendering thread, which is running at a faster speed. The same idea will be used to eliminate head movement latency in VR. Finally, I switched the instance indexes that are uploaded to the GPU from integers to 16-bit unsigned shorts. You can still have up to 131072 instances of a single object, because the engine will store instances above and below 65536 in two separate batches, and then send an integer to the shader to add to the instance index. Again, this is an example of a hard limit I am putting in place in order to make a more structured and faster performing engine, but it seems like the constraints I am setting so far are unlikely to even be noticed. Animation is working great, and performance is just as fast as before I started adding it, so things are looking good. Here's a funny picture of me trying to add things to the renderer to slow it down and failing : I'm not sure what I will tackle next. I could work on threading the physics and AI, spend some time exploring new graphics options, or implement lighting so that we have a basic usable version of Leadwerks 5 for indoors games. What would you like to see next in the Leadwerks Game Engine 5 Alpha?

Josh

Josh

Threaded Animation

The animation update routine has been moved into its own thread now where it runs in the background as you perform your game logic. We can see in the screenshot below that animation updates for 1025 characters take about 20 milliseconds on average. (Intel graphics below, otherwise it would be 1000 FPS lol.) In Leadwerks 4 this would automatically mean that your max framerate would be 50 FPS, assuming nothing else in the game loop took any time at all. Because of the asynchronous threaded design of Leadwerks 5, this otherwise expensive operation has no impact whatsoever on framerate! The GPU is being utilized by a good amount (96%) while the CPU usage is actually quite low at 10%, even though there are four threads running: Although the performance here is within an acceptable limit for a game running with a 30 hz loop (it's under 33 milliseconds) it would be too slow for a 60 hz game. (Note that game frequency and framerate are two different things.) In order to get the animation time under the 16.667 milliseconds that a 60 hz game allows, we can split the animation job up onto several different threads. This job is very easily parallelized, so the animation time is just the single-threaded time divided by the number of threads. We can't make our game run faster by adding more threads unnecessarily, all we have to do is make sure the job is completed within the allocated amount of time so the engine keeps running at the correct speed. When I split the task into two threads, the average update time is about 10 milliseconds, and CPU usage only goes up 2%. Splitting the task into 16 threads brings the average time down to 1-2 milliseconds, and CPU usage is still only at 15%. What does this mean? Well, it seems each thread is spending a lot of time paused (intentionally) and we haven't begun to scratch the surface of CPU utilization. So I will do my best to keep the CPU clear for all your game code and at the same time Leadwerks Game Engine 5 will be using A LOT of threads in the background for animation, physics, navigation, and rendering. The performance we're seeing with this system is absolutely incredible, beyond anything I imagined it would be when I started building an engine specifically designed for modern PC hardware. We're seeing results that are 10 times faster than anything else out there. In fact, here are 10,000 animated characters running at 200+ FPS on a GEForce 1070 with no LOD or any special optimization tricks. (Thanks to @AggrorJorn for the screen capture.) It remains to be seen how performance is when lights, physics, and AI are added, but so far it looks extremely good. In fact, for anyone making an RTS game with lots of characters, Leadwerks 5 may be the only reasonable choice due to the insane performance it gets!

Josh

Josh

Three Types of Optimization

In designing the new engine, I have found that there are three distinct types of optimization. Streamlining
This is refinement. You make small changes and try to gain a small amount of performance. Typically, this is done as a last step before releasing code. The process can be ongoing, but suffers from diminishing returns after a while. When you eliminate unnecessary math based on guaranteed assumptions you are streamlining code. For example, a 4x4 matrix multiplication can skip the calculations to fill the right-most column if the matrices are guaranteed to be orthogonal (non-sheared). Quality Degradation
This is when you downgrade the quality of your results within a certain tolerable level where it won't be noticed much. An example of this is using a low-resolution copy of a model when it is far away from the camera. Quality degradation can be pretty arbitrary, and can mask your true performance, so it's best to keep an option to disable this. Architectural
By designing algorithms in a way that makes maximum use of hardware and produces the most optimum results, we can greatly increase performance. Architectural optimization produces groundbreaking changes that can be ten or 100 times faster than the old architecture. An example of this is GPU hardware, which produces a massive performance increase over software rendering. We're seeing a lot of these types of improvements in Leadwerks Game Engine 5 because the entire system is being designed to make maximum use of modern graphics hardware.  

Josh

Josh

Revisiting Border Recon

Ok I'm finally done with my obligations to my University as a student. Now it's time to come back to game dev and finish up (hopefully) the Border Recon development.  While I was busy in University, I did do a bit of work on the game here and there. I added a spectator feature that allows the players to spectate others in free mode and third person while waiting to respawn after dying. Pretty cool. Also, I think some of us have seen the earlier changes with the style of the graphics... I changed the style of the graphics from HQ to just flat pastel colours resulting in a cartoony look in-game. I like it... requires less work on the level design details and it actually looks quite nice. Another thing that's cool coming in the next version: soundtracks! I found this cool soundtrack from gamedevmarket website and I think it fits perfectly to the game's theme. I'm going to work on that so I can add a bit of music to the game. Link to the soundtrack here: https://www.gamedevmarket.net/asset/brass-attacks/ Also, I'm working on the muzzle flash feature for the weapons, it's half done now and it looks kick ***. Screenshots coming soon! Thanks for reading folks!      

jen

jen

Animation Tweening

Leadwerks 5 uses a different engine architecture with a game loop that runs at either 30 (default) or 60 updates per second. Frames are passed to the rendering thread, which runs at an independent framerate that can be set to 60, 90, or unlimited. This is great for performance but there are some challenges in timing. In order to smooth out the motion of the frames, the results of the last two frames received are interpolated between. Animation is a big challenge for this. There could potentially be many, many bones, and interpolating entire skeletons could slow down the renderer. In the screen capture below, I have slowed the game update loop down to 5 updates per second to exaggerate the problem that occurs when no interpolation is used: My solution was to upload the 4x4 matrices of the previous two frames and perform the tweening inside the vertex shader: //Vertex Skinning mat4 animmatrix[8]; for (int n=0; n<4; ++n) { if (vertex_boneweights[n] > 0.0f) { animmatrix[n] = GetAnimationMatrix(vertex_boneindices[n],0); animmatrix[n + 4] = GetAnimationMatrix(vertex_boneindices[n],1); } } vec4 vertexpos = vec4(vertex_position,1.0f); vec4 modelvertexposition; for (int n=0; n<4; ++n) { if (vertex_boneweights[n] > 0.0f) { modelvertexposition += animmatrix[n] * vertexpos * vertex_boneweights[n] * rendertweening + animmatrix[n+4] * vertexpos * vertex_boneweights[n] * (1.0f - rendertweening); } } modelvertexposition = entitymatrix * modelvertexposition; Bone matrices are retrieved from an RGBA floating point texture with this function: mat4 GetAnimationMatrix(const in int index, const in int frame) { ivec2 coord = ivec2(index * 4, gl_InstanceID * 2 + frame); mat4 bonematrix; bonematrix[0] = texelFetch(texture14, coord, 0); bonematrix[1] = texelFetch(texture14, coord + ivec2(1,0), 0); //bonematrix[2] = texelFetch(texture14, coord + ivec2(2,0), 0); bonematrix[2].xyz = cross(bonematrix[0].xyz,bonematrix[1].xyz); //removes one texture lookup! bonematrix[2].w = 0.0f; bonematrix[3] = texelFetch(texture14, coord + ivec2(3,0), 0); return bonematrix; } This slows down the shader because up to 24 texel fetches might be performed per vertex, but it saves the CPU from having to precompute interpolated matrices for each bone. In VR, I think this cost savings is critical. Doing a linear interpolation between vertex positions is not exactly correct, but it's a lot faster than slerping a lot of quaternions and converting them to matrices, and the results are so close you can't tell any difference. There's actually a similar concept in 2D animation I remember reading about.when I was a kid. The book is called The Illusion of Life: Disney Animation and it's a really interesting read with lots of nice illustrations. Here is the same scene with interpolation enabled. It's recorded at 15 FPS so the screen capture still looks a little jittery, but you get the idea: Adding interpolation brought this scene down to 130 FPS from 200 on an Intel chip, simply because of the increased number of texel fetches in the vertex shader. Each character consists of about 4000 vertices. I expect on a discrete card this would be running at pretty much the max framerate (1000 or so). With this in place, I can now confirm that my idea for the fast rendering architecture in Leadwerks Game Engine 5 definitely works. The next step will be to calculate animations on a separate thread (or maybe two). My test scene here is using a single skeleton shared by all characters, but putting the animation on its own thread will allow many more characters to all be animated uniquely.

Josh

Josh

First Animation Metrics

I got skinned animation working in the new renderer, after a few failed attempts that looked like something from John Carpenter's The Thing. I set up a timer and updated a single animation on a model 10,000 times. Animation consists of two phases. First, all animations are performed to calculate the local position and quaternion rotation. Second, 4x4 matrices are calculated for the entire hierarchy in global space and copied into an array of floats. To test this, I placed this code inside the main loop: float frame = Millisecs() / 10.0f; auto tm = Millisecs(); for (int n = 0; n < 10000; ++n) { monster->skeleton->SetAnimationFrame(frame, 1, 1, true); monster->UpdateSkinning(); } int elapsed = Millisecs() - tm; Print(elapsed); The result in release mode was around 60 milliseconds. When I tested each command lone, I found that UpdateSkinning() took around 30-35 milliseconds while SetAnimationFrame() hovered around 20 milliseconds. When I cut the number of iterations in half, the result was right around 30 milliseconds, which is within our window of time (33 ms) for games that run at 30 hz. If your game uses a 60 hz loop then you can cut that number in half. The model I am using also only has 24 bones, but models with up to 256 bones are supported (with a pretty linear performance cost). Now this is with a single call to SetAnimationFrame. If the animation manager is in use there could potentially be many more calculations performed as animations are smoothly blended. Splitting the animations up into multiple threads could be done easily, but most computers only have four CPUs, so I don't see this being useful on more than 2-3 threads. Let's say we dedicate two threads to animation. That means right now our theoretical limit is about 10,000 zombies onscreen at once. I would like to go higher, but I think this is probably our realistic limit for CPU-performed animations. The alternative would be to upload the animation sequences themselves to the GPU and perform all animation entirely on the GPU, but then we would lose all feedback on the CPU side like the current bone orientations. Perhaps a solution would be to have both a CPU and GPU animation system that produces identical results, and the CPU side stuff would only be called when needed, but that makes things pretty complicated and I am not sure I want to go down that road. In reality, the final results will probably be quite a lot less than this when all functionality is added, but from this data we can reasonably extrapolate that Leadwerks 5 will support thousands of animated characters onscreen. According to the developers of the Dead Rising series, a few thousand is the practical limit of how many characters you would ever want onscreen, so this is encouraging. Of course, there is no limit on the number of offscreen characters, since animation is only performed for characters that appear onscreen.

Josh

Josh

Animation in Leadwerks 5

The design of Leadwerks 4 was meant to be flexible and easy to use. In Leadwerks 5, our foremost design goals are speed and scalability. In practical terms that means that some options are going to go away in order to give you bigger games that run faster. I'm working out the new animation system. There are a few different ways to approach this. In situations like this I find it is best to start by deciding the desired outcome and then figuring out how to achieve that. So what do we want? Fast performance with as many animated characters as possible. Hopefully, tens of thousands. Feedback to the CPU on the orientations of bones for things like parenting a weapon to the character's hand, firing a shot, collision detection with limbs, etc. Low memory usage (so we can have lots and lots of characters). In Leadwerks 4, a bone is a type of entity. This is convenient because the same entity positioning commands work just fine with bones, it's easy to parent a weapon to a limb, and there is a lot of consistency. However, this comes at a cost of potential performance as well as memory consumption. A stripped-down Bone class without all the overhead of the entity system would be more efficient when we hit really large numbers of animated models. So here's what I am thinking: Bones are a simplified class that do not have all the features of the entity system. The Model class has a "skeleton" member, which is the top-most bone in a hierarchy of bones for that model. You can call animation commands on bones only, and you cannot parent an entity to a bone, since the bone is not an entity. Instead you can attach it by making a copy that is weighted 100% to the bone you specify, and it becomes part of the animated model: weapon->Attach(model->FindBone("r_hand")); If you have any hierarchy in that weapon model, like a pivot to indicate where the firing position is, it would be lost, so you will need to retrieve those values in your script and save them before attaching the weapon. This also means bones won't appear in the map as an editable entity, which I would argue is a good thing, since they clog up the hierarchy with thousands of extra entities. When you call an animation command, it will be sent to the animation thread the next time the game syncs in the World::Update() command. Animations are then performed on a copy of all the visible skeletons in the scene, and their 4x4 matrices are retrieved during the next call to World::Update(). Animation data is then passed to the rendering thread where it is fed into a float texture the animation shader reads to retrieve the bone orientations for each instance in the batch it is rendering. This means there is latency in the system and everything is always one frame behind, but your animations will all be performed on a separate thread and thus have basically no cost. In fact with the simplified bone class, it might not even be necessary to use a separate thread, since carrying out the animations is just a lot of quaternion Slerps and matrix multiplications. I'll have to try it and just see what the results are. The bottlenecks here are going to be the number of animations we can calculate, the passing of data from the game thread to the animation thread and back, and the passing of data from the rendering thread to the GPU. It's hard to predict what we will hit first, but those are the things I have in mind right now. It would be possible to carry out the animation transforms entirely on the GPU, but that would result in us getting no feedback whatsoever on the orientation of limbs. So that's not really useful for anything but a fake tech demo. I don't know, maybe it's possible to get the results asynchronously with a pixel buffer object. In addition to animation, having tons of characters also requires changes to the physics and navmesh system, which I am already planning. The end result will be a much more scalable system that always provides fast performance for VR. As we are seeing, the optimizations made for VR are having a beneficial effect on general performance across the board. As explained above, this may sometimes require a little more work on your part to accomplish specific things, but the benefits are well worth it, as we will easily be able to run games with more characters than the video below, in VR, perhaps even on Intel graphics.  

Josh

Josh

Second Performance Test: nearly 400% faster!

After observing the behavior of the previous test, I rearranged the threading architecture for even more massive performance gains. This build runs at speeds in excess of 400 FPS with 100,000 entities....on Intel integrated graphics! I've had more luck with concurrency in design than parallelism. (Images below are taken from here.) Splitting the octree recursion up into separate threads produced only modest gains. It's difficult to optimize because the sparse octree is unpredictable. Splitting different parts of the engine up into multiple threads did result in a massive performance boost. The same test in Leadwerks 4 runs at about 9 FPS. making Leadwerks 5 more than 45 times faster under heavy loads like this. Alpha subscribers can try the test out here.

Josh

Josh

First performance demonstration

I am proud to show off our first performance demonstration which proves that my idea for the Leadwerks 5 renderer works. To test the renderer I created 100,000 instanced boxes. The demo includes both regular and a mock VR mode that simulates single-pass stereoscopic rendering with a geometry shader. The hardware I tested on is an Intel i7-4770R (for graphics too) which is a few years old. Now this is not a perfect benchmark for several reasons. There is no frustum culling being performed, the renderer just adds everything into the scene and renders it. I am not showing threaded and non-threaded performance side by side. You may also see all objects disappear for a single frame occasionally, and some triangles may be discarded prematurely in stereoscopic mode. However, the results are incredible and worth bragging about. In normal mode, with a polygon load of 1.2 million per frame, this little machine with integrated graphics is getting 115 FPS, and in mock VR mode (2.4 million polys) it is hovering right around 90! With 100,000 objects, on integrated graphics! Alpha subscribers can download the test now. The secret of this massive performance is an efficient architecture built to unlock the full power of your graphics hardware. Below you can see that GPU utilization is around 95%:

Josh

Josh

Multithreaded Rendering

After working out a thread manager class that stores a stack of C++ command buffers, I've got a pretty nice proof of concept working. I can call functions in the game thread and the appropriate actions are pushed onto a command buffer that is then passed to the rendering thread when World::Render is called. The rendering thread is where all the (currently) OpenGL code is executed. When you create a context or load a shader, all it does is create the appropriate structure and send a request over to the rendering thread to finish the job: Consequently, there is currently no way of detecting if OpenGL initialization fails(!) and in fact the game will still run along just fine without any graphics rendering! We obviously need a mechanism to detect this, but it is interesting that you can now load a map and run your game without ever creating a window or graphics context. The following code is perfectly legitimate in Leawerks 5: #include "Leadwerks.h" using namespace Leadwerks int main(int argc, const char *argv[]) { auto world = CreateWorld() auto map = LoadMap(world,"Maps/start.map"); while (true) { world->Update(); } return 0; } The rendering thread is able to run at its own frame rate independently from the game thread and I have tested under some pretty extreme circumstances to make sure the threads never lock up. By default, I think the game loop will probably self-limit its speed to a maximum of 30 updates per second, giving you a whopping 33 milliseconds for your game logic, but this frequency can be changed to any value, or completely removed by setting it to zero (not recommended, as this can easily lock up the rendering thread with an infinite command buffer stack!). No matter the game frequency, the rendering thread runs at its own speed which is either limited by the window refresh rate, an internal clock, or it can just be let free to run as fast as possible for those triple-digit frame rates. Shaders are now loaded from multiple files instead of being packed into a single .shader file. When you load a shader, the file extension will be stripped off (if it is present) and the engine will look for .vert, .frag, .geom, .eval, and .ctrl files for the different shader stages: auto shader = LoadShader("Shaders/Model/diffuse"); The asynchronous shader compiling in the engine could make our shader editor a little bit more difficult to handle, except that I don't plan on making any built-in shader editor in the new editor! Instead I plan to rely on Visual Studio Code as the official IDE, and maybe add a plugin that tests to see if shaders compile and link on your current hardware. I found that a pragma statement can be used to indicate include files (not implemented yet) and it won't trigger any errors in the VSCode intellisense: Although restructuring the engine to work in this manner is a big task, I am making good progress. Smart pointers make this system really easy to work with. When the owning object in the game thread goes out of scope, its associated rendering object is also collected...unless it is still stored in a command buffer or otherwise in use! The relationships I have worked out work perfectly and I have not run into any problems deciding what the ownership hierarchy should be. For example, a context has a shared pointer to the window it belongs to, but the window only has a weak pointer to the context. If the context handle is lost it is deleted, but if the window handle is lost the context prevents it from being deleted. The capabilities of modern C++ and modern hardware are making this development process a dream come true. Of course with forward rendering I am getting about 2000 FPS with a blank screen and Intel graphics, but the real test will be to see what happens when we start adding lots of lights into the scene. The only reason it might be possible to write a good forward renderer now is because graphics hardware has gotten a lot more flexible. Using a variable-length for loop and using the results of a texture lookup for the coordinates of another lookup  were a big no-no when we first switched to deferred rendering, but it looks like that situation has improved. The increased restrictions on the renderer and the total separation of internal and user-exposed classes are actually making it a lot easier to write efficient code. Here is my code for the indice array buffer object that lives in the rendering thread: #include "../../Leadwerks.h" namespace Leadwerks { OpenGLIndiceArray::OpenGLIndiceArray() : buffer(0), buffersize(0), lockmode(GL_STATIC_DRAW) {} OpenGLIndiceArray::~OpenGLIndiceArray() { if (buffer != 0) { #ifdef DEBUG Assert(glIsBuffer(buffer),"Invalid indice buffer."); #endif glDeleteBuffers(1, &buffer); buffer = 0; } } bool OpenGLIndiceArray::Modify(shared_ptr<Bank> data) { //Error checks if (data == nullptr) return false; if (data->GetSize() == 0) return false; //Generate buffer if (buffer == 0) glGenBuffers(1, &buffer); if (buffer == 0) return false; //shouldn't ever happen //Bind buffer glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, buffer); //Set data if (buffersize == data->GetSize() and lockmode == GL_DYNAMIC_DRAW) { glBufferSubData(GL_ELEMENT_ARRAY_BUFFER, 0, data->GetSize(), data->buf); } else { if (buffersize == data->GetSize()) lockmode = GL_DYNAMIC_DRAW; glBufferData(GL_ELEMENT_ARRAY_BUFFER, data->GetSize(), data->buf, lockmode); } buffersize = data->GetSize(); return true; } bool OpenGLIndiceArray::Enable() { if (buffer == 0) return false; if (buffersize == 0) return false; glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, buffer); return true; } void OpenGLIndiceArray::Disable() { glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0); } } From everything I have seen, my gut feeling tells me that the new engine is going to be ridiculously fast. If you would like to be notified when Leadwerks 5 becomes available, be sure to sign up for the mailing list here.

Josh

Josh

Building a Zero-Overhead Renderer

The Leadwerks 4 renderer was built for maximum flexibility. The Leadwerks 5 renderer is being built first and foremost for great graphics with maximum speed. This is the fundamental difference between the two designs. VR is the main driving force for this direction, but all games will benefit. Multithreaded Design Leadwerks 4 does make use of multithreading in some places but it is fairly simplistic. In Leadwerks 5 the entire architecture is based around separate threads, which is challenging but a lot of fun for me to develop. I worked out a way to create a command buffer on the main thread that stores a list of commands for the rendering thread to perform during the next rendering frame. (Thanks for the tip on Lambda functions @Crazycarpet) Each object in the main thread has a simplified object it is associated with that lives in the rendering thread. For example, each Camera has a RenderCamera object that corresponds to it. Here's how changes in the main thread get added to a command buffer to be executed when the rendering thread is ready: void Camera::SetClearColor(const float r,const float g,const float b,const float a) { clearcolor.x = r; clearcolor.y = g; clearcolor.z = b; clearcolor.w = a; #ifdef LEADWERKS_5 GameEngine::cullingthreadcommandbuffer.push_back( [this->rendercamera, this->clearcolor]() { rendercamera->clearcolor = clearcolor; } ); #endif } The World::Render() command is still there for conceptual consistency, but what it really does it add all the accumulated commands onto a stack of command buffers for the rendering thread to evaluate whenever it's ready: void World::Render(shared_ptr<Buffer> buffer) { //Add render call onto command buffer GameEngine::cullingthreadcommandbuffer.push_back(std::bind(&RenderWorld::AddToRenderQueue, this->renderworld)); //Copy command buffer onto culling command buffer stack GameEngine::CullingThreadCommandBufferMutex->Lock(); GameEngine::cullingthreadcommandbufferstack.push_back(GameEngine::cullingthreadcommandbuffer); GameEngine::CullingThreadCommandBufferMutex->Unlock(); //Clear the command buffer and start over GameEngine::cullingthreadcommandbuffer.clear(); } The rendering thread is running in a loop inside a function that looks something like this: shared_ptr<SharedObject> GameEngine::CullingThreadEntryPoint(shared_ptr<SharedObject> o) { while (true) { //Get the number of command stacks that are queued CullingThreadCommandBufferMutex->Lock(); int count = cullingthreadcommandbufferstack.size(); CullingThreadCommandBufferMutex->Unlock(); //For each command stack for (int i = 0; i < count; ++i) { //For each command for (int n = 0; n < cullingthreadcommandbufferstack[i].size(); ++n) { //Execute command cullingthreadcommandbufferstack[i][n](); } } //Remove executed command stacks CullingThreadCommandBufferMutex->Lock(); int newcount = cullingthreadcommandbufferstack.size(); if (newcount == count) { cullingthreadcommandbufferstack.clear(); } else { memcpy(&cullingthreadcommandbufferstack[0], &cullingthreadcommandbufferstack[count], sizeof(sizeof(cullingthreadcommandbufferstack[0])) * (newcount - count)); cullingthreadcommandbufferstack.resize(newcount); } CullingThreadCommandBufferMutex->Unlock(); //Render queued worlds for (auto it = RenderWorld::renderqueue.begin(); it != RenderWorld::renderqueue.end(); ++it) { (it->first)->Render(nullptr); } } return nullptr; } I am trying to design the system for maximum flexibility with the thread speeds so that we can experiment with different frequencies for each stage. This is why the rendering thread goes through and executes all commands an all accumulated command buffers before going on to actually render any queued world. This prevents the rendering thread from rendering an extra frame when another one has already been received (which shouldn't really happen, but we will see). As you can see, the previously expensive World::Render() command now does almost nothing before returning to your game loop. I am also going to experiment with running the game loop and the rendering loop at different speeds. So let's say previously your game was running at 60 FPS and 1/3 of that time was spent rendering the world. This left you without about 11 milliseconds to execute your game code, or things would start to slow down. With the new design your game code could have up to 33 milliseconds to execute without compromising the framerate. That means your code could be three times more complex, and you would not have to worry so much about efficiency, since the rendering thread will keep blazing away at a much faster rate. The game loop is a lot simpler now with just two command you need to update and render the world. This gives you a chance to adjust some objects after physics and before rendering. A basic Leadwerks 5 program is really simple: #include "Leadwerks.h" using namespace Leadwerks; int main(int argc, const char *argv[]) { auto window = CreateWindow("MyGame"); auto context = CreateContext(window); auto world = CreateWorld(); auto camera = CreateCamera(world); while (true) { if (window->KeyHit(KEY_ESCAPE) or window->Closed()) return 0; world->Update(); world->Render(context); } } This may cause problems if you try to do something fancy like render a world to a buffer and then use that buffer as a texture in another world. We might lose some flexibility there, and if we do I will prioritize speed over having lots of options. Clustered Forward Rendering Leadwerks has used a deferred renderer since version 2.1. Version 2.0 was a forward renderer with shadowmaps, and it didn't work very well. At the time, GPUs were not very good at branching logic. If you had an if / else statement, the GPU would perform BOTH branches (including expensive texture lookups) and take the result of the "true" one. To get around this problem, the engine would generate a new version of a shader each time a new combination of lights were onscreen, causing period microfreezes when a new shader was loaded. In 2.1 we switched to a deferred renderer which eliminated these problems. Due to increasingly smart graphics hardware and more flexible modern APIs a new technique called clustered forward rendering is now possible, offering flexibility similar to a deferred renderer, with the increased speed of a forward renderer. Here is a nice article that describes the technique:
http://www.adriancourreges.com/blog/2016/09/09/doom-2016-graphics-study/ This approach is also more scalable. Extra renders to the normal buffer and other details can be skipped for better scaling on integrated graphics and slower hardware. I'm not really targeting slow hardware as a priority, but I wouldn't be surprised if it ran extremely fast on integrated graphics when the settings are turned down. Of course, the system requirements will be the same because we need modern API features to do this. I'm still a little foggy on how custom post-processing effects will be implemented. There will definitely be more standard features built into the renderer. For example, SSR will be mixed with probe reflections and a quality setting (off, static, dynamic) will determine how much processing power is used for reflections. If improved performance and integration comes at the cost of reduced flexibility in the post-process shaders, I will choose that option, but so far I don't foresee any problems. Vulkan Graphics The new renderer is being developed with OpenGL 4.1 so that I can make a more gradual progression, but I am very interested in moving to Vulkan once I have the OpenGL build worked out. Valve made an agreement with the developers of MoltenVK to release the SDK for free. This code translates Vulkan API calls into Apple's Metal API, so you basically have Vulkan running on Mac (sort of). I previously contacted the MoltenVK team about a special license for Leadwerks that would allow you guys to release your games on Mac without buying a MoltenVK license, but we did not reach any agreement and at the time the whole prospect seemed pretty shaky. With Valve supporting this I feel more confident going in this direction. In fact, due to the design of our engine, it would be possible to outsource the task of a Vulkan renderer without granting any source code access or complicating the internals of the engine one bit.

Josh

Josh

Lua in Leadwerks 5 is Solved

When considering the script system in Leadwerks 5, I looked at alternatives including Squirrel, which is used by Valve in many games, but these gave me a deeper appreciation for the simplicity of Lua. There are only a handful of rules you need to learn to use the language, it’s fun to use, yet somehow it does everything you could ever need. These were three big issues I had to solve. First, the Leadwerks 5 API makes extensive use of smart pointers, which our binding library tolua++ does not support. Second, I wanted better auto completion and a better user experience in the IDE in general. Third, if an external IDE is going to be used it needs to be able to interface with the Leadwerks debugging system. To support smart pointers, I found a new library called sol2 that does everything we need. @Rick and I discussed the idea at great length and I am happy to say we’ve come up with a design that is simple to use and quite a bit easier than Leadwerks 4.x even. The binding code is nowhere near done but at this point I can see that everything will work. @AggrorJorn suggested using Visual Studio Code as our official script IDE in Leadwerks 5, and after investigating I think it’s a great idea. The auto completion is quite good and the IDE feels more natural then anything I could come up with using a custom text editor made with Scintilla. In fact eliminating the built-in script editor in Leadwerks 5 relieves me of a lot of uncertainty and potential issues when this is written. Finally, VS Code has support for custom debuggers. I wrote an example command line debugger for Leadwerks and I will use this to show another programmer how to interface with Leadwerks. (I don’t plan on writing the debugger myself.) With your feedback and ideas are shaping up to make Leadwerks 5 a huge leap forward over our previous designs. The improved simplicity of the new script system is a big cognitive relief. Having fewer things to worry about makes life better in a subtle but definite way. There’s something else that consumes a lot of mental attention. Social media and the internet have grown and changed over the years and become more efficient at consuming our attention. (Many features of this site are designed the same way.) The scary thing is that normal non-technical people seem to be more vulnerable than nerds. We’ll fire up the Witcher and play for an hour, but regular people are checking their phones 24/7 for feedback and validation. It’s much much worse than any accusation we got as kids of being “Nintendo zombies” because we spent an afternoon playing games instead of staring passively at broadcast TV. People who play games generally don’t care about posting photographs of their food or collecting followers. Somewhere along the line the internet went from being a weird thing on your computer to the collective consciousness of humanity. Reality is online and the physical world around us is just a mirage, one possible instance of a million possible individual experiences. Maybe it was around the time they started using AI to optimize clickbait that things got out of hand. Although my career and the way I live my life are only possible through the internet, I am old enough to remember life before the web, and in many ways it was better. Things were more focused. Even the early web before clickbait ads and online echo chambers was pretty nice. You could go to a record store and hang out talking to people about new music. Printed paper magazines were a thing. I already removed the link to our Google+ page in the website footer and no one noticed. I think about deleting our Facebook and twitter accounts, or at least not linking to them on our site. Why must every website pay homage to these monopolies? What are they doing for me, besides a limited flow of hits that pale in comparison to what my own email list brings in? I have written about this before but now that it is fashionable to criticize social media I might act on it. I don’t know, we’ll see. Please like, share, and retweet.

Josh

Josh

×