# Blogs

## Voxel Cone Tracing Part 3 - Raycasting

I added a raycast function to the voxel tree class and now I can perform raycasts between any two positions. This is perfect for calculating direct lighting. Shadows are calculated by performing a raycast between the voxel position and the light position, as shown in the screenshot below. Fortunately the algorithm seems to work great an there are no gaps or cracks in the shadow: Here is the same scene using a voxel size of 10 centimeters: If we move the light a little lower, you can see a shadow appearing near two edges of the floor: Why is this happening? Well, the problem is that at those angles, the raycast is hitting the neighboring voxel on the floor next to the voxel we are testing: You might think that if we just move one end of the ray up to the top of the voxel it will work fine, and you'd be right, in this situation. But with slightly different geometry, we have a new problem. So how do we solve this? At any given time, a voxel can have up to three faces that face the light (but it might have as few as one). In the image below I have highlighted the two voxel faces on the right-most voxel that face the light: If we check the neighboring voxels we can see that the voxel to the left is occupied, and therefore the left face does not make a good position to test from: But the top voxel is clear, so we will test from there: If we apply the same logic to the other geometry configuration I showed, we also get a correct result. Of course, if both neighboring voxels were solid then we would not need to perform a raycast at all because we know the light would be completely blocked at this position. The code to do this just checks which side of a voxel the light position is on. As it is written now, up to three raycasts may be performed per voxel: if (lightpos.x < voxel->bounds.min.x) { if (GetSolid(ix - 1, iy, iz) == false) { result = IntersectsRay(p0 - Vec3(voxelsize * 0.5f, 0.0f, 0.0f), lightpos); } } if (lightpos.x > voxel->bounds.max.x and result == false) { if (GetSolid(ix + 1, iy, iz) == false) { result = IntersectsRay(p0 + Vec3(voxelsize * 0.5f, 0.0f, 0.0f), lightpos); } } if (lightpos.y < voxel->bounds.min.y and result == false) { if (GetSolid(ix, iy - 1, iz) == false) { result = IntersectsRay(p0 - Vec3(0.0f, voxelsize * 0.5f, 0.0f), lightpos); } } if (lightpos.y > voxel->bounds.max.y and result == false) { if (GetSolid(ix, iy + 1, iz) == false) { result = IntersectsRay(p0 + Vec3(0.0f, voxelsize * 0.5f, 0.0f), lightpos); } } if (lightpos.z < voxel->bounds.min.z and result == false) { if (GetSolid(ix, iy, iz - 1) == false) { result = IntersectsRay(p0 - Vec3(0.0f, 0.0f, voxelsize * 0.5f), lightpos); } } if (lightpos.z > voxel->bounds.max.z and result == false) { if (GetSolid(ix, iy, iz + 1) == false) { result = IntersectsRay(p0 + Vec3(0.0f, 0.0f, voxelsize * 0.5f), lightpos); } } .With this correction the artifact disappears: It even works correctly at a lower resolution: Now our voxel raycast algorithm is complete. The next step will be to calculate direct lighting on the voxelized scene using the lights that are present.

## Voxel Cone Tracing Part 2 - Sparse Octree

At this point I have successfully created a sparse octree class and can insert voxelized meshes into it. An octree is a way of subdividing space into eight blocks at each level of the tree: A sparse octree doesn't create the subnodes until they are used. For voxel data, this can save a lot of memory. It was difficult to get the rounding and all the math completely perfect (and it has to be completely perfect!) but now I have a nice voxel tree that can follow the camera around and is aligned correctly to the world axis and units. The code that inserts a voxel is pretty interesting: A voxel tree is created with a number of levels, and the size of the structure is equal to pow(2,levels). For example, an octree with 8 levels creates a 3D grid of 256x256x256 voxels. Individual voxels are then inserted to the top-level tree node, which recursively calls the SetSolid() function until the last level is reached. A voxel is marked as "solid" simply by having a voxel node at the last level (0). (GetChild() has the effect of finding the specified child and creating it if it doesn't exist yet.) A bitwise flag is used to test which subnode should be called at this level. I didn't really work out the math, I just intuitively went with this solution and it worked as I expected: void VoxelTree::SetSolid(const int x, const int y, const int z, const bool solid) { int flag = pow(2, level); if (x < 0 or y < 0 or z < 0) return; if (x >= flag * 2 or y >= flag * 2 or z >= flag * 2) return; flag = pow(2, level - 1); int cx = 0; int cy = 0; int cz = 0; if ((flag & x) != 0) cx = 1; if ((flag & y) != 0) cy = 1; if ((flag & z) != 0) cz = 1; if (solid) { if (level > 0) { GetChild(cx, cy, cz)->SetSolid(x & ~flag, y & ~flag, z & ~flag, true); } } else { if (level > 0) { if (kids[cx][cy][cz] != nullptr) { kids[cx][cy][cz]->SetSolid(x & ~flag, y & ~flag, z & ~flag, false); } } else { //Remove self auto parent = this->parent.lock(); Assert(parent->kids[position.x][position.y][position.y] == Self()); parent->kids[position.x][position.y][position.y] = nullptr; } } } The voxel tree is built by adding all scene entities into the tree. From there it was easy to implement a simple raycast to see if anything was above each voxel, and color it black if another voxel is hit: And here is the same program using a higher resolution voxel tree. You can see it's not much of a stretch to implement ambient occlusion from here: At a voxel size of 0.01 meters (the first picture) the voxelization step took 19 milliseconds, so it looks like we're doing good on speed. I suspect the rest of this project will be more art than science. Stay tuned!

## What Makes a Good Brand Name?

In evaluating possible company names I have come up with the following criteria which I used to choose a name for our new game engine. Spelling and Pronunciation
The name should be unambiguous in spelling. This helps promote word-of-mouth promotion because when someone hears the name for the first time, they can easily find it online. Similarly, the name when read should be unambiguous in pronunciation. This helps the name travel from written to spoken word and back. Can you imagine telling someone else the name of this...establishment...and having them successfully type the name into a web browser?: Shorter is Better
Everything else aside, fewer letters is generally better. Here is a very long company name: And here is perhaps the shortest software company name in history. Which do you think is better? The Name Should "Pop"
A good company or product name will use hard consonants like B, T, K, X, and avoid soft sounding letters like S and F. The way a name sounds can actually influence perception of the brand, aside from the name meaning. The name "Elysium", besides being hard to pronounce and spell, is full of soft consonants that sound weak. "Blade Runner", on the other hand, starts with a hard B sound and it just sounds good. Communicate Meaning
The name should communicate the nature of the product or company. The name "Uber" doesn't mean anything except "better", which is why the company Uber originally launched as UberCab. Once they got to a certain size it was okay to drop the "cab" suffix, but do you remember the first time you heard of them? You probably thought "what the heck is an Uber?" The Leadwerks Brand
So according to our criteria above, the name Leadwerks satisfies the following conditions: The name "pops" and sounds cool. It's not too long. But here's where it falls short: Ambiguity in spelling (Leadworks?) Ambiguity in pronunciation. Leadwerks is pronounced like Led Zeppelin, but many people read it as "Leed-works". The name doesn't mean anything, even if it sounds cool. It's just a made-up word. These are the reasons I started thinking about naming the new engine something different. New Engine, New Name
So with this in mind, I set out to find a new name for the new coming engine. I was stumped until I realized that there are only so many words in the English language, and any good name you come up will invariably have been used previously in some other context, hopefully in another industry or product type. Realizing this gave me more leeway, as I did not have to come up with something completely unique the world has never heard before. Our early benchmarks indicate the new engine is a performance monster, with incredible results I did not even dream were possible. Together with the rapid development pipeline of Leadwerks, I knew I wanted to focus on speed. Finally, there was one name I kept coming back to for weeks on end. I was able to obtain a suitable domain name. I am now filing a trademark for use of this name, which requires that I begin using it commercially, which is why I am now revealing the name for the first time...                             Keep scrolling.                               How does this name stack up?: Unambiguous spelling and pronunciation. It's short. The name "pops". It communicates the defining feature of the product. Now think about our goals for the new engine's name. Will people have any trouble remembering this name? Is there any ambiguity about what the product stands for, and the promise it makes? If two developers are at a Meetup group and one of them says "I made this with Turbo" is there any doubt what the promise of this product is, i.e. massive performance? The name even works on a subconscious level. Anyone having trouble with their game performance (in other slow engines that aren't Turbo) will naturally wonder how fast it could be running in ours. The fact that the name has a positive emotional response for many people and a strong connection to the game industry is a plus. Turbo Game Engine is an unambiguous brand name that takes a stand and makes a clear promise of one thing: speed, which is incredibly important in the days of VR and 240 hz screens.

## Voxel Cone Tracing

I've begun working on an implementation of voxel cone tracing for global illumination. This technique could potentially offer a way to perfrorm real-time indirect lighting on the entire scene, as well as real-time reflections that don't depend on having the reflected surface onscreen, as screen-space reflection does. I plan to perform the GI calculations all on a background CPU thread, compress the resulting textures using DXTC, and upload them to the GPU as they are completed. This means the cost of GI should be quite low, although there is going to be some latency in the time it takes for the indirect lighting to match changes to the scene. We might continue to use SSR for detailed reflections and only use GI for semi-static light bounces, or it might be fast enough for moving real-time reflections. The GPU-based implementations I have seen of this technique are techically impressive but suffer from terrible performance, and we want something fast enough to run in VR. The first step is to be able to voxelize models. The result of the voxelization operation is a bunch of points. These can be fed into a geometry shader that generates a box around each one: void main() { vec4 points[8]; points[0] = projectioncameramatrix[0] * (geometry_position[0] + vec4(-0.5f * voxelsize.x, -0.5f * voxelsize.y, -0.5f * voxelsize.z, 0.0f)); points[1] = projectioncameramatrix[0] * (geometry_position[0] + vec4(0.5f * voxelsize.x, -0.5f * voxelsize.y, -0.5f * voxelsize.z, 0.0f)); points[2] = projectioncameramatrix[0] * (geometry_position[0] + vec4(0.5f * voxelsize.x, 0.5f * voxelsize.y, -0.5f * voxelsize.z, 0.0f)); points[3] = projectioncameramatrix[0] * (geometry_position[0] + vec4(-0.5f * voxelsize.x, 0.5f * voxelsize.y, -0.5f * voxelsize.z, 0.0f)); points[4] = projectioncameramatrix[0] * (geometry_position[0] + vec4(-0.5f * voxelsize.x, -0.5f * voxelsize.y, 0.5f * voxelsize.z, 0.0f)); points[5] = projectioncameramatrix[0] * (geometry_position[0] + vec4(0.5f * voxelsize.x, -0.5f * voxelsize.y, 0.5f * voxelsize.z, 0.0f)); points[6] = projectioncameramatrix[0] * (geometry_position[0] + vec4(0.5f * voxelsize.x, 0.5f * voxelsize.y, 0.5f * voxelsize.z, 0.0f)); points[7] = projectioncameramatrix[0] * (geometry_position[0] + vec4(-0.5f * voxelsize.x, 0.5f * voxelsize.y, 0.5f * voxelsize.z, 0.0f)); vec3 normals[6]; normals[0] = (vec3(-1,0,0)); normals[1] = (vec3(1,0,0)); normals[2] = (vec3(0,-1,0)); normals[3] = (vec3(0,1,0)); normals[4] = (vec3(0,0,-1)); normals[5] = (vec3(0,0,1)); //Left geometry_normal = normals[0]; gl_Position = points[0]; EmitVertex(); gl_Position = points[4]; EmitVertex(); gl_Position = points[3]; EmitVertex(); gl_Position = points[7]; EmitVertex(); EndPrimitive(); //Right geometry_normal = normals[1]; gl_Position = points[1]; EmitVertex(); gl_Position = points[2]; EmitVertex(); ... } Here's a goblin who's polygons have been turned into Lego blocks. Now the thing most folks nowadays don't realize is that if you can voxelize a goblin, well then you can voxelize darn near anything. Global illumination will then be calculated on the voxels and fed to the GPU as a 3D texture. It's pretty complicated stuff but I am very excited to be working on this right now. If this works, then I think environment probes are going to completely go away forever. SSR might continue to be used as a low-latency high-resolution first choice when those pixels are available onscreen. We will see. It is also interesting that the whole second-pass reflective water technique will probably go away as well, since this technique should be able to handle water reflections just like any other material.

## Three Types of Optimization

In designing the new engine, I have found that there are three distinct types of optimization. Streamlining
This is refinement. You make small changes and try to gain a small amount of performance. Typically, this is done as a last step before releasing code. The process can be ongoing, but suffers from diminishing returns after a while. When you eliminate unnecessary math based on guaranteed assumptions you are streamlining code. For example, a 4x4 matrix multiplication can skip the calculations to fill the right-most column if the matrices are guaranteed to be orthogonal (non-sheared). Quality Degradation
This is when you downgrade the quality of your results within a certain tolerable level where it won't be noticed much. An example of this is using a low-resolution copy of a model when it is far away from the camera. Quality degradation can be pretty arbitrary, and can mask your true performance, so it's best to keep an option to disable this. Architectural
By designing algorithms in a way that makes maximum use of hardware and produces the most optimum results, we can greatly increase performance. Architectural optimization produces groundbreaking changes that can be ten or 100 times faster than the old architecture. An example of this is GPU hardware, which produces a massive performance increase over software rendering. We're seeing a lot of these types of improvements in Leadwerks Game Engine 5 because the entire system is being designed to make maximum use of modern graphics hardware.

## Revisiting Border Recon

Ok I'm finally done with my obligations to my University as a student. Now it's time to come back to game dev and finish up (hopefully) the Border Recon development.  While I was busy in University, I did do a bit of work on the game here and there. I added a spectator feature that allows the players to spectate others in free mode and third person while waiting to respawn after dying. Pretty cool. Also, I think some of us have seen the earlier changes with the style of the graphics... I changed the style of the graphics from HQ to just flat pastel colours resulting in a cartoony look in-game. I like it... requires less work on the level design details and it actually looks quite nice. Another thing that's cool coming in the next version: soundtracks! I found this cool soundtrack from gamedevmarket website and I think it fits perfectly to the game's theme. I'm going to work on that so I can add a bit of music to the game. Link to the soundtrack here: https://www.gamedevmarket.net/asset/brass-attacks/ Also, I'm working on the muzzle flash feature for the weapons, it's half done now and it looks kick ***. Screenshots coming soon! Thanks for reading folks!

## Second Performance Test: nearly 400% faster!

After observing the behavior of the previous test, I rearranged the threading architecture for even more massive performance gains. This build runs at speeds in excess of 400 FPS with 100,000 entities....on Intel integrated graphics! I've had more luck with concurrency in design than parallelism. (Images below are taken from here.) Splitting the octree recursion up into separate threads produced only modest gains. It's difficult to optimize because the sparse octree is unpredictable. Splitting different parts of the engine up into multiple threads did result in a massive performance boost. The same test in Leadwerks 4 runs at about 9 FPS. making Leadwerks 5 more than 45 times faster under heavy loads like this. Alpha subscribers can try the test out here.

## Lua binding in Leadwerks 5

The Leadwerks 5 API uses C++11 smart pointers for all complex objects the user interacts with. This design replaces the manual reference counting in Leadwerks 4 so that there is no Release() or AddRef() method anymore. To delete an object you just set all variables that reference that object to nullptr: auto model = CreateBox(); model = nullptr; //poof! In Lua this works the same way, with some caveats: local window = CreateWindow() local context = CreateContext(window) local world = CreateWorld() local camera = CreateCamera(world) camera:SetPosition(0,0,-5) local model = CreateBox() while true do if window:KeyHit(KEY_SPACE) then model = nil end world:Render() end In the above example you would expect the box to disappear immediately, right? But it doesn't actually work that way. Lua uses garbage collection, and unless you are constantly calling the garbage collector each frame the model will not be immediately collected. One way to fix this is to manually call the garbage collector immediately after setting a variable to nil: if window:KeyHit(KEY_SPACE) then model = nil collectgarbage() end However, this is not something I recommend doing. Instead, a change in the way we think about these things is needed. If we hide an entity and then set our variable to nil we can just defer the garbage collection until enough memory is accrued to trigger it: if window:KeyHit(KEY_SPACE) then model:Hide()-- out of sight, out of mind model = nil end I am presently investigating the sol2 library for exposing the C++ API to Lua. Exposing a new class to Lua is pretty straightforward: lua.new_usertype<World>("World", "Render", &World::Render, "Update", &World::Update); lua.set_function("CreateWorld",CreateWorld); However, there are some issues like downcasting shared pointers. Currently, this code will not work with sol2: local a = CreateBox() local b = CreateBox() a:SetParent(b)-- Entity:SetParent() expects an Entity, not a Model, even though the Model class is derived from Entity There is also no support for default argument values like the last argument has in this function: Entity::SetPosition(const float x,const float y,const float z,const bool global=false) This can be accomplished with overloads, but it would require A LOT of extra function definitions to mimic all the default arguments we use in Leadwerks. I am talking to the developer now about these issues and we'll see what happens.

## Lua table gotcha

I recently was introduced to a bug in my game. I had 20 AI units and only 19 of them were actively doing something. Number 20 was just standing there. The problem eventually lied in using '#enemies' to get the amount of enemies. Here is what happened: A lua table index by default starts on index 1. This in contrary to many other languages where it starts at 0. However, you can assign a value to index '0' if you want. Since I use C# on a daily basis, I am more comfortable using the 0 index as a start. As a result this is my (simplified) code: for i = 0, enemyCount-1, do enemies[i] = new Enemy() end In the AI script I loop over the enemies like this: for i = 0, #enemies-1, do enemies[i]:DoStuff() end This is really basic lua scripting with one tiny gotcha: The '#' is used to get the amount of consecutive keyed items in the list. This I knew. What I did not know, is that there is also the requirement that this order starts at index 1 (or at least not on index 0). It simply ignores the 0 index! Here is a full script to try local enemies = {} local enemyCount = 4 for i = 0, enemyCount-1, 1 do enemies[i] = "I am enemy " .. i System:Print(enemies[i]) end System:Print("#enemiesCount: " .. #enemies) for i = 0, #enemies-1 do System:Print(enemies[i]) end Output: I am enemy 0 I am enemy 1 I am enemy 2 I am enemy 3 #enemiesCount: 3 I am enemy 0 I am enemy 1 I am enemy 2 Problem
So what was happening? I did get the amount of enemies back, except for the one enemy that was located on index 0. I quickly noticed the lower count of enemies, but since enemy number 20 wasn't doing anything I was also looking in the wrong place. It was actually enemy number 1 that was the culprit, even though it's AI was being executed. Solution
It can be solved in numerous simple ways, but I guess best practice is to just stick to Lua's standard and not assign anything to 0. This can really prevent some time being wasted on absolutely silly issues like this.

×

• Pages

• Back
• Store

• Projects