I was able to partially implement clustered forward rendering. At this time, I have not divided the camera frustum up into cells and I am just handing a single point light to the fragment shader, but instead of a naive implementation that would just upload the values in a shader uniform, I am going through the route of sending light IDs in a buffer. I first tried texture buffers because they have a large maximum size and I already have a GPUMemBlock class that makes them easy to work with. Becau
This is about financial stuff, and it's not really your job to care about that, but I still think this is cool and wanted to share it with you.
People are buying stuff on our website, and although the level of sales is much lower than Steam it has been growing. Unlike Steam, sales through our website are not dependent on a third party and cannot be endangered by flooded marketplaces, strange decisions, and other random events.
Every customer I checked who used a credit card has kept it
I decided I want the voxel GI system to render direct lighting on the graphics card, so in order to make that happen I need working lights and shadows in the new renderer. Tomorrow I am going to start my implementation of clustered forward rendering to replace the deferred renderer in the next game engine. This works by dividing the camera frustum up into sectors, as shown below.
A list of visible lights for each cell is sent to the GPU. If you think about it, this is really another v
Steam Workshop was a compelling idea to allow game asset authors to sell their items for use with Leadwerks Game Engine. However, the system has turned out to have some fundamental problems, despite my best efforts to work around it.
Free items are not curated, causing the store to fill with low quality content.
Some people have reported trouble downloading items.
The publishing process is pretty tedious.
The check-out process requires adding funds to Steam Wallet, a
I was having trouble with cone tracing and decided to first try a basic GI algorithm based on a pattern of raycasts. Here is the result:
You can see this is pretty noisy, even with 25 raycasts per voxel. Cone tracing uses an average sample, which eliminates the noise problem, but it does introduce more inaccuracy into the lighting.
Next I wanted to try a more complex scene and get an estimate of performance. You may recognize the voxelized scene below as the "Sponza" scene freque
Summer is here, and you know what that means! Yes, it is time for another LEGENDARY game tournament. This year the theme is "Retro Gaming". Create a modern take on an arcade game hearkening back to the days of NES, Neo Geo, Sega, or just make anything you want. Either way, you get this totally radical poster as a prize!
How does it work? For one month, the Leadwerks community builds small playable games. Some people work alone and some team up with others. At the end of the mont
Now that we can voxelize models, enter them into a scene voxel tree structure, and perform raycasts we can finally start calculating direct lighting. I implemented support for directional and point lights, and I will come back and add spotlights later. Here we see a shadow cast from a single directional light:
And here are two point lights, one red and one green. Notice the distance falloff creates a color gradient across the floor:
The idea here is to first calculate direc
I added a raycast function to the voxel tree class and now I can perform raycasts between any two positions. This is perfect for calculating direct lighting. Shadows are calculated by performing a raycast between the voxel position and the light position, as shown in the screenshot below. Fortunately the algorithm seems to work great an there are no gaps or cracks in the shadow:
Here is the same scene using a voxel size of 10 centimeters:
If we move the light a little lower
At this point I have successfully created a sparse octree class and can insert voxelized meshes into it. An octree is a way of subdividing space into eight blocks at each level of the tree:
A sparse octree doesn't create the subnodes until they are used. For voxel data, this can save a lot of memory.
It was difficult to get the rounding and all the math completely perfect (and it has to be completely perfect!) but now I have a nice voxel tree that can follow the camera around and
Hi Guys,
It has been very silent around Phodex Framework and my projects, but thats because I really focused on game development and nothing else. Posting stuff and creating content costs time and I was also very busy with studying. However as promised in my last entry I was working on an actual game. I tried to create a small, but quality experience including decent design, nice graphics and, at least for a one man indie game, complex game mechanics such as first person melee and range com
In evaluating possible company names I have come up with the following criteria which I used to choose a name for our new game engine.
Spelling and Pronunciation
The name should be unambiguous in spelling. This helps promote word-of-mouth promotion because when someone hears the name for the first time, they can easily find it online. Similarly, the name when read should be unambiguous in pronunciation. This helps the name travel from written to spoken word and back. Can you imagine telli
I've begun working on an implementation of voxel cone tracing for global illumination. This technique could potentially offer a way to perfrorm real-time indirect lighting on the entire scene, as well as real-time reflections that don't depend on having the reflected surface onscreen, as screen-space reflection does.
I plan to perform the GI calculations all on a background CPU thread, compress the resulting textures using DXTC, and upload them to the GPU as they are completed. This means t
First, I was experiencing some crashes due to race conditions. These are very very bad, and very hard to track down. The problems were being caused by reuse of thread returned objects. Basically, a thread performs some tasks, returns an object with all the processed data, and then once the parent thread is done with that data it is returned to a pool of objects available for the thread to use. This is pretty complicated, and I found that when I switched to just creating a new return object each
The animation update routine has been moved into its own thread now where it runs in the background as you perform your game logic. We can see in the screenshot below that animation updates for 1025 characters take about 20 milliseconds on average. (Intel graphics below, otherwise it would be 1000 FPS lol.)
In Leadwerks 4 this would automatically mean that your max framerate would be 50 FPS, assuming nothing else in the game loop took any time at all. Because of the asynchronous threa
In designing the new engine, I have found that there are three distinct types of optimization.
Streamlining
This is refinement. You make small changes and try to gain a small amount of performance. Typically, this is done as a last step before releasing code. The process can be ongoing, but suffers from diminishing returns after a while. When you eliminate unnecessary math based on guaranteed assumptions you are streamlining code. For example, a 4x4 matrix multiplication can skip the calc
Leadwerks 5 uses a different engine architecture with a game loop that runs at either 30 (default) or 60 updates per second. Frames are passed to the rendering thread, which runs at an independent framerate that can be set to 60, 90, or unlimited. This is great for performance but there are some challenges in timing. In order to smooth out the motion of the frames, the results of the last two frames received are interpolated between. Animation is a big challenge for this. There could potentially
I got skinned animation working in the new renderer, after a few failed attempts that looked like something from John Carpenter's The Thing. I set up a timer and updated a single animation on a model 10,000 times. Animation consists of two phases. First, all animations are performed to calculate the local position and quaternion rotation. Second, 4x4 matrices are calculated for the entire hierarchy in global space and copied into an array of floats. To test this, I placed this code inside the ma
The design of Leadwerks 4 was meant to be flexible and easy to use. In Leadwerks 5, our foremost design goals are speed and scalability. In practical terms that means that some options are going to go away in order to give you bigger games that run faster.
I'm working out the new animation system. There are a few different ways to approach this. In situations like this I find it is best to start by deciding the desired outcome and then figuring out how to achieve that. So what do we want?
After observing the behavior of the previous test, I rearranged the threading architecture for even more massive performance gains. This build runs at speeds in excess of 400 FPS with 100,000 entities....on Intel integrated graphics!
I've had more luck with concurrency in design than parallelism. (Images below are taken from here.)
Splitting the octree recursion up into separate threads produced only modest gains. It's difficult to optimize because the sparse octree is unpredicta
I am proud to show off our first performance demonstration which proves that my idea for the Leadwerks 5 renderer works. To test the renderer I created 100,000 instanced boxes. The demo includes both regular and a mock VR mode that simulates single-pass stereoscopic rendering with a geometry shader.
The hardware I tested on is an Intel i7-4770R (for graphics too) which is a few years old.
Now this is not a perfect benchmark for several reasons. There is no frustum culling being perform
After working out a thread manager class that stores a stack of C++ command buffers, I've got a pretty nice proof of concept working. I can call functions in the game thread and the appropriate actions are pushed onto a command buffer that is then passed to the rendering thread when World::Render is called. The rendering thread is where all the (currently) OpenGL code is executed. When you create a context or load a shader, all it does is create the appropriate structure and send a request over
The Leadwerks 4 renderer was built for maximum flexibility. The Leadwerks 5 renderer is being built first and foremost for great graphics with maximum speed. This is the fundamental difference between the two designs. VR is the main driving force for this direction, but all games will benefit.
Multithreaded Design
Leadwerks 4 does make use of multithreading in some places but it is fairly simplistic. In Leadwerks 5 the entire architecture is based around separate threads, which is chal
When considering the script system in Leadwerks 5, I looked at alternatives including Squirrel, which is used by Valve in many games, but these gave me a deeper appreciation for the simplicity of Lua. There are only a handful of rules you need to learn to use the language, it’s fun to use, yet somehow it does everything you could ever need.
These were three big issues I had to solve. First, the Leadwerks 5 API makes extensive use of smart pointers, which our binding library tolua++ does not