Jump to content

Josh

Staff
  • Posts

    22,901
  • Joined

  • Last visited

Blog Entries posted by Josh

  1. Josh

    Articles
    I have not used the engine outside the editor in a while, but I needed to for performance testing, so now I am back to real-time rendering. During the development of the GI system I broke most of the light types, so I had to spend a couple of days getting those to work again. While doing this, I decided to resolve some longstanding issues I have put off.
    First, the PBR lighting will use a default gradient in place of the skybox, if no reflection map is set for the world. This is based off the ambient light level, so it will blend in with any scene. You can disable this by setting a black skybox, but this will give some acceptable default reflectance when an empty scene is created and it prevents unlit areas from looking completely flat:

    Shadow acne will be a thing of the past. I found that using vkCmdSetDepthBias to use the hardware depth bias feature does a great job of eliminating shadow acne. This shot shows a light that is extremely close to the floor, yet no artifacts are visible:

    Of course the new strip lights are can be used for dramatic effects:

    My new video project1.mp4 I have not yet implemented shadows for directional lights. I was hoping this whole time I would come up with a brilliant idea that would make cascaded shadow maps obsolete, but that's the one are I haven't found any great new innovative technique to use.
    Box lights are another new type of light, and can be used to simulate directional lights coming into a window, for levels that are mostly indoors.
    The realtime GI system has been very popular. I am rolling around an idea to increase the quality and performance of the system, with the flip side of that being increased latency. But looking around at the world, the general global illumination never really changes suddenly all at once, so I think it is worth trying. The whole point of Ultra Engine is that it should run very fast, so if I can make the GI system favor performance, hopefully to where it has no impact at all on framerate, I think that is more consistent with our goals.
  2. Josh

    Articles
    Before proceeding with multiple GI volumes, I decided to focus on just getting the lighting to look as close to perfect as possible, with a single stage.
    Injecting the ambient light into the voxel data made flat-lit areas appear much more "3D", with color bleeding and subtle contours everywhere.
    Lighting only:

    Lighting + albedo

    Some adjustments to the way the sky color is sampled gave a more lifelike appearance to outdoor lighting.
    Before:

    After. Notice the shaded area still has a lot of variation:

    Initial performance testing gives results consistent with my expectations. I'm running at half of 1920x1080 resolution, on a GEForce 1660 TI, and performance is about a third what it would be without GI. At 1920x1080, that drops to 90 FPS. Because it is so resource-intensive, I plan to render the effect at half-resolution, then upscale it and use an edge detection filter to fill in info for any pixels that need it. This card has only 1536 stream processors, about half as much as a 2080.

    Further experiments with motion did not resolve the artifacts I was experiencing earlier, and in fact caused new ones because of the flickering introduced by the GPU voxelization. You can read a detailed discussion of these issues on the Gamedev.net forum. My conclusion for now is that moving objects should not be baked into the voxelized data, because they cause a lot of flashing and flickering artifacts. These could be added in the future by storing a separate voxel grid for each dynamic object, along with some kind of data structure the shader can use to quickly find the objects a ray can pass through.
    This is great though, because it means voxelization only has to be updated when the camera moves a certain distance, of if a new static object is created or deleted. You still have completely dynamic direct lighting, and the GI system just follows you around and generates indirect lighting on the fly. I could run the update in the background and then show a smooth transition in between updates, and all the flickering problems go away. Performance should be very good once I have further optimized the system. And every surface in your game can show reflections everywhere. Moving lights work really really well, as you have seen.
    The end is in sight and I am very pleased how this system is turning out. My goal was to create a completely dynamic system that provided better 3D reflections than cubemaps, and did not require manual placement or baking of probes, fast enough to use on older mid-range discrete GPUs, and that is what we got.
  3. Josh

    Articles
    Not a lot of people know about this, but back in 2001 Discreet (before the company was purchased by Autodesk) released a free version of 3ds max for modding games. Back then game file formats and tools were much more highly specialized than today, so each game required a "game pack" to customize the gmax interface to support that game. I think the idea was to charge the game developer money to add support for their game. Gmax supported several titles including Quake 3 Arena and Microsoft Flight Simulator, but was later discontinued.
    I personally love the program because it includes only the features you need from 3ds max for hard surface modeling. It's basically a stripped down version of 3ds max with only the features you need.

    Gmax still survives today, apparently in the custody of Turbosquid. You can download it here. You need the gmax 1.2 installer and the Tempest game pack for Quake 3. You will also need to request a free registration key from Turbosquid. After installing gmax, you can simply find the MD3 export plugin ("md3exp.dle") in the Tempest game pack download and copy that to your "C:\gmax\plugins" directory to enable export. There is also an optional  There is also an optional MD3 import script by Chris Cookson which is uploaded here for safekeeping:
    q3-md3.ms
    With the plugin system in our new engine I was able to add support for loading Quake 3 MD3 models, so you can export your gmax models and load them up in the new engine. However, there are some restrictions. The MD3 file format uses compressed vertex positions, so your vertex positions have a limited range and resolution. Additionally, there are restrictions on what you can do with the gmax program, so take a look at the licensing terms before you do anything. Still, it's a fun program to have and this is a nice feature to play around with.
     
  4. Josh
    Until now, all my experiments with voxel cone step tracing placed the center of the GI data at the world origin (0,0,0). In reality, we want the GI volume to follow the camera around so we can see the effect everywhere, with more detail up close. I feel my productivity has not been very good lately, but I am not being too hard on myself because this is very difficult stuff. The double-blind nature of it (rendering the voxel data and then using that data to render an effect) makes development very difficult. The intermediate voxel data consists of several LODs and is difficult to visualize. My work schedule lately has been to do nothing for several days, or just putter around with some peripheral tasks, and then suddenly solve major problems in a short two-hour work session.
    Here you can see a single GI stage following the camera around properly. More will be added to increase the area the effect covers, and the edges of the final stage will fade out the effect for a smooth transition:

    My new video project1.mp4 This all makes me wonder what "work" is when you are handling extremely difficult technical problems. I have no problem working 8+ hours a day on intermediate programming tasks, but when it comes to these really advanced problems I can't really be "on" all day. This morning, I went for a walk, for seven miles. Was I subconsciously working during that time, so that I can later sit down and quickly solve a problem I was completely stuck on previously?
    I definitely underestimated the difficulty of making this feature work as a robust engine feature that can be used reliably. There is a lot of nuance and small issues that come up when you start testing in a variety of scenes, and this information could easily fill an hour-long talk about the details of voxel cone step tracing. However, there is just one more step, to make the moving volumes work with multiple GI stages. Once that is working I can proceed with more testing, look for artifacts to eliminate, and optimize speed.
    This is the last big feature I have to finish. It seems fitting that I should get one final big challenge before completing Ultra Engine, and I am enjoying it.
  5. Josh
    I had to spend several weeks just eliminating light leaks and other artifacts, and getting the results I wanted in a variety of scenes. The results are looking good. Everyone who tries implementing this technique has problems with light leaks but I have fortunately been able to avoid this with careful planning:

    Now that I have nice results with a single volume texture centered at the origin, it's time to add additional stages. The idea is to have a cascading series of volume textures around the camera, where each volume is twice the dimensions (and eight times the volume) of the previous one. This allows us to cover a large area of the scene, while using lower-resolution data for parts of the scene that are further away from the camera:

    Putting this into practice, in this shot you can see some ambient occlusion and reflections in the trench. (The metalness is turned up really high to make it more easily visible, so don't worry about the bright white specs.)

    A little bit further from the center of the scene, we see another trench. This appears in the second GI volume with half the voxel resolution of the previous image, so it has some small artifacts. However, this will also appear further from the camera so they won't be noticeable:

    The transition between stages is good enough. If you look carefully at the floor between stage 0 and 1, the reflection of the window is lost in the lower-resolution stage 1. On the wall in between stage 1 and 2 the boundary is visible. However, the camera is going to be further away so those artifacts won't be as apparent as they are now.

    From the outside, we can see four 64x64x64 volume textures can be used to cover the entire train station, with a base voxel size of 12.5 centimeters.

    To cover the same area with a single volume texture we would need a 512x512x512 texture. Using several cascaded volume textures brings our memory usage down to less than 1% what it would be otherwise:
    64*64*64*4 = 1048576
    512*512*512 = 134217728
    1048576 / 134217728 * 100 = 0.78125%
    There is still a lot of room for optimization. We can perform the voxelization step for all four stages in one single pass using multi-target rendering, like the pointlight shadow shader does. We could also distribute the GI stage updates so that only one gets drawn each frame, since even objects in motion probably won't cause a change in the voxelized result every single frame. Right now I am just focusing on optimizing the shader and rendering everything each frame, so I can deal with a worst case scenario before I start adding techniques that will make it harder to measure performance.
    Without doing any performance tests, the rendering seems quite fast. I am not even using this in a real-time application right now, but I definitely get a feel for how responsive the viewport rendering is to mouse movement, and it seems to be very snappy so far. I can definitely bog it down if I turn up the settings too high.
    There are a lot of settings that are very scalable for performance / quality such as voxel resolution, number of cascaded stages, maximum ray steps, and number of light bounces.
    I've done some initial tests trying to make the volumes move around with the camera, and that produced a lot of new and strange artifacts I didn't anticipate. So I think we can expect a few more weeks of slow but steady progress as I dive into this even deeper.
    There's a lot of academic papers and demos out there that show this technique, but delivering a complete solution that produces good results for any scene is quite a challenge, and I've been working on this for four years, with the last three months spent pretty much full-time, just on this feature! 
    But I am glad to do this, because my love for you is so great and I want you to be happy.

     
  6. Josh
    Adding emission into the cascaded voxel cone step tracing global illumination and dynamic reflections system (SEO ftw) was simple enough:
    There's some slight trailing but it looks okay to me. There is a bit of a "glitch" in that when the emissive surface gets near the wall, the ambient occlusion kicks in, even though the sphere is self-illuminating. This happens because the emission color is mixed with the light voxel during the rasterization step. I could fix this by storing emission in a separate texture, but I don't think the increase memory and processing time are justifiable and this is acceptable as-is.
    Changing the center of the voxel grid will probably cause the whole scene to display the same trailing, and at that point I think that would look really bad. I'm thinking the contents of the volume texture probably need to be shifted over when the camera moves one voxel unit. Can a texture be copied to itself in Vulkan? Let's see:
    Hmmm, that still does not tell me if I can copy a section of overlapping memory in a texture.  Hmmm, this might be what I am looking for...
    So I probably need to keep an extra "transfer" texture around to act as an intermediate image when copying the contents of the volume texture to itself.
  7. Josh
    Finally, finally, finally, finally, for the first time since I started working on this feature several years ago, finally we have real-time global illumination with a second light bounce: Below you can see the direct light hitting the floor, bounding up to the ceiling, and then being reflected back down on the floor again.

    Performance is still good and I have not started fine-tuning optimization yet. I was just trying to get the effect working at all, which was quite difficult to do, but now it works great.
    In the shot above, ambient light is set to black. Although the GI does light up a lot of the room, there is probably still a place for a small amount of flat ambient light, which can then be darkened by the ambient occlusion effect.
    Light leakage is really not an issue. The algorithm does a good job keeping dark indoor areas dark, and only lets light in where it should be.

    While areas with more sunlight exposure are quite a lot more bright:

    Everything else about Ultra Engine is great, but I think this is going to be the thing that really attracts a lot of people. I am very glad I get to include a feature so groundbreaking and amazing as this.
     
  8. Josh
    For downsampling of GI voxel data, I found that a compute shader offers the best performance. The first step was to add support for compute shaders into Ultra Engine. 
    I've never used these before but I was able to get them working pretty quickly. I think the user API will look something like this:
    //Load compute shader auto module = LoadShaderModule("Shaders/Compute/test.comp.spv"); auto shader = CreateShader(); shader->SetModule(module, SHADER_COMPUTE); //Create work group int workercount = 8; auto workgroup = CreateWorkgroup(shader, workercount, workercount, workercount); This is not using my final formula. It's not even using bilinear sampling, it's just using a single sample from the previous mipmap. But it's basically working:

    Next I will fix up the downfilter sampling and then we can start doing GI and reflections again. Basically, we are in exactly the same place we were a month ago, except now the voxelization process is instantaneous, and will work with animated models and objects in motion. Below is a shot of the results I was getting previously:

    It will be very interesting to see this working with animated models. Maybe we can remake the first level of Quake 2 and compare it to the RTX version.
    I'd like to thank Vilém Otte for his helpful advice with this.
  9. Josh
    Leadwerks 4.3 brings a big performance boost to your games. In this blog I am going to talk about some of the common problems you can eliminate to make your games run faster.
    When slow performance is encountered, it is typically one really bad mistake that is slowing everything down. Here's a few common bottlenecks for performance you can create in your games, how to identify them, and how to fix them.
    Shadow Updates
    Shadow rendering is cheaper than regular renders because no textures have to be used, but extra rendering passes to update shadows can add up.
    How to identify: The game stats will display the number of shadows updated. If this is more than a few, you might have a problem. Remember that point lights require six extra passes, and directional lights three, but both only count as one shadow. You also want your rendered shadow polys to be as low as possible.
    How to fix: Figure out what objects are triggering the redraw and whether it is necessary. Non-moving high-polygon models should use the static shadow mode so they don't have to be redrawn during a render. In version 4.3, low and medium light quality settings will also stagger shadow updates so that fewer are rendered each frame. (This can also make it harder to detect a problem, so maybe test on high quality settings when you are examining this.)
    GPU Pixel Pipeline
    The GPU has a limited number of stream processors it can split up the task of rendering an image with. When you overload the GPU pixel pipeline it slows down your program.
    How to identify: If you have a much higher framerate at a lower screen resolution, this is probably the cause.
    How to fix: Lower light quality settings, remove post-processing effects, or run at a lower screen resolution.
    GPU Vertex Pipeline
    This is pretty rare because the number of vertices the GPU has to process are tiny compared to the number of pixels, but it is possible.
    How to identify: Slow speed regardless of screen resolution, slow even when rendering the scene with no gameplay, extremely high-polygon counts in the game stats (like 2,000,000+). There are some applications where extremely high polygon counts are acceptable, but unless you are specifically making such an application and are aware of this, it probably means you should use models designed for real-time games.
    How to fix: Use lower-resolution models or lighten up on the vegetation.
    Too Many Objects
    The renderer itself has a cost of computation on the CPU. The more separate objects there are, the more the CPU has to work to determine what objects are visible. On the other hand, discarding large numbers of objects can give a big speed boost, so it's always a balance.
    How to identify: The render time shown in your game stats will be more than a few milliseconds, and the number of batches, displayed in the game stats, will be very high. It's all relative but if you have a simple scene and 500 batches are being drawn, there is probably a problem. Large spread out maps with dense distribution of objects can often have this problem. This will happen on all machines, regardless of how good the GPU is. The most complex maps I've ever made had about 700 batches rendered. There is no need to go above that no matter how big the map is, because objects in the distance will be culled. The vegetation system does not cost much on a per object basis, so it is an extremely efficient way to lay down a lot of dense objects.
    How to fix: Use the model editor Collapse feature to collapse models into a single object and resave them. Also set the view range of smaller objects to a closer distance so there are fewer of them visible in the distance.
    Slow Collision
    If your game slows down when you get close to an object, you might have a high-poly collision mesh.
    How to identify: The physics update time in the game stats will be more than a few milliseconds. Enable "View Physics" in the editor and check to make sure all physics shapes are low-poly.
    How to fix: Use the model editor to generate a low-poly physics shape with the best available option.
    Code Errors
    Is your game constantly reloading files from the hard drive? Are you performing lots of pick operations each frame? Did you create a runaway loop of new objects to process?
    How to identify: Comment out sections of your code and test the framerate.
    How to fix: Figure out the main section that is causing the slowdown, then keep commenting out smaller and smaller parts until you narrow down the problem. Post on the forum if you don't know why something is causing a performance drop.
  10. Josh

    Articles
    After testing and some discussion with other programmers, I decided to try performing voxelization on the GPU instead of the CPU. The downside is the memory usage is much higher than a sparse voxel octree, but I found that sparse voxel octrees were very slow when it came to soft reflections, although the results of the sharp raycast were impressive:
    You can read the details of GPU voxelization here if you wish.

    Initially I thought the process would require rendering the scene in one pass for each slice of the volume texture, but it is actually possible to voxelize a 3D scene in just one single pass. Implementation was fairly difficult, but it's finally working:

    The voxelization process is very fast, fast enough to do in real-time. That means the problems I had earlier with reflection lag (shown below) should be eliminated with this approach. Animated characters should work with absolutely no problem as well:
    I am experiencing some flickering when multiple triangles contribute to a voxel. Because the volume texture is written to in random order, one imageStore() operation might overwrite the result of another one, and the "winning" pixel (last one drawn) can change from frame to frame. So that's something that needs to be solved.
  11. Josh

    Articles
    I've moved on to one of the final steps for voxel cone step tracing, which is downsampling the lit voxels in a way that approximates a large area of rays being cast. You can read more about the details of this technique here.
    This artifact looks like a mirror that is sunken below the surface of some kind of frame. It was appearing because the mesh surface was inside the voxel, and neighboring voxels were being intersected. The solution was to move the ray starting point out of the voxel the point was in, using the normal's largest axis to determine which direction to move:

    Once that was fixed the artifact disappeared. This series of images shows reflections read from each LOD level. The first image is full resolution, and each image after that gets lower-res and blockier/ Notice the lighting in the reflections is much more accurate than in previous images.





    Because the downsampling routine does not yet consider the alpha value, the geometry has a tendency to grow as it is downsampled. The next step is to determine an equation that will consider the alpha component of each voxel, and use that to start fading the shapes out as the bigger voxels start spanning areas that are both solid and empty. This is the magic optimization that makes cone step tracing an imperfect but fast approximation of ray tracing for real-time rendering.
    A naive approach to downsampling would just take the average of the 8 child node colors. This would also result in a lot of light leaking. Instead, I took an average of the closest four children, then performed an alpha blend with the furthest four children, for each axis. When we add transparency into the downsampling and raycasting routine, the reflection gets more confusing, but it's generally correct. Most importantly, the skybox is not leaking through the reflection.


    I think there's a lot I can experiment with here. I'm using six images, with a lighting calculation for the positive and negative direction on each axis, but since it's only an approximation it might be possible to merge that into one image. The transparent areas are hitting interior faces of the voxels, which looks strange, but it is what I told it to do, and I am not sure what the alternative is. I've never actually seen any voxel cone step tracing demo that was this precise. Normally the reflection is not shown very clearly. So it's hard to know how I can improve it, but it's getting there.
  12. Josh

    Articles
    I've got cone step tracing working now with the sparse voxel octree implementation. I actually found that two different routines are best when the surface is rough or smooth. For sharp reflections, and precise voxel raytracing works best:

    For rough surfaces, cone step tracing can be used. There are some issues to work out and I need to revisit the downsampling routine, but it's basically working:

    Here's a video showing the sharp raycast in motion. Performance is quite good with this:
     
  13. Josh
    In Leadwerks 4, render-to-texture was accomplished with the SetRenderTarget command, which allowed a camera to draw directly to a specified texture, while hiding the underlying framebuffer object (FBO). In the new engine we have a bit more explicit handling of this behavior. This is largely in part due to the use of Vulkan's bindless design, which greatly improves the context-binding design of OpenGL. The Leadwerks "Buffer" class was never documented or officially supported because the underlying OpenGL functionality made the system pretty messy, but the design of Vulkan simplifies this aspect of graphics.
    We have seen that the Framebuffer classes replaces the LE4 context. I've added a TextureBuffer class which can be created similarly:
    shared_ptr<TextureBuffer> CreateTextureBuffer(const int width, const int height, const int colorcomponents = 1, const bool depthcomponent = true, const int samples = 0); Once a TextureBuffer is created, you can set a camera to target it for rendering:
    camera->SetRenderTarget(texbuffer); You can also apply its color component(s) to a material:
    material->SetTexture(texbuffer->GetColorBuffer(0), TEXTURE_BASE); You could also retrieve the depth buffer and apply that to a material, rendering the scene from the top down and using the depth in a rain or snow shader, for example.
    This functionality will later be used to render the GUI system to a texture for use in VR or with in-game menus painted onto 3D surfaces.
    Like everything with Vulkan, this involved a very long process of figuring out everything we need to use, discarding the things we don't, and packaging it up in a structure that is actually usable by the end user. However, once all that is done we have a very powerful system that is optimized for exactly the way modern GPUs work. Here is a small sample of some of my code, just to give you an idea of how complicated this stuff is:
    for (auto pair : visset->cameravislists) { auto cam = pair.first; clear[1].color = { cam->clearcolor.r, cam->clearcolor.g, cam->clearcolor.b, cam->clearcolor.a }; auto light = dynamic_pointer_cast<RenderLight>(cam); if (light == nullptr and cam->rendertarget == nullptr) continue; renderpass[0] = device->shadowpass; renderpass[1] = device->renderpass[CLEAR_COLOR | CLEAR_DEPTH]; int faces = 1; if (light) { if (light->description.type == LIGHT_POINT) faces = 6; } if (MULTIPASS_CUBEMAP) faces = 1; for (int face = 0; face < faces; ++face) { renderPassBeginInfo.clearValueCount = 2; if (light) { renderPassBeginInfo.renderPass = device->shadowpass->pass; if (light->description.type == LIGHT_POINT and MULTIPASS_CUBEMAP == true) { renderPassBeginInfo.renderPass = device->cubeshadowpass->pass; } renderPassBeginInfo.framebuffer = light->shadowbuffer[face]->framebuffer; renderPassBeginInfo.renderArea.extent.width = light->shadowbuffer[face]->size.width; renderPassBeginInfo.renderArea.extent.height = light->shadowbuffer[face]->size.height; } else { renderpass[0] = device->renderpass[CLEAR_COLOR | CLEAR_DEPTH]; int cc = cam->rendertarget->CountColorTextures(); renderPassBeginInfo.renderPass = device->rendertotexturepass[cc][int(cam->rendertarget->depthtexture != nullptr)]->pass; renderPassBeginInfo.framebuffer = cam->rendertarget->framebuffer; renderPassBeginInfo.renderArea.extent.width = cam->rendertarget->size.width; renderPassBeginInfo.renderArea.extent.height = cam->rendertarget->size.height; } vkCmdBeginRenderPass(commandbuffers[currentFrame]->commandbuffer, &renderPassBeginInfo, VK_SUBPASS_CONTENTS_INLINE); RecordDraw(currentFrame, cam, pair.second, renderpass[0], face); commandbuffers[currentFrame]->EndRenderPass(); if (light) commandbuffers[currentFrame]->BindResource(light->shadowbuffer[face]); //Copy output to render texture if (cam->rendertarget) { for (int n = 0; n < cam->rendertarget->colortarget.size(); ++n) { if (cam->rendertarget->colortarget[n] != nullptr) { commandbuffers[currentFrame]->TransitionImageLayout(pair.first->rendertarget->colortexture[n], VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, -1); commandbuffers[currentFrame]->TransitionImageLayout(pair.first->rendertarget->colortarget[n], VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, -1); VkImageCopy regions = {}; regions.dstOffset = {0u,0u,0u}; regions.extent = { uint32_t(cam->rendertarget->colortarget[n]->size.x), uint32_t(cam->rendertarget->colortarget[n]->size.y), 1u}; regions.srcOffset = regions.dstOffset; regions.dstSubresource.aspectMask = VK_IMAGE_ASPECT_COLOR_BIT; regions.dstSubresource.baseArrayLayer = 0; regions.dstSubresource.layerCount = 1; regions.dstSubresource.mipLevel = 0; regions.srcSubresource = regions.dstSubresource; vkCmdCopyImage(commandbuffers[currentFrame]->commandbuffer, cam->rendertarget->colortexture[n]->vkimage, VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMAL, cam->rendertarget->colortarget[n]->vkimage, VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, 1, &regions); commandbuffers[currentFrame]->TransitionImageLayout(pair.first->rendertarget->colortarget[n], VK_IMAGE_LAYOUT_TRANSFER_DST_OPTIMAL, VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL, VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT, VK_PIPELINE_STAGE_TRANSFER_BIT, -1); } } } } } Below is a simple Lua program that sets up a scene with two cameras, and renders one camera to a texture buffer which is displayed on the middle box itself.
    --Get the primary display local displaylist = ListDisplays() local display = displaylist[1]; if display == nil then DebugError("Primary display not found.") end local displayscale = display:GetScale() --Create a window local window = CreateWindow(display, "Render to Texture", 0, 0, math.min(1280 * displayscale.x, display.size.x), math.min(720 * displayscale.y, display.size.y), WINDOW_TITLEBAR) --Create a rendering framebuffer local framebuffer = CreateFramebuffer(window); --Create a world local world = CreateWorld() --Create second camera local texcam = CreateCamera(world) texcam:SetClearColor(1,0,1,1) --Create a camera local camera = CreateCamera(world) camera:Move(0,0,-2) camera:SetClearColor(0,0,1,1) --Create a texture buffer local texbuffer = CreateTextureBuffer(512,512,1,true) texcam:SetRenderTarget(texbuffer) --Create scene local box = CreateBox(world) local cone = CreateCone(world) cone:SetPosition(2,0,0) cone:SetColor(1,0,0,1) local sphere = CreateSphere(world) sphere:SetPosition(-2,0,0) sphere:SetColor(0,1,0,1) --Create render-to-texture material local material = CreateMaterial() local tex = texbuffer:GetColorBuffer() material:SetTexture(tex, TEXTURE_BASE) box:SetMaterial(material) --Create a light local light = CreateLight(world,LIGHT_DIRECTIONAL) light:SetRotation(35,-55,0) --Main loop while window:Closed() == false do texcam:SetPosition(0,0,0) texcam:Turn(0,1,0) texcam:Move(0,0,-2) world:Update() world:Render(framebuffer) end Here is the result. Look how simple it is to control this powerful system!

  14. Josh

    Articles
    While seeking a way to increase performance of octree ray traversal, I came across a lot of references to this paper:
    http://wscg.zcu.cz/wscg2000/Papers_2000/X31.pdf
    Funnily enough, the first page of the paper perfectly describes my first two attempted algorithms. I started with a nearest neighbor approach and then implemented a top-down recursive design:
    GLSL doesn't support recursive function calls, so I had to create a function that walks up and down the octree hierarchy without calling itself. This was an interesting challenge. You basically have to use a while loop and store your variables at each level in an array. Use a level integer to indicate the current level you are working at, and everything works out fine.
    while (true) { childnum = n[level]; n[level]++; childindex = svotnodes[nodeindex].child[childnum]; if (childindex != 0) { pos[level + 1] = pos[level] - qsize; pos[level + 1] += coffset[childnum] * hsize; bounds.min = pos[level + 1] - qsize; bounds.max = bounds.min + hsize; if (AABBIntersectsRay2(bounds, p0, dir)) { if (level == maxlevels - 2) { if (SVOTNodeGetDiffuse(childindex).a > 0.5f) return true; } else { parent[level] = nodeindex; nodeindex = childindex; level++; n[level] = 0; childnum = 0; size *= 0.5f; hsize = size * 0.5f; qsize = size * 0.25f; } } } while (n[level] == 8) { level--; if (level == -1) return false; nodeindex = parent[level]; childnum = n[level]; size *= 2.0f; hsize = size * 0.5f; qsize = size * 0.25f; } } I made an attempt to implement the technique described in the paper above, but something was bothering me. The octree traversal was so slow that even if I was able to speed it up four times, it would still be slower than Leadwerks with a shadow map.
    I can show you very simply why. If a shadow map is rendered with the triangle below, the GPU has to process just three vertices, but if we used voxel ray tracing, it would require about 90 octree traversals. I think we can assume the post-vertex pipeline triangle rasterization process is effectively free, because it's a fixed function feature GPUs have been doing since the dawn of time:

    The train station model uses 4 million voxels in the shot below, but it has about 40,000 vertices. In order for voxel direct lighting to be on par with shadow maps, the voxel traversal would have to be about 100 times faster then processing a single vertex. The numbers just don't make sense.

    Basically, voxel shadows are limited by the surface area, and shadow maps are limited by the number of vertices. Big flat surfaces that cover a large area use very few vertices but would require many voxels to be processed. So for the direct lighting component, I think shadow maps are still the best approach. I know Crytek is claiming to get better performance with voxels, but my experience indicates otherwise.
    Another aspect of shadow maps I did not fully appreciate before is the fact they give high resolution when an object is near the light source, and low resolution further away. This is pretty close to how real light works, and would be pretty difficult to match with voxels, since their density does not increase closer to the light source.

    There are also issues with moving objects, skinned animation, tessellation, alpha discard, and vertex shader effects (waving leaves, etc.). All of these could be tolerated, but I'm sure shadow maps are much faster, so it doesn't make sense to continue on that route.
    I feel I have investigated this pretty thoroughly and now I have a strong answer why voxels cannot replace shadow maps for the direct shadows. I also developed a few pieces of technology that will continue to be used going forward, like our own improved mesh voxelization and the sparse octree traversal routine (which will be used for reflections). And part of this forced me to implement Vulkan dynamic rendering, to get rid of render passes and simplify the code.
    Voxel GI and reflections are still in the works, and I am farther along than ever now. Direct lighting is being performed on the voxel data, but now I am using the shadow maps to light the voxels. The next step is to downsample the lit voxel texture, then perform a GI pass, downsample again, and perform the second GI pass / light bounce. Because the octree is now sparse, we will be able to use a higher resolution with faster performance than the earlier videos I showed. And I hope to finally be able to show GI with a second bounce.
  15. Josh

    Articles
    The VK_KHR_dynamic_rendering extension has made its way into Vulkan 1.2.203 and I have implemented this in Ultra Engine. What does it do?
    Instead of creating renderpass objects ahead of time, dynamic rendering allows you to just specify the settings you need as your are performing filling in command buffers with rendering instructions. From the Khronos working group:
    In my experience, post-processing effects is where this hurt the most. The engine has a user-defined stack of post-processing effects, so there are many configurations possible. You had to store and cache a lot of renderpass objects for all possible combinations of settings. It's not impossible but it made things very very complicated. Basically, you have to know every little detail of how the renderpass object is going to be used in advance. I had several different functions like the code below, for initialing renderpasses that were meant to be used at various points in the rendering routine.
    bool RenderPass::InitializePostProcess(shared_ptr<GPUDevice> device, const VkFormat depthformat, const int colorComponents, const bool lastpass) { this->clearmode = clearmode; VkFormat colorformat = __FramebufferColorFormat; this->colorcomponents = colorComponents; if (depthformat != 0) this->depthcomponent = true; this->device = device; std::array< VkSubpassDependency, 2> dependencies; dependencies[0] = {}; dependencies[0].srcSubpass = VK_SUBPASS_EXTERNAL; dependencies[0].dstSubpass = 0; dependencies[0].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; dependencies[0].srcAccessMask = 0; dependencies[0].dstStageMask = VK_PIPELINE_STAGE_EARLY_FRAGMENT_TESTS_BIT; dependencies[0].dstAccessMask = VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_READ_BIT | VK_ACCESS_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT; dependencies[1] = {}; dependencies[1].srcSubpass = VK_SUBPASS_EXTERNAL; dependencies[1].dstSubpass = 0; dependencies[1].srcStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; dependencies[1].srcAccessMask = 0; dependencies[1].dstStageMask = VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT; dependencies[1].dstAccessMask = VK_ACCESS_COLOR_ATTACHMENT_READ_BIT | VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT; renderPassInfo = {}; renderPassInfo.sType = VK_STRUCTURE_TYPE_RENDER_PASS_CREATE_INFO; renderPassInfo.attachmentCount = colorComponents; renderPassInfo.dependencyCount = colorComponents; if (depthformat == VK_FORMAT_UNDEFINED) { dependencies[0] = dependencies[1]; } else { renderPassInfo.attachmentCount++; renderPassInfo.dependencyCount++; } renderPassInfo.pDependencies = dependencies.data(); colorAttachment[0] = {}; colorAttachment[0].format = colorformat; colorAttachment[0].samples = VK_SAMPLE_COUNT_1_BIT; colorAttachment[0].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; colorAttachment[0].loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; colorAttachment[0].storeOp = VK_ATTACHMENT_STORE_OP_STORE; colorAttachment[0].stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; colorAttachment[0].stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE; colorAttachment[0].finalLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; if (lastpass) colorAttachment[0].finalLayout = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR; VkAttachmentReference colorAttachmentRef = {}; colorAttachmentRef.attachment = 0; colorAttachmentRef.layout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; depthAttachment = {}; VkAttachmentReference depthAttachmentRef = {}; if (depthformat != VK_FORMAT_UNDEFINED) { colorAttachmentRef.attachment = 1; depthAttachment.format = depthformat; depthAttachment.samples = VK_SAMPLE_COUNT_1_BIT; depthAttachment.loadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; depthAttachment.initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;// VK_IMAGE_LAYOUT_UNDEFINED; depthAttachment.storeOp = VK_ATTACHMENT_STORE_OP_STORE; depthAttachment.stencilLoadOp = VK_ATTACHMENT_LOAD_OP_DONT_CARE; depthAttachment.stencilStoreOp = VK_ATTACHMENT_STORE_OP_DONT_CARE; depthAttachment.finalLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; depthAttachmentRef.attachment = 0; depthAttachmentRef.layout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL; } colorAttachment[0].initialLayout = VK_IMAGE_LAYOUT_UNDEFINED; depthAttachment.initialLayout = VK_IMAGE_LAYOUT_DEPTH_STENCIL_ATTACHMENT_OPTIMAL;// VK_IMAGE_LAYOUT_UNDEFINED; subpasses.push_back( {} ); subpasses[0].pipelineBindPoint = VK_PIPELINE_BIND_POINT_GRAPHICS; subpasses[0].colorAttachmentCount = colorComponents; subpasses[0].pColorAttachments = &colorAttachmentRef; subpasses[0].pDepthStencilAttachment = NULL; if (depthformat != VK_FORMAT_UNDEFINED) subpasses[0].pDepthStencilAttachment = &depthAttachmentRef; VkAttachmentDescription attachments[2] = { colorAttachment[0], depthAttachment }; renderPassInfo.subpassCount = subpasses.size(); renderPassInfo.pAttachments = attachments; renderPassInfo.pSubpasses = subpasses.data(); VkAssert(vkCreateRenderPass(device->device, &renderPassInfo, nullptr, &pass)); return true; } This gives you an idea of just how many render passes I had to create in advance:
    // Initialize Render Passes shadowpass[0] = make_shared<RenderPass>(); shadowpass[0]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true);//, CLEAR_DEPTH, -1); shadowpass[1] = make_shared<RenderPass>(); shadowpass[1]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true, true, true, 0); if (MULTIPASS_CUBEMAP) { cubeshadowpass[0] = make_shared<RenderPass>(); cubeshadowpass[0]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true, true, true, CLEAR_DEPTH, 6); cubeshadowpass[1] = make_shared<RenderPass>(); cubeshadowpass[1]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 0, true, true, true, 0, 6); } //shaderStages[0] = TEMPSHADER->shaderStages[0]; //shaderStages[4] = TEMPSHADER->shaderStages[4]; posteffectspass = make_shared<RenderPass>(); posteffectspass->InitializePostProcess(dynamic_pointer_cast<GPUDevice>(Self()), VK_FORMAT_UNDEFINED, 1, false); raytracingpass = make_shared<RenderPass>(); raytracingpass->InitializeRaytrace(dynamic_pointer_cast<GPUDevice>(Self())); lastposteffectspass = make_shared<RenderPass>(); lastposteffectspass->InitializeLastPostProcess(dynamic_pointer_cast<GPUDevice>(Self()), depthformat, 1, false); lastcameralastposteffectspass = make_shared<RenderPass>(); lastcameralastposteffectspass->InitializeLastPostProcess(dynamic_pointer_cast<GPUDevice>(Self()), depthformat, 1, true); { std::vector<VkFormat> colorformats = { __FramebufferColorFormat ,__FramebufferColorFormat, VK_FORMAT_R8G8B8A8_SNORM, VK_FORMAT_R32_SFLOAT }; for (int earlyZPass = 0; earlyZPass < 2; ++earlyZPass) { for (int clearflags = 0; clearflags < 4; ++clearflags) { renderpass[clearflags][earlyZPass] = make_shared<RenderPass>(); renderpass[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 1, false, false, false, clearflags, 1, earlyZPass); renderpassRGBA16[clearflags][earlyZPass] = make_shared<RenderPass>(); renderpassRGBA16[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), colorformats, depthformat, 4, false, false, false, clearflags, 1, earlyZPass); firstrenderpass[clearflags][earlyZPass] = make_shared<RenderPass>(); firstrenderpass[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 1, false, true, false, clearflags, 1, earlyZPass); lastrenderpass[clearflags][earlyZPass] = make_shared<RenderPass>(); lastrenderpass[clearflags][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), { VK_FORMAT_UNDEFINED }, depthformat, 1, false, false, true, clearflags, 1, earlyZPass); //for (int d = 0; d < 2; ++d) { for (int n = 0; n < 5; ++n) { if (n == 2 or n == 3) continue; rendertotexturepass[clearflags][n][earlyZPass] = make_shared<RenderPass>(); rendertotexturepass[clearflags][n][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), colorformats, depthformat, n, true, false, false, clearflags, 1, earlyZPass); firstrendertotexturepass[clearflags][n][earlyZPass] = make_shared<RenderPass>(); firstrendertotexturepass[clearflags][n][earlyZPass]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), colorformats, depthformat, n, true, true, false, clearflags, 1, earlyZPass); // lastrendertotexturepass[clearflags][n] = make_shared<RenderPass>(); // lastrendertotexturepass[clearflags][n]->Initialize(dynamic_pointer_cast<GPUDevice>(Self()), depthformat, n, true, false, true, clearflags); } } } } } With dynamic rendering, you still have to fill in most of the same information, but you can just do it based on whatever the current state of things is, instead of looking for an object that hopefully matches the exact settings you want:
    VkRenderingInfoKHR renderinfo = {}; renderinfo.sType = VK_STRUCTURE_TYPE_RENDERING_INFO_KHR; renderinfo.renderArea = scissor; renderinfo.layerCount = 1; renderinfo.viewMask = 0; renderinfo.colorAttachmentCount = 1; targetbuffer->colorAttachmentInfo[0].imageLayout = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL; targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[0] = 0.0f; targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[1] = 0.0f; targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[2] = 0.0f; targetbuffer->colorAttachmentInfo[0].clearValue.color.float32[3] = 0.0f; targetbuffer->colorAttachmentInfo[0].imageView = targetbuffer->imageviews[0]; renderinfo.pColorAttachments = targetbuffer->colorAttachmentInfo.data(); targetbuffer->depthAttachmentInfo.clearValue.depthStencil.depth = 1.0f; targetbuffer->depthAttachmentInfo.clearValue.depthStencil.stencil = 0; targetbuffer->depthAttachmentInfo.imageLayout = VK_IMAGE_LAYOUT_DEPTH_ATTACHMENT_OPTIMAL; renderinfo.pDepthAttachment = &targetbuffer->depthAttachmentInfo; device->vkCmdBeginRenderingKHR(cb->commandbuffer, &renderinfo); Then there is the way render passes effect the image layout state. With the TransitionImageLayout command, it is fairly easy to track the current state of the image layout, but render passes automatically switch the image layout after completion to a predefined state. Again, not impossible to handle, in and of itself, but when you add these things into the complexity of designing a full engine, things start to get ugly.
    void GPUCommandBuffer::EndRenderPass() { vkCmdEndRenderPass(commandbuffer); for (int k = 0; k < currentrenderpass->layers; ++k) { for (int n = 0; n < currentrenderpass->colorcomponents; ++n) { if (currentdrawbuffer->colortexture[n]) currentdrawbuffer->colortexture[n]->imagelayout[0][currentdrawbuffer->baseface + k] = currentrenderpass->colorAttachment[n].finalLayout; } if (currentdrawbuffer->depthtexture != NULL and currentrenderpass->depthcomponent == true) currentdrawbuffer->depthtexture->imagelayout[0][currentdrawbuffer->baseface + k] = currentrenderpass->depthAttachment.finalLayout; } currentdrawbuffer = NULL; currentrenderpass = NULL; } Another example where this was causing problems was with user-defined texture buffers. One beta tester wanted to implement some interesting effects that required rendering to some HDR color textures, but the system was so static it couldn't handle a user-defined color format in a texture buffer. Again, this is not impossible to overcome, but the practical outcome is I just didn't have enough time because resources are finite.
    It's interesting that this extension also removes the need to create a Vulkan framebuffer object. I guess that means you can just start rendering to any combination of textures you want, so long as they use a format that is renderable by the hardware. Vulkan certainly changes a lot of conceptions we had in OpenGL.
    So this extension does eliminate a significant source of problems for me, and I am happy it was implemented.
  16. Josh
    Previously I described how I was able to save the voxel data into a sparse octree and correctly lookup the right voxel in a shader. This shot shows that each triangle is being rasterized separately, i.e. the triangle bounding box is being correctly trimmed to avoid a lot of overlapping voxels:

    Calculating direct lighting using the sparse octree was very difficult, and took me several days of debugging. I'm not 100% sure what the problem was, other than it seems GLSL code is not quite as flexible as C++. I actually had the same exact function working in GLSL and C++, and it worked perfectly in C++ but gave wrong results in GLSL! Of course I did not have a debugger for my GLSL code, so I ended up having to write a lot of if statements and outputting a pixel color base on the result. In the end I finally tracked the problem down to some data stored in an array, changed the way the routine worked, but what the exact issue was I'll never know.
    With the sparse voxel octree, we only have about 400,000 pixels to draw when we process direct lighting. Rendering all voxels in a 256x256x256 volume texture would require 16 million pixels to be drawn. So the sparse approach requires us to draw only 2% the number of pixels we would have to otherwise. Using shadow maps, on a 1920x1080 screen we would have to calculate about 2,000,000 shadow intersections. Although we are not comparing the same exact things, this does make me optimistic for the final performance results. Basically, instead of calculating shadow visibility for each pixels, we can just calculate per voxel, and your voxels are always going to be quite a bit bigger than screen pixels. So the whole issue of balancing shadow map resolution with screen resolution goes away.
    Ray traversal is very fast because it skips large chunks of empty space, instead of checking every single grid space for a voxel.
    The voxel resolution below is not very high, I am only using one octree, and there's currently no blending / filtering, but that will all come in time.

    Leadwerks 1 and 3D World Studio used lightmaps for lighting. Later versions of Leadwerks used deferred lighting and shadowmaps. Being able to roll out another cutting-edge lighting technology in Ultra Engine is icing on the cake for the new engine. I expect this will allow particle shadows and transparent glass with colored shadows, as well as real-time global illumination and reflections, all with great performance on most hardware.
  17. Josh

    Articles
    My initial implementation of mesh voxelization for ray tracing used this code. It was good for testing, but has some problems:
    It's slow, using an unnecessary and expensive x * y * z loop No support for per-voxel color based on a texture lookup There are mathematical mistakes that cause inaccuracy, and the math has to be perfect My solution addresses these problems and only uses an x * y loop to generate the voxels. It does this by identifying the major (largest magnitude) axis of the triangle normal and using the other two axes for the X and Y axis, then finding the Z position of the triangle at each grid point along the surface.
    In previous screenshots, you could see some black faces that were caused by geometry that lies outside the bounds of the voxel geometry. Some of this was caused because I was voxelizing the mesh in local space and then transforming the resulting voxels to world space. That doesn't work, because the voxel position can end up rounding off to a different coordinate than the triangle it's supposed to enclosed. The best solution is to have a low-res LOD model that is used to generate the voxel data. (It's important to make sure the voxel geometry still contains the full-resolution model.)
    In the shot below you can see every single surface has a voxel to retrieve the color from. There are no texture lookups being performed in this shot, just colored voxels that were generated by reading the image pixel at each voxel position and then stored in a GPU buffer.

    This means we can safely calculate which lights are visible at each voxel and store those light IDs in a texture to be retrieved instead of performing a shadowmap lookup. It also means we can calculate global illumination at each voxel, store it in a texture, and just do a single texture lookup to grab the GI lighting when the actual model is rendered, instead of calculating GI each frame.
    Onwards and upwards!
  18. Josh

    Articles
    Previously I noted that since Voxel global illumination involves calculation of direct lighting, it would actually be possible to do away with shadow maps altogether, and use voxels for direct and global illumination. This can eliminate the problems of image-based shadows like shadow acne and adjusting the shadow map size. I also believe this method will turn out a lot faster than shadow map rendering, and you know how I like fast performance. 
    The sparse voxel octree node structure consumes 48 bytes and looks like this:
    struct SparseVoxelOctreeTreeNode { uint32_t index, parent, color, emission; uint32_t child[2][2][2]; } It might be possible to eliminate the index and parent values, but the structure size has to be aligned to 16 bytes in GPU memory anyways, so I don't see it getting any smaller.
    In my test scenario, the sparse voxel octree creates 353,345 voxels, which consumes 14% the memory of the uncompressed data, but is only a little bit smaller than compressed volume textures, and I could see the SVO data being bigger than a compressed 3D texture.
    Uncompressed, diffuse + emission
    256*256*256*4 + 256*256*256*3 = 67108864 + 50331648 = 117440512 bytes = 112 Mb
    DXT5 compressed diffuse + DXT1 compressed emission
    16777216 + 8388608 = 25165824 bytes = 24 Mb
    Sparse Voxel Octree
    353345 * 48 = 16.2 Mb
    That's just for the voxelized triangles' diffuse color. We still need size textures to store direct lighting, one for each direction on each axis. Since these are rendered to in a shader, I don't see any texture compression happening here:
    256 * 256 * 256 * 4 * 6 bytes = 384 Mb
    If we store the lit sparse voxels in a 1024x512 RGBA textures (x6), that consumes a lot less memory:
    1024 * 512 * 4 * 6 = 12 Mb
    So overall, we do see a very significant reduction in memory usage when we make the octree sparse. It's also going to be A LOT more efficient to render to one 512x1024 texture buffer (with six color attachments), instead of rendering 256 separate slices of a volume texture.
    Looking up a single value in a sparse voxel octree is more complex than a simple texture sampler, because it has to iterate through all the levels of the tree, down to the terminal node. However, ray traversal should be much faster with the sparse voxel octree, because the algorithm can efficiently skip over large empty spaces. An interesting challenge is the fact that GLSL does not support recursive function calls, so recursive functions have to be written in creative ways. This one isn't too bad, but when you hav branching pathways in a ray traversal, it can get pretty complicated:
    bool SVOTGetNodeColor(in uint x, in uint y, in uint z, out vec4 diffuse, out vec3 emission) { diffuse = vec4(0,0,0,0); uint index = 1; int maxlevels = 10; uint size = 256; if (x >= size || y >= size || z >= size) return true; uint hsize; uint px,py,pz,childindex; for (int n = 0; n < maxlevels - 1; n++) { hsize = size / 2; px = uint(x >= hsize); py = uint(y >= hsize); pz = uint(z >= hsize); index = svotnodes[index - 1].child[px * 4 + py * 2 + pz]; if (index == 0) return false; x -= px * hsize; y -= py * hsize; z -= pz * hsize; size = hsize; } diffuse = SVOTNodeGetDiffuse(index); return true; } In this shot, I am rendering the original mesh geometry and doing a texture lookup in the sparse voxel octree to find the color stored for the voxel at each point. There's a few places where the surface appears black, meaning that the point being rendered lies outside the bounds of any voxel saved. Maybe there is a problem with the precision of the triangle voxelization routine, I will have to look into this further.

    The important point is that this image is being rendered without the use of a volume texture. You are seeing the sparse voxel octree being successfully sent to and navigated within in the fragment shader.
    The next step will be to take the diffuse colors and render direct lighting into a separate texture. That means my clustered forward rendering implementation will not get used, but developing that did prepare me for what I must do next. Instead of placing all the lights into a grid oriented around the camera frustum, I need to place them in a grid in world space, with the camera at the center of the grid. This is actually quite a lot simpler. 
  19. Josh
    One month ago I began work to investigate what it would take to bring Ultra App Kit, the foundation for our new generation of game development tools, to Linux. Today I am happy to share my progress with you and discuss some of the things I have learned.
    Developed by MIT in the year 1984, X11 is an interesting beast that is easy to start with, but can become quite difficult once you get into the details. (Wayland support is of course an obvious step in the not-too-distant future but I have to work with what exists here now today and Ubuntu 20.04 still uses X by default.) The single hardest part had to do with calls to XSetInputFocus on windows that had not yet been mapped. X has a weird asynchronous design, yet XSetInputFocus does not seem to get added to the command queue and instead depends on the mapping (visible) state of a window right now. That means that is you create or show a window and then immediately activate it, an error will occur that looks something like this:
    Error of failed request: BadMatch (invalid parameter attributes) Major opcode of failed request: 42 (X_SetInputFocus) Serial number of failed request: 12 Current serial number in output stream: 12 The way around this is to call XMapWindow and then wait on the event queue until a MapNotify event for that window occurs, adding all other events into a list that can be evaluated on the next call to WaitEvent(). The time elapsed is checked inside the loop in case something weird happens and the desired event is never received:
    void Window::Show() { if (!Hidden()) return; XMapWindow(display->GetHandle(),xwindow); XMoveResizeWindow(display->GetHandle(), GetHandle(),m_position.x,m_position.y,size.x,size.y); XFlush(display->GetHandle()); XSync(display->GetHandle(),false); auto engine = GameEngine::Get(); XEvent ev = {}; auto start = Millisecs(); while (xhidestate) { XNextEvent(display->GetHandle(),&ev); if (ev.type == MapNotify and ev.xany.window == GetHandle()) { xhidestate = false; return; } if (engine) engine->storedxevents.push_back(ev); Sleep(1); if (Millisecs() - start > 5000) { Warn("MapNotify is taking a long time to be received. This may cause window errors."); return; } } } POSIX timers are strange creatures that seem to follow rules all their own. A timer callback gets triggered during any call to sleep() when a timer tick has occurred, but a mutex lock inside the callback will freeze the program. I ended up using the much simpler timerfd interface.
    Double-buffering, good text rendering, and alpha blending are all different extensions built on top of the base X11 system. Getting all of this to work together took a lot of trial and error. However, I think you will agree based on the screenshots below that this work has been worthwhile.


    Multi-select draggable treeview with insertion between nodes
     

    Multi-line text display with optional word wrapping
     

    Hierarchical menu system with real popup windows
     
    For the final leg of development I have set up a small Kickstarter campaign. If you haven't gotten Ultra App Kit yet this is a good opportunity to grab it before the Linux build is released. Ultra App Kit can also be purchase in our store or on Steam.
     
  20. Josh

    Articles
    This is an update on my progress of our voxel raytracing system. VXRT is designed to provide all the reflection information that PBR materials use. If a picture is worth a thousand words, then this counts as a 5000 word article.
    Direct lighting:

    Global illumination:

    Specular reflection:

    Skybox component:

    Final combined image:

  21. Josh

    Articles
    The Ultra Engine editor is designed to be expandable and modifiable.  Lua script is integrated into the editor and can be used to write editor extensions and even modify the scene or the editor itself in real-time. 
    We can create a scene object entirely in code and make it appear in the scene browser tree:
    box = CreateBox(editor.world) box.name = "box01" o = CreateSceneObject(box) --make editor recognize the entity and add it to the scene browser o:SetSelected(true)
    We can even modify the editor itself and start adding new features to the interface:
    editor.sidepanel.tabber:AddItem("Roads") p = CreatePanel(0,0,300,500,editor.sidepanel.tabber) button = CreateButton("Create",20,20,100,30,p)
    Of course, you would usually want to put all this code in a script file and either run the script by selecting the Script > Run Script... menu item, or placing the script in the "Scripts/Start" folder so it automatically gets run at startup. But it sure is cool to be able to experiment live with Lua right in the console and see the result of your code instantly.
  22. Josh

    Articles
    A while back I wrote enthusiastically about Basis Universal super compression. KTX2 is a texture file format from Khronos, makers of the Vulkan and glTF specifications. Like DDS files, KTX2 can store multiple mipmaps, as well as memory-compressed texture formats like DXT5 and BC7. However, KTX2 now supports Basis compressed data as well, which makes it the all-in-one universal texture format. glTF has an official extension for KTX2 textures in glTF files, so it can be combined with Draco mesh compression to compress your overall game model sizes:

    Additionally, KTX2 also includes information about clamp and filter settings,. The reason I implemented the .tex texture format in Leadwerks was because DDS lacks these features and I wanted it stored in the texture file.
    I've added built-in KTX2 texture loading and saving, so you can easily save and load these files. I plan to make KTX2 the recommended texture file format for Ultra Engine.

  23. Josh

    Articles
    Google Draco is a library that aims to do for mesh data what MP3 and OGG did for music. It does not reduce memory usage once a mesh is loaded, but it could reduce file sizes and improve download times. Although mesh data does not tend to use much disk space, I am always interested in optimization. Furthermore, some of the NASA models I work with are very high-poly, and do take up significant disk space. Google offers a very compelling chart showing a compression ratio of about 95%:

    However, there is not much information given about the original mesh. Is it an ASCII .obj file? Of course that would be much bigger than binary data. I wanted to get a clear look at what kind of compression ratios I could expect, within the context of glTF files. I found a farily high-poly model on SketchFab here to work with.

    This model has 2.1 million triangles and 1 million vertices. That should be plenty to test with.
    Now, glTF is actually three different file formats. Normal glTF files store JSON data and come with an extra .bin file for binary data. This stores things like vertex positions and animation data, stuff you probably won't want to edit by hand. The .glb version of the format combines JSON and binary data into a single file, which can be viewed but not edited in a text editing program. Finally, there is also base64 glTF, which stores JSON together with binary data with base64 encoding in a single file. The base64 data looks like gibberish, but the file can be opened in a text editor, modified, and resaved without destroying the binary data.
    I was very curious to see what advantage Google Draco mesh compression would offer. Would it make glTF files significantly smaller, so that your games take up less space and have faster download times?
    To answer this question, I imported the model into Blender and exported several versions. I only exported position, normal, and texture coordinates. I also loaded the uncompressed .glb file in Ultra Engine and resaved it with simple mesh quantization.

    As you can see, mesh quantization (using one byte for each normal component, plus one byte for padding, and two bytes for each texture coordinate component) combined with regular old ZIP compression comes in significantly smaller than Draco compression at the maximum compression level. It's not in the chart, but I also tried ZIP compression the smallest Draco file, and that was still bigger at 28.8 MB.
    You can look at the models yourself here:
    dracotest.zip
    Based on this test, it appears that Google Draco is only marginally smaller than an uncompressed quantitized mesh, and still slightly bigger when ZIP compression are applied to both. Unless someone can show me otherwise, it does not appear that Google Draco mesh compression offers the 95% reduction in file sizes they seem to promise.
    Correction:
    This model was made up of several sub-objects. I collapsed the model and resaved it, and Draco now produces compression more like I was expecting to see:
    collapsed-draco.zip
    Presumably this means whatever data structure they use takes up a certain amount of space (probably an n-dimensional tree), and having fewer of these structures is more optimal.
    Here is the corrected comparison chart. This is good. Draco shrank this model to 7% the size of the uncompressed .glb export:

    This will be very useful for 3D scans and CAD models, as long as they don't contain a lot of articulated subobjects. Original model is on the left, Draco compressed model is on the right:

  24. Josh

    Articles
    The glTF importer took a very long time to develop, but it much easier to write a glTF save routine. In one day I got an exporter working with support for everything except skinning and animation. To save a model in glTF format, just call Model::Save("mymodel.gltf") and it will work! Entire scenes can also be saved in glTF format.Here is a model that was loaded from Leadwerks MDL, MAT, and TEX files and saved as glTF. The textures are converted to PNG files. (Microsoft has an official extension for adding DDS textures into the file, and I plan to implement that next.)

    Take a look at the exported file in your favorite modeling application:
    station.zip
    So you can load a model from any format supported by import plugins, and then save it as glTF flawlessly. Or, you can set up an automatic conversion in your project settings, so that the editor will automatically convert files from one format to another any time they added or resaved in your project.

    Ultra Engine uses the most widely compatible file formats available, and loads assets directly from the standard computer file system, so your game assets are always easy to access, modify, and replace.
×
×
  • Create New...