Jump to content


  • Content Count

  • Joined

  • Last visited

Community Reputation

8,211 Excellent

About Josh

  • Rank
    Advanced Member

Profile Information

  • Gender
  • Location
    San Francisco, CA

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

  1. Now intermittently hitting 100% CPU usage on an 8-core processor.

  2. The reason four threads was less than 25% the speed of one is because some calculations were being skipped. I fixed that and the numbers are a little higher now, but still form the same curve.
  3. Multithreading is very useful for processes that can be split into a lot of parallel parts, like image and video processing. I wanted to speed up the normal updating for the new terrain system so I added a new thread creation function that accepts any function as the input, so I can use std::bind with it, the same way I have been easily using this to send instructions in between threads: shared_ptr<Thread> CreateThread(std::function<void()> instruction); The terrain update normal function has two overloads. Once can accept parameters for the exact area to update, but if no parameters are supplied the entire terrain is updated: virtual void UpdateNormals(const int x, const int y, const int width, const int height); virtual void UpdateNormals(); This is what the second overloaded function looked like before: void Terrain::UpdateNormals() { UpdateNormals(0, 0, resolution.x, resolution.y); } And this is what it looks like now: void Terrain::UpdateNormals() { const int MAX_THREADS_X = 4; const int MAX_THREADS_Y = 4; std::array<shared_ptr<Thread>, MAX_THREADS_X * MAX_THREADS_Y> threads; Assert((resolution.x / MAX_THREADS_X) * MAX_THREADS_X == resolution.x); Assert((resolution.y / MAX_THREADS_Y) * MAX_THREADS_Y == resolution.y); for (int y = 0; y < MAX_THREADS_Y; ++y) { for (int x = 0; x < MAX_THREADS_X; ++x) { threads[y * MAX_THREADS_X + x] = CreateThread(std::bind((void(Terrain::*)(int, int, int, int)) & Terrain::UpdateNormals, this, x * resolution.x / MAX_THREADS_X, y * resolution.y / MAX_THREADS_Y, resolution.x / MAX_THREADS_X, resolution.y / MAX_THREADS_Y)); } } for (auto thread : threads) { thread->Resume(); } for (auto thread : threads) { thread->Wait(); } } Here are the results, using a 2048x2048 terrain. You can see that multithreading dramatically reduced the update time. Interestingly, four threads runs more than four times faster than a single thread. It looks like 16 threads is the sweet spot, at least on this machine, with a 10x improvement in performance.
  4. A similar technique for terrain reduced system memory by 130 mb on a 2048x2048 terrain.
  5. What I mean is GTK and some other Linux libs that change with each version of Linux. There is another reason to make components optional, and that is because we can more easily update software for enterprise customers if they can skip optional components. Steam and GA would both be good candidates to make as a plugin, though I am not sure yet exactly how it would work.
  6. I tested a 1.5 million poly mesh and the BSP generation time was about three seconds. Not terrible given the size, but I would like to completely eliminate that annoying delay.
  7. We support Ubuntu 16.04. In the future I am going to use as few of third-party libraries as possible on Linux, but it's not possible to fix existing libs that keep breaking. The engine scales well, but you have to pay attention to optimization. I have seen people do things like have ten point lights constantly re-rendering a million extra polygons each frame, or a 10,000 poly collision mesh, and of course this can kill performance. Sometimes people want to just ignore these kinds of things, but optimization is an important part of development, and it should be paid attention to from the very start. The new engine I am working on does some very advanced things in order to handle the massive data the aerospace customers are working with, and I think this will have a good effect for game developers as well. It will still be possible to hit bottlenecks in the new engine, but I think most people will never come close.
  8. The GMF2 data format gives us fine control to enable fast load times. Vertex and indice data is stored in the exact same format the GPU uses so everything is read straight into video memory. There are some other optimizations I have wanted to make for a long time, and our use of big CAD models makes this a perfect time to implement these improvements. Precomputed BSP Trees For raycast (pick) operations, a BSP structure needs to be constructed from the mesh data. In Leadwerks Engine this structure is computed after a model is loaded. The "Bober Station: model from The Zone is a 46,000 poly model. Here are the build times for its BSP structure: Release: 500 milliseconds Debug: 6143 milliseconds This is not our biggest mesh. Let's assume a two million poly mesh, which is actually something we come across with CAD data. With this size model our load times increase 40x: Release: 20 seconds Debug: 4 minutes Those numbers are not good. To reduce load times I have embedded precomputed BSP structures into the GMF2 format so that they can be loaded from the file. I don't have any numbers yet because there is currently no GMF1 to GMF2 pipeline, but I believe this will significantly reduce load times, especially for large models. I am interested in the idea of replacing the BSP float vertex values with shorts to reduce memory, but I'm not going to worry about it right now. Static Meshes I also am adding making the engine dump vertex and indice data from memory by default when a mesh is sent to the GPU. This will reduce memory usage, since you normally don't need a copy of the vertex data in memory for any reason. void Mesh::Finalize(const bool makestatic) { GameEngine::Get()->renderingthreadmanager->AddInstruction(std::bind(&RenderMesh::Modify, rendermesh, vertices, indices)); if (makestatic) { if (collider == nullptr) UpdateCollider(); vertices.clear(); vertices.shrink_to_fit(); indices.clear(); indices.shrink_to_fit(); } } This change reduced our system memory usage in "The Zone" by 40 mb.
  9. Josh

    Compressed Vertex Arrays

    My best guess is this yielded a 7% performance boost, across the board. I'm not going to test in detail because it takes a lot of changes to modify the vertex format.
  10. Josh

    Compressed Vertex Arrays

    In fact, Leadwerks model files are usually measured in kilobytes rather than megabytes, so compressing those further would make very little difference in your game data size.
  11. My work with the MD3 format and its funny short vertex positions made me think about vertex array sizes. (Interestingly, the Quake 2 MD2 format uses a single char for vertex positions!) Smaller vertex formats run faster, so it made sense to look into this. Here was our vertex format before, weighing in at 72 bytes: struct Vertex { Vec3 position; float displacement; Vec3 normal; Vec2 texcoords[2]; Vec4 tangent; unsigned char color[4]; unsigned char boneweights[4]; unsigned char boneindices[4]; } According to the OpenGL wiki, it is okay to replace the normals with a signed char. And if this works for normals, it will also work for tangents, since they are just another vector. I also implemented half floats for the texture coordinates. Here is the vertex structure now at a mere 40 bytes, about half the size: struct Vertex { Vec3 position; signed char normal[3]; signed char displacement; unsigned short texcoords[4]; signed char tangent[4]; unsigned char color[4]; unsigned char boneweights[4]; unsigned char boneindices[4]; } Everything works with no visible loss of quality. Half-floats for the position would reduce the size by an additional 6 bytes, but would likely incur two bytes of padding since it would no longer by aligned to four bytes, like most GPUs prefer. So it would really only save four bytes, which is not worth it for half the precision. Another interesting thing I might work into the GMF format is Draco mesh compression by Google. This is only for shrinking file sizes, and does not lead to any increased performance. The bulk of your game data is going to be in texture files, so this isn't too important but might be a nice extra.
  12. Just posting my code here for the beginning of a file export for 3ds max: MacroScript SaveMapMacro category:"File" buttontext:"A Save GMF..." tooltip:"A Save GMF..." ( on execute do ( try ( createFile "C:\\file.txt" ) catch ( format "*** % ***\n" (getCurrentException()) ) ) ) You have to add the macro to the menu via "customize user interface". I started the name with "A " to make the item easier to find in that huge list.
  13. Josh

    GMF2 SDK Update

    The GMF2 SDK has been updated with tangents and bounds calculation, object colors, and save-to-file functionality. The GMF2 SDK is a lightweight engine-agnostic open-source toolkit for loading and saving of game models. The GMF2 format can work as a standalone file format or simply as a data format for import and export plugins. This gives us a protocol we can pull model data into and export model data from, and a format that loads large data much faster than alternative file formats. Here is an example showing how to construct a GMF2 model and save it to a file: //Create a GMF file GMFFile* file = new GMFFile; //Create a model GMFNode* node = new GMFNode(file, GMF_TYPE_MODEL); //Set the orientation node->SetMatrix(1,0,0,0, 0,1,0,0, 0,0,1,0, 0,0,0,1); //Set the object color node->SetColor(0,0,1,1); //Add one LOD level GMFLOD* lod = node->AddLOD(); //Create a triangle mesh and add it to the LOD. (Meshes can be shared.) GMFMesh* mesh = new GMFMesh(file, 3); lod->AddMesh(mesh); //Add vertices mesh->AddVertex(-0.5,0.5,0, 0,0,1, 0,0); mesh->AddVertex(0.5,0.5,0, 0,0,1, 1,0); mesh->AddVertex(0.5,-0.5,0, 0,0,1, 1,1); mesh->AddVertex(-0.5,-0.5,0, 0,0,1, 0,1); //Add triangles mesh->AddPolygon(0,1,2); mesh->AddPolygon(0,2,3); //Build vertex tangents (optional) mesh->UpdateTangents(); //Save data to file file->Save("out.gmf"); //Cleanup - This will get rid of all created objects delete file; You can get the GMF2 SDK on GitHub.
  14. Josh

    Supernatural City

    There is a very strange place I know. I might be able to sneak in, in two weeks.
  15. Upon final release, yes.
  • Create New...