Jump to content

Unicode in Leadwerks 5

Josh

1,500 views

I've begun implementing unicode in Leadwerks Game Engine 5.  It's not quite as simple as "switch all string variables to another data type".

First, I will give you a simple explanation of what unicode is.  I am not an expert so feel free to make any corrections in the comments below.

When computers first started drawing text we used a single byte for each character.  One byte can describe 256 different values and the English language only has 26 letters, 10 numbers, and a few other characters for punctuation so all was well.  No one cared or thought about supporting other languages like German with funny umlauts or the thousands of characters in the Chinese language.

Then some people who were too smart for their own good invented a really complicated system called unicode.  Unicode allows characters beyond the 256 character limit of a byte because it can use more than one byte per character.  But unicode doesn't really store a letter, because that would be too easy.  Instead it stores a "code point" which is an abstract concept.  Unfortunately the people who originally invented unicode were locked away in a mental asylum where they remain to this day, so no one in the real world actually understands what a code point is.

There are several kinds of unicode but the one favored by nerds who don't write software is UTF-8.  UTF-8 uses just one byte per character, unless it uses two, or sometimes four.  Because each character can be a different length there is no way to quickly get a single letter of a string.  It would be like trying to get a single byte of a compressed zip file; you have to decompress the entire file to read a byte at a certain position.  This means that commands like Replace(), Mid(), Upper(), and basically any other string manipulation commands simply will not work with UTF-8 strings.

Nonetheless, some people still promote UTF-8 religiously because they think it sounds cool and they don't actually write software.  There is even a UTF-8 Everywhere Manifesto.  You know who else had a manifesto?  This guy, that's who:

Karl_Marx.thumb.jpg.1ece92f31b80f2b01f9a2b0bd51b0381.jpg

Typical UTF-8 proponent.

Here's another person with a "manifesto":

Theodore_Kaczynski.jpg.c537d9a7ae01f7b53e773b19a7c0bc49.jpg

The Unabomber (unibomber? Coincidence???)

The fact is that anyone who writes a manifesto is evil, therefore UTF-8 proponents are evil and should probably be imprisoned for crimes against humanity.  Microsoft sensibly solved this problem by using something called a "wide string" for all the windows internals.  A C++ wide string (std::wstring) is a string made up of wchar_t values instead of char values.  (The std::string data type is sometimes called a "narrow string").  In C++ you can set the value of a wide string by placing a letter "L" (for long?) in front of the string:

std::wstring s = L"Hello, how are you today?";

The C++11 specification defines a wchar_t value as being composed of two bytes, so these strings work the same across different operating systems.  A wide string cannot display a character with an index greater than 65535, but no one uses those characters so it doesn't matter.  Wide strings are basically a different kind of unicode called UTF-16 and these will actually work with string manipulation commands (yes there are exceptions if you are trying to display ancient Vietnamese characters from the 6th century but no one cares about that).

For more detail you can read this article about the technical details and history of unicode (thanks @Einlander).

First Pass

At first I thought "no problem, I will just turn all string variables into wstrings and be done with it".  However, after a couple of days it became clear that this would be problematic.  Leadwerks interfaces with a lot of third-party libraries like Steamworks and Lua that make heavy use of strings.  Typically these libraries will accept a chr* value for the input, which we know might be UTF-8 or it might not (another reason UTF-8 is evil).  The engine ended up with a TON of string conversions that I might be doing for no reason.  I got the compiler down to 2991 errors before I started questioning whether this was really needed.

Exactly what do we need unicode strings for?  There are three big uses:

  • Read and save files.
  • Display text in different languages.
  • Print text to the console and log.

Reading files is mostly an automatic process because the user typically uses relative file paths.  As long as the engine internally uses a wide string to load files the user can happily use regular old narrow strings without a care in the world (and most people probably will).

Drawing text to the screen or on a GUI widget is very important for supporting different languages, but that is only one use.  Is it really necessary to convert every variable in the engine to a wide string just to support this one feature?

Printing strings is even simpler.  Can't we just add an overload to print a wide string when one is needed?

I originally wanted to avoid mixing wide and narrow strings, but even with unicode support most users are probably not even going to need to worry about using wide strings at all.  Even if they have language files for different translations of their game, they are still likely to just load some strings automatically without writing much code.  I may even add a feature that does this automatically for displayed text.  So with that in mind, I decided to roll everything back and convert only the parts of the engine that would actually benefit from unicode and wide strings.

Second Try + Global Functions

To make the API simpler Leadwerks 5 will make use of some global functions instead of trying to turn everything into a class.  Below are the string global functions I have written:

std::string String(const std::wstring& s);
std::string Right(const std::string& s, const int length);
std::string Left(const std::string& s, const int length);
std::string Replace(const std::string& s, const std::string& from, const std::string& to);
int Find(const std::string& s, const std::string& token);
std::vector<std::string> Split(const std::string& s, const std::string& sep);
std::string Lower(const std::string& s);
std::string Upper(const std::string& s);

There are equivalent functions that work with wide strings.

std::wstring StringW(const std::string& s);
std::wstring Right(const std::wstring& s, const int length);
std::wstring Left(const std::wstring& s, const int length);
std::wstring Replace(const std::wstring& s, const std::wstring& from, const std::wstring& to);
int Find(const std::string& s, const std::wstring& token);
std::vector<std::wstring> Split(const std::wstring& s, const std::wstring& sep);
std::wstring Lower(const std::wstring& s);
std::wstring Upper(const std::wstring& s);

The System::Print() command has become a global Print() command with a couple of overloads for both narrow and wide strings:

void Print(const std::string& s);
void Print(const std::wstring& s);

The file system commands are now global functions as well.  File system commands can accept a wide or narrow string, but any functions that return a path will always return a wide string:

std::wstring SpecialDir(const std::string);
std::wstring CurrentDir();
bool ChangeDir(const std::string& path);
bool ChangeDir(const std::wstring& path);
std::wstring RealPath(const std::string& path);
std::wstring RealPath(const std::wstring& path);

This means if you call ReadFile("info.txt") with a narrow string the file will still be loaded even if it is located somewhere like "C:/Users/约书亚/Documents" and it will work just fine.  This is ideal since Lua 5.3 doesn't support wide strings, so your game will still run on computers around the world as long as you just use local paths like this:

LoadModel("Models/car.mdl");

Or you can specify the full path with a wide string:

LoadModel(CurrentDir() + L"Models/car.mdl");

The window creation and text drawing functions will also get an overload that accepts wide strings.  Here's a window created with a Chinese title:

Image1.jpg.987ae08e739ae771c6a6bda0bef35ec0.jpg

So in conclusion, unicode will be used in Leadwerks and will work for the most part without you needing to know or do anything different, allowing games you develop (and Leadwerks itself) to work correctly on computers all across the world.



5 Comments


Recommended Comments

Great stuff. It'll be useful for multiplayer, especially game applications that uses the Steamworks API since a lot of Steam users have wingdings on their names, the symbols disappear when their names show on chat box in the current version of LE. 

Share this comment


Link to comment

Worked all day on it, and now both these lines of code will successfully load a model;.  The first uses the wide string function overload and the second uses a UTF-8 narrow string which is then converted into a wide string by the engine:

auto model = LoadModel(L"Models/汽车/formation1.mdl");
auto model = LoadModel(u8"Models/汽车/formation1.mdl");

This code, however, will not work:

auto model = LoadModel("Models/汽车/formation1.mdl");

But this will:

ChangeDir(L"Models/汽车");
auto model = LoadModel("formation1.mdl");

 

Share this comment


Link to comment

Are you wrapping your definitions in #extern "C"{ }?

Would be cool to use something like Golang with Leadwerks. Performance might suck, but it would be worth a shot IMO.

Share this comment


Link to comment

It would be interesting if the asian market latches on to Leadwerks since it is a lower long term cost than Unity and Unreal. Thats a huge market. Them and russia.

Share this comment


Link to comment
2 hours ago, Einlander said:

It would be interesting if the asian market latches on to Leadwerks since it is a lower long term cost than Unity and Unreal. Thats a huge market. Them and russia.

Russia is actually our #2 or 3 market after the U.S, if you don't count western Europe as one region.

Share this comment


Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Add a comment...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Blog Entries

    • By Josh in Josh's Dev Blog 7
      You might have seen this graphic comparing the size of the world in different games. I've played Fuel, and never reached the end of the world in that game. You can drive for a very long time on those roads.

      We want to use the new engine for realistic simulations of air and ground movements. At normal cruising altitude of a commercial airliner, the pilot has a view range of about 400 kilometers. The image below shows that area (800 x 800 km). You can see the areas of the biggest games ever fit neatly into the corner of just our visible area.

      The gray space above is not the total world size, it is just the area you can see at once from high altitude. The total world size is about 50 times bigger.
      This is what I am working on now.
    • By Josh in Josh's Dev Blog 26
      Gamers have always been fascinated with the idea of endless areas to roam.  It seems we are always artificially constrained within a small area to play in, and the possibility of an entire world outside those bounds is tantalizing.  The game FUEL captured this idea by presenting the player with an enormous world that took hours to drive across:
      In the past, I always implemented terrain with one big heightmap texture, which had a fixed size like 1024x1024, 2048x2048, etc.  However, our vegetation system, featured in the book Game Engine Gems 3, required a different approach.  There was far too many instances of grass, trees, and rocks to store them all in memory, and I wanted to do something really radical.  The solution was to create an algorithm that could instantly calculate all the vegetation instances in a given area.  The algorithm would always produce the same result, but the actual data would never be saved, it was just retrieved in the area where you needed it, when you needed it.  So with a few modifications, our vegetation system is already set up to generate infinite instances far into the distance.

      However, terrain is problematic.  Just because an area is too far away to see doesn't mean it should stop existing.  If we don't store the terrain in memory then how do we prevent far away objects from falling into the ground?  I don't like the idea of disabling far away physics because it makes things very complex for the end user.  There are definitely some tricks we can add like not updating far away AI agents, but I want everything to just work by default, to the best of my ability.
      It was during the development of the vegetation system that I realized the MISSING PIECE to this puzzle.  The secret is in the way collision works with vegetation.  When any object moves all the collidable vegetation instances around it are retrieved and collision is performed on this fetched data.  We can do the exact same thing with terrain   Imagine a log rolling across the terrain.  We could use an algorithm to generate all the triangles it potentially could collide with, like in the image below.

      You can probably imagine how it would be easy to lay out an infinite grid of flat squares around the player, wherever he is standing in the world.

      What if we only save heightmap data for the squares the user modifies in the editor?  They can't possibly modify the entire universe, so let's just save their changes and make the default terrain flat.  It won't be very interesting, but it will work, right?
      What if instead of being flat by default, there was a function we had that would procedurally calculate the terrain height at any point?  The input would be the XZ position in the world and the output would be a heightmap value.

      If we used this, then we would have an entire procedurally generated terrain combined with parts that the developer modifies by hand with the terrain tools.  Only the hand-modified parts would have to be saved to a series of files that could be named "mapname_x_x.patch", i.e. "magickingdom_54_72.patch".  These patches could be loaded from disk as needed, and deleted from memory when no longer in use.
      The real magic would be in developing an algorithm that could quickly generate a height value given an XZ position.  A random seed could be introduced to allow us to create an endless variety of procedural landscapes to explore.  Perhaps a large brush could even be used to assign characteristics to an entire region like "mountainy", "plains", etc.
      The possibilities of what we can do in Leadwerks Engine 5 are intriguing.  Granted I don't have all the answers right now, but implementing a system like this would be a major step forward that unlocks an enormous world to explore.  What do you think?

    • By Haydenmango in Snowboarding Development Blog 6
      So I've been researching snowboarding lately to get an idea of what animations and mechanics I need to create for my game.  I have learned lots of interesting things since I've only seen snow once or twice in my entire life and have never even tried snowboarding or any other board sports (skateboarding, surfing, etc.) for that matter.
       
      Snowboarding tricks are quite interesting as they are mostly derived from skateboarding.  Snowboarding tricks pay homage to their equivalent skating tricks by sharing many concepts and names.  For example basic grabs in snowboarding share the same concepts and names as skateboarding: indy, mute, method, stalefish, nosegrab, and tailgrab.  Something interesting to note is in snowboarding you can grab Tindy or Tailfish but this is considered poor form since these grabs can't be done on a skateboard (due to the board not being attached to the skaters feet) and grabbing these areas is generally something a novice snowboarder does when failing or "half-assing" a normal grab.  Check out this diagram to see how grabs work -
       
       
      So, after reading lots of text descriptions for tricks I was still confused by what all these terms meant and how they were actually applied.  So my next step was to look up these tricks actually being done and I found some really cool videos showing off how to do various tricks.  This video in particular is the best reference material I've found as it contains nearly every trick back to back with labeled names and some tweaks -
       
      Sadly my rigged model doesn't handle leg animations with the snowboard that well so I can't animate as many tricks as I want to.  Regardless there will still be around 15 total grab/air tricks in the game.  Now it's time for me to stop procrastinating and start animating!  
×
×
  • Create New...