SourcePawn, Part 2 – New Strings

Today I’m going to touch upon a new major change from AMX Mod X that was introduced into SourcePawn: natively packed strings.

But first, an errata: we decided to scrap the idea of using parenthetical tokens for dynamic arrays. Both array types use brackets (‘[', ']‘) now. The reason for this change was simple: with two different types, users would simply forget which was which. There’s no logical reason to separate them when it’s obvious whether or not your array is dynamic.

Natively Packed Strings
As readers may recall from a previous article on Pawn, strings are flat arrays of cells. This means that a string of 10 characters is rolled into a 40 byte array, with one cell per character. This was a huge burden on AMX Mod X. Constantly copying strings back and forth is an intensive process. It requires a huge number of memory reads and writes to mismatched memory widths, which is almost always big performance hit on any processor. While we were able to optimize this somewhat with our improved atcprintf() template, the fact remains that 99% of string operations in AMX Mod X require copying back and forth between temporary buffers.

In SourcePawn we decided to change all that. While most of our modifications have been fairly localized, this introduction touches many subtle areas of the compiler, and not all of the changes may sit well with coders familiar to AMX Mod X. The first major change is that all strings are now “natively packed.” Normally, Pawn packs strings in Big-Endian order, or as an array of cells (each one byte being casted to a 4 or 8 byte cell). In SourcePawn, literal strings are now packed in the native byte order. Literal strings also have the tag “String,” which means code like this:

native strlen(const string[]);

public example()
{
   new string[] = "gaben";
   return strlen(string);
}

Will now look like this:

native strlen(const String:string[]);

public example()
{
   new String:string[] = "gaben";
   return strlen(string);
}

The new “String” tag is a “magic tag,” much like Float, Fixed, and Function (which is SourcePawn specific). When you use the String tag, two things happen:

  • The last array dimension is internally changed. For example, if you wrote 40 on a normal array, it would be 40 cells. However, since it’s a String, the compiler can reduce the number of cells you need to 40 bytes (padded to be aligned to the cellsize, of course).
  • When you access the cell of an array tagged with “String,” you no longer access a cell, but a byte.

The first point comes at a price – the sizeof() operator will return cellsize, not bytesize, and thus the convenience of passing non-hardcoded string lengths is lost. As such, this might be changed for the first public release. But it’s the second point that has more subtle consequences. Take this code, for example::

new String:str[] = "gaben"
new a = str[1]

Although they both “look” like cells, they are in fact different data types, despite Pawn’s typeless mantra. When you use str[1], you are using a single byte. But when you assign it to a, you are casting it to a cell. This has a few implications with passing strings by reference. In Pawn, you can “slice” arrays and pass the array slice as a separate array. This is very convenient, and is similar to the address operator in C:

native print(const String:str[]);
native printc(char);

public test()
{
   new String:str[] = "gaben"
   printc(str[0]); //print first char
   print(str[1]); //print everything after 1st character
}

We made sure that this feature was extended to String arrays. The following is a different story:

native getc(&a);

public test()
{
   new String:str[] = "gaben"

   getc(str[0]);
}

This code is now illegal. Can you see why? It is because the reference types no longer match. It’s easy to cast a cell to a character and back again, but most languages forbid implicitly casting/coercing references because they are internally pointers. Thus str[0] is passing a pointer to a byte, not a cell, and the native would be getting garbage. There is a fix for this: in theory, the compiler could create a temporary cell, cast the byte into it, pass the address of the temporary cell, call the native, and then copy back the new value. But rather than deal with that, it’s simply illegal syntax for now. The correct way to do this, as it would be in other languages:

public test()
{
   new String:str[] = "gaben";

   new a = str[0];
   getc(a);
   str[0] = a;
}

Don’t let that discourage you, though. That usage tends to be very rare. The benefits of natively packed strings heavily outweigh the annoyances:

  • Memory is more compact.
  • String processing is much faster – Core simply has to cast a cell * to a char *!
  • Tagging results in slightly less confusion for new coders, since arrays and strings are differentiated.
  • Tagging leads to clearer function definitions and code.
  • More native-C library calls can be used to operate on strings.

Bonus Question: Why is it bad practice to implicitly cast references?

Answer to last bonus question: If you could copy an array reference (i.e. squirrel it away somewhere), then the original variable could go out of scope while references to it were still floating around. Therefore, we would need two things: an actual heap which had a non-stack tracking structure, and a garbage collector. One of Pawn’s best properties is not needing a garbage collector (or even manual memory management), so copying array references is not a feature that will be coming anytime soon, although we’re giving thought to dynamic global allocation.

Leave a Reply

You must be logged in to post a comment.