Archive for November, 2006

SourcePawn, Part 2 – New Strings

Tuesday, November 7th, 2006

Today I’m going to touch upon a new major change from AMX Mod X that was introduced into SourcePawn: natively packed strings.

But first, an errata: we decided to scrap the idea of using parenthetical tokens for dynamic arrays. Both array types use brackets (‘[', ']‘) now. The reason for this change was simple: with two different types, users would simply forget which was which. There’s no logical reason to separate them when it’s obvious whether or not your array is dynamic.

Natively Packed Strings
As readers may recall from a previous article on Pawn, strings are flat arrays of cells. This means that a string of 10 characters is rolled into a 40 byte array, with one cell per character. This was a huge burden on AMX Mod X. Constantly copying strings back and forth is an intensive process. It requires a huge number of memory reads and writes to mismatched memory widths, which is almost always big performance hit on any processor. While we were able to optimize this somewhat with our improved atcprintf() template, the fact remains that 99% of string operations in AMX Mod X require copying back and forth between temporary buffers.

In SourcePawn we decided to change all that. While most of our modifications have been fairly localized, this introduction touches many subtle areas of the compiler, and not all of the changes may sit well with coders familiar to AMX Mod X. The first major change is that all strings are now “natively packed.” Normally, Pawn packs strings in Big-Endian order, or as an array of cells (each one byte being casted to a 4 or 8 byte cell). In SourcePawn, literal strings are now packed in the native byte order. Literal strings also have the tag “String,” which means code like this:

native strlen(const string[]);

public example()
   new string[] = "gaben";
   return strlen(string);

Will now look like this:

native strlen(const String:string[]);

public example()
   new String:string[] = "gaben";
   return strlen(string);

The new “String” tag is a “magic tag,” much like Float, Fixed, and Function (which is SourcePawn specific). When you use the String tag, two things happen:

  • The last array dimension is internally changed. For example, if you wrote 40 on a normal array, it would be 40 cells. However, since it’s a String, the compiler can reduce the number of cells you need to 40 bytes (padded to be aligned to the cellsize, of course).
  • When you access the cell of an array tagged with “String,” you no longer access a cell, but a byte.

The first point comes at a price – the sizeof() operator will return cellsize, not bytesize, and thus the convenience of passing non-hardcoded string lengths is lost. As such, this might be changed for the first public release. But it’s the second point that has more subtle consequences. Take this code, for example::

new String:str[] = "gaben"
new a = str[1]

Although they both “look” like cells, they are in fact different data types, despite Pawn’s typeless mantra. When you use str[1], you are using a single byte. But when you assign it to a, you are casting it to a cell. This has a few implications with passing strings by reference. In Pawn, you can “slice” arrays and pass the array slice as a separate array. This is very convenient, and is similar to the address operator in C:

native print(const String:str[]);
native printc(char);

public test()
   new String:str[] = "gaben"
   printc(str[0]); //print first char
   print(str[1]); //print everything after 1st character

We made sure that this feature was extended to String arrays. The following is a different story:

native getc(&a);

public test()
   new String:str[] = "gaben"


This code is now illegal. Can you see why? It is because the reference types no longer match. It’s easy to cast a cell to a character and back again, but most languages forbid implicitly casting/coercing references because they are internally pointers. Thus str[0] is passing a pointer to a byte, not a cell, and the native would be getting garbage. There is a fix for this: in theory, the compiler could create a temporary cell, cast the byte into it, pass the address of the temporary cell, call the native, and then copy back the new value. But rather than deal with that, it’s simply illegal syntax for now. The correct way to do this, as it would be in other languages:

public test()
   new String:str[] = "gaben";

   new a = str[0];
   str[0] = a;

Don’t let that discourage you, though. That usage tends to be very rare. The benefits of natively packed strings heavily outweigh the annoyances:

  • Memory is more compact.
  • String processing is much faster – Core simply has to cast a cell * to a char *!
  • Tagging results in slightly less confusion for new coders, since arrays and strings are differentiated.
  • Tagging leads to clearer function definitions and code.
  • More native-C library calls can be used to operate on strings.

Bonus Question: Why is it bad practice to implicitly cast references?

Answer to last bonus question: If you could copy an array reference (i.e. squirrel it away somewhere), then the original variable could go out of scope while references to it were still floating around. Therefore, we would need two things: an actual heap which had a non-stack tracking structure, and a garbage collector. One of Pawn’s best properties is not needing a garbage collector (or even manual memory management), so copying array references is not a feature that will be coming anytime soon, although we’re giving thought to dynamic global allocation.

SourcePawn, Part 1 – Features Added

Friday, November 3rd, 2006

Whoa, an article! Yes, it’s been a while. After our huge forums overhaul and a number of other server moves, this section of the site was mostly forgotten during the chaos and aftermath. However, the JIT series is not dead and will return with some delicious goodies.

During the past two months, faluco and I have worked very diligently on an extremely fine-tune optimized JIT for Pawn (which we’re tentatively renaming as SourcePawn for SourceMod). As part of next the JIT series installments, I will be open-sourcing our library code for implementing a quick and dirty JIT.

During our writing of the new SourcePawn JIT, we also changed a few things in the language. Features were added and oddities were removed. I’d like to take a moment to explain those changes in this article. This first part will cover the three major additions we’ve made. They are: Dynamic Arrays, Function Pointers, and Fast Declarations.

Dynamic Local Arrays
This one is a biggie that has been requested over and over again for AMX Mod X. So, we took the liberty of adding “Dynamic Local Arrays.” Normally in Pawn, arrays have statically defined dimensions. You must use a constant value. Thus, creating an array based on a string length, or creating an array based on the number of connected players — is impossible. Dynamic Local Arrays let you declare local arrays with a variable dimension size for any or all dimensions. An example of the syntax:

new string[] = "Gaben"   //statically dimensioned array, 6 cells
new len = StringLength(string);
new better_string[len+1]  //dynamically dimensioned array
new weird_var[len][len] //dynamically dimensioned 2D array

string[0] = weird_var[0][0] = better_string[0] = 0

There is a caveat: Creating a dynamic array is remarkably more expensive. It is always on the heap, rather than the stack, and thus it requires a special tracking mechanism in the JIT. Worse, a multi-dimensional array has to have “indirection tables” generated. For a large multi-dimensional array, I won’t lie: this process should be considered extremely expensive (the compiler’s generation of these tables is recursive, but luckily tail-recursive, and thus optimizable). So if you can make the dimensions constant, use brackets instead, as the performance will be better.

Astute readers will also note the emphasis I put on “local.” These arrays cannot be global, obviously, because global instantiation would require a constructor of some sort, otherwise dynamic values would not make sense.

Another fine point: these arrays are not references. Meaning, you cannot do:

new array[];
array = new magicref[strlen("hello")];

Though, this functionality will definitely be added in some form, someday.

Function Pointers and Function Type Enumeration
This was another biggie seen as a flaw in AMX Mod X. Pawn coders will remember the pain of tracking down an error like this:

native register_event(const event_name[], const hook_name[]);

public plugin_init()
   register_event("DeathMsg", "hookdeath")

public hook_death

Oops! The event name is mispelled. Now, our code will never run, and since it was registered in init, it is likely we won’t notice the error message. A few minutes of hair pulling later, the programmer curses: “Oh, if only the compiler could have caught that!” Now, it can.

(For those wondering why it couldn’t before: Consider that the native is only asking for an array. This means you could pass in any sort of variable, constant, or string. The compiler has absolutely no way of magically deducing that you meant a function. There might be some way to hack this with a special tag, but it would be very difficult, and would only work for literally known strings. It also would not allow for any of the features that will be described below.)

The first major change is that you can now pass functions as values, instead of by a string containing their name. For example, one might have this:

native register_event(const event_name[], EventHook:event);


public plugin_init()
   register_event("Damage", OnDamage);

This should strike you with two implications. The first is that the compiler can now typo check in case of a silly error, and even better, it can also type check, meaning it can tell you if your function is potentially declared the wrong way. The second implication is that functions no longer have to be public. One of the annoying things about AMX Mod X, is the keyword “public” began to be mean “you must return something,” even if the return value was ignored! That was partially a side effect of improper tagging, but that’s a different rant.

Worse, everyone started making every function public just in case it needed to be public. This is bad. Public functions get exported to a table; a table which takes memory and adds to the relocation work of the JIT and VM. Though it’s not harmful, it’s bad practice for that reason. And worst of all is when people would combine all of these misunderstandings, and return random values in a stock-appropriate function which was declared as public.

But, I digress! We’ve only covered the “typo” benefits of this feature The type-checking part is where the magic happens. In AMX Mod X, the “set_task()” call supports two types of function prototypes:

native set_task(Float:time, const function_name[], timer_type);
forward TimerThatHasData(const data[], timer_id);
forward TimerThatHasNoData(timer_id);

However, the compiler lets you define a public function in any way you want. This means you can declare an array where one shouldn’t exist, or a float that should be a boolean, et cetera. As soon as the host app calls into your bad function, it will crash, or best, not work.

This was addressed in Pawn 3.1, which forced all public functions to have a pre-defined (“forwarded”) declaration. But this is a design flaw: it assumes that the user will only have one hook, since the named function has to be pre-defined. This is very bad for a lot of reasons – for example, timers and per-event hooks work most efficiently when assigned to unique functions. How can you get around that? Of course, you declare your own forward for your uniquely named function. But this defeats the purpose, since you can simply declare your forward wrong as well!

To get around this, we have introduced function types. Function types create a list of function prototypes that can be associated with a given tag. For example, in our previous example:

funcenum Timer
   public(const data[], timer_id),

native set_task(Float:time, Timer:timer_function, timer_method);

In this revised method, the compiler will check the function you are trying to pass in. If it doesn’t match one of the given rules, you will get a tag mismatch warning (which may end up being moved to an error). The rules are strict: Your parameter count, dimension sizes, tags, return tags, and references must all match. Although you can change names, any inconsistency that could cause a potential crash or run-time error is removed.

There is one extra case, though. A common practice is to remove parameters which you won’t be using. Thus, we introduced “optional” parameters which can be omitted from your declaration. This looks roughly like C++’s “pure virtual” syntax:

funcenum Timer
   public(const data[], timer_id=0)

Now, the timer_id parameter can be removed.

“Fast” Garbage Declarations
Lastly, as far as major features go, we’ve introduced a new variable declaration keyword: “decl.” decl can be used in lieu of new, and it has one caveat: your variable won’t be automatically zeroed.

Automatically zeroing variables is considered very expensive. For example, code like this in AMX Mod X could easily increase CPU usage:

public server_frame()
   for (new i=1; i< =maxplayers; i++)
      new some_array[256]

Imagine - every frame, you are zeroing 256*4 (1KB) of memory! Writing to memory is expensive, and wasteful if you're a)simply going to overwrite it again and b)not going to use all of it.

In AMX Mod X, the dirty hack was to use "static" instead, which makes the array pseudo-global. But this removes re-entrancy. So for SourcePawn, the new decl keyword will skip the zeroing step, allowing you to opt into fine-tuned, optimized, but unsafe storage.

Bonus Question:
Why didn't we implement copying array references, or global array creation?
Hint: Dynamic arrays go on the heap, but in SourcePawn, the heap is just another stack. Since you can't copy their references, they are simply "popped" off the heap once their scope dies.