Saturday, March 23, 2013

Elemental Resistances

A map from Karnov.
Final Fantasy uses a simple system for calculating elemental resistance and weakness. It packs eight 1-bit resistance flags into one byte and eight 1-bit weakness flags into a second byte. Storing two bits per element, it can store four possible states but uses only three: normal, weak, resistant, and normal (when weak + resist cancel out). It would not be hard to create a fourth state, say absorption, in the same space. You can just pack four elements as 2-bit codes into one byte, and the other four in the other. However, there is one catch: how do we combine the bonuses granted from multiple items? We can no longer just and or or them together any more.

First off, we need to decide how the various parameters combine. Here are the obvious combinations (in columns to save space):

Key: A=absorb, R=resist,N=normal,W=weak
A A A ? A R R ? A R N W ? ? W W

Anything combined with normal is unchanged. Anything combined with its self is unchanged. Absorb combined with Resist is Absorb (the better of the two). Weak combined with Absorb or Resist is non-obvious because there’s gameplay considerations involved. For reasons explained later, I am using W+A->A and W+R->R for now.

The simplest approach to implement this table is with an actual table. We mask off each element one at a time and combine them into an index into our lookup table.  (A=0,R=1,N=2,W=3) Here is the code to do it:

unsigned short resistCombine(unsigned short a, unsigned short b) {
    static unsigned short table[16] = {0,0,0,0,0,1,1,1,0,1,2,3,0,1,3,3};
    unsigned short result = 0;
    for (int i = 14; i >= 0; i-=2) {
        unsigned short aBits = (a >> i) & 3;
        unsigned short bBits = (b >> i) & 3;
        result = (result << 2) | (table[(aBits << 2) | bBits]);
    return result;

At a glance, I see 11 ops per element. This multiplied by the eight elements totals 88 ops. That seems like a lot of work for such a simple task. I know an optimizing compiler may reduce it some, and early optimization is a bad idea, but the point of this article is to teach you a new trick, so we'll pretend that this is a problem worth solving.

Let us think about the problem again. We have two unsigned shorts that resemble a SIMD register: Eight two-bit integers. (I wonder what the Intel would make the opcode prefix?)  In addition, the work is a mapping of four bits to two bits.  If you have ever taken a digital logic course, you know it is time for a Karnaugh map. I you have not, read up here. In the original table, we have some unknowns. These unknowns are called “don't care” values in the K-map. Don’t cares are nice because we can set them to whatever value makes the equation simpler.  After a cursory search, I found this site for solving K-maps.

There are two K-maps to solve: one for the high bit of the result one for the low bit.
Normally, you would leave the don't-cares in the solver, but our don’t cares interact. I want W+A = A+W and W+R = R+W which means we really have two pairs of don't cares instead of four don't cares. Also we don't want some weird results like W+R -> A. So instead, I solved each map four ways and we can pick the best pair from there.
‘A’ is the high bit of the first input; ‘B’ is the low bit of the first input; ‘C’ is the high bit of the second input; and ‘D’ is the low bit of the second input. The A column in the table represents the value of the don’t care bits for the W+A combination and the R column represents the don’t care bits of the W+R combination. Because I don't have fancy typography options on the blog I've used lower case letters to represent inverted values: 'a' = 'not A'.

AR   Low Bit
00 = BCd+AbD+ACD+aBcD : 16 ops
01 = BD+BC+AD : 5 ops
10 = BCd+bCD+AbD+ABC+ABd+aBcD : 24 ops
11 = AB+CD+BD+BC+AD : 9 ops

AR   High Bit
00 = AC : 1 ops
01 = AC+BCD+ABD : 7 ops
10 = AC+bCD+ABd : 9 ops
11 = AB+CD+AC :  5 ops

So lets try out our best case: the 5 ops low bit, and 1 op high bit. This results in W+A->A (take the 0 from the 'A' column from the high bit section and the 0 from the 'A' column of the low bit section) and W+R->R (take the 0 from the 'R' column from the high bit section and the 1 from the 'R' column of the low bit section). These seem like plausible values. Lets see how the code fairs:

unsigned short resistCombine2(unsigned short a, unsigned short b){
    //move the high and low bits of each resist to separate variables
    unsigned short A = (a>>1)&0x5555; //high bits
    unsigned short B = a&0x5555; //low bits
    unsigned short C = (b>>1)&0x5555; //high bits
    unsigned short D = b&0x5555; //low bits
    return ((A&C)<<1) | ((B&D)|(B&C)|(A&D));

Real world (totally un-scientific) benchmarks
Original: 4.87779s
Improved: .40409s
1207% Speedup!

Now that I have it super optimized, it is important to point out that this code is next to worthless. Sure, it runs fast, but what happens when you hand off the code to someone else. You can not attach a picture of you notebook into the comments. Even worse, what if W+A->A leads to imbalanced itemization and you need to change it to R or N to compensate? Then you would have to re-solve all the equations and recode the function again instead of just editing the table. In this case, where the function is only called when a player changes their armor, it is much better to leave it as the inefficient version. Though, I sure had fun solving those equations... It takes me back.