January 19, 2024 - Fixing Undefined Behavior

Finally. IT's done

So what's this all about? Well, Rustic has had a dirty little secret since its very first Alpha 1 version: despite using safe Rust being a top priority for the engine, Rustic has had a little bit of unsafe code tucked away in a dark corner somewhere. That in itself isn't that big of a deal, but in this case, this unsafe code expected a certain outcome, where Rust defines this outcome as undefined behavior. That's bad.

Why? Because it isn't called "Undefined Behavior" for nothing. It means that you're depending on a behavior of which the compiler deems that it should not exist. This can cause trouble with newer versions of the compiler, or when enabling optimizations.

People coming from C or C++ may already recognize this. You may have been in this situation yourself. For example, have you ever encountered the weird problem that a program suddenly doesn't work right, when turning on the -O2 or -O3 optimizations in GCC, while it does work as expected when leaving out or only basic optimization? That happens when the program depends on a characteristic of the non-optimized compile, which changes when optimizations are turned on at a certain level. That's why it's called "undefined behavior." I also know of a at least one chess engine that is being developed and compiled with a very old version of GCC, because newer versions fixed an undefined behavior the engine depends upon; in other words, when compiling that engine with newer versions of GCC, it won't work right.

Where did Rustic suffer from this? The problem was in the move list. When first developing Rustic, it was very slow compared to what I was expecting. After profiling and researching, the problem turned out to be in the creation of the move list. It is best to create a move list in an array, backed by a counter, as compared to just using a vector. The reason is that an array is created on the stack, while a vector lives on the heap. The array is faster to set up. However, Rust requires that all variables need to be initialized before they are used. Therefore, when creating an array to hold 255 moves, Rust declares the memory and writes a 0 into each location. After that, the move generator runs, it will be overwriting 30 or 40 zero's, so it is doing lots of double work.

The solution is to get Rust to create the array without initializing it. My first attempt in Rustic Alpha 1 looked like this:

type TMoveList = [Move; MAX_LEGAL_MOVES as usize];
type RawMoveList = MaybeUninit<TMoveList>;

#[derive(Copy, Clone)]
pub struct MoveList {
    list: TMoveList,
    count: u8,
}

impl MoveList {
    pub fn new() -> Self {
        Self {
            list: unsafe { RawMoveList::uninit().assume_init() },
            count: 0,
        }
    }
}

The RawMoveList is an array of TMoveList, which contains Move elements. The RawMoveList may not be initialized. When creating the MoveList struct, the 'list' member gets set to an 'assumed initialized version' of the RawMoveList. This is obviously not true. The 'list' member is of this type:

type TMoveList = [Move; MAX_LEGAL_MOVES as usize]

And the RawMoveList is of this type:

type RawMoveList = MaybeUninit<TMoveList>;

What I'm stating in the constructor of MoveList basically says: take an uninitialized version of RawMoveList, assume it is initialized, and put it into the 'list' member of type TMoveList.

The idea behind this is that the move list does not have to be initialized, because the move generator will receive this list and then write its moves into it. From the viewpoint of Rust however, this is undefined behavior: at the end of the constructor, the 'list' member is not initialized even though we say it is. Rust warns about this, because the compiler may run optimizations on the 'list' member as if it was initialized, and those optimizations may not work because actually, the list wasn't initialized. In the past this was only a warning, but in current Rust versions, using MaybeUnit like this is an error.

I got around it by doing this:

impl MoveList {
    pub fn new() -> Self {
        Self {
            list: unsafe {
                let block = RawMoveList::uninit();
                
                // Initialization should be here!
                
                block.assume_init()
            },
            count: 0,
        }
    }
}

This obviously isn't the solution, because this works in exactly the same way as the previous version. The only difference is that the compiler and linter don't complain about it, because the uninit() and assume_init() calls are split up. I imagine that later versions of Rust will be able to detect that no initialization happens between these two calls and the error message about undefined behavior will return. Also, I don't want to pass the move generator into the move list struct and then pass the 'list' member into the move generator. That would be exceedingly ugly, and the wrong way around.

In Rustic 4-beta I finally took the time to fixing it:

type MoveListArray = [Move; MAX_LEGAL_MOVES as usize];
pub type MoveListRaw = [MaybeUninit<Move>; MAX_LEGAL_MOVES as usize];

pub fn allocate_move_list_memory() -> MoveListRaw {
    unsafe { MaybeUninit::uninit().assume_init() }
}

#[derive(Copy, Clone)]
pub struct MoveList {
    list: MoveListArray,
    count: u8,
}

impl MoveList {
    pub fn new(raw: &MoveListRaw, count: u8) -> Self {
        Self {
            list: unsafe { mem::transmute::<_, MoveListArray>(*raw) },
            count,
        }
    }
}

TMoveList has been renamed to MoveListArray. This is a normal array holding Move elements. MoveListRaw is now a public type, and contains move list elements that have possibly not been initialized. This is different from the previous version. Compare them:

// old
type RawMoveList = MaybeUninit<[Move; MAX_LEGAL_MOVES as usize]>;

// new
pub type MoveListRaw = [MaybeUninit<Move>; MAX_LEGAL_MOVES as usize];

In the old version, we had an uninitialized array that was filled with Move elements. Now we have an initialized array that is filled with possibly uninitialized Move elements. So now its the elements that aren't initialized, not the entire array.

Therefore we can now do this:

pub fn allocate_move_list_memory() -> MoveListRaw {
    unsafe { MaybeUninit::uninit().assume_init() }
}

So why can these calls now be chained without issue? Because we are working with a MoveListRaw, of type [MaybeUninit; 255]. What we are returning here is an array initialized with elements that are possibly not initialized. (Yeah, read that again.) The compiler now knows to not touch that array with optimizations, because it's elements are possibly not initialized yet, and thus can't be optimized.

This is the MaybeUninit documentation, which describes the technique I'm using here.

We use it in this way:

let mut memory = allocate_move_list_memory();
let mut move_list = move_gen.generate_moves(..., &mut memory, ...);

'memory' holds this raw move list that is initialized with possibly uninitialized elements. We pass this 'memory' variable into the move generator. There, we obviously had to replace all "MoveList" types with "MoveListRaw", because that is what the move generator needs to accept now. In the move generator itself, we only have to change two lines, from this:

list.push(Move::new(move_data));
list.push(Move::new(move_data | promotion_piece))

To this:

memory[*count as usize].write(Move::new(move_data));
memory[*count as usize].write(Move::new(move_data | promotion_piece));

And we need to count the moves that are added. Why? Because we are no longer working with the MoveList struct; we are working with an array containing uninitialized Move elements. So instead of pushing moves into the array, we now need to write the data into it. What write() does, is put the Move data into the MaybeUninit struct, so the MaybeUninit is now initialized.

At the end of the function generate_moves() we can now say:

self.init_unused_part_of(memory, count);
MoveList::new(memory, count)

pub fn init_unused_part_of(&self, memory: &mut MoveListRaw, count: u8) {
    for i in count..MAX_LEGAL_MOVES {
        memory[i as usize].write(Move::new(0));
    }
}

This writes Moves with 0 data into the part of the MoveListRaw we are not using, and then we return a new, proper MoveList struct from the move_generator() function. The constructor does this:

impl MoveList {
    pub fn new(raw: &MoveListRaw, count: u8) -> Self {
        Self {
            list: unsafe { mem::transmute::<_, MoveListArray>(*raw) },
            count,
        }
    }
}

It "transmutes" the incoming MoveListRaw into a MoveListArray. Remember, MoveListRaw and MoveListArray are basically the same type, except that MoveListRaw says that the elements of the array MaybeUninit, instead of Move. MaybeUninit guarantees that the memory layout is exactly the same as that of T, so what transmute basically does, is stripping the MaybeUninit from the array's elements, which makes the elements now correctly initialized.

Why all of these shenanigans? Well, if you just have Rust create a new MoveList, it will fill the array with zero's, after which the move generator is going to write its own moves into it. This is double work, and it happens in the hottest part oof the code, which slows down the engine significantly. Doing this by using MaybeUninit, we lose a bit of speed because of the transmute function, but this loss is far outweighed by the speed we gain by not doing double work millions of times very second.

Now, get out of here and write some more chess engine code.