Wednesday, October 28, 2009

C Bitfields and HW Registers

When I need a quick low-level programming "fix", I browse the archives at PageTable.com. Last week I read the post on Readable and Maintainable Bitfields in C which argued the merits of using bitfields over bitmasks+macros. Although I agree with the post's points I think it omitted one important detail - the danger of using bitfields with hardware registers.

Hardware registers can be mapped into a processor's memory space and accessed with standard memory read/write instructions. Therefore the temptation is to define a bitfield type representing a register's structure and set a pointer to its base memory address. For example, assuming register bar is four bytes wide, has five bit-fields, and is located at memory address 0xBAADF00D one might be tempted to do the following:

typedef struct {
    unsigned int f1:8;
    unsigned int f2:4;
    unsigned int f3:8;
    unsigned int f4:4;
    unsigned int f5:8;
  } registerBar_t;


// Set pointer to register's memory address
registerBar_t *pBar = 0xBAADF00D;

// Use pointer to access register
pBar->f1 = 0xFE;

One problem with this approach is that many registers are designed to be accessed only at their full size - all accesses must be aligned to the register's base address and read/write the whole thing. Unfortunately, the setting of f1 in the above example may produce a partial register write that can lead to unexpected and unintended results.

Another challenge when dealing with registers is that, unlike main memory, register accesses can have side effects. Even register reads can cause the hardware to initiate action or clear information. Consider the following example:

...
tmpf1 = pBar->f1;
tmpf2 = pBar->f2;
...

If reads of register bar are destructive causing its contents to be cleared then the read of f1 may clear the contents of f2 before it is read by the subsequent pointer dereference. The resulting loss of information could cause the driver, hardware, or both to behave unexpectedly.

Alternatively, if reads of register bar cause the hardware to initiate action then spurious activity may occur if the f1 and f2 accesses are done separately.

Because of these granularity and side effect issues, I was advised early in my career to avoid using bitfields with hardware registers. This is, I think, an important point that is captured in the PageTable.com post's comments but not in the post itself which is an unfortunate omission.

After reading the PageTable.com post, I realized that I always took this advice on faith and never looked at the instructions generated by bitfield accesses. So I decided to do a quick experiment, below is a short program that accesses a bitfield with fields of varying size and alignment.

#include <stdio.h>

typedef union {
  struct {
    unsigned int f1:8; // Bits 07:00
    unsigned int f2:4; // Bits 11:08
    unsigned int f3:8; // Bits 19:12
    unsigned int f4:4; // Bits 23:20
    unsigned int f5:8; // Bits 31:24
  };
  unsigned int raw;
} bitfield_t;


int main()
{
  bitfield_t bitfield;
  unsigned int tmp;

  bitfield.raw = 0x0;

  // Set bit field f1
  bitfield.f1 = 0xEF;
  tmp = bitfield.f1;
  printf(" After f1: F1(0x%02x) RAW(0x%08x)\n", 
         tmp,
         bitfield.raw);

  // Set bit field f2
  bitfield.f2 = 0xE;
  tmp = bitfield.f2;
  printf(" After f2: F2(0x%02x) RAW(0x%08x)\n", 
         tmp,
         bitfield.raw);

  // Set bit field f3
  bitfield.f3 = 0xDB;
  tmp = bitfield.f3;
  printf(" After f3: F3(0x%02x) RAW(0x%08x)\n", 
         tmp,
         bitfield.raw);

  // Set bit field f4
  bitfield.f4 = 0xA;
  tmp = bitfield.f4;
  printf(" After f4: F4(0x%02x) RAW(0x%08x)\n", 
         tmp,
         bitfield.raw);

  // Set bit field f5
  bitfield.f5 = 0xDE;
  tmp = bitfield.f5;
  printf(" After f5: F5(0x%02x) RAW(0x%08x)\n", 
         tmp,
         bitfield.raw);

  // Set with raw
  bitfield.raw = 0xDECAFBAD;
  tmp = bitfield.raw;
  printf("After raw: RAW(0x%08x)\n", 
         tmp);

  return 0;
}

Compiling and running this program on an Ubuntu system results in the expected output.

jcardent@ubuntu:~/tmp$ gcc -g -o foo foo.c 
jcardent@ubuntu:~/tmp$ ./foo 
 After f1: F1(0xef) RAW(0x000000ef)
 After f2: F2(0x0e) RAW(0x00000eef)
 After f3: F3(0xdb) RAW(0x000dbeef)
 After f4: F4(0x0a) RAW(0x00adbeef)
 After f5: F5(0xde) RAW(0xdeadbeef)
After raw: RAW(0xdecafbad)

Running the command

jcardent@ubuntu:~/tmp$ objdump -d -S foo 

reveals the instructions generated to access the bit-fields. Looking at the f1 write and read sequence shows:

  // Set bit field f1
  bitfield.f1 = 0xEF;
 80483dc:       c6 45 f8 ef        movb   $0xef,-0x8(%ebp)
  tmp = bitfield.f1;
 80483e0:       0f b6 45 f8        movzbl -0x8(%ebp),%eax
 80483e4:       0f b6 c0           movzbl %al,%eax
 80483e7:       89 45 f4           mov    %eax,-0xc(%ebp)

The first thing to note from this disassembly fragment is that bitfield is located on the stack eight bytes below %ebp. Likewise, tmp is located at offset 0xC.

From this example it's clear that the write to f1 uses a single byte move instruction. If bitfield had been mapped to a hardware register, this would have resulted in an aligned but too short write access that could have produced unintended behavior.

The read of f1 is less clear until the movzbl instruction is understood to be a move from a single byte to a word, four bytes in this case. So here again, if bitfield had been mapped to a register the single-byte access may have resulted in unintended behavior like dataloss (top three bytes cleared) or spurious action (if subsequent reads are done to other fields for the same operation).

Looking at the f2 write and read sequence shows:

 // Set bit field f2
  bitfield.f2 = 0xE;
 8048404:       0f b6 45 f9        movzbl -0x7(%ebp),%eax
 8048408:       83 e0 f0           and    $0xfffffff0,%eax
 804840b:       83 c8 0e           or     $0xe,%eax
 804840e:       88 45 f9           mov    %al,-0x7(%ebp)
  tmp = bitfield.f2;
 8048411:       0f b6 45 f9        movzbl -0x7(%ebp),%eax
 8048415:       83 e0 0f           and    $0xf,%eax
 8048418:       0f b6 c0           movzbl %al,%eax
 804841b:       89 45 f4           mov    %eax,-0xc(%ebp)

In this case, setting the four bit-wide field f2 results in a byte-wide read-modify-write sequence aligned with the second byte of bitfield, evidenced by the offset of 0x7 instead of 0x8. Similarly, reading f2 results in a byte-wide read aligned with the second byte of bitfield. Both accesses are too short and misaligned.

Since f3 spans bytes 2 and 3 of bitfield, its access sequence results in aligned, four byte-wide mov instructions.

  bitfield.f3 = 0xDB;
 8048438:       8b 45 f8           mov    -0x8(%ebp),%eax
 804843b:       25 ff 0f f0 ff     and    $0xfff00fff,%eax
 8048440:       0d 00 b0 0d 00     or     $0xdb000,%eax
 8048445:       89 45 f8           mov    %eax,-0x8(%ebp)
  tmp = bitfield.f3;
 8048448:       8b 45 f8           mov    -0x8(%ebp),%eax
 804844b:       c1 e8 0c           shr    $0xc,%eax
 804844e:       80 e4 ff           and    $0xff,%ah
 8048451:       0f b6 c0           movzbl %al,%eax
 8048454:       89 45 f4           mov    %eax,-0xc(%ebp)

Although the accesses themselves are well-formed, unintended behaviors can still result if f3 is only one of multiple fields that must be set for a single operation.

Since the structure of bitfield is symmetrical, the accesses to fields f4 and f5 produce instructions similar to those for f2 and f1 respectively albeit with different offsets.

Finally, the accesses to raw produce aligned, full-width instructions as expected.

  // Set with raw
  bitfield.raw = 0xDECAFBAD;
 80484cd:       c7 45 f8 ad fb ca de movl   $0xdecafbad,-0x8(%ebp)
  tmp = bitfield.raw;
 80484d4:       8b 45 f8           mov    -0x8(%ebp),%eax
 80484d7:       89 45 f4           mov    %eax,-0xc(%ebp)

This last example illustrates a tempting workaround for "safely" using bitfields to manage register accesses. Consider:

registerBar_t *pBar = 0xBAADF00D;
registerBar_t tmpBar;

// Set field f1 to 0xff
tmpBar.raw = pBar->raw;
tmpBar.f1  = 0xff;
pBar->raw  = tmpBar.raw;

While this approach works, it suffers the risk of an uninformed future maintainer "optimizing out" the temporary variable and just using the bitfield method directly. In this regard, it may be more maintainable to use bitmasks and macros for register accesses.

Of course, problems can arise regardless of the method used if "uninformed" developers are allowed to change the code. The only prevention here is to make sure there is suitable training and disciplined code reviews.