Tuesday, August 26, 2008

The Incompatibility of 64-bit GCC With 32-bit Packed Data Structures

There are certain times when we, C programmers take it for granted that when you declare:

unsigned long x;

you expect to have 32-bit unsigned variable. You would expect that you need to declare the variable as:

unsigned long long x;

to have 64-bit unsigned variable.

Well, at some points these are not problematic. But, when you have a packed data structure, say a structure that describes the header of a binary file. You're in a big trouble if the seemingly innocent code:

unsigned long x;

turns out to declare a 64-bit unsigned long instead of the expected 32-bit unsigned long.

Let's look at a real life example. This code:

struct img_header {
unsigned char signature[SIGNATURE_LEN];
unsigned long startAddr;
unsigned long burnAddr;
unsigned long len;
}__attribute__((packed));

typedef struct img_header IMG_HEADER_T, *IMG_HEADER_Tp;

Would create an 16-bytes IMG_HEADER_T structure with the 32-bit GCC compiler. On the other hand, it would create a 28-bytes IMG_HEADER_T structure with the 64-bit GCC compiler. This is very dangerous when dealing with firmware binary. The workaround on 64-bit GCC compiler is to force the compiler to use the 32-bit "compatibility" mode by using the "-m32" compiler flags. Most 64-bit GCC compiler in 64-bits Linux distributions have this "compatibility" mode. Usually, the compiler flags are placed in the Makefile. You can force this intended behavior there, like this:

...
CC=gcc
CFLAGS = -m32
...


It takes me more than one month to spot this bug :-(. Which is a pity. It's known only after I made a very simple 8-bit checksum utility which spots an excess of 12-bytes in the header file of an intermediate file in the SDK that I worked with. A couple of lessons learned from this incident.

1. Never trust your seemingly innocent Makefile when you're working on a 64-bit system with 64-bit compiler or multilib compiler. Use any "force" options to enforce your intended output from the compiler because in most cases you don't know what the default is unless you do some tests.

2. Always build test stubs to verify intermediate results when something goes awry in the output file of an SDK. This will save your development time.

3. Use common sense to track down the bug. Add debug statements to watch the output of a binary utility in an SDK when using it on 64-bit systems because you won't know whether it will behave as intended or not. Most of todays SDKs has been tested only in 32-bit systems and assumed to be running on the very same architecture.

This is a very hard lesson for me because it wasted time plus a lot of resources to spot the bug. I should've build the "test stub" application which only takes very short time to create in the very beginning. The "test stub":

#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

#define DEBUG
#undef DEBUG


/*
* TODO:
*
* 1. Add default end offset handling (i.e., EOF equ end offset)
* 2. Add more robust error handling
* 3. Add GNU gettext handler for input parameters
*
*/


static unsigned char sum(char buf[], unsigned long len, unsigned long start)
/*
* @param buf pointer to buffer to be 8-bit summed
* @param start starting offset in buf to calculate the sum
* @param len length of the buffer to be summed (in bytes)
*/
{
unsigned char sum;
unsigned long i;

sum = 0;

for(i=0; i<len; i++ ) {
sum = sum + buf[start+i];
}

return sum;
}

static void show_help(char* argv[])
{
printf("Usage: %s filename start_offset_in_file(hex) "
" end_offset_in_file(hex) \n", argv[0]); // TODO: use GNU's gettext
exit(1);
}

int main(int argc, char* argv[])
{
int stream;
char* buf;
struct stat st;
unsigned long start_offset, end_offset;


if(argc != 4) {
show_help(argv);
}

stream = open( argv[1], O_RDONLY);

if (stream == -1 ) {
printf("Error opening input file!\n");
exit(1);
}


// get file size
if( fstat(stream, &st) != 0 ) {
printf("Error, unable to get file size!\n");
exit(1);
}

printf("Input file size = 0x%X bytes\n", st.st_size);

// allocate buffer for the file size obtained above
buf = (char*) malloc(st.st_size);
if( buf == NULL ) {
printf("Unable to allocate memory for file buffer!\n");
exit(1);
}

// read the opened file to buffer
read( stream, (void*) buf, st.st_size );

// calculate checksum, passing in the buffer and start offset
start_offset = strtol(argv[2], NULL, 16);
end_offset = strtol(argv[3], NULL, 16);

if ( start_offset >= end_offset ) {
printf("Error! Wrong parameter. "
"end_offset should be bigger than start_offset\n");

free(buf);
close(stream);
exit(1);
}

#ifdef DEBUG // check string conversion routine
printf("start_offset = 0x%X\n", start_offset );
printf("end_offset = 0x%X\n", end_offset );
#endif

printf("File checksum (from offset 0x%X to 0x%X)= 0x%X\n",
start_offset, end_offset,
sum(buf, end_offset - start_offset, start_offset) );

free(buf);
close(stream);

return 0;
}

This bug won't bite you if you do development on 64-bit Linux/Unix systems if pay attention to it.


Never take anything for granted when developing on 64-bit or multilib systems


This document is a good source of information on 64-bit/multilib systems portability, particularly for programmers and advanced sysadmins.
Post a Comment

1 comment:

Anonymous said...

Thanks for the tip!