Thursday, February 18, 2016

C Macro 101: Stringizing Operator and Token Pasting Operator

Present day C language has evolved to the point where we can be productive writing code in it. Perhaps, many still views that writing code in C is tedious compared to writing code in other higher-level language. I think, that's a subjective view. Perhaps, only functional languages have higher productivity compared to C for most non-system-programming.

Two of the "productivity features" in C that I found indispensable at the moment are the stringizing operator (#) and token-pasting operator (a.k.a concatenation operator) (##). Both of these operators can be used in C macros only, you cannot use it outside of C macros. However, both are very powerful tools to create function templates in C. Yes, you read that right. Function templates are not just for C++ programmers. C programmers also has a sort of function template via C macros, despite it's a little "rudimentary" compared to C++.

Most stringizing and token-pasting tutorials out there don't provide useful code snippets with regard to the "real" power of these operators. This post aims to fill that gap. Without further ado, let's get to the code. You can clone the complete sample code used in this post from https://github.com/pinczakko/sample_token_pasting
#include <stdio.h>
#include <assert.h>

typedef enum {
 PEEK_REQUEST_ITEM,
 PEEK_REPLY_ITEM,
 MOD_REQUEST_ITEM,
 MOD_REPLY_ITEM
}ITEM_TYPE;

struct queue_item {
 ITEM_TYPE type;
 char payload[32];
};

struct handler {
 int identifier;
 int (*process_data) (void* data);
};

#define PRINT_FUNC_NAME \
do { \
 printf("In function: %s() \n", __func__); \
} while (0);

static inline int process_peek_request(const struct queue_item *const
          peek_req,
          struct handler *p)
{
 /** Algorithm A ...  */
 PRINT_FUNC_NAME
 return 0;
}

static inline int process_peek_reply(const struct queue_item *const
        peek_rep,
        struct handler *p)
{
 /** Algorithm B ...  */
 PRINT_FUNC_NAME
 return 0;
}

static inline int process_modification_request(const struct queue_item *const
          modification_req,
          struct handler *p)
{
 /** TODO: Invalidate cached items taking part in the MOD transaction **/

 /** TODO: Enqueue the MOD request to egress_port_output_queue */

 /** TODO: Notify egress_port thread to consume the MOD request */

 PRINT_FUNC_NAME

 return 0;/** Success */

 error:
 return -1;/** Failed */
}

static inline int process_modification_reply(const struct queue_item *const
        modification_rep,
        struct handler *p)
{
 /** TODO: Enqueue the MOD reply to ingress_port_output_queue */

 /** TODO: Notify ingress_port thread to consume the MOD reply */

 PRINT_FUNC_NAME

 return 0;/** Success */

 error:
 return -1;/** Failed */
}

#define PROCESS_DEQUEUED_ITEM(MESSAGE, TYPE) \
static inline int process_dequeued_##MESSAGE(const struct queue_item *const MESSAGE,\
         struct handler *p) \
{ \
 assert((MESSAGE != NULL) && (p != NULL)); \
 \
 assert((MESSAGE->type == PEEK_##TYPE) || \
        (MESSAGE->type == MOD_##TYPE)); \
 \
 PRINT_FUNC_NAME \
 \
 if (MESSAGE->type == PEEK_##TYPE) { \
  printf("Processing PEEK " #MESSAGE "\n"); \
  return process_peek_##MESSAGE(MESSAGE, p); \
 \
 } else if (MESSAGE->type == MOD_##TYPE) { \
  printf("Processing MOD " #MESSAGE "\n"); \
  return process_modification_##MESSAGE(MESSAGE, p); \
 \
 } else { \
  printf("Warning: Unknown " #MESSAGE " type!\n"); \
  return -1; /** Failed */ \
 } \
}

/** Token-pasted function instance to handle request message */
PROCESS_DEQUEUED_ITEM(request, REQUEST_ITEM)

/** Token-pasted function instance to handle reply message */
PROCESS_DEQUEUED_ITEM(reply, REPLY_ITEM)

int main (int argc, char * argv[])
{
 int i; 

 struct queue_item req_item[2], rep_item[2];
 struct handler h;

 req_item[0].type = PEEK_REQUEST_ITEM;
 req_item[1].type = MOD_REQUEST_ITEM;

 rep_item[0].type = PEEK_REPLY_ITEM;
 rep_item[1].type = MOD_REPLY_ITEM;

 for (i = 0; i < 2; i++) {
  process_dequeued_request(&req_item[i], &h);
 }

 for (i = 0; i < 2; i++) {
  process_dequeued_reply(&rep_item[i], &h);
 }

 return 0;
}
The code above will produce two different functions,  process_dequeued_request() and process_dequeued_reply(), respectively, to  handle request and reply. The algorithm used by both functions is very similar, the differences are only in function naming, parameters naming and constant naming. Therefore, it is natural to use token-pasting and stringizing operators in the code. In C++, you would use  C++ template. You can achieve the same thing in C with token-pasting (##) and stringizing operator (#).

The stringizing operator (#) basically creates a C string from the C macro parameter. For example, if you pass reply as parameter to a C macro, the C preprocessor will produce "reply" (C string -- including the double quotes) as output if the stringizing operator is applied to the macro parameter. Perhaps, it's a bit hard to understand. Let's look at the sample code above. In this line:
PROCESS_DEQUEUED_ITEM(reply, REPLY_ITEM)
we asked the preprocessor to instantiate the process_dequeued_reply() function. In the process_dequeued_reply() function, the code uses the stringizing operator like so:
printf("Processing PEEK " #MESSAGE "\n");
After GCC preprocessing stage, this function call becomes:
printf("Processing PEEK " "reply" "\n");
As you see, the reply macro input parameter is transformed into "reply", i.e. stringized.
Perhaps, you asked, how can I obtain the preprocessor output? Well, in most compiler, you can obtain the preprocessor output via certain compiler switch(es). In GCC, you can use the -save-temps switch to do so. The GCC preprocessor output is a *.i file with the same name as the source file. In my sample code, the Makefile uses this switch to instruct GCC to place the preprocessor output in the source code directory. I used the indent utility (indent -linux sample_token_pasting.i) to beautify the preprocessor output.
This is an example snippet of the "beautified" preprocessor output from sample_token_pasting.i file:
static inline int process_dequeued_reply(const struct queue_item *const reply,
      struct handler *p)
{
#107 "sample_token_pasting.c" 3 4
 ((
#107 "sample_token_pasting.c"
   (reply !=
#107 "sample_token_pasting.c" 3 4
    ((void *)0)
#107 "sample_token_pasting.c"
   ) && (p !=
#107 "sample_token_pasting.c" 3 4
         ((void *)0)
#107 "sample_token_pasting.c"
   )
#107 "sample_token_pasting.c" 3 4
  )? (void)(0) : __assert_fail(
#107 "sample_token_pasting.c"
          "(reply != ((void *)0)) && (p != ((void *)0))"
#107 "sample_token_pasting.c" 3 4
          , "sample_token_pasting.c", 107,
          __PRETTY_FUNCTION__))
#107 "sample_token_pasting.c"
     ;
#107 "sample_token_pasting.c" 3 4
 ((
#107 "sample_token_pasting.c"
   (reply->type == PEEK_REPLY_ITEM)
   || (reply->type == MOD_REPLY_ITEM)
#107 "sample_token_pasting.c" 3 4
  )? (void)(0) : __assert_fail(
#107 "sample_token_pasting.c"
          "(reply->type == PEEK_REPLY_ITEM) || (reply->type == MOD_REPLY_ITEM)"
#107 "sample_token_pasting.c" 3 4
          , "sample_token_pasting.c", 107,
          __PRETTY_FUNCTION__))
#107 "sample_token_pasting.c"
     ;
 do {
  printf("In function: %s() \n", __func__);
 } while (0);
 if (reply->type == PEEK_REPLY_ITEM) {
  printf("Processing PEEK " "reply" "\n");
  return process_peek_reply(reply, p);
 } else if (reply->type == MOD_REPLY_ITEM) {
  printf("Processing MOD " "reply" "\n");
  return process_modification_reply(reply, p);
 } else {
  printf("Warning: Unknown " "reply" " type!\n");
  return -1;
 }
}
It's a bit unwieldy. However, sometimes you need to be sure that you don't make any silly mistake with your C macro by looking into the preprocessor output.

Let's move to the other operator, the token-pasting operator. This operator basically "paste and concatenate" the macro parameter to create the "target" C token from both the macro parameter and the C token "fragment" in your macro. If you don't truly understand what a C language token yet, please read http://www.help2engg.com/c_tokens and https://msdn.microsoft.com/en-us/library/c6sb2c6b.aspx. The sample code uses the token-pasting operator to create "configurable" C function name and constants. This code:
PROCESS_DEQUEUED_ITEM(reply, REPLY_ITEM)
produces three C tokens: process_dequeued_reply function name, PEEK_REPLY_ITEM constant and MOD_REPLY_ITEM constant. You can see the process clearly in the GCC preprocessor output snippet above. The process_dequeued_ C token "fragment" is concatenated with the value of the MESSAGE macro parameter, which in this macro invocation:
PROCESS_DEQUEUED_ITEM(reply, REPLY_ITEM)
has a value equal to reply. Therefore, the concatenated ("target") C token is process_dequeued_reply. The constants also undergo similar transformation via the TYPE macro parameter.

Anyway, this is the output of the program (compiled from the sample code)
In function: process_dequeued_request() 
Processing PEEK request
In function: process_peek_request() 
In function: process_dequeued_request() 
Processing MOD request
In function: process_modification_request() 
In function: process_dequeued_reply() 
Processing PEEK reply
In function: process_peek_reply() 
In function: process_dequeued_reply() 
Processing MOD reply
In function: process_modification_reply() 
Well, the output just shows which functions are invoked and their order of invocation to clarify the inner working of both stringizing operator and token-pasting operator.

Hopefully, the explanation in this post clarify the power of C stringizing and token-pasting operator.