view paper/rectype.ind @ 43:c2207a9208a5

modify
author Nobuyasu Oshiro <dimolto@cr.ie.u-ryukyu.ac.jp>
date Sat, 16 Jun 2012 03:22:51 +0900
parents ba21600eabba
children 39f6db695840
line wrap: on
line source

-title: Recursive type syntax in Continuation based C

\newcommand{\rectype}{{\tt \_\_rectype}}

--author: Nobuyasu Oshiro, Shinji KONO

--abstract: 

We have implemented Continuation based C (CbC).
CbC is an extension of C, which has parameterized goto statement.
It is useful for finite state automaton or many core tasks. 
Goto statement is a way to force tail call elimination.
The destination of goto statement is called Code Segment, which is actually a normal function of C.
To represent recursive function call, the type system of C is not enough, because
it has no recursive types.
We introduce \rectype  keyword for recursive type, and it is implemented in GCC-4.6.0.
We will compare the conventional methods, \rectype keyword and a method using C structure.
Also we show usage of CbC and it's benchmark.

--Motivation

The C programming language is used in many practical programs, operating system  
kernels, byte code machines, network servers or embedded systems. Low level feature
of C is useful, but sometimes it requires programming hacks. For example, in case of
byte code machine often entire program is a huge switch statement which has many
labels which correspond the byte codes. Operating system or network server has
many layers such as system call dispatch, transport or presentation layer. It 
requires deep call levels,  2 or 3 for each layer, resulting 10-30 call levels.

Continuation based C (CbC) provides a structured way to represent these situations. It
is a small medication of C. It consists of a syntax to force tail-call-elimination
and parameterized goto statement. 

C has capable of recursive functions, but we find it's type system is not enough
to represent CbC programming. In this paper, we introduce \rectype
keyword as a recursive type. To handle recursive type, conventionally tagged struct 
is used. It is also fit for CbC program.

We will describe the CbC and \rectype implementation of CbC on GCC 4.6.0.

--Continuation based C

CbC's basic programming unit is a Code Segment. It is not a subroutine, but it
looks like a function, because it has input and output. We can use C struct
as input and output interfaces.

   struct interface1  { int i; };
   struct interface2  { int o; };

   __code f(struct interface1 a) { 
       struct interface2 b; b.o=a.i;  
       goto g(b); 
   }

In this example, a Code Segment
\verb+f+ has \verb+input a+ and sends \verb+output b+ to a Code Segment \verb+g+.
There is no return from Code Segment \verb+b+, \verb+b+ should call another
continuation using \verb+goto+. Any control structure in C is allowed in CwC
language, but in case of CbC, we restrict ourselves to use \verb+if+ statement
only, because it is sufficient to implement C to CbC translation. In this case,
Code Segment has one input interface and several output interfaces (fig.\ref{code}).

<center><img src="figure/code.pdf" alt="code"></center>


\verb+__code+ and parameterized global goto statement is an extension of
Continuation based C. Unlike \verb+C--+ \cite{cminusminus}'s parameterized goto,
we cannot goto into normal C function.

In CwC, we can go to a Code Segment from a C function and we can call C functions
in a Code Segment. So we don't have to shift completely from C to CbC. The later
one is straight forward, but the former one needs further extensions.

   void *env;
   __code (*exit)(int);

   __code h(char *s) { 
        printf(s);
        goto (*exit)(0),env;
   }

   int main() {
        env  = __environment;
        exit = __return;
        goto h("hello World\n");
   }

In this hello world example, the environment of \verb+main()+
and its continuation is kept in global variables. The environment
and the continuation can be get using \verb+__environment+,
and \verb+__return+. Arbitrary mixture of Code Segments and functions
are allowed (in CwC). The continuation of \verb+goto+ statement 
never returns to original function, but it goes to caller of original
function. In this case, it returns result 0 to the operating system.



--Recursive type syntax

A continuation is a pointer to a Code Segment to be executed after.
For example, it is passed as follows;

   __code csA( __code (*p)() ) {
       goto p(csB);
   }

where {\tt p} is the continuation and {\tt csB} is a Code Segment which
has the same type of {\tt csA}.  This declaration is not enough because
it lacks the type of the argument. If add the type declaration, we
get following program;

    __code csA( __code (*p)( __code (*)( __code (*)( __code (*)())))) {
       goto p(csB);
    }

or 
    __code csA( __code (*p)( __code(*)())) {
       goto p(csB);
    }

It is enough to type {\verb+csB+}, but of course it is not completely will typed.
Omitting types of the arguments of a function is allowed, but it requires
complex declaration and it is incomplete.

We introduce \rectype syntax for this situation. In an argument deflation,
\rectype represents it's function type. Using \rectype, previous example can
be written like this;

    __code csA( __rectype *p) {
       goto p(csB);
    }

{\tt *p} has a type of {\tt csA} itself.

The same situation is happen in conventional C, since Code Segment is a
mere C function with tail-call-elimination. \rectype provides exact and
concise way to describe recursive type. If we have extra arguments,

    __code csBs(int a,  __rectype *p) ;

    __code csAs(int a,  __rectype *p) {
       goto p(a, csBs);
    }

In this case, we have two \rectype keywords. It may points the same type or
different types. There is no ambiguity here, because it point the exact
position. If it points the same position in the same type declaration,
it is the same type.


--Recursive type syntax in a function body

\rectype can be appeared in a function arguments, a function body or struct definition as a
pointer type, otherwise it errors.

In a function body, it has a type of the function itself.

    __code csAs(int a,  __rectype *p) {
       __rectype *p1 = p;
       goto p1(a, csBs);
    }

It is possible to write the following way; 

    __code csAs(int a,  __rectype *p) {
       __code (*p)(int, __retype *);
       p1 = p;
       goto p1(a, csBs);
    }

but previous one is shorter.

We cannot allow non pointer type \rectype, because it generates infinite size of object.

In case of struct, 

     struct {
         int a;
         __rectype *next;
     }

is the same of this;

     struct b {
         int a;
         struct b *next;
     }

this is a conventional way to represent recursive type in C. Using this technique we
can write a continuation like this;

    struct interface {
       __code (*next)(struct interface);
    };

    __code csA(struct interface p) {
        struct interface ds = { csB };
        goto p.next(ds); }

    int main() {
        struct interface ds = { print };
        goto csA(ds);
        /* NOTREACHED*/ 
        return 0;
    }

\rectype is clearer but struct technique provides abstract representation. It requires
extra struct declaration, but it give us the same assembler output.

    __code fibonacci(__rectype *p, int num,  int count, int result, int prev) 


--How to implement \rectype

\rectype syntax is implemented overriding AST.
First, \rectype syntax make Tree same \code(\ref{fig:tree1}).
Second, tree was created to be rectype flag.
Third, to override AST(\ref{fig:tree2}).

\begin{figure}[htpb]
\begin{minipage}{0.5\hsize}
\begin{center}
\scalebox{0.35}{\includegraphics{figure/tree1.pdf}}
\end{center}
\caption{AST of function pointer}
\label{fig:tree1}
\end{minipage}
\begin{minipage}{0.2\hsize}
\begin{center}
\scalebox{0.35}{\includegraphics{figure/tree2.pdf}}
\end{center}
\caption{AST of \rectype}
\label{fig:tree2}
\end{minipage}
\end{figure}

This AST(\ref{fig:tree2}) is made by syntax of \verb+__code csA(__rectype *p)+ .
\treelist have information of argument.
First \treelist represent that argument is function pointer(\verb+__code (*p)()+) .
Second \treelist represent that csA is Fixed-length argument.
Third \treelist is connected with \pointertype.
\pointertype have pointer of function(\functiontype).
We have to override it in the pointer of csA.




--Problems with implementation of \rectype.

Segmentation fault has occurred in the following program on compile.

    __code csA(__rectype *p) {
       goto p(3);
    }

The above code is the wrong argument of p.
The p's argument is converted by GCC.
3 of type int is converted to a pointer type of Code Segment.
At this time, GCC looks at the type of the argument.
p's argument is pointer of csA.
csA's argument is p.
GCC is also an infinite recursion happens to see the type of argument of the argument.

We Solve this problem that does not check the arguments if the \rectype flag is true.
The following program become part was fixed gcc/c-family/c-pretty-print.c.

    static void
    pp_c_abstract_declarator (c_pretty_printer *pp, tree t)
    {
      if (TREE_CODE (t) == POINTER_TYPE)
        {
          if (TREE_CODE (TREE_TYPE (t)) == ARRAY_TYPE
          || TREE_CODE (TREE_TYPE (t)) == FUNCTION_TYPE)
        pp_c_right_paren (pp);
    #ifndef noCbC
          if (IS_RECTYPE(t)) return;
    #endif
          t = TREE_TYPE (t);
        }
      pp_direct_abstract_declarator (pp, t);
    }

 
Variable t have information of p's argument.
If t is \pointertype, t is assigned type of \pointertype(\verb+t = TREE_TYPE (t);+).
We have added code \verb+if (IS_RECTYPE(t)) return;+
Thereby we have solved type checking and infinite recursion problem.




--Method other than \rectype

The recursively program of C's syntax can be solved using struct syntax.
For example, if we write

    struct interface {
       __code (*next)(struct interface);
    };

    __code csA(struct interface p) {
        struct interface ds;
	ds.next = csB;
        goto p.next(ds);
    }

    int main() {
        struct interface ds;
	ds = print;
        goto csA(ds);
        return 0;
    }

there is no need to write recursively.
Because the struct syntax wrapped in a function pointer.
Code Segment does not receive function pointer in arguments.
Recursively program does not occur.


--Comparison

Here is CbC program that finds the Fibonacci sequence.

    __code print(__rectype *p, long int num,
                  long int count, long int result, long int prev) {
        printf("fibonacci(%d) = %ld\n",num,result);    
        goto cs_while(print, num, count, result, prev);
    }
    
    __code fibonacci(__rectype *p, long int num,
                      long int count, long int result, long int prev) {
        if (count == 0) {
            result += 0;
            count++;
        } else if (count == 1) {
            result += 1;
            count++;
        } else if (count > 1){
            long int tmp = prev;
            prev = result;
            result = result + tmp;
            count++;
        } else {
            printf("please enter nutural number\n");
            exit(0);
        }
        if (num < count) {
            goto p(fibonacci, num,  count, result, prev);
        }
        goto fibonacci(p, num, count, result, prev);
    }
    __code cs_while(__rectype *p, long int num,
                     long int count, long int result, long int prev) {
        if (num > 0) {
            num--;
            goto fibonacci(print, num, 0, 0, 0);
        }
        exit(0);
    }

It is written using \rectype syntax.
Do not use the \rectype syntax program would be the following declaration.

    __code print(__code (*p)(__code(*)(),long int,long int,long int,long int),
                  long int num, long int count, long int result, long int prev);

    __code fibonacci(__code (*p)(__code(*)(),long int,long int, long int,long int),
                     long int num,  long int count, long int result, long int prev);

    __code cs_while(__code (*p)(__code(*)(),long int, long int, long int, long int),
                     long int num, long int count, long int result, long int prev);

Comparing the program that made the declaration of each.
AST is almost the same should be created both.
Therefore, there should be no difference was the result.

Here is the result.

--Conclusion


We have designed and implemented Continuation based language for practical use.
We have implemented \rectype syntax in GCC for CbC.
Thereby Easyly be able to write CbC program than previous.

This \rectype implementation may be other problems.
If so, it is necessary to find and fix them on future.