view cfopm.tex @ 0:c0d36568602d

1st commit
author Kaito Tokumori <e105711@ie.u-ryukyu.ac.jp>
date Sun, 10 May 2015 22:54:12 +0900
parents
children 97e85476344e
line wrap: on
line source

\documentclass[conference]{IEEEtran}

\usepackage[cmex10]{amsmath}
\usepackage{url}
\usepackage{listings}
\usepackage[dvipdfmx]{graphicx}

\lstset{
  frame=single,
  keepspaces=true,
  stringstyle={\ttfamily},
  commentstyle={\ttfamily},
  identifierstyle={\ttfamily},
  keywordstyle={\ttfamily},
  basicstyle={\ttfamily},
  breaklines=true,
  xleftmargin=0zw,
  xrightmargin=0zw,
  framerule=.2pt,
  columns=[l]{fullflexible},
  numbers=left,
  stepnumber=1,
  numberstyle={\scriptsize},
  numbersep=1em,
  language=c,
  tabsize=4,
  lineskip=-0.5zw,
  escapechar={@},
}

\ifCLASSINFOpdf
  % \usepackage[pdftex]{graphicx}
  % declare the path(s) where your graphic files are
  % \graphicspath{{../pdf/}{../jpeg/}}
  % and their extensions so you won't have to specify these with
  % every instance of \includegraphics
  % \DeclareGraphicsExtensions{.pdf,.jpeg,.png}
\else
  % or other class option (dvipsone, dvipdf, if not using dvips). graphicx
  % will default to the driver specified in the system graphics.cfg if no
  % driver is specified.
  % \usepackage[dvips]{graphicx}
  % declare the path(s) where your graphic files are
  % \graphicspath{{../eps/}}
  % and their extensions so you won't have to specify these with
  % every instance of \includegraphics
  % \DeclareGraphicsExtensions{.eps}
\fi


% correct bad hyphenation here
\hyphenation{op-tical net-works semi-conduc-tor}


\begin{document}
%
% paper title
% Titles are generally capitalized except for words such as a, an, and, as,
% at, but, by, for, in, nor, of, on, or, the, to and up, which are usually
% not capitalized unless they are the first or last word of the title.
% Linebreaks \\ can be used within to get better formatting as desired.
% Do not put math or special symbols in the title.
\title{Implementing Continuation based language in LLVM and Clang}


% author names and affiliations
% use a multiple column layout for up to three different
% affiliations
\author{
\IEEEauthorblockN{Kaito TOKUMORI}
\IEEEauthorblockA{University of the Ryukyus \\ Email: kaito@cr.ie.u-ryukyu.ac.jp}
\and
\IEEEauthorblockN{Shinji KONO}
\IEEEauthorblockA{University of the Ryukyus \\ Email: kono@ie.u-ryukyu.ac.jp}
}

% make the title area
\maketitle

% As a general rule, do not put math, special symbols or citations
% in the abstract
\begin{abstract}
A programming paradigm which use data segments and
code segments is proposed. CbC is a lower language of C for this paradigm. 
CbC has standalone compiler and GCC version.
In this study, we add an implement CbC compiler on LLVM/clang-3.7. 
The detail of implementation and evaluation are shown.
\end{abstract}

% no keywords




% For peer review papers, you can put extra information on the cover
% page as needed:
% \ifCLASSOPTIONpeerreview
% \begin{center} \bfseries EDICS Category: 3-BBND \end{center}
% \fi
%
% For peerreview papers, this IEEEtran command inserts a page break and
% creates the second title. It will be ignored for other modes.
\IEEEpeerreviewmaketitle

\section{A Practical Continuation based Language}
We proposed units of program named code segment and data segment.
Code segment is a unit of calculation which has no state.
Data segment is a set of typed data.
Code segments are connected to data segments with a context, which is a meta data segment.
After an execution of a code segment and its context, next code segments (Continuation) is executed.

We had developed a programming language ``Continuation based C (CbC)'' \cite{DBLP:journals/corr/abs-1109-4048}.
Hear after we call it CbC, which supports code segments.
CbC is compatible with C language and it has continuation as a goto statement.

Code segments and data segments are low level enough to represent computation details,
and it is architecture independent.
It can be used as an architecture independent assembler.

CbC has standalone compiler and GCC version. Here we report new partial implementation of CbC compiler based on LLVM and Clang 3.7.

First we show CbC language overview.
\section{Continuation based C}
CbC's basic programing unit is a code segment. It is not a subroutine, but it looks like a function, because it has input and output. These interfaces should be data segments and we are currently designing data segments part. 

\begin{table}[html]
\begin{lstlisting}
  __code f(Allocate allocate){
    allocate.size = 0;
    goto g(allocate);
  }

  // data segment definition
  // (generated automatically)
  union Data {
    struct Allocate {
      long size;
    } allocate;
  };
\end{lstlisting}
\caption{CbC Example}
\label{src:example}
\end{table}

In this example, a code segment {\bf f} has input datasegment {\bf allocate} (Allocate is a data segment identifier) and sends output it to a code segmnet {\bf g}. CbC compiler generates data segment definition automatically so we do not have to write it. There is no return from code segment {\bf g}, {\bf g} should call another continuation using {\bf goto}. Code segments has one input data segment and several output data segment, and their dependency is proved by data segments.
\begin{figure}[htp]
  \begin{center}
    \scalebox{0.5}{\includegraphics{fig/csds.pdf}}
  \end{center}
  \caption{Code Segmnets and Data Segments on CbC}
  \label{fig:csds}
\end{figure}

In CbC, we can go to a code segment from a C function and we can call C functions in a code segment. So we don't have to shift completely from C to CbC. The later one is straight forward, but the former one needs further extensions.

\begin{table}[html]
\begin{lstlisting}
  int main() {
    goto hello("Hello World\n", __return, __environment);
  }

  __code hello(char *s, __code(*ret)(int, void*), void *env) {
    printf(s);
    goto (*ret)(0);
  }
\end{lstlisting}
\caption{Call C Functions in a Code Segment}
\label{src:example}
\end{table}

In this hello world example, the environment of {\bf main}() and its continuation is kept in a variable {\bf \_\_environment}. The environment and the continuation can be get using {\bf \_\_environment} and {\bf \_\_return}.Arbitrary mixture of code segments and functions are allowed. The continuation of {\bf goto} statement never returns to original function, but it goes to caller of original function. In this case, it returns result 0 to the operating system. This cotinuation is called {\bf goto with environment}.

\section{LLVM and Clang}
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies, and the LLVM Core libraries provide a modern source and target independent optimizer, along with code generation support for many popular CPUs. Clang is an LLVM native C/C++/Objective-C compiler. Figure \ref{fig:structure} is Clang and LLVM's compile flow.

\begin{figure}[htp]
  \begin{center}
    \scalebox{0.25}{\includegraphics{fig/clang_llvm_structure.pdf}}
  \end{center}
  \caption{LLVM and Clang structure}
  \label{fig:structure}
\end{figure}

LLVM has a intermediate representation which called LLVM IR\cite{LLVMIR}. Importantly, we do not modify it so we do not have to modifiy optimize part.

\section{Implementation in LLVM and Clang}
So how to implement CbC compiler in LLVM and Clang. Here we show our idea.

\begin{itemize}
\item Code segments are implemented by C functions.
\item Transition is implemented by forced tail call.
\item Goto with environment is implemented by setjmp and longjmp.
\end{itemize}

{\bf \_\_code} is implemented as a new type keyword in LLVM and Clang. You may think {\bf \_\_code} is an attribute of a function, which means that the function can call in tail call elimination only.
Because of this implementation, we can actually call code segmnet as a C function call.

Forcing tail call require many condisions. For example, there should be no statement after tail call, caller and callee's calling convention have to be the same and type is cc10, cc11 or fastcc, callee's return value type have to be the same as caller's it, add tail call elimination pass, and so on.

All code segment has the void return type and we do not allow to write statement after continuation, so type problem and after statement problem is solved.

Tail call elimination pass is enabled in {\bf BackendUtil.cpp}. In the clang, when optimize level is two or more, tail call elimination pass is enabled. We modify it to enable anytime and if optimize level is one or less, tail call elimination pass work for only code segment.

Next, we solve a calling convention problem. We select fastcc for code segment's calling convention. In the clang, calling convention is managed by CGFunctionInfo class and its infomations are set in {\bf CGCall.cpp}. We modify here to set fastcc to code segmnets.

Goto with environment is implemented by code rearranging. If the {\bf \_\_environment} or {\bf \_\_return} is declared, CbC compiler rearrange code for goto with environment. We use setjmp and longjmp for it. Setjmp save environment before continuation, and longjmp restore it.

\section{Result}
Here is our bench mark program.

\begin{table}[html]
\begin{lstlisting}
int f0(int i) {
  int k,j;
  k = 3+i;
  j = g0(i+3);
  return k+4+j;
}

int g0(int i) {
  return h0(i+4)+i;
}

int h0(int i) {
  return i+4;
}
\end{lstlisting}
\caption{bench mark program conv1}
\label{src:example}
\end{table}

It is written in C and CbC and there are several optimization is possible.
When argument is 1, use CbC continuation. When argument is 2 or 3, optimization is enabled.

Here we show bench mark result (TABLE \ref{result}).

\begin{table}[htpb]
  \centering
  \begin{tabular}{|l|r|r|r|} \hline
    & ./conv1 1 & ./conv1 2 & ./conv1 3 \\ \hline
    Micro-C  & 6.875 & 2.4562 & 3.105 \\ \hline
    GCC -O2 & 2.9438 & 0.955 & 1.265  \\ \hline
    LLVM/clang -O0 & 5.835 & 4.1887 & 5.0625 \\ \hline
    LLVM/clang -O2 & 3.3875 & 2.29 & 2.5087 \\ \hline
  \end{tabular}
  \caption{Execution time(s)}
  \label{result}
\end{table} 

LLVM and clang compiler is faster than Micro-C when optimization is enabled. It mean LLVM's optimization is powerful and useful. LLVM and clang compiler is slower than GCC but GCC cannot compile safety without optimization. It means LLVM can compile more reliability than GCC.

\section{Conclusion}
We have designed and implemented Continuation based language for practical use. We have partial implementation of CbC using LLVM and Clang 3.7. CbC can use LLVM's optimization. We did not modify LLVM IR to implement CbC compiler.

In future, we design and implement data segmnet and meta code segment, meta data segment for meta computation.
\nocite{opac-b1092711, LLVMIR, LLVM, clang}
\bibliographystyle{IEEEtran}
\bibliography{IEEEabrv,reference}



% that's all folks
\end{document}