3
|
1 -title: Programming in code and data
|
0
|
2
|
|
3 --abstract:
|
|
4
|
4
|
5 We introduce system of code and data. Instead of function call, code accepts input data and simply generates output data.
|
|
6 Chains of code and data performs computation. After calling next code, it does not return to the caller, so it has no
|
|
7 environment nor calling stack. This system is designed for reliable system description and parallel execution on various
|
|
8 architecture. We use segments or gears sometime. This is directly connected to the category theory like lambda calculus.
|
|
9 We also find a categorical representation from the view point of data. This shows duality of code and data. This duality will be
|
|
10 a guideline of our system design.
|
0
|
11
|
4
|
12 --Reliable computation and predictability
|
|
13
|
|
14 Various software are used in real world. Each of them have to work in a reliable way. A piece of device contains
|
5
|
15 millions of lines of code. These programs are written in C, Haskell\cite{haskell} and so on.
|
2
|
16 To assure its reliability, the computation of a function should be predictable.
|
4
|
17 The correctness of the prediction should be assured by measurements, model checking or proofs.
|
2
|
18
|
|
19 % The unit of computation of conventional programming language is of course a function call.
|
|
20 % a function may call another functions and it comes back to the original function.
|
|
21 % Proof theory such as Hoare Logic or model checking method is used to make the prediction, such as
|
4
|
22 % zero divide is never occurred in a function.
|
0
|
23
|
|
24 We propose new unit of computation, data segments and code segments. Computations in these segments are finite and
|
2
|
25 predictable. We sometimes call these gears.
|
|
26
|
|
27 A set of gears makes a programming system.
|
|
28
|
|
29 % This unit is larger then an machine instruction and smaller than a function call.
|
|
30
|
|
31 Data segments and code segments are connected by meta data segments and meta code segments.
|
|
32 The idea is construct system as a set of predictable unit of computation.
|
0
|
33
|
|
34 --Definition of data segment and code segment
|
|
35
|
4
|
36 Actually we implement our Gears language in LLVM\cite{LLVM}, but we can think both code and data are System F\cite{Girard:1989:PT:64805} term. As usual, Types are
|
2
|
37 defined starting from type variables X,Y,Z and is generated by two operations:
|
|
38 1. if U and V are types, then U → V is a type.
|
5
|
39 2. if V i a type, and X a type variable,
|
|
40 then ΠX.V is a type.
|
2
|
41 Terms are constructed by six schemes
|
5
|
42 1. variable: x, y, z,... of type T,
|
|
43 2. application: t u of type V, where t is of type
|
|
44 U→V and u is of type U,
|
|
45 3. λ-abstraction: λx.v of type U→V,
|
|
46 where x is a variable of type U and v is of type V,
|
|
47 4. universal abstraction: if v is a term type V, then
|
|
48 we can form ΛX.v of type ΠX.V, so long as the
|
|
49 variable X is not free in the type of a free variable of v.
|
|
50 5. universal application: if t is a term of type ΠX.V
|
|
51 and U is a type, then tU is a term of type V[U/X].
|
4
|
52 and usual conversions,
|
2
|
53 1. (λx.v)u 〜> v[u/x]
|
|
54 2. (ΛX.v)U 〜> v[U/X]
|
5
|
55 What we need here is that a term has a type, a function has type \verb+U → V+.
|
|
56 Code segments f of type \verb+I → O+, accepts data segments of type \verb+I+ and generates data segments of type \verb+O+.
|
4
|
57 f : I → O
|
|
58 Domain of C is I and codomain of C is O.
|
2
|
59
|
4
|
60 Gears system only allows calling another code at the bottom of the code, that is all codes have tail call form. Normal function call
|
|
61 is not prohibited, but it should be closed in a code.
|
|
62
|
5
|
63 <center><img src="fig/codeGear.pdf"></center>
|
3
|
64
|
2
|
65 --Meta computation of Gears
|
|
66
|
4
|
67 Computation of a code is limited in the inputs and the outputs and it makes the computation of the code predictable, but its data are
|
|
68 usually connected to other data. The code has its continuation also. These connections are out of the scope of the code.
|
|
69 We think these connections are made by a meta computation, such as monads\cite{Moggi:1989:CLM:77350.77353}. A monad is a data structure with monad laws and
|
5
|
70 after an execution of a code, monads join code is called to handle meta data structure. With monad \verb+T\verb+,
|
2
|
71 C' : I → T O
|
5
|
72 is a meta computation. \verb+C+ has one to one correspondence with \verb+C'+.
|
3
|
73 μ : T (T O) → T O
|
|
74
|
4
|
75 Parallel execution or IO handling are represented as a monad in our scheme. Monads are only allows to use at the bottom of a code in our system.
|
2
|
76
|
5
|
77 <center><img src="fig/meta.pdf"></center>
|
3
|
78
|
4
|
79 --A category of codes
|
2
|
80
|
4
|
81 Types of code segments and data segments naturally compose a category of function and types. Codes and data are interconnected one by one.
|
3
|
82 Object : a, b, c ...
|
|
83 Arrows : f, g, h ...
|
5
|
84 An arrow has its domain object and codomain object. In this case, Object is a type of data and arrows are function \verb+h+ with type \verb+a → b+, which domain is an input type a and
|
3
|
85 codomain is an output type b.
|
|
86 h : a → b
|
|
87 g : b → c
|
|
88 f : c → d
|
4
|
89 There is composition of arrows,
|
3
|
90 f o g : b → d
|
4
|
91 and it satisfies the composition law.
|
3
|
92 (f o g) o h = f o (g o h )
|
5
|
93 There is also an identity arrow \verb+id a+ for each object \verb+a+, which satisfies,
|
2
|
94 a o id (domain a) = a = id (domain a) o a
|
|
95
|
4
|
96 --A category of data
|
2
|
97
|
3
|
98 A code generate an output type and it becomes an input type of next code. It is similar to a code is in between an input type and an output type.
|
|
99 Is is possible to think data is an arrow between codes?
|
|
100
|
5
|
101 <center><img src="fig/dataGear.pdf"></center>
|
3
|
102
|
4
|
103 In other words, is there any duality in codes and data? Usual answer is no, since we cannot simply combine data segments,
|
3
|
104 but introducing continuation, it is possible to create a category which objects are functions and
|
|
105 which arrows are data. Actually these two categories are dual in the sense of adjunction, that is
|
4
|
106 there is a one to one correspondence between their arrows with isomorphism.
|
3
|
107 Here we show the duality as an exercise of a category theory.
|
2
|
108
|
4
|
109 The problem of data segments composition is that it forgets about later computation. We can simply store it as a continuation in the
|
5
|
110 data segment. Data segments now have a continuation, which is a code segment. It is a part of meta computation. We have a data \verb+f+,
|
|
111 which is a codomain of a code \verb+a+ and is a domain of code \verb+b+. A continuation of \verb+f+ will called before the execution of code \verb+b+.
|
3
|
112
|
5
|
113 Now an data arrow \verb+f+ is a triplet \verb+{a,b,n}+, \verb+a+ is a code which codomain is data \verb+f+ and \verb+b+ is a code which domain is data \verb+f+.
|
|
114 \verb+n : codomain a → domain b + is a continuation of \verb+f+.
|
2
|
115
|
4
|
116 We introduce access function as follows:
|
3
|
117 data-domain f = a
|
5
|
118 data-codomain f = b
|
3
|
119 continuation f = n
|
2
|
120
|
4
|
121 We use data-codomain and data-codomain for new Data Category C, which is constructed from a category C.
|
5
|
122 Objects of Data category C are arrows of category C. An arrow of Data category C from \verb+a+ to \verb+c+ is
|
|
123 a data segment \verb+f+ with continuation \verb+n+, \verb+b'+ is a intermediate data segment.
|
3
|
124 n : codomain a → domain b
|
|
125 f = {a,b,n}
|
2
|
126
|
5
|
127 \verb+f+ is an arrow from \verb+a+ to \verb+b+.
|
3
|
128
|
5
|
129 <center><img src="fig/dataGearComposition.pdf"></center>
|
2
|
130
|
5
|
131 If \verb+f+ and \verb+g+ has same data-domain and data-codomain, an equality of \verb+f+ and \verb+g+ is defined as follows,
|
3
|
132 f = g if b o continuation f = b o continuation g
|
5
|
133 where \verb+b+ is data-codomain of \verb+f+ and \verb+g+.
|
2
|
134
|
4
|
135 Composition of arrows of Data category is defined as follows.
|
5
|
136 {b,c,m} + {a,b,n} = {a,c,m o (b o n))}
|
2
|
137 It is easy to see its composition lows.
|
5
|
138 {c,d,l} + ( {b,c,m} + {a,b,n} ) =
|
|
139 ({c,d,l} + {b,c,m} ) + {a,b,n}
|
3
|
140 because
|
|
141 d o ( l o ( m o (b o n))) = d o (( l o m ) o (b o n) ) .
|
|
142
|
5
|
143 To make an identity arrow in the Data category, if \verb+a+ is a codomain of \verb+f+, we need a reverse arrow of \verb+a+, \verb+a'+.
|
|
144 So every arrows \verb+a+ in C have to have a reverse arrow \verb+a'+, where
|
4
|
145 a o a' = id (codomain a)
|
5
|
146 Then an identity data segment of \verb+a+ in Data Category C is
|
3
|
147 {a,a,a'}
|
5
|
148 <center><img src="fig/IdDataGear.pdf"></center>
|
2
|
149
|
5
|
150 To check \verb+{a,a,a'}+ is an identity,
|
|
151 {b,b,b'} + {a,b,n} = {a,b,b' o (b o n)} = {a,b,n}
|
|
152 Right identity law holds the same way.
|
2
|
153
|
5
|
154 If we use
|
|
155 continuation f = continuation g
|
|
156 as a definition of arrow equality, we need
|
|
157 a' o a = id (domain a)
|
2
|
158
|
|
159
|
|
160 --A trivial duality of code and data
|
|
161
|
5
|
162 <center><img src="fig/Duality.pdf"></center>
|
3
|
163
|
|
164
|
4
|
165 Data category is a data segment with continuation, which is one step behind, so there is a trivial one to one correspondence.
|
|
166 It is also easy to show C and Data category C is an adjunction pair. We show it using a universal mapping problem.
|
2
|
167
|
3
|
168 Data segments in a Data category C is an object of original category C, so it has an identity arrow,
|
4
|
169 which is an object of Data Category C . Actually, we have a map F,
|
3
|
170 F : Obj C → Obj (Data Category C)
|
|
171 F a = id a
|
|
172 As a reverse, Functor U : (DataCategory C) C is defined as follows :
|
|
173 U d = codomain d
|
|
174 U {a,b,n} = b o n
|
5
|
175 where \verb+d+ is an object of Data Category C, and \verb+{a,b,n}+ is an arrow from \verb+a+ to \verb+b+, with continuation \verb+n+.
|
|
176 Identity law and distribution law of \verb+U+ is easyly checked as
|
4
|
177 U {a,a,a'} = a o a' = id (codomain a)
|
5
|
178 We need a mapping \verb+η+, which will be a natural transformation.
|
3
|
179 η : Obj C → U ( F a )
|
|
180 η a = id a
|
5
|
181 We can define a solution \verb+*+ of universal mapping problem for \verb+F, U, η+ for
|
3
|
182 f : a → U {a,b,n}.
|
5
|
183 it is an allow from \verb+F a+ to \verb+b+,
|
3
|
184 * f = {a,b,b' o f}
|
|
185 To see this is a solution,
|
|
186 U ( *f ) o (η a) = (b o (b' o f) ) o id a = f
|
5
|
187 is directly checked and \verb+*f+ is unique, that is
|
|
188 if there is an arrow g U g o (η a) = f then g = *f.
|
3
|
189 This is also trivial as
|
5
|
190 b o continuation g =
|
|
191 ( b o b' ) o ( b o continuation g) =
|
|
192 b o (b' o ( b o continuation g) ) =
|
|
193 b o (b ' o f) =
|
|
194 b o continuation ( *f )
|
|
195 which implies \verb+g = *f+. Q.E.D.
|
3
|
196
|
2
|
197
|
|
198 --Comparison
|
|
199
|
4
|
200 Meta computation in Haskell is defined as set of explicit monads. In Gears, meta computation is unique among the system,
|
2
|
201 which is something like operating system kernel or libraries.
|
|
202
|
4
|
203 In OpenCL\cite{opencl} or Cuda\cite{cuda}, there is a code segment which is called kernel. The kernel runs parallelly in a GPU. The kernel
|
2
|
204 is very similar to our code segment.
|
|
205
|
4
|
206 Object oriented languages, such as Smalltalk-80 or Java , has meta computation as a virtual machines. It is very different from
|
|
207 monad or meta computation of our Gears system. Data segment has no identity such as Smalltalk's self, it can be copied easyly,
|
|
208 which is a very good property in a parallel computation environment.
|
2
|
209
|
4
|
210 Code segment can be seen as a kind of Typed assembly language, which has typed input and predictable behavior.
|
2
|
211
|
|
212 In old age, there is a software design method called data flow diagram. Category of code and data is very much like
|
4
|
213 data flow diagram approach. It can be seen as a revival of main frame technologies.
|
2
|
214
|
|
215
|
|
216 --Conclusion
|
|
217
|
4
|
218 As usual categorical result, trivial duality of category and Data category means nothing itself.
|
2
|
219 During the design of gears system, we see similarities of code segment and data segment. It has
|
4
|
220 meta segments and continuations. We think the duality gives us some guidance to design software
|
2
|
221 system such as an operating system or libraries.
|
|
222
|
4
|
223 We have implemented Gears system on top of LLVM\cite{LLVM} and hope it can be used as real system description language.
|
2
|
224
|
4
|
225
|