git @ Cat's Eye Technologies Shelta / f0a6091
Import of Shelta version 1.1 revision 2009.0307 sources. Cat's Eye Technologies 10 years ago
6 changed file(s) with 499 addition(s) and 27 deletion(s). Raw diff Collapse all Expand all
0 bin/sheltan.com: src/shelta86.s
1 nasm src/shelta86.s -o bin/sheltan.com
2
3 all: bin/sheltan.com
4
00 @echo off
1 REM BOOTSTRP.BAT v2002.1208 (c)1999 Chris Pressey, Cat's-Eye Technologies.
1 REM BOOTSTRP.BAT
2 REM v1.1.2009.0307 (c)1999-2009 Chris Pressey, Cat's-Eye Technologies.
23 REM Builds the bootstrapped versions (S & S2) of the Shelta compiler.
34 @echo on
4 call bin\shelta 86 prj\sheltas
5 copy prj\sheltas.com bin\sheltas.com
6 call bin\shelta s prj\sheltas
7 copy prj\sheltas.com bin\sheltas2.com
8 call bin\shelta s2 prj\sheltas
9 diff prj\sheltas.com bin\sheltas2.com
10 del prj\sheltas.com
5 call bin\shelta n eg\sheltas
6 copy eg\sheltas.com bin\sheltas.com
7 call bin\shelta s eg\sheltas
8 copy eg\sheltas.com bin\sheltas2.com
9 call bin\shelta s2 eg\sheltas
10 del eg\sheltas.com
0 @echo off
1 REM SHELTA.BAT v2002.1208 (c)2002 Cat's-Eye Technologies.
0 REM @echo off
1 REM SHELTA.BAT
2 REM v1.1.2009.0307 (c)1999-2009 Chris Pressey, Cat's-Eye Technologies.
23 REM A 'make'-like utility for Shelta compilers, as an MS-DOS batch.
34
45 REM -- Change the following lines to tailor what libraries are
56 REM -- included by default. See readme.txt
6 type lib\8086\8086.she >s
7 type lib\8086\gupi.she >>s
8 type lib\8086\dos.she >>s
9 type lib\8086\string.she >>s
10 type lib\gupi\linklist.she >>s
7 type lib\8086\8086.she >s.she
8 type lib\8086\gupi.she >>s.she
9 type lib\8086\dos.she >>s.she
10 type lib\8086\string.she >>s.she
11 type lib\gupi\linklist.she >>s.she
1112
12 REM -- This section builds the source file, always called 'S'.
13 REM -- This section builds the source file, always called 's.she'.
1314 if not exist %2.she echo Can't find project file %2.she!
14 if exist %3.she type %3.she >>s
15 if exist %4.she type %4.she >>s
16 if exist %5.she type %5.she >>s
17 if exist %6.she type %6.she >>s
18 if exist %7.she type %7.she >>s
19 if exist %8.she type %8.she >>s
20 if exist %9.she type %9.she >>s
21 if exist %2.she type %2.she >>s
22 type null.txt >>s
15 if exist %3.she type %3.she >>s.she
16 if exist %4.she type %4.she >>s.she
17 if exist %5.she type %5.she >>s.she
18 if exist %6.she type %6.she >>s.she
19 if exist %7.she type %7.she >>s.she
20 if exist %8.she type %8.she >>s.she
21 if exist %9.she type %9.she >>s.she
22 if exist %2.she type %2.she >>s.she
23 type bin\null.txt >>s.she
2324
24 bin\shelta%1.com <s > %2.com
25 rem bin\shelta%1.com <s.she
26 bin\shelta%1.com <s.she >%2.com
2527
2628 if errorlevel 32 echo Source file could not be opened.
2729 if errorlevel 16 echo Error - Unknown identifier in source file.
28 del s
30 del s.she
Binary diff not shown
0 Well, here we are, ten years later.
1
2 What brought me back here was the fact that no one uses Turbo Assembler
3 anymore. I'm not even sure if Borland is around anymore. And I started
4 thinking, well, I use NASM these days; maybe I should translate the 8086
5 assembly version of Shelta into NASM. It's free, and tinkers like free.
6 Plus Ben recommended it, way back when it was something like version 0.98.
7 I said I'd wait until it was past version 1.0. Well, it's 2.mumble now,
8 so it's high time, right?
9
10 So I started translating, and I discovered just how much more explicit
11 NASM is. I was aiming to reproduce the same SHELTA86.COM file, or at
12 least one of the same length with the labels in the same places. I had
13 to go through some lengths to stop NASM from inserting redundant ds:
14 segment references, and from padding the start of the data segment to a
15 word boundary (I just left out the data segment directive entirely.)
16
17 But then, when I got it all nice and translated -- Shock! Horror!
18 I discovered the awful truth: shelta86.com cannot compile sheltas.she.
19
20 Where did I get the nerve to say that I had bootstrapped a half-kilobyte
21 compiler? Misleading at best! I had bootstrapped a probably about 555-
22 byte compiler. I then butchered the language it compiled -- removing
23 three instruction forms -- so that I could shove that compiler into half
24 a kilobyte. At no time did I actually bootstrap the <512-byte version.
25 No, that would have required rewriting shelta.she to have a lot of blocks
26 with temporary names that were only pushed once, and other garbage like
27 that, so that the stripped-down compiler wouldn't choke on it. One of
28 the nice things about (the full) Shelta, I think, is that while it is
29 small, it doesn't force you to wallow in garbage. At least not a lot.
30
31 So I screwed up my courage, cracked my knuckles, and tried to live up to
32 my own hype. I re-instated the string (`) and push-pointer-anonymously
33 (]) functions, which bumped the size back up to around 555 bytes. I
34 didn't bother with the push-named-pointer (]Name) form, because it's not
35 used in sheltas.she. OK, so neither are strings, but a lot of the other
36 example Shelta programs use strings, so I thought they would be good to
37 have.
38
39 I then proceeded to squish the living daylights out of the new, NASM-
40 language shelta86.s. Mostly this involved long, hard looks at the logic
41 and detailed liveness analysis (done by hand, of course.) There were a
42 few small tweaks that were easily done; for example, removing one or two
43 instructions that were completely unnecessary, and replacing the jmp
44 in the handler dispatch with a call (several ret statements take up less
45 space than several jmps back to the top of the loop. Who cares about
46 wasting space on the stack?) The most significant savings, though, came
47 from factoring out some code to write a push instruction and calling an
48 existing routine for it instead, and from shuffling registers to keep dx
49 free long enough so that it, instead of a memory location, could be used
50 to store one of the crucial computed pointers. The result: a 509-byte
51 executable which did all that the old shelta86.com could do *and* enough
52 more to actually compile sheltas.she!
53
54 The old shelta86.com is still in the distribution, for comparison, or
55 nostalgia, or completeness, or whatever. The new executable is called
56 sheltan.com (for Shelta in NASM, I suppose.) The bootstrapping and
57 driver scripts have been changed slightly to accomodate this newfangle-
58 ness. I haven't touched the other documentation, which is now slightly
59 inaccurate but still quite useful.
60
61 Happy bootstrapping!
62 Chris Pressey
63 March 7, 2009
64 Bellevue, WA
0 ; shelta86.s
1 ; v1.1.2009.0307
2 ; (c)2009 Chris Pressey, Cat's Eye Technologies.
3
4 ; Implements an assembler/compiler for the Shelta language, in 8086 machine
5 ; language, in the format of the NASM assembler.
6
7 ; * Special thanks to Ben Olmstead (BEM) for his suggestions for how to
8 ; reduce SHELTA86.COM's size even further.
9
10 org 0100h
11 bits 16
12 cpu 8086
13
14 ;-------------- Code
15
16 ; Main program.
17
18 WhileFile:
19
20 ; ----- begin scanning token
21
22 call word ScanChar ; get char -> al
23 or al, al
24 jz EndFile
25 cmp al, 32
26 jbe WhileFile ; repeat if char is whitespace
27
28 mov di, token
29 cld
30
31 .TokenLoop: stosb ; put char in token
32 call word ScanChar ; get char
33 cmp al, 32
34 ja .TokenLoop ; repeat if char is not whitespace
35
36 mov byte [di], 0 ; null-terminate the token
37
38 ; ----- end scanning token
39
40 mov si, token + 1
41
42 mov al, [token]
43 sub al, '['
44 cmp al, 5
45 ja .Unroll
46 xor ah, ah
47 shl ax, 1
48 xchg bx, ax
49 mov ax, [ttable + bx]
50 call ax ; call handler as listed in ttable
51 jmp short WhileFile
52
53 .Unroll: dec si ; start at first character of token
54 call word LookupSymbol ; destroys DI & SI, but that's OK
55
56 ; copy cx bytes from ax to codeh
57
58 xchg ax, si
59 mov di, [codeh] ; use di to track codeh
60 rep movsb
61
62 UpCodeH: mov [codeh], di
63 jmp short WhileFile
64
65 EndFile: ; put in a jump over the safe area
66
67 mov di, token ; re-use token
68 mov al, 0e9h
69 stosb
70 mov ax, [safeh]
71 sub ax, safe - 1
72 stosw
73 mov al, 090h
74 stosb
75
76 mov cx, 4
77 mov dx, token
78 call word WriteIt
79
80 ; make the first word of the safe area an offset
81 ; to just past the last word of the code
82
83 mov cx, [safeh]
84 mov dx, safe
85 sub cx, dx
86 mov ax, cx
87 add ax, [codeh]
88 sub ax, codeadj
89 mov [safe], ax
90
91 call word WriteIt
92
93 mov cx, [codeh]
94 mov dx, code
95 sub cx, dx
96 call word WriteIt
97
98 xor al, al
99
100 GlobalExit: mov ah, 4ch ; exit to DOS
101 int 21h
102
103 WriteIt:
104 mov ah, 40h
105 mov bx, 1
106 int 21h
107 jnc .OK
108 mov al, 32
109 jmp short GlobalExit
110 .OK: ret
111
112 ; -------------------------------- HANDLERS --------------------------- ;
113 ; When coming into any handler, di will equal the address of the null
114 ; (that is, the number of characters in the token + offset token)
115
116 ; ==== [ ==== BEGIN BLOCK ==== ;
117
118 BeginBlock: mov di, [stach] ; push [ onto stack
119 mov ax, [codeh]
120 stosw ; mov [bx], ax
121 mov [stach], di
122 ret
123
124 ; ==== ] ==== END BLOCK ==== ;
125
126 EndBlock: dec di ; di left over from scanning token
127
128 mov bx, di ; di now free to hold something until .WName
129 sub bx, si ; get length of token
130 mov [toklength], bx ; store it for later
131
132 mov ax, [safeh]
133 mov [safestart], ax
134 mov dx, ax ; dx = namestart initially = safestart = safeh
135 xchg ax, di ; di now holds safe area head location
136
137 sub word [stach], byte 2
138 mov bx, [stach] ; pop [ from stack
139 mov ax, [bx] ; ax = codeh when [ happened
140
141 mov bp, [codeh] ; find length
142 sub bp, ax ; bp = length of code between [ ... ] (codeh - old codeh)
143
144 cmp word [stach], stac
145 je .StackEmpty
146
147 mov cx, [bx - 2] ; cx = contents popped from stack
148
149 ; namestart:dx = namestart:dx - (contents:cx - tokenlength:ax)
150
151 sub cx, ax
152 sub dx, cx
153
154 .StackEmpty: cmp byte [si], ':' ; si still = offset token + 1
155 jne .PreCopy
156
157 mov di, [macrh] ; copy into macro area instead of safe area if :
158 mov dx, di
159
160 ; copy everything from ax to codeh into the di area
161
162 .PreCopy: push ax
163 mov cx, bp
164 push si
165 xchg si, ax
166 rep movsb
167 pop si
168 pop ax
169
170 ; restore codeh back to old codeh before [
171
172 mov [codeh], ax
173 cmp byte [si], ':' ; si still = offset token + 1
174 jne .UpdateSafe
175
176 mov [macrh], di
177 jmp short .NameIt
178
179 .UpdateSafe: mov [safeh], di
180
181 ; write push instruction if '=' or ':' not used
182
183 cmp byte [si], '=' ; si still = offset token + 1
184 je .NameIt
185
186 mov ax, [safestart]
187 sub ax, safeadj
188
189 mov di, [codeh] ; di no longer contains macrh/safeh
190 jmp short WritePush
191
192 ; insert namestart into dictionary
193
194 .NameIt: mov cx, dx
195 mov ax, [toklength]
196
197 inc si
198
199 .WName: ; Insert token into the symbol table.
200 ; DESTROYS: DI
201 ; INPUT: si = pointer to token text
202 ; ax = length of token text
203 ; cx = pointer to data associated with token
204 ; bp = length of data associated with token
205
206 mov di, [symth] ; di no longer contains macrh/safeh
207 add ax, 6 ; 1 word for length, 1 for ptr, 1 for data length
208
209 stosw ; place ax length in symt
210
211 sub ax, 6
212 xchg cx, ax ; cx <- ax; ax <- cx
213 stosw ; place cx (ptr to data)
214 xchg ax, bp
215 stosw ; place bp (ptr length)
216
217 rep movsb
218
219 mov [symth], di
220
221 ret
222
223 ; ==== ^ ==== PUSH POINTER ==== ;
224
225 PushPointer: call LookupSymbol ; destroys di & si, should be OK
226
227 sub ax, safeadj
228 mov di, [codeh]
229 jmp short WritePush
230
231 ; ==== ` ==== STRING ==== ;
232
233 String: mov di, [codeh]
234 .Loop: mov al, [si]
235 stosb
236 inc si
237 cmp byte [si], 0
238 jne .Loop
239 mov [codeh], di
240 ret
241
242 ; ==== _ ==== LITERAL BYTE ==== ;
243
244 LiteralByte: cmp byte [si], '_'
245 je LiteralWord
246 cmp byte [si], '^'
247 je LiteralSymbol
248 call DecipherDecimal ; sets DI to [codeh]
249 jmp short GnarlyTrick
250
251 ; ==== __ ==== LITERAL WORD ==== ;
252
253 LiteralWord: inc si
254 call DecipherDecimal ; sets DI to [codeh]
255 FunkyTrick: stosw
256 jmp short CheapTrick
257
258 ; ==== _^ ==== LITERAL SYMBOL ==== ;
259
260 LiteralSymbol: inc si
261 call LookupSymbol ; destroys DI & SI, that's OK
262
263 sub ax, safeadj
264
265 mov di, [codeh]
266 jmp short FunkyTrick
267
268 ; ==== \ ==== PUSH WORD ==== ;
269
270 PushWord: call DecipherDecimal ; sets DI to [codeh]
271
272 WritePush: mov byte [di], 0b8h ; B8h, low byte, high byte, 50h
273 inc di
274 stosw
275 mov al, 50h
276 GnarlyTrick: stosb
277 CheapTrick: mov [codeh], di
278 ret
279
280 ; -------------------------------- SUBROUTINES --------------------------- ;
281
282 DecipherDecimal:
283 ; INPUT: si = address of token
284 ; OUTPUT: ax = value, di = codeh
285 ; uses and destroys DI
286
287 xor di, di
288
289 .Loop: lodsb
290
291 mov bx, di
292 mov cl, 3
293 shl bx, cl
294 mov cx, di
295 shl cx, 1
296 add bx, cx
297
298 sub al, '0'
299 cbw
300 add bx, ax
301 mov di, bx
302
303 cmp byte [si], '0'
304 jae .Loop
305
306 xchg ax, di
307 mov di, [codeh]
308 ret
309
310 ; Scans a single character from the input file, placing
311 ; it in register al, which will be 0 upon error
312 ; or eof (so don't embed nulls in the Shelta source...)
313
314 ScanChar:
315 mov ah, 7 ; read from stdin one byte
316 int 21h
317 cmp al, ';' ; check for comment
318 je .Comment
319 ret
320 .Comment: mov ah, 7 ; read from stdin one byte
321 int 21h
322 cmp al, ';' ; check for comment
323 jne .Comment
324 jmp short ScanChar
325
326 LookupSymbol:
327 ; INPUT: si = address of symbol to find, di = address of null termination
328 ; OUTPUT: ds:ax = pointer to contents or zero if not found
329 ; cx = length of contents
330
331 mov bx, symt ; bx starts at symbol table
332 mov bp, si
333 sub di, si
334
335 .Loop: mov ax, [bx] ; first word = token size
336
337 mov dx, bx ; keep track of start of this symt entry
338
339 sub ax, 6
340 cmp ax, di
341 jne .Exit ; if it doesn't fit, you must acquit
342
343 ; exit if right token
344
345 xor si, si ; reset si to token
346 .Inner: mov al, [bx + 6] ; get byte from bx+6=pointer to token text
347 cmp [bp + si], al ; compare to si=token
348 jne .Exit
349 inc bx
350 inc si
351 cmp si, di ; hit the length yet?
352 jb .Inner ; no, repeat
353
354 ; a match!
355
356 mov bx, dx
357 mov cx, [bx + 4] ; third word = data length
358 mov ax, [bx + 2] ; second word = data ptr
359 ret
360
361 .Exit: mov bx, dx
362 mov ax, [bx]
363 add bx, ax
364 cmp bx, [symth]
365 jb .Loop
366
367 mov al, 16 ; return 16 if unknown identifier
368 jmp GlobalExit
369
370 ;-------------- Initialized Data
371
372 symth: dw symt
373 codeh: dw code
374 stach: dw stac
375 safeh: dw safe + 2
376 macrh: dw macr
377
378 ttable: dw BeginBlock, PushWord, EndBlock, PushPointer, LiteralByte, String
379 ; [ \ ] ^ _ `
380
381 ;-------------- Uninitialized Data
382
383 section .bss
384
385 token: resb 128
386
387 safestart: resw 1
388 toklength: resw 1
389
390 safe: resb 16384
391 symt: resb 16384 ; 16K + 16K = 32K
392 code: resb 4096
393 macr: resb 4096 ; + 8K = 40K
394 stac: resb 256
395
396 ;-------------- Equates
397
398 safeadj equ (safe - 0104h)
399 codeadj equ (code - 0104h)