.TITLE ZPARSE .IDENT /V1.3/ ; ++ ; This is the Z-machine parser. ; (c) 2000 by Johnny Billquist ; ; History: ; ; 00-08-01 BQT Initial coding started. ; Y1.0 00-08-26 21:00 BQT First release. ; Y1.1 00-08-30 02:00 BQT Bugfix. If flag is set, the parser ; should skip entry if word in unknown. ; Y1.2 00-09-01 04:30 BQT Added parsing of ASCII characters. ; V1.3 00-11-07 16:00 BQT Code is working. ; -- .INCLUDE /ZMAC/ .PSECT DATA,D,RW ; SEPTBL: .BLKB 100 ; Separator table. ; WRDLEN: .WORD 0 ; Length of each word entry. WRDCNT: .WORD 0 ; Number of words in dictionary. WRDTAB: .WORD 0 ; Pointer to start of dictionary. WRDLIM: .WORD 0 ; Limit on # of words. ; WORD: .BLKB 10. ; Current word. WRDEND: ZWORD: .BLKB 18. ; Current word in ZSCII. CURLEN: .WORD 0 ; Length of current word. ZWLEN:: .WORD 0 ; Length of words in WORDS. CURWRD: .WORD 0 ; Current number of words. FLAGS: .WORD 0 ; Parse flags. ; STRPTR: .WORD 0 ; Pointer to start of string. ; .PSECT CODE,I,RO ; ; PARSE ; ; Parse a string. ; Do lexical analysis, and build the parse tree. ; ; In: R0 - Address of input buffer. ; R1 - Address of input string. ; R2 - Address of dictionary table. ; R3 - Parse buffer. ; R4 - Flags. ; PARSE:: MOV R4,FLAGS ; Save flags. MOV R5,-(SP) ; Save registers. MOV R0,STRPTR ; Save string pointer. ; ; First set up the basic parser info. ; MOV #SEPTBL,R4 .GETBB R2,R5 ; Get size of separator table. MOVB R5,(R4)+ INC R2 MOV R3,-(SP) 10$: .GETBB R2,R3 ; Copy all separators. MOVB R3,(R4)+ INC R2 SOB R5,10$ MOV (SP)+,R3 .GETBB R2,WRDLEN ; Get word length. INC R2 .GETWB R2,WRDCNT ; Get word count. ADD #2,R2 MOV R2,WRDTAB ; Save pointer to word table. ; ; Now we have all the dictionary info we need. R2 can now be ; destroyed. ; .GETBB R3,WRDLIM ; Limit on words. INC R3 MOV R3,-(SP) ; Save pointer to word count. INC R3 CLR CURWRD ; Number of words done. ; 100$: MOV #1,CURLEN CLR R2 ; Get next character. BISB (R1)+,R2 BNE 110$ ; We got something. 101$: MOV (SP)+,R3 ; Nothing more. .PUTBB R3,CURWRD ; Save actual number of words done. MOV (SP)+,R5 ; Restore registers. RETURN ; ; We have a character. ; 110$: CMP R2,#' ; Space? BEQ 100$ ; Yes. Next char please. ; ; We have the beginning of a word. ; MOV R1,-(SP) ; Save pointer to start of word. DEC (SP) ; (um, start was at previous char.) MOV #WORD,R5 ; Point at place for word. MOVB R2,(R5)+ ; Copy char. CALL ISSEP ; Is it a separator? BCS 200$ ; Yes. That's our word. ; ; The word is not a separator. Let's start copying. ; 120$: CLR R2 ; Get next char. BISB (R1),R2 BEQ 200$ ; End of string. CALL ISSEP ; Separator? BCS 200$ ; Yes. CMP R2,#' ; Space? BEQ 200$ ; Yes. CMP R5,#WRDEND-1 ; Buffer full? BGE 130$ ; Yes. MOVB R2,(R5)+ ; No. Store it. 130$: INC R1 ; Point at next char. INC CURLEN BR 120$ ; And repeat. ; ; We now have a word. ; 200$: CLRB (R5) ; Set end of word mark. ; ; Now we'll ZSCIIfy it. ; MOV R0,-(SP) ; Save register. MOV ZWLEN,R0 ; Get length of words. ASR R0 ADD ZWLEN,R0 MOV #WORD,R4 ; Ascii for word. MOV #ZWORD,R5 ; Buffer where to place zscii. 300$: CLR R2 ; Get char. BISB (R4)+,R2 BEQ 310$ ; End of buffer? CALL ZSCII ; No. Convert to ZSCII. SOB R0,300$ ; Loop until space runs out. BR 315$ ; Space ran out. 310$: MOVB #5,(R5)+ ; Space left. Fill with filler. SOB R0,310$ ; Loop. ; ; And then pack it. ; 315$: MOV ZWLEN,R5 ; Word length. ASR R5 ADD ZWLEN,R5 CLRB ZWORD(R5) ; Set end of word mark. MOV (SP)+,R0 ; Restore R0. MOV #ZWORD,R4 ; Source. MOV #ZWORD,R5 ; Destination. 320$: CLR R2 ; Setup word. BISB (R4)+,R2 ; No. Get char. ASH #5,R2 ; Shift upwards. BISB (R4)+,R2 ; No. Get next char. ASH #5,R2 ; Shift upwards. BISB (R4)+,R2 ; No. Get char. SWAB R2 MOV R2,(R5)+ ; Save word. TSTB (R4) ; More to come? BNE 320$ ; Loop. ; BIS #200,-(R5) ; Set flag on last word. ; ; We now have a correct ZSCII word in ZWORD, along with the plain ; word in WORD. ; ; We now need to find the word in the dectionary. ; CALL FNDWRD ; ; At this point we have the following information: ; R0 - Points at start of input buffer. ; R1 - Points at next word in input buffer. ; R2 - Address of word in dictionary (or 0 if no match). ; R3 - Address where in parse buffer to place result. ; (SP) - Address of start of word. ; ; Start by setting up the parse info. ; TST R2 ; Did we succeed? BNE 400$ ; Yes. TST FLAGS ; No. Should we include unknown words? BEQ 400$ ; Yes. ADD #2,SP ; No. Clean up stack. ADD #4,R3 ; Skip to next entry. BR 499$ ; Do next word. ; 400$: .PUTWB R3,R2 ; Byte address in dictionary. ADD #2,R3 .PUTBB R3,CURLEN ; Word length. INC R3 MOV (SP)+,R4 ; Get pointer to start of string. SUB R0,R4 ; Make pointer relative to start of buf. .PUTBB R3,R4 ; Save pointer. INC R3 ; ; That's all, folks! ; 499$: INC CURWRD ; Number of words done. DEC WRDLIM ; Max # of words left to do. BEQ 500$ ; Done? JMP 100$ ; No. Next word. 500$: JMP 101$ ; Yes. ; ; ISSEP - Check if a character is a separator. ; ; In: R2 - Character. ; ; Out: Carry set means character was a separator. ; ISSEP:: MOV R0,-(SP) ; Save registers. MOV R1,-(SP) MOV #SEPTBL,R0 ; Point at separator table. MOVB (R0)+,R1 ; Get length of table. 10$: CMPB R2,(R0)+ ; Separator match? BEQ 20$ ; Yes. SOB R1,10$ ; No. Check next. CLC ; No match found. BR 30$ 20$: SEC ; Match found. 30$: MOV (SP)+,R1 ; Restore registers. MOV (SP)+,R0 RETURN ; ; ZSCII - Convert a character to ZSCII character(s). ; ; In: R2 - Character. ; R5 - Buffer where to placed converted character(s). ; ; Out: R5 updated. ; ZSCII: MOV R0,-(SP) ; Save registers. MOV R1,-(SP) MOV ASCPTR,R0 ; Get ASCII table. MOV #26.,R1 ; We have 26 possible chars to look at. 10$: CMPB R2,(R0)+ ; Match? BEQ 30$ ; Yes. Done. SOB R1,10$ ; No. Try next. MOVB #5,(R5)+ ; If we didn't find it, try table 2. MOV #26.,R1 ADD R1,R0 20$: CMPB R2,(R0)+ ; Match? BEQ 30$ ; Yes. SOB R1,20$ ; No. Try next. MOVB #6,(R5)+ ; No match found. Indicate ASCII coming. MOV R2,-(SP) ; Put it in as ascii. ASH #-5,R2 ; First three high bits. BIC #^C7,R2 MOVB R2,(R5)+ MOV (SP)+,R1 ; And then five low bits. BIC #^C37,R1 BR 35$ 30$: NEG R1 ADD #32.,R1 35$: MOVB R1,(R5)+ MOV (SP)+,R1 MOV (SP)+,R0 RETURN ; ; FNDWRD - Find a word in the dictionary. ; ; In: ZWORD - The ZSCII word. ; WRDTAB - Pointer to the dictionary. ; WRDCNT - Number of words in dictionary. ; WRDLEN - Length of one word in dictionary. ; ; Out: R2 - Address of dictionary entry, or 0 if not found. ; ; If WRDCNT is a negative value, we need to search through to ; dictionary in a linear fashion. ; FNDWRD: MOV R0,-(SP) ; Save registers... MOV R1,-(SP) MOV R3,-(SP) MOV R4,-(SP) MOV R5,-(SP) CLR R0 ; First entry. MOV WRDCNT,R1 ; Last entry. BPL 10$ ; If positive, we have a sorted dict. ; ; Dictionary is unsorted. We'll have to run trough it linear. ; NEG R1 ; Actual dictionary length. 1$: MOV R0,R3 ; Point at entry to check. MUL WRDLEN,R3 ADD WRDTAB,R3 MOV ZWLEN,R2 ; Number of bytes to compare. ASR R2 ; Make that words. MOV #ZWORD,R4 ; Word to compare against. 2$: .GETWB R3,R5 ; Read word. ADD #2,R3 ; Next byte. CMP (R4)+,R5 ; Compare. BNE 3$ ; Not equal. Search on. SOB R2,2$ ; Repeat. SUB ZWLEN,R3 ; Found. R3 now points past word in MOV R3,R2 ; dictionary, so back up. BR 200$ ; Done. 3$: INC R0 ; Not correct entry. SOB R1,1$ ; Try again. BR 100$ ; We fail. ; ; Sorted dictionary. We can use a binary search through it... ; 10$: MOV R1,R3 ; Let R3 be in the middle of R0,R1 SUB R0,R3 ASR R3 ADD R0,R3 MUL WRDLEN,R3 ; Now make R3 into a proper address. ADD WRDTAB,R3 MOV ZWLEN,R2 ; R2 is word length. MOV #ZWORD,R4 ; And R4 points at "our" word. 20$: .GETBB R3,R5 ; Try to match words... INC R3 CMPB (R4)+,R5 BHI 30$ ; If higher, then we should search upper BLO 40$ ; If lower, we should search low SOB R2,20$ ; Match means we continue to check. SUB ZWLEN,R3 ; We found a match. R3 holds address MOV R3,R2 ; past word in dictionary entry. BR 200$ ; R2 is now correct. ; ; Search upper half. Move lower limit to just checked entry. ; 30$: MOV R1,R2 SUB R0,R2 ASR R2 BEQ 100$ ADD R2,R0 BR 10$ ; ; Search lower half. Move upper limit to just checked entry. ; 40$: MOV R1,R2 SUB R0,R2 ASR R2 BEQ 100$ SUB R2,R1 BR 10$ ; ; Finished... ; 100$: CLR R2 200$: .ENABL LSB .DBG #D.PARS,<"Parse word results in %D.">,R2 MOV (SP)+,R5 MOV (SP)+,R4 MOV (SP)+,R3 MOV (SP)+,R1 MOV (SP)+,R0 RETURN .DSABL LSB ; .END