B/text: Text Template. @Purpose: Code to support the text kind of value. @------------------------------------------------------------------------------- @p Block Format. The short block for a text is two words long: the first word selects which form of storage will be used to represent the content, and the second word is a reference to that content. This reference is an I6 String or Routine in all cases except one, when it's a pointer to a long block containing a null-terminated array of characters, like a C string. Clearly we need |PACKED_TEXT_STORAGE| and |UNPACKED_TEXT_STORAGE| to distinguish between the two basic methods of text storage, roughly equivalent to the pre-2013 kinds "text" and "indexed text". But why do we need four? |CONSTANT_PACKED_TEXT_STORAGE| is easy to explain: the BlkValue routines normally detect constants using metadata in their long blocks, but of course that won't work for values which haven't got any long blocks. We use this instead. We don't need a |CONSTANT_UNPACKED_TEXT_STORAGE| because I7 never compiles constant text in unpacked form. The surprising one is |CONSTANT_PERISHABLE_TEXT_STORAGE|. This is a constant created by the I7 compiler which is marked as being tricky because its value is a text substitution containing references to local variables. Unlike other text substitutions, this can't meaningfully be stored away to be expanded later: it must be expanded into unpacked text before it perishes. @c Constant CONSTANT_PACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_CONSTANT + 1; Constant CONSTANT_PERISHABLE_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_CONSTANT + 2; Constant PACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + 3; Constant UNPACKED_TEXT_STORAGE = BLK_BVBITMAP_TEXT + BLK_BVBITMAP_LONGBLOCK + 4; @p Extent Of Long Block. When there's a long block, we need enough of the entries to store the number of characters, plus one for the null terminator. @c [ TEXT_TY_Extent arg1 x; x = BlkValueSeekZeroEntry(arg1); if (x < 0) return -1; ! should not happen, of course return x+1; ]; @p Character Set. On the Z-machine, we use the 8-bit ZSCII character set, stored in bytes; on Glulx, we use the opening 16-bit subset of Unicode (which though only a subset covers almost all letter forms used on Earth), stored in half-words. The Z-machine does have very partial Unicode support, but not in a way that can help us here. It is capable of printing a wide range of Unicode characters, and on a good interpreter with a good font (such as Zoom for Mac OS X, using the Lucida Grande font) can produce many thousands of glyphs. But it is not capable of printing those characters into memory rather than the screen, an essential technique for texts: it can only write each character to a single byte, and it does so in ZSCII. That forces our hand when it comes to choosing the indexed-text character set. @c #IFDEF TARGET_ZCODE; Constant TEXT_TY_Storage_Flags = BLK_FLAG_MULTIPLE; Constant ZSCII_Tables; #IFNOT; Constant TEXT_TY_Storage_Flags = BLK_FLAG_MULTIPLE + BLK_FLAG_16_BIT; Constant Large_Unicode_Tables; #ENDIF; {-segment:UnicodeData.i6t} {-segment:Char.i6t} @p KOV Support. See the "BlockValues.i6t" segment for the specification of the following routines. Because no block values are ever stored in a text, they can freely be bitwise copied or forgotten, which is why we need do nothing special to copy or destroy a text. @c [ TEXT_TY_Support task arg1 arg2 arg3; switch(task) { CREATE_KOVS: return TEXT_TY_Create(arg2); CAST_KOVS: TEXT_TY_Cast(arg1, arg2, arg3); MAKEMUTABLE_KOVS: return TEXT_TY_Mutable(arg1); COPYQUICK_KOVS: rtrue; COPYSB_KOVS: TEXT_TY_CopySB(arg1, arg2); KINDDATA_KOVS: return 0; EXTENT_KOVS: return TEXT_TY_Extent(arg1); COMPARE_KOVS: return TEXT_TY_Compare(arg1, arg2); READ_FILE_KOVS: if (arg3 == -1) rtrue; return TEXT_TY_ReadFile(arg1, arg2, arg3); WRITE_FILE_KOVS: return TEXT_TY_WriteFile(arg1); HASH_KOVS: return TEXT_TY_Hash(arg1); DEBUG_KOVS: TEXT_TY_Debug(arg1); } ! We choose not to respond to: DESTROY_KOVS, COPYKIND_KOVS, COPY_KOVS rfalse; ]; @p Debugging. This shows the various forms a text's short block can take: @c [ TEXT_TY_Debug txt; switch (txt-->0) { CONSTANT_PACKED_TEXT_STORAGE: print " = cp~", (PrintI6Text) txt-->1, "~"; CONSTANT_PERISHABLE_TEXT_STORAGE: print " = cp~", (PrintI6Text) txt-->1, "~"; PACKED_TEXT_STORAGE: print " = p~", (PrintI6Text) txt-->1, "~"; UNPACKED_TEXT_STORAGE: print " = ~", (TEXT_TY_Say) txt, "~"; default: print " broken?"; } ]; @p Creation. A newly created text is a two-word short block with no long block, like this: |Array ThisIsAText --> PACKED_TEXT_STORAGE EMPTY_TEXT_PACKED;| @c [ TEXT_TY_Create short_block x; return BlkValueCreateSB2(short_block, PACKED_TEXT_STORAGE, EMPTY_TEXT_PACKED); ]; @p Copy Short Block. When a short block for a constant is copied, the new copy isn't a constant any more. @c [ TEXT_TY_CopySB to_bv from_bv; BlkValueCopySB2(to_bv, from_bv); if (to_bv-->0 & BLK_BVBITMAP_CONSTANTMASK) to_bv-->0 = PACKED_TEXT_STORAGE; ]; @p Transmutation. What happens if a text is stored in packed form, but we need to access or change its individual characters? The answer is that we have to "transmute" it into long block form. Sometimes this is a permanent change, but often it's only temporary, and will soon be followed by an un-transmutation. @c [ TEXT_TY_Transmute txt; TEXT_TY_Temporarily_Transmute(txt); ]; [ TEXT_TY_Temporarily_Transmute txt x; if ((txt) && (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0)) { x = txt-->1; ! The old value was a packed string txt-->0 = UNPACKED_TEXT_STORAGE; txt-->1 = FlexAllocate(32, TEXT_TY, TEXT_TY_Storage_Flags); if (x ~= EMPTY_TEXT_PACKED) TEXT_TY_CastPrimitive(txt, false, x); return x; } return 0; ]; [ TEXT_TY_Untransmute txt pk cp x; if ((pk) && (txt-->0 == UNPACKED_TEXT_STORAGE)) { x = txt-->1; ! The old value was an unpacked string FlexFree(x); txt-->0 = cp; txt-->1 = pk; ! The value earlier returned by TEXT_TY_Temporarily_Transmute } return txt; ]; @p Mutability. That neatly handles the question of how to make a text mutable. (Note that constants are never created in unpacked form.) @c [ TEXT_TY_Mutable txt; if (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) { TEXT_TY_Transmute(txt); return 0; } return 2; ! Tell BlockValue there's a long block pointer ]; @p Casting. In general computing, "casting" is the process of translating data in one type into semantically equivalent data in another: the only interesting cast here is that a snippet can be turned into a text. @c [ TEXT_TY_Cast to_txt from_kind from_value; if (from_kind == TEXT_TY) { BlkValueCopy(to_txt, from_value); } else if (from_kind == SNIPPET_TY) { TEXT_TY_Transmute(to_txt); TEXT_TY_CastPrimitive(to_txt, true, from_value); } else BlkValueError("impossible cast to text"); ]; [ SNIPPET_TY_to_TEXT_TY to_txt snippet; return BlkValueCast(to_txt, SNIPPET_TY, snippet); ]; @p Data Conversion. We use a single routine to handle two kinds of format translation: a packed I6 string into an unpacked text, or a snippet into an unpacked text. In each case, what we do is simply to print out the value we have, but with the output stream set to memory rather than the screen. That gives us the character by character version, neatly laid out in an array, and all we have to do is to copy it into the text and add a null termination byte. What complicates things is that the two virtual machines handle printing to memory quite differently, and that the original text has unpredictable length. We are going to try printing it into the array |TEXT_TY_Buffers|, but what if the text is too big? Disastrously, the Z-machine simply writes on in memory, corrupting all subsequent arrays and almost certainly causing the story file to crash soon after. There is nothing we can do to predict or avoid this, or to repair the damage: this is why the Inform documentation warns users to be wary of using text with large strings in the Z-machine, and advises the use of Glulx instead. Glulx does handle overruns safely, and indeed allows us to dynamically allocate memory as necessary so that we can always avoid overruns entirely. In either case, though, it's useful to have |TEXT_TY_BufferSize|, the size of the temporary buffer, large enough that it will never be overrun in ordinary use. This is controllable with the use option "maximum indexed text length". @c #ifndef TEXT_TY_BufferSize; Constant TEXT_TY_BufferSize = 512; #endif; Constant TEXT_TY_NoBuffers = 2; #ifdef TARGET_ZCODE; Array TEXT_TY_Buffers -> TEXT_TY_BufferSize*TEXT_TY_NoBuffers; ! Where characters are bytes #ifnot; Array TEXT_TY_Buffers --> (TEXT_TY_BufferSize+2)*TEXT_TY_NoBuffers; ! Where characters are words #endif; Global RawBufferAddress = TEXT_TY_Buffers; Global RawBufferSize = TEXT_TY_BufferSize; Global TEXT_TY_CastPrimitiveNesting = 0; @p Z Version. The two versions of this routine, one for each virtual machine, are in all important respects the same, but there are enough fiddly differences that it's clearer to give two definitions, so: @c #ifdef TARGET_ZCODE; [ TEXT_TY_CastPrimitive to_txt from_snippet from_value len news buffer; if (to_txt == 0) BlkValueError("no destination for cast"); SuspendRTP(); buffer = RawBufferAddress + TEXT_TY_CastPrimitiveNesting*TEXT_TY_BufferSize; TEXT_TY_CastPrimitiveNesting++; if (TEXT_TY_CastPrimitiveNesting > TEXT_TY_NoBuffers) FlexError("ran out with too many simultaneous text conversions"); @push say__p; @push say__pc; ClearParagraphing(6); @output_stream 3 buffer; if (from_value) { if (from_snippet) print (PrintSnippet) from_value; else print (PrintI6Text) from_value; } @output_stream -3; @pull say__pc; @pull say__p; ResumeRTP(); len = buffer-->0; if (len > RawBufferSize-1) len = RawBufferSize-1; buffer->(len+2) = 0; TEXT_TY_CastPrimitiveNesting--; BlkValueMassCopyFromArray(to_txt, buffer+2, 1, len+1); ]; @p Glulx Version. @c #ifnot; ! TARGET_ZCODE [ TEXT_TY_CastPrimitive to_txt from_snippet from_value len i stream saved_stream news buffer buffer_size memory_to_free results; if (to_txt == 0) BlkValueError("no destination for cast"); buffer_size = (TEXT_TY_BufferSize + 2)*WORDSIZE; RawBufferSize = TEXT_TY_BufferSize; buffer = RawBufferAddress + TEXT_TY_CastPrimitiveNesting*buffer_size; TEXT_TY_CastPrimitiveNesting++; if (TEXT_TY_CastPrimitiveNesting > TEXT_TY_NoBuffers) { buffer = VM_AllocateMemory(buffer_size); memory_to_free = buffer; if (buffer == 0) FlexError("ran out with too many simultaneous text conversions"); } if (unicode_gestalt_ok) { SuspendRTP(); .RetryWithLargerBuffer; saved_stream = glk_stream_get_current(); stream = glk_stream_open_memory_uni(buffer, RawBufferSize, filemode_Write, 0); glk_stream_set_current(stream); @push say__p; @push say__pc; ClearParagraphing(7); if (from_snippet) print (PrintSnippet) from_value; else print (PrintI6Text) from_value; @pull say__pc; @pull say__p; results = buffer + buffer_size - 2*WORDSIZE; glk_stream_close(stream, results); if (saved_stream) glk_stream_set_current(saved_stream); ResumeRTP(); len = results-->1; if (len > RawBufferSize-1) { ! Glulx had to truncate text output because the buffer ran out: ! len is the number of characters which it tried to print news = RawBufferSize; while (news < len) news=news*2; i = VM_AllocateMemory(news*WORDSIZE); if (i ~= 0) { if (memory_to_free) VM_FreeMemory(memory_to_free); memory_to_free = i; buffer = i; RawBufferSize = news; buffer_size = (RawBufferSize + 2)*WORDSIZE; jump RetryWithLargerBuffer; } ! Memory allocation refused: all we can do is to truncate the text len = RawBufferSize-1; } buffer-->(len) = 0; TEXT_TY_CastPrimitiveNesting--; BlkValueMassCopyFromArray(to_txt, buffer, 4, len+1); } else { RunTimeProblem(RTP_NOGLULXUNICODE); } if (memory_to_free) VM_FreeMemory(memory_to_free); ]; #endif; @p Comparison. This is more or less |strcmp|, the traditional C library routine for comparing strings, but it does pose a few interesting questions. The answers are: (a) Two different unexpanded texts with substitutions are never equal, so "[X]" and "[Y]" aren't equal as texts even if X and Y are equal. (b) Otherwise we test the current value of the text as expanded, so "[X]" and "17" can be equal as texts if X is 17. @c [ TEXT_TY_Compare left_txt right_txt rv; @push say__comp; say__comp = true; rv = TEXT_TY_Compare_Inner(left_txt, right_txt); @pull say__comp; return rv; ]; [ TEXT_TY_Compare_Inner left_txt right_txt pos ch1 ch2 capacity_left capacity_right fl fr cl cr cpl cpr; if (left_txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) fl = true; if (right_txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) fr = true; if (fl && fr) { if ((left_txt-->1 ofclass String) && (right_txt-->1 ofclass String)) return left_txt-->1 - right_txt-->1; if ((left_txt-->1 ofclass Routine) && (right_txt-->1 ofclass Routine)) return left_txt-->1 - right_txt-->1; cpl = left_txt-->0; cl = TEXT_TY_Temporarily_Transmute(left_txt); cpr = right_txt-->0; cr = TEXT_TY_Temporarily_Transmute(right_txt); } else if (fl) { cpl = left_txt-->0; cl = TEXT_TY_Temporarily_Transmute(left_txt); } else if (fr) { cpr = right_txt-->0; cr = TEXT_TY_Temporarily_Transmute(right_txt); } if ((cl) || (cr)) { pos = TEXT_TY_Compare(left_txt, right_txt); TEXT_TY_Untransmute(left_txt, cl, cpl); TEXT_TY_Untransmute(right_txt, cr, cpr); return pos; } capacity_left = BlkValueLBCapacity(left_txt); capacity_right = BlkValueLBCapacity(right_txt); for (pos=0:(pos0; p = TEXT_TY_Temporarily_Transmute(txt); rv = 0; len = BlkValueLBCapacity(txt); for (i=0: i0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) return PrintI6Text(txt-->1); dsize = BlkValueLBCapacity(txt); for (i=0: i 0) { BlkValueWrite(mod, 0, CharToCase(BlkValueRead(mod, 0), 1)); TEXT_TY_Say(mod); rc = true; say__p = 1; } BlkValueFree(mod); return rc; ]; @p Serialisation. Here we print a serialised form of a text which can later be used to reconstruct the original text. The printing is apparently to the screen, but in fact always takes place when the output stream is a file. The format chosen is a letter "S" for string, then a comma-separated list of decimal character codes, ending with the null terminator, and followed by a semicolon: thus |S65,66,67,0;| is the serialised form of the text "ABC". @c [ TEXT_TY_WriteFile txt len pos ch p cp; cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); len = BlkValueLBCapacity(txt); print "S"; for (pos=0: pos<=len: pos++) { if (pos == len) ch = 0; else ch = BlkValueRead(txt, pos); if (ch == 0) { print "0;"; break; } else { print ch, ","; } } TEXT_TY_Untransmute(txt, p, cp); ]; @p Unserialisation. If that's the word: the reverse process, in which we read a stream of characters from a file and reconstruct the text which gave rise to them. @c [ TEXT_TY_ReadFile txt auxf ch i v dg pos tsize p; TEXT_TY_Transmute(txt); tsize = BlkValueLBCapacity(txt); while (ch ~= 32 or 9 or 10 or 13 or 0 or -1) { ch = FileIO_GetC(auxf); if (ch == ',' or ';') { if (pos+1 >= tsize) { if (BlkValueSetLBCapacity(txt, 2*pos) == false) break; tsize = BlkValueLBCapacity(txt); } BlkValueWrite(txt, pos++, v); v = 0; if (ch == ';') break; } else { dg = ch - '0'; v = v*10 + dg; } } BlkValueWrite(txt, pos, 0); return txt; ]; @p Substitution. @c [ TEXT_TY_SubstitutedForm to txt; if (txt) { BlkValueCopy(to, txt); TEXT_TY_Transmute(to); } return to; ]; [ TEXT_TY_IsSubstituted txt; if ((txt) && (txt-->0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) && (txt-->1 ofclass Routine)) rfalse; rtrue; ]; @p Perishability. As noted above, a perishable constant is one which must be expanded before the values it refers to vanish from existence. @c [ TEXT_TY_ExpandIfPerishable to from; if ((from) && (from-->0 == CONSTANT_PERISHABLE_TEXT_STORAGE)) return TEXT_TY_SubstitutedForm(to, from); return from; ]; @p Recognition-only-GPR. An I6 general parsing routine to look at words from the position marker |wn| in the player's command to see if they match the contents of the text |txt|, returning either |GPR_PREPOSITION| or |GPR_FAIL| according to whether a match could be made. This is used when the an object's name is set to include one of its properties, and the property in question is a text: "A flowerpot is a kind of thing. A flowerpot has a text called pattern. Understand the pattern property as describing a flowerpot." When the player types EXAMINE STRIPED FLOWERPOT, and there is a flowerpot in scope, the following routine is called to test whether its pattern property -- a text -- matches any words at the position STRIPED FLOWERPOT. Assuming a pot does indeed have the pattern "striped", the routine advances |wn| by 1 and returns |GPR_PREPOSITION| to indicate a match. This kind of GPR is called a "recognition-only-GPR", because it only recognises an existing value: it doesn't parse a new one. @c [ TEXT_TY_ROGPR txt p cp r; if (txt == 0) return GPR_FAIL; cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); r = TEXT_TY_ROGPRI(txt); TEXT_TY_Untransmute(txt, p, cp); return r; ]; [ TEXT_TY_ROGPRI txt pos len wa wl wpos bdm ch own; bdm = true; own = wn; len = BlkValueLBCapacity(txt); for (pos=0: pos<=len: pos++) { if (pos == len) ch = 0; else ch = BlkValueRead(txt, pos); if (ch == 32 or 9 or 10 or 0) { if (bdm) continue; bdm = true; if (wpos ~= wl) return GPR_FAIL; if (ch == 0) break; } else { if (bdm) { bdm = false; if (NextWordStopped() == -1) return GPR_FAIL; wa = WordAddress(wn-1); wl = WordLength(wn-1); wpos = 0; } if (wa->wpos ~= ch or TEXT_TY_RevCase(ch)) return GPR_FAIL; wpos++; } } if (wn == own) return GPR_FAIL; ! Progress must be made to avoid looping return GPR_PREPOSITION; ]; @p Blobs. That completes the compulsory services required for this KOV to function: from here on, the remaining routines provide definitions of text-related phrases in the Standard Rules. What are the basic operations of text-handling? Clearly we want to be able to search, and replace, but that is left for the segment "RegExp.i6t" to handle. More basically we would like to be able to read and write characters from the text. But texts in I7 tend to be of natural language, rather than containing arbitrary material -- that's indeed why we call them texts rather than strings. This means they are likely to be punctuated sequences of words, divided up perhaps into sentences and even paragraphs. So we provide facilities which regard a text as being an array of "blobs", where a "blob" is a unit of text. The user can choose whether to see it as an array of characters, or words (of three different sorts: see the Inform documentation for details), or paragraphs, or lines. @c Constant CHR_BLOB = 1; ! Construe as an array of characters Constant WORD_BLOB = 2; ! Of words Constant PWORD_BLOB = 3; ! Of punctuated words Constant UWORD_BLOB = 4; ! Of unpunctuated words Constant PARA_BLOB = 5; ! Of paragraphs Constant LINE_BLOB = 6; ! Of lines Constant REGEXP_BLOB = 7; ! Not a blob type as such, but needed as a distinct value @p Blob Access. The following routine runs a small finite-state-machine to count the number of blobs in a text, using any of the above blob types (except |REGEXP_BLOB|, which is used for other purposes). If the optional arguments |ctxt| and |wanted| are supplied, it also copies the text of blob number |wanted| (counting upwards from 1 at the start of the text) into the text |ctxt|. If the further optional argument |rtxt| is supplied, then |ctxt| is instead written with the original text |txt| as it would read if the blob in question were replaced with the text in |rtxt|. @c Constant WS_BRM = 1; Constant SKIPPED_BRM = 2; Constant ACCEPTED_BRM = 3; Constant ACCEPTEDP_BRM = 4; Constant ACCEPTEDN_BRM = 5; Constant ACCEPTEDPN_BRM = 6; [ TEXT_TY_BlobAccess txt blobtype ctxt wanted rtxt p1 p2 cp1 cp2 r; if (txt==0) return 0; if (blobtype == CHR_BLOB) return TEXT_TY_CharacterLength(txt); cp1 = txt-->0; p1 = TEXT_TY_Temporarily_Transmute(txt); cp2 = rtxt-->0; p2 = TEXT_TY_Temporarily_Transmute(rtxt); TEXT_TY_Transmute(ctxt); r = TEXT_TY_BlobAccessI(txt, blobtype, ctxt, wanted, rtxt); TEXT_TY_Untransmute(txt, p1, cp1); TEXT_TY_Untransmute(rtxt, p2, cp2); return r; ]; [ TEXT_TY_BlobAccessI txt blobtype ctxt wanted rtxt brm oldbrm ch i dsize csize blobcount gp cl j; dsize = BlkValueLBCapacity(txt); if (ctxt) csize = BlkValueLBCapacity(ctxt); else if (rtxt) "*** rtxt without ctxt ***"; brm = WS_BRM; for (i=0:i= 2) brm = WS_BRM; LINE_BLOB: if (gp >= 1) brm = WS_BRM; default: brm = WS_BRM; } } } else { gp = false; if ((blobtype == WORD_BLOB or PWORD_BLOB or UWORD_BLOB) && (ch == '.' or ',' or '!' or '?' or '-' or '/' or '"' or ':' or ';' or '(' or ')' or '[' or ']' or '{' or '}')) gp = true; switch (oldbrm) { WS_BRM: brm = ACCEPTED_BRM; if (blobtype == WORD_BLOB) { if (gp) brm = SKIPPED_BRM; } if (blobtype == PWORD_BLOB) { if (gp) brm = ACCEPTEDP_BRM; } SKIPPED_BRM: if (blobtype == WORD_BLOB) { if (gp == false) brm = ACCEPTED_BRM; } ACCEPTED_BRM: if (blobtype == WORD_BLOB) { if (gp) brm = SKIPPED_BRM; } if (blobtype == PWORD_BLOB) { if (gp) brm = ACCEPTEDP_BRM; } ACCEPTEDP_BRM: if (blobtype == PWORD_BLOB) { if (gp == false) brm = ACCEPTED_BRM; else { if ((ch == BlkValueRead(txt, i-1)) && (ch == '-' or '.')) blobcount--; blobcount++; } } ACCEPTEDN_BRM: if (blobtype == WORD_BLOB) { if (gp) brm = SKIPPED_BRM; } if (blobtype == PWORD_BLOB) { if (gp) brm = ACCEPTEDP_BRM; } ACCEPTEDPN_BRM: if (blobtype == PWORD_BLOB) { if (gp == false) brm = ACCEPTED_BRM; else { if ((ch == BlkValueRead(txt, i-1)) && (ch == '-' or '.')) blobcount--; blobcount++; } } } } if (brm == ACCEPTED_BRM or ACCEPTEDP_BRM) { if (oldbrm ~= brm) blobcount++; if ((ctxt) && (blobcount == wanted)) { if (rtxt) { BlkValueWrite(ctxt, cl, 0); TEXT_TY_Concatenate(ctxt, rtxt, CHR_BLOB); csize = BlkValueLBCapacity(ctxt); cl = TEXT_TY_CharacterLength(ctxt); if (brm == ACCEPTED_BRM) brm = ACCEPTEDN_BRM; if (brm == ACCEPTEDP_BRM) brm = ACCEPTEDPN_BRM; } else { if (cl+1 >= csize) { if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break; csize = BlkValueLBCapacity(ctxt); } BlkValueWrite(ctxt, cl++, ch); } } else { if (rtxt) { if (cl+1 >= csize) { if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break; csize = BlkValueLBCapacity(ctxt); } BlkValueWrite(ctxt, cl++, ch); } } } else { if ((rtxt) && (brm ~= ACCEPTEDN_BRM or ACCEPTEDPN_BRM)) { if (cl+1 >= csize) { if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break; csize = BlkValueLBCapacity(ctxt); } BlkValueWrite(ctxt, cl++, ch); } } } if (ctxt) BlkValueWrite(ctxt, cl++, 0); return blobcount; ]; @p Get Blob. The front end which uses the above routine to read a blob. (Note that, for efficiency's sake, we read characters more directly.) @c [ TEXT_TY_GetBlob ctxt txt wanted blobtype; if (txt==0) return; if (blobtype == CHR_BLOB) return TEXT_TY_GetCharacter(ctxt, txt, wanted); TEXT_TY_BlobAccess(txt, blobtype, ctxt, wanted); return ctxt; ]; @p Replace Blob. The front end which uses the above routine to replace a blob. (Once again, characters are handled directly to avoid incurring all that overhead.) @c [ TEXT_TY_ReplaceBlob blobtype txt wanted rtxt ctxt ilen rlen i p cp; TEXT_TY_Transmute(txt); cp = rtxt-->0; p = TEXT_TY_Temporarily_Transmute(rtxt); if (blobtype == CHR_BLOB) { ilen = TEXT_TY_CharacterLength(txt); rlen = TEXT_TY_CharacterLength(rtxt); wanted--; if ((wanted >= 0) && (wanted0; p1 = TEXT_TY_Temporarily_Transmute(ftxt); cp2 = rtxt-->0; p2 = TEXT_TY_Temporarily_Transmute(rtxt); r = TEXT_TY_ReplaceTextI(blobtype, txt, ftxt, rtxt); TEXT_TY_Untransmute(ftxt, p1, cp1); TEXT_TY_Untransmute(rtxt, p2, cp2); return r; ]; [ TEXT_TY_ReplaceTextI blobtype txt ftxt rtxt ctxt csize ilen flen i cl mpos ch chm whitespace punctuation; if (blobtype == REGEXP_BLOB or CHR_BLOB) return TEXT_TY_Replace_RE(blobtype, txt, ftxt, rtxt); ilen = TEXT_TY_CharacterLength(txt); flen = TEXT_TY_CharacterLength(ftxt); ctxt = BlkValueCreate(TEXT_TY); TEXT_TY_Transmute(ctxt); csize = BlkValueLBCapacity(ctxt); mpos = 0; whitespace = true; punctuation = false; for (i=0:i<=ilen:i++) { ch = BlkValueRead(txt, i); .MoreMatching; chm = BlkValueRead(ftxt, mpos++); if (mpos == 1) { switch (blobtype) { WORD_BLOB: if ((whitespace == false) && (punctuation == false)) chm = -1; } } whitespace = false; if (ch == 10 or 13 or 32 or 9) whitespace = true; punctuation = false; if (ch == '.' or ',' or '!' or '?' or '-' or '/' or '"' or ':' or ';' or '(' or ')' or '[' or ']' or '{' or '}') { if (blobtype == WORD_BLOB) chm = -1; punctuation = true; } if (ch == chm) { if (mpos == flen) { if (i == ilen) chm = 0; else chm = BlkValueRead(txt, i+1); if ((blobtype == CHR_BLOB) || (chm == 0 or 10 or 13 or 32 or 9) || (chm == '.' or ',' or '!' or '?' or '-' or '/' or '"' or ':' or ';' or '(' or ')' or '[' or ']' or '{' or '}')) { mpos = 0; cl = cl - (flen-1); BlkValueWrite(ctxt, cl, 0); TEXT_TY_Concatenate(ctxt, rtxt, CHR_BLOB); csize = BlkValueLBCapacity(ctxt); cl = TEXT_TY_CharacterLength(ctxt); continue; } } } else { mpos = 0; } if (cl+1 >= csize) { if (BlkValueSetLBCapacity(ctxt, 2*cl) == false) break; csize = BlkValueLBCapacity(ctxt); } BlkValueWrite(ctxt, cl++, ch); } BlkValueCopy(txt, ctxt); BlkValueFree(ctxt); ]; @p Character Length. When accessing at the character-by-character level, things are much easier and we needn't go through any finite state machine palaver. @c [ TEXT_TY_CharacterLength txt ch i dsize p cp r; if (txt==0) return 0; cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); dsize = BlkValueLBCapacity(txt); r = dsize; for (i=0:i0 & BLK_BVBITMAP_LONGBLOCKMASK == 0) { if (txt-->1 == EMPTY_TEXT_PACKED) rtrue; rfalse; } if (TEXT_TY_CharacterLength(txt) == 0) rtrue; rfalse; ]; @p Get Character. Characters in a text are numbered upwards from 1 by the users of this routine: which is why we subtract 1 when reading the array in the block-value, which counts from 0. @c [ TEXT_TY_GetCharacter ctxt txt i ch p cp; if (txt==0) return 0; cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); TEXT_TY_Transmute(ctxt); if ((i<=0) || (i>TEXT_TY_CharacterLength(txt))) ch = 0; else ch = BlkValueRead(txt, i-1); BlkValueWrite(ctxt, 0, ch); BlkValueWrite(ctxt, 1, 0); TEXT_TY_Untransmute(txt, p, cp); return ctxt; ]; @p Casing. In many programming languages, characters are a distinct data type from strings, but not in I7. To I7, a character is simply a text which happens to have length 1 -- this has its inefficiencies, but is conceptually easy for the user. |TEXT_TY_CharactersOfCase(txt, case)| determines whether all the characters in |txt| are letters of the given casing: 0 for lower case, 1 for upper case. In the case of ZSCII, this is done correctly handling all of the European accented letters; in the case of Unicode, it follows the Unicode standard. Note that there is no requirement for |txt| to be only a single character long. @c [ TEXT_TY_CharactersOfCase txt case i ch len p cp r; if (txt==0) return 0; cp = txt-->0; p = TEXT_TY_Temporarily_Transmute(txt); len = TEXT_TY_CharacterLength(txt); r = true; for (i=0:i0; pk = TEXT_TY_Temporarily_Transmute(txt); TEXT_TY_Transmute(ctxt); len = TEXT_TY_CharacterLength(txt); if (BlkValueSetLBCapacity(ctxt, len+1)) { bnd = 1; for (i=0:i0; p = TEXT_TY_Temporarily_Transmute(from_txt); r = TEXT_TY_ConcatenateI(to_txt, from_txt, blobtype, ref_txt); TEXT_TY_Untransmute(from_txt, p, cp); return r; ]; [ TEXT_TY_ConcatenateI to_txt from_txt blobtype ref_txt pos len ch i tosize x y case; switch(blobtype) { CHR_BLOB, 0: pos = TEXT_TY_CharacterLength(to_txt); len = TEXT_TY_CharacterLength(from_txt); if (BlkValueSetLBCapacity(to_txt, pos+len+1) == false) return to_txt; for (i=0:i0; p = TEXT_TY_Temporarily_Transmute(from_txt); len = TEXT_TY_CharacterLength(from_txt); if (len > 118) len = 118; #ifdef TARGET_ZCODE; buffer->1 = len; at = 2; #ifnot; buffer-->0 = len; at = 4; #endif; for (i=0:i(i+at) = CharToCase(BlkValueRead(from_txt, i), 0); for (:at+i<120:i++) buffer->(at+i) = ' '; VM_Tokenise(buffer, parse); players_command = 100 + WordCount(); ! The snippet variable "player's command" TEXT_TY_Untransmute(from_txt, p, cp); ];