--------------------------------------------------------------------------- Inform: An Apology pertaining to the second release --------------------------------------------------------------------------- "Look on my works, ye mighty, and despair..." Hello, Informer! Inform is an assembler for Infocom version-3 format story files. It has some of the trappings of a compiler, though its code is still haphazard in some places. It reports errors strangely at times, and in particular its expression evaluator has a few eccentric mannerisms. Some features one might expect from a compiler are flagrantly missing. Worse yet, much of the source code is still written in a naive and unsystematic fashion. On the bright side, it works most of the time, and runs in only two passes. (This may sound easy but is not, because the story file format requires all manner of tricky operations to be done: for example, the dictionary must be alphabetically sorted, and the code must know absolute addresses of its entries... and the address of the start of the dictionary depends on many other things not known during pass 1... and so on.) It produces "Curses", the author's game, correctly. This is a fairly strenuous test since the game is about 123K long and pushes most of the version-3 format to the limits. In Appendix A is a complete specification of the version-3 "Z-machine", and some details of how to use Inform as an assembler instead of a compiler. Some of this information is already circulating in other files, but uncollated. The rest seems only to be available in as much as it is implicit in the interpreter sources. Implementation bears about the same relation to designing a game as typing does to writing poetry. Appendix B contains some of the author's opinions on game design, and may safely be ignored by those of a nervous disposition. (In any case he has not absolutely always followed his own advice.) Appendix C discusses example programs. One of these ("Deja Vu") is a toy game which, although small and not very interesting in itself, contains the source of a fairly good parser and implements most of the standard kernel of adventure games; this may freely be stolen and adapted. The source of the other ("Hello Cruel World") is included within this file. The source of "Curses", on the other hand, is not on public show. Inform is not public domain, as mistakenly stated at earlier times, in the proper legal sense of the term. The copyright is retained by the author, Graham Nelson. He is perfectly happy for Inform to be used by anybody for any recreational purpose. It may be freely distributed provided no profit is involved, and provided the copyright message is retained. Please do not circulate heavily modified versions, and please comment any private changes of your own at the top of the source code. Story files produced by Inform belong to whoever wrote the source for them; I think, however, it is fair to ask that game-writers put some message into their credits saying that Inform was used, and giving the version number used to compile it. And now the author stands back, and looks forward to seeing new games with bated breath... Graham Nelson Magdalen College, Oxford April 1993 Since the first release, much improvement has been made in memory management which is now quite efficient: it allocates between 50 and 75K of memory, as opposed to 800K in the first edition. The code is in ANSI C, is contained in a single file (without needing non-standard headers) and some effort has been made to improve its portability. Hopefully it doesn't assume an ASCII character set, or 32-bit integers, or any particular byte-orientation within integers. PC versions now ought to be feasible. The code has been annotated to some extent, and contains notes which should be useful to anyone trying to port the code to a new machine. This documentation has changed only within the above introduction, in the new "objectloop" construction, and in Appendix C (sample output for the given programs). The language which Inform compiles has not changed (except that two defunct features, which had not in any case been documented, have been withdrawn). Details of changes to the ANSI source code of Inform may be found in detailed comments at its head. The author's email address may be found at the bottom of this file. Comments and bug reports (by email) are welcomed with whatever degree of enthusiasm he can muster. GAN June 1993 --------------------------------------------------------------------------- Contents --------------------------------------------------------------------------- 1. Command line format 2. Source file format 3. Compiler directives 4. Variables 5. Constants 6. Routines 7. Expressions 8. Commands 9. Conditions 10. Built-in functions 11. Objects 12. Verbs and grammar 13. The Dictionary 14. Indirect function calls 15. Text spacing A1. The Z-machine A2. How text is encoded A3. How Z-code is encoded A4. Using Inform as an assembler B1. A Bill of Player's Rights B2. What makes a good game? C1. A Hello Cruel World program C2. "Deja Vu": a toy game --------------------------------------------------------------------------- 1. Command line format --------------------------------------------------------------------------- inform [-options] where four switches may be given in options: h help information l list assembly lines s give statistics p give statistics after both passes m print memory allocation made d contract double spaces in text Samples of -s output can be found in Appendix C. For -d, see section (15). -m reveals how many bytes were malloc'ed. The program can be compiled in several different version: the default (and most economical) settings use about 75K. With judicious adjustment of various #defines at the beginning, this could be reduced a little further. Inform will write its output to a file with the same name, but prefixed with "z3". (This is easy to alter by changing #defines at the beginning of the source.) --------------------------------------------------------------------------- 2. Source file format --------------------------------------------------------------------------- Lines in an Inform file are terminated by semicolons. Exclamation marks ! thus... denote that the rest of that physical line is a comment. Backslashes "fold" lines, thus: initpos "A hinged trapdoor in the floor stands open, and light streams in \ from below."; is treated as if the "f" in "from below." follows directly from where the backslash \ is; i.e., the carriage return and leading spaces are removed. These lines may either be compiler directives (all of which fit on one line) or routines (which take more than one line). Inform command names are not case sensitive. --------------------------------------------------------------------------- 3. Compiler directives --------------------------------------------------------------------------- ATTRIBUTE Make new attribute flag CONSTANT Declare a constant DICTIONARY Enter in dictionary, and make a new constant for its address END End compilation here (this is optional) GLOBAL [ = ] Make a new global variable; [give it the initial value a] [ string ] [make it point to an (a+1)-byte array, which has as first byte, and is otherwise zeros] [ data ] [make it point to an a-byte array, which is all zeros] [ initial ... ] [make it point to an array, the bytes of which are as given] [ initstr "text" ] [make it point to an array, the bytes of which are the ASCII values of the characters in the string] OBJECT ... Make an object (see below) PROPERTY ... Make a new property (see below) RELEASE Set the release number to VERB ... Enter a line of grammar (see below) The following are mainly for debugging the compiler (should anyone ever get around to doing this) but might sometimes be amusing or helpful: LIST List the symbol table SHOWDICT Show dictionary TREE List object tree VERBS List verb table TRACE Trace assembler LTRACE List the lines of input ETRACE Trace expression evaluator BTRACE Trace assembler on both passes NOTRACE, NOLTRACE, etc Turn off appropriate tracing --------------------------------------------------------------------------- 4. Variables --------------------------------------------------------------------------- Variables are all two-byte integers, which are treated as signed when it makes sense to do so (eg in asking whether one is positive or not) but not when it isn't (eg when it is used as an address). There can be up to 240 global variables; as indicated in (3), these can be initialised to point to dynamic workspace, so as to achieve the effect of strings and arrays. In any routine, there can be up to 15 local variables. There is also a stack, but it should be tampered with only with care. Never call a variable "sp", as this is the stack pointer variable which you might occasionally need to use. The observant reader will have noticed that 240+15+1 = 256. This is of course no coincidence. --------------------------------------------------------------------------- 5. Constants --------------------------------------------------------------------------- Constants may be prefixed with a # character if desired. This can be useful if they are alphabetical and might otherwise be confused with something else. A constant in "double quotes" assembles the given text at a suitable (even) address, and gives half this address as the integer value. Inside this text the character ^ is replaced by a newline character, and the character ~ by a double-quote mark. A character in single quotes, such as 'e', means the ASCII value of that character. A dollar $ indicates that a hexadecimal constant follows; $$ indicates that binary follows. Declared constants can be given, and so can the special constants adjectives_table preactions_table actions_table which give the code address of these tables. A constant beginning a$, followed by the name of a routine which is an action routine, will have as value the number of the action. A constant beginning w$, followed by a word of text, has as value the address of the given word in the dictionary (Inform will give an error at compile time if no such word is there). Thus, for instance, the following are legal constants: 31415 $ff $$1001001 #adjectives_table #a$LookSub #w$invent 'X' "an emerald the size of a plover's egg" "~Hello,~ said Peter.^~Hello, Peter,~ said Jane.^" --------------------------------------------------------------------------- 6. Routines --------------------------------------------------------------------------- The syntax to begin a routine is [ RoutineName ... ; and to end it, is ]; l1 to ln are the names of local variables, which are also the call parameters. For example, if you have a routine [ Look i j k; ...some code... ]; and it is called by Look(attic); then i will initially have the value "attic" when this is executed. Any local variables not specified (in this case, j and k) are initially zero. Every routine returns a value to the caller; if no such value is explicitly given, this value is the integer 1. Inside a routine, labels may be declared with a line of their own: .labelname; but note that whereas local variables have names which only mean anything locally, labels have names which are global. In other words, you can't have a label called "loop" more than once in the file. There is one special routine, which you must define, called Main. This is where execution of the game will begin, and it _must_ be the first one defined. Returning from Main will cause the interpreter to crash: you should explicitly QUIT instead. Also, uniquely and for peculiar reasons, Main is _not_ permitted to have any local variables of its own. This means it is usually only used as an outer shell. --------------------------------------------------------------------------- 7. Expressions --------------------------------------------------------------------------- The usual arithmetic expressions are allowed, including the operators: = set variable (only) on left equal to value on right + - plus, minus * / % & | times, divide, remainder, bitwise and, bitwise or -> --> byte, word array entry (eg: buffer->4 gives contents of the byte with address buffer+4, while table-->3 gives the word at table+6) In addition one may call a function, either a built-in function or a routine. For example: 4*(x+3/y) i=j-->1 Fish(x)+Fowl(y) Warning: for a few commands, strange results may occur if two or more complicated expressions are used in the same command, for instance: put buffer+6 byte i+j+1 56*prime(4); One can only describe this as a hideous bug, but in practice the need seldom arises and the solution would be quite difficult to implement. --------------------------------------------------------------------------- 8. Commands --------------------------------------------------------------------------- The "high level" commands in Inform are as follows: NEW_LINE Print a carriage return QUIT Quit the game (at once, with no confirmatory question to the user) RESTART Restart the game from its initial state (ditto) SHOW_SCORE Redisplay the score bar immediately, without waiting for the next keyboard input PRINT "text" Print text PRINT_RET "text" Print text, print a newline and return 1 PRINT_NUM Print a as a (signed) decimal number PRINT_CHAR Print the character whose ASCII value is a PRINT_ADDR Print the string whose address is a PRINT_PADDR Print the string whose address is 2*a PRINT_OBJ Print the short name of object a READ Reads keyboard into buffer a and decomposes it to the buffer b: on entry, a[0] = buffer size, b[0] similarly on exit a[1] = no chars typed, 2 to a[1]+1 are the chars (unterminated) From byte 2, b contains 4-byte chunks, one for each word of input: address of dictionary entry if recognised, 0000 otherwise number of letters in word first char of word in a This command automatically redisplays the status (score) line. Precisely, it prints the short name of the object whose number is the first declared global variable, then prints the next two globals in the form "45/34". It is assumed that these are the location, score and number of turns so far. REMOVE Remove object a from the tree of objects (it may certainly be later put back) MOVE TO Add object a to the things possessed by b PUT BYTE Write byte value v into index'th byte after addr PUT WORD ...and similarly for words PUT_PROP
Address of main routine, in bytes, +1 (This +1 is why Main cannot have local variables - it is a peculiarity of the standard. Note also that this is uniquely a routine address in bytes and not words: Main must occur in the lower 64K of the file. Inform always sets word 3 to be word 2, plus 1.) 4 The dictionary table address, in bytes 5 Object table address, in bytes 6 Global variables address, in bytes 7 The total number of bytes in a saved game (Saving the game is done by saving this many bytes from the beginning of the machine. (Saved games also contain the current state of the Z-machine stack; the stack is _not_ stored anywhere in the Z-machine's memory.)) 8 This word of flags has bits: 0 Scripting on: send output to printer 1 Disable proportional fonts while this is set 4 Something mysterious to do with sound effects in The Lurking Horror This is followed by the six bytes from byte 18 to 23, which are the version number string. (Inform sets these to the current date, in the form YYMMDD.) Then more words: 12 Synonym table address in bytes 13 Length of file, in words 14 Sum of bytes from 64 upwards, mod $10000 (The length and checksum are not actually used at all by many interpreters.) The remaining bytes in the header are used by the interpreter and should be left alone by the game code. By convention, the next item in the memory map, beginning at $40, is the synonyms table. There are 3*32=96 strings stored here (entries 0 to 31 in three dictionaries), one after another. This means they all have even addresses, conveniently. Once these 96 strings are entered, the actual table begins, and this is what the synonyms address points to. The table contains 96 two-byte entries, which are the word addresses of the strings before it. (Since Inform never makes use of synonyms, this could just be left out altogether, but for the sake of convention it creates a null table containing 96 copies of " " (three spaces).) Next is the object table. In fact it begins with what is sometimes called the "global properties table", though it is actually a table of default values of properties. This is a list of 31 2-byte words. There is no property 0, so the first word is always 0000. (Inform also sets the default for property 1 - the special "name" property - to 0000; the remainder are set in property definitions.) After these 62 bytes, the objects begin, beginning from object 1. An object entry consists of 9 bytes, looking like: ---32 bits in 4 bytes--- ---3 bytes------------------ ---2 bytes-- The last three bytes are 00 when the object pointed to is "nothing". The is an address (in bytes) of the properties attached to the given object. When all these 9-byte entries are out of the way, the properties tables begin. (Inform keeps these in the same order as the objects they are attached to.) An individual property table has the brief header 03 --some even number of bytes--- and then lists the properties held, in descending numerical order. (This order is essential.) A property is stored as ---between 1 and 8 bytes-- The size byte is arranged as 32*the number of data bytes, plus the property number. Each list of properties is ended by a 00 size byte. This is why there is no property 0. When all the property tables are done, we come to the global variable table. Global variables are numbered from 0 to 239, and this table begins with 240 initial 2-byte values for them. After this is conventially left space for all the arrays, dynamic strings and so on which they point to. We have now reached the top of the save area. Everything above here is never altered. Next is the table of grammar, which is described as above. It is immediately followed by the actions table, the preactions table and then the adjectives table, also described above. And next the dictionary table, described above. Next is the code area. Not all Infocom games begin with Main, but all Informed ones do. The code area simply contains a list of routines. All routines (and static strings) must occur at even addresses, so as to enable them to have word addresses instead. (Inform occasionally inserts 00 bytes between routines to ensure this.) A routine begins with one byte indicating the number of local variables the routine has (from 0 to 15), and then with that many 2-byte words giving their initial values, if not supplied by the call to the routine. (Inform never makes use of this initialisation, and simply stores 0000's here.) Unlike global variables, these bytes are _not_ used for the current values of the variables: they are kept on the stack. Executable code follows this header. There is no special marker for the end of a routine; it is simply expected that in every case a legal return instruction will be hit. Finally, from the end of the code to the top of memory are the static strings. These are put up here to be out of the way, where they won't clog up the bottom 64K of memory. There's no table of their addresses, or pointer to where they begin; each is referred to by an address in the code or data given earlier. --------------------------------------------------------------------------- A2. How text is encoded --------------------------------------------------------------------------- Text is stored as a sequence of 2-byte words. Each of these is divided into three 5-bit pieces, plus 1 bit left over, arranged as --first byte------- --second byte--- 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 bit --first-- --second--- --third-- The bit is set only on the last 2-byte word of the text, and so marks the end. The pieces are then characters, with values in the range 0 to 31. There are three alphabets, in which the numbers 6 to 31 mean: A0 abcdefghijklmnopqrstuvwxyz A1 ABCDEFGHIJKLMNOPQRSTUVWXYZ A2 ^0123456789.,!?_#'~/\-:() ('^' being actually the new-line character.) Character 0 is a space in all alphabets. Characters 1, 2 and 3 are used for abbreviations. Inform makes no use of these, but the Z-machine provides for commonly occurring strings to be printed out as if they were characters. Being plainly abbreviations, these are for some reason called "synonyms". By default, a character is presumed to be in A0, i.e. to be a lower-case English letter. However, the character 4 means that the next one (only) is in A1; and 5 means the next is in A2. Notice that character 6 in A2 is blank. It isn't a space: it simply isn't there. The sequence 5 followed by 6 indicates that the next two characters define an ASCII value. This is the way to get at the characters not in any of the three alphabets. For example, the familiar message *** You are dead *** takes four "characters" to produce each of the *'s. Finally, note that the end-bit only comes up once every three characters, so that a way is needed to safely use up any spare characters in the last 2-byte block. This is done by padding out with 5's. (5 followed by 5 does nothing.) This is especially the case with dictionary entries. Some dictionary entries, like "i", ought only to take one 2-byte block, but in order to make all entries 2-byte blocks and alphabetically sortable by number, they are padded out by up to five 5's in a row. In practice the text compression factor is not really very good: "Curses" contains about 127000 characters of text, stored in 91000 bytes. (Text usually accounts for about three quarters of a story file.) But the encoding does at least encrypt the text so that casual browsers can't read it. --------------------------------------------------------------------------- A3. How Z-code is encoded --------------------------------------------------------------------------- The encoding of version 3 Z-code is to say the least complicated. The reader is warned that it is also different to that in all other versions. There are all kinds of exceptions intended either to make small economies of code size (these are very seldom worth the effort, in fact) or to provide new features tacked on at the last minute. Experimenting with Inform as an assembler, while tracing is turned on, may be helpful. Z-code understands four kinds of operand, and describes these in 2-bit fields: $$00 Large constant (>=256) 2 bytes $$01 Small constant (0 to 255) 1 byte $$10 Variable 1 byte $$11 Omitted altogether 0 bytes Variables are described in one byte. 00 means the top of the stack, 01 to $0f are the local variables of the current routine and $10 to $ff are the global variables, 0 to 239. Writing to 00 pushes something onto the stack and reading from it pulls it off. The stack can also be manipulated (with care) using the PUSH, PULL and POP instructions. The stack is guaranteed to be at least 512 bytes long, and some interpreters are more generous. There isn't any way to check stack overflowing, so be careful with recursion. (One of the trickiest problems in compiling Z-code is throwing away unwanted return values of routines which are left on the stack... it can take hundreds of turns before a game crashes if this is got wrong.) Z-code opcodes are 1 byte only. To begin with, look at the top two bits. If these are $$11, we shall call it "variable"; if $$10, "short"; and otherwise "long". In this description, we shall adopt the opcode names used by the existing Infocom disassembler "TXD". For short opcodes, look at the next two bits (4 and 5). These give the kind of operand which the code has. If this is $11, there isn't an operand and the opcode has no argument at all. In this event, the remaining part of the opcode gives what it is: $00 RET#TRUE (1) The opcode is followed by text $01 RET#FALSE in 2-byte chunks as usual $02 PRINT (1) $03 PRINT_RET (1) (2) Opcode followed by a branch $05 SAVE (2) $06 RESTORE (2) (3) This is an abbreviation for $07 RESTARE RET SP, to save one byte $08 RET(SP)+ (3) $09 POP $0A QUIT $0B NEW_LINE $0C SHOW_SCORE $0D VERIFY (2) If the type wasn't $11, then an operand follows, and moreover the "code" part of the opcode means something different: $00 JZ (2) (4) Followed by a store opcode $01 GET_SIBLING (2) (4) (before the branch, if there $02 GET_CHILD (2) (4) is also a branch) $03 GET_PARENT (4) $04 GET_PROP_LEN (4) (5) Refers indirectly to variables $05 INC (5) by their number (Inform $06 DEC (5) suppresses this feature, so $07 PRINT_ADDR "@inc sp" produces the constant 0 instead of variable no. 0 as $09 REMOVE_OBJ operand) $0A PRINT_OBJ $0B RET $0C JUMP $0D PRINT_PADDR $0E LOAD (4) (5) $0F NOT (4) "Long" opcodes have two operands. The bottom 5 bits of the opcode say what it is: $01 JE (2) (6) (6) If this is encoded as $02 JLE (2) "variable", then operands 3 and $03 JGE (2) 4 (if present) are used as a $04 DEC_CHK (2) (5) kind of OR command: eg, $05 INC_CHK (2) (5) branch if o1 = o2, o3 or o4 $06 COMPARE_POBJ (2) $07 TEST (2) $08 OR (4) $09 AND (4) $0A TEST_ATTR (2) $0B SET_ATTR $0C CLEAR_ATTR $0D STORE (5) $0E INSERT_OBJ $0F LOADW (4) $10 LOADB (4) $11 GET_PROP (4) $12 GET_PROP_ADDR (4) $13 GET_NEXT_PROP (4) $14 ADD (4) $15 SUB (4) $16 MUL (4) $17 DIV (4) $18 MOD (4) The alert reader will notice that bits 5 and 6 are left spare to be used. Now there are two operands to specify, which ought to take up 4 bits, which obviously won't fit. So a more economical form is used instead. Bit 6 refers to the first operand, and bit 5 to the second. A value of 0 means a small constant and 1 means a variable. Now, type $11 (not really there) operands can't happen, so that's no problem, but there might well be type $00 (large constant) operands, for example in "@mul x #666 sp". In this event, the opcode is instead programmed as a "variable" opcode. So we must now describe the "variable" opcode form. In addition to the possible opcodes which can arise from overflowing "long" opcodes, there are others which can only be "variable". Here all of the bottom 6 bits are available to describe the opcode, and this either holds the above numbers $00 to $18 or else: $20 CALL (4) (7) These codes are somewhat $21 STOREW conjectural and only apply $22 STOREB to a few Infocom games; Inform $23 PUT_PROP never uses them unless told to $24 READ explicitly $25 PRINT_CHAR $26 PRINT_NUM $27 RANDOM (4) $28 PUSH $29 PULL (5) $2A STATUS_SIZE (7) $2B SET_WINDOW (7) $33 SET_PRINT (7) $34 #RECORD_MODE (7) $35 SOUND (7) Some of these are only of "variable" type because the available codes for the other types had run out - PRINT_CHAR, for instance. Others, especially CALL, need the flexibility to have between 1 and 4 operands. In the "variable" type opcode, all eight bits of the opcode have been used up, so we have to add another byte describing the operands. This is divided into four 2-bit fields. For example, $$00101111 means large constant followed by variable (and no third or fourth opcode). Once the opcode is out of the way, the operands are simply stored in one or two-byte form as appropriate. PRINT and PRINT_RET are followed by text: this is assembled in the usual way immediately after the opcode (which may well be at an odd address, but this doesn't matter) and execution resumes after the last 2-byte chunk of text (the one with top bit set). Opcodes marked as "store" in the above tables, return a value: for example, MUL multiplies its two arguments together, and CALL calls a routine which must return a value. Such instructions are followed by a single byte giving the variable (stack pointer, local or global as usual) to put it in. This may look like an extra operand but is not: there is no need to tell the Z-machine what type it has, since it must be a variable. Finally, there are instructions which test a condition. Apart from the obvious branch instructions (JE and so on), SAVE does this, for example, the test in question being whether or not the save was successful. Branches are stored in two different ways for economy reasons: nearby ones in a single byte at the end of the instruction, farther ones in two bytes. The top bit of the first byte of a branch is the "flag". If this is clear, then a branch occurs when the condition came out false. If it is set, then the branch occurs when it was true. If the next bit (bit 6) is set, then the branch is in abbreviated 1-byte format and the offset is in the bottom 6 bits (0 to 5). If not, the offset is in the bottom 15 bits (0 to 6 of the first byte, and all of the second). This offset can be positive or negative. (Eg., all 1's means -1 in the usual way.) In the abbreviated form, an offset of 1 in fact means "return true from the current routine" and an offset of $20 (i.e., -31) means "return false". An offset of 1 is never useful but -31 might arise, and so it is essential to use the long form for such branches. Working out what the offset ought to be is more complicated than it appears because the PC has already moved on from the start of the instruction when it reaches the branch. The bizarre formula in question is Offset = Destination address - Address of this instruction - Length + B where Length = number of bytes in instruction (not counting the branch) and B is 1 for short branches, 0 for long ones. In practice Inform compiles branches in the long form, considering the economy to be not worth the nightmarish computation needed to make the long/short decision. (One problem is that the number of bytes in each instruction _must_ be the same in both passes, so that the decision needs to be made before the value of the offset is known... in a 2-pass compiler this is insoluble. Another is that the offsets are affected by the size of the branch, confusing things considerably on forward branches.) However, its assembler mode allows you to make an explicit choice. JUMP instructions similarly encode their address operand as an offset, but always as a two-byte (signed) constant. In this respect they differ from CALL instructions. In a CALL, the address is half the absolute routine address. --------------------------------------------------------------------------- A4. Using Inform as an assembler --------------------------------------------------------------------------- Inform can also act as an assembler. A line beginning with an @ character is sent straight to the assembly routines. Constants and variable names can be given as operands but not compound expressions. The following are supported: jump

Set property p of object o to value v INC Increment variable DEC Decrement RETURN Return the value a RET#TRUE Return true, i.e. the value 1 RET#FALSE Return false, i.e. the value 0 INVERSION Print the version number of Inform used to compile the story file IF If the condition is true, execute the code { ... code ... } (braces are _compulsory_) [else execute the [ ELSE { ... other ...} ] other code instead] WHILE While loop { ... code ... } FOR TO For loop: the final value must be a constant { ... code ... } or another variable. If the range is empty, it does not execute even once. DO Until loop { ... code ... } UNTIL OBJECTLOOP FROM/IN A form of while loop. The var first holds either the obj value (if it is FROM) or its child (if IN), and runs through the sibling objects. So, for instance, objectloop x in lamp { print_obj x; new_line; } is equivalent to x=child(lamp); while x~=0 { print_obj x; new_line; x=sibling(x); } BREAK Break out of current loop (not block) JUMP where the relation is one of == a equals b ~= a doesn't equal b < > >= <= comparisons has object a has attribute b at the moment hasnt ...hasnt... near objects a and b have the same parent far ...haven't... These may _not_ be used in expressions (as if the language were C) and there is no AND/OR construction. There is a reason for this, but not a very good one (unless you count laziness). However, one tiny concession towards such a feature is provided, viz. the construction == [or [or ]] which is true if the first something is any of the values given. --------------------------------------------------------------------------- 10. Built-in functions --------------------------------------------------------------------------- The built in functions are PARENT(obj) SIBLING(obj) CHILD(obj) for reading the object tree (see (11) below), and RANDOM(x) which returns a uniformly random number between 1 and x, and PROP_LEN(addr) PROP_ADDR(o,p) PROP(o,p) for which see (11) below. Warning: some interpreters set up the random number generator with poor choices of seed value, which means that the first few random numbers may be rather peculiarly distributed. After a time, it settles down. To get around this, "Curses" (for example) takes and throws away 100 random numbers when it begins. --------------------------------------------------------------------------- 11. Objects --------------------------------------------------------------------------- The object hierarchy is a tree of up to 255 "objects", which you might use for many different game elements: rooms, compass points, scenery, things which can be picked up, and so on. They are numbered from 1 to 255, and the number 0 by convention means "nothing". Attempting to print_obj object 0 will produce a string full of peculiar letters and (if you are very unlucky indeed) even random ASCII values. In the tree, each object has a parent, a sibling, and a child. Thus, for instance, a portion may resemble Meadow | Mailbox -> Player | | Note Sceptre -> Cucumber -> Torch -> Magic Rod | Battery in which -> shows siblings, and | parents and children. In this case, the Meadow has nothing as its parent. Anything with no possessions, such as the note, has nothing as its child, and so on. When an object is moved, its possessions move with it, of course. In practice an object needs rather more data than just a position in a tree. It also has a collection of variables attached to it. Firstly, there are 32 flags, called "attributes", which can be either set or clear. These might be such conditions as "giving light", "currently worn" or "is one of the featureless white cubes". All 32 are free for the user to use. They must be declared before use, by commands like ATTRIBUTE locked; which will allocate a new attribute and make a constant "locked" to have the value of its number. You never then need to know about these numbers, because you can use commands like IF obj HAS locked { print_ret "But it's locked!"; } SET_ATTR obj locked; CLEAR_ATTR obj locked; Warning: 32 sounds like plenty, but the limit can quite easily be hit. The author has found it useful to declare one as "general", to be used for different things for different objects. Secondly, there are 30 "properties". These are far more elaborate. For one thing, not every object has every property. The following all declare new properties: PROPERTY door_to; PROPERTY article "a"; PROPERTY blorpleroutine $ffff; The value given, in the case of article and blorpleroutine, is the default value: that is, the value of the property which an object will have if it doesn't explicitly have some other value. If you don't define a default value, it will by default be 0. The data for a given property can be a number, or up to four numbers in a row, or up to eight bytes of data. The simplest way to get at the current value is something like i=PROP(location,door_to); which will get the first number in the property door_to of object location. Similarly, it can be written to with PUT_PROP location door_to hall_of_mists; A subtle point is that numbers smaller than 256 are stored differently from larger ones. In order to decide whether the property is one byte's worth or two, the Z-machine looks at the number of bytes which the property has in all, and sees whether it is odd or even; if even, it presumes the number is a 2-byte word; if odd, it presumes it is just one byte. This is seldom something you need to know about, but occasionally you will want a property which will, later in the game, need to hold a value of, say, 1000, but which initially will be zero. This is particularly the case with timing mechanisms, for instance. The command PROPERTY LONG timeleft; declares the property "timeleft" and requires Inform to make sure that all "timeleft" fields are 2 bytes wide, even if they have small initial values. More elaborate manipulation has to be done by hand. k=PROP_ADDR(o,weird); sets k to the address of the "weird" data of object o. To find out how many bytes there are, apply PROP_LEN to this address. l=PROP_LEN(k); Once you have the address you can read and write to it directly. Be careful not to overrun the length, which may not be changed. Warning: the Z-machine crashes if you attempt to write to a property field which an object hasn't got. An object is declared (before the body of the code) by something like: OBJECT trapdoor "hinged trapdoor" attic WITH name "hinged" "trap" "door" "trapdoor", initpos "A hinged trapdoor in the floor stands open, and light \ streams in from below.", closedpos "There is a closed trapdoor in the middle of the floor.", portalto house, postroutine TrapdoorPost, dirprop d_to HAS portal static open light openable; trapdoor is a constant which is set to its object number; "hinged trapdoor" is its attached short name; attic is the object which initially possesses it. If it was to be initially unowned, this would be "nothing" instead of "attic". After WITH is a list of property definitions, in the form ... [[, ...]] Warning: an excellent source of mysterious errors is missing off the commas between these, since property names are themselves legal constants. There is one special property, called "name" and numbered 1. Its data must be (up to four at most) words, as above, and these are entered into the dictionary as nouns (if they aren't already present): the data actually stored is the dictionary addresses. Note that the dictionary itself does _not_ know that "door" refers to this object: there might be any number of objects which could be called "door". After HAS is a list of attributes which the object initially has. --------------------------------------------------------------------------- 12. Verbs and grammar --------------------------------------------------------------------------- Whereas objects should be declared at the start of the file, the grammar to be allowed by the game should be declared at the end. This is done with the VERB command. VERB does something very complicated, but probably not what you think. A typical VERB command would be: VERB "take" "get" "pick" "lift" * "out" -> ExitSub * multi -> TakeSub * multiinside "from" noun -> RemoveSub * "in" noun -> EnterSub * "off" held -> DisrobeSub; This declares a verb, for which "take", "get" etc are synonyms, and which can take five different courses. In the first, it must be followed by the word "out". In the last, it must be followed by "off" and then an item which is currently held by the player. In the second, it can be followed by one object, or a list, perhaps specified as "everything", for instance. There can be no grammar at all, for example VERB "invent" "i" * -> InvSub; After the "->" is the name of a routine which is to be called when this is matched. For traditional reasons unclear to the author, previous Infocom hackers have called words such as "out" and "off", adjectives. This is monstrously illiterate since they are of course prepositions. We shall wearily follow convention anyway. Remember that the Z-machine does _not_ contain the bulk of a game parser, only the computationally expensive and low-level part which works out what the words are. So this command only sets up a table with some numbers in. If you want a parser, you have to write code to deal with the table again. By convention, adjectives are numbered downwards from $ff. Thus, if the above were the opening lines of grammar, "from" would be $fe, and so on. As they are created, they are entered into the dictionary, and also into the adjective table, which has four-byte entries 00 ----2 bytes----------------- ----2 bytes----------- In order to make life more interesting, these entries are stored in reverse order (i.e., lowest adjective number first). The address of this table is rather difficult to deduce from the file header information, so the constant #adjectives_table is set up by Inform to refer to it. In any event, the table isn't very useful and is created only for the sake of conforming to Infocom internal conventions. The important tables are the grammar and action tables. The grammar table address is stored in word 7 (ie bytes 14 and 15) of the header. The table consists of a list of two-byte addresses to the entries for each word. This list is immediately followed by these entries, one after another. An entry consists of one byte giving the number of lines (eg, 5 for the "take" definition above) and then that many 8-byte lines. These lines have the form --1 byte- ----6 bytes-------- --1 byte------- is the number of objects which need to be supplied: eg, 0 for "inventory", 1 for "take frog", 2 for "tie rope to dog". The sequence of words gives up to 6 blocks of syntax to follow the verb, which must be matched in order. Large numbers such as $ff mean that the appropriate adjective must appear; small numbers are inserted by special words such as "held" or "noun" in the VERB command: Word Byte What the "Deja Vu" parser uses it for ==== ==== ===================================== noun 0 any visible object held 1 object held multi 2 one or more visible objects multiheld 3 one or more held objects multiexcept 4 one or more objects, except the other object multiinside 5 one or more objects, inside the other object creature 6 an animate creature special 7 any word or number The sequence is padded out to 6 bytes with zeros. The action numbers begin at 0. The first routine mentioned as an action (in the above example, ExitSub) is assigned action number 0; the next (TakeSub) is given 1, and so on. The appropriate number is stored in the last byte of the line. Thus, a little later on in the grammar, the line VERB "exit" "leave" * -> ExitSub; might well appear, and ExitSub will mean "action 0" as before. So this table does not store the address of the action routine, as one might expect. Instead the addresses corresponding to the action numbers are stored in the actions table. Once again, Inform puts this table in its conventional place, but this address being difficult to work out, the constant #actions_table is set up to hold it. The actions table is simply a list of 2-byte entries giving the routine addresses (divided by 2). There is also a preactions table, with another constant #preactions_table, created only to conform to Infocom conventions; it is set up containing 0000 for each action. ("Curses", for instance, makes no use of this.) In the mean time, what has happened to the actual words, "take", "get", "pick" and "lift"? Note that these do not appear in the grammar table at all. Instead they are entered into the dictionary, along with the verb number. As a final baroque twist, these numbers also count down from $ff. Any number of words can be given, all referring to the same verb number; "Curses" has 11 synonyms for "attack", for instance. Of course, Inform does not know or care what is done with any of these tables. For instance, the "take" verb has the entry 005 000 255 000 000 000 000 000 000 001 002 000 000 000 000 000 001 002 005 254 000 000 000 000 002 001 253 000 000 000 000 000 003 001 252 001 000 000 000 000 004 but it is up to the code you write to deal with this. (The VERBS command will print out the full verb table in a similar format.) --------------------------------------------------------------------------- 13. The Dictionary --------------------------------------------------------------------------- This section describes what Inform does with the dictionary. The fourth word of the file header (bytes 8 and 9) contain the dictionary table's address. The table begins with a 7-byte header: 03 '.' ',' '"' meaning there are three characters used to separate words in typed input, full stops, commas and quotation marks. (The Z-machine will allow any list to be given here but Inform decides on this for you.) 07 ----2 bytes-------- meaning there are that many entries in the dictionary, all 7 bytes long. (This could again be in principle varied, but allows for six significant letters in words, while still enabling the text of the word to occupy a 4-byte integer - which is convenient and fast when the compiler is alphabetically sorting.) The seven-byte entries are in alphabetical order, and look like: ----4 bytes----------- --1 b-- ----1 byte--- ----1 byte-------- The text is stored in the usual text format, thus allowing up to 6 characters. The flags (chosen once again to conform loosely to Infocom conventions, not for any sensible reason) have the eight bits 7 6 5 4 3 2 1 0 .. .. .. .. , and mean the word can be a verb, noun or adjective; the bit means the word was inserted by a DICTIONARY command in the program, except that words also have the bit set (ours not to wonder why). Note that a word can be any combination of these at once. It can even be simultaneously a verb, adjective and noun. Typically a full game contains about 600 dictionary entries - about ten times the number of portable objects. Even so it only consumes about 4K, or 1/64th of the available memory. It's never worth economising on dictionary entries; nothing else a designer can do with 4K will be as good to the user. --------------------------------------------------------------------------- 14. Indirect function calls --------------------------------------------------------------------------- Occasionally one needs to call a function whose address is in a variable: for example, if the routine address has been looked up from a table, or an object's property list. For this, the function "indirect" is provided: a=indirect(b); sets a to the return value of calling the function whose address is in b. If you want to pass arguments as well, you should use the assembler-level @icall. But do so with care: it is dangerously easy to leave values lying about on the stack, which will overflow causing a mysterious crash hundreds of turns later. --------------------------------------------------------------------------- 15. Text spacing --------------------------------------------------------------------------- Typewritten English, like this file, normally puts a double space after a full stop. This is much easier to read. Unfortunately Infocom-standard interpreters do not usually understand that. When they fold text across lines, they can easily turn ...and a pomegranate. After all, you always hated fruit. into something which looks like |You decline the offer of a banana, an apple and a pomegranate. | | After all, you always hated fruit. | | | |> | which looks awful. It would be easy to fix the interpreter not to do this; but nobody does. In case (like the author's) your typing is habitually double-spaced, Inform provides a command line option -d to change it back again. It does this only by replacing the string ". " by ". " in text conversion. --------------------------------------------------------------------------- A1. The Z-machine --------------------------------------------------------------------------- The so-called Z-machine (the imaginary machine for which story files are programs) is quite well-adapted to its task. It maintains a hierarchy of objects and possessions, and does the computationally-intensive part of parsing input itself. That said, it does not contain the bulk of the parser. The parsing tables which some investigators think are part of the Z-machine format, are in fact the same across different Infocom games only because they all contain essentially the same parser code. Thus, Inform is in principle free not to compile such tables, but it does so in order to INFODUMP properly. Some tables are put to subtly different uses, however. The following description is fairly complete, but only covers version 3. It would be helpful if someone public-spirited would write an account of the differences in later versions. The version 3 Z-machine is 128K long at most. Addresses within it are nonetheless held in 2-byte words, which is why some addresses are stored as half their actual values, and why some items (routines and static strings) are always stored at even addresses. The first 64 bytes contain a header. The first 4 bytes are: 03 ----2 bytes----- 3 indicates version 3; the release number is as set in the program; the flags byte contains bits: 1 Status line type (clear for Score/Turns, set for Hours:Mins) 3 Censorship bit (used by some games, but not by the Z-machine) 4 Alternative prompts - sometimes used by primitive interpreters 5 Status window support - used only by "Seastalker" Next come seven word addresses, at words 2 to 8: 2 Where routines begin, in bytes 3