Format of Inform 6 Debugging Information Files Version 1.0 0: Introduction This is a specification of the Version 1 format for the debugging information files emitted by the Inform 6 compiler. It replaces Version 0, which is documented in Section 12.5 of the Inform Technical Manual. 1: Overview Debugging information files are written in XML and encoded in UTF-8. They therefore begin with the following declaration: Beyond the usual requirements for well-formed XML, the file adheres to the conventions that all numbers are written in decimal, all strings are case-sensitive, and all excerpts from binary files are Base64-encoded. 2: The Top Level The root element is given by the tag with three attributes, the version of the debug file format being used, the name of the program that produced the file, and that program's version. For instance, ... The elements from Sections 3--8 may appear in the ellipses. 3: Story File Prefix The story file prefix contains a Base64 encoding of the story file's first bytes so that a debugging tool can easily check whether the story and the debug information file are mismatched. For example, the prefix for a Glulx story might appear as R2x1bAADAQEACqEAAAwsAAAMLAAAAQAAAAAAPAAIo2Jc 6B2XSW5mbwABAAA2LjMyMC4zOAABMTIxMDE1wQAAMA== The story file prefix is mandatory, but its length is unspecified. Version 6.33 of the Inform compiler records 64 bytes, which seems sufficient. 4: Story File Sections Story file sections partition the story file according to how the data will be used. For the Inform 6 compiler, this partitioning is the same as the one that the `z' flag prints. A record for a story file section gives a name for that section, its beginning address (inclusive), and its end address (exclusive): abbreviations table
64
128
The names currently in use include those from Section 12.5 of the Inform Technical Manual: abbreviations table header extension (Z-code only) alphabets table (Z-code only) Unicode table (Z-code only) property defaults object tree common properties class numbers individual properties (Z-code only) global variables array space grammar table actions table parsing routines (Z-code only) adjectives table (Z-code only) dictionary code area strings area plus one addition for Z-code: abbreviations two additions for Glulx: memory layout id string decoding table and three additions for both targets: header identifier names zero padding Names may repeat; Glulx story files, for example, sometimes have two zero padding sections. A compiler that does not wish to subdivide the story file should emit one section for the entirety and give it the name story 5: Source Files Source files are encoded as in the example below. Each file has a unique index, which is used by other elements when referring to source code locations; these indices count from zero. The file's path is recorded in two forms, first as it was given to the compiler via a command-line argument or include directive but without any path abbreviations like `>' (the form suitable for presentation to a human) and second after resolution to an absolute path (the form suitable for loading the file contents). All paths are written according to the conventions of the host OS. The language is, at present, either "Inform 6" or "Inform 7". More languages may added in the future. example.inf /home/user/directory/example.inf Inform 6 If the source file is known to appear in the story's Blorb, its chunk number will also be recorded: example.inf /home/user/directory/example.inf Inform 6 18 6: Table Entries; Grammar Lines Table entries are data defined by particular parts of the source code, but without any corresponding identifiers. The element notes the entry's type, the address where it begins (inclusive), the address where it ends (exclusive), and the defining source code location(s), if any: grammar line
1004
1030 ...
Version 6.33 of the Inform compiler only emits tags for grammar lines; these data are all located in the grammar table section. 7: Named Values; Constants, Attributes, Properties, Actions, Fake Actions, Objects, Classes, Arrays, and Routines Records for named values store their identifier, their value, and the source code location(s) of their definition, if any. For instance, MAX_SCORE 40 ... would represent a named constant. Attributes, properties, actions, fake actions, objects, arrays, and routines are also names for numbers, and differ only in their use; they are represented in the same format under the tags , , , , , , and . (Moreover, unlike Version 0 of the debug information format, fake actions are not recorded as both fake actions and actions.) The records for constants include some extra entries for the system constants tabulated in Section 12.2 of the Inform Technical Manual, even though these are not created by Constant directives. Entries for #undefed constants are also included, but necessarily without values. Some records for objects will represent class objects. In that case, they will be given with the tag rather than and include an additional child to indicate their class number: lamp 5 1560 ... Records for arrays also have extra children, which record their size, their element size, and the intended semantics for their zeroth element: route 1500 20 4 true ... And finally, records contain an
and a element, along with any number of the and elements, which are described in Sections 9 and 10. The address is provided because the identifier's value may be packed. Sometimes what would otherwise be a named value is in fact anonymous; unnamed objects, embedded routines, some replaced routines, veneer properties, and the Infix attribute are all examples. In such a case, the subelement will carry the XML attribute artificial to indicate that the compiler is providing a sensible name of its own, which could be presented to a human, but is not actually an identifier. For instance: lantern.time_left 1820 80 ... ... Artificial identifiers may contain characters, like the full stop in ``lantern.time_left'', that would not be legal in the source language. 8: Global Variables Globals are similar to named values, except that they are not interpreted as a fixed value, but rather have an address where their value can be found. Their records therefore contain an
tag in place of the tag, as in: darkness_witnessed
1520
...
9: Local Variables The format for local variables mimics the format for global variables, except that a source code location is never included, and their memory locations are not given by address. For Z-code, locals are specified by index: parameter 1 whereas for Glulx they are specified by frame offset: parameter 4 If a local variable identifier is only in scope for part of a routine, it's scope will be encoded as a beginning instruction address (inclusive) and an ending instruction address (exclusive): rulebook 0 1628 1678 Identifiers with noncontiguous scopes are recorded as one element per contiguous region. It is possible for the same identifier to map to different variables, so long as the corresponding scopes are disjoint. 10: Sequence Points Sequence points are stored as an instruction address and the corresponding single location in the source code:
1628
...
The source code location will always be exactly one position with overlapping endpoints. Sequence points are defined as in Section 12.4 of the Inform Technical Manual, but with the further stipulation that labels do not influence their source code locations, as they did in Version 0 of the debug information format. For instance, in code like say__p = 1; ParaContent(); .L_Say59; .LSayX59; t_0 = 0; the sequence points are to be placed like this: <*> say__p = 1; <*> ParaContent(); .L_Say59; .LSayX59; <*> t_0 = 0; rather than like this: <*> say__p = 1; <*> ParaContent(); <*> .L_Say59; .LSayX59; t_0 = 0; 11: Source Code Locations Most source code locations take the following format, which describes their file, the line and character number where they begin (inclusive), the line and character number where they end (exclusive), and the file positions (in bytes) corresponding to those endpoints: 0 1024 4 44153 1025 1 44186 Line numbers and character numbers begin at one, but file positions count from zero. In the special case where the endpoints coincide, as happens with sequence points, the end elements may be elided: 0 1024 4 44153 At the other extreme, sometimes definitions span several source files or appear in two different languages. The former case is dealt with by including multiple code location elements and indexing them to indicate order: 0 1024 4 44153 1025 1 44186 1 1 0 0 3 1 59 The latter case is also handled with multiple elements. Note that indexing is only used to indicated order among locations in the same language. 2 12 0 308 12 112 420 0 1024 4 44153 1025 1 44186