Code–Generation On–the–Fly: A Key to Portable Software
SWISS FEDERAL INSTITUTE OF TECHNOLOGY ZURICH
Code–Generation On–the–Fly: A Key to Portable Software
Acknowledgement
I am deeply indebted to Professor N. Wirth for the privilege of being allowed to work with him. Over the years,he has taught me a wealth of wisdom that will follow me throughout my professional life, and beyond. I admirehis unique ability of immediately recognizing the essential, and his courage to eliminate the superfluous evenwhen considered indispensable by others. Often enough, I have attempted to apply his standards of judgement inmy own work, and the results have been most gratifying. I thank him most sincerely for taking up thesupervision of this thesis and commenting so competently on the ongoing project.
I thank Professor J. Gutknecht for his willingness to be my co_examiner and for advising me on this thesis. He istruly a great pedagogue, and if the contents of this thesis are comprehensible at all, this is also a result of hisvaluable criticism of an earlier draft, delivered in his unrivalled didactic style.
I am grateful to my colleagues at the Institut fur Computersysteme for their openness to new ideas and theirenthusiasm in discussing them, even late at night. Together, they have provided a friendly and intellectually moststimulating working environment.
Finally, I thank C. Pfister and S. Ludwig for proof_reading this thesis.
Code–Generation On–the–Fly: A Key to Portable Software
Contents
Abstract .71. Introduction .82. Program Representation .93. The SDE Encoder and Decoder.184. Modular Design of the Implementation.265. Benchmark Results .316. Portability and Software Components .367. Further Applications .428. Related Work.459. Summary and Conclusion.49References .50
Code–Generation On–the–Fly: A Key to Portable Software
Abstract
A technique for representing programs abstractly and independently of the eventual target architecture ispresented that yields a file representation twice as compact as machine code for a CISC processor. It forms thebasis of an implementation, in which the process of code generation is deferred until the time of loading. At thatpoint, native code is created on–the–fly by a code–generating loader.
The process of loading with dynamic code–generation is so fast that it requires little more time than the input ofequivalent native code from a disk storage medium. This is predominantly due to the compactness of the abstractprogram representation, which allows to counterbalance the additional effort of code–generation by shorter inputtimes. Since processor power is currently rising more rapidly than disk_access times and transfer rates arefalling, the proposed technique is likely to become even more competitive as hardware technology evolves.
To users of the implemented system, working with modules in the abstract representation is as convenient asworking with native object files. Both kinds of file_representation coexist in the implemented system; they arecompletely interchangeable and modules in either representation can import from the other kind. Separatecompilation of program modules with type_safe interfaces, and dynamic loading on a per–module basis are bothfully supported.
Deferring code–generation until loading time can provide several new capabilities, especially when theintermediate program representation is machine–independent and thereby portable. It is argued that thecombination of portability with practicality denotes an important step toward a software–component industry. Further benefits include a potential for reducing the number of recompilations after changes in source text, and amechanism to decide at load time whether or not run–time integrity checks should be generated for a librarymodule, eliminating the need to distinguish between development and production library configurations. All ofthese new possibilities have the potential of lowering the cost of software development and maintenance.
In the long run, fast on–the–fly code–generation may even replace binary compatibility for achieving softwareportability among processors implementing the same architecture. Already today, different models of a processorfamily are diverging more and more and it is becoming increasingly difficult to serve all of them equally wellwith just one version of native code. If code is generated only at the time of loading, however, it can always becustom–tailored toward the specific processor that it will eventually run on. Introduction
The rapid evolution of hardware technology is constantly influencing software development, for better as well asfor worse. On the downside, faster hardware can conceal the complexity and cost of badly–designed programs;Reiser [Rei89] is not far off the mark in observing that sometimes “software gets slower more quickly thanhardware gets faster”. On the other hand, improvements in hardware, such as larger memories and fasterprocessors, have also provided the means for better software tools.
Often enough, methodical breakthroughs have been indirect consequences of better hardware. For instance,software–engineering techniques such as information hiding and abstract data types could be developed onlyafter computers became powerful enough to support compilers for modular programming languages. Mechanisms such as separate compilation place certain demands on the underlying hardware and so had achance of proliferation only after sufficiently capable computers were commonplace.
This thesis presents another example of a systematic technological advance that owes its viability to improvedhardware. It describes a technique for representing programs abstractly in a format that is twice as compact asobject code for a CISC processor. Combined with the speed of current processors and abundance of mainstorage, the use of an intermediate program representation based on the new technique makes it possible toaccelerate the process of code generation to such a degree that it can be performed on-the-fly during programloading on ordinary desktop computers, even with a high resulting code quality.
A system has been implemented that permits the convenient use of program modules in thismachine_independent intermediate representation, as if they had been compiled to native code. In this thesis, it isargued that such a system presents some completely new possibilities, among which lies the prospect offounding an industry offering user-serviceable software components that are immediately functional on morethan one kind of target architecture.
Among the themes that recur throughout this thesis are methods of program representation, code generation,modular programming, and software portability. This is a comparatively wide scope for a doctoral dissertation,which is reflected in a large and diverse list of references.
Code–Generation On–the–Fly: A Key to Portable Software
Program Representation
For every calculable function, there exists an algorithm that computes it [Chu35]. A program is an encoded description of such an algorithm that instructs a universal computing machine how to compute the corresponding function. A machine is called universal exactly if it can compute every function that is representable by an algorithm; Turing [Tur36] has shown that such universal machines exist. Programming languages correspond to abstract and relatively complex universal machines, while digital computers represent physical realizations of universal machines that are programmable in their characteristic machine languages.
Algorithms can be translated from one program representation into another; the mere existence of an algorithmguarantees that a corresponding program can be constructed for every machine that is universal. This chapterdiscusses the use of intermediate languages as transitional steps in such transformations between programrepresentations, particularly solutions based on abstract machines. It then introduces a newprogram-representation technique called semantic-dictionary encoding, which achieves a high informationdensity while facilitating simultaneously the efficient further translation into various machine languages. Subdivided Compilers
Often, and for a variety of reasons, compilers are subdivided into smaller units. This may be a static separation into modules, or a dynamic partition into two or more separate compilation phases (or passes) that execute one after the other. The initial argument for such a division is often a physical limitation of the host machine that makes the construction of a monolithic compiler impossible. There are, however, other advantages to be gained from splitting a compiler into smaller parts, which make this approach attractive even when it is not mandated by external constraints, and keep it attractive when the constraints are eventually removed by evolving hardware technology.
Most importantly, subdividing a compiler usually reduces its complexity. The individual pieces are smaller andthereby simpler than the whole, and often the combined complexity of all parts taken together is less than that ofa corresponding monolithic compiler. The reason for this is that the different parts can usually be decoupledfrom each other quite well, resulting in narrow interfaces between them. Complexity, on the other hand, ismostly rooted in non_local dependencies.
Another reason for structuring a compiler concerns the ease with which it can be retargeted for another architecture. By disentangling the code-generating functions from the rest of the compiler, the construction of a family of compilers for different target architectures can be simplified considerably. Adapting the compiler for a new target machine then amounts to the construction of a new code generator, while the rest of the compiler can remain unchanged. As a matter of fact, this technique has become so common that we tend to speak of “code generators” today as if they were self-contained programs in their own right, disregarding completely that syntactic analysis and code generation were not separated from each other in early compilers. Intermediate Languages
A compiler transforms a program from one representation into another. When such a compiler consists of several phases, this implies the existence of further intermediate representations carrying the state of the compilation from one phase to the next. These intermediate representations are usually only transient data structures in memory, but in some compilers they have a linear form that can be stored on a data file. In the following, such stand_alone linear representations are referred to as intermediate languages.
Intermediate languages are interesting from the aspect of software portability. If an intermediate language is sufficiently simple, it will provide for the easy construction of a series of interpreters that can execute the intermediate language directly on a number of different target machines. Interpreters are usually easier to build than code generators, so that this approach is attractive, although it comes at the expense of reduced performance. Implementations that have been based on interpreted intermediate languages in this manner include BCPL [Ric71, RW79], SNOBOL4 [Gri72], and Pascal [NAJ76].
Intermediate languages are often also used in the process of bootstrapping compilers onto new machines [Hal65]. Suppose that we have a portable compiler that translates a source language SL into an intermediate language IL, and that this compiler is itself written in SL and available encoded both in SL and in IL. Suppose further that we have an interpreter for IL that runs on a computer with a machine language TL. We may then modify the portable compiler (written in SL) to derive a native compiler that translates from SL to TL directly. By compiling it with the interpretable version of the original compiler we gain a native compiler that is written in IL, and hence executable by the interpreter. This compiler can now be used to create a native compiler capable ofrunning directly on the target machine (Figure 2.1).
Each 'T'-shaped boxrepresents a compilerwriten in this languagethat translatesfrom thislanguage into this
* Available at Start Figure 2.1: Bootstrapping a CompilerUNCOL
In the 1950’s, it became apparent that the ongoing proliferation of programming languages and hardwarearchitectures would continue for some time. This would set off a vast demand for different compilers, aspotentially one would require separate compilers for each combination of source language and targetarchitecture.
To counteract the anticipated combinatorial explosion, the idea of a linguistic switchbox materialized in 1958 [SWT58, Con58, Ste60, Ste61a]. It was planned to design a universal intermediate language, into which programs originating in any problem_oriented language could be translated by an appropriate generator (front-end in today’s terminology), and from which code for any processor architecture could be generated by a suitable translator (back-end). This would enable the construction of compilers for m languages to be run on n machines by having to write only m+n programs (m front-ends and n back-ends), instead of m*n (Figure 2.2). The intermediate language was to be called UNCOL, for universal computer-oriented language.
Code–Generation On–the–Fly: A Key to Portable Software
Figure 2.2: The Use of UNCOL as a Linguistic Switchbox
One might of course wonder why such an intermediate language needs to be designed specifically for thispurpose. If algorithms can be translated from one program representation into another, then in principle themachine language of any sufficiently powerful computer could serve as UNCOL. Unfortunately, however, mosttranslations between languages are so difficult to do that they are not practical. To wit, potential UNCOLcandidates need to be amenable to efficient implementation. They should also provide for further developmentsin source and target languages. Steel [Ste61b] cites the example of a hypothetical UNCOL specified before theinvention of index registers, which would be very cumbersome to use with programs operating on arrays.
To this date, no proposed UNCOL has been met with universal agreement. However, many compiler families have offered varying combinations of front_ ends and back_ends by way of common intermediate representations, and thereby kept the spirit of UNCOL alive. Moreover, certain programming languages have been implemented on so many different architectures that they have attained an almost UNCOL_like status. For instance, the programming language C [KR78] is nearly pervasive. Atkinson et al. [ADH89] describe a Cedar compiler that achieves portability by performing high-level source-to-source translation, producing C as its output. The resulting C programs are then passed directly to a C compiler, without further human editing. They serve only as an intermediate representation in this context. Abstract Machines
A straightforward method for obtaining an intermediate_language representation of a program is to represent it as an instruction sequence for some fictitious computer, also called an abstract machine [PW69, NPW72, KKM80]. Most intermediate languages, including the very first UNCOL ever proposed [Con58], follow this pattern. Besides abstract machines, intermediate languages have also been based on linearized parse_trees [GF84, DRA93a]. Furthermore, there are compilers that compile via ordinary high_level programming languages by way of source_to_source program translation as mentioned above [ADH89].
Abstract machines reached their heyday in the early 1970’s. They were widely popular for a decade, providingsoftware portability to a variety of target architectures while consuming only moderate resources. There wereeven architectures that started out as abstract machines and, because of their popularity, were subsequently
realized physically [WD79] or implemented in microcode [Hal82]. Some of the better known abstract machinesfrom that era are the following:•
BCPL. BCPL [Ric69] is a small, block-structured programming language aimed primarily atsystems-programming applications. It has only one data type, the Rvalue, to which the language attachesdifferent interpretations depending on the operation applied. Each Rvalue occupies one unit of store on anidealized object computer that is visible at the level of the source language. The implementation of BCPL[Ric71, RW79] is based on an abstract stack machine. BCPL’s type-less semantic model of an idealizedcomputer allows for portable address arithmetic in source programs and target_machine_independent stackmanagement in the compiler. SNOBOL4. SNOBOL4 is a member of the SNOBOL family of character-string-processing languages[Gri78]. Its original implementation [Gri72] uses a register-less abstract machine that offers onlymemory-to-memory instructions. The basic data unit of this abstract machine is called a descriptor. All ofSNOBOL4’s data types are encoded within such descriptors. However, the internal layout of descriptorsvaries between different target architectures, allowing for an efficient implementation of the interpreter ondifferent kinds of hardware, but requiring changes in the SNOBOL4 compiler whenever the system isported. Pascal-P. The general-purpose programming language Pascal [Wir71] is still widely used today. One of itsearly implementations, Pascal-P [NAJ76], is founded on a hypothetical stack computer similar to theabstract machine used in the implementation of BCPL. Unlike BCPL, however, Pascal differentiatesbetween various data types. In the Pascal-P implementation, the abstract-machine representations of thesetypes are adapted to each individual target architecture, with respect to their storage requirements, and totheir alignment. As in the implementation of SNOBOL4, this target-architecture dependence prevents directportability of the intermediate language and requires parametrization of the compiler, but results in efficientstorage allocation and execution of the interpreter. Janus. Unlike the previous three examples, each of which uses an abstract machine to implement a singlesource language on a variety of target architectures, Janus [CPW74, HW78] comes closer to the originalidea of UNCOL by defining the essentials of an intermediate language independently of any sourcelanguage or target architecture. Rather than providing a single abstract machine, Janus describes thecommon basic structure (stack machine with index register) of a whole family of abstract machines, whichdiffer in their instruction sets to accommodate the specific constructs of source languages. Janus has beenused successfully in the development of compilers for the programming languages Pascal [Wir71] andAlgol_68 [WMP69].
The advent of the personal computer eventually caused the demise of the abstract machine, since it led to ade_facto standardization of the machine language on the low end of the computer performance spectrum whilesimultaneously making these computers affordable. In the light of a quasi_ standard machine language, however,there was no longer a need to use intermediate languages to achieve software portability, even more so as theinterpreted execution of abstract machines was associated with inefficiency.
This picture is changing again only now. Although the standardization of hardware has continued over the pasttwo decades, leaving only about a dozen major architectures to which software needs to be ported, the appeal oftraditional binary compatibility is waning. This is because different implementations of the same architecturebegin to diverge by so much that it is becoming more and more difficult to generate native code that performswell on all processors within a family. As will be shown in a later chapter, the work presented in this thesisoffers an alternative, by allowing to generate code quickly at the time of loading. It is thereby possible to deliveroptimized instruction sequences to individual processors while still maintaining a user-convenience comparableto that of binary compatibility. Disadvantages of AbstractMachines
Abstract machines are often defined by forming a conceptual intersection of potential target-machinearchitectures. Hence, intermediate languages based on abstract machines are by their very nature closer tomachine languages than to high-level programming languages. Unfortunately, this has the undesirableconsequence that the abstract-machine representation of an algorithm contains less structure than thecorresponding source program. Most abstract machines can describe only the algorithmic aspects of acomputation, and cannot correctly preserve all of the higher-order information expressed in high-level- languageprograms, such as block structure and data typing.
For example, consider the following source fragment in the programming language Oberon [Wir88]:
Code–Generation On–the–Fly: A Key to Portable Software
VARch: CHAR;lint: LONGINT;BEGINlint := LONG(ORD(ch));
A possible representation as an instruction sequence for an abstract stack computer might be the following:
LOAD1 ch move single_byte CHAR value onto stackORD zero-extend, result is an INTEGERLONG sign-extend, result is a LONGINTSTORE4 lint move four_byte LONGINT value back to memory
However, this sequence of operations can be performed by a single instruction on certain processors, such asthose of the National Semiconductor 32000 family [NS84]:
MOVZBD ch, lint move, zero-extending byte to double-word
Hence, it may be necessary to combine several abstract_machine instructions into one target_machine instructionwhen generating anything but the most naive code out of an abstract_machine intermediate representation. Inthese cases, the code generator effectively needs to reconstruct, at considerable expense, information that wasmore easily accessible in the front_end, but lost in the transition to the intermediate representation.
The only way to solve this particular problem is by bridging the semantic gap through the introduction, on thelevel of the abstract machine, of a separate move instruction for every combination of source and destinationdata type. Still, this is able to rectify only those cases that involve a direct data_format conversion. As targetarchitectures may potentially offer arbitrarily complex operations, e.g., combined move-and-shift operations, themerger of several abstract-machine instructions into one single target_machine instruction can never be ruled outcompletely, no matter how elaborate the abstract machine’s design.
Because of their systematic defects, intermediate languages based on abstract machines are not well suited as a basis for the fast generation of high-quality native code. The remainder of this chapter introduces an intermediate representation that is. This representation, which I call semantic-dictionary encoding (SDE), preserves the full semantic context of source programs while being highly compact. An Overview of Semantic-Dictionary Encoding
SDE is a dense representation. It encodes a syntactically correct source program by a succession of indices into a semantic dictionary, which in turn contains the information necessary for generating native code. The dictionary itself is not part of the SDE representation, but is constructed dynamically during the translation of a source program to SDE form, and reconstructed before (or during) the decoding process. This method bears some resemblance to commonly used data compression schemes [Wel84].
With the exception of wholly constant expressions, which are evaluated during the transformation of a sourceprogram to SDE form, SDE preserves all of the information that is available at the level of the source language. Hence, unlike abstract_machine representations, transformation to SDE preserves the block structure ofprograms, as well as the type of every expression. Moreover, when used as the input for code-generation, SDE incertain cases provides for short-cuts that can increase translation efficiency.
A program in SDE form consists of a symbol table (in a compact format, as explained in [Fra93b]) and a series of dictionary indices. The symbol table describes the names and the internal structure of various entities that are referenced within the program, such as variables, procedures, and data types. It is used in an initialization phase, in the course of which several initial entries are placed into the semantic dictionary. The encoding of a program’s actions consists of references to these initial dictionary entries, as well as to other entries added later in the encoding process.
Dictionary entries represent nodes in a directed acyclic graph that describes the semantic actions of a program abstractly. In its most elementary form, a semantic dictionary is simply such an abstract syntax-tree in tabular shape, in which the references between nodes have been replaced by table indices. Each dictionary entry consists of a class attribute denoting the semantic action that the entry stands for (e.g., assignment, addition, etc.), and possibly some references to objects in the symbol table (connected via an info attribute in the following diagram) and to other dictionary entries (described by the links attribute). Figure 2.3 gives an example of a simple arithmetic expression represented as an abstract syntax tree and as a semantic dictionary. Figure 2.3: Different Representations of an Expression
What differentiates a tabular abstract syntax-tree from a semantic dictionary is that the latter can describe also generic characteristics of potential nodes that might appear in such a tree. In addition to complete entries that directly correspond to nodes of an abstract syntax tree, semantic dictionaries contain also generic, or template entries. These templates have the same structure as complete entries, but at least one of their attributes is missing, as recorded in a status flag. In SDE, complex expressions can be represented not only by complete entries, but also by templates in combination with other entries. For example, the expression “ a + b “ can be represented by the index of an “addition” template followed by the indices of two variable_reference entries.
Now suppose that a template existed in the semantic dictionary for every construct of the source language(assignment, addition, etc.), and that furthermore the semantic dictionary were initialized in such a way that itcontained at least one entry (possibly a template) for every potential use of every object in the symbol table. Forexample, an object describing a procedure in the programming language Oberon [Wir88] requires a minimum offour entries in the dictionary, relating in turn to a call of the procedure, entry of the procedure, return from theprocedure, and addressing the procedure, which is used in the assignment of the procedure to procedurevariables. There are no other operations involving procedures in Oberon.
These preconditions can be fulfilled by initializing the semantic dictionary in a suitable way. They enable us torepresent any program by only a symbol table and a succession of dictionary indices. As an example, considerthe following module M in the programming language Oberon:
Code–Generation On–the–Fly: A Key to Portable Software
MODULE M; VAR i, j, k: INTEGER; PROCEDURE P(x: INTEGER): INTEGER; BEGIN … END P;BEGIN i := P(i); i := i + j; j := i + k; k := i + j; i := i + jEND M.
In order to encode this program by the method of SDE, we first need to initialize the semantic dictionary, whichdepends on the operations offered by the programming language (Oberon in this case) and on the objects in thesymbol table. The symbol table for the example program contains three integer variables (i, j, and k) and afunction procedure (P). After initialization, the corresponding semantic dictionary might look like the following(the individual entries’ indices are represented by symbolic names so that they can be referenced further along inthis chapter, and missing attributes of templates are denoted by a dot).
The instruction sequence that constitutes the body of module M may then be represented by the followingsequence of 24 dictionary indices:
asgn vi callp vi i := P(i)asgn vi plus vi vj i := i + jasgn vj plus vi vk j := i + kasgn vk plus vi vj k := i + jasgn vi plus vi vj i := i + j
This is where the second major idea of SDE comes in. What if we were to keep on adding entries to thedictionary during the encoding process, based on the expressions being encoded, in the hope that a similarexpression would occur again later in the encoding process? For example, once we have encoded the assignment“i := P(i)” we might add the following three entries to the dictionary:
Thereafter, if the same assignment “i := P(i)” occurs again in the source text, we can represent it by a singledictionary index. If another assignment of a different expression to the variable i is come across, this may berepresented using the template “assign to i”, resulting in a shorter encoding.
Encoding module M in this manner results in the following additional entries being placed in the dictionary(assuming that, after initialization, the dictionary comprised n-1 entries):
The body of module M can then be encoded as follows:
asgn vi callp vi i := P(i)n+1 plus vi vj i := i + jasgn vj n+3 vk j := i + kasgn vk n+5 k := i + jn+6 i := i + j
Instead of the previous 24 dictionary indices, this encoding requires only 16 of them, although the individualindices themselves may be larger since the dictionary has more entries. Still, the second encoding is usuallymuch more space_economical and has further advantages, as will be shown in the following paragraph. Increasing the Speed of Code Generation
The decoding of a program in SDE form is similar to the encoding operation. At first, the dictionary is initializedto a state identical to that at the onset of encoding. Since the symbol table is part of the SDE file_representation,the decoder has all required information available to perform this task.
Thereafter, the decoder repeatedly reads dictionary indices from the file, looking up each corresponding entry inthe semantic dictionary. Whenever a complete entry is found in this manner, its meaning is encoded directly inthe dictionary and the decoder can proceed to process the next index on the input stream. If a template isretrieved instead, it is copied, further entries corresponding to its undefined attributes are input in turn, and themodified copy is added to the dictionary as a new complete entry. Moreover, additional templates are sometimesadded to the semantic dictionary according to some fixed heuristics, in the hope that a corresponding branch ofthe program’s syntax tree will show up further along. A more detailed discussion of these heuristics follows inthe next chapter.
In addition to providing a dense program representation, SDE is able to supply certain information that is notexplicitly available in source text, namely about multiple textual occurrences of identical subexpressions(including, incidentally, common designators). This can sometimes be exploited during code generation,resulting in an increase in the speed of code output.
Consider again the (second, more compact) SDE-representation of module M above. Now suppose that we wantto generate object code for a simple stack machine directly from this SDE form. Let us assume that we haveprocessed the first two statements of M’s body already, yielding the following instruction sequence starting ataddress a:
Code–Generation On–the–Fly: A Key to Portable Software
a LOAD i load i onto stacka+1 BRANCH P call procedure P with argument ia+2 STORE i assign result to ia+3 LOAD i load ia+4 LOAD j load ja+5 ADD add i to ja+6 STORE i assign their sum to i
Let us further assume that we have kept a note in the decoder’s semantic dictionary, describing which of thegenerated instructions correspond to what dictionary entry. For example, we might simply have recorded theprogram counter value twice for every entry in the semantic dictionary, both before and after generating code forthe entry:
A setup such as this allows us to bypass the usual code generation process under certain circumstances, replacingit with a simple copy operation of instructions already generated. For example, when we encounter the secondreference to entry n+6 in the SDE-representation of module M, we know that we have already compiled thecorresponding statement “i := i + j” earlier on. In this case, we may simply re-issue the sequence of instructionsgenerated at that earlier time, which can be found in the object code between the addresses a+3 and a+6, asrecorded in the dictionary.
Of course, this method cannot be applied in every case and on all kinds of machine. First of all, code-copying is possible only for position invariant instruction sequences. For example, a subexpression that includes a call of a local function by way of a relative branch is not position invariant because the branch distance is different each time that the function is called. It happens that information about position invariance can also be recorded conveniently in the dictionary, as shown in the example above.
Secondly, the instruction sequences obtained by code-copying may not be optimal for more complex processorswith many registers and multistage instruction-pipelines. For these machines, it may be necessary to employspecific optimization techniques. However, since SDE preserves all information that is available in the sourcetext, arbitrary optimization levels are possible at the time of code generation, albeit without the shortcutsprovided by code-copying. One then simply treats the semantic dictionary as an abstract syntax-tree in tabularform.
The interesting part is that code-copying is beneficial especially on machines that are otherwise slow, i.e., CISCprocessors with few registers. Consider module M again, in which the subexpression “i := i + j” appears threetimes. On a simple machine that has only a single accumulator or an expression stack, the identical instructionsequence will have to be used in each occurrence of the subexpression. Code-copying can accelerate thecode-generation process in this case. The optimal solution for a modern RISC processor, on the other hand,might well consist of three distinct instruction sequences for the three instances of the subexpression, due to theuse of register variables and the effects of instruction pipelining. However, such processors will generally bemuch faster, counterbalancing the increased code-generation effort, so that an acceptable speed of codegeneration can still be achieved even if optimization is necessary. The SDE Encoder and Decoder
The previous chapter has introduced the technique of semantic-dictionary encoding (SDE). On the basis of this encoding technique, a system has been implemented, in which the code generator is no longer part of the compiler, but has been incorporated into the module loader. The implemented system features an Oberon-to-SDE compiler that transforms programs written in the programming language Oberon [Wir88] into the SDE file-representation, and a code-generating loader that reads SDE-files and uses the information contained in them for generating code on-the-fly for Motorola 68020 processors [Mot87].
This chapter discusses the encoding and decoding mechanisms and relates the motivation behind key designdecisions. Most importantly, it explains some of the heuristics used in the management of semantic dictionaries. Making use of Scoping
Many programming languages limit the scope of certain objects appearing within a program, for example by restricting the visibility of a procedure’s local variables to the extent of the procedure body. These scoping rules can be exploited for reducing the size of the semantic dictionary. Smallness of the semantic dictionary is beneficial because a program in SDE form is composed of dictionary indices. The larger the dictionary, the more bits are required on average for the representation of these indices.
It is therefore desirable to remove from the semantic dictionary continuously all entries which (directly orindirectly) relate to objects that have become inaccessible by the scoping rules. However, the encoder and thedecoder of the SDE format need to synchronize their dictionary operations. Any removal of an entry from theencoder’s dictionary must, therefore, be communicated to the decoder, and this can happen only via the encodedprogram itself.
We also note that there are objects that are visible throughout a module, such as constants and global variables. Furthermore, the dictionary contains also the operation templates that are used for encoding the semantic actionsof a program. These must never be removed from the dictionary.
An easy and efficient solution to dictionary management, taking into account these differences betweentemporary and permanent entries, is to let the dictionary grow in two directions simultaneously, as illustrated inFigure 3.1. Entries with unlimited scope are added at one end of the dictionary, which grows unboundedly. Allother nodes are added at the opposite end of the dictionary, which is managed as a stack. Every time a new scopeis opened, the dictionary’s current extent in this second direction is marked. It is reset to the same point when thescope is closed again. The decoder of the SDE format can keep track of these mark and restore operations bymonitoring its input stream for the occurrence of dictionary indices denoting procedure entry and return. Figure 3.1: Two Directions of Dictionary Growth
Code–Generation On–the–Fly: A Key to Portable Software
Representing Indices on the File
In the implemented system, dictionary indices are represented on SDE-files in a variable-length data formatbased on a stop-bit, as has been described in [Fra93b]. This variable-length data format requires less space forencoding small absolute values than it needs for encoding large absolute values.
Such a compact format is ideally suited for representing dictionary indices. However, the information density ofthe resulting encoding depends significantly on the magnitudes of the individual indices and, therefore, on thedistance of entries from the origin of the dictionary. For a compact encoding, it is essential to map the indexvector onto the dictionary in such a way that often-occurring dictionary entries come to lie within the range thatcan be addressed by a minimum number of bits.
Experiments have revealed that the most common dictionary references in SDE-encoded programs relate totemplates describing the primitive operations of the source language, and to expressions involving localvariables. A dictionary addressing scheme as indicated in Figure 3.2 has, therefore, been adopted. Since mostprocedures start at lexical level zero and are relatively short (i.e., they generate few new dictionary entries), ahigh proportion of the dictionary indices appearing in typical SDE representations will then actually fall withinthe range of indices that can be represented space-efficiently. Figure 3.2: Relative Position of the Short Dictionary Addressing RangeEncoding Oberon Source-Programs into SDE
The Oberon-to-SDE compiler incorporates a fairly conventional compiler front-end [Cre91]. It parses each Oberon source text by recursive descent and builds a symbol table as well as an abstract syntax tree, i.e., a directed acyclic graph that describes the semantic actions of the program. If no error is detected during parsing, control passes to the encoder, which creates an SDE-file representing the contents of the source program in a compact and machine-independent manner.
In doing so, the encoder traverses the abstract syntax tree. For each node reached during this traversal, it searches the current semantic dictionary for an entry with a high conformance to the node. Conformance describes how well a dictionary entry matches a node of the tree. An entry is fully conforming, or isomorphic, to a node of the abstract syntax tree, if both of them describe the same semantic action, and if they both either have no descendants at all, or if all of their corresponding descendants are isomorphic to each other. For example, in Figure 3.3, the dictionary entry with index n+2 is isomorphic to the node labelled “t”, and the entry with index n is isomorphic to node “u”. Figure 3.3: Dictionary Entry “n+2” is Isomorphic to Node “t”.
Only complete entries (i.e., entries with no undefined descendants) can conform fully to syntax-tree nodes. Conformance for templates is defined in a more modest manner. A template conforms to a node of the abstractsyntax-tree, if it describes the same semantic action and all of its defined descendants are isomorphic to thecorresponding descendants of the node. For example, in Figure 3.4 below, the templates with indices x, x+1, andx+2 all conform to the node labelled “t”.
The search for a conforming dictionary entry succeeds always, because through the initialization of thedictionary we guarantee that there is at least one minimally conforming template in the dictionary for everypossible node in the abstract syntax tree. After finding a conforming dictionary entry, the encoder writes itsindex to the SDE-file and then traverses recursively all of the sub-trees of the abstract syntax-tree thatcorrespond to undefined descendants of the conforming entry. In cases in which there are several alternativeentries that all conform to a node in the abstract syntax tree, as in the example of Figure 3.4, the encoder canmaximize the information density of the SDE-file by selecting always the entry with the largest number ofdefined descendants.
Code–Generation On–the–Fly: A Key to Portable Software
Figure 3.4: Dictionary Entries “x”, “x+1”, and “x+2” all Conform to Node “t”.
The representation used in SDE-files is heavily biased towards the ease of decoding. The process of encodingrequires many searches in the dictionary and is, therefore, inherently less efficient than decoding. In principle, itwould be quite possible to perform all of these required searches by traversing the whole dictionary linearly eachtime until an adequate entry had been found. However, this would make the semantic-dictionary encoding oflarge programs unpractical.
Instead of searching the whole dictionary over and over, the current implementation uses an overlay sortingstructure to reduce the number of dictionary entries that need to be inspected. Dictionary entries are grouped bytheir semantic action in this sorting structure, and all entries related to a certain procedure-scope are removed assoon as that scope has been closed.
Note that for correctness of the algorithm, it would suffice to reset the dictionary counter at the end of eachscope, while the entries themselves would not have to be removed from the dictionary. Due to the nature of thescoping rules, it would be impossible for any dictionary entry ever to be returned as the result of a search afterthe corresponding scope had been closed. Heuristics of Dictionary Management
There are several time and space trade-offs to be considered in the algorithm that manages the semanticdictionary. This becomes most important when considering the strategy by which new templates are added. Ingeneral, adding a greater number of templates leads to a denser program representation on SDE-files, but it alsoinflates the size of the dictionary and thereby the memory requirements of the encoding and decoding steps. Itmay also increase the time of decoding, due to the overhead of having to allocate a larger dictionary and havingto perform more initializations. The following discussion should exemplify the sort of questions that need to beanswered in the design of dictionary-management heuristics. As before, an expression syntax is used, in which adot stands for a missing link of a template entry.
Consider the expression “a + b”. Since addition is a fundamental operation of the Oberon language, a templatefor the generic expression “. + .” is guaranteed to exist in every Oberon-specific semantic dictionary after itsinitialization. Similarly, if some variables “a” and “b” appear in the symbol table of a certain program, then thecorresponding semantic dictionary will be initialized automatically to contain appropriate references. Theexpression “a + b” occurring at the very beginning of a program will, therefore, be translated into a sequence ofthree dictionary indices, standing for “. + .”, “a”, and “b”.
At this point, the designer of the dictionary-management heuristics has a choice which of the related entries “. +b”, “a + .”, and “a + b” should be added to the dictionary. Clearly, the strategy that yields the most compactSDE-files, at the expense of table space, is to add all three variants. For the moment, let us assume that all threevariants were added. Now consider what happens if, further down in the same program, the expression “a + x” iscome across. This can be encoded based on the entry “a + .” already in the semantic dictionary. But then what?Again there is a choice of further entries that could be added to the dictionary. Clearly, adding “a + x” isbeneficial, but what about “. + x”?
The problem lies in the fact that “. + x” may very well be in the semantic dictionary already, for example,resulting from a previous encoding of “y + x”. However, determining whether an entry is in the dictionaryrequires a search, which is expensive and, therefore, not acceptable at the the time of decoding. Hence, there is achoice of either leaving the entry in question out of the dictionary altogether, or to blindly enter it anyway,possibly for a second time and at a waste of table space.
The question of which strategy is better is still very much open. Different heuristics of template-generation havebeen experimented with, leading to the observation that the effect varies with the style in which the originalsource program has been written, depending, among other things, on the number of common subexpressions thathave been factored out by the programmer explicitly. A possible solution would of course be to try severalvariants during the encoding process, finally using the strategy best suited for each particular source program. The corresponding heuristic to be applied during decoding could then be identified by a tag in the SDE-file. Tokeep it simple, the current implementation uses a straightforward strategy generating relatively few templates. Factorization of Procedure Calls
When we examine existing Oberon programs, we observe that the parameter lists of procedures are usuallyarranged in such a way that parameters become “more variable” to the right side of a parameter list. Forexample, the procedure
WriteBytes(VAR r: Files.Rider; VAR x: ARRAY OF CHAR; n: LONGINT)
is likely to be called several times for any fixed rider r with varying further parameters. Moreover, the actualparameter substituted for n is still more likely to vary than the one assigned to the formal parameter x.
This arrangement is partly due to the extensive use of abstract data types, which naturally appear as the firstparameters of the procedures operating on them. However, there also seems to exist a deeper, unwritten, butgenerally observed rule among programmers to structure parameter lists in the way mentioned. Thedictionary_management heuristics used in the encoding of procedure calls has been based on the hypothesis thatit is natural to place “less variable” parameters before “more variable” ones.
Consequently, the implemented system supports the factorization of common parameter lists that partially matchfrom the left side. For example, after encoding the procedure call “P(x, y, z)”, a corresponding complete entry isadded to the semantic dictionary, as are the two templates “P(x, ., .)”, and “P(x, y, .)”. A subsequent procedurecall “P(x, y, a)” can then be represented space-efficiently by only two dictionary indices, standing for “P(x, y, .)”and “a”.
This strategy represents a trade-off between information density and size of the dictionary, while it is amenableto efficient implementation. An adequate mapping onto a semantic-dictionary representation can be found quitesimply by viewing each procedure call as a binary tree of partial evaluations, as shown in Figure 3.5.
Code–Generation On–the–Fly: A Key to Portable Software
Figure 3.5: Tree View of a Procedure Call.
An alternative strategy would have been to generate even further dictionary entries describing every possiblepermutation of wild-cards in parameter lists. For example, after encoding the procedure call “P(x, y, z)” above,one might have also included the following entries, in addition to the ones mentioned previously: “P(., y, .)”,“P(., ., z)”, “P(., y, z)”, and “P(x, ., z)”. Preparing the Dictionary for Decoding
In the implemented system, SDE-files are decoded at the time of loading. This is done by an SDE-decodingcode-generating loader that is based on the code generator of the MacOberon system [Fra90a, Fra90b, Fra91,BCF92, Fra93a, Fra93b]. It makes use of the idea of code-copying, as outlined in the previous chapter, for whicha preprocessed syntax tree, as it results from semantic-dictionary encoding and in which source-level commonsubexpressions and common designators have been factorized, offers an ideal basis. The strategy of initializingthe semantic dictionary with entries based on the objects occurring in the symbol table is able to further enhancethe efficiency of code generation.
Consider an object of the variable category in the symbol table. No further discrimination is usually made in thecompiler front-end between different kinds of variables; for example, with respect to whether they are global to amodule or local to a procedure. This distinction is left to the code generator, which for each variable referencehas to determine what particular kind of variable it is handling, i.e., which addressing mode is to be employed.
Within an SDE-file, however, all references to variables are based on entries that are placed into the dictionary inan initialization step. The important insight is that the meanings of the individual entries in the decoder’sdictionary need not directly match those of the corresponding entries used in the encoding step, as illustrated inFigure 3.6. Hence, it is possible to simplify the task of code generation by performing a certain amount of
preprocessing at the time of dictionary initialization, discriminating more finely between different variants of symbol_table objects on the decoder’s side than is necessary in the encoder.
PROCEDURE P(VAR vp: INTEGER); VAR loc: INTEGER;BEGIN …
Figure 3.6: Different Semantic Dictionaries for Encoding and Decoding.
The implemented code-generating loader for the MC68020 [Mot87] performs some of this preprocessing. Forexample, it differentiates between global variables (absolute addressing), direct local variables (frame-pointerrelative addressing) and indirect local variables (memory indirect addressing, used for variable parameters). Consequently, three different kinds of dictionary entry are used to represent these different kinds of variables,and all information required for address generation is gathered in the semantic dictionary already when it isinitialized. Since in typical programs most variables are referenced more than once, this strategy usuallyaccelerates the code-generation process. Decoding SDE-Files
After all the previous explanations, the design of the SDE-decoding code-generating loader then becomes quitestraightforward. First, the global symbol table is read from the SDE-file. Next, the semantic dictionary isinitialized as discussed before. Then, dictionary indices are read in succession and processed as is outlinedbelow:
Code–Generation On–the–Fly: A Key to Portable Software
< read an index idx and look up the corresponding entry E := dict[idx] >
IF E.copy_safe THEN < duplicate the code between E.beg and E.end at the current PC location>ELSE IF < E is a template > THEN < input undefined descendants of E recursively > < possibly create further templates heuristically > < create a new complete entry E from template and descendants > END;
E.beg := PC; < generate code for E, updating the E.copy_safe flag appropriately > E.end := PCEND;
The copy-safe property needs to be kept track of for every complete dictionary entry. It indicates whether code for the corresponding expression can be produced simply by duplicating an instruction sequence generated earlier. Templates are never copy-safe.
A dictionary entry is copy-safe if there are no pending fix-ups within the resulting code fragment, and if the codeis position-independent. In the implemented code-generating loader for the Motorola 68020, these criteria arefulfilled by all full entries with the exception of those describing expressions that contain calls of local functions. These calls are translated into relative branches, which have a different branch distance at every occurrence.
In fact, code-copying was the main reason why it was decided not to translate calls to external procedures intorelative branches, which would have been easy since all addresses in external modules are known whengenerating code on-the-fly. This was the single instance in which speed of execution was traded for speed ofcode generation. However, the performance of the two call instructions does not differ by much, while externalcalls occur often in typical modules so that one can benefit from code-copying frequently. Miscellaneous Implementation Details
There is a multitude of smaller design decisions in any project such as the one described here. Many of thesedecisions may be rather ad_hoc, while the motivations behind others may long be lost before they can bedocumented. The following list is by no means complete, but illustrates some of the finer points that are notmentioned elsewhere in this thesis. •
Symbol Table. From what has been said before, the reader might have obtained the impression that thecomplete symbol table is stored contiguously at the beginning of the SDE-file. This is not entirely correct. In fact, only the global scope is stored at the front of the SDE-file, while the remaining scopes follow thedictionary indices that represent the corresponding procedure entries. This provides for a more efficientstorage management in the decoder, allowing a stack mechanism to be used not only for the dictionary, butalso for the symbol table. Constants. In the currently implemented system, unnamed literal constants are actually not part of thesymbol table, but inserted literally into the stream of dictionary indices, following the index of a “constant”template that implicitly indicates the type of the constant. For each constant represented in this way, acomplete entry is constructed and added to the global end of the dictionary, so that further occurrences ofthe same constant can be referenced directly by an index. The constant NIL corresponds to a reserved wordof the Oberon language and is placed into the dictionary during its initialization, as are the pre-definedconstants TRUE and FALSE. Storage Allocation Primitives. In Oberon, calls to the standard procedures NEW and SYSTEM.NEW areusually translated into dedicated supervisor calls, so that only the appropriate trap vector number need bestatically known to the compiler, but not the routine that implements it. In the absence of memoryprotection on the target machine, it makes no sense to utilize the expensive supervisor call mechanismwhen the actual address of the Kernel routine that implements the supervisor instruction is obvious at thetime of code generation. Hence, the current implementation maps these two storage-allocation primitivesonto ordinary procedure calls, which not only shortens the instruction sequence, but also accelerates thecall. Modular Design of the Implementation
The previous chapters have introduced the technique of semantic-dictionary encoding and have presented an Oberon-to-SDE compiler and a code-generating loader. Up to this point, however, it has been disregarded completely that the implemented system has a modular structure. This chapter gives an overview of this modular architecture and points out some alternatives to the chosen approach. Modular Architecture
The implemented system is based on the Oberon System [WG89, WG92], specifically its MacOberon implementation [Fra90a, Fra90b, Fra93a] for Apple Macintosh computers [App85]. The Oberon System supports separate compilation with static interface checking of programs written in the programming language Oberon [Wir88]. It also offers dynamic loading of individual modules, meaning that separately compiled modules can be linked into an executing computing session at any time, provided that their interfaces are consistent with the interfaces of the modules that have been loaded already. In MacOberon, dynamic loading is accomplished by a linking loader, which modifies the code image retrieved from an object file in such a way that all references to other modules are replaced by absolute addresses.
The new implementation adds a second loader to MacOberon, namely a code-generating loader that processes files containing semantic-dictionary-encoded programs (SDE-files). It has been possible to integrate this code-generating loader transparently into the existing MacOberon environment, in the sense that SDE-files and native object-files are completely interchangeable and can, therefore, import from each other arbitrarily. Depending on a tag in the file (first two bytes), either the native linking loader or the code-generating loader is used to set up the module in memory and prepare it for execution.
The code-generating loader has not been incorporated into the core of MacOberon, but constitutes a self-contained module package at the application level. The existing module loader was modified slightly, introducing a procedure variable into which an alternate load procedure can be installed. Once initialized, this procedure variable is up-called whenever an abnormal tag is detected in an object file, and so initiates the passage of control, from the linking loader that is part of the system core, to the code-generating loader external to it. The installation of such an alternate load procedure is considered a privileged operation, analogous to the modification of an interrupt vector.
Besides the obvious advantages during the design and testing phases, this architecture permits us to viewmachine-independence quite naturally as an enhancement of an existing system. It also hints at the possibility ofadding successive external code-generating loaders as time progresses and standards for machine-independentobject-file formats emerge. Several such formats could in fact be supported simultaneously, among which onewould distinguish by different file-tags. Discussion: On System Architecture
In the implemented system, the code-generating loader is installed by an explicit call to Modules.This. However,it would be entirely possible to issue such a call during the initialization of module Modules itself, leading to athree-stage bootstrap of successively more powerful loading capabilities:
1. Boot Loader2. Native Linking Loader3. Code-Generating Loader
The only task of the boot loader is to install the native linking loader and the core operating system routinesrequired by it, such as file-system-access capabilities. The native linking loader in turn sets up thecode-generating loader from a pre-compiled object file. All higher modules, even such essential ones as thedisplay system, may then be represented by SDE-files, as long as the algorithms contained in them can beformulated machine-independently.
One immediately senses the possible implications of this. Entirely feasible, and implemented in an experimentalsystem variant, is a mode of operation in which external inputs during the bootstrapping process modify thedefault behaviour of the system. For example, depressing the key labelled “x” on the keyboard during startupcould turn on array subscript checking for all library modules being loaded during this phase, while it wouldnormally be turned off. Likewise, symbolic-debugging information could be generated only when needed, andneed not take up memory space in configurations that are used solely for running finished applications.
It is also notable that, after an initial bootstrap, one could have done away with machine-specific object-filesaltogether by including a code-generating loader in the boot-file and providing a mechanism for generating new
Code–Generation On–the–Fly: A Key to Portable Software
boot-files. In fact, with some simple modifications for supporting the generation of relocation information, thecode-generating loader itself could be employed for the generation of new boot-files. The contents of a boot-fileare very similar to those of the module area in the system heap at run_time, except that the former contains also alist of references that need to be relocated.
A similar approach has been taken in the Smalltalk-80 system [Gol84], in which there are onlytarget-machine-independent source-files and an image file containing the snapshot of a compiled system state. The image file needs to contain an interpreter or compiler, so that the portable source-files can be used at all. After modifying the run-time environment, a new image file can be generated that incorporates all of the changesapplied to the previous image.
Unfortunately, however, systems in which all machine-dependent parts are encapsulated within a boot-file havea drawback, which is that a change in any of the modules contained in the boot-file requires the generation of awhole new boot-file. It was exactly this inflexibility that modular architectures were trying to avoid in the firstplace. For this reason, the idea has not been pursued any further for the time being. Execution Frequency Hierarchy Considerations
Programs are usually compiled far less frequently than they are executed. Therefore, it is worthwhile to investsome effort into compilation, because the benefit will be repeated. The same holds for program loading versusexecution. While a program is normally loaded more often than it is compiled, the individual machineinstructions are executed even more often, many times on average per load operation. Unfortunately, we need toshift workload unfavourably when we employ a loader that performs code generation. This can be justified onlyif important advantages are gained in return, or if the resulting code quality is increased. After all, it is executionspeed that matters ultimately.
From the outset, three desired properties were therefore put forward that a system incorporating acode-generating loader should possess in comparison to a system employing a traditional compiler and linkingloader:
1. run-time performance should be at least as good2. loading time should not be much worse3. source-encoding time should be tolerable
These requirements could be met. Benchmarks in the next chapter will show that it was not only possible toconstruct a fast code-generating loader for Motorola 68020 processors [Mot87], but that the native codeproduced by it is of high quality. Indeed, although its method of code-generation is quite straightforward, thecode_generating loader is sometimes able to yield object code that would require a much more sophisticatedcode-generation strategy in a regular compiler. This is mainly due to the fact that the absolute addresses of allimported objects are known at loading time, which enables the code-generating loader to optimize access tothem; for example, by using short displacements instead of long ones.
The time required by the code-generating loader for creating native code on-the-fly turned out to be quiteacceptable, too. Loading with code-generation takes only slightly longer than loading of a native object-file withlinking; the difference is barely noticeable in practice. A partial reason for this is that the Oberon system has amodular structure, in which many functions are shared between different application packages. In traditionalsystems, these common functions would be replicated in many applications and statically linked to each of theseapplications.
As a consequence of Oberon’s modular design, each new application adds just little code to an already runningsystem. Hence, at most times during normal operation, the code-generating loader need only process themoderate number of modules that are unique to an application, while other modules that are also required willalready be in memory from past activations of other applications. Interchangeability of Object Files
SDE-files and their native-object-file counterparts are completely interchange-able in the implemented system. This flexibility doesn’t come as readily as it may seem. Not only does it require that the code-generating loaderis able to interpret native symbol-files and the corresponding entry-tables in memory, but also that informationcan be communicated from the new load architecture to the old one, in formats that are backward-compatiblewith the native compiler and loader.
In the present case, the more difficult part of this problem had an almost trivial solution, since the two varieties of “object” files in the implemented system share a common representation of symbol-table information. As documented in a paper by the author [Fra93b], a portable symbol-file format had been introduced into
MacOberon some time ago, leading to improvements in performance completely unrelated to portability. Butnow, there were further benefits from the previous investment into machine-independence:
SDE-files use exactly the same symbol-table encoding as do the symbol files of the native compiler. Furthermore, the symbol table is embedded within each SDE-file in such a way that its front-most part wouldconstitute also a valid symbol file (Figure 4.1). Consequently, the native MacOberon compiler cannot distinguishSDE-files from its regular symbol-files. It is able to compile modules that import libraries stored in SDE form, asit can extract the required interface information directly from SDE-files. Figure 4.1: Embedding of Symbol File within SDE_File
Apart from the native compiler, the native loader need also be able to handle modules that importmachine-independent libraries. This is achieved by enabling the code-generating loader to construct on-the-flynot only object code, but also all of the remaining data structures that are usually part of a native object-file, suchas entry tables. Once that a module has been loaded from an SDE-file, it loses all aspects ofmachine-independence, and can henceforth be maintained (enumerated, unloaded, etc.) by the regularMacOberon module manager. Discussion: On Interfacing Without a Common Symbol File Format
Without a common symbol-table encoding, the task of interconnecting machine_independent modules in theSDE format with others in a regular object-file format would have been more difficult. Indeed, since SDE-filesare independent of the target architecture, we might want to use them on more than one kind of machine, onwhich many different native symbol-file formats could be in concurrent use. In the following discussion, theadjective native refers to the formats used on the eventual target machine, while portable denotes themachine-independent SDE format.
Enabling native clients to import portable libraries is quite straightforward, although it may not be so simple toimplement. It requires that each local compiler on every target machine can understand the portable symbol-tableencoding in addition to its own, and that each code-generating loader can provide entry tables matching thissymbol information, in the format required by the corresponding native loader. For some target machines,fulfilling the first requirement may entail a significant amount of programming, because the portablesymbol-table encoding contains no machine-dependent information such as type-sizes and offsets. Consequently,local compilers need to be able to calculate these values on-the-fly when reading the symbol table from anSDE-file.
The reverse import relationship, however, is even more problematic. Consider a portable module that importsmodule Files. Implementations of Files are inherently non-portable, so that they need to be compiled by a nativecompiler always. Nevertheless, a portable client module should be linkable to the local Files module of anarbitrary target machine, assuming that the different Files modules all have the same interface. But how do weensure that all of these interfaces on different machines are indeed compatible with the portable client, and howdo we encode references to them in the portable object-file?
A possible solution to this problem, which has actually been implemented in a precursor version of the present system, starts by defining a second, portable interface for each of the native library modules in question. These portable interfaces serve as place-holders for a portable subset of the actual interfaces found on different machines. They define capabilities that are expected to be present on every machine, so that native interfaces are in fact allowed to be a superset of their portable counterparts. Portable symbol-files are then generated from these “place-holder” interfaces, and used whenever portable clients are compiled.
For each target machine, one then constructs a tool called a localizer. The localizer connects each portable“place-holder” interface with the corresponding native one, reading both the native and the portable symbol file. If the native version of the module includes every feature specified in the portable interface, the localizer createsa mapping file that will tell the code-generating loader which local entry number is associated with which featurein the portable interface. This mapping mechanism is illustrated in Figure 4.2. Each mapping file contains also
Code–Generation On–the–Fly: A Key to Portable Software
the keys of both related symbol files, so that the validity of a mapping can be verified by comparing modulekeys. Figure 4.2: Localization of an InterfaceLevels of Symbolic Information
As mentioned in [Fra93b], machine-independence of symbol files mandates that the size of each data type (aswell as offsets and other machine-dependent data) is calculated only at the time that the symbol file is read,because it may differ from machine to machine. This has consequences when parts of a data structure can behidden, as is possible in Oberon. Since it is impossible to calculate the total size of the invisible componentsbeforehand, portable symbol-files need to contain a full structural description (the names of the hidden parts areobviously not needed) of every exported feature, including all non-exported parts.
This has the undesirable effect that a portable symbol-file (and thereby its key) may change when certainmodifications are made in the hidden part of the corresponding module, at least as long as a conventionalcombination of compiler, linker, and loader is used. When employing a code-generating loader instead, thissituation changes. Upon closer analysis of a symbol file’s information content, we find that ordinary compilersrequire more information than code-generating loaders:
• Code-generating loaders need only know about exported names and structure. They never require any
addresses, offsets, or sizes, nor do they need information about invisible parts of an exported object. Thisis because all inter-module references are handled strictly symbolically and are not resolved until theloading phase, which includes code generation.
• A compiler generating native code requires the same symbolic information, plus knowledge about all
invisible fields and structures to which references exist from exported types. This is because the nativecompiler has to calculate offsets, on which the addressing modes of some instructions may depend;certain offsets may also appear literally in the object code.
It would therefore in principle be possible to use a two-tier structure for storing the symbol table, in which thesecond layer reveals some finer grained details of the implementation. This mechanism has not beenimplemented because it adds complexity while applying only to a side-issue of the dissertation project, but thefollowing elaboration on the idea of two-tier symbol-files suggests a possible evolution of the current work. Consider the following structure of a symbol file:
Interface Part re_exported modules exported constants exported types (not mentioning invisible fields) exported global variables exported proceduresOffset Calculation Part more imported modules (containing the types of invisible fields) invisible types of invisible fields invisible fields (of types mentioned in Interface Part)
With this organization, all client modules that are represented by SDE-files are impervious to changes in thehidden part of a library module, because they depended only on the interface part of the library’s symbol file. Whenever a module is compiled under such a scheme, both parts of the symbol file are updated, regardless of thetype of “object” file generated. However, changes in the offset calculation part can affect only those clients thathave been compiled into directly executable object code.
In systems supporting two-level symbol_files, it is therefore advantageous to use as few native object-files aspossible, in order to reduce intermodule dependencies. Moreover, the penalty for using native object-files isgreater at a higher level in the hierarchy than at a low level, because interface invalidations propagate upwards. Ithas always been good software engineering practice to isolate machine dependencies in low-level modules andstrive for portability higher up in the module hierarchy. Having a technical argument might further increase theacceptance of these rules.
The interface part of the symbol file may even be changed to include per-feature keys instead of a single key for the whole interface. This would further reduce the interdependency of the portable modules in the system, as it would make possible a link-by-name facility. A client module can be connected to a library if all of the requirements of the client can be fulfilled by features of the library. Instead of a crude comparison of whole interfaces, we could therefore compare exported objects on one side with imported objects on the other. Changes in a module’s interface would then only invalidate those portable clients that made direct use of the features that were modified, but left all other portable clients unperturbed. By appending a separate version key to every name, we would be able to use a simple string comparison to determine whether the requirements of one module conform to the features of another, without having to repeat the compiler’s structural analysis.
Note that some key changes could be avoided also if the native compiler were allowed to change the ordering offields in record types (i.e., moving the visible fields to the lowermost offsets). However, the native compilerrequires also the size of these types. Adding or removing data fields therefore necessarily invalidates clientsrepresented by native object-files.
Code–Generation On–the–Fly: A Key to Portable Software
Benchmark Results
This chapter provides some data on the performance of the implemented system and discusses these results. Fourdifferent Oberon application-packages have been used in the following benchmarks. Filler is a program thatdraws the Hilbert and Sierpinski varieties of space-filling curves onto the screen. Hex is a byte-level file editor ofmedium sophistication. Draw is an extensible object-oriented graphics editor [WG92], of which the three mainmodules and three extension modules are included in the survey. Edit (formerly called Write) is a relativelysophisticated extensible document processing system [Szy92], five modules of which are studied here. Memory and File-Store Requirements
Figure 5.1 gives an impression of the compactness of SDE-files, the influence of the presence of run-timeintegrity checks on code size, and the memory requirements of the code-generating loader. Its first three columnscompare the sizes (in bytes) of native MacOberon object-files with those of SDE-files. Two different values aregiven for the former, reflecting different levels of run-time integrity checking. SDE-files contain enoughinformation to generate code with full integrity checking, but the user need only decide at loading time whetheror not he requires code that includes these run-time checks, and which ones should be included.
The column labelled “Native-” in the table below displays the size of native object-files that include referenceinformation for the MacOberon post-mortem debugger, but no extra code for nil-checking, index_checking,type_checking, nor for the initialization of local pointers to NIL. Conversely, the column labelled “Native+”gives the sizes of native object-files that include not only reference information, but also code for theaforementioned run-time checks. Neither kind of object file includes symbolic information for interactivedebugging, which can be added by enabling a further option.
The fourth column of Figure 5.1 shows the maximum size to which the semantic dictionary grows duringcompilation and code-generation. The code-generating loader requires about 80 times this number of bytes astemporary storage while loading a module. Figure 5.1: Object-File Size (Bytes) and Dictionary Size (Entries)
This data shows that on average, SDE-files are about 2.5 times more compact than native MacOberonobject-files not containing run-time integrity checks, and about 3 times more compact than native filesincorporating these checks. It is also noteworthy that the maximum size, to which the semantic dictionary growsduring encoding and decoding, is not proportional to the overall size of a module, but only roughly proportionalto the length of the longest procedure in a module. This seems to level out at a relatively small value in typicalmodules, much smaller than anticipated originally.
Figure 5.2 indicates how much information is output into memory by the code-generating loader. The firstcolumn repeats the sizes of SDE-files from the previous table. The second and third columns give two differentvalues for the size of the object code generated on_the_fly, depending on whether or not run_ time integritychecks (in the same combinations as before) are emitted. The remaining columns list the sizes ofdynamically-generated constant data (including type descriptors), reference data for the Oberon post-mortemdebugger, and link data. The latter comprises entry tables generated on-the-fly, so that native client modules canlater be connected to dynamically generated library modules. Figure 5.2: Sizes of SDE-Files and of Dynamically-Generated Data (Bytes)
It is notable that SDE-files encode programs more than twice as densely as object code for the MC68020architecture [Mot87]. If code that includes run-time checking is considered instead, the factor becomes even 2.5. This is in spite of the fact that SDE-files can additionally also serve as symbol files and contain referenceinformation as well. Performance
Unless stated otherwise, the following benchmarks were all carried out under MacOberon Version 4.03 on aMacintosh Quadra 840AV computer (MC68040 Processor running at 40MHz, equipped with a graphicsaccelerator card), using version 7.1.1 of the Macintosh System Software. All results are given in real time, i.e. the actual delay that a user experiences sitting in front of a computer, with a resolution of 1/60th of a second. Each benchmark was executed after a cold start, so that no files were left in the operating system’s file cachebetween a compilation and subsequent loading, and the best of three measurements is given in each case. Everyattempt has been made to present the system as a regular user would experience it in everyday use.
Figure 5.3 presents the times (in milliseconds) required for compilation and for module loading. The first twocolumns compare the native compilation times of the MacOberon compiler with the times that theOberon-to-SDE compiler requires for generating an SDE-file out of an Oberon source program. The remainingtwo columns report the loading times for both varieties of object file, including disk access, and, in the case ofSDE-files, also including on-the-fly generation of native code. On average, 10% of the object code emitted bythe code-generating loader can be generated by code-copying. All timings apply to the situation in which norun-time integrity checking is used.
Code–Generation On–the–Fly: A Key to Portable Software
Figure 5.3: Compilation versus SDE-Encoding and Module Loading Times (ms)
As can be seen from these timings, Oberon-to-SDE encoding on average takes about 1.5 times as long as normalcompilation. This factor could certainly be further reduced by using more complicated sorting structures for thesemantic dictionary (see Chapter 3). However, more important is the speed of loading with code generation. Thiscurrently takes about 1.5 times as long as normal loading, but the times spent directly on module loading tellonly half the story.
Figure 5.4 presents a more realistic measure of loading time, as it takes into account not only the time requiredfor loading an application, but also the duration of loading and displaying a typical document. The followingtiming values indicate how long (in milliseconds) a user must wait after activating a document-openingcommand before he can execute the next operation. Commands take longer the first time that they are issued,because the modules of the corresponding application have to be loaded as well. Subsequent activations thentake up much less time. Figure 5.4: Command Execution Times with Checks Disabled (ms)
Figure 5.5 demonstrates the effect of run-time integrity-checking on loading time. The measurements ofFigure 5.4 have been repeated, but with native object-files that incorporate run_time integrity checks and aretherefore larger, and with on-the-fly emission of the same checks in the code-generating-loader. Figure 5.5: Command Execution Times with Checks Enabled (ms)
The timings in Figures 5.4 and 5.5 show that, in practice, on-the-fly code-generation is already almostcompetitive to dynamic loading of pre-compiled native code from regular object-files. This applies to currentstate-of-the-art CISC hardware, and is in part due to the fact that disk reading is relatively slow. Thecompactness of the SDE representation speeds up the disk-access component of program loading considerably,and the time gained thereby counterbalances most of the additional processing necessary for on-the-fly codegeneration. This argument is supported by a comparison of Figures 5.4 and 5.5. It seems to be more efficient togenerate run-time checks on the fly than to inflate the size of object-files by including them there.
A noteworthy trend in hardware technology today is that processor power is rising more rapidly than disk accesstimes and transfer rates are falling. This trend is likely to continue in the future, which means that hardwaretechnology is evolving in favor of the ideas proposed in this thesis. Consider Figures 5.6 and 5.7, which repeatthe timings of Figure 5.5 for different processors of the MC680x0 family, using the identical external disk driveunit for all experiments. Figure 5.6: Time for Draw.Open Counters.Graph on DifferentMachines (ms)Figure 5.7: Time for Edit.Open OberonReport.Text on DifferentMachines (ms)
Extrapolating from these results, there is reason to believe that on-the-fly code-generation from small SDE-filesmay eventually become faster than dynamic loading of larger native object-files, unless of course secondarystorage universally migrates to a much faster technology. One way or the other, on-the-fly code_generation willdefinitely become faster in absolute terms as clock speeds increase further, so that the relative speed incomparison to native loading should not much longer be of importance anyway. Ultimately, the speed of loadingneeds to be tolerable for an interactive user — that is all that matters. Code Quality
The last point that needs to be addressed concerns the quality of code that is generated dynamically. Figure 5.8,by way of a popular benchmark, compares the quality of code generated on-the-fly with that generated by theAppleMPW C compiler for the Macintosh (Version 3.2.4) with all possible optimizations for speed enabled (“-m-mc68020 -mc68881 -opt full -opt speed”). The generation of integrity checks was disabled in thecode-generating loader, because no equivalent concept is present in C.
Code–Generation On–the–Fly: A Key to Portable Software
Figure 5.8: Benchmark Execution Times (ms)
The table lists execution times in milliseconds (less is better). Due to processor cache effects, these timings canvary by as much as 15% when executed repeatedly; the figures give the best of three executions. The C versionof the Treesort benchmark exceeded all meaningful time bounds, due to apparent limitations of the standardMacintosh operating system storage-allocator. The code-generating loader is not bound by these limitations, as itgenerates calls to the storage-allocator of MacOberon, which includes an independent memory-managementsubsystem.
From this data, it can be inferred that the code-generating loader emits native code of high quality that cancompete with optimizing C compilers. In some cases, it surpasses even the output of the official optimizingcompiler recommended by the manufacturer of the target machine, which is orders of magnitude slower incompilation. It should be possible to improve the remaining cases in which the code-generating loader iscurrently inferior, without sacrificing much of the speed of on-the-fly code generation. Since the code-generationefficiency of the code-generating loader is rooted in the compactness of the abstract program representation,rather than in any machine-specific details, it should also be possible to duplicate it for other architectures.
The Apple MPW C compiler requires about 6.9 seconds for compiling a C source program that performs theseries of benchmarks listed above, and a similar time additionally for linking, compared to 1.1 seconds that areneeded for encoding a corresponding Oberon program into the SDE file-representation and 233 milliseconds forloading it with on-the-fly code generation (including file access after a cold start). Portability and Software Components
It has already been mentioned that semantic-dictionary-encoded programs are independent of the eventual target machine, which means that they are, in principle, portable between different machine architectures. This chapter examines the notion of software portability more closely; it presents some existing definitions and approaches to portability and introduces a new, concise and practical definition of upward-compatibility. It is then argued that fast on-the-fly code-generation from a machine-independent intermediate represen-tation constitutes an enabling technology for a software-component industry, because it would simplify the distribution and maintenance of such compo-nents considerably, without resulting in a loss of code efficiency or user-convenience. Portability
Portability is one of the more elusive concepts in computer science. There is in fact only one aspect of portabilityabout which there seems to exist a general consensus, namely that it is beneficial. Probing further, we soon findthat there are many diverse opinions of what portability actually amounts to in practice. This is somewhatsurprising, considering the economic importance of the concept and the widespread use of the term “portable”,which seems to be quite clear intuitively.
Most computer-literate people would probably agree that portability somehow relates to “usability of the same software on different machines”. A casual remark by Dahl [Dah84] describes portability as “having the same meaning for software as compatibility for hardware”. Again, most of us would probably consent to that statement, but this does not get us much further. Unfortunately, the term “compatible” is equally ambiguous as “portable”.
Part of the problem of the portability debate is that so many diverse things have been mixed up into it. Moreover, the emphasis has shifted significantly over time. Not long ago, a major thread of the portability argument concerned the physical portability of programs and data, specifically the problems of different character sets (e.g., ASCII versus EBCDIC) and the multitude of storage media (e.g., 7-track versus 9-track magnetic tape). In 1975, Waite [Wai75] reported that “getting it into the computer” was often more difficult than “getting it to run”. Those were the days when the first design decision at the beginning of a programming project concerned the choice of a character set.
Today, physical portability of programs is no longer of any concern and physical portability of data is becomingless of an issue because rising processor speed makes it possible to re-code data transparently on-the-fly. Theproblems of media compatibility have been moderated by evolving standards and by dropping hardware costs,which have made it possible to support a multitude of formats concurrently. On top of all this, pervasivenetworking is successively eliminating the need to use physical media at all for transporting machine-readableinformation.
Other no longer relevant issues relate to former limitations in compilers and operating systems, for example,regarding the length of identifiers in source programs — it is becoming hard to appreciate that this was ever aproblem. Tanenbaum et al. [TKB78] enumerate exhaustively the areas in which programs were most unlikely tobe portable in 1978. Half of their concerns simply no longer apply in 1993. Unfortunately, however, newchallenges have emerged in the meantime, such as the mastery of parallelism in multiprocessor systems. Thesewill keep us busy for a while. Portability of Programs
The cost of developing software is rising steadily as programs get ever more complex in order to use theadditional capabilities of emerging hardware to the fullest. At the same time, hardware costs are falling, makingthe disproportionately high price that we pay for software more apparent. An obvious solution for keeping incheck the exploding costs of software is to spread the cost of development by selling it in large volumes. However, in view of the many different computer architectures that coexist, such mass-market software mustnecessarily be portable so that it is not confined to the market segment of just one specific type of targetmachine.
A necessary precondition for portable software is the use of a high-level language in its construction, because all alternatives would render it machine-dependent. In this context, a high-level language is one that abstracts from the underlying hardware. However, the obstacles to portability that remain even when high-level languages are used are often underestimated, as relatively few computational processes do not depend upon their environment in some way.
Code–Generation On–the–Fly: A Key to Portable Software
Nevertheless, most algorithms can be parametrized in a manner that facilitates easy adaptation to a particulartarget environment, although this usually requires some planning ahead (“design for portability”). A commonsolution for handling machine dependencies in a program is to isolate into a single module those parts of theprogram that are machine-dependent. Hopefully, only this machine-dependent module needs to be modifiedwhen the program is subsequently ported to another machine.
Some authors make a finer distinction between programs that are usable on several machines without any changes whatsoever, and those that require adjustments. Hague and Ford [HF76] use the term portable in a restrictive way to describe only those programs that can be compiled and will then execute correctly on other machines completely unmodified. If the necessary changes for such a move are capable of mechanical implementation via a preprocessor, they call the software transportable.
Others use a less rigid definition of the term portable. A widespread practice is to call a program portable if theeffort of moving it to a new environment is much less than the effort of rewriting it for the new environment[BH76, Fox76]. For some authors, a program may even be called portable if it is so well documented that it canbe rewritten completely for another machine with the help of the original design documents [Wal82]. Dahlstrand[Dah84] interprets portability as a measurable quantity and suggests to express it as “the percentage of lines thatcould be left unchanged when porting an application”. Portability of Numerical Software
Numerical software is written almost exclusively in FORTRAN, so that the “portability” effort in this contextconcerns itself only with the incompatibilities that arise out of the use of various language dialects and theparticularities of different target environments. Tools such as the PFort verifier [Ryd74], which checks aFORTRAN program for adherence to a portable subset of the language, are useful in detecting these potentialimpediments to portability.
Several existing numerical software libraries have been designed to be portable. The central idea of a portable library is to maintain only a single master version of each routine, and derive all variants from it mechanically [Boy76]. Maintaining just one version of a program offers substantial economic advantages.
The simplest approach to such a master-file solution is a markup scheme, under which a programmer inserts annotations into a FORTRAN source file, which specify the changes that need to be made to produce other versions. These annotations (which are specially marked comments) are interpreted by a preprocessor, which outputs a version of the FORTRAN routine that is tailored to a particular target environment. Examples of markup schemes are the Specializer [Kro76] and the IMSL Converter [Air76]. The latter includes also a special “scan” option that can perform a precision conversion without requiring specially marked comments to trigger its operation. It is able to change type declarations, constants, and function names automatically as required.
In the Master Library File System (MLFS) [HF76, RH77] of the Oxford University Numerical Algorithms Group, the control statements that differentiate between the different variants of a program are introduced automatically rather than by a human programmer. This is accomplished by the use of an anti-editor. The anti_editor is a tool that compares a particular executable version of a certain routine with the master version of the same routine. It then modifies the master version so that the executable variant can be generated from it automatically in the future.
An even more ambitious tool is used in the maintenance of the LINPACK library [DMB79]. LINPACK is designed to be completely machine independent and makes no use of machine dependent constants, input/output statements, character manipulation, the COMMON or EQUIVALENCE statements of the FORTRAN language, nor of mixed-mode arithmetic. It is therefore possible to automate completely the process of converting it for the use in different target environments. The LINPACK project uses a system originally called NATS II and later renamed to Transformation-Assisted Multiple Program Realisation (TAMPR) [BD74, Dri76]. In this system, programs are stored in a canonical intermediate representation that can be translated mechanically into variants of FORTRAN for several machines. The backward translation is also possible, so that existing programs originating on some particular machine can be added to the library. All knowledge about specific target environments in TAMPR is contained in the converter programs, and not in the individual master files. As a consequence, however, this system can handle only programs that have been designed for machine independence.
Drug Trivia Game To have fun while educating and reinforcing information learned about various substances and harm reduction practices. Have Ready: Drug Trivia questions and answers. Drug Trivia game board pieces. Prize(s) for winning team (optional). Instructions: Divide the group into two teams and have them choose team names. Decide which te
2nd Balkandiab Meeting Chronic diabetic complications INTRODUCTION (or the status of Diabetes Care) will be presented. Dr Christos Manes In the second half we will discuss with the advisory President of Northern Greece, Diabetes association board the plan of our activities in the future. So it’stime to start. I would like to ask Dr. I. Kalo and Pr. M. On behalf of Northern