A new architecture for mini-computers—
The DEC PDP-11

by G. BELL,* R. CADY, H. McFARLAND, B. DELAGI, J. O'LAUGHLIN and R. NOONAN

Digital Equipment Corporation
Maynard, Massachusetts

and

W. WULF
Carnegie-Mellon University
Pittsburgh, Pennsylvania

INTRODUCTION

The mini-computer** has a wide variety of uses: communications controller; instrument controller; large-system pre-processor; real-time data acquisition systems...; desk calculator. Historically, Digital Equipment Corporation's PDP-8 Family, with 6,000 installations has been the archetype of these mini-computers.

In some applications current mini-computers have limitations. These limitations show up when the scope of their initial task is increased (e.g., using a higher level language, or processing more variables). Increasing the scope of the task generally requires the use of more comprehensive executives and system control programs, hence larger memories and more processing. This larger system tends to be at the limit of current mini-computer capability, thus the user receives diminishing returns with respect to memory, speed efficiency and program development time. This limitation is not surprising since the basic architectural concepts for current mini-computers were formed in the early 1960's. First, the design was constrained by cost, resulting in rather simple processor logic and register configurations. Second, application experience was not available. For example, the early constraints often created computing designs with what we now consider weaknesses:

1. limited addressing capability, particularly of larger core sizes
2. few registers, general registers, accumulators, index registers, base registers
3. no hardware stack facilities
4. limited priority interrupt structures, and thus slow context switching among multiple programs (tasks)
5. no byte string handling
6. no read only memory facilities
7. very elementary I/O processing

* Also at Carnegie-Mellon University, Pittsburgh, Pennsylvania.
** The PDP-11 design is predicated on being a member of one (or more) of the micro, midi, mini, ..., maxi (computer name) mark&. We will define these names as belonging to computers of the third generation (integrated circuit to medium scale integrated circuit technology), having a core memory with cycle time of .5 ~ 2 microseconds, a clock rate of 5 ~ 10 MHz ..., a single processor with interrupts and usually applied to doing a particular task (e.g., controlling a memory or communications lines, pre-processing for a larger system, process control). The specialized names are defined as follows:

<table>
<thead>
<tr>
<th>maximum addressable primary memory (words)</th>
<th>processor and memory cost (1970 kilodollars)</th>
<th>word length (bits)</th>
<th>processor state (words)</th>
<th>data types</th>
</tr>
</thead>
<tbody>
<tr>
<td>micro</td>
<td>8 K</td>
<td>5</td>
<td>8 ~ 12</td>
<td>2</td>
</tr>
<tr>
<td></td>
<td>32 K</td>
<td>5 ~ 10</td>
<td>12 ~ 16</td>
<td>2 ~ 4</td>
</tr>
<tr>
<td>mini</td>
<td>65 ~ 128 K</td>
<td>10 ~ 20</td>
<td>16 ~ 24</td>
<td>4 ~ 16</td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
<td></td>
<td>double length floating point (occasionally)</td>
</tr>
</tbody>
</table>
8. no larger model computer, once a user outgrows a particular model
9. high programming costs because users program in machine language.

In developing a new computer the architecture should at least solve the above problems. Fortunately, in the late 1960's integrated circuit semiconductor technology became available so that newer computers could be designed which solve these problems at low cost. Also, by 1970 application experience was available to influence the design. The new architecture should thus lower programming cost while maintaining the low hardware cost of mini-computers.

The DEC PDP-11, Model 20 is the first computer of a computer family designed to span a range of functions and performance. The Model 20 is specifically discussed, although design guidelines are presented for other members of the family. The Model 20 would nominally be classified as a third generation (integrated circuits), 16-bit word, 1 central processor with eight 16-bit general registers, using two's complement arithmetic and addressing up to \(2^{16}\) eight bit bytes of primary memory (core). Though classified as a general register processor, the operand accessing mechanism allows it to perform equally well as a 0-(stack), 1-(general register) and 2-(memory-to-memory) address computer. The computer's components (processor, memories, controls, terminals) are connected via a single switch, called the Unibus.

The machine is described using the PMS and ISP notation of Bell and Newell (1970) at different levels. The following descriptive sections correspond to the levels: external design constraints level; the PMS level—the way components are interconnected and allow information to flow; the program level or ISP (Instruction Set Processor)—the abstract machine which interprets programs; and finally, the logical design level. (We omit a discussion of the circuit level—the PDP-11 being constructed from TTL integrated circuits.)

DESIGN CONSTRAINTS

The principal design objective is yet to be tested; namely, do users like the machine? This will be tested both in the market place and by the features that are emulated in newer machines; it will indirectly be tested by the life span of the PDP-11 and any offspring.

Word length

The most critical constraint, word length (defined by IBM) was chosen to be a multiple of 8 bits. The memory word length for the Model 20 is 16 bits, although there are 32- and 48-bit instructions and 8- and 16-bit data. Other members of the family might have up to 80 bit instructions with 8-, 16-, 32- and 48-bit data. The internal, and preferred external character set was chosen to be 8-bit ASCII.

**Range and performance**

Performance and function range (extendability) were the main design constraints; in fact, they were the main reasons to build a new computer. DEC already has (4) computer families that span a range* but are incompatible. In addition to the range, the initial machine was constrained to fall within the small-computer product line, which means to have about the same performance as a PDP-8. The initial machine outperforms the PDP-5, LINC, and PDP-4 based families. Performance, of course, is both a function of the instruction set and the technology. Here, we're fundamentally only concerned with the instruction set performance because faster hardware will always increase performance for any family. Unlike the earlier DEC families, the PDP-11 had to be designed so that new models with significantly more performance can be added to the family.

A rather obvious goal is maximum performance for a given model. Designs were programmed using benchmarks, and the results compared with both DEC and potentially competitive machines. Although the selling price was constrained to lie in the $5,000 to $10,000 range, it was realized that the decreasing cost of logic would allow a more complex organization than earlier DEC computers. A design which could take advantage of medium- and eventually large-scale integration was an important consideration. First, it could make the computer perform well; and second, it would extend the computer family's life. For these reasons, a general registers organization was chosen.

**Interrupt response**

Since the PDP-11 will be used for real time control applications, it is important that devices can communicate with one another quickly (i.e., the response time of a request should be short). A multiple priority level, nested interrupt mechanism was selected; additional priority levels are provided by the physical position of a device on the Unibus. Software polling is

---

* PDP-4, 7, 9, 15 family; PDP-5, 8, 8/8, 8/1, 8/L family; LINC, PDP-8/LINC, PDP-12 family; and PDP-6, 10 family. The initial PDP-1 did not achieve family status.
unnecessary because each device interrupt corresponds to a unique address.

Software

The total system including software is of course the main objective of the design. Two techniques were used to aid programmability: first benchmarks gave a continuous indication as to how well the machine interpreted programs; second, systems programmer continually evaluated the design. Their evaluation considered: what code the compiler would produce; how would the loader work; ease of program reloca-

bility; the use of a debugging program; how the compiler, assembler and editor would be coded—in effect, other benchmarks; how real time monitors would be written to use the various facilities and present a clean interface to the users; finally the ease of coding a program.

Modularity

Structural flexibility (sometimes called modularity) for a particular model was desired. A flexible and straightforward method for interconnecting components had to be used because of varying user needs (among user classes and over time). Users should have the ability to configure an optimum system based on cost, performance and reliability, both by interconnection and, when necessary, constructing new components. Since users build special hardware, a computer should be easily interfaced. As a by-product of modularity, computer components can be produced and stocked, rather than tailor-made on order. Since users build special hardware, a computer should be easily interfaced. As a by-product of modularity, computer components can be produced and stocked, rather than tailor-made on order. The physical structure is almost identical to the PMS structure discussed in the following section; thus, reasonably large building blocks are available to the user.

Microprogramming

A note on microprogramming is in order because of current interest in the "firmware" concept. We believe microprogramming, as we understand it (Wilkes, 1951), can be a worthwhile technique as it applies to processor design. For example, microprogramming can probably be used in larger computers when floating point data operators are needed. The IBM System/360 has made use of the technique for defining processors that interpret both the System/360 instruction set and earlier family instruction sets (e.g., 1401, 1620, 7090). In the PDP-11 the basic instruction set is quite straightforward and does not necessitate microprogrammed interpretation. The processor-memory connection is asynchronous and therefore memory of any speed can be connected. The instruction set encourages the user to write reentrant programs; thus, read-only memory can be used as part of primary memory to gain the permanency and performance normally attributed to microprogramming. In fact, the Model 10 computer which will not be further discussed has a 1024-word read only memory, and a 128-word read-write memory.

Understandability

Understandability was perhaps the most fundamental constraint (or goal) although it is now somewhat less important to have a machine that can be quickly understood by a novice computer user than it was a few years ago. DEC's early success has been predicated on selling to an intelligent but inexperienced user. Understandability, though hard to measure, is an important goal because all (potential) users must understand the computer. A straightforward design should simplify the systems programming task; in the case of a compiler, it should make translation (particularly code generation) easier.

PDP-11 STRUCTURE AT THE PMS LEVEL*

Introduction

PDP-11 has the same organizational structure as nearly all present day computers (Figure 1). The primitive PMS components are: the primary memory (Mp) which holds the programs while the central processor (Pc) interprets them; io controls (Kio) which manage data transfers between terminals (T) or secondary memories (Ms) to primary memory (Mp); the components outside the computer at periphery (X) either humans (H) or some external process (e.g., another computer); the processor console (T. console) by which humans communicate with the computer and observe its behavior and affect changes in its state; and a switch (S) with its control (K) which allows all the other components to communicate with one another. In the case of PDP-11, the central logical switch structure is implemented using a bus or chained switch (S) called the Unibus, as shown in Figure 2. Each physical component has a switch for placing messages on the bus or taking messages off the bus. The central control decides the next component to

* A descriptive (block-diagram) level (Bell and Newell, 1970) to describe the relationship of the computer components: processors memories, switches, controls, links, terminals and data operators.
use the bus for a message (call). The S (Unibus) differs from most switches because any component can communicate with any other component.

The types of messages in the PDP-11 are along the lines of the hierarchical structure common to present day computers. The single bus makes conventional and other structures possible. The message processes in the structure which utilize S(Unibus) are:

1. The central processor (Pc) requests that data be read or written from or to primary memory (Mp) for instructions and data. The processor calls a particular memory module by concurrently specifying the module's address, and the address within the modules. Depending on whether the processor requests reading or writing, data is transmitted either from the memory to the processor or vice versa.

2. The central processor (Pc) controls the initialization of secondary memory (Ms) and terminal (T) activity. The processor sets status bits in the control associated with a particular Ms or T, and the device proceeds with the specified action (e.g., reading a card, or punching a character into paper tape). Since some devices transfer data vectors directly to primary memory, the vector control information (i.e., the memory location and length) is given as initialization information.

3. Controls request the processor's attention in the form of interrupts. An interrupt request to the processor has the effect of changing the state of the processor; thus the processor begins executing a program associated with the interrupting process. Note, the interrupt process is only a signaling method, and when the processor interruption occurs, the interruptee specifies a unique address value to the processor. The address is a starting address for a program.

4. The central processor can control the transmission of data between a control (for T or Ms) and either the processor or a primary memory for program controlled data transfers. The device signals for attention using the interrupt dialogue and the central processor responds by managing the data transmission in a fashion similar to transmitting initialization information.
5. Some device controls (for T or Ms) transfer data directly to/from primary memory without central processor intervention. In this mode the device behaves similar to a processor; a memory address is specified, and the data is transmitted between the device and primary memory.

6. The transfer of data between two controls, e.g., a secondary memory (disk) and say a terminal/T. display is not precluded, provided the two use compatible message formats.

As we show more detail in the structure there are, of course, more messages (and more simultaneous activity). The above does not describe the shared control and its associated switching which is typical of a magnetic tape and magnetic disk secondary memory systems. A control for a DECtape memory (Figure 3) has an S(DECtape bus) for transmitting data between

```
Ms(#0:7; 'DECtape) ...
```

```
S 'DECtape bus; concurrency:1
```

```
Kio('DECtape) S
```

Figure 3—DECtape control switching PMS diagram

a single tape unit and the DECtape transport. The existence of this kind of structure is based on the relatively high cost of the control relative to the cost of the tape and the value of being able to run concurrently with other tapes. There is also a dialogue at the periphery between X-T and X-Ms which does not use the Unibus. (For example, the removal of a magnetic tape reel from a tape unit or a human user (H) striking a typewriter key are typical dialogues.)

All of these dialogues lead to the hierarchy of present computers (Fig. 4). In this hierarchy we can see the paths by which the above messages are passed (Po-Mp; Pc-K; K-Pc; Kio-T and Kio-Ms; and Kio-Mp; and, at the periphery, T-X and T-Ms; and T.console-H).

Model 20 implementation

Figure 5 shows the detailed structure of a uniprocessor, Model 20 PDP-11 with its various components (options). In Figure 5 the Unibus characteristics are suppressed. (The detailed properties of the switch are described in the logical design section.)

Extensions to increase performance

The reader should note (Figure 5) that the important limitations of the bus are: a concurrency of one, namely, only one dialogue can occur at a given time, and a maximum transfer rate of one 16-bit word per .75 μsec., giving a transfer rate of 21.3 megabits/second. While the bus is not a limit for a uniprocessor structure, it is a limit for multiprocessor structures. The bus also imposes an artificial limit on the system performance when high speed devices (e.g., TV cameras, disks) are
transferring data to multiple primary memories. On a larger system with multiple independent memories the supply of memory cycles is 17 megabits/second times the number of modules. Since there is such a large supply of memory cycles/second and since the central processor can only absorb approximately 16 megabits/second, the simple one Unibus structure must be modified to make the memory cycles available. Two changes are necessary: first, each of the memory modules have to be changed so that multiple units can access each module on an independent basis; and second, there must be independent control accessing mechanisms. Figure 6 shows how a single memory is modified to have more access ports (i.e., connect to 4 Unibusses).

Figure 7 shows a system with 3 independent memory modules which are accessed by 2 independent Unibusses. Note that two of the secondary memories and one of the transducers are connected to both Unibusses. It should be noted that devices which can potentially interfere with Pc-Mp accesses are constructed with two ports; for simple systems, the two ports are both connected to the same bus, but for systems with more busses, the second connection is to an independent bus.

Higher performance processors

Increasing the bus width has the greatest effect on performance. A single bus limits data transmission to 21.4 megabits/second, and though Model 20 memories are 16 megabits/second, faster (or wider) data path modules will be limited by the bus. The Model 20 is not restricted, but for higher performance processors operating on double word (fixed point) or triple word (floating point) data two or three accesses are required for a single data type. The direct method to improve the performance is to double or triple the primary memory and central processor data path widths. Thus, the bus data rate is automatically doubled or tripled.

For 32- or 48-bit memories a coupling control unit is needed so that devices of either width appear isomorphic to one another. The coupler maps a data

Figure 8 shows a multiprocessor system with two central processors and three Unibusses. Two of the Unibus controls are included within the two processors, and the third bus is controlled by an independent control unit. The structure also has a second switch to allow either of two processors (Unibusses) to access common shared devices. The interrupt mechanism allows either processor to respond to an interrupt and similarly either processor may issue initialization information on an anonymous basis. A control unit is needed so that two processors can communicate with one another; shared primary memory is normally used to carry the body of the message. A control connected to two Pcs (see Figure 8) can be used for reliability; either processor or Unibus could fail, and the shared Ms would still be accessible.

Figure 6—1 and 4 port memory modules PMS diagram

Figure 7—Three Mp, 2 S(Unibus) structure PMS diagram

Figure 8—Dual Pc multiprocessor system PMS diagram
request of a given width into a higher- or lower-width request for the bus being coupled to, as shown in Figure 9. (The bus is limited to a fixed number of devices for electrical reasons; thus, to extend the bus a bus repeating unit is needed. The bus repeating control unit is almost identical to the bus coupler.) A computer with a 48-bit primary memory and processor and 16-bit secondary memory and terminals (transducers) is shown in Figure 9.

In summary, the design goal was to have a modular structure providing the final user with freedom and flexibility to match his needs. A secondary goal of the Unibus is open-endedness by providing multiple busses and defining wider path busses. Finally, and most important, the Unibus is straightforward.

THE INSTRUCTION SET PROCESSOR (ISP) LEVEL-ARCHITECTURE*

Introduction, background and design constraints

The Instruction Set Processor (ISP) is the machine defined by hardware and/or software which interprets programs. As such, an ISP is independent of technology and specific implementations.

The instruction set is one of the least understood aspects of computer design; currently it is an art. There is currently no theory of instruction sets, although there have been attempts to construct them (Maurer, 1966), and there has also been an attempt to have a computer program design an instruction set (Haney, 1968). We have used the conventional approach in this design: first a basic ISP was adopted and then incremental design modifications were made (based on the results of the benchmarks).**

---

* The word architecture has been operationally defined (Amdahl, Blaauw and Brooks, 1964) as “the attributes of a system as seen by a programmer, i.e., the conceptual structure and functional behavior, as distinct from the organization of the data flow and controls, the logical design and the physical implementation.”

** A predecessor multiregister computer was proposed which used a similar design process. Benchmark programs were coded on each of 10 “competitive” machines, and the object of the design was to get a machine which gave the best score on the benchmarks. This approach had several fallacies: the machine had no basic character of its own; the machine was difficult to program since the multiple registers were assigned to specific functions and had inherent idiosyncrasies to score well on the benchmarks; the machine did not perform well for programs other than those used in the benchmark test; and finally, compilers which took advantage of the machine appeared to be difficult to write. Since all “competitive machines” had been hand-coded from a common flowchart rather than separate flowcharts for each machine, the apparent high performance may have been due to the flowchart organization.

Although the approach to the design was conventional, the resulting machine is not. A common classification of processors is as zero-, one-, two-, three-, or three-plus-one-address machines. This scheme has the form:

\[ \text{op } l1, l2, l3, l4 \]

where \( l1 \) specifies the location (address) in which to store the result of the binary operation (op) of the contents of operand locations \( l2 \) and \( l3 \), and \( l4 \) specifies the location of the next instruction.

The action of the instruction is of the form:

\[ l1 \leftarrow l2 \text{ op } l3; \text{ goto } l4 \]

The other addressing schemes assume specific values for one or more of these locations. Thus, the one-address von Neumann (Burks, Goldstine and von Neumann, 1946) machines assume \( l1 = l2 = \) the “accumulator” and \( l4 \) is the location following that of the current instruction. The two-address machine assumes \( l1 = l2; l4 \) is the next address.

Historically, the trend in machine design has been to move from a 1 or 2 word accumulator structure as in the von Neumann machine towards a machine with accumulator and index register(s).* As the number of registers is increased the assignment of the registers to specific functions becomes more undesirable and inflexible; thus, the general-register concept has developed. The use of an array of general registers in the processor was apparently first used in the first-generation, vacuum-tube machine, PEGASUS (Elliott et al., 1956) and appears to be an outgrowth of both 1- and 2-address structures. (Two alternative structures—the early 2- and 3-address per instruction computers may be disregarded, since they tend to always access primary memory for results as well as temporary storage and thus are wasteful of time and memory cycles, and require a long instruction.) The stack concept (zero-address) provides the most efficient

---

* Due in part to needs, but mainly technology which dictates how large the structure can be.

---

**Figure 9—Computer with 48 bit Pc, Mp with 16 bit Ms, T PMS diagram**
access method for specifying algorithms, since very little space, only the access addresses and the operators, needs to be given. In this scheme the operands of an operator are always assumed to be on the “top of the stack”. The stack has the additional advantage that arithmetic expression evaluation and compiler statement parsing have been developed to use a stack effectively. The disadvantage of the stack is due in part to the nature of current memory technology. That is, stack memories have to be simulated with random access memories, multiple stacks are usually required, and even though small stack memories exist, as the stack overflows, the primary memory (core) has to be used.

Even though the trend has been toward the general register concept (which, of course, is similar to a two address scheme in which one of the addresses is limited to small values), it is important to recognize that any design is a compromise. There are situations for which any of these schemes can be shown to be “best”. The IBM System/360 series uses a general register structure, and their designers (Amdahl, Blaauw and Brooks, 1964) claim the following advantages for the scheme:

1. Registers can be assigned to various functions: base addressing, address calculation, fixed point arithmetic and indexing.
2. Availability of technology makes the general registers structure attractive.

The System/360 designers also claim that a stack organized machine such as the English Electric KDF 9 (Allmark and Lucking, 1962) or the Burroughs B5000 (Lonegran and King, 1961) has the following disadvantages:

1. Performance is derived from fast registers, not the way they are used.
2. Stack organization is too limiting and requires many copy and swap operations.
3. The overall storage of general registers and stack machines are the same, considering point #2.
4. The stack has a bottom, and when placed in slower memory there is a performance loss.
5. Subroutine transparency is not easily realized with one stack.
6. Variable length data is awkward with a stack.

We generally concur with points 1, 2, and 4. Point 5 is an erroneous conclusion, and point 6 is irrelevant (that is, general register machines have the same problem). The general-register scheme also allows processor implementations with a high degree of parallelism since instructions of a local block all can operate on several registers concurrently. A set of truly general purpose registers should also have additional uses. For example, in the DEC PDP-10, general registers are used for address integers, indexing, floating point, boolean vectors (bits), or program flags and stack pointers. The general registers are also addressable as primary memory, and thus, short program loops can reside within them and be interpreted faster. It was observed in operation that PDP-10 stack operations were very powerful and often used ((accounting for as many as 20% of the executed instructions, in some programs, e.g., the compilers.)

The basic design decision which sets the PDP-11 apart was based on the observation that by using truly general registers and by suitable addressing mechanisms it was possible to consider the machine as a zero-address (stack), one-address (general register), or two-address (memory-to-memory) computer. Thus, it is possible to use whichever addressing scheme, or mixture of schemes, is most appropriate.

Another important design decision for the instruction set was to have only a few data types in the basic machine, and to have a rather complete set of operations for each data type. (Alternative designs might have more data types with few operations, or few data types with few operations.) In part, this was dictated by the machine size. The conversion between data types must be easily accomplished either automatically or with 1 or 2 instructions. The data types should also be sufficiently primitive to allow other data types to be defined by software (and by hardware in more powerful versions of the machine). The basic data type of the machine is the 16 bit integer which uses the two's complement convention for sign. This data type is also identical to an address.

PDP-11 model 20 instruction set (basic instruction set)

A formal description of the basic instruction set is given in Appendix 1 using the ISPL notation (Bell and Newell, 1970). The remainder of this section will discuss the machine in a conventional manner.

Primary memory

The primary memory (core) is addressed as either 216 bytes or 215 words using a 16 bit number. The linear address space is also used to access the input-output devices. The device state, data and control registers are read or written like normal memory locations.
General register

The general registers are named: \( R[0:7](15:0) \); that is, there are 8 registers each with 16 bits. The naming is done starting (at the left with bit 15 (the sign bit)) to the least significant bit 0. There are synonyms for \( R[6] \) and \( R[7] \):

- Stack Pointer/SP(15:0) := \( R[6](15:0) \) used to access a special stack which is used to store the state of interrupts, traps and subroutine calls.
- Program Counter/PC(15:0) := \( R[7](15:0) \) points to the current instruction being interpreted. It will be seen that the fact that PC is one of the general registers is crucial to the design.

Any general register, \( R[0:7] \), can be used as a stack pointer. The special Stack Pointer (SP) has additional properties that force it to be used for changing processor state interrupts, traps, and subroutine calls (It also can be used to control dynamic temporary storage subroutines.)

In addition to the above registers there are 8 bits used (from a possible 16) for processor status, called PS(15:0) register. Four bits are the Condition Codes (CC) associated with arithmetic results; the T-bit controls tracing; and three bits control the priority of running programs Priority (2:0). Individual bits are mapped in PS as shown in Appendix 1.

Data types and primitive operations

There are two data lengths in the basic machine: bytes and words, which are 8 and 16 bits, respectively. The non-trivial data types are word length integers (w.i.); byte length integers (b.i.); word length boolean vectors (w.bv), i.e., 16 independent bits (booleans) in a 1 dimensional array; and byte length boolean vectors (b bv). The operations on byte and word boolean vectors are identical. Since a common use of a byte is to hold several flag bits (booleans), the operations can be combined to form the complete set of 16 operations. The logical operations are: "clear," "complement," "inclusive or," and "implication" \((x \lor y) \) or \( \neg x \lor y \).

There is a complete set of arithmetic operations for the word integers in the basic instruction set. The arithmetic operations are: add, subtract, multiply (optional), divide (optional), compare, add one, subtract one, clear, negate, and multiply and divide by powers of two (shift). Since the address integer size is 16 bits, these data types are most important. Byte length integers are operated on as words by moving them to the general registers where they take on the value of word integers. Word length integer operations are carried out and the results are returned to memory (truncated).

The floating point instructions defined by software (not part of the basic instruction set) require the definition of two additional data types (of length two and three), i.e., double word (d.w.) and triple (t.w.) words. Two additional data types, double integer (d.i.) and triple floating point (t.f. or f) are provided for arithmetic. These data types imply certain additional operations and the conversion to the more primitive data types.

Address (operand) calculation

The general methods provided for accessing operands are the most interesting (perhaps unique) part of the machine’s structure. By defining several access methods to a set of general registers, to memory, or to a stack (controlled by a general register), the computer is able to be a 0, 1 and 2 address machine. The encoding of the instruction Source (S) fields and Destination (D) fields are given in Fig. 10 together with a list of the various access modes that are possible. (Appendix 1 gives a formal description of the effective address calculation process.)

It should be noted from Figure 10 that all the common access modes are included (direct, indirect, immediate, relative, indexed, and indexed indirect) plus several relatively uncommon ones. Relative (to PC) access is used to simplify program loading, while immediate mode speeds up execution. The relatively uncommon access modes, auto-increment and auto-decrement, are used for two purposes: access to a stack under control of the registers* and access to bytes or words organized as strings or vectors. The indirect access mode allows a stack to hold addresses of data (instead of data). This mode is desirable when manipulating longer and variable-length data types (e.g., strings, double fixed and triple floating point).

The register auto increment mode may be used to access a byte string; thus, for example, after each access, the register can be made to point to the next data item. This is used for moving data blocks, searching for particular elements of a vector, and byte-string operations (e.g., movement, comparisons, editing).

*Note, by convention a stack builds toward register 0, and when the stack crosses 400, a stack overflow occurs.
This addressing structure provides flexibility while retaining the same, or better, coding efficiency than classical machines. As an example of the flexibility possible, consider the variations possible with the nimt trivial word instruction MOVE (see Figure 11). The instruction interpretation process is given in Figure 13, and follows the common fetch-execute cycle. There are three major states: (1) interrupting—the PC and PS are placed on the stack accessed by the Stack Pointer/SP, and the new state is taken from an address specified by the source requesting the trap or interrupt; (2) trace (controlled by T-bit)—essentially one instruction at a time is executed as a trace

### Instruction formats

There are several instruction decoding formats depending on whether 0, 1, or 2 operands have to be explicitly referenced. When 2 operands are required, they are identified as Source/S and Destination/D and the result is placed at Destination/D. For single operand instructions (unary operators) the instruction action is D ← u D; and for two operand instructions (binary operators) the action is D ← D b S (where u and b are unary and binary operators, e.g., +, −, /, respectively. Instructions are specified by a 16-bit word. The most common binary operator format (that for operations requiring two addresses) is shown below.

<table>
<thead>
<tr>
<th>op</th>
<th>D</th>
<th>S</th>
</tr>
</thead>
<tbody>
<tr>
<td>15</td>
<td>12</td>
<td>11</td>
</tr>
</tbody>
</table>

The other instruction formats are given in Figure 12.

### Instruction interpretation process

The instruction interpretation process is given in Figure 13, and follows the common fetch-execute cycle. There are three major states: (1) interrupting—the PC and PS are placed on the stack accessed by the Stack Pointer/SP, and the new state is taken from an address specified by the source requesting the trap or interrupt; (2) trace (controlled by T-bit)—essentially one instruction at a time is executed as a trace
The DEC PDP-11

Binary arithmetic and logical operations:

\[
\begin{array}{c|c|c|c}
\text{bop} & S & D \\
\end{array}
\]

- D := S \oplus D
- example: ADD (:=bop=0010) \rightarrow (CC,D \leftarrow D+S);

Unary arithmetic and logical operations:

\[
\begin{array}{c|c}
\text{Bop} & D \\
\end{array}
\]

- D := S
- examples: NEG (:=bop=00001100) \rightarrow (CC,D \leftarrow \neg D); ASL (:=bop=00000110011) \rightarrow (CC,D \leftarrow D \times 2); shift left

Branch (relative) operators:

\[
\begin{array}{c|c|c|c|c|c|c}
\text{Brop} & \text{Offsets} & \text{Form} & \text{PC} & \text{PC} & \text{Offset} \\
\end{array}
\]

- g brop cmditim \rightarrow (PC \leftarrow PC + \text{offset});
- example: SSP (:=brop=316) \rightarrow (PC \leftarrow (PC + \text{offset}) + 2);

Jump:

\[
\begin{array}{c|c|c|c|c|c|c}
\text{Brop} & \text{Address} & \text{Form} & \text{PC} & PC & \text{PC} \\
\end{array}
\]

- 0 000 000 001 \rightarrow D
- form: PC := D + PC
- Jump to subroutine: 0 000 100 000 \rightarrow D
- save R[er] on stack, enter subroutine at D + PC

Misc. operations:

\[
\begin{array}{c|c|c|c|c|c|c}
\text{Brop} & \text{Address} & \text{Form} & \text{PC} & PC & \text{PC} \\
\end{array}
\]

- save (er) on stack, enter subroutine at D + PC
- form: ST
- example: HALT (:= instruction = 0) \rightarrow (R[er] = 0);

Note: these instructions are all 1 word. D and/or S may each require 1 additional immediate data or address word. Thus instructions can be 1, 2, or 3 words long.

Figure 12—PDP-11 instruction formats (simplified)

Examples of addressing schemes

Use as a stack (zero address) machine

Figure 14 lists typical zero-address machine instructions together with the PDP-11 instructions which perform the same function. It should be noted that translation (compilation) from normal infix expressions to reverse Polish is a comparatively trivial task. Thus, one of the primary reasons for using stacks is for the evaluation of expressions in reverse Polish form.

Consider an assignment statement of the form

\[ D \leftarrow A + B/C \]

which has the reverse Polish form

\[ DABC/+\leftarrow \]

and would normally be encoded on a stack machine as follows:

- load stack address of D
- load stack A
- load stack B
- load stack C
- / +
- store
Common stack instructions:
- Load stack from memory address specified by stack
- Load stack from memory location A
- Store stack at memory address specified by stack
- Store stack at memory location A
- Duplicate top of stack
- Add 2 top data of stack to stack
- Subtract, multiply, divide
- Negate top data of stack
- Clear top data of stack
- Duplicate top of stack
- Add addressed location A to top of stack
- Jump unconditional
- Reset stack location to W
- A, "and" 2 top stack data

Equivalent PDP-11 instruction:
- MOVE (R0)+, - (R0)
- MOVE, (R0)+, (R0)+
- MOVE (R0)+, A
- ADD (R0)+, - (R0)
- MOVE (R0)+, - (R0)
- MOVE (R0)+, A
- ADD (R0)+, (R0)
- (see add)
- NEG A
- CLR A
- BTST (R0)+, (R0)
- COM 0
- TST 0
- BR (c, d, >, a, <, s)
- JUMP
- MOVE (R0)+, R1
- MOVE (R0)+, R2
- MOVE (R0)+, (R0)
- MOVE (R0)+, (R0)
- MOVE (R0)+, (R0)
- MOVE (R0)+, (R0)
- MOVE (R0)+, (R0)
- MOVE (R0)+, (R0)
- MOVE (R0)+, (R0)

Figure 14—Stack computer instructions and equivalent PDP-11 instructions

However, with the PDP-11 there is an address method for improving the program encoding and run time, while not losing the stack concept. An encoding improvement is made by doing an operation to the top of the stack from a direct memory location (while loading). Thus the previous example could be coded as:

- Load stack B
- Divide stack by C
- Add A to stack
- Store stack D

Use as a one-address (general register) machine

The PDP-11 is a general register computer and should be judged on that basis. Benchmarks have been coded to compare the PDP-11 with the larger DEC PDP-10. A 16 bit processor performs better than the DEC PDP-10 in terms of bit efficiency, but not with time or memory cycles. A PDP-11 with a 32 bit wide memory would, however, decrease time by nearly a factor of two, making the times essentially comparable.

Use as a two-address machine

Figure 15 lists typical two-address machine instructions together with the equivalent PDP-11 instructions for performing the same operations. The most useful instruction is probably the MOVE instruction because it does not use the stack or general registers. Unary instructions which operate on and test primary memory are also useful and efficient instructions.

Extensions of the instruction set for real (floating point) arithmetic

The most significant factor that affects performance is whether a machine has operators for manipulating data in a particular format. The inherent generality of a stored program computer allows any computer by subroutine to simulate any other—given enough time and memory. The biggest and perhaps only factor that separates a small computer from a large computer is whether floating point data is understood by the computer. For example, a small computer with a cycle time of 1.0 microseconds and 16 bit memory width might have the following characteristics for a floating point add, excluding data access:

programmed: 250 microseconds
programmed (but special normalize and differencing of exponent instructions): 75 microseconds
microprogrammed hardware: 25 microseconds
hardwired: 2 microseconds

It should be noted that the ratios between programmed and hardwired interpretation varies by roughly two orders of magnitude. The basic hardwiring scheme and the programmed scheme should allow binary program compatibility, assuming there is an interpretive program for the various operators in the Model 20. For example, consider one scheme which would add eight 48 bit registers which are addressable in the extended instruction set. The eight floating registers, F, would be mapped into eight double length

Two ADDRESS COMPUTE
A = B; transfer B to A
A = A; add = x, ;
A = A; negate
A = A ; inclusive or
A = A ; not
Jump unconditioned
Test A, a, and transfer to E

PDP-11
MOVE, R,A
ADD, R,A
NEG A
BTST, A
COM
JUMP
BR (c, d, >, a, <, s)
(32 bit) registers, D. In order to access the various parts of F or D registers, registers F0 and F1 are mapped onto registers R0 to R2 and R3 to R5.

Since the instruction set operation code is almost completely encoded already for byte and word length data, a new encoding scheme is necessary to specify the proposed additional instructions. This scheme adds two instructions: enter floating point mode and execute one floating point instruction. The instructions for floating point and double word data would be:

<table>
<thead>
<tr>
<th>binary ops</th>
<th>op</th>
<th>floating point/f and double word/d</th>
</tr>
</thead>
<tbody>
<tr>
<td>bop’ S</td>
<td>←</td>
<td>FMOVE</td>
</tr>
<tr>
<td></td>
<td>+</td>
<td>FADD</td>
</tr>
<tr>
<td></td>
<td>−</td>
<td>FSUB</td>
</tr>
<tr>
<td></td>
<td>×</td>
<td>FMUL</td>
</tr>
<tr>
<td></td>
<td>/</td>
<td>FDIV</td>
</tr>
<tr>
<td></td>
<td></td>
<td>FCMP</td>
</tr>
<tr>
<td>unary ops</td>
<td></td>
<td>FNEG</td>
</tr>
<tr>
<td>uop’ D</td>
<td>−</td>
<td>DNEG</td>
</tr>
</tbody>
</table>

LOGICAL DESIGN OF S(UNIBUS) AND PC

The logical design level is concerned with the physical implementation and the constituent combinatorial and sequential logic elements which form the various computer components (e.g., processors, memories, controls). Physically, these components are separate and connected to the Unibus following the lines of the PMS structure.

Unibus organization

Figure 16 gives a PMS diagram of the Pc and the entering signals from the Unibus. The control unit for the Unibus, housed in Pc for the Model 20, is not shown in the figure.

The PDP-11 Unibus has 56 bi-directional signals conventionally used for program-controlled data transfers (processor to control), direct-memory data transfers (processor or control to memory) and control-to-processor interrupt. The Unibus is interlocked; thus transactions operate independent of the bus length and response time of the master and slave. Since the bus is bi-directional and is used by all devices, any device can communicate with any other device. The controlling device is the master, and the device to which the master is communicating is the slave. For example, a data transfer from processor (master) to memory (always a slave) uses the Data Out dialogue facility for writing and a transfer from memory to processor uses the Data In dialogue facility for reading.
The assignment of bus mastership is done concurrent with normal communication (dialogues).

**Unibus dialogues**

Three types of dialogues use the Unibus. All the dialogues have a common protocol which first consists of obtaining the bus mastership (which is done concurrent with a previous transaction) followed by a data exchange with the requested device. The dialogues are: Interrupt; Data In and Date In Pause; and Data Out and Data Out Byte.

**Interrupt**

Interrupt can be initiated by a master immediately after receiving bus mastership. An address is transmitted from the master to the slave on Interrupt. Normally, subordinate control devices use this method to transmit an interrupt signal to the processor.

**Data in and data in pause**

These two bus operations transmit slave’s data (whose address is specified by the master) to the master. For the Data In Pause operation data is read into the master and the master responds with data which is to be rewritten in the slave.

**Data out and data out byte**

These two operations transfer data from the master to the slave at the address specified by the master. For Data Out a word at the address specified by the address lines is transferred from master to slave. Data Out Byte allows a single data byte to be transmitted.

**Processor logical design**

The Pc is designed using TTL logical design components and occupies approximately eight 8" × 12" printed circuit boards. The organization of the logic is shown in Figure 17. The Pc is physically connected to two other components, the console and the Unibus. The control for the Unibus is housed in the Pc and occupies one of the printed circuit boards. The most regular part of the Pc, the arithmetic and state section, is shown at the top of the figure. The 16-word scratch-pad memory and combinatorial logic data operators, D(shift) and D(add, logical ops), form the most regular part of the processor’s structure. The 16-word memory holds most of the 8-word processor state found in the ISP, and the 8 bits that form the Status word are stored in an 8-bit register. The input to the adder-shift network has two latches which are either memories or gates. The output of the adder-shift network can be read to either the data or address parts of the Unibus, or back to the scratch-pad array.

The instruction decoding and arithmetic control are less regular than the above data and state and these are shown in the lower part of the figure. There are two major sections: the instruction fetching and decoding control and the instruction set interpreter (which in effect defines the ISP). The later control section operates on, hence controls, the arithmetic and state parts of the Pc. A final control is concerned with the interface to the Unibus (distinct from the Unibus control that is housed in the Pc).

**CONCLUSIONS**

In this paper we have endeavored to give a complete description of the PDP-11 Model 20 computer at four descriptive levels. These present an unambiguous specification at two levels (the PMS structure and the ISP), and, in addition, specify the constraints for the design at the top level, and give the reader some idea of the implementation at the bottom level logical design. We have also presented guidelines for forming additional models that would belong to the same family.

**ACKNOWLEDGMENTS**

The authors are grateful to Mr. Nigberg of the technical publication department at DEC and to the reviewers for their helpful criticism. We are especially grateful to Mrs. Dorothy Josephson at Carnegie-Mellon University for typing the notation-laden manuscript.

**REFERENCES**

1 R H ALLMARK J R LUCKING
   Design of an arithmetic unit incorporating a nesting store
   Proc IFIP Congress pp 694-698 1962
2 G M AMDAHL G A BLAAUW F P BROOKS JR
   Architecture of the IBM System/360
   IBM Journal Research and Development Vol 8 No 2 pp 87-101 April 1964
3 C G BELL A NEWELL
   Computer structures
The DEC PDP-11

APPENDIX 1

DEC PDP-11 instruction set processor Description (in ISPL*)

The following description is not a detailed description of the instructions. The description omits the trap behavior of unimplemented instructions, references to non-existent primary memory and io devices, SP (stack) overflow, and power failure.

Primary Memory State

M/Mb/ Memory[0:2^16-1](7:0) (byte memory)
Mw[0:2^16-1](15:0) : = M[0:2^16-1](7:0) (word memory mapping)

Processor State (9 words)

R/Registers[0:7](15:0) (word general registers)
SP(15:0) := R[6](15:0) (stack pointer)
PC(15:0) := R[7](15:0) (program counter)

*ISP NOTATION

Although the ISP language has not been described in publications, its syntax is similar to other languages. The language is inherently interpreted in parallel, thus to get sequential evaluation the word "next" must be used. Italics are used for comments. The following notes are in order:

- a := f( . . . ) equivalence or substitution process used for name and process substitution. For every occurrence of a,f( . . . ) replaces it.
- a+f( . . . ) Replacement operator; the contents in register a are replaced by the value of the function.
- a:b array declaration, e.g., Q[0:11][0:4095](15:0) denotes a range of characters a, a+1, . . . , b to base n. If n is not given, the base is 2.
- [c:d] Array designation c, c+1, . . . , d
- a+b; equivalent to ALGOL if a then b
- "next" sequential interpretation
- instruction declaration, e.g., ADD (: = bop = 0010) → (CC, D ← D + S)

operators: = (+/add | -/subtract | negate | X/multiply | /divide | ∨/or | ∨/or | ∨/not | ∨/exclusive or | =/equal | >/greater than | ≥ | ≤ | ≠ | modulo | etc.)
PS(15:0)
Priority/P(2:0) := PS(7:5)

CC/Condition_Codes(3:0) := PS(3:0)

Carry/C := CC(0)

Negative/N := CC(3)
Zero/Z := CC(2)
Overflow/V := CC(1)

Trace/T := ST(4)

Undefined(7:0) := PS(15:8)

Run
Wait

Instruction Format
(Bit assignments used in the various instruction formats)

i/instruction(15:0)
bop(3:0) := i(15:12)
uop(15:6) := i(15:6)
brop(15:8) := i(15:8)
sop(15:6) := i(15:6)
s/source(5:0) := i(11:6)
  sm(0:1) := s(5:4)
  sd := s(3)
  sr := s(2:0)
d/destination(5:0) := i(5:0)
  dm(0:1) := d(5:4)
  dd := d(3)
  dr(2:0) := d(2:0)
offset(7:0) := i(7:0)
address_increment/ai

Data Types
by/byte(7:0)
w/word(15:0)
by.i/byte.integer(7:0)
w.i/word.integer(15:0)
by.bv/byte.boolean_vector(7:0)
w.bv/word.boolean_vector(15:0)

(processor state register)
(under program control; priority level of the process currently being interpreted a higher level process may interrupt or trap this process)
(under program control; when set, each instruction executed will trap; used for interpretive and break-point debugging)
(a result condition code indicating an arithmetic carry from bit 15 of the last operation)
(a result condition code indicating last result was negative)
(a result condition code indicating last result was zero)
(a result condition code indicating an arithmetic overflow of the last operation)
(denotes whether instruction trace trap is to occur after each instruction is executed)
(unused)
(denotes normal execution)
(denotes waiting for an interrupt)

(binary operation code)
(unary operation code)
(branch operation code)
(shift operation code)
(source control byte)
(source mode control)
(source defer bit)
(source register)
(signed 7 bit integer)
(implicit bit derived from i to denote byte or word length operations)

(signed integers)
(boolean vectors (bits))
The DEC PDP-11

\[ \text{d/double word}(31:0) \]
\[ \text{t/triple word}(47:0) \]
\[ \text{f/floating point}(47:0) \]

**Source/S and Destination/D Calculation**

\[ \text{S/Source}(15:0) := \left( \neg \text{sd} \rightarrow ( \right) \]

\[ \begin{align*}
\text{(sm = 00)} & \rightarrow R[\text{sr}]; \\
\text{(sm = 01)} & \land (sr \neq 7) \rightarrow (M[R[\text{sr}]]; \text{next } R[\text{sr}] \leftarrow R[\text{sr}] + ai); \\
\text{(sm = 01)} & \land (sr = 7) \rightarrow (M[\text{PC}]; PC \leftarrow PC + 2); \\
\text{(sm = 10)} & \rightarrow (R[\text{sr}] \leftarrow R[\text{sr}] - ai; \text{next } M[R[\text{sr}] ]); \\
\text{(sm = 11)} & \land (sr \neq 7) \rightarrow (M[M[\text{PC}] + R[\text{sr}]]; PC \leftarrow PC + 2); \\
\text{(sm = 11)} & \land (sr = 7) \rightarrow (M[M[\text{PC}] + PC]; PC \leftarrow PC + 2)); \\
\text{sd} & \rightarrow ( \\
\text{(sm = 00)} & \rightarrow M[R[\text{sr}] ]; \\
\text{(sm = 01)} & \land (sr \neq 7) \rightarrow (M[M[R[\text{sr}] ]); \text{next } R[\text{sr}] \leftarrow R[\text{sr}] + ai); \\
\text{(sm = 01)} & \land (sr = 7) \rightarrow (M[M[\text{PC}]]; PC \leftarrow PC + 2); \\
\text{(sm = 10)} & \rightarrow (R[\text{sr}] \leftarrow R[\text{sr}] - ai; \text{next } M[R[\text{sr}] ]); \\
\text{(sm = 11)} & \land (sr \neq 7) \rightarrow (M[M[\text{PC}] + R[\text{sr}]]; PC \leftarrow PC + 2); \\
\text{(sm = 11)} & \land (sr = 7) \rightarrow (M[M[M[\text{PC}] + PC]]; PC \leftarrow PC + 2)); \\
\end{align*} \]

The above process defines how operands are determined (accessed) from either memory or the registers. The various length operands, Db(byte), Dw(word), Dd(double), and Df(floating) are not completely defined. The Source/S and Destination/D processes are identical. In the case of jump instruction an address, D', is used—instead of the word in location $M[\text{CI}]$.

**Instruction Interpretation Process**

- Interrupt rq[i] ∧ Run ∧ Wait → $i \leftarrow M[\text{PC}]; PC \leftarrow PC + 2$; next instruction execution; next (fetch)
  
  (execute)

- $T \leftarrow \text{SP} \leftarrow \text{SP} + 2$; next
  
  $M[\text{SP}] \leftarrow \text{PS};$
  
  $\text{SP} \leftarrow \text{SP} + 2$; next
  
  $M[\text{SP}] \leftarrow \text{PC};$
  
  $PC \leftarrow M[148];$
  
  $\text{ST} \leftarrow M[168])$

- Interrupt rq[i] ∧ (CC[i] > CC) ∧ Run → (T ← 0;
  
  $\text{SP} \leftarrow \text{SP} + 2$; next
  
  $M[\text{SP}] \leftarrow \text{PS};$

Instruction Set and the Execution Process

-The following instruction set will be defined briefly and is incomplete. It is intended to give the reader a simple understanding of the machine operation.

Instruction execution := (MOV(♭ = bop = 0001) → (CC,D ← S);

MOV(B(♭ = bop = 1001) → (CC,Db ← Sb); (move word)

* not hardwired or optional
Binary Arithmetic: \( D \leftarrow D \pm S; \)

\[
\begin{align*}
\text{ADD}(= \text{bop} = 0110) & \rightarrow (CC,D \leftarrow D + \text{S}) ; \\
\text{SUB}(= \text{bop} = 1110) & \rightarrow (CC,D \leftarrow D - \text{S}) ; \\
\text{CMP}(= \text{bop} = 0010) & \rightarrow (CC \leftarrow D - \text{S}) ; \\
\text{CMPB}(= \text{bop} = 1010) & \rightarrow (CC \leftarrow Db - \text{Sb}) ; \\
\text{MUL}(= \text{bop} = 0111) & \rightarrow (CC,D \leftarrow D \times \text{S}) ; \\
\text{DIV}(= \text{bop} = 1111) & \rightarrow (CC,D \leftarrow D / \text{S}) ;
\end{align*}
\]

Unary Arithmetic \( D \leftarrow u \text{S}; \)

\[
\begin{align*}
\text{CLR}(= \text{uop} = 050a) & \rightarrow (CC,D \leftarrow 0) ; \\
\text{CLRB}(= \text{uop} = 1050a) & \rightarrow (CC,Db \leftarrow 0) ; \\
\text{COM}(= \text{uop} = 051a) & \rightarrow (CC,D \leftarrow \neg D) ; \\
\text{COMB}(= \text{uop} = 1051a) & \rightarrow (CC,Db \leftarrow \neg Db) ; \\
\text{INC}(= \text{uop} = 052a) & \rightarrow (CC,D \leftarrow D + 1) ; \\
\text{INCB}(= \text{uop} = 1052a) & \rightarrow (CC,Db \leftarrow Db + 1) ; \\
\text{DEC}(= \text{uop} = 053a) & \rightarrow (CC,D \leftarrow D - 1) ; \\
\text{DECB}(= \text{uop} = 1053a) & \rightarrow (CC,Db \leftarrow Db - 1) ; \\
\text{NEG}(= \text{uop} = 054a) & \rightarrow (CC,D \leftarrow \neg D) ; \\
\text{NEG}(= \text{uop} = 1054a) & \rightarrow (CC,Db \leftarrow \neg Db) ;
\end{align*}
\]

Shift operations: \( D \leftarrow D \times 2^n; \)

\[
\begin{align*}
\text{ROR}(= \text{sop} = 060a) & \rightarrow (CC,D \leftarrow C_D/2\{rotate\}) ; \\
\text{RORB}(= \text{sop} = 1060a) & \rightarrow (CC,Db \leftarrow C_Db/2\{rotate\}) ; \\
\text{ROL}(= \text{sop} = 061a) & \rightarrow (CC,D \leftarrow C_D \times 2\{rotate\}) ; \\
\text{ROLB}(= \text{sop} = 1061a) & \rightarrow (CC,Db \leftarrow C_Db \times 2\{rotate\}) ; \\
\text{ASR}(= \text{sop} = 062a) & \rightarrow (CC,D \leftarrow D \times 2) ; \\
\text{ASRB}(= \text{sop} = 1062a) & \rightarrow (CC,Db \leftarrow Db/2) ; \\
\text{ASL}(= \text{sop} = 063a) & \rightarrow (CC,D \leftarrow D \times 2) ; \\
\text{ASLB}(= \text{sop} = 1063a) & \rightarrow (CC,Db \leftarrow Db \times 2) ; \\
\text{ROT}(= \text{sop} = 064a) & \rightarrow (CC,D \leftarrow D \times 2^2) ; \\
\text{ROTB}(= \text{sop} = 1064a) & \rightarrow (CC,Db \leftarrow D \times 2^2) ; \\
\text{LSH}(= \text{sop} = 065a) & \rightarrow (CC,D \leftarrow D \times 2\{logical\}) ; \\
\text{LSHB}(= \text{sop} = 1065a) & \rightarrow (CC,Db \leftarrow Db \times 2\{logical\}) ; \\
\text{ASH}(= \text{sop} = 066a) & \rightarrow (CC,D \leftarrow D \times 2^2) ; \\
\text{ASHB}(= \text{sop} = 1066a) & \rightarrow (CC,Db \leftarrow Db \times 2^2) ; \\
\text{NOR}(= \text{sop} = 067a) & \rightarrow (CC,D \leftarrow \text{normalize}(D)) ; \\
\text{NORD}(= \text{sop} = 1067a) & \rightarrow (Db \leftarrow \text{normalize}(Dd)) ; \\
\text{SWAB}(= \text{sop} = 3) & \rightarrow (CC,D \leftarrow D(7:0, 15:8)) ;
\end{align*}
\]

Logical Operations

\[
\begin{align*}
\text{BIC}(= \text{bop} = 0100) & \rightarrow (CC,D \leftarrow D \land \neg \text{S}) ; \\
\text{BICB}(= \text{bop} = 1100) & \rightarrow (CC,Db \leftarrow Db \lor \neg \text{Sb}) ; \\
\text{BIS}(= \text{bop} = 0101) & \rightarrow (CC,D \leftarrow D \lor \text{S}) ; \\
\text{BISB}(= \text{bop} = 1101) & \rightarrow (CC,Db \leftarrow Db \lor \text{Sb}) ; \\
\text{BIT}(= \text{bop} = 0011) & \rightarrow (CC \leftarrow D \land \text{S}) ; \\
\text{BITB}(= \text{bop} = 1011) & \rightarrow (CC \leftarrow Db \land \text{Sb}) ;
\end{align*}
\]

(adjust)  
(subtract)  
(word compare)  
(byte compare)  
(*multiply if \( D \) is a register then a double length operator)  
(*divide, if \( D \) is a register, then a remainder is saved)  

(clear word)  
(complement word)  
(complement byte)  
(increment word)  
(increment byte)  
(decrement word)  
(decrement byte)  
(negate)  
(negate byte)  
(add the carry)  
(add to byte the carry)  
(subtract the carry)  
(subtract from byte the carry)  
(test)  
(test byte)  

(rotate right)  
(byte rotate right)  
(rotate left)  
(byte rotate left)  
(arithmetic shift right)  
(byte arithmetic shift right)  
(arithmetic shift left)  
(byte arithmetic shift left)  
(byte rotate)  
(*logical shift)  
(*byte logical shift)  
(*arithmetic shift)  
(*byte arithmetic shift)  
(*normalize)  
(*normalize double)  

(bit clear)  
(byte bit clear)  
(bit set)  
(byte bit set)  
(bit test under mask)  
(byte bit test under mask)
Branches and Subroutines Calling: PC ← i;
JMP(ː = sop = 00016) → (PC ← D);  
BR(ː = nrop = 0116) → (PC ← PC + offset);
BEQ(ː = nrop = 0316) → (Z ← (PC ← PC + offset));
BNE(ː = nrop = 0216) → (~Z ← (PC ← PC + offset));
BLT(ː = nrop = 0516) → (N ⊕ V ← (PC ← PC + offset));
BGE(ː = nrop = 0416) → (N = V ← (PC ← PC + offset));
BLE(ː = nrop = 0716) → (Z ∨ (N ⊕ V) ← (PC ← PC + offset));
BGT(ː = nrop = 0616) → (~Z ∨ (N ⊕ V) ← (PC ← PC + offset));
BCC/BHIS(ː = nrop = 8716) → (C ← (PC ← PC + offset));

BCC/BLO(ː = nrop = 8616) → (~C ← (PC ← PC + offset));
BLS(ː = nrop = 8316) → (C ∧ Z ← (PC ← PC + offset));
BHI(ː = nrop = 8216) → (~C ∨ Z ← (PC ← PC + offset));
BVS(ː = nrop = 8116) → (V ← (PC ← PC + offset));
BVC(ː = nrop = 8016) → (~V ← (PC ← PC + offset));
BMT(ː = nrop = 8116) → (N ← (PC ← PC + offset));
BPL(ː = nrop = 8016) → (~N ← (PC ← PC + offset));

JSR(ː = sop = 0040h) → (i = 2; next
  SP ← SP - 2; next
  M[SP] ← R[sp];
  R[sp] ← PC;
  PC ← D);
RTS(ː = i = 00020h) → (i = 4; next
  PC ← R[dr];
  R[dr] ← M[SP];
  SP ← SP + 2);

RTI(ː = i = 2i) → (PC ← M[SP];
  SP ← SP + 2; next
  PS ← M[sp];
  SP ← SP + 2);
HALT(ː = i = 0) → (Run ← 0);
WAIT(ː = i = 1) → (Wait ← 1);
TRAP(ː = i = 3) → (SP ← SP + 2; next
  M[SP] ← PS;
  SP ← SP + 2; next
  M[SP] ← PC;
  PC ← M[34];
  PS ← M[62]);
EMT(ː = nrop = 8216) → (i = 4; next
  SP ← SP + 2; next
  M[SP] ← PS;
  SP ← SP + 2; next
  M[SP] ← PC;
  PC ← M[30];
  PS ← M[32]);

IOT(ː = i = 4) → (see TRAP)
RESET(ː = i = 5) → (not described)
OPERATE(ː = i(5:15) = 5) → (i(4) → (CC ← CC ∨ i(3:0));
  ¬i(4) → (CC ← CC ∧ ¬i(3:0)));

end Instruction execution