

THE Am29116

## WESCON PAPER (1979)

William J. Harmon, Jr. Warren K. Miller

Advanced Micro Devices 901 Thompson Place Sunnyvale, CA 94086

#### INTRODUCTION

The Am29116 is a high-performance 16-bit bipolar microprocessor intended for use in microprogrammed systems, particularly peripheral controllers, although it is also suitable for use in communication controllers, industrial controllers and digital modems. The chip can also be used in microprogrammed processor applications. In addition to its complete arithmetic and logic instruction set, the Am29116 instruction set contains functions particularly useful in controller applications; bit set, bit reset, bit test, rotate and merge, rotate and compare, and cyclic-redundancy-check (CRC) generation.

#### OUTSTANDING FEATURES

16-Bit Data Path - The Am29116 contains a 16-bit data path with full carry lookahead over all 16 bits in the ALU during arithmetic operation. In order to facilitate interfacing the device to other circuits, the Am29116 has the ability to execute all instructions in either the 16-bit word or 8-bit byte mode.

32 Working Registers - In order to provide adequate on-chip storage, the Am29116 contains 32 working registers arranged in a single port RAM architecture. With the use of an external multiplexer, it is possible to select separate read and write addresses for the same instruction. The device also contains a 16-bit Accumulator and a 16-bit Data Latch.

16-Bit Barrel Shifter - A 16-bit Barrel Shifter which can rotate an input up to 15 positions is also included in the device. Like the ALU, the barrel shifter can work in either the word or byte mode.

Status Register and Condition-Code Generator/Multiplexer - The

Am29116 contains an 8-bit Status Register and a Condition-Code Generator/

Multiplexer. The Status Register stores the four ALU status outputs,

Z, C, N, OVR, as well as a Link bit for shifting and three user
definable Flag bits. The Condition-Code Generator/Multiplexer allows

testing of 12 different test conditions. The output of the Condition
Code Generator/Multiplexer can be connected directly to the conditional
test input of a microprogram sequencer.

Immediate Instruction Capability - Immediate instructions can be executed by the Am29116. These are two-microcycle instructions. The first instruction contains information necessary to perform the instruction. The second instruction contains immediate data, which is entered via the 16 Instruction Inputs.

CRC Generation - The Am29116 has instructions which perform CRC,
(Cyclic-Redundancy Check), calculations for any CRC polynomial of
16 bits or less.

Powerful Instruction Set - The instruction set of the Am29116 is very powerful. In addition to the normal single- and two-operand logical and arithmetic instructions, the Am29116 can also execute the following instructions in a single microcycle: rotate and merge, rotate and compare, and prioritize.

#### ARCHITECTURE OF THE Am29116

The Am29116 is a high-performance, microprogrammable 16-bit bipolar microprocessor. This 48-pin device is designed internally with ECL (emitter-coupled logic) circuitry and has TTL to ECL and ECL to TTL converters on all inputs and outputs. The design goal is to execute all microinstructions in 100 nanoseconds over the commercial operating range.

All data paths within the device are 16-bits wide. As shown in the Block Diagram, Figure 1, the device consists of the following:

- 32-Word by 16-Bit RAM
- Accumulator
- Data Latch
- Barrel Shifter
- ALU
- Priority Encoder
- Status Register
- Condition-Code Generator/
  Multiplexer
- Three-State Output Buffers
- Instruction Latch and Decoder

32-Word by 16-Bit RAM - The 32-Word by 16-Bit RAM is a single-port RAM with a 16-bit latch at its output. The latches are transparent when the clock input (CP) is HIGH and latched when the clock input is LOW. Data is written into the RAM while the clock is LOW if the TEN input is also LOW and if the instruction being executed defines the RAM as the destination of the operation. For byte instructions, only the lower eight RAM bits are written into; for word instructions, all 16 bits are written into.

Accumulator - The 16-bit Accumulator is an edge-triggered register. The Accumulator accepts data on the LOW to HIGH transition of the clock input if the  $\overline{\text{IEN}}$  input is LOW and if the instruction being executed defines the Accumulator as the destination of the operation. For byte instructions, only the lower eight bits of the Accumulator are written into; for word instructions, all 16 bits are written into.

<u>Data Latch</u> - The 16-bit Data Latch holds the data input to the Am29116 on the bi-directional Y bus. The latch is transparent when the DLE input is HIGH and latched when the DLE input is LOW.

<u>Barrel Shifter</u> - A 16-bit Barrel Shifter is used as one of the ALU inputs. This permits rotating data from either the RAM, the Accumulator or the Data Latch up to 15 positions. In the word mode, the Barrel Shifter rotates a 16-bit word; in the byte mode, it rotates only the lower eight bits.

Arithmetic Logic Unit - The Am29116 contains a 16-bit ALU with full carry lookahead across all 16 bits in the arithmetic mode. The ALU is capable of operating on either one, two or three operands, depending upon the instruction being executed. It has the ability to execute all conventional one and two operand operations, such as pass, complement, two's complement, add, subtract, AND, NAND, OR, NOR, EXOR, and EX-NOR. In addition, the ALU can also execute three-operand instructions such as rotate and merge and rotate and compare with mask. All ALU operations can be performed on either a word or byte basis, byte operations being performed on the lower eight bits only.

The ALU produces three status outputs, C (carry), N (negative) and OVR (overflow). The appropriate flags are generated at the byte or word level, depending upon whether the device is executing in the byte or word mode. The Z (zero) flag, although not generated by the ALU, detects zero at both the byte and word level.

The carry input to the ALU is generated by the Carry Multiplexer which can select an input of zero, one, or the stored carry bit from the Status Register, QC. Using QC as the carry input allows execution of multiprecision addition and subtraction.

<u>Priority Encoder</u> - The Priority Encoder produces a binary-weighted code to indicate the location of the highest order ONE at its input. The input to the Priority Encoder is generated by the ALU which performs an AND operation on the operand to be prioritized and a mask.

The mask determines which bit locations to eliminate from prioritization. In the word mode, if no bit is HIGH, the output is a binary zero. If bit 15 is HIGH, the output is a binary one. Bit 14 produces a binary two, etc. Finally, if only bit 0 is HIGH, a binary 16 is produced.

In the byte mode, bits 8 thru 15 do not participate. If none of bits 7 thru 0 are HIGH, the output is a binary zero. If Bit 7 is HIGH a binary one is produced. Bit 6 produces a binary two, etc. Finally, if only bit 0 is HIGH, a binary 8 is produced.

Status Register - The Status Register holds the 8-bit status word. With the Status-Register Enable, SRE, input LOW and the IEN input LOW, the Status Register is updated at the end of all instructions except NO-OP, Save-Status and Test-Status instructions. SRE going HIGH or IEN going HIGH inhibits the Status Register from changing.

The lower four bits of the Status Register contain the ALU status bits of Zero (Z), Carry (C), Negative (N) and Overflow (OVR). The upper four bits contain a Link bit and three user-definable status bits (Flag 1, Flag 2, Flag 3).

With SRE LOW and TEN LOW, the lower four status bits are updated after each instruction except those mentioned above, NO-OP, Save Status, Status Test and the Status Set/Reset instruction for the upper four bits. Under the same conditions, the upper four status bits are changed only during their respective Status Set/Reset instructions and during Status Load instructions in the word mode. The Link-Status bit is also updated after each shift instruction.

The Status Register can be loaded from the internal Y-bus, and can also be selected as a source for the internal Y-bus. When the Status Register is loaded in the word mode, all 8-bits are updated; in the byte mode, only the lower 4 bits (Z, C, N, OVR) are updated.

When the Status Register is selected as a source in the word mode, all eight bits are loaded into the lower byte of the destination; the upper byte of the destination is loaded with all zeros. In the byte mode, the Status Register again loads into the lower byte of the destination, but the upper byte remains unchanged. This Store and Load combination allows saving and restoring the Status Register for interrupt and subroutine processing. The four lower status bits (Z, C, N, OVR) can be read directly via the bidirectional T bus. These four bits are available as outputs on the  $T_{1-4}$  outputs whenever  $OE_{T}$  is HIGH.

Condition-Code Generator/Multiplexer - The Condition-Code Generator/
Multiplexer contains the logic necessary to develop the 12 conditioncode test signals. The multiplexer portion can select one of these
test signals and place it on the CT output for use by the microprogram
sequence. The multiplexer may be addressed in two different ways:
One way is through the Test Instruction. This instruction specifies
the test condition to be placed in the CT output, but does not allow
an ALU operation at the same time. The second method uses the
bidirectional T bus as an input. This requires extra microcode, but
provides the ability to simultaneously test and execute.

Three-State Output Buffers - There are two sets of Three-State Output Buffers in the Am29116. One set controls the bidirectional,  $16\text{-bit}\ Y$  bus. These outputs are enabled by placing a LOW on the  $\overline{\text{OE}}$  input. A HIGH puts the Y outputs in the high-impedance state, allowing data to be input to the Data latch from an external source.

The second set of Three-State Output Buffers control the bidirectional 4-bit T bus and is enabled by placing a HIGH on the  $OE_{m}$  input. This allows storing the four internal ALU status bits

(Z, C, N, OVR) externally. A LOW  $OE_{\mathrm{T}}$  input forces the T outputs into the high-impedance state. External devices can then drive the T bus to select a test condition for the CT output.

Instruction Latch and Decoder - The 16-bit Instruction Latch is normally transparent to allow decoding of the Instruction Inputs by the Instruction Decoder into the internal control signals for the Am29116. All instructions except Immediate Instructions are executed in a single clock cycle.

Immediate instructions require two clock cycles for execution.

During the first clock cycle, the Instruction Decoder recognizes that an Immediate Instruction is being specified and captures the data on the Instruction Inputs in the Instruction Latch. During the second clock cycle, the data on the Instruction Inputs is used as one of the operands for the function specified during the first clock cycle. At the end of the second clock cycle, the Instruction Latch is returned to its transparent state.

#### INSTRUCTION SET

The Am29116 Instruction set can be divided into 11 types of instructions.

These are:

Single Operand

Two Operand

Shift

Bit

Prioritize

Rotate by N

Rotate and Merge

Rotate and Compare

CRC

No-op

Status

<u>Single-Operand Instructions</u> - Single-Operand Instructions contain four indicators: byte or word mode, operation, source and destination.

The operations which can be performed are Pass, Complement, Increment and Two's Complement.

<u>Two-Operand Instructions</u> - Two-Operand Instructions contain five indicators: byte or word mode, operation, R operand, S operand, and destination.

The possible operations are R plus S, R plus S plus Carry, R minus S, S minus R, R minus S with Carry, S minus R with Carry, R AND S, R NAND S, R OR S, R NOR S, R EX-OR S, R EX-NOR S.

<u>Shift Instructions</u> - Shift Instructions contain four indicators; byte or word mode, direction and shift linkage, source and destination.

The direction and shift linkage indicator defines the direction of the shift (up or down) as well as what will be shifted into the vacant bit. On a shift-up instruction, the LSB may be loaded with ZERO, ONE, or the Link-Status bit (QLINK). The MSB is loaded into the Link-Status Bit. On a shift-down instruction, the MSB may be loaded with ZERO, ONE, the contents of the Status Carry flip-flop, (QC), the Exclusive-OR of the Negative-Status bit and the Overflow-Status bit (QN \(\mathbf{Q}\)QOVR) or the Link-Status bit. The LSB is loaded into the Link-Status bit.

<u>Bit Instructions</u> - The Bit Instructions contain four indicators: byte or word mode, operation, source/destination, and the address of the bit to be operated on.

The operations which can be performed are: set bit N which forces the  $N^{th}$  bit to a ONE; reset bit N, which forces the  $N^{th}$  bit to ZERO; test bit N, which sets the ZERO Status Bit depending on the state of bit N; load  $2^N$ , which loads ONE in bit position N and ZERO in all other bit positions; load  $\overline{2^N}$  which loads ZERO in bit position N and ONE in all other bit positions; increment by  $2^N$ , which adds  $2^N$  to the operand; and decrement by  $2^N$ , which subtracts  $2^N$  from the operand.

<u>Prioritize Instruction</u> - The Prioritize Instructions contain four indicators: byte or word mode, R operand, Mask operand (S), and destination.

The function of the Prioritize Instructions

R operand is ANDed with the complement of the Mask operand. A ZERO in the Mask operand allows the corresponding bit in the R operand to participate in the priority encoding function. A ONE in the Mask operand forces the corresponding bit in the R operand to a ZERO, eliminating it from participation in the priority encoding function.

The Priority Encoder accepts a 16-bit input and produces a 5-bit binary-weighted code indicating the bit position of the highest-priority active bit. If none of the inputs are active, the output is ZERO. In the word mode, if input bit 15 is active, the output is 1, etc.

Rotate by N Instructions - The Rotate by N Instructions contain four indicators: byte or word mode, source, destination and the number of places the source is to be rotated.

The N indicator specifies the number of bit positions the source is to be rotated up (0 to 15). In the word mode, all 16-bits are rotated up while in the byte mode, only the lower 8-bits (0 to 7) are rotated up.

Rotate and Merge Instructions - The Rotate and Merge Instructions contain five indicators: byte and word mode, rotated operand, non-rotated operand/destination, mask and number of bit positions the rotated operand is to be rotated.

The function performed by the Rotate and Merge Instruction

The rotated operand, U, is rotated by the Barrel Shifter N places. The Mask input then selects, on a bit by bit basis, the rotated U input or R input. A ZERO in bit i of the mask will select the i<sup>th</sup> bit of the rotated U input as the i<sup>th</sup> output bit, while a ONE in bit i will select the i<sup>th</sup> R input as the output bit. The output word is stored in the non-rotated operand location.

Rotate and Compare Instructions - The Rotate and Compare Instructions contain five indicators: byte or word mode, rotated operand, non-rotated operand, mask, and number of bit positions the rotated operand, is to be rotated.

The function performed by the rotate and compare instruction

The rotated operand is rotated by the Barrel

Shifter N places. The Mask is inverted and ANDed on a bit-by-bit

basis with the output of the Barrel Shifter and R input. Thus, a ONE in the mask input eliminates that bit from the comparison. A ZERO allows the comparison. If the comparison passes, the Zero flag is set. If it fails, the Zero flag is reset.

-101-

CRC Instructions - The CRC (cyclic redundancy check) Instructions provide a method for generation of the check bits in a CRC calculation.

The ACC serves as a polynomial mask to define the generating polynomial while the RAM register holds the partial result and eventually the calculated Check Sum. The Link-Bit is used as the serial input. The serial input combines with the MSB of the check-sum register, according to the polynomial defined by the polynomial mask register. When the last input bit has been processed, the check-sum register contains the CRC check bits.

Two CRC instructions are provided-CRC Forward and CRC

Reverse The difference in these two instructions arises

because CRC standards do not specify which data bit is to be trans
mitted first, the LSB or the MSB, but they do specify which check

bit must be transmitted first.

 ${
m NO-OP\ Instruction}$  - The No-op Instruction does not change any internal registers in the Am29116. It preserves the status register, RAM registers and the ACC register.

<u>Status Instructions</u> - The <u>Set Status Instruction</u> contains a single indicator. This indicator specifies which bit or group of bits, contained in the Status Register are to be set (forced to a ONE).

The <u>Reset Status Instruction</u> contains a single indicator. This indicator specifies which bit or group of bits, contained in the status register are to be reset (forced to ZERO).

The Store Status Instruction contains two indicators, a byte/word and a second indicator that specifies the destination of the Status Register. The Store Status Instruction allows the state of the processor to be saved and restored later, which is an especially useful function for interrupt handling.

The <u>Load Status Instruction</u> contains two indicators. The indicators specify the byte or word mode and the source for the Status Register. In the byte mode only, the lower 4-bits (QC, QN, QZ, QOVR) are loaded from the source. In the word mode, all 8-bits of the Status Register are loaded from the source.

The <u>Test Status Instructions</u> contain a single indicator which specifies which one of the 12 possible test conditions are to be placed on the Conditional-Test output. Besides the eight bits in the status register (QZ, QC, QN, QOVR, QLINK, QFLAG1, QFLAG2 and QFLAG3), four logical functions (QN $\oplus$ QOVR), (QN $\oplus$ QOVR) + QZ, QZ + QC and LOW may also be selected. These functions are useful in testing of 2's complement and unsigned numbers.

The Status Register may also be tested via the bidirectional T bus. See the discussion on the Status Register for a full description.

# TYPICAL SYSTEM CONFIGURATION



#### Am29116 APPLICATIONS

The intended primary applications for the Am29116 are high-performance peripheral controllers. Figure 14 shows a typical system configuration for a Host Computer, Memory and Peripheral Controller. The interface between the three units is via three buses; the data bus (D1), the address bus (A1) and the control bus (C1). The interface between the Peripheral Controller and the Peripheral Devices is via a data bus (D2) which may be either serial or parallel, and a control bus (C2). Information on the control buses consists of status, command and timing signals.

TYPICAL

Typical

A typical implementation of the Peripheral Controller is shown in Figure 15. The bidirectional interface to the Dl data bus is via two Am2950 8-Bit Parallel I/O Ports; two Am2940 8-bit DMA Address Generators drive the Al bus and another Am2950 interfaces to the bidirectional Cl bus. The interface to the serial D2 bus is via a parallel-to-serial and a serial-to-parallel converter, and the bidirectional interface to the C2 bus is via two Am2950s. The interface between these bus-interface units and the Am29116 is a 16-bit bidirectional bus which connects to the Y 0-15 outputs of the 29116.

Also connected to this bus is a 256-word RAM for temporary data storage and a 12-bit interface (1-1/2 Am2920s) to the  $\rm D_{0-11}$  inputs of the Am2910 Microprogram Sequencer. The bus-control and clock-enable signals for these devices are generated by the Pipeline Register at the output of the Microprogram Memory.

The Am29116, Am2910 and the Microprogram Memory perform the data manipulation and routing; command and status testing and generation; and timing-signal generation functions. The implementation illustrated in Figure 15 minimizes the amount of hardware necessary to implement a controller. This is accomplished by A) sharing the Instruction-Inputs to the Am29116 with the  $D_{0-11}$  inputs to the Am2910, B)generating all necessary test conditions within the 29116 which allow connecting the

CT output of the 29116 directly to the  $\overline{CC}$  inputs of the Am2910, C) by generating the CT output via the Instruction Inputs, D) performing all the necessary status manipulations within the Am29116, E) using the same RAM address for reading and writing and F) running the controller at a fixed clock rate.

MINIMUM PARTS CONFIGURATION PERIPHERAL CONTROLLER,



Figure 15
ADVANCED MICRO DEVICES &

-108-

### Maximizing Throughput

Although the implementation shown in Figure 15 minimizes the amount of required hardware, it does limit the throughput of the controller. The architecture shown in Figure 16 uses the same bus interface circuits but maximizes the throughput of the controller at the expense of additional hardware. In this implementation, the Instruction Inputs of the Am29116 and the  $D_{n-11}$  inputs of the Am2910 are driven from separate microcode bits; this allows simultaneous instruction execution in the Am29116 and direct jumping in the Am2910. The multiplexer at the  $\overline{CC}$  input of the Am2910 allows testing of conditions without loading the signals into the Am29116. Four additional bits of microcode drive the  $T_{1-4}$  inputs of the Am29116; this allows simultaneous conditional testing and execution of an instruction in the Am29116. The Am2904 can be loaded with the four ALU arithmetic status bits (Z, C, N, OVR). The flexibility of the Am2904, such as selective loading of status bits, reduces the number of cycles necessary to perform status manipulation. By adding five additional microcode bits and a multiplexer at the  $I_{0-4}$  inputs of the Am29116, separate RAM source and destination addresses can be used in the same microcycle; for example, the contents of RAM address 3 can be added to the contents of the Accumulator and the results can be stored in RAM Address 27. The Am2925 System-Clock Generator and Driver, in addition to providing the basic oscillator and clock driver functions, provides the ability to dynamically alter the length of the microcycle; this facilitates interfacing the Am29116 to slower bus interface and peripheral circuits.

Figures 15 and 16 are intended to show the two extremes of minimizing hardware versus maximizing throughput.

