Bit-Slice Design: Controllers and ALUs

Evolution of the ALU

Last Edit July 11, 2001

Improving ALU Speed

Current Instruction Execution

Referring to SIMCOM as encoded so far, the number of microcycles required to perform the softyware operation B = A + B (where A and B are memory locations) is fixed at nine (9).

The operations invloved are given in Table 5-2.

<x> means contents ofthe location
<<x>> means contents of the location addressed by <x>
<>|<> means place left side of the equation into both locations of the right side. (The PC could have been incremented as a last step in the routines.) This is suitable if all arithmetic requires memory accesses.

Table 5-2 Software Operation B = A + B (Memory Addressing)

Program Level	Machine Level	Microcode Level
B = A + B	LDA, MEMA	<PC> -> <MAR> -> ADDR BUS
		<PC> <- <PC> + 1
		<<PC>> -> <IR> \| <MAR> (dual destination)
		DECODE
		<MAR> -> ADDR BUS
		<<MAR>> -> <ACC>
	ADD, MEMB	<PC> -> <MAR> -> ADDR BUS
		<PC> <- <PC> + 1
		<<PC>> -> <IR> \| <MAR>
		DECODE
		<MAR> -> ADDR BUS
		<<MAR>> + <ACC> -> <ACC>
	STO, MEMB	<PC> -> <MAR> -> ADDR BUS
		<PC> <- <PC> + 1
		<<PC>> -> <IR> \| <MAR>
		DECODE
		<MAR> -> ADDR BUS
		<ACC> -> <<MAR>>

Scratchpad Registers

In many instances, with emphasis on the cases where a high volume of computation is performed, an arithmetic operation is performed on the result of a former arithmetic operation or an operation may use the same operands as were used in a former operation. In those cases where the data to be accessed several times or where the result of one operation is to be used several times before it is to be stored into main memory, the availability of scratchpad registers can be shown to improve the system throughput.

If the operand data is already in appropriate registers, and if the result is to be kept in one of those registers, the operation

R_B = R_A + R_B

is performed in three microcycles.

To implement this, one format for register operation is:

Op Code
(8)

R_source(4)

R_operation(4)

This format allows 256 different register op cpdes and 16 scratchpad registers.

This format also requires that the instruction register be as wide as the format (i.e., 16 bits in this example). (The decode of the op code would prevent the MAR being used to perform a memory access if memory addressing and register addressing are used in the same system.) The CCU controls the connection of the register addresses to the scratchpad block via a MUX.

Genralized ACC

By using a multiport scratchpad block both source registers may be accessed at once. By using the scratchpad block to replace the single register ACC, 16 different accumulators are possible. The structure is shown in Figure 5-4.

Figure 5-4 Register Arithmetic: multiport architecture

The complete microcode would be given as in Table 5-3.

Table 5-3 Software Operation RB = RA + RB (Register Addressing)

Program Level	Machine Level	Microcode Level
RB = RA + RB	ADD, RA, RB	<PC> -> <MAR> -> ADDR BUS
		<PC> <- <PC> + 1
		<<PC>> -> <IR> -> <MAR>
		DECODE
		<RA> + <RB> -> <RB> under CCU control

If all instructions are required to be register oriented, the instruction set could look like that shown in Table 5-4. As a variation, even when any register could be used as the accumulator, some default or implied addressing instructions are desirable for code compaction. These are usually selected to be the most frequently occurring instructions such as load from memory or increment. For SIMCOM, if the R0 register were the default ACC, the load and store instructions would become LDR, addr and STO, addr. As an option, both implied addressing and defined addressing versions of instructions are often included to permit the greatest power adn flexibility in an instruction set.

Table 5-4 SIMCOM Register Instruction Set

LDR, REG, Addr	Load contents of main memory address into register (2 -word instruction)
ADD, REG₁, REG₂	Add contents of REG₁ to REG₂, put results into REG₂
SUB, REG₁, REG₂	Subtract contents of REG₁ from REG₂, put results into REG₂
OR, REG₁, REG₂	OR contents of REG₁ with REG₂, put results into REG₂
AND, REG₁, REG₂	AND contents of REG₁ with REG₂, put results into REG₂
XOR, REG₁, REG₂	Exclusive-OR accumulator and contents of address
INR, REG	Input to register
OUT, REG	Output from regsiter
JMP, REG	Jump direct to contents of register
JMZ, REG₁, REG₂	If REG = 0, then jump direct to contents of REG2
MOV, REG₁, REG₂	Move contents of REG₁ into REG₂
STO, REG, Addr	Store contents of register at main memory address

To perform the register operation itself in one microcycle, the system timing must be such that the instruction cycle is long enough to allow for the read register access, the ALU operation, the write register data, and the address setup time.

Generalized PC

Another change can be made to advantage -- the PC can be moved into the scratchpad block (i.e., any register can be the PC register.)

This allows arithmetic operations to be performed on the program address as is required in relative addressing, for example, where the PC is added to a base register to find the actual address. Indexed addressing and various other addressing structures are now feasible using high speed register arithmetic. The resulting structure is shown in Figure 5-5.

Figure 5-5 Redrawing the Structure

Bit-Slice Design: Controllers and ALUs

by Donnamaie E. White

Evolution of the ALU

Improving ALU Speed

Current Instruction Execution

Program Level

Machine Level