Evolution of the ALU
Last Edit July 11, 2001
Improving ALU Speed
Current Instruction Execution
Referring to SIMCOM as encoded so far, the number of microcycles
required to perform the softyware operation B = A + B (where A and
B are memory locations) is fixed at nine (9).
The operations invloved are given in Table 5-2.
<x> means contents ofthe location
<<x>> means contents of the location addressed by
<x>
<>|<> means place left side of the equation into both
locations of the right side. (The PC could have been incremented
as a last step in the routines.) This is suitable if all arithmetic
requires memory accesses.
Table 5-2 Software Operation B = A + B (Memory Addressing)
Program Level
|
Machine Level
|
Microcode Level
|
B = A + B
|
LDA, MEMA
|
<PC> -> <MAR> -> ADDR BUS |
|
|
<PC> <- <PC> + 1 |
|
|
<<PC>> -> <IR> | <MAR> (dual destination) |
|
|
DECODE |
|
|
<MAR> -> ADDR BUS |
|
|
<<MAR>> -> <ACC> |
|
ADD, MEMB
|
<PC> -> <MAR> -> ADDR BUS |
|
|
<PC> <- <PC> + 1 |
|
|
<<PC>> -> <IR> | <MAR> |
|
|
DECODE |
|
|
<MAR> -> ADDR BUS |
|
|
<<MAR>> + <ACC> -> <ACC> |
|
STO, MEMB
|
<PC> -> <MAR> -> ADDR BUS |
|
|
<PC> <- <PC> + 1 |
|
|
<<PC>> -> <IR> | <MAR> |
|
|
DECODE |
|
|
<MAR> -> ADDR BUS |
|
|
<ACC> -> <<MAR>> |
Scratchpad Registers
In many instances, with emphasis on the cases where a high volume
of computation is performed, an arithmetic operation is performed
on the result of a former arithmetic operation or an operation may
use the same operands as were used in a former operation. In those
cases where the data to be accessed several times or where the result
of one operation is to be used several times before it is to be
stored into main memory, the availability of scratchpad registers
can be shown to improve the system throughput.
If the operand data is already in appropriate registers, and if
the result is to be kept in one of those registers, the operation
RB = RA + RB
is performed in three microcycles.
To implement this, one format for register operation is:
Op Code
(8)
|
Rsource
(4)
|
Roperation
(4)
|
This format allows 256 different register op cpdes and 16 scratchpad
registers.
This format also requires that the instruction register be as wide
as the format (i.e., 16 bits in this example). (The decode of the
op code would prevent the MAR being used to perform a memory access
if memory addressing and register addressing are used in the same
system.) The CCU controls the connection of the register addresses
to the scratchpad block via a MUX.
Genralized ACC
By using a multiport scratchpad block both source registers may
be accessed at once. By using the scratchpad block to replace the
single register ACC, 16 different accumulators are possible. The
structure is shown in Figure 5-4.
Figure 5-4 Register Arithmetic: multiport architecture
The complete microcode would be given as in Table 5-3.
Table 5-3 Software Operation RB = RA + RB (Register Addressing)
Program Level
|
Machine Level
|
Microcode Level
|
RB = RA + RB
|
ADD, RA, RB
|
<PC> -> <MAR> -> ADDR BUS |
|
|
<PC> <- <PC> + 1 |
|
|
<<PC>> -> <IR> -> <MAR> |
|
|
DECODE |
|
|
<RA> + <RB> -> <RB>
under CCU control |
If all instructions are required to be register oriented, the instruction
set could look like that shown in Table 5-4. As a variation,
even when any register could be used as the accumulator, some default
or implied addressing instructions are desirable for code compaction.
These are usually selected to be the most frequently occurring instructions
such as load from memory or increment. For SIMCOM, if the R0 register
were the default ACC, the load and store instructions would become
LDR, addr and STO, addr. As an option, both implied addressing and
defined addressing versions of instructions are often included to
permit the greatest power adn flexibility in an instruction set.
Table 5-4 SIMCOM Register Instruction Set
LDR, REG, Addr |
Load contents of main memory address into register
(2 -word instruction) |
ADD, REG1, REG2 |
Add contents of REG1 to REG2,
put results into REG2 |
SUB, REG1, REG2 |
Subtract contents of REG1 from REG2,
put results into REG2 |
OR, REG1, REG2 |
OR contents of REG1 with REG2,
put results into REG2 |
AND, REG1, REG2 |
AND contents of REG1 with REG2,
put results into REG2 |
XOR, REG1, REG2 |
Exclusive-OR accumulator and contents of address |
INR, REG |
Input to register |
OUT, REG |
Output from regsiter |
JMP, REG |
Jump direct to contents of register |
JMZ, REG1, REG2 |
If REG = 0, then jump direct to contents of REG2 |
MOV, REG1, REG2 |
Move contents of REG1 into REG2 |
STO, REG, Addr |
Store contents of register at main memory address
|
To perform the register operation itself in one microcycle,
the system timing must be such that the instruction cycle is long
enough to allow for the read register access, the ALU operation,
the write register data, and the address setup time.
Generalized PC
Another change can be made to advantage -- the PC can be moved
into the scratchpad block (i.e., any register can be the PC register.)
This allows arithmetic operations to be performed on the program
address as is required in relative addressing, for example, where
the PC is added to a base register to find the actual address. Indexed
addressing and various other addressing structures are now feasible
using high speed register arithmetic. The resulting structure is
shown in Figure 5-5.
Figure 5-5 Redrawing the Structure
|