CS311 Lecture: CPU Control: Hardwired control and Microprogrammed control
Last revised 11/4/03
Objectives:
1. To explain the concept of a control word
2. To show how control words can be generated using hardwired control
3. To explain the concept of microprogramming
4. To discuss some advantages and disadvantages of microprogramming.
Materials:
1. Transparency of MARIE RTL
2. Transparency of state machine for MARIE
I. Introduction
- ------------
A. We have seen that a CPU - whether simple or complex - basically consists
of a control unit, plus a data part, encompassing:
1. A set of registers (including registers that interface to the system
bus)
2. A set of D-units (adders, shifters etc.)
3. A set of data paths (busses) connecting the above.
4. An interface to the "outside world" (memory, IO) - usually some sort
of bus system.
(We continue to assume a single instance of all components shared by all
steps of instruction execution, and sequential execution of instruction
steps. When we discuss pipelining and other forms of parallelism, we
will see that some components will have to be replicated.)
B. The data part is capable of performing a set of micro-operations or
primative computations that can be performed in one cycle (clock
pulse). Each micro-operation changes the contents of a single register.
An instruction in the user-visible instruction set must be programmed as a
series of micro-operations (some of which may be done in parallel on the
same clock pulse.)
C. Control of the system is accomplished by a control unit that -at the start
of each clock cycle - activates the necessary control functions to cause
the data part to perform the desired micro-operation(s) on the next clock
pulse. In the case of a multi-cycle CPU implementation (where a given
component may perform different tasks on different cycles), this can
be pictured as follows:
-------- --------
-----------------> Registers,
Control -----------------> ALU,
-----------------> data paths
-----------------> Bus System (memory/IO)
-------- --------
D. The set of control signals that pass from Control to the data part and bus
system is called a micro-word or control word. Conceptually, each bit of
this micro-word corresponds to the enabling of one particular
micro-operation that some system component can perform.
E. The job of the control unit designer for such a CPU is to develop a
means whereby an orderly sequence of control words may be presented to the
data part (and other hardware such as the memory) - one per clock pulse.
F. There are two basic ways such a sequence of control words can be
generated:
1. Hardwired control: The control unit is implemented as a state machine,
with combinatorial circuits generating each of the control functions
on the basis of the current state and certain variables such as the
op-code of the user instruction undergoing execution.
a. The state machine often has two levels of states: majors states,
each of which is broken up into minor states. A given major state
will consist of a series of minor states.
b. The major states may correspond to the various phases of
instruction execution, or each major state may correspond to a
single access to memory as part of instruction execution.
c. Either way, the minor states correspond to the individual steps
for a major state - e.g. if a certain major state requires three
successive micro-operations, then it will have three minor states.
2. Microprogrammed control. The various control words needed to
implement the user instructions are stored in a ROM, with a sequencer
causing the appropriate control word to be fetched at each clock
cycle and fed to the rest of the CPU.
II. An Example of Hardwired Control
-- ---------- -- --------- -------
A. To get some feel for what is involved in hardwired control, we will
discuss a hardwired control unit for our multicycle MIPS simulation.
B. Observe that, in the RTL specification for this machine we
discussed earlier, almost all instructions require exactly 4
cycles to fetch and execute. (One - j - requires only two - one
for fetch and one for execute). For simplicity, we will allocate 4
cycles to every instruction - thus wasting two on j.
1. Our state machine then looks like this:
Cycle 0 ----> Cycle 1 ----> Cycle 2 ----> Cycle 3
^ |
|----------------------------------------------|
(Where a state transition occurs on each clock)
a. The simplicity of the state machine for MIPS is a consequence
of the regularity of the instructions, which in turn is a
characteristic of the ISA designed to facilitate a pipelined
implementation. (The ISA makes this part of the implementation
easy).
b. Actually, a full implementation would need additional states
to deal with issues like interrupts and exceptions.
c. This simplified state machine can be realized by a 2 bit counter,
with its output decoded to yield 4 signals used internally in the
control unit.
----------- -----------
| | | 4 way |--- CYCLE0
| 2 bit |-----| decoder |--- CYCLE1
| counter |-----| |--- CYCLE2
| | | |--- CYCLE3
----------- -----------
2. Most machines would utilize much more complex state machines - e.g.
MARIE.
1. Show RTL
2. State machine might have 6+ major states - with only some used
for any given instruction. (No instruction would need them all)
a. IF - fetch the instruction (Common to all instructions)
b. OAC - calculate the address of the operand (used for
instructions that reference memory)
c. INDIRECT - go to memory to get the address of an operand
(used for AddI, JumpI)
d. OF - fetch an operand from memory (used for instructions that
read an operand from memory)
e. EXEC - execute an instruction (all instructions)
f. OS - store an operand into memory (Store)
g. Additional states for more complicated instructions like Jns
Note that these are major states - some might have 2 minor
states.
3. Flow between states - TRANSPARENCY
4. This sort of state machine could be implemented using design
techniques we discussed earlier in the course.
C. The control word for the MIPS simulation contains 17 bits.
1. Review the meaning of the bits
a. Some enable the loading of various registers (IR, PC, General)
(Note that the ALU Input and Output registers are loaded on every
cycle - there is nothing to be gained by having enables for them.)
b. Some control the various MUXes. These may be single bits (for a
2-way MUX) or groups of bits - PC Source (2), Memory Address,
Register Source, ALU Source A, ALU Source B (2).
c. One group of 2 controls _how_ the general register to be loaded (if
there is one) is specified - i.e. a MUX that controls the input
to the decoder that load-enables the correct register.
d. One group of 3 controls the ALU Function (i.e. the internal MUX
in the ALU).
e. There is one bit each to control memory read and memory write.
2. Each of these bits can be derived by a combinatorial network whose
inputs are the current state of the machine plus certain fields
in the IR. It will simplify the design work if we assume that the
opcode bits in the IR are connected to a 64-way decoder, with
exactly one line being asserted for any given instruction (or none
if the instruction is undefined)
---------- -----------
| |-----| 64-way |---- RTYPE (0)
| |-----| decoder |---- J (2)
| Opcode |-----| |---- JAL (3)
| bits |-----| |---- BEQ (4)
| of IR |-----| |---- BNE (5)
| |-----| |---- ADDI (8)
---------- | |---- SLTI (0xa)
| |---- ANDI (0xc)
| |---- OR (0xd)
| |---- XORI (0xe)
| |---- LUI (0xf)
| |---- LW (0x23)
| |---- SW (0x2b)
-----------
(A full implementation of the ISA would have many more!)
3. The function to be realized by each network is determined
by examining the RTL to see what value of the bit is implied by
each.
a. Example: the Load IR bit. This is 1 on Cycle 0 of all instructions,
and 0 everywhere else. Thus, we can derive this bit as
CYCLE0 -------- IR_LOAD
b. Example: the Load PC bit. This is 1 in four places, and 0
everywhere else
i. Cycle 0 of all instructions
ii. Cycle 3 of jr
iii. Cycle 3 of beq/bne if the branch condition is met
iv. Cycle 1 of j
This yields the following circuit:
CYCLE0 -------------------------------|
|
CYCLE3 ---------| |
RTYPE ---------| AND ----------------|
(Func == 8) ----| |
|
CYCLE3 ---------| |
BEQ ------------| AND ----------------| OR --- LOAD_PC
(RS == RT) -----| |
|
CYCLE3 ---------| |
BNE ------------| AND ----------------|
(RS == RT) ---- NOT ------------| |
|
CYCLE1 ---------| AND ----------------|
J --------------|
c. This same process can be continued for each bit of the control
word. To simplify design, we can take advantage of don't-cares.
Example: if LOAD_PC is 0, then we don't care about the value of
PC_SOURCE
It turns out we can make this 0 (PC + 4) on Cycle 0,
1 (IR J-Format constant) on Cycle 1,
3 (ALU Out) on Cycle 3
Since this yields the correct value whenever LOAD_PC is 1 and
is ignored otherwise
d. etc
III. Microprogramming
--- ----------------
A. As you can see, for even a very simple machine like the one we just
looked at, hardwired control leads to very complex control logic. For a
more complex machine, the control-unit complexity would make hardwired
control virtually impossible. Thus, the majority of CISCs use
microprogramming as a means of keeping the complexity of control within
limits (at the cost of a somewhat slower execution cycle.)
B. The basic idea is this: we build the control unit around a small, very
fast memory (not visible to the programmer.)
1. The width of this memory is equal to the width of the control word,
plus some additional bits we will discuss shortly.
2. We store the various control words in the memory (which is therefore
called the CONTROL STORE). We connect the output of the memory to
the control inputs of the ALU, data paths, etc.
3. On each clock, we fetch a control word and use it to determine what
the ALU etc. do on that clock.
4. We use a simple device called the SEQUENCER to arrange for the correct
sequence of control words to be fetched. (The additional bits in each
word in control store are used to control the sequencer.)
5. The control store is generally a ROM; but it is also possible to use
a writeable memory (PROM or RAM) for the control memory. This allows
for:
a. Dynamic microprogramming - e.g. for adding custom user instructions
to the standard set or emulating another machine.
b. Diagnostics - a microprogram that exercises a suspected portion of
the circuitry one micro-operation at a time may be loaded to assist
in the isolation of hardware flaws.
C. A micro-programmed implementation of our example MIPS machine.
1. Structure of the control unit:
-------------------------------------
| Control store - small, fast ROM |
| 512 words x 32 bits |
| |
-------------------------------------
|||| ||||||||| || |||||||||||||||||
-------------------------------------
| Current word from control store |
-------------------------------------
|||| ||||||||| || |||||||||||||||||
|||| ||||||||| || |||||||||||||||||
not sequencing Control word to
used control - registers, data
see below paths, ALU, memory
2. Micro-word format
----------------------------------------------------------
| Sequencing control | Control word to send to data part |
----------------------------------------------------------
a. The control word part would be 17 bits wide, as discussed under
hardwired control
b. The sequencing control part contains two fields
i. A 9 bit next micro-word field that contains the
address of the next microword. (Thus, each microword
explicitly contains the address of its successor).
This field is called "next"
ii. A 2 bit field used to allow branching in the microprogram -
we'll discuss this shortly. This field is called "decode".
iii. The structure for sequencing is as follows (where CSAR is
"control store address register", which holds the address
in control store of the current microword)
To datapaths etc.
____________________ ^
| | | |
| ---------------------
| | Current microword |
| ---------------------
| ^
| | |
| ---------------------
| | Control store |
| | |
| | |
| ---------------------
| ^
| | |
| --------
| | CSAR | <-- Decode fields
| --------
| ^
|__________________|
On each cycle, the next field of the current control
word is placed in the CSAR. Then that word is
accessed in control store and becomes the current microword.
iv. Optionally, some field from the instruction register or
other bit in the ALU can be or-red with the next field
before it is loaded into the CSAR. This is specified by
the decode field, whose values are interpreted as follows:
00 - Don't or anything with the next field
01 - Or the opcode field of the instruction register times 4
10 - Or the func field of the instruction register
11 - Or the output of the comparison between the rs and rt
registers - 1 if they are equal, 0 if not
3. The 512 words of control store are organized as follows. Note
that quite a few are unused - the structure is set up to facilitate
quick computation of addresses by or-ring bits, rather than by
doing addition (which takes more time).
Words 0-1: microprogram for fetching and decoding an instruction
4-5: Final control word of beq instruction - first for
registers not equal (don't branch); second for
registers equal (branch)
6-7: Final control word of bne instruction - first for
registers not equal (branch); second for registers
equal (don't branch)
0x80..0xbf: Final control word of RType instructions
(handled separately because the last control word
for JR is different from other RType instructions)
Note that the final control word for a particular
RType instruction is at address 0x80 + func.
0x100 .. 0x1ff: Control words for the various instructions -
up to 4 per instruction. (Actually, each instruction
needs at most 3, but 4 is a power of 2 and allows us
to multiply the op-code by shifting)
Note that the control words for a particular instruction
are a the four successive locations beginning at
0x100 + 4 * opcode.
4. Decoding of instructions is handled as follows: the decode field
of a microword can specify that some field of the instruction
register (or the output of the register comparator) - shifted two
places in the case of the opcode - is or-red with next field to
form the value placed in the CSAR.
` a. Note that the first two control words executed by every
instruction are those at control store locations 0 and 1.
That at 0 does IR <- M[PC], PC <- PC + 4
That at 1 does Decode opcode
(We can't do this as part of 0 because the opcode is not
loaded into the IR until the clock at the end of the cycle,
which is the same time we need to load a new address into
CSAR, and thus cannot be used to help determine that address.)
b. This appears wasteful because it adds an extra cycle to each
instruction. (I.e. most instructions now use 5). In practice,
a richer ISA has operations that can be done speculatively at
this point - e.g. MAR <- Address portion of instruction -
not needed for every instruction but needed for enough to make
it worthwhile.
5. DEMO: Lab 6 Part 1 program - note values in next / decode / CSAR
at each step.
V. Advantages/disadvantages of micro-programming
- ------------------------ -- -----------------
A. Advantages
1. Great sophistication in the user instruction set can be achieved for
relatively low cost. Adding new instructions is cheap.
2. Multiple user instruction sets can be available on the same machine.
This allows a new machine to emulate a previous model to aid in
the conversion process - e.g.
a. Early IBM 360's contained microcode to emulate 1401's and/or 1620's
b. Early DEC VAX's emulated PDP-11's.
c. DEC Alpha's use a form of microcode (though different from what
we have discussed here) to emulate VAX's.
3. New architectures can be tried out by simulating them using writeable
control store on an existing machine. Special micro-engines have
been built for just this kind of work.
4. Micro-code can be written to allow direct execution of high-level
languages - e.g. LISP, Pascal.
5. For specialized applications (e.g. real-time systems), critical loops
can be microprogrammed for faster execution time.
6. Micro-programmed diagnostics.
7. Bit-sliced processors, allowing implementation of custom machines.
B. Disadvantages
1. For a simple machine, the extra hardware needed for the control store
and sequencer may be more complex than hardwiring.
2. For a given level of technology, hardwired control will be faster,
since there is no delay for micro-instruction fetch from ROM before
the control unit can produce a control word.
3. Does not lend itself well to parallelism, as we shall see.
C. CISCs typically use micro-programming, because hardwired control is
generally not feasible due to the complexity of their instruction set.
RISCs do not use micro-programming; their simplicity facilitates
hard-wired control, which is faster and allows pipelining - our next
topic. Indeed, a pipelined machine avoids even having a distinct
control unit of the sort we have discussed - control is distributed
throughout the pipeline, as we shall see.
Copyright ©2003 - Russell C. Bjork