CPU Control: Hardwired Control and Microprogramming

CS311 Lecture: CPU Control: Hardwired control and Microprogrammed control
                                                        Last revised 11/4/03

Objectives:

1. To explain the concept of a control word
2. To show how control words can be generated using hardwired control
3. To explain the concept of microprogramming
4. To discuss some advantages and disadvantages of microprogramming.

Materials:

1. Transparency of MARIE RTL
2. Transparency of state machine for MARIE

I. Introduction
-  ------------

   A. We have seen that a CPU - whether simple or complex - basically consists
      of a control unit, plus a data part, encompassing:

      1. A set of registers (including registers that interface to the system
         bus)

      2. A set of D-units (adders, shifters etc.)

      3. A set of data paths (busses) connecting the above.

      4. An interface to the "outside world" (memory, IO) - usually some sort
         of bus system.

      (We continue to assume a single instance of all components shared by all 
       steps of instruction execution, and sequential execution of instruction 
       steps.  When we discuss pipelining and other forms of parallelism, we 
       will see that some components will have to be replicated.)

   B. The data part is capable of performing a set of micro-operations or 
      primative computations that can be performed in one cycle (clock 
      pulse).  Each micro-operation changes the contents of a single register.  
      An instruction in the user-visible instruction set must be programmed as a
      series of micro-operations (some of which may be done in parallel on the
      same clock pulse.) 

   C. Control of the system is accomplished by a control unit that -at the start
      of each clock cycle - activates the necessary control functions to cause
      the data part to perform the desired micro-operation(s) on the next clock
      pulse.  In the case of a multi-cycle CPU implementation (where a given
      component may perform different tasks on different cycles), this can 
      be pictured as follows:

        --------                        --------
                    ----------------->  Registers,
        Control     ----------------->  ALU,
                    ----------------->  data paths
                    ----------------->  Bus System (memory/IO)
        --------                        --------

   D. The set of control signals that pass from Control to the data part and bus
      system is called a micro-word or control word.  Conceptually, each bit of 
      this micro-word corresponds to the enabling of one particular 
      micro-operation that some system component can perform.

   E. The job of the control unit designer for such a CPU is to develop a 
      means whereby an orderly sequence of control words may be presented to the
      data part (and other hardware such as the memory) - one per clock pulse.

   F. There are two basic ways such a sequence of control words can be 
      generated:

      1. Hardwired control: The control unit is implemented as a state machine,
         with combinatorial circuits generating each of the control functions 
         on the basis of the current state and certain variables such as the 
         op-code of the user instruction undergoing execution.

         a. The state machine often has two levels of states: majors states,
            each of which is broken up into minor states.  A given major state
            will consist of a series of minor states.

         b. The major states may correspond to the various phases of
            instruction execution, or each major state may correspond to a
            single access to memory as part of instruction execution.

         c. Either way, the minor states correspond to the individual steps
            for a major state - e.g. if a certain major state requires three
            successive micro-operations, then it will have three minor states.

      2. Microprogrammed control.  The various control words needed to
         implement the user instructions are stored in a ROM, with a sequencer
         causing the appropriate control word to be fetched at each clock
         cycle and fed to the rest of the CPU.

II. An Example of Hardwired Control
--  ---------- -- --------- -------

   A. To get some feel for what is involved in hardwired control, we will
      discuss a hardwired control unit for our multicycle MIPS simulation.

   B. Observe that, in the RTL specification for this machine we
      discussed earlier, almost all instructions require exactly 4
      cycles to fetch and execute.  (One - j - requires only two - one
      for fetch and one for execute).  For simplicity, we will allocate 4 
      cycles to every instruction - thus wasting two on j.  

      1. Our state machine then looks like this:

                 Cycle 0 ---->  Cycle 1 ---->   Cycle 2 ---->   Cycle 3
                    ^                                              |
                    |----------------------------------------------|

         (Where a state transition occurs on each clock)

         a. The simplicity of the state machine for MIPS is a consequence
            of the regularity of the instructions, which in turn is a
            characteristic of the ISA designed to facilitate a pipelined
            implementation.  (The ISA makes this part of the implementation
            easy).

         b. Actually, a full implementation would need additional states
            to deal with issues like interrupts and exceptions.

         c. This simplified state machine can be realized by a 2 bit counter,
            with its output decoded to yield 4 signals used internally in the
            control unit.

                -----------     -----------
                |         |     | 4 way   |--- CYCLE0
                | 2 bit   |-----| decoder |--- CYCLE1
                | counter |-----|         |--- CYCLE2
                |         |     |         |--- CYCLE3
                -----------     -----------

      2. Most machines would utilize much more complex state machines - e.g.
         MARIE.

         1. Show RTL

         2. State machine might have 6+ major states - with only some used
            for any given instruction.  (No instruction would need them all)

            a. IF - fetch the instruction (Common to all instructions)

            b. OAC - calculate the address of the operand (used for
               instructions that reference memory)

            c. INDIRECT - go to memory to get the address of an operand
               (used for AddI, JumpI)

            d. OF - fetch an operand from memory (used for instructions that
               read an operand from memory)

            e. EXEC - execute an instruction (all instructions)

            f. OS - store an operand into memory (Store)

            g. Additional states for more complicated instructions like Jns
            
            Note that these are major states - some might have 2 minor
            states.

         3. Flow between states - TRANSPARENCY

         4. This sort of state machine could be implemented using design
            techniques we discussed earlier in the course.  

   C. The control word for the MIPS simulation contains 17 bits.

      1. Review the meaning of the bits

         a. Some enable the loading of various registers (IR, PC, General)
            (Note that the ALU Input and Output registers are loaded on every
            cycle - there is nothing to be gained by having enables for them.)

         b. Some control the various MUXes.  These may be single bits (for a
            2-way MUX) or groups of bits - PC Source (2), Memory Address, 
            Register Source, ALU Source A, ALU Source B (2).

         c. One group of 2 controls _how_ the general register to be loaded (if
            there is one) is specified - i.e. a MUX that controls the input
            to the decoder that load-enables the correct register.

         d. One group of 3 controls the ALU Function (i.e. the internal MUX
            in the ALU).

         e. There is one bit each to control memory read and memory write.

      2. Each of these bits can be derived by a combinatorial network whose
         inputs are the current state of the machine plus certain fields
         in the IR.  It will simplify the design work if we assume that the
         opcode bits in the IR are connected to a 64-way decoder, with
         exactly one line being asserted for any given instruction (or none
         if the instruction is undefined)

         ----------     -----------
         |        |-----| 64-way  |---- RTYPE (0)
         |        |-----| decoder |---- J (2)
         | Opcode |-----|         |---- JAL (3)
         | bits   |-----|         |---- BEQ (4)
         | of IR  |-----|         |---- BNE (5)
         |        |-----|         |---- ADDI (8)
         ----------     |         |---- SLTI (0xa)
                        |         |---- ANDI (0xc)
                        |         |---- OR (0xd)
                        |         |---- XORI (0xe)
                        |         |---- LUI (0xf)
                        |         |---- LW (0x23)
                        |         |---- SW (0x2b)
                        -----------

         (A full implementation of the ISA would have many more!)

      3. The function to be realized by each network is determined
         by examining the RTL to see what value of the bit is implied by
         each.

         a. Example: the Load IR bit.  This is 1 on Cycle 0 of all instructions,
            and 0 everywhere else.  Thus, we can derive this bit as

                CYCLE0 -------- IR_LOAD

         b. Example: the Load PC bit.  This is 1 in four places, and 0
            everywhere else

            i. Cycle 0 of all instructions

           ii. Cycle 3 of jr

          iii. Cycle 3 of beq/bne if the branch condition is met

           iv. Cycle 1 of j

            This yields the following circuit:

                        CYCLE0 -------------------------------|
                                                              |
                        CYCLE3 ---------|                     |
                        RTYPE  ---------| AND ----------------|
                        (Func == 8) ----|                     |
                                                              |         
                        CYCLE3 ---------|                     |
                        BEQ ------------| AND ----------------| OR --- LOAD_PC
                        (RS == RT) -----|                     |
                                                              |          
                        CYCLE3 ---------|                     |
                        BNE ------------| AND ----------------|
        (RS == RT) ---- NOT ------------|                     |
                                                              |         
                        CYCLE1 ---------| AND ----------------|
                        J --------------|

         c. This same process can be continued for each bit of the control
            word.  To simplify design, we can take advantage of don't-cares.

            Example: if LOAD_PC is 0, then we don't care about the value of
            PC_SOURCE

            It turns out we can make this 0 (PC + 4) on Cycle 0,
            1 (IR J-Format constant) on Cycle 1,
            3 (ALU Out) on Cycle 3

            Since this yields the correct value whenever LOAD_PC is 1 and
            is ignored otherwise

         d. etc

III. Microprogramming
---  ----------------

   A. As you can see, for even a very simple machine like the one we just
      looked at, hardwired control leads to very complex control logic.  For a 
      more complex machine, the control-unit complexity would make hardwired 
      control virtually impossible.  Thus, the majority of CISCs use 
      microprogramming as a means of keeping the complexity of control within 
      limits (at the cost of a somewhat slower execution cycle.)

   B. The basic idea is this: we build the control unit around a small, very 
      fast memory (not visible to the programmer.)  

      1. The width of this memory is equal to the width of the control word, 
         plus some additional bits we will discuss shortly.  

      2. We store the various control words in the memory (which is therefore 
         called the CONTROL STORE).  We connect the output of the memory to 
         the control inputs of the ALU, data paths, etc. 

      3. On each clock, we fetch a control word and use it to determine what 
         the ALU etc. do on that clock.  

      4. We use a simple device called the SEQUENCER to arrange for the correct 
         sequence of control words to be fetched.  (The additional bits in each
         word in control store are used to control the sequencer.)

      5. The control store is generally a ROM; but it is also possible to use 
         a writeable memory (PROM or RAM) for the control memory.  This allows 
         for:

         a. Dynamic microprogramming - e.g. for adding custom user instructions
            to the standard set or emulating another machine.

         b. Diagnostics - a microprogram that exercises a suspected portion of
            the circuitry one micro-operation at a time may be loaded to assist
            in the isolation of hardware flaws.

   C. A micro-programmed implementation of our example MIPS machine.

      1. Structure of the control unit:

                -------------------------------------
                | Control store - small, fast ROM   |
                | 512 words x 32 bits               |
                |                                   |
                -------------------------------------
                 |||| ||||||||| || |||||||||||||||||
                -------------------------------------
                | Current word from control store   |
                -------------------------------------
                 |||| ||||||||| || |||||||||||||||||
                 |||| ||||||||| || |||||||||||||||||
                 not   sequencing   Control word to
                 used  control -    registers, data     
                       see below    paths, ALU, memory

      2. Micro-word format

        ----------------------------------------------------------
        | Sequencing control | Control word to send to data part | 
        ----------------------------------------------------------

         a. The control word part would be 17 bits wide, as discussed under
            hardwired control

         b. The sequencing control part contains two fields
        
            i. A 9 bit next micro-word field that contains the
               address of the next microword.  (Thus, each microword
               explicitly contains the address of its successor).
               This field is called "next"

           ii. A 2 bit field used to allow branching in the microprogram -
               we'll discuss this shortly.  This field is called "decode".

          iii. The structure for sequencing is as follows (where CSAR is
               "control store address register", which holds the address
               in control store of the current microword)

                                        To datapaths etc.
                ____________________         ^
                |                  |        | |
                |               ---------------------
                |               | Current microword |
                |               ---------------------
                |                        ^
                |                       | |
                |               ---------------------
                |               |   Control store   |
                |               |                   |
                |               |                   |
                |               ---------------------
                |                  ^
                |                 | |
                |               --------
                |               | CSAR | <-- Decode fields
                |               --------
                |                  ^
                |__________________|

                On each cycle, the next field of the current control
                word is placed in the CSAR.  Then that word is
                accessed in control store and becomes the current microword.

           iv. Optionally, some field from the instruction register or
               other bit in the ALU can be or-red with the next field
               before it is loaded into the CSAR.  This is specified by
               the decode field, whose values are interpreted as follows:

                00 - Don't or anything with the next field
                01 - Or the opcode field of the instruction register times 4
                10 - Or the func field of the instruction register
                11 - Or the output of the comparison between the rs and rt
                     registers - 1 if they are equal, 0 if not

      3. The 512 words of control store are organized as follows.  Note
         that quite a few are unused - the structure is set up to facilitate
         quick computation of addresses by or-ring bits, rather than by
         doing addition (which takes more time).

         Words 0-1: microprogram for fetching and decoding an instruction

               4-5: Final control word of beq instruction - first for
                    registers not equal (don't branch); second for
                    registers equal (branch)

               6-7: Final control word of bne instruction - first for
                    registers not equal (branch); second for registers
                    equal (don't branch)

               0x80..0xbf: Final control word of RType instructions
                    (handled separately because the last control word
                    for JR is different from other RType instructions)

                    Note that the final control word for a particular
                    RType instruction is at address 0x80 + func.

               0x100 .. 0x1ff: Control words for the various instructions -
                    up to 4 per instruction.  (Actually, each instruction
                    needs at most 3, but 4 is a power of 2 and allows us
                    to multiply the op-code by shifting)

                    Note that the control words for a particular instruction
                    are a the four successive locations beginning at
                    0x100 + 4 * opcode.

      4. Decoding of instructions is handled as follows: the decode field
         of a microword can specify that some field of the instruction
         register (or the output of the register comparator) - shifted two
         places in the case of the opcode - is or-red with next field to
         form the value placed in the CSAR.

`        a. Note that the first two control words executed by every
            instruction are those at control store locations 0 and 1.

            That at 0 does      IR <- M[PC], PC <- PC + 4
            That at 1 does      Decode opcode

            (We can't do this as part of 0 because the opcode is not
             loaded into the IR until the clock at the end of the cycle,
             which is the same time we need to load a new address into
             CSAR, and thus cannot be used to help determine that address.)

        b. This appears wasteful because it adds an extra cycle to each
           instruction.  (I.e. most instructions now use 5).   In practice,
           a richer ISA has operations that can be done speculatively at
           this point - e.g. MAR <- Address portion of instruction -
           not needed for every instruction but needed for enough to make
           it worthwhile.

     5. DEMO: Lab 6 Part 1 program - note values in next / decode / CSAR
        at each step.

V. Advantages/disadvantages of micro-programming
-  ------------------------ -- -----------------

   A. Advantages

      1. Great sophistication in the user instruction set can be achieved for
         relatively low cost.  Adding new instructions is cheap.

      2. Multiple user instruction sets can be available on the same machine.
         This allows a new machine to emulate a previous model to aid in
         the conversion process - e.g.

         a. Early IBM 360's contained microcode to emulate 1401's and/or 1620's
         b. Early DEC VAX's emulated PDP-11's.
         c. DEC Alpha's use a form of microcode (though different from what
            we have discussed here) to emulate VAX's.

      3. New architectures can be tried out by simulating them using writeable
         control store on an existing machine.  Special micro-engines have
         been built for just this kind of work.

      4. Micro-code can be written to allow direct execution of high-level
         languages - e.g. LISP, Pascal.

      5. For specialized applications (e.g. real-time systems), critical loops
         can be microprogrammed for faster execution time.

      6. Micro-programmed diagnostics.

      7. Bit-sliced processors, allowing implementation of custom machines.

   B. Disadvantages

      1. For a simple machine, the extra hardware needed for the control store
         and sequencer may be more complex than hardwiring.

      2. For a given level of technology, hardwired control will be faster, 
         since there is no delay for micro-instruction fetch from ROM before
         the control unit can produce a control word.
         
      3. Does not lend itself well to parallelism, as we shall see.

   C. CISCs typically use micro-programming, because hardwired control is
      generally not feasible due to the complexity of their instruction set.
      RISCs do not use micro-programming; their simplicity facilitates
      hard-wired control, which is faster and allows pipelining - our next
      topic.  Indeed, a pipelined machine avoids even having a distinct 
      control unit of the sort we have discussed - control is distributed
      throughout the pipeline, as we shall see.
Copyright ©2003 - Russell C. Bjork