We will design a simplified MIPS processor

0
0
42
1 week ago
PDF Preview
Full text
(1)Datapath & Control Design • •. We will design a simplified MIPS processor The instructions supported are – memory-reference instructions: lw, sw – arithmetic-logical instructions: add, sub, and, or, slt – control flow instructions: beq, j. •. Generic Implementation: – – – –. •. use the program counter (PC) to supply instruction address get the instruction from memory read registers use the instruction to decide exactly what to do. All instructions use the ALU after reading the registers Why? memory-reference? arithmetic? control flow?. 1.

(2) What blocks we need • •. • • • • •. We need an ALU – We have already designed that We need memory to store inst and data – Instruction memory takes address and supplies inst – Data memory takes address and supply data for lw – Data memory takes address and data and write into memory We need to manage a PC and its update mechanism We need a register file to include 32 registers – We read two operands and write a result back in register file Some times part of the operand comes from instruction We may add support of immediate class of instructions We may add support for J, JR, JAL. 2.

(3) Simple Implementation •. Include the functional units we need for each instruction. Instruction address MemWrite. PC Instruction. Add Sum. Instruction memory. Address. Write data. a. Instruction memory. 5 Register numbers. 5 5. Data. b. Program counter. Read register 2 Registers Write register Write data. Data memory. ALU control. Read data 1. Sign extend. 32. MemRead a. Data memory unit. Data. 16. c. Adder. 3. Read register 1. Read data. Zero ALU ALU result. Read data 2. b. Sign-extension unit. Why do we need this stuff?. RegWrite a. Registers. b. ALU. 3.

(4) More Implementation Details •. Abstract / Simplified View:. Data Register # PC. Address. Instruction. Instruction memory. Registers. ALU. Address. Register # Data memory. Register # Data. •. Two types of functional units: – elements that operate on data values (combinational) • Example: ALU. – elements that contain state (sequential) • Examples: Program and Data memory, Register File. 4.

(5) Managing State Elements • •. Unclocked vs. Clocked Clocks used in synchronous logic – when should an element that contains state be updated? falling edge. cycle time rising edge. 5.

(6) MIPS Instruction Format 31. 26 25. 21 20 REG 1. LW 31. 26 25. 21 20. 31. 26 25. 31. 21 20. 26 25 R-TYPE. 31. 26 25. 31. 26 25 JUMP. DST 16 15. JUMP. OFFSET. 6. 5. 0 OFFSET. 6. SHIFT AMOUNT 11 10. 0. 5. 0. ADD/AND/OR/SLT. 6. 5. 0. 5. 0. IMMEDIATE DATA. REG 2. 21 20. 11 10. 0. 5. BRANCH ADDRESS. REG 2. REG 1. 6. 11 10. 16 15. 21 20. 5 OFFSET. 11 10. 16 15. 21 20. 6. STORE ADDRESS. REG 2. REG 1. I-TYPE. 16 15 REG 2. REG 1. BEQ/BNE/J. 11 10 LOAD ADDRESS. REG 2. REG 1. SW. 16 15. 16 15. 11 10. 6 ADDRESS. 6.

(7) Building the Datapath •. Use multiplexors to stitch them together PCSrc M u x. Add Add ALU result. 4 Shift left 2. PC. Read address Instruction Instruction memory. Registers Read register 1 Read Read data 1 register 2 Write register Write data RegWrite 16. ALUSrc. Read data 2. Sign extend. M u x. 3. ALU operation. Zero ALU ALU result. MemWrite MemtoReg. Address. Read data. Data memory Write data. M u x. 32 MemRead. 7.

(8) A Complete Datapath for R-Type Instructions • •. Lw, Sw, Add, Sub, And, Or, Slt can be performed For j (jump) we need an additional multiplexor PCSrc. Add ALU Add result. 4 RegWrite Instruction [25–21] PC. Read address Instruction [31–0] Instruction memory. Instruction [20–16] 1 M u Instruction [15–11] x 0 RegDst Instruction [15–0]. Read register 1 Read register 2. Read data 1. Read Write data 2 register Write data Registers 16. Sign 32 extend. 1 M u x 0. Shift left 2. MemWrite ALUSrc 1 M u x 0. ALU control. Zero ALU ALU result. MemtoReg Address. Read data. Write Data data memory. 1 M u x 0. MemRead. Instruction [5–0] ALUOp. 8.

(9) What Else is Needed in Data Path •. Support for j and jr – For both of them PC value need to come from somewhere else – For J, PC is created by 4 bits (31:28) from old PC, 26 bits from IR (27:2) and 2 bits are zero (1:0) – For JR, PC value comes from a register. •. Support for JAL – Address is same as for J inst – OLD PC needs to be saved in register 31. •. And what about immediate operand instructions – Second operand from instruction, but without shifting. •. Support for other instructions like lw and immediate inst write. 9.

(10) Operation for Each Instruction LW: 1. READ INST. SW: 1. READ INST. R/I/S-Type: 1. READ INST. BR-Type: 1. READ INST. JMP-Type: 1. READ INST. 2. READ REG 1 2. READ REG 1 2. READ REG 1 2. READ REG 1 2. READ REG 2. READ REG 2. READ REG 2. READ REG 2. 3. ADD REG 1 + 3. ADD REG 1 + 3. OPERATE on 3. SUB REG 2 OFFSET OFFSET REG 1 / REG 2 from REG 1 4. READ MEM. 4. WRITE MEM 4.. 5. WRITE REG2 5.. 5. WRITE DST. 4.. 5.. 3.. 4.. 5.. 10.

(11) Data Path Operation 4. A D D. M U X. ADD Shift Left 2. M U X. AND jmp 25-00 zero br 25-21 PC. IA. INST MEMORY. 20-16. INST 31-00 M 15-11 U X. RA1. RD1. RA2 REG FILE. WA WD RD2 WE RDES Sign 15-00 Ext 05-00. 31-26. ALU. DATA MEMORY. M U X ALU SRC. MA. WD MD MR MW ALU CON ALUOP. M U X Memreg. CONTROL. 11.

(12) Our Simple Control Structure •. All of the logic is combinational. •. We wait for everything to settle down, and the right thing to be done – ALU might not produce “right answer” right away – we use write signals along with clock to determine when to write. •. Cycle time determined by length of the longest path. State element 1. Combinational logic. State element 2. Clock cycle. We are ignoring some details like setup and hold times. 12.

(13) Control Points 4. A D D. M U X. ADD Shift Left 2. M U X. AND jmp 25-00 zero br 25-21 PC. IA. INST MEMORY. 20-16. INST 31-00 M 15-11 U X. RA1. RD1. RA2 REG FILE. WA WD RD2 WE RDES Sign 15-00 Ext 05-00. 31-26. ALU. DATA MEMORY. M U X ALU SRC. MA. WD MD MR MW ALU CON ALUOP. M U X Memreg. CONTROL. 13.

(14) LW Instruction Operation 4. A D D. M U X. ADD Shift Left 2. M U X. AND jmp 25-00 zero br 25-21 PC. IA. INST MEMORY. 20-16. INST 31-00 M 15-11 U X. RA1. RD1. RA2 REG FILE. WA WD RD2 WE RDES Sign 15-00 Ext 05-00. 31-26. ALU. DATA MEMORY. M U X ALU SRC. MA. WD MD MR MW ALU CON ALUOP. M U X Memreg. CONTROL. 14.

(15) SW Instruction Operation 4. A D D. M U X. ADD Shift Left 2. M U X. AND jmp 25-00 zero br 25-21 PC. IA. INST MEMORY. 20-16. INST 31-00 M 15-11 U X. RA1. RD1. RA2 REG FILE. WA WD RD2 WE RDES Sign 15-00 Ext 05-00. 31-26. ALU. DATA MEMORY. M U X ALU SRC. MA. WD MD MR MW ALU CON ALUOP. M U X Memreg. CONTROL. 15.

(16) R-Type Instruction Operation 4. A D D. M U X. ADD Shift Left 2. M U X. AND jmp 25-00 zero br 25-21 PC. IA. INST MEMORY. 20-16. INST 31-00 M 15-11 U X. RA1. RD1. RA2 REG FILE. WA WD RD2 WE RDES Sign 15-00 Ext 05-00. 31-26. ALU. DATA MEMORY. M U X ALU SRC. MA. WD MD MR MW ALU CON ALUOP. M U X Memreg. CONTROL. 16.

(17) BR-Instruction Operation 4. A D D. M U X. ADD Shift Left 2. M U X. AND jmp 25-00 zero br 25-21 PC. IA. INST MEMORY. 20-16. INST 31-00 M 15-11 U X. RA1. RD1. RA2 REG FILE. WA WD RD2 WE RDES Sign 15-00 Ext 05-00. 31-26. ALU. DATA MEMORY. M U X ALU SRC. MA. WD MD MR MW ALU CON ALUOP. M U X Memreg. CONTROL. 17.

(18) Jump Instruction Operation 4. A D D. M U X. ADD Shift Left 2. M U X. AND jmp 25-00 zero br 25-21 PC. IA. INST MEMORY. 20-16. INST 31-00 M 15-11 U X. RA1. RD1. RA2 REG FILE. WA WD RD2 WE RDES Sign 15-00 Ext 05-00. 31-26. ALU. DATA MEMORY. M U X ALU SRC. MA. WD MD MR MW ALU CON ALUOP. M U X Memreg. CONTROL. 18.

(19) Control •. For each instruction – Select the registers to be read (always read two) – Select the 2nd ALU input – Select the operation to be performed by ALU – Select if data memory is to be read or written – Select what is written and where in the register file – Select what goes in PC. •. Information comes from the 32 bits of the instruction. •. Example: add $8, $17, $18 Instruction Format: 000000 10001 10010 01000 00000 100000 op. rs. rt. rd. shamt. funct. 19.

(20) Adding Control to DataPath 0 M u x Add Add 4 Instruction [31– 26]. Control. Instruction [25– 21] PC. Instruction [20– 16]. Instruction memory. Instruction [15– 11]. 1. Shift left 2. RegDst Branch MemRead MemtoReg ALUO p MemWrite ALUSrc RegWrite Read register 1. Read address Instruction [31– 0]. ALU result. 0 M u x 1. Read data 1 Read register 2 Registers Read Write data 2 register. Zero 0 M u x 1. Write data. ALU. ALU result. Address. Write data Instruction [15– 0]. 16. Sign extend. Read data Data memory. 1 M u x 0. 32 ALU control. Instruction [5– 0]. Memto- Reg Mem Mem Instruction RegDst ALUSrc Reg Write Read Write Branch ALUOp1 ALUp0 R-format 1 0 0 1 0 0 0 1 0 lw 0 1 1 1 1 0 0 0 0 sw X 1 X 0 0 1 0 0 0 beq X 0 X 0 0 0 1 0 1. 20.

(21) ALU Control • •. ALU's operation based on instruction type and function code – e.g., what should the ALU do with any instruction Example: lw $1, 100($2). •. •. 35. 2. 1. op. rs. rt. 16 bit offset. ALU control input 000 001 010 110 111. •. 100. AND OR add subtract set-on-less-than. Why is the code for subtract 110 and not 011?. 21.

(22) Other Control Information •. •. Must describe hardware to compute 3-bit ALU conrol input – given instruction type 00 = lw, sw ALUOp 01 = beq, computed from instruction type 10 = arithmetic 11 = Jump – function code for arithmetic Control can be described using a truth table: ALUOp ALUOp1 ALUOp0 0 0 X 1 1 X 1 X 1 X 1 X 1 X. F5 X X X X X X X. Funct field F4 F3 F2 F1 X X X X X X X X X 0 0 0 X 0 0 1 X 0 1 0 X 0 1 0 X 1 0 1. Operation F0 X X 0 0 0 1 0. 010 110 010 110 000 001 111. 22.

(23) Implementation of Control •. Simple combinational logic to realize the truth tables Inputs Op5 Op4. ALUOp. Op3 Op2 Op1. ALUcontrol block ALUOp0. Op0. ALUOp1 Outputs. F3 F2 F(5–0). Operation1. Operation. Iw. sw. beq. RegDst ALUSrc MemtoReg RegWrite. F1 Operation0 F0. R-format. Operation2. MemRead MemWrite Branch ALUOp1 ALUOpO. 23.

(24) A Complete Datapath with Control. 24.

(25) Datapath with Control and Jump Instruction. 25.

(26) Timing: Single Cycle Implementation •. Calculate cycle time assuming negligible delays except: – memory (2ns), ALU and adders (2ns), register file access (1ns) PCSrc. Add ALU Add result. 4 RegWrite Instruction [25– 21] PC. Read address Instruction [31– 0] Instruction memory. Instruction [20– 16] 1 M u Instruction [15– 11] x 0 RegDst Instruction [15– 0]. Read register 1 Read register 2. Read data 1. Read Write data 2 register Write Registers data 16. Sign 32 extend. 1 M u x 0. Shift left 2. MemWrite ALUSrc 1 M u x 0. ALU control. Zero ALU ALU result. MemtoReg Address. Read data. Data Write memory data. 1 M u x 0. MemRead. Instruction [5– 0] ALUOp. 26.

(27) Where we are headed • •. •. Design a data path for our machine specified in the next 3 slides Single Cycle Problems: – what if we had a more complicated instruction like floating point? – wasteful of area One Solution: – use a “smaller” cycle time and use different numbers of cycles for each instruction using a “multicycle” datapath:. Instruction register PC. Address. Data A. Memory. Data. Register #. Instruction or data Memory data register. ALU. Registers Register #. ALUOut. B Register #. 27.

(28) Machine Specification • • • • • • • •. 16-bit data path (can be 4, 8, 12, 16, 24, 32) 16-bit instruction (can be any number of them) 16-bit PC (can be 16, 24, 32 bits) 16 registers (can be 1, 4, 8, 16, 32) With m register, log m bits for each register Offset depends on expected offset from registers Branch offset depends on expected jump address Many compromise are made based on number of bits in instruction. 28.

(29) Instruction • • • • • • •. LW R2, #v(R1) ; Load memory from address (R1) + v SW R2, #v(R1) ; Store memory to address (R1) + v R-Type – OPER R3, R2, R1 ; Perform R3 ß R2 OP R1 – Five operations ADD, AND, OR, SLT, SUB I-Type – OPER R2, R1, V ; Perform R2 ß R1 OP V – Four operation ADDI, ANDI, ORI, SLTI B-Type – BC R2, R1, V; Branch if condition met to address PC+V – Two operation BNE, BEQ Shift class – SHIFT TYPE R2, R1 ; Shift R1 of type and result to R2 – One operation Jump Class -- JAL and JR (JAL can be used for Jump) – What are th implications of J vs JAL – Two instructions. 29.

(30) Instruction bits needed • • • • • • • • • • •. LW/SW/BC – Requires opcode, R2, R1, and V values R-Type – Requires opcode, R3, R2, and R1 values I-Type – Requires opcode, R2, R1, and V values Shift class – Requires opcode, R2, R1, and shift type value JAL requires opcode and jump address JR requires opcode and register address Opcode – can be fixed number or variable number of bits Register address – 4 bits if 16 registers How many bits in V? How many bits in shift type? – 4 for 16 types, assume one bit shift at a time How many bits in jump address?. 30.

(31) Performance • • • •. Measure, Report, and Summarize Make intelligent choices See through the marketing hype Key to understanding underlying organizational motivation Why is some hardware better than others for different programs? What factors of system performance are hardware related? (e.g., Do we need a new machine, or a new operating system?) How does the machine's instruction set affect performance?. 31.

(32) Which of these airplanes has the best performance?. Airplane. Passengers. Boeing 737-100 Boeing 747 BAC/Sud Concorde Douglas DC-8-50. 101 470 132 146. Range (mi) Speed (mph) 630 4150 4000 8720. 598 610 1350 544. •How much faster is the Concorde compared to the 747? •How much bigger is the 747 than the Douglas DC-8?. 32.

(33) Computer Performance: TIME, TIME, TIME •. Response Time (latency) — How long does it take for my job to run? — How long does it take to execute a job? — How long must I wait for the database query?. •. Throughput — How many jobs can the machine run at once? — What is the average execution rate? — How much work is getting done?. • If we upgrade a machine with a new processor what do we increase? If we add a new machine to the lab what do we increase?. 33.

(34) Execution Time •. •. •. Elapsed Time – counts everything (disk and memory accesses, I/O , etc.) – a useful number, but often not good for comparison purposes CPU time – doesn't count I/O or time spent running other programs – can be broken up into system time, and user time Our focus: user CPU time – time spent executing the lines of code that are "in" our program. 34.

(35) Clock Cycles •. Instead of reporting execution time in seconds, we often use cycles seconds cycles seconds = × program program cycle. •. Clock “ticks” indicate when to start activities (one abstraction):. time. • •. cycle time = time between ticks = seconds per cycle clock rate (frequency) = cycles per second (1 Hz. = 1 cycle/sec) 1 × 10 9 = 5 nanoseconds cycle time A 200 Mhz. clock has a 200 × 10 6. 35.

(36) How to Improve Performance seconds cycles seconds = × program program cycle. So, to improve performance (everything else being equal) you can either. ________ the # of required cycles for a program, or ________ the clock cycle time or, said another way, ________ the clock rate.. 36.

(37) How many cycles are required for a program?. .... 6th. 5th. 4th. 3rd instruction. 2nd instruction. Could assume that # of cycles = # of instructions 1st instruction. •. time. This assumption is incorrect, different instructions take different amounts of time on different machines. Why? hint: remember that these are machine instructions, not lines of C code. 37.

(38) Different numbers of cycles for different instructions. time. •. Multiplication takes more time than addition. •. Floating point operations take longer than integer ones. •. Accessing memory takes more time than accessing registers. •. Important point: changing the cycle time often changes the number of cycles required for various instructions (more later). 38.

(39) Now that we understand cycles •. A given program will require – some number of instructions (machine instructions) – some number of cycles – some number of seconds. •. We have a vocabulary that relates these quantities: – cycle time (seconds per cycle) – clock rate (cycles per second) – CPI (cycles per instruction) a floating point intensive application might have a higher CPI. – MIPS (millions of instructions per second) this would be higher for a program using simple instructions. 39.

(40) Performance • •. Performance is determined by execution time Do any of the other variables equal performance? – # of cycles to execute program? – # of instructions in program? – # of cycles per second? – average # of cycles per instruction? – average # of instructions per second?. •. Common pitfall: thinking one of the variables is indicative of performance when it really isn’t.. 40.

(41) # of Instructions Example •. A compiler designer is trying to decide between two code sequences for a particular machine. Based on the hardware implementation, there are three different classes of instructions: Class A, Class B, and Class C, and they require one, two, and three cycles (respectively). The first code sequence has 5 instructions: 2 of A, 1 of B, and 2 of C The second sequence has 6 instructions: 4 of A, 1 of B, and 1 of C. Which sequence will be faster? How much? What is the CPI for each sequence?. 41.

(42) MIPS example •. Two different compilers are being tested for a 100 MHz. machine with three different classes of instructions: Class A, Class B, and Class C, which require one, two, and three cycles (respectively). Both compilers are used to produce code for a large piece of software. The first compiler's code uses 5 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions. The second compiler's code uses 10 million Class A instructions, 1 million Class B instructions, and 1 million Class C instructions.. • •. Which sequence will be faster according to MIPS? Which sequence will be faster according to execution time?. 42.

(43)

Nuevo documento

Ejercicios propuestos I
son adem´as tangentes a los lados de un ´angulo agudo por la parte interior del mismo, por lo que sus centros estar´an situados sobre la bisectriz de dicho ´angulo.. Halla el valor del
Números racionales e irracionales
Æ Números IRRACIONALES: Los números irracionales, al contrario que los racionales, no pueden expresarse como cociente de números enteros, luego no pueden ser ni decimales exactos ni
NÚMEROS REALES: Unidad Didáctica
  Los  números  racionales  también  contienen  a  los  números  que  tienen  expresión  decimal  exacta  0’12345  y  a  los  que  tienen  expresión  decimal 
Apuntes de Trigonometría III
Veamos que dado un ángulo cualquiera comprendido entre 90º y 360º, existe otro ángulo en el primer cuadrante con razones trigonométricas iguales, en valor absoluto, a las del dado...
Apuntes DE TRIGONOMETRÍA
Teorema del seno y del coseno Estos teoremas se utilizan para resolver triángulos no rectángulos, en los que no podemos aplicar ni el teorema de Pitágoras, ni las razones
Las espirales en el arte
La influencia de saber griego en el cual todo había de poder representarse con regla y compás en Durero marcaron la forma que obtuvo el método para resolver la espiral áurea, las cuales
Dossier cuarto op b
− Adquirir métodos y herramientas para resolver problemas del cálculo de probabilidades − Estudiar los diferentes casos que se pueden presentar a la hora de contar el número de
Cálculo Algebraico
Las operaciones de suma y de multiplicaci´on se extienden a este nuevo conjunto, y la resta queda bien definida entre cualquier par de n´umeros enteros.. Si bien la resta es una
RESUMEN de inecuaciones VA
Desigualdades no lineales Ejemplo Resolver la desigualdad x 2 > 2x Soluci´on: Cuarto paso: para la u ´ltima fila, multiplicamos todos los signos de cada columna:... Valor Absoluto
Inecuacionesysistemas resueltas
Estudiamos el signo de la función en cada uno de estos intervalos ¿Verifica la inecuación?... SISTEMAS DE

Etiquetas

Documento similar

Factors that affect the English language teaching-learning process in Ecuadorian public high schools
If we have a small classroom and a large class size, it will affect the process of teaching and learning because we need to do pair tasks, group tasks and debates in the classroom to
0
0
76
Profiling and Analysis of Irregular Memory Accesses of Memory-Intensive Embedded Programs-Edición Única
We need a framework to perform the study of how the memory access behavior impacts in the memory hierarchy, also we need to be able to estimate the power consumption and performance of
0
0
258
Fuzzy logic controllers on chip
a The programmable blocks CAB and PL block are configured and accessed using special function registers and data memory both internal and external of the mP8051 memory organization..
0
0
6
LaST: Language Study Tool
Memory Panel Memory Segments: LaST allows the user to rapidly identify if a memory cell is being used as part of an activation record, heap data, static data or a register, using
0
0
12
Semantic based visualization: a first approach
We have concluded that the first stage of the “Unified Visualization Model”, the raw data, will include an XML representation of the input data and with this the associated semantic;
0
0
5
Do we need an open face recognition search engine?
2 Second, pictorial data present on the Internet can be used for unauthorized identification, connecting faces loaded with identity data as pictures in SNs to pictures of people in
0
0
8
FIPBLOX: a graphical interactive design tool for FIPSOC
The Dual Port RAM module includes two independent pairs of Address and Data Output pins that share access to the same region of memory, allowing simultaneous read and write access to
0
0
11
Using JOP to build a chip multiprocessor JVM for embedded realtime systems
Provide a way for the processors to share a common memory system In order to implement a shared memory architecture, we need to provide shared access to a single memory interface.. In
0
0
9
Probabilistic Multicast Trees
We have devised Probabilistic Multicast Trees PMT [8], which is an optimizing mechanism that is designed to improve the data delivery latency and data delivery efficiency of any
0
0
6
Memory and Oblivion. Testimonies and Descriptions
When we contemplate a people that remembers, rather than forgetting, we are aware that this memory has been passed down from generation to generation via the channels and depositaries
0
0
7
Essays on Macroeconometrics
Using a measure of social capital based on trust and voluntary cooperation and exploiting enrollment data to proxy for human capital, we obtained the observations needed to address the
0
0
149
We Attach to What We Nurture
In the early 1980s, I met a thirteen-year-old, Deborah, who responded to the experience of computer programming by speaking about the pleasures of putting “a piece of your mind into the
0
0
10
Introduction to Statistics for Biomedical Engineers – Kristina M. Ropella
Assuming a Probability Model From the Sample Data Now that we have collected the data, graphed the histogram, estimated measures of central tendency and variability, such as mean,
0
0
102
Cas – Evaluating business models for e-books through usage data analysis – Grigson – 2009
Continuing analysis of usage data at both local and national level will be needed if we are to understand how our e-book collections are being used, and ensure both that we are
0
0
11
Dissertation Proposal Growth, employment and labor productivity: A long run analysis for the case of Argentina
In order to do that I will study the long run evolution and determinants of labor supply by building an ADL model, and then we will combine the demand and supply sides through
0
0
20
Show more