contents: 0-1. CA Intro
Pipelined datapath
We need registers between stages to hold information produced in previous stage and make isolation
Data hazards
When an instruction depends on the completion of data access by a previous instruction, it causes data hazards.
How to detect & when to forward
- Step 1: Pass register numbers along pipeline
- if I-format or branch instruction, Rd is invalid value
- Step 2: Check the following data hazard conditions
- EX/MEM.RegisterRd == ID/EX.RegisterRs
- EX/MEM.RegisterRd == ID/EX.RegisterRt
- MEM/WB.RegisterRd == ID/EX.RegisterRs
- MEM/WB.RegisterRd == ID/EX.RegisterRt
- Step 3: If there is a data hazard, then do forwarding
- But only if the forwarding instruction writes to a register
- Check the RegWrite signal in EX/MEM.RegWrite
and MEM/WB.RegWrite
is 1
- And only if Rd for the forwarding instruction is not $zero
- $zero
cannot be overwritten
- EX/MEM.RegisterRd != 0
- MEM/WB.RegisterRd != 0
How to process
- EX hazards
- If (EX/MEM.RegWrite == 1 && EX/MEM.RegisterRd != 0 && EX/MEM.RegisterRd == ID/EX.RegisterRs
)
ForwardA = 10
- If (EX/MEM.RegWrite == 1 && EX/MEM.RegisterRd != 0 && EX/MEM.RegisterRd == ID/EX.RegisterRt
)
ForwardB = 10
- MEM hazard
- If (EX/MEM.RegWrite == 1 && EX/MEM.RegisterRd != 0 && MEM/WB.RegisterRd == ID/EX.RegisterRs
)
ForwardA = 01
- If (EX/MEM.RegWrite == 1 && EX/MEM.RegisterRd != 0 && MEM/WB.RegisterRd == ID/EX.RegisterRt
)
ForwardB = 01
- Both EX & MEM hazards
- Use the most recent result (the result in EX/MEM)
Practice
Load-use data hazards
But, sometimes, we cannot avoid stalls by forwarding
We need to stall for one cycle when we have to forward the data from MEM/WB registers to ALU stage
How to detect & when to forward
- Step 1: Check the following condition
- ID/EX.MemRead == 1 &&
(ID/EX.RegisterRt == IF/ID.RegisterRs || ID/EX.RegisterRt == IF/ID.RegisterRt)
- Step 2: If detect, stall
- Prevent PC and IF/ID from changing
the same instruction is executed in IF & ID stages
- Insert nop
in the EX stage by setting the control signals in ID/EX register to 0
- PCWrite = 0
, IF/IDWrite = 0
(if no load-use data hazards, they are 1)
Practice
Code scheduling
Reorder code to avoid the load-use data hazards (done by compiler)
Algorithm
- Check the existence of load hazards
- A load hazard occurs when the destination register of a load instruction is used as a source register of its next instruction
- If exists, check whether there is a code that can ve executed after the load instruction
- This reordering must not change the result of the program, while solving hazard
- We must consider the instruction dependencies
Control hazards
If branch outcome is determined in a MEM stage, there will be three pipeline bubbles
How to process
- Simple solution: static branch prediction
- Predict that the branch will not be taken (just the next instruction will be executed)
- If the prediction is correct, there will be no pipeline stall
- If the prediction is incorrect, flush 3 instructions and insert 3 pipeline bubbles
- Improving by early comparison in ID stage
- Use same prediction
- If the prediction is correct, there will be no pipeline stall
- If the prediction is incorrect, flush 1 instruction and insert 1 pipeline bubble
- Dynamic branch prediction
- History-based prediction with BPT(branch prediction table), store a recent branch decision (taken / not taken)
- Indexed by branch instruction addresses
Dynamic branch prediction
Make a better branch prediction in a way to reduce the number of misprediction
-
Step 1: access a branch prediction table by using the instruction address
-
Step 2: Check the prediction value in the table and fetch the next (not taken) or target branch instruction (taken)
-
Step 3: If the prediction is wrong, flush pipeline and flip prediction
2-bit predictor
Only change prediction on two successive mispredictions
Branch target buffer
Still, we should experience 1 pipeline stall to compute the target address for a taken branch
-
Cache of target address
-
Indexed by the address of branch instructions
-
When a branch instruction is fetched and its prediction is "taken",
then check the branch target buffer
- If there is a target address, fetch the target immediately