I am trying to code a cycle-accurate model of a simple CPU that has multiple pipeline stages - Fetch, Decode, Execute, MemAccess, Register WriteBack. I am thinking of coding this in SystemVerilog by having each pipeline stage be executed on a separate thread. Is this a recommended approach or are there better ways. What are the potential downsides of this approach?
In reply to tpan:
Describe what you mean by “separate thread”. “Thread” can be an overused term, and certainly has canonical definitions that lean more heavily on a software based meaning.
You could model a CPU’s operating stages with separate modules, or separate procedural blocks, or all stuffed together in one always block. There’s probably pluses and minuses for each of these different strategies. I suggest starting out with what you think is clearest in your head, then adjust as you learn/design the thing.
Regards,
Mark
In reply to Mark Curry:
I am thinking of having something like this in my scoreboard
task run_phase(uvm_phase phase);
super.run_phase(phase);
fork
begin
@ (posedge `TB_CLK);
fetch_process();
end
begin
@ (posedge `TB_CLK);
decode_process()
end
... // Similar threads for other pipeline stages
...
join_none
endtask
Each of the pipeline stages have a corresponding queue defined (fetch_process_q[$],…). The fetch_process_q is fed instructions from the scoreboard tlm port that connects to the fetch interface monitor.
Within the fetch_process task I intend to do something as below -
task fetch_process();
if (fetch_process_q.size() > 0) begin
fetch_instruction = fetch_process_q[0];
...
// Process the instruction
...
end
// Send the instruction into decode queue
decode_process_q.push_back(fetch_instruction);
// Pop out the processed instruction
fetch_process_q.pop_front();
endtask
SImilar to the fetch_process, I plan to have separate tasks for each stage. I am not sure if I can achieve concurrency of the stages with such a design.
I implemented this and see that it doesn’t achieve the desired output. I see that each of the process tasks process the instruction in the same cycle. (Eg. If is see Instr1 being fetched in cycle T, I see it being decoded in cycle T too)
I think I figured out why this happens. The individual pipeline stages tasks (fetch_process, decode_process, etc) do not have any delay. So basically the queue at the end of the fetch_process are updated in the same cycle as when the decode process starts. The decode process is waiting for decode_instr_q to be not-zero and seems to get processed in the same cycle.
Can someone tell me how to get around this in SV?
In reply to tpan:
Based on your previous post, I see you are having trouble with the way Verilog models concurrency. This is not a quick thing to describe in this forum. I suggest you search for basic tutorials on how Verilog works.
Thanks, All. I was able to figure out by going over some of the tutorials.