Modeling a pipeline

I am trying to code a cycle-accurate model of a simple CPU that has multiple pipeline stages - Fetch, Decode, Execute, MemAccess, Register WriteBack. I am thinking of coding this in SystemVerilog by having each pipeline stage be executed on a separate thread. Is this a recommended approach or are there better ways. What are the potential downsides of this approach?

In reply to tpan:

Describe what you mean by “separate thread”. “Thread” can be an overused term, and certainly has canonical definitions that lean more heavily on a software based meaning.

You could model a CPU’s operating stages with separate modules, or separate procedural blocks, or all stuffed together in one always block. There’s probably pluses and minuses for each of these different strategies. I suggest starting out with what you think is clearest in your head, then adjust as you learn/design the thing.

Regards,
Mark

In reply to Mark Curry:

I am thinking of having something like this in my scoreboard


task run_phase(uvm_phase phase);
  super.run_phase(phase);

  fork
    begin
      @ (posedge `TB_CLK);
      fetch_process();
    end
    begin
      @ (posedge `TB_CLK);
      decode_process()
    end
    ... // Similar threads for other pipeline stages
    ...
  join_none
endtask

Each of the pipeline stages have a corresponding queue defined (fetch_process_q[$],…). The fetch_process_q is fed instructions from the scoreboard tlm port that connects to the fetch interface monitor.

Within the fetch_process task I intend to do something as below -


task fetch_process();
  if (fetch_process_q.size() > 0) begin
     fetch_instruction = fetch_process_q[0];
    ...
    // Process the instruction
    ...
  end

  // Send the instruction into decode queue
  decode_process_q.push_back(fetch_instruction);

  // Pop out the processed instruction
  fetch_process_q.pop_front();
endtask

SImilar to the fetch_process, I plan to have separate tasks for each stage. I am not sure if I can achieve concurrency of the stages with such a design.

I implemented this and see that it doesn’t achieve the desired output. I see that each of the process tasks process the instruction in the same cycle. (Eg. If is see Instr1 being fetched in cycle T, I see it being decoded in cycle T too)

I think I figured out why this happens. The individual pipeline stages tasks (fetch_process, decode_process, etc) do not have any delay. So basically the queue at the end of the fetch_process are updated in the same cycle as when the decode process starts. The decode process is waiting for decode_instr_q to be not-zero and seems to get processed in the same cycle.

Can someone tell me how to get around this in SV?

In reply to tpan:

Based on your previous post, I see you are having trouble with the way Verilog models concurrency. This is not a quick thing to describe in this forum. I suggest you search for basic tutorials on how Verilog works.

Thanks, All. I was able to figure out by going over some of the tutorials.