Effect of clocking block on uvm driver/monitor

uvmsd · April 22, 2019, 9:19pm

I updated interface, driver, monitor for a counter based on the inputs in other thread.

https://verificationacademy.com/forums/uvm/clocking-blocks-uvm-and-sampling-uvm-monitor

Here is my interface:
Earlier i had 2 clocking blocks. one for driver and monitor each. Since the same clock is used in both, I changed my code to use just one clocking block. Thats correct, right?

interface updn_counter_if(input logic clk, reset, clk_en);
  logic [ABSOLUTE_DATA_WIDTH-1:0] count_value;
  logic                           load_counter;
  logic                           up_counter;
  logic [1:0]                     inc;
  logic [ABSOLUTE_DATA_WIDTH-1:0] current_value;
  logic                           count_reached;

clocking counter_cb @(posedge clk);
  default input #1 output #1;
  output load_counter;
  output count_value;
  output up_counter;
  output inc;
  input current_value;
  input count_reached;  
endclocking
endinterface

DRIVER

virtual task drive();  
  @( counter_vif.counter_cb);
    if (counter_vif.clk_en == 1 && counter_vif.reset) begin
      counter_vif.counter_cb.count_value  <= req.count_value;
      counter_vif.counter_cb.load_counter <= req.load_counter;
      counter_vif.counter_cb.up_counter   <= req.up_counter;
      counter_vif.counter_cb.inc          <= req.inc;
    end //if
endtask
endclass : updn_counter_driver

MONITOR

virtual task run_phase(uvm_phase phase);
   super.run_phase(phase);
 forever begin
   @(posedge counter_vif.clk);
    if (counter_vif.clk_en  && counter_vif.reset == 1) begin 
     seq_item_collected.count_value   = counter_vif.count_value;
     seq_item_collected.load_counter  = counter_vif.load_counter;
     seq_item_collected.up_counter    = counter_vif.up_counter;
     seq_item_collected.inc           = counter_vif.inc;

     @(counter_vif.counter_cb);  
     seq_item_collected.current_value = counter_vif.counter_cb.current_value;
     seq_item_collected.count_reached = counter_vif.counter_cb.count_reached;
    
     trans_collected_port.write(seq_item_collected);
    end  //    
 end //forever
 endtask : run_phase

I have some questions:

If i don’t add @(posedge counter_vif.clk); in monitor after forever, my test just hangs there. IS this because vif.clk_en & reset have to be validated with vif.clk?
Counter starts with default value of 7f(in down counter mode). Earlier, i could see scoreboard showing the comparison values from 7f. But now i see the comparison happens from 7e.
The above code works fine if input delay in clocking block is 0. If i change it to non-zero, I see an offset by 1(clk?) between RTL and TB counter values. Test fails. Is this because of input delay specified? The current_value from RTL might be sampled just before the clock edge and hence takes previous value?
How do we take care of this?

Please let me know.

dave_59 · April 23, 2019, 5:29am

In reply to uvmsd:

If at time 0 (or any other time afterward), clk_en && rest is false or unknown(x), your monitor gets into a 0-delay forever loop, which is a hang.

You should not mix different clock expressions, just use @(counter_vif.counter_cb);

You could change your clocking block signal directions to ‘inout’. Thet would allow you to both drive and sample them

uvmsd · April 23, 2019, 5:10pm

In reply to dave_59:
@Dave_59

Wow! Thanks Dave. I didn’t know/think about 0-delay forever loop.

And you wrote:
“You could change your clocking block signal directions to ‘inout’. Thet would allow you to both drive and sample them”

I did this and things look good. Or should we have 2 clocking blocks? one for driver and one for monitor? which method is recommended?

BTW, I see below warnings after making clocking block signals to inout(I see warnings on DUT outputs):

"Variable ‘/tb_updn_counter_top/intf/current_value’, driven via a port connection, is multiply driven. "

Do i ignore them?

One more thing:
If i mentione input delay to be ‘0’ in clocking block, I have a data match in my TB. If i mention it to be non-zero, I see a data mismatch. RTL is offset by 1 value. Is this because the output is sammpled just before the clock edge and gets the previous value? How do we take care of this? Do we just check for previous value in TB?
Or keep input delay to be ‘0’?

PLease let me know.

dave_59 · April 23, 2019, 5:25pm

In reply to uvmsd:

You should use inout for signals you expect to drive and sample. Use input to clocking block for outputs from your DUT.

uvmsd · April 23, 2019, 5:27pm

In reply to dave_59:
@dave_59
Ok, makes sense. thats done! thanks.

One more thing:
If i mentione input delay to be ‘0’ in clocking block, I have a data match in my TB. If i mention it to be non-zero, I see a data mismatch. RTL is offset by 1 value. Is this because the output is sampled just before the clock edge and gets the previous value? How do we take care of this? Do we just check for previous value in TB?
Or keep input delay to be ‘0’?

PLease let me know.

uvmsd · April 23, 2019, 5:48pm

In reply to uvmsd:

“A skew value of #0 changes the way input values are sampled and output values are synchronized, even though both will still be done at the simulation time when a clock event occurs. A skew value of #0 for any input means that the input will be sampled at the Observed region. An output skew value of #0 indicates that the output will be synchronized out in the Non-blocking assignment (NBA) region.”

AT the same time, I came across ,
**“Never use input#0 in your clocking blocks.
**
· input#0 sampling delay specification causes the clocking block to sample an input signal in the Observed region of the SystemVerilog scheduler.
· The Observed region occurs very late in the events queue, even later than the NBA region, and therefore input#0 sampling can see the new values of register outputs after they have been updated by NBA assignments executed at the clock edge.
· input#0 sampling allows our testbench a “sneak preview” of the value of a signal after the effects of the current clock. Normally your testbench would not be able to see that new value until the next clock.”

Mechanic · April 27, 2020, 8:27am

please let me know whether
input#0 and output#1 is valid .
Since we are delaying the stimulus driving.In this case also we cant use input#0?

Please clarify

Thanks in advance

chr_sue · April 27, 2020, 12:42pm

In reply to Mechanic:

It is leag to use #0 for all inputs, but it is not recommended. #0 can end-up in problems.
For inputs recommeded is #1step and for outputs an abolute time.

Mechanic · April 27, 2020, 1:46pm

In reply to chr_sue:

Thanks for the reply i would like to know what are the issues will arise if we use #0 input skew in clocking block.
Even though signals are sampled in observed region(#0) and stimulus is driving in the reactive region (output #0).
I am bit confused with the sentance:
" input#0 sampling allows our testbench a “sneak preview” of the value of a signal after the effects of the current clock. Normally your testbench would not be able to see that new value until the next clock."
please provide me an example it would be helpful.
Thanks in advance

chr_sue · April 27, 2020, 2:04pm

In reply to Mechanic:

With #0 you are stepping foreard in the SV event scheduler. Because you do not know where you are in this scheduleryou might disturb the correct processing of the assignments etc in the current time slot. I know this sounds a little bit theoretically but it is the truth.
Unfortunately I did not find a figure of the SV event scheduler I can put in here.
The clocking block is doing on the RTL level the same we are doing on the gatelevel when introducing a setup and hold time.
BTW the clocking block is an option and not a must. I have never used in my professional experoence and projects a clocking block.
Where does the sentence come from you are mentioning?

Mechanic · April 27, 2020, 2:32pm

In reply to chr_sue:

In reply to Mechanic:
With #0 you are stepping foreard in the SV event scheduler. Because you do not know where you are in this scheduleryou might disturb the correct processing of the assignments etc in the current time slot. I know this sounds a little bit theoretically but it is the truth.
Unfortunately I did not find a figure of the SV event scheduler I can put in here.
The clocking block is doing on the RTL level the same we are doing on the gatelevel when introducing a setup and hold time.
BTW the clocking block is an option and not a must. I have never used in my professional experoence and projects a clocking block.
Where does the sentence come from you are mentioning?

I have read in some technical paper
i am not able to get “With #0 you are stepping foreard in the SV event scheduler. Because you do not know where you are in this scheduleryou might disturb the correct processing of the assignments etc in the current time slot. I know this sounds a little bit theoretically but it is the truth.” could you please elaborate on this.
Since we have a feedback at every event region we can find out at which region it is executing right ?Please correct me if i am wrong.

chr_sue · April 27, 2020, 3:33pm

In reply to Mechanic:

Using #0 for the inputs means ‘sampled after nonblocking assignments update’.
Using #0 foe outputs does mot matter.
If you do not specifiy anything the defaults of the clocking block are #1step for inputs and #0 for outputs.

Mechanic · April 28, 2020, 4:15am

In reply to chr_sue:

If we use #0 as input skew and if there is a delay(clk-q) in the DUT model then the response read by the test bench be the wrong response .
This is the reason we are using input Skew time units to read the correct response.
In case of zero delay DUT then input #0 will be good?
please correct me if i am wrong in understanding

chr_sue · April 28, 2020, 9:09am

In reply to Mechanic:

A clockin block is only useful if testbench and design have the same synchronisation. If you are delaying a clock the cb is not needed. Again: it is only an option.

Mechanic · April 28, 2020, 2:29pm

In reply to chr_sue:

Thank you for completed information.

rishikpillai90 · May 18, 2021, 11:05am

In reply to chr_sue:

Hello chr_sue,

A follow up question on use of #0 in clocking block.
Let’s say that a TB component using the clocking block inside an interface is a reactive slave that drives a ready in response to valid from DUT, like an AXI bus. Ready could be driven 10 clock cycles after valid goes high, or it could be driven instantaneously so that the valid ready handshake completes in 1 clock cycle.

In case of instantaneous ready generation, behavior would be like a combinatorial equation where valid high result in ready high. Then the slave driver has to have the valid sampled in the observed region, right? Else the valid from DUT would have to wait for 1 more cock cycle.

What’s a good strategy in this case? Should the valid be taken out of interface clocking block all together, or should it be given #0 input skew in clocking block?


 interface test_if(input clk);
  logic valid;
  logic ready;
  
  clocking drv @(posedge clk)
    default input #0;
    input valid;
    output ready;
  endclocking

 endinterface

class driver;
 virtual test_if tb_if; //Virtual interface

  //Run phase
  @(tb_if.drv);
  if(tb_if.drv.valid) tb_if.drv.ready <= 1;
endclass

chr_sue · May 18, 2021, 12:20pm

In reply to rishikpillai90:

You do not specify the timing for the output. My recommendation for you is:

  clocking drv @(posedge clk)
    default input #1step;
    default output #1;
    input valid;
    output ready;
  endclocking

rishikpillai90 · May 18, 2021, 5:39pm

In reply to chr_sue:

Thanks for the reply. I suppose there would still be a one cycle delay between sampling valid and issuing ready. To react to an incoming valid with a ready in the same cycle, I suppose the clocking block shouldn’t be used, because that logic has to be combinatorial.

chr_sue · May 19, 2021, 8:14am

In reply to rishikpillai90:

It should work with the definition I showed you.
But you should aks if a direct combinatorical relationship between inputs and outputs is a good thing.
I believe you’ll run in serious trouble when doing other things like DFT pattern generation.

rishikpillai90 · May 19, 2021, 12:01pm

In reply to chr_sue:

I tried 2 simple examples where the valid sampled by clocking block

ready_drv

is given back to ready output of the same clocking block:

Using the clocking block where input skew is #1step.


interface test_if(input logic clk);
logic valid;
logic ready;
clocking ready_drv @(posedge clk);
  default input #1step;    
  default output #1;
  input valid;
  output ready;
endclocking
endinterface

module tb;
  logic clk=0;

  test_if tb_if(clk);
  logic valid=1;

  always clk = #5ns ~clk;

  always @(posedge clk) begin
    valid <= ~valid;
  end

  assign tb_if.valid = valid;

  initial forever begin
   @(tb_if.ready_drv);
    tb_if.ready_drv.ready <= tb_if.ready_drv.valid;
  end
endmodule

Here, as I said before, saw that ready output changes 1 clock after valid input changes. I suppose that’s because valid was sampled at preponed region of the timestep when the clk toggled. Hence if valid went high at the posedge of a clk, clocking block would still sample it as 0 (preponed value). The corresponding output drive would be #1 precision time after next clock edge.

Using the clocking block where input skew is #0.


interface test_if(input logic clk);
logic valid;
logic ready;
clocking ready_drv @(posedge clk);
  default input #0;    
  default output #1;
  input valid;
  output ready;
endclocking
endinterface

In this case, sampling of valid moved to observed region, so it could already catch the change in valid happened in active or NBA regions in same clock edge. This was seen in below waveform:

Regarding combinatorial relationship between input and output, I was just thinking about a UVM driver that is capable of giving ready to a valid at 0 cycle, 1 cycle… N cycle which can be controlled from a UVM sequence/test. I’m not aware about the DFT part of it in a real design, but I suppose there could be designs where the slave has to back-pressure a datapath in the same cycle as an incoming valid due to a possible internal fifo-full condition. Probably that’s why a high speed valid-ready path need to have sync-fifos in between to ease timing. I’m no expert in any of these.