Performing matrix like transpose operation in an optimal way

Mark_Curry · June 11, 2019, 9:48pm

I realize this may drift a bit into a tool-specific optimization, but I’m now struggling with some performance issues when modelling some linear algebra operations using SystemVerilog.

Specifically, just doing a tranpose operation is taking a signficant amount of time:

  class matrix_c
  #(
    type T = bit [ 31 : 0 ]  
  );
    
    typedef T matrix_t[][];

    static function matrix_t transpose( input matrix_t a ); 
      matrix_t rets;
      if( a.size() )
      begin
        rets = new [ a[ 0 ].size() ];  // Take size from first row
        for( int y = 0; y < rets.size(); y++ )
        begin
          rets[ y ] = new[ a.size() ];
          for( int x = 0; x < rets[ y ].size(); x++ )
            if( y < a[ x ].size() ) 
              rets[ y ][ x ] = a[ x ][ y ];
        end
      end
      return( rets );
    endfunction
  endclass

Does anyone have a suggestion on a way to do this more efficiently in SystemVerilog - other than one data point at a time in nested for() loops?

Thanks,

Mark

sbellock · June 12, 2019, 6:20pm

In reply to Mark Curry:

You could

Save the size of the input array so you don’t have to query it all the time.
Use separate loops for dynamic array sizing and performing the transpose operation. This might be easier for the simulator to optimize.


static function matrix_t transpose( input matrix_t a, input longint x_size, input longint y_size ); 
   // Loop to set size of rets using x_size and y_size.
   // Loop to perform transpose operation.
endfunction