Performing matrix like transpose operation in an optimal way

I realize this may drift a bit into a tool-specific optimization, but I’m now struggling with some performance issues when modelling some linear algebra operations using SystemVerilog.

Specifically, just doing a tranpose operation is taking a signficant amount of time:

  class matrix_c
  #(
    type T = bit [ 31 : 0 ]  
  );
    
    typedef T matrix_t[][];

    static function matrix_t transpose( input matrix_t a ); 
      matrix_t rets;
      if( a.size() )
      begin
        rets = new [ a[ 0 ].size() ];  // Take size from first row
        for( int y = 0; y < rets.size(); y++ )
        begin
          rets[ y ] = new[ a.size() ];
          for( int x = 0; x < rets[ y ].size(); x++ )
            if( y < a[ x ].size() ) 
              rets[ y ][ x ] = a[ x ][ y ];
        end
      end
      return( rets );
    endfunction
  endclass

Does anyone have a suggestion on a way to do this more efficiently in SystemVerilog - other than one data point at a time in nested for() loops?

Thanks,

Mark

In reply to Mark Curry:

You could

  • Save the size of the input array so you don’t have to query it all the time.
  • Use separate loops for dynamic array sizing and performing the transpose operation. This might be easier for the simulator to optimize.

static function matrix_t transpose( input matrix_t a, input longint x_size, input longint y_size ); 
   // Loop to set size of rets using x_size and y_size.
   // Loop to perform transpose operation.
endfunction