I realize this may drift a bit into a tool-specific optimization, but I’m now struggling with some performance issues when modelling some linear algebra operations using SystemVerilog.
Specifically, just doing a tranpose operation is taking a signficant amount of time:
class matrix_c
#(
type T = bit [ 31 : 0 ]
);
typedef T matrix_t[][];
static function matrix_t transpose( input matrix_t a );
matrix_t rets;
if( a.size() )
begin
rets = new [ a[ 0 ].size() ]; // Take size from first row
for( int y = 0; y < rets.size(); y++ )
begin
rets[ y ] = new[ a.size() ];
for( int x = 0; x < rets[ y ].size(); x++ )
if( y < a[ x ].size() )
rets[ y ][ x ] = a[ x ][ y ];
end
end
return( rets );
endfunction
endclass
Does anyone have a suggestion on a way to do this more efficiently in SystemVerilog - other than one data point at a time in nested for() loops?
Thanks,
Mark