For creating a fast multiplication algorithm, often times a CSA (Carry-Save Adder) tree will be inferred to speed up the addition of partial products.
I believe that this code I made will just generate a chain of adders:
logic [DATA_LEN*2-1:0] mul_rslt_ex2;
always_comb begin
mul_rslt_ex2 = '0;
for (int i = 0; i < DATA_LEN; i++) begin
mul_rslt_ex2 += partial_product_ex1[i]
end
end
However, is there a specific way to write my RTL to influence Synthesis to generate a CSA tree implementation without directly instantiating CSA adders?
What about 4:2 compressors to further speed up the adder tree?
Would appreciate any help on this issue :)
Also here is a picture of the generated elaborated design schematic which leads me to believe my code is generating serial adders (very inefficient) Generated Elaborated Design from inferred adders - Siemens forum question - Google Docs