How to influence Synthesis to create a CSA tree and/or 4:2 compressors

For creating a fast multiplication algorithm, often times a CSA (Carry-Save Adder) tree will be inferred to speed up the addition of partial products.
I believe that this code I made will just generate a chain of adders:

    logic [DATA_LEN*2-1:0] mul_rslt_ex2;
    always_comb begin
        mul_rslt_ex2 = '0;
        for (int i = 0; i < DATA_LEN; i++) begin
            mul_rslt_ex2 += partial_product_ex1[i]

However, is there a specific way to write my RTL to influence Synthesis to generate a CSA tree implementation without directly instantiating CSA adders?
What about 4:2 compressors to further speed up the adder tree?

Would appreciate any help on this issue :)

Also here is a picture of the generated elaborated design schematic which leads me to believe my code is generating serial adders (very inefficient) Generated Elaborated Design from inferred adders - Siemens forum question - Google Docs

In reply to blaze09:
I think the answer to your question is technology and tool specific. Suggest you ask your question here or here and mention what synthesis tools you are using and what technologies you are targeting.