AI Engine-ML Intrinsics User Guide  (v2023.2)
Multiply-accumulate of bfloat16 datatypes

Matrix multiplications in which matrix A and B have bfloat16 data elements. More...

Overview

Matrix multiplications in which matrix A and B have bfloat16 data elements.

For an explanation how these operations works see Multiply Accumulate.

Multiplication of (4x8) with (8x4)


v16accfloat mul_4x8_8x4 (v32bfloat16 a, v32bfloat16 b)
 
v16accfloat negmul_4x8_8x4 (v32bfloat16 a, v32bfloat16 b)
 
v16accfloat mac_4x8_8x4 (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1)
 
v16accfloat msc_4x8_8x4 (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1)
 
v16accfloat addmac_4x8_8x4 (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, v16accfloat acc2)
 
v16accfloat addmsc_4x8_8x4 (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, v16accfloat acc2)
 

Multiplication of (4x8) with (8x4) with dynamic negation of multiplication result

v16accfloat mul_4x8_8x4_conf (v32bfloat16 a, v32bfloat16 b, int sub_mul)
 
v16accfloat negmul_4x8_8x4_conf (v32bfloat16 a, v32bfloat16 b, int sub_mul)
 

Multiplication of (4x8) with (8x4) with dynamic negation of multiplication result, zeroing of acc1, and negation of acc1

v16accfloat mac_4x8_8x4_conf (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, int zero_acc1, int sub_mul, int sub_acc1)
 
v16accfloat msc_4x8_8x4_conf (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, int zero_acc1, int sub_mul, int sub_acc1)
 
v16accfloat addmac_4x8_8x4_conf (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, v16accfloat acc2, int zero_acc1, int sub_mul, int sub_acc1, int sub_acc2)
 
v16accfloat addmsc_4x8_8x4_conf (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, v16accfloat acc2, int zero_acc1, int sub_mul, int sub_acc1, int sub_acc2)
 

Channel by channel multiplication of (1x2) with (2x1)


v16accfloat mul_elem_16_2 (v32bfloat16 a, v32bfloat16 b)
 
v16accfloat negmul_elem_16_2 (v32bfloat16 a, v32bfloat16 b)
 
v16accfloat mac_elem_16_2 (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1)
 
v16accfloat msc_elem_16_2 (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1)
 
v16accfloat addmac_elem_16_2 (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, v16accfloat acc2)
 
v16accfloat addmsc_elem_16_2 (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, v16accfloat acc2)
 

Channel by channel multiplication of (1x2) with (2x1) with dynamic negation of multiplication result

v16accfloat mul_elem_16_2_conf (v32bfloat16 a, v32bfloat16 b, int sub_mul)
 
v16accfloat negmul_elem_16_2_conf (v32bfloat16 a, v32bfloat16 b, int sub_mul)
 

Channel by channel multiplication of (1x2) with (2x1) with dynamic negation of multiplication result, zeroing of acc1, and negation of acc1

v16accfloat mac_elem_16_2_conf (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, int zero_acc1, int sub_mul, int sub_acc1)
 
v16accfloat msc_elem_16_2_conf (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, int zero_acc1, int sub_mul, int sub_acc1)
 
v16accfloat addmac_elem_16_2_conf (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, v16accfloat acc2, int zero_acc1, int sub_mul, int sub_acc1, int sub_acc2)
 
v16accfloat addmsc_elem_16_2_conf (v32bfloat16 a, v32bfloat16 b, v16accfloat acc1, v16accfloat acc2, int zero_acc1, int sub_mul, int sub_acc1, int sub_acc2)
 

Function Documentation

◆ addmac_4x8_8x4()

v16accfloat addmac_4x8_8x4 ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
v16accfloat  acc2 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
acc2Accumulator 2 input
Returns
Result of operation

◆ addmac_4x8_8x4_conf()

v16accfloat addmac_4x8_8x4_conf ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
v16accfloat  acc2,
int  zero_acc1,
int  sub_mul,
int  sub_acc1,
int  sub_acc2 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
acc2Accumulator 2 input
zero_acc1Zeroing mask for acc1
sub_mulNegation mask of multiplication result
sub_acc1Negation mask of acc1
sub_acc2Negation mask of acc2
Returns
Result of operation

◆ addmac_elem_16_2()

v16accfloat addmac_elem_16_2 ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
v16accfloat  acc2 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
acc2Accumulator 2 input
Returns
Result of operation

◆ addmac_elem_16_2_conf()

v16accfloat addmac_elem_16_2_conf ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
v16accfloat  acc2,
int  zero_acc1,
int  sub_mul,
int  sub_acc1,
int  sub_acc2 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
acc2Accumulator 2 input
zero_acc1Zeroing mask for acc1
sub_mulNegation mask of multiplication result
sub_acc1Negation mask of acc1
sub_acc2Negation mask of acc2
Returns
Result of operation

◆ addmsc_4x8_8x4()

v16accfloat addmsc_4x8_8x4 ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
v16accfloat  acc2 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
acc2Accumulator 2 input
Returns
Result of operation

◆ addmsc_4x8_8x4_conf()

v16accfloat addmsc_4x8_8x4_conf ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
v16accfloat  acc2,
int  zero_acc1,
int  sub_mul,
int  sub_acc1,
int  sub_acc2 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
acc2Accumulator 2 input
zero_acc1Zeroing mask for acc1
sub_mulNegation mask of multiplication result
sub_acc1Negation mask of acc1
sub_acc2Negation mask of acc2
Returns
Result of operation

◆ addmsc_elem_16_2()

v16accfloat addmsc_elem_16_2 ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
v16accfloat  acc2 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
acc2Accumulator 2 input
Returns
Result of operation

◆ addmsc_elem_16_2_conf()

v16accfloat addmsc_elem_16_2_conf ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
v16accfloat  acc2,
int  zero_acc1,
int  sub_mul,
int  sub_acc1,
int  sub_acc2 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
acc2Accumulator 2 input
zero_acc1Zeroing mask for acc1
sub_mulNegation mask of multiplication result
sub_acc1Negation mask of acc1
sub_acc2Negation mask of acc2
Returns
Result of operation

◆ mac_4x8_8x4()

v16accfloat mac_4x8_8x4 ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
Returns
Result of operation

◆ mac_4x8_8x4_conf()

v16accfloat mac_4x8_8x4_conf ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
int  zero_acc1,
int  sub_mul,
int  sub_acc1 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
zero_acc1Zeroing mask for acc1
sub_mulNegation mask of multiplication result
sub_acc1Negation mask of acc1
Returns
Result of operation

◆ mac_elem_16_2()

v16accfloat mac_elem_16_2 ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
Returns
Result of operation

◆ mac_elem_16_2_conf()

v16accfloat mac_elem_16_2_conf ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
int  zero_acc1,
int  sub_mul,
int  sub_acc1 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
zero_acc1Zeroing mask for acc1
sub_mulNegation mask of multiplication result
sub_acc1Negation mask of acc1
Returns
Result of operation

◆ msc_4x8_8x4()

v16accfloat msc_4x8_8x4 ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
Returns
Result of operation

◆ msc_4x8_8x4_conf()

v16accfloat msc_4x8_8x4_conf ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
int  zero_acc1,
int  sub_mul,
int  sub_acc1 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
zero_acc1Zeroing mask for acc1
sub_mulNegation mask of multiplication result
sub_acc1Negation mask of acc1
Returns
Result of operation

◆ msc_elem_16_2()

v16accfloat msc_elem_16_2 ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
Returns
Result of operation

◆ msc_elem_16_2_conf()

v16accfloat msc_elem_16_2_conf ( v32bfloat16  a,
v32bfloat16  b,
v16accfloat  acc1,
int  zero_acc1,
int  sub_mul,
int  sub_acc1 
)
Parameters
aMatrix A
bMatrix B
acc1Accumulator 1 input
zero_acc1Zeroing mask for acc1
sub_mulNegation mask of multiplication result
sub_acc1Negation mask of acc1
Returns
Result of operation

◆ mul_4x8_8x4()

v16accfloat mul_4x8_8x4 ( v32bfloat16  a,
v32bfloat16  b 
)
Parameters
aMatrix A
bMatrix B
Returns
Result of operation

◆ mul_4x8_8x4_conf()

v16accfloat mul_4x8_8x4_conf ( v32bfloat16  a,
v32bfloat16  b,
int  sub_mul 
)
Parameters
aMatrix A
bMatrix B
sub_mulNegation mask for multiplication result
Returns
Result of operation

◆ mul_elem_16_2()

v16accfloat mul_elem_16_2 ( v32bfloat16  a,
v32bfloat16  b 
)
Parameters
aMatrix A
bMatrix B
Returns
Result of operation

◆ mul_elem_16_2_conf()

v16accfloat mul_elem_16_2_conf ( v32bfloat16  a,
v32bfloat16  b,
int  sub_mul 
)
Parameters
aMatrix A
bMatrix B
sub_mulNegation mask for multiplication result
Returns
Result of operation

◆ negmul_4x8_8x4()

v16accfloat negmul_4x8_8x4 ( v32bfloat16  a,
v32bfloat16  b 
)
Parameters
aMatrix A
bMatrix B
Returns
Result of operation

◆ negmul_4x8_8x4_conf()

v16accfloat negmul_4x8_8x4_conf ( v32bfloat16  a,
v32bfloat16  b,
int  sub_mul 
)
Parameters
aMatrix A
bMatrix B
sub_mulNegation mask for multiplication result. If a bit of sub_mul is set the corresponding vector lane of the output accumulator will be negated.
Returns
Result of operation

◆ negmul_elem_16_2()

v16accfloat negmul_elem_16_2 ( v32bfloat16  a,
v32bfloat16  b 
)
Parameters
aMatrix A
bMatrix B
Returns
Result of operation

◆ negmul_elem_16_2_conf()

v16accfloat negmul_elem_16_2_conf ( v32bfloat16  a,
v32bfloat16  b,
int  sub_mul 
)
Parameters
aMatrix A
bMatrix B
sub_mulNegation mask for multiplication result. If a bit of sub_mul is set the corresponding vector lane of the output accumulator will be negated.
Returns
Result of operation