aiengine_ml_intrinsics/intrinsics/group__intr__gpvectorop__mul.html

MSC:    res = acc_in1 - (X_vec x Y_vec)

NEGMUL: res = - (X_vec x Y_vec)

MACMUL: res = (zero_acc1 ? 0 : acc_in1) + (X_vec x Y_vec)

ADDMAC: res = acc_in1 + acc_in2 + (X_vec x Y_vec)

ADDMSC: res = acc_in1 + acc_in2 - (X_vec x Y_vec)

SUBMAC: res = acc_in1 - acc_in2 + (X_vec x Y_vec)

SUBMSC: res = acc_in1 - acc_in2 - (X_vec x Y_vec)
Precision Mode	Channels	Matrix A	Matrix B	Matrix C
8-bit x 4-bit = 32-bit	1	4x16	16x8	4x8
8-bit x 4-bit = 32-bit	1	4x32	32x8 (sparse)	4x8
8-bit x 8-bit = 32-bit	1	4x8	8x8	4x8
8-bit x 8-bit = 32-bit	32	1x2	2x1	1x1
8-bit x 8-bit = 32-bit	8	4x4 (convolution)	4x1	4x1
8-bit x 8-bit = 32-bit	4	8x8 (convolution)	8x1	8x1
8-bit x 8-bit = 32-bit	1	32x8 (convolution)	8x1	32x1
8-bit x 8-bit = 32-bit	1	4x16	16x8 (sparse)	4x8
16-bit x 8-bit = 32-bit	1	4x4	4x8	4x8
16-bit x 8-bit = 32-bit	2	4x4	4x4	4x4
16-bit x 16-bit = 32-bit	1	4x2	2x8	4x8
16-bit x 16-bit = 32-bit	32	1x1	1x1	1x1
16-bit x 8-bit = 64-bit	1	2x8	8x8	2x8
16-bit x 8-bit = 64-bit	1	4x8	8x4	4x4
16-bit x 8-bit = 64-bit	1	2x16	16x8 (sparse)	2x8
16-bit x 16-bit = 64-bit	1	2x4	4x8	2x8
16-bit x 16-bit = 64-bit	1	4x4	4x4	4x4
16-bit x 16-bit = 64-bit	16	1x2	2x1	1x1
16-bit x 16-bit = 64-bit	1	16x4 (convolution)	4x1	16x1
Complex 16-bit x Complex 16-bit = 64-bit	8	1x2	2x1	1x1
16-bit x 16-bit = 64-bit	1	2x8	8x8 (sparse)	2x8
32-bit x 16-bit = 64-bit	1	4x2	2x4	4x4
Complex 32-bit x Complex 16-bit = 64-bit	8	1x1	1x1	1x1
bfloat16 x bfloat16 = fp32	1	4x8	8x4	4x4
bfloat16 x bfloat16 = fp32	16	1x2	2x1	1x1
bfloat16 x bfloat16 = fp32	1	4x16	16x4 (sparse)	4x4
Modules
	Emulated Multiply-accumulate of 16b x 32b datatypes
	Matrix multiplications in which matrix A has data elements of 16 bit and matrix B has data elements of 32 bit. These operations are emulated on top of Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance.

	Emulated Multiply-accumulate of 32b x 16b datatypes
	Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 16 bit. These operations are emulated on top of Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance.

	Emulated Multiply-accumulate of 32b x 32b datatypes
	Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 32 bit. These operations are emulated on top of Multiply-accumulate of 32b x 16b integer datatypes and Multiply-accumulate of 16b x 16b integer datatypes and might not have optimal performance.

	Emulated Multiply-accumulate of Complex 32b x Complex 32b datatypes
	Matrix multiplications in which matrix A has data elements of complex 32 bit and matrix B has data elements of complex 32 bit. These operations are emulated on top of Multiply-accumulate of 32b x 16b complex integer datatypes and might not have optimal performance.

	Multiply-accumulate of 16b x 16b complex integer datatypes
	Matrix multiplications in which matrix A and matrix B have complex data elements of 16 bit. For an explanation how these operations works see Multiply Accumulate.

	Multiply-accumulate of 16b x 16b integer datatypes
	Matrix multiplications in which matrix A and matrix B have data elements of 16 bit.

	Multiply-accumulate of 16b x 8b integer datatypes
	Matrix multiplications in which matrix A has data elements of 16 bit and matrix B has data elements of 8 bit.

	Multiply-accumulate of 32b x 16b complex integer datatypes
	Matrix multiplications in which matrix A has complex data elements of 32 bit and matrix B has complex data elements of 16 bit.

	Multiply-accumulate of 32b x 16b integer datatypes
	Matrix multiplications in which matrix A has data elements of 32 bit and matrix B has data elements of 16 bit.

	Multiply-accumulate of 8b x 4b datatypes
	Matrix multiplications in which matrix A has data elements of 8 bit and matrix B has data elements of 4 bit.

	Multiply-accumulate of 8b x 8b integer datatypes
	Matrix multiplications in which matrix A and matrix B have data elements of 8 bit.

	Multiply-accumulate of bfloat16 datatypes
	Matrix multiplications in which matrix A and B have bfloat16 data elements.

	Multiply-accumulate of fp32 x fp32 datatypes
	Elementwise-multiplication and matrix multiplication using bfloat16 datapath. 2 options available. With or without set_rnd(0) for truncation before using these intrinsics. Use flag AIE2_FP32_EMULATION_SET_RND_MODE flag to set rnd mode to truncation. For an explanation how these operations works see Multiply Accumulate.

	Multiply-accumulate with a sparse matrix
	Matrix multiplications in which matrix B is a sparse matrix.