MXU is the name for the XBurst SIMD instructions. SIMD means Single Instruction Multiple Data and is often used to speed up audio/video processing. Examples of SIMD instruction sets for other CPUs are MMX, SSE and AltiVec.
Contents |
Instruction Naming
The initial letter indicates the number of elements in the vector(s) operated upon: S(ingle) for 1, D(ual) for 2, Q(uad) for 4. The letter is followed by a number, which denotes the length of the input elements in bits. The number is followed by the name of the operation that will be performed.
Register Naming
There is a dedicated register set for the MXU operations. It contains 17 32-bit registers which will be referred to as xr0
..xr16
. Registers xr0
..xr15
are used in computations, xr16
is a control register. MXU register xr0
always has value 0; writes to it have no effect.
The main MIPS registers will be referred to as r0
..r31
.
Enabling MXU
Before the MXU can be used, it must be enabled. This is done by setting bit 0 (the lowest bit) of xr16
to 1.
Load and Store Instructions
S32I2M
S32I2M xr, r
Assigns the value of main register r
to MXU register xr
.
S32M2I
S32M2I xr, r
Assigns the value of MXU register xr
to main register r
.
S32LDD
S32LDD xr, p, o
Loads the contents of the memory at p + o
(pointer + offset) into MXU register xr
.
S32LDDV
S32LDDV xr, p, o, s
Loads the contents of the memory at p + o * 2s
(pointer + shifted offset) into MXU register xr
.
S32LDI
S32LDI xr, p, o
Loads the contents of the memory at p + o
(pointer + offset) into MXU register xr
. After that, p
is incremented by o
.
S32LDIV
S32LDIV xr, p, o, s
Loads the contents of the memory at p + o * 2s
(pointer + shifted offset) into MXU register xr
. After that, p
is incremented by o * 2s
.
S32STD
S32STD xr, p, o
Stores the contents of MXU register xr
into the memory at p + o
(pointer + offset).
S32STDV
S32STDV xr, p, o, s
Stores the contents of MXU register xr
into the memory at p + o * 2s
(pointer + shifted offset).
S32SDI
S32SDI xr, p, o
Stores the contents of MXU register xr
into the memory at p + o
(pointer + offset). After that, p
is incremented by o
.
S32SDIV
S32SDIV xr, p, o, s
Stores the contents of MXU register xr
into the memory at p + o * 2s
(pointer + shifted offset). After that, p
is incremented by o * 2s
.
Addition and Subtraction Instructions
D32ADD, Q16ADD
D32ADD xra, xrb, xrc, xrd, addsub
Q16ADD xra, xrb, xrc, xrd, addsub, swizzle
Performs addition and/or subtraction on vectors xrb
and xrc
and writes the results to vectors xra
and xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA: | xra := xrb + xrc; | xrd := xrb + xrc |
addsub = AS: | xra := xrb + xrc; | xrd := xrb - xrc |
addsub = SA: | xra := xrb - xrc; | xrd := xrb + xrc |
addsub = SS: | xra := xrb - xrc; | xrd := xrb - xrc |
When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW: | xrb.hl | (as-is) |
swizzle = XW: | xrb.lh | (exchanged) |
swizzle = HW: | xrb.hh | (clone high) |
swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc
are always used as-is.
D32ACC, Q16ACC
D32ACC xra, xrb, xrc, xrd, addsub
Q16ACC xra, xrb, xrc, xrd, addsub, swizzle
Performs addition and/or subtraction on vectors xrb
and xrc
and adds the results to vectors xra
and xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
mode = AA: | xra += xrb + xrc; | xrd += xrb + xrc |
mode = AS: | xra += xrb + xrc; | xrd += xrb - xrc |
mode = SA: | xra += xrb - xrc; | xrd += xrb + xrc |
mode = SS: | xra += xrb - xrc; | xrd += xrb - xrc |
When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW: | xrb.hl | (as-is) |
swizzle = XW: | xrb.lh | (exchanged) |
swizzle = HW: | xrb.hh | (clone high) |
swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc
are always used as-is.
Q8ADD
Q8ADD xra, xrb, xrc, addsub
Adds or subtracts the four 8-bit values in the vectors xrb
and xrc
. The four 8-bit results are stored in the vector xra
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA: | xra.h := xrb.h + xrc.h; | xra.l := xrb.l + xrc.l |
addsub = AS: | xra.h := xrb.h + xrc.h; | xra.l := xrb.l - xrc.l |
addsub = SA: | xra.h := xrb.h - xrc.h; | xra.l := xrb.l + xrc.l |
addsub = SS: | xra.h := xrb.h - xrc.h; | xra.l := xrb.l - xrc.l |
Q8ADDE
Q8ADDE xra, xrb, xrc, xrd, addsub
Adds or subtracts the four 8-bit unsigned values in the vectors xrb
and xrc
. The four 16-bit results are stored in the vectors xra
and xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA: | xra := xrb.h + xrc.h; | xrd := xrb.l + xrc.l |
addsub = AS: | xra := xrb.h + xrc.h; | xrd := xrb.l - xrc.l |
addsub = SA: | xra := xrb.h - xrc.h; | xrd := xrb.l + xrc.l |
addsub = SS: | xra := xrb.h - xrc.h; | xrd := xrb.l - xrc.l |
Q8ACCE
Q8ACCE xra, xrb, xrc, xrd, addsub
Adds or subtracts the four 8-bit unsigned values in the vectors xrb
and xrc
. The four 16-bit results are added to the vectors xra
and xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA: | xra += xrb.h + xrc.h; | xrd += xrb.l + xrc.l |
addsub = AS: | xra += xrb.h + xrc.h; | xrd += xrb.l - xrc.l |
addsub = SA: | xra += xrb.h - xrc.h; | xrd += xrb.l + xrc.l |
addsub = SS: | xra += xrb.h - xrc.h; | xrd += xrb.l - xrc.l |
D16AVG, Q8AVG
D16AVG xra, xrb, xrc
Q8AVG xra, xrb, xrc
Computes the average, rounded down, of the unsigned values in vectors xrb
and xrc
and assigns the result to vector xra
.
D16AVGR, Q8AVGR
D16AVGR xra, xrb, xrc
Q8AVGR xra, xrb, xrc
Computes the average, rounded up, of the unsigned values in vectors xrb
and xrc
and assigns the result to vector xra
.
Q8SAD
Q8SAD xra, xrb, xrc, xrd
Computes the absolute difference of the unsigned values in vectors xrb
and xrc
. The sum of these 4 differences is assigned to the full register xra
and added to the full register xrd
.
Multiply Instructions
D16MUL, Q8MUL
D16MUL xra, xrb, xrc, xrd, swizzle
Q8MUL xra, xrb, xrc, xrd
Multiplies the signed values in vector xrb
by the signed values in vector xrc
and assigns the results to vectors xra
and xrd
.
When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW: | xrb.hl | (as-is) |
swizzle = XW: | xrb.lh | (exchanged) |
swizzle = HW: | xrb.hh | (clone high) |
swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc
are always used as-is.
D16MAC, Q8MAC
D16MAC xra, xrb, xrc, xrd, addsub, swizzle
Q8MAC xra, xrb, xrc, xrd, addsub
Multiplies the signed values in vector xrb
by the signed values in vector xrc
and adds or subtracts the results to vectors xra
and xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA: | xra += xrb.h * xrc.h; | xrd += xrb.l * xrc.l |
addsub = AS: | xra += xrb.h * xrc.h; | xrd -= xrb.l * xrc.l |
addsub = SA: | xra -= xrb.h * xrc.h; | xrd += xrb.l * xrc.l |
addsub = SS: | xra -= xrb.h * xrc.h; | xrd -= xrb.l * xrc.l |
When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW: | xrb.hl | (as-is) |
swizzle = XW: | xrb.lh | (exchanged) |
swizzle = HW: | xrb.hh | (clone high) |
swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc
are always used as-is.
D16MADL, Q8MADL
D16MADL xra, xrb, xrc, xrd, addsub, swizzle
Q8MADL xra, xrb, xrc, xrd, addsub
Multiplies the signed values in vector xrb
by the signed values in vector xrc
. The results of the multiplication are added or subtracted from the values in vector xra
and that final result is written to vector xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA: | xrd.h := xra.h + xrb.h * xrc.h; | xrd.l := xra.l + xrb.l * xrc.l |
addsub = AS: | xrd.h := xra.h + xrb.h * xrc.h; | xrd.l := xra.l - xrb.l * xrc.l |
addsub = SA: | xrd.h := xra.h - xrb.h * xrc.h; | xrd.l := xra.l + xrb.l * xrc.l |
addsub = SS: | xrd.h := xra.h - xrb.h * xrc.h; | xrd.l := xra.l - xrb.l * xrc.l |
When the vector elements are 16-bit, it is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW: | xrb.hl | (as-is) |
swizzle = XW: | xrb.lh | (exchanged) |
swizzle = HW: | xrb.hh | (clone high) |
swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc
are always used as-is.
D16MULF
D16MULF xra, xrb, xrc, swizzle
Multiplies the signed values in vector xrb
by the signed values in vector xrc
. The highest 16 bits of the results of the multiplication are written to vector xra
. Note that the result of multiplying two 16-bit signed numbers is a 31-bit signed number (bit 30 being the sign bit), so vector xra
will contain bits 30..15 of the two multiplication results, not bits 31..16.
It is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW: | xrb.hl | (as-is) |
swizzle = XW: | xrb.lh | (exchanged) |
swizzle = HW: | xrb.hh | (clone high) |
swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc
are always used as-is.
D16MACF
D16MACF xra, xrb, xrc, xrd, addsub, swizzle
Multiplies the signed values in vector xrb
by the signed values in vector xrc
. These results are doubled to make two 32-bit signed numbers. Those numbers are then added to or subtracted from vector xra
and xrd
. The upper 16 bits of those numbers, rounded up, are written to vector xra
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA: | xra.h := ceil((xra + xrb.h * xrc.h * 2) / 2^16); | xra.l := ceil((xrd + xrb.l * xrc.l * 2) / 2^16) |
addsub = AS: | xra.h := ceil((xra + xrb.h * xrc.h * 2) / 2^16); | xra.l := ceil((xrd - xrb.l * xrc.l * 2) / 2^16) |
addsub = SA: | xra.h := ceil((xra - xrb.h * xrc.h * 2) / 2^16); | xra.l := ceil((xrd + xrb.l * xrc.l * 2) / 2^16) |
addsub = SS: | xra.h := ceil((xra - xrb.h * xrc.h * 2) / 2^16); | xra.l := ceil((xrd - xrb.l * xrc.l * 2) / 2^16) |
It is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW: | xrb.hl | (as-is) |
swizzle = XW: | xrb.lh | (exchanged) |
swizzle = HW: | xrb.hh | (clone high) |
swizzle = LW: | xrb.ll | (clone low) |
The values read from vector xrc
are always used as-is.
S16MAD
S16MAD xra, xrb, xrc, xrd, addsub, select
Multiplies a 16-bit signed value from vector xrb
with a 16-bit signed value from vector xrc
. The result is added to or subtracted from xra
and the final result is written to xrd
.
Whether the multiplication result is added or subtracted is controlled by addsub
, as shown in the following table:
addsub = A: | xrd := xra + x * y |
addsub = S: | xrd := xra - x * y |
Which parts of xrb
and xrc
are used is controlled by select
, as shown in the following table:
select = HH: | x := xrb.h; | y := xrc.h |
select = HL: | x := xrb.h; | y := xrc.l |
select = LH: | x := xrb.l; | y := xrc.h |
select = LL: | x := xrb.l; | y := xrc.l |
Other Math
S32MAX, D16MAX, Q8MAX
S32MAX xra, xrb, xrc
D16MAX xra, xrb, xrc
Q8MAX xra, xrb, xrc
Takes the maximum of the signed values of vector xrb
and vector xrc
and assigns those to vector xra
.
S32MIN, D16MIN, Q8MIN
S32MIN xra, xrb, xrc
D16MIN xra, xrb, xrc
Q8MIN xra, xrb, xrc
Takes the minimum of the signed values of vector xrb
and vector xrc
and assigns those to vector xra
.
Q16SAT
Q16SAT xra, xrb, xrc
Saturate: The values in xrb
and xrc
are taken as four 16-bit signed integers and clamped to the range [0..255]. The result is written to xra
, with from high to low: upper half of xrb
, lower half of xrb
, upper half of xrc
, lower half of xrc
.
S32CPS, D16CPS
S32CPS xra, xrb, xrc
D16CPS xra, xrb, xrc
Copy Sign: For each signed value in vector xrc
: If it is non-negative signed value, assign the corresponding value from vector xrb
, unmodified, to vector xra
. Otherwise, assign the corresponding value from vector xrb
, negated, to vector xra
.
Q8ABD
Q8ABD xra, xrb, xrc
Absolute difference: Computes the absolute value of the difference of the unsigned values in vector xrb
and vector xrc
and assigns the result to vector xra
.
Q8SLT
Q8SLT xra, xrb, xrc
Set on Less Than: Compares the signed values in vector xrb
and vector xrc
. If the value from xrb
is less than the value from xrc
, 1 is assigned to the corresponding position in vector xra
, otherwise 0 is assigned.
This is a vectorized version of the MIPS instruction SLT
.
Shift and Shuffle Instructions
D32SLL
D32SLL xra, xrb, xrc, xrd, S
Shift Logical Left: The value of xrb
is shifted S
bits to the left and the result is assigned to xra
. Also, the value of xrc
is shifted S
bits to the left and the result is assigned to xrd
. S
is a constant in the range [0..31].
D32SLLV
D32SLLV xra, xrb, rs
Shift Logical Left: The value of xra
is shifted S
bits to the left and the result is assigned to xra
. Also, the value of xrb
is shifted S
bits to the left and the result is assigned to xrb
. S
is [0..31]: the value of the lowest 5 bits of main MIPS register rs
.
D32SLR
D32SLR xra, xrb, xrc, xrd, S
Shift Logical Right: The unsigned value of xrb
is shifted S
bits to the right and the result is assigned to xra
. Also, the unsigned value of xrc
is shifted S
bits to the right and the result is assigned to xrd
. S
is a constant in the range [0..31].
D32SLRV
D32SLRV xra, xrb, rs
Shift Logical Right: The unsigned value of xra
is shifted S
bits to the right and the result is assigned to xra
. Also, the unsigned value of xrb
is shifted S
bits to the right and the result is assigned to xrb
. S
is [0..31]: the value of the lowest 5 bits of main MIPS register rs
.
D32SAR
D32SAR xra, xrb, xrc, xrd, S
Shift Arithmetic Right: The signed value of xrb
is shifted S
bits to the right and the result is assigned to xra
. Also, the signed value of xrc
is shifted S
bits to the right and the result is assigned to xrd
. S
is a constant in the range [0..31].
D32SARV
D32SARV xra, xrb, rs
Shift Arithmetic Right: The signed value of xra
is shifted S
bits to the right and the result is assigned to xra
. Also, the signed value of xrb
is shifted S
bits to the right and the result is assigned to xrb
. S
is [0..31]: the value of the lowest 5 bits of main MIPS register rs
.
D32SARL
D32SARL xra, xrb, xrc, S
Shift Arithmetic Right: The signed value of xrb
is shifted S
bits to the right and the lower 16 bits of the result are assigned to the higher 16 bits of xra
. Also, the signed value of xrc
is shifted S
bits to the right and the lower 16 bits of the result are assigned to the lower 16 bits of xra
. S is a constant in the range [0..31].
D32SARW
D32SARW xra, xrb, xrc, rs
Shift Arithmetic Right: The signed value of xrb
is shifted S
bits to the right and the lower 16 bits of the result are assigned to the higher 16 bits of xra
. Also, the signed value of xrc
is shifted S
bits to the right and the lower 16 bits of the result are assigned to the lower 16 bits of xra
. S
is [0..31]: the value of the lowest 5 bits of main MIPS register rs
.
Q16SLL
Q16SLL xra, xrb, xrc, xrd, S
Shift Logical Left: The values of the upper and lower halves of xrb
are shifted S
bits to the left and the result is assigned to xra
. Also, the values of the upper and lower halves of xrc
are shifted S
bits to the left and the result is assigned to xrd
. S
is a constant in the range [0..15].
Q16SLLV
Q16SLLV xra, xrb, rs
Shift Logical Left: The values of the upper and lower halves of xra
are shifted S
bits to the left and the result is assigned to xra
. Also, the values of the upper and lower halves of xrb
are shifted S
bits to the left and the result is assigned to xrb
. S
is [0..15]: the value of the lowest 4 bits of main MIPS register rs
.
Q16SLR
Q16SLR xra, xrb, xrc, xrd, S
Shift Logical Right: The unsigned values of the upper and lower halves of xrb
are shifted S
bits to the right and the result is assigned to xra
. Also, the unsigned values of the upper and lower halves of xrc
are shifted S
bits to the right and the result is assigned to xrd
. S
is a constant in the range [0..15].
Q16SLRV
Q16SLRV xra, xrb, rs
Shift Logical Right: The unsigned values of the upper and lower halves of xra
are shifted S
bits to the right and the result is assigned to xra
. Also, the unsigned values of the upper and lower halves of xrb
are shifted S
bits to the right and the result is assigned to xrb
. S
is [0..15]: the value of the lowest 4 bits of main MIPS register rs
.
Q16SAR
Q16SAR xra, xrb, xrc, xrd, S
Shift Arithmetic Right: The signed values of the upper and lower halves of xrb
are shifted S
bits to the right and the result is assigned to xra
. Also, the signed values of the upper and lower halves of xrc
are shifted S
bits to the right and the result is assigned to xrd
. S
is a constant in the range [0..15].
Q16SARV
Q16SARV xra, xrb, rs
Shift Arithmetic Right: The signed values of the upper and lower halves of xra
are shifted S
bits to the right and the result is assigned to xra
. Also, the signed values of the upper and lower halves of xrb
are shifted S
bits to the right and the result is assigned to xrb
. S
is [0..15]: the value of the lowest 4 bits of main MIPS register rs
.
S32ALN
S32ALN xra, xrb, xrc, s
Takes the value of xrb:xrc
, shifts it s
bytes (0..4) to the left and assigns the highest 32 bits of the result to xra
. Can be used to realign values that are not aligned in memory.
S32SFL
S32SFL xra, xrb, xrc, xrd, ptn
Shuffles (swizzles) the bytes of xrb
and xrc
as indicated in the table below and writes the result into xra
and xrd
.
Input | xrb | xrc | ||||||
---|---|---|---|---|---|---|---|---|
b3 | b2 | b1 | b0 | c3 | c2 | c1 | c0 | |
Output | xra | xrd | ||||||
ptn=0 | b3 | c3 | b2 | c2 | b1 | c1 | b0 | c0 |
ptn=1 | b3 | b1 | c3 | c1 | b2 | b0 | c2 | c0 |
ptn=2 | b3 | c3 | b1 | c1 | b2 | c2 | b0 | c0 |
ptn=3 | b3 | b2 | c3 | c2 | b1 | b0 | c1 | c0 |
New instructions in JZ4770
The JZ4770 has a quite a few additional MXU instructions. Ingenic writes 3 or 7 to register xr16 to activate these. This may imply that there are two levels of extension between JZ4740 and JZ4770.
Load and store instructions
-
LXB rb, rc, strd2
-
LXBU rb, rc, strd2
-
LXH rb, rc, strd2
-
LXHU rb, rc, strd2
-
LXW rb, rc, strd2
-
S16LDD xra, rb, s10, optn2
-
S16LDI xra, rb, s10, optn2
-
S16SDI xra, rb, s10, optn2
-
S16STD xra, rb, s10, optn2
-
S32LDDR xra, rb, s12
-
S32LDDVR xra, rb, rc, strd2
-
S32LDIR xra, rb, s12
-
S32LDIVR xra, rb, rc, strd2
-
S32SDIR xra, rb, s12
-
S32SDIVR xra, rb, rc, strd2
-
S32STDR xra, rb, s12
-
S32STDVR xra, rb, rc, strd2
-
S8LDD xra, rb, s8, optn3
-
S8LDI xra, rb, s8, optn3
-
S8SDI xra, rb, s8, optn3
-
S8STD xra, rb, s8, optn3
Other math
-
D16MOVN xra, xrb, xrc
-
D16MOVZ xra, xrb, xrc
-
D16SLT xra, xrb, xrc
-
Q16SCOP xra, xrb, xrc, xrd
-
Q8MOVN xra, xrb, xrc
-
Q8MOVZ xra, xrb, xrc
-
Q8SLTU xra, xrb, xrc
-
S32ABS xra, xrb
-
S32ALNI xra, xrb, xrc, optn3
-
S32EXTRV xra, xrd, rs, rt
-
S32EXTR xra, xrd, rs, bits5
-
S32LUI xra, s8, optn2
-
S32MOVN xra, xrb, xrc
-
S32MOVZ xra, xrb, xrc
-
S32SLT xra, xrb, xrc
Addition and subtraction instructions
-
D16ASUM xra, xrb, xrc, xrd
-
D32ACCM xra, xrb, xrc, xrd
-
D32ADDC xra, xrb, xrc
-
D32ASUM xra, xrb, xrc, xrd
-
Q16ACCM xra, xrb, xrc, xrd
-
Q16ASUM xra, xrb, xrc, xrd
-
S32MSUBU xra, xrd, rs, rt
-
S32MSUB xra, xrd, rs, rt
Multiply instructions
-
D16MACE xra, xrb, xrc, xrd
-
D16MULE xra, xrb, xrc, xrd
-
D8SUMC xra, xrb, xrc
-
D8SUM xra, xrb, xrc
-
Q8MULSU xra, xrb, xrc, xrd
-
Q8MACSU xra, xrb, xrc, xrd
-
S32MADDU xra, xrd, rs, rt
-
S32MADD xra, xrd, rs, rt
-
S32MULU xra, xrd, rs, rt
-
S32MUL xra, xrd, rs, rt
Bitwise instructions
-
S32AND xra, xrb, xrc
-
S32NOR xra, xrb, xrc
-
S32OR xra, xrb, xrc
-
S32XOR xra, xrb, xrc