MXU is the name for the XBurst SIMD instructions. SIMD means Single Instruction Multiple Data and is often used to speed up audio/video processing. Examples of SIMD instruction sets for other CPUs are MMX, SSE and AltiVec.
Contents 
Instruction Naming
The initial letter indicates the number of elements in the vector(s) operated upon: S(ingle) for 1, D(ual) for 2, Q(uad) for 4. The letter is followed by a number, which denotes the length of the input elements in bits. The number is followed by the name of the operation that will be performed.
Register Naming
There is a dedicated register set for the MXU operations. It contains 17 32bit registers which will be referred to as xr0
..xr16
. Registers xr0
..xr15
are used in computations, xr16
is a control register. MXU register xr0
always has value 0; writes to it have no effect.
The main MIPS registers will be referred to as r0
..r31
.
Enabling MXU
Before the MXU can be used, it must be enabled. This is done by setting bit 0 (the lowest bit) of xr16
to 1.
Load and Store Instructions
S32I2M
S32I2M xr, r
Assigns the value of main register r
to MXU register xr
.
S32M2I
S32M2I xr, r
Assigns the value of MXU register xr
to main register r
.
S32LDD
S32LDD xr, p, o
Loads the contents of the memory at p + o
(pointer + offset) into MXU register xr
.
S32LDDV
S32LDDV xr, p, o, s
Loads the contents of the memory at p + o * 2^{s}
(pointer + shifted offset) into MXU register xr
.
S32LDI
S32LDI xr, p, o
Loads the contents of the memory at p + o
(pointer + offset) into MXU register xr
. After that, p
is incremented by o
.
S32LDIV
S32LDIV xr, p, o, s
Loads the contents of the memory at p + o * 2^{s}
(pointer + shifted offset) into MXU register xr
. After that, p
is incremented by o * 2^{s}
.
S32STD
S32STD xr, p, o
Stores the contents of MXU register xr
into the memory at p + o
(pointer + offset).
S32STDV
S32STDV xr, p, o, s
Stores the contents of MXU register xr
into the memory at p + o * 2^{s}
(pointer + shifted offset).
S32SDI
S32SDI xr, p, o
Stores the contents of MXU register xr
into the memory at p + o
(pointer + offset). After that, p
is incremented by o
.
S32SDIV
S32SDIV xr, p, o, s
Stores the contents of MXU register xr
into the memory at p + o * 2^{s}
(pointer + shifted offset). After that, p
is incremented by o * 2^{s}
.
Addition and Subtraction Instructions
D32ADD, Q16ADD
D32ADD xra, xrb, xrc, xrd, addsub
Q16ADD xra, xrb, xrc, xrd, addsub, swizzle
Performs addition and/or subtraction on vectors xrb
and xrc
and writes the results to vectors xra
and xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA:  xra := xrb + xrc;  xrd := xrb + xrc 
addsub = AS:  xra := xrb + xrc;  xrd := xrb  xrc 
addsub = SA:  xra := xrb  xrc;  xrd := xrb + xrc 
addsub = SS:  xra := xrb  xrc;  xrd := xrb  xrc 
When the vector elements are 16bit, it is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW:  xrb.hl  (asis) 
swizzle = XW:  xrb.lh  (exchanged) 
swizzle = HW:  xrb.hh  (clone high) 
swizzle = LW:  xrb.ll  (clone low) 
The values read from vector xrc
are always used asis.
D32ACC, Q16ACC
D32ACC xra, xrb, xrc, xrd, addsub
Q16ACC xra, xrb, xrc, xrd, addsub, swizzle
Performs addition and/or subtraction on vectors xrb
and xrc
and adds the results to vectors xra
and xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
mode = AA:  xra += xrb + xrc;  xrd += xrb + xrc 
mode = AS:  xra += xrb + xrc;  xrd += xrb  xrc 
mode = SA:  xra += xrb  xrc;  xrd += xrb + xrc 
mode = SS:  xra += xrb  xrc;  xrd += xrb  xrc 
When the vector elements are 16bit, it is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW:  xrb.hl  (asis) 
swizzle = XW:  xrb.lh  (exchanged) 
swizzle = HW:  xrb.hh  (clone high) 
swizzle = LW:  xrb.ll  (clone low) 
The values read from vector xrc
are always used asis.
Q8ADD
Q8ADD xra, xrb, xrc, addsub
Adds or subtracts the four 8bit values in the vectors xrb
and xrc
. The four 8bit results are stored in the vector xra
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA:  xra.h := xrb.h + xrc.h;  xra.l := xrb.l + xrc.l 
addsub = AS:  xra.h := xrb.h + xrc.h;  xra.l := xrb.l  xrc.l 
addsub = SA:  xra.h := xrb.h  xrc.h;  xra.l := xrb.l + xrc.l 
addsub = SS:  xra.h := xrb.h  xrc.h;  xra.l := xrb.l  xrc.l 
Q8ADDE
Q8ADDE xra, xrb, xrc, xrd, addsub
Adds or subtracts the four 8bit unsigned values in the vectors xrb
and xrc
. The four 16bit results are stored in the vectors xra
and xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA:  xra := xrb.h + xrc.h;  xrd := xrb.l + xrc.l 
addsub = AS:  xra := xrb.h + xrc.h;  xrd := xrb.l  xrc.l 
addsub = SA:  xra := xrb.h  xrc.h;  xrd := xrb.l + xrc.l 
addsub = SS:  xra := xrb.h  xrc.h;  xrd := xrb.l  xrc.l 
Q8ACCE
Q8ACCE xra, xrb, xrc, xrd, addsub
Adds or subtracts the four 8bit unsigned values in the vectors xrb
and xrc
. The four 16bit results are added to the vectors xra
and xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA:  xra += xrb.h + xrc.h;  xrd += xrb.l + xrc.l 
addsub = AS:  xra += xrb.h + xrc.h;  xrd += xrb.l  xrc.l 
addsub = SA:  xra += xrb.h  xrc.h;  xrd += xrb.l + xrc.l 
addsub = SS:  xra += xrb.h  xrc.h;  xrd += xrb.l  xrc.l 
D16AVG, Q8AVG
D16AVG xra, xrb, xrc
Q8AVG xra, xrb, xrc
Computes the average, rounded down, of the unsigned values in vectors xrb
and xrc
and assigns the result to vector xra
.
D16AVGR, Q8AVGR
D16AVGR xra, xrb, xrc
Q8AVGR xra, xrb, xrc
Computes the average, rounded up, of the unsigned values in vectors xrb
and xrc
and assigns the result to vector xra
.
Q8SAD
Q8SAD xra, xrb, xrc, xrd
Computes the absolute difference of the unsigned values in vectors xrb
and xrc
. The sum of these 4 differences is assigned to the full register xra
and added to the full register xrd
.
Multiply Instructions
D16MUL, Q8MUL
D16MUL xra, xrb, xrc, xrd, swizzle
Q8MUL xra, xrb, xrc, xrd
Multiplies the signed values in vector xrb
by the signed values in vector xrc
and assigns the results to vectors xra
and xrd
.
When the vector elements are 16bit, it is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW:  xrb.hl  (asis) 
swizzle = XW:  xrb.lh  (exchanged) 
swizzle = HW:  xrb.hh  (clone high) 
swizzle = LW:  xrb.ll  (clone low) 
The values read from vector xrc
are always used asis.
D16MAC, Q8MAC
D16MAC xra, xrb, xrc, xrd, addsub, swizzle
Q8MAC xra, xrb, xrc, xrd, addsub
Multiplies the signed values in vector xrb
by the signed values in vector xrc
and adds or subtracts the results to vectors xra
and xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA:  xra += xrb.h * xrc.h;  xrd += xrb.l * xrc.l 
addsub = AS:  xra += xrb.h * xrc.h;  xrd = xrb.l * xrc.l 
addsub = SA:  xra = xrb.h * xrc.h;  xrd += xrb.l * xrc.l 
addsub = SS:  xra = xrb.h * xrc.h;  xrd = xrb.l * xrc.l 
When the vector elements are 16bit, it is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW:  xrb.hl  (asis) 
swizzle = XW:  xrb.lh  (exchanged) 
swizzle = HW:  xrb.hh  (clone high) 
swizzle = LW:  xrb.ll  (clone low) 
The values read from vector xrc
are always used asis.
D16MADL, Q8MADL
D16MADL xra, xrb, xrc, xrd, addsub, swizzle
Q8MADL xra, xrb, xrc, xrd, addsub
Multiplies the signed values in vector xrb
by the signed values in vector xrc
. The results of the multiplication are added or subtracted from the values in vector xra
and that final result is written to vector xrd
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA:  xrd.h := xra.h + xrb.h * xrc.h;  xrd.l := xra.l + xrb.l * xrc.l 
addsub = AS:  xrd.h := xra.h + xrb.h * xrc.h;  xrd.l := xra.l  xrb.l * xrc.l 
addsub = SA:  xrd.h := xra.h  xrb.h * xrc.h;  xrd.l := xra.l + xrb.l * xrc.l 
addsub = SS:  xrd.h := xra.h  xrb.h * xrc.h;  xrd.l := xra.l  xrb.l * xrc.l 
When the vector elements are 16bit, it is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW:  xrb.hl  (asis) 
swizzle = XW:  xrb.lh  (exchanged) 
swizzle = HW:  xrb.hh  (clone high) 
swizzle = LW:  xrb.ll  (clone low) 
The values read from vector xrc
are always used asis.
D16MULF
D16MULF xra, xrb, xrc, swizzle
Multiplies the signed values in vector xrb
by the signed values in vector xrc
. The highest 16 bits of the results of the multiplication are written to vector xra
. Note that the result of multiplying two 16bit signed numbers is a 31bit signed number (bit 30 being the sign bit), so vector xra
will contain bits 30..15 of the two multiplication results, not bits 31..16.
It is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW:  xrb.hl  (asis) 
swizzle = XW:  xrb.lh  (exchanged) 
swizzle = HW:  xrb.hh  (clone high) 
swizzle = LW:  xrb.ll  (clone low) 
The values read from vector xrc
are always used asis.
D16MACF
D16MACF xra, xrb, xrc, xrd, addsub, swizzle
Multiplies the signed values in vector xrb
by the signed values in vector xrc
. These results are doubled to make two 32bit signed numbers. Those numbers are then added to or subtracted from vector xra
and xrd
. The upper 16 bits of those numbers, rounded up, are written to vector xra
.
Whether the values are added or subtracted is controlled by addsub
, as shown in the following table:
addsub = AA:  xra.h := ceil((xra + xrb.h * xrc.h * 2) / 2^16);  xra.l := ceil((xrd + xrb.l * xrc.l * 2) / 2^16) 
addsub = AS:  xra.h := ceil((xra + xrb.h * xrc.h * 2) / 2^16);  xra.l := ceil((xrd  xrb.l * xrc.l * 2) / 2^16) 
addsub = SA:  xra.h := ceil((xra  xrb.h * xrc.h * 2) / 2^16);  xra.l := ceil((xrd + xrb.l * xrc.l * 2) / 2^16) 
addsub = SS:  xra.h := ceil((xra  xrb.h * xrc.h * 2) / 2^16);  xra.l := ceil((xrd  xrb.l * xrc.l * 2) / 2^16) 
It is possible to swizzle the values read from vector xrb
as follows:
swizzle = WW:  xrb.hl  (asis) 
swizzle = XW:  xrb.lh  (exchanged) 
swizzle = HW:  xrb.hh  (clone high) 
swizzle = LW:  xrb.ll  (clone low) 
The values read from vector xrc
are always used asis.
S16MAD
S16MAD xra, xrb, xrc, xrd, addsub, select
Multiplies a 16bit signed value from vector xrb
with a 16bit signed value from vector xrc
. The result is added to or subtracted from xra
and the final result is written to xrd
.
Whether the multiplication result is added or subtracted is controlled by addsub
, as shown in the following table:
addsub = A:  xrd := xra + x * y 
addsub = S:  xrd := xra  x * y 
Which parts of xrb
and xrc
are used is controlled by select
, as shown in the following table:
select = HH:  x := xrb.h;  y := xrc.h 
select = HL:  x := xrb.h;  y := xrc.l 
select = LH:  x := xrb.l;  y := xrc.h 
select = LL:  x := xrb.l;  y := xrc.l 
Other Math
S32MAX, D16MAX, Q8MAX
S32MAX xra, xrb, xrc
D16MAX xra, xrb, xrc
Q8MAX xra, xrb, xrc
Takes the maximum of the signed values of vector xrb
and vector xrc
and assigns those to vector xra
.
S32MIN, D16MIN, Q8MIN
S32MIN xra, xrb, xrc
D16MIN xra, xrb, xrc
Q8MIN xra, xrb, xrc
Takes the minimum of the signed values of vector xrb
and vector xrc
and assigns those to vector xra
.
Q16SAT
Q16SAT xra, xrb, xrc
Saturate: The values in xrb
and xrc
are taken as four 16bit signed integers and clamped to the range [0..255]. The result is written to xra
, with from high to low: upper half of xrb
, lower half of xrb
, upper half of xrc
, lower half of xrc
.
S32CPS, D16CPS
S32CPS xra, xrb, xrc
D16CPS xra, xrb, xrc
Copy Sign: For each signed value in vector xrc
: If it is nonnegative signed value, assign the corresponding value from vector xrb
, unmodified, to vector xra
. Otherwise, assign the corresponding value from vector xrb
, negated, to vector xra
.
Q8ABD
Q8ABD xra, xrb, xrc
Absolute difference: Computes the absolute value of the difference of the unsigned values in vector xrb
and vector xrc
and assigns the result to vector xra
.
Q8SLT
Q8SLT xra, xrb, xrc
Set on Less Than: Compares the signed values in vector xrb
and vector xrc
. If the value from xrb
is less than the value from xrc
, 1 is assigned to the corresponding position in vector xra
, otherwise 0 is assigned.
This is a vectorized version of the MIPS instruction SLT
.
Shift and Shuffle Instructions
D32SLL
D32SLL xra, xrb, xrc, xrd, S
Shift Logical Left: The value of xrb
is shifted S
bits to the left and the result is assigned to xra
. Also, the value of xrc
is shifted S
bits to the left and the result is assigned to xrd
. S
is a constant in the range [0..31].
D32SLLV
D32SLLV xra, xrb, rs
Shift Logical Left: The value of xra
is shifted S
bits to the left and the result is assigned to xra
. Also, the value of xrb
is shifted S
bits to the left and the result is assigned to xrb
. S
is [0..31]: the value of the lowest 5 bits of main MIPS register rs
.
D32SLR
D32SLR xra, xrb, xrc, xrd, S
Shift Logical Right: The unsigned value of xrb
is shifted S
bits to the right and the result is assigned to xra
. Also, the unsigned value of xrc
is shifted S
bits to the right and the result is assigned to xrd
. S
is a constant in the range [0..31].
D32SLRV
D32SLRV xra, xrb, rs
Shift Logical Right: The unsigned value of xra
is shifted S
bits to the right and the result is assigned to xra
. Also, the unsigned value of xrb
is shifted S
bits to the right and the result is assigned to xrb
. S
is [0..31]: the value of the lowest 5 bits of main MIPS register rs
.
D32SAR
D32SAR xra, xrb, xrc, xrd, S
Shift Arithmetic Right: The signed value of xrb
is shifted S
bits to the right and the result is assigned to xra
. Also, the signed value of xrc
is shifted S
bits to the right and the result is assigned to xrd
. S
is a constant in the range [0..31].
D32SARV
D32SARV xra, xrb, rs
Shift Arithmetic Right: The signed value of xra
is shifted S
bits to the right and the result is assigned to xra
. Also, the signed value of xrb
is shifted S
bits to the right and the result is assigned to xrb
. S
is [0..31]: the value of the lowest 5 bits of main MIPS register rs
.
D32SARL
D32SARL xra, xrb, xrc, S
Shift Arithmetic Right: The signed value of xrb
is shifted S
bits to the right and the lower 16 bits of the result are assigned to the higher 16 bits of xra
. Also, the signed value of xrc
is shifted S
bits to the right and the lower 16 bits of the result are assigned to the lower 16 bits of xra
. S is a constant in the range [0..31].
D32SARW
D32SARW xra, xrb, xrc, rs
Shift Arithmetic Right: The signed value of xrb
is shifted S
bits to the right and the lower 16 bits of the result are assigned to the higher 16 bits of xra
. Also, the signed value of xrc
is shifted S
bits to the right and the lower 16 bits of the result are assigned to the lower 16 bits of xra
. S
is [0..31]: the value of the lowest 5 bits of main MIPS register rs
.
Q16SLL
Q16SLL xra, xrb, xrc, xrd, S
Shift Logical Left: The values of the upper and lower halves of xrb
are shifted S
bits to the left and the result is assigned to xra
. Also, the values of the upper and lower halves of xrc
are shifted S
bits to the left and the result is assigned to xrd
. S
is a constant in the range [0..15].
Q16SLLV
Q16SLLV xra, xrb, rs
Shift Logical Left: The values of the upper and lower halves of xra
are shifted S
bits to the left and the result is assigned to xra
. Also, the values of the upper and lower halves of xrb
are shifted S
bits to the left and the result is assigned to xrb
. S
is [0..15]: the value of the lowest 4 bits of main MIPS register rs
.
Q16SLR
Q16SLR xra, xrb, xrc, xrd, S
Shift Logical Right: The unsigned values of the upper and lower halves of xrb
are shifted S
bits to the right and the result is assigned to xra
. Also, the unsigned values of the upper and lower halves of xrc
are shifted S
bits to the right and the result is assigned to xrd
. S
is a constant in the range [0..15].
Q16SLRV
Q16SLRV xra, xrb, rs
Shift Logical Right: The unsigned values of the upper and lower halves of xra
are shifted S
bits to the right and the result is assigned to xra
. Also, the unsigned values of the upper and lower halves of xrb
are shifted S
bits to the right and the result is assigned to xrb
. S
is [0..15]: the value of the lowest 4 bits of main MIPS register rs
.
Q16SAR
Q16SAR xra, xrb, xrc, xrd, S
Shift Arithmetic Right: The signed values of the upper and lower halves of xrb
are shifted S
bits to the right and the result is assigned to xra
. Also, the signed values of the upper and lower halves of xrc
are shifted S
bits to the right and the result is assigned to xrd
. S
is a constant in the range [0..15].
Q16SARV
Q16SARV xra, xrb, rs
Shift Arithmetic Right: The signed values of the upper and lower halves of xra
are shifted S
bits to the right and the result is assigned to xra
. Also, the signed values of the upper and lower halves of xrb
are shifted S
bits to the right and the result is assigned to xrb
. S
is [0..15]: the value of the lowest 4 bits of main MIPS register rs
.
S32ALN
S32ALN xra, xrb, xrc, s
Takes the value of xrb:xrc
, shifts it s
bytes (0..4) to the left and assigns the highest 32 bits of the result to xra
. Can be used to realign values that are not aligned in memory.
S32SFL
S32SFL xra, xrb, xrc, xrd, ptn
Shuffles (swizzles) the bytes of xrb
and xrc
as indicated in the table below and writes the result into xra
and xrd
.
Input  xrb  xrc  

b3  b2  b1  b0  c3  c2  c1  c0  
Output  xra  xrd  
ptn=0  b3  c3  b2  c2  b1  c1  b0  c0 
ptn=1  b3  b1  c3  c1  b2  b0  c2  c0 
ptn=2  b3  c3  b1  c1  b2  c2  b0  c0 
ptn=3  b3  b2  c3  c2  b1  b0  c1  c0 
New instructions in JZ4770
The JZ4770 has a quite a few additional MXU instructions. Ingenic writes 3 or 7 to register xr16 to activate these. This may imply that there are two levels of extension between JZ4740 and JZ4770.
Load and store instructions

LXB rb, rc, strd2

LXBU rb, rc, strd2

LXH rb, rc, strd2

LXHU rb, rc, strd2

LXW rb, rc, strd2

S16LDD xra, rb, s10, optn2

S16LDI xra, rb, s10, optn2

S16SDI xra, rb, s10, optn2

S16STD xra, rb, s10, optn2

S32LDDR xra, rb, s12

S32LDDVR xra, rb, rc, strd2

S32LDIR xra, rb, s12

S32LDIVR xra, rb, rc, strd2

S32SDIR xra, rb, s12

S32SDIVR xra, rb, rc, strd2

S32STDR xra, rb, s12

S32STDVR xra, rb, rc, strd2

S8LDD xra, rb, s8, optn3

S8LDI xra, rb, s8, optn3

S8SDI xra, rb, s8, optn3

S8STD xra, rb, s8, optn3
Other math

D16MOVN xra, xrb, xrc

D16MOVZ xra, xrb, xrc

D16SLT xra, xrb, xrc

Q16SCOP xra, xrb, xrc, xrd

Q8MOVN xra, xrb, xrc

Q8MOVZ xra, xrb, xrc

Q8SLTU xra, xrb, xrc

S32ABS xra, xrb

S32ALNI xra, xrb, xrc, optn3

S32EXTRV xra, xrd, rs, rt

S32EXTR xra, xrd, rs, bits5

S32LUI xra, s8, optn2

S32MOVN xra, xrb, xrc

S32MOVZ xra, xrb, xrc

S32SLT xra, xrb, xrc
Addition and subtraction instructions

D16ASUM xra, xrb, xrc, xrd

D32ACCM xra, xrb, xrc, xrd

D32ADDC xra, xrb, xrc

D32ASUM xra, xrb, xrc, xrd

Q16ACCM xra, xrb, xrc, xrd

Q16ASUM xra, xrb, xrc, xrd

S32MSUBU xra, xrd, rs, rt

S32MSUB xra, xrd, rs, rt
Multiply instructions

D16MACE xra, xrb, xrc, xrd

D16MULE xra, xrb, xrc, xrd

D8SUMC xra, xrb, xrc

D8SUM xra, xrb, xrc

Q8MULSU xra, xrb, xrc, xrd

Q8MACSU xra, xrb, xrc, xrd

S32MADDU xra, xrd, rs, rt

S32MADD xra, xrd, rs, rt

S32MULU xra, xrd, rs, rt

S32MUL xra, xrd, rs, rt
Bitwise instructions

S32AND xra, xrb, xrc

S32NOR xra, xrb, xrc

S32OR xra, xrb, xrc

S32XOR xra, xrb, xrc