This commit lets the native emitter backend extends the range of the
BCC family of opcodes (BALL, BANY, BBC, BBS, BEQ, BGE, BGEU, BLT,
BLTU, BNALL, BNE, BNONE) from 8 bits to 18 bits.
The test suite contains some test files that, when compiled into native
code, would require BCC jumps outside the (signed) 8 bits range. In
this case either the MicroPython interpreter or mpy-cross would raise an
exception, not running the test when using the "--via-mpy --emit native"
command line options with the test runner.
This comes with a 3 bytes penalty on each forward jump, bringing the
footprint of those jumps to 6 bytes each, as a longer opcode sequence
has to be emitted to let jumps access a larger range. However, this is
slightly offset by the fact that backward jumps can be emitted with a
single opcode if the range is small enough (8-bits offset).
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit lets the native emitter backend extends the range of the
BCCZ family of opcodes (BEQZ, BNEZ, BLTZ, BGEZ) from 12 bits to 18
bits.
The test suite contains some test files that, when compiled into native
code, would require BCCZ jumps outside the (signed) 12 bits range. In
this case either the MicroPython interpreter or mpy-cross would raise an
exception, not running the test when using the "--via-mpy --emit native"
command line options with the test runner.
This comes with a 3 bytes penalty on each forward jump, bringing the
footprint of those jumps to 6 bytes each, as a longer opcode sequence
has to be emitted to let jumps access a larger range. However, this is
slightly offset by the fact that backward jumps can be emitted with a
single opcode if the range is small enough (3 bytes for a 12-bits
offset).
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit updates the existing specialised implementations for
int-indexed 32-bit load and store operations, and adds a specialised
implementation for int-indexed 16-bit load.
The 32-bit operations relied on the fact that their applicability was
limited to a specific range, falling back on a generic implementation
otherwise. Introducing a single entry point for each int-indexed
load/store operation size would break that assumption. Now those two
operations contain fallback code to generate working code by themselves
instead of raising an exception.
The 16-bit operation instead simply did not have any range check, but it
was not exposed directly to the Viper emitter. When a 16-bit
int-indexed load entry point was introduced, the existing implementation
would fail when accessing memory outside its 0..255 halfwords range. A
specialised implementation is now present, performing fewer operations
than the existing Viper emitter equivalent.
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit simplifies native functions' prologue code by not emitting a
jump opcode that goes over the function's constants pool if the pool is
empty.
The original code assumed the constants pool is never empty as large
32-bits constants are commonly used, but for inline assembler functions
that may not be the case. This meant that inline assembler functions
may start with an unneeded jump (along with its alignment byte), using
four bytes more than necessary.
This commit is limited to the "xtensa" target, as "xtensawin" doesn't
support inline assembler functions yet, so native functions' constant
pools are almost always guaranteed to hold one or more values.
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit expands the Xtensa inline assembler to support most if not
all opcodes available on the ESP8266 and LX3 Xtensa cores.
This is meant as a stepping stone to add inline assembler support for
the ESP32 and its LX6 core, along to windowed-specific opcodes and
additional opcodes that are present only on the LX7 core (ESP32-S3 and
later).
New opcodes being added are covered by tests, and the provided tests
were expanded to also include opcodes available in the existing
implementation. Given that the ESP8266 space requirements are tighter
than ESP32's, certain opcodes that won't be commonly used have been put
behind a define to save some space in the general use case.
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit fixes compilation errors occurring when enabling the Xtensa
code dumper inside mpy-cross.
The original code was meant to dump the code from an Xtensa device
itself, but for debugging the inline assembler this functionality was
also needed off-line. The changes involve solving a signed/unsigned
mismatch that was not much of a problem for the 8266's gcc version but
made modern compilers complain, and using the printf formatter for
pointers when it comes to printing code addresses.
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
This commit removes old raw printf calls happening inside certain branch
opcode emitters, indicating the target label is out of range for the
opcode. They have been replaced with a RuntimeError being raised in
these cases, using a parameterised qstr instead.
Whilst this technically breaks runtime behaviour expectations, the
generated code would not have worked anyway so it's better to catch
those cases early. This should be updated to always emit long jumps
unless jumps are backwards and short enough, following the other ports,
but that's something coming later.
This is actually needed because there are test files that do not work
when processed through mpy-cross and entirely converted to native code.
The original implementation would still generate mostly-valid code that
was bound to crash on the device, whilst this change would prevent
invalid code to even be emitted in the first place.
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
The STATIC macro was introduced a very long time ago in commit
d5df6cd44a. The original reason for this was
to have the option to define it to nothing so that all static functions
become global functions and therefore visible to certain debug tools, so
one could do function size comparison and other things.
This STATIC feature is rarely (if ever) used. And with the use of LTO and
heavy inline optimisation, analysing the size of individual functions when
they are not static is not a good representation of the size of code when
fully optimised.
So the macro does not have much use and it's simpler to just remove it.
Then you know exactly what it's doing. For example, newcomers don't have
to learn what the STATIC macro is and why it exists. Reading the code is
also less "loud" with a lowercase static.
One other minor point in favour of removing it, is that it stops bugs with
`STATIC inline`, which should always be `static inline`.
Methodology for this commit was:
1) git ls-files | egrep '\.[ch]$' | \
xargs sed -Ei "s/(^| )STATIC($| )/\1static\2/"
2) Do some manual cleanup in the diff by searching for the word STATIC in
comments and changing those back.
3) "git-grep STATIC docs/", manually fixed those cases.
4) "rg -t python STATIC", manually fixed codegen lines that used STATIC.
This work was funded through GitHub Sponsors.
Signed-off-by: Angus Gratton <angus@redyak.com.au>
This commit adds optimised l32i/s32i functions that select the best load/
store encoding based on the size of the offset, and uses the function when
necessary in code generation.
Without this, ASM_LOAD_REG_REG_OFFSET() could overflow the word offset
(using a narrow encoding), for example when loading the prelude from the
constant table when there are many (>16) constants.
Fixes issue #8458.
Signed-off-by: Damien George <damien@micropython.org>
This commit adds support for saving and loading .mpy files that contain
native code (native, viper and inline-asm). A lot of the ground work was
already done for this in the form of removing pointers from generated
native code. The changes here are mainly to link in qstr values to the
native code, and change the format of .mpy files to contain native code
blocks (possibly mixed with bytecode).
A top-level summary:
- @micropython.native, @micropython.viper and @micropython.asm_thumb/
asm_xtensa are now allowed in .py files when compiling to .mpy, and they
work transparently to the user.
- Entire .py files can be compiled to native via mpy-cross -X emit=native
and for the most part the generated .mpy files should work the same as
their bytecode version.
- The .mpy file format is changed to 1) specify in the header if the file
contains native code and if so the architecture (eg x86, ARMV7M, Xtensa);
2) for each function block the kind of code is specified (bytecode,
native, viper, asm).
- When native code is loaded from a .mpy file the native code must be
modified (in place) to link qstr values in, just like bytecode (see
py/persistentcode.c:arch_link_qstr() function).
In addition, this now defines a public, native ABI for dynamically loadable
native code generated by other languages, like C.
All architectures now have a dedicated register to hold the pointer to the
native function table mp_fun_table, and so they all need to load this
register at the start of the native function. This commit makes the
loading of this register uniform across architectures by passing the
pointer in the constant table for the native function, and then loading the
register from the constant table. Doing it this way means that the pointer
is not stored in the assembly code, helping to make the code more portable.
Loading a pointer by indexing into the native function table mp_fun_table,
rather than loading an immediate value (via a PC-relative load), uses less
code space.
For all but the last pass the assembler only needs to count how much space
is needed for the machine code, it doesn't actually need to emit anything.
The dummy_data just uses unnecessary RAM and without it the code is not
any more complex (and code size does not increase for Thumb and Xtensa
archs).
This patch adds the MICROPY_EMIT_INLINE_XTENSA option, which, when
enabled, allows the @micropython.asm_xtensa decorator to be used.
The following opcodes are currently supported (ax is a register, a0-a15):
ret_n()
callx0(ax)
j(label)
jx(ax)
beqz(ax, label)
bnez(ax, label)
mov(ax, ay)
movi(ax, imm) # imm can be full 32-bit, uses l32r if needed
and_(ax, ay, az)
or_(ax, ay, az)
xor(ax, ay, az)
add(ax, ay, az)
sub(ax, ay, az)
mull(ax, ay, az)
l8ui(ax, ay, imm)
l16ui(ax, ay, imm)
l32i(ax, ay, imm)
s8i(ax, ay, imm)
s16i(ax, ay, imm)
s32i(ax, ay, imm)
l16si(ax, ay, imm)
addi(ax, ay, imm)
ball(ax, ay, label)
bany(ax, ay, label)
bbc(ax, ay, label)
bbs(ax, ay, label)
beq(ax, ay, label)
bge(ax, ay, label)
bgeu(ax, ay, label)
blt(ax, ay, label)
bnall(ax, ay, label)
bne(ax, ay, label)
bnone(ax, ay, label)
Upon entry to the assembly function the registers a0, a12, a13, a14 are
pushed to the stack and the stack pointer (a1) decreased by 16. Upon
exit, these registers and the stack pointer are restored, and ret.n is
executed to return to the caller (caller address is in a0).
Note that the ABI for the Xtensa emitters is non-windowing.