This commit lets mpy_ld.py resolve symbols not only from the object
files involved in the linking process, or from compiler-supplied static
libraries, but also from a list of symbols referenced by an absolute
address (usually provided by the system's ROM).
This is needed for ESP8266 targets as some C stdlib functions are
provided by the MCU's own ROM code to reduce the final code footprint,
and therefore those functions' implementation was removed from the
compiler's support libraries. This means that unless `LINK_RUNTIME` is
set (which lets tooling look at more libraries to resolve symbols) the
build process will fail as tooling is unaware of the ROM symbols'
existence. With this change, fixed-address symbols can be exposed to
the symbol resolution step when performing natmod linking.
If there are symbols coming in from a fixed-address symbols list and
internal code or external libraries, the fixed-address symbol address
will take precedence in all cases.
Although this is - in theory - also working for the whole range of ESP32
MCUs, testing is currently limited to Xtensa processors and the example
natmods' makefiles only make use of this commit's changes for the
ESP8266 target.
Natmod builds can set the MPY_EXTERN_SYM_FILE variable pointing to a
linkerscript file containing a series of symbols (weak or strong) at a
fixed address; these symbols will then be used by the MicroPython
linker when packaging the natmod. If a different natmod build method is
used (eg. custom CMake scripts), `tools/mpy_ld.py` can now accept a
command line parameter called `--externs` (or its short variant `-e`)
that contains the path of a linkerscript file with the fixed-address
symbols to use when performing the linking process.
The linkerscript file parser can handle a very limited subset of
binutils's linkerscript syntax, namely just block comments, strong
symbols, and weak symbols. Each symbol must be in its own line for the
parser to succeed, empty lines or comment blocks are skipped. For an
example of what this parser was meant to handle, you can look at
`ports/esp8266/boards/eagle.rom.addr.v6.ld` and follow its format.
The natmod developer documentation is also updated to reflect the new
command line argument accepted by `mpy_ld.py` and the use cases for the
changes introduced by this commit.
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
Brings it into sync with a matching change to micropython-lib (which was
much older). Includes one small automatic fix.
Signed-off-by: Angus Gratton <angus@redyak.com.au>
This is a known limitation, so better to give a clear warning than a
catch-all AssertionError. Happens for example when trying to use
soft-float on ARCH=armv6m
Also give more details on the assertion for unknown relocations, such that
one can see which symbol it affects etc, to aid in debugging.
References issue #14430.
Signed-off-by: Jon Nordby <jononor@gmail.com>
This commit introduces an additional symbol resolution mechanism to the
natmod linking process. This allows the build scripts to look for required
symbols into selected libraries that are provided by the compiler
installation (libgcc and libm at the moment).
For example, using soft-float code in natmods, whilst technically possible,
was not an easy process and required some additional work to pull it off.
With this addition all the manual (and error-prone) operations have been
automated and folded into `tools/mpy_ld.py`.
Both newlib and picolibc toolchains are supported, albeit the latter may
require a bit of extra configuration depending on the environment the build
process runs on. Picolibc's soft-float functions aren't in libm - in fact
the shipped libm is nothing but a stub - but they are inside libc. This is
usually not a problem as these changes cater for that configuration quirk,
but on certain compilers the include paths used to find libraries in may
not be updated to take Picolibc's library directory into account. The bare
metal RISC-V compiler shipped with the CI OS image (GCC 10.2.0 on Ubuntu
22.04LTS) happens to exhibit this very problem.
To work around that for CI builds, the Picolibc libraries' path is
hardcoded in the Makefile directives used by the linker, but this can be
changed by setting the PICOLIBC_ROOT environment library when building
natmods.
Signed-off-by: Volodymyr Shymanskyy <vshymanskyi@gmail.com>
Co-authored-by: Alessandro Gatti <a.gatti@frob.it>
This commit adds support for RV32IMC native modules, as in embedding native
code into a self-contained MPY module and and make its exported functions
available to the MicroPython environment.
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
Native .mpy files targetting armv6m (eg RP2040) cannot currently have more
than about 2kiB of native code (between the start of the file and the init
function).
This commit fixes that by using bigger jumps to jump to the init function.
Signed-off-by: Damien George <damien@micropython.org>
As reported in #14430 the Xtensa compiler can add R_XTENSA_ASM_EXPAND
relocation relaxation entries in object files, and they were not
supported by mpy_ld.
This commit adds handling for that entry, doing nothing with it, as it
is only of real use for an optimising linker.
Signed-off-by: Alessandro Gatti <a.gatti@frob.it>
Also define `mp_type_bytearray`. These all help to write native modules.
Signed-off-by: Brian Pugh <bnp117@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
This is required because the .mpy native ABI was changed by the
introduction of `mp_proto_fun_t`, see commits:
- 416465d81e
- 5e3006f117
- e2ff00e811
And three `mp_binary` functions were added to `mp_fun_table` in
commit d2276f0d41.
Signed-off-by: Damien George <damien@micropython.org>
These are needed to read/write array.array objects, which is useful in
native code to provide fast extensions that work with big arrays of data.
Signed-off-by: Damien George <damien@micropython.org>
Because mpy_ld.py doesn't know the target object representation, it emits
instances of `MP_OBJ_NEW_QSTR(MP_QSTR_Foo)` as const string objects, rather
than qstrs. However this doesn't work for map keys (e.g. for a locals dict)
because the map has all_keys_are_qstrs flag is set (and also auto-complete
requires the map keys to be qstrs).
Instead, emit them as regular qstrs, and make a functioning MP_OBJ_NEW_QSTR
function available (via `native_to_obj`, also used for e.g. making
integers).
Remove the code from mpy_ld.py to emit qstrs as constant strings, but leave
behind the scaffold to emit constant objects in case we want to do use this
in the future.
Strictly this should be a .mpy sub-version bump, even though the function
table isn't changing, it does lead to a change in behavior for a new .mpy
running against old MicroPython. `mp_native_to_obj` will incorrectly return
the qstr value directly as an `mp_obj_t`, leading to unexpected results.
But given that it's broken at the moment, it seems unlikely that anyone is
relying on this, so it's not work the other downsides of a sub-version bump
(i.e. breaking pure-Python modules that use @native). The opposite case of
running an old .mpy on new MicroPython is unchanged, and remains broken in
exactly the same way.
This work was funded through GitHub Sponsors.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Sections sometimes named .rodata.str1.1 etc, instead of just .rodata.
Avoid crashing in that case. Instead treat it like any other RO section.
Fix thanks to @phlash.
Fixes issue #8783.
Signed-off-by: Jon Nordby <jononor@gmail.com>
Spurious fix as the logic is structured such that these variables will be
set before dereferenced, but this keeps Ruff happy (no more F821
undefined-name).
Signed-off-by: Angus Gratton <angus@redyak.com.au>
The intent is to allow us to make breaking changes to the native ABI (e.g.
changes to dynruntime.h) without needing the bytecode version to increment.
With this commit the two bits previously used for the feature flags (but
now unused as of .mpy version 6) encode a sub-version. A bytecode-only
.mpy file can be loaded as long as MPY_VERSION matches, but a native .mpy
(i.e. one with an arch set) must also match MPY_SUB_VERSION. This allows 3
additional updates to the native ABI per bytecode revision.
The sub-version is set to 1 because the previous commits that changed the
layout of mp_obj_type_t have changed the native ABI.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Signed-off-by: Damien George <damien@micropython.org>
Support for architecture-specific qstr linking was removed in
d4d53e9e11, where native code was changed to
access qstr values via qstr_table. The only remaining use for the special
qstr link table in persistentcode.c is to support native module written in
C, linked via mpy_ld.py. But native modules can also use the standard
module-level qstr_table (and obj_table) which was introduced in the .mpy
file reworking in f2040bfc7e.
This commit removes the remaining native qstr liking support in
persistentcode.c's load_raw_code function, and adds two new relocation
options for constants.qstr_table and constants.obj_table. mpy_ld.py is
updated to use these relocations options instead of the native qstr link
table.
Signed-off-by: Damien George <damien@micropython.org>
The examples/natmod features0 and features1 examples now build and run on
ARMv6-M platforms. More complicated examples are not yet supported because
the compiler emits references to built-in functions like __aeabi_uidiv.
Signed-off-by: Damien George <damien@micropython.org>
Prior to this commit, even with unicode disabled .py and .mpy files could
contain unicode characters, eg by entering them directly in a string as
utf-8 encoded.
The only thing the compiler disallowed (with unicode disabled) was using
\uxxxx and \Uxxxxxxxx notation to specify a character within a string with
value >= 0x100; that would give a SyntaxError.
With this change mpy-cross will now accept \u and \U notation to insert a
character with value >= 0x100 into a string (because the -mno-unicode
option is now gone, there's no way to forbid this). The runtime will
happily work with strings with such characters, just like it already works
with strings with characters that were utf-8 encoded directly.
This change simplifies things because there are no longer any feature
flags in .mpy files, and any bytecode .mpy will now run on any target.
Signed-off-by: Damien George <damien@micropython.org>
Background: .mpy files are precompiled .py files, built using mpy-cross,
that contain compiled bytecode functions (and can also contain machine
code). The benefit of using an .mpy file over a .py file is that they are
faster to import and take less memory when importing. They are also
smaller on disk.
But the real benefit of .mpy files comes when they are frozen into the
firmware. This is done by loading the .mpy file during compilation of the
firmware and turning it into a set of big C data structures (the job of
mpy-tool.py), which are then compiled and downloaded into the ROM of a
device. These C data structures can be executed in-place, ie directly from
ROM. This makes importing even faster because there is very little to do,
and also means such frozen modules take up much less RAM (because their
bytecode stays in ROM).
The downside of frozen code is that it requires recompiling and reflashing
the entire firmware. This can be a big barrier to entry, slows down
development time, and makes it harder to do OTA updates of frozen code
(because the whole firmware must be updated).
This commit attempts to solve this problem by providing a solution that
sits between loading .mpy files into RAM and freezing them into the
firmware. The .mpy file format has been reworked so that it consists of
data and bytecode which is mostly static and ready to run in-place. If
these new .mpy files are located in flash/ROM which is memory addressable,
the .mpy file can be executed (mostly) in-place.
With this approach there is still a small amount of unpacking and linking
of the .mpy file that needs to be done when it's imported, but it's still
much better than loading an .mpy from disk into RAM (although not as good
as freezing .mpy files into the firmware).
The main trick to make static .mpy files is to adjust the bytecode so any
qstrs that it references now go through a lookup table to convert from
local qstr number in the module to global qstr number in the firmware.
That means the bytecode does not need linking/rewriting of qstrs when it's
loaded. Instead only a small qstr table needs to be built (and put in RAM)
at import time. This means the bytecode itself is static/constant and can
be used directly if it's in addressable memory. Also the qstr string data
in the .mpy file, and some constant object data, can be used directly.
Note that the qstr table is global to the module (ie not per function).
In more detail, in the VM what used to be (schematically):
qst = DECODE_QSTR_VALUE;
is now (schematically):
idx = DECODE_QSTR_INDEX;
qst = qstr_table[idx];
That allows the bytecode to be fixed at compile time and not need
relinking/rewriting of the qstr values. Only qstr_table needs to be linked
when the .mpy is loaded.
Incidentally, this helps to reduce the size of bytecode because what used
to be 2-byte qstr values in the bytecode are now (mostly) 1-byte indices.
If the module uses the same qstr more than two times then the bytecode is
smaller than before.
The following changes are measured for this commit compared to the
previous (the baseline):
- average 7%-9% reduction in size of .mpy files
- frozen code size is reduced by about 5%-7%
- importing .py files uses about 5% less RAM in total
- importing .mpy files uses about 4% less RAM in total
- importing .py and .mpy files takes about the same time as before
The qstr indirection in the bytecode has only a small impact on VM
performance. For stm32 on PYBv1.0 the performance change of this commit
is:
diff of scores (higher is better)
N=100 M=100 baseline -> this-commit diff diff% (error%)
bm_chaos.py 371.07 -> 357.39 : -13.68 = -3.687% (+/-0.02%)
bm_fannkuch.py 78.72 -> 77.49 : -1.23 = -1.563% (+/-0.01%)
bm_fft.py 2591.73 -> 2539.28 : -52.45 = -2.024% (+/-0.00%)
bm_float.py 6034.93 -> 5908.30 : -126.63 = -2.098% (+/-0.01%)
bm_hexiom.py 48.96 -> 47.93 : -1.03 = -2.104% (+/-0.00%)
bm_nqueens.py 4510.63 -> 4459.94 : -50.69 = -1.124% (+/-0.00%)
bm_pidigits.py 650.28 -> 644.96 : -5.32 = -0.818% (+/-0.23%)
core_import_mpy_multi.py 564.77 -> 581.49 : +16.72 = +2.960% (+/-0.01%)
core_import_mpy_single.py 68.67 -> 67.16 : -1.51 = -2.199% (+/-0.01%)
core_qstr.py 64.16 -> 64.12 : -0.04 = -0.062% (+/-0.00%)
core_yield_from.py 362.58 -> 354.50 : -8.08 = -2.228% (+/-0.00%)
misc_aes.py 429.69 -> 405.59 : -24.10 = -5.609% (+/-0.01%)
misc_mandel.py 3485.13 -> 3416.51 : -68.62 = -1.969% (+/-0.00%)
misc_pystone.py 2496.53 -> 2405.56 : -90.97 = -3.644% (+/-0.01%)
misc_raytrace.py 381.47 -> 374.01 : -7.46 = -1.956% (+/-0.01%)
viper_call0.py 576.73 -> 572.49 : -4.24 = -0.735% (+/-0.04%)
viper_call1a.py 550.37 -> 546.21 : -4.16 = -0.756% (+/-0.09%)
viper_call1b.py 438.23 -> 435.68 : -2.55 = -0.582% (+/-0.06%)
viper_call1c.py 442.84 -> 440.04 : -2.80 = -0.632% (+/-0.08%)
viper_call2a.py 536.31 -> 532.35 : -3.96 = -0.738% (+/-0.06%)
viper_call2b.py 382.34 -> 377.07 : -5.27 = -1.378% (+/-0.03%)
And for unix on x64:
diff of scores (higher is better)
N=2000 M=2000 baseline -> this-commit diff diff% (error%)
bm_chaos.py 13594.20 -> 13073.84 : -520.36 = -3.828% (+/-5.44%)
bm_fannkuch.py 60.63 -> 59.58 : -1.05 = -1.732% (+/-3.01%)
bm_fft.py 112009.15 -> 111603.32 : -405.83 = -0.362% (+/-4.03%)
bm_float.py 246202.55 -> 247923.81 : +1721.26 = +0.699% (+/-2.79%)
bm_hexiom.py 615.65 -> 617.21 : +1.56 = +0.253% (+/-1.64%)
bm_nqueens.py 215807.95 -> 215600.96 : -206.99 = -0.096% (+/-3.52%)
bm_pidigits.py 8246.74 -> 8422.82 : +176.08 = +2.135% (+/-3.64%)
misc_aes.py 16133.00 -> 16452.74 : +319.74 = +1.982% (+/-1.50%)
misc_mandel.py 128146.69 -> 130796.43 : +2649.74 = +2.068% (+/-3.18%)
misc_pystone.py 83811.49 -> 83124.85 : -686.64 = -0.819% (+/-1.03%)
misc_raytrace.py 21688.02 -> 21385.10 : -302.92 = -1.397% (+/-3.20%)
The code size change is (firmware with a lot of frozen code benefits the
most):
bare-arm: +396 +0.697%
minimal x86: +1595 +0.979% [incl +32(data)]
unix x64: +2408 +0.470% [incl +800(data)]
unix nanbox: +1396 +0.309% [incl -96(data)]
stm32: -1256 -0.318% PYBV10
cc3200: +288 +0.157%
esp8266: -260 -0.037% GENERIC
esp32: -216 -0.014% GENERIC[incl -1072(data)]
nrf: +116 +0.067% pca10040
rp2: -664 -0.135% PICO
samd: +844 +0.607% ADAFRUIT_ITSYBITSY_M4_EXPRESS
As part of this change the .mpy file format version is bumped to version 6.
And mpy-tool.py has been improved to provide a good visualisation of the
contents of .mpy files.
In summary: this commit changes the bytecode to use qstr indirection, and
reworks the .mpy file format to be simpler and allow .mpy files to be
executed in-place. Performance is not impacted too much. Eventually it
will be possible to store such .mpy files in a linear, read-only, memory-
mappable filesystem so they can be executed from flash/ROM. This will
essentially be able to replace frozen code for most applications.
Signed-off-by: Damien George <damien@micropython.org>
This commit removes all parts of code associated with the existing
MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE optimisation option, including the
-mcache-lookup-bc option to mpy-cross.
This feature originally provided a significant performance boost for Unix,
but wasn't able to be enabled for MCU targets (due to frozen bytecode), and
added significant extra complexity to generating and distributing .mpy
files.
The equivalent performance gain is now provided by the combination of
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE (which has
been enabled on the unix port in the previous commit).
It's hard to provide precise performance numbers, but tests have been run
on a wide variety of architectures (x86-64, ARM Cortex, Aarch64, RISC-V,
xtensa) and they all generally agree on the qualitative improvements seen
by the combination of MICROPY_OPT_LOAD_ATTR_FAST_PATH and
MICROPY_OPT_MAP_LOOKUP_CACHE.
For example, on a "quiet" Linux x64 environment (i3-5010U @ 2.10GHz) the
change from CACHE_MAP_LOOKUP_IN_BYTECODE, to LOAD_ATTR_FAST_PATH combined
with MAP_LOOKUP_CACHE is:
diff of scores (higher is better)
N=2000 M=2000 bccache -> attrmapcache diff diff% (error%)
bm_chaos.py 13742.56 -> 13905.67 : +163.11 = +1.187% (+/-3.75%)
bm_fannkuch.py 60.13 -> 61.34 : +1.21 = +2.012% (+/-2.11%)
bm_fft.py 113083.20 -> 114793.68 : +1710.48 = +1.513% (+/-1.57%)
bm_float.py 256552.80 -> 243908.29 : -12644.51 = -4.929% (+/-1.90%)
bm_hexiom.py 521.93 -> 625.41 : +103.48 = +19.826% (+/-0.40%)
bm_nqueens.py 197544.25 -> 217713.12 : +20168.87 = +10.210% (+/-3.01%)
bm_pidigits.py 8072.98 -> 8198.75 : +125.77 = +1.558% (+/-3.22%)
misc_aes.py 17283.45 -> 16480.52 : -802.93 = -4.646% (+/-0.82%)
misc_mandel.py 99083.99 -> 128939.84 : +29855.85 = +30.132% (+/-5.88%)
misc_pystone.py 83860.10 -> 82592.56 : -1267.54 = -1.511% (+/-2.27%)
misc_raytrace.py 21490.40 -> 22227.23 : +736.83 = +3.429% (+/-1.88%)
This shows that the new optimisations are at least as good as the existing
inline-bytecode-caching, and are sometimes much better (because the new
ones apply caching to a wider variety of map lookups).
The new optimisations can also benefit code generated by the native
emitter, because they apply to the runtime rather than the generated code.
The improvement for the native emitter when LOAD_ATTR_FAST_PATH and
MAP_LOOKUP_CACHE are enabled is (same Linux environment as above):
diff of scores (higher is better)
N=2000 M=2000 native -> nat-attrmapcache diff diff% (error%)
bm_chaos.py 14130.62 -> 15464.68 : +1334.06 = +9.441% (+/-7.11%)
bm_fannkuch.py 74.96 -> 76.16 : +1.20 = +1.601% (+/-1.80%)
bm_fft.py 166682.99 -> 168221.86 : +1538.87 = +0.923% (+/-4.20%)
bm_float.py 233415.23 -> 265524.90 : +32109.67 = +13.756% (+/-2.57%)
bm_hexiom.py 628.59 -> 734.17 : +105.58 = +16.796% (+/-1.39%)
bm_nqueens.py 225418.44 -> 232926.45 : +7508.01 = +3.331% (+/-3.10%)
bm_pidigits.py 6322.00 -> 6379.52 : +57.52 = +0.910% (+/-5.62%)
misc_aes.py 20670.10 -> 27223.18 : +6553.08 = +31.703% (+/-1.56%)
misc_mandel.py 138221.11 -> 152014.01 : +13792.90 = +9.979% (+/-2.46%)
misc_pystone.py 85032.14 -> 105681.44 : +20649.30 = +24.284% (+/-2.25%)
misc_raytrace.py 19800.01 -> 23350.73 : +3550.72 = +17.933% (+/-2.79%)
In summary, compared to MICROPY_OPT_CACHE_MAP_LOOKUP_IN_BYTECODE, the new
MICROPY_OPT_LOAD_ATTR_FAST_PATH and MICROPY_OPT_MAP_LOOKUP_CACHE options:
- are simpler;
- take less code size;
- are faster (generally);
- work with code generated by the native emitter;
- can be used on embedded targets with a small and constant RAM overhead;
- allow the same .mpy bytecode to run on all targets.
See #7680 for further discussion. And see also #7653 for a discussion
about simplifying mpy-cross options.
Signed-off-by: Jim Mussared <jim.mussared@gmail.com>
Updating to Black v20.8b1 there are two changes that affect the code in
this repository:
- If there is a trailing comma in a list (eg [], () or function call) then
that list is now written out with one line per element. So remove such
trailing commas where the list should stay on one line.
- Spaces at the start of """ doc strings are removed.
Signed-off-by: Damien George <damien@micropython.org>
This makes the loading of viper-code-with-relocations a bit neater and
easier to understand, by treating the rodata/bss like a special object to
be loaded into the constant table (which is how it behaves).
We don't want to add a feature flag to .mpy files that indicate float
support because it will get complex and difficult to use. Instead the .mpy
is built using whatever precision it chooses (float or double) and the
native glue API will convert between this choice and what the host runtime
actually uses.
This commit adds a new tool called mpy_ld.py which is essentially a linker
that builds .mpy files directly from .o files. A new header file
(dynruntime.h) and makefile fragment (dynruntime.mk) are also included
which allow building .mpy files from C source code. Such .mpy files can
then be dynamically imported as though they were a normal Python module,
even though they are implemented in C.
Converting .o files directly (rather than pre-linked .elf files) allows the
resulting .mpy to be more efficient because it has more control over the
relocations; for example it can skip PLT indirection. Doing it this way
also allows supporting more architectures, such as Xtensa which has
specific needs for position-independent code and the GOT.
The tool supports targets of x86, x86-64, ARM Thumb and Xtensa (windowed
and non-windowed). BSS, text and rodata sections are supported, with
relocations to all internal sections and symbols, as well as relocations to
some external symbols (defined by dynruntime.h), and linking of qstrs.