mirror of
https://github.com/RRZE-HPC/asmbench.git
synced 2025-07-21 20:51:05 +02:00
d18b4d20042c0a186d78f7ed9380980b612ea515
Specifies register shape and suffix for floating point and vector registers. The former cannot be benchmarked without and the latter would require adding the required suffixes manually in the instruction operands. Doing both allows using them in the same manner as on x86. Additionally there are two small changes affecting all architectures: Allow the 'w' constraint code, which is used for vector registers on aarch64. Always specify a clobber for the flags register as many instructions one might want to benchmark modify it.
asmbench ======== A benchmark toolkit for assembly instructions using the LLVM JIT. Usage ===== To benchmark latency and throughput of a 64bit integer add use the following command: ``asmbench 'add {src:i64:r}, {srcdst:i64:r}'`` To benchmark two instructions interleaved use this: ``asmbench 'add {src:i64:r}, {srcdst:i64:r}' 'sub {src:i64:r}, {srcdst:i64:r}'`` To find out more add `-h` for help and `-v` for verbose mode. Operand Templates ================= Operands always follow this form: ``{direction:data_type:pass_type}``. Direction may be ``src``, ``dst`` or ``srcdst``. This will allow asmbench to serialize the code (wherever possible). ``src`` operands are read, but not modiefied by the instruction. ``dst`` operands are modified to, but not read. ``srcdst`` operands will be read and modified by the instruction. Data and Pass Types: * ``i64:r`` -> 64bit general purpose register (gpr) (e.g., ``%rax``) * ``i32:r`` -> 32bit gpr (e.g., ``%ecx``) * ``<2 x double>:x`` -> 128bit SSE register with two double precision floating-point numbers (e.g., ``%xmm1``) * ``<4 x float>:x`` -> 128bit SSE register with four single precision floating-point numbers (e.g., ``%xmm1``) * ``<4 x double>:x`` -> 256bit AVX register with four double precision floating-point numbers (e.g., ``%ymm1``) * ``<8 x float>:x`` -> 256bit AVX register with eight single precision floating-point numbers (e.g., ``%ymm1``) * ``<8 x double>:x`` -> 512bit AVX512 register with eight double precision floating-point numbers (e.g., ``%zmm1``) * ``<16 x float>:x`` -> 512bit AVX512 register with sixteen single precision floating-point numbers (e.g., ``%zmm1``) * ``i8:23`` -> immediate 0 (i.e., ``$23``)
Languages
Python
99.1%
C
0.9%