mirror of https://github.com/RRZE-HPC/OSACA.git synced 2025-09-07 18:31:28 +02:00

Files

Metehan Dundar ebf76caa18 Apply selected improvements from 1ceac6e: enhanced RISC-V parser, ImmediateOperand enhancements, and rv6→rv64 file renames

- Enhanced ImmediateOperand with reloc_type and symbol attributes for better RISC-V support
- Updated RISC-V parser with relocation type support (%hi, %lo, %pcrel_hi, etc.)
- Renamed example files from rv6 to rv64 for consistency
- Updated related configuration and test files
- All 115 tests pass successfully

2025-07-11 18:15:51 +02:00

add

Apply selected improvements from 1ceac6e: enhanced RISC-V parser, ImmediateOperand enhancements, and rv6→rv64 file renames

2025-07-11 18:15:51 +02:00

copy

Apply selected improvements from 1ceac6e: enhanced RISC-V parser, ImmediateOperand enhancements, and rv6→rv64 file renames

2025-07-11 18:15:51 +02:00

daxpy

Apply selected improvements from 1ceac6e: enhanced RISC-V parser, ImmediateOperand enhancements, and rv6→rv64 file renames

2025-07-11 18:15:51 +02:00

Apply selected improvements from 1ceac6e: enhanced RISC-V parser, ImmediateOperand enhancements, and rv6→rv64 file renames

2025-07-11 18:15:51 +02:00

j2d

Apply selected improvements from 1ceac6e: enhanced RISC-V parser, ImmediateOperand enhancements, and rv6→rv64 file renames

2025-07-11 18:15:51 +02:00

striad

Apply selected improvements from 1ceac6e: enhanced RISC-V parser, ImmediateOperand enhancements, and rv6→rv64 file renames

2025-07-11 18:15:51 +02:00

sum_reduction

Apply selected improvements from 1ceac6e: enhanced RISC-V parser, ImmediateOperand enhancements, and rv6→rv64 file renames

2025-07-11 18:15:51 +02:00

triad

Apply selected improvements from 1ceac6e: enhanced RISC-V parser, ImmediateOperand enhancements, and rv6→rv64 file renames

2025-07-11 18:15:51 +02:00

update

Apply selected improvements from 1ceac6e: enhanced RISC-V parser, ImmediateOperand enhancements, and rv6→rv64 file renames

2025-07-11 18:15:51 +02:00

README.md

Added explanation for kernels

2020-02-03 13:39:12 +01:00

README.md

Examples

We collected sample kernels for the user to run examples with OSACA. The assembly files contain only the extracted and already marked kernel for code compiled with on Intel Cascade Lake (CSX), AMD Zen and Marvell ThunderX2 (TX2), but can be run on any system supporting the ISA and supported by OSACA. The used compilers were Intel Parallel Studio 19.0up05 and GNU 9.1.0 in case of the x86 systems and ARM HPC Compiler for Linux version 19.2 and GNU 8.2.0 for the ARM-based TX2.

To analyze the kernels with OSACA, run

osaca --arch ARCH FILE

While all Zen and TX2 kernels use the comment-style OSACA markers, the kernels for Intel Cascade Lake (.csx..s) use the byte markers to be able to be analyzed by IACA as well. For this use

gcc -c FILE.s
iaca -arch SKX FILE.o

The kernels currently contained in the examples are shown briefly in the following.

Copy (`copy/`)

double * restrict a, * restrict b;

for(long i=0; i < size; ++i){
    a[i] = b[i];
}

Vector add (`add/`)

double * restrict a, * restrict b, * restrict c;

for(long i=0; i < size; ++i){
    a[i] = b[i] + c[i];
}

Vector update (`update/`)

double * restrict a;

for(long i=0; i < size; ++i){
    a[i] = scale * a[i];
}

Sum reduction (`sum_reduction/`)

double * restrict a;

for(long i=0; i < size; ++i){
    scale = scale + a[i];
}

For this kernel we noticed an overlap of the loop bodies when using gcc with -Ofast flag (see this blog post for more information). We therefore compiled all gcc version additionally with -O3 flag instead. These versions are named accordingly.

DAXPY (`daxpy/`)

double * restrict a, * restrict b;

for(long i=0; i < size; ++i){
    a[i] = a[i] + scale * b[i];
}

STREAM triad (`triad/`)

double * restrict a, * restrict b, * restrict c;

for(long i=0; i < size; ++i){
    a[i] = b[i] + scale * c[i];
}

Schönauer triad (`striad/`)

double * restrict a, * restrict b, * restrict c, *  restrict d;

for(long i=0; i < size; ++i){
    a[i] = b[i] + c[i] * d[i];
}

Gauss-Seidel method (`gs/`)

double ** restrict a;

for(long k=1; k < size_k-1; ++k){
  for(long i=1; i < size_i-1; ++i){
    a[k][i] = scale * (
      a[k][i-1] + a[k+1][i]
      + a[k][i+1] + a[k-1][i]
    );
  }
}

Jacobi 2D (`j2d/`)

double ** restrict a, ** restrict b;

for(long k=1; k < size_k-1; ++k){
  for(long i=1; i < size_i-1; ++i){
    a[k][i] = 0.25 * (
      b[k][i-1] + b[k+1][i]
      + b[k][i+1] + b[k-1][i]
    );
  }
}

For this kernel we noticed a discrepancy between measurements and predcitions especially when using AVX-512 instructions. We therefore compiled the x86 kernels additionally with AVX/SSE instruction and marekd those kernels accordingly.

README.md

Examples

Copy (copy/)

Vector add (add/)

Vector update (update/)

Sum reduction (sum_reduction/)

DAXPY (daxpy/)

STREAM triad (triad/)

Schönauer triad (striad/)

Gauss-Seidel method (gs/)

Jacobi 2D (j2d/)

Copy (`copy/`)

Vector add (`add/`)

Vector update (`update/`)

Sum reduction (`sum_reduction/`)

DAXPY (`daxpy/`)

STREAM triad (`triad/`)

Schönauer triad (`striad/`)

Gauss-Seidel method (`gs/`)

Jacobi 2D (`j2d/`)