Martin Kroeker
ccb9731c7b
Fix propagation of cpu properties to compiler options
5 years ago
Martin Kroeker
a29338aaa6
Remove extraneous quotes that caused a cmake policy warning
5 years ago
Martin Kroeker
438a8e5624
Fix placement of getarch call and spurious cpu property accumulation in DYNAMIC_ARCH builds
5 years ago
Martin Kroeker
e5967810b7
Merge pull request #110 from xianyi/develop
rebase
5 years ago
Martin Kroeker
ff74319ea5
Merge pull request #2977 from martin-frbg/issue2976
Fix macro name used in ifdef for POWERPC/PGI
5 years ago
Martin Kroeker
28d2dfe2b3
Fix macro name used in ifdef
5 years ago
Gengxin Xie
725ffbf041
fix typo
5 years ago
Gengxin Xie
d9ba49165a
Improve the performance of rot by using AVX512 and AVX2 intrinsic
5 years ago
Martin Kroeker
60ab9c783f
Merge pull request #2966 from martin-frbg/issue2964
Ensure that EXPRECISION is disabled for DYNAMIC_ARCH with TARGET=GENERIC and fix CMAKE DYNAMIC_ARCH builds
5 years ago
Martin Kroeker
8cc73fee98
Export NO_EXPRECISION after overriding for DYNAMIC_ARCH with GENERIC target
5 years ago
Martin Kroeker
0155cd53a3
Add -msse3 where needed for DYNAMIC_ARCH builds
5 years ago
Martin Kroeker
a9f9354296
Fix target test
5 years ago
Martin Kroeker
b9bc76aec4
Add files via upload
5 years ago
Martin Kroeker
f071245939
Merge pull request #2967 from RajalakshmiSR/dgemm88
POWER10: Change dgemm unroll factors
5 years ago
Aisha Tammy
60997ddd73
allow setting soname without suffix or prefix
Allows to create a library with a different
SONAME without the need to add suffixes to symbols
Backwards compatible and should have no effect
on the workflow and previous users.
Useful for allowing INTERFACE64 library alongside
the standard library without file conflicts
5 years ago
Martin Kroeker
e5f8c2bf8a
typo fix
5 years ago
Martin Kroeker
6baf8af658
Disable EXPRECISION for the combination of DYNAMIC_CORE and GENERIC target
5 years ago
Martin Kroeker
40a93c232b
Disable EXPRECISION for DYNAMIC_ARCH in combination with TARGET=GENERIC
NO_EXPRECISION is disabled for the GENERIC_TARGET already, so prevent mixing with code parts that use a different float size by default
5 years ago
Martin Kroeker
fab952bee4
Merge pull request #2962 from brada4/develop
add openbsd 68+ gfortran name
5 years ago
Martin Kroeker
1cf04a6f0e
Merge pull request #2963 from martin-frbg/issue2959
Reunify default BUFFER_SIZE on ARM64 to avoid crashes in DYNAMIC_ARCH mode
5 years ago
Rajalakshmi Srinivasaraghavan
dd7a9cc5bf
POWER10: Change dgemm unroll factors
Changing the unroll factors for dgemm to 8 shows improved performance with
POWER10 MMA feature. Also made some minor changes in sgemm for edge cases.
5 years ago
Martin Kroeker
7f26be4802
Reunify BUFFERSIZE across arm64 platforms to avoid segfaults in DYNAMIC_ARCH
5 years ago
User User-User
9fab65e90a
add openbsd gfortran
5 years ago
Martin Kroeker
9efc3f0815
Merge pull request #109 from xianyi/develop
rebase
5 years ago
Martin Kroeker
aa21cb5217
Merge pull request #2960 from thrasibule/avx2_detection
fix avx2 detection
5 years ago
Guillaume Horel
1f564d729b
fix avx2 detection
reword commits to make it clearer
5 years ago
Martin Kroeker
9349dcd206
Merge pull request #2956 from RajalakshmiSR/caxpy_p10
Optimize caxpy for POWER10
5 years ago
Rajalakshmi Srinivasaraghavan
b435491885
Optimize caxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
5 years ago
Martin Kroeker
9a058f2451
Merge pull request #2940 from Qiyu8/optimize-benchmark
Refactor the performance measurement system
5 years ago
Martin Kroeker
074927a7d0
Merge pull request #2954 from Guobing-Chen/BF16_gemv_support
Implementation of BF16 based gemv
5 years ago
Martin Kroeker
60b22e3462
Merge pull request #2955 from Guobing-Chen/Fix_cooperlake_build_issue
Fix cooperlake compile issue
5 years ago
Chen, Guobing
c5e62dad69
Fix cooperlake compile issue
Add a missing macro which is required in Makefile.x86_64 due to recent
clearnup, which causes cooperlake platform build failure.
5 years ago
Chen, Guobing
a7b1f9b1bb
Implementation of BF16 based gemv
1. Add a new API -- sbgemv to support bfloat16 based gemv
2. Implement a generic kernel for sbgemv
3. Implement an avx512-bf16 based kernel for sbgemv
Signed-off-by: Chen, Guobing <guobing.chen@intel.com>
5 years ago
Martin Kroeker
67f39ad813
Merge pull request #2939 from thrasibule/Makefile_cleanup
reuse variables defined in Makefile.system
5 years ago
Martin Kroeker
6e13a7e99e
Merge pull request #2951 from martin-frbg/cleanup_make
Minor Makefile cleanup
5 years ago
Martin Kroeker
2207a16235
Merge pull request #2952 from martin-frbg/issue2931
Try to read cpu ID from /sys/devices/.../cpu0 if HWCAP_CPUID fails
5 years ago
Martin Kroeker
5d643929dd
Merge pull request #2948 from martin-frbg/issue2947
Expressly enable neon for use with intrinsics if available
5 years ago
Martin Kroeker
e8cbf0fc50
Output predefined HAVE_ entries to Makefile.conf for ARM with specified TARGET
5 years ago
Martin Kroeker
b937d78a6d
Try to read cpu information from /sys/devices/system/cpu/cpu0 if HWCAP_CPUID fails
5 years ago
Martin Kroeker
e2f9005db8
Merge pull request #2950 from RajalakshmiSR/saxpy
Optimize saxpy for POWER10
5 years ago
Martin Kroeker
6a1f3e40af
Remove debug printout of object list
5 years ago
Martin Kroeker
878b6d1f41
Remove spurious expr in flang version check
5 years ago
Rajalakshmi Srinivasaraghavan
c24ba8b1dd
Optimize saxpy for POWER10
This patch makes use of new POWER10 vector pair instructions for
loads and stores.
5 years ago
Qiyu8
f917c26e83
Refractoring remaining benchmark cases.
5 years ago
Martin Kroeker
76203e2120
Merge pull request #2946 from martin-frbg/issue2945
Move definitions that are neither needed nor supported on Solaris
5 years ago
Martin Kroeker
eec517af0e
Expressly enable neon for use with intrinsics if available
5 years ago
Martin Kroeker
fd7da56965
Move definitions that are neither needed nor supported on SUNOS
5 years ago
Martin Kroeker
2f9fc9be30
Update version to 0.3.12.dev
5 years ago
Martin Kroeker
81fcfd5ed3
Update version to 0.3.12.dev
5 years ago
Martin Kroeker
addf7593ae
Merge pull request #2944 from xianyi/release-0.3.0
Merge back 0.3.12 tag (and Changelog typo fixes) from release
5 years ago