abhishek-fujitsu
|
0c239c9d48
|
update contribution list
|
9 months ago |
Annop Wongwathanarat
|
ec146157d3
|
Use SVE kernel for S/DGEMVT for SVE machines
|
10 months ago |
Annop Wongwathanarat
|
9807f56580
|
Optimize aarch64 sgemm_ncopy
|
11 months ago |
Annop Wongwathanarat
|
a085b6c9ec
|
Fix aarch64 sbgemv_t compilation error for GCC < 13
|
11 months ago |
Martin Kroeker
|
2b941c44b5
|
Merge branch 'develop' into sbgemv_n_neon
|
11 months ago |
Ye Tao
|
35bdbca153
|
Add sbgemv_n_neon kernel for arm64.
|
11 months ago |
Annop Wongwathanarat
|
edaf51dd99
|
Add sbgemv_t_bfdot kernel for ARM64
This improves performance for sbgemv_t by up to 100x on NEOVERSEV1.
The geometric mean speedup is ~61x for M=N=[2,512].
|
11 months ago |
Marek Michalowski
|
650a062e19
|
Add thread throttling profile for SGEMV on `NEOVERSEV2`
|
11 months ago |
Marek Michalowski
|
b723c1b7b7
|
Add thread throttling profile for SGEMM on `NEOVERSEV2`
|
11 months ago |
Ye Tao
|
c748e6a338
|
optimized sbgemm kernel for neoverse-v1 (sve-256)
Signed-off-by: Ye Tao <ye.tao@arm.com>
|
1 year ago |
Martin Kroeker
|
6e393a5599
|
Merge branch 'develop' into gemv_t
|
1 year ago |
Marek Michalowski
|
838bb57e27
|
Merge branch 'develop' into develop
|
1 year ago |
Marek Michalowski
|
4d5b13f765
|
Add thread throttling profile for SGEMV on `NEOVERSEV1`
|
1 year ago |
Annop Wongwathanarat
|
c0318cea6e
|
Simplify gemv_t_sve_v1x3 kernel
|
1 year ago |
Annop Wongwathanarat
|
c8cd8da496
|
Add thread throttling profile for SGEMM on NEOVERSEV1
|
1 year ago |
CDAC-SSDG
|
41912f9c22
|
Update CONTRIBUTORS.md
|
1 year ago |
CDAC-SSDG
|
2718b37fed
|
Update CONTRIBUTORS.md
|
1 year ago |
Chris Daley
|
cb48505251
|
optimize gemv forwarding on ARM64 systems
|
1 year ago |
Jake Arkinstall
|
44004178aa
|
Updated CONTRIBUTORS.md
As requested on X (https://x.com/KroekerMartin/status/1755218919290278185)
|
1 year ago |
Mark Seminatore
|
b29fd48998
|
Merge branch 'develop' into win_tidy
|
2 years ago |
Mark Seminatore
|
10548a0460
|
update contributors
|
2 years ago |
Dirreke
|
ec89466e14
|
Add CSKY support
|
2 years ago |
Mark Seminatore
|
5f51811728
|
try at new threading model
|
2 years ago |
Martin Kroeker
|
616fdea82a
|
Revert "Improve Windows threading performance scaling"
|
2 years ago |
Mark Seminatore
|
427f9f2428
|
update contributors
|
2 years ago |
Chris Sidebottom
|
bfc20c2e97
|
Add Chris Sidebottom to CONTRIBUTORS.md
|
2 years ago |
Pablo Romero
|
1b1f781cf9
|
Added name and details to contributors' list.
|
3 years ago |
Xianyi Zhang
|
f9715605ac
|
Add PLCT to contributors.
|
3 years ago |
Martin Kroeker
|
5d24f3d210
|
Update CONTRIBUTORS.md
|
4 years ago |
Martin Kroeker
|
66a15e15a8
|
Update CONTRIBUTORS.md
|
4 years ago |
Bine Brank
|
19d435b1b3
|
update armv8sve + contributors
|
4 years ago |
Bine Brank
|
cbcea149f0
|
update contributors
|
4 years ago |
Bine Brank
|
ca65a4e91d
|
update CONTRIBUTORS.md
|
4 years ago |
River Dillon
|
ddb6cee0d5
|
Contribution note
|
4 years ago |
Xianyi Zhang
|
7834c10e2f
|
Add PingTouGe contribution credit.
|
5 years ago |
Marius Hillenbrand
|
f7731a358a
|
Update CONTRIBUTERS.md - clang build fixes for IBM z
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
|
5 years ago |
张丹枫
|
2a3aa91354
|
update CONTRIBUTORS.md, adding myself
|
5 years ago |
Marius Hillenbrand
|
cb9dc36dd5
|
Update CONTRIBUTORS.md
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
|
5 years ago |
Marius Hillenbrand
|
d7c1677c20
|
Update CONTRIBUTORS.md, adding myself
Signed-off-by: Marius Hillenbrand <mhillen@linux.ibm.com>
|
5 years ago |
Martin Kroeker
|
3e28db7f38
|
Update CONTRIBUTORS.md
|
5 years ago |
wjc404
|
9f5cdc49d4
|
Update CONTRIBUTORS.md
|
6 years ago |
wjc404
|
bb2729c855
|
Update CONTRIBUTORS.md
|
6 years ago |
wjc404
|
aae44d040d
|
Update CONTRIBUTORS.md
|
6 years ago |
wjc404
|
312060d0d6
|
Update CONTRIBUTORS.md
|
6 years ago |
wjc404
|
3ce6bcdb5f
|
Update CONTRIBUTORS.md
|
6 years ago |
wjc404
|
6fbe51072b
|
Update CONTRIBUTORS.md
|
6 years ago |
AbdelRauf
|
0f105dd8a5
|
sgemm/strmm
|
6 years ago |
Abdurrauf
|
1cfdb2295d
|
Optimized standard Blas Level-1,2 (excluding nrm2 functions) for z13 (double precision)
|
8 years ago |
Abdurrauf
|
08786c4b95
|
strmm and ctrmm
|
9 years ago |
Abdurrauf
|
0d96b0e2a7
|
Merge branch 'z13' into develop
|
9 years ago |