Rohit Goswami
|
628a096e91
|
BLD: Re-work the L2 gemv
|
1 year ago |
Rohit Goswami
|
bca90dcb1f
|
BLD,BUG: Fix an extent issue
|
1 year ago |
Rohit Goswami
|
8401ac168e
|
BLD: Add more kernels
|
1 year ago |
Rohit Goswami
|
9aed88fd0f
|
BLD: Generate L1 symbol flags correctly
|
1 year ago |
Rohit Goswami
|
f2b5996df1
|
BLD: Add the ? variant for kernel
|
1 year ago |
Rohit Goswami
|
ff2d1d849d
|
ENH: Use the kernel style
Necessary to extend this to L2/L3
|
1 year ago |
Rohit Goswami
|
168f32fe78
|
MAINT: Cleanup kernel meson
|
1 year ago |
Rohit Goswami
|
cca7523d72
|
MAINT: Fix filepaths for q variants [L1]
|
1 year ago |
Rohit Goswami
|
e5cb411be5
|
MAINT: Cleanup a bit
|
1 year ago |
Rohit Goswami
|
90246ce9d7
|
MAINT: Minor refactors to have common precisions
|
1 year ago |
Rohit Goswami
|
6f5211ed69
|
MAINT: Simplify and generalize
|
1 year ago |
Rohit Goswami
|
ceac74c1ac
|
MAINT: Move the precisions out to main meson.build
|
1 year ago |
Rohit Goswami
|
4f00b6f8b2
|
MAINT: Cleanup makefile to meson for parallel opt
Needs some work
|
1 year ago |
Rohit Goswami
|
84aa94575b
|
MAINT: Cleanup undefined symbols
|
1 year ago |
Rohit Goswami
|
604933d231
|
MAINT,BLD: Cleanup SIMD with meson arrays
|
1 year ago |
Rohit Goswami
|
18732402d3
|
MAINT: Move -m64 out to cpu_family()
|
1 year ago |
Rohit Goswami
|
2da1c2444f
|
MAINT: Add simd flags
|
1 year ago |
Rohit Goswami
|
553ca0fb67
|
MAINT: Generalize and setup F_INTERFACE
|
1 year ago |
Rohit Goswami
|
32567edbcc
|
MAINT: Rework make defines to meson arguments
For SMALL_MATRIX_OPT and MAX_STACK_ALLOC
|
1 year ago |
Rohit Goswami
|
e06834c5cc
|
TMP: Focus on getting a single test example up
Use:
nm -gC bbdir/libopenblas.a | grep drot
❯ gcc trial.c -o trail -I$(pwd)/tmpmake/include -L$(pwd)/bbdir -lopenblas -Wl,--verbose | grep openblas
❯ ./trail
Resulting vectors:
x: 3.000000 4.000000 5.000000 6.000000
y: 2.000000 2.000000 2.000000 2.000000
|
1 year ago |
Rohit Goswami
|
8947604447
|
BLD: Add generic BLAS2 modes
|
1 year ago |
Rohit Goswami
|
844cb7a68f
|
ENH: Add more L2 flags
|
1 year ago |
Rohit Goswami
|
bd43398df8
|
BLD: Add swap and refactor a bit
|
1 year ago |
Rohit Goswami
|
d5da5164e4
|
TMP: Be more DRY
|
1 year ago |
Rohit Goswami
|
ade3f82c73
|
ENH: Start abstracting rules for kernels
|
1 year ago |
Martin Kroeker
|
5d08ec7ff3
|
Merge pull request #4782 from martin-frbg/azurewincl
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
|
1 year ago |
Chip Kerchner
|
cb154832f8
|
Vectorize SBGEMM incopy - 4x faster.
|
1 year ago |
Martin Kroeker
|
a5c04e326a
|
Update scal.c
|
1 year ago |
Martin Kroeker
|
536200bc9e
|
fix handling of INF or NAN
|
1 year ago |
Martin Kroeker
|
3677b3886c
|
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
|
1 year ago |
Martin Kroeker
|
f3c364c2cc
|
temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN
|
1 year ago |
Martin Kroeker
|
2a5fe97e3b
|
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN
|
1 year ago |
Martin Kroeker
|
c1019d5832
|
Handle INF and NAN in inputs
|
1 year ago |
Martin Kroeker
|
9e24121e7e
|
temporarily(?) disable da=0 shortcut to handle x=Inf or NAN
|
1 year ago |
Martin Kroeker
|
a11f086c17
|
Update sscal_msa.c
|
1 year ago |
Martin Kroeker
|
541e1b6959
|
disable the fast path for inc=1, alpha=0 as it does not handle x=NaN or Inf
|
1 year ago |
Martin Kroeker
|
c08113c279
|
fix special cases of x= NAN or INF
|
1 year ago |
Martin Kroeker
|
bd47630bcf
|
exclude the alpha=0 branch as it does not handle NaN or Inf in x
|
1 year ago |
Martin Kroeker
|
68f2501958
|
temporarily(?) disable the alpha=0 branch to handle Inf/NaN in x
|
1 year ago |
Martin Kroeker
|
0a744a939a
|
temporarily(?) disable the alpha=0 branch to handle NaN/Inf in x
|
1 year ago |
Martin Kroeker
|
7f8f037a36
|
handle INF and NAN in input
|
1 year ago |
Martin Kroeker
|
f1248b849d
|
handle INF and NAN in input
|
1 year ago |
Martin Kroeker
|
a2ee4b1966
|
Merge branch 'OpenMathLib:develop' into issue4728
|
1 year ago |
Martin Kroeker
|
3ec59922b6
|
Add a clobber list to fix utest errors seen with gcc13 on Apple M
|
1 year ago |
Martin Kroeker
|
3d8054fb16
|
add clobber list
|
1 year ago |
Martin Kroeker
|
dd7efcf9ef
|
Avoid exceeding the configured thread count in x86_64 TOBF16 (#4748)
* avoid setting nthreads higher than available
|
1 year ago |
Martin Kroeker
|
6ffaf99817
|
disable da=0 shortcut to handle NAN and INF correctly
|
1 year ago |
Martin Kroeker
|
c7cacd9b38
|
disable the shortcut for da=0 to ensure proper handling of INF and NAN
|
1 year ago |
Martin Kroeker
|
5ed4f24d6e
|
Handle corner cases with INF and NAN arguments
|
1 year ago |
Martin Kroeker
|
2bd43ad0eb
|
Merge branch 'OpenMathLib:develop' into issue4728
|
1 year ago |