Mateusz Sokół
|
4673fd0dcb
|
BLD: Generate config.h file
|
1 year ago |
Rohit Goswami
|
ae14071a29
|
MAINT: Rework from make log
|
1 year ago |
Rohit Goswami
|
ef6e060813
|
MAINT: Try a working set
|
1 year ago |
Rohit Goswami
|
f0d6bf15dc
|
MAINT: Fixup and use the new .S dep
|
1 year ago |
Rohit Goswami
|
48b68c2c62
|
BUG,ENH: Fix .S handling
TMP: Fixup
|
1 year ago |
Rohit Goswami
|
f30f33bed3
|
MAINT: Add more symbols for the test
|
1 year ago |
Rohit Goswami
|
a91f69940b
|
MAINT: Quick and dirty working set of symbols
Well working as in has enough symbols for the import, currently is
failing NumPy tests, including dot matrix multiplications...
|
1 year ago |
Rohit Goswami
|
dda2122726
|
MAINT: Add the _beta variant of gemm
|
1 year ago |
Rohit Goswami
|
a198203992
|
MAINT: Add the gemm_small_kernel variants
w/o and with b0
|
1 year ago |
Rohit Goswami
|
5e9b81eed6
|
ENH: Add more L3 symbols
|
1 year ago |
Rohit Goswami
|
91feee93e2
|
ENH: Add TRMM_KERNEL bindings
|
1 year ago |
Rohit Goswami
|
325c8721b5
|
MAINT: Start adding L3
|
1 year ago |
Rohit Goswami
|
b7b42ac7b8
|
MAINT: Start working on kernels and driver L2
|
1 year ago |
Rohit Goswami
|
52668a982a
|
BLD: Start working on L3
|
1 year ago |
Rohit Goswami
|
facd09848a
|
BLD: Finalize the generic and L1, L2
|
1 year ago |
Rohit Goswami
|
98099aa559
|
BLD: Fixup more L1 from Kernel generic
|
1 year ago |
Rohit Goswami
|
628a096e91
|
BLD: Re-work the L2 gemv
|
1 year ago |
Rohit Goswami
|
bca90dcb1f
|
BLD,BUG: Fix an extent issue
|
1 year ago |
Rohit Goswami
|
8401ac168e
|
BLD: Add more kernels
|
1 year ago |
Rohit Goswami
|
9aed88fd0f
|
BLD: Generate L1 symbol flags correctly
|
1 year ago |
Rohit Goswami
|
f2b5996df1
|
BLD: Add the ? variant for kernel
|
1 year ago |
Rohit Goswami
|
ff2d1d849d
|
ENH: Use the kernel style
Necessary to extend this to L2/L3
|
1 year ago |
Rohit Goswami
|
168f32fe78
|
MAINT: Cleanup kernel meson
|
1 year ago |
Rohit Goswami
|
cca7523d72
|
MAINT: Fix filepaths for q variants [L1]
|
1 year ago |
Rohit Goswami
|
e5cb411be5
|
MAINT: Cleanup a bit
|
1 year ago |
Rohit Goswami
|
90246ce9d7
|
MAINT: Minor refactors to have common precisions
|
1 year ago |
Rohit Goswami
|
6f5211ed69
|
MAINT: Simplify and generalize
|
1 year ago |
Rohit Goswami
|
ceac74c1ac
|
MAINT: Move the precisions out to main meson.build
|
1 year ago |
Rohit Goswami
|
4f00b6f8b2
|
MAINT: Cleanup makefile to meson for parallel opt
Needs some work
|
1 year ago |
Rohit Goswami
|
84aa94575b
|
MAINT: Cleanup undefined symbols
|
1 year ago |
Rohit Goswami
|
604933d231
|
MAINT,BLD: Cleanup SIMD with meson arrays
|
1 year ago |
Rohit Goswami
|
18732402d3
|
MAINT: Move -m64 out to cpu_family()
|
1 year ago |
Rohit Goswami
|
2da1c2444f
|
MAINT: Add simd flags
|
1 year ago |
Rohit Goswami
|
553ca0fb67
|
MAINT: Generalize and setup F_INTERFACE
|
1 year ago |
Rohit Goswami
|
32567edbcc
|
MAINT: Rework make defines to meson arguments
For SMALL_MATRIX_OPT and MAX_STACK_ALLOC
|
1 year ago |
Rohit Goswami
|
e06834c5cc
|
TMP: Focus on getting a single test example up
Use:
nm -gC bbdir/libopenblas.a | grep drot
❯ gcc trial.c -o trail -I$(pwd)/tmpmake/include -L$(pwd)/bbdir -lopenblas -Wl,--verbose | grep openblas
❯ ./trail
Resulting vectors:
x: 3.000000 4.000000 5.000000 6.000000
y: 2.000000 2.000000 2.000000 2.000000
|
1 year ago |
Rohit Goswami
|
8947604447
|
BLD: Add generic BLAS2 modes
|
1 year ago |
Rohit Goswami
|
844cb7a68f
|
ENH: Add more L2 flags
|
1 year ago |
Rohit Goswami
|
bd43398df8
|
BLD: Add swap and refactor a bit
|
1 year ago |
Rohit Goswami
|
d5da5164e4
|
TMP: Be more DRY
|
1 year ago |
Rohit Goswami
|
ade3f82c73
|
ENH: Start abstracting rules for kernels
|
1 year ago |
Martin Kroeker
|
5d08ec7ff3
|
Merge pull request #4782 from martin-frbg/azurewincl
Fix NAN handling in ARM/generic SCAL; have AzureCI Windows show errors on failure
|
1 year ago |
Chip Kerchner
|
cb154832f8
|
Vectorize SBGEMM incopy - 4x faster.
|
1 year ago |
Martin Kroeker
|
a5c04e326a
|
Update scal.c
|
1 year ago |
Martin Kroeker
|
536200bc9e
|
fix handling of INF or NAN
|
1 year ago |
Martin Kroeker
|
3677b3886c
|
Merge pull request #4702 from bashimao/detect-nv-grace
Correctly detect ARM Neoverse V2 CPUs.
|
1 year ago |
Martin Kroeker
|
f3c364c2cc
|
temporarily(?) disable the alpha=0 branch as it fails to handle INF,NAN
|
1 year ago |
Martin Kroeker
|
2a5fe97e3b
|
temporarily(?) disable the alpha=0 branch as it does not handle INF,NAN
|
1 year ago |
Martin Kroeker
|
c1019d5832
|
Handle INF and NAN in inputs
|
1 year ago |
Martin Kroeker
|
9e24121e7e
|
temporarily(?) disable da=0 shortcut to handle x=Inf or NAN
|
1 year ago |