Yoh
bd15e32517
move x86 abs_mathfun to x86_mathfun ( #4659 )
* move x86 abs_mathfun to x86_mathfun
* Improve the implementation of AVX and AVX512F abs.
* Unified naming rules
---------
Co-authored-by: MouriNaruto <Mouri_Naruto@Outlook.com>
3 years ago
nihui
e006aa8007
fix extension not present error ( #4655 )
3 years ago
Kenji Mouri
f2a5a81a5d
Add AVX512F implementation of atan2 in x86 targets. ( #4641 )
3 years ago
nihui
a2106f840f
setup more extension entrypoint ( #4636 )
3 years ago
Kenji Mouri
d802acd205
Add SSE and AVX implementation of atan2 in x86 targets. ( #4633 )
3 years ago
張小凡
d87e895a1f
Add get_gpu_instance() function and Organized the instance class codes. ( #4630 )
3 years ago
Darren Cheng
176f8e12cb
remove platform.h.in aarch64 judgment ( #4628 )
3 years ago
張小凡
772b13a1d1
Add three extension capability support check ( #4626 )
* Add some extension capability for vma
3 years ago
Kenji Mouri
1936142aae
Add AVX and AVX512F implementation of asin, acos and atan in x86 targets. Fix typo for SSE2 implementation of asin in x86 targets. ( #4621 )
3 years ago
caofx0418
643107533a
fix errors while build in Android Open Source Code ( #4622 )
3 years ago
Kenji Mouri
5ca5209cd5
Add FMA optimization for SSE2 implementation of asin, acos and atan in x86 targets. ( #4620 )
3 years ago
AlOa
22e86402f7
Read image from memory buffer for simpleocv ( #4557 )
3 years ago
nihui
db628b1b99
allow overwriting built-in layer with custom layer ( #4616 )
3 years ago
nihui
1133a18ca8
x86 and arm optimization for convolution1d packed unified elempack ( #4615 )
3 years ago
nihui
85991e2e0e
test custom option, update ci ( #4609 )
* early return for cpu test
* make nvidia driver happy
* fix gemm x86 threading
3 years ago
nihui
f34becf6fc
fix divide by zero in get optimal tile size mnk ( #4610 )
3 years ago
nihui
2ce77ba918
fix mha gemm allocator race ( #4611 )
3 years ago
nihui
804ac3421d
infrastructure and optimization for a53 and a55 ( #4596 )
* new api for detecting arm midr and a53 a55 arch info wrapper
* let a35 be a53 :P
* a53 bf16s
* detect running core
3 years ago
nihui
a961ab992e
arm deconv matmul use gemm ( #4594 )
* arm deconv matmul use gemm
* reduce gemm armv7 register uses
3 years ago
nihui
254eb8d0d4
blacklist fp16a on old adreno driver ( #4587 )
3 years ago
nihui
06b97d7e69
fix exynos 9810 isa detection ( #4585 )
3 years ago
nihui
5ac17df797
arm optimization for packed convolution unified elempack ( #4590 )
3 years ago
nihui
010d6772d6
softmax arm unified elempack and bf16/fp16 optimization ( #4582 )
* mha arm use softmax fp16
3 years ago
nihui
c777bf09dc
arm convolution sgemm unified elempack ( #4572 )
* fuse im2col and packb tile
3 years ago
nihui
6987efd950
fix scale avx512 ( #4580 )
3 years ago
Kenji Mouri
47879ea7ea
Add SSE2 implementation of atan in x86 targets. ( #4575 )
3 years ago
Kenji Mouri
b314b3543d
Add SSE2 implementation of acos in x86 targets. ( #4573 )
3 years ago
Kenji Mouri
328d2ca2c4
Add SSE2 implementation of asin in x86 targets. ( #4570 )
3 years ago
Yoh
7573faae52
move floor and ceil sse_function from unaryOp to sse_mathfun ( #4566 )
3 years ago
nihui
dabc4c065f
arm convolution winograd unified elempack ( #4556 )
* update f43 coeffs
* arm convolution winograd unified elempack
* disable bf16s test atm
* test gnu inline asm off
3 years ago
nihui
6f08ec7397
use full date for macos pypi package ( #4552 )
* use full date for pypi package
* split version date string only for dylib
3 years ago
WuJinxuan
ff80ac2955
[ARM] Multiheadattention ( #4463 )
3 years ago
nihui
bbc770079e
silence fopen error on sysfs cache files
3 years ago
nihui
47ea2877ed
stb and emsdk update ( #4536 )
* stb_image_write 1.16
* stb_image v2.28
* update emsdk 3.1.28
* enable stb arm neon
* update doc
Co-authored-by: ncnnnnn <67086033+ncnnnnn@users.noreply.github.com>
3 years ago
nihui
d0c2738043
update riscv winograd f43 coeffs and fix some warnings ( #4537 )
* update winograd f43 coeffs
* rvv tanh rework
* fix warnings
* rebuild qemu
3 years ago
WuJinxuan
6572da3533
[x86] GroupNorm ( #4471 )
Co-authored-by: EdVince <EdVince@users.noreply.github.com>
3 years ago
nihui
833f6ed8e4
c api for getting output indexes and names ( #4534 )
3 years ago
nihui
1832da8292
concat 4d ( #4528 )
3 years ago
nihui
fb9cf7982d
eltwise 4d ( #4529 )
3 years ago
nihui
32e2de015e
slice 4d ( #4525 )
3 years ago
nihui
fc6ce4a641
copyto operator ( #4522 )
3 years ago
nihui
242e775d21
pnnx convert torch log10, pow 2 as square ( #4518 )
3 years ago
nihui
246e71c526
implement atan2 ( #4516 )
3 years ago
Fangjun Kuang
92e75105c9
Support torch.cumsum ( #4505 )
3 years ago
nihui
ab4cfbf5b0
enrich ncnn binary broadcast rules ( #4513 )
3 years ago
nihui
6869c81ed3
find cpu cache size from sysfs ( #4502 )
* find cpu cache size from sysfs
* android l3
* make g_thread_affinity_mask singleton
* global mask
3 years ago
nihui
17197b3c45
ci build with musl libc ( #4499 )
3 years ago
nihui
ce6b80a16b
pnnx flatten input tuple list ( #4498 )
3 years ago
nihui
3b36656bc8
reduce vulkan winograd f43 transform shader register pressure ( #4496 )
3 years ago
nihui
dfbcd3e69b
improve vulkan winograd f43 fp16 numerical stability ( #4492 )
3 years ago