Kenji Mouri
|
d802acd205
|
Add SSE and AVX implementation of atan2 in x86 targets. (#4633)
|
3 years ago |
張小凡
|
d87e895a1f
|
Add get_gpu_instance() function and Organized the instance class codes. (#4630)
|
3 years ago |
Darren Cheng
|
176f8e12cb
|
remove platform.h.in aarch64 judgment (#4628)
|
3 years ago |
張小凡
|
772b13a1d1
|
Add three extension capability support check (#4626)
* Add some extension capability for vma
|
3 years ago |
Kenji Mouri
|
1936142aae
|
Add AVX and AVX512F implementation of asin, acos and atan in x86 targets. Fix typo for SSE2 implementation of asin in x86 targets. (#4621)
|
3 years ago |
caofx0418
|
643107533a
|
fix errors while build in Android Open Source Code (#4622)
|
3 years ago |
Kenji Mouri
|
5ca5209cd5
|
Add FMA optimization for SSE2 implementation of asin, acos and atan in x86 targets. (#4620)
|
3 years ago |
AlOa
|
22e86402f7
|
Read image from memory buffer for simpleocv (#4557)
|
3 years ago |
nihui
|
db628b1b99
|
allow overwriting built-in layer with custom layer (#4616)
|
3 years ago |
nihui
|
1133a18ca8
|
x86 and arm optimization for convolution1d packed unified elempack (#4615)
|
3 years ago |
nihui
|
85991e2e0e
|
test custom option, update ci (#4609)
* early return for cpu test
* make nvidia driver happy
* fix gemm x86 threading
|
3 years ago |
nihui
|
f34becf6fc
|
fix divide by zero in get optimal tile size mnk (#4610)
|
3 years ago |
nihui
|
2ce77ba918
|
fix mha gemm allocator race (#4611)
|
3 years ago |
nihui
|
804ac3421d
|
infrastructure and optimization for a53 and a55 (#4596)
* new api for detecting arm midr and a53 a55 arch info wrapper
* let a35 be a53 :P
* a53 bf16s
* detect running core
|
3 years ago |
nihui
|
a961ab992e
|
arm deconv matmul use gemm (#4594)
* arm deconv matmul use gemm
* reduce gemm armv7 register uses
|
3 years ago |
nihui
|
254eb8d0d4
|
blacklist fp16a on old adreno driver (#4587)
|
3 years ago |
nihui
|
06b97d7e69
|
fix exynos 9810 isa detection (#4585)
|
3 years ago |
nihui
|
5ac17df797
|
arm optimization for packed convolution unified elempack (#4590)
|
3 years ago |
nihui
|
010d6772d6
|
softmax arm unified elempack and bf16/fp16 optimization (#4582)
* mha arm use softmax fp16
|
3 years ago |
nihui
|
c777bf09dc
|
arm convolution sgemm unified elempack (#4572)
* fuse im2col and packb tile
|
3 years ago |
nihui
|
6987efd950
|
fix scale avx512 (#4580)
|
3 years ago |
Kenji Mouri
|
47879ea7ea
|
Add SSE2 implementation of atan in x86 targets. (#4575)
|
3 years ago |
Kenji Mouri
|
b314b3543d
|
Add SSE2 implementation of acos in x86 targets. (#4573)
|
3 years ago |
Kenji Mouri
|
328d2ca2c4
|
Add SSE2 implementation of asin in x86 targets. (#4570)
|
3 years ago |
Yoh
|
7573faae52
|
move floor and ceil sse_function from unaryOp to sse_mathfun (#4566)
|
3 years ago |
nihui
|
dabc4c065f
|
arm convolution winograd unified elempack (#4556)
* update f43 coeffs
* arm convolution winograd unified elempack
* disable bf16s test atm
* test gnu inline asm off
|
3 years ago |
nihui
|
6f08ec7397
|
use full date for macos pypi package (#4552)
* use full date for pypi package
* split version date string only for dylib
|
3 years ago |
WuJinxuan
|
ff80ac2955
|
[ARM] Multiheadattention (#4463)
|
3 years ago |
nihui
|
bbc770079e
|
silence fopen error on sysfs cache files
|
3 years ago |
nihui
|
47ea2877ed
|
stb and emsdk update (#4536)
* stb_image_write 1.16
* stb_image v2.28
* update emsdk 3.1.28
* enable stb arm neon
* update doc
Co-authored-by: ncnnnnn <67086033+ncnnnnn@users.noreply.github.com>
|
3 years ago |
nihui
|
d0c2738043
|
update riscv winograd f43 coeffs and fix some warnings (#4537)
* update winograd f43 coeffs
* rvv tanh rework
* fix warnings
* rebuild qemu
|
3 years ago |
WuJinxuan
|
6572da3533
|
[x86] GroupNorm (#4471)
Co-authored-by: EdVince <EdVince@users.noreply.github.com>
|
3 years ago |
nihui
|
833f6ed8e4
|
c api for getting output indexes and names (#4534)
|
3 years ago |
nihui
|
1832da8292
|
concat 4d (#4528)
|
3 years ago |
nihui
|
fb9cf7982d
|
eltwise 4d (#4529)
|
3 years ago |
nihui
|
32e2de015e
|
slice 4d (#4525)
|
3 years ago |
nihui
|
fc6ce4a641
|
copyto operator (#4522)
|
3 years ago |
nihui
|
242e775d21
|
pnnx convert torch log10, pow 2 as square (#4518)
|
3 years ago |
nihui
|
246e71c526
|
implement atan2 (#4516)
|
3 years ago |
Fangjun Kuang
|
92e75105c9
|
Support torch.cumsum (#4505)
|
3 years ago |
nihui
|
ab4cfbf5b0
|
enrich ncnn binary broadcast rules (#4513)
|
3 years ago |
nihui
|
6869c81ed3
|
find cpu cache size from sysfs (#4502)
* find cpu cache size from sysfs
* android l3
* make g_thread_affinity_mask singleton
* global mask
|
3 years ago |
nihui
|
17197b3c45
|
ci build with musl libc (#4499)
|
3 years ago |
nihui
|
ce6b80a16b
|
pnnx flatten input tuple list (#4498)
|
3 years ago |
nihui
|
3b36656bc8
|
reduce vulkan winograd f43 transform shader register pressure (#4496)
|
3 years ago |
nihui
|
dfbcd3e69b
|
improve vulkan winograd f43 fp16 numerical stability (#4492)
|
3 years ago |
weirdseed
|
503a8b921f
|
fix uninitialized gpu bug_buffer_image_load_zero value (#4493)
|
3 years ago |
nihui
|
d2d012dce5
|
x86 bfloat16 cast functions (#4491)
* simplify cast fp16 avx512 dispatch
* define sse4.1 macro on msvc avx+
|
3 years ago |
nihui
|
fed99fd35b
|
gemm output transpose, prepack c (#4479)
* mha is now permute and reshape free
* gemm user defined tile mnk param
|
3 years ago |
nihui
|
2e3e680d77
|
x86 optimization for packed convolution unified elempack (#4469)
|
3 years ago |