nihuini
|
9b5cb959b9
|
auto convert int8 to fp32 on extract
|
5 years ago |
nihuini
|
00b0094b2f
|
copy-pasta makes msvc happy
|
5 years ago |
nihuini
|
9a12597dff
|
build ncnn quantization tool even opencv not found
|
5 years ago |
nihui
|
49f3e1ea09
|
drawing api and stb_image (#2913)
* drawing api
* add drawing test
* yuv420sp drawing
* enable simpleocv in webassembly build
|
5 years ago |
nihui
|
ad37c34d25
|
disable NCNN_ARM82DOT whenever NCNN_ARM82 disabled
|
5 years ago |
FusionBolt
|
322812de2d
|
Fix typo (#2922)
|
5 years ago |
Cai Shanli
|
8cc8cd716a
|
Add get input and output names (#2890)
|
5 years ago |
nihuini
|
afc02d57f9
|
runtime detect armv8.2 dotprod
|
5 years ago |
nihui
|
5f62fdec87
|
allow more concurrent gpu submits on device with low queue count
|
5 years ago |
nihui
|
5d6f03dbbd
|
asm word for risc-v without v
|
5 years ago |
nihui
|
a1b06baec8
|
fix build
|
5 years ago |
nihui
|
17936e9f54
|
fix packing risc-v test, add cpu_riscv_vlenb()
|
5 years ago |
nihui
|
da40cc6ec8
|
risc-v v optimization for math functions of all LMUL
|
5 years ago |
nihuini
|
d0d8120335
|
force lrn vulkan accumulator always use fp32, fix #2882
|
5 years ago |
nihuini
|
7a1e015d72
|
fix bias error in conv1x1s1 sgemm pack8to4 fp16sa, fix #2880
|
5 years ago |
nihui
|
11958424c2
|
runtime riscv v and zfh dispatch, riscv v optimization for cast
|
5 years ago |
nihui
|
e87af6d020
|
riscv v optimization for log_ps sincos_ps tanh_ps, mish, swish, tanh, unaryop, enable fp16sa
|
5 years ago |
nihui
|
37aeba2369
|
riscv v optimization for exp_ps, sigmoid
|
5 years ago |
nihui
|
1c26291757
|
more verbose hint for find_blob_index_by_name failure
|
5 years ago |
nihui
|
e8f92e4a04
|
riscv v optimization for packing, add rvv071 compatibility header
|
5 years ago |
nihuini
|
34bd5ef161
|
update eq quant info
|
5 years ago |
nihuini
|
72ef77a469
|
fix build with NCNN_STRING off and NCNN_VULKAN on
|
5 years ago |
zhiliu6
|
fb9d529487
|
fix compile error when NCNN_STRING is disabled (#2874)
|
5 years ago |
songqun
|
ad1012bcda
|
add comment for alpha beta in hardswish and hardsigmoid compared to tf/pytorch implementation (#2859)
|
5 years ago |
Zhang Xianyi
|
a1ece94f51
|
Use RVV spec 0.7.1 for C906. (#2868)
|
5 years ago |
nihui
|
45bf3cd779
|
add runtime riscv v detection function, the initial c906 riscv linux toolchain
|
5 years ago |
nihui
|
bdb3aa5657
|
fix build
|
5 years ago |
nihui
|
a61f03ec76
|
arm neon optimization for pixelshuffle scale 2
|
5 years ago |
nihui
|
81be8e235c
|
workaround macos intel dummy image readonly issue, fix #2548 (#2864)
|
5 years ago |
nihuini
|
d6b2ea5aac
|
arm neon optimization for convolution 3x3 on small channels
|
5 years ago |
nihuini
|
6a397716ca
|
arm neon optimization for instancenorm
|
5 years ago |
nihuini
|
687cc857b1
|
some x86 sse2 optimization for convolution int8
|
5 years ago |
zhiliu6
|
ec0f904c16
|
improve x86 1x1 pack8 convolution performance (#2852)
|
5 years ago |
nihuini
|
68468dccbd
|
arm neon assembly optimization for padding int8 pack8, convolution int8 out elempack 4
|
5 years ago |
nihuini
|
31d436c627
|
more verbose load failure, ncnn2int8 write int8 data properly
|
5 years ago |
nihuini
|
05d457c78f
|
innerproduct int8 support all fused activation types
|
5 years ago |
nihuini
|
1bc0126302
|
fix crash when input cpu blob and extract the same from gpu, update vgg16 int8 model
|
5 years ago |
nihui
|
7e1aaa5828
|
cmake option NCNN_INT8 (#2839)
|
5 years ago |
nihui
|
66455c1b95
|
implement 2823 binary broadcasting type (#2827)
|
5 years ago |
nihuini
|
85efe132ff
|
unroll inch 4 for convolution sgemm int8
|
5 years ago |
nihui
|
c6cd5e8628
|
fix armv7 no-neon build
|
5 years ago |
nihuini
|
e38a5fcbe6
|
fix build
|
5 years ago |
nihuini
|
01f5dcb700
|
arm neon optimization for convolution sgemm pack1 pack8to1 int8
|
5 years ago |
zhiliu6
|
c4700c52ca
|
optimize x86 1x1 pack8 convolution (#2820)
|
5 years ago |
nihui
|
0d1d5b66c5
|
fix arm64 asm build
|
5 years ago |
nihuini
|
e9ab1acf27
|
arm neon optimization for convolution sgemm pack1to8 int8
|
5 years ago |
nihuini
|
e975de1f36
|
better condition for arm82 conv3x3s1 winograd
|
5 years ago |
nihuini
|
41a4bea954
|
unroll size 8 for conv3x3s1 pack8to1 int8 arm64
|
5 years ago |
nihuini
|
3631c1933d
|
non-inlined addref and release slows down overall speed, move them to header
|
5 years ago |
nihui
|
e9cc637573
|
arm neon optimization for int8 packing kernels (#2809)
|
5 years ago |