nihuini
d6b2ea5aac
arm neon optimization for convolution 3x3 on small channels
5 years ago
nihuini
6a397716ca
arm neon optimization for instancenorm
5 years ago
nihuini
687cc857b1
some x86 sse2 optimization for convolution int8
5 years ago
zhiliu6
ec0f904c16
improve x86 1x1 pack8 convolution performance ( #2852 )
5 years ago
nihuini
68468dccbd
arm neon assembly optimization for padding int8 pack8, convolution int8 out elempack 4
5 years ago
nihuini
31d436c627
more verbose load failure, ncnn2int8 write int8 data properly
5 years ago
nihuini
05d457c78f
innerproduct int8 support all fused activation types
5 years ago
nihuini
1bc0126302
fix crash when input cpu blob and extract the same from gpu, update vgg16 int8 model
5 years ago
nihui
7e1aaa5828
cmake option NCNN_INT8 ( #2839 )
5 years ago
nihui
66455c1b95
implement 2823 binary broadcasting type ( #2827 )
5 years ago
nihuini
85efe132ff
unroll inch 4 for convolution sgemm int8
5 years ago
nihui
c6cd5e8628
fix armv7 no-neon build
5 years ago
nihuini
e38a5fcbe6
fix build
5 years ago
nihuini
01f5dcb700
arm neon optimization for convolution sgemm pack1 pack8to1 int8
5 years ago
zhiliu6
c4700c52ca
optimize x86 1x1 pack8 convolution ( #2820 )
5 years ago
nihui
0d1d5b66c5
fix arm64 asm build
5 years ago
nihuini
e9ab1acf27
arm neon optimization for convolution sgemm pack1to8 int8
5 years ago
nihuini
e975de1f36
better condition for arm82 conv3x3s1 winograd
5 years ago
nihuini
41a4bea954
unroll size 8 for conv3x3s1 pack8to1 int8 arm64
5 years ago
nihuini
3631c1933d
non-inlined addref and release slows down overall speed, move them to header
5 years ago
nihui
e9cc637573
arm neon optimization for int8 packing kernels ( #2809 )
5 years ago
nihui
d7cbc055f3
fix illegal instruction on pi4 when NCNN_ARM82 enabled
compiler may compile inline member functions as noinline blocks for different architectures, and linker may pick the newer arch, that results illegal instructions on old hardware
5 years ago
nihuini
256754bff9
fix build with old gcc, fix #2805
5 years ago
zhiliu6
61cd9da55b
optimize x86 3x3 pack8 leftover ( #2797 )
5 years ago
nihuini
912e81d086
fix tanh neon, fix #2751
5 years ago
nihui
32b48f0157
fix int8 auto pack layout
5 years ago
nihui
1ea8bfbd2e
x86 avx2 conv3x3s1 pack8 direct optimization, fix #2789
5 years ago
nihui
5fe75f19ef
architecture changes for int8 packing ( #2771 )
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
5 years ago
nihui
d4a7abc218
fix onnx2ncnn clip without max blob, fix #2788
5 years ago
nihui
eaee64c782
export vkcommand and vkcompute
5 years ago
nihuini
e86799e95f
fix get_big_cpu_count return zero on smp cpu
5 years ago
nihui
7c079b853e
default to big cpu count
5 years ago
restyled-io[bot]
5f00ba89d2
feat(ncnnoptimize): replace denormals to zero on layers with weights ( #2690 )
* feat(ncnnoptimize): replace denormals to zero on layers with weights
Co-authored-by: youngsoo.lee <youngsoo15.lee@gmail.com>
Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
nihui
67e24e0703
use local pool allocator ( #2736 )
* use local pool allocator
* detach extract feat from local allocator
* fix test
5 years ago
nihuini
15d63ec0f5
fuse onnx multiheadattention with same qkv blob
5 years ago
Cai Shanli
f5b307689b
fix net and extractor destroy order when use vulkan ( #2732 )
5 years ago
RBelogorodtsevFBase
1212ed6e94
implements gelu activation ( #2749 )
5 years ago
nihui
b58cd14678
fix non arm-neon build
5 years ago
nihui
0870bf45b1
optimize warpaffine family
5 years ago
nihuini
c17eb4e208
multiheadattention layer
5 years ago
nihuini
b51959802c
fix buffer2host copy, fix #2725
5 years ago
nihuini
7ac23ab34d
fuse onnx layernorm, fix 2-dim layernorm implementation, add test
5 years ago
zhiliu6
57397c418d
Optimize general AVX2 convolution. ( #2714 )
5 years ago
Xu Yang
fd634e9a58
remove unnecessary mat clone when NCNN_BENCHMARK enabled ( #2708 )
5 years ago
Dahan Gong
cbd410c237
fix broken inplace forward ( #2709 )
5 years ago
restyled-io[bot]
8c9bea2322
Restyle faster bbox calculation by background score ( #2693 )
* faster bbox calculation by background score
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
Co-authored-by: Qoo <r97922153@gmail.com>
Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
zylo117
41fba71fa0
fix adaptive avg pooling accumulation overflow in vulkan using fp16 arithmetic ( #2698 )
5 years ago
nihui
3c92a1184b
arm neon optimization for general convolution im2col sgemm ( #2668 )
* arm neon optimization for conv3x3s1 winograd42
* better condition
* Update test_convolution.cpp
* Update test_convolution.cpp
* more proper conditions
* arm neon optimization for general im2col sgemm pack4
* add sgemm
* wip
* wip
* fix armv7 build
* more conditions blah blah
* code format
* fix convolution
* move packed convolution to seperated header source
* unify weight data bf16
* proper conditions
* conv3x3s2 sgemm pack4 test
5 years ago
zylo117
65d71d8f23
support adaptive_pooling in vulkan implementation ( #2681 )
5 years ago
Youngsoo Lee
b9bed8d993
feat: add denormal options ( #2656 )
* feat: add denormal options
Flush-To-Zero(FTZ) and Denormals-Are-Zero(DAZ) are modes that bypass IEEE754 methods of dealing with denormal floating-point numbers on x86_64 and some x86 CPUs.
* feat: Integrate `flush_denormals` into `Extractor::extract`
* chore: replace global variable with `ThreadLocalStorage`
5 years ago