nihui
|
c6cda8d07c
|
arm neon optimization for requantize leakyrelu (#3144)
* arm neon optimization for requantize leakyrelu
* add missing changes
* Update test_requantize.cpp
* more test coverage
|
4 years ago |
nihuini
|
169614f732
|
fix build with NCNN_STDIO off
|
4 years ago |
nihui
|
da7b64b833
|
fix build on c906
|
4 years ago |
nihuini
|
11794675f3
|
apple a11 and a12 do not support armv8.2 dotprod, restore the fp16-only optimized path
|
4 years ago |
nihuini
|
b9460c6e8f
|
fix armv7 requantize relu, fix #3122
|
4 years ago |
Xavier Hsinyuan
|
2a5c672787
|
Add unittest and RVV optimized for SELU (#3114)
|
4 years ago |
Xavier Hsinyuan
|
d78add2acd
|
RVV optimized PReLU, with fp16 support (#3113)
|
4 years ago |
nihui
|
f9a16ea1ec
|
fix build on c906
|
4 years ago |
Xavier Hsinyuan
|
9933cc776a
|
RVV optimized HardSwish and HardSigmoid (#3108)
* RVV optimized HardSwish, with fp16 support
* RVV optimized HardSigmoid, with fp16 support
* apply code-format changes
Co-authored-by: thelastlin <thelastlin@users.noreply.github.com>
|
4 years ago |
nihui
|
d91cccfb55
|
apply code-format changes
|
4 years ago |
Xavier Hsinyuan
|
99440e67f7
|
RVV optimized binaryop, with fp16 support (#3097)
|
4 years ago |
nihui
|
2c4ae09604
|
fix #2961 (#3095)
|
4 years ago |
nihuini
|
affbefe311
|
some space cleanup, blob clone from allocator
|
4 years ago |
lsdustc
|
61af40cfbd
|
Fix Innerproduct gemm forgot to add an offset of w in int8 forward (#3084)
|
4 years ago |
zhiliu6
|
190207a173
|
fix AVX2 compiled as AVX problem (#3090)
|
4 years ago |
nihui
|
052b2a1653
|
apply code-format changes
|
4 years ago |
chenxiemin
|
9bd8a50b96
|
fix vulkan memory leak issue (#3088)
|
4 years ago |
nihui
|
cdf45a6512
|
cmake option NCNN_BF16 (#3068)
|
4 years ago |
Tijmen Verhulsdonck
|
eaa7e24db6
|
Added ability to switch AVX/AVX2 during runtime (#3076)
|
4 years ago |
nihui
|
9391fae741
|
optimize arm neon tanh (#3079)
|
4 years ago |
deepage
|
5841f667a6
|
optimize arm neon tanh fma (#3078)
Co-authored-by: chao.tang <chao.tang@lynxi.com>
|
4 years ago |
nihui
|
06a7086daa
|
optimize arm neon exp log sincos fma (#3077)
|
4 years ago |
nihui
|
b413fd3a3d
|
auto code-format bot and disable restyled (#3075)
|
4 years ago |
Tijmen Verhulsdonck
|
a7f301a99d
|
Add clang compatiblity (#3071)
* Add clang compatiblity
Add ability to build NCNN lib on windows with clang GNU
* Restyled/pull 3071 (#16)
* [skip ci] Restyled by clang-format
* [skip ci] Restyled by astyle
* [skip ci] Restyled by clang-format
* [skip ci] Restyled by astyle
Co-authored-by: Restyled.io <commits@restyled.io>
Co-authored-by: Restyled.io <commits@restyled.io>
|
4 years ago |
Zhuo Zhang
|
5049aaeabd
|
fix invalid read for yuv420sp2rgb on arm32 (#3066)
|
4 years ago |
nihui
|
927e34278c
|
mips msa optimization part2 (#3063)
* mips msa optimization for convolution 3x3 pack1to4, 7x7s2 pack1to4, dropout, eltwise, hardsigmoid, hardswish, packing, flatten, prelu
* prefetch convdw3x3 and convdw5x5
* fix mips convolution sgemm pack4to1
* more prefetch helps
|
4 years ago |
nihui
|
49cda73420
|
mips msa optimization for convolution sgemm and winograd (#3055)
* mips msa optimization for convolution sgemm and 3x3s1 winograd, use msa fmadd
* mips msa optimization for convolution sgemm pack4to1
* mips msa optimization for swish, improve sgemm kernel
* unroll 12x4, use prefetch
|
4 years ago |
nihui
|
8a40b98348
|
mips optimization (#3051)
* mips msa optimization for crop deconvolution deconvolutiondepthwise general cases
* mips msa activation support pack4
* mips msa optimization for concat slice interp
* mips msa optimization for binaryop unaryop
|
4 years ago |
nihui
|
90b38ba145
|
fix mips convolutiondepthwise
|
4 years ago |
nihui
|
a133f4be92
|
mips msa optimization for pooling general cases, fix test on ls2k
|
4 years ago |
nihui
|
1f49fc4b67
|
mips msa optimization for padding packing flatten innerproduct convolution convolutiondepthwise general cases
|
4 years ago |
DaydreamCoding
|
f42d0e5dc9
|
fix warpaffine_bilinear_yuv420sp uv matrix (#3048)
|
4 years ago |
nihui
|
b8e03ced3c
|
allow examples building with simpleocv
|
4 years ago |
nihui
|
f2d2a6d51b
|
drop fprintf
|
4 years ago |
nihuini
|
e63bb1f79e
|
riscv v optimization for convolution 3x3s1 3x3s2 7x7s2 pack1ton
|
4 years ago |
nihui
|
589609b37a
|
fix test on c906
|
4 years ago |
nihuini
|
41a64e5efe
|
riscv v optimization for convolution sgemm pack1ton packnto1
|
4 years ago |
nihui
|
b2e8a8d87d
|
split noop support riscv zfh
|
4 years ago |
nihui
|
4f135e07bf
|
implement convolution1d and pooling1d (#3035)
* implement convolution1d and pooling1d
* add conv1d pool1d test
* fuse convolution1d activation
* update operator doc
* fix vulkan adpative pooling
|
4 years ago |
nihuini
|
e526934a48
|
riscv v optimization for convolutiondepthwise 3x3 5x5 packn
|
4 years ago |
nihui
|
abd0b26994
|
fix test on c906
|
4 years ago |
nihuini
|
44e60b8dd9
|
fix over allocated workspace blobs for fp16sa in convolution sgemm rvv
|
4 years ago |
nihuini
|
400aa23e57
|
riscv v optimization for convolution sgemm pack1
|
4 years ago |
nihuini
|
e9a7b5ac96
|
riscv v optimization for convolution 3x3s1 packn winograd42 and winograd64
|
4 years ago |
nihui
|
2cbece80ad
|
fix build on c906
|
4 years ago |
nihuini
|
c7ceee8768
|
riscv v optimization for convolution sgemm and conv1x1 packn
|
4 years ago |
nihui
|
64092d0c7c
|
mips msa optimization for log_ps and mish
|
4 years ago |
nihui
|
832fc3eb72
|
fix mips tanh
|
4 years ago |
nihui
|
1c31ac2549
|
runtime cpu dispatch for mips msa and loongson mmi
|
5 years ago |
nihui
|
2f70343aec
|
cmake clean (#3032)
|
5 years ago |