nihui
aa9753b2f0
detach mat from local blob allocator so net instance could be destroyed much earlier ( #3287 )
4 years ago
zhiliu6
814f89ef1a
Fuse HardSwish activation into Convolution and InnerProduct ( #3233 )
* add general fused activation
* add NCNN_FORCE_INLINE option
4 years ago
Tijmen Verhulsdonck
4270b5c502
Fix broken codepaths with AVX only ( #3254 )
* Fix codepaths for fp16 weights when only AVX is enabled
* Disable opt overrides
* Update SDK url
* Update vulkan SDK download version
* Debugging risv pad
* apply code-format changes
* fix padding test
* fix mips slice test
* fix lrn test
* implement mish swish image shader, fix pooling adaptive image storage support, drop debug output
* update ci ubuntu 18.04
Co-authored-by: nihui <shuizhuyuanluo@126.com>
4 years ago
zhiliu6
80699dd3f9
fix hardswish test beta param ( #3214 )
4 years ago
nihui
c6cda8d07c
arm neon optimization for requantize leakyrelu ( #3144 )
* arm neon optimization for requantize leakyrelu
* add missing changes
* Update test_requantize.cpp
* more test coverage
4 years ago
Xavier Hsinyuan
2a5c672787
Add unittest and RVV optimized for SELU ( #3114 )
4 years ago
nihuini
f1533667ff
fix test_c_api net instance destroyed earlier than blob destruction
4 years ago
Tijmen Verhulsdonck
eaa7e24db6
Added ability to switch AVX/AVX2 during runtime ( #3076 )
4 years ago
nihui
b413fd3a3d
auto code-format bot and disable restyled ( #3075 )
4 years ago
DaydreamCoding
f42d0e5dc9
fix warpaffine_bilinear_yuv420sp uv matrix ( #3048 )
4 years ago
nihui
4f135e07bf
implement convolution1d and pooling1d ( #3035 )
* implement convolution1d and pooling1d
* add conv1d pool1d test
* fuse convolution1d activation
* update operator doc
* fix vulkan adpative pooling
4 years ago
nihuini
12eaa6f9ba
update concat test
5 years ago
nihuini
a180bf7bdc
update concat test for larger channels
5 years ago
nihui
c1ce8ea84d
add more test
5 years ago
nihuini
07fa2e1fe3
prefer large channels for int8 operator tests
5 years ago
nihui
3a77b09c31
fix test failure
5 years ago
nihuini
fef61c5296
fix arm build
5 years ago
nihuini
934a1a8e32
test flatten packing padding int8
5 years ago
nihui
49f3e1ea09
drawing api and stb_image ( #2913 )
* drawing api
* add drawing test
* yuv420sp drawing
* enable simpleocv in webassembly build
5 years ago
nihui
17936e9f54
fix packing risc-v test, add cpu_riscv_vlenb()
5 years ago
nihui
a61f03ec76
arm neon optimization for pixelshuffle scale 2
5 years ago
nihuini
d6b2ea5aac
arm neon optimization for convolution 3x3 on small channels
5 years ago
nihui
7e1aaa5828
cmake option NCNN_INT8 ( #2839 )
5 years ago
nihui
66455c1b95
implement 2823 binary broadcasting type ( #2827 )
5 years ago
nihuini
41a4bea954
unroll size 8 for conv3x3s1 pack8to1 int8 arm64
5 years ago
nihui
e9cc637573
arm neon optimization for int8 packing kernels ( #2809 )
5 years ago
nihui
1ea8bfbd2e
x86 avx2 conv3x3s1 pack8 direct optimization, fix #2789
5 years ago
ncnnnnn
6e6cb9f4f3
simple sort ncnn_add_layer_test ( #2790 )
for obsessive
5 years ago
nihui
a48bf43ef7
test conv/fc int8 with activation
5 years ago
nihui
5fe75f19ef
architecture changes for int8 packing ( #2771 )
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
5 years ago
nihuini
15d63ec0f5
fuse onnx multiheadattention with same qkv blob
5 years ago
RBelogorodtsevFBase
1212ed6e94
implements gelu activation ( #2749 )
5 years ago
nihuini
c17eb4e208
multiheadattention layer
5 years ago
nihuini
7ac23ab34d
fuse onnx layernorm, fix 2-dim layernorm implementation, add test
5 years ago
nihui
3c92a1184b
arm neon optimization for general convolution im2col sgemm ( #2668 )
* arm neon optimization for conv3x3s1 winograd42
* better condition
* Update test_convolution.cpp
* Update test_convolution.cpp
* more proper conditions
* arm neon optimization for general im2col sgemm pack4
* add sgemm
* wip
* wip
* fix armv7 build
* more conditions blah blah
* code format
* fix convolution
* move packed convolution to seperated header source
* unify weight data bf16
* proper conditions
* conv3x3s2 sgemm pack4 test
5 years ago
nihui
ab56083ca5
arm neon optimization for conv3x3s1 winograd42 ( #2664 )
5 years ago
nihuini
f437bcdd4c
enable fp16s and int8s on newer adreno/mali, actually enable int8 tests
5 years ago
nihui
74451897cb
handle gemm in innerproduct ( #2607 )
5 years ago
nihui
0a59ac9b16
integer warpaffine ( #2604 )
* integer warpaffine
* fix some corner case
* fix yuv420sp border value
5 years ago
nihui
6672b09a37
arm neon optimization for gru ( #2597 )
5 years ago
nihui
0b35540c72
arm neon optimization for lstm ( #2595 )
5 years ago
nihuini
3915b5d496
arm neon optimization for packing fp16/bf16 pack8 family
5 years ago
nihui
fca04980f3
enhance padding test ( #2580 )
* workaround nvidia driver crash
* workaround radv buffer_ld1 zero bug
* fix offset elempack
5 years ago
nihui
80fdddb502
more slice test
5 years ago
nihui
ef3550b52f
gru and rnn layer ( #2572 )
5 years ago
Guoxia Wang
609f63c57e
support PyTorch AdaptiveAvgPool2d and AdaptiveMaxPool2d ( #2546 )
* support pytorch adaptive pool
* support onnx2ncnn adaptive pool convert
* support ncnnoptimize adaptive pool param write
* fix adaptive pool out_shape order
* fix adaptive pool out_shape order, H and W can be either a int
add test case, set support_vulkan = false Pooling_vulkan::create_pipeline
* review adaptive pool
* fix typo
* add adaptive pool forward in pooling_x86.cpp pooling_arm.cpp
fix out_w, out_h id naming convention
* fix typo
* don't support packing, bf16, int8, image for adaptive pool
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
nihui
21dc650eb3
check layer support ( #2564 )
5 years ago
tpoisonooo
baf49574c4
innerproduct aarch64 use gemm ( #2521 )
* perf(innerproduct-arm): add aarch64 gemm
* fix(innerproduct): fix compilation errror
* fix(armv7-innerproduct): fix armv7 compilation error
* fix(innerproduct): fix gemm param
* fix(int8): update mock scales and fix runtime error
* fix(compilation): fix compilation error
5 years ago
nihui
54c0a13b9f
build shared library ( #2525 )
* build shared lib and enable lto
* reserved for layer and option
* allocator pimpl
* datareader pimpl
* paramdict pimpl, disable copy assign for allocator and datareader
* modelbin pimpl
* net extractor pimpl
* gpu pimple
* disable copy assign vulkandevice, code format
* command pimpl, dummy image readonly
* pipeline pipelinecache pimpl, export platform class
* code format, export simple family
* update ci
* disable lto on android armv7, merge webassembly ci
* link libgcc, fix macos dylib version
* pipeline pimpl, gpu info pimpl
* destroy gpu info after vulkan device
* ignore msvc stl class warning
* fix ncnn_paramdict_get_float return type
* fix vktransfer upload fp16 without flatten, add command test
5 years ago
nihuini
fbf0ffda53
pixelshuffle nhwc mode, convert onnx DepthToSpace mode DCR, convert mlir tf.DepthToSpace
5 years ago