nihui
f10cc6dd93
initial data structure changes for 3dcnn, conv3d, pooling3d ( #3378 )
Co-authored-by: ElvisYu <elvisyuovo@gmail.com>
Co-authored-by: 余浩文 <m18107220188@163.com>
Co-authored-by: Zr2223 <67497651+Zr2223@users.noreply.github.com>
4 years ago
nihui
f86a307ab5
silence code scanning
4 years ago
nihui
6a52e8e5f2
fix potential integer type overflow
4 years ago
nihuini
6c2cee8186
fix mat clone with atypical source cstep
4 years ago
nihuini
3631c1933d
non-inlined addref and release slows down overall speed, move them to header
5 years ago
nihui
d7cbc055f3
fix illegal instruction on pi4 when NCNN_ARM82 enabled
compiler may compile inline member functions as noinline blocks for different architectures, and linker may pick the newer arch, that results illegal instructions on old hardware
5 years ago
nihui
5fe75f19ef
architecture changes for int8 packing ( #2771 )
* quantize and dequantize tests
* unify activation and usability function
* drop NCNN_REQUANT cmake option, test dequantize requantize pack8, fix webassembly build
* benchmark use requantize int8 model
5 years ago
restyled-io[bot]
5f00ba89d2
feat(ncnnoptimize): replace denormals to zero on layers with weights ( #2690 )
* feat(ncnnoptimize): replace denormals to zero on layers with weights
Co-authored-by: youngsoo.lee <youngsoo15.lee@gmail.com>
Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
nihui
79efe33fdc
cmake option for platform api uses ( #2502 )
* cmake option for platform api uses
* adroid gpu ci does not rely on glslangvalidator, add android termux ci
5 years ago
nihui
e644164873
reshape arm bf16s fp16s, flatten api
5 years ago
nihui
7f5047d1dc
Ci test end2end squeezenet ( #1919 )
5 years ago
nihui
3ef995ed1e
format code style and setup restyled.io ( #1840 )
6 years ago
tpoisonooo
8e1c3ac4d1
Add crop para check ( #1825 )
* add copy_cut_border check; fix compile warnings
6 years ago
Naiyang Lin
ceef2470a5
Add logger.h ( #1753 )
6 years ago
nihui
62da1228e1
adreno image shader + fp16 + fp16a ( #1714 )
* wip
* wip
* fix
* image and imageview can not be destroyed until command execution ends
* fast copy path for tightly packed data
* wip
* texture load works
* 1d 3d image
* record clone image, multiple commands share one image reference
* upload download image
* layer forward accept vkimagemat
* vkimagemat graph works
* staging vkimagemat for passing dynamic parameters, macro for fp32+image shader, padding image shader
* vkimagemat elemsize
* convolution test pass
* conv1x1s1 image shader
* fast staging image allocator from host memory, pooling image shader
* convolutiondepthwise image shader
* innerproduct image shader
* packing image shader
* crop deconvolution image shader
* resolve spirv binding types
* image fp16 and fp16a, cast image shader
* eltwise image shader
* wip
* absval image shader
* deconvolutiondepthwise image shader
* concat image shader, squeezenet works
* noop split image shader
* uniform precision hint
* layer support_image_storage
* wip
* vulkan device utility operator
* command is storage and packing option aware
* fallback to cpu on image allocation failed, mobilenetssd works
* flatten image shader, enable more test
* ci test
* check imgfp32 imgfp16 imgfp16a features
* fix ci test
* fix ci test
* upgrade swiftshader
* wip
* opt aggressive
* imgfp16p
* opt none
* convolution winograd image shader
* fix flush range, fast copy path for continous buffer
* minor fix
* fix innerproduct
* wip ...
* wip
* cast fix
* packing test
* wip
* image fp16p is fp16p
* wip
* silence
* more line info
* code clean
* softmax image shader
6 years ago
nihuini
ee118e7d70
reconstruct import android hardwarebuffer api, wip
6 years ago
nihui
44eb28fadc
fix cast arm packing test
6 years ago
nihui
f214883203
cast between float32 and bfloat16
6 years ago
nihui
7ae585f217
shape hint is elemsize aware
6 years ago
nihui
0f7e7bca02
shader shape specialization constant and basic local group size partition ( #1523 )
* use Mat class for Shape description
* shape specialization constant in compute shader
* wip
* wip
* test forward_inplace, add binaryop unaryop sigmoid test
* fix arm unaryop test
* fix arm binaryop test
* make shape hint optional, cast int8 to fp32, add cast test
* wip
* follow the good and old local size setting for conv1x1
* the optimal local size rewrite
* fix build on msvc
* add permute shader for all packing layout, add permute test
* concat and slice patial shape constant, slice test
* fix slice test
* interp test
* add lrn test, test packing layout implicitly
* add eltwise test
* add normalize test
* add instancenorm test
* reorg shape constant
* simple local group size partition
* add shape constant param
6 years ago
nihui
6f2ef1932d
int8 code refactoring wip, add int8 test
6 years ago
nihuini
a86c2f44c3
vkimagemat, vkimageallocator, convenient construct from android hardware buffer
6 years ago
nihuini
a170ef1acf
remove the default option usage in layer interface, fix write out of range in cast arm pack4, handle fp16p conversion on cpu/gpu transfer
6 years ago
nihui
b4c388a72a
Mat misc function accept option parameter, deconvolution pack4 arm neon
6 years ago
nihuini
c4f23ae8ad
rename Mat packing to elempack
6 years ago
nihuini
838c5df839
option api changes
7 years ago
nihuini
dfffb29bb5
resize bicubic
7 years ago
nihuini
a4b74d27b0
move copy cut border function to operator
7 years ago
nihuini
5a905c7cb9
implement substract_mean_normalize with bias and scale op
7 years ago
nihuini
c25c190703
move resize bilinear function to operator
7 years ago
nihuini
43737b378f
wrapper function for converting between fp32 and fp16
7 years ago
nihui
8e5674363b
element packing ( #770 )
* mat packing
* packing layer
* packing works
* convert_packing function
7 years ago
nihuini
bf1c58be46
padding is elemsize aware, copy_make_border is now a padding wrapper
7 years ago
nihui
9706cd1447
implement ncnn blob/workspace allocator, fine-grained per-layer openmp threads control, fix #469
7 years ago
nihuini
ee98817446
proper first row/col handling in resize family, fix #429
8 years ago
dong
6ea09ebf2c
Use aarch64 assembly to replace arm intrinsics
8 years ago
nihuini
d2ee4e7d27
ld1 and st1 handle data endian mode per element
8 years ago
nihuini
a84ba8fc0f
element type storage support in Mat, move data member the first so that a pointer to Mat is a pointer to data, convenient index access for float vector
8 years ago
peng
39445b5233
no memcpy for small size copy_cut_border/copy_make_boder
8 years ago
彭
a86cc8f620
memcpy optimize copy_cut_border/copy_make_boder ( #179 )
* memcpy optimize copy_cut_border/copy_make_boder
* copy small border memcpy may slow
* remove unuse line
* code style
8 years ago
nihui
908a8f48d2
assign same size
8 years ago
nihuini
0edd2b78c5
arm neon optimize for bilinear_resize, about 40% faster
8 years ago
nihuini
613028aa17
implement Mat resize_bilinear
8 years ago
nihuini
b7db8be4f6
add ncnn source qwq
9 years ago