nihuini
9e9ae2322c
use platform aligned malloc
7 years ago
nihui
1634675c96
fix dst write out of range, fix #886
7 years ago
liurs1990
1554438515
fix NaN(var maybe minus due to accuracy sometimes) issue in InstanceNorm ( #874 )
7 years ago
nihuini
dfffb29bb5
resize bicubic
7 years ago
nihuini
c778265658
reuse hresize result properly when enlarging, fix #863
7 years ago
nihuini
a4b74d27b0
move copy cut border function to operator
7 years ago
nihuini
5a905c7cb9
implement substract_mean_normalize with bias and scale op
7 years ago
nihuini
c25c190703
move resize bilinear function to operator
7 years ago
BUG1989
780c7d9a72
merge de/requantize op, optimize some int8 conv layer on arm64-v8a ( #867 )
* optimize the conv sgemm int8 on arm64-v8a platform
* optimize int8 arm64-v8a with sadalp ins
* merge requantize op into latest conv layer
* merge requantize op into conv-int8 op
* update the mobilenet.param in the benchmark
* Update README.md
update Kirin970 and RK3399
* try to fix the travis build error
7 years ago
nihui
7ab968e6e1
fix gpu crop, convert crop offset with axis
7 years ago
nihuini
f90a9898e2
fix priorbox pipeline creation error on adreno
7 years ago
nihui
58ed8e437f
require GL_EXT_shader_16bit_storage only for fp16_storage, explicit type cast
7 years ago
nihui
162c46647d
do not create fp16 shader module on unsupported platform
7 years ago
nihui
d753fe2589
upload fp16 weight, enable fp16 storage and arithmetic
7 years ago
ShuangLiu1992
c6a2d0417a
add missing header for pipeline.cpp and fix compile error for emscripten ( #861 )
7 years ago
nihui
058bd65c88
fix fp16 shader creation
7 years ago
nihuini
4e3df863d5
fix enable feature pointer
7 years ago
nihuini
46dc21c8b1
fp16 shader
7 years ago
Gemfield
add8c73922
Fix the return value of load_param and load_model ( #855 )
7 years ago
nihuini
37573aeeb5
remove unused record download
7 years ago
nihuini
05bf09ba70
rename fp16_storage to support_fp16_storage
7 years ago
nihuini
43737b378f
wrapper function for converting between fp32 and fp16
7 years ago
nihuini
2b8ff843e9
cast layer and shader for fp32 fp16 conversion
7 years ago
Gemfield
573c2bcd93
Fix crash issue during load_model ( #848 )
* Fix crash issue during load_model
* Fix crash issue during load_model 2nd part
7 years ago
nihuini
a3a2548aa2
initial fp16s fp16a shader build system
7 years ago
nihuini
332722af63
fix fp16a int8a exchange oops
7 years ago
nihuini
e59dc6fafe
proper usage of instance extension VK_KHR_get_physical_device_properties2, check fp16 and int8 feature
7 years ago
nihui
caeb85d6cd
multithreaded pipeline creation and destruction may cause driver crash :(
7 years ago
nihuini
20fb006282
coverage never works without proper unittest
7 years ago
Abdel Younes
e9ac5f207f
add: cmake option to install NCNN SDK ( #841 )
A project using src/CMakeLists.txt directly does not want
to install NCNN library and headers. This new option makes it
optional (default to true).
7 years ago
nihuini
b2e41bf83d
fallback convolution to cpu path for pad -233
7 years ago
nihuini
d933f384b6
bump engine version
7 years ago
nihuini
038389fa63
blacklist known buggy driver
7 years ago
nihuini
d999f43b87
fix vulkan initialization using memory loading
7 years ago
nihuini
d263cd507c
gpu packing and unpacking
7 years ago
nihuini
806911a549
packing vec and image shader
7 years ago
BUG1989
2f4c4a8202
fix the compile error when using armv7a without neon ( #835 )
7 years ago
nihuini
6f9ffca7e3
fix crop on channel dim only, fix #797 , fix #831
7 years ago
nihuini
76638aebf9
fix build on msvc
7 years ago
nihuini
d9301c4f59
convert mxnet crop slice step1, convert onnx slice step1, fix reduction dims 2, fix #441 , fix #498 , fix #519
7 years ago
BUG1989
ff38053321
[WIP] arm64-v8a int8 optimization ( #823 )
* requantize layer arm64-v8a neon implement
* convdw3x3s1 arm64-v8a neon implement
* convdw3x3s2 arm64-v8a neon implement
* conv1x1s1 arm64-v8a is optimized by neon assembly
* conv sgemm int8 optimized with neon assembly,kernel transform is offline
* conv conv winograd int8 optimized with neon assembly,fix ci build failed
* conv3x3s2 int8 arm64-v8a optimized with neon assembly,remove old codes.
7 years ago
nihuini
d3a11eb6c9
one codepath for unified and discrete device
7 years ago
nihuini
433a92401a
auto barrier in pipeline and copy command
7 years ago
nihuini
2672cd437f
add layer type index member
7 years ago
nihuini
ce65edcc84
fix flatten pack1to4
7 years ago
nihuini
5646b7d2c2
flatten image
7 years ago
nihuini
1f4bdd91b5
uint32_t typed workgroup size
7 years ago
nihuini
2e939fab0f
fix memleak
7 years ago
nihuini
532054b453
expose more device info
7 years ago
nihui
dd83284cee
prelu shader
7 years ago