nihui
c4a007406d
windows clang ci ( #5469 )
* windows clang ci
* clang msvc use x86intrin.h for xop
* test arm64 compiler features
2 years ago
nihui
08b7d99a75
rnn/lstm/gru dynamic quantization ( #5435 )
2 years ago
Tabbleman
b8fefb977d
clear warning: unused variable while building on x86-wsl platform ( #5444 )
2 years ago
quink
e31be492d5
c_api: Fix function prototypes with no argument ( #5436 )
Here is a big difference between C and C++. foo() in C means that
the funtion takes an unspecified number of arguments, while
foo(void) means the function takes no argument.
Fix -Wstrict-prototypes warning.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2 years ago
nihui
6cdd7110be
fix instruction extension dispatch ( #5427 )
2 years ago
nihui
9ce7930413
x86 optimization for convolution tiled gemm ( #5426 )
2 years ago
nihui
0fd25d6c70
fix arm riscv build with NCNN_BF16=OFF ( #5422 )
2 years ago
nihui
db035d602d
update ncnnoptimize layers, lightmode=false keeps original weight ( #5414 )
2 years ago
nihui
056509a034
fix create_pipeline crash in vulkan-enabled layer without calling load_param/load_model first ( #5410 )
2 years ago
張小凡
3b048d1923
destroy_gpu_instance() function wait for all devices to be idle before destroy ( #4763 )
* destroy_gpu_instance() will internally ensure that all vulkan devices are idle before proceeding with destruction.
2 years ago
nihui
69640594f7
unified macos ios ci, drop 32bit support, drop ios arm64e, default to ios 13 ( #5403 )
2 years ago
nihui
5a8f79f7c7
add apple A17 and M3 family macro ( #5405 )
2 years ago
nihui
2a07aa2d79
unified mac-catalyst ci ( #5402 )
* fix moltenvk static linking
2 years ago
nihui
fafb897ff7
update ios toolchain, add visionos ci, update watchos, ncnn target ilp32 ( #5399 )
2 years ago
nihui
824b79a314
fix rvv extract blob with fp16 enabled, fix #5360 ( #5398 )
2 years ago
nihui
7cc89108b3
try more known vulkan library with simplevk ( #5396 )
2 years ago
nihui
2f65729873
fix riscv v build with old cpp standard, fix #5366 ( #5391 )
2 years ago
nihui
167501f0c6
fix softmax arm fp16s sum error, fix #5340 ( #5393 )
2 years ago
nihui
6595743bb2
shift before adding for dropping additional double bit from vqdmulhq_s16, fix #5263 ( #5390 )
2 years ago
nihui
84256b1494
pnnx enhance functionize ( #5387 )
* pnnx fix some undefined dtype
* fix ncnn convdw1d dynamic weight loading
2 years ago
Shatyuka
5a11c383a2
Support LLVM OpenMP runtime for MSVC ( #5370 )
with `/openmp:llvm` compile option
2 years ago
hokamilkv
74fda386f3
Update convolution_im2col_gemm_int8.h ( #5365 )
remove _sum0=_sum0
2 years ago
Shatyuka
e7748e5311
Fix `destroy_gpu_instance` crash ( #5353 )
* Fix `destroy_gpu_instance` crash
* Additional check and clear
2 years ago
Shatyuka
ddd17dd907
Fix build error with NCNN_PIXEL_DRAWING off ( #5346 )
2 years ago
nihui
4797d19873
ruapu cpu isa detection ( #5341 )
2 years ago
nihui
984d6dd844
promote vfpv4 for auto fp16 storage conversion ( #5325 )
* promote vfpv4 for auto fp16 storage conversion
* always report neon and vfpv4 for arm64
2 years ago
nihui
5b536af234
fix uwp build ( #5328 )
2 years ago
nihui
d38bdbdb84
fix debug build on some compiler, fix #5295 ( #5326 )
2 years ago
nihui
87d7165848
disable signal based detectisa if being debugged ( #5280 )
2 years ago
Justin Fung
f6763262d1
Add draw rectangle, draw text, draw circle, and draw line to C API ( #5324 )
2 years ago
Xinyu Yang
7ac42680cf
RVV: Refine riscv gemm fp32 ( #5303 )
* replace storexxx to vsseg2e32_v_f32m1
* refine transpose
---------
Co-authored-by: Xinyu302 <Xinyu302@users.noreply.github.com>
2 years ago
Sophon
294e786d36
convolution_x86: Fix typo in logging ( #5310 )
Signed-off-by: Xilin Wu <wuxilin123@gmail.com>
2 years ago
nihui
0942efab2e
x86 avx512 optimization for mish ( #5309 )
2 years ago
nihui
7928d44d51
port stb image optimization ( #5307 )
2 years ago
nihui
05b4dcb06c
report vulkan cm 8x8x16 config, enable fp16a cm ( #5298 )
2 years ago
nihui
5329d32e74
check vulkan fp16 uniform support and implement lfp conversion without fp16u ( #5287 )
2 years ago
nihui
656b082284
fix cast armv7 sigbus when loading fp16 model ( #5292 )
* fix sigbus error when loading fp16 model on armv7
* apply for bf16
2 years ago
nihui
ba42369c68
workaround l2 norm produce -inf value with subnormals ( #5272 )
2 years ago
nihui
c222208cc9
feat mask for disable threading, make some extractor setter no-op, update doc ( #5270 )
2 years ago
nihui
a31f66203b
do not cache temporary blob for uploading weight ( #5266 )
2 years ago
nihui
556b79ce4d
create layer decoupled ( #5258 )
* create layer decoupled
* no more virtual public
* allow build test with shared library
* decouple cpu vulkan
* drop old scripts
2 years ago
Molly Sophia
92d49e1f59
requantize: Use activation_ss in fused_activation.h ( #5245 )
Which fixes int8 requantization on risc-v
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
2 years ago
nihui
d1d9aa2edb
fix some cpu.cpp warning ( #5244 )
2 years ago
nihui
d30af29ee2
fix simplecv Mat templated ptr ( #5241 )
2 years ago
nihui
6c261a8c04
fix the missing elemsize in vkimagemat from_android_hardware_buffer ( #5237 )
2 years ago
nihui
ded0b78bb2
fix nvidia vulkan crash on exit ( #5234 )
2 years ago
nihui
8c4fc5e2a0
enable uniform 16bit and 8bit when available, fix validation error in fp16sa shader ( #5233 )
2 years ago
nihui
b7f70cfe4e
initialize cpu thread affinity mask all to all cores ( #5231 )
call omp_set_num_threads with zero num_threads is implementation defined
2 years ago
nihui
5a8ce63af4
optimize resize bilinear and compress font data ( #5200 )
2 years ago
nihui
eea3fc9b41
optimize vulkan global pooling ( #5191 )
Co-authored-by: nihui <nihui@users.noreply.github.com>
Co-authored-by: michaelcai <michaelcai@tencent.com>
2 years ago