nihui
|
db035d602d
|
update ncnnoptimize layers, lightmode=false keeps original weight (#5414)
|
2 years ago |
nihui
|
056509a034
|
fix create_pipeline crash in vulkan-enabled layer without calling load_param/load_model first (#5410)
|
2 years ago |
張小凡
|
3b048d1923
|
destroy_gpu_instance() function wait for all devices to be idle before destroy (#4763)
* destroy_gpu_instance() will internally ensure that all vulkan devices are idle before proceeding with destruction.
|
2 years ago |
nihui
|
69640594f7
|
unified macos ios ci, drop 32bit support, drop ios arm64e, default to ios 13 (#5403)
|
2 years ago |
nihui
|
5a8f79f7c7
|
add apple A17 and M3 family macro (#5405)
|
2 years ago |
nihui
|
2a07aa2d79
|
unified mac-catalyst ci (#5402)
* fix moltenvk static linking
|
2 years ago |
nihui
|
fafb897ff7
|
update ios toolchain, add visionos ci, update watchos, ncnn target ilp32 (#5399)
|
2 years ago |
nihui
|
824b79a314
|
fix rvv extract blob with fp16 enabled, fix #5360 (#5398)
|
2 years ago |
nihui
|
7cc89108b3
|
try more known vulkan library with simplevk (#5396)
|
2 years ago |
nihui
|
2f65729873
|
fix riscv v build with old cpp standard, fix #5366 (#5391)
|
2 years ago |
nihui
|
167501f0c6
|
fix softmax arm fp16s sum error, fix #5340 (#5393)
|
2 years ago |
nihui
|
6595743bb2
|
shift before adding for dropping additional double bit from vqdmulhq_s16, fix #5263 (#5390)
|
2 years ago |
nihui
|
84256b1494
|
pnnx enhance functionize (#5387)
* pnnx fix some undefined dtype
* fix ncnn convdw1d dynamic weight loading
|
2 years ago |
Shatyuka
|
5a11c383a2
|
Support LLVM OpenMP runtime for MSVC (#5370)
with `/openmp:llvm` compile option
|
2 years ago |
hokamilkv
|
74fda386f3
|
Update convolution_im2col_gemm_int8.h (#5365)
remove _sum0=_sum0
|
2 years ago |
Shatyuka
|
e7748e5311
|
Fix `destroy_gpu_instance` crash (#5353)
* Fix `destroy_gpu_instance` crash
* Additional check and clear
|
2 years ago |
Shatyuka
|
ddd17dd907
|
Fix build error with NCNN_PIXEL_DRAWING off (#5346)
|
2 years ago |
nihui
|
4797d19873
|
ruapu cpu isa detection (#5341)
|
2 years ago |
nihui
|
984d6dd844
|
promote vfpv4 for auto fp16 storage conversion (#5325)
* promote vfpv4 for auto fp16 storage conversion
* always report neon and vfpv4 for arm64
|
2 years ago |
nihui
|
5b536af234
|
fix uwp build (#5328)
|
2 years ago |
nihui
|
d38bdbdb84
|
fix debug build on some compiler, fix #5295 (#5326)
|
2 years ago |
nihui
|
87d7165848
|
disable signal based detectisa if being debugged (#5280)
|
2 years ago |
Justin Fung
|
f6763262d1
|
Add draw rectangle, draw text, draw circle, and draw line to C API (#5324)
|
2 years ago |
Xinyu Yang
|
7ac42680cf
|
RVV: Refine riscv gemm fp32 (#5303)
* replace storexxx to vsseg2e32_v_f32m1
* refine transpose
---------
Co-authored-by: Xinyu302 <Xinyu302@users.noreply.github.com>
|
2 years ago |
Sophon
|
294e786d36
|
convolution_x86: Fix typo in logging (#5310)
Signed-off-by: Xilin Wu <wuxilin123@gmail.com>
|
2 years ago |
nihui
|
0942efab2e
|
x86 avx512 optimization for mish (#5309)
|
2 years ago |
nihui
|
7928d44d51
|
port stb image optimization (#5307)
|
2 years ago |
nihui
|
05b4dcb06c
|
report vulkan cm 8x8x16 config, enable fp16a cm (#5298)
|
2 years ago |
nihui
|
5329d32e74
|
check vulkan fp16 uniform support and implement lfp conversion without fp16u (#5287)
|
2 years ago |
nihui
|
656b082284
|
fix cast armv7 sigbus when loading fp16 model (#5292)
* fix sigbus error when loading fp16 model on armv7
* apply for bf16
|
2 years ago |
nihui
|
ba42369c68
|
workaround l2 norm produce -inf value with subnormals (#5272)
|
2 years ago |
nihui
|
c222208cc9
|
feat mask for disable threading, make some extractor setter no-op, update doc (#5270)
|
2 years ago |
nihui
|
a31f66203b
|
do not cache temporary blob for uploading weight (#5266)
|
2 years ago |
nihui
|
556b79ce4d
|
create layer decoupled (#5258)
* create layer decoupled
* no more virtual public
* allow build test with shared library
* decouple cpu vulkan
* drop old scripts
|
2 years ago |
Molly Sophia
|
92d49e1f59
|
requantize: Use activation_ss in fused_activation.h (#5245)
Which fixes int8 requantization on risc-v
Signed-off-by: Molly Sophia <mollysophia379@gmail.com>
|
2 years ago |
nihui
|
d1d9aa2edb
|
fix some cpu.cpp warning (#5244)
|
2 years ago |
nihui
|
d30af29ee2
|
fix simplecv Mat templated ptr (#5241)
|
2 years ago |
nihui
|
6c261a8c04
|
fix the missing elemsize in vkimagemat from_android_hardware_buffer (#5237)
|
2 years ago |
nihui
|
ded0b78bb2
|
fix nvidia vulkan crash on exit (#5234)
|
2 years ago |
nihui
|
8c4fc5e2a0
|
enable uniform 16bit and 8bit when available, fix validation error in fp16sa shader (#5233)
|
2 years ago |
nihui
|
b7f70cfe4e
|
initialize cpu thread affinity mask all to all cores (#5231)
call omp_set_num_threads with zero num_threads is implementation defined
|
2 years ago |
nihui
|
5a8ce63af4
|
optimize resize bilinear and compress font data (#5200)
|
2 years ago |
nihui
|
eea3fc9b41
|
optimize vulkan global pooling (#5191)
Co-authored-by: nihui <nihui@users.noreply.github.com>
Co-authored-by: michaelcai <michaelcai@tencent.com>
|
2 years ago |
nihui
|
1138312f1e
|
detect avx512 isa with signal action on macos (#5185)
|
2 years ago |
nihui
|
dba87f8cad
|
fix build with msvc arm64 asimdhp (#5176)
|
2 years ago |
nihui
|
deae9e61da
|
disable rtti and exceptions for msvc (#5167)
* disable rtti and exceptions for msvc
* warnings--
* erff
* arch sse2 for 32bit build
* enable rtti for cross compiling
|
2 years ago |
nihui
|
058aa0ad37
|
enable arm neon intrinsics for msvc build (#5151)
|
2 years ago |
AlOa
|
9f26eeb5a7
|
Prelu layer uses sse instruction _mm_load_ps but data can be misaligned so it must use _mm_loadu_ps (#5149)
|
2 years ago |
Justin Fung
|
465debe9bb
|
Add print statements for 4 dimensions benchmark (#5148)
|
2 years ago |
nihui
|
4136de3b8d
|
arm optimization for convolution int8 packed unified elempack (#5147)
|
2 years ago |