Ikko Ashimine
cdba4ae936
Fix typo in stb_image.h ( #4358 )
exitting -> exiting
3 years ago
nihui
eceac35a7f
implement MultiheadAttention kdim vdim ( #4347 )
3 years ago
nihui
498ca7341b
squeeze and expanddims 4d ( #4346 )
3 years ago
Lry89757
6a47f8d15c
gridsample op support ( #4288 )
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
junchao-loongson
279222c2c9
add vector optimization for loongarch64 ( #4242 )
3 years ago
nihui
5b28c1730e
implement ncnn fold and unfold ( #4326 )
3 years ago
Xavier Hsinyuan
d1ac1de7ab
RVV: InstanceNorm with fp16s(a) support ( #4078 )
3 years ago
Xavier Hsinyuan
31602bd2dc
RVV: BatchNorm with fp16s(a) support ( #4075 )
3 years ago
nihui
6e49fa30dc
groupnorm 1d/2d/4d ( #4312 )
3 years ago
nihui
b853b3d132
get_physical_cpu_count api family ( #4302 )
* get_physical_cpu_count api family
* set default to physical big cpu
* always treat smt core as big core
* is_smt_cpu
* get max freq mhz on windows
* windows thread affinity
3 years ago
nihui
9c6f1107d2
fix #4315 ( #4316 )
3 years ago
nihui
5ee276cdf7
x86 unified fc fp32/fp16s ( #4303 )
* more fma
* more transpose utility function
3 years ago
nihui
512e584a6a
general cpu feature detection on macos/ios, enable bf16 and i8mm on a15 a16 and m2 ( #4300 )
3 years ago
bestpower
a116e005b8
Fix linux build error( #4265 ) ( #4294 )
Co-authored-by: wangyu <786794414@qq.com>
3 years ago
nihui
8eab5ea0ea
x86 sse2/avx2 optimization for convolution sgemm/winograd int8 family ( #4286 )
3 years ago
Fangjun Kuang
5281d51535
implement GLU and pnnx conversion ( #4283 )
3 years ago
Yoh
bb660d09b8
add elu vulkan operator ( #4280 )
3 years ago
nihui
0b591b0d1f
implement layer feature disabled bit ( #4278 )
3 years ago
Eahow Chen
f80c2743e7
fix compile warning with gcc 9.1.0 including simplestl.h file ( #4274 )
* fix compile warning with gcc 9.1.0 including simplestl.h file
* apply code-format changes
Co-authored-by: veahow <veahow@users.noreply.github.com>
3 years ago
miemie2013
b13c2a16ce
Optimize x86 DeformableConv2D ( #4128 )
3 years ago
nihui
77eda4c19f
implement lstm proj_size ( #4263 )
3 years ago
nihui
3e2b3fa04d
more stricter armv7 fp16 and armv84 bf16 compiler check, fix #4147 fix #4222 ( #4247 )
3 years ago
LinHe
9426e21166
Memory Pool Improvement For Variadic Sized Inputs ( #4190 )
* Simple miss count for better space efficiency
* Simple double ended greedy;
* Add size drop threshold setter;
* set workspace allocator cr to zero as we had some sort of recylcing capability :P
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
Xavier Hsinyuan
e7eadca6c1
RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part #4100 ) ( #4118 )
* RVV: use size_t for vl
* RVV: replace vsseg.v tuple type by using regex
-----
search:
vsseg([1-9])e(8|16|32)_v_(f|i|u)\2m(1|2|4|8)x\1\(([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)\), vl\);
substitute by:
vsseg$1e$2_v_$3$2m$4($5, $6, vl);
* RVV: replace vssseg.v tuple types by using regex
---
search:
vssseg([1-9])e(8|16|32)_v_f\2m1x\1\(([ -~]+), vcreate_f\2m1x\1\(([ -~]+)\), vl\);
substitute by:
vssseg$1e$2_v_f$2m1($3, $4, vl);
* RVV: replace vlseg.v tuple types in load/store
* RVV: replace vloxseg2ei32.v tuple types
* RVV: add a wrapper for old compilers
* RVV: add segment load/store wrapper in pakcing
* RVV: fix cmake test
* RVV: make clang happy by dropping VLAs in sgemm
* RVV: add clang cmake toolchain configure
* RVV: add clang ci, riscv64-unknown-linux-gnu
Co-authored-by: thelastlin <thelastlin@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
汤圆奶昔
d30fc825d4
style: space alignment ( #4217 )
3 years ago
Lry89757
5eb56b2ea5
[Gelu x86] Finish intrinsic with elempack merged(fast version) ( #4144 )
* Finish the gelu x86 intrinsics
* Finish the fast tanh x86 simd impl
3 years ago
Lry89757
9f59711338
[Prelu x86] Finish intrinsic with elempack merged ( #4177 )
3 years ago
luqiang guo
5148224516
optmize softmax arm neon ( #4171 )
3 years ago
Menci
479a73a62a
remove duplicated newline ( #4188 )
3 years ago
Molly Sophia
1d7b2172cc
remove duplicated newline ( #4187 )
3 years ago
Lry89757
9278f90114
[Elu x86] Finish intrinsic with elempack merged ( #4153 )
3 years ago
nanjoin
3c0096c548
fix ConvolutionDepthwise allocator not updated ( #4173 )
3 years ago
tpoisonooo
acbaaa665b
fix compile warnings for unused parameter ( #4131 )
3 years ago
Lry89757
00c08d7bda
[Batchnorm x86] Merge the multiple elempack ( #4085 )
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
LinHe
03f2ad38ce
Layer Norm x86 SIMD Optimizations ( #4065 )
Co-authored-by: LinHeLurking <LinHeLurking@users.noreply.github.com>
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
nihui
b4ba207c18
more strict compiler rvv checks, drop rvv-071 support ( #4094 )
3 years ago
nihui
0666143513
fix vulkan winograd weight layout with cooperative matrix enabled ( #4093 )
3 years ago
miemie2013
720f3c9aab
Add DeformableConv2D ( #4070 )
* Add DeformableConv2D
* add unittest and docs
* pnnx torchvision deformconv2d conversion
Co-authored-by: miemie2013 <miemie2013@users.noreply.github.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
nihui
4f414c1806
implement 4d memorydata ( #4074 )
* implement 4d memorydata
* fix ncnnoptimize memorydata 4d
3 years ago
Lry89757
13a9533984
[BatchNorm Optimize x86] AVX512 intrinsic ( #4061 )
* Add the test samples for elempack==16
* Add the AVX512 Support for batchnorm
4 years ago
nihui
30ab31cc41
add address sanitizer ci, fix potential memory leak shouted by asan ( #4058 )
4 years ago
nihui
0ea7a672fa
fix undefined reference to vkGetAndroidHardwareBufferPropertiesANDROID, add android-29 shared ci ( #4056 )
4 years ago
nihui
4bc4a5ed0b
check mat create oom ( #4054 )
4 years ago
nihui
1d0917c83b
fix build with very old gcc ( #4048 )
* clear bom marker, avoid vector data function
4 years ago
nihui
b0c40fa644
unified arm eltwise elempack ( #4040 )
4 years ago
nihui
76849cede4
armv8.4 i8mm optimization for convolution gemm int8 ( #4034 )
4 years ago
nihui
dd86cebab8
armv8.6 ci and coverage ( #4025 )
* asimdfhm in fc
* move neon bf16 conversion function to arm_usability header
* fix cmake option
* fix build with newer gcc
* arm84 coverage
* arm asimdfhm optimization for innerproduct gemm fp16s
4 years ago
nihui
f1ea792b26
fix too many microtask error in old libomp runtime ( #4002 )
4 years ago
nihui
9b8272e86d
arm edsp and arm neon optimization for convolution int8 winograd ( #4017 )
4 years ago
nihui
a12cd7c212
mips msa and loongson mmi optimization for convolution int8 winograd f43 ( #4014 )
4 years ago