nihui
6c261a8c04
fix the missing elemsize in vkimagemat from_android_hardware_buffer ( #5237 )
2 years ago
nihui
ded0b78bb2
fix nvidia vulkan crash on exit ( #5234 )
2 years ago
nihui
8c4fc5e2a0
enable uniform 16bit and 8bit when available, fix validation error in fp16sa shader ( #5233 )
2 years ago
nihui
b7f70cfe4e
initialize cpu thread affinity mask all to all cores ( #5231 )
call omp_set_num_threads with zero num_threads is implementation defined
2 years ago
nihui
5a8ce63af4
optimize resize bilinear and compress font data ( #5200 )
2 years ago
nihui
eea3fc9b41
optimize vulkan global pooling ( #5191 )
Co-authored-by: nihui <nihui@users.noreply.github.com>
Co-authored-by: michaelcai <michaelcai@tencent.com>
2 years ago
nihui
1138312f1e
detect avx512 isa with signal action on macos ( #5185 )
2 years ago
nihui
dba87f8cad
fix build with msvc arm64 asimdhp ( #5176 )
2 years ago
nihui
deae9e61da
disable rtti and exceptions for msvc ( #5167 )
* disable rtti and exceptions for msvc
* warnings--
* erff
* arch sse2 for 32bit build
* enable rtti for cross compiling
2 years ago
nihui
058aa0ad37
enable arm neon intrinsics for msvc build ( #5151 )
2 years ago
AlOa
9f26eeb5a7
Prelu layer uses sse instruction _mm_load_ps but data can be misaligned so it must use _mm_loadu_ps ( #5149 )
2 years ago
Justin Fung
465debe9bb
Add print statements for 4 dimensions benchmark ( #5148 )
2 years ago
nihui
4136de3b8d
arm optimization for convolution int8 packed unified elempack ( #5147 )
2 years ago
張小凡
2ecaf37a3e
Fix find GPU driver dll path in windows ( #5141 )
2 years ago
nihui
b4f26237cb
in-house vulkan loader ( #5130 )
* vulkan-driver-loader.md
* static vulkan on apple
2 years ago
ningjiang233
b2f12fdd67
delete useless setences ( #5139 )
2 years ago
nihui
39bc71c941
support big endian platform, add powerpc ci ( #5121 )
2 years ago
nihui
4494aadd74
deconvolution dynamic weight ( #5119 )
2 years ago
nihui
6c6c40edb3
fix deconvolution x86 unaligned bias load ( #5112 )
2 years ago
nihui
14e14a9ae8
slice with indices ( #5103 )
2 years ago
nihui
3eb2969db9
fix build with ohos toolchain ( #5105 )
2 years ago
nihui
9dda7e385a
fix gridsample x86 warnings ( #5096 )
2 years ago
nihui
7afdbfa680
simplify vulkan conv1d ( #5095 )
2 years ago
nihui
54ab8051e3
fix warnings ( #5094 )
2 years ago
邓实诚
a1e3ebf8e5
implement simplemath ( #4905 )
* complete abs, fmod and sin function in simplemath.h
* remove some unused variables in simplemath.cpp
* modify test-coverage.yml and add some functions to simplemath.cpp
* modify erf.cpp which included math.h
* include platform.h for NCNN_SIMPLEMATH definition
* move utility constants and functions in simplemath.h to simplemath.cpp
* guard simplemath functions with extern "C"
* add NCNN_EXPORT macro in simplemath.h
* include plateform.h and guard all declarations with NCNN_SIMPLEMATH
* clean unused code in test_unaryop.cpp
* guard #include <vector> with NCNN_SIMPLEMATH in benchncnn.cpp
* add 'static' to guard functions that not declarated in header file
* modify sin and cos with better implementation
---------
Co-authored-by: HonestDeng <HonestDeng@users.noreply.github.com>
2 years ago
nihui
80b3b9c6f0
arm optimization for convolution int8 winograd unified elempack ( #5087 )
* enable out elempack 8 for winograd and sgemm
2 years ago
Yoh
3f437d3f3d
Grid sample op ( #4373 )
* pnnx support grid_sample op
* complete the permute and gridsample operator fusion
* spilt calculation into two stages and support permute fusion
2 years ago
FhqTreap
dc25128195
Vulkan conv1d ( #5060 )
2 years ago
Xinyu302
b82d395753
Add riscv float32 gemm ( #4903 )
Co-authored-by: Xinyu302 <Xinyu302@users.noreply.github.com>
2 years ago
nihui
7b02425246
x86 optimization for convolution int8 winograd unified elempack ( #5054 )
2 years ago
張小凡
b4f8fa6d38
Fixed _mm256_set_m128 is only availble on gcc8+. issue#5072 ( #5075 )
2 years ago
daquexian
75ad1cc749
support tag in memorydata layer ( #5061 )
Signed-off-by: daquexian <daquexian566@gmail.com>
2 years ago
nihui
26a70c9b05
fix build with vanilla c906 toolchain ( #5048 )
2 years ago
nihui
78aca88d67
elu 4d and selu 4d ( #5047 )
2 years ago
Beq Jal
019176c6b2
selu and shufflechannel on x86 ( #5017 )
2 years ago
nihui
fdf2c482dc
fuse adaptive pool dynamic output size, implement ncnn adaptive pooling dynamic outsize ( #5043 )
2 years ago
Amir Ramezani
7e5fa3ade3
shrink operator ( #5022 )
2 years ago
FhqTreap
a12a14f3a6
Gelu afp fix ( #5039 )
2 years ago
nihui
c8662cce5e
arm optimization for convolution int8 gemm unified elempack ( #5016 )
2 years ago
nihui
4da33b195e
prevent some old gcc using high registers as kernel values ( #5036 )
2 years ago
Amir Ramezani
0ea587b8c7
celu activation vulkan and onnx conversion ( #5018 )
2 years ago
Beq Jal
bcfec1da33
Celu layer and export to ncnn ( #5019 )
2 years ago
Beq Jal
c851231832
add diag layer and its converter ( #4935 )
2 years ago
Amir Ramezani
695f770eab
erf implementation ( #5012 )
* added erf implementation
* added testcase for erf
* added onnx2ncnn support of erf
2 years ago
FhqTreap
e14cc272ac
gelu vk op tanh fix ( #5008 )
2 years ago
FhqTreap
cc54b889d5
add gelu vulkan operator ( #5001 )
2 years ago
nihui
9ecf6a61be
x86 optimization for convolution int8 gemm unified elempack ( #4881 )
2 years ago
lrw04
1a61bcd286
Fix potential overflow in determining the datatype ( #4988 )
2 years ago
nihui
c003281c53
fix unaryop round build with old glibc ( #4963 )
2 years ago
nihui
7c4969ed58
fix convolution winograd43 vulkan on fixed shape hint ( #4973 )
2 years ago