Fangjun Kuang
da44ec5b14
minor fixes
3 years ago
Fangjun Kuang
26b45c7301
small fixes
3 years ago
Fangjun Kuang
cc9881f1bc
add speech recognition demo videos using ncnn
3 years ago
Fangjun Kuang
607c8f8332
Update README to include sherpa-ncnn for real-time speech recognition ( #4424 )
3 years ago
mizu-bai
c4574586ca
Add Example ncnn-fortran ( #4423 )
3 years ago
nihui
5da70724b1
matmul x86 use sgemm ( #4421 )
3 years ago
tpoisonooo
edb70f5b35
Update README.md ( #4419 )
3 years ago
tpoisonooo
8fea27fbb5
Update model-convert.md ( #4352 )
3 years ago
wzyforgit
e06081308b
Flush benchmark of some CPU model by tag 20221128 ( #4418 )
Flush RTX3090、FT-2000、3A3000、3A4000、3A5000 benchmark data,add SW831 benchmark data.
3 years ago
nihui
1f1981052c
convolution deconvolution and deformableconv2d x86 use sgemm ( #4414 )
* drop old sgemm code
* fix convdw test
* fix avx512 gemm
* optimize prefer sgemm condition
3 years ago
nihui
9cc6eb1942
meet gemm x86 transpose alignment
3 years ago
nihui
18fbaebe68
get cpu l2 cache size and resolve gemm tile size ( #4411 )
* get cpu l2 cache size and resolve gemm tile size
* optimize constant tile K
* fix per-core l2 cache detection, better macos cpu cluster topology discovery
3 years ago
nihui
c5640a16c3
gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only ( #4407 )
* gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only
* relax mnk multiple restrictions
* make square tiles in each thread
* sanitize num_threads changes
3 years ago
nihui
d48f712599
force NxK size the multiple of native simd length to fix mis-alignment
3 years ago
nihui
2f8d1d4f9e
fix gemm x86 transpose b pack4 mis-alignment
3 years ago
nihui
fd1ac3c7a0
x86 optimization for gemm unified elempack ( #4387 )
3 years ago
nihui
18bb249564
fix some ci on ubuntu ( #4405 )
3 years ago
dependabot[bot]
58fca8c6e7
Bump pypa/cibuildwheel from 2.11.2 to 2.11.3 ( #4392 )
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.11.2 to 2.11.3.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.11.2...v2.11.3 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 years ago
nihui
03550ba532
Update release-python.yml
3 years ago
nihui
c934c6e94a
fix openmp affinity abort when cpu goes offline ( #4370 )
3 years ago
tpoisonooo
bdcbc37a2d
fix(pybind11): build error ( #4368 )
3 years ago
shaoshengsong
47c4ab7394
add example project link ( #4365 )
3 years ago
magicse
8f9a524027
I added one more project to the list of examples. ( #4205 )
* Dedicated to coloring black and white photographs.
3 years ago
nihui
cf07bd9083
disable out-of-line atomics since ndk23+ for resolving linking issue with old ndk ( #4362 )
3 years ago
nihui
f527fe88ee
update glslang ( #4361 )
3 years ago
nihui
0736c5b658
Fix c api allocator ( #4360 )
* add some c_api interfaces related to allocator setup.
* fix errors in allocator parameters in c_api.
* test c api allocator
Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com>
3 years ago
nihui
6647396667
update release ci ( #4359 )
* update release ci
* find modern glslang
* parallel jobs on windows
3 years ago
Zhuo Zhang
a5e60ae11c
Fix windows-arm64 build for non-neon case ( #4227 )
3 years ago
Ikko Ashimine
cdba4ae936
Fix typo in stb_image.h ( #4358 )
exitting -> exiting
3 years ago
Fangjun Kuang
1b83fe4f16
Support mat.numpy() in Python ( #4356 )
3 years ago
nihui
057b5bb515
split tests ( #4354 )
3 years ago
nihui
aed05aa851
pnnx fuse more function to module ( #4351 )
* pnnx fuse more function to module
* rename some pass name
* fuse adjacent reshape, fuse pad conv2d
* fuse pad conv1d
3 years ago
nihui
ec1b07c9fe
pnnx fp16 option for ncnn and onnx weight type ( #4350 )
3 years ago
nihui
6967baaccc
pnnx convert torch bitwise left_shift right_shift ( #4349 )
3 years ago
nihui
eceac35a7f
implement MultiheadAttention kdim vdim ( #4347 )
3 years ago
nihui
498ca7341b
squeeze and expanddims 4d ( #4346 )
3 years ago
Lry89757
6a47f8d15c
gridsample op support ( #4288 )
Co-authored-by: LRY89757 <LRY89757@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
Co-authored-by: nihui <shuizhuyuanluo@126.com>
3 years ago
nihui
6019f47f08
ci loongarch64 lsx ( #4344 )
3 years ago
junchao-loongson
279222c2c9
add vector optimization for loongarch64 ( #4242 )
3 years ago
nihui
a2af6369d9
match inplace slice copy pattern, rewrite copy uses ( #4338 )
3 years ago
nihui
a7e3c62a1b
save foldable constants in file for reducing memory usage ( #4337 )
3 years ago
nihui
cb88e16fdf
pnnx save onnx zero ( #4077 )
3 years ago
WuJinxuan
abb28435d6
fix:pnnx-softmax ( #4333 )
3 years ago
nihui
92da26be79
pnnx load gpu torchscript and reset device ( #4330 )
3 years ago
nihui
5b28c1730e
implement ncnn fold and unfold ( #4326 )
3 years ago
shaoshengsong
d522e78af1
support yolov5 6.2 ( #4328 )
3 years ago
nihui
a12c24d328
pnnx convert fold unfold ( #4325 )
3 years ago
nihui
b8d40a960f
pnnx convert nn.Softmax2d ( #4324 )
3 years ago
nihui
bcf06bd1c0
fold new_full and full_like ( #4323 )
3 years ago
nihui
2ef57a6204
fix ci pnnx build
3 years ago