li mengyang
ff4d05a713
fix typo ( #4482 )
3 years ago
nihui
2e3e680d77
x86 optimization for packed convolution unified elempack ( #4469 )
3 years ago
nihui
bd5bbe3f2c
x86 optimization for winograd unified elempack part2 ( #4470 )
* improve gemm packb threading
* optimize tile size
* profile winograd condition
* handle threads changes
3 years ago
ws
643285a08c
fix macos vulkan instance create failed when vulkan sdk version >= 1.… ( #4472 )
* enable VK_KHR_portability_subset extension if device support it
Co-authored-by: w1ndseeker <w1ndseeker@users.noreply.github.com>
3 years ago
nihui
88274827da
x86 optimization for winograd unified elempack ( #4456 )
3 years ago
nihui
f0a91f46f5
update macos ci xcode version and vulkan sdk ( #4465 )
3 years ago
WuJinxuan
ad956c8c9c
[ARM] GELU ( #4464 )
3 years ago
wyushun
68de8a2128
fix output_indexes name ( #4453 )
3 years ago
Jiao Dian's Power Plant
b07c5fc811
Remove unused imports in python ( #4378 )
3 years ago
WuJinxuan
10e9d91576
Add x86 MultiHeadAttention ( #4443 )
* fix doc, sync x86 gemm fix
Co-authored-by: EdVince <EdVince@users.noreply.github.com>
Co-authored-by: nihuini <nihuini@tencent.com>
3 years ago
nihui
15761fc1a6
arm vfpv4 asimdhp asimdfhm optimization for gemm ( #4432 )
3 years ago
nihui
c471826da1
fix arm bfloat2float float2bfloat oops ( #4439 )
3 years ago
dependabot[bot]
b5884827d6
Bump pypa/cibuildwheel from 2.11.3 to 2.11.4 ( #4438 )
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.11.3 to 2.11.4.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.11.3...v2.11.4 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 years ago
nihui
88dba58992
fix gemm transpose B wrong result when tile N is not a multiple of 4, optimize load C ( #4430 )
3 years ago
Yoh
0fffefd3c9
fix crop bug and eliminate Tensor.clone ( #4416 )
* fix crop bug and eliminate Tensor.clone
* fix crop slice bug on msvc
* fix slice bug on msvc
3 years ago
nihui
7b3261dace
gemm arm optimization ( #4426 )
* cmake determine target 32bit and 64bit
* include opt source with non-runtime cpu
* check compiler support gnu style inline assembly
3 years ago
inisis
62fc16d157
pnnx readme remove duplicate space ( #4428 )
3 years ago
Fangjun Kuang
607c8f8332
Update README to include sherpa-ncnn for real-time speech recognition ( #4424 )
3 years ago
mizu-bai
c4574586ca
Add Example ncnn-fortran ( #4423 )
3 years ago
nihui
5da70724b1
matmul x86 use sgemm ( #4421 )
3 years ago
tpoisonooo
edb70f5b35
Update README.md ( #4419 )
3 years ago
tpoisonooo
8fea27fbb5
Update model-convert.md ( #4352 )
3 years ago
wzyforgit
e06081308b
Flush benchmark of some CPU model by tag 20221128 ( #4418 )
Flush RTX3090、FT-2000、3A3000、3A4000、3A5000 benchmark data,add SW831 benchmark data.
3 years ago
nihui
1f1981052c
convolution deconvolution and deformableconv2d x86 use sgemm ( #4414 )
* drop old sgemm code
* fix convdw test
* fix avx512 gemm
* optimize prefer sgemm condition
3 years ago
nihui
9cc6eb1942
meet gemm x86 transpose alignment
3 years ago
nihui
18fbaebe68
get cpu l2 cache size and resolve gemm tile size ( #4411 )
* get cpu l2 cache size and resolve gemm tile size
* optimize constant tile K
* fix per-core l2 cache detection, better macos cpu cluster topology discovery
3 years ago
nihui
c5640a16c3
gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only ( #4407 )
* gemm x86 multiply alpha beta in post gemm stage, enable one_blob_only
* relax mnk multiple restrictions
* make square tiles in each thread
* sanitize num_threads changes
3 years ago
nihui
d48f712599
force NxK size the multiple of native simd length to fix mis-alignment
3 years ago
nihui
2f8d1d4f9e
fix gemm x86 transpose b pack4 mis-alignment
3 years ago
nihui
fd1ac3c7a0
x86 optimization for gemm unified elempack ( #4387 )
3 years ago
nihui
18bb249564
fix some ci on ubuntu ( #4405 )
3 years ago
dependabot[bot]
58fca8c6e7
Bump pypa/cibuildwheel from 2.11.2 to 2.11.3 ( #4392 )
Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel ) from 2.11.2 to 2.11.3.
- [Release notes](https://github.com/pypa/cibuildwheel/releases )
- [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md )
- [Commits](https://github.com/pypa/cibuildwheel/compare/v2.11.2...v2.11.3 )
---
updated-dependencies:
- dependency-name: pypa/cibuildwheel
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
3 years ago
nihui
03550ba532
Update release-python.yml
3 years ago
nihui
c934c6e94a
fix openmp affinity abort when cpu goes offline ( #4370 )
3 years ago
tpoisonooo
bdcbc37a2d
fix(pybind11): build error ( #4368 )
3 years ago
shaoshengsong
47c4ab7394
add example project link ( #4365 )
3 years ago
magicse
8f9a524027
I added one more project to the list of examples. ( #4205 )
* Dedicated to coloring black and white photographs.
3 years ago
nihui
cf07bd9083
disable out-of-line atomics since ndk23+ for resolving linking issue with old ndk ( #4362 )
3 years ago
nihui
f527fe88ee
update glslang ( #4361 )
3 years ago
nihui
0736c5b658
Fix c api allocator ( #4360 )
* add some c_api interfaces related to allocator setup.
* fix errors in allocator parameters in c_api.
* test c api allocator
Co-authored-by: zhangtongshe <yuyuyezi@vip.qq.com>
3 years ago
nihui
6647396667
update release ci ( #4359 )
* update release ci
* find modern glslang
* parallel jobs on windows
3 years ago
Zhuo Zhang
a5e60ae11c
Fix windows-arm64 build for non-neon case ( #4227 )
3 years ago
Ikko Ashimine
cdba4ae936
Fix typo in stb_image.h ( #4358 )
exitting -> exiting
3 years ago
Fangjun Kuang
1b83fe4f16
Support mat.numpy() in Python ( #4356 )
3 years ago
nihui
057b5bb515
split tests ( #4354 )
3 years ago
nihui
aed05aa851
pnnx fuse more function to module ( #4351 )
* pnnx fuse more function to module
* rename some pass name
* fuse adjacent reshape, fuse pad conv2d
* fuse pad conv1d
3 years ago
nihui
ec1b07c9fe
pnnx fp16 option for ncnn and onnx weight type ( #4350 )
3 years ago
nihui
6967baaccc
pnnx convert torch bitwise left_shift right_shift ( #4349 )
3 years ago
nihui
eceac35a7f
implement MultiheadAttention kdim vdim ( #4347 )
3 years ago
nihui
498ca7341b
squeeze and expanddims 4d ( #4346 )
3 years ago