nihuini
c8ccccf045
adapt mlir changes
5 years ago
nihuini
c17eb4e208
multiheadattention layer
5 years ago
Zhiqiang Wang
2c370b2c58
Fix typo in docs ( #2727 )
5 years ago
nihuini
b51959802c
fix buffer2host copy, fix #2725
5 years ago
nihuini
b0d16325b1
fuse onnx binaryop with scalar
5 years ago
Cai Shanli
2edb3ed7a4
nanodet python demo ( #2723 )
* nanodet python demo
* add clip
* fix clip wh
* remove nms package requirement
5 years ago
nihuini
f7cbcaa72b
fix onnx normalize expand ghost shape
5 years ago
nihuini
c910574b5b
fuse onnx multiheadattention
5 years ago
teng
c3466a7798
fix array index out of bounds in examples/yolact.cpp ( #2722 )
5 years ago
RangiLyu
ecf1f413b4
fix duplicate variable name in examples/nanodet.cpp ( #2719 )
5 years ago
nihuini
f2a5ea7678
fix layernorm ghost input without affine
5 years ago
nihuini
7ac23ab34d
fuse onnx layernorm, fix 2-dim layernorm implementation, add test
5 years ago
zhiliu6
57397c418d
Optimize general AVX2 convolution. ( #2714 )
5 years ago
Xu Yang
fd634e9a58
remove unnecessary mat clone when NCNN_BENCHMARK enabled ( #2708 )
5 years ago
Dahan Gong
cbd410c237
fix broken inplace forward ( #2709 )
5 years ago
restyled-io[bot]
8c9bea2322
Restyle faster bbox calculation by background score ( #2693 )
* faster bbox calculation by background score
* Restyled by clang-format
* Restyled by astyle
* Restyled by clang-format
* Restyled by astyle
Co-authored-by: Qoo <r97922153@gmail.com>
Co-authored-by: Restyled.io <commits@restyled.io>
5 years ago
Zhuo Zhang
06e40a75d7
doc: add build-mlir2ncnn with commit sha and steps ( #2706 )
* doc: add build-mlir2ncnn with commit sha and steps
* add mlir git log detail
5 years ago
ncnnnnn
4083eafbc0
update how-to-build.md benchncnn out format ( #2704 )
5 years ago
小林
6afc736225
Update README.md ( #2699 )
5 years ago
zylo117
41fba71fa0
fix adaptive avg pooling accumulation overflow in vulkan using fp16 arithmetic ( #2698 )
5 years ago
nihui
3c92a1184b
arm neon optimization for general convolution im2col sgemm ( #2668 )
* arm neon optimization for conv3x3s1 winograd42
* better condition
* Update test_convolution.cpp
* Update test_convolution.cpp
* more proper conditions
* arm neon optimization for general im2col sgemm pack4
* add sgemm
* wip
* wip
* fix armv7 build
* more conditions blah blah
* code format
* fix convolution
* move packed convolution to seperated header source
* unify weight data bf16
* proper conditions
* conv3x3s2 sgemm pack4 test
5 years ago
zylo117
31bc57b1e2
fix a missing import in python setup.py ( #2691 )
5 years ago
zylo117
65d71d8f23
support adaptive_pooling in vulkan implementation ( #2681 )
5 years ago
Youngsoo Lee
b9bed8d993
feat: add denormal options ( #2656 )
* feat: add denormal options
Flush-To-Zero(FTZ) and Denormals-Are-Zero(DAZ) are modes that bypass IEEE754 methods of dealing with denormal floating-point numbers on x86_64 and some x86 CPUs.
* feat: Integrate `flush_denormals` into `Extractor::extract`
* chore: replace global variable with `ThreadLocalStorage`
5 years ago
nihui
4409052cd2
link pthread explicitly for simpleomp and simplestl build
5 years ago
ncnnnnn
65d2651343
add android cmake ninja ( #2680 )
5 years ago
nihui
3ed6c21565
find threads in cmake config
5 years ago
nihui
248929811c
find and link with thread library
5 years ago
nihui
d3d16d2413
fix ncnnoptimize crash on models with multiple custom layers
5 years ago
nihui
9fd4d371ae
bridge image for adreno image upload and download ( #2658 )
* add bridge image for adreno image storage upload and download
* enable sbn1, print bugbilz flag
* blacklist old adreno
* let user choose use_image_storage option even when bug_storage_buffer_no_l1
5 years ago
Evgeny Proydakov
0c5b33d330
Using new version of glslang. Fixed several compile warnings: [-Wsign-compare], [-Wunused-parameter] ( #2615 )
5 years ago
RangiLyu
62761720bd
add NanoDet support in readme ( #2674 )
5 years ago
nihuini
a1839f6bce
fix ncnnoptimize shufflechannel reverse mode
5 years ago
nihuini
5b72a37544
fix megvii style shufflechannel blob count mismatch
5 years ago
nihui
ab56083ca5
arm neon optimization for conv3x3s1 winograd42 ( #2664 )
5 years ago
Cai Shanli
90a1fa158d
Fix model zoo download ( #2660 )
* fix model zoo download files
* fix yolov4.bin download split
* fix python/examples/shufflenetv2
5 years ago
Cai Shanli
e9fc84baae
add pypi wheels aarch64 ( #2662 )
5 years ago
Cai Shanli
eb6797ac7d
fix model zoo download files ( #2659 )
5 years ago
Youngsoo Lee
c16342ca15
fix(cmake): fix can't build on macos ( #2655 )
Co-authored-by: YoungSoo Lee <youngsoo.lee@navercorp.com>
5 years ago
nihuini
f47fbcbf83
preserve the very first onnx constant dimension size when we guess it is not the batchsize one, fix #2487
5 years ago
nihuini
2a57ca4942
reduce memory usage in lightmode, handle upload image allocation failure properly
5 years ago
Cai Shanli
71e886bed5
fix typo ( #2652 )
5 years ago
Zhuo Zhang
c2ed1874fc
workaround bugihfa cannot access fp16 element by vector index ( #2651 )
5 years ago
nihuini
3bf03379d7
fix pipeline compilation error on image store fp16sa
5 years ago
nihuini
bd68ee487b
fallback to cpu when image allocation failed, fix #2648
5 years ago
nihuini
77db369cae
fix upload data may be invalidated by previous pipelines with the same memory on integrated device, fix #2640
5 years ago
Zhuo Zhang
7c8c6d9ce9
add doc: use ncnn with own project ( #2645 )
5 years ago
DC Technology
11d1de7858
fix iOS minimum version,when used Vukan 1.2 ( #2621 )
* fix iOS minimum version,when used Vukan 1.2
* Update IOS mininum version to 9.0
5 years ago
nihuini
92be1cb404
fix conv1x1s1 int8 armv7 requant non-neon part
5 years ago
Cai Shanli
18980d717e
release to pypi on windows, linux and macos ( #2644 )
* release python
* fix twine upload only push tags, fix linux bdist_wheel so path error
* fix long_description_content_type
* remove log print
* set cibuildwheel version 1.6.3
* add pyproject.toml
* set release python only on push tag
* add ci build python 2.7
* remove python2.7; update readme.md
5 years ago