3bf73ff1
feat(dnn): add cuda preprocess fusion by
2020-12-01 17:26:17 +0800
86cf7490
feat(dnn/aarch64): add quantizeds4 matmul int4x4x16_k8x8x8 by
2020-09-16 17:18:58 +0800
bff0fc61
fix(mge/interpreter): fix outputs check on async level0 by
2020-12-07 15:04:09 +0800
142f31a8
perf(dnn/cuda): change conv_bias heu, prefer dnn chanwise impl, dislike dnn batch gemm conv1x1 by
2020-12-07 17:15:55 +0800
f214e146
refactor(mgb/cuda): use single implementation of get_device_prop from utils by
2020-12-07 17:30:05 +0800
54e79dd1
perf(mgb/cuda): do not call cudaGetDeviceProperties to avoid io traffic by
2020-12-06 17:01:40 +0800
5f171298
feat(mgb/gopt): add AxisAddRemove opr support for cd4 opt pass by
2020-12-04 14:34:36 +0800
93f4977c
feat(mge/imperative): add thread name by
2020-12-04 15:45:41 +0800
98a74e4a
refactor(dnn): refactor opr proxy in test by
2020-12-02 22:04:29 +0800
57546b4c
test(mge/distributed): fix test skip condition error by
2020-12-04 12:25:08 +0800
90e7cb00
feat(externcopr/lar): imp lar run extern c opr with dynamic param by
2020-11-26 20:20:52 +0800
dbb64b46
feat(debug/android): opt android backtrace by
2020-12-03 15:09:59 +0800
3e00e3f6
feat(debug/linux): opt linux backtrace by
2020-12-03 09:04:14 +0800
783a6126
feat(debug/macos/windows): imp macos/windows backtrace, fix mem issue by
2020-11-30 19:21:18 +0800
e92670e8
fix(mgb/atlas): when batchsize more than model max batchsize by
2020-12-02 17:32:18 +0800
147dbf8a
fix(test): fix a race condition in TestCudaMemAlloc by
2020-12-02 15:25:37 +0800
7066ad5b
feat(dnn): add uint16 support by
2020-11-23 14:45:41 +0800
a1877ee0
refactor(dnn): refactor algo interface, use algoinfo instead of global algorithm by
2020-11-16 23:08:37 +0800
cb59c278
feat(mlir/ir): add more op definitions by
2020-11-27 18:06:08 +0800
9ec8d375
feat(externcopr): add config extern c opr dynamic param by
2020-11-21 18:50:05 +0800
ee4ea7fd
test(distributed/test): make distributed test more stronger by
2020-11-25 19:29:15 +0800
3ecded74
refactor(distributed/server): use port 0 to get available port by
2020-11-25 19:28:11 +0800
88e918e2
feat(mgb/jit): add scf.ForOp in MgbToGpuLoweringPass by
2020-11-29 17:18:36 +0800
7aa54b0e
feat(mge): enable memory swap and drop/recomputation by
2020-11-09 00:40:06 +0800
6f5d0feb
perf(dnn/cuda): enhance performance for pooling forward by
2020-11-25 16:51:32 +0800
0560a218
chore(dnn/test): refactor megdnn arm_common test by
2020-11-21 18:05:52 +0800
f7731bd4
fix(mgb/jit): fix a pointer bug in mlir executable_cuda by
2020-11-26 17:08:52 +0800
810d8cba
fix(mgb/jit): add cmake target MLIRTosa for latest llvm-project by
2020-11-25 17:17:57 +0800
2ad8c5e1
fix(mge/io_remote): fix remote send/recv gradient at trace by
2020-11-25 14:49:52 +0800
f470df4f
fix(mgb/opr): fix convbias with no bias when weight preprocess by
2020-11-25 18:46:06 +0800
6c4841e8
fix(mge/quantization): `disable_fake_quant` does not work correctly by
2020-11-23 13:50:26 +0800
aa953c3b
fix(mge/module): fix missing import by
2020-11-25 11:23:21 +0800
5a01de78
fix(mge): fix scalar transpose by
2020-11-24 17:55:46 +0800
6b9ac894
fix(mgb/topk): fix topk grad by
2020-11-20 20:38:50 +0800
6856ce9c
feat(dnn): support conv bias activation for nchw4 input tensor format and nchw output tensor format by
2020-11-12 17:01:50 +0800
85368643
chore(version): fix dev version to a large number by
2020-11-23 11:02:33 +0800
61c5c9cf
chore(cmake): normlize some cmake message level by
2020-10-20 11:48:30 +0800
c2255914
(tag: v1.1.0, release-1.1)
chore(release): bump version by
2020-11-23 14:40:51 +0800
2e874208
fix(module): fix docs in normalization by
2020-11-21 17:32:08 +0800
638ab52f
feat(mge/imperative): simulates scalar by
2020-09-30 17:11:26 +0800
7167fdbd
feat(mge/module): add normalization module includes group_norm, instance_norm and layer_norm by
2020-08-11 14:58:57 +0800
94dba16f
perf(mge/imperative): misc optimizations by
2020-11-20 17:48:50 +0800
9f139562
test(mge): fix requires-test.txt for doctest by
2020-11-19 21:40:00 +0800
442058ae
feat(mge/quantize): add batch matmul activation module for inference by
2020-11-16 17:56:27 +0800
81e0dae4
docs(mge): update GradManger docs by
2020-11-19 18:15:42 +0800
66671006
feat(mge): use weakref for GradManger.attach by
2020-11-19 18:14:45 +0800
75ca5bfe
feat(mge): remove GradManager.detach until it is ready by
2020-11-17 16:25:49 +0800
544b7983
revert: feat(mge/grad_manager): add `clear_grad` method for GradManager by
2020-11-17 15:28:34 +0800
176268d2
test(mge): disable gc in leak check by
2020-11-20 21:35:13 +0800
5d0f8da4
feat(mgb/jit): add Dimshuffle and lowering passes in jit mlir backend by
2020-11-19 23:34:06 +0800
0007b9e0
build(third_party): update llvm-project by
2020-11-16 14:35:49 +0800
404ef808
feat(mgb/jit): adapt jit mlir backend to new mgb dialect and add typecvt by
2020-11-13 06:24:02 +0800
cc85047b
fix(mge/trace): fix sublinear in trace by
2020-11-19 12:34:21 +0800
b9918c32
feat(mge/distributed): support distributed key-value store by
2020-11-18 14:13:40 +0800
ab82c8da
feat(mge/device): add python binding for get_mem_status_bytes by
2020-11-18 19:31:49 +0800
ab6328c5
feat(imperative): port persistent cache by
2020-10-29 13:39:08 +0800
60c6d59f
feat(mbg/core): support bias preprocess in conv_bias by
2020-11-13 15:12:38 +0800
ff8ef9ed
docs(dnn): add comments of weight prerpocess interface by
2020-11-11 15:27:57 +0800
13272eaa
fix(mge/trace): fix random op in symbolic trace by
2020-11-18 21:48:26 +0800
1fed5929
perf(mge/optimizer): close conver_inputs for optimizer step by
2020-10-28 21:42:28 +0800
1f75c7ad
ci(midout): fix midout and reopen midout test we just test function, do not check size by
2020-10-28 15:19:05 +0800
49547295
fix(trace): link io-op to avoid deadlock by
2020-11-18 11:39:59 +0800
064c774f
feat(imperative): impl hashable for SendRecv and add virtual input for Recv by
2020-11-18 11:38:06 +0800
798f7b3e
feat(imperative): add virtual_deps op by
2020-11-18 11:31:51 +0800
1e71e0af
refactor(dnn): refactor deconv algo by
2020-11-17 00:52:11 +0800
89ad33ae
feat(dnn/cuda): support weight preprocessing for cutlass algorithms by
2020-11-04 10:41:54 +0800
33e8879a
feat(mge/quantization): support distributed qat by
2020-11-16 15:21:26 +0800
8e11204a
perf(mge/module): optimize conv_bn qat module to improve performance by
2020-11-17 11:22:34 +0800
51fa530d
fix(mge/interpreter): add check for invalid tensor ptr by
2020-11-10 15:16:16 +0800
634de590
feat(mge/imperative): add valid flag of `infer_output_attrs_fallible` by
2020-11-06 18:27:43 +0800
50c4daac
feat(mge/interpreter): add async_level mechanism for Interpreter by
2020-11-02 17:41:20 +0800
82b0f677
fix(mge/core): fix dtype promotion issue for quantized dtype by
2020-11-16 11:41:11 +0800
8118a594
fix(mge/utils): fix get_oprs_seq of cgtools by
2020-11-13 18:09:59 +0800
ae8c3c81
feat(mge/functional): add python wrapper for fake quant opr by
2020-11-10 18:25:23 +0800
b60cc8ca
feat(mgb/opr): add megbrain fake quant opr by
2020-11-10 18:23:14 +0800
c03249c0
feat(dnn/opr): add megdnn fake quant opr by
2020-11-10 18:22:16 +0800
2e530779
fix(mge/trace): use xpux device when dump by
2020-11-11 11:52:32 +0800
739f927c
feat(dnn/cuda): opt dp4a conv for small channel base on cutlass by
2020-11-09 18:01:02 +0800
4f9948d0
chore(mbg/core): add MGB_WORKER_SHORT_SPIN env variable to set short spin by
2020-11-11 17:47:57 +0800
4f527af4
chore(mbg/core): fix the repeat code by
2020-11-11 17:45:18 +0800
1f8e4075
fix(mkl): fix windows mkl LOG compute exception by
2020-11-13 12:43:58 +0800
fea5cf62
fix(sdk/loader): debug sstar loader by
2020-10-14 22:26:30 +0800
9cb3c07c
feat(mge/functional): add elemwise mode support string input by
2020-11-12 11:26:17 +0800
8fad00a1
feat(mge/functional/nn): add conv1d padding by
2020-10-21 22:45:41 +0800
4aa277a2
refactor(dnn/cuda): misc by
2020-11-06 17:47:01 +0800
094601e8
feat(mge/distributed): allow remote grad by using grad manager by
2020-11-12 14:45:05 +0800
f7b2bdae
refactor(dnn): refactor algorithm type interface by
2020-11-07 21:34:09 +0800
d793c87c
refactor(mlir/dialect): redefine mgb dialect by
2020-11-11 16:27:57 +0800
005ead5a
docs(cpu/comp_node): note cpu_default() by
2020-11-10 14:45:04 +0800
9415ba58
feat(src/core): free weight preprocessed weight by
2020-10-12 21:17:56 +0800
7cd71c31
fix(mgb/gopt): fix cd4 elewise transform by
2020-11-03 20:40:50 +0800
cae8c8a4
test(mge/parampack): add test for parampack by
2020-11-10 12:54:00 +0800
02438ee6
refactor(mge/distributed): use thread.Threading to create Server by
2020-11-09 15:49:37 +0800
a1efd4d5
fix(mge/parampack): fix param pack when no param left by
2020-11-09 15:33:52 +0800
e0e18964
docs(mge): formatting math in docstring by
2020-11-09 12:10:32 +0800
18ec5341
refactor(dnn): remove unused costmodel in cuda by
2020-11-10 23:12:13 +0800
e39f9386
refactor(dnn): remove ProfileCache and matmul algo in x86 by
2020-11-10 22:20:27 +0800
28dbadf7
feat(imperative): add remap opr by
2020-09-29 15:13:51 +0800
280861ae
feat(atlas): add acl.json support, use for profile by
2020-08-05 15:16:54 +0800
1e0fb127
perf(mge/functional): reduce matmul python overhead by
2020-11-09 00:09:51 +0800