c2293815
fix(autodiff): proxy_graph_detail::make_backward_graph support multiple opnodes by
2022-03-17 17:20:21 +0800
335d51b4
ci(mge): set env when downloading mkl by
2022-03-25 13:04:01 +0800
0948f587
ci(mge): download mkl from ftp by
2022-03-23 15:56:39 +0800
22be7e11
docs(mge/functional): typo by
2022-03-22 08:57:13 +0000
111c150d
docs(mge/functional): fix typo by
2022-03-22 08:53:46 +0000
240a685f
feat(opencl): opt lite-OpenCL api: opencl_clear_global_data and enable_opencl_deploy lite api by
2022-03-11 18:38:13 +0800
4f60fbbb
fix(mge/distributed): add polling to solve xmlrpc server io error by
2022-03-09 17:23:15 +0800
273c0e87
fix(autodiff): fix some bugs in relation to 2nd order grad by
2022-03-01 15:16:53 +0800
bc9aa47a
feat(mge/indexing): support newaxis by
2022-02-28 13:57:52 +0800
9779bc7f
fix(imperative): allow rng op infer shape fallible by
2022-03-02 11:20:58 +0800
8f7fa90c
fix(lite): fix lar cuda option crash for mdl model by
2022-03-17 13:52:53 +0800
1071f2ab
build(flatbuffer): fix uclibc build flatbuffer by
2022-03-22 16:09:00 +0800
7582157a
chore(deps): change flatbuffers repo from google to MegEngine by
2022-03-22 14:42:32 +0800
8fa9a8de
fix(imperative): fix dot-op implement by
2022-03-19 22:05:07 +0800
6c413ba9
refactor(mge): refactor physical tensor by
2022-03-18 18:33:31 +0800
d56570d9
fix(megbrain): add rdnn to copybara by
2022-03-18 18:03:18 +0800
7de1bb11
fix(mge/utils): disable memory forwarding for subgraph by
2022-03-18 16:20:48 +0800
b7c9361f
perf(mge/functional): add infer_output_attrs_fallible for some ops by
2022-03-17 22:29:30 +0800
a4327c4d
perf(imperative): add dim_expansion transform for conv/bn1d by
2022-03-17 14:13:11 +0800
72a70dd6
perf(imperative): specialize convolution implementation by
2022-03-14 20:17:32 +0800
12a3ef8d
refactor(fastrun): decouple fastrun from computing graph by
2022-03-09 19:07:19 +0800
0a6f4a88
fix(mge/dtr): fix dtr problem by
2022-03-15 14:33:02 +0800
529b394f
fix(imperative): fix profiler problem by
2022-03-16 21:24:48 +0800
e64536a3
fix(imperative): fix the dtype promote problem when amp by
2022-03-11 18:30:32 +0800
2b80806f
perf(imperative/src): improve dot performance by
2022-03-08 20:07:12 +0800
2f3bc2db
perf(mge/utils): move astensor1d into C++ by
2022-03-11 13:50:58 +0800
fa62f6c0
perf(mge/utils): move convert_input into C++ by
2022-03-10 18:18:50 +0800
d98be080
perf(mge): move Const into C++ by
2022-03-09 19:17:43 +0800
1709b394
perf(mge/functional): speed up Broadcast and Reshape by
2022-03-04 16:32:36 +0800
0f736a0a
perf(mge/functional): speed up Dimshuffle by
2022-03-04 17:28:54 +0800
3e5e08b0
perf(mge/functional): speed up RemoveAxis by
2022-03-03 19:29:39 +0800
a4d473c9
perf(mge/functional): speed up AddAxis by
2022-03-03 19:00:53 +0800
3e206d89
perf(mge/functional): speed up Split by
2022-03-03 17:38:47 +0800
730ddc2d
perf(interpreter): improve interpreter performance by
2022-03-09 17:15:32 +0800
729242f9
refactor(imperative): move typecvt code of sereval ops to c++ by
2022-03-07 15:47:59 +0800
3c3fc6f3
refactor(imperative): move python code of elemwise/reduce/conv2d/bn to c++ by
2022-02-23 17:06:21 +0800
84466261
perf(imperative/src): improve elemwise by
2022-03-03 15:51:29 +0800
e400b7ff
perf(imperative): enable memory forwarding for imperative by
2022-02-09 16:10:46 +0800
84d1a440
fix(imperative): do not use output_desc in rng ops by
2022-03-03 15:02:56 +0800
1ce78aa0
fix(imperative): destruct dnn handles at last by
2022-02-16 14:17:28 +0800
0cb60d64
feat(imperative): add output_descs for apply_on_physical_tensor by
2022-02-23 10:32:02 +0800
c7ded2fe
refactor(imperative): remove unnecessary reverve in small vector by
2022-02-17 10:42:27 +0800
8c2b916e
refactor(imperative): remove some methods in proxy graph by
2022-02-15 16:32:58 +0800
2348a963
refactor(imperative): apply workspace limit hook to mini graph by
2022-02-16 21:36:05 +0800
fea46ea9
perf(imperative): add opr cache for apply_on_physical_tensor by
2021-05-11 19:33:02 +0800
ea4e6ab9
fix(mgb/opr): fix shape cache of NvOF by
2021-07-01 18:05:12 +0800
3228fb75
fix(cuda): conv algo heuristic choose by
2022-03-17 20:56:34 +0800
8c415f4e
feat(dnn): cuda nhwc nearest resize support not 1 or 3 channel by
2022-03-15 20:41:58 +0800
04475744
feat(opencl): add OpenCL cache compat level api by
2022-03-08 13:22:42 +0800
6fb5a343
build(flatbuffer/cx2): fix cx2 build and fix uclibc build flatbuffer by
2022-03-16 18:52:33 +0800
87de704a
feat(gopt): fuse conv h_swish by
2022-03-02 19:53:38 +0800
4adba378
feat(lite): add example script and some small change for lar by
2022-03-03 15:27:37 +0800
87f00232
fix(mge/gm): fix missing dtype checking while attach tensors by
2022-03-07 16:24:08 +0800
3726f5cc
feat(gopt): merger consecutive relayout and dimshuffle to one relayout to optimize CD4 performarce by
2022-02-25 11:21:51 +0800
1fead9b6
feat(gopt): merge consecutive dimshuffle and relayout to one relayout to optimize CD4 performace by
2022-02-22 19:36:04 +0800
26d1e4f7
feat(gopt): optimize cd4 pass rule for elemwise and typecvt to let cd4 start as soon as possible by
2022-02-22 16:18:24 +0800
ac26bdce
fix(cuda): fix direct conv speed and memory problem by
2022-03-07 20:01:06 +0800
f7994683
feat(cuda): add large kernel direct conv to heuristic algo chooser by
2022-03-07 20:00:14 +0800
6dc0c0b9
fix(dnn): fix the sync problem in some kernels by
2022-03-10 11:50:09 +0800
04193e3b
feat(dnn): add nearest mode for remap and resize by
2022-03-01 14:43:04 +0800
69b89388
docs(mge/functional): fix debug_param set_execution_strategy docstring by
2022-03-04 17:09:13 +0800
93c7e451
feat(arm): delete the reduant implement by
2022-02-23 15:41:38 +0800
e34a642b
feat(fallback): reduce support general intrinsic by
2022-02-17 18:27:26 +0800
10f23778
feat(fallback): add simd general intrinsic by
2022-02-17 18:26:45 +0800
286051ed
feat(dnn): differentiate sass kernel with cuda version by
2022-03-01 19:21:10 +0800
f78b60ec
feat(bazel): make bazel gensass depend on cuda toolchain version automatically by
2021-11-24 19:19:05 +0800
f48227c0
feat(mgb): show more details for cuda driver api call by
2021-11-17 20:11:00 +0800
bb5af9b4
feat(lite): hidden lar gflags symbols for static link by
2022-02-25 11:05:48 +0800
d8bb3ff5
fix(cuda): fix fp16 tensorcore gemm split k workspace by
2022-03-02 10:40:05 +0800
597efed4
feat(lite): add get last error code interface in lite c by
2022-02-25 14:51:27 +0800
90c8a58c
docs(docstring): add pad docstring by
2022-02-24 22:50:43 +0800
5f4501e0
fix(gopt): fix conv bias fuse 2 noline by
2022-02-25 21:37:27 +0800
ac2f548c
docs(imperative/dataloader): update preload description by
2022-02-24 13:25:37 +0800
73b518b7
feat(lite): add get physic addr interface in lite by
2022-02-22 19:00:07 +0800
f67086ad
fix(lite): fix lite global layout transform symvar replace error by
2022-02-28 10:56:33 +0800
c42ce937
(tag: v1.8.2, release-1.8)
feat(mge/third_party): update cutlass version by
2022-02-25 11:20:54 +0800
9902ccfc
chore(release): bump version by
2022-02-27 07:07:24 +0000
8e5410e4
feat(cuda): add fp16 compute 16 kernel by
2022-02-23 02:20:14 +0800
472e2f96
refactor(cuda): depthwish large kernel by
2022-02-21 21:54:44 +0800
e698ec20
feat(cuda): float16 depthwise large kernel conv compute fp32 by
2022-02-15 22:39:17 +0800
48406382
feat(cuda): support float16 depthwise large kernel conv by
2022-02-07 18:28:06 +0800
7042f76b
perf(cuda): speedup conv backward data with small feature map and large filter size by
2022-01-26 19:44:51 +0800
87a2aeeb
perf(cuda): speedup chanwise conv with small feature map and large filter size by
2022-01-21 20:29:19 +0800
2293385e
feat(mge): add conv padding mode by
2022-01-26 13:27:42 +0800
afe9c4b5
feat(dnn/cuda): add implicit bmm kernels for large kernel depthwise convolution backward filter opr by
2022-02-14 16:47:49 +0800
e8a16929
feat(dnn/cuda): add heuristic rule for implicit batched gemm large kernel dwconv2d kernels by
2022-02-11 18:07:54 +0800
38067472
fix(dnn/cuda): fix ci by
2022-02-11 17:38:00 +0800
1da58ae1
feat(dnn/cuda): add implicit bmm large kernel dwconv2d dgrad kernels by
2022-02-10 20:24:43 +0800
96050073
feat(dnn/cuda): add implicit bmm large kernel dwconv2d fprop impl by
2022-02-07 18:54:21 +0800
4462953f
feat(mge/third_party): update cutlass version by
2022-02-25 11:20:54 +0800
d7b0994a
feat(cuda): add fp16 compute 16 kernel by
2022-02-23 02:20:14 +0800
8a2e92bd
refactor(cuda): depthwish large kernel by
2022-02-21 21:54:44 +0800
6b8a69d5
feat(cuda): float16 depthwise large kernel conv compute fp32 by
2022-02-15 22:39:17 +0800
bc385b53
feat(cuda): support float16 depthwise large kernel conv by
2022-02-07 18:28:06 +0800
7d2063e3
perf(cuda): speedup conv backward data with small feature map and large filter size by
2022-01-26 19:44:51 +0800
72403e89
perf(cuda): speedup chanwise conv with small feature map and large filter size by
2022-01-21 20:29:19 +0800
28d48f2f
fix(mgb/src): fix megbrain cmake unsupport android_nn by
2022-02-11 12:41:13 +0800
ab6d12ca
feat(mge): add conv padding mode by
2022-01-26 13:27:42 +0800
177001d5
refactor(dispatch): allow dynamic type creation by
2022-02-16 15:10:11 +0800
150a6a61
perf(dispatch/trace): remove unnecessary h2d for constant by
2022-02-09 03:10:15 +0800