81d8c73a
perf(dispatch/trace): serval tricks to speed up trace by
2022-02-07 22:48:57 +0800
4fa61620
perf(dispatch): improve performance of dispatch system by
2022-01-27 23:53:24 +0800
ca001777
perf(dispatch): speed up dispatch system by
2022-01-20 19:40:09 +0800
187c1dc0
fix(jit): copy aux var when shallow copying JITExecutor by
2022-01-05 14:31:31 +0800
7bd848ce
fix(subgraph): fix hand-written backward for serval jit-elemwise ops by
2021-12-29 15:00:29 +0800
7be7656c
fix(imperative): explicitly manage global structures by
2021-12-13 18:54:31 +0800
62034fb2
fix(imperative): make CompNode finalize happens before global object destructor by
2021-12-10 19:00:19 +0800
59cbf958
fix(subgraph): use CompiledOp in cpu to avoid workspace error by
2021-12-03 12:54:13 +0800
b6ce02a1
fix(subgraph): fallback back to cg if jit unsupported by
2021-09-28 17:32:45 +0800
21f5a7fc
fix(subgraph): fix device recognition and scalar propagate by
2021-09-27 14:58:21 +0800
27346b0b
test(opr): add scalar check for opr_test by
2021-09-27 14:57:03 +0800
22504523
perf(imperative): improve shape inference by
2021-09-26 20:40:47 +0800
df3474ca
perf(functional): rewrite serval elemwise ops with jit subgraph by
2021-09-26 19:56:45 +0800
c55fda9a
fix(fastrun): don't kill profiling worker by
2021-09-26 19:54:16 +0800
2775f458
feat(subgraph): subgraph builder supports jit and custom grad by
2021-09-26 19:53:02 +0800
3c61e0e0
feat(ops): add JITFusion op by
2021-09-26 19:51:36 +0800
aa587446
feat(subgraph): support shape inference for CompiledOp by
2021-09-26 19:49:46 +0800
1c1e9b00
fix(rng): init layout strides by
2021-09-26 19:40:37 +0800
9527859c
feat(opcache): add ndim and has_value to cache key by
2021-09-26 19:39:03 +0800
cbb47089
perf(interpreter): add fastpath for GetVarShape by
2021-09-26 19:34:40 +0800
b4581788
feat(opr): add mutable tensor opr by
2021-09-26 19:32:35 +0800
47fe7663
feat(dnn/cuda): add implicit bmm kernels for large kernel depthwise convolution backward filter opr by
2022-02-14 16:47:49 +0800
dcc96935
feat(dnn/cuda): add heuristic rule for implicit batched gemm large kernel dwconv2d kernels by
2022-02-11 18:07:54 +0800
6cefabe7
fix(dnn/cuda): fix ci by
2022-02-11 17:38:00 +0800
888f4e46
feat(dnn/cuda): add implicit bmm large kernel dwconv2d dgrad kernels by
2022-02-10 20:24:43 +0800
08d8635f
feat(dnn/cuda): add implicit bmm large kernel dwconv2d fprop impl by
2022-02-07 18:54:21 +0800
93ceb80a
refactor(imperative): fix broadcast,reshape,reduce by
2022-02-15 14:14:24 +0800
d919aaeb
test(imperative): reopen special interpolate test and sync when test rng by
2022-02-11 16:46:34 +0800
ca2deebc
fix(imperative/tensor): make @ operator has the same functionality as matmul functional by
2022-02-11 17:12:29 +0800
e860a083
refactor(mge/indexing): move indexing into c++ by
2022-01-19 19:12:25 +0800
e6706be2
refactor(imperative): remove infer_output_mem_desc by
2022-01-25 19:17:23 +0800
a5af35c1
refactor(imperative): remove command buffer by
2022-01-05 17:10:19 +0800
bdb853ee
fix(mgb): fix extra device malloc when load MultipleDeviceTensorWithFormatHolder by
2022-02-08 19:03:58 +0800
406115db
fix(imperative): syncbn fp16 support by
2022-01-21 17:18:04 +0800
d5ef7923
perf(lite): optimized lite tensor get data by share by
2022-01-26 15:22:49 +0800
19fe2e94
chore(release): bump version by
2022-02-08 05:39:23 +0000
b2f15a24
(tag: v1.8.1)
test(trace): test subtensor on unknown shape by
2022-01-28 14:10:31 +0800
6a2348f4
fix(trace): assume result is not scalar when shape is valid by
2022-01-28 14:00:18 +0800
fc212042
fix(traced_module): fix Module compatible issue and traced module getattr check by
2022-01-27 17:09:43 +0800
ce9ad07a
feat(ci): update ci and readme by
2022-02-07 14:24:11 +0800
1add4517
test(trace): test subtensor on unknown shape by
2022-01-28 14:10:31 +0800
54eef558
fix(trace): assume result is not scalar when shape is valid by
2022-01-28 14:00:18 +0800
84d99d1c
fix(traced_module): fix Module compatible issue and traced module getattr check by
2022-01-27 17:09:43 +0800
88486570
test(trace): test subtensor on unknown shape by
2022-01-28 14:10:31 +0800
c34a75d0
fix(trace): assume result is not scalar when shape is valid by
2022-01-28 14:00:18 +0800
bebb2cf4
Merge pull request #428 from P2Oileen:fix-pad by
2022-02-07 12:01:37 +0800
e2b79ea0
feat(mgb): reduce the number of trtruntimeopr create contexts by
2022-01-17 16:37:26 +0800
6157d9cf
fix(traced_module): fix Module compatible issue and traced module getattr check by
2022-01-27 17:09:43 +0800
26b52a61
feat(lite): add get model infomation before create network interface by
2022-01-24 17:28:34 +0800
5e17b3e4
Merge pull request #426 from Qsingle:fix-pixel_suffle by
2022-02-07 12:00:33 +0800
2bebe80e
fix(imperative): fix the default pickle protocol version of save by
2022-01-21 18:13:12 +0800
f02cd2d2
Merge pull request #436 from bealwang/master by
2022-01-30 14:33:28 +0800
df4153dc
docs(readme): add more badges by
2022-01-29 15:23:34 +0800
ea91babb
Merge pull request #435 from MegEngine/try-import by
2022-01-25 16:51:07 +0800
8e94af9d
Merge pull request #400 from jieli-matrix:docstring-svd by
2022-01-25 15:20:26 +0800
260923e1
(tag: v1.8.1.m1)
perf(aarch64): optimize aarch64 uint16 relayout with block_w==3 by
2022-01-24 11:12:27 +0800
b04c3d14
feat(lite): add set address ptr pair interface by
2022-01-21 11:24:40 +0800
17f2dffb
fix(imperative/cpu/multithread): fix multithread at imperative by
2022-01-21 16:30:06 +0800
3159eeca
fix(init): fix fan_in and fan_out for group conv2d by
2021-09-26 17:23:32 +0800
51c03f3e
Merge pull request #434 from MegEngine/try-import by
2022-01-25 15:08:00 +0800
3a27ee01
Update README.md by
2022-01-19 18:19:06 +0800
9809871e
docs(readme): update README.md by
2021-11-19 18:59:49 +0800
8548cc7e
Merge pull request #389 from weixiao-huang/docs/auto-correct by
2022-01-19 17:56:24 +0800
6d635206
docs(mge): add space between Chinese and English for README_CN.md by
2021-11-13 23:23:52 +0800
24c6dea9
(tmp-test)
ci(test): test2 by
2022-01-19 15:45:35 +0800
ccd1eb97
ci(test): test1 by
2022-01-19 15:45:18 +0800
275b6311
(tag: v1.8.0)
fix(imperative): fix use collections error from python3.10 by
2022-01-18 15:47:37 +0800
95ac0555
feat(dnn,mgb,imperative): add diag opr implement by
2021-11-30 18:32:26 +0800
39d77fb5
feat(arm): add arm rnn_cell/lstm_cell/lstm optimized kernel by
2022-01-04 15:55:45 +0800
3ddc32d3
feat(android/whl): support android whl by
2022-01-11 17:44:43 +0800
f509b1be
fix(build): split elemwise_multi_type cpp by
2022-01-11 13:09:24 +0800
3252016e
Merge pull request #401 from LosReturn:patch-1 by
2022-01-18 19:19:58 +0800
f7e034b5
feat(lite): add global layout transform python interface for lite by
2021-12-31 16:18:37 +0800
e70c07a2
feat(lite): add global layout transform c/c++ interface for lite by
2021-12-31 16:14:09 +0800
86ee4638
Merge pull request #402 from AA1HSHH:docstring-reshape by
2022-01-18 15:19:57 +0800
3251f501
fix(mgb/cuda-stub): add libcuda-wrap_11.4.h to fit the CUDA11.4 toolchain by
2022-01-13 17:28:37 +0800
2c2df830
fix(cmake): enable custom op when building develop to avoid the pytest fail by
2022-01-13 18:15:05 +0800
ee0b95e9
feat(dnn/elemwise/arm_common): support part of arm ternary elemwise multithread BCAST111C_VEC_BCAST111C and BCAST101_VEC_BCAST101 by
2022-01-14 14:29:40 +0800
7ea104d7
Revert "fix(mge): replace _full_sync by sync" by
2022-01-17 15:20:15 +0800
cbbca5fb
feat(mge): add softmax op use cudnn api by
2021-12-17 10:35:12 +0800
1d2510b6
fix(module): fix module dumped in old version without _short_name attr by
2022-01-17 16:31:52 +0800
cf5e9488
fix(traced_module): fix module trace transformation by
2022-01-17 15:06:44 +0800
97c90d91
feat(traced_module): add _exclude_from_trace by
2021-12-01 16:30:33 +0800
30e565e5
fix(traced_module): fix error message by
2021-12-01 14:26:05 +0800
de8ffe0c
refactor(imperative): unify interpreter option setting by
2022-01-14 16:36:33 +0800
8b60bdfa
fix(mge): replace _full_sync by sync by
2022-01-14 18:06:29 +0800
20b42a8c
fix(dnn): add naive lstm kernel by
2022-01-04 11:48:43 +0800
2faa6ea5
Merge pull request #213 from kxz18:rnn by
2022-01-16 13:19:38 +0800
f5b8fec4
fix(imperative): remove big tensor from host side by
2021-12-27 12:09:17 +0800
68cde873
fix(mge/imperative): support broadcast with None by
2022-01-11 14:29:41 +0800
0bdd0b14
refactor(dispatch): switch to new dispatch system by
2022-01-14 13:52:29 +0800
d3689c3f
feat(imperative/python): add transformation manager by
2022-01-14 13:51:47 +0800
9ce1f0f5
refactor(dispatch): implement grad by
2022-01-14 13:23:37 +0800
c609c031
refactor(dispatch): implement symbol by
2022-01-14 13:23:14 +0800
e32929df
refactor(dispatch): implement scalar by
2022-01-14 13:22:35 +0800
59084fa8
refactor(dispatch): implement lazy_eval by
2022-01-14 13:22:11 +0800
d2b67c2a
refactor(dispatch): implement trace by
2022-01-14 13:21:46 +0800
39ac606b
refactor(dispatch): implement eval by
2022-01-14 13:21:15 +0800
42759dc7
test(dtr): make dtr_resnet1202 isolated by
2022-01-14 13:19:35 +0800
7f03ae9a
fix(imperative): reduce tls usage by
2022-01-14 13:17:35 +0800