7b2a76d1
refactor(mgb): make conv handle noncontiguous tensors by
2021-06-09 14:49:04 +0800
ca2828dd
fix(dnn/x86): fix x86 int8 matmul ldc bug by
2021-06-28 20:58:57 +0800
aa4e8476
fix(interpreter): release gil when interpreter blocking by
2021-05-28 12:49:42 +0800
bd62a0a6
fix(interpreter): remove notice flag of produce_tensor by
2021-05-18 13:38:04 +0800
c78a7848
test(profiler): simple tests for profiler by
2021-04-26 19:39:45 +0800
d6079db1
feat(profiler): add memory flow format for profiler by
2021-04-26 13:29:34 +0800
4286bb9f
feat(profiler): add launch script for profiler by
2021-04-26 13:27:51 +0800
9d47c3ba
feat(profiler): imperative profiler support tracing by
2021-04-26 13:26:01 +0800
cdcb46ba
feat(profiler): add chrome timeline format for profiler by
2021-05-15 20:54:36 +0800
1d64792b
refactor(profiler): detach profiler from interpreter by
2021-05-15 18:14:44 +0800
f2027b8d
refactor(interpreter): recompute with do_apply_op by
2021-05-15 18:11:34 +0800
5cb35c1b
refactor(profiler): add state structs to replay recorded events by
2021-05-15 14:23:00 +0800
84fc5c92
refactor(profiler): remove dump logic for old profiler by
2021-05-15 13:28:50 +0800
5b5a8261
refactor(profiler): use macro to simplify event recording/definition by
2021-05-15 12:21:41 +0800
1ce40b5b
refactor(interpreter): wrap accesses to channel/worker state by
2021-05-14 19:21:51 +0800
90d39057
feat(mge): add mge._exit by
2021-04-26 13:14:53 +0800
846642a8
docs(misc): add some notes by
2021-04-26 13:12:09 +0800
fb8f1534
fix(cmake/debug): fix asan do not work at windows env by
2021-06-25 17:10:38 +0800
40085acb
fix(mgb): remove unnecessary cudnn8 warning by
2021-06-24 18:55:04 +0800
54a4d70e
feat(src/serialization): add support of serializing metadata by
2021-05-11 10:54:04 +0800
721091fa
fix(core): fix thread local is not supported in ios by
2021-06-18 19:24:55 +0800
1d865281
ci(config): change pip mirror to pypi cache proxy by
2021-06-08 18:21:36 +0800
82b95a3a
fix(lar): fix build error by
2021-06-22 19:45:29 +0800
8cfe3c39
fix(lar): update --verbose to --model-info by
2021-06-22 11:36:33 +0800
2e5a473d
feat(lar): support --verbose by
2021-06-21 20:40:26 +0800
b7e596b4
perf(autograd): copy inputs before capture in backward_graph_grad_rule by
2021-06-23 13:54:04 +0800
2ac3c9dc
fix(trace): constants in backward graph treat as ImmutableTensor corectlly by
2021-06-15 19:18:00 +0800
7eea1fc6
feat(lite/opencl): fix CMake OpenCL and megenginelite py import and add keywords check by
2021-06-23 21:02:50 +0800
62bd6c82
feat(cmake/debug): misc for build * add asan build option * fix cpuinfo build opt level * fix host release build with out debug info * opt "fix lite bazel/cmake symbols MR" * other misc build opt by
2021-06-09 20:28:34 +0800
c7a5c21a
chore(bazel/cmake): fix lite bazel/cmake symbols by
2021-04-02 14:08:54 +0800
3e4e4c46
feat(mgb/jit): add graph_opt_config and jit_config interfaces by
2021-06-04 14:45:08 +0800
1c7d0802
fix(cuda): remove cuda driver version check and runtime minor version by
2021-05-26 17:01:18 +0800
b87af9f7
feat(dnn/cuda): topk support fp16 by
2021-05-28 17:34:25 +0800
34262d90
fix(imperative): fix the size of blob with offset by
2021-06-17 16:46:57 +0800
787f187e
fix(imperative/src): fix dot backward error by
2021-06-10 16:07:45 +0800
f35687ca
Merge pull request #181 from haolongzhangm/update-readme by
2021-07-08 20:18:29 +0800
bf2e2d29
Update README.md by
2021-06-15 10:25:39 +0800
8d76576f
fix typos by
2021-04-24 20:28:36 +0800
e67fefcd
ci(yml): enable try-import branch invoke ci on push by
2021-06-30 17:08:40 +0800
7038a7f5
fix(quant): fix spell error by
2021-05-27 17:00:21 +0800
4d72e707
deps: update cutlass by
2021-06-23 13:26:02 +0800
355153e1
feat(mge/dtr): add DTR in computing graph by
2021-05-14 14:24:40 +0800
76f4f975
refactor(sublinear): add SeqModifierBase by
2021-05-24 16:52:20 +0800
f584416a
fix(dnn/bn): revise the conditions for inplace flag by
2021-05-14 14:22:08 +0800
a9b60fbf
fix(ci/lite): reopen lite_test build by cmake as some reason, lite_test need static link lite when cuda enable on gcc7 and gcc8, if not cask_trt::AbiInfo::~AbiInfo will double call at atexit stage, which will lead double free at the end of test, gcc9 do not have this issue, for compat all CI env, we use static link!!! by
2021-06-09 14:17:38 +0800
2eea0009
feat(mgb): add fast run batch size graph option by
2021-05-19 20:37:44 +0800
0ac642b5
fix(imperative): persistent cache write through on put by
2021-06-04 16:47:40 +0800
47dcdf3e
fix(mgb/core): fix dtype and resize modifiers for tensor by
2021-06-09 15:09:17 +0800
29f7cdb8
fix(mgb/opr): correct nvof out shape computation by
2021-06-08 15:43:59 +0800
71cc814e
feat(ci): add aarch64 linux ci by
2021-06-04 16:22:09 +0800
31a1f538
feat(whl/opencl): enable OpenCL in python whl by
2021-06-04 00:25:53 +0800
b07f3728
feat(aarch64/whl): support aarch64 whl by
2021-06-03 15:41:42 +0800
d8ee0d7b
fix(mge/distributed): fix the mutli dataloader test error by
2021-06-03 10:48:58 +0800
e275dfec
feat(imperative/python): support pooling mode "average" for avg pool2d module by
2021-06-02 15:57:52 +0800
03ab8136
fix(core): fix asan error cause by wild thread_pool ptr by
2021-06-04 18:24:54 +0800
24a38781
feat(dnn/cuda): add nchw conv u4xs4 support by
2021-04-28 18:57:13 +0800
606540be
feat(dnn/cuda): add nhwc 4bit warp perspective by
2021-05-24 20:09:36 +0800
1e601943
feat(dnn/cuda): add nhwc int4 pooling by
2021-05-25 19:08:46 +0800
0fb9cc41
fix(gopt): fix nchw64 opt pass by
2021-05-20 14:11:41 +0800
e661ae90
feat(dnn/cuda): add base class for cutlass uint4 and int4 algos by
2021-05-21 10:12:28 +0800
319436dd
feat(dnn/cuda): add cutlass impls for uint4 x int4 conv bias by
2021-05-17 19:42:43 +0800
d28eba4e
feat(dnn/cuda): add cutlass impls for int4 conv bias by
2021-05-07 17:06:29 +0800
14b65e4d
feat(dnn/cuda): add reduce_filter_and_update_bias by
2021-05-10 19:13:59 +0800
2d4e62ef
feat(dnn/cuda): add cuda uint4 pooling by
2021-05-07 16:59:38 +0800
19919384
feat(dnn/cuda): add cuda uint warp perspective by
2021-05-17 13:41:56 +0800
01354337
fix(mge/autodiff): fix incorrect handling of tuple dy by
2021-06-03 20:05:22 +0800
5868d1fe
fix(arm_common/pooling): check mode in pooling algo to avoid wrong use AVERAGE_COUNT_EXCLUDE_PADDING by
2021-06-02 17:06:05 +0800
86b69cac
fix(dnn): fixes for int4 by
2021-04-26 17:20:49 +0800
4a802d21
feat(dnn/cuda): add conv u4xs4 sass kernel by
2021-04-16 10:39:52 +0800
adf75a29
perf(dnn/cuda): add sass int4 128x128 by
2021-04-21 18:13:21 +0800
8da2f698
feat(dnn/cuda): support warp perspective/pooling op when channel not aligned to 64 by
2021-04-12 19:48:33 +0800
c218d4b0
feat(dnn/cuda): fallback conv qs4 support channel not aligend to 64 by
2021-04-12 17:50:47 +0800
4fe68ac9
feat(dnn/cuda): support transforming layout between nchw and nchw64 when channel not aligned to 64 by
2021-04-12 17:20:19 +0800
ae6ff2c5
feat(mgb/gopt): add opt pass for nchw64 layout transform by
2021-04-02 17:54:40 +0800
63a9bd30
feat(mgb/gopt): add an opt pass for padding channels to enable fast int8/int4 support on GPU by
2021-03-31 18:46:54 +0800
56e863b7
fix(dnn/cuda): fix int4 epilogue stg bug by
2021-04-19 19:39:16 +0800
cff61a53
perf(dnn/cuda): optimize int4 sass conv main loop and epilogue without fuse_z by
2021-04-12 16:54:26 +0800
12a0e615
feat(dnn/cuda): add cuda elemwise int4 by
2021-04-06 17:28:14 +0800
df1af59b
feat(dnn): warp perspective support int4 by
2021-03-29 18:50:32 +0800
2398df07
feat(dnn/cuda): add cuda int4 pooling by
2021-03-31 21:04:43 +0800
2a2a7f45
test(mgb/opr): add testcase for conv bias int4 by
2021-03-30 10:27:47 +0800
858261af
fix(python_module): fix conversion between numpy-ndarray and mgb tensor for qint4 and quint4 by
2021-03-26 17:51:21 +0800
e250afb0
feat(dnn/cuda): support conv_bias for nchw64 and qint4 by
2021-03-23 10:41:39 +0800
3b9b8780
refactor(dnn): refactor lowbit tensor format by
2021-03-24 14:06:27 +0800
c74660ea
fix(dnn/cuda): fix invalid local read for relayout format kernel by
2021-03-24 14:05:19 +0800
8fef78d0
feat(dnn/cuda): add relayout format when width is an odd number by
2021-03-22 17:47:55 +0800
91d61607
feat(dnn/common): add tensor format for low-bits tensor layout by
2021-03-19 18:01:21 +0800
19a554d6
test(dnn/cuda): add testcase for transforming tensor layout between nchw and nchw64 by
2021-03-18 15:54:21 +0800
71c2f612
feat(dnn/cuda): add relayout format to support layout transform between NCHW and NCHW64 by
2021-03-16 13:52:40 +0800
df009e89
feat(dnn/cuda): add cuda conv bias impls for NCHW format tensors with qint4 data type by
2021-03-12 10:30:34 +0800
ed922075
feat(dnn/cuda): add conv bias impl for int4 data type using sass language by
2021-03-11 14:22:06 +0800
52b55564
refactor(dnn/cuda): refactor reorder filter and bias kernel to support conv imma with data type s4 by
2021-03-10 12:06:51 +0800
d2673c5a
fix(ci/windows): add windows cuda test by
2021-05-17 20:57:57 +0800
2d6827c1
fix(mgb/windows): temporary workround on cuda-windows python exit code(127), as windows cuda driver unloading before atexit function may remove this after upgrade cuda runtime by
2021-06-01 17:05:55 +0800
517cc684
ci(gitlab-ci): add inline lineno checking in copybara linter by
2021-06-01 18:25:19 +0800
23032f50
feat(dnn/cuda): support float16 for index_incr_multi_axis_vec by
2021-05-25 16:54:32 +0800
93894402
fix(mgb/dnn): fix cudnn8 convbias by
2021-05-19 18:56:31 +0800
5427a67c
fix(cmake/subdirectory): fix project import by other sdk by add-subdirectory by
2021-05-27 13:53:23 +0800
241b35a6
refactor(ops): remove BackwardGraph op by
2021-05-24 17:09:19 +0800
d2e33af5
fix(mgb): fix wrong set of strategy in lar by
2021-05-25 19:13:47 +0800