You can not select more than 25 topics Topics must start with a chinese character,a letter or number, can include dashes ('-') and can be up to 35 characters long.

README.md 2.9 kB

add NAB dataset Former-commit-id: 144fc3f7890ea2ea397a38cce83950243f07c1ff [formerly 51ee914920aba12988acc8b78f000c3f3a48b26b] [formerly ef80542af045d55499bc711b20ec5f7cb95d8ddc [formerly 3e0aa0de57dd3c39fc106d452608080bc97ebdb1]] [formerly 902f3a7279a6a79989dc3e785e54595e8638f9ba [formerly c84725dc68d2a017e097ae72c170fe79e30d72c9] [formerly f2e4714c60b1905c0dbb47814bb94a33d99a1dc0 [formerly 507d6b4abbea1caae7f799c161f697b37587570e]]] [formerly f8d6b8d7c78297ebabf7d64ddf9b9a316fabb90d [formerly be310386699dc1773007ce29e9d1e7baba144de1] [formerly 1aca4b0620dedb6e6b6be0467e409eec847dbb84 [formerly e395eab9f6f516b2dc16a76040d848bcfa885c72]] [formerly cb9d203e2d04a54fdb40ca83d5cd4b6fbc48dbb1 [formerly 938f3b5551f9d346e245af22cdeda0e8a4c6cdd4] [formerly 2fe834d1b84fd6709b9f0e56d436bab82985402f [formerly 0ceefa541a0f3a85b674d90e30102e526e5b33fc]]]] [formerly 740419315b74b3ade3924d98706e8d98f808589d [formerly 18e5437ef211c5b20cae9ca46f2a746f77934183] [formerly 26d627a136a475a9f35b59cf024e5834a479a2e4 [formerly c4e982cf5098acf33ab8ade73fa6ed0f40fddb82]] [formerly 9c49820d64f8967e8e64f0153890020e244b47d1 [formerly 03fa79cd28ff4341d52608f7e5a6350e14067a9f] [formerly 7bf4a741cf9812046f9f2aa739ead93f44fca750 [formerly 9999d83ba0d8576deb7e1ce1eaec5b4e917116fa]]] [formerly 83a0decfaee3bc410cfb4b5318999f13ce4bf801 [formerly e7b1bb09d4208e9e833e6b8b414c9f909f0d6138] [formerly 028619d32f9cd2bf67215a295cad7151ffba42dd [formerly 69cd27d9e940d90147faded20933092cf5a3ca0a]] [formerly 6bbd1343387e4c16e6ebdc1ea65edf3a19d45478 [formerly da00923e8e225d91f6dc5af23edf7bcd456ab1cc] [formerly 7c4f8e6a2bb14a5598d3c5ddd4c9165d0912dd5e [formerly 0fecb333a60713d8a410c6d93d703575b8f9e03f]]]]] Former-commit-id: a85941b3b90aeed83314836a46c693160af0621a [formerly 727065a4400fb5eb2fef2d52922dfabc46e4e431] [formerly a61c183d8a9d9462f7d6aca18c4cd18384b598db [formerly 4f17c638c3a77c7359972de19fe1727a3b2fda0d]] [formerly e32dde39608d23142bc771f7292380c9e39118eb [formerly 632c8bbee0d061262e619829c344c1b21278615a] [formerly 09de65e17ce3d4e89e47fbcaf7573346de569c74 [formerly 42cfe67a0ded2d750cf8a38c7257393283467404]]] [formerly 4e7a053e6912550310ef513690e026420bb069eb [formerly 73e57ce3b4821a96db9d618b31ec8f00e5426160] [formerly 5e709a32502eac1d53bdbf23823f77898dce53dc [formerly 38e1eb285493a0c57a9e6b3a06569ab779745807]] [formerly 2a082a48c97187b333aada107c56684a2dcb9fc5 [formerly 6adba8cc00103b409860388506745ffdfc0bdcaf] [formerly 7c4f8e6a2bb14a5598d3c5ddd4c9165d0912dd5e]]] Former-commit-id: 88c2e0138b08fda56cb94126a151ff32f6dbd868 [formerly 89b05b68de06d9d3afae5e4669ea9bdce5757f9e] [formerly f063ba3de662ee81fe0c7b21f4237d7d2c4bcf3b [formerly ccba2c2a90969cd669d1e5b2f86c855d307ad49b]] [formerly 4267e27b5c1c42c4c5411704e1b38df7ce7a7606 [formerly a823fbd4851ebd8558092cf8e11c22d541d196b4] [formerly 08148f74caac2a9abc98f299764e3345a304e059 [formerly 90494e66f988ac1a57be95f68a426e8f17fd010f]]] Former-commit-id: e549c884641b226188eba6f934c3ad9a7efee3cf [formerly 88a7e05850c0d2936dc78746cad6c2819adf8ec5] [formerly de4e53722a9717e92c2188f4a54665fd8eb62c4d [formerly f7fdd01bc0b32ebd755074b4f8cbbc0d7a5c568f]] Former-commit-id: 2a2faafdba702faf040f43a0b2fadcf259c1a63c [formerly e21cfea3eb2877812c007f4bcd984b5f0ee61a97] Former-commit-id: e29f2f324b8303055d2fe9257bf8a39557c82784
5 years ago
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768
  1. NAB Data Corpus
  2. ---
  3. Data are ordered, timestamped, single-valued metrics. All data files contain anomalies, unless otherwise noted.
  4. ### Real data
  5. - realAWSCloudwatch/
  6. AWS server metrics as collected by the AmazonCloudwatch service. Example metrics include CPU Utilization, Network Bytes In, and Disk Read Bytes.
  7. - realAdExchange/
  8. Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM). One of the files is normal, without anomalies.
  9. - realKnownCause/
  10. This is data for which we know the anomaly causes; no hand labeling.
  11. - ambient_temperature_system_failure.csv: The ambient temperature in an office
  12. setting.
  13. - cpu_utilization_asg_misconfiguration.csv: From Amazon Web Services (AWS)
  14. monitoring CPU usage – i.e. average CPU usage across a given cluster. When
  15. usage is high, AWS spins up a new machine, and uses fewer machines when usage
  16. is low.
  17. - ec2_request_latency_system_failure.csv: CPU usage data from a server in
  18. Amazon's East Coast datacenter. The dataset ends with complete system failure
  19. resulting from a documented failure of AWS API servers. There's an interesting
  20. story behind this data in the [Numenta
  21. blog](http://numenta.com/blog/anomaly-of-the-week.html).
  22. - machine_temperature_system_failure.csv: Temperature sensor data of an
  23. internal component of a large, industrial mahcine. The first anomaly is a
  24. planned shutdown of the machine. The second anomaly is difficult to detect and
  25. directly led to the third anomaly, a catastrophic failure of the machine.
  26. - nyc_taxi.csv: Number of NYC taxi passengers, where the five anomalies occur
  27. during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow
  28. storm. The raw data is from the [NYC Taxi and Limousine Commission](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml).
  29. The data file included here consists of aggregating the total number of
  30. taxi passengers into 30 minute buckets.
  31. - rogue_agent_key_hold.csv: Timing the key holds for several users of a
  32. computer, where the anomalies represent a change in the user.
  33. - rogue_agent_key_updown.csv: Timing the key strokes for several users of a
  34. computer, where the anomalies represent a change in the user.
  35. - realTraffic/
  36. Real time traffic data from the Twin Cities Metro area in Minnesota, collected
  37. by the
  38. [Minnesota Department of Transportation](http://www.dot.state.mn.us/tmc/trafficinfo/developers.html).
  39. Included metrics include occupancy, speed, and travel time from specific
  40. sensors.
  41. - realTweets/
  42. A collection of Twitter mentions of large publicly-traded companies
  43. such as Google and IBM. The metric value represents the number of mentions
  44. for a given ticker symbol every 5 minutes.
  45. ### Artificial data
  46. - artificialNoAnomaly/
  47. Artificially-generated data without any anomalies.
  48. - artificialWithAnomaly/
  49. Artificially-generated data with varying types of anomalies.

全栈的自动化机器学习系统,主要针对多变量时间序列数据的异常检测。TODS提供了详尽的用于构建基于机器学习的异常检测系统的模块,它们包括:数据处理(data processing),时间序列处理( time series processing),特征分析(feature analysis),检测算法(detection algorithms),和强化模块( reinforcement module)。这些模块所提供的功能包括常见的数据预处理、时间序列数据的平滑或变换,从时域或频域中抽取特征、多种多样的检测算