NAB Data Corpus --- Data are ordered, timestamped, single-valued metrics. All data files contain anomalies, unless otherwise noted. ### Real data - realAWSCloudwatch/ AWS server metrics as collected by the AmazonCloudwatch service. Example metrics include CPU Utilization, Network Bytes In, and Disk Read Bytes. - realAdExchange/ Online advertisement clicking rates, where the metrics are cost-per-click (CPC) and cost per thousand impressions (CPM). One of the files is normal, without anomalies. - realKnownCause/ This is data for which we know the anomaly causes; no hand labeling. - ambient_temperature_system_failure.csv: The ambient temperature in an office setting. - cpu_utilization_asg_misconfiguration.csv: From Amazon Web Services (AWS) monitoring CPU usage – i.e. average CPU usage across a given cluster. When usage is high, AWS spins up a new machine, and uses fewer machines when usage is low. - ec2_request_latency_system_failure.csv: CPU usage data from a server in Amazon's East Coast datacenter. The dataset ends with complete system failure resulting from a documented failure of AWS API servers. There's an interesting story behind this data in the [Numenta blog](http://numenta.com/blog/anomaly-of-the-week.html). - machine_temperature_system_failure.csv: Temperature sensor data of an internal component of a large, industrial mahcine. The first anomaly is a planned shutdown of the machine. The second anomaly is difficult to detect and directly led to the third anomaly, a catastrophic failure of the machine. - nyc_taxi.csv: Number of NYC taxi passengers, where the five anomalies occur during the NYC marathon, Thanksgiving, Christmas, New Years day, and a snow storm. The raw data is from the [NYC Taxi and Limousine Commission](http://www.nyc.gov/html/tlc/html/about/trip_record_data.shtml). The data file included here consists of aggregating the total number of taxi passengers into 30 minute buckets. - rogue_agent_key_hold.csv: Timing the key holds for several users of a computer, where the anomalies represent a change in the user. - rogue_agent_key_updown.csv: Timing the key strokes for several users of a computer, where the anomalies represent a change in the user. - realTraffic/ Real time traffic data from the Twin Cities Metro area in Minnesota, collected by the [Minnesota Department of Transportation](http://www.dot.state.mn.us/tmc/trafficinfo/developers.html). Included metrics include occupancy, speed, and travel time from specific sensors. - realTweets/ A collection of Twitter mentions of large publicly-traded companies such as Google and IBM. The metric value represents the number of mentions for a given ticker symbol every 5 minutes. ### Artificial data - artificialNoAnomaly/ Artificially-generated data without any anomalies. - artificialWithAnomaly/ Artificially-generated data with varying types of anomalies.