By: Laureline David & Michael Rohrer
In this lab you will perform a number of tasks and document your progress in a lab report. Each task specifies one or more deliverables to be produced. Collect all the deliverables in your lab report. Give the lab report a structure that mimics the structure of this document.
No such lines were found in the syslog, which is given as an annex, or the stderr files. It is probably because of the Hadoop distribution which is Amazon 1.0.3.
The overal highest temperature is 38.0 degrees. This temperature has been reached in 2003.
Three EC2 instances were created to run this job. We can see it on the next screnshot.
This pricing test has been made with 3 EMR instances of type m1.small. This job took 19 minutes to complete so we have been charged for a 1 hour. The price for it was about 0.18 $. It's important to notice that EC2 instances are already included in the price of EMR.
It is not written in the log file. It is probably because of the Hadoop distribution which is Amazon 1.0.3. But we could have find it in the Map input records field.
It is not written in the log file. It is probably because of the Hadoop distribution which is Amazon 1.0.3. But we could have find it in the Reduce input groups field.
#!/usr/bin/env python import re import sys for line in sys.stdin: val = line.strip() temp = val[87:91] q = val[92:93] if (temp != "+999" and re.match("[01459]", q)): print "%s\t%s" % (temp, "1")
#!/usr/bin/env python import sys last_key = None count_val = 0 for line in sys.stdin: (key, val) = line.strip().split("\t") if last_key and last_key != key: print "%s\t%s" % (last_key, count_val) count_val = 0 count_val += 1 last_key = key if last_key: print "%s\t%s" % (last_key, count_val)
#!/usr/bin/env python import sys import math # Maximum bar width width = 120 temps = {} maximum = 0 for line in sys.stdin: (key, val) = line.strip().split("\t") key = int(key) val = int(val) temps[key] = val maximum = max(maximum, val) for key, count in sorted(temps.items(), key=lambda row: row[0]): print "{:+03d} | [{:6d}] {:s}".format(key, count, "X" * int(max(1, math.floor((count / float(maximum)) * width))))
56'530 times
Maximum is 38 degrees, minimum is -25 degrees.
14 degrees, with 114613 occurences.