Laureline's Wiki

Laureline's Wiki

Lab 05: MapReduce in the Cloud

This is an old revision of the document!


Lab 05: MapReduce in the Cloud

Pedagogical Objectives

  • Perform data analysis in the cloud using a dynamically allocated cluster of machines
  • Write a MapReduce program
  • Become familiar with Hadoop

Tasks

In this lab you will perform a number of tasks and document your progress in a lab report. Each task specifies one or more deliverables to be produced. Collect all the deliverables in your lab report. Give the lab report a structure that mimics the structure of this document.

Task 1 - Using Elastic MapReduce

Copy a screenshot of the EMR console into the report.

Copy the bar chart of maximum temperature by year into the report.

What is the overall highest temperature in the data set?

The overal highest temperature is 38.0 degrees. This temperature has been reached in 2003.

How many EC2 instances were created to run the job?

Three EC2 instances were created to run this job. We can see it on the next screnshot.

This pricing test has been made with 3 EMR instances of type m1.small. This job took 19 minutes to complete so we have been charged for a 1 hour. The price for it was about 0.18 $.

how many input key-value pairs all the mappers did process

how many input key-value pairs all the reducers did process