Getting started with Hadoop on the IBM SmartCloud Enterprise

It couldn’t be easier to set up your Hadoop cluster on the IBM SmartCloud Enterprise. It’s also fast! For example, a three-node Hadoop cluster can be up and running in less than 30 minutes.

All the IBM SmartCloud Enterprise data centers include two types of images for BigInsights Basic Edition, which is IBM’s distribution of Hadoop with added features. The basic edition is free:

  • IBM BigInsights Basic 1.1 – Hadoop Master Node
  • IBM BigInsights Basic 1.1 – Hadoop Data node

The images provided in the IBM SmartCloud Enterprise are running under Red Hat Enterprise Linux (RHEL) 5.6, 64-bit with the pay-as-you-go option. There is no charge for BigInsights Basic edition, but there is a charge of US$0.30/hour (at the time of writing) for using RHEL and the IBM SmartCloud Enterprise infrastructure.

If you are new to Hadoop, you can take the free online course Hadoop Fundamentals I at BigDataUniversity.com, which includes videos and lab exercises. This course also includes a video demonstration of the set up, and a video demonstration of running some Hadoop commands on the IBM SmartCloud Enterprise. This material is provided in Lesson 1, section Hands-on lab – Creating your own Hadoop cluster, Option 3 in the course. If you want to take a more detailed course, IBM offers the fee-based InfoSphere BigInsights Essential class. And if you prefer to read step-by-step instructions while trying this out hands-on, I wrote this article in IBM developerWorks; it explains how to provision three instances on the IBM SmartCloud Enterprise to set up a three-node cluster. The article also shows how to verify that your cluster is working by stopping and starting all Hadoop components, testing a few commands, and monitoring your cluster using the BigInsights Web console. You can follow the same instructions in the article to set up a larger cluster that satisfies your needs.

Hadoop uses a master-slave architecture where the master includes a NameNode and a JobTracker node, and the slaves include a DataNode, and a TaskTracker node.

Hadoop can be configured so you work in one of three modes. The stand-alone mode does not start all components and works on a single node. The pseudo-distributed mode starts all components and works on a single node. The fully distributed mode starts all components and requires you to work on more than one node.  The stand-alone and pseudo-distributed modes are typically used in development or testing, while the fully distributed mode is typically used in production scenarios.

When working with the images provided with the IBM SmartCloud Enterprise, you can work in stand-alone or pseudo-distributed mode when provisioning a single node, the Hadoop master node. If you want to work in the fully distributed mode, the IBM SmartCloud Enterprise BigInsights images have been configured so the cluster is easily built simply by specifying the IP address of the Hadoop master node when provisioning Hadoop data nodes. The Hadoop master node instance must be provisioned first.

Thanks to the cloud and Hadoop, it is now possible to handle in a timely manner large amounts of data –structured or unstructured. However, there is a lack of skill in these areas. See the video demonstrations (Hadoop Fundamentals I at BigDataUniversity.com) and the article previously referenced to jump into these technologies with hands-on steps!

If you don’t have an account with IBM SmartCloud Enterprise, you can take advantage of the trial available until November 11, 2011 (sign up by October 28).

Share
Comments: 1
Raul Chong

About Raul Chong

Raul F. Chong is a senior DB2, Big Data and Cloud Program Manager at the IBM Information Management Cloud Computing Center of Competence, based at the IBM Canada Laboratory in Toronto. He works as a technical evangelist delivering presentations at educational institutions and conferences around the world showing the latest features of DB2, BigInsights, Data Studio, and related products, and how they work on the Cloud.
This entry was posted in Cloud 101, Workloads and tagged , , , , , , , , , , , . Bookmark the permalink.