Deployment of Hadoop on Isilon – Some Lessons Learned
The explosive growth of information is driving companies to find new ways to storage, manage and leverage this information for business value. In 2010 EMC recognized this data storage challenge and moved to acquire Isilon Systems – maker of an industry leading scale-out clustered NAS storage product. Today EMC Isilon solutions can be deployed for storage needs starting at 18 terabytes in a 3-node cluster and scaling to over 40 petabytes in a 144-node cluster – all with a single filesystem/namespace called OneFS. When considering what to do with a massive influx of data, where to put it, how to access it and how to protect it, the choice for Isilon becomes very clear.
We’ve covered the reasons why Hadoop works well on Isilon, now let’s take a look at some of the lessons learned in the deployment of Cloudera CDH on Isilon.
While Isilon supports a number of the currently available Hadoop distributions, it’s very important to verify the supported version of Hadoop as well as the corresponding supported version of Isilon OneFS. To verify this, check the Isilon Supportability and Compatibility Guide (URL below):
Our experience is that some software version mismatches were not fatal, but some produced confounding issues that were difficult to chalk up to anything but the version issues. Keep your versions in line with the support matrix.
Also, review the version of CDH that is supported by the version of Isilon you’re using, but also verify the Hadoop tools versions and verify if your applications require a particular version. In particular, pay attention to the version of Hive being used as per your needs, certain versions of CDH only come with particular versions of Hive – pay close attention to this as it can be the cause of tremendous time lost.
There are only a few steps that must be taken in order to get Isilon to support serving data over HDFS. However, there are a number of steps to ensure that Hadoop jobs will run correctly, you can review the Hadoop configuration section in chapter 22 of the OneFS Web Administration Guide. (https://support.emc.com/docu56049_OneFS-7.2-Web-Administration-Guide.pdf?language=en_US)
Pay close attention to the following details, as failing to do so will certainly can problems later on.
- Be sure to have the Hadoop users defined on Isilon that have the correct access permissions to the filesystem in order for Hadoop jobs to be able to read and write to the appropriate locations. Page 444 of the OneFS Web Administration Guide goes over this.
- Get your IP address ranges defined ahead of time and stick to a plan
- Get your DNS configured and be sure to use FQDN’s everywhere you can
- Get your NTP services setup and make sure all hosts and Isilon are in sync
- Install a base Linux VM and use VMware Templates to deploy your Cloudera VM’s – make sure to deploy VMware Tools into the Template!
One of the most helpful sites that can serve as a kind of template for how to deploy Cloudera Hadoop on Isilon can be found here:
Granted, this site focuses more on the deployment of Cloudera CDH with VMware’s BDE project, but the integration steps on Isilon are very helpful.
Please let me know if you have any questions or comments.