Going to Bio-IT Expo Stop by Booth 110 to See Cloudera Hadoop with EMC Isilon
The scale of worldwide data growth according to IDC’s 2014 report on the Digital Universe is a staggering 40% per year into the next decade. According to 2013 metrics, the Digital Universe stood at 4.4 zettabytes (4.4 Trillion Gigabytes) and is estimated to reach 44 zettabytes by 2020.
The explosive growth of information is driving companies to find new ways to store, manage and leverage this information for business value. In 2010 EMC recognized this data storage challenge and moved to acquire Isilon Systems – maker of an industry leading scale-out clustered NAS storage product. Today EMC Isilon solutions can be deployed for storage needs starting at 18 terabytes in a 3-node cluster and scaling to over 40 petabytes in a 144-node cluster – all with a single filesystem/namespace called OneFS. When considering what to do with a massive influx of data, where to put it, how to access it and how to protect it, the choice for Isilon becomes very clear.
Another interesting tidbit about the EMC Isilon platform is that any data placed on the Isilon’s OneFS filesystem over one of the supported protocols, is immediately and simultaneously available over all of the other supported protocols. Given this concept, EMC had the vision to provide Hadoop Namenode and Datanode functionality on Isilon, thereby enabling HDFS to be accessed as a protocol. Functionally, this enables a Hadoop users to leverage Isilon HDFS for their Hadoop cluster storage and to push data onto HDFS via a CIFS share on Isilon. Given the built in data and hardware platform protection of Isilon, HDFS immediately gains two important benefits that are missing in a traditional Hadoop cluster: 1. Namenode redundancy without having to create multiple Namenode instances. 2. Dataset data protection against node loss and corruption.
Looking at the figure below, you can see conceptually how a Hadoop cluster is connected to Isilon. While Isilon certainly supports a number of Hadoop distributions, we chose to focus in on Cloudera CDH.
Cloudera CDH offers some really key features that enable rapid and simple deployment and management of Hadoop clusters. Traditionally speaking, deploying Hadoop across an array of physical or virtual servers is a tedious task of deploying packages and updating configuration files. At scale, this can be fraught with errors and very time consuming – completely unmanageable. Cloudera Manager enables a user to detect target hosts, push management components and the required and optional Hadoop packages from a central repository and create a cluster in minutes – all from a central management GUI.
Selecting the appropriate Cloudera CDH (Hadoop) distribution is simple and the Cloudera Manager can deploy multiple differing CDH clusters in the same management realm.
The figure below shows the main Cloudera Manger screen, accessible via a web browser, from which one can manage the clusters, change configurations, monitor performance and many other features.
With regard to how Cloudera CDH is integrated with Isilon, the task is very straight forward. You’ll recall that EMC enabled HDFS on Isilon as a protocol. So the HDFS Namenode URI is provided as part of the Cloudera CDH Isilon Service deployment and the cluster is ready to run MapReduce jobs against data sets on Isilon.
So to recap, EMC Isilon is the first and only scale-out NAS platform to incorporate native support for the HDFS layer, the key benefits of Hadoop on EMC Isilon are as follows:
- Dependable security
- Scalable storage solution
- Continuous availability
- Existing infrastructure and simple integration
- Easy deployment and faster administration
Simply connect Cloudera Enterprise’s analytics compute resources to your Isilon storage system and you are ready to begin your analytics projects immediately, to turn your data lake into an enterprise data hub
Whether motivated by added security capabilities like key management, compliance requirements like robust audit and data lineage, automated backup and disaster recovery, or by the critical need for a true enterprise-grade tool like Cloudera Manager to keep the entire Hadoop environment running optimally, customers will get the processing and analytic power of Cloudera Enterprise atop their existing data lake
If you’d like to see the solution discussed in this blog live and in person, remember to visit us at our booth in the BIO IT World Conference in Boston, April 21-23, 2015!