Why Hadoop on EMC Isilon?
The explosive growth of information is driving companies to find new ways to storage, manage and leverage this information for business value. In 2010 EMC recognized this data storage challenge and moved to acquire Isilon Systems – maker of an industry leading scale-out clustered NAS storage product. Today EMC Isilon solutions can be deployed for storage needs starting at 18 terabytes in a 3-node cluster and scaling to over 40 petabytes in a 144-node cluster – all with a single filesystem/namespace called OneFS. When considering what to do with a massive influx of data, where to put it, how to access it and how to protect it, the choice for Isilon becomes very clear.
Another interesting tidbit about the EMC Isilon platform is that any data placed on the Isilon’s OneFS filesystem over one of the supported protocols, is immediately and simultaneously available over all of the other supported protocols. Given this concept, EMC had the vision to provide Hadoop Namenode and Datanode functionality on Isilon, thereby enabling HDFS to be accessed as a protocol. Functionally, this enables a Hadoop users to leverage Isilon HDFS for their Hadoop cluster storage and to push data onto HDFS via a CIFS share on Isilon. Given the built in data and hardware platform protection of Isilon, HDFS immediately gains two important benefits that are missing in a traditional Hadoop cluster: 1. Namenode redundancy without having to create multiple Namenode instances. 2. Dataset data protection against node loss and corruption.
In a traditional sense, deployments of Hadoop are implemented on dedicated infrastructure that present the following problems:
- Traditional Hadoop deployments utilize Direct attached storage (DAS) which is isolated within a cluster and poorly utilized as a result, further because of the isolation, filesystem operations are built to provide mirroring of data that exacerbates the efficiency issue.
- The process of mirroring data across DAS units is a built-in requirement in HDFS and creates a serious demand on time and resources to ingest large-scale data sets. In such cases, data-sets may need to be broken down into smaller units, thereby creating more work and causing more time to benefit or insight delays.
- DAS infrastructure provides functionality that supports only one distribution of Hadoop at a time against a given hardware cluster.
- Traditional NameNode configurations present the possibility of a single-point of failure due to the loss of a non-clustered NameNode.
- Traditional Hadoop deployments on DAS lack any sort of enterprise-level data protection.
As the world’s first and only scale-out NAS platform with built-in support for the Hadoop Distributed File System (HDFS), Isilon has the capability to provide a platform for in-place analytics – efficiently store vast quantities of unstructured data sets in and a centralized fashion and leverage this same location with your analytics compute resources – no further data movement necessary.
Multiple groups and projects within an organization no longer need to provision distinct DAS-based Hadoop clusters for their own purposes. Isilon provides for maximum flexibility by providing a single large bucket of storage for a variety of simultaneous uses, including multiple simultaneous Hadoop distributions. Aside from providing great flexibility, this allows an organization to reduce capital expenditure, leverage their Isilon environment to the maximum and increase agility in their analytics efforts.
Aside from the massive scalability and increase in efficiencies for analytics, are the built in data protection capabilities of Isilon. Due to the HDFS integration with Isilon, NameNode redundancy problems in traditional Hadoop configurations are resolved on the Isilon platform. Further, datasets are protected by Isilon’s highly-available architecture and snapshot and replication capacities. With this in mind, unwieldy copy data backups are not necessary when leveraging Isilon for datasets of any size.
Isilon certainly bears some striking advantages for organizations looking to execute on Hadoop projects. Intuitively, one can see that those organizations already invested in Isilon for other purposes can now seek to capitalize on their investment further by realizing greater value from their investment.
Please let me know if you have any questions or comments.