Scaled power of Hadoop
A leading online marketing and data analytics company was looking for a more cost effective and secure alternative to AWS for their sensitive Hadoop clusters.
The most challenging part of this decision was the significant costs to be invested in the company’s private data centers, which required considerable infrastructure modernization and greater capacity. Consequently, any potential solution needed to be cost effective and based on open-source technologies. Additional requirements included: minimal manual intervention, and continued function without internet access.
After careful consideration of all requirements, including analysis of existing infrastructure, budgeting and labor costs, we engineered a scalable architecture based on open-source software.
- Main Support Cluster - In order to deploy a large number of Hadoop nodes, primary infrastructure must first be created. This includes: operating system installation and configuration (Cobbler), further support software installation and configuration (CHEF + cookbooks), DNS services resolving configuration (etcd / confd) and a set of monitoring tools (Collectd, Sensu, flapjack, logstash).
- PXE boot - Thanks to kickstart (ISO) images Cobbler allows other servers and nodes to be booted via PXE netboot. This ensures both flexibility and scalability within the data center.
- Cloudera manager - All Hadoop nodes (200-500 nodes) are managed in a centralized way with Clodera manager. This necessitates Cloudera agent installation on every Hadoop node for hardbeat sensing and metrics collection. For this Ambari API is used.
Preliminary results show that during the initial installation period manual labor cost has been reduced drastically. The business goal has been achieved, and the solution is currently being scaled with additional hardware.