Building Clusters Journey
This doc is about how we build "BigData" related clusters such as Hadoop, Spark, Kafka etc. Specifically we'll look into following clusters:
- Hadoop 3 cluster
- Spark cluster:
- Spark on Standalone
- Spark on Hadoop/YARN
- Spark on Kubernetes
- Hive on Hadoop/YARN
- Presto
- Kafka
Note that we're using Apache licensed software for these projects except for Presto.
#
Contents (& Progress)- Hadoop 3
- Spark cluster 2. Spark on Hadoop/YARN (plus Spark on Standalone) 3. Spark on Kubernetes (Not yet)
- Hive on Hadoop/YARN
- Presto (Not yet)
- Kafka cluster (Not yet)
#
EnvironmentHere's the detail which you need to prepare when you build a cluster:
- Amazon EC2 instances
- us-east-1
- AmazonLinux2 (AMI ID:
ami-0947d2ba12ee1ff75
) - Instance type: m5.xlarge
clush
command - version:clush 1.8.3