とりあえず,サンプルプログラムくらい動かさないと,構築できたかどうか分かりません.
モンテカルロ法の円周率の計算をいきましょう.Hadoopで言う,hadoop-examplesのpiですね.
実行.
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop;export SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-0.9.1-hadoop2.0.0-cdh4.4.0.jar;export SPARK_YARN_APP_JAR=./examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar; export=SPARK_YARN_MODE=true; ./bin/run-example org.apache.spark.examples.SparkPi yarn-client
これ,一行に書く必要は全くないのですが,とりあえずSparkのドキュメントにあるようにexport無しでやると,環境変数が引き継がれなくてエラーになったんです.
なので,環境変数は一度設定すると,あとはrun-exampleからの入力で大丈夫です.
14/05/12 18:31:22 INFO yarn.Client: Command for the ApplicationMaster: $JAVA_HOME/bin/java -server -Xmx640m -Djava.io.tmpdir=$PWD/tmp org.apache.spark.deploy.yarn.WorkerLauncher --class notused --jar ./examples/target/scala-2.10/spark-examples-assembly-0.9.1.jar --args 'sparkclient:33070' --worker-memory 1024 --worker-cores 1 --num-workers 2 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
14/05/12 18:31:22 INFO yarn.Client: Submitting application to ASM
14/05/12 18:31:22 INFO client.YarnClientImpl: Submitted application application_1382610529109_9585 to ResourceManager at resourcemanager/192.168.1.4:8040
14/05/12 18:31:22 INFO cluster.YarnClientSchedulerBackend: Application report from ASM:
appMasterRpcPort: 0
appStartTime: 1399887082601
yarnAppState: ACCEPTED
14/05/12 18:31:37 INFO scheduler.TaskSetManager: Finished TID 0 in 3962 ms on hadoopdatanode3 (progress: 1/2)
14/05/12 18:31:37 INFO util.RackResolver: Resolved hadoopdatanode2 to /default-rack
14/05/12 18:31:37 INFO scheduler.DAGScheduler: Completed ResultTask(0, 1)
14/05/12 18:31:37 INFO scheduler.TaskSetManager: Finished TID 1 in 3846 ms on hadoopdatanode2 (progress: 2/2)
14/05/12 18:31:37 INFO scheduler.DAGScheduler: Stage 0 (reduce at SparkPi.scala:39) finished in 6.291 s
14/05/12 18:31:37 INFO cluster.YarnClientClusterScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
14/05/12 18:31:37 INFO spark.SparkContext: Job finished: reduce at SparkPi.scala:39, took 6.428673126 s
Pi is roughly 3.13856良い感じです!Hadoop側のresourcemanagerにアクセスして,その後Hadoopのデータノードの2つにアクセスしてる感じです.
ホンマにラフですが,円周率の近似値が出てます.