carbondata 安装 部署 测试

所有命令在 master 节点运行

下载 carbondata jar 文件
[https://dist.apache.org/repos/dist/release/carbondata/1.6.0/apache-carbondata-1.6.0-bin-spark2.2.1-hadoop2.7.2.jar]

1. 分发到每个主从节点
scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn001:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn002:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn003:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn004:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn005:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn006:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn007:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn008:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn009:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn010:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn011:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn012:/usr/hdp/current/spark2-thriftserver/jars/

scp
/usr/hdp/current/spark2-thriftserver/jars/carbondata.jar root@yarn013:/usr/hdp/current/spark2-thriftserver/jars/

2.carbon.propertie 分发到每个主从节点

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn001:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn002:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn003:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn004:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn005:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn006:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn007:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn008:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn009:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn010:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn011:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn012:/usr/hdp/current/spark2-thriftserver/conf/

scp
/usr/hdp/current/spark2-thriftserver/conf/carbon.properties root@yarn013:/usr/hdp/current/spark2-thriftserver/conf/

3. 配置 spark-defaults.conf
并分发到每个主从节点
spark.executor.extraJavaOptions
-Dcarbon.properties.filepath=/usr/hdp/current/spark2-thriftserver/conf/carbon.properties

spark.driver.extraJavaOptions
-Dcarbon.properties.filepath=/usr/hdp/current/spark2-thriftserver/conf/carbon.properties

scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn001:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn002:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn003:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn004:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn005:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn006:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn007:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn008:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn009:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn010:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn011:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn012:/usr/hdp/current/spark2-thriftserver/conf/
scp /usr/hdp/current/spark2-thriftserver/conf/spark-defaults.conf root@yarn013:/usr/hdp/current/spark2-thriftserver/conf/

4. 测试 spark-shell

import org.apache.spark.sql.SparkSession

import org.apache.spark.sql.CarbonSession._

val carbon =
SparkSession.builder().config(sc.getConf).getOrCreateCarbonSession(“hdfs://master:8020/user/carbon/carbonstore”)

carbon.sql(“show
tables”)

carbon.sql(“create table user_carbon_test(id int,name
string)STORED BY ‘carbondata’”)

carbon.sql(“insert into user_carbon_test values (1,‘baidu’)”)

carbon.sql("select * from user_carbon_test")

不报错代表测试通过。

5. 支持 hive yarn 参考官网文档https://carbondata.apache.org/hive-guide.html

cp /usr/hdp/current/spark2-thriftserver/jars/carbondata.jar /usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn001:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn002:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn003:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn004:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn005:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn006:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn007:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn008:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn009:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn010:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn011:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn012:/usr/hdp/current/hadoop-yarn-nodemanager/lib
scp /usr/hdp/current/hadoop-yarn-nodemanager/lib/carbondata.jar root@yarn013:/usr/hdp/current/hadoop-yarn-nodemanager/lib

7. 验证
hive
select * from user_carbon_test;

8. 集成 alluxio
各个节点 修改 /usr/hdp/current/spark2-thriftserver/conf/carbon.properties 文件
carbon.storelocation=alluxio://master:19998/user/carbon/carbonStore
carbon.ddl.base.hdfs.url=alluxio://master:19998/user/carbon/data

以下是简单验证
CREATE TABLE u_user (userid INT, age INT, gender CHAR(1), occupation STRING, zipcode STRING) LOCATION ‘alluxio://master:19998/ml-100k’;
验证
select * from user_carbon_test;