ClickHouse 的基本用法

1. 建表相关

1.1 数据类型

    类型              备注     
UInt8, UInt16, UInt32, UInt64 unsigned int 类型相关
Int8, Int16, Int32, Int64 int 类型相关
Float32, Float64 Float32 为 Float,Float64 为 Double
Decimal 18.12.13 版本后支持
Boolean
String
FixedString(N) 固定长度的 String,短于 N 时字符串后面填充 Null, 长于 N 时返回异常
UUID
Date 格式为:yyyy-MM-dd
DateTime 格式为:yyyy-MM-dd HH🇲🇲ss
Enum
Array(T)
Nullable 可空字符串表示为 Nullable(String)

1.2 表引擎

    引擎              备注     
MergeTree 需要设定分区字段,并且表内至少需要一个字段为 Date 类型
ReplaceingMergeTree 与 MergeTree 不同在于会删除相同 primary key 的重复数据
TinyLog 没有 Date 字段要求,没有分区,适合用于数据量少的情况
Log 没有 Date 字段要求,没有分区,适合用于数据量少的情况
Memory 内存表
Buffer
External data 外部数据表
Distributed 分布式表,该表本身不存储数据,它会记录配置的机器中相同的表名并整合分布在各机器中的数据

1.3 建表

建表通过 CREATE TABLE 语句创建新的表,支持绝大部分 sql 语句,字段大小写敏感,后面需要设置表的引擎,如 MergeTree,Log 等,如 ssp_report_overall_dm 的建表语句为:

  CREATE TABLE ssp_report_overall_dm (
	publisherid Int32 DEFAULT CAST(0 AS Int32),
	appid Int32 DEFAULT CAST(0 AS Int32),
	countryid Int32 DEFAULT CAST(0 AS Int32),
	carrierid Int32 DEFAULT CAST(0 AS Int32),
	adtype Int32 DEFAULT CAST(0 AS Int32),
	campaignid Int32 DEFAULT CAST(0 AS Int32),
	offerid Int32 DEFAULT CAST(0 AS Int32),
	imageid Int32 DEFAULT CAST(0 AS Int32),
	affsub Nullable(String),
	requestcount Int64 DEFAULT CAST(0 AS Int64),
	sendcount Int64 DEFAULT CAST(0 AS Int64),
	showcount Int64 DEFAULT CAST(0 AS Int64),
	clickcount Int64 DEFAULT CAST(0 AS Int64),
	feereportcount Int64 DEFAULT CAST(0 AS Int64),
	feesendcount Int64 DEFAULT CAST(0 AS Int64),
	feereportprice Float32 DEFAULT CAST(0. AS Float32),
	feesendprice Float32 DEFAULT CAST(0. AS Float32),
	cpcbidprice Float32 DEFAULT CAST(0. AS Float32),
	cpmbidprice Float32 DEFAULT CAST(0. AS Float32),
	conversion Int64 DEFAULT CAST(0 AS Int64),
	allconversion Int64 DEFAULT CAST(0 AS Int64),
	revenue Float32 DEFAULT CAST(0. AS Float32),
	realrevenue Float32 DEFAULT CAST(0. AS Float32),
	l_time DateTime,
	b_date Date,
	publisheramid Int32 DEFAULT CAST(0 AS Int32),
	publisheramname Nullable(String),
	advertiseramid Int32 DEFAULT CAST(0 AS Int32),
	advertiseramname Nullable(String),
	appmodeid Int32 DEFAULT CAST(0 AS Int32),
	appmodename Nullable(String),
	adcategory1id Int32 DEFAULT CAST(0 AS Int32),
	adcategory1name Nullable(String),
	campaignname Nullable(String),
	adverid Int32 DEFAULT CAST(0 AS Int32),
	advername Nullable(String),
	offeroptstatus Int32 DEFAULT CAST(0 AS Int32),
	offername Nullable(String),
	publishername Nullable(String),
	appname Nullable(String),
	iab1name Nullable(String),
	iab2name Nullable(String),
	countryname Nullable(String),
	carriername Nullable(String),
	adtypeid Int32 DEFAULT CAST(0 AS Int32),
	adtypename Nullable(String),
	versionid Int32 DEFAULT CAST(0 AS Int32),
	versionname Nullable(String),
	publisherproxyid Int32 DEFAULT CAST(0 AS Int32),
	data_type Nullable(String),
	feecpctimes Int64 DEFAULT CAST(0 AS Int64),
	feecpmtimes Int64 DEFAULT CAST(0 AS Int64),
	feecpatimes Int64 DEFAULT CAST(0 AS Int64),
	feecpasendtimes Int64 DEFAULT CAST(0 AS Int64),
	feecpcreportprice Float32 DEFAULT CAST(0. AS Float32),
	feecpmreportprice Float32 DEFAULT CAST(0. AS Float32),
	feecpareportprice Float32 DEFAULT CAST(0. AS Float32),
	feecpcsendprice Float32 DEFAULT CAST(0. AS Float32),
	feecpmsendprice Float32 DEFAULT CAST(0. AS Float32),
	feecpasendprice Float32 DEFAULT CAST(0. AS Float32),
	countrycode Nullable(String),
	packagename Nullable(String),
	domain Nullable(String),
	operatingsystem Nullable(String),
	systemlanguage Nullable(String),
	devicebrand Nullable(String),
	devicetype Nullable(String),
	browserkernel Nullable(String),
	b_time DateTime,
	respstatus Int32 DEFAULT CAST(0 AS Int32),
	winprice Float64 DEFAULT 0.,
	winnotices Int64 DEFAULT CAST(0 AS Int64),
	issecondhighpricewin Int32 DEFAULT CAST(0 AS Int32),
	companyid Int32 DEFAULT CAST(0 AS Int32),
	companyname Nullable(String),
	test Int32 DEFAULT CAST(0 AS Int32),
	ruleid Int32 DEFAULT CAST(0 AS Int32),
	smartid Int32 DEFAULT CAST(0 AS Int32),
	proxyid Int32 DEFAULT CAST(0 AS Int32),
	smartname Nullable(String),
	rulename Nullable(String),
	appcompanyid Int32 DEFAULT CAST(0 AS Int32),
	offercompanyid Int32 DEFAULT CAST(0 AS Int32),
	newcount Int64 DEFAULT CAST(0 AS Int64),
	activecount Int64 DEFAULT CAST(0 AS Int64),
	adcategory2id Int32 DEFAULT CAST(0 AS Int32),
	adcategory2name Nullable(String),
	publisherampaid Int32 DEFAULT CAST(0 AS Int32),
	publisherampaname Nullable(String),
	advertiseramaaid Int32 DEFAULT CAST(0 AS Int32),
	advertiseramaaname Nullable(String),
	eventname Nullable(String),
	recommender Int32 DEFAULT CAST(0 AS Int32),
	ratertype Int32 DEFAULT CAST(0 AS Int32),
	raterid Nullable(String)
  ) ENGINE = MergeTree PARTITION BY (b_date, b_time) ORDER BY (b_date, b_time) SETTINGS index_granularity = 8192;

其中 PARTITION BY 后面为分区字段,与 hive 的 PARTITION 含义相近,ORDER BY 后面为排序字段,index_granularity 为索引粒度。

2. 数据导入

目前 ClickHouse 支持 4 中数据类型的导入,分别是 VALUES,CSV,TSV,JSONEachRow

2.1  Values

values 方式为普通的 sql insert 的方式,如:

INSERT INTO country(id, name, alpha2_code) VALUES(1, "All", "UN");

2.2 CSV

csv 需要通过 http 或者 tcp 的方式导入,数据通过逗号分隔,csv 数据的个数需要与 INSERT 语句声明的个数一致,否则插入失败

#tcp方式
clickhouse-client --format_csv_delimiter="|" --query="INSERT INTO country FORMAT CSV" < data.csv

#http方式
curl 'http://node111:8123/?query=INSERT INTO country FORMAT CSV' -d "1,All,UN"

2.3 TSV

tsv 与 csv 方式几乎相同,相比于 csv,tsv 的数据通过 tab 分割

2.4 JSONEachRow

相比于 csv 和 tsv,通过 JSONEachRow 的导入数据的速度最慢,但是兼容性最高。在增加字段的时候,只要设置了默认值,在原来基础上的服务就不需要修改,缺少的字段直接填上默认值。

curl 'http://node111:8123/?query=INSERT INTO country FORMAT JSONEachRow' -d '{"id": 1, "name": "All", "alpha2_code": "UN"}'

3. 数据查询

数据查询支持 JDBC,ODBC,也支持 http,tcp。官方提供了默认 JDBC 包,通过 http 实现,也可以通过 http post 方法发送 sql 获取数据。默认返回的数据格式为逗号分隔,可以通过参数设置成 json 的格式返回

curl 'http://node111:8123/' -d "SELECT * FROM country FORMAT JSON"