Home

用Docker快速上手Clickhouse

用Docker快速上手Clickhouse

ClickHouse 是一个由俄罗斯搜索巨头Yandex开源的分布式列存储OLAP数据库。主要的特点有

看起来非常厉害的样子,但 Clickhouse 最吸引人的一点应该就是一个“快”字吧。但是骡子是马拉出来溜溜,让我们开始吧。

单机

Clickhouse官方有提供 clickhouse 的docker镜像, 只要简单运行

docker run -d --name clickhouse-server --ulimit nofile=262144:262144 -p 9000:9000 yandex/clickhouse-server:1.1

clickhouse-server 就可以跑起来了。但我们想有更多的控制项。

先将默认的配置拷贝出来

mkdir etc
mkdir data

docker run -it --rm --entrypoint=/bin/bash -v $PWD:/work --privileged=true --user=root yandex/clickhouse-server:1.1
cp -r /etc/clickhouse-server/* /work/etc/
exit

再运行

docker run -d --name clickhouse-server \
	--ulimit nofile=262144:262144 \
	-p 9000:9000 \
	-v $PWD:/etc/clickhouse-server \
	-v $PWD/data:/var/lib/clickhosue \
	--privileged=true --user=root \
	yandex/clickhouse-server:1.1

clickhouse 跑起来之后,就可以去 官方教程 那里玩一玩啦。

集群部署

clickhouse性能再好,单机总是有上限的。但 clickhouse 可以通过集群分片来应对数据的增长。

用 docker-compose 我们可以很轻松的跑一个 clickhouse 集群。 下面我们来跑一个3节点分片的 clickhouse 集群

mkdir -p clickhouse-3shards/ch01
mkdir -p clickhouse-3shards/ch02
mkdir -p clickhouse-3shards/ch03

mkdir -p clickhouse-3shards/ch01/data
mkdir -p clickhouse-3shards/ch02/data
mkdir -p clickhouse-3shards/ch03/data

cp -r etc clickhouse-3shards/ch01/etc
cp -r etc clickhouse-3shards/ch02/etc
cp -r etc clickhouse-3shards/ch03/etc

vim docker-compose.yaml

配置 clickhouse 集群

为方便管理,我们将集群相关的配置拿出来,放在metrika.xml。在每个config.xml文件里面加入下面一行引入metrika.xml

<include_from>/etc/clickhouse-server/metrika.xml</include_from>

metrika.xml

<yandex>
<clickhouse_remote_servers>
    <!-- 集群名 -->
    <perftest_3shards_1replicas>
        <!-- 分片地址 -->
        <shard>
            <internal_replication>true</internal_replication>
            <replica>
                <host>clickhouse01</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <internal_replication>true</internal_replication>
            <replica> 
                <host>clickhouse02</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <internal_replication>true</internal_replication>
            <replica>
                <host>clickhouse03</host>
                <port>9000</port>
            </replica>
        </shard>
    </perftest_3shards_1replicas>
</clickhouse_remote_servers>

<!-- 宏配置,用于分布式建表时做替换,每个节点配置不能相同 -->
<macros>
    <shard>01</shard>
    <replica>01</replica>
</macros>


<networks>
   <ip>::/0</ip>
</networks>

</yandex>

docker-compose.yaml

version: '2'

services:
  clickhouse01:
    image: yandex/clickhouse-server:1.1
    expose:
      - "9000"
    user: root
    ports:
      - "9001:9000"
    volumes:
      - ./ch01/etc:/etc/clickhouse-server 
      - ./ch01/data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144  
        hard: 262144  
    privileged: true

  clickhouse02:
    image: yandex/clickhouse-server:1.1
    expose:
      - "9000"
    user: root
    ports:
      - "9002:9000"  
    volumes:
      - ./ch02/etc:/etc/clickhouse-server 
      - ./ch02/data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144  
        hard: 262144  
    privileged: true

  clickhouse03:
    image: yandex/clickhouse-server:1.1
    expose:
      - "9000"
    user: root
    ports:
      - "9003:9000"  
    volumes:
      - ./ch03/etc:/etc/clickhouse-server 
      - ./ch03/data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144  
        hard: 262144  
    privileged: true  

配置之后启动集群

docker-compose -d up

事不宜迟,我们来先在每个 clickhouse-server 上都建一个本地表和 Distributed 表。

clickhouse-client --port=9001
# 其他clickhouse-server 同样处理
# clickhouse-client --port=9002
# clickhouse-client --port=9003

CREATE TABLE chtest_local (TDate Date,Value UInt16) ENGINE = MergeTree(TDate, (Value, TDate), 8192);

CREATE TABLE chtest_all AS chtest_local ENGINE = Distributed(perftest_3shards_1replicas, default, chtest_local, rand());

在任意一台插入一些数据

clickhouse-client --port=9001

insert into chtest_all (TDate,Value) values ('2017-12-25', 111);
insert into chtest_all (TDate,Value) values ('2017-12-25', 222);
insert into chtest_all (TDate,Value) values ('2017-12-26', 333);
insert into chtest_local (TDate,Value) values ('2017-12-26', 444);

之后就可以查 chtest_all 获取全部都数据。注意的是写入 chtest_local 的数据也是可以在 chtest_all 查出来的。

:) select * from chtest_all;

SELECT *
FROM chtest_all 

┌──────TDate─┬─Value─┐
│ 2017-12-26 │   444 │
└────────────┴───────┘
┌──────TDate─┬─Value─┐
│ 2017-12-25 │   111 │
└────────────┴───────┘
┌──────TDate─┬─Value─┐
│ 2017-12-25 │   222 │
└────────────┴───────┘
┌──────TDate─┬─Value─┐
│ 2017-12-26 │   333 │
└────────────┴───────┘

4 rows in set. Elapsed: 0.008 sec. 

:) select * from chtest_local;

SELECT *
FROM chtest_local 

┌──────TDate─┬─Value─┐
│ 2017-12-26 │   444 │
└────────────┴───────┘
┌──────TDate─┬─Value─┐
│ 2017-12-25 │   111 │
└────────────┴───────┘

2 rows in set. Elapsed: 0.006 sec. 

:) select * from chtest_local;

SELECT *
FROM chtest_local 

┌──────TDate─┬─Value─┐
│ 2017-12-25 │   222 │
└────────────┴───────┘

1 rows in set. Elapsed: 0.005 sec. 


:) select * from chtest_local;

SELECT *
FROM chtest_local 

┌──────TDate─┬─Value─┐
│ 2017-12-25 │   222 │
└────────────┴───────┘

1 rows in set. Elapsed: 0.005 sec. 

当我们任意分片挂掉的时候,是无法读的 Distributed 表的,当写入 Distributed 表是数据分片到挂了的服务时是会报错的。

docker stop cluster3shards_clickhouse03_1

clickhouse-client --port=9001

高可用集群

在分布式系统里面,为保证服务的可用性,我们需要数据多副本存储。

clickhouse 的 数据副本同步需要用到 zookeeper,所以我们修改docker-compose.yaml,加多一个节点,配置成2个分片,2个副本的高可用集群。

version: '2'

services:
  zookeeper:
    image: zookeeper:3.5
    ports:
      - "2181:2181"
      - "2182:2182"

  clickhouse01:
    image: yandex/clickhouse-server:1.1
    expose:
      - "9000"
    user: root
    privileged: true
    ports:
      - "9001:9000"
    volumes:
      - ./ch01/etc:/etc/clickhouse-server 
      - ./ch01/data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144  
        hard: 262144  
    depends_on:
      - "zookeeper"

  clickhouse02:
    image: yandex/clickhouse-server:1.1
    expose:
      - "9000"
    user: root
    privileged: true
    ports:
      - "9002:9000"  
    volumes:
      - ./ch02/etc:/etc/clickhouse-server 
      - ./ch02/data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144  
        hard: 262144  
    depends_on:
      - "zookeeper"

  clickhouse03:
    image: yandex/clickhouse-server:1.1
    expose:
      - "9000"
    user: root
    privileged: true  
    ports:
      - "9003:9000"  
    volumes:
      - ./ch03/etc:/etc/clickhouse-server 
      - ./ch03/data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144  
        hard: 262144  
    depends_on:
      - "zookeeper"

  clickhouse04:
    image: yandex/clickhouse-server:1.1
    expose:
      - "9000"
    user: root
    privileged: true    
    ports:
      - "9004:9000"  
    volumes:
      - ./ch04/etc:/etc/clickhouse-server 
      - ./ch04/data:/var/lib/clickhouse
    ulimits:
      nofile:
        soft: 262144  
        hard: 262144  
    depends_on:
      - "zookeeper"

集群配置

<yandex>
<clickhouse_remote_servers>
    <perftest_2shards_2replicas>
        <shard>
            <internal_replication>true</internal_replication>
            <replica>
                <host>clickhouse01</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>clickhouse03</host>
                <port>9000</port>
            </replica>
        </shard>
        <shard>
            <internal_replication>true</internal_replication>
            <replica>
                <host>clickhouse02</host>
                <port>9000</port>
            </replica>
            <replica>
                <host>clickhouse04</host>
                <port>9000</port>
            </replica>
        </shard>
    </perftest_2shards_2replicas>
</clickhouse_remote_servers>

<macros>
    <shard>01</shard>
    <replica>02</replica>
</macros>

<zookeeper-servers>
  <node index="1">
    <host>zookeeper</host>
    <port>2181</port>
  </node>
</zookeeper-servers>

<networks>
   <ip>::/0</ip>
</networks>

</yandex>

集群起来之后,在每一个节点创建表

CREATE TABLE chtest_local (TDate Date,Value UInt16) ENGINE = MergeTree(TDate, (Value, TDate), 8192);

CREATE TABLE chtest_replica (TDate Date,Value UInt16)
ENGINE = ReplicatedMergeTree(
    '/clickhouse_perftest/tables/{shard}/ontime',
    '{replica}',
    TDate,
    (Value, TDate),
    8192);
    
CREATE TABLE chtest_all AS chtest_replica ENGINE = Distributed(perftest_2shards_2replicas, default, chtest_replica, rand());    

后面再任意一台机器往 chtest_all 插入数据,可以看到ch01 与 ch03, ch02 与 ch04 都存有相同的数据。

总结

总的来说 clickhouse 的体验还是很不错的。性能不错,分布式比较方便,开箱即用。不过 clickhouse 的集群管理方面还是很弱,没有一个中央的控制节点。例如增减节点需要更新所有节点的配置,需要自己弄一套管理的工具。


参考资料

https://clickhouse.yandex/tutorial.html
http://www.cnblogs.com/gomysql/p/6708650.html
http://jackpgao.github.io/2017/12/13/ClickHouse-Cluster-Beginning-to-End/