Superset 1.1.0 新手指南
pip install apache-superset==1.1.0
on CentOS 7.6
SuperSetopen in new window 是出色的开源 BI 工具(3.85W Star),SuperSet 1.1.0 版本可谓脱胎换骨,换了 UI 和交互,非常出色,值得推荐。
以下是艰辛的安装过程。
题记
使用 docker-composeopen in new window 部署遇到容器内无法与外部网络不通,使用 Running on Kubernetesopen in new window 的方式,添加 database 又遇到 An error occurred while creating databases: (configuration_method) Missing data for required field.
最后直接本机使用 pip install apache-superset==1.1.0
的方式部署成功,部署之路可谓坎坷。
前置条件
- CentoOS 7.6 x86-64
- Python >= 3.7.4
1. 准备安装环境
1.1 安装 Python 3.7.4
因为 apache-superset==1.1.0open in new window 的 Python 版本要求是 Requires: Python ~=3.7,所以首先把 CentOS 7 中的 Python 换成 3.7.4,否则只能安装 apache-superset 0.38.0open in new window。
# wget https://www.python.org/ftp/python/3.7.0/Python-3.7.4.tgz
# tar zxf Python-3.7.4.tgz
# cd Python-3.7.4
# ./configure
# make && make install
2
3
4
5
1.2 安装系统依赖
# yum install gcc gcc-c++ libffi-devel python3-devel python3-pip python3-wheel openssl-devel cyrus-sasl-devel openldap-devel
1.3 安装 SuperSet
# pip install apache-superset==1.1.0
- 初始化 DB
# superset db upgrade
- Create an admin user
# export FLASK_APP=superset
# superset fab create-admin
2
- Load some data to play with
# superset load_examples
- Create default roles and permissions
# superset init
- 启动 Web 服务
superset 使用了 django 框架,使用如下方式启动服务。
# superset run -h 10.0.0.30 -p 8088 --with-threads
2. 添加待查询的数据库实例
在 Install Database Driversopen in new window 页面中找到需要安装数据库驱动的方式。
这里介绍下小编需要使用的 MySQL、Hive、Impala、Druid
- MySQL
# pip install mysqlclient
连接串: mysql://{username}:@{password}{hostname}:{port}/{database}
- Hive
# pip install 'pyhive[hive]'
连接串: hive://{hostname}:{port, default 10000}/{database}
- Impala
# pip install impyla
连接串: impala://{hostname}:{port, default 21050}/{database}
- Druid
# pip install pydruid
连接串: druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql
3. 出图
3.1 添加 datasets
3.2 创建 Chart
自带 46 种 图表类型。
3.3 创建仪表盘
3.4 SQL Lab
FAQ
使用 pip 安装 superset 还是蛮艰辛的,其中坎坷,一言难尽。
error: command 'gcc' failed with exit status 1
因为前面的安装依赖没做。
src/geohash.cpp:538:20: 致命错误:Python.h:没有那个文件或目录
error: command 'gcc' failed with exit status 1
2
3
ERROR: No matching distribution found for pandas<1.3,>=1.2.2
这是因为 pandas==1.2.2open in new window Requires: Python >=3.7.1,Python 3.7.0 是不行的。
ERROR: Could not find a version that satisfies the requirement pandas<1.3,>=1.2.2 (from apache-superset) (from versions: 0.1, 0.2b0, 0.2b1, 0.2, 0.3.0b0, 0.3.0b2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0rc1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0rc1, 0.8.0rc2, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0rc1, 0.19.0, 0.19.1, 0.19.2, 0.20.0rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0rc1, 0.21.0, 0.21.1, 0.22.0, 0.23.0rc2, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0rc1, 0.24.0, 0.24.1, 0.24.2, 0.25.0rc0, 0.25.0, 0.25.1, 0.25.2, 0.25.3, 1.0.0rc0, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.1.0rc0, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5)
ERROR: No matching distribution found for pandas<1.3,>=1.2.2
2
ERROR: No matching distribution found for mysqlclient==2.0.3
安装 mysqlclient 时,提示 mariadb_config 不存在,使用 yum 安装 mariadb-devel
即可。
# pip install mysqlclient==2.0.3
Looking in indexes: http://mirrors.tencentyun.com/pypi/simple
Collecting mysqlclient==2.0.3
Downloading http://mirrors.tencentyun.com/pypi/packages/3c/df/59cd2fa5e48d0804d213bdcb1acb4d08c403b61c7ff7ed4dd4a6a2deb3f7/mysqlclient-2.0.3.tar.gz (88 kB)
|████████████████████████████████| 88 kB 9.1 MB/s
ERROR: Command errored out with exit status 1:
command: /usr/local/bin/python3.7 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/setup.py'"'"'; __file__='"'"'/tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-bwo0e6yy
cwd: /tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/
Complete output (15 lines):
/bin/sh: mysql_config: 未找到命令
/bin/sh: mariadb_config: 未找到命令
/bin/sh: mysql_config: 未找到命令
mysql_config --version
mariadb_config --version
mysql_config --libs
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/setup.py", line 15, in <module>
metadata, options = get_config()
File "/tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/setup_posix.py", line 70, in get_config
libs = mysql_config("libs")
File "/tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/setup_posix.py", line 31, in mysql_config
raise OSError("{} not found".format(_mysql_config_path))
OSError: mysql_config not found
----------------------------------------
WARNING: Discarding http://mirrors.tencentyun.com/pypi/packages/3c/df/59cd2fa5e48d0804d213bdcb1acb4d08c403b61c7ff7ed4dd4a6a2deb3f7/mysqlclient-2.0.3.tar.gz#sha256=f6ebea7c008f155baeefe16c56cd3ee6239f7a5a9ae42396c2f1860f08a7c432 (from http://mirrors.tencentyun.com/pypi/simple/mysqlclient/) (requires-python:>=3.5). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement mysqlclient==2.0.3 (from versions: 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.3.8, 1.3.9, 1.3.10, 1.3.11rc1, 1.3.11, 1.3.12, 1.3.13, 1.3.14, 1.4.0rc1, 1.4.0rc2, 1.4.0rc3, 1.4.0, 1.4.1, 1.4.2, 1.4.2.post1, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 2.0.0, 2.0.1, 2.0.2, 2.0.3)
ERROR: No matching distribution found for mysqlclient==2.0.3
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
ModuleNotFoundError: No module named '_sqlite3'
因为 CentOS 中使用 yum 可以安装 Python 3.6 ,版本差异不大,拷贝一下即可。
cp /usr/lib64/python3.6/lib-dynload/_sqlite3.cpython-36m-x86_64-linux-gnu.so /usr/local/lib/python3.7/lib-dynload/cpython-37m-x86_64-linux-gnu.so
ModuleNotFoundError: No module named '_bz2'
同上。
# cp /usr/lib64/python3.6/lib-dynload/_bz2.cpython-36m-x86_64-linux-gnu.so /usr/local/lib/python3.7/lib-dynload/_bz2.cpython-37m-x86_64-linux-gnu.so
Issue 1002 - The database returned an unexpected error.
hive 连接串中未填用户名,默认为 root,没有 /user 目录的权限。
DB engine Error
hive error: ('Query error', 'Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:61)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)\n')
This may be triggered by:
Issue 1002 - The database returned an unexpected error.
2
3
4
5
hive: Error while fetching schema list
在 SQL Lab 中查询 hive 的表时,有时候会遇到如下报错,暂时还没找到原因,可能是 pyhive 库的版本兼容性,先记录下,后面找到原因再更新文章。
INFO: - - [16/May/2021 16:06:07] "GET /api/v1/database/3/schemas/?q=(force:!f) HTTP/1.1" 500 -
Failed to fetch database function names with error: type object 'TCLIService' has no attribute 'Client'
ERROR:superset.models.core:Failed to fetch database function names with error: type object 'TCLIService' has no attribute 'Client'
ERROR:root:type object 'TCLIService' has no attribute 'Client'
2
3
4
Reference
- [1] Apache. SuperSet Running on Kubernetesopen in new window
- [2] danny_duan. Superset1.0.1安装和数据源对接open in new window