0%

OpenTSDB 简介

OpenTSDB是一个基于HBase的可扩展时间序列数据库,支持数百万每秒的读写,与OpenTSDB的交互主要是通过运行一个或多个TSD来实现的。每个TSD是独立的,没有主设备,没有共享状态,因此您可以根据需要运行任意数量的 TSD,以处理向其施加的任何负载。

Cloudera Express管理并运行了生产环境的 HDFS/HBase/ZooKeeper。

项目背景

OpenTSDB存储了SDN设备间的流量数据,粒度包括:源IP、目的IP、上下行流量、服务类型、协议等。

流量统计接口按时间区间直接查询了OpenTSDB的原始数据,未经Rollup处理。

查询瓶颈

客户BJ联通反馈某台SDN设备11月初的流量统计接口返回超时。测试发现其他设备流量统计正常,联通设备查询时候有如下错误日志:

1
2
3
4
16:14:13.466 ERROR [RegionClient.exceptionCaught] - Unexpected exception from downstream on [id: 0xbbc6970f, /172.31.250.10:50790 => /172.31.120.131:16020]
com.stumbleupon.async.CallbackOverflowError: Too many callbacks in Deferred@1273979129(state=PENDING, result=null, callback=net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver@682fd571 -> net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver@3061522a -> net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver@27fac471 -> net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver@17a290df -> net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver@1304b07 ->
//....
passthrough -> passthrough) (size=16383) when attempting to add cb=net.opentsdb.tsd.HttpJsonSerializer$1DPsResolver@29bbd6fc@700176124, eb=passthrough@1260931085

Github上有人2016年就遇到该错误,CallbackOverflowError with many tag values,issue至今未关闭

翻阅了OpenTSDB报错代码段

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
private static final short MAX_CALLBACK_CHAIN_LENGTH = (1 << 14) - 1;

....

else if (last_callback == callbacks.length) {
final int oldlen = callbacks.length;
if (oldlen == MAX_CALLBACK_CHAIN_LENGTH * 2) {
throw new CallbackOverflowError("Too many callbacks in " + this
+ " (size=" + (oldlen / 2) + ") when attempting to add cb="
+ cb + '@' + cb.hashCode() + ", eb=" + eb + '@' + eb.hashCode());
}
final int len = Math.min(oldlen * 2, MAX_CALLBACK_CHAIN_LENGTH * 2);
final Callback[] newcbs = new Callback[len];
System.arraycopy(callbacks, next_callback, // Outstanding callbacks.
newcbs, 0, // Move them to the beginning.
last_callback - next_callback); // Number of items.
last_callback -= next_callback;
next_callback = 0;
callbacks = newcbs;
}

MAX_CALLBACK_CHAIN_LENGTH限制了Deferred链的长度,而进入到上面addCallbacks方法的为聚合分类后的每条数据,估计联通的问题就是处理数据超过了该限制。

测试也发现缩短联通设备的查询时间区间是能正常返回的,并且2019-11-06 8:00:00 - 2019-11-06 9:00:00的数据就近1.5W条返回,也就是说联通设备目的IP近1.5W个,大多数IP流量不多。

Deferred类是Java库提供一些有用的构建模块去 构建高性能,多线程,异步的java应用。它的实现灵感来自Twisted的异步库(twisted.internet.defer)。
Deferred允许你轻松地构建异步的处理链,这个处理链必须触发,当一个异步的事件(I/O,RPC 以及其它)完成。它能被广泛用于构建异步API,在一个多线程服务器或者是客户端中。

着手解决

OpenTSDB并不提供返回结果限制参数,至少2.4版本没有。

华为云CloudTable集成的OpenTSDB服务支持了结果限制,也就是query api提供了一个limit参数。

流量统计接口是统计该设备的TOP10流量,因此设置更大的MAX_CALLBACK_CHAIN_LENGTH没有意义:返回结果多,查询慢。

解决办法就是过滤进入到addCallback的数据大小:

1、OpenTSDB的查询最后准备返回结果的时候才通过三行代码过滤掉不在该时间区间的数据,

1
2
3
4
if (dp.timestamp() < data_query.startTime() || 
dp.timestamp() > data_query.endTime()) {
continue;
}

因此一个优化就是在进入addCallback就过滤掉这一部分数据。

2、提供一个limit参数,限制返回结果集大小。这里分两步:一是进入addCallback前通过TreeSet自动排序,限制最多limit条数据会进入处理,二是OutputStream写入时候,排序后截取前limit条数据返回。

关键是提前过滤不在时间区间的数据点及TreeSet自动排序过滤掉大部分小流量的IP数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
final Deferred<Object> cb_chain = new Deferred<Object>();
TreeSet<Number> numberTreeSet = LongStream.range(0 - limit, 0).mapToDouble(i -> 1.0 * i).boxed().collect(Collectors.toCollection(TreeSet::new));

for (DataPoints[] separate_dps : results) {
for (DataPoints dps : separate_dps) {
try {
//filter empty dps, laudukang
boolean isDpsOk = false;

for (final DataPoint dp : dps) {
if (dp.timestamp() >= data_query.startTime() &&
dp.timestamp() <= data_query.endTime()) {
isDpsOk = true;
break;
}
}

if (isDpsOk && limit > 0) {
Number dpsSum = dps.sumDps();
if (numberTreeSet.first().doubleValue() < dpsSum.doubleValue()) {
numberTreeSet.pollFirst();
numberTreeSet.add(dpsSum.doubleValue());
} else {
//filter
isDpsOk = false;
}
}

if (isDpsOk) {
cb_chain.addCallback(new DPsResolver(dps));
}
} catch (Exception e) {
throw new RuntimeException("Unexpected error durring resolution", e);
}
}
}

如query api传入参数包含limit,则先将数据写入到ByteArrayOutputStream(这样子源代码改动小),再从ByteArrayOutputStream中反序列化获得原响应数据,倒序并截取前limit条数据写入到OutputStream

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
@SuppressWarnings("unchecked")
private void sortDps(long limit, ByteArrayOutputStream stream, OutputStream output) throws IOException {
byte[] content = stream.toByteArray();
TypeReference<List<Map<String, Object>>> typeReference = new TypeReference<List<Map<String, Object>>>() {
};

List<Map<String, Object>> dataList = JSON.parseToObject(content, typeReference);

List<Map<String, Object>> resultMapList = dataList.stream()
.filter(m -> !m.containsKey("metric"))
.collect(Collectors.toList());

List<Map<String, Object>> metricNodeMapList = dataList.stream()
.filter(m -> m.containsKey("metric"))
.collect(Collectors.toList());

//sum dps for sort
metricNodeMapList.forEach(m -> {
double value = ((Map<String, Object>) m.get("dps")).values().stream()
.mapToDouble(i -> {
if (i instanceof Integer) {
return ((Integer) i).doubleValue();
}
return (double) i;
})
.sum();

m.put("sumDps", value);
});

Comparator<Map<String, Object>> dpsSorter = Comparator.comparingDouble(m -> (double) m.get("sumDps"));
List<Map<String, Object>> metricLimitNodeMapList = metricNodeMapList.stream()
.sorted(dpsSorter.reversed()).limit(limit)
.collect(Collectors.toList());

resultMapList.addAll(metricLimitNodeMapList);


final JsonGenerator json = JSON.getFactory().createGenerator(output);
json.writeObject(resultMapList);
}

编译测试

参考OpenTSDB General Development Compiles OpenTSDB and generates a Debian package,卸载官方版本后安装。

修改 laudukang/opentsdb-okhttp-client 让其支持limit查询参数。

测试显示查询耗时明显缩短,而且响应大小也减少了,不再返回几MB的json数据。

查询例子
POST /api/query?summary

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
{
"limit": 10,//限制返回条目数,倒序
"delete": false,
"end": 1573012800000,
"globalAnnotations": false,
"msResolution": false,
"noAnnotations": false,
"queries": [
{
"aggregator": "sum",
"downsample": "0all-sum",
"explicitTags": false,
"filters": [
{
"filter": "*",
"groupBy": true,
"tagk": "dst_ip",
"type": "wildcard"
},
{
"filter": "CONNECTION_OVERSEAS",
"groupBy": false,
"tagk": "service_type",
"type": "literal_or"
}
],
"metric": "1164.conn.total.in",
"preAggregate": false,
"rate": false,
"useMultiGets": false
}
],
"showQuery": false,
"showStats": false,
"showSummary": false,
"showTSUIDs": false,
"start": 1572998400000,
"timezone": "UTC",
"useCalendar": false
}

response

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
[
{
"metric": "1164.conn.total.in",
"tags": {
"protocol": "TCP",
"service_type": "CONNECTION_OVERSEAS",
"dst_ip": "107.167.89.234"
},
"aggregateTags": [
"src_ip"
],
"dps": {
"157296960": 17588304652
},
"sumDps": 17588304652
},
{
"metric": "1164.conn.total.in",
"tags": {
"src_ip": "240.16.1.109",
"protocol": "TCP",
"service_type": "CONNECTION_OVERSEAS",
"dst_ip": "152.199.39.180"
},
"aggregateTags": [],
"dps": {
"157296960": 9804591660
},
"sumDps": 9804591660
},

//余下8条省略...
]

提供一个编译好的安装包:
Debian package: opentsdb-2.4.0_all.deb

Debian package: opentsdb-2.4.0_all.deb Backup Url

1
2
>md5sum opentsdb-2.4.0_all.deb
22c98200d548b04ef8048922c9886cd0 opentsdb-2.4.0_all.deb

部分资料

Install pssh, to batch config spark slaves

1
2
3
4
5
6
7
wget https://storage.googleapis.com/google-code-archive-downloads/v2/code.google.coms/parallel-ssh/pssh-2.3.1.tar.gz

tar -xzvf pssh-2.3.1.tar.gz

cd pssh-2.3.1

python setup.py install

使用参见:

System config in master

/work/hosts

1
2
3
4
5
6
7
8
root@192.168.4.210
root@192.168.4.211
root@192.168.4.212
root@192.168.4.213
root@192.168.4.214
root@192.168.4.215
root@192.168.4.216
root@192.168.4.217

/etc/hosts

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
127.0.0.1       localhost
127.0.1.1 ubuntu

# The following lines are desirable for IPv6 capable hosts
::1 localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

192.168.4.211 n1
192.168.4.212 n2
192.168.4.213 n3
192.168.4.214 n4
192.168.4.215 n5
192.168.4.216 n6
192.168.4.217 n7

/etc/environment

1
2
3
4
5
6
7
8
9
10
LANGUAGE="zh_CN:zh:en_US:en"
LANG=zh_CN.GBK
SPARK_HOME=/work/spark
SCALA_HOME=/opt/scala
JAVA_HOME=/opt/java
J2SDKDIR=/opt/java
J2REDIR=/opt/java/jre
DERBY_HOME=/opt/java/db
PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/opt/java/bin:/opt/scala/bin:/work/spark/bin"
CLASSPATH=..:/opt/java/lib:/opt/java/jre/lib:/opt/scala/lib:/work/spark/jars

/work/spark/conf/spark-env.sh,update ens3 to your network interface

1
2
3
4
5
6
7
8
export SPARK_LOCAL_IP=$(ifconfig ens3 | grep "inet addr:" | awk '{print $2}' | cut -c 6-)
export SPARK_MASTER_HOST=n1
export SPARK_MASTER_PORT=7077
export SPARK_MASTER_WEBUI_PORT=8000
export SPARK_HISTORY_OPTS="-Dspark.history.fs.logDirectory=/work/spark/logs-event"
#export SPARK_EXECUTOR_CORES=1
#export SPARK_EXECUTOR_MEMORY=512M
#export SPARK_WORKER_INSTANCES=4

update_hostname.sh, update default hostname ubuntu to n{machine ip last number}

1
2
3
4
5
6
7
8
9
#!/usr/bin/env bash
NODE_LOCAL_IP=$(ifconfig ens3 | grep "inet addr:" | awk '{print $2}' | cut -c 6-)
NEW_HOSTNAME="n${NODE_LOCAL_IP:12}"

#echo $NEW_HOSTNAME > /proc/sys/kernel/hostname
#echo $NEW_HOSTNAME > /etc/hostname

sed -i 's/127.0.1.1.*/127.0.1.1\t'"$NEW_HOSTNAME"'/g' /etc/hosts
hostnamectl set-hostname $NEW_HOSTNAME

Init master and slaves in master

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
pssh -h hosts "apt update"
pssh -h hosts "apt install htop -y"

#zsh init
pssh -h hosts "apt install zsh -y"
pssh -h hosts 'sh -c "$(curl -fsSL https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh)";'
pssh -h hosts chsh -s $(which zsh)

#update dns resolve
pscp -h hosts /etc/resolvconf/resolv.conf.d/base /etc/resolvconf/resolv.conf.d/base
pssh -h hosts "resolvconf -u"

#host update
pscp -h hosts /etc/hosts /etc

#create work directory
pssh -h hosts "mkdir /work"

#copy from master: 192.168.4.210
#plesae remove 192.168.4.210 in hosts

tar -zxvf jdk-8u172-linux-x64.tar.gz -C /opt
tar -zxvf scala-2.11.12.tgz -C /opt
tar -zxvf scala-2.11.12.tgz -C /opt
mv /opt/jdk-8u172-linux-x64 /opt/java
mv /opt/scala-2.11.12 /opt/scala

tar -zxvf spark-2.3.0-bin-hadoop2.7.tgz -C /work
mv spark-2.3.0-bin-hadoop2.7 spark

#copy java, scala
pscp -r -h hosts /opt/* /opt/
#copy spark
pscp -r -h hosts /work/spark /work

#update env variables
pscp -h hosts /etc/environment /etc
pssh -h hosts "source /etc/environment"

#update hostname
pscp -h hosts update_hostname.sh /tmp/
pssh -h hosts "./tmp/update_hostname.sh"

Hadoop config in master

此处引入hadoop是为了slaves从hadoop拉去资源启动app;

当然,也可以复制jar到各slaves相同的路径启动;

后面发现,每次重启app,spark都会从hadoop全量拉取一遍资源到spark/work目录。

core-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/work/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://n0:9820</value>
</property>
</configuration>

hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/work/hadoop/tmp/dfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/work/hadoop/tmp/dfs/datanode</value>
</property>
<property>
<name>dfs.http.address</name>
<value>n0:9870</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.secondary.http.address</name>
<value>n0:9868</value>
</property>
<property>
<name>dfs.datanode.http.address</name>
<value>n0:9864</value>
</property>
<property>
<name>dfs.datanode.address</name>
<value>n0:9866</value>
</property>
<property>
<name>dfs.datanode.hostname</name>
<value>n0</value>
</property>
</configuration>

init hadoop

1
2
3
4
5
6
7
8
9
#create hadoop user in master
sudo useradd -m hadoop -s /bin/bash

#/work/hadoop/bin
./hdfs dfs -mkdir -p /sparkHistoryLogs
./hdfs dfs -mkdir -p /eventLogs
./hdfs dfs -mkdir -p /spark

#./hdfs dfs -rm -R /spark/app/*

copy_app_resouces_to_hadoop.sh, run as hadoop user

1
2
3
4
5
6
7
8
9
10
#!/bin/bash
cd /work/hadoop/bin
#./hdfs dfs -mkdir -p /spark
./hdfs dfs -rm -R /spark/app/*
./hdfs dfs -copyFromLocal -f /work/spark/app/log4j.properties /spark/app
#spark config
./hdfs dfs -copyFromLocal -f /work/spark/app/default.conf /spark/app
#app dependencies
./hdfs dfs -copyFromLocal -f /work/spark/app/lib /spark/app
./hdfs dfs -copyFromLocal -f /work/spark/app/node-quality-streaming-0.0.1-SNAPSHOT.jar /spark/app

Exception solutions

  • fix spark WorkWebUI hostname(logs) 指向master机器hostname

    看源码,在spark-env.sh中指定SPARK_LOCAL_HOSTNAME并没起作用,

    解决方案:设置SPARK_PUBLIC_DNS参数后,worker webui中的跳转链接正常了。

    SPARK_PUBLIC_DNSpublicHostName我也是服了,

    如下图,原先stdout的链接的的主机为n0n0为master所在的机器:

    spark-worker-ui-hostname.png

    源码参考

    core/src/main/scala/org/apache/spark/ui/WebUI.scala

    1
    2
    3
    4
    5
    protected val publicHostName = Option(conf.getenv("SPARK_PUBLIC_DNS")).getOrElse(
    conf.get(DRIVER_HOST_ADDRESS))

    /** Return the url of web interface. Only valid after bind(). */
    def webUrl: String = s"http://$publicHostName:$boundPort"

    core/src/main/scala/org/apache/spark/util/Utils.scala

    1
    2
    3
    4
    5
    6
    7
    8
    9
    private lazy val localIpAddress: InetAddress = findLocalInetAddress()
    private var customHostname: Option[String] = sys.env.get("SPARK_LOCAL_HOSTNAME")

    /**
    * Get the local machine's URI.
    */
    def localHostNameForURI(): String = {
    customHostname.getOrElse(InetAddresses.toUriString(localIpAddress))
    }
  • spark ConsumerRecord NotSerializableException bug

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    java.io.NotSerializableException: org.apache.kafka.clients.consumer.ConsumerRecord
    Serialization stack:
    - object not serializable (class: org.apache.kafka.clients.consumer.ConsumerRecord, value: ConsumerRecord(topic = hi2, partition = 4, offset = 385, CreateTime = 1526369397516, checksum = 2122851237, serialized key size = -1, serialized value size = 45, key = null, value = {"date":1526369397516,"message":"0hh2KcCH4j"}))
    - element of array (index: 0)
    - array (class [Lorg.apache.kafka.clients.consumer.ConsumerRecord;, size 125)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:393)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

    解决方案

    set SparkConf

    1
    2
    sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
    sparkConf.set("spark.kryo.registrator", "me.codz.registrator.CunstomRegistrator");

    create CunstomRegistrator

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    package me.codz.registrator;

    import com.esotericsoftware.kryo.Kryo;
    import org.apache.kafka.clients.consumer.ConsumerRecord;
    import org.apache.spark.serializer.KryoRegistrator;

    public class CunstomRegistrator implements KryoRegistrator {

    @Override
    public void registerClasses(Kryo kryo) {
    kryo.register(ConsumerRecord.class);
    }
    }
  • spark TaskContext.get() cause NullPointerException

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    stream.foreachRDD((VoidFunction2<JavaRDD<ConsumerRecord<String, String>>, Time>) (v1, v2) -> {
    OffsetRange[] offsetRanges = ((HasOffsetRanges) v1.rdd()).offsetRanges();

    List<ConsumerRecord<String, String>> consumerRecordList = CollectionTools.emptyWrapper(v1.collect());
    consumerRecordList.forEach(consumerRecord -> {
    TaskContext taskContext = TaskContext.get();
    int partitionId = taskContext.partitionId();
    OffsetRange o = offsetRanges[partitionId];

    //...
    });

    });

    解决方案
    using foreachPartition

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    v1.foreachPartition(consumerRecordIterator -> {
    while (consumerRecordIterator.hasNext()) {
    ConsumerRecord<String, String> consumerRecord = consumerRecordIterator.next();

    //...
    }

    TaskContext taskContext = TaskContext.get();
    int partitionId = taskContext.partitionId();
    OffsetRange offsetRange = offsetRanges[partitionId];

    //...
    });
    });
  • Spark app connect kafka server by ip suspend

    1
    2
    2018-06-07 18:40:16 [ForkJoinPool-1-worker-5] INFO :: [Consumer clientId=consumer-1, groupId=node-quality-streaming] Discovered group coordinator lau.cc:9092 (id: 2147483647 rack: null)
    2018-06-07 18:40:18 [ForkJoinPool-1-worker-5] INFO :: [Consumer clientId=consumer-1, groupId=node-quality-streaming] Group coordinator lau.cc:9092 (id: 2147483647 rack: null) is unavailable or invalid, will attempt rediscovery

    解决方案

    vim kafka/config/server.properties

    1
    2
    #add follow line
    advertised.host.name=192.168.3.20

    kafka advertised.host.name DEPRECATED since 0.10.x, 0100 brokerconfigs

    1
    DEPRECATED: only used when `advertised.listeners` or `listeners` are not set. Use `advertised.listeners` instead. Hostname to publish to ZooKeeper for clients to use. In IaaS environments, this may need to be different from the interface to which the broker binds. If this is not set, it will use the value for `host.name` if configured. Otherwise it will use the value returned from java.net.InetAddress.getCanonicalHostName().

    so follow config can also take effect

    1
    2
    listeners=PLAINTEXT://192.168.3.20:9092
    advertised.listeners=PLAINTEXT://192.168.3.20:9092
  • mark

    1
    2
    3
    java.lang.OutOfMemoryError: unable to create new native thread

    Offsets out of range with no configured reset policy for partitions

Zookeeper cluster init

Download package from here

1
2
3
4
tar -zxvf zookeeper-3.4.12.tar.gz -C /work
mv zookeeper-3.4.12 zookeeper

cp conf/zoo_sample.cfg conf/zoo.cfg

vim conf/zoo.cfg

1
2
3
4
5
6
7
8
9
10
11
12
13
14
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/work/zookeeper/data
clientPort=2181

server.0=192.168.4.210:2888:3888
server.1=192.168.4.211:2888:3888
server.2=192.168.4.212:2888:3888
server.3=192.168.4.213:2888:3888
server.4=192.168.4.214:2888:3888
server.5=192.168.4.215:2888:3888
server.6=192.168.4.216:2888:3888
server.7=192.168.4.217:2888:3888

bin/init_myid.sh, update ens3 to your network interface

1
2
3
#!/bin/bash
NODE_LOCAL_IP=$(ifconfig ens3 | grep "inet addr:" | awk '{print $2}' | cut -c 6-)
echo ${NODE_LOCAL_IP##*.} > /work/zookeeper/data/myid

init cluster myid

1
pssh -h /work/hosts "/work/zookeeper/bin/init_myid.sh"

start zookeeper cluster

1
pssh -h /work/hhosts -o out "/work/zookeeper/bin/zkServer.sh start"

参考内容

正确配置了公钥到远程机器,并且 /root/.ssh/authorized_keys存在公钥

1
ssh-copy-id -i ~/.ssh/id_rsa.pub root@192.168.3.20

权限确认无误

1
2
3
4
5
6
7
8
ls -al /root/.ssh
total 24
drwxr-xr-x 2 root root 4096 Nov 13 16:09 .
drwx------ 24 root root 4096 Feb 28 15:16 ..
-rw------- 1 root root 785 Oct 9 18:23 authorized_keys
-rw------- 1 root root 1679 Jul 29 2017 id_rsa
-rw-r--r-- 1 root root 395 Jul 29 2017 id_rsa.pub
-rw-r--r-- 1 root root 3548 Feb 7 18:22 known_hosts

然而登录还是需要输入密码,key不生效

最后查看log文件,发现很重要的一行Authentication refused: bad ownership or modes for directory /root

1
2
3
4
5
6
7
8
vim /var/log/auth.log

Feb 28 15:00:45 ubuntu sshd[6209]: Connection closed by 192.168.3.30 port 11364 [preauth]
Feb 28 15:03:12 ubuntu sshd[6206]: Received signal 15; terminating.
Feb 28 15:03:12 ubuntu sshd[6217]: Server listening on 0.0.0.0 port 22.
Feb 28 15:03:12 ubuntu sshd[6217]: Server listening on :: port 22.
Feb 28 15:03:37 ubuntu sshd[6224]: Authentication refused: bad ownership or modes for directory /root
Feb 28 15:03:37 ubuntu sshd[6224]: message repeated 3 times: [ Authentication refused: bad ownership or modes for directory /root]

/root的权限为

1
drwxr-xr-x   6  501 staff  4096 Feb 28 15:03 root/

f*ck,系统是硬盘clone过来的,样本为啥要修改/root权限?!

最后修改/root权限为700完美解决!

Minio Introduction

Minio is an object storage server released under Apache License v2.0.

It is compatible with Amazon S3 cloud storage service.

It is best suited for storing unstructured data such as photos, videos, log files, backups and container / VM images.

Size of an object can range from a few KBs to a maximum of 5TB.

Minio server is light enough to be bundled with the application stack, similar to NodeJS, Redis and MySQL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
cd /tmp
#https://dl.minio.io/server/minio/release/中选择合适的版本
curl -O https://dl.minio.io/server/minio/release/linux-amd64/minio
sudo chmod +x minio
#directory where Minio's systemd startup script expects to find it
sudo mv minio /usr/local/bin

sudo useradd -r minio-user -s /sbin/nologin
sudo chown minio-user:minio-user /usr/local/bin/minio

#创建数据存储目录
sudo mkdir -p /work/minio/data1 /work/minio/data2
sudo chown minio-user:minio-user /work/minio

#创建配置文件目录
sudo mkdir /etc/minio
sudo chown minio-user:minio-user /etc/minio

#default config, modify your own MINIO_VOLUMES/MINIO_OPTS/MINIO_ACCESS_KEY/MINIO_SECRET_KEY
cat <<EOF > /etc/default/minio
# Remote node configuration.
MINIO_VOLUMES="http://192.168.3.21/work/minio/data1 http://192.168.3.21/work/minio/data2 http://192.168.3.23/work/minio/data1 http://192.168.3.23/work/minio/data2"

# Use if you want to run Minio on a custom port.
MINIO_OPTS="-C /etc/minio --address :9001"

# Access Key of the server.
MINIO_ACCESS_KEY=YOUR-ACCESS-KEY-HERE

# Secret key of the server.
MINIO_SECRET_KEY=YOUR-SECRET-KEY-HERE
EOF


cd /etc/systemd/system/
curl -O https://raw.githubusercontent.com/minio/minio-service/master/linux-systemd/distributed/minio.service
sudo chmod 755 minio.service
sudo systemctl daemon-reload
sudo systemctl enable minio
sudo systemctl start minio
sudo systemctl status minio

Nginx Configuration

Multiple Minio servers load balance

1
2
3
4
5
6
7
8
9
10
11
12
13
14
upstream minio_servers {
server minio-server-1:9000;
server minio-server-2:9000;
}

server {
listen 80;
server_name www.example.com;

location / {
proxy_set_header Host $http_host;
proxy_pass http://minio_servers;
}
}

SSL/TLS Termination

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
server {
listen 80;
server_name www.example.com;
return 301 https://www.example.com$request_uri;
}

server {
listen 443 ssl;
server_name www.example.com;

ssl_certificate www.example.com.crt;
ssl_certificate_key www.example.com.key;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_ciphers HIGH:!aNULL:!MD5;

location / {
proxy_set_header Host $http_host;
proxy_pass http://localhost:9000;
}
}

Nginx caching

1
2
3
4
5
6
7
8
9
limit_req_zone $binary_remote_addr zone=my_req_limit:10m rate=10r/s;

server {
# ...
location /images/ {
limit_req zone=my_req_limit burst=20;
# ...
}
}

Reference

{an excerpt from “Lisp: Good News, Bad News, How to Win Big.” [html]}

2.1 The Rise of Worse is Better

I and just about every designer of Common Lisp and CLOS has had extreme exposure to the MIT/Stanford style of design. The essence of this style can be captured by the phrase the right thing. To such a designer it is important to get all of the following characteristics right:

Simplicity – the design must be simple, both in implementation and interface. It is more important for the interface to be simple than the implementation.
Correctness – the design must be correct in all observable aspects. Incorrectness is simply not allowed.
Consistency – the design must not be inconsistent. A design is allowed to be slightly less simple and less complete to avoid inconsistency. Consistency is as important as correctness.
Completeness – the design must cover as many important situations as is practical. All reasonably expected cases must be covered. Simplicity is not allowed to overly reduce completeness.
I believe most people would agree that these are good characteristics. I will call the use of this philosophy of design the MIT approach Common Lisp (with CLOS) and Scheme represent the MIT approach to design and implementation.

The worse-is-better philosophy is only slightly different:

Simplicity – the design must be simple, both in implementation and interface. It is more important for the implementation to be simple than the interface. Simplicity is the most important consideration in a design.
Correctness – the design must be correct in all observable aspects. It is slightly better to be simple than correct.
Consistency – the design must not be overly inconsistent. Consistency can be sacrificed for simplicity in some cases, but it is better to drop those parts of the design that deal with less common circumstances than to introduce either implementational complexity or inconsistency.
Completeness – the design must cover as many important situations as is practical. All reasonably expected cases should be covered. Completeness can be sacrificed in favor of any other quality. In fact, completeness must be sacrificed whenever implementation simplicity is jeopardized. Consistency can be sacrificed to achieve completeness if simplicity is retained; especially worthless is consistency of interface.
Early Unix and C are examples of the use of this school of design, and I will call the use of this design strategy the New Jersey approach I have intentionally caricatured the worse-is-better philosophy to convince you that it is obviously a bad philosophy and that the New Jersey approach is a bad approach.

However, I believe that worse-is-better, even in its strawman form, has better survival characteristics than the-right-thing, and that the New Jersey approach when used for software is a better approach than the MIT approach.

Let me start out by retelling a story that shows that the MIT/New-Jersey distinction is valid and that proponents of each philosophy actually believe their philosophy is better.

Two famous people, one from MIT and another from Berkeley (but working on Unix) once met to discuss operating system issues. The person from MIT was knowledgeable about ITS (the MIT AI Lab operating system) and had been reading the Unix sources. He was interested in how Unix solved the PC loser-ing problem. The PC loser-ing problem occurs when a user program invokes a system routine to perform a lengthy operation that might have significant state, such as IO buffers. If an interrupt occurs during the operation, the state of the user program must be saved. Because the invocation of the system routine is usually a single instruction, the PC of the user program does not adequately capture the state of the process. The system routine must either back out or press forward. The right thing is to back out and restore the user program PC to the instruction that invoked the system routine so that resumption of the user program after the interrupt, for example, re-enters the system routine. It is called PC loser-ing because the PC is being coerced into loser mode, where loser is the affectionate name for user at MIT.

The MIT guy did not see any code that handled this case and asked the New Jersey guy how the problem was handled. The New Jersey guy said that the Unix folks were aware of the problem, but the solution was for the system routine to always finish, but sometimes an error code would be returned that signaled that the system routine had failed to complete its action. A correct user program, then, had to check the error code to determine whether to simply try the system routine again. The MIT guy did not like this solution because it was not the right thing.

The New Jersey guy said that the Unix solution was right because the design philosophy of Unix was simplicity and that the right thing was too complex. Besides, programmers could easily insert this extra test and loop. The MIT guy pointed out that the implementation was simple but the interface to the functionality was complex. The New Jersey guy said that the right tradeoff has been selected in Unix – namely, implementation simplicity was more important than interface simplicity.

The MIT guy then muttered that sometimes it takes a tough man to make a tender chicken, but the New Jersey guy didn’t understand (I’m not sure I do either).

Now I want to argue that worse-is-better is better. C is a programming language designed for writing Unix, and it was designed using the New Jersey approach. C is therefore a language for which it is easy to write a decent compiler, and it requires the programmer to write text that is easy for the compiler to interpret. Some have called C a fancy assembly language. Both early Unix and C compilers had simple structures, are easy to port, require few machine resources to run, and provide about 50%-80% of what you want from an operating system and programming language.

Half the computers that exist at any point are worse than median (smaller or slower). Unix and C work fine on them. The worse-is-better philosophy means that implementation simplicity has highest priority, which means Unix and C are easy to port on such machines. Therefore, one expects that if the 50% functionality Unix and C support is satisfactory, they will start to appear everywhere. And they have, haven’t they?

Unix and C are the ultimate computer viruses.

A further benefit of the worse-is-better philosophy is that the programmer is conditioned to sacrifice some safety, convenience, and hassle to get good performance and modest resource use. Programs written using the New Jersey approach will work well both in small machines and large ones, and the code will be portable because it is written on top of a virus.

It is important to remember that the initial virus has to be basically good. If so, the viral spread is assured as long as it is portable. Once the virus has spread, there will be pressure to improve it, possibly by increasing its functionality closer to 90%, but users have already been conditioned to accept worse than the right thing. Therefore, the worse-is-better software first will gain acceptance, second will condition its users to expect less, and third will be improved to a point that is almost the right thing. In concrete terms, even though Lisp compilers in 1987 were about as good as C compilers, there are many more compiler experts who want to make C compilers better than want to make Lisp compilers better.

The good news is that in 1995 we will have a good operating system and programming language; the bad news is that they will be Unix and C++.

There is a final benefit to worse-is-better. Because a New Jersey language and system are not really powerful enough to build complex monolithic software, large systems must be designed to reuse components. Therefore, a tradition of integration springs up.

How does the right thing stack up? There are two basic scenarios: the big complex system scenario and the diamond-like jewel scenario.

The big complex system scenario goes like this:

First, the right thing needs to be designed. Then its implementation needs to be designed. Finally it is implemented. Because it is the right thing, it has nearly 100% of desired functionality, and implementation simplicity was never a concern so it takes a long time to implement. It is large and complex. It requires complex tools to use properly. The last 20% takes 80% of the effort, and so the right thing takes a long time to get out, and it only runs satisfactorily on the most sophisticated hardware.

The diamond-like jewel scenario goes like this:

The right thing takes forever to design, but it is quite small at every point along the way. To implement it to run fast is either impossible or beyond the capabilities of most implementors.

The two scenarios correspond to Common Lisp and Scheme.

The first scenario is also the scenario for classic artificial intelligence software.

The right thing is frequently a monolithic piece of software, but for no reason other than that the right thing is often designed monolithically. That is, this characteristic is a happenstance.

The lesson to be learned from this is that it is often undesirable to go for the right thing first. It is better to get half of the right thing available so that it spreads like a virus. Once people are hooked on it, take the time to improve it to 90% of the right thing.

A wrong lesson is to take the parable literally and to conclude that C is the right vehicle for AI software. The 50% solution has to be basically right, and in this case it isn’t.

But, one can conclude only that the Lisp community needs to seriously rethink its position on Lisp design. I will say more about this later….

Reference

issue

1
2
3
4
5
6
7
8
Configuration file: /etc/hostapd/hostapd.conf
Using interface wlp4s0 with hwaddr 3c:33:00:f6:67:2b and ssid "Codz"
random: Cannot read from /dev/random: Resource temporarily unavailable
random: Only 0/20 bytes of strong random data available from /dev/random
random: Not enough entropy pool available for secure operations
WPA: Not enough entropy in random pool for secure operations - update keys later when the first station connects
wlp4s0: interface state UNINITIALIZED->ENABLED
wlp4s0: AP-ENABLED

solution

1
2
mv /dev/random /dev/random.orig
ln -s /dev/urandom /dev/random

reference:

执行下面查询会报一堆错误

1
root@vagrant:/tmp#find / -name "my.cnf"

issue

1
find: '/var/lib/lxcfs/cgroup/devices/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/init.scope/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/init.scope/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/vagrant.mount/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/vagrant.mount/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/mdadm.service/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/mdadm.service/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/vboxadd-x11.service/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/vboxadd-x11.service/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/ifup@enp0s8.service/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/ifup@enp0s8.service/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/boot.mount/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/boot.mount/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/dbus.service/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/dbus.service/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/var-lib-lxcfs.mount/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/var-lib-lxcfs.mount/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/cron.service/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/cron.service/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/run-user-0.mount/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/run-user-0.mount/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/lvm2-lvmetad.service/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/lvm2-lvmetad.service/devices.deny': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/run-lxcfs-controllers-cpu\\x2ccpuacct.mount/devices.allow': Permission denied find: '/var/lib/lxcfs/cgroup/devices/system.slice/run-lxcfs-controllers-cpu\\x2ccpuacct.mount/devices.deny': Permission denied

版本信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
root@vagrant:/tmp# cat /etc/\*-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.1 LTS"
NAME="Ubuntu"
VERSION="16.04.1 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.1 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
UBUNTU_CODENAME=xenial

Bug was fix in lxcfs (2.0.4-0ubuntu1ubuntu16.04.1)
debian/changelog :lxcfs (2.0.4-0ubuntu1
ubuntu16.04.1) xenial; urgency=medium


一个临时解决方案
1
sudo apt-get remove --auto-remove lxcfs

Reference