Journal of Statistical Computation and Simulation
F
o
[SPECIAL ISSUE AsiaSim] A Sliding Window based Multi-
stage Clustering and Probabilistic Forecasting Approach for
r
P
Large Multivariate Time Series Data
e
e
r
Journal:
Journal of Statistical Computation and
Simulation
Manuscript ID GSCS-2016-0670
Manuscript Type: Original Paper
R
Areas of Interest: SPECIAL ISSUE (AISIASIM), TIME SERIES
e
2010 Mathematics Subject
Classification:
v
i
62H30
e
w
O
n
l
y
URL: http:/mc.manuscriptcentral.com/gscs
Page 1 of 14
Journal of Statistical Computation and Simulation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
A Sliding Window based Multi-stage Clustering and Probabilistic
Forecasting Approach for Large Multivariate Time Series Data
F
Abstract: Time series data analysis, such as temporal pattern recognition and trend forecasting, plays an
o
r
increasingly significant part in temporal data statistics and analytics. Yet challenges still exist in efficiency of
pattern extracting and trend prediction for large multivariate time series. The paper proposes a multi-stage
clustering approach towards multivariate time series by using dynamic sliding time windows. The segmented
multivariate time series are clustered separately in each time window to product first stage clustering centers, and
P
which are used to generate second stage clustering results involving all time windows. The method can simplify
large scale multivariate time series mining problems through multi-stage clustering on multiple sliding time
windows thus achieve improved efficiency. Based on the clustering outcomes, a correlation rules mining method
is given to discover frequent patterns in the time series and generate self-correlation rules. Then, the paper
presents a probabilistic forecasting model that leverages the extracted rules to make short term predictions.
e
e
r
Finally, the experiments are presented to show the efficiency and effectiveness of the proposed approach.
R
Keywords: time series, statistics, multi-stage clustering, dynamic sliding time window, time series forecasting
e
model
1. Introduction
v
i
e
O
w
A time series is defined as a sequence of data points in successive time order. Most commonly,
time series are represented by plots in curves or lines chart. Time series statistics and analysis
target at extracting meaningful statistics characteristics to identify the underlying patterns as
well as predict future events, which are widely used in signal processing, mathematical
finance, earthquake prediction, smart control, and transport trajectory forecasting. With the
rising of Big Data era, time series mining and forecasting are playing an increasingly
significant role in data statistics and analytics [1].
Time series statistics and analysis mainly concern two goals. One is discovering the nature
of the phenomenon represented by the temporal data, and the other is making predictions of
future values by building forecasting models. Time series patterns identification plays an
essential part in achieving these goals. Most methods of time series mining involve similarity
measurement, time series clustering and association rule analysis. Time series patterns mining
methods usually vary with different problems that involve specific temporal data with
characteristics. Romani [2] proposed a time series mining method for identify association
rules between meteorological data and remote sensing information data. The method can
extract the characteristic patterns of all kinds of data and generate association rules by training.
Li [3] introduced an algorithm for mining feature patterns, which can identify the
n
l
y
URL: http:/mc.manuscriptcentral.com/gscs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Journal of Statistical Computation and Simulation
Page 2 of 14
F
co-evolutional sequences containing missing values or not by exploiting smoothness and
correlation. Yao [4] developed a mode dependent stock mining method that can be used for
patterns discovery of stock data and trends predicting. In the method, a time search model was
used to find frequent patterns in stock data by converting the continuous time series data into
discrete space. And the model can make short-term trend forecasting of stock prices by using
the patterns. Novak [5] applied soft computing methods to time series data mining, such as
fuzzy logic and fuzzy transformation, and used these methods to extract a variety of
information, and finally integrated them together by natural language. Regarding the research
on time series clustering, Xi [6] presented a clustering method based on singular value
decomposition (SVD), which can be used to settle the problem of special structure in time
series clustering. This method leverages SVD to obtain memory features of time series and
cluster these features by canopy and K-means. Tamura [7] proposed a two-stage clustering
method. In the first stage, the one-pass k-median++ method is used to gain the initial
clustering centers, but not iterating. And clustering is carried out in the second stage. Xu [8]
made a reduction of the dimensions of feature space for ICU data, and segmented the time
series by time segmentation technology. This method was applied to prediction of mortality.
Wang [9] intercepted the large scale time series into sub ones, and each sub sequence was
represented as a group of coarse granularity fuzzy information. As a result, the original
problem cab be converted from numeric level to granular level and achieve more effective
clustering.
r
e
e
o
P
r
R
e
Regarding the time series forecasting models and methods, Ching-Hsue Cheng [10] built a
time series model based on indicators selection through applying multivariate adaptive
regression splines and stepwise regression. Further, a forecasting model was established by
SVR and optimized by GA. Ching-Hsue Cheng [11] also proposed a rough set rule induction
method, which was used to predict the stock price. Winita Sulandari [12] presented a hybrid
time series model based on moving average and weighted fuzzy. The model was used to
enhance the accuracy of the trend forecasting of time series. Bindu Garg [13] also established
a predicting model based on fuzzy time series. The model focused on the influence of trend
and seasonal components, and was used to forecast the trend in fuzzy environment. Zhibo Zhu
[14] introduced a stock trend analysis method based on feature simplification. The method
firstly utilized the local least square method to arrange the feature, and then made the
selection of features. And this method can be used to trend predicting of both micro stock data
and macro stock market.
v
i
e
w
O
Most above methods lay emphasis on extracting time sequence features and transforming
continuous time series data to a combination of extracted features. This often leads to data
loss in calculation process, resulting in lower effectiveness in describing the nature as well as
lower efficiency in forecasting, especially for large multivariate time series. In addition,
although clustering is widely used, the pattern computing efficiency still faces challenges in
dealing with large multivariate time series. The paper proposes a sliding window based
multi-stage clustering approach towards multivariate time series by using dynamic sliding
time windows (DSTW). The segmented multivariate time series are clustered separately in
each time window to product first stage clustering centers, and which are used to generate
second stage clustering results involving all time windows. The method can simplify large
scale multivariate time series mining problems by multi-stage clustering on multiple sliding
n
l
y
URL: http:/mc.manuscriptcentral.com/gscs
Page 3 of 14
Journal of Statistical Computation and Simulation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
time windows, thus achieve improved efficiency. Based on the clustering outcomes, a
correlation rules mining method is given to discover frequent patterns in the time series and
generate association rules. Then, the paper presents a probabilistic forecasting model that
leverages the extracted rules to make short term predictions. Finally, the experiments are
presented to show the efficiency and effectiveness of the proposed approach.
The rest of the paper is organized as follows. Section 2 presents the multi-stage clustering
method using DSTW. Section 3 discusses the forecasting model based on rules by clustering.
Section 4 gives experiments that verify the proposed approach. Section 5 concludes the work.
2. Multi-Stage Clustering Method Using DSTW For Multivariate Time Series
F
In this section, we present the multi-stage clustering method based on DSTW for multivariate
time series data.
o
r
2.1. Fuzzy Centers Selection Method For Multivariate Time Series
P
e
We improve the clustering algorithm based on K-means [15] to adapt it to multivariate time
series.
e
The steps of K-means algorithm are described as follows.
(1) Select k elements from the data source as initial centers.
r
R
{
}
e
,
1
2
µ µ µ µ µ (1)
,...,
3
=
,
k
(2) Calculate the distance between each element and centers, then divide the elements to
v
i
corresponding clusters according to the minimum distance.
−
e
[16] (2)
=
: arg min
x µ
j
c
( )
i
( )
i
2
j
(3) Recalculate the mean value of each cluster.
w
µ
j
=
:
i
m
( )
i
∑
∑
{
c
1
=
1
{
m
c
1
=
1
i
=
( )
i
( )
i
}
j x
}
=
j
[17] (3)
O
n
l
y
(4) Calculate the standard measuring function, and the algorithm would be terminated once
certain conditions were met such as converging to one point, otherwise the process should
return to step (2).
As the clustering objects focus on multivariate time series typically represented by a set of
curves rather than points, we improve the algorithm by incorporating features of the multiple
temporal curves to adapt it to multivariate time series mining. The improvements mainly
include two aspects as follows.
(1) Initial centers selection
Initial centers selection has a crucial impact the results of algorithm. Random selection of
initial centers is a common way, but which usually leads to unsatisfied results of clustering. In
addition, improper selection of initial centers typically results in superfluous iterations costing
too much computing time. An optimization strategy is running multiple times and choosing a
group of different random centers at each run, then selecting the clusters with minimum
URL: http:/mc.manuscriptcentral.com/gscs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Journal of Statistical Computation and Simulation
Page 4 of 14
quadratic sum of variance. This strategy is simple and the results depend on the number of
clusters as well as data set. Another strategy is to select the successive initial center that
farthest from the previous initial centers already selected. This method can guarantee that the
selected initial centers are randomly scattered, but outliers may be chosen.
We integrate two optimization strategies above and make further improvement. Figure
1shows the improved process of initial centers selection. The first center is selected randomly
from source data set and added to a special collection used for storing centers. Then in the
process of successive centers selection, once an element is selected randomly, the distance
between the element and each existing centers will be calculated. The element could be
confirmed as an initial center and added to the collection if the distance met predefined
threshold. Otherwise, the successive center selection should be continued until the collection
was filled up. The above process would be running multiple times, and the cluster with
minimum quadratic sum of variance should be selected.
F
o
r
P
e
e
r
R
e
v
i
e
w
O
n
l
y
Figure 1. Initial centers selection method
(2) Reselection of centers
Since common mathematical methods such as average calculation are not suitable for time
URL: http:/mc.manuscriptcentral.com/gscs
Page 5 of 14
Journal of Statistical Computation and Simulation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
series data elements clustering, we present centers reselection method for time series. As
shown in Figure 2 is the method which assumes that the initial centers should be selected
appropriately with small computing error in clustering process, and the elements in the cluster
could be relatively close to current center. The method traverses twice the clusters that need
reselection of centers, computing the distance between each element and other ones. Once
over 50% of the distances are within the scope of preset threshold, the element is chosen as
new center. If no elements could meet the threshold conditions, the initial center of this cluster
should be regarded as improper choice, and a new element would be selected randomly as a
new center.
F
o
r
P
e
e
r
R
e
Figure 2. Selection method of new center
v
i
e
2.2. First-stage Clustering Within Time Windows
w
Multivariate time series should be divided into segments before perform clustering in
time windows. Based on sliding time windows [18,19], dynamic sliding time windows enable
dynamic variety of time windows by configuring the parameters during the segmentation
process. Figure 3 shows the principle of DSTW. The key variables involve the size as well as
sliding step-length of time windows for data partitioning.
O
n
l
y
The size configuration of time windows has an essential impact on computational
efficiency. Because we use the dynamic time warping (DTW) distance [20, 21] to compute
the distances between multivariate time series, it is not necessary to make segments with
equal length. Actually the overall trend of a segment is more important. Instead, the size of
time windows can be set to a range [22]. One important thing to note is that too big distance
between segments should be avoided because it could not make sense. In our method, the
fluctuation range is set to 1~1.3 times of the initial window size. In addition, the sliding step
also has critical influence on clustering results. Too small steps could lead to redundant
repetitive computation because of overlapping of cross-window time series data sets.
URL: http:/mc.manuscriptcentral.com/gscs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Journal of Statistical Computation and Simulation
Page 6 of 14
F
o
r
Figure 3. Dynamic sliding time window
As the parameters are configured, the first-stage inner window clustering will be running.
P
As shown in Figure 4, firstly the inner window data of segmented multivariate time series will
be extracted and stored in memory. Then for each collection of multivariate time series data,
all segmented time series within the same time window will perform clustering. As a result,
the method will generate a number of independent inner window clusters that cover all time
period.
e
e
r
R
e
v
i
e
w
O
Figure 4. Subsequence clustering in the same period
2.3. Second-stage Clustering Based on Centers
n
l
y
In the second stage, the results of small segmented clusters from the first stage will be
aggregated to larger clusters. Firstly, the method gathers the clustering centers of the results in
the first stage as the original data set of the second stage. Then second clustering will be
running on the data set to generate the clusters of the centers. Further, the corresponding
elements of each new cluster centers that appear in the first stage results will be copied to the
corresponding clusters in the second stage to obtain the final clustering results.
URL: http:/mc.manuscriptcentral.com/gscs
Page 7 of 14
Journal of Statistical Computation and Simulation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
F
o
r
P
e
Figure 5. Multi-stage clustering method based on DSTW
e
r
R
Figure 5 shows the overall process of the multi-stage clustering method. The method takes
data from Data Source Pool which contains sub sequences of all time periods. Then these data
are organized in same period separately, and then the first stage cluster is executed with these
data sets by invoking the improved K-means algorithm from Algorithm Pool. The results are
temporarily saved in the Intermediate Result Storage Area which provides service to the
second stage cluster. The center clustering is carried out with centers extracted from the
clusters of first stage to generate the center clusters. Finally, the results of first stage clustering
in the Intermediate Result Storage Area are integrated to the corresponding clusters according
to the centers. And the final clusters are stored in Cluster Result Storage Area.
v
i
e
e
w
3. Forecasting Model Based on Rules from Multi-stage Clustering
O
3.1.Rule Base Establishment Based on Clustering Results
n
l
y
The self-correlation association rule mining for multivariate time series data is a non-standard
process based on the rule extraction derived from the clustering results. Firstly, the method
gets the first result set (the clustering results after pruning). Then draw the “Antecedents and
consequences” from the source data from the first results set to constitute a new data set and
cluster the data set to get the second result set. Finally, find rules of the derivation from the
former to the latter, after computing confidence of the rule. The rules obtained in this part are
as follow:
A
B→ (4)
confidence
=
(
A B
,
)
count
/
A
count
(5)
Where A is the former part, B is the latter part, (A, B)count is the count of the simultaneous
URL: http:/mc.manuscriptcentral.com/gscs