Journal of Computer and Communications, 2019, 7, 65-71
http://www.scirp.org/journal/jcc
ISSN Online: 2327-5227
ISSN Print: 2327-5219
Comparison of Spatiotemporal Fusion Models
for Producing High Spatiotemporal Resolution
Normalized Difference Vegetation Index Time
Series Data Sets
Zhizhong Han1, Wenya Zhao2
1College of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China
2Chongqing Aerospace Poly Technic, Chongqing, China
How to cite this paper: Han, Z.Z. and
Zhao, W.Y. (2019) Comparison of Spati-
otemporal Fusion Models for Producing
High Spatiotemporal Resolution Norma-
lized Difference Vegetation Index Time
Series Data Sets. Journal of Computer and
Communications, 7, 65-71.
https://doi.org/10.4236/jcc.2019.77007
Received: May 15, 2019
Accepted: July 7, 2019
Published: July 10, 2019
Abstract
It has a great significance to combine multi-source with different spatial res-
olution and temporal resolution to produce high spatiotemporal resolution
Normalized Difference Vegetation Index (NDVI) time series data sets. In this
study, four spatiotemporal fusion models were analyzed and compared with
each other. The models included the spatial and temporal adaptive reflectance
model (STARFM), the enhanced spatial and temporal adaptive reflectance fu-
sion model (ESTARFM), the flexible spatiotemporal data fusion model
(FSDAF), and a spatiotemporal vegetation index image fusion model
(STVIFM). The objective of is to: 1) compare four fusion models using Land-
sat-MODIS NDVI image from the Banan district, Chongqing Province; 2)
analyze the prediction accuracy quantitatively and visually. Results indicate
that STVIFM would be more suitable to produce NDVI time series data sets.
Keywords
Spatiotemporal Fusion, NDVI, Time Series, STVIFM
1. Introduction
The Normalized Difference Vegetation Index (NDVI) is a widely used vegeta-
tion index (VI) and provides a way of evaluating the biophysical or biochemical
information related to vegetation growth [1]. Long term NDVI time-series data-
sets have been widely used for monitoring ecosystem dynamics to understand
the responses of climate change [2] [3]. However, due to financial and technical
DOI: 10.4236/jcc.2019.77007 Jul. 10, 2019
65
Journal of Computer and Communications
Z. Z. Han, W. Y. Zhao
constraints, it is difficult to obtain NDVI data with both high spatial and high
temporal resolution on the same remote sensing instrument [4]. In addition,
long periods of cloud cover problems in some regions have aggravated this mat-
ter [5]. Thus, spatiotemporal fusion techniques which combine NDVI date from
multi-sensors with high spatial and temporal resolution is feasible solution to
acquire remote sensing time series for monitoring surface vegetations dynamics
[6] [7].
Up to now, several spatiotemporal fusion models have been proposed. Gao et
al. [8] proposed a spatial and temporal adaptive reflectance fusion model
(STARFM) to blend MODIS and Landsat image to produce a synthetic surface
reflectance product at 30 m spatial resolution. Based the STARFM, Zhu et al. [9]
developed an enhanced spatial and temporal adaptive reflectance fusion model
(ESTARFM), introducing conversion coefficient between pixels and improving
the prediction accuracy. Zhu et al. [10] proposed the flexible spatiotemporal data
fusion model (FSDAF) which performs better in predicting abrupt land cover
changes. Liao et al. [11] developed a spatiotemporal vegetation index image fu-
sion model (STVIFM) to generate NDVI time series images with high spatial
and temporal resolution in heterogeneous regions. In this study, we made a
comparation between STARFM, ESTARFM, FSDAF, and STVIFM methods,
tested by Landsat and MODIS data acquired in same site and quantitatively as-
sess the accuracy of predicted image generated from each fusion model.
2. Materials and Methods
2.1. Study site and Data Preparation
In this study, a selected study area is shown in Figure 1, which located in Banan
District (29˚34'10''N, 106˚57'35''E) in Chongqing Province to perform the com-
parison between the spatiotemporal fusion models. We select MODIS daily sur-
face reflectance image and Landsat-8 image acquired for these dates during this
period: April 28, 2015, August 02, 2015, and October 21, 2015. All images are
pre-processed and calculated as NDVI data. Scene subset is shown in Figure 2.
DOI: 10.4236/jcc.2019.77007
Figure 1. Location of the study area.
66
Journal of Computer and Communications
Z. Z. Han, W. Y. Zhao
Figure 2. Landsat NDVI (upper row) and MODIS NDVI (lower row) images. From left
to right, they were acquired from April 28, 2015, August 02, 2015, and October 21, 2015,
respectively.
2.2. Selected Spatiotemporal Fusion Models
2.2.1. STARFM
The STARFM is based on the moving window technology, which requires at
least a pair of high-resolution image and coarse-resolution image on the base
time and one coarse-resolution image on the predicted time. By introducing a
weigh function using spectral difference, temporal difference and spatial differ-
ence to determining the contribution of other pixels in the window to the central
pixel. And then a synthetic high Spatiotemporal image (F(t2)) is predicted with
the high- and coarse-resolution data through the proposed weight function. This
model can be written as in Equation (1).
F t
( 2)
=
∑
Wi F t M t
( 1)
( 2)
+
(
−
M t
( 1))
(1)
where, F(t1) and M(t1) denote the high-and coarse resolution date on the base
date, M(t2) is the coarse resolution date at the predicted date, and Wi is the
weight function.
2.2.2. ESTARFM
The ESTARFM needs at least two pairs of high-resolution image and coarse res-
olution image on the base time and one coarse-resolution image on the pre-
dicted time. Compared with STARFM, this method not only considers the spa-
tial and spectral similarity between pixels, but also introduces a conversion coef-
ficient, which is derived from the high-and coarse-resolution data during the
observation period using a linear regression. The final high-resolution predic-
tion is computed as in Equation (2).
F t
( 2)
=
∑
Wi Vi F t M t
* (
( 1)
( 2)
+
−
M t
( 1))
(2)
DOI: 10.4236/jcc.2019.77007
67
Journal of Computer and Communications
Z. Z. Han, W. Y. Zhao
where, F(t1) and M(t1) denote the high-and coarse resolution data on the base
date, M(t2) is the coarse resolution data at the predicted date, and Wi, Vi denote
the weight function and conversion coefficient respectively.
2.2.3. FSDAF
The FSDAF using one pair of high-resolution image and coarse-resolution im-
age on the base time and one coarse-resolution image on the predicted time, and
it also need to use land cover map. This model integrates STARFM, the linear
unmixing method [12] and the thin plate spline (TPS) interpolator that main-
tains the land cover change signals and local variability, which combined the
temporal prediction from the linear unmixing method with the spatial predic-
tion obtained by the TPS and distribute the residual to fine pixel to get the final
prediction. It can be written as Equation (3).
F t
( 2)
=
F t
( 1)
+
∆∑
F
*
Wi
(3)
where, F(t1), F(t2) denote the high-resolution image on the base time and pre-
dicted time respectively. F∆ is referred to the change between t1 and t2, which
computed by the linear unmixing method and TPS. And Wi is the weight func-
tion.
2.2.4. STVIFM
The STVIFM requires two pairs of high- and coarse-resolution images acquired
on the base time and one coarse-resolution on the predicted date. On the one
hand, this model links the mean NDVI change of high-resolution pixels to mean
NDVI change of coarse resolution pixels within a moving window. On the other
hand, it also considers the difference in NDVI change rates at different growing
stages. And the final prediction can be written as Equation (4).
* NDVI
NDVI( 2) NDVI( 1)
(4)
∆∑
Wi
=
t
t
+
where, NDVI(t2), NDVI(t1) are the high-resolution date on the prediction time
and base time respectively. ΔNDVI denote the change between t1 and t2, which
calculated by this model. And the Wi is the weight function.
2.3. Assessing Prediction Accuracy
The model’s prediction performance is quantitatively evaluated by representative
metrics. And the r and RMSE (root mean squared errors) are used to measure
the difference between the predicted image and actual image. The formulations
of these metrics are as follows:
r
=
∑
∑
(
N
j
1
=
N
j
1
=
(
x
j
−
x
j
−
2
x
)
x y
)(
∑
N
j
=
−
y
)
j
(
y
j
2
−
y
)
(5)
2
DOI: 10.4236/jcc.2019.77007
RMSE
=
−
y
j
2
)
∑
N
j
=
1(
x
j
N
(6)
68
Journal of Computer and Communications
Z. Z. Han, W. Y. Zhao
where N is the total number of pixels in the predicted image, xj and yj are the
values of the jth pixel in the predicted image and the actual image respectively.
And x , y represent the mean gray values of the predicted image and the ac-
tual image respectively.
3. Result and Discussion
3.1. Prediction Performance
We use the August 02 Landsat NDVI image as validation source and use April
28 and October 21 to predict the August 02 image. Figure 3 shows the actual
NDVI image and predicted NDVI image by four spatiotemporal fusion models
on August 02, 2015. All the predicted NDVI images are consistent with the ac-
tual image from visual comparison, and water boundaries and clear land can be
predicted obviously, which demonstrate the practicality of these spatiotemporal
models.
3.2. Quantitative Assessment
Scatter plots in Figure 4 indicate the difference between the actual NDVI values
and the predicted NDVI values on August 02 2015. We can see that the pre-
dicted NDVI values by four spatiotemporal fusion models are all fall close to the
1:1 line, which show all four spatiotemporal fusion models can capture changes
in phenology. And the prediction of ESTARFM and STVIFM using one input
pair is relatively accurate than that of STARFM and FSDAF using two input
pairs, which because two input pairs can provide more spatial details.
To better assess the accuracy of predictions, the metrics r and RMSE were
calculated in Table 1. All four methods can get the change details to the base
date image to get the prediction. The accuracy of the predicted NDVI image us-
ing the STVIFM is the best (r = 0.864, RMSE = 0.1191) and a little better than
the accuracy of the predicted NDVI image using ESTARFM (r = 0.867, RMSE =
0.1247). The image predicted by STARFM (r = 0.804, RMSE = 0.1626) and
FSDAF (r = 0.810, RMSE = 0.1446) can also produce an accurate result, but
these two models got inaccurate predictions on some pixels (Figure 3(b), Figure
3(d)), which demonstrate the predictions using two input pairs is relatively
more accurate.
Table 1. Comparison of rand RMSE betweeen actual NDVI and predicted NDVI by using
STARFM, ESTARFM, FSDAF, and STVIFMmodelsin the study area on August 02 2015.
Models
STARFM
ESTARFM
FSDAF
STVIFM
r
0.804
0.867
0.810
0.864
RMSE
0.1626
0.1247
0.1446
0.1191
69
Journal of Computer and Communications
DOI: 10.4236/jcc.2019.77007
Z. Z. Han, W. Y. Zhao
Figure 3. (a) Actual Landsat-8 NDVI image; (b)-(d) are the predicted NDVI images of
STARFM, ESTARFM, FSDAF, and STVIFM respectively.
Figure 4. Scatter plots of the actual and predicted values for NDVI (darker areas indicate
high density, and the line is 1:1 line).
4. Conclusion
This study made a comparison between four spatiotemporal fusion models,
STARFM, ESTARFM, FSDAF, and STVIFM using high-and coarse-resolution
NDVI data, and quantitatively analyzed the performance of these models using r
and RMSE. For the results predicted by all four models, the r varied between
0.804 and 0.867 and the RMSE varied between 0.1191 and 0.1626, which showed
that all the selected models can produce reasonable predictions. And we found
that STVIFM can capture vegetation change and get the predicted results closed
to actual NDVI image than other three methods. In conclusion, the STVIFM is
more suitable for producing high spatiotemporal resolution NDVI time series,
especially for some vegetation with different growing period.
70
Journal of Computer and Communications
DOI: 10.4236/jcc.2019.77007
Z. Z. Han, W. Y. Zhao
Conflicts of Interest
The authors declare no conflicts of interest regarding the publication of this pa-
per.
References
[1] Busetto, L., Meroni, M. and Colombo, R. (2008) Combining Medium and Coarse
Spatial Resolution Satellite Data to Improve the Estimation of Sub-Pixel NDVI
Time Series. Remote Sens. Environ., 112, 118-131.
https://doi.org/10.1016/j.rse.2007.04.004
[2] Tewes, A., Thonfeld, F., Schmidt, M., Oomen, R., Zhu, X., Dubovyk, O., Menz, G.
and Schellberg, J. (2015) Using RapidEye and MODIS Data Fusion to Monitor Ve-
getation Dynamics in Semi-Arid Rangelands in South Africa. Remote Sens., 7,
6510-6534. https://doi.org/10.3390/rs70606510
[3] Bhandari, S., Phinn, S. and Gill, T. (2012) Preparing Landsat Image Time Series
(LITS) for Monitoring Changes in Vegetation Phenology in Queensland, Australia.
Remote Sens., 4, 1856-1886. https://doi.org/10.3390/rs4061856
[4] Gevaert, C.M. and García-Haro, F.J. (2015) A Comparison of STARFM and an
Unmixing Based Algorithm for Landsat and MODIS Data Fusion. Remote Sensing
of Environment, 156, 34-44. https://doi.org/10.1016/j.rse.2014.09.012
[5] Schmidt, M., Udelhoven, T., Gill, T. and Röder, A. (2012) Long Term Data Fusion
for a Dense Time Series Analysis with MODIS and Landsat Imagery in an Australi-
an Savanna. J. Appl. Remote Sens., 6, 63512. https://doi.org/10.1117/1.JRS.6.063512
[6] Fensholt, R. (2004) Earth Observation of Vegetation Status in the Sahelian and Su-
danian West Africa: Comparison of Terra MODIS and NOAA AVHRR Satellite
Data. Int. J. Remote Sens., 25, 1641-1659.
https://doi.org/10.1080/01431160310001598999
[7] Zurita-Milla, R., Clevers, J., van Gijsel, J. and Schaepman, M. (2011) Using MERIS
Fused Images for Classification Mapping and Vegetation Status Assessment in He-
terogeneous Landscapes. Int. J. Remote Sens., 32, 973-991.
https://doi.org/10.1080/01431160903505286
[8] Gao, F., Masek, J., Schwaller, M. and Hall, F. (2006) On the Blending of the MODIS
and Landsat ETM+ Surface Reflectance. IEEE Trans. Geosci. Remote Sens., 44,
2207-2218. https://doi.org/10.1109/TGRS.2006.872081
[9] Zhu, X., Chen, J., Gao, F., Chen, X. and Masek, J.G. (2010) An enhanced Spatial and
Temporal Adaptive Reflectance Fusion Model for Complex Heterogeneous Regions.
Remote Sens. Environ., 114, 2610-2623. https://doi.org/10.1016/j.rse.2010.05.032
[10] Zhu, X., Helmer, E.H., Gao, F., Liu, D., Chen, J. and Lefsky, M.A. (2016) A Flexible
Spatiotemporal Method for Fusing Satellite Images with Different Resolutions. Re-
mote Sens. Environ., 172, 165-177. https://doi.org/10.1016/j.rse.2015.11.016
[11] Liao, C., Wang, J., Pritchard, I., et al. (2017) A Spatio-Temporal Data Fusion Model
for Generating NDVI Time Series in Heterogeneous Regions. Remote Sensing, 9,
1125. https://doi.org/10.3390/rs9111125
[12] Zhukov, B., Oertel, D. and Lanzl, F. (1999) Unmixing-Based Multisensor Multire-
solution Image Fusion. IEEE Trans. Geosci. Remote Sens., 37, 1212-1226.
https://doi.org/10.1109/36.763276
71
Journal of Computer and Communications
DOI: 10.4236/jcc.2019.77007