JSS
Journal of Statistical Software
July 2011, Volume 43, Issue 4.
http://www.jstatsoft.org/
Analyzing Remote Sensing Data in R: The landsat
Package
Sarah C. Goslee
USDA-ARS Pasture Systems and Watershed Management Research Unit
Abstract
Research and development on atmospheric and topographic correction methods for
multispectral satellite data such as Landsat images has far outpaced the availability of
those methods in geographic information systems software. As Landsat and other data
become more widely available, demand for these improved correction methods will in-
crease. Open source R statistical software can help bridge the gap between research and
implementation. Sophisticated spatial data routines are already available, and the ease
of program development in R makes it straightforward to implement new correction al-
gorithms and to assess the results. Collecting radiometric, atmospheric, and topographic
correction routines into the landsat package will make them readily available for evaluation
for particular applications.
Keywords: atmospheric correction, Landsat, radiometric correction, R, remote sensing, satel-
lite, topographic correction.
1. Introduction
Satellite remote sensing data can provide a complement or even alternative to ground-based
research for large scale studies or over long periods. The Landsat platform is the premier
example: images have been collected continuously since the early 1970s. If both Landsat 5
and 7 are considered, every location within the United States is imaged every 8 days. Other
satellite platforms have become available more recently. Images from all platforms can be
used for mapping land cover, tracking land use change, and even estimating plant biomass.
Image characteristics vary from date to date. Some of this variation is due to solar elevation
angle and can be easily corrected for given date and time (radiometric calibration). Addi-
tional variation is caused by atmospheric conditions at the time of imaging: scatter at different
wavelengths due to haze. Atmospheric corrections are very complex. If one image every few
2
landsat: Analyzing Remote Sensing Data in R
years is analyzed, for example to track development patterns, atmospheric corrections may
be unnecessary because of the magnitude of the change in land use during that interval. If
researchers wish to take advantage of the temporal data density offered by Landsat, careful
correction is required. Otherwise differences in apparent reflectance due to atmospheric vari-
ation can swamp differences caused by actual change on the ground. In mountainous areas,
topographic corrections are also necessary to compensate for differences in incident radiation
due to slope and aspect. Without correction, the same ground cover on opposite sides of the
mountain could return a different signal.
Many procedures to implement atmospheric and topographic corrections have been proposed,
each with strengths and weaknesses, but few are readily available in geographic information
systems (GIS) or image processing software. It is impossible to compare methods to identify
the one most suited for a given application.
Indeed, it is not always clear exactly what
proprietary GIS software is doing. R offers an alternative to GIS software for research and
testing of satellite image processing algorithms. The interpreted nature of R makes it possible
to implement, test and modify algorithms easily. The available graphical and statistical tools
are vastly superior to anything available in common GIS packages, making it straightforward
to compare algorithms. Tools for processing and display of spatially-referenced data are
already available in R (R Development Core Team 2011). The sp (Bivand, Pebesma, and
Gomez-Rubio 2008) and rgdal (Keitt, Bivand, Pebesma, and Rowlingson 2011) packages
provide these capabilities for several commonly-used spatial formats.
The landsat package described here, and available from the Comprehensive R Archive Network
at http://CRAN.R-project.org/package=landsat, provides basic tools for working with
satellite imagery such as automated georeferencing and cloud detection. It contains functions
for radiometric normalization, and several different approaches to atmospheric correction.
Four topographic correction algorithms have been implemented. Other useful functions such
as bare soil line and tasseled cap calculations have been included. While these functions were
developed with Landsat data in mind, they are suitable for use with satellite imagery from
other platforms as long as appropriate calibration data are used.
2. Landsat platform characteristics
Landsat 5 (TM) was launched on 1984-03-01 and Landsat 7 (ETM+) on 1999-04-15. Each
revisits a location every 16 days, and the two orbits are staggered so an image is taken for
each location every 8 days. The Landsat Data Continuity Mission is planned to launch in
2011, and will record data in bands compatible with the TM and ETM+ instruments. Due
to a hardware failure on 2003-05-31, Landsat 7 scenes are now missing 22% of the pixels. The
problem is most severe near the edges of a image. Band characteristics are largely consistent,
but ETM+ added an additional band (Table 1).
The Landsat images have been converted to integer digital numbers (DN ) before distribution
to facilitate storage and display. They may require conversion to radiance or reflectance,
topographic correction or atmospheric correction. If using a single image, or images widely
separated in time to examine gross changes, minimal processing may be required. For detailed
comparison of vegetation indices from multiple images, however, careful correction is needed.
The most accurate atmospheric corrections require ground data taken during the satellite
overpass. For retrospective studies this is impossible to obtain, and less-accurate image-based
Journal of Statistical Software
3
Planned wavelength Actual wavelength Resolution
Landsat 5
Band 1 Blue
Band 2 Green
Band 3 Red
Band 4 Near infrared (NIR)
Band 5 Middle infrared (MIR)
Band 6 Thermal infrared (TIR)
Band 7 Middle infrared (SWIR)
0.45–0.52
0.52–0.60
0.63–0.69
0.76–0.90
1.55–1.75
10.40–12.50
2.08–2.35
0.452–0.518
0.528–0.609
0.626–0.693
0.776–0.904
1.567–1.784
10.45–12.42
2.097–2.349
Landsat 7
30
30
30
30
30
120
30
Planned wavelength Actual wavelength Resolution
Band 1 Blue
Band 2 Green
Band 3 Red
Band 4 Near infrared (NIR)
Band 5 Middle infrared (MIR)
Band 6 Thermal infrared (TIR)
Band 7 Middle infrared (SWIR)
Band 8 Panchromatic
0.45–0.52
0.52–0.60
0.63–0.69
0.77–0.90
1.55–1.75
10.40–12.50
2.09–2.35
0.52–0.90
0.452–0.514
0.519–0.601
0.631–0.692
0.772–0.898
1.547–1.748
10.31–12.36
2.065–2.346
0.515–0.896
30
30
30
30
30
60
30
15
Table 1: Band wavelengths (µm) and resolutions (m) for Landsat 5 Thematic Mapper (TM)
and 7 Enhanced Thematic Mapper (ETM+). Band 8 (ETM+ only) is higher-resolution visible
light data. Actual wavelengths are from Chander et al. (2009).
correction methods must be used. The following sections describe the procedures needed for
calibration of Landsat data and the R implementations of these algorithms in the landsat
package.
3. Sample data
The landsat package includes a 300 × 300 pixel subset of United States Geological Survey
(USGS) Landsat ETM+ images from two dates, 2002-07-20 and 2002-11-25 (US Geological
Survey 2010b). These images are in SpatialGridDataFrame format, and can be displayed
using image(). Even after clouds are removed, the July image has a higher mean DN and
greater dynamic range than the November image (Figure 1c, d). Complete metadata are
included in the help files. A USGS digital elevation model (DEM) covering the same area and
at the same resolution has been included (Figure 1f; US Geological Survey 2010a).
4. Tools
This package provides a few basic tools for working with Landsat images. The lssub()
function is an R interface to the image subsetting tools from the Geospatial Data Abstraction
Library (GDAL, GDAL Development Team 2011). This function is provided for convenience;
while the same effect can be obtained using the subsetting property of sp, it is considerably
faster to use the GDAL functions to remove a smaller section of a geotiff image. Landsat
images are usually distributed as geotiff files, and can be imported in SpatialGridDataFrame
4
landsat: Analyzing Remote Sensing Data in R
Figure 1: Band 4 (near infrared) from Landsat images from July and November 2002. The
July image has substantial cloud cover, as identified in the cloud mask produced by clouds().
November 2002 was cloud-free, so no mask is shown.
Journal of Statistical Software
5
format using readGDAL. The lssub() function preserves the geotiff format.
Image processing requires some information not often available in the metadata. The Earth-
Sun distance for a given date can be calculated with ESdist(). A date can be conveniently
converted to decimal format with ddist().
4.1. Automated georeferencing
A simple error-minimization routine can be used to provide relative georeferencing by match-
ing one image to a reference image (georef and geoshift by means of vertical and horizontal
shifts. If the reference image has been absolutely georeferenced, then all of the subsequent
images will also be spatially referenced. This function can find local minima, so results should
always be checked visually. The matching process only needs to be done once for each date;
Band 3 or Band 4 generally provide good results, but any band may be used. The results of
this step can then be used for each band image from that date. Sufficient padding must be
added around the edges of the image to accommodate the magnitude of the shift. The larger
the image area, the more effective the matching process. The below example illustrates the
procedure, but because of the small image sizes used for the demo data the shift coefficients
are unreliable.
R> july.shift <- georef(nov3, july3, maxdist = 50)
R> july1.corr <- geoshift(july1, padx = 10, pady = 10, july.shift$shiftx,
+
july.shift$shifty)
4.2. Topographic calculations
Topographic corrections require the use of slope and aspect data calculated from a DEM of
the same resolution as the satellite data. The slopeasp() function will calculate both given a
DEM. While all GIS software will provide topographic calculations, this function was included
for the convenience of being able to do all the processing within R and to allow exploration
of different algorithms. Most GIS software offers only a tiny subset of the methods that have
been proposed.
The most common method for slope and aspect calculations is the third-order finite difference
weighted by reciprocal of distance (Unwin 1981; Clarke and Lee 2007). This method is the
equivalent of using a 3 × 3 Sobel filter to determine the east-west slope and the north-south
slope. Given a cell Zi,j with east-west cell size EW res and north-south cell size NS res , slope
in either percent or degrees, and aspect with north as 0◦, east as 90◦ and south as 180◦ can
be calculated using Equation 1. A simple smoothing correction (dividing by a smoothing
parameter before taking the arctan) can reduce extreme slope values (Riano, Chuvieco, Salas,
and Aguado 2003).
EW = [(Zi+1,j+1 + 2Zi+1,j + Zi+1,j−1) − (Zi−1,j+1 + 2Zi−1,j + Zi−1,j−1)]/(8EW res )
NS = [(Zi+1,j+1 + 2Zi,j+1 + Zi−1,j+1) − (Zi+1,j−1 + 2Zi,j−1 + Zi−1,j−1)]/(8NS res )
θp = arctan
EW 2 + NS 2
slope% = 100 ·
EW 2 + NS 2
NS
φo = 180◦ − arctan
+ 90◦ EW
|EW |
EW
(1)
6
landsat: Analyzing Remote Sensing Data in R
5. Radiometric calibration
Landsat images are distributed as digital numbers, integer values from 0–255. While less
crucial now, this correction was originally necessary to make it possible to store, distribute
and portray these images efficiently. Radiometric calibration is a two-step process. First
the DN values are converted to at-satellite radiance using parameters provided in the image
metadata. Data on solar intensity are used to convert the at-satellite radiance to at-satellite
reflectance. These parameters are also in the image metadata.
5.1. At-sensor radiance
The first step in processing is to convert DN to at-sensor spectral radiance L, also called
top-of-atmosphere radiance. The conversion coefficients are available in the metadata accom-
panying the images. Whenever possible the metadata values should be used, as coefficients
vary by platform and over time, but standard coefficients are given in Table 2.
Coefficients are provided in one of three band-specific formats: gain 2 and offset; Grescale (also
called gain) and Brescale (bias); or radiances associated with minimum and maximum DN
values (Lmax and Lmin ). Any of the three can be used to convert from DN to at-sensor
radiance (Equation 2).
L =
DN − offset
gain 2
L = Grescale DN + Brescale
L = (
Lmax − Lmin
DN max − DN min
) · (DN − DN min ) + Lmin
(2)
It is not always clear which form of coefficients the metadata contain because “gain” has been
used to refer to both gain 2 and Grescale . Most recent image metadata provide Grescale and
Brescale , but these formulations are interconvertible (Equation 3). The magnitudes of Grescale
and gain 2 are similar for most bands, but the former is paired with Brescale , which will have
Grescale
0.778740
0.798819
Landsat 5
Grescale Brescale Grescale
0.765827 −2.29
0.671339 −2.19
1.448189 −4.29
1.322205 −4.16
1.043976 −2.21
0.876024 −2.39
0.120354 −0.49
1.18
0.055376
0.065551 −0.22
NA
Landsat 7 low gain Landsat 7 high gain
Brescale
Brescale
1.180709 −7.38
−6.98
for TM images taken before 1991-12-31
−7.20
1.209843 −7.61
for TM images taken before 1991-12-31
0.942520 −5.94
−5.62
0.969291 −6.07
−5.74
−1.13
0.191220 −1.19
0.067087 −0.07
3.16
0.066496 −0.42
−0.39
0.975597 −5.68
−5.34
0.621654
0.639764
0.126220
0.037205
0.043898
0.641732
NA
Band 1
Band 2
Band 3
Band 4
Band 5
Band 6
Band 7
Band 8
Table 2: Default gain (Grescale ; W m−2sr−1µm−1DN −1) and bias (Brescale ; W m−2sr−1µm−1)
for Landsat 5 (TM) and Landsat 7 (ETM) from Chander et al. (2009). The metadata will
state whether Landsat 7 images were taken at low gain or high gain.
Journal of Statistical Software
7
Landsat 5 Landsat 7
Band 1
Band 2
Band 3
Band 4
Band 5
Band 7
Band 8
1983
1796
1536
1031
220.0
83.44
NA
1997
1812
1533
1039
230.8
84.90
1362
Table 3: Extra-solar atmospheric constants (Esun ; W m−2µm−1) for Landsat 5 (TM) and 7
(ETM+) from Chander et al. (2009).
(Note: The Landsat 5 column contained errors in
previous versions of this manuscript, corrected on 2012-01-08.)
negative values for non-thermal bands, while gain 2 is paired with offset, which is positive for
non-thermal bands.
Lmax − Lmin
DN max − DN min
Grescale =
Grescale =
1
gain 2
Brescale = Lmin − Grescale DN min
Brescale =
−offset
gain 2
(3)
gain 2 =
offset =
1
Grescale
−Brescale
Grescale
All radiometric functions in landsat accept either gain and offset or Grescale and Brescale .
5.2. At-sensor reflectance
The at-sensor radiance values calculated using Equation 2 must be corrected for solar vari-
ability caused by annual changes in the Earth-Sun distance d, producing unitless at-sensor
(or top-of-atmosphere) reflectance ρAS (Equation 4).
πd2L
ρAS =
(4)
Esun is the band-specific exoatmospheric solar constant (Table 3; W m−2µm−1). The solar
zenith angle θz can be derived from the image metadata, where the solar elevation angle θs is
usually included; θz = 90◦−θs. At-sensor reflectance can be calculated using the radiocorr()
function with method = "apparentreflectance".
Esun cos θz
R> july4.ar <- radiocorr(july4, Grescale = 0.63725, Brescale = -5.1,
+
+
sunelev = 61.4, edist = ESdist("2002-07-20"), Esun = 1039,
method = "apparentreflectance")
8
landsat: Analyzing Remote Sensing Data in R
5.3. Thermal bands
Band 6 contains thermal infrared data. Landsat 7 offers two thermal bands, while Land-
sat 5 provides one. Instead of calculating top-of-atmosphere reflectance, these data can be
converted to temperature (K◦) using thermalband(). This function provides default coeffi-
cients, and requires only the DN data and the band number (6 for Landsat 5; 61 or 62 for
Landsat 7).
R> july61.thermal <- thermalband(july61, band = 61)
6. Cloud identification
Clouds are reflective (high) in Band 1 and cold (low) in Band 6, so the ratio of the two
bands is high over clouds (Martinuzzi, Gould, and Gonz´alez 2007). The absolute value of this
ratio must be adjusted for data type, whether reflectance, radiance, or DN . The clouds()
function will create a cloud mask (1 where clouds are present; NA where they are not) given
Band 1 and Band 6. The default parameters for the ratio level (level) and for adding a buffer
around the cloud edge (buffer) were adequate for the test data once converted to at-sensor
reflectance and temperature. This function can be used with DN data if the level argument
is adjusted appropriately. The mask does not demarcate areas of cloud shadow, but only the
clouds themselves (Figure 1e).
R> july.cloud <- clouds(july1.ar, july61.thermal)
R> nov.cloud <- clouds(nov1.ar, nov61.thermal)
7. Atmospheric correction
For most applications, ground reflectance is of greater interest than at-sensor reflectance so
atmospheric correction is required. Variation in atmospheric conditions at the time of over-
pass can overwhelm any changes in surface reflectance, so it is crucial to correct for these
differences.
If measured atmospheric data such as optical depth are available an accurate
correction can be applied, but these are rarely available for retrospective studies. Most com-
monly, an image-based method is used instead. Two categories of corrections are available.
Relative normalization methods match the spectral characteristics of each image to a refer-
ence image in such a way that each transformed image appears to have been taken using the
same sensor and with the same atmospheric conditions as the reference image. Functions are
available for relative atmospheric correction methods using the entire image or unchanging
subsets.
Absolute atmospheric correction methods rely on a mechanistic understanding of atmospheric
effects to adjust each image individually. Instead of correcting to a reference image, informa-
tion contained in part of an image, for instance the darkest areas, is extracted and used to
correct the rest of the image. This extracted information substitutes for measured parameters.
Three such methods have been included here.
7.1. Relative atmospheric correction using the entire image
Statistical correction methods are entirely empirical and do not consider physical principles