Ir a

Introduction to npphen in R

Authors: Matías Olea, José A. Lastra, Roberto O. Chávez (May 30, 2022)


The R package npphen has been developed primarily to enable land surface phenology reconstruction and extreme anomaly detection by using remote sensing time series. Nevertheless, npphen includes functions to analyze any type of numerical time series. Examples of remote sensing time series that can be tackled using npphen are differents time series (e.g. NDVI, EVI, Temperature, Precipitation, Soil Moisture) from diferents dataset as GIMMS project, the U.S. MODIS Program, Landsat Program the E.S.A. Sentinel 2 Program or the CR2MET products for chilean climate. Examples of numerical time series can be the NPP (Net Primary Productivity) from a FluxNet Tower or the GI (Green Index) from the Phenocams Networks.

In this document we want to facilitate the use of this package by demonstrating in detail what the individual functions do and how they can be applied. For this task we will use a demo dataset of 16-day MODIS Enhanced Vegetation Index (EVI) composites, though generally the package can be used to process a wide range of remotely-sensed products such as Landsat, MODIS, Sentinel-2 or AVHRR.

1 Set up your system

All necessary R packages (such as raster & rgdal) are automatically installed together with npphen by typing and running in an active R session:

> ?npphen #to get familiar with the avaible functions

2 Remote sensing data and data pre-procesing

The demo consists of 16-day MODIS EVI composites of deciduous a Nothofagus pumilio forest in Aysén, Southern Chile, were massive defoliation has been caused by outbreaks of the moth Ormiscodes amphimone in 2016. All in all we used 377 EVI images obtained from the Terra satellite (MOD13Q1 product) which have already been ’cleaned’ using the VI Quality Bands.

2.1 Read EVI data and create a multi-temporal raster object

First we list all TIFF-files in the folder and create a raster stack object. The raster stack now includes all the images, but it lacks the exact date at which each image was taken from. Therefore, we have to use the CSV-file with the dates of the images, convert the column date into a date format, and create a raster time series object.

evi.list <- list.files(path='YourFolder', pattern = glob2rx('*.tif'),full.names=T) <- stack(evi.list) #creates a raster stack from list files
dates.table <- read.csv('YourFolder/mod13q1_377_dates.csv')
date <- as.Date(dates.table$date, format='%d/%m/%Y')

3. Land surface phenology (LSP) and anomaly detection of numerical vectors

3.1 Working with numerical vectors (a single pixel): Phen

In order to understand how the npphen approach works and what the results look like, we first want to apply the algorithm on the EVI time series of one pixel. We can either use a known cell number, or extract a single pixel by a coordinate with the function cellFromXY(), and plot the VI values of each observation. We recommend to repeat the following steps with a couple of different pixels in the dataset, especially if working in a heterogeneous study area.

p <- cellFromXY(, c(282593, 4974144))
ts <- as.numeric([p])
plot(y=ts, x=date, type='l', main='Time series from a pixel', ylab='EVI', xlab='')

The function Phen() estimates the annual phenological cycle from a time series of vegetation greenness. It requires the time series as well as the observation dates as numerical vectors. The argument h indicates the geographic hemisphere and is used to define the starting date of the growing season. h=1 if the study area is in the Northern Hemisphere (season starting at January 1st), h=2 if it is in the Southern Hemisphere (season starting at July 1st). The argument frequency Defines the number of samples for the output phenology and must be one of the this: 'daily' giving output vector of length 365, '8-days' giving output vector of length 46 (i.e MOD13Q1 and MYD13Q1), 'monthly' giving output vector of length 12, 'bi-weekly' giving output vector of length 24 (i.e. GIMMS) or '16-days' (default) giving output vector of length 23 (i.e MOD13Q1 or MYD13Q1). rge is a vector containing the minimum and maximum values of the response variable used in the analysis. We suggest the use of theoretically based limits. For example, in the case of MODIS NDVI or EVI, the values range from 0 to 10,000, so rge=c(0,10000). The output is a numerical vector containing the expected value of the vegetation index for each dates which lengths its determined by the argument frequency. This is in fact the expected annual phenology of a standard growing season, reconstructed from all available historical observations.

PhenPix <- Phen(x=ts, dates=date, h=2, frequency ='16-days', rge=c(0, 10000))
plot(PhenPix, xlab='Days in phenological cycle', ylab='EVI')

3.2  Plot of the annual phenology and its RFD: PhenKplot

The function PhenKplot can be used to visualize the reference frequency distribution (RFD) of VI values at different days of the growing seasons. The dark red line is considered the reference VI phenological curve and is the output of the Phen function. This plot illustrates the LSP of the specific pixels in more detail than the simple plot function by also showing how the vegetation index is ranging through time.

PhenKplot(x=ts, dates=date, h=2, xlab='Days in phenological cycle', ylab='EVI', rge=c(0, 10000))

3.3 Anomaly detection using ExtremeAnom

When our goal is to not only reconstruct the land surface phenology but to map vegetation changes (e.g. the impact of insect infestations on the forest structure), we can use the function ExtremeAnoma(). It will first estimate the regular phenological cycle using a reference period (just as Phen does), and then calculate the anomalies of the observed phenology of a given date from the reference curve (check the red curve in PhenKplot). Therefore, the additional arguments refp and anop define the reference period and the anomaly calculation period, respectively. The new argument output defines the output values. 'both' (default) returns both anomalies and rfd position together as a single numeric vector, 'anomalies' returns only anomalies, 'rfd' returns only rfd values (how extreme the anomalies are) and 'clean' returns only extreme anomalies, i.e. anomalies at which a given rfd is overpass (e.g. 0.90). This critical threshold is set by the users using the rfd argument. And finally, rfd is an argument that only applies when the argument output = 'clean'. It defines the percentile (from 0 to 0.99) of the reference frequency distribution, for which anomalies are not flagged as extreme anomalies. For example, if rfd = 0.90 only anomalies falling outside the '0.90 rfd' (default value) will be flagged as extreme anomalies while the rest will be neglected (NA values).

# Example with all anomalies detected, output='both'
anom <- ExtremeAnom(x=ts, dates=date, h=2, refp=c(1:354),

                                       anop=c(355:377), rge=c(0, 10000), output='both', rfd=0.9)

barplot(anom[1:23],col=ifelse(anom[1:23]>0, 'green', 'red'), names='', main='Anomalies')
barplot(anom[24:46],col=ifelse(anom[24:46]>=90, 'red', 'grey'), names='', main='RFD', ylim=c(0, 100))

# Example with only extreme anomalies, output='clean'
exanom <- ExtremeAnom(x=ts, dates=date, h=2, refp=c(1:354),

                                       anop=c(355:377), rge=c(0, 10000), output='clean', rfd=0.9)

barplot(exanom[1:23],col=ifelse(exanom[1:23]>0, 'green', 'red'), names='', main='Anomalies')
axis(1, at=1:23, labels=names(exanom), las = 2, cex.axis = 0.6)

4. Working with time series raster stacks: PhenMap and PhenAnoMap

Now that we have tested an individual pixels (again, it is recommended to analyze several pixels of the study area) , we want to move on and spatially analyze the land surface phenology in our entire study region. The functions PhenMap() and ExtremeAnoMap() basically work in the same way as Phen() and ExtremeAnom() but reconstruct the land surface phenology of each pixel within a raster stack.

In addition to the parameters of the function Phen(), the argument nCluster allows to manipulate the number of CPU cores to be used for computational calculations. This can significantly reduce computation time, especially when working with large datasets.

The output is a raster stack with the same spatial dimensions as the input stack, and a time dimension equal to the number of dates per growing season of the original data. In our example, as the MOD13Q1 product consists of 23 yearly observations, the output raster has 23 bands.

4.1 Reconstructing LSP with PhenMap

When plotting the phenological cycle we can clearly observe how the vegetation cover increases during summer and decreases during winter.

PhenMap(, dates=date, h=2, frequency= '16-days', nCluster=4, outname='YourDir/Phen_trapa.tif',
              format='GTiff', datatype='INT2S', rge=c(0,10000)) <- stack('YourDir/Phen_trapa.tif')

# Plot all 23 dates of the phenological cycle
plot([[1:12]], zlim=c(0,7000))
plot([[13:23]], zlim=c(0,7000))



4.2 Anomaly detection using ExtremeAnoMap

ExtremeAnoMap()  corresponds to ExtremeAnom() and can be applied to a time-series raster. When mapping the EVI anomalies of the year 2016, we can see the huge impact of insect outbreaks causing wide-spread defoliation in the study region during late summer.

ExtremeAnoMap(, dates=date, h=2, refp=c(1:354), anop=c(355:377), rge=c(0, 10000),
                         output = 'both', rfd = 0.9, nCluster=4, outname='YourDir/Anom_trapa.tif',
                         format='GTiff', datatype='INT2S') <- stack('YourDir/Anom_trapa.tif')

# Plot all 23 dates of the phenological cycle
install.packages('RColorBrewer') # Allows use of different color palettes
plot([[1:12]], zlim=c(-3000,3000), col=brewer.pal(8, 'RdYlBu'))
plot([[13:23]], zlim=c(-3000,3000), col=brewer.pal(8, 'RdYlBu'))


# Plot all 23 RFD
plot([[24:35]], zlim=c(0,100), col=brewer.pal(9, 'BuPu'))
plot([[36:46]], zlim=c(0,100), col=brewer.pal(9, 'BuPu'))

# Now with the output=clean
ExtremeAnoMap(, dates=date, h=2, refp=c(1:354), anop=c(355:377), rge=c(0, 10000),
                         output = 'clean', rfd = 0.9, nCluster=4, outname='YourDir/Anom_trapa.tif',
                         format='GTiff', datatype='INT2S') <- stack('YourDir/Extreme_trapa.tif')

# Plot all 23 dates of the phenological cycle
plot([[1:12]], zlim=c(-3000,3000), col=brewer.pal(8, 'RdYlBu'))
plot([[13:23]], zlim=c(-3000,3000), col=brewer.pal(8, 'RdYlBu'))