Published Date
Remote Sensing of Environment
April 2016, Vol.176:188–201, doi:10.1016/j.rse.2016.01.015
Open Access, Creative Commons license
Author
Abstract
Forest inventory and monitoring programs are needed to provide timely, spatially complete (i.e. mapped), and verifiable information to support forest management, policy formulation, and reporting obligations. Satellite images, in particular data from the Landsat Thematic Mapper and Enhanced Thematic Mapper (TM/ETM +) sensors, are often integrated with field plots from forest inventory programs, leveraging the complete spatial coverage of imagery with detailed ecological information from a sample of plots to spatially model forest conditions and resources. However, in remote and unmanaged areas such as Canada's northern forests, financial and logistic constraints can severely limit the availability of inventory plot data. Additionally, Landsat spectral information has known limitations for characterizing vertical vegetation structure and biomass; while clouds, snow, and short growing seasons can limit development of large area image mosaics that are spectrally and phenologically consistent across space and time. In this study we predict and map forest structure and aboveground biomass over 37 million ha of forestland in Saskatchewan, Canada. We utilize lidar plots—observations of forest structure collected from airborne discrete-return lidar transects acquired in 2010—as a surrogate for traditional field and photo plots. Mapped explanatory data included Tasseled Cap indices and multi-temporal change metrics derived from Landsat TM/ETM + pixel-based image composites. Maps of forest structure and total aboveground biomass were created using a Random Forest (RF) implementation of Nearest Neighbor (NN) imputation. The imputation model had moderate to high plot-level accuracy across all forest attributes (R2 values of 0.42–0.69), as well as reasonable attribute predictions and error estimates (for example, canopy cover above 2 m on validation plots averaged 35.77%, with an RMSE of 13.45%, while unsystematic and systematic agreement coefficients (ACuns and ACsys) had values of 0.63 and 0.97 respectively). Additionally, forest attributes displayed consistent trends in relation to the time since and magnitude of wildfires, indicating model predictions captured the dominant ecological patterns and processes in these forests. Acknowledging methodological and conceptual challenges based upon the use of lidar plots from transects, this study demonstrates that using lidar plots and pixel compositing in imputation mapping can provide forest inventory and monitoring information for regions lacking ongoing or up-to-date field data collection programs.
Keywords
Lidar
Landsat
Pixel composites
Change metrics
Random forest
Imputation
For further details log on website :
http://www.sciencedirect.com/science/article/pii/S0034425716300165
Remote Sensing of Environment
April 2016, Vol.176:188–201, doi:10.1016/j.rse.2016.01.015
Open Access, Creative Commons license
Author
Received 9 July 2015. Revised 2 December 2015. Accepted 20 January 2016. Available online 5 February 2016.
Highlights
- •Imputation mapping traditionally integrates field plots with remotely sensed imagery.
- •Remote unmanaged areas often lack plot data to support imputation mapping of forests.
- •Lidar plots can be used as surrogates for traditional field inventory plots.
- •Lidar plots can be integrated with Landsat pixel composites and change metrics.
- •Maps of lidar metrics and derived attributes provide flexibility for future monitoring efforts.
Forest inventory and monitoring programs are needed to provide timely, spatially complete (i.e. mapped), and verifiable information to support forest management, policy formulation, and reporting obligations. Satellite images, in particular data from the Landsat Thematic Mapper and Enhanced Thematic Mapper (TM/ETM +) sensors, are often integrated with field plots from forest inventory programs, leveraging the complete spatial coverage of imagery with detailed ecological information from a sample of plots to spatially model forest conditions and resources. However, in remote and unmanaged areas such as Canada's northern forests, financial and logistic constraints can severely limit the availability of inventory plot data. Additionally, Landsat spectral information has known limitations for characterizing vertical vegetation structure and biomass; while clouds, snow, and short growing seasons can limit development of large area image mosaics that are spectrally and phenologically consistent across space and time. In this study we predict and map forest structure and aboveground biomass over 37 million ha of forestland in Saskatchewan, Canada. We utilize lidar plots—observations of forest structure collected from airborne discrete-return lidar transects acquired in 2010—as a surrogate for traditional field and photo plots. Mapped explanatory data included Tasseled Cap indices and multi-temporal change metrics derived from Landsat TM/ETM + pixel-based image composites. Maps of forest structure and total aboveground biomass were created using a Random Forest (RF) implementation of Nearest Neighbor (NN) imputation. The imputation model had moderate to high plot-level accuracy across all forest attributes (R2 values of 0.42–0.69), as well as reasonable attribute predictions and error estimates (for example, canopy cover above 2 m on validation plots averaged 35.77%, with an RMSE of 13.45%, while unsystematic and systematic agreement coefficients (ACuns and ACsys) had values of 0.63 and 0.97 respectively). Additionally, forest attributes displayed consistent trends in relation to the time since and magnitude of wildfires, indicating model predictions captured the dominant ecological patterns and processes in these forests. Acknowledging methodological and conceptual challenges based upon the use of lidar plots from transects, this study demonstrates that using lidar plots and pixel compositing in imputation mapping can provide forest inventory and monitoring information for regions lacking ongoing or up-to-date field data collection programs.
Keywords
1 Introduction
Forests cover approximately 31% of global land surface area (FAO, 2010), providing critical ecosystem services such as wood products, wildlife habitat, biodiversity, and regulation of the earth's biogeochemical cycles (Daily, G., 1997 and Millennium Ecosystem Assessment,, 2005). In Canada, forests cover over 400 million ha, representing more than 53% of Canada's land area and accounting for approximately 10% of global forest cover (Natural Resources Canada, 2014). Canada's forests make significant contributions to global bio-geochemical cycles and provide a wide array of other ecosystem services (Natural Resources Canada, 2014). Sustainable management and conservation of forests to maintain these services requires consideration of a wide array of ecological, economic, and societal values. To inform these needs, comprehensive inventory and monitoring systems are required to provide timely, spatially complete (i.e. mapped), and verifiable information on forest structure (i.e. canopy cover, stand height, and stem volume), biomass, and carbon pools.
Inventory and monitoring of forest conditions (i.e. structure, composition, biomass, and carbon) are often conducted by National Forest Inventory programs that rely upon plot-based field sampling (Tomppo et al., 2010). Measurements from field inventory plots include highly detailed information about forest vegetation composition and structure, from which sample-based estimates of forest conditions can be calculated. Information from inventory plots and other long-term plot networks can be used to develop and calibrate growth and yield equations (Smith, S.H., et al., 1984, Lessard, V.C., et al., 2001 and Lacerte, V., et al., 2006), and facilitate the calibration and validation of remotely sensed estimates of forest inventory attributes (Smith, W.B., 2002 and Wulder, M.A., et al., 2004). The re-measurement of permanent sample plots in a forest inventory cycle can also provide critical information for change monitoring (Poso, S., 2006, Woodbury, P.B., et al., 2007, Herold, M., et al., 2011 and Moeur, M., et al., 2011). Despite their widespread use, lack of spatial coverage and lengthy re-measurement intervals can limit the effectiveness of field plots in quantifying forest change, especially in remote and/or unmanaged forests that often have greatly reduced or non-existent field inventory data (Wulder et al., 2004). To accommodate the difficulty and expense in collecting ground plots, photo plots are often used as a substitute for field plots, as well as for stratification purposes in multi-phase plot-based inventory programs (Bechtold, W.A. and Patterson, P.L., 2005 and Gillis, M., et al., 2005). Photo plots provide an opportunity for implementation of a sample-based inventory upon similar statistical underpinnings as plot-based programs. Large area representation is possible with photo plots (Nielsen, U., et al., 1979 and Magnusssen, S. and Russo, G., 2012); however complete spatial coverage is lacking. Furthermore, establishing and measuring photo plots typically requires purpose collected imagery, which in combination with expert interpretation and the need for some level of supporting field plot data, can substantially increase costs in remote and unmanaged forests.
As a complementary approach to field and photo-based inventories, satellite imagery can provide spatially complete information about forests across large areas. Regional and global maps of forest cover and change over time have long been derived from multispectral satellite imagery (Woodcock, C.E., et al., 1994, Cohen, W.B., et al., 2001, Hansen, M.C., et al., 2003 and Hansen, M.C., et al., 2013. Hermosilla, Wulder, White, Coops, & Hobart, 2015a). In particular, Landsat TM/ETM + imagery is widely used for forest mapping because of its free and open data policy, global coverage, long temporal record, large scene-sizes, and spectral and spatial resolutions compatible with characterizing vegetation conditions and dynamics (Cohen, W.B. and Goward, S.N., 2004, Woodcock, C.E., et al., 2008, Wulder, M.A., et al., 2012a and Kennedy, R.E., et al., 2014). Regional and national forest inventory programs increasingly integrate satellite imagery with inventory plots, leveraging the detailed forest conditions provided by sampled field or photo plots with complete spatial coverage provided by satellite imagery to generate spatial predictions (e.g. maps) of forest conditions (Ohmann, J.L. and Gregory, M.J., 2002, Tomppo, E., et al., 2008, Wilson, B.T., et al., 2013 and Beaudoin, A., et al., 2014). As one possible approach, nearest neighbor (NN) imputation methods are widely used in plot/imagery integration. Imputation methods fill in observations that are missing for some records (Y-variables), using related variables that are available for all records (X-variables). In forest mapping applications, Y-variables are usually measures of forest composition or structure derived from a sample of field or photo plots, while mapped X-variables can include multispectral satellite imagery and other spatially complete datasets (i.e. climate, topography). Regression approaches predict new Y-variables when they are missing, but can distort marginal distributions and covariation between Y-variables. In contrast, imputation is a method for filling in missing data by substituting values from donor observations with the underlying assumption that two locations with similar values of X-variables should be similar with respect to Y-variables. A major strength of imputation approaches is these donor-based methods are multivariate, non-parametric, and distribution-free (Eskelson et al., 2009).
Common across large scale imputation mapping projects is the use of satellite imagery as explanatory variables (X-variables). The recent availability of cost-free Landsat images in a consistent, analysis-ready, and easy-to-use format has facilitated a conceptual shift in how Landsat imagery is used in ecosystem inventory, mapping, monitoring (Wulder, M.A., et al., 2012a and Kennedy, R.E., et al., 2014). Advances in pixel-based image composting and change detection using the Landsat time series (LTS) can be especially important for improving the accuracy of forest maps and partially overcoming passive optical imagery limitations. Pixel-based image compositing methods are applied to the Landsat archive to generate cloud-free, radiometrically and phenologically consistent image composites that are spatially contiguous over large areas (Roy, D.P., et al., 2010, Hansen, M.C. and Loveland, T.R., 2012, Griffiths, P., et al., 2013 and White, J., et al., 2014). LTS change detection methods provide pixel-level characterization of forest disturbance, recovery, and other trends (Masek, J.G., et al., 2008, Huang, C., et al., 2010, Kennedy, R.E., et al., 2010 and Hermosilla, T., et al., 2015a). Pixel-based image composites are invaluable for image/plot integration when minimization of year-to-year spectral variability and seamless multi-scene image mosaics are needed to relate to plot data collected across large spatial extents or multiple years (Ohmann et al., 2012). By quantifying disturbance, recovery, and trends, LTS change metrics can improve and partially overcome Landsat limitations in predicting forest vegetation structure (Lu, 2006), because they characterize temporal changes associated with forest processes of mortality, succession, and growth (Pflugmacher, D., et al., 2012 and Zald, H.S., et al., 2014), and facilitate predictions of forest biomass dynamics over time (Powell, S.L., et al., 2010, Main-Knorn, M., et al., 2013 and Pflugmacher, D., et al., 2014).
A more practical approach for large scale inventory in remote areas may be to improve maps of forest attributes using remotely sensed information on vegetation structure. Airborne light detection and ranging (lidar) can provide detailed three-dimensional structure of forest canopies, and has been widely used to characterize forest cover and structure (see reviews by Dubayah, R.O. and Drake, J.B., 2000, Lefsky, M.A., et al., 2002 and Reutebuch, S.E., et al., 2005), been integrated with plot-based samples of forest conditions to accurately map forest structure (Hudak, A.T., et al., 2008, Falkowski, M.J., et al., 2010 and Zald, H.S., et al., 2014), and used to update forest inventory data (Hilker, Wulder, & Coops, 2008). Declining costs have made lidar acquisitions possible for increasingly large areas; yet complete, single-year wall-to-wall lidar coverage for large areas is still costly and logistically prohibitive for many regional and national forest inventory programs. As a result, the use of lidar in mapping forest conditions is often constrained to sub regional extents (Hudak, A.T., et al., 2008, Falkowski, M.J., et al., 2010 and Zald, H.S., et al., 2014), or as a component of multi-phase sampling procedures in larger landscapes (Andersen, H.-E., et al., 2012 and Strunk, J.L., et al., 2014). Alternatively, sample based “lidar plots” may provide detailed, spatially discrete information about vegetation structure, similar to field plots in areas without sufficient field inventory data (Wulder et al., 2012b).
In this paper, we build upon the potential synergies of the Landsat times series and lidar observations to predictively map forest structure (i.e. canopy cover, stand height, basal area, etc.) and aboveground biomass, bringing together national lidar transect information and wall-to-wall surface reflectance composites developed from LTS imagery within the context of Canada's forest inventory and monitoring needs. Canada's National Forest Inventory (NFI) is a sample-based inventory designed to support many inventory and monitoring needs (Gillis et al., 2005). Extending inventory information to provide spatially explicit and complete information would allow for more accurate assessments, yet financial and logistical constraints limit the number of field and photo plots, particularly in northern forests (Wulder, M.A., et al., 2004 and Falkowski, M.J., et al., 2009). For example, establishing and measuring field inventory plots may be prohibitively expensive and logistically difficult in remote, unmanaged, forests that often lack road access. For unmanaged forest areas, photo plots will typically require purpose collected imagery that are outside of regular forest inventory responsibilities. The collection of aerial photography over remote regions, coupled with needs for expert delineation and interpretation can lead to substantial costs. Our primary objective was to demonstrate the integration of lidar plot data and image composites to predictively map forest structure and aboveground biomass in Canada's forests, using the Province of Saskatchewan as an example. We used detailed lidar plots— sampled observations of forest structure collected from lidar transects—as a surrogate for traditional field and photo plots (Wulder et al., 2012b). We then relate forest conditions from lidar plots to spectral conditions and multi-temporal change metrics from pixel-based image composites recently developed from Landsat TM/ETM + imagery for Saskatchewan (White, J., et al., 2014 and Hermosilla, T., et al., 2015a), and use nearest neighbor imputation to generate wall-to-wall maps of forest structure and aboveground biomass.
2 Methods
2.1 Study area
The study area spans 37,881,800 ha of forest and represents 58% of the total area of Saskatchewan (Fig. 1). The study area can be divided into the Taiga Shield, Boreal Shield, Boreal Plains, and Prairies ecozones (Ecological Stratification Working Group, 1996); with the latter being largely unforested. Vegetation in the northern areas is composed primarily of black spruce (Picea mariana) and jack pine (Pinus banksiana), while forests to the south are characterized by quaking aspen (Populus tremuloides), white spruce (Picea glauca), and eastern larch (Larix laricina). Fire is the primary natural disturbance agent (Kurz, W.A. and Apps, M.J., 1999 and Bond-Lamberty, B., et al., 2007), and in Saskatchewan follows a latitudinal gradient with a higher percentage of annual area burned in the largely unmanaged northern forests (Stocks et al., 2002). In the southern portions of the study area, fires are smaller and less frequent in part due to suppression efforts, while timber harvesting is ongoing in these managed areas (Wulder et al., 2004).
2.2 Lidar data and lidar plot sampling
Lidar data for Saskatchewan was collected in the summer of 2010 along three survey transects with a total length of approximately 1560 km (Fig. 1). These transects are part of a larger national lidar data acquisition of 34 survey transects of more than 25,000 km (Wulder et al., 2012b). Due to the remoteness of Canada's northern forests, survey flights were made between airports with suitable runways, fuel availability, and maintenance facilities. Discrete return lidar data was collected by fixed wing aircraft equipped with an Optech ALTM 3100 laser scanner. Desired survey specifications included a flying height of 1200 m above ground level, 70 kHz pulse rate, scan angle of ± 15° from nadir, and a nominal pulse density of 2.8 returns per m2. A variety of conditions (adverse weather, extreme terrain, smoke from wildfires, and restricted airspace) required deviations from desired survey specifications (see Wulder et al., 2012b for details).
Lidar survey plots were created by overlaying a 25 × 25 m grid over the lidar transect swaths (~ 800 m wide), generating over 1.2 million plots, of which 879,482 were forested based on the Landsat-derived land cover data from the Earth Observation for Sustainable Development of Forests map (Wulder et al., 2008, available online at http://tree.pfc.forestry.ca/). The 25 m grid for lidar plots was chosen to conform to best practices in application of lidar in support of forest inventory (White et al., 2013). Gridded vegetation metrics (Table 1) characterizing vegetation height, cover, and vertical distribution were calculated for each grid cell from classified lidar point cloud files using the publicly available software FUSION (McGaughey, 2013). In addition to these gridded metrics, additional forest structure variables and total aboveground biomass were predicted for each 25 m cell (Table 1). These predictions were calculated by applying existing and published regression models developed between lidar metrics, forest structure, and aboveground biomass on 201 plots located across Quebec, Ontario, and the Northwest Territories (Wulder et al., 2012b). Application of these regression models resulted in variable predictions, but no error terms or uncertainty in the regression models associated with each prediction.
Table 1. Response variables directly measured or derived from lidar.
Source | Variable | Description | Units |
---|---|---|---|
Directly measured from lidar | elev_mean | Mean vegetation height | m |
elev_sd | Standard deviation of vegetation height | m | |
elev_cv | Coefficient of variation of vegetation height | % | |
elev_p95 | 95th percentile of vegetation height | m | |
cover_2m | Percentage of first returns above 2 m | % | |
cover_mean | Percentage of first returns above mean vegetation height | % | |
Derived from lidar | lorey_height | Lorey's tree height | m |
basal_area | Basal area | m2 ha− 1 | |
stem_volume | Gross stem volume | m3 ha− 1 | |
total_biomass | Total aboveground biomass | kg− 1ha− 1 |
Note: Derived lidar attributes calculated from equations described by Wulder et al. (2012b) that describe relationships between NFI field plot data and lidar metrics. Basal area and gross stem volume were calculated for trees at 1.3 m above ground level. Total aboveground biomass includes estimates of foliage, branches, crown, bark, wood, and stem biomass.
The 25 m lidar survey plots were sub-sampled to develop training and validation plots for model development, imputation mapping, and accuracy assessment. Sampling occurred as a multi-step process to minimize fine-scale spatial autocorrelation, and avoid potential error and bias associated with higher scan angles at transect edges, small plot sizes, forest edge effects, and mixed structural conditions. A 250 m hexagon lattice was overlain on the lidar transects, and the central 25 m cell within each hexagon was selected as a plot center (Electronic Supplemental Fig. S1). Plot centers were then excluded if they were within 100 m of a lidar transect edge. Lidar plots were then summarized using a 3 × 3 window of 25 m cells (0.5625 ha.), because simulations and empirical research indicate relationships between lidar, field plots, and Landsat imagery decline when plots are smaller than 25 m radius (Frazer, G., et al., 2011 and Zald, H.S., et al., 2014). Lidar plots were excluded if they contained both disturbed and undisturbed conditions or were within 75 m of a disturbance edge, as determined from the 1984–2012 disturbance history for Saskatchewan (Hermosilla, Wulder, White, Coops, & Hobart, 2015b). Lastly, heterogeneous plots with multiple forest conditions were excluded if the coefficient of variation of vegetation height within plots was greater than 50%. These plots were removed because for heterogeneous plots, explanatory variables are an average of values representing often very different forest attributes, resulting in nearest neighbor predictions closer to the central tendency of the overall sample, over predicting for low values and under predicting for high values. These steps resulted in 4340 lidar plots, of which 75% were randomly sampled for use in model training, and the remaining 25% reserved for model validation. The mean, range, and standard deviation of all lidar metrics and derived structural attributes are presented in Electronic Supplemental Table S1. Summary statistics and frequency distributions were compared between all 879,482 25 m forested cells within the lidar transects and the 4340 lidar plots to ensure sampled lidar plots encompassed the full range of forest structural variability present in the original lidar transects (Electronic Supplemental Table S1, Electronic Supplemental Figs. S2–S3).
2.3 Mapped predictor variables and plot-level variable extraction
The primary predictor variables were spectral indices and change metrics derived from Landsat TM/ETM + imagery, as well as topographic variables (Table 2). Cloud-free, radiometrically and phenologically consistent, and spatially contiguous annual best available pixel (BAP) image composites (White, J., et al., 2014 and Hermosilla, T., et al., 2015a) provided surface reflectance data across the province. The forestland in Saskatchewan was represented by 50 scenes (path/rows) of the Landsat Worldwide Referencing System-2 (WRS-2). Candidate images from 1984 to 2012 were identified and downloaded from the USGS Landsat archive as Level-1 Terrain Corrected (L1T) products if they were acquired ± 30 days of August 1 (Julian day 213) and had less than 70% cloud cover. L1T images were pre-processed to remove clouds, cloud shadows, clear land, and water using Fmask version 2.1 (Zhu & Woodcock, 2012). Surface reflectance and opacity values were generated using LEDAPS 1.3.0 (Schmidt, Jenkerson, Masek, Vermote, & Gao, 2013). Each pixel in every candidate image was given a score based on the 1) sensor type, 2) day of year, 3) distance to cloud or cloud shadow, and 4) opacity. The scoring system originally used by White et al. (2014) was modified by downgrading the sensor score for Landsat 7 ETM + SLC-off images (from 0.5 to − 0.7), because geometric fidelity of the Landsat 7 instrument resulted in persistent gaps in the BAP composite if SLC-off data was used in the same area for consecutive years (Hermosilla et al., 2015a). Pixel scores were summed for all L1T candidate images to determine the image with the maximum score at each pixel, BAPs were identified and assigned, and annual BAP composites generated for 1984 to 2012.
Table 2. Mapped predictor variables from Landsat spectral indices, change metrics, and topography.
Type | Variable | Description |
---|---|---|
Spectral indices | TCB | Tasseled cap brightness |
TCG | Tasseled cap greenness | |
TCW | Tasseled cap wetness | |
TCA | Tasseled cap angle, following Powell, Cohen, Yang, Pierce, and Alberti (2008) | |
TCD | Tasseled cap distance, following Duane et al. (2010) | |
Change metrics | Prechange magnitude variation | Difference in normalized burn ration (NBR) values between breakpoints B and A |
Prechange persistence | Number of years between breakpoints B and A | |
Prechange evolution | Ratio of prechange magnitude variation to prechange persistence | |
Years since greatest change | Year in which great change breakpoint occurs, subtracted from imputation map year (2010) | |
Change persistence | Number of years between breakpoints B and C | |
Change magnitude variation | Difference in NBR values between breakpoints B and C | |
Change rate | Ratio of change magnitude variation to change persistence | |
Postchange magnitude variation | Difference in NBR values between breakpoints D and C | |
Postchange persistence | Number of years between breakpoints D and C | |
Postchange evolution | Ratio of postchange magnitude variation to postchange persistence | |
Topography | Elevation | Elevation in meters |
Slope | Slope in degrees | |
TSRI | Topographic solar radiation index, following Roberts and Cooper (1989) | |
TWI | Topogrpahic wetness index, following Beven and Kirkby (1979) |
Note: See Electronic Supplemental Fig. S4 for visual depiction of breakpoints used to generate change metrics.
High specificity in user-defined rules can result in data gaps in BAP composites (White et al., 2014). To fill these gaps in a manner that does not confound detection and quantification of changes over time, we used proxy BAP image composites (Hermosilla et al., 2015a) developed in conjunction with multi-temporal change metrics. Initially, outliers from the temporal trajectories for each pixel through the time series were removed using the filtering method described by Kennedy et al. (2010). Then, changes were detected using the Normalized Burn Ratio (NBR, Key & Benson, 2006) as the spectral index, and a variation of the bottom-up breakpoint detection algorithm (Keogh, E., et al., 2001 and Hermosilla, T., et al., 2015a), which divided the time series into temporal trends. Following the generation of breakpoints, a contextual analysis was conducted to improve the consistency and spatial homogeneity of change events, while change events smaller than 0.5 ha were removed. Breakpoints and trends were used to derive metrics that characterize the year, temporal persistence, and spectral magnitude of disturbances (as the difference in NBR values between breakpoints), as well as pre- and post-change conditions (Table 2, Electronic Supplemental Fig. S4). Finally, data gaps were filled with proxy spectral values computed using a piecewise linear interpolation guided by the temporal trends, resulting in seamless proxy BAP composites. From the proxy BAP composites for 2010, we calculated Tasseled Cap indices of brightness (TCB), greenness (TCG), and wetness (TCW) (Crist, E.P. and Cicone, R.C., 1984 and Kauth, R.J. and Thomas, G., 1976). We also calculated Tasseled Cap angle (TCA) as described by Powell et al. (2008) where TCA = arctan (TCG/TCB), and Tasseled Cap distance (TCD) as described by Duane et al. (2010), where . TCA describes the gradient of percent vegetation cover, whereas TCD is associated with vegetation composition and structure.
Topographic variables used as predictors included elevation, slope, topographic wetness index (TWI), and topographic solar radiation index (TSRI). The digital elevation model comes from the Canadian Digital Elevation Data (CDED, Natural Resources Canada, 1995). The CDED is based on the National Topographic Database (NTDB) digital files with scales ranging from 1:50,000 to 1:250,000. CDED grid spacing is based on geographic coordinates at a maximum and minimum resolution of 0.75 and 3 arc sec for the 1:50,000 scale, and 3 and 12 arc sec for the 1:250,000 scale respectively, depending on latitude. Slope (percent), TWI, and TSRI were calculated from the CDED. The TWI is a model of potential surface moisture, determined by topographic position as described by Beven and Kirkby (1979), where TWI = ln (specific catchment area/ tan (slope in radians)). TSRI is a transformed measure of aspect, calculated as described by Roberts and Cooper (1989), where TSRI = 1 - cos ((pi/180)(aspect - 30))/2. TSRI values range from 0 to 1, with a value of 0 indicating cool NE slopes, and 1 indicating warm SW slopes. Elevation, slope, TWI, and TSRI were reprojected and resampled to match the projection and finer 30 m resolution of Landsat pixel composites and change metrics. Mean values for all predictor variables (spectral indices, change metrics, and topographic variables) were extracted for each lidar plot using weighted mean values to account for the different sizes of LiDAR plots and Landsat pixels (Electronic Supplemental Fig. S1).
2.4 Random forest model and imputation mapping
Sampled response variables of forest structure were related to mapped predictor variables of spectral conditions, disturbance history, and topography using the random forest machine learning algorithm (RF), and predictions then were mapped across the study area using nearest neighbor (NN) imputation. RF builds on the functionality of single classification trees (or regression trees for continuous predictions) by extracting a single prediction from an ensemble of tree models (in this case we used 500 trees as a compromise between adequate number of trees while minimizing excessive computational time required for the large number of pixels to be mapped). Each individual classification tree within a RF model is built from a random subset of observations and explanatory variables (Breiman, 2001). There are many statistical methods for relating response and predictor variables in the context of multi-variate imputation mapping, including canonical correlation analysis (Moeur & Stage, 1995), canonical correspondence analysis (Ohmann & Gregory, 2002), and ensemble machine learning methods such as RF (Hudak, A.T., et al., 2008 and Henderson, E.B., et al., 2014). One advantage of using RF is its efficiency for high-dimensional data (Prasad, Iverson, & Liaw, 2006), which may be especially important for the hierarchical structure of predictor variables resulting from forest change occurring at different times, magnitudes, directions, and recovery rates.
We built RF models using the R-package ‘yaImpute’ (Crookston & Finley, 2008). The method implemented in this R-package amalgamates multiple random forest models, each tuned to a single response variable directly measured from lidar (mean, standard deviation, 95th percentile, and coefficient of variation of vegetation height, percent of first returns above 2 m, and percent of first returns above mean). Derived lidar metrics of forest structure and biomass were attached as ancillary variables to the pixel-level imputation predictions after mapping, meaning these variables are not modeled in RF, but rather joined to the map after plot ids are predicted for each pixel (sensu Zald et al., 2014). We evaluated the relative importance of each predictor variable in the RF model by calculating variable importance scores. For each RF tree the prediction error on the out-of-bag portion of the data is recorded. Then the same is done after permuting each predictor variable. The difference between the two are then averaged over all trees, and normalized by the standard deviation of the differences to generate variable importance scores. To generate spatial predictions (maps), RF chooses neighbor plots based on a non-Euclidean distance measure built from the nodes matrix of the amalgamated RF models. This nodes matrix holds a plot identifier for each terminus (or ‘leaf’) of each classification tree in the RF models. For new locations (map pixels), the terminal nodes where the pixel falls in the RF models are recorded. The nearest-neighbor (kNN = 1) plot for the new pixel is the most frequent plot within its set of nodes. Pixel predictions were made using the RF model and the predict function in the R-package ‘raster’ (Hijmans, 2014).
2.5 Map validation
From the imputation map, predictions of vegetation metrics and forest structure attributes were extracted for map pixels associated with the validation plots using the same methods used to extract values of mapped explanatory variables for training plots. Predictions were compared to observed values using a variety of accuracy measures. We calculated R2, root mean squared error (RMSE), and the normalized root mean squared error (nRMSE, RMSE divided by the range of observed values) for each response variable. We also quantified prediction bias and random error using the systematic and unsystematic agreement coefficients (ACsys and ACuns) following Ji and Gallo (2006). ACsys represents the difference between observed and predicted values that can be predicted by a simple linear model (bias from the 1:1 line), and ACuns represents differences which appear to be random. In calculating ACsys and ACuns, geometric mean functional relationships (GMFR) regression lines were used to describe the relationship between the observed and predicted datasets. Unlike ordinary least squares (OLS) regression, GMFR is a symmetric regression model that assumes both X and Y are subject to error.
We assessed the ability of the model to predict ecologically important gradients in forest structure (canopy cover and total aboveground biomass) associated with the magnitude of recent wildfires, as well as the magnitude and years since past wildfires. First, we used the Canadian National Fire Database (CNFDB, Canadian Forest Service, 2013a) to select five large fires (each > 30,000 ha, 177,238 ha total) that burned across Saskatchewan in 2008 (no large fires burned in 2009), and had little evidence (< 10% of area) of prior fire activity in the preceding century (Electronic Supplemental Fig. S4). Sample points within these selected fire perimeters were generated using a hexagon lattice with 500 m spacing using the function spsample in the R package ‘sp’ (Bivand, Pebesma, & Gomez-Rubio, 2013). From these sample points, 90 ∗ 90 m polygons were created corresponding to 3 ∗ 3 cell windows in the imputation map. Polygons were excluded if they contained non-forest land cover or positive change magnitude variation values, because non-forest land cover is beyond the study scope of inference, and positive change polygons are indicative of spatial errors in the fire perimeter database and/or unburned patches within the fire perimeters. Polygon exclusion resulted in 6403 polygons across the five fires. Boxplots were then generated to visualize trends and variation of predicted canopy cover and total aboveground biomass in relation to change magnitude variation.
We also assessed patterns of predicted canopy cover and total aboveground biomass in relation to change magnitude variation and years since fire (0–26 years, corresponding to fires from 1985–2010). Using the CNFDB, we selected all fires (> 500 ha.) that burned between 1985–2010, resulting in 873 fires totaling over 1.5 million ha (Electronic Supplemental Fig. S6). Using the same methods as described above, we sampled points within selected fire perimeters (although with larger 1000 m hexagon spacing), generated 90 ∗ 90 m sample polygons, and excluded polygons containing non-forest land cover and positive change magnitude variation values. We then binned sample polygons by year of fire (26 bins) and change magnitude variation (6 bins of values ranging from 0 to − 1.5 in 0.25 increments), calculated the average predicted cover and biomass values within each bin, and removed bins with fewer than 50 plots, resulting in 16,415 total plots. We then visualized trends in predicted canopy cover and biomass in relation to the years since fire and change magnitude variation using adjacency plots (Wickham, 2009).
3 Results
3.1 Importance of explanatory variables in random forest model
Overall, Tasseled Cap indices (including TC angle and TC distance) and elevation were the most important predictor variables related to our response variables (Fig. 2). The most important change metrics were change magnitude, post-change magnitude, years since greatest change, and post change evolution rate. The topographic variables slope and TWI were also moderately important in the model. Change persistence, pre-change evolution rate, pre-change magnitude, and TSRI had the lowest importance in the model.
3.2 Summary statistics and prediction accuracy on validation plots
Mean predicted values for all forest attributes closely followed mean observed values on validation plots, while the predicted range and standard deviation were slightly but consistently less than observed (Table 3). R2 values ranged from 0.42 to 0.69, with the highest values for canopy cover (percentage of first returns above 2 m and mean elevation), and the lowest values for vertical variability of vegetation height (elev_stddev and elev_cv). Values for nRMSE were consistently low, ranging from 0.08 to 0.14. ACsys values were consistently high (indicating low bias), ranging from 0.86 to 0.97, while ACuns values were more variable, ranging from 0.18 to 0.63, with less random error for predictions of canopy cover, and greater random error for predictions of vertical variability of vegetation height. For all forest attributes, the model over and under predicts for low and high observed values, respectively (Fig. 3). This bias is most noticeable for derived variables (basal area, gross stem volume, total aboveground biomass), and least noticeable for canopy cover.
Table 3. Summary and accuracy statistics for predicted forest structure variables on 1085 validation plots.
Observed | Predicted | Accuracy metrics | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Variable | Units | Mean | Range | Stddev | Mean | Range | Stddev | R2 | RMSE | nRMSE | ACuns | ACsys |
elev_mean | m | 5.65 | 2.28–18.82 | 2.65 | 5.54 | 2.55–17.33 | 1.97 | 0.52 | 1.84 | 0.11 | 0.36 | 0.90 |
elev_stddev | m | 1.85 | 0.29–8.19 | 1.10 | 1.78 | 0.42–6.11 | 0.79 | 0.48 | 0.80 | 0.10 | 0.30 | 0.86 |
elev_p95 | % | 8.72 | 2.78–26.39 | 4.01 | 8.52 | 3.29–19.09 | 3.00 | 0.54 | 2.74 | 0.12 | 0.40 | 0.90 |
elev_cv | m | 0.32 | 0.11–0.85 | 0.08 | 0.31 | 0.13–0.87 | 0.06 | 0.42 | 0.06 | 0.08 | 0.18 | 0.90 |
cover_2m | % | 36.35 | 1.46–97.08 | 24.10 | 35.77 | 2.53–94.4 | 20.68 | 0.69 | 13.45 | 0.14 | 0.63 | 0.97 |
cover_mean | % | 17.86 | 0.64–55.28 | 13.02 | 17.56 | 0.87–53.18 | 11.03 | 0.68 | 7.34 | 0.13 | 0.62 | 0.97 |
loreys_height | m | 9.77 | 4.37–22.18 | 3.15 | 9.62 | 4.93–17.52 | 2.40 | 0.55 | 2.11 | 0.12 | 0.43 | 0.91 |
basal_area | m2/ha | 11.04 | 1.6–48.50 | 7.92 | 10.66 | 2.49–36.20 | 6.00 | 0.57 | 5.20 | 0.11 | 0.45 | 0.91 |
stem_volume | m2/ha | 39.69 | 3.26–231.42 | 36.53 | 37.81 | 5.63–158.36 | 26.67 | 0.53 | 24.99 | 0.11 | 0.39 | 0.88 |
total_biomass | kg/ha | 71048.81 | 4869.12–521957.37 | 76988.35 | 67172.29 | 8358.31–343407.77 | 54871.18 | 0.50 | 54607.22 | 0.11 | 0.31 | 0.86 |
Note: stdev = standard deviation, RMSE = root mean squared error, nRMSE = normalized root mean squared error, ACuns = unsystematic agreement coefficient and ACsys = systematic agreement coefficient.
Qualitative analysis of imputation maps also provides a useful assessment of map attributes at multiple spatial scales. Imputation maps of forest structure and total aboveground biomass display clear latitudinal trends at the provincial scale, with higher productivity and forest cover in the southern boreal forest, while the legacy of past wildfires is clearly identifiable as large areas with reduced cover, height, and biomass (Fig. 4, Electronic Supplemental Figs. S7–S8). At much finer scales, the same imputation maps are spatially consistent with wildfire perimeters and intra-fire variability, while undisturbed areas display fine grained patterns likely caused by topographic and edaphic factors.
3.3 Predicted canopy cover and total aboveground biomass in relation to past wildfires
Predicted forest canopy cover and total aboveground biomass displayed consistent trends in relation to magnitude change in selected wildfires that burned in 2008 (Fig. 5). High fire severity resulted in consistently low predictions of canopy cover. As change magnitude values approach zero (fire severity declined), both the mean and variability of predicted canopy cover increased. Total aboveground biomass also displayed increased values with lower severity in selected 2008 wildfires, although changes in both mean values and variability of biomass were smaller than those displayed by canopy cover.
Predicted canopy cover and total aboveground biomass also displayed trends in relation to years since wildfire, although these trends covaried with change magnitude, while also being less consistent than found on the selected wildfires that burned in 2008 (Fig. 6). Predicted canopy cover was generally higher with increased years since fire, but the relationship was nonlinear, with variable predictions 16–25 years since fire. Predicted canopy cover increased as change magnitude values approached zero (decreasing fire severity) for more recent fires, but this relationship was not apparent 16–25 years since fire. Predictions of total aboveground biomass were higher with lower fire severity, and this pattern was mostly consistent for 0–25 year old fires. However, biomass predictions showed little trend 0–9 years since fire, after which predicted biomass declined in forests that had experienced high fire severity, and did not appear to decline over time in lower severity fires.
4 Discussion
Imputation approaches are widely used for estimating and mapping forest attributes, but the use of lidar as a surrogate for traditional inventory plots is a new conceptual approach to mapping forest conditions that has seen little application (although see Ahmed, Franklin, Wulder, & White, 2015). While satellite imagery is widely integrated with inventory plots to map forest conditions, pixel composites and change metrics have only recently become available for large areas, so their use as explanatory variables in mapping forest conditions is less common (Pflugmacher, D., et al., 2014and Zald, H.S., et al., 2014). A key finding of this study is that lidar plots can be used as a surrogate for traditional field inventory plots for imputation mapping applications. Below we discuss the ability of our mapping approach to predict forest structure and biomass and spatially characterize key ecological patterns and processes. We also discuss the value of pixel composites and change metrics derived from Landsat TM/ETM + imagery in the context of imputation mapping. While lidar plots have great potential in forest mapping applications, there are important limitations and uncertainty we discuss that may be inherent to lidar plots, the transect sampling utilized in this study, and imputation procedures.
4.1 Statistical and ecological validity of imputation maps
When combined with spectral and change information from Landsat time series imagery, plot-level predictions of many forest attributes have comparable or superior accuracy to other regional-scale imputation mapping projects that use traditional field plots and satellite imagery. For example, our estimates of basal area were equal or more accurate (R2 = 0.57) than imputation maps generated in Oregon and California using traditional inventory plots and Landsat imagery without change metrics (R2 = 0.16–59) (Ohmann, J.L. and Gregory, M.J., 2002, Pierce, K.B., et al., 2009 and Zald, H.S., et al., 2014). Likewise, our estimates of total aboveground biomass have plot-level accuracy comparable to recent imputation mapping of Canada's forest using photo plots and moderate resolution MODIS imagery (Beaudoin et al., 2014). These comparisons to other studies are partially confounded by different statistical methods, inventory protocols, and forest types; yet they indicate lidar plots can be used at least as accurately as traditional field and photo inventory plots to predict and map forest structure and total aboveground biomass.
In addition to plot-level accuracy assessments, our spatial predictions of forest attributes appear to be ecologically realistic, quantifying dominant ecological patterns and processes. In the ecozones containing most of Saskatchewan's forests (Tiaga Shield, Boreal Shield, Boreal Plains), fire is the dominant disturbance agent shaping ecosystem dynamics (Kurz, W.A. and Apps, M.J., 1999 and Bond-Lamberty, B., et al., 2007). We found that in recent fires, predicted canopy cover and total aboveground biomass were negatively associated with increased fire severity, consistent with known relationships of these forest attributes to burn severity (Campbell, J., et al., 2007 and Miller, J.D., et al., 2009). Predicted total aboveground biomass was less sensitive than canopy cover to fire severity, which should be expected since trees have lower combustion factors compared to litter and soil, resulting in tree biomass largely remaining in forests via transfer from living to dead biomass pools (i.e. snags and logs), and a larger portion of pyrogenic emissions to the atmosphere coming from combusted litter and soil (Campbell et al., 2007). Trends in predicted canopy cover and total aboveground biomass were more variable across 25 years of fires, but were relatively consistent with post-fire vegetation recovery. For example, the relationship of canopy cover and burn severity is much stronger in the initial years after wildfire, but this relationship weakens over time as forests recover, consistent with post-fire spectral trajectories of forest recovery over time (Goetz, Fiske, & Bunn, 2006). More difficult to explain are predictions of total aboveground biomass that do not decline until approximately eight years after wildfires, and then are variable but have largely stopped declining 15 years after fire. These trends may result from how post-fire tree mortality, snag fall, and vegetation recovery dynamics are reflected in lidar metrics such as canopy cover and mean vegetation height used to derive biomass estimates (Bolton, Coops, & Wulder, 2015). Dead trees (snags) can persist for many years after fire (Russell, R.E., et al., 2006, Angers, V.A., et al., 2010 and Dunn, C.J. and Bailey, J.D., 2012), resulting in lidar metrics of canopy height remaining elevated until the majority of snags within a pixel fall or experience significant breakage. Later in post-fire recovery, snagfall-driven reductions in stand height may be partially offset by increases in canopy cover and vegetation height associated with tree regeneration and growth. The stability of lidar vegetation metrics may be consistent with temporal lags of tree mortality and snag fall in post-fire forest dynamics, but they may also represent a source of error where the model is unable to distinguish living trees and snags with similar stand heights, even with inclusion of Landsat-derived disturbance history.
4.2 Importance of pixel composites and change metrics in imputation mapping
Pixel composites and change metrics derived from Landsat TM/ETM + imagery were an important component of the imputation method used in this study. Spatially complete, cloud-free, and radiometrically consistent surface reflectance image composites support the development of spatially complete and consistent predictions of forest attributes, regardless of the statistical methods or plot data used. The opening of the Landsat archive and development of best available pixel (BAP) compositing methods have greatly improved the development of cloud-free image mosaics with the holdings for Canada being spatially and temporally comprehensive (White & Wulder, 2013); notwithstanding the favorable coverage of present data, gaps remain on an annual basis necessitating the development and use of proxy image composites (White, J., et al., 2014 and Hermosilla, T., et al., 2015a). With the launch of Landsat 8 in 2013, Landsat OLI imagery is acquired in a denser and more globally consistent manner than Landsat TM/ETM + (Roy et al., 2014), which should greatly increase the density of cloud-free scenes. Furthermore, the successful launch of the Sentinel 2 mission by the European Space Agency, with similar spatial and spectral characteristics to Landsat (Drusch et al., 2012), will greatly increase the temporal density of imagery available in the future. Despite current and future improvements with Landsat 8 and Sentinel 2, pixel compositing will continue to be necessary for forest inventory and monitoring applications for both historical and future imagery, not only in Canada, but also for other boreal and tropical regions with persistent cloud cover, snow cover, and short phenologic windows (Asner, G.P., 2001 and Potapov, P., et al., 2011).
Previous studies have demonstrated the benefits of incorporating multi-temporal change metrics from Landsat imagery to improve predictions of forest structure and biomass (Pflugmacher, D., et al., 2012, Pflugmacher, D., et al., 2014, Zald, H.S., et al., 2014 and Ahmed, O.S., et al., 2015). However, scaling up from these smaller proof of concept studies has been limited by the availability of change metrics at regional and national scales. We leveraged the recent availability of change metrics and proxy pixel composites for Saskatchewan (Hermosilla, T., et al., 2015a and Hermosilla, T., et al., 2015b). Landsat pixel compositing and multi-temporal change metrics are an area of active research and product development. The specific image products used in this study are not nationally or globally available, yet the emergence of similar national and global image composites and change metrics (Roy, D.P., et al., 2010 and Hansen, M.C., et al., 2013) suggests multi-temporal change mapping will be increasingly available to support regional and national forest inventory programs. In addition to multi-temporal change detection, imputation mapping of forest conditions may greatly benefit from recent advances in change attribution (Kennedy, R.E., et al., 2015 and Hermosilla, T., et al., 2015b). Change attribution may improve prediction accuracy by distinguishing pixel-level spectral conditions, year of change, and magnitude of change that can be very similar for different disturbance agents (for example harvesting versus wildfire), yet these disturbances can result in very different forest structure and biomass. In northern Saskatchewan, change attribution may be less important because fire is the dominant disturbance agent. However, the increased importance of other disturbance agents such as harvesting may lower prediction accuracy in disturbed forests in southern Saskatchewan, and this issue may be especially important in other forest regions with multi-agent disturbance regimes.
4.3 Limitations of lidar plots in imputation mapping
A key component of this study is the use of lidar plots as surrogates for traditional field plots within the imputation technique. This novel element also poses conceptual and methodological limitations for accuracy assessment, uncertainty analysis, and large area estimation of forest resources. The most obvious limitation is the inability to map species composition, because lidar plots lack tree observations that include species identification, and this limitation also applies to any variable that is only observable with field plot data. Imputation mapping relies on relationships between sampled response and mapped predictor variables, constraining predictions to values in the plot sample, so that predictions cannot exceed biological reality as determined from the sample dataset. However, this assumes the sample dataset adequately contains the variables of interest, and represents the range of variation in these variables that are unmeasured in the larger study area. Plot-level accuracy assessments used a validation set of sampled lidar plots spatially constrained within lidar transects. Sampled lidar plots cover the range and frequency distribution of values in the larger lidar transects (Electronic Supplemental Table S2, Electronic Supplemental Figs. S2–S3), but meaningful plot-level validation of predicted forest attributes beyond the lidar transects was precluded by the scarcity of available inventory field plots in the study area, as Provincial and industry inventory data exist, but were not available at the time of this study. Our mean predictions of aboveground biomass for the Boreal Shield (56.47 tons/ha) and Boreal Plains (104.30 tons/ha) ecozones are reasonable when compared to provincial-level estimates derived from National Forest Inventory plots (e.g., 75.6 tons/ha for Boreal Shield and 77.7 tons/ha for Boreal Plains) (Canadian Forest Service, 2013b), but these comparisons have significant caveats. NFI-based biomass estimates incorporate all plot-data for entire ecozones, which extend beyond the boundaries of Saskatchewan and our study area. Moreover, NFI plot density is low for these ecozones (290 for the Boreal Shield, 116 for the Boreal Plains), and entirely lacking for the Taiga Shield and Prairies ecozones. For the two ecozones with NFI data, our maximum biomass estimates are 56 and 59% higher than NFI estimates. However, natural and anthropogenic disturbances prevent most forests from reaching their maximum potential biomass, posing problems for establishing upper bounds using geographically systematic inventory plots (Smithwick, Harmon, Remillard, Acker, & Franklin, 2002), and this issue is compounded by low plot sample numbers in Canada's unmanaged northern forests.
Plot-level accuracy assessments did find a consistent bias, over predicting all attributes at low observed values and under predicting at high values. Under prediction for high observed values is also found in studies relating traditional field inventory plots and space borne lidar to Landsat spectral conditions (Ohmann, J.L. and Gregory, M.J., 2002 and Duncanson, L. I., et al., 2010), consistent with saturation of Landsat reflectance data in forests with high biomass and leaf area index. Prediction biases may also be caused to varying degrees by the lidar data, lidar plot sampling procedures, and nearest neighbor methodology. Lidar data within a 25 m pixel was only used if the percent of first returns above 2 m was greater than zero, resulting in exclusion of lidar data that may represent forest conditions found in canopy gaps, forest openings, and recent harvests. Heterogeneous plots with multiple forest conditions were excluded, but this reduced the frequency of lidar plots with very low canopy cover (Electronic Supplemental Fig. S3). Finally, the nearest neighbor imputation method itself may contribute to prediction bias. Lidar plots occupy a multi-dimensional cloud within the feature space of the explanatory variables, with longer nearest neighbor distances towards the edges of the plot cloud. Therefore, pixels towards the edges of the feature space are more likely to have larger nearest neighbor distances and their nearest neighbor plots are more likely to be towards the centroid of the feature space, resulting in over and under predictions at the low and high ranges, respectively. While beyond the scope of this study, these biases may be partially corrected by approaches that stratify plot selection during sampling or imputation procedures (McRoberts, R.E. and Tomppo, E.O., 2007 and Eskelson, B.N., et al., 2008). However, using the plots themselves to derive strata can potentially violate the assumption that plots are a random sample of a stratum, this can be a particular concern when using nearest neighbor imputation techniques (McRoberts & Tomppo, 2007), requiring other independent data sources for stratification purposes.
In addition to limitations for plot-level accuracy assessment, spatially clustered lidar plots from transacts may require new conceptual and methodological approaches to spatial accuracy assessment, area estimation, and uncertainty analysis. Geographically systematic sampling underlies assessment protocols for continuous geospatial datasets such as imputation maps of forest conditions (Riemann, Wilson, Lister, & Parks, 2010), yet the lidar transects we use are not a spatially systematic probability sample. Additionally, derived lidar attributes such as stem volume and biomass are themselves not true observations but predictions derived from regressions between lidar and values from NFI field plots, posing questions for using accuracy assessments protocols that compare observed and predicted values. While posing problems for accuracy assessments, the transect based sample we use is more widely distributed spatially and much larger than is typically used in global studies of Lefsky (2010) and Simard, Pinto, Fisher, and Baccini (2011), as summarized in Bolton, Coops, and Wulder (2013). Furthermore, field plot based studies are often limited in the number of samples available and are often spatially limited to safe and ready access to road networks, which would adversely constrain and bias plot sampling in Northern Canada, as well as many regions with remote and inaccessible forests.
Nearest neighbor distance maps can provide an indirect metric of spatial error in map predictions (Ohmann, J.L. and Gregory, M.J., 2002 and Beaudoin, A., et al., 2014), but they do not provide direct measures of error associated with the predicted forest conditions of interest. Limitations of spatial accuracy assessment also pose problems for area estimation and uncertainty analysis. For example, from our predictions we can generate total estimates of basal area (365.68 M m2), timber volume (1321.90 M m3), and total aboveground biomass (2.36 Pg) for Saskatchewan, yet we have limited tools for quantifying the uncertainty of such estimates. One approach to quantify uncertainty would be to apply recently developed variance estimators to imputation predictions, but the computational requirements to calculate variance estimates may be prohibitive when using the NN imputation approach over large areas (McRoberts, Tomppo, Finley, & Heikkinen, 2007). There are other approaches that have recently been developed to quantify variance in univariate regressions (Gregoire, T.G., et al., 2011 and Ståhl, G, et al., 2011) and uncertainty in multi-variate imputation procedures (Bell, Gregory, & Ohmann, 2015), and development of such procedures for Random Forest implementations of imputation procedures is an area of future research.
More fundamentally, a major area for future research is to assess how completely lidar plots from transects occupy the feature space of the larger study area of interest, as well as covering the variation of the forest variables of interest. Such knowledge could be used with a priori flight information to plan optimal locations for lidar transects, as well as after lidar collection to determine if additional lidar is needed, quantify areas with diminished plot support, and stratify lidar plot sampling and model based estimation. Regardless of these potential limitations, it is important to recognize the lidar plot sample size used in this study is larger and better distributed than could be expected from a field based program over the same jurisdiction.
Derived forest attributes (basal area, stem volume, total aboveground biomass, etcetera) also have important sources of error and limitations for quantifying their uncertainty. These attributes were calculated using empirical models developed between lidar plots and 201 NFI inventory plots across Canada's boreal forest (Wulder et al., 2012b). Model fit for these attributes was high (adjusted R2 of 0.64–0.84), and our imputation models only use the predicted values from these equations. It is possible to incorporate the uncertainty or confidence intervals from the Wulder et al. (2012b) models into pixel-level imputation predictions as ancillary variables, but there are important reasons why we did not. Most importantly, equations from Wulder et al. (2012b) were developed across multiple ecozones and provinces in order to have a robust NFI plot sample size and nationally consistent equations. While model fit and lidar metrics used as predictor variables are generally consistent with other studies (Hudak, A.T., et al., 2006 and Li, Y., et al., 2008), applying confidence intervals from national models to an area without robust plot support is likely exceeding the data's scope of inference. More broadly, calculating basal area, stem volume, and total aboveground biomass is limited by multiple sources of error. For example, tree species vary in their wood density and canopy structure, and these species-level differences influence biomass equations (Lambert, M.-C., et al., 2005, Ung, C.-H., et al., 2008 and Jenkins, J., et al., 2003), yet we do not incorporate tree species into our predictions. However, Tompalski, Coops, White, and Wulder (2014)found error in height had a larger impact than species information on simulated tree volume estimates, implying our lidar based models of forest structure are likely more accurate that estimates generated with species data but without precise lidar height estimates. Furthermore, field-based biomass estimates themselves have many sources of sampling and measurement error, as well as model misspecifications (Temesgen, Affleck, Poudel, Gray, & Sessions, 2015). One of the strengths of our imputation approach is we only used lidar metrics (canopy height, variation of canopy height, and canopy cover) as response variables in the Random Forest model, while derived forest attributes with higher potential but unquantified uncertainty and error were predicted as ancillary variables. This approach is inherently more flexible as it allows future incorporation of more accurate predictions as new models of the derived forest attributes become available.
5 Conclusions
Canada's National Forest Inventory (NFI) is designed to support many inventory, monitoring, and reporting objectives. As a sample based program, the NFI represents 100% of Canada's forest with a 1% sample (Gillis et al., 2005), resulting in a need to characterize and map forest conditions on the remaining 99%. This information must be temporally and nationally consistent, spatially explicit and exhaustive, at a spatial resolution that can resolve natural and human impacts, and generated in a way that is transparent and scientifically defensible. Approaches that integrate inventory plots with remotely sensed imagery have commonly had limitations in the context of Canada's extensive northern forests (i.e. too few inventory plot samples, temporal mismatch of plots and imagery, and now-mitigated issues with multi-scene Landsat mosaics). By integrating lidar plots with Landsat pixel composites and change metrics, our approach overcomes many of these limitations to provide timely, spatially complete, and verifiable information on forest conditions. This information can inform policy, support jurisdictional (including national) monitoring programs, and facilitate a wide range of scientific inquiry. Furthermore, lidar-based imputation of forest attributes is temporally transferable (Fekety, Falkowski, & Hudak, 2015), indicating that the single year predictions we present in this study can be extended across the Landsat time series to characterize multi-decadal changes in forest conditions.
Acknowledgments
This research was undertaken as part of the “National Terrestrial Ecosystem Monitoring System (NTEMS): Timely and detailed national cross-sector monitoring for Canada” project was jointly funded by the Canadian Space Agency (CSA)Government Related Initiatives Program (GRIP) and the Canadian Forest Service (CFS) of Natural Resources Canada.
Appendix A Supplementary data
For further details log on website :
http://www.sciencedirect.com/science/article/pii/S0034425716300165
No comments:
Post a Comment