- forestatrisk.computeAUC(pos_scores, neg_scores, n_sample=100000)[source]¶
Compute the AUC index.
Compute the Area Under the ROC Curve (AUC). See Liu et al. 2011.
- Parameters:
pos_scores – Scores of positive observations.
neg_scores – Scores of negative observations.
n_samples – Number of samples to approximate AUC.
- Returns:
AUC value.
- forestatrisk.accuracy_indices(pred, obs)[source]¶
Compute accuracy indices.
Compute the Overall Accuracy, the Figure of Merit, the Specificity, the Sensitivity, the True Skill Statistics and the Cohen’s Kappa from a confusion matrix built on predictions vs. observations.
- Parameters:
pred – List of predictions.
obs – List of observations.
- Returns:
A dictionnary of accuracy indices.
- forestatrisk.cross_validation(data, formula, mod_type='icar', ratio=30, nrep=5, seed=1234, icar_args={'beta_start': 0, 'burnin': 1000, 'mcmc': 1000, 'n_neighbors': None, 'neighbors': None, 'thin': 1}, rf_args={'n_estimators': 100, 'n_jobs': None})[source]¶
Model cross-validation
Performs model cross-validation.
- Parameters:
data – Full dataset.
formula – Model formula.
mod_type – Model type, can be either “icar”, “glm”, or “rf”.
ratio – Percentage of data used for testing.
nrep – Number of repetitions for cross-validation.
seed – Seed for reproducibility.
icar_args – Dictionnary of arguments for the binomial iCAR model.
rf_args – Dictionnary of arguments for the random forest model.
- Returns:
A Pandas data frame with cross-validation results.
- forestatrisk.map_validation(pred, obs, blk_rows=128)[source]¶
Compute accuracy indices based on predicted and observed forest-cover change (fcc) maps.
Compute the Overall Accuracy, the Figure of Merit, the Specificity, the Sensitivity, the True Skill Statistics and the Cohen’s Kappa from a confusion matrix built on predictions vs. observations.
- Parameters:
pred – Raster of predicted fcc.
obs – Raster of observed fcc.
blk_rows – If > 0, number of rows for block (else 256x256).
- Returns:
A dictionnary of accuracy indices.
- forestatrisk.map_confmat(r_obs0, r_obs1, r_pred0, r_pred1, blk_rows=0)[source]¶
Compute a confusion matrix.
This function computes a confusion matrix at a given resolution. Number of pixels in each category (0, 1) and in each spatial cell are given by r_obs* and r_pred* rasters.
- Parameters:
r_obs0 – Raster counting the number of 0 for observations.
r_obs1 – Raster counting the number of 1 for observations.
r_pred0 – Raster counting the number of 0 for predictions.
r_pred1 – Raster counting the number of 1 for predictions.
blk_rows – If > 0, number of lines per block.
- Returns:
A numpy array of shape (2,2).
- forestatrisk.map_accuracy(mat)[source]¶
Compute accuracy indices from a confusion matrix.
Compute Overall Accuracy, Expected Accuracy, Figure of Merit, Specificity, Sensitivity, True Skill Statistics and Cohen’s Kappa from a confusion matrix.
- Parameters:
mat – Confusion matrix. Format: [[n00, n01], [n10, n11]] with pred on lines and obs on columns.
- Returns:
A dictionnary of accuracy indices.
- forestatrisk.r_diffproj(inputA, inputB, output_file='diffproj.tif', blk_rows=128)[source]¶
Compute a raster of differences for comparison.
This function compute a raster of differences between two rasters of future forest cover. Rasters must have the same extent and resolution.
- Parameters:
inputA – Path to first raster (predictions).
inputB – Path to second raster of (sd. predictions or observations).
output_file – Name of the output raster file for differences.
blk_rows – If > 0, number of rows for computation by block.
- forestatrisk.mat_diffproj(input_raster, blk_rows=128)[source]¶
Compute a confusion matrix from a raster of differences.
This function computes a confusion matrix from a raster of differences. The raster of differences can be obtained using function
.r_diffproj()
.- Parameters:
input_raster – Raster of differences obtain with forestatrisk.r_projdiff.
- Returns:
A confusion matrix. [[np00, np01], [np10, np11]].
- forestatrisk.resample_sum(input_raster, output_raster, val=0, window_size=2)[source]¶
Resample to coarser resolution with counts.
This function resamples to coarser resolution counting pixel number having a given value. Window’s size is limited to 1000 pixels.
- Parameters:
input_raster – Path to input raster.
val – Pixel value to consider.
window_size – Size of the window in number of pixels.
output_raster – Path to output raster file.
- forestatrisk.validation_npix(r_pred, r_obs, value_f=1, value_d=0, square_size=33, output_file='npix.txt')[source]¶
Compute non-deforested and deforested pixels per square. (deprecated)
This function computes the number of non-deforested and deforested pixels in squares of a given size for both a raster of predictions and a raster of observations. Results can be used to compute correlations.
- Parameters:
r_pred – Path to raster of predictions.
r_obs – Path to raster of observations.
value_f – Value of non-deforested pixels in rasters.
value_d – Value of deforested pixels in rasters.
square_size – Size of the square side in number of pixels.
output_file – Path to result file.
- Returns:
A pandas DataFrame, each row being one square.
- forestatrisk.validation_udef_arp(fcc_file, time_interval, riskmap_file, tab_file_defor, period='calibration', csize_coarse_grid=300, indices_file_pred='indices.csv', tab_file_pred='pred_obs.csv', fig_file_pred='pred_obs.png', figsize=(6.4, 6.4), dpi=100, verbose=True)[source]¶
Validation of the deforestation risk map.
This function computes the observed and predicted deforestion (in ha) for either the calibration or validation period. Deforestation density estimates (in ha/pixel/yr) obtained with the
defrate_per_cat
function are used to compute the predicted deforestation in each grid cell. The function creates both a.csv
file with the validation data and a plot comparing predictions vs. observations. The function returns two indices, the weighted Root Mean Squared Error (wRMSE, in hectares) and the MedAE (Median Absolute Error, in hectares) associated with the deforestation predictions.- Parameters:
fcc_file – Input raster file of forest cover change at three dates (123). 1: first period deforestation, 2: second period deforestation, 3: remaining forest at the end of the second period. No data value must be 0 (zero).
period – Either “calibration” (from t1 to t2), “validation” (from t2 to t3), or “historical” (from t1 to t3).
time_interval – Duration (in years) of the period.
riskmap_file – Input raster file with categories of spatial deforestation risk at the beginning of the period.
tab_file_defor – Path to the
.csv
input file with estimates of deforestation density (in ha/pixel/yr) for each category of deforestation risk.csize_coarse_grid – Spatial cell size in number of pixels. Must correspond to a distance < 10 km. Default to 300 corresponding to 9 km for a 30 m resolution raster.
tab_file_pred – Path to the
.csv
output file with validation data.fig_file_pred – Path to the
.png
output file for the predictions vs. observations plot.figsize – Figure size.
dpi – Resolution for output image.
verbose – Logical. Whether to print messages or not. Default to
True
.
- Returns:
A dictionary. With
wRMSE
,MedAE
, andR2
: weighted root mean squared error (in ha), median absolute error (in ha), and R-square respectively for the deforestation predictions,ncell
: the number of grid cells with forest cover > 0 at the beginning of the validation period,csize_coarse_grid
: the coarse grid cell size in number of pixels,csize_coarse_grid_ha
: the coarse grid cell size in ha.