--- title: "Getting started with hexsmoothR" author: "Max M. Lang" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Getting started with hexsmoothR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 5, warning = FALSE, message = FALSE ) options(hexsmoothR.verbose = FALSE) set.seed(1) ``` ## What hexsmoothR does `hexsmoothR` builds a hexagonal (or square) grid over a study area, extracts raster values into the cells, and applies a Gaussian-weighted spatial-smoother that uses each cell's *N*-order neighbours. The smoother is implemented in C++ with a pure-R fallback. The package is intentionally small. There are four user-facing steps: 1. `create_grid()` - build the grid. 2. `extract_raster_data()` - pull raster values into the cells. 3. `compute_topology()` - find neighbours and weights. 4. `smooth_variables()` - apply the smoother. Helpers (`get_utm_crs()`, `find_hex_cell_size_for_target_cells()`, `hex_flat_to_edge()` and friends) cover common conveniences. ## Setup ```{r libraries} library(hexsmoothR) library(sf) library(terra) ``` We use the small NDVI raster shipped with the package so the vignette runs without external data: ```{r data} ndvi_path <- system.file("extdata", "default.tif", package = "hexsmoothR") ndvi <- rast(ndvi_path) ndvi ``` Build a study area from the raster's extent: ```{r study-area} study_area_wgs <- st_as_sf(st_as_sfc(st_bbox(ndvi))) st_crs(study_area_wgs) <- 4326 ``` ## Cell-size units depend on the CRS Cell-size units are dictated by the CRS that the grid is built in: | Grid CRS | `cell_size` units | |-----------------------|-------------------| | Projected (UTM, etc.) | metres | | Geographic (WGS84) | degrees | For real analyses, work in a projected CRS so distances and areas are meaningful. `get_utm_crs()` picks an appropriate UTM zone automatically: ```{r utm} utm_crs <- get_utm_crs(study_area_wgs) study_area_utm <- st_transform(study_area_wgs, utm_crs) utm_crs ``` ## Step 1: create a grid ```{r grid} hex_grid <- create_grid( study_area = study_area_utm, cell_size = 25000, # 25 km flat-to-flat type = "hexagonal" ) nrow(hex_grid) head(hex_grid[, c("grid_id", "grid_index")]) ``` If you don't know the right cell size up front, search for one that yields approximately a target number of cells: ```{r grid-search} target_size <- find_hex_cell_size_for_target_cells( study_area = study_area_utm, target_cells = 250, cell_size_min = 5000, cell_size_max = 100000 ) round(target_size, 0) ``` `create_grid()` will auto-pick a UTM CRS when `projection_crs = NULL` (the default), so the simplest call is just: ```{r grid-auto, eval = FALSE} hex_grid_wgs_input <- create_grid(study_area_wgs, cell_size = 25000) ``` ## Step 2: extract raster values into cells `extract_raster_data()` accepts either file paths or `terra::SpatRaster` objects. CRS transformations are applied automatically per raster, so the grid and rasters do not need to share a CRS. ```{r extract} extracted <- extract_raster_data( raster_files = list(ndvi = ndvi), hex_grid = hex_grid ) str(extracted, max.level = 1) head(extracted$data) ``` The returned list contains the value table (`data`), the grid actually used (`hex_grid`), and a bit of provenance (`cell_size`, `extent`, `variables`, `n_cells`). ## Step 3: compute topology `compute_topology()` finds the *N*-order neighbours of every cell and derives Gaussian weights from the average inter-cell distance: ```{r topology} topology <- compute_topology(hex_grid, neighbor_orders = 2) str(topology, max.level = 1) ``` The relevant numbers: ```{r topology-numbers} topology$avg_distance # metres between neighbouring centroids topology$sigma # bandwidth used for Gaussian decay topology$weights # normalised centre + per-order weights ``` By construction, all weights (centre + every order) sum to 1: ```{r topology-norm} topology$weights$center_weight + sum(topology$weights$neighbor_weights) ``` You can override the auto-weights by passing `neighbor_weights_param`: ```{r custom-weights} topo_custom <- compute_topology( hex_grid, neighbor_orders = 3, neighbor_weights_param = list(0.5, 0.3, 0.1) ) topo_custom$weights$neighbor_weights ``` ## Step 4: smooth ```{r smooth} result <- smooth_variables( variable_values = list(ndvi = extracted$data$ndvi), neighbors = topology$neighbors, weights = topology$weights ) str(result, max.level = 2) ``` For each variable you get: | Field | Meaning | |--------------------------|-----------------------------------------------------------------| | `raw` | The original (unsmoothed) cell value | | `neighbors_{st,nd,…}` | Mean of valid neighbours at order *N* | | `weighted_combined` | Gaussian-weighted average of the centre and all neighbours | Quick noise-reduction check: ```{r noise-reduction} sd_raw <- sd(result$ndvi$raw, na.rm = TRUE) sd_smoothed <- sd(result$ndvi$weighted_combined, na.rm = TRUE) round(100 * (sd_raw - sd_smoothed) / sd_raw, 1) # % SD reduction ``` ## Plotting Attach the smoothed columns back onto the grid and plot: ```{r plot, fig.width = 7, fig.height = 4} hex_grid$ndvi_raw <- result$ndvi$raw hex_grid$ndvi_smoothed <- result$ndvi$weighted_combined op <- par(mfrow = c(1, 2), mar = c(2, 2, 2, 1)) plot(hex_grid["ndvi_raw"], main = "Raw", key.pos = NULL, reset = FALSE) plot(hex_grid["ndvi_smoothed"], main = "Smoothed", key.pos = NULL, reset = FALSE) par(op) ``` ## N-order smoothing Increasing `neighbor_orders` widens the kernel. The cost is roughly linear in the number of orders for these small grids, but neighbour counts grow quadratically with order, so don't go overboard: ```{r n-order, fig.width = 7, fig.height = 5} results_per_order <- lapply(2:4, function(k) { topo_k <- compute_topology(hex_grid, neighbor_orders = k) smooth_variables( variable_values = list(ndvi = extracted$data$ndvi), neighbors = topo_k$neighbors, weights = topo_k$weights )$ndvi$weighted_combined }) sapply(results_per_order, sd, na.rm = TRUE) # SD shrinks with order ``` ## Multiple variables in one pass `smooth_variables()` processes any number of variables in a single C++ call, which is much faster than per-variable looping: ```{r multi} multi <- smooth_variables( variable_values = list( ndvi = extracted$data$ndvi, ndvi_squared = extracted$data$ndvi^2 ), neighbors = topology$neighbors, weights = topology$weights ) names(multi) ``` ## Tips and gotchas - **Always project before measuring.** If you pass a geographic-CRS grid, `cell_size` is in degrees, distances aren't comparable across latitudes, and the topology distance estimate is meaningless. Use UTM or another projected CRS, or rely on the auto-detection. - **`raw` is the input, not a smoothing output.** It's there so you can carry the original value alongside the smoothed columns without an extra join. - **NA propagation.** `weighted_combined` divides by the *realised* weight (sum of weights of non-NA contributors), so missing neighbours don't bias the result towards zero - they're simply skipped. - **Verbose mode.** Set `options(hexsmoothR.verbose = FALSE)` in scripts and notebooks to silence the progress messages emitted by `create_grid()`, `compute_topology()`, etc.