---
title: "Getting started with hexsmoothR"
author: "Max M. Lang"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with hexsmoothR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment  = "#>",
  fig.width  = 7,
  fig.height = 5,
  warning  = FALSE,
  message  = FALSE
)
options(hexsmoothR.verbose = FALSE)
set.seed(1)
```

## What hexsmoothR does

`hexsmoothR` builds a hexagonal (or square) grid over a study area, extracts
raster values into the cells, and applies a Gaussian-weighted spatial-smoother
that uses each cell's *N*-order neighbours. The smoother is implemented in C++
with a pure-R fallback.

The package is intentionally small. There are four user-facing steps:

1. `create_grid()`           - build the grid.
2. `extract_raster_data()`   - pull raster values into the cells.
3. `compute_topology()`      - find neighbours and weights.
4. `smooth_variables()`      - apply the smoother.

Helpers (`get_utm_crs()`, `find_hex_cell_size_for_target_cells()`,
`hex_flat_to_edge()` and friends) cover common conveniences.

## Setup

```{r libraries}
library(hexsmoothR)
library(sf)
library(terra)
```

We use the small NDVI raster shipped with the package so the vignette runs
without external data:

```{r data}
ndvi_path <- system.file("extdata", "default.tif", package = "hexsmoothR")
ndvi      <- rast(ndvi_path)
ndvi
```

Build a study area from the raster's extent:

```{r study-area}
study_area_wgs <- st_as_sf(st_as_sfc(st_bbox(ndvi)))
st_crs(study_area_wgs) <- 4326
```

## Cell-size units depend on the CRS

Cell-size units are dictated by the CRS that the grid is built in:

| Grid CRS              | `cell_size` units |
|-----------------------|-------------------|
| Projected (UTM, etc.) | metres            |
| Geographic (WGS84)    | degrees           |

For real analyses, work in a projected CRS so distances and areas are
meaningful. `get_utm_crs()` picks an appropriate UTM zone automatically:

```{r utm}
utm_crs        <- get_utm_crs(study_area_wgs)
study_area_utm <- st_transform(study_area_wgs, utm_crs)
utm_crs
```

## Step 1: create a grid

```{r grid}
hex_grid <- create_grid(
  study_area = study_area_utm,
  cell_size  = 25000,        # 25 km flat-to-flat
  type       = "hexagonal"
)

nrow(hex_grid)
head(hex_grid[, c("grid_id", "grid_index")])
```

If you don't know the right cell size up front, search for one that yields
approximately a target number of cells:

```{r grid-search}
target_size <- find_hex_cell_size_for_target_cells(
  study_area     = study_area_utm,
  target_cells   = 250,
  cell_size_min  = 5000,
  cell_size_max  = 100000
)
round(target_size, 0)
```

`create_grid()` will auto-pick a UTM CRS when `projection_crs = NULL`
(the default), so the simplest call is just:

```{r grid-auto, eval = FALSE}
hex_grid_wgs_input <- create_grid(study_area_wgs, cell_size = 25000)
```

## Step 2: extract raster values into cells

`extract_raster_data()` accepts either file paths or `terra::SpatRaster`
objects. CRS transformations are applied automatically per raster, so the grid
and rasters do not need to share a CRS.

```{r extract}
extracted <- extract_raster_data(
  raster_files = list(ndvi = ndvi),
  hex_grid     = hex_grid
)
str(extracted, max.level = 1)
head(extracted$data)
```

The returned list contains the value table (`data`), the grid actually used
(`hex_grid`), and a bit of provenance (`cell_size`, `extent`, `variables`,
`n_cells`).

## Step 3: compute topology

`compute_topology()` finds the *N*-order neighbours of every cell and derives
Gaussian weights from the average inter-cell distance:

```{r topology}
topology <- compute_topology(hex_grid, neighbor_orders = 2)
str(topology, max.level = 1)
```

The relevant numbers:

```{r topology-numbers}
topology$avg_distance      # metres between neighbouring centroids
topology$sigma             # bandwidth used for Gaussian decay
topology$weights           # normalised centre + per-order weights
```

By construction, all weights (centre + every order) sum to 1:

```{r topology-norm}
topology$weights$center_weight + sum(topology$weights$neighbor_weights)
```

You can override the auto-weights by passing `neighbor_weights_param`:

```{r custom-weights}
topo_custom <- compute_topology(
  hex_grid,
  neighbor_orders        = 3,
  neighbor_weights_param = list(0.5, 0.3, 0.1)
)
topo_custom$weights$neighbor_weights
```

## Step 4: smooth

```{r smooth}
result <- smooth_variables(
  variable_values = list(ndvi = extracted$data$ndvi),
  neighbors       = topology$neighbors,
  weights         = topology$weights
)
str(result, max.level = 2)
```

For each variable you get:

| Field                    | Meaning                                                         |
|--------------------------|-----------------------------------------------------------------|
| `raw`                    | The original (unsmoothed) cell value                            |
| `neighbors_<N>{st,nd,…}` | Mean of valid neighbours at order *N*                           |
| `weighted_combined`      | Gaussian-weighted average of the centre and all neighbours      |

Quick noise-reduction check:

```{r noise-reduction}
sd_raw      <- sd(result$ndvi$raw,               na.rm = TRUE)
sd_smoothed <- sd(result$ndvi$weighted_combined, na.rm = TRUE)
round(100 * (sd_raw - sd_smoothed) / sd_raw, 1)  # % SD reduction
```

## Plotting

Attach the smoothed columns back onto the grid and plot:

```{r plot, fig.width = 7, fig.height = 4}
hex_grid$ndvi_raw      <- result$ndvi$raw
hex_grid$ndvi_smoothed <- result$ndvi$weighted_combined

op <- par(mfrow = c(1, 2), mar = c(2, 2, 2, 1))
plot(hex_grid["ndvi_raw"],      main = "Raw",      key.pos = NULL, reset = FALSE)
plot(hex_grid["ndvi_smoothed"], main = "Smoothed", key.pos = NULL, reset = FALSE)
par(op)
```

## N-order smoothing

Increasing `neighbor_orders` widens the kernel. The cost is roughly linear
in the number of orders for these small grids, but neighbour counts grow
quadratically with order, so don't go overboard:

```{r n-order, fig.width = 7, fig.height = 5}
results_per_order <- lapply(2:4, function(k) {
  topo_k <- compute_topology(hex_grid, neighbor_orders = k)
  smooth_variables(
    variable_values = list(ndvi = extracted$data$ndvi),
    neighbors       = topo_k$neighbors,
    weights         = topo_k$weights
  )$ndvi$weighted_combined
})

sapply(results_per_order, sd, na.rm = TRUE)  # SD shrinks with order
```

## Multiple variables in one pass

`smooth_variables()` processes any number of variables in a single C++ call,
which is much faster than per-variable looping:

```{r multi}
multi <- smooth_variables(
  variable_values = list(
    ndvi          = extracted$data$ndvi,
    ndvi_squared  = extracted$data$ndvi^2
  ),
  neighbors = topology$neighbors,
  weights   = topology$weights
)
names(multi)
```

## Tips and gotchas

- **Always project before measuring.** If you pass a geographic-CRS grid,
  `cell_size` is in degrees, distances aren't comparable across latitudes,
  and the topology distance estimate is meaningless. Use UTM or another
  projected CRS, or rely on the auto-detection.
- **`raw` is the input, not a smoothing output.** It's there so you can carry
  the original value alongside the smoothed columns without an extra join.
- **NA propagation.** `weighted_combined` divides by the *realised* weight
  (sum of weights of non-NA contributors), so missing neighbours don't bias
  the result towards zero - they're simply skipped.
- **Verbose mode.** Set `options(hexsmoothR.verbose = FALSE)` in scripts and
  notebooks to silence the progress messages emitted by `create_grid()`,
  `compute_topology()`, etc.