---
title: "WC_config_file"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{WC_config_file}
%\VignetteEncoding{UTF-8}
%\VignetteEngine{knitr::rmarkdown}
editor_options:
markdown:
wrap: 72
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```
## Wildlife Computers QC config file structure
The Wildlife Computers
JSON config file has the same 4-block structure as the [SMRU_config_file](SMRU_config_file.html).
The `meta` block is similarly
optional, however, some of the parameters within the blocks differ. Below is the
config file for a NSWOO (IRAP) QC workflow on WC SPOT6 tags deployed on
loggerhead turtles in NSW, Australia.
{width="100%"}
The slightly different config parameter structure accounts for differences in
data structures between the 2 manufacturers' tags. The WC config parameters are
as follows:
-
`setup` config block specifies the program overseeing data assembly &
paths to required data, metadata & output directories:
-
`program` the national (or other) program of which the data is a part.
For example, `imos`, `atn`, `otn`.
-
`data.dir` the name of the data directory. Must reside within the `wd`.
-
`meta.file` the metadata filename. Must reside within the `wd`. Can be
NULL, in which case, the `meta` config block (see below) must be present
& tag-specific metadata are acquired from the WC Data Portal.
-
`maps.dir` the directory path to write diagnostic maps of QC'd tracks.
-
`diag.dir` the directory path to write diagnostic time-series plots of
QC'd lon & lat.
-
`output.dir` the directory path to write QC output CSV files. Must
reside within the `wd`.
-
`return.R` a logical indicating whether the function should return a
list of QC-generated, internal R objects. This results in a single large
object returned to the R work space containing the following elements:
-
`dropIDs` the WC uuid's dropped from the QC process
-
`wc` the WC tag data files downloaded from the WC Data Portal
-
`meta` the deployment metadata formatted for use in the QC workflow (ie. not
the final output format)
-
`locations_sf` the projected location data to be passed as input to the SSM
-
`fit1` the initial SSM output fit object
-
`fit2` the final SSM output fit object including re-routed locations if
specified.
-
`wc_ssm` the SSM-annotated WC tag data files. This output object
can be useful for troubleshooting undesirable results during
delayed-mode (supervised) QC workflows.\
-
`harvest` config block specifies data harvesting parameters:
-
`download` a logical indicating whether tag data are to be downloaded
from the WC Data Portal API or read from the local `data.dir`.
-
`owner.id` the Wildlife Computers collaborator ID, which is required if
the user does not own or otherwise does not have direct access to the
tag data. Note, that data-sharing collaborations must be set up in the
Wildlife Computers Data Portal prior to accessing via the API.
-
`wc.akey` the Wildlife Computers Access Key that all Portal users must
have to access the API.
-
`wc.skey` the Wildlife Computers Secret Key.
-
`tag.list` a .CSV file with a single variable named `uuid`, providing the uuid's
to be downloaded. This ensures only the subset of desired tag datasets are
downloaded. If not provided (NULL) then ALL the tag datasets attributed to the
`owner.id` or to user providing wc-keys will be downloaded.
-
`dropIDs` a .CSV file with a single variable named `uuid`, providing the uuid's
to be ignored by the QC process. Can be NULL.
-
-
`model` config block specifies model- and data-specific parameters:
-
`model` the aniMotum SSM model to be used for the location QC -
typically either `rw` or `crw`.
-
`vmax` for SSM fitting; max travel rate (m/s) to identify implausible
locations
-
`time.step` the prediction interval (in decimal hours) to be used by the
SSM
-
`proj` the proj4string to be used for the location data & for the
SSM-estimated locations. Can be NULL, which will result in one of 5
projections being used, depending on whether the centroid of the
observed latitudes lies in N or S polar regions, temperate or equatorial
regions, or if tracks straddle (or lie close to) -180,180 longitude.
-
`reroute` a logical; whether QC'd tracks should be re-routed off of land
(default is FALSE). Note, in some circumstances this can substantially
increase processing time. Default land polygon data are sourced from the
`ropensci/rnaturalearthhires` R package.
-
`dist` the distance in km from outside the convex hull of observed
locations from which to select land polygon data for re-routing. Ignored
if `reroute = FALSE`.
-
`barrier` an optional filepath to an alternate polygon data file (shapefile) for
the land (or other) barrier. For example, higher resolution local coastline
data can be supplied, provided the data extend at least `dist` km beyond
the extent of the track data.
-
`buffer` the distance in km to buffer rerouted locations from the
coastline. Ignored if `reroute = FALSE`.
-
`centroids` whether centroids are to be included in the visibility graph
mesh used by the rerouting algorithm. See `?pathroutr::prt_visgraph` for
details. Ignored if `reroute = FALSE`.
-
`cut` logical; should predicted locations be dropped if they lie within
in a large data gap (default is FALSE).
-
`min.gap` the minimum data gap duration (h) to be used for cutting
predicted locations (default is 72 h)
-
`QCmode` one of either `nrt` for Near Real-Time QC or `dm` for Delayed
Mode QC.
-
`pred.int` the prediction interval (in hours) to be used. Typically, this is the
same as the `time.step`.
-
`meta` config block specifies species and deployment location
information. This config block is only necessary when no metadata file
is provided in the `setup` block.
-
`common_name` the species common name (e.g., "loggerhead turtle")
-
`species` the species scientific name (e.g., "Caretta caretta")
-
`release_site` the location where tags were deployed (e.g., "NSW")
-
`state_country` the country/territory name (e.g., "Australia")
With a completed config file, the standard call to initiate the QC workflow
within R is:
```{r fn2 call, eval=FALSE}
wc_qc(wd = "test", config = "irap_config.json")
```
where `wd` is the file path for the working directory within which all
QC data/metadata inputs are downloaded (or read) and outputs are
written.
### Additional details on config parameters
The Wildlife Computers API credentials: `collab.id`, `wc.akey`, and `wc.skey`
may be used to download data directly from the Wildlife Computers
Portal, in this case, data are written to tag-specific directories
within the specified `data.dir` directory. Alternatively, `wc_qc()` may be used
with local copies of Wildlife Computers tag data, provided they are stored in
tag-specific directories within the `data.dir` directory.
The `proj` argument specifies the projection (as a `proj4string`) to
which the tag-measured locations are converted as input to the QC
state-space model (SSM), ie. the working projection in `km` for the SSM.
Any valid `proj4string` may be used, provided the units are in `km`. If
`proj` is left as `NULL` then the QC algorithm will project the data
differently depending on the centroid latitude of the tracks. The
default projections are:
| Central Latitude or Longitude | Projection (with `+units=km`) |
|:-----------------:|:---------------------------------------------------:|
| -55 to -25 or 25 to 55 Lat | Equidistant Conic with standard parallels at the tracks' 25th & 75 percentile Latitudes |
| \< -55 or \> 55 Lat | Stereographic with origin at the tracks' centroid |
| -25 to 25 Lat | Mercator with origin at the tracks' centroid |
| -25 to 25 Lat & Long straddles -180,180 | Longitudes are shifted to 0, 360 and a Mercator with origin at tracks' centroid |
The `model` argument specifies the `aniMotum` SSM to be used; typically
either `rw` or `crw`. The latter is usually less biased when data gaps
are absent, the former is best when data gaps are present. A general
recommendation is to use `model`:`rw` as the SSM for unsupervised (e.g.,
NRT) QC workflows. The SSM fitting algorithm has a few fundamental
parameters that need to be specified; `vmax` is the animals' maximum
plausible travel rate in ms$^{-1}$. For example, `vmax`:`3` is usually
appropriate for seals and `vmax`:`2` for turtles. The SSM prediction
interval in hours is specified with `time.step`. Decimal hours can be
used for `time.steps` shorter than 1 hr. This time interval determines
the temporal resolution of the predicted track. The predicted track
locations provide the basis for interpolation to the time of each
tag-measured ocean observation or behavioural event. Typically, 6 hours
is appropriate for most Argos data collected from seals and turtles but
a finer time interval may be required for faster moving species and/or
more frequently measured ocean observations, and a coarser interval for
more sporadically observed locations. Further details on SSM fitting to
Argos and GPS data are provided in the associated R package [aniMotum
vignettes](https://ianjonsen.github.io/aniMotum/) and in [Jonsen et al.
2023](https://besjournals.onlinelibrary.wiley.com/doi/full/10.1111/2041-210X.14060).
When animals pass close to land some SSM-predicted locations may
implausibly lie on land. Often, this is due to the spatial and temporal
resolution of the Argos tracking data. In these cases, SSM-predicted
locations can be adjusted minimally off of land by setting
`reroute`:`true`. The [`pathroutr` R
package](https://jmlondon.github.io/pathroutr/) is used for efficient
rerouting. In this case, additional arguments should be specified:
`dist` - the distance in km beyond track locations from which coastline
polygon data should be sampled (smaller provides less information for
path re-routing, greater increase computation time)
`barrier` - an optional parameter that can provide an alternate spatial
polygon dataset , as a shapefile, for the land (or other barriers to movement).
Typically, this alternate dataset would be a localised, high-resolution
coastline dataset.
`buffer` - the distance in km to buffer rerouted locations from the
coastline
`centroid`- whether to include the visibility graph centroids for
greater resolution
SSM-predicted tracks can be `cut` (`cut`:`true`) in regions where large
location data gaps exist. These location data gaps can occur when the
tags are unable to transmit for extended periods or when animal
surfacing occurs during periods of Argos satellite unavailability (more
common closer to the equator than at higher latitudes). In this case,
`min.gap` is used to specify the minimum data gap duration (h) from
which to cut SSM-predicted locations. This will limit interpolation
artefacts due to implausible SSM-predicted locations in excessively long
data gap periods.
The `QCmode` sets whether the QC is being conducted in delayed-mode `dm`
or near real-time `nrt`. Delayed-mode is reserved for when tag
deployments have ended and usually involve greater user intervention;
such as making decisions on removing aberrant portions of a deployment
(e.g., as tag batteries begin failing). The `nrt` mode is meant to be
fully automated and only used while a deployment is active. In both
cases, the output .CSV and plot file names will include the `QCmode` as
a suffix.