--- title: "SMRU_workflow" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{SMRU_workflow} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: markdown: wrap: 72 --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## SMRU SRDL tag QC workflow The first step to initiate any `ArgosQC` workflow is to construct a `JSON` config file (see [SMRU_config_file](SMRU_config_file.html)). The SMRU SRDL tag QC workflow has a number of data/metadata processing, model fitting, data file annotation and output steps. Each of these steps is encapsulated in an ArgosQC function: Table 1. SMRU SRDL tag QC workflow functions listed in order of operation. For standard near real-time (NRT) QC workflows, these functions are implemented via the wrapper function, `smru_qc()`. For examples of how to implement the individual functions refer to the R help pages, e.g., type `?download_data` in the R console. | SMRU QC function | Description | |:-------------:|:---------------------------------------:| | `download_data` | Downloads tag data .mdb file from SMRU server & writes to the specified `data.dir`. | | `smru_pull_tables` | Extracts tag data files from the downloaded .mdb file to a named list object in the R workspace, See `?smru_pull_tables` for an example. | | `get_metadata` | Either loads deployment metadata from a CSV file specified in the `config` file `setup` block, or builds the metadata from the tag manufacturer's data portal in combination with species and deployment site attributes provided in the `config` file `meta` block. | | `smru_prep_loc` | Prepares location data for SSM fitting by restructuring the SMRU diag & gps (if present) tag data files, truncating start and end dates (for NRT QC) based on dates of first & last CTD profile, projecting locations from lon-lat to the `config` file specified proj4string. | | `multi_filter` | Applies a 1st pass of the SSM model using parameters specified in the `config` file `model` block. SSM's are fit in parallel across n available processors. | | `redo_multi_filter` | Applies a 2nd pass to refit the SSM to any tag location datasets that failed to converge on the first pass. Uses automatically revised parameters to help ensure convergence. Reroutes any locations off of land, if `model:reroute:true` in `config` file. | | `ssm_mark_gaps` | Identifies & marks SSM predicted & rerouted locations in track segments with data gaps of a specified minimum duration. Typically used in DM QC's only. | | `smru_append_ssm` | Appends SMRU tag data files with SSM-derived coordinates & uncertainty (lon,lat,x,y, x.se, y.se) for each record. | | `smru_clean_diag` | Restructures the diag file for diagnostic plots | | `diagnostics` | Generates a map of all QC'd tracks & time-series plots of longitude & latitude for quick assessment of SSM fit to the tag location datasets. | | `smru_write_csv` | Combines QC-annotated tag data files across individual tags, applies tests for expected variables types and ranges (IMOS only for now), writes data files to CSV as final QC outputs | If a more complicated NRT workflow is required, e.g., with custom processing in between the standard workflow steps, the functions can be called separately from an R script so that intermediate results can be checked. This is also the recommended approach for all delayed-mode (supervised) QC workflows. In this example, the main QC outputs were written to .CSV files in the specified output directory, `output.dir`. Each .CSV file includes the name of the SMRU data table, when present (`ctd`, `diag`, `dive`, `haulout`, `summary`) or the QC file (`metadata`, `ssmoutputs`). For QC workflows with ATN data, each of these file names is appended with the species' `AnimalAphiaID` and the `ADRProjectID`. For IMOS and other programs, the file names are appended with the SMRU campaign ID (e.g., `ct182`). The diag files show the SSM fit (red) overlaid on the tag-measured Argos &/or GPS locations (blue). The dark grey vertical bars denote the time period tags were actively recording locations but the seal(s) either had not yet gone to sea (no recorded diving activity - left side), or the CTD sensor had failed (e.g., grey bar on right side of `tu123-Catherine-25`). By default, the QC model does not fit to data in these time periods. These plots help judge whether the SSM fits have artefacts that need addressing - typically only addressed during a delayed-mode QC workflow. ![](images/lat_coverage_tu123.jpg){width="100%"} The map file shows the SSM-predicted tracks (blue) and current last estimated location (red) for each deployed tag. The map files are annotated by the QC date so they are not overwritten by successive QC runs. ![](images/map_tu123_2025-11-10.jpg){width="100%"} #### Output .CSV files The QC's main outputs, the .CSV files contain all records from the original SMRU data tables and are appended with the following additional columns: `ssm_lat`, `ssm_lon`, `ssm_x`, `ssm_y`, `ssm_x_se`, `ssm_y_se`. These are the QC'd locations and their uncertainty estimates interpolated to the time of each record. The `ssm_x`, `ssm_y` variables are the coordinates from the QC workflow projection (in km) and `ssm_x_se`, `ssm_y_se` are the associated standard errors (in km). Note that NA's may be present in the QC-appended location variables, particularly at the start and/or end of individual tracks. This is typically indicative of track portions prior to animals going to sea (at deployment start) and portions when either the CTD or pressure sensor failed, eg. due to biofouling or seawater ingress, but tag still transmitted locations (near deployment end). #### Metadata .CSV file If an input deployment/tag metadata file is provided then the output metadata file contains all the original metadata records plus the following variables describing the QC workflow applied to the data: - `qc_start_date` - the track datetime (UTC) at which the QC workflow was started. - `qc_end_date` - the track datetime (UTC) at which the QC workflow was ended. - `qc_proj4string` - the projection used for QC'ing the locations, as a proj4string. - `qc_method` - denotes the `ArgosQC` R package was used. - `qc_version` - denotes the version number of the `ArgosQC` R package used. - `qc_run_date` - the datetime (UTC) when the QC was applied to the data. Note, these variables are not appended to the metadata for IMOS QC workflows due to IMOS - AODN metadata specifications. #### SSMOutputs .CSV file The SSMOutputs file contains the SSM-predicted locations at the `time.step` specified prediction interval. The time of the first location is set to the time of the first tag-measured location passed to the model. This may or may not be the first tag-measured location in the tag datafile, depending on whether the animal-borne tag was immediately at sea. The location coordinates are provided as: `lon`, `lat`, `x`, `y`, and location uncertainty as `x_se`, `y_se`. The planar coordinates and uncertainty estimates always have units in km. Their coordinate projection is provided in the metadata .CSV file (`qc_proj4string`).