Skip to contents

Compute features on an input time series dataset

Usage

calculate_features(
  data,
  id_var = "id",
  time_var = "timepoint",
  values_var = "values",
  group_var = NULL,
  feature_set = c("catch22", "feasts", "tsfeatures", "Kats", "tsfresh", "TSFEL"),
  catch24 = FALSE,
  tsfresh_cleanup = FALSE,
  features = NULL,
  seed = 123
)

Arguments

data

data.frame with at least 4 columns: id variable, group variable, time variable, value variable

id_var

character specifying the ID variable to identify each time series. Defaults to "id"

time_var

character specifying the time index variable. Defaults to "timepoint"

values_var

character specifying the values variable. Defaults to "values"

group_var

character specifying the grouping variable that each unique series sits under (if one exists). Defaults to NULL

feature_set

character or vector of character denoting the set of time-series features to calculate. Defaults to "catch22"

catch24

Boolean specifying whether to compute catch24 in addition to catch22 if catch22 is one of the feature sets selected. Defaults to FALSE

tsfresh_cleanup

Boolean specifying whether to use the in-built tsfresh relevant feature filter or not. Defaults to FALSE

features

named list containing a set of user-supplied functions to calculate on data. Each function should take a single argument which is the time series. Defaults to NULL for no manually-specified features. Each list entry must have a name as calculate_features looks for these to name the features. If you don't want to use the existing feature sets and only compute those passed to features, set feature_set = NULL

seed

integer denoting a fixed number for R's random number generator to ensure reproducibility. Defaults to 123

Value

object of class feature_calculations that contains the summary statistics for each feature

Author

Trent Henderson

Examples

featMat <- calculate_features(data = simData, 
  id_var = "id", 
  time_var = "timepoint", 
  values_var = "values", 
  group_var = "process", 
  feature_set = "catch22",
  seed = 123)
#> No IDs removed. All value vectors good for feature extraction.
#> Running computations for catch22...
#> Warning: There was 1 warning in `dplyr::reframe()`.
#>  In argument: `Rcatch22::catch22_all(.data$values, catch24 = catch24)`.
#>  In group 1: `id = "Gaussian Noise_1"`, `group = "Gaussian Noise"`.
#> Caused by warning:
#> ! As of 0.1.14 the feature 'CO_f1ecac' returns a double instead of int
#> This warning is displayed once per session.
#> 
#> Calculations completed for catch22.