Conduct statistical testing on time-series feature classification performance to identify top features or compare entire sets
Source:R/compare_features.R
compare_features.Rd
Conduct statistical testing on time-series feature classification performance to identify top features or compare entire sets
Arguments
- data
list
object containing the classification outputs produce bytsfeature_classifier
- metric
character
denoting the classification performance metric to use in statistical testing. Can be one of"accuracy"
,"precision"
,"recall"
,"f1"
. Defaults to"accuracy"
- by_set
Boolean
specifying whether you want to compare feature sets (ifTRUE
) or individual features (ifFALSE
). Defaults toTRUE
but this is contingent on whether you computed by set or not intsfeature_classifier
- hypothesis
character
denoting whether p-values should be calculated for each feature set or feature (depending onby_set
argument) individually relative to the null ifuse_null = TRUE
intsfeature_classifier
through"null"
, or whether pairwise comparisons between each set or feature should be conducted on main model fits only through"pairwise"
. Defaults to"null"
- p_adj
character
denoting the adjustment made to p-values for multiple comparisons. Should be a valid argument tostats::p.adjust
. Defaults to"none"
for no adjustment."holm"
is recommended as a starting point for adjustments
References
Henderson, T., Bryant, A. G., and Fulcher, B. D. Never a Dull Moment: Distributional Properties as a Baseline for Time-Series Classification. 27th Pacific-Asia Conference on Knowledge Discovery and Data Mining, (2023).
Examples
library(theft)
features <- theft::calculate_features(theft::simData,
group_var = "process",
feature_set = NULL,
features = list("mean" = mean, "sd" = sd))
#> No IDs removed. All value vectors good for feature extraction.
#> Running computations for user-supplied features...
#>
#> Calculations completed for user-supplied features.
classifiers <- classify(features,
by_set = FALSE,
n_resamples = 3)
#> Only one set of 'catch22', 'feasts', 'tsfeatures', or 'Kats' with potential duplicates is in your feature data. Exiting and returning original input data.
#> Fitting model 1/6
#> Fitting model 2/6
#> Fitting model 3/6
#> Fitting model 4/6
#> Fitting model 5/6
#> Fitting model 6/6
compare_features(classifiers,
by_set = FALSE,
hypothesis = "pairwise")
#> Calculating comparison 1/1
#> hypothesis names_a names_b metric names_a_mean names_b_mean
#> 1 User_mean != User_sd User_mean User_sd accuracy 0.1851852 0.6148148
#> t_statistic p.value
#> 1 -8.043153 0.007554182