Perform cluster analysis of time series using their feature vectors
Arguments
- data
feature_calculations
object containing the raw feature matrix produced bytheft::calculate_features
- norm_method
character
denoting the rescaling/normalising method to apply. Can be one of"zScore"
,"Sigmoid"
,"RobustSigmoid"
,"MinMax"
, or"MaxAbs"
. Defaults to"zScore"
- unit_int
Boolean
whether to rescale into unit interval[0,1]
after applying normalisation method. Defaults toFALSE
- clust_method
character
specifying the clustering algorithm to use. Can be one of"kmeans"
for k-means clustering,"hclust"
for hierarchical clustering, or"mclust"
for Gaussian mixture model clustering. Defaults to"kMeans"
- k
integer
denoting the number of clusters to extract. Defaults to2
- features
character
vector denoting the names of time-series features to use in the clustering algorithm. Defaults toNULL
for no feature filtering and usage of the entire feature matrix- na_removal
character
defining the way to deal with NAs produced during feature calculation. Can be one of"feature"
or"sample"
."feature"
removes all features that produced any NAs in any sample, keeping the number of samples the same."sample"
omits all samples that produced at least one NA. Defaults to"feature"
- seed
integer
to fix R's random number generator to ensure reproducibility. Defaults to123
- ...
arguments to be passed to
stats::kmeans
orstats::hclust
, ormclust::Mclust
depending on selection inclust_method
Value
object of class feature_cluster
containing the clustering algorithm and a tidy version of clusters joined to the input dataset ready for further analysis
Examples
library(theft)
features <- theft::calculate_features(theft::simData,
group_var = "process",
feature_set = "catch22")
#> No IDs removed. All value vectors good for feature extraction.
#> Running computations for catch22...
#>
#> Calculations completed for catch22.
clusts <- cluster(features,
k = 6)