Overview

The MultIvariate sTatisTical procEss coNtrol (MITTEN) package is one of two Python packages (along with MASE) developed by a team of three college students studying Computational Modeling and Data Analytics, Computer Science, Mathematics, and Statistics. Any issues can be reported on Github.

mitten.apply_mewma(df, num_in_control, lambd=0.1, alpha=0, plotting=True, save='', plot_title='MEWMA')
Parameters
  • df – multivariate dataset as Pandas DataFrame

  • num_in_control – number of rows before anomalies begin

  • lambd – smoothing parameter between 0 and 1; lower value = higher weight to older observations; default is 0.1

  • alpha – the percentage of false positives we want to allow, used for calculating the Upper Control Limit

  • save – the directory to save the graphs to, if not changed from default, nothing will be saved

  • plot_title – the title for the plot generated

Returns

MEWMA statistic values and a calculated UCL with approximately alpha false positive rate

mitten.hotelling_t2(df, num_in_control, alpha=0, plotting=True, save='', plot_title='Hotellings T^2')
Parameters
  • df – multivariate dataset as Pandas DataFrame

  • num_in_control – number of in control observations before the anomalies start

  • alpha – the percentage of false positives we want to allow, used for calculating the Upper Control Limit

  • save – the directory to save the graphs to, if not changed from default, nothing will be saved

  • plot_title – the title for the plot generated

Returns

Hotelling T^2 statistic values and a calculated UCL with approximately alpha false positive rate

mitten.interpret_multivariate_signal(df, stats_list, ucl, batch_size=5, n_most_likely=5, verbose=False)

Designed to interpret an out of control signal from a multivariate control chart method. Used to identify the source of the signal by examining each individual feature using successive t-tests. Since this method relies on t-tests, it is ill-equipped to handle shifts in process variability.

Parameters
  • df – dataframe which stores the data to be tested, with features as columns and observations as rows.

  • stats_list – list of calculated statistics from an MSPC method

  • ucl – Upper Control Limit (returned by an MSPC method or chosen by the user)

  • batch_size – successive t-tests will be run on subsets of the potentially out of control segment of the dataset contained in df. The size of each subset (aka batch) is controlled by batch_size

  • n_most_likely – if verbose=True then this controls how many of the most likely observations will be printed.

  • verbose – if True, prints the most likely cuplrit features

Returns

A ranked list of features (as a Pandas series) sorted from highest to lowest average t-statistic ranking.

mitten.mcusum(df, num_in_control, k, alpha=0, plotting=True, save='', plot_title='MCUSUM')

Implementation of the Multivariate Cumulative Sum (MCUSUM) method.

Based on Kent(2007) https://etd.ohiolink.edu/rws_etd/send_file/send?accession=kent1185558637&disposition=inline

  • Reference 17 : (Jackson 1985)

  • Reference 5 : (Crosier 1988)

Parameters
  • df – multivariate dataset as Pandas DataFrame

  • num_in_control – number of in control observations

  • k – the slack parameter which determines model sensetivity (should typically be set to 1/2 of the mean shift that you expect to detect)

  • alpha – the percentage of false positives we want to allow, used for calculating the Upper Control Limit

  • save – the directory to save the graphs to, if not changed from default, nothing will be saved

  • plot_title – the title for the plot generated

Returns

MCUSUM statistic values and a calculated UCL with approximately alpha false positive rate

mitten.pc_mewma(df, num_in_control, num_princ_comps, alpha=0, lambd=0.1, plotting=True, save='', plot_title='PC_MEWMA')

MEWMA on Principle Components Variables contained in df must have mean 0

Parameters
  • df – multivariate dataset as Pandas DataFrame

  • num_in_control – number of in control observations

  • num_princ_comps – number of principle components to include

  • alpha – the percentage of false positives we want to allow, used for calculating the Upper Control Limit

  • lambd – smoothing parameter between 0 and 1; lower value = higher weightage to older observations; default is 0.1

  • save – the directory to save the graphs to, if not changed from default, nothing will be saved

  • plot_title – the title for the plot generated

Returns

MEWMA statistic values using PCA for dimensionality reduction and a calculated UCL with approximately alpha false positive rate