Overview¶
The MultIvariate sTatisTical procEss coNtrol (MITTEN) package is one of two Python packages (along with MASE) developed by a team of three college students studying Computational Modeling and Data Analytics, Computer Science, Mathematics, and Statistics. Any issues can be reported on Github.
-
mitten.apply_mewma(df, num_in_control, lambd=0.1, alpha=0, plotting=True, save='', plot_title='MEWMA')¶ - Parameters
df – multivariate dataset as Pandas DataFrame
num_in_control – number of rows before anomalies begin
lambd – smoothing parameter between 0 and 1; lower value = higher weight to older observations; default is 0.1
alpha – the percentage of false positives we want to allow, used for calculating the Upper Control Limit
save – the directory to save the graphs to, if not changed from default, nothing will be saved
plot_title – the title for the plot generated
- Returns
MEWMA statistic values and a calculated UCL with approximately
alphafalse positive rate
-
mitten.hotelling_t2(df, num_in_control, alpha=0, plotting=True, save='', plot_title='Hotellings T^2')¶ - Parameters
df – multivariate dataset as Pandas DataFrame
num_in_control – number of in control observations before the anomalies start
alpha – the percentage of false positives we want to allow, used for calculating the Upper Control Limit
save – the directory to save the graphs to, if not changed from default, nothing will be saved
plot_title – the title for the plot generated
- Returns
Hotelling T^2 statistic values and a calculated UCL with approximately
alphafalse positive rate
-
mitten.interpret_multivariate_signal(df, stats_list, ucl, batch_size=5, n_most_likely=5, verbose=False)¶ Designed to interpret an out of control signal from a multivariate control chart method. Used to identify the source of the signal by examining each individual feature using successive t-tests. Since this method relies on t-tests, it is ill-equipped to handle shifts in process variability.
- Parameters
df – dataframe which stores the data to be tested, with features as columns and observations as rows.
stats_list – list of calculated statistics from an MSPC method
ucl – Upper Control Limit (returned by an MSPC method or chosen by the user)
batch_size – successive t-tests will be run on subsets of the potentially out of control segment of the dataset contained in
df. The size of each subset (aka batch) is controlled bybatch_sizen_most_likely – if
verbose=Truethen this controls how many of the most likely observations will be printed.verbose – if
True, prints the most likely cuplrit features
- Returns
A ranked list of features (as a Pandas series) sorted from highest to lowest average t-statistic ranking.
-
mitten.mcusum(df, num_in_control, k, alpha=0, plotting=True, save='', plot_title='MCUSUM')¶ Implementation of the Multivariate Cumulative Sum (MCUSUM) method.
Based on Kent(2007) https://etd.ohiolink.edu/rws_etd/send_file/send?accession=kent1185558637&disposition=inline
Reference 17 : (Jackson 1985)
Reference 5 : (Crosier 1988)
- Parameters
df – multivariate dataset as Pandas DataFrame
num_in_control – number of in control observations
k – the slack parameter which determines model sensetivity (should typically be set to 1/2 of the mean shift that you expect to detect)
alpha – the percentage of false positives we want to allow, used for calculating the Upper Control Limit
save – the directory to save the graphs to, if not changed from default, nothing will be saved
plot_title – the title for the plot generated
- Returns
MCUSUM statistic values and a calculated UCL with approximately
alphafalse positive rate
-
mitten.pc_mewma(df, num_in_control, num_princ_comps, alpha=0, lambd=0.1, plotting=True, save='', plot_title='PC_MEWMA')¶ MEWMA on Principle Components Variables contained in
dfmust have mean 0- Parameters
df – multivariate dataset as Pandas DataFrame
num_in_control – number of in control observations
num_princ_comps – number of principle components to include
alpha – the percentage of false positives we want to allow, used for calculating the Upper Control Limit
lambd – smoothing parameter between 0 and 1; lower value = higher weightage to older observations; default is 0.1
save – the directory to save the graphs to, if not changed from default, nothing will be saved
plot_title – the title for the plot generated
- Returns
MEWMA statistic values using PCA for dimensionality reduction and a calculated UCL with approximately
alphafalse positive rate