• No se han encontrado resultados

Instituto Tecnol´ogico y de Estudios Superiores de Monterrey

N/A
N/A
Protected

Academic year: 2022

Share "Instituto Tecnol´ogico y de Estudios Superiores de Monterrey"

Copied!
90
0
0

Texto completo

(1)

Instituto Tecnol´ ogico y de Estudios Superiores de Monterrey

Campus Monterrey

School of Engineering and Sciences

A Phase I Nonparametric Shewhart-Type Chart Based on Sequential Normal Scores

A thesis presented by

Guillermo Hern´ andez Zamudio

Submitted to the

School of Engineering and Sciences

in partial fulfillment of the requirements for the degree of Master of Science

in Manufacturing Systems

Monterrey, Nuevo Le´on, June 15th, 2020

(2)

i

Instituto Tecnol´ ogico y de Estudios Superiores de Monterrey

Campus Monterrey

School of Engineering and Sciences

The committee members, hereby, certify that have read the dissertation presented by Guillermo Hern´andez Zamudio and that it is fully adequate in scope and quality as a partial requirement for the degree of Master of Science in Manufacturing Systems.

Dr. V´ıctor Gustavo Tercero G´omez Tecnologico de Monterrey School of Engineering and Sciences Principal Advisor Dr. William Jay Conover Texas Tech Univeristy Co-Advisor Dr. Alvaro Eduardo Cordero Franco Universidad Aut´onoma de Nuevo Le´on Committee Member Dr. Luis Alejandro Benavides V´azquez Tecnologico de Monterrey Committee Member Dr. R´uben Morales Men´endez

School of Engineering and Sciences Dean of Graduate Studies

Monterrey, Nuevo Le´on, June 15th, 2020

(3)

ii

Declaration of Authorship

I, Guillermo Hern´andez Zamudio, declare that this thesis titled “A Phase I Nonparametric Shewhart-Type Chart Based on Sequential Normal Scores.”

and the work presented in it are my own. I confirm that:

• This work was done wholly or mainly while in candidature for a research degree at this University.

• Where any part of this thesis has previously been submitted for a degree or any other qualification at this University or any other institution, this has been clearly stated.

• Where I have consulted the published work of others, this is always clearly attributed.

• Where I have quoted from the work of others, the source is always given.

With the exception of such quotations, this dissertation is entirely my own work.

• I have acknowledged all main sources of help.

• Where the dissertation is based on work done by myself jointly with others, I have made clear exactly what was done by others and what I have contributed myself.

Guillermo Hern´andez Zamudio Monterrey, Nuevo Le´on, June 15th, 2020

@2020 by Guillermo Hern´andez Zamudio All rights reserved

(4)

iii

Dedication

To my parents Mr. V´ıctor and Mrs. Bertha, who have been always by my side in each step I have taken, demanding the best from me, not letting me settle with less than I deserve, and, for teaching me, that the best things in life will not be easy to pursue, and will require major endeavors, but they totally worth it. For being my very first role model and pushed me to never surrender.

To my siblings, V´ıctor, Bertha, and Grecia, the smartest people that I know. They all have been an excellent model that inspires me.

To my grandparents, to those watching me from heaven and those who I am still lucky to hug. To my aunts who have been an extension of my mother, and to the newest members of the family.

And for all those people who dream big and dare to make their dreams come true.

(5)

iv

Acknowledgments

I would like to express my gratitude to all the people that made this achievement possible.

Thanks to my parents and my siblings for believing in me, for all the support and motivation, because in dark times they brought light to my life.

To my advisors, professors, and all professional staff for sharing their knowledge and guidance, always demanding the best of me.

To my friends (and my friends’ parents too) who walked alongside with me in this journey.

Special acknowledgment to Tecnologico de Monterrey and CONACyT for trusting and accepting me as part of this program, and for all the economic support provided.

Thanks.

(6)

v

A Phase I Nonparametric Shewhart-Type Chart Based on Sequential Normal Scores.

By

Guillermo Hern´andez Zamudio

Abstract

Nonparametric statistical methods are gaining importance in industrial pro- cess monitoring due to their robustness to the underlying distribution of the data, a common situation when dealing with real industrial processes. Con- trol charts are regularly used to monitor the behavior of a system over time, often assuming a normal distribution, thus, the exactness of results obtained relies on the truthfulness of given assumptions. Nonparametric solutions based on permutations are limited to deal with small samples due to the computational complexity. Approaches based on rank transformations have shown relatively great power, however, their use in the analysis of series, such as control chart monitoring, involves re-ranking calculations that might become too complex when facing large data flows. This can be avoided by restricting the incorporation of new data into the analysis at the expense of losing power. Sequential rank transformations have shown attractive proper- ties in terms of power and computational complexity, and the normal scores variant has reduced the analytical complexity extending its applicability by adapting parametric approaches that assume normality. This thesis proposes the use of sequential normal scores (SNS) for industrial process monitoring and compares its performance over a wide variety of practical situation and other nonparametric alternatives. The performance showed robustness over different distributions, in terms of the Empirical Alarm Probability (EAP), and an increase in power as new observations were incorporated in the anal- ysis.

keywords

Empirical Alarm Probability, False Alarm Probability, Nonpara- metric, Phase I, Sequential Ranks, Statistical Process Control

(7)

vi

(8)

List of Figures

1.1 Number of papers published per year by research topic. Data

obtained from ScienceDirect database. . . 2

1.2 Fitted distribution of real data observations. . . 4

1.3 Anderson-Darling normality test results. . . 4

1.4 Fitted distribution of transformed observations. . . 4

1.5 X control chart over real data set. . . .¯ 5

1.6 Zone chart and new control limits using Johnson’s transformed observations. (Aichouni et al., 2014). . . 5

2.1 Technology pillars of an Industry 4.0 system. Saturno et al. (2017). . . 10

2.2 Elements of a control chart. . . 12

4.1 Probability density function of a Normal (0,1) distribution compared with a T(3) distribution. . . 35

4.2 Probability density function of a Normal (0,1) distribution compared with a Gamma(1,0.5) distribution. . . 35

4.3 In-control comparison graph. N(0,1) process. . . 37

4.4 In-control comparison graph. T(3) process. . . 38

4.5 In-control comparison graph. Gamma(1,0.5) process. . . 39

4.6 Power comparison graph. Normal(0,1) process. Sustained shift, m=10. The works considered are the sequential normal scores (SNS) represented by a continuous line with a dot as a marker, the mean rank (MR) with a dashed line with a trian- gle as a marker, the ¯X (XB) represented by a dot dashed line with a x as a marker, and the median chart (Med) represented by a long dash line with a dot as a marker. . . 43

4.7 Power comparison graph. Normal(0,1) process. Sustained shift, m=20. SNS vs Mean Rank vs Median Chart. . . 44

vii

(9)

viii LIST OF FIGURES

4.8 Power comparison graph. Normal(0,1) process. Isolated shift,

m=10. SNS vs Mean Rank vs Median Chart. . . 45 4.9 Power comparison graph. T(3) process. Sustained shift, m=10.

SNS vs Mean Rank vs Median Chart. . . 46 4.10 Power comparison graph. T(3) process. Sustained shift, m=20.

SNS vs Mean Rank vs Median Chart. . . 47 4.11 Power comparison graph. T(3) process. Isolated shift, m=10.

SNS vs Mean Rank vs Median Chart. . . 48 4.12 Power comparison graph. Gamma(1,0.5) process. Sustained

shift, m=10. SNS vs Mean Rank vs Median Chart. . . 49 4.13 Power comparison graph. Gamma(1,0.5) process. Sustained

shift, m=20. SNS vs Mean Rank vs Median Chart. . . 50 4.14 Power comparison graph. Gamma(1,0.5) process. Isolated

shift, m=10. SNS vs Mean Rank vs Median Chart. . . 50 4.15 Power comparison graph. Normal(0,1) process. Sustained

shift, m=10. The works considered are the sequential nor- mal scores (SNS) represented by a continuous line with a dot as a marker and the pooled sequential scores (pSNS) with a

dashed line with a triangle as a marker. . . 51 4.16 Power comparison graph. Normal(0,1) process. sustained shift[8:10], m=10. pSNS vs SNS. . . 51 4.17 Power comparison graph. Normal(0,1) process. Sustained

shift, m=10. The works considered are the sequential normal scores (SNS) represented by a continuous line with a dot as a marker, the pooled sequential scores (pSNS) with a dashed line with a triangle as a marker, and the mean rank (MR)

represented by a dot dashed line with a x as a marker. . . 53 4.18 Power comparison graph. T(3) process. Sustained shift, m=10.

pSNS vs SNS vs Mean Rank. . . 53 4.19 Power comparison graph. Gamma(1,0.5) process. Sustained

shift, m=10. pSNS vs SNS vs Mean Rank. . . 54 4.20 Power comparison graph. Normal(0,1) process. Isolated shift,

m=10. pSNS vs SNS vs Mean Rank. . . 56 4.21 Power comparison graph. T(3) process. Isolated shift, m=10.

pSNS vs SNS vs Mean Rank. . . 56

(10)

LIST OF FIGURES ix

4.22 Power comparison graph. Gamma(0.5) process. Isolated shift,

m=10. pSNS vs SNS vs Mean Rank. . . 57 5.1 a)Example of a ¯X chart for m=20, n=3 and FAP0 =0.10.

b)Example of a SNS chart for m=20, n=3 and FAP0 =0.10.

c)Example of a pSNS chart for m=20, n=3 and FAP0 =0.10.

Data from construction industry. . . 65

(11)

x LIST OF FIGURES

(12)

List of Tables

2.1 Mean Rank Chart Control Limits . . . 23

2.2 Median Chart. Attained FAP values and the control limits (a,b) for F AP0 = 0.10 . . . 24

3.1 Phase I SNS Chart control cimits . . . 30

3.2 Phase I pSNS Chart control limits . . . 31

4.1 Testing scenarios for experimentation model. Part 1. . . 34

4.2 Testing scenarios for experimentation model. Part 2. . . 36

4.3 IC performance of charts for different distributions. . . 40

4.4 EAP for different shift and subgroup sizes, for m = 10, sus- tained shift, Normal(0,1) distribution. SNS vs ¯X, Mean Rank, and Median Charts. . . 41

4.5 EAP for different shift and subgroup sizes, for m = 20, sus- tained shift, Normal(0,1) distribution. SNS vs ¯X and Mean Rank. . . 41

4.6 EAP for different shift and subgroup sizes, for m = 10, isolated shift, Normal(0,1) distribution. SNS vs ¯X, Mean Rank, and Median Charts. . . 42

4.7 EAP for different shift and subgroup sizes, for m = 10, sus- tained shift, T(3) distribution. SNS vs Mean Rank, and Me- dian Charts. . . 42

4.8 EAP for different shift and subgroup sizes, for m = 20, sus- tained shift, T(3) distribution. SNS vs Mean Rank Charts. . . 44

4.9 EAP for different shift and subgroup sizes, for m = 10, isolated shift, T(3) distribution. SNS vs Mean Rank, and Median Charts. 45 4.10 EAP for different shift and subgroup sizes, for m = 10, sus- tained shift, Gamma(1,0.5) distribution. SNS vs Mean Rank and Median Charts. . . 46

xi

(13)

xii LIST OF TABLES

4.11 EAP for different shift and subgroup sizes, for m = 20, sus- tained shift, Gamma(1,0.5) distribution. SNS vs Mean Rank

Chart. . . 47 4.12 EAP for different shift and subgroup sizes, for m = 10, isolated

shift, Gamma(1,0.5) distribution. SNS vs Mean Rank and

Median Charts. . . 48 4.13 EAP for different shift and subgroup sizes, for m = 10, sus-

tained shift, Normal(0,1) distribution. pSNS vs SNS and Mean

Rank Charts. . . 52 4.14 EAP for different shift and subgroup sizes, for m = 10, sus-

tained shift, T(3) distribution. pSNS vs SNS and Mean Rank

Charts. . . 52 4.15 EAP for different shift and subgroup sizes, for m = 10, sus-

tained shift, Gamma(1,0.5) distribution. pSNS vs SNS and

Mean Rank charts. . . 52 4.16 EAP for different shift and subgroup sizes, for m = 10, isolated

shift, Normal(0,1) distribution. pSNS vs SNS and Mean Rank

Charts. . . 54 4.17 EAP for different shift and subgroup sizes, for m = 10, isolated

shift, T(3) distribution. pSNS vs SNS and Mean Rank Charts. 55 4.18 EAP for different shift and subgroup sizes, for m = 10, isolated

shift, Gamma(1,0.5) distribution. pSNS vs SNS and Mean

Rank charts. . . 55 5.1 Data for compressive strength for Mixed Concrete (Kgf/cm2) . 60 5.2 Sequential ranks obtained from data for compressive strength

for Mixed Concrete (Kgf/cm2) . . . 61 5.3 Rankits and Z scores obtained from data for compressive strength

for Mixed Concrete (Kgf/cm2) . . . 62 5.4 Sequential ranks obtained from data for compressive strength

for Mixed Concrete (Kgf/cm2). Backward set. . . 63 5.5 Rankits obtained from data for compressive strength for Mixed

Concrete (Kgf/cm2). Forward and Backwards order. . . 64 5.6 Z scores and pooled Z scores obtained from data for compres-

sive strength for Mixed Concrete (Kgf/cm2). Forward and

Backward order. . . 64

(14)

Contents

1 Introduction 1

1.1 Motivation . . . 3

1.2 Problem statement . . . 6

1.3 Research questions . . . 6

1.4 Research hypotheses . . . 6

1.5 Research purposes and objectives . . . 7

1.6 Scope and limitations . . . 7

1.7 Contributions . . . 7

2 Background and literature review 9 2.1 Key Definitions . . . 9

2.1.1 Industry 4.0 . . . 9

2.1.2 Statistical process control . . . 11

2.1.3 Control Charts . . . 12

2.1.4 Shewhart control chart . . . 13

2.1.5 Evaluation of performance . . . 14

2.2 Literature review . . . 15

2.2.1 Gap analysis . . . 17

2.2.2 Parametric control charts for mean monitoring . . . 18

2.2.3 Nonparametric control charts for mean monitoring . . 19

2.2.4 Competing charts . . . 20

3 Methodology 25 3.1 Sequential Normal Scores . . . 25

3.1.1 Individual Observations . . . 25

3.1.2 Batched observations . . . 26

3.2 Pooled Sequential Normal Scores . . . 27

3.3 Shewhart control chart for batched observations using SNS . . 28

xiii

(15)

xiv CONTENTS

3.4 Shewhart control chart for batched observations using pSNS . 28

3.5 Control limits . . . 29

4 Experimental Design and Results 33 4.1 Results . . . 35

4.1.1 In Control analysis . . . 35

4.1.2 Out of control analysis . . . 38

5 Example 59 5.1 X chart . . . .¯ 59

5.2 SNS chart . . . 60

5.3 pSNS chart . . . 61

5.4 Results . . . 62

6 Conclusions 67

(16)

Chapter 1. Introduction 1

Chapter 1 Introduction

Quality assurance is one of the most demanding and important tasks within industry. Commercial success is closely related to the quality of products or services, as a result, the interest of enterprises in providing high quality products is increasing as they are becoming more involved and committed.

Its definition might turn a little subjective, since quality depends on the characteristic of the product or service itself. In most cases, quality is related to the fulfillment of different specifications provided by the customer, making it quantifiable and, consequently, able to control and monitor.

Through history, different authors have developed methods and tools for quality control. One of the most widely used approaches to achieve high quality standards is statistical process control (SPC).

As the industry 4.0 era arises, factories are becoming smarter, in the sense that processes become more controllable due to the increment of sensors involved in them. In consequence, more extent data streams become available for their analysis. We can get an overview of the importance of this topic by the number of related published articles over recent years. Figure 1.1 shows that, research topics related with SPC, such as Industry 4.0, control charts, or data science, have been gaining relevance since 2012, marking a positive trend and, therefore, an increasing interest of researchers.

There are two different phases of an SPC implementation. During Phase I, a process is assessed in order to determine its stability, which is equivalent to say, that the process is in control (IC). The next stage (Phase II) faces the problem of monitoring such process and detect any changing of conditions, this is, to check whether the process is out of control (OC). The elementary tool of SPC are the control charts. Control charts allow production engineers to detect out-of-control observations if they are accurately constructed. The

(17)

Chapter 1. Introduction 2

Figure 1.1: Number of papers published per year by research topic. Data obtained from ScienceDirect database.

effectiveness of a Phase II control chart depends at a great extent in the proper construction of a prior Phase I control chart to evaluate the state of control, and the accuracy of the process parameters estimates. If out- of-control observations are detected promptly, processes can be stopped for corrective maintenance to avoid the production of nonconforming items that, otherwise, would result in waste for the companies.

Basic control charts work properly within the assumption of a normal underlying distribution of the data, but might become inefficient provided different conditions. To overcome this problem, some modifications need to be done over basic control charts, obtaining the so called nonparametric or distribution-free control charts, and creating a whole new direction on the use of SPC. Most of the nonparametric control charts involve statistical transformations of the data. Some statistics are based on permutations, but this approach is limited to small samples due to the computational complexity of the method. Rank transformations have shown great detection power and robustness against the presence of outliers, but their accuracy may be affected when the subgroup sizes are small. Sequential ranks, where only the most recent observations are ranked, have also shown great properties about their power, while eliminating the dependence between observations in the same subgroup.

In 2017, Conover et al. suggested to take advantage of the relationship between ranks and the estimators of a cumulative probability to replace the ranks by normal scores, thus, defining for the first time, the Sequential Nor-

(18)

Chapter 1. Introduction 3

mal Scores (SNS). The result is a powerful convenient method to analyze non-normal data sequences. However, the implementation of nonparametric control charts in real practice has not been yet fully adopted, either due to the ignorance of practitioners about these methods and their interpretation, or to the facility of using the traditional parametric approach.

1.1 Motivation

Matching the theory with real life industry applications may be a hard task to achieve for engineers. When analyzing data, observations will scarcely ever fit a normal distribution or any other known distribution that allows us to apply parametric approaches. In some other cases, the cost of obtaining a priori information makes it hard to collect enough observations to rely on the central limit theorem.

Several authors have tried to cope with this problem, and alternatives ap- proaches that are based on the identification of the fitting distribution have been suggested. One of the most widely used models to normalize real data is the system of Johnson’s distributions. This transformation is available in many statistical softwares such as R and Minitab.

To illustrate the problem of using parametric approaches to deal with non-normality during a Phase I analysis, let us use a real data set from the construction industry available in Aichouni et al. (2014). Figure 1.2 shows the fitted distribution of the 66 observations. At first sight, the distribution appears to be normal-like, centered around 357.6 and slightly skewed, how- ever, after performing an Anderson Darling normality test (see Figure 1.3), a p-value lower than 0.005 proves the opposite. Therefore it justifies the use of the a Johnson’s transformation, whose plot is shown on Figure 1.4. After transformation, a new Anderson Darling test reveals a p-value of 0.455.

The ¯X control chart carried over the real data and a zone chart using the transformed observations are presented in Figures 1.5 and 1.6 respec- tively. While the chart from the real data triggers 3 alarms, the one with the Johnson’s transformation shows stability in the process. This practice is extensively accepted and adopted in many industrial processes.

(19)

Chapter 1. Introduction 4

Figure 1.2: Fitted distribution of real data observations.

Figure 1.3: Anderson-Darling normality test results.

Figure 1.4: Fitted distribution of transformed observations.

(20)

Chapter 1. Introduction 5

Figure 1.5: ¯X control chart over real data set.

Figure 1.6: Zone chart and new control limits using Johnson’s transformed observations.

(Aichouni et al., 2014).

Nevertheless, one of the main downsides of this method is that, during a Phase I analysis, when data is first approached, it is unknown whether assignable causes of variation are present or not. If contaminated data is transformed to make it look normal, assignable causes disguise into an artifi- cial normal behavior, making the discriminant analysis harder to be carried.

A transformation over a mixture of in-control and out-of-control variables

“normalizes” the mixture, effectively reducing the ability to make a distinc- tion. Since transformations are problematic in a Phase I analysis, alternative procedures, such as nonparametric procedures, are worth considering.

At the time this research was done, the only nonparametric approaches

(21)

Chapter 1. Introduction 6

available were from Jones-Farmer et al. (2009) and Graham et al. (2010). The former, considered a rank transformation over all Phase I samples, extending the parametric paradigm; and, the latter, relies on dichotomization, which is known to provide low power. Our hypotheses is that, by using Sequen- tial Normal Scores, both issues are avoided, and, as a consequence, a more powerful Phase I analysis can be created.

1.2 Problem statement

This thesis deals with the problem of designing a Phase I control chart based on sequential normal scores, measuring its performance in terms of EAP, and comparing the results with current nonparametric alternatives.

1.3 Research questions

1. Is the performance of the proposed control chart in terms of the empir- ical alarm probability comparable with parametric approaches when the normality conditions are met?

2. Is the performance of the proposed control chart in terms of the empirical alarm probability better than parametric approaches under conditions where normality is not met?

3. Is the performance of the proposed control chart in terms of the empiri- cal alarm probability comparable with nonparametric alternatives under conditions where normality is not met?

1.4 Research hypotheses

1. The performance of the proposed control chart in terms of the empirical alarm probability is similar when compared with parametric approaches when the normality conditions are met.

2. The performance of the proposed control chart in terms of the empir- ical alarm probability is better than parametric approaches under non- normality conditions.

(22)

Chapter 1. Introduction 7

3. The performance of the proposed control chart in terms of the empirical alarm probability is better than nonparametric alternatives under non- normality conditions.

1.5 Research purposes and objectives

1. Construct a Phase I Shewhart-type chart based on the Sequential Nor- mal Scores for monitoring the location of a process, where observations are grouped in batches larger than one and there is no knowledge about any quantiles.

2. Assess the IC performance of the obtained chart under different scenarios and distribution and compare with other parametric and nonparametric alternatives.

3. Assess the OC performance of the obtained chart under different scenar- ios and distribution and compare with other parametric approaches.

4. Assess the OC performance of the obtained chart under different scenar- ios and distribution and compare with other nonparametric alternatives.

1.6 Scope and limitations

The proposed chart can only be used for a Phase I control scheme, namely a retrospective analysis and not for online monitoring. The control chart designed is a two-sided control chart for monitoring exclusively the location (mean) of the process. A varied set of scenarios were used during the ex- perimentation, combining different subgroup sizes, number of subgroups and distribution of the data.

Since the exact distribution of the statistic used in the chart is unknown, nu- merical approaches such as Monte Carlo simulations were used to determine the efficiency of such method.

1.7 Contributions

Two Shewhart-type Phase I control charts based on Sequential Normal Scores were introduced, similar in performance with parametric methods and better

(23)

Chapter 1. Introduction 8

than existing nonparametric alternatives under any kind of distribution.

A set of control limits for both charts and for different subgroup and sample sizes was provided to readers for reproducibility purposes. FAP0

values of 0.10 and 0.05 were considered.

At the same time, guidelines for the usage of the proposed control charts were obtained as well as some recommendations for practitioners.

(24)

Chapter 2. Background and literature review 9

Chapter 2

Background and literature review

This section is intended for the readers with little or null knowledge about statistical process control. The purpose is for the audience to become familiar with some terminology and related works involved in the research.

2.1 Key Definitions

In this section, some common terms used during the elaboration of this work will be introduced for better understanding of the readers.

2.1.1 Industry 4.0

Through the history of humanity, industry has been constantly evolving, inventions such as the steam machine that lead to the mechanization of pro- cesses, electricity-powered devices and equipment, or the development of the computer, were, by the time they aroused, agents of change that took the concept of industry to a whole new level. These technological leaps regarding industrialization, leading paradigm changes are what we call now “industrial revolutions”.

In the last decade, with the emerging of the Internet, digitization became one of the main drivers of growth within factories, and almost a measure- ment of business success in the industry field. The combination of Internet technologies and future-oriented technologies in the field of “smart” objects seems to result in a new fundamental paradigm shift in industrial production (Lasi et al., 2014).

(25)

Chapter 2. Background and literature review 10

Industry 4.0 is a term referring to the fourth industrial revolution. It was first utilized in Germany in 2011, and spread around the world. Nowa- days, more and more companies are engaged in the task of transitioning into Industry 4.0 schemes and the development of “smart factories”.

Several are the technological pillars of an Industry 4.0 system, as can be seen in Figure 2.1. Roughly speaking, the technological implications required to move forward towards this new paradigm involves:

• Increasing mechanization and automation.

• Digitization and networking (increasing amount of sensor-data).

• Miniaturization

Figure 2.1: Technology pillars of an Industry 4.0 system. Saturno et al. (2017).

In addition, the continuous generation of high volume data will allow the creation of a Big Data environment, that will require capable analytic tools.

As the term Industry 4.0 takes relevance, new concepts emerges, such as the definition of “Smart factories” (utterly autonomous manufacturing systems) and Cyber-Physical Systems.

Cyber-Physical Systems (CPS) is defined as transformative technologies for managing interconnected systems between its physical assets and compu- tational capabilities (Baheti and Gill, 2011).

Lee et al. (2015) proposed a 5-level CPS structure, named the 5C archi- tecture, which is integrated by:

1. Smart connection

(26)

Chapter 2. Background and literature review 11

2. Data-to-information conversion 3. Cyber

4. Cognition 5. Configuration

Such network configuration will be able to increase the efficiency and re- siliency of the machines.

In conclusion, the concept of Industry 4.0 is based on the integration of information and communication technologies and industrial technology, and its purpose is to build a highly flexible production model of personalized and digital products and services, with real-time interactions between people, products and devices during the production process (Zhou et al., 2015).

2.1.2 Statistical process control

Statistical process control (SPC) is a statistical tool for assessing the confor- mance of a product against its designed requirements. In every process exists two types of variability associated with it. The first case is the inherent vari- ability of the process and it is caused by natural factors. The second type of variability is produced by external factors, causing shifts in the parame- ters of a process. The main objective of SPC is then, to detect as fast as possible whenever a type two variability is present within a certain process.

SPC can be roughly divided into two phases. In Phase I, the performance of the production process is often unknown and the main objective of this phase is to determine the IC state of the process. This is regularly done by letting the process produce a certain number of articles and analyze the values of the quality characteristic of interest. If the data obtained from the sample does not seem to run stable, pertinent adjustments to the production process need to be done. If only few observations seems to be out of con- trol, they can be considered as outliers and removed from the sample after a field search for root causes, to create an IC reference sample. This procedure might be repeated multiple times if necessary, until the IC state is reached.

Once the stable sample of the process is obtained, estimates of the parame- ters are obtained from the reference sample to determine the IC distribution of the quality characteristic. Phase II involves the online monitoring of the

(27)

Chapter 2. Background and literature review 12

process. Whenever a significant shift on the IC distribution is detected, an alarm is triggered, indicating that the process need to be stopped for root cause identification. This state of assigned causes is called an out of control process (OC).

2.1.3 Control Charts

Control charts are tools which use statistics to evaluate sequentially a time series looking for changes in it, and they can be represented with graphs representing the value of a certain characteristic or variable of a process over time. The y axis represents the value, or statistic, that measures the quality characteristic of interest, and the x axis represents the time or observed order.

There are three basic elements in a control chart presented as horizontal lines in the graph. In Figure 2.2 we can find: (a) the Center Line (CL), representing the target of the quality characteristic, (b) the Upper Control Limit (UCL), and (c) the Lower Control Limit (LCL).

Figure 2.2: Elements of a control chart.

According to Montgomery (2019), these control limits are chosen so that if the process is in control, nearly all of the sample points will fall between them.

Otherwise, if there is an observation falling under or above the lower and upper control limits respectively, the process is said to be out of control and corrective actions are required to eliminate the causes associated. There are different types of control charts, such as (a) Shewhart,(b) CUSUM, (c)EWMA

(28)

Chapter 2. Background and literature review 13

and (d) Change point detection (CPD) charts. Each of them have their own characteristics, advantages, and disadvantages against the others.

2.1.4 Shewhart control chart

The first control chart documented in a paper was developed by Shewhart (1924) and it was named after the author as the Shewhart control chart. Such chart, that was part of the seven basic tools of quality, by Ishikawa, quickly became adopted and spread in industry as it was used during the war effort in USA and the post-war reconstruction by one of its main promoters, Dr.

William Edwards Deming. For a compile of Walter Shewhart’s methods and philosophy see Shewhart (1931).

The general form of this type of chart uses lower control limit (LCL) and upper control limit (UCL) based on the mean and the standard deviation of the process when in-control. The simple form follows

U CL = µ0 + Z1−α/2σ

LCL = µ0 − Z1−α/2σ (2.1)

where µ0 represents the true mean of the process and σ is the standard deviation. Here α denotes the false alarm rate (FAR), while Z1−α/2 stands for the (1 − α/2) quantile of the normal standard distribution N(0,1) . This latter value represents the distance of the control limits from the mean of the process, expressed in terms of standard deviations. Further values from the center will translate in less false alarms, but will increase the type II error probability, and vice versa, closer limits from the center will increase the number of false alarms while reducing the type II error.

Some authors also suggest the designation of “warning limits”, located at shorter distances than the actual control limits, in order to increase the sensitivity of the chart.

For a modern approach to control chart see Montgomery (2019).

When using a Shewhart chart in a Phase II scheme, this “distance” Z1−α/2σ is usually determined by the desired average run length (ARL), defined as the number of samples, on average, that the chart will require before signaling an alarm. Under the IC assumptions of independent and identically normally distributed observations, the first alarm will occur at α1.

(29)

Chapter 2. Background and literature review 14

Nevertheless, in a Phase I analysis, the width of the control limits is better determined by the false alarm probability (FAP), since the number of runs is already a fixed number. Further details in how to select the proper FAP value are presented later in this work.

2.1.5 Evaluation of performance

In previous section, two terms involved in the construction of Shewhart-type control charts were introduced, the FAR and FAP.

Suppose that m > 1 independent random subgroups, each of size n > 1, are available for a control chart development, the FAR is then the probability of a false alarm on every sampling stage. This probability is the same for all m subgroups, and its calculation requires only the marginal distribution of the process when it is found to be in-control. On the other hand, the FAP is the probability of at least one false alarm out of m samples or subgroups. This calculation does require the knowledge of the joint distribution of the plotting statistic when the process is said to be in control, due to the simultaneous comparisons that are being carried on.

There are two methods for constructing Phase I charts when there is no knowledge about the process parameters. The first method is attributed to Hillier (1969), in which the FAR was controlled for a specified value. The second method, proposed by King (1954), consisted on calculating control limits to be attained to an specified FAP value when the process is in control.

In practice, the FAP method is preferred over the FAR for comparative purposes, because it considers the effect of parameter estimations and the de- pendence caused when comparing the m subgroups against the same control limits.

The first step when comparing Phase I control charts will be then, to select an IC FAP value and corresponding control limits. This value will be called FAP0. Common FAP0 values used in practice are 0.10, 0.05, and 0.01.

The rule of decision will be to define the best control chart as the one with the greatest empirical alarm probability (EAP), being this, the proportion of times that the control chart triggers at least one alarm over the total of replicates of the experiment when the process is out of control.

(30)

Chapter 2. Background and literature review 15

2.2 Literature review

The relevance of SPC has been discussed through the evolution of industry, and despite the proved benefits of its use regarding quality control, there are still some controversies involving the application of such methods, perhaps due to the heterogeneity of the workforces in quality fields and the lack of statistical background in most workers. Woodall (2000) discussed this and some other controversies in SPC such as the relation of the control chart with hypothesis testing. In this work he also gave an extensive discussion on the relevance of SPC and the importance of continuing doing research on the field.

Another discussion of the type is presented in Yashchin (1993), where similarities and differences between the terms Engineering Process Control and SPC were addressed. In this article, the author also provided theoretical and graphical aspects of control charting, emphasizing in two of the most classic control charts, such as the CUSUM by Page (1954) and the EWMA technique developed by Roberts (1959).

Several authors have also contributed to the discussion by means of multi- ple research work and the gathering of literature regarding some applications of the SPC. In more recent works, Woodall and Montgomery (2014) extended the applicability of SPC to important areas such as health-related monitoring, spatiotemporal surveillance, profile monitoring, use of autocorrelated data, among others. In this review article, he focused on works done over the past decade.

Another recurrent discussion in the field is the role of the statistical theory in its application. Narrowing the gap between the theory and the practice may not be a simple task. Woodall (2017) himself, tried to attack this issue and discussed some of the most recurrent worries involved in the transition, with the intention of making statistical process monitoring (SPM) a more useful tool for practitioners.

However, one of the main hassles during the implementation of SPM ac- tions is the unfulfilling of the required assumptions prior to the analysis, such as the knowledge of the distribution of the data or the normality of the same.

Several researchers have dedicated to demonstrate the effects of non-normal distributions on the performance of parametric control charts.

Chakraborti et al. (2001) presented an overview of the literature about

(31)

Chapter 2. Background and literature review 16

nonparametric control charts and highlighted the advantages of their use while pointing some disadvantages of the more traditional charts. Qiu (2018) (2017) gave some perspectives on issues related to the robustness of conven- tional SPC charts and to the strengths and limitations of various nonpara- metric SPC charts. In this article, Qiu (2018) evaluated the robustness of a basic EWMA and CUSUM chart for Phase II against two different distri- butions such as the T and chi-square in terms of the ARL0 and obtained sufficient information to show that conventional charts are inappropriate to use in cases where the actual process distribution is not normal. He also wrote about the use of control charts as a tool for analyzing big data and provided explanation for some modifications needed to be done in the basic control charts to monitor different types of processes.

Similar conclusions were drawn in the work presented by Qiu and Zhang (2015) over nonparametric control charts. Here they compared two CUSUM charts for Phase II monitoring constructed using a nonparametric transfor- mation (Johnson’s and Box-Cox) based on the IC dataset, to two represen- tative nonparametric CUSUM charts based on the Wilcoxon rank-sum test statistic and categorization of Phase II observations as well as to a conven- tional CUSUM. Numerical studies showed that nonparametric charts offered a better performance in cases where the process distribution is non-normal.

Several nonparametric Shewhart type charts have been proposed as an alternative of the traditional ¯X in the literature. Among the most known are the one proposed by Bakir (2004), which is based on the Wilcoxon signed-rank statistic, whose distribution, available in Wilcoxon et al. (1970), was used to find the exact FAR of the chart and ARL0. Results of the experimentation on the OC performance of the chart suggested that it was a better option in cases where the distribution of the data is symmetrical and heavy tailed.

Following with the line of signed-rank Shewhart type charts, Chakraborti and Eryilmaz (2007) developed a similar chart, with the addition of a k-of-k runs rule of decision. One of the advantages of this chart over the previous one, is that the symmetry assumption was unnecessary. Authors provided instructions in how to create positive-sided and two-sided k-of-k Signed-Rank Charts.

Another nonparametric alternative, proposed by Chakraborti et al. (2004), is the so called Precedence Chart, because its functioning is based on the precedence statistic for location, proposed by Nelson (1963) as a nonpara-

(32)

Chapter 2. Background and literature review 17

metric test.

Among all nonparametric methods, those involving sequential ranks are gaining importance due to their properties in terms of the computational com- plexity and power of detection. McDonald (1990) was the first one to develop a CUSUM procedure based on sequential ranks. These ranks were standard- ized over the current number of evenly distributed observations between 0 and 1. A Markov chain was defined to obtain approximate calculations of the ARL.

A very promising result on the application of sequential ranks was pre- sented in the sequential normal scores transformation, defined by Conover et al. (2017). Here, sequential ranks are replaced by corresponding quantiles from a normal N(0,1) distribution.

The majority of the nonparametric control charts constructed under a basis of sequential ranks for mean monitoring available on the literature are defined for CUSUM and EWMA approaches. The sequential normal scores transformation allows us to adapt data sets and use parametric tools to easily construct any type of control chart, either Shewhart, CUSUM or EWMA.

Senac-Cuesta (2018) continued the job of Conover et al. (2017) and created a Shewhart chart type for Phase II, based on SNS and evaluated its performance in terms of the ARL against 2 other nonparametric charts using different sets of distributions such as normal, lognormal and Laplace, thus demonstrating the consistency and robustness of this method against distributions.

2.2.1 Gap analysis

In the available literature, most nonparametric control charts are developed for a Phase II analysis, Capizzi (2015) wrote about the need for a nonpara- metric approach to Phase I analysis. The reason of this lack of research can be associated to factories focusing more on monitoring their processes than in obtaining information and richer insights about them.

However, many of these control charts require an IC set for them to work properly, see for example Hack and Ledolter (1992), Tapang and Pong- pullponsak (2012), SUYI (2011), or Ambartsoumian and Jeske (2015). All these charts have in common the need for an IC data set from which the dis- tribution of the process is estimated or the new observations are compared to.

(33)

Chapter 2. Background and literature review 18

Therefore, the importance of a previous Phase I analysis should not be neglected within a statistical process monitoring scheme. The effects of the estimation of parameters based on a Phase I reference sample have been deeply studied, for instance in Jensen et al. (2006). Chakraborti et al. (2008) discussed the relevance of Phase I analysis in SPC and highlighted the de- pendence of the success of process monitoring in Phase II with the success of corresponding Phase I charts. In this article the authors debated about the ideal measure of performance for Phase I charts comparison and also gave an overview of the available univariate parametric Phase I charts. Jones-Farmer et al. (2014b) extended the review by adding a discussion on multivariate charts as well as profile monitoring methods for Phase I, while providing comments on the best practices for basic steps in Phase I charting, such as the use of rational subgrouping, the ideal sample size and directions under the presence of outliers. Jones-Farmer et al. (2014a) and Zhang et al. (2013) also focused the attention on the lack of Phase I tool for attribute variables.

Many authors concur in the little emphasis that researchers have been giving to Phase I in SPC, and consequently, the need for the development of better tools for practitioners.

For this reason, and being one of the main drivers of the author to carry on this research work, two Phase I control charts based on SNS are proposed.

Corresponding evaluation of the performance in terms of the EAP and com- parison against different nonparametric charts will be delivered as part of this work, in the case where there is no knowledge about any quantiles or information about the process.

2.2.2 Parametric control charts for mean monitoring

The ¯Xchart

One of the most widely used control charts in industry for monitoring the location of a process with numerical variables is the ¯X chart. In the last section we defined how to create a Shewhart type control chart. However, in practice µ0 and and σ are almost always unknown, therefore, it is justified the use of estimators. We will use the grand sample mean to estimate µ0

ˆ

µ0 = ¯X =¯ 1 m

m

X

i=1

i. (2.2)

(34)

Chapter 2. Background and literature review 19

The standard deviation was traditionally estimated using the sample ranges due to computation advantages (perhaps negligible nowadays). Sample ranges are defined as Ri = Xi(n)− Xi(1), the difference between the first and the last order statistic at the ith subgroup.

Consequently, the natural estimator of σ is ˆ

σ = R¯

d2(n) (2.3)

where

R =¯ 1 m

m

X

i=1

Ri (2.4)

and d2(n) is a constant that depends on the subgroup size n, assuming nor- mality, and can be easily found in books. In Qiu (2013) a table of this constant for 2 ≤ n ≤ 25 is provided for practitioners.

The control limits for the ¯X chart are given by U CL = ¯X +¯ Z1−α/2

d2(n)√ n

R¯ CL = ¯X¯

LCL = ¯X −¯ Z1−α/2 d2(n)√

n R¯

(2.5)

This control chart gives an alarm at the ith subgroup each time ¯Xi goes beyond these limits.

2.2.3 Nonparametric control charts for mean monitoring

Although the catalogue of nonparametric Phase I charts is not too exten- sive in the literature, some authors have dedicated efforts to develop a few alternatives for practitioners.

For instance, Jones-Farmer et al. (2009) developed a rank-based control chart for Phase I analysis. Such chart was named Mean Rank Chart due to the plotting statistic used in its construction. The mean rank of each sub- group was standardized to create a symmetrical control chart around 0, and its performance was compared against a ¯X chart. The results demonstrated that the EAP for this chart was better under heavy-tailed (T) and skewed

(35)

Chapter 2. Background and literature review 20

(Gamma) distribution but less efficient under normality conditions, specially for small subgroup sizes.

Graham et al. (2010) proposed a median based Phase I control chart as a nonparametric alternative for monitoring the location of a continuous vari- able. In this chart, they used the median of the pooled Phase I sample as a pivoting statistic and counted the number of observations on each subgroup that are less than such number. The chart displayed a comparable perfor- mance against a ¯X chart for normal and heavy-tailed (T and uniform) distri- butions and favorable results for positive shifts under skewed distributions.

One of the implicates of this chart to work properly is that the subgroup size needed to be large enough.

Another Phase I chart for monitoring the location of a process was devel- oped by Ning et al. (2015). This chart was based on the empirical likelihood ratio test, and focused on the monitoring of individual observations. An ap- plication on health care is included for exemplification purposes.

Scale monitoring Phase I control charts are available in the literature as well, see for example Jones-Farmer and Champ (2010) or Li et al. (2019), where a multi-sample Lepage statistic is used to monitor both, location and scale. The applicability of Phase I charts was extended to multivariate anal- ysis by Li et al. (2014) where they proposed a change point detection control chart based on data depth. Another multivariate alternative was presented by Bell et al. (2014), this is an analog of the univariate Mean Rank chart in Jones-Farmer et al. (2009). Kazemzadeh et al. (2008) developed three chart- ing method for monitoring polynomial profiles, using a likelihood test based procedure in order to be able to detect location of shifts.

All the latter charts were reviewed, but we do not extend their review as the aim of this research is not aligned with such charts.

2.2.4 Competing charts

In this section we present two nonparametric charts to be compared against our proposals. These two charts were considered due to the similarity with the proposed SNS and pSNS charts.

The criteria to select such charts includes the next characteristics:

• The chart needs to be focused on a Phase I analysis.

(36)

Chapter 2. Background and literature review 21

• The chart needs to be one of the Shewhart type.

• The chart needs to be designed to monitor changes in location.

• Observations need to be grouped in batches larger than 1.

• There is no previous knowledge about any quantiles of the null distribu- tion of the data.

Mean Rank Chart

Consider a Phase I data set grouped in m batches of size n each, and let Xi,j be a random variable that represents the jth observation in batch i. All m subgroups are assumed to be mutually independent.

We can treat this subgrouped sample as one big sample of size N = nm.

Let Ri,j represent the rank of the observation Xi,j compared within the pooled sample of size N.

Since it is assumed that the quality characteristic to be measured comes from a continuous distribution, ties are not expected. However, in real in- dustrial processes, ties are likely to happen due to a lack of resolution of measuring devices. In case of ties, the midrank method is selected as tie- breaker.

Let us now proceed to calculate the average of the ranks on subgroup i by using the next equation.

i = P

j Ri,j

n (2.6)

When the process is in control, all ranks should be distributed evenly around subgroups, therefore ¯Ri follows the same distribution for all i and the mean and variance of ¯Ri are given by

E( ¯Ri) = N + 1 2

V ar( ¯Ri) = (N − n)(N + 1) 12n

and the central limit theorem suggests that the statistic Zi =

i− E( ¯Ri)

pV ar( ¯Ri) (2.7)

(37)

Chapter 2. Background and literature review 22

is approximately standard normal as n gets larger. This statistic is called the Standard Mean Rank. The result is a symmetrical control chart around 0 with control limits LCL = -UCL.

Control limits

In their article, the authors provided a table of control limits obtained through Monte Carlo simulations. These control limits are applicable for a FAP value of 0.1 and 0.05, batch size n = 3, 4, 5, 10, 15, 20 and number of subgroups m = 10, 20, 30, 40, 50. Such control limits are found in Table 2.1.

Median Chart

Let Xi,1, Xi,2, Xi,3, ..., Xi,n denote the subgroup of size n > 1 that comes from the ith population with continuous c.d.f. Fi, i = 1, 2, ..., m. Suppose that there are m independent subgroups of size n that form a mn Phase I sample that will be analyzed, and assume a location model Fi(x) = (x − θi), where F represents an arbitrary cumulative function and θi the location parameter of interest. Assume that F (0) = 0.5 so that θi is the median of the ith population, we select the median due to its robustness and simplicity when designing control charts. The process to build the chart is the next one. First, find the median M of the pooled sample following the traditional method, that is

M =

(X((N +1)/2) if N is odd

(X(N/2) + X((N +2)/2))/2 if N is even

where X(1), X(2), ..., X(N ) represents the order statistic of the whole Phase I preliminary sample.

Next, for each sample i, define Ui, as the number of observations that are less than M in the ith subgroup, where Ui ∈ (0, 1, 2, ..., n), i.e.,

Ui =

n

X

j=1

I(Xi,j < M ) (2.8)

for i = 1, 2, ..., m and I is the indicator function of value 1 when the condition is evaluated as true and 0 otherwise. Finally, each ui is plotted and compared against the proper control limits, that will depend on the subgroup size n,

(38)

Chapter 2. Background and literature review 23

Table 2.1: Mean Rank Chart Control Limits Desired

FAP = 0.10

Desired FAP = 0.05 m n Control

limit ±

Simulated FAP

Control limit ±

Simulated FAP 10 3 2.386 0.0767 2.524 0.0355 10 4 2.436 0.0932 2.615 0.0416 10 5 2.474 0.0899 2.666 0.0484 10 10 2.510 0.0993 2.734 0.0490 10 15 2.528 0.0973 2.760 0.0474 10 20 2.534 0.0980 2.766 0.0494 20 3 2.494 0.0942 2.595 0.0469 20 4 2.606 0.0922 2.738 0.0473 20 5 2.650 0.0927 2.808 0.0473 20 10 2.720 0.0997 2.924 0.0497 20 15 2.750 0.0979 2.954 0.0499 20 20 2.758 0.0984 2.972 0.0497 30 3 2.574 0.0859 2.664 0.0410 30 4 2.692 0.0912 2.807 0.0482 30 5 2.748 0.0992 2.895 0.0476 30 10 2.834 0.0993 3.031 0.0481 30 15 2.868 0.0988 3.071 0.0482 30 20 2.882 0.0984 3.087 0.0494 40 3 2.598 0.0976 2.682 0.0477 40 4 2.734 0.0979 2.853 0.0466 40 5 2.806 0.0953 2.946 0.0497 40 10 2.918 0.0982 3.102 0.0477 40 15 2.946 0.0996 3.143 0.0484 40 20 2.962 0.0988 3.164 0.0489 50 3 2.636 0.0992 2.716 0.0497 50 4 2.776 0.0968 2.889 0.0487 50 5 2.852 0.0980 2.990 0.0490 50 10 2.970 0.0996 3.153 0.0476 50 15 3.008 0.0994 3.203 0.0471 50 20 3.024 0.0998 3.219 0.0490

The result is a control chart with integer limits 0 < a < b < n where b = n−a so that the limits are symmetrical. One of the conditions for this chart to have a better performance is that n > m.

Control limits

(39)

Chapter 2. Background and literature review 24

Table 2.2: Median Chart. Attained FAP values and the control limits (a,b) for F AP0 = 0.10 FAP0 = 0.10

Number (m) of Phase I samples

4 5 6 7 8 9 10

Sample size (n) 15 0.0563 0.0873 0.0205 0.0272 0.0337 0.0406 0.0472 (3,12) (3,12) (2,13) (2,13) (2,13) (2,13) (2,13) 16 0.0305 0.0486 0.0671 0.0858 0.0181 0.0219 0.0257 (3,13) (3,13) (3,13) (3,13) (2,14) (2,14) (2,14) 17 0.0836 0.027 0.0377 0.0493 0.0605 0.0721 0.0833 (4,13) (3,14) (3,14) (3,14) (3,14) (3,14) (3,14) 18 0.0479 0.0743 0.0208 0.0275 0.0343 0.0411 0.048

(4,14) (4,14) (3,15) (3,15) (3,15) (3,15) (3,15) 19 0.0267 0.0434 0.0597 0.077 0.0936 0.0232 0.0272 (4,15) (4,15) (4,15) (4,15) (4,15) (3,16) (3,16) 20 0.0681 0.0243 0.0345 0.045 0.0556 0.0628 0.0769 (5,15) (4,16) (4,16) (4,16) (4,16) (4,16) (4,16) 21 0.0398 0.063 0.0856 0.0261 0.0324 0.0391 0.0456 (5,16) (5,16) (5,16) (4,17) (4,17) (4,17) (4,17) 22 0.0903 0.0369 0.0516 0.0665 0.0815 0.0964 0.0266 (6,16) (5,17) (5,17) (5,17) (5,17) (5,17) (4,18) 23 0.0549 0.0851 0.0305 0.0401 0.0495 0.0592 0.0686 (6,17) (6,17) (5,18) (5,18) (5,18) (5,18) (5,18) 24 0.0326 0.0518 0.0715 0.0913 0.0295 0.0355 0.0415 (6,18) (6,18) (6,18) (6,18) (5,19) (5,19) (5,19)

In their article, the authors provided a table of control limits. These control limits were obtained for a FAP value of 0.01, 0.05 and 0.10, for m = 4(1)10 and n = 15(1)24. Some of these limits are presented in Table 2.2.

(40)

Chapter 3. Methodology 25

Chapter 3

Methodology

A Phase I nonparametric Shewhart-type control chart for mean location based on SNS by Conover, Tercero and Cordero (2017) was developed, in- cluding the following characteristics: no previous knowledge of distribution or any quantiles, with sample grouped in batches.

Intructions on the construction of a Shewhart-type control chart of the same characteristics, using a novel method defined as Pooled Sequential Nor- mal Scores (denominated as pSNS from now on) are provided as well.

3.1 Sequential Normal Scores

3.1.1 Individual Observations

As defined by Conover et al. (2017), let X1, X2, . . . , be a sequence of inde- pendent identically distributed random observations with a continuous dis- tribution function F (x). The sequential rank Ri of Xi, where i stands for the observed order within the sequence, is defined as the rank of Xi relative to the previous variables in the sequence up to and including Xi. The sequential ranks of all the observations before Xi remain unchanged. They obtained Pi to estimate F (x) using the next formula:

Pi = Ri − 0.5

i (3.1)

Sequential normal scores are obtained from Pi using Zi = Φ−1 (Pi) where Φ−1 represents the inverse cumulative standard normal distribution function.

As i gets large the sequence {Zi : i = 1, . . . } consists of mutually independent asymptotically standard normal random variables.

(41)

Chapter 3. Methodology 26

3.1.2 Batched observations

Let {Xi,j : i = 1, . . . , m; j = 1, . . . , n} be a sequence of independent identi- cally distributed random observations with continuous distribution function F (x). These variables are grouped into batches (samples) of size n. For the first batch (i = 1) the n random observations are ranked relative to the other random observations in that batch, and the ranks are denoted by R1,j for j = 1, . . . , n. For all subsequent batches i > 1 the sequential rank Ri,j of Xi,j, is given by,

Ri,j(Xi,j) =

i−1

X

k=1 n

X

l=1

I(Xk,l ≤ Xi,j) + 1 (3.2) where I(Xk,l ≤ Xi,j) is an indicator function. This sequential rank is com- puted for each j = 1, . . . , n in batch i. Note that several of these sequential ranks within a batch may be equal to one another, but do not change as more batches are observed. We will use

P1,j = R1,j − 0.5

n (3.3)

and

Pi,j = Ri,j − 0.5

n(i − 1) + 1 (3.4)

for i > 1, to estimate F (Xi,j).

Sequential normal scores are obtained from Pi,j using Zi,j = Φ−1(Pi,j) where Φ−1 represents the inverse cumulative standard normal distribution function. As i gets large the sequence {Zi,j : i = 1, 2, . . . m; j = 1, 2, . . . , n}

consists of mutually independent asymptotically standard normal random variables.

However, when dealing with batches of n observations, sometimes it is more convenient to graph the statistic

Zi = Pn

j=1Zi,j

√n . (3.5)

As Zi,j approaches a N(0,1) distribution, Zi is also approximately standard normal, as is stated in Conover et al. (2019).

(42)

Chapter 3. Methodology 27

3.2 Pooled Sequential Normal Scores

Pooled Sequential Normal Scores (pSNS) will be defined for the first time in this work, as an extension of the traditional SNS to overcome some flaws detected on the performance of such method in presence of early out of control observations, namely, observations located during the first subgroups of the data set to be analyzed.

Let {Xij : i = 1, . . . , m; j = 1, . . . , n} be a sequence of independent identi- cally distributed random observations with continuous distribution function F . These variables are grouped into batches (samples) of size n.

We create two different set of ranks depending on the order of evaluation.

The sequence of the first set of ranks is defined by the subindex i. We will define this set as the forward order. The computation of the scores in this set of observations follows the exact procedure as the SNS.

The second set of observations corresponds to the subgroups in the original data in the inverse order and it will be defined as the backward order. This second set of ranks follows the same ranking procedure, but the sequence of observations is inverted as i0 = m − i + 1. The next equation can also be used to compute these ranks.

Ri,jB (Xi,j) =

m

X

k=i+1 n

X

l=1

I(Xk,l ≤ Xi,j) + 1, (3.6) for i < m, where the superindex B indicates the ranking in the backwards order. For the specific case of RBm,j, the ranking is performed only between observations in sample m giving the rank 1 to the smallest and the rank m to the greatest.

To estimate F (Xi,j), we need to do some modifications on equations (3.3) and (3.4). Recalling from the SNS methodology, the denominator of these equations is called the rank order. For the forward set, the rank order remains the same as in the SNS and will be defined as OF, however, for the backward set, this rank order differs due to the fact that we are reversing the order of the observations. Thus, OmB = n and OiB = n(m−i)+1. Note that OiF = OBi0. New rankits will be obtained then, using equations

Pm,jB = Rm,j − 0.5

OBm , (3.7)

Referencias

Documento similar

Develop a synthetic functional, biocompatible and bioabsorbable coating for an abdominal wall hernia mesh with hyaluronic acid, vitamin E and adipose tissue stem cells, which will

An algorithm based on the flying dragon behavior can be an efficient cooperative strategy to guide 2 UAVs in the search and location of pollutant sources.. 1.4

To improve execution time of the baseline implementation, the adaptation stage of the algorithm was implemented on the second core of the ZYNQ.. The two Cortex-A9 processors on the

The implementation of control synthesis on the virtual model of the manufacturing cell was useful for a first validation of the plant modelling of the cap’s dispensers workstation,

Figure 1.1 shows the control architecture used in the regulation control framework, whose goal is to make the plant output to be equal to the specification output (otherwise stated,

PIs using ThumbRule as lag method and ARIMA as prediction method in the Mexico dataset.. A.27 a)DSS, SCORE and b) CWC of the normal and adjusted PIs using ThumbRule as lag method

Finally, the second linear trend (high current intensities) shows a similar high density of particles over the whole area of the droplet, suggesting that the current distribution

7.1, where it can be seen a stigmatic aplanatic lens, the reference rays of the on-axis and off-axis must meet each other in a single point in the second surface, the meeting