Ministerio de Hacienda - Ministerio de Salud

Psychological assessment has been based primarily on subjective judgment of clinician and classical psychometric test theory. Despite all the pros of the clinical interview, the problem of the subjective inferences of the clinician can cause errors in later evaluation

and diagnosis (Nordgaard, Sass, & Parnas, 2013). Indeed, the clinician’s evaluation could be affected by underestimation or overestimation of patient’s symptoms. Regarding the psychometric approach, typically, a total score determines the impairment level, which requires that all the same items are administered to all respondents. This last approach is primarily data oriented, and the product is often a series of scores. The score’s descriptions are typically unrelated to the person’s overall context and do not address unique problems the person may be facing (Hayes, Nelson, & Jarrett, 1987). In contrast, psychological assessment attempts to evaluate individual data in a broad perspective, with its focus being individual problem solving and decision-making.

Psychological assessment should include the evaluation of individual specific features. The central role of the clinician performing psychological assessment is that of an expert in human behaviour who must deal with complex processes and understand test scores in the context of a person’s life (Groth-Marnat, 2009).

Thus, rather than just knowing the labels and definitions for various types of anxiety or thought disorders, clinicians should also have in-depth operational criteria for them. For example, the construct of depression, as represented by the score, can sometimes seem misleadingly straightforward. Depression can manifest with a variety of different symptoms that may be due to a different culture or a different aetiology (as reported in the Chapter on mood disorders). Only through personalized assessment can be possible to distinguish these conditions (Groth-Marnat, 2009). Unless clinicians are familiar with these areas, they are not adequately prepared to understand different types of depression (i.e., agitated depression, depression with flight of ideas, inhibition of depression, etc.). An alternative to administration of a full scale achieving a personalize assessment is adaptive testing. It means that each individual may receive different scale items that are targeted to their specific impairment level (Fliege, Becker, Walter, Bjorner, Klapp, &

Rose, 2005). In adaptive testing, a person’s initial item responses are used to determine a provisional estimate of his or her standing on the measured trait (for example, depression or anxiety) to be used for the selection of subsequent items (Wainer, 2000). This form of testing has recently emerged in the field of knowledge and mental health research (Falmagne & Doignon, 2011; Weiss, 2004). Procedures based on item response theory (Embretson & Reise, 2013) can be used to obtain estimates for individuals (for example, severity of depression) to more efficiently identify suitable subsets of item for each individual (Gibbons et al., 2008). In particular in the last years several studies demonstrated that diagnostic instruments could benefit substantially from modern statistical approaches like models of item response theory (IRT), e.g., the Rasch model. Indeed, by using IRT-modelling it was shown that unidimensionality, an important aspect of test theory, cannot be taken for granted. For example, if a patient suffering from a severe somatic illness reported somatic symptoms in a depression questionnaire those symptoms may be ascribed to the somatic illness or a depressive episode (Forkmann et al., 2009). Moreover, it was shown that questionnaires could be shortened without loss of information. This testing approach is referred to as computerized adaptive testing (CAT) and can be applied to achieve a more effective assessment (Petersen, Groenvold, Aaronson, Fayers, Sprangers, & Bjorner, 2006). The main idea is to administer a small, optimal number of items to the individual without loss of measurement precision and according with his previous answer. This process mimics the semi-structured interview, with the difference that the inferences are made by an algorithm which considers all the information and step by step goes through the assessment following logically correct process (Spoto, 2011).

Eggen and Straetmans (2000) combined IRT with statistical procedures, like sequential probability ratio test and weighted maximum likelihood, for classifying people under exam. Other systems use Bayesian statistical techniques instead of IRT in the evaluation

of students’ knowledge (e.g. EDUFORM, Nokelainen, Silander, Tirri, Nevgi, & Tirri, 2001; and PARES, Marinagi, Kaburlasos, & Tsoukalas, 2007).

In the field of knowledge assessment ALEKS (Assessment and LEarning in Knowledge Spaces) is a complex system able to adaptively assess a subject’s knowledge and provide a consequent learning individualized path (Grayce, 2013; Donadello, Spoto, Sambo, Badaloni, Granziol, & Vidotto, 2016). Starting from a set of items on a specific topic, the output of ALEKS system is the subset of items, which the subject is able to reply; this subset is called “knowledge state” and it refers to the level of knowledge of the individual in a particular field.

However, the formulation of the adaptive algorithm is even more difficult in the clinical setting. In fact, the objectivity of the questions and therefore of the answers given by the subject is much more questionable, and the probabilities of making mistakes in the answer increase. Despite this, research has demonstrated that both item response theory and CAT (Baek, 1997) can be applied to the measurement of attitudes and personality variables (Reise & Waller, 1990). In the clinical context, Spiegel and Nenh (2004) developed an expert system, which calculates possible symptom combinations and returns all possible risk diagnoses. Yong and colleagues (2007) developed an interactive self-help system for depression diagnosis that provides advice about patients’ levels of impairment. Simms, Goldberg, Roberts, Watson, Welte, & Rotterman (2011) developed the CAT for Personality Disorders (CAT-PD) aimed at realizing a computerized adaptive assessment system. CAT has been applied also in developing adaptive classification tests by means of stochastic curtailment using CES-D for depression (Finkelman, Smits, Kim, & Riley, 2012; Smits, Finkelman, & Kelderman, 2016).

Gibbons and colleagues (2008) used the combination of item response theory and computerized adaptive testing (CAT) in mood and anxiety disorder assessment. In particular they applied a bifactor structure, consisting of a primary dimension and four

sub-factors (mood, panic-agoraphobia, obsessive-compulsive, and social phobia). Participants completed the Mood and Anxiety Spectrum Scales (MASS) at two times. The first administration was used to define an adapting testing version of the MASS, the second confirmed the functioning of CAT in live computerized testing. Authors created item banks with a large item pool, and being able to administer a small set of the items most relevant for a given individual with no loss of information, allowing a strong time reduction and consequent patient and clinician burden. A chart review was performed for six patients with mood disorders (three major depressive disorder and three bipolar disorder) who were interviewed by the psychiatrist. Most of the CAT items that were endorsed positive were not documented in the six patients’ psychiatric evaluation through SCID-I. These items included clinically important information, such as a history of manic symptoms, potentially risky behaviours etc. This last study is an important example of how adaptive testing can be effective. Despite this, it has several limitations: first, the proposed model is totally deterministic; it starts from a theory based on the factorial structure and does not take into consideration the possibility that the subject’s answers are not corrected. A second limitation, according to the bifactor model, there is only one main dimension and the sub-dimensions related; so, if this condition is not satisfied the model can not be used. Finally, this model works only if each item loads on a primary dimension and no more than one sub-dimensions. If items are related to multiple sub-dimensions, they will not be appropriate for the bifactor model and therefore CAT is not applicable.

However, although there have been several attempts to apply adaptive clinical assessment, as far as we know, no system was able to combine adaptability, quantitative and qualitative information, and estimate error parameters through a probabilistic model.

The Formal Psychological Assessment, and its application to mood disorders, is the core of this work, and represents a further step to overcome the obstacles encountered up to now in adaptive testing.

In document Boletín Oficial. Gobierno de la Ciudad Autónoma de Buenos Aires (página 128-162)