Virtual and Augmented Reality: A preliminary study of a new approach to assessing and assisting everyday multi-tasking

(1)

MSc and Postgraduate Diploma in Clinical/Applied Neuropsychology

Research Portfolio

Mauricio Molinari Ulate

2337188M

Supervisor: Prof. Jonathan J. Evans

Advisor: Dr. Matthew Jamieson

Submitted in partial fulfilment of the requirements for the degree of MSc (Med Sci) Applied Neuropsychology, Academic Unit of Mental Health and Wellbeing, University of

Glasgow

Date of submission: 31/08/18

(2)

ii

Declaration of Originality Form

This form must be completed and signed and submitted with all assignments.

Please complete the information below (using BLOCK CAPITALS).

Name ... Student Number ... Course Name ... Assignment Number/Name ...

An extract from the University’s Statement on Plagiarism is provided overleaf. Please read carefully THEN read and sign the declaration below.

I confirm that this assignment is my own work and that I have:

Read and understood the guidance on plagiarism in the Undergraduate Handbook,

including the University of Glasgow Statement on Plagiarism 

Clearly referenced, in both the text and the bibliography or references, all sources used in

the work 

Fully referenced (including page numbers) and used inverted commas for all text quoted

from books, journals, web etc. (Please check the section on referencing in the ‘Guide to Writing Essays & Reports’ appendix of the Graduate School Research Training

Programme handbook.)



Provided the sources for all tables, figures, data etc. that are not my own work  Not made use of the work of any other student(s) past or present without

acknowledgement. This includes any of my own work, that has been previously, or concurrently, submitted for assessment, either at this or any other educational institution, including school (see overleaf at 31.2)



Not sought or used the services of any professional agencies to produce this work  In addition, I understand that any false claim in respect of this work will result in disciplinary

action in accordance with University regulations 

DECLARATION:

I am aware of and understand the University’s policy on plagiarism and I certify that this

assignment is my own work, except where indicated by referencing, and that I have followed the good academic practices noted above

(3)

iii

Dedication

(4)

iv

Acknowledgements

In the first place, my special thanks to the Ministerio de Ciencia, Tecnología y Telecomunicaciones of the Costa Rican Government for award me with a scholarship to financially support my studies at the University of Glasgow. I strongly suggest them to keep offering this opportunity to develop high quality professionals that can improve science and technology in Costa Rica, which will contribute to the development of our country.

Second, my gratitude to my supervisor Prof. Jonathan Evans for his constant support and professional advices during the entire process, and for sharing his knowledge with kindness and courtesy, which is already very valuable for my academic and professional career. As well, a special mention to Dr. Matthew Jamieson for offering me the opportunity to work in his project and for his feedback during the development of my dissertation.

Third, I extend my appreciation to Dr. Mark McGill and the Computer Interactive System Section of the School of Computer Science of the University of Glasgow, without your help and effort this project would not be possible.

Fourth, all my love and gratitude to my classmates Hooi and Leonidas because their emotional and intellectual support are part of this achievement. As well, to my flatmates Jakob, Ellen and Jordanna for their day to day motivation, and my best friends, Beto, Carlos, Ricardo and Víquez that, despite of the distance, were constantly interested and sending me motivational words.

Lastly, my deepest gratitude to all my family, particularly my parents and siblings. They have always been side by side in all my personal, academic and professional development. Your love and support always help me to do my best effort.

(5)

v

List of Tables

Table 1.1 Inclusion and Exclusion Criteria for Participants ... 7

Table 1.2 Timetable ... 15

Table 2.1 Quality Assessment Scores ... 33

Table 2.2 Data extracted from the final studies included in the systematic review ... 35

Table 3.1 Participants’ Demographics ... 69

Table 3.2 Descriptive Statistics for D-KEFS and both conditions of VR-SST ... 78

(8)

viii

List of Figures

Figure 1.1 Study PRISMA flow diagram ... 34

Figure 2.1 Comparison between VR-SST conditions’ means according to test administration time ... 81

Figure 2.2 Supermarket Layout... 96

Figure 2.3 Vending Machine... 96

Figure 2.4 Coffee Machine ... 96

Figure 2.5 Interaction with Pizza To Go ... 97

Figure 2.6 Mobile Shopping List App and Shopping Basket ... 97

Figure 2.7 Coffee message ... 97

Figure 2.8 AR coffee assistance ... 97

Figure 2.9 AR objects highlighting and text assistance ... 98

(9)

ix

List of Appendices

Appendix 1.1 Standard Quality Assessment Criteria, Manual for Quality Scoring of

Quantitative Studies ... 53

Appendix 2.1 Images of the VR supermarket layout and assistive technologies ... 95

Appendix 2.2 Correlations between primary and secondary outcomes from the Standard VR-SST and the D-KEFS Tower Test ... 99

Appendix 2.3 VR-SST Participants’ Instructions Sheet ... 101

Appendix 2.4 VR-SST Participants’ Information Sheet ... 103

Appendix 2.5 VR-SST Scoring Sheet ... 108

Appendix 2.6 Participants’ Consent Form ... 114

Appendix 2.7 Research Ethics Committee Approval Letter ... 116

(10)

Chapter 1

(11)

2 MSc and Postgraduate Diploma in Clinical/Applied Neuropsychology

Research Project Proposal

Virtual and Augmented Reality: A preliminary study of a new approach to assessing and assisting everyday multi-tasking

Mauricio Molinari Ulate

2337188M

Supervisor: Prof. Jonathan J. Evans

Advisor: Dr. Matthew Jamieson

Submitted for examination in part fulfilment of the MSc in Clinical/Applied Neuropsychology at the University of Glasgow

(12)

3 Summary

(13)

4 Introduction

Acquired brain injury (ABI) may affect a variety of brain areas including cortical and subcortical regions, causing a range of cognitive deficits that can impact many aspects of everyday life (Fassoti, 2017; Malec, 2017). Among the main cognitive difficulties, deficits in executive functions, attention and memory are typically reported after non-progressive and progressive brain injury conditions (Suchy, 2009; Bradley & Kapur, 2010). Because of the importance of both cognitive constructs for activities of daily living (ADL), neuropsychological assessments have targeted specific tests and tasks that provide information about the individual’s performance on these areas. However, cases have been

identified in the literature where the subject performed at or above average levels on standardized cognitive tests, but still reported difficulties on everyday life activities (Eslinger & Damasio, 1985; Shallice & Burgess, 1991).

In this context, Shallice & Burgess (1991) began the development of alternative tests that could detect everyday executive difficulties, one of which was the Multiple Errands Test (MET). The MET (Shallice & Burgess, 1991) and subsequent versions, such as the MET-SV (simplified version) (Alderman, Burgess, Knight & Henman, 2003) and the MET-R (revised version) (Morrison, et al., 2013) were demonstrably sensitive to reported difficulties on ADL and executive function tasks. These tasks have a common format in which participants are assessed in natural environments (e.g. shopping mall, hospital) and are given various tasks that must be done within a limited timescale and within certain task rules. However, as Jansari, et al. (2014) indicate, there are pragmatic limitations with this approach, such as the decrease in the clinician’s control over settings, the time involved to transfer individuals to ideal settings, and the difficulty of administering these types of assessment in clinical settings.

(14)

5 feasible way. This approach has resulted in the development of several novel virtual reality assessment tests, which vary on the technological equipment used, the virtual environments created (supermarkets, libraries and classrooms), the tasks, and the cognitive constructs assessed (Christiansen, et al., 1998; Rizzo, et al., 2000; Lee, et al., 2003; Kang, et al., 2008; Klinger, Chemin, Lebreton & Marié, 2004; Rand, Rukan, Weiss & Katz, 2009; Raspelli, et al., 2012; Jovanovski, Zakzanis, Campbell, Erb & Nussbaum, 2012; Renison, Ponsford, Test, Richardson & Brownfield, 2012; Jansari, et al., 2014).

Despite the number of previous studies, the technology used until now does not allow the participant to fully immerse into the virtual environment, as tests typically run on computer screens, which restrict a vivid sensation of the performance and limits the reliability of the results. By contrast, this project marks a step forward by enhancing the Virtual Reality assessment approach through a VR Supermarket Shopping Task (VR-SST) that fully immerses the participant within the virtual environment, allowing the individual to walk and move in the real world whilst interacting with the virtual environment. In addition, the VR environment provides the opportunity to examine the use of electronic assistive technology, such as a mobile app (also incorporated into the virtual environment) that can provide in-task reminders (e.g. keeping a list of products to buy during the in-task). Previously, similar tasks in real and virtual versions allow the participants to memorize or keep the list of items in a paper and pencil version, but with the availability of mobile technology that is increasingly used to support cognition in everyday life, MET tasks need to be able to support use of such technology whilst completing the task (Shallice & Burgess, 1991; Alderman, et al., 2003; Josman, Schenirderman, Klinger & Shevil, 2009; Rand, et al., 2009; Raspelli, et al., 2012; Jovanovski, et al., 2012; Morrison, et al., 2013; Cipresso, et al., 2014; Kolovopulos, 2017).

(15)

6 people with cognitive impairment with cues relevant to task completion in the real world. Virtual environments provide the opportunity to test out AR tools without the need to operate in real environments that may present significant risk to participants. The present study will therefore provide the opportunity to study the benefits of an Augmented Reality technology on the participants’ performance, which will be helpful in developing future devices that

compensate cognitive impairments in the real world. A preliminary study (Kolovopoulos, 2017) has shown an effect on the performance of participants using AR on a shopping task by comparing the difference between memorizing a list of products and keeping in mind a budget, with an AR assistive technology that provides the list and helps the participant with the expenses during the task. This project will focus on comparing the benefits of AR assistive technology with a currently existing assistive technology, such as a mobile phone app.

Lastly, the project represents an integration of computing science and clinical neuropsychology, incorporating professionals from the Computing Science Department and the Institution of Health and Wellbeing at the University of Glasgow.

Aims and Hypothesis

Aims

The study has two aims. The first is to examine the validity of the VR- SST as a tool for the assessment of executive functions from an everyday task approach.

The second, the project will explore whether augmented reality technology improves participants’ performance, by comparing the results of an AR condition with a condition in

which a standard mobile phone app (created in the virtual environment) is used.

Hypothesis

(16)

7 1. The performance on the VR-SST will correlate with the performance on a

standardized neuropsychological test (D-KEFS Tower Test)

2. Test performance with the AR assistive technology will significantly differ from the performance with mobile app assistive technology

Plan of Investigation

Participants

The sample will be composed of 32 healthy individuals, mainly students of the University of Glasgow, 18 years or above. For all the participants recruited, sociodemographic details, such as age, gender, ethnicity, and education information, will be collected.

The including and excluding criteria is shown in the following table:

Table 1.1 Inclusion and Exclusion Criteria for Participants

Inclusion Criteria Exclusion Criteria

18 years or above Under 18 years of age English as primary language Diagnosis of any cognitive deficit

Diagnosis of any mental health disorder Pregnancy

Participant or family member history of epilepsy, stroke, traumatic brain injury or

photosensitivity Prone to spells of dizziness Presence of visual disorders

Sleep deprivation

Under the influence of alcohol or drugs Suffer from a heart condition or other

(17)

8 Recruitment

The participants will be recruited through the Student Voice system, the Psychology Department and the School of Computer Science of the University of Glasgow via email and flyer distribution. The flyer will also be distributed through social media such as Facebook and Instagram. All the recruitment will be done according to the inclusion and exclusion criteria stipulated before. Each individual will be paid £10 for their participation in the study.

Settings and Equipment

The following settings and equipment will be needed for the study:

1. An empty room of a minimum 4x4 meters to install the VR equipment which will allow the participant to freely move in the real environment without been obstructed. 2. The HTC Vive system: is a virtual reality headset released on April 5th, 2016, by the smartphone company HTC and Valve corporation. The VR systems includes two screens, one for each eye, with a display resolution of 1080x1200 streaming data at a rate of 90Hz. It allows the participant to immerse in a “full room” visual field

experience with a 360-degree view, which lets the individual walk around, and explores the VR environment and by interacting with objects and the surrounding area through two controllers, one per hand, that are also visible in the VR display. The device allows movement tracking through gyrosensors, accelerometer and laser position sensors (Kolovopoulos, 2017; D’Orazio & Savov, 2015).

Measures

The following instruments will be used to assess the participants’ performance, motion

sickness and the participant experience with the virtual reality system:

(18)

9 1. VR-SST: the task is based on the Multiple Errands Test approach to assessing executive functions in everyday life situations (Shallice & Burgess, 1991) and its VR variants developed in recent years (Christiansen, et al., 1998; Rizzo, et al., 2000; Lee, et al., 2003; Kang, et al., 2008; Klinger, et al., 2004; Rand, et al., 2009; Raspelli, et al., 2012; Jovanovski, et al., 2012; Renison, et al., 2012; Jansari, et al., 2014)

The test consists of five tasks that the examinee must accomplish within the virtual supermarket: a) buy nine products on the shopping list provided in the virtual mobile phone app, b) buy an item of their preference from the vending machine, c) buy a product that will be notified during the task, d) order a large pepperoni pizza from Pizza to Go, and d) collect the large peperoni pizza from Pizza to Go. During the tasks, the participants must follow these specific rules: a) remain within the budget assigned, b) complete the task in the least amount of time possible, without rushing, c) execute all the tasks, d) do not talk to the researcher during the task. Regarding the measures of the performance, the following categories are assessed: a) tasks failures (e.g. forgetting to purchase one or more items from the list), b) rule breaks (e.g. spending more money than the budget assigned), c) inefficiencies (e.g. performing the tasks in the same order as the instructions) and d) effective strategies (e.g. buying the products by categories). A total errors score will be calculated, and this will be the primary VR-SST outcome measure. An overall efficiency score will also be calculated by multiplying the inverse of the score for the number of tasks successfully completed by the time taken to complete the test.

For the purpose of this study, there are two versions of the VR-SST:

(19)

10 b. AR Version: this recreates an augmented reality assistive technology that consists of virtual information (through images) that combines with (is overlaid on) the real world. The information is present in front of the participant’s view, without completely obstructing the view of the real world.

The participant will be assisted with a checklist of products, the items on the list (which will be highlighted when the participant is close to them and will show the name and the price) and visual text prompting at the shelves indicating the item needed and instruction to use the coffee and vending machine.

2. D-KEFS Tower Test (Delis, Kaplan & Kramer, 2001): has been developed as a tool

for the assessment of executive functions, such as spatial planning, rule learning, inhibition of impulsive responding, inhibition of perseverative responding, and establishing and maintaining the instructional set. It consists of five disks of different size and a board of three vertical pegs. The task has the examiner placing the disks on the pegs in a starting position and showing a picture of the disks, in a different position, which the examinee must reach (ending position). The examinee has to move the disks across the pegs and reach the ending position in the fewest number of moves possible. There are two rules that the examinee must follow: a) moving one disk at a time and b) never place a larger disk on a smaller one. The measures of the task include: total number of moves (primary outcome), first-move completion time, item-completion time, final achievement (correct or incorrect tower) and number of rule violations (Delis, et al., 2001).

For health concerns the following instrument will assess for motion sickness:

(20)

11 2. Illness rating scale: is a standard 7-point illness rating scale previously used by one of the researchers (McGIll, et al., 2017) to measure the motion sickness during each virtual reality condition. If the participant ever reaches the stage of stomach awareness, they will remove the headset and stop the study (Griffin & Newman, 2004).

To assess the participant’s experience with the virtual reality system:

1. NASA-Task Load Index (NASA-TLX): is a six-component scale that shows the contribution of each factor to the workload of a specific activity from the perspective of the rater. The scores of the six components are integrated measuring an overall workload score (Hart & Staveland, 1988).

2. iGroup Presence Questionnaire (IPQ): is a subjective rating scale for measuring the sense of presence experience by the participants in a virtual environment. It is composed of a three subscale and an additional item not belonging to any of the subscales (Schubert, Friedmann & Regenbrecht, 2001).

3. Translational Gain Questionnaire: is a brief questionnaire developed by the researcher composed of three items assessing the effect of the translational gain experience in the virtual reality conditions.

4. Open-ended interview: six open-ended questions will be asked to the participants about their experience using the assistive technologies assessed through the virtual reality system.

Design and Procedures

(21)

12 following stages: 1) A practice simulation will be loaded for the participants to familiarize with the first VR condition they will perform, the system and the use of the controllers; 2) the first VR version of the Supermarket Shopping Task will be performed; 3) participants will complete the NASA-TLX and the iGroup Presence Questionnaire; 4) participants will be assessed by the D-KEFS Tower Test; 5) A second practice simulation will be loaded for the participants to familiarize with the second VR condition they will perform, the system and the use of the controllers; 6) the second VR version will be applied to the participants; 7) participants will again complete the NASA-TLX and the iGroup Presence Questionnaire, and will complete the translational gain questionnaire and the open-ended interview.

The exposure to both versions will be counterbalanced, for some participants the Standard Version will be performed at stage 2 and AR Version at stage 4, and for the other cases, the Standard Version will be performed at stage 4 and AR Version at stage 2. As well, the experiment introduces two different layouts (two different shopping lists) for both versions that will also be counterbalanced. All participants will be assessed by the standardized neuropsychological test in between the VR task versions.

Participants Safety regarding the use of Virtual Reality equipment

Health concerns (physical and mental) regarding head-mounted display use

(22)

13 Health concerns regarding motion sickness

The incidence of motion/simulator sickness is unknown for this study, however, at the recruitment phase, the exclusion criteria will be applied to reduce the likelihood of individuals from suffering any adverse symptoms. Also, the participants will be assessed on their susceptibility to simulator motion sickness during and after each condition of the VR-SST. For this purpose, a) a standard 7-point illness rating scale will be used during each condition, where if the participant ever gets to the stage of stomach awareness they will have to remove the headset and the study will stop prematurely; and b) the Simulator Sickness Questionnaire (SSQ) will be used after each condition. This will reduce the likelihood of severe motion sickness from being induced, and ensure that all participants finish each session with, at most, mild motion sickness. Moreover, participants will be instructed to report any rapid increase in sickness immediately to prevent any symptoms and stop the session.

If participants suffer from dizziness, nausea or vomiting during the study, a first aid kit and sick bags will be provided, and the study will stop immediately. The room environment will also be controlled to reduce sickness symptoms, with air conditioning providing a cool environment throughout the study and drinking water will be available. If the participants cannot continue with the study, all data recorded to that point will be erased and the participant will be free to leave after sufficient rest period. In the unlikely event of an extreme case of motion sickness or any other atypical symptom, the experimenter will cease the study and remain with the participant until collection by friends or family, or they will be taken to the nearest A&E unit (Glasgow Royal Infirmary) or their GP for treatment.

Being unaware of what is going on in their physical environment

(23)

14 participant to ensure his/her safety.

The condition of being unaware of the physical environment can, rarely, cause anxiety or stress, and for this reason, any participant with conditions such as claustrophobia, panic attacks or anxiety will be excluded. Additionally, participants will be assured that their personal space will not be violated, and they will not be touched or surprised during the study other than to prevent them walking into a wall (this is unlikely as they would have walked beyond the perimeter of the virtual task).

Data Analysis/Power Calculation

The sample size was estimated through GPower 3.1 with previous effect sizes obtained in research about Virtual Reality Assessments Tests. Jovanovski, et al. (2012) found a correlation between Plan Score in a VR Multitasking in the City Center task and the Modified Six Elements (r=0.40), on a sample of undergraduate students. Also, Lalonde, Henry, Drouin-Germain, Nolin, and Beauchamp (2013) found an association between errors in the VR Stroop Task and rule violations on Tower Test of D-KEFS (r=0.51).

Considering these evidences, a mean of both effect sizes was obtained (r=.455) and through the GPower it was calculated that a sample of 32 participants would give an 80% power (at alpha =0.050) to detect an effect size of r=0.45. In relation to the difference between conditions (AR vs Mobile phone app) with alpha at 0.05, a sample size of 32 will have 80% power to detect an effect size of r=0.51

(24)

15 Regarding the comparison between the two versions of the VR-SST, if assumptions are met, t-tests will be used to compare the performance on both conditions. The primary comparison will be for the total errors score. Data will be analysed through the software IBM SPSS Statistics 23.

Practical Application

The present study contributes to the areas of computer science and clinical neuropsychology, and the interdisciplinary research approach between them. The examination of VR and AR technologies will allow their improvement and development, and validates their usefulness in different fields. If the VR task reflects executive skills and it can be demonstrated that AR technology improves task performance in multiple-errand type tasks, this raises the possibility that this technology may be useful in both the assessment and rehabilitation of individuals with cognitive impairments due to different neurological diagnoses.

Timescale

Table 2.2. Timetable

Timetable

Activity\Month 2018

1 2 3 4 5 6 7 8 9

Submit Research Proposal to Supervisor and Advisor

Make corrections from Supervisor and Advisor

Development of VR-SST

Submit Research Proposal to UofG Ethics

Wait for Approval from UofG Ethics

Data Collection

Analyze Final Data

Write Up of Final Version of Research Project

Submit Research Project

(25)

16 Ethical Approval

(26)

17 References

Alderman, N., Burgess, P. W., Knight, C. & Henman, C. (2003). Ecological validity of a simplified version of the multiple errands shopping test. Journal of the International Neuropsychological Society, 9, 31-44. doi: 10.10170S1355617703910046

Bradley, V. & Kapur, N. (2010). Neuropsychological Assessment of memory disorders. En Gurd, J. M., Kischka, U. & Marshall, J. C. (Ed.), The Handbook of Clinical Neuropsychology. Second Edition (159-183). Oxford: Oxford University Press

Christiansen, C., Abreu, B., Ottenbacher, K., Huffman, K., Masel, B. & Culpepper, R. (1998). Task performance in virtual environments used for cognitive rehabilitation after traumatic brain injury. Archives of Physical Medicine and Rehabilitation, 79(8), 888-892. doi: https://doi.org/10.1016/S0003-9993(98)90083-1.

Cipresso, P., Albani, G., Serino, S., Pedroli, E., Pallavacini, F., Mauro, A. & Riva, G. (2014). Virtual multiple errands test (VMET): a virtual reality-based tool to detect early executive functions deficit in Parkinson’s disease. Frontiers in Behavioral Neuroscience, 8, 1-11. doi: 10.3389/fnbeh.2014.00405

Delis DC, Kaplan E & Kramer JH. Delis–Kaplan Executive Function System (D-KEFS). San Antonio, TX: The Psychological Corporation; 2001.

D’Orazio, D. & Savov, V. (2015, March 1). Valve’s VR headset is called the vive and it’s

made by HTC. The Verge. Retrieved from

https://www.theverge.com/2015/3/1/8127445/htc-vive-valve-vr-headset

(27)

18 Fasoti, L. (2017). Mechanisms of Recovery After Acquired Brain Injury. En Wilson, B. A., Winegardner, J., Van Heugten, C. M. & Ownsworth, T. (Ed.), Neuropsychological Rehabilitation: The International Handbook (25-35). New York, Oxon: Routledge

Griffin, M. J. & Newman, M. M. (2004). Visual field effects on motion sickness in cars. Aviation, space, and environmental medicine. Aviat Space Environ Med, 75(9), 739– 748. Retrieved from: http://www.ncbi.nlm.nih.gov/pubmed/15460624

Hart, S. G. & Staveland, L. E. (1988). Development of NASA-TLX (Task Load Index): Results of empirical and theoretical research. In Human mental workload. Retrieved

June 17, 2013 from

http://humanfactors.arc.nasa.gov/groups/TLX/downloads/NASA-TLXChapter.pdf

Jansari, A. S., Devlin, A., Agnew, R., Akesson, K., Murphy, L. & Leadbetter, T. (2014). Ecological Assessment of Executive Functions: A New Virtual Reality Paradigm. Brain Impairment, 15(2), 71-87. doi: 10.1017/BrImp.2014.14

Josman, N., Schenirderman, A. E., Klinger, E. & Shevil, E. (2009). Using virtual reality to evaluate executive functioning among persons with schizophrenia: A validity study. Schizophrenia Research, 115, 270–277. doi: 10.1016/j.schres.2009.09.015

Jovanovski, D., Zakzanis, K., Campbell, Z., Erb, S. & Nussbaum, D. (2012). Development of a Novel, Ecologically Oriented Virtual Reality Measure of Executive Function: The Multitasking in the City Test. Applied Neuropsychology: Adult, 19(3), 171-182. doi: 10.1080/09084282.2011.643955

(28)

19 Kennedy, R. S., Lane, N. E., Berbaum, K. S. & Lilienthal, M. G. (1993). Simulator Sickness Questionnaire: An Enhanced Method for Quantifying Simulator Sickness. The International Journal of Aviation Psychology, 3(3): 203–220. doi: https://doi.org/10.1207/s15327108ijap0303_3

Klinger, E., Chemin, I., Lebreton, S. & Marié, R.M. (2004). A virtual supermarket to assess cognitive planning. Cyberpsychol. Behav, 7 (3), 292–293.

Kolovopoulos, D. (2017). Testing Augmented Reality Assistive Technologies in Virtual Reality. University of Glasgow, Glasgow.

Lalonde, G., Henry, M., Drouin-Germain, A., Nolin, P. & Beauchamp, M. H. (2013). Assessment of executive function in adolescence: A comparison of traditional and virtual reality tools. Journal of Neuroscience Methods, 219, 76-82. doi: http://dx.doi.org/10.1016/j.jneumeth.2013.07.005

Lee, J. H., Ku, J., Cho, W., Hahn, W. Y., Kim, I. Y., Lee, S., Kang, Y., Kim, D. Y., Yu, T., Wiederhold, B. K., Wiederhold, M. D. & Kim, S. I. (2004). A Virtual Reality System For The Assessment and Rehabilitation Of The Activities Of Daily Living.

CyberPsychology & Behavior, 6(4): 383-388.

doi: https://doi.org/10.1089/109493103322278763

Malec, J. M. (2017). Assessment for Neuropsychological Rehabilitation Planning. En Wilson, B. A., Winegardner, J., Van Heugten, C. M. & Ownsworth, T. (Ed.), Neuropsychological Rehabilitation: The International Handbook (25-35). New York, Oxon: Routledge

(29)

20 Morrison, M. T., Giles, G. M., Ryan, J. D., Baum, C. M., Dromerick, A. W., Polatajko, H. J. & Edwards, D. F. (2013). Multiple Errands Test-Revised (MET-R): A Performance-Based Measure of Executive Function in People with Mild Cerebrovascular Accident. American Journal of Occupational Therapy, 67(4), 460-468. doi: http://dx.doi.org/10.5014/ajot.2013.007880

Rand, D., Rukan, S., Weiss, P. L. & Katz, N. (2009). Validation of the Virtual MET as an assessment tool for executive functions. Neuropsychological Rehabilitation, 19(4), 583-602. doi: 10.1080/09602010802469074

Raspelli, S., Pallavacini, F., Carelli, L., Morganti, F., Pedroli, E., Cipresso, P., Poletti, B., Corra, B., Sangalli, D., Silani, V. & Riva, G. (2012). Validating the Neuro VR-Based Virtual Version of the Multiple Errands Test: Preliminary Results. Presence, 21(1), 31-42.

Renison, B., Ponsford, J., Testa, R., Richardson, B. & Browinfield, K. (2012). The Ecological and Construct Validity of a Newly Developed Measure of Executive Function: The Virtual Library Task. Journal of the International Neuropsychological Society, 18, 440-450. doi: 10.1017/S1355617711001883

Rizzo A, Buckwalter J, Van der Zaag C, Neumann U, Thiebaux M, Chua C, van Rooyen, A., Humphrey, L. & Larson, P. (2000, March 18-22). Virtual environment applications in clinical neuropsychology. Paper presented at Proceedings IEEE Virtual Reality 2000, New Brunswick, NJ, USA. doi: 10.1109/VR.2000.840364

Schubert, T., Friedmann, F., & Regenbrecht, H. (2001). The experience of presence: Factor analytic insights. Presence: Teleoperators and Virtual Environments, 10(3), 266– 281.

(30)

(31)

Chapter 2

(32)

23 MSc and Postgraduate Diploma in Clinical/Applied Neuropsychology

Systematic Literature Review

Validity and Reliability of Immersive Virtual Reality Neuropsychological Tests for Assessing Executive Functions: A systematic review

Mauricio Molinari Ulate 2337188

Submitted for examination in part fulfilment of the MSc in Clinical Neuropsychology at the University of Glasgow

(33)

24

Validity and Reliability of Immersive Virtual Reality Neuropsychological

Tests for assessing Executive Functions: A systematic review

Mauricio Molinari

ABSTRACT

Virtual reality (VR) is a simulation technology that allows the recreation of real-world environments such as apartments, schools, supermarkets, etc. Due to this characteristic, it has been considered a viable tool to recreate everyday activities in which neuropsychologists can assess individual’s performance and determine the presence of cognitive deficits. However, most of the VR assessment tools developed to date, have used a non-immersive VR technology instead of a more immersive approach where participants are fully immersed in the virtual world, offering a more vivid experience of their actions, closer to their everyday situations.

This systematic review will examine the immersive VR neuropsychological tests for the assessment of EF that have been published and analysed for their validity and reliability. Articles published up to 30th July 2018 were included. Search was done through the following databases EbscoHost, PsycBite, PsycNet, PubMed – NCBI, Redalyc, Web of Science (SciELO citations index), SciELO, Sciencedirect and by screening the reference lists of the articles. According to the selection criteria, a total of 13 studies (n = 683) were included in the review. The review process identified eight immersive VR assessment tools for EF. Only one study assessed the reliability of the tool. Validation studies showed small to medium effect sizes when comparing the performance on the VR test with the performance on traditional neuropsychological paper-pencil or computerised tests, and large effect sizes when discriminating between healthy and clinical population.

Keywords: Immersive virtual reality, neuropsychological assessment, executive functions

Introduction

(34)

25 practice (Culley & Evans, 2010; Pauly-Takacs, Moulin & Estlin, 2011; Jamieson, et al., 2017).

In terms of assessment, the inclusion of technology has led to the development of novel tests that may overcome several constraints of traditional pencil-paper neuropsychological tests. In particular, it has been argued that traditional tests tend to lack ecological validity, with individuals performing above or at average levels, but nevertheless experiencing difficulties in their daily life activities (Eslinger & Damasio, 1985; Shallice & Burgess, 1991). For this reason, virtual reality (VR) has arisen as one of the approaches to developing tasks closer to activities of daily living (ADLs) that could increase the ecological validity of cognitive evaluation.

VR refers to simulation technology that creates a computer-generated three-dimensional (3D) interface where the user can experience an active interaction with the virtual environment through visual experiences displayed on head mounted displays (HMD) and/or computer or TV screens (Galimberti, Ignazi, Vercesi & Riva, 2013; Shahrbanian, et al., 2012). It allows recreation or simulation of real-world environments such as supermarkets, cities, classrooms, laboratories, apartments, etc. (Lalonde, Henry, Drouin-Germain, Nolin & Beauchamp, 2013; Jovanovski, Zakzanis, Campbell, Erb & Nussbaum, 2012; Jansari, et al., 2014; Parsons & McMahan, 2017; Davison, Deeprose & Terbeck, 2017; Parsons & Barnett, 2017) allowing individuals to perform tasks similar and relevant to everyday living.

(35)

26 environment due to an exocentric navigation, where the user is outside the environment, looking at it through a screen or by 2D interaction devices (Shahrbanian, et al., 2012; Kozhevnikov & Gurlitt, 2013). The individual’s navigation around the virtual environment

(VE) can occur using keyboards, mice, joysticks, controllers, video-capture or through room-scale VR; the latter allowing the person to explore the VE via real-world walking (Wilson, McGill, Jamieson, Williamson & Brewster, 2018).

To date, a number of studies have contributed to the development of different types of virtual reality assessments for cognition, including memory, prospective memory, attention, inhibition, executive functions, navigation, and involving a variety of clinical populations such as acquired brain injury (ABI), schizophrenia or attention deficit hyperactivity disorder (ADHD) (Christiansen, et al., 1998; Ku, et al., 2003; Parsons, Bowerly, Buckwalter & Rizzo, 2007; Sweeney, Kersel, Morris, Manly & Evans, 2010; Grewe, et al., 2014). In a previous meta-analytic review, Negut, Matu, Sava & David (2016) studied the sensitivity of some of these VR measures in detecting cognitive deficits by comparing the performance of clinical and healthy populations, identifying that VR measures have a moderate sensitivity effect size in detecting cognitive impairments, considered to be similar to the global effect size identified in traditional paper-pencil and computerised tests.

(36)

27 EF is a neuropsychological construct involving a set of higher-level mental processes that control other cognitive systems (such as memory or reasoning) allowing behaviour to be guided in a purposeful, goal-directed and future-oriented way during novel situations or to overcome an automatic response tendency (Norman & Shallice, 1986; Suchy, 2009; Diamond, 2013; Jansari, et al, 2014). As VR offers the possibility of assessing the performance of individuals in simulated everyday activities, the construct of executive functions was chosen because of its importance in ADLs.

Methods

Literature Search

Seven databases were searched, including EbscoHost, PsycBite, PsycNet, PubMed – NCBI, Redalyc, Web of Science (SciELO citations index), SciELO and Sciencedirect. For papers in English, the searched was done through EbscoHost, PscyBite, PsycNet, PubMed-NCBI, Web of Science (SciELO citations index) and Sciencedirect. For studies in Spanish the databases used were Redalyc, SciELO, Web of Science (SciELO citations index), Sciencedirect and EbscoHost. Some studies were also identified by screening the reference lists of the articles. All articles up to 30th July 2018 were considered for review.

The following search strategy was used for studies in English: An initial search of each database was conducted using the terms:

(1) Virtual Reality AND Cognition

Titles considered likely to meet inclusion criteria were recorded. Next the following terms were used:

(2) Virtual Reality AND Executive Functions

(37)

28 (3) "virtual reality" AND executive function* OR cogniti* AND assess*

(4) ("virtual reality" AND ("activities of daily living" OR ADL) AND assess*)

For each search, potentially relevant studies not already identified were selected.

For studies in Spanish, the following searches were carried out in turn, with potentially relevant studies selected at each stage:

(5) "realidad virtual" AND "funciones ejecutivas" (6) "realidad virtual" AND cogni* AND evalu*

(7) "realidad virtual" AND cogni* OR "funciones ejecutivas" AND evalu*

Studies Selection Procedure

The following criteria was used for the inclusion of the studies: a) any experimental study assessing concurrent and/or convergent validity and/or reliability of a virtual reality neuropsychological assessment tool for executive functions (as defined at the Introduction), b) the virtual reality system had to implement the use of Head Mounted Displays (HMD), as it is the main characteristic of immersive VR, c) publications were in English or Spanish. Publications such as conference abstracts, case studies, dissertations and books, were excluded from the review.

A single rater screened the titles of the papers identified through the electronic databases, excluding those not related to the topic. Afterwards, duplicate papers were excluded. Two raters independently assessed the remaining abstracts, leading to the final full-text article review stage. When disagreements were identified, a discussion was carried on. A single rater checked the remaining papers, obtaining the final articles considered for the analysis.

The quality of the studies was assessed independently by two raters using the “STANDARD QUALITY ASSESSMENT CRITERIA” (Kmet, Lee & Cook, 2004). The rating

(38)

29 studies, due to the characteristics of the papers included at this review, the quantitative checklist was used. The checklist is formed by 14 items which are scored according to the degree in which they met the criteria (0 = no, 1 = partial, 2 = yes). Items not applicable are scored as “n/a” and are not considered in the calculation of the summary score (for more

details about items, see Appendix 1.1).

An extra item was included considering the “assessment of validity” and was scored

using the same score range for the rest of items (0-2). When studies compared the VR test with standardized neuropsychological tests and an elderly or clinical population, 2 points were given; when the studies compared the VR test with standardized neuropsychological tests or an elderly or clinical population, 1 point was obtained. When neither of the options were considered in the study, score was 0. The results for each paper can be seen at Table 2.1

Three of the original items of the scale were considered to be “not applicable”

according to the characteristics of the papers analysed in this review (items 5, 6 and 7). Considering this fact, the total score was obtained using the same formula explained by Kmet, et al. (2004), including the extra item added for this review.

Data extraction

The following information was extracted from the final articles: a) study identification (main author and year), b) name of the virtual reality test, c) sample size, d) mean age of participants, e) participants’ characteristics (e.g. healthy volunteers or clinical population),

(39)

30

Effect sizes

The effect sizes were obtained according to the aims of the studies and the primary outcomes identified for the VR tasks. When outcomes were unclear or multiple outcomes were considered, the researcher selected the one that best reflected the purpose of the task (e.g. the interference score was used for Stroop tasks).

As the purpose of this review was to determine which of the tasks available have been validated to assess executive function, only the effect sizes from convergent and concurrent validity were analysed. For concurrent validity, effect sizes were calculated from parametric or non-parametric between and within groups comparisons (when they were not reported), and for convergent validity, the correlations between the main outcome of the VR test and the main outcome of the standardized neuropsychological test used (convergent validity). To determine the test-retest reliability, effect sizes were extracted from the correlations between total mean score at both times of administration.

Results

Literature Search

(40)

31

Studies Characteristics

From the final 13 papers reviewed, 12 were assessing the validity of the tests and one evaluated reliability; none of the studies assessed both psychometric constructs. Validity was evaluated through comparing the performance on the VR test against standardized neuropsychological tests in seven studies, the remaining five studies compared the performance between two groups of participants. Eight studies used only healthy individuals, one only clinical subjects (TBI) and four compared healthy controls and clinical population/elderly subjects (including ABI, schizophrenia, old participants) (see Table 2.2 for more detail).

Regarding the type of virtual test studied, six papers assessed the validity of Stroop tests, four papers used an ADLs approach (cooking, shopping, parking a car), two used the Wisconsin Sorting Card Test approach of rule shifting and one followed the Multiple Errands Test (MET) version. Most of the VR tasks used a mouse, keyboard or joystick to interact with the virtual environment, only one study (Parsons & McMahan, 2017) used the HTC Vive system for the interaction, however, it is not clear if the participants used the controllers or if they did real-world walking while performing the task (room-scale virtual reality). To the acknowledgment of the investigators, according to the information given in the papers, none of the studies used a Virtual Reality Room-Scale approach.

(41)

32 Environment Grocery Store (VEGS, Parsons & McMahan, 2017) and two tasks based on the Wisconsin Card Sorting Test with no specific name (Pugnetti, et al., 1998; Ku, et al., 2003).

As executive functions are an umbrella term that covers different cognitive constructs, a variety of cognitive domains were targeted. The Stroop tasks focus on assessing inhibition/impulsivity, tasks based on the WCST target perseveration and rule shifting, the VEGS has two versions assessing everyday memory (episodic and prospective memory) and response inhibition, and the rest of the tests cover a broader number of constructs such as planning, problem solving, and calculation.

Quality of the Studies

Using the “STANDARD QUALITY ASSESSMENT CRITERIA” (Kmet, et al., 2004), two

(42)

33 Table 2.1. Quality assessment scores obtained from the “STANDARD QUALITY ASSESSMENT CRITERIA” (Kmet, et al., 2004).

Paper\Item 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Total sum Summary Score

(43)

34

Figure 1.1. Study PRISMA flow diagram

143 studies identified after title screening

124 studies were considered for abstract assessment

19 duplicates removed

Excluded N=67

Case studies (n=2)

Not focusing on EF (n=26)

Not using Fully Immersive (n=20)

Not assessment (e.g. rehab) (n=8)

Not focus on validity or reliability (n=2)

Dissertation (n=1)

Literature reviews (n=8)

Annual Review (n=4)

57 studies were considered for

Full-text Analysis

Excluded N=44

Not using Fully Immersive (n=41) Not assessment (n=2) Not focusing on EF (n=1)

13 studies included in the systematic

review

7 additional studies identified through

other sources Studies identified

through databases in English using all strategies described

15736

Studies identified through databases in

Spanish using all strategies described

(44)

35 Table 2.2. Data extracted from the studies included in the systematic review.

Study Name of the Test N

Mean Age

(years) Participants Study Design

Aim of the Study Standardized neuropsychologic al test VR task outcome measures Effect sizes

Armstrong, et al. (2013)

Virtual Reality Stroop Task

(VRST)

49 28,78

English-speaking active duty soldiers with

at least high school education or GED equivalent Correlational study between

VR test and standardized neuropsychologic al tests Validity D-KEFS CWIT PASAT ANAM 4 Virtual Reality City Memory PASAT VRST response time and accuracy VRST Interference with D-KEFS CWIT interference (r = ,49) and complex

interference (r = ,32)

VRST complex interference with

D-KEFS CWIT interference (r =

,41)

ANAM Stroop interference with

VRST interference (r = ,64) and complex

interference (r =

,67)

Christiansen, et al. (1998)

Virtual Reality Kitchen

30 30,96

Participants with TBI who

were receiving rehabilitation Test-retest reliability study using intraclass correlation (ICC) Reliabilit

y None

Number of correct responses and time required to complete each task Test-retest VR total score correlation (r =

(45)

36 Davison, et al.

(2017) IVR tasks 40

Young group (M = 20,55)

Old group (M = 69,94)

22 healthy young individuals and 18 healthy old individuals Between and within subject experimental design Correlational study Validity Stroop Colour Stroop Colour-Word TMT-A TMT-B

Task 1 = Time taken to complete parking simulator levels, number of parking simulator levels complete number of times virtual car crashed.

Task 2 = Time taken to place chair, time taken to place stool and total number of items placed.

Task 3 = Time taken to locate objects and Total number of items located.

Stroop CW with # of parking levels

completed (r = ,43), # of items placed in total (r =

,33)

TMT A with # parking levels completed (r = -,49), # of items placed in total (r =

-,35) and # of items located (r =

-,38)

TMT B with # of parking levels completed (r =

-,28)

Between groups comparison (young more than

old) # of parking levels completed (r = ,79), # of items placed (r = ,72) and # of items

(46)

37 Henry, et al. (2012)

Virtual Apartment Stroop Task

40 33,8

Healthy participants (not clearly mentioned) Correlational study and multiple regression analysis Validity D-KEFS CWIT The Elevator Counting with Distractors (TEA) The Continuous-Performance Task II (CPT-II) The Stop-it Task

Mean total reaction time, mean reaction time for correct responses, variation of reaction times, variation of reaction times for correct responses, number of correct responses, number of commission errors and number of omission errors.

Condition 2 = VR Stroop correct responses with D-KEFS CWIT time

completion (r = -,45)

Kang, et al. (2008)

Virtual Reality Based-Cognitive Assessment 40 Stroke participants

(M = 55,4) Control participants

(M = 48,7)

20 participants diagnosed with stroke due to unilateral brain lesions 20 healthy participants with no brain injury history Student t-tests comparison between experimental and control group

Validity None

Total time, total distance, judgment score and executive index Executive index between groups comparison (control group better performance) (r =

(47)

38 Ku, et al. (2003) Not given

name 26

Schizophreni c group (M =

30,07) Control group (M =

27,84) 13 schizophrenic patients and 13 controls Repeated measure design Validity Wisconsin Card Sorting test Standard Progressive Matrices Perseveratio n index, rule

finding, total index

VR without distractors perseveration index with WCST

perseveration

index (r = - ,59)

VR with distractors perseveration index with WCST

perseveration index (r = ,59) and

VR WD total index with WCST

total index (r = -,55)

Lalonde, et al. (2013)

VR Classroom Stroop Task

38 14,69

English-speaking adolescents Correlational study and multiple regression analysis Validity CBLC WASI BRIEF D-KEFS TT and

CWIT, TMT, Verbal Fluency and Twenty Questions Reaction time, omission errors, commission errors and correct answers D-KEFS CWIT and Box VR

Stroop (r = ,39) D-KEFS CWIT

and Word VR Stroop (r = .37)

Parsons and Barnett (2017) Virtual Apartment Stroop Task 89 Older age cohort (74.38) College cohort (20.59 years) Two groups: 39 healthy older age participants and 50 healthy college age participants Analyses of variance, pairwise comparisons and correlational analyses Validity D-KEFS CWIT ANAM Computerized Stroop Task Mean reaction time, correct responses, throughput VR Stroop interference overall performance (old perform more poorly) between groups comparison (r =

(48)

39 Parsons and Carlew (2016) VR Classroom Stroop Task Stud y 1: 50 Study 1 (20.37) Study 1: healthy undergraduate students Analyses of

variance Validity

D-KEFS CWIT ANAM Computerized Stroop Task Mean reaction time, correct responses, throughput VR Classroom condition: between interference and colour-naming (r = 1,13) Between interference and word-reading (r =

1,00) Parsons and McMahan (2017) Virtual Environme nt Grocery Store Stud y 1: 42 Stud y 2: 61 Study 1 (19.83) Study 2 (20,89) Study 1: healthy participants Study 2: healthy undergraduate participants Correlational study and Analyses of variance

Validity CVLT-II D-KEFS CWIT Long delay free and cued recall, time based, event based, number of times check shop list, number of times check map, list items and extra items picked up and basket total budget VEGS (without distractors) with long delay (r = ,31) and cued (r =

,38) recall of

CVLT-II

VEGS (with distractors) with

long delay (r = ,31) and cued (r =

,45) recall of CVLT-II

VEGS # of times checked the map correlated with D-KEFS CWIT interference (r = .26) and inhibition

(49)

40 VEGS prescription

drop-off (r = .50) and pickup times (r = .50) with D-KEFS

inhibition/switchi ng completion

time

Parsons, et al. (2013)

Virtual Reality Stroop Task

(VRST)

50 19,71 College-aged participants

Analyses of

variance Paired sample

-tests Correlational study Validity D-KEFS CWIT ANAM Computerized Stroop Task Colour-word score, simple interference score, complex interference score Interference from the VRST with interference from

the D-KEFS CWIT (r = ,45) and the ANAM Stroop (r = ,38)

Pugnetti, et al. (1998) Not given name Stud y III: 68 Study III: control group (36,6) and patients’ group (36,4) 32 healthy individuals and 36 ABI patients (25 MS, 4 CVA,

5 TBI and 2 hydrocephalu

s)

Correlational

study Validity

Wisconsin Card Sorting Test VR categories achieved, total correct and errors Between groups comparison for VR categories achieved (controls more than patients) (r = 1)

and VR total errors (patients

more than controls) (r = 0,5)

Zhang, et al. (2001)

Virtual Reality Kitchen 60 Two groups: Healthy volunteers (33,1) and TBI individuals (30,96) 30 healthy volunteers and 30 individuals with TBI Student t-tests comparison between and within subjects

Validity none Overall Score

1st between groups comparison (clinical groups more errors than

healthy volunteers) (r =

(50)

41 2nd between

groups comparison (clinical groups more errors than

healthy volunteers) (r =

1,18)

Note: D-KEFS: Delis-Kaplan Executive Function System; CWIT: Colour-Word Interference Test; PASAT: Paced Auditory Serial Addition Test; ANAM 4: Automated Neuropsychological Assessment Metrics, version 4; VRST: Virtual Reality Stroop Task; TBI: Traumatic Brain Injury; ICC: intraclass correlation; VR: Virtual Reality; IVR: Immersive Virtual Reality; CW: Colour-word; TMT: Trail Making Test; TEA: Test of Everyday Attention; CPT: Continuous

(51)

42

Effect Sizes

The effect sizes for each task are presented on Table 2.2. Due to the diversity of tests identified, and the different outcomes used in the studies to assess convergent validity, in this section, an analysis of the effect sizes obtained are explained according to the type of tests identified.

Stroop Tests

The studies evaluating virtual reality versions of the Stroop used the following traditional neuropsychological measures to assess validity: D-KEFS Colour-Word Interference Test (D-KEFS CWIT), Tower Test (D-KEFS TT), Trail Making Test, Verbal Fluency and Twenty Questions, Automated Neuropsychological Assessment Metrics (ANAM) Computerised Stroop task, Paced Auditory Serial Addition Test (PASAT), the Elevator Counting with Distractors (from the TEA), the Continuous Performance Task II (CPT-II) and Stop-it Task. For the purpose of this review, only the effect sizes obtained by comparing the virtual reality Stroop main outcome (determined by the researcher when it was not identified by the authors) against a traditional Stroop test (D-KEFS CWIT or ANAM) and by comparing between groups, were reported.

(52)

43 The Virtual Apartment Stroop Task (Parsons & Barnett, 2017) compared the performance between groups of young and old participants. The results showed large effect sizes, indicating that old volunteers performed poorer than young individuals.

Wisconsin Card Sorting Test approach

Two studies created a virtual test where participants had to exit a maze through opening doors. For opening doors, a similar rule finding method from the WCST was used. One of the studies obtained medium effect sizes when comparing the performance on the VR task with performance on the WCST (Ku, et al., 2003). In the other study, medium to large effect sizes were obtained when differentiating between a healthy and an ABI group performance. Activities of Daily Living approach

The tests that used this approach varied according to the type of activity simulated in VR, such as shopping, parking a car, cooking, arranging seats or finding item in a chemistry lab. The shopping task (Kang, et al., 2008) showed large effect sizes when differentiating between a group of healthy participants and a group of stroke individuals using an executive function score. The parking and chemistry lab activities (Davison, et al., 2017) showed small to medium effect sizes while comparing the performance on these tasks with the Stroop interference and TMT A and B.

The virtual kitchen (Christiansen, et al., 1998; Zhang, et al., 2001) is the only test found in this review that assesses reliability through a test-retest approach, showing a medium effect size for the correlation between administration time. Also, validity of the test was studied by comparing healthy participants with a TBI sample at two different times, obtaining large effect sizes indicating that TBI individuals make more errors than healthy controls.

Multiple Errands Test

(53)

44 CVLT-II. Both conditions of the VEGS (distractor and without distractors) obtained small effect sizes when compared to the CVLT-II. Also, the results demonstrated small and medium effects sizes when correlating distractor performance against the D-KEFS CWIT. Discussion

The aim of this systematic review was to identify the available VR neuropsychological tests, displayed through head-mounted displays (HMD), for the assessment of executive functions and to determine which of them have better validity and/or reliability. For this review, only effect sizes comparing the performance on the VR tests between groups and/or against traditional neuropsychological tests were analysed.

From the 13 studies included for the review, a total of eight VR tests were identified, including Stroop versions, ADLs assessment approaches, MET and WCST approaches. When the performance of the VR tests was compared against traditional neuropsychological tests, the majority of the studies showed small to medium correlations, only one study reported large effect sizes (Armstrong, et al., 2013).

All the studies that assessed concurrent validity by comparing between groups, showed large effect sizes, indicating their capacity to differentiate between healthy participants and clinical or elderly population. These results are similar to the ones obtained by Negut, et al. (2016), who studied the sensitivity of VR tests to distinguish performance between groups in their meta-analysis.

(54)

45 It is challenging to draw general conclusions regarding the validity of immersive VR tasks. First, a wide range of forms of validity were assessed (including convergent, discriminant, concurrent and predictive validity), meaning there was little consistency across studies. For this reason, the review focused on examining the case-control studies and studies examining correlations between performance on the VR tasks and more traditional neuropsychological tests of executive function. This does, however, mean some aspects of validity were not extracted

Second, even though most of the studies clearly identified the outcomes of the VR tests, none specified a unique primary outcome that can be considered as a primary measure of the cognitive construct being examined. In fact, this was similar for the outcomes used from the neuropsychological tests, where different scores were considered for the comparisons between performances or, when using the same test, the outcomes were distinct across the studies (e.g. D-KEFS CWIT outcomes included mean completion times, correct responses or total errors). This means that studies frequently reported multiple correlations, meaning that the validity of any result may be questioned and, also, meaning that it was challenging to synthesise results across studies. By screening the results at Table 2.2, this limitation can be observed.

Third, most of the studies stated a broad hypothesis that the VR tests will correlate with traditional neuropsychological tests, which could be considered as a very vague approach to assessing validity. According to this, any correlation between any of the outcomes of the tests could be considered as evidence for the validation of the VR tests. However, strong evidence requires a comparison with outcomes that have been demonstrated to target and measure the cognitive domain of interest.

(55)

46 However, no good evidence was obtained which validated the tests regarding other domains such as planning, problem solving or prospective memory. It is important to mention that these tests maintain a very similar format to the traditional neuropsychological tests, just passing from a paper-pencil version to a virtual reality display.

Despite previous limitations of the studies reviewed, VR does offer some advantages that traditional neuropsychological tests lack. The majority of the tests simulated everyday tasks and distractors that paper-pencil, or even computerised versions, are not able to reproduce. This offers the possibility of developing more ecologically valid assessments to get closer to the everyday performance of the individuals. In fact, participants have to perform the tasks by moving and interacting in a simulated real environment, providing an opportunity to test real functional ability in a controlled way.

Limitations and Future Research

Due to the diversity on the definitions of executive functions, some studies might be excluded as they did not adjust to the terminology used at this review. However, during the selection and exclusion of the studies, differences regarding this fact were rare between the two reviewers.

Another important limitation is that the description of the virtual reality equipment used was not very detailed in some studies, complicating the inclusion or exclusion decision regarding the immersive virtual reality criteria. Some available virtual tests may not be considered for this review as they did not mention the use of head-mounted displays and, for this review, that characteristic was considered as crucial for determining if a test was carried out using immersive or non-immersive virtual reality.

Virtual and Augmented Reality: A preliminary study of a new approach to assessing and assisting everyday multi-tasking

Dedication

Acknowledgements

Table of contents

List of Tables

List of Figures

List of Appendices

Chapter 1

Chapter 2

Validity and Reliability of Immersive Virtual Reality Neuropsychological

Tests for assessing Executive Functions: A systematic review

Mauricio Molinari