9. MARCO JURÍDICO
9.3. Leyes y Decretos a partir de la Constitución de 1991 relativos a las
Our experiment is the first that uses a natural field experiment in a real labor market to examine how a task’s meaningfulness influences labor supply.
Overall, we found that the greater the amount of meaning, the more likely a subject is to participate, the more output they produce, the higher quality output they produce, and the less compensation they require for their time. We also observe
an interesting effect: high meaning increasesquantity of output (with an insignificant
increase in quality) and low meaning decreases quality of output (with no change
in quantity). It is possible that the level of perceived meaning affects how workers substitute their efforts between task quantity and task quality. The effect sizes were found to be the same in the US and India.
Our finding has important implications for those who employ labor in any short- term capacity besides crowdsourcing, such as temp-work or piecework. As the world begins to outsource more of its work to anonymous pools of labor, it is vital to understand the dynamics of this labor market and the degree to which non-pecuniary incentives matter. This study demonstrates that they do matter, and they matter to a significant degree.
This study also serves as an example of what MTurk offers economists: an excellent platform for high internal validity natural field experiments while evading the external
validity problems that may occur in laboratory environments.
Acknowledgements
Both authors contributed equally to this work. The authors wish to thank Professor Susan Holmes of Stanford University for comments and for allowing us to adapt the DistributeEyes software for our experiment (funded under NIH grant #R01GM086884- 02). They gratefully acknowledge financial support from the George and Obie Schultz Fund and the NSF Graduate Research Fellowship Program. The authors also thank Iwan Barankay, Lawrence Brown, Rob Cohen, Geoff Goodwin, Patrick DeJarnette,
John Horton, David Jim´enez-Gomez, Emir Kamenica, Abba Krieger, Steven Levitt,
Blakeley McShane, Susanne Neckermann, Paul Rozin, Martin Seligman, Jesse Shapiro,
J¨org Spenkuch, Jan Stoop, Chad Syverson, Mike Thomas, Adi Wyner, seminar par-
3
Preventing Satisficing in Surveys
∗Abstract
Researchers are increasingly using online labor markets such as Amazon’s Mechan- ical Turk (MTurk) as a source of inexpensive data. One of the most popular tasks is answering surveys. However, without adequate controls, researchers should be concerned that respondents may fill out surveys haphazardly in the unsupervised en- vironment of the Internet. Social scientists refer to mental shortcuts that people take as “satisficing” and this concept has been applied to how respondents take surveys. We examine the prevalence of survey satisficing on MTurk. We present a question-
presentation method, called Kapcha, which we believe reduces satisficing, thereby
improving the quality of survey results. We also present an open-source platform for further survey experimentation on MTurk.
3.1
Introduction
It has been well established that survey-takers may “satisfice” (i.e., take mental short- cuts) to economize on the amount of effort and attention they devote to filling out
a survey (Krosnick, 1991).2 As a result, the quality of data in surveys may be lower
than researchers’ expectations. Because surveys attempt to measure internal mental processes, they are by their very nature not easily verifiable by external sources. This presents a potential problem for the many researchers who are beginning to employ Amazon’s Mechanical Turk (MTurk) workers to answer surveys and participate in academic research (Paolacci et al., 2010b, Horton and Chilton, 2010, Chandler and Kapelner, 2013 which is also Chapter 2 of this document). Moreover, unlike other tasks completed on MTurk, inaccuracies in survey data cannot be remedied by having multiple workers complete a survey, nor is there an easy way to check them against
“gold-standard” data.3
In our experiment, we examine alternative ways to present survey questions in order to make respondents read and answer questions more carefully.
Our first treatment “exhorts” participants to take our survey seriously. We ask for their careful consideration of our questions by placing a message in prominent red text on the bottom of every question. Surprisingly, this has no effect.
Our two other treatments took a more economic approach and attempted to alter the incentives of survey-takers who ordinarily have an incentive to fill out questions as quickly as possible in order to maximize their hourly wage and exert minimal
2Krosnick (1991) applies Simon (1955)’s famous idea of satisficing to how respondents complete surveys.
3In an MTurk context, “gold-standard” data refers to asking workers questions to which the surveyor already knows the answer as a way to identify bad workers. Although this is straightforward for an image labeling task (e.g. Holmes and Kapelner, 2010 which is also Chapter 6 of this document and Sorokin and Forsyth, 2008), it is less clear how to apply this concept to surveys.
cognitive effort. More specifically, both treatments force the participant to see the question for a certain “waiting period”. Combined, these waiting period treatments
improved the quality of survey responses by 10.0% (p < 0.001). Under the waiting
period treatments, the participant is forced to spend more time on each question and once there, we hypothesize that they will spend more time thinking about and thoughtfully answering questions.
The first of these two treatments, called simply the Timing control treatment,
features a disabled continue button for the duration of the waiting period. The
second of these treatments, referred to as the Kapcha4 has a waiting period equal
to that of the Timing control treatment, but also attempts to attract the attention
of respondents by sequentially “fading in” each word in the question’s directions, its prompt, and its answer choices. This treatment was the most effective and improved quality by approximately 13%.
To proxy for quality, which is largely unobservable, we introduce a “trick question” into the survey as a way of measuring whether people carefully read instructions. We echo the methodology from Oppenheimer et al. (2009) who call this trick question
an instructional manipulation check (IMC). Additionally, we give respondents two
hypothetical thought experiments where we ask them to imagine how they would
behave under certain conditions. The conditions are identical except for a subtle
word change that would only be apparent if the instructions were read carefully —
hence, for close readers, there should be a greater difference in reported behavior as
compared with people who were merely skimming.5
4The name was inspired from the “Captcha” Internet challenge-response test to ensure a human response (Von Ahn et al., 2003).
5This portion of our experiment replicates Study 1 in Oppenheimer et al. (2009). Their primary focus was to identify subsamples of higher quality data and to eliminate the “noisy data” (i.e., the participants who did not read the instructions carefully enough to pass the trick question). This enables researchers to increase the statistical power of their experiments.
This paper presents initial evidence on alternative ways to present survey ques- tions in order to reduce satisficing. We hypothesize that altering the cost-benefit analysis undertaken by survey respondents is the mechanism which reduces satisfic- ing. The approach we present has the benefit of improving the quality of results without increasing monetary cost or convenience for the surveyor. We also examine the prevalence of satisficing and how it may vary across respondent demographics. Finally, we discuss ideas for further improving how to present survey questions.
Section 5.2 explains our experimental methods. Section 3.3 illustrates our most important results, section 9.6 concludes, and section 3.5 talks about future direc- tions. Appendix 3.5 describes the TurkSurveyor open-source package for running experiments and Appendix 3.5 provides links to our source code and data so that others may replicate and verify our results.