5. Síntesis de proteína microbiana
5.2. Aportes nitrogenados
In this chapter, we use the term "prototype" to refer to an approach, not
necessarily a technical prototype. We prototyped an approach with two goals in mind. The first goal was to identify specialized subcrowds, each having a range of expertise
for multiple topics. Just as important, we also wanted to know when we could not identify expertise levels and specialized crowds.
The second goal was to solicit questions from the crowd that can be differentiated by complexity, and to determine if these differences are correlated to workers' expertise. We also wanted to know what difficulties people had in doing this.
There were three important considerations we had to work through. First, we needed an instrument to measure workers' expertise levels in various scientific topics. Second, we needed an approach for question solicitation that would encourage crowd workers to create questions with rich content and of varying complexity. Third, we needed to determine a way to measure the difficulty level of the questions the crowd generated. Next, we discuss each of those considerations, in turn.
5.2.1 Using a Validated Measurement Instrument
For the measurement instrument, we leveraged prior work in measuring civic scientific literacy (e.g., Miller 1998, 2012a, 2012b). We used a validated survey instrument, with suitably updated questions, for measuring crowd workers’ expertise levels. The survey consisted of 42 questions: 35 closed-ended questions (i.e., multiple choice and true/false) and 7 open-ended questions (i.e., written responses). The questions fell into two broad domains: biology and space. Within the biology domain, there were questions covering the topics of DNA, stem cells, molecules, evolution, and genetics. Within the space domain, there were questions covering the topics of planets, the Sun, Earth's atmosphere and geology, and the universe.
The percentage of correct items for a topic is used as the measure of expertise in that topic. To determine this measure, all closed-ended responses were weighed on the
same binary scale (i.e., one point if correct, zero if not), and the open-ended responses were weighted using a three-point scale, effectively making it worth three times more than a closed-ended item. This weighting of the open-ended responses accomplished two things: it allowed for a more fine-grained measurement of varying quality, and this higher resolution allowed us to tease apart the workers on the high end of the expertise scale. This treatment of open-ended responses is similar to Miller's measurement approach for open-ended responses (1998, 2012a, 2012b).
We used this measurement approach to establish measures for well-represented topics in the survey. In particular, we established measurements for the space and biology domains, and as a subset of those domains, we had measures for the topics of DNA, stem cells, the solar system, and the universe.
5.2.2 Soliciting Questions from the Crowd
We needed an approach for question solicitation that would encourage crowd workers to create questions with rich content and of varying complexity. To build our intuition on how to design for this, we performed several pre-studies. We found that the scope of the topics might affect the crowd's ability to create questions that varied in complexity. This is a classic problem in expertise finding on where to divide the topics and how narrow should they be (Merritt et al. 2016). Topics that were too broad resulted in vague questions, and topics that were too narrow or specialized (e.g., molecular biology), caused people to struggle with having enough knowledge to generate multiple questions. Ultimately, we stayed within the science and technology domain, and pre- testing pointed us towards four topics that showed promise: DNA, stem cells, the solar
measure expertise in these topics, which allowed us to correlate workers' topical expertise with the difficulty level of their questions.
The pre-studies also showed us that soliciting multiple questions per topic led to richer questions because it forced workers to go beyond the canned "What is [insert topic here]?" This also applied to asking the crowd for multiple levels of difficulty: instead of simply asking for questions, we asked for easy, medium, and hard questions. Not only did this increase the number of questions we asked for, it also prompted intentional thinking about the difficulty levels of questions, which we found the crowd was able to do.
5.2.3 Measuring Question Difficulty with a Rubric
To measure the difficulty level of the crowd-generated questions, we created a rubric for assessing the complexity of the question. The rubric was also used to score the open-ended responses in the survey. In short, we considered the complexity in the question content as well as the complexity of the answer. If the question or the expected answer covered a complex topic, the scoring would reflect that. Although the rubric adds consistency and reliability to the scoring, the scoring will always include some amount of subjectivity. Thus the use of the rubric to score the questions generated by the crowd, as well as the open-ended responses in the survey, introduced opinion-based criteria into the measurements of expertise. Combined with the fact-based measurements from the closed-ended responses on the survey, this study is positioned somewhere in the middle of the fact-opinion spectrum of types of expertise.