Carbohidratos de las plantas
2.2. ACIDOSIS RUMINAL
As the second tier, the crowd was overall particularly good at identifying non- keepers (NoWay ratings), achieving 89% precision and 60% recall (F1=0.72) across all families combined (see Table 3). The crowd was only moderately successful in
predicting Definitely ratings (58% precision, 43% recall, 0.50 F1), but for one family, the crowd achieved 91% precision (n=35). This family rated significantly more Definitely's (n=62) than the other 4 families, which would drive up the crowd's precision for
Definitely's, similar to the previous discussion about the ML's high precision for NoWay's.
Note that the recall for four of the families ranged from 38% to 67%, meaning the crowd was able to uncover a significant subset of the Definitely's, in general. The recall and precision of zero for Family C's Definitely's may have been a consequence of their using nuanced and idiosyncratic criteria, which we discuss below.
Interestingly, four crowd workers expressed their enjoyment, via unsolicited emails to the research team, in doing the task of rating audio clips of children. Two workers said they "loved" hearing these clips, commenting on the cuteness and
hilariousness of the children's utterances. One worker even remarked: "As mine grow up I wish I had saved so much more audio of them."
Remarkably, the crowd divulged an interesting array of thought processes and criteria they used to make their decisions in their free-text responses in the tasks. Beyond frequent statements about recordings being "cute", "silly", and "adorable", workers often viewed specific activities they heard as being important to parents, such as singing, playing, and a "child calling for her daddy...means so much". Some workers guessed about possible use cases that would make an audio clip valuable, such as: "meaningful...if long distance" or "to a parent who isn't around at the time this occurred", "put into a musical Christmas card...sent overseas if they have a parent in the military", and "they might want to embarrass their kid when he's older; quite funny". Others made
judgments, different from their own opinion, based on what they thought the parent would choose: "it's a child screaming, perhaps [the parent] thinks it's funny but it's annoying really" and "I think this audio will only be meaningful to the audio
owner...while cute, it doesn't mean a lot to people who do not know the child or have some context to go with audio." Finally, many workers were willing to share personal
thoughts about the audio recordings themselves: "Reminds me of my kids", "heart breaking child wishing for parents, so moving", "sounded awesome; [my] favorite so far", "I love kids just being kids", and "children grow up so fast".
4.6.3.1 Identifying specialized crowds
Given the crowd's apparent ability to draw on their own experience or to use in- depth thought processes to predict ratings for others, we wanted to better understand if there was a subset of the paid crowd that seemed to perform better than the rest of the crowd. To identify a "specialized" crowd, we ran an experiment where 40 crowd workers rated 30 randomly chosen recordings from two families (15 from each family). In
addition to soliciting the ratings, we asked demographics questions of the workers: their age range, gender, how long they have been a parent, and how many children they have. We calculated each worker's accuracy based on how their ratings compared to the ground truth (i.e., the parents from the user study). Of those who have children (n=16), workers 35 and younger rate less accurately than those who are 36 and older (47.8% vs. 63.3%, respectively; Fisher's exact test p-value=0.02). Remarkably, when comparing the same age groups for workers without children, we see an opposite effect: workers 35 and younger were more accurate (55.8%) than those who were 36 and older (48.0%), but the effect was not statistically significant (Fisher's exact p=0.24).
Because this data suggest "older" parents are able to rate more accurately, we wanted to take a deeper look at why this might be true. We discovered that worker age was a by-product, in most cases, of how long a worker had been a parent. Workers who had been parents for 16 or more years (n = 5) were more accurate in their ratings than
those who had been parents for 5 years or less (70.7% vs. 53.6%, respectively; Fisher's exact p=0.006).
These results suggest there is a subset of the crowd with substantially more
expertise in predicting how parents of young children would judge the sentimental quality of audio recordings. This specialized crowd, as it were, appears to consist of middle- aged (and older) workers who are parents, and if they have been parents for a significant amount of time (16+ years), they seem to have even more expertise. If this is true, it would imply that their lived experience as parents has equipped them with specialized knowledge they are drawing upon for this problem. It is important to note that all the parents in the user study have been parenting for nine years or less. This could mean that a specialized crowd for curating digital audio recordings might need to have been
parenting at least as long as the end users in order to be most helpful.