The concept of crowdsourcing incorporates several approaches. This term combines the two key elements of the process, which depends on a large number of workers or online users (crowd) who contribute to out-(sourcing) by performing some tasks or providing potential ideas or solutions.
The concept appeared first in an article written by Howe (2006) where he states that crowd- sourcing is:
“an umbrella term for a highly varied group of approaches that share one obvious attribute in common: they all depend on some contribution from the crowd. But the nature of those contributions can differ tremendously”.
The primary definition of crowdsourcing quoted by Doan, Ramakrishnan, and Halevy (2011) is:
“.. enlists a crowd of humans to help solve a problem defined by the owners of the system” In other words, crowdsourcing is defined as a system where the requesters (i.e., those who need data-related tasks to be completed) use outsourcing by posting their tasks online (like an
open call) via a website or platform to a crowd of individuals who will perform the tasks for them. Brabham (2008) defines crowdsourcing in his article as:
“an online, distributed problem-solving and production model that leverages the collective intelligence of online communities to serve specific organisational goals.”
This definition focuses on the type of the problem that needs to be solved. Estellés-Arolas and González-Ladrón-De-Guevara (2012) came up with a general definition of crowdsourcing by aggregating 40 definitions, which had appeared between 2006 and 2011, by performing accu- rate analysis of eight factors derived from the three main elements: the Crowd, the Initiator, and the Process. The definition is as follows:
“Crowdsourcing is a type of participative online activity in which an individual, an insti- tution, a non-profit organization, or company proposes to a group of individuals of varying knowledge, heterogeneity, and number, via a flexible open call, the voluntary undertaking of a task. The undertaking of the task, of variable complexity and modularity, and in which the crowd should participate bringing their work, money, knowledge and/or experience, always entails mutual benefit. The user will receive the satisfaction of a given type of need, be it economic, social recognition, self-esteem, or the development of individual skills, while the crowdsourcer will obtain and utilize to their advantage that what the user has brought to the venture, whose form will depend on the type of activity undertaken.”
Brabham (2013) describes this definition as “wordy but complete” and it covers most aspects of what crowdsourcing represents. He argues that since the rise of the crowdsourcing perspective in 2006, several cases were considered as crowdsourcing in the literature which are technically not. The next section describes the origins of crowdsourcing and the historical background of the concept.
2.3.2 The Origin of Crowdsourcing
The theme of Surowiecki’s book was the inspiration for Howe (2006) to outline the concept of crowdsourcing and name some examples that represent the main idea behind it. He used Threadless.com, InnoCentive.com, Amazon’s Mechanical Turk, and iStockphoto.com as examples of crowdsourcing models (Surowiecki,2004). There are many successful examples on the web, where people bring together their knowledge and opinions as resources to support non-profit organisations. Moreover, in other cases, people use their creativity and skills to design a prod- uct or serve business goals in solving a particular problem. However, there are some studies that have identified some conditions regarding the definition a crowdsourcing system.
In his book, Brabham (2013), setting out what he considers crowdsourcing is and is not, de- scribes the three factors that a crowdsourcing system requires (see Figure2.2). These factors are: (1) The traditional top-down management from the organisation or the requester to the crowd. (2) The bottom-up, open creativity process from the crowd. (3) The position of the lo- cus of control of the innovation in an openly exchangeable platform between the organisation and the crowd.
FIGURE2.2: The key ingredients of crowdsourcing systems as described in (Brab-
ham,2013).
These key ingredients as distinguished by Brabham are the concepts that are an essential part of crowdsourcing. Both the business - or the requester - and the workers should share the control of creating the solution to the crowdsourced task, and once the locus of control moves down to the crowd or goes up to the business management, the system violates one of the key conditions of crowdsourcing. For example, Wikipedia, the world’s largest knowledge base on the web, was established in 2001 and has used as a base for contributions for more than 200 projects from different companies (Halder,2014). The locus of control is based with the users and it is only due to their contributions that the scope of the work increases. However, the company still has the control on removing or editing the content that appears on the Wikipedia website by using Wikiproject which consists of groups of contributors, each group examining articles pertaining to a specific topic and assessing the quality of these articles.
A similar situation applies to any open source project, such as the Mozilla Firefox Web browser, where the supervision of management is absent and the collaboration flows horizontally be- tween the users. Although there are users who act as management assessors in these examples, these projects are not considered crowdsourcing systems according to Brabham’s definition. The constraints of Brabham’s definition conflict with the first definition given by Howe (2006) and, moreover, it is in disagreement with several papers and studies such as Yuen, King, and Leung (2011) and Dawson and Bynghall (2011) that present a survey and taxonomy of crowd- sourcing systems and applications as will be described in Sections2.5and2.6.4.
2.3.3 The value of using crowdsourcing
AI systems aim to develop an adaptive learning system that can involve, work in, and assess many different situations based on training datasets that contain similar situations stored in knowledge banks. However, in some cases, such as medical diagnoses, the dataset could be limited or sometimes not exist (Holzinger,2016). In these cases, using human expertise is the best way to find the right diagnosis and adds new information to the database; this has been defined in the literature as a “Human in the Loop” approach (Holzinger, 2016; Dautenhahn, 1998).
Using Humans in the Loop at every stage will improve machine learning performance by help- ing to create training data as well as helping to solve unknown cases to gain more accurate results and to make the algorithm smarter. Over the past few years, the term “Crowdsourcing” appears to have replaced “Human Computation” in several cases where employees could be replaced with people outside the business by means of an open call as compared to outsourcing where specific professionals are assigned the job (Holzinger,2016). Crowdsourcing appears to be an extension of the HC aspect.
The benefit of using crowdsourcing platforms (i.e., where work requests and offers from the crowd come together) is the possibility of carrying out a job incredibly fast, with reasonable quality, and at a low cost in comparison with the traditional way of completing the same job (Alonso,2013). In this context, a study by Crump, McDonnell, and Gureckis (2013) attempts to validate Amazon Mechanical Turk as a tool for collecting data in behavioural cognitive re- search. They designed several types of experiments and compared the results with traditional laboratory ways of collecting data. The findings of this study proved that the quality of the data collected under the experimental conditions in Amazon Mechanical Turk is highly similar to the quality of the data collected the traditional laboratory way. Despite some concerns related to the limitations of technical and visual design of the task and unexpected behaviour such as dropping out of a task before finishing it, collecting data with crowdsourcing saves time and money and could reach a wide range of users in a few seconds.