Crowdsourcing as a Map Reduce Job
Navy SBIR 2013.1 - Topic N131-063
ONR - Ms. Lore Anne Ponirakis - [email protected]
Opens: December 17, 2012 - Closes: January 16, 2013
N131-063 TITLE: Crowdsourcing as a Map Reduce Job
TECHNOLOGY AREAS: Information Systems, Human Systems
ACQUISITION PROGRAM: PMMI, MCSC
OBJECTIVE: Develop a methodology based on cloud computing for using crowdsourcing to recognize social changes or activity of immediate significance to crisis situations.
DESCRIPTION: During crises situations, whether they are caused by national disasters or regional instability, accurate and timely information is needed for mission success. Some of this information can be retrieved from existing data sources (e.g. local weather) but much needs to be reasoned about by humans (how locals will react to the insertion of forces). During the early phases of a crisis, decision makers such as Marine Expeditionary Unit (MEU) commanders are often confronted with more raw information than can be processed even if they had an organic capability to interpret the data. A capability to surge analytic efforts is needed in support of crises response in order to optimize tactical operations. Crowdsourcing is a methodology that distributes analytic efforts to a large group to create a more accurate product faster. Crowdsourcing represents a distributed problem solving model that has existed for many years but has become more effective through the use of the Internet (1). The benefits include a means to obtain information from a wider range of sources than might be present in one organization. In addition, communities can be made both contributors and beneficiaries of group team work. The power of social networking was demonstrated in the DARPA Network Challenge (2) where this challenge showed that groups can collaborate on locating moored weather balloons. It has also been shown that disclosure of problem information to a large group of contributors is an effective means of solving scientific problems (3). The value of open information sharing has been shown not only with science problems but also cultural products. The developers of Ushahidi reporting service demonstrated that citizen inputs could be collected through mobile phones and do a better job of reporting acts of violence than that achieved through mainstream media (4). Technologies have evolved to bring information together via the internet.
The evolving DoD cloud architecture, with a map reduce application construct, may be well suited to allow crowdsourcing to be used during crises response periods. To enable this capability, innovative data collection, clustering, and dissemination methods are needed (map) as well as innovation in how many inputs can be quickly reduced into actionable information automatically. The implementation of crowd sourcing as a distributed analytic capability for MEU commander assigned a crisis response mission must, however, be very different than the commercial use of this technique. The MEU commander will not have the time to sift through a large number of received responses. Rather than the manual reduce process typically used by crowd sourcers, a military response to crises requires an automated reduce process. The other innovation required for the use of crowd sourcing for crises response that must be explored by proposers is the use of crowd sourcing principals to responses received from both humans and machine analytic fusion nodes. Deriving metrics for such a map reduce task has technical risk but is foundational to the use of cloud enabled collaborative workflows.
Challenges for this topic include:
A matured system should be able to utilize crowdsourcing principles from within cloud architecture tenants to show that questions related to the recognition of social changes or activity of immediate significance can be answered more accurately in less time.
PHASE I: Determine the technical feasibility and develop a proof-of-concept for a prototype system that can understand the suitability of a question to distributed crowdsourcing. The concept should address the capability to: 1) automate the discovery of available and relevant human and machine processing nodes, 2) map a question into parts that can be processed by distributed nodes with access to disparate data and be able to reduce disparate responses into a single, more complete and accurate answer, 3) identify and track key technical performance parameters, and 4) demonstrate the concept in a manner that clearly shows how much risk, relative to the production of a full prototype system, has been mitigated.
PHASE II: Develop a proof-of-concept prototype system that is capable of more completely and accurately detecting social change or a significant event by using human and machine populated cloud architecture and a map reduce processing paradigm. The prototype system should demonstrate an increase in the accuracy of a produced answer relative to the assessment produced by an average node. The demonstration should use real responses from distributed human nodes. During Phase II, the system requirement may include the processing of classified data.
PHASE III: Produce a system capable of deployment and operational evaluation. The system should address all of the social behavior and significant event detection requirements of the transition program. Enhance performance over Phase II demonstration by showing increases in both the breadth of questions that can be addressed and in the accuracy and completeness in which the system can address them.
PRIVATE SECTOR COMMERCIAL POTENTIAL/DUAL-USE APPLICATIONS: Media outlets already mine the internet and enabled social media outlets for news. The technology developed under this topic would enable these outlets to use members of social media channels to help write more accurate and complete stories to report on, using both discovered information and human responses to challenge questions. Budget realities prevent news media from being able to have reporters at all locations in the world where news of interest can occur at all times. The developed technology also has great applicability to NGOs challenged with crises response.
2. DARPA Network Challenge, http://archive.darpa.mil/networkchallenge/.
3. Lakhani, Karim, Lars Bo Jeppesen, Peter A. Lohes, and Jill A. Panetta. 2007. "The Value of Openness in Scientific Problem Solving." http://www.hbs.edu/research/pdf/07-050.pdf.
4. Shirky, Clay. 2010. Cognitive Surplus - Creativity and Generosity in a Connected Age, Penguin Press, New York.
KEYWORDS: Crowd Sourcing, Map Reduce, Reporting, Cloud Architecture, Distributed Processing, Collaboration, Mixed Initiative, Fusion