Context-Specific Dynamic Collaborative Information Analysis
Navy SBIR 2011.2 - Topic N112-152
ONR - Mrs. Tracy Frost - [email protected]
Opens: May 26, 2011 - Closes: June 29, 2011

N112-152 TITLE: Context-Specific Dynamic Collaborative Information Analysis

TECHNOLOGY AREAS: Information Systems

OBJECTIVE: The objective of this topic is to empower analysts working on complex intelligence tasks by dynamically forming collaborative research groups of individuals who are currently investigating similar issues, while requiring minimal or no direct input from each analyst with respect to the goals of their current mission. To this end, the goal of this topic is to develop technologies and a tool that allow each analyst to, not only, explore information in a huge database of documents, but also suggest other analysts who may provide key insights for the task at hand.

DESCRIPTION: In recent years, the web has experienced a huge increase in the number of webpages, blog posts, tweets and other information sources published daily, forcing users to cope with information overload. The task of guiding users through this flood of information has thus become critical. The DoD is also faced with similar information overload challenges: an intelligence analyst must wade through huge amounts of possible documents, both from classified sources and from the web. The main tool at the analyst's disposal, namely keyword search, provides little help when the number of matching documents is huge, or when the most relevant search terms are unclear.

To address this problem, the first step of this topic is to provide a general technology and tool for addressing such information overload challenges on any huge collection of documents. The modes of search interaction must go beyond simple keyword matching technology toward a more semantic understanding of language, allowing for a richer description of the analyst�s information needs through, for example, analysts inputting specific documents related to their search goals and rating responses in an interactive fashion. Through the use of machine learning techniques, the proposed technology should learn the goals of the analyst�s current task and improve the quality of the responses to the current context.

Often analysts will repeat significant work performed by other analysts, or miss key insights that are available to analysts working on related tasks. If analysts with similar goals could be connected together, we could achieve a transformative improvement in productivity and quality of the final product. Unfortunately, only very few individuals, e.g., those managing a group of analysts, may know enough about the current and past tasks of each individual to form these connections, significantly decreasing the potential for this successful synergy.

In this topic, we also seek technologies and a tool to effectively connect analysts working on related tasks or those who could provide new insights to an analyst�s current task. Although a natural approach would be for each analyst to provide a detailed description of their current task, this process has the potential of being so burdensome to the individual that the tool would not be used effectively. Instead, the goal is to form these research groups solely based on each individual�s interaction of the proposed search tool, and perhaps other small elements of information, such as chats between analysts and microblogs (tweets) of free text about their current goals and objectives.

A major challenge that should be addressed in this topic is scalability. The tool and methodology should progressively scale to an environment that, by the completion of Phase III, can involve hundreds of analysts research millions of documents in an interactive fashion.

PHASE I: Develop a detailed technical plan and architecture for a tool that individual analysts can effectively use to search through a large document database that goes beyond keyword search. Demonstrate the viability of this approach in a database with ten thousand documents. Based on this tool, demonstrate the ability to connect multiple analysts who are working on similar topics based only on their usage of the tool.

PHASE II: Develop a tool for individual analysts to explore a database of hundreds of thousands of documents. This tool must enable a description of the analyst�s information needs that goes beyond the typical keyword matching or database query, improving the performance on their task through interaction. Based on this tool, provide and demonstrate a technology that connects tens of analysts who can simultaneously use the tool. Through its use, the system should connect analysts who can collaborate or provide key insights to the task at hand, in the user�s current context. This tool should be able to exploit not only the database of documents and a user�s current usage, but the analyst�s previous usage history and their communications through chats.

PHASE III: Extend the scale of the technology to where hundreds of analysts examining millions of documents can use the tool at the same time. The technologies and products developed under this topic will have applications in intelligence analysis, law enforcement, and security. In particular, the approach will significantly decrease "missed opportunities" for "connecting the dots" in complex intelligence tasks in the Navy and DoD at large.

PHASE III DUAL USE APPLICATIONS: Applications of the developed tools also have significant potential impact in social networking, providing a new way to dynamically connect users, and in novel methodologies for searching information on the web.

REFERENCES:
1. Shaparenko and Joachims: Information Genealogy: Uncovering the Flow of Ideas in Non-Hyperlinked Document Databases. Proceedings of the ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2007.

2. El-Arini, Veda, Shahaf, and Guestrin: Turning Down the Noise in the Blogosphere. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2009.

3. Goldenberg, Zheng, Fienberg, and Airoldi: A Survey of Statistical Network Models. Foundations and Trends in Machine Learning, 2, pp 129-233, 2009.

4. Shahaf and Guestrin. Connecting the Dots Between News Articles. ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010.

5. Namata, Sharara, and Getoor: A Survey of Link Mining Tasks for Analyzing Noisy and Incomplete Networks. In Link Mining: Models, Algorithms, and Applications, Springer, 2010.

6. Carlson, Betteridge, Kisiel, Settles, Hruschka, and Mitchell. Toward an Architecture for Never-Ending Language Learning. Proceedings of the Conference on Artificial Intelligence (AAAI), 2010.

KEYWORDS: Huge databases of documents, learning queries, discovering collaborative groups, link mining.

** TOPIC AUTHOR (TPOC) **
DoD Notice:  
Between April 26 and May 25, 2011, you may talk directly with the Topic Authors to ask technical questions about the topics. Their contact information is listed above. For reasons of competitive fairness, direct communication between proposers and topic authors is
not allowed starting May 26, 2011, when DoD begins accepting proposals for this solicitation.
However, proposers may still submit written questions about solicitation topics through the DoD's SBIR/STTR Interactive Topic Information System (SITIS), in which the questioner and respondent remain anonymous and all questions and answers are posted electronically for general viewing until the solicitation closes. All proposers are advised to monitor SITIS (11.2 Q&A) during the solicitation period for questions and answers, and other significant information, relevant to the SBIR 11.2 topic under which they are proposing.

If you have general questions about DoD SBIR program, please contact the DoD SBIR Help Desk at (866) 724-7457 or email weblink.