This solicitation is now closed
Enhanced Summarizations of Streaming Text
Navy SBIR 2012.1 - Topic N121-078
NSMA - Mr. Chris Coleman - [email protected]
Opens: December 12, 2011 - Closes: January 11, 2012

N121-078 TITLE: Enhanced Summarizations of Streaming Text

TECHNOLOGY AREAS: Information Systems

RESTRICTION ON PERFORMANCE BY FOREIGN CITIZENS (i.e., those holding non-U.S. Passports): This topic is "ITAR Restricted". The information and materials provided pursuant to or resulting from this topic are restricted under the International Traffic in Arms Regulations (ITAR), 22 CFR Parts 120 - 130, which control the export of defense-related material and services, including the export of sensitive technical data. Foreign Citizens may perform work under an award resulting from this topic only if they hold the "Permanent Resident Card", or are designated as "Protected Individuals" as defined by 8 U.S.C. 1324b(a)(3). If a proposal for this topic contains participation by a foreign citizen who is not in one of the above two categories, the proposal will be rejected.

OBJECTIVE: To develop methods and tools that produce real-time topic-focused summaries of streaming text data sources.

DESCRIPTION: Researchers and analysts often have to extract meaningful information from large collections of unstructured text. They do not have the time to spend reading irrelevant documents that are not pertinent to their topic. Thus, algorithms and tools that will automatically summarize the documents with little or no loss of information are highly desirable. We seek innovative theoretical, methodological, and technical approaches for real-time summarization of text, where the summaries are relevant to a specific topic.

Text-based data sources that arrive in a streaming manner or have a time stamp associated with them include, but are not limited to news casts, tweets, chats, and blogs. These types of data sources are often noisy, short, bursty, and use non-standard language, which makes the problem more challenging. The ability to exploit and to account for the time attribute associated with the document summaries will speed up and enhance the analytic process.

Conventional summarization can be thought of as generic or topic-focused, where the later generates summaries that are related to a given topic. Recent research efforts in academia are addressing the problem of updating summaries at each time step that take the current summary into account. For example, content from prior documents that has already been read by users should not be included in the summary.

The goal of this effort is to research and develop novel techniques and tools that will update topic-focused summaries of noisy individual texts from a large streaming corpus. Of additional interest is to expand these approaches to provide the ability to extract and update topic-focused summarizations of related corpora, track and connect the important content in the documents, and provide some measure of associated uncertainty. The approaches and methodologies developed under this effort should be able to work with existing text analytic and natural language processing techniques and tools.

PHASE I: Research and develop new methods and approaches for real-time summarization of text, where the summaries and documents are relevant to a specific topic.

PHASE II: Implement promising methods in a prototype software tool and demonstrate that it meets performance needs, such as real-time processing, production of cogent summaries, and the ability to work with streaming data (e.g., broadcast media). Consideration should be given to customizing the system for domain and user preferences.

PHASE III: Productize the system to cover multiple domains and very large corpora.

PRIVATE SECTOR COMMERCIAL POTENTIAL/DUAL-USE APPLICATIONS: A system capable of assisting users to ingest and understand large amounts of information would be very useful to researchers and analysts in the legal, journalism, medical, business, and economic domains.

REFERENCES:
1. Li, X., et al., "Graph-based marginal ranking for update summarization," SIAM Conference on Data Mining (SDM 2011), pp. 486-497

2. http://www-nlpir.nist.gov/projects/duc/duc2007/tasks.html

3. Wan, X., "TimedTextRank: Adding the temporal dimension to multi-document summarization," Proceedings of SIGIR 07, pp. 867-868

4. Lin, C. and E. Hovy, "The automated acquisition of topic signatures for text summarization," COLING 2000, Proceedings of the 18th Conference on Computational Linguistics

5. R. M. Aliguliyev, "Clustering techniques and discrete particle swarm optimization algorithm for multi-document summarization," Computational Intelligence, 2010

KEYWORDS: Paraphrase generation; text analytics; text summarization; natural language processing; text mining; data mining

** TOPIC AUTHOR (TPOC) **
DoD Notice:  
Between November 9 and December 11, 2011, you may talk directly with the Topic Authors to ask technical questions about the topics. Their contact information is listed above. For reasons of competitive fairness, direct communication between proposers and topic authors is
not allowed starting December 12, 2011, when DoD begins accepting proposals for this solicitation.
However, proposers may still submit written questions about solicitation topics through the DoD's SBIR/STTR Interactive Topic Information System (SITIS), in which the questioner and respondent remain anonymous and all questions and answers are posted electronically for general viewing until the solicitation closes. All proposers are advised to monitor SITIS (12.1 Q&A) during the solicitation period for questions and answers, and other significant information, relevant to the SBIR 12.1 topic under which they are proposing.

If you have general questions about DoD SBIR program, please contact the DoD SBIR Help Desk at (866) 724-7457 or email weblink.