|
Authorship Analysis for Cognitive Information Operations
Navy SBIR 2012.1 - Topic N121-080 ONR - Ms. Tracy Frost - [email protected] Opens: December 12, 2011 - Closes: January 11, 2012 N121-080 TITLE: Authorship Analysis for Cognitive Information Operations TECHNOLOGY AREAS: Information Systems, Human Systems ACQUISITION PROGRAM: PEO C4I & Space and OA Enterprise Services EC Product OBJECTIVE: Use authorship analysis to obtain signatures of individuals and groups to gain an understanding of their thoughts and perceptions, thus providing methods for optimizing feature selection to profile authors across online media forms and languages. DESCRIPTION: Authorship analysis includes author identification, characterization and similarity detection. It relies on lexical, syntactic, structural and content-specific features. Progress has been made over many years in identifying relevant writing features [ref. 1]. Examples of writing features include: 1) lexical word or sentence length, 2) syntactic frequency of words, 3) structural paragraph length, and 4) content-based on key words. The features that are relevant for signature detection vary based on document forms and language. Additionally, accuracy of authorship classification varies greatly based on author writing samples and number of authors; however, accuracy as high as 70 to 90% has been seen. The global nature of terrorism necessitates tracking individual online communication across media forms (e.g., email, chat, blogs, newsgroups) and languages. The anonymous nature of online message distribution has made identity tracing important and the size of the Internet a much more challenging problem. Most author identification research has been done on the English language. A few studies have been done on Chinese [ref. 1] and Arabic [ref. 2]. The most popular language on the Internet is English at 35.8%, followed by Chinese at 14.1% [ref. 1]. Authorship analysis needs to be advanced across languages thus gaining insight into cognitive information. Methods of authorship analysis rely on statistical analysis and machine learning [ref. 3]. Recent work indicates potential for cognitive insight into author in-group and out-group alignment [ref. 4]. The specific topic technical challenges include: 1) identification of features that are useful to establishment of authorship, 2) development of clustering algorithms that can separate authors based on authorship features, and 3) development of a tool that can run against document streams. The Navy is interested in innovative research and development solutions that include technical and scientific merit and involve technical risk. PHASE I: Develop methods to determine document author signatures from features derived from text. Identify a group of the most promising features as well as key technical risks and track risk-mitigation through the measurement of key technical parameters. Explore how selected features can be generalized across languages. Conduct a proof-of-concept demonstration. Results from the model development and tests are to be documented in a technical report and presented at a selected conference. PHASE II: Produce a prototype system that is capable of ingesting more than two data type sources. The prototype system will be able to automatically process and group documents by author or group. The model(s) and techniques are to include analysis of data available besides text content such as selection of formats, styles, hyperlinks and image types. A proof-of-concept should be shown with relevant document corpus with a 90% goal for detection accuracy and missed declarations of no greater than 10%. PHASE III: Produce a system capable of deployment and operational evaluation. Package the developed application as an Ozone Widget ready for incorporation into the Distributed Common Ground Station program. PRIVATE SECTOR COMMERCIAL POTENTIAL/DUAL-USE APPLICATIONS: There are many commercial applications including law enforcement, business activity monitoring, and security monitoring. Presently, there is a strong need to protect military and civilian personnel from gangs and terror cells. Developed systems should operate in a net-centric environment and provide reliable performance. Commercial value and cost savings is enhanced by operation in a distributed service oriented architecture with other applications. REFERENCES: 2. Ahmed Abbasi and Hsinchun Chen, "Applying Authorship Analysis to Extremist-Group Web Forum Messages", Homeland Security IEEE Computer Society 1541-1672/05, Sept/Oct 2005, www.computer.org/Intelligent. 3. Efstathlos Stamatatoes, "A Survey of Modern Author Attribution Methods", Journal of the American Society for Information Science and Technology, 60 (3): 538-556, 2009. 4. Marion Ceruti, Scott McGirr, and Joan Kaina, "Interaction of Language, Culture and Cognition in Group Dynamics for Understanding the Adversary", NSSDF, Las Vegas, July 2010. KEYWORDS: Authorship Analysis; Document Signatures; Cognitive Science; Writing Styles; Natural Language Processing; Lexicalization
|