Pattern of Life Calculation from Big Graphs
Navy SBIR 2014.1 - Topic N141-075
ONR - Ms. Lore Anne Ponirakis - [email protected]
Opens: Dec 20, 2013 - Closes: Jan 22, 2014

N141-075 TITLE: Pattern of Life Calculation from Big Graphs

TECHNOLOGY AREAS: Information Systems, Human Systems

OBJECTIVE: The research objective is to mature big graph data analytics to model patterns of life and automatically detect changes indicative of anomalous events . A capability that scales linearly with graph complexity is needed.

DESCRIPTION: Sailors and Marines are responsible for conducting missions such as assaults, embassy protection, non-combatant evacuation, and disaster relief. Mission planning involves understanding normal patterns of life (POL) so anomalies can be recognized. For example, observed changes in imagery over time has been used to derive POL for areas of interest. This type of data can be expressible as a vector of terms enabling expected and estimates of change to be calculable as Euclidian distances from mean values. Relevant POL modeling, however, needs to consider far richer data sets that include information on the content and frequency of interaction of nodes. Relevant data sets must also include open source (e.g. social media), cyber (transactional) and messaging (theme and concept spread) in addition to considering the relationship between movement, people and places. This data, even when time/place bounded, results in very large graphs.

A new capability to enable POL calculation, based on large graph theory, is needed. While Euclidian distance calculation scales linearly with nodes, graph analytics do not. Spurred by social networking data, big graph approximate analytics are maturing towards linearity with nodes. These developments empower answers to questions such as who a person may know or personal preferences. POL calculations must consider more diverse data sets and richer graph representations, calling for even more efficient distance calculations.

Network analysis provides powerful means of studying structural connections [1]. Of interest for this topic, is POL for situation awareness. Military and intelligence operators typically rely on their own data sources and analysis for determination of threat activities. Much of the data reported is of events unfolding or that have already taken place; that is important for response; situation assessment; and historical base-lining. However, this data doesn't provide normal POL that could serve as early indicators of change. This topic will explore analytics of nontraditional open source data fused with conventional data that can provide insight into anomalous activities.

Nontraditional data sources can provide insight into real time activity [2]. Examples of data include open source text, satellite imagery available through search engines, video feeds of vehicle traffic flow, video feeds from public areas available on the Internet, public utility patterns (electricity, water, etc.), weather station reports and many more. It should be noted that combining multiple diverse information sources into a unified graph that can be mapped to POL indicators presents challenges. For instance, how do we assess quality of sources? How do we normalize data types? How do we form graph structures?

The technical challenges of this topic are as follows: 1) automating collection of data for big graph formation; 2) data enrichment and fusion 3) construction and maintenance of a dynamic big graph representation [3]; 4) calculation of relevant static and dynamic POL metrics from very large and diverse graphs [4]; 5) setting filters for event detection relevant to user needs (i.e. location, time and tasking); 6) providing means to optimize data collections over time by monitoring and adjusting data sources, (i.e. user data needs and haves); and, 7) identifying analytic methods to scale processing. Practical system building needs to be considered as well as metrics to measure development success.

Creative solutions are desired. Public data gathering should be done on a �not to interfere� basis with providers and should comply with policies for use of the data acquired. Data used, and modeling methods, should be relevant to potential customer for product transition, such as a government agency, program of record or commercial market place. Use of open standards is encouraged to reduce costs and improve system interoperability.

PHASE I: Develop processes and techniques to characterize the content, as it relates to patterns of life, of big graphs over time. The data behind graphs should contain information extracted from diverse data sources. Key technical risks should be identified as well as key technical parameters that measure progress against the risk areas. Results from analysis and concept feasibility tests should be documented in a technical report or paper at a selected conference. The final Phase I brief/demonstration should show risk reduction to the development of a fully responsive Phase II product as well as plans for Phase I Option and Phase II.

PHASE II: Produce a prototype system that is capable of detecting changes to features describing patterns of life rapidly from dynamic large graphs populated by multiple data types and providing mission relevant early threat indicators all enabled by big graph analytics. The prototype system should be able to automatically process, display and alert on activity discoveries relevant to the specific user location and mission interests. The system should support data acquisition, large graph data storage and analytics and alert dissemination. It is desired that context and pedigree of information be maintained for operator review. At this point the performer should focus on a proof-of-concept of capability using data sources that are of interest to a transition program. It is possible that some data sources of interest may be classified secret such as multi-intelligence data (IMINT, HUMINT, MASINT, ELINT).

PHASE III: Produce a system capable of deployment and operational evaluation. The system should address POL indicators that are of value to transition program or commercial application. Machine based processing steps and inferences about patterns of life should be accessible by operator and presented in human understandable form. The software and hardware should be modified to operate in accordance with guidelines provided by transition sponsor.

PRIVATE SECTOR COMMERCIAL POTENTIAL/DUAL-USE APPLICATIONS: The capability specified by this topic is highly relevant to non-government organizations involved in disaster relief who need to track life disruptions and return to normalcy over time.

REFERENCES:
1. Haixun Wang, "Managing and Mining Billion Node", Computing Machinery�s Special Interest Group on Knowledge Discovery and Data Mining Summer School on "Mining the Big Data", August 11, 2012. http://kdd2012.sigkdd.org/sites/images/summerschool/Haixun-Wang.pdf

2. Carter T. Butts, "Revisiting the Foundations of Network Analysis", Science 325, 414, 2009.

3. T.von Landesberger, et. al., "Visual Analysis of Large Graphs: State-of-the-Art and Future Research Challenges". Computer Graphics Forum. http://onlinelibrary.wiley.com/doi/10.1111/j.1467-8659.2011.01898.x/abstract

4. Erica Naome, "The New Big Data Today�s big data is forcing researchers to find new techniques for knowledge discovery and data mining" MIT Technical Review, Aug 22, 2011. http://www.technologyreview.com/news/425090/the-new-big-data/

KEYWORDS: Big Graph Analytics; Patterns of Life; Activity Detection, Dynamic Analysis; Change Detection; Analytics; Graphs; Scalability

** TOPIC AUTHOR (TPOC) **
DoD Notice:  
Between November 20 and December 19 you may talk directly with the Topic Authors (TPOC) to ask technical questions about the topics. Their contact information is listed above. For reasons of competitive fairness, direct communication between proposers and topic authors is
not allowed starting Dec 20, 2013, when DoD begins accepting proposals for this solicitation.
However, proposers may still submit written questions about solicitation topics through the DoD's SBIR/STTR Interactive Topic Information System (SITIS), in which the questioner and respondent remain anonymous and all questions and answers are posted electronically for general viewing until the solicitation closes. All proposers are advised to monitor SITIS (14.1 Q&A) during the solicitation period for questions and answers, and other significant information, relevant to the SBIR 14.1 topic under which they are proposing.

If you have general questions about DoD SBIR program, please contact the DoD SBIR Help Desk at (866) 724-7457 or email weblink.