Navy - 22.2 SBIR - Artificial Intelligence-Driven Multi-Intelligence Multi-Attribute Metadata Enabling All-Domain Preemptive Measures

Artificial Intelligence-Driven Multi-Intelligence Multi-Attribute Metadata Enabling All-Domain Preemptive Measures

Navy SBIR 22.2 - Topic N222-118
ONR - Office of Naval Research
Opens: May 18, 2022 - Closes: June 15, 2022 (12:00pm est) [ View Q&A ]

N222-118 TITLE: Artificial Intelligence-Driven Multi-Intelligence Multi-Attribute Metadata Enabling All-Domain Preemptive Measures

OUSD (R&E) MODERNIZATION PRIORITY: Artificial Intelligence (AI)/Machine Learning (ML); Cybersecurity; Networked C3

TECHNOLOGY AREA(S): Battlespace Environments;Information Systems; Sensors

The technology within this topic is restricted under the International Traffic in Arms Regulation (ITAR), 22 CFR Parts 120-130, which controls the export and import of defense-related material and services, including export of sensitive technical data, or the Export Administration Regulation (EAR), 15 CFR Parts 730-774, which controls dual use items. Offerors must disclose any proposed use of foreign nationals (FNs), their country(ies) of origin, the type of visa or work permit possessed, and the statement of work (SOW) tasks intended for accomplishment by the FN(s) in accordance with the Announcement. Offerors are advised foreign nationals proposed to perform on this topic may be restricted due to the technical data under US Export Control Laws.

OBJECTIVE: Develop a system of Artificial Intelligence (AI)-driven multi-attribute metadata analytic tool sets that can be fully integrated with proper associative databases to monitor and track developing activities/signals in all operational domains. The system will utilize available multi-INT indicators and observables to isolate persistent threats including those engaged in undesired reconnaissance activities. The multi-INT information sphere encompasses all physical domains (undersea, surface, air, space, land) as well as cyber. Associative databases serve as the living ground truth repository of wide-ranging information. This AI framework serves as a unifying platform among disparate surveillance sources. It is a persistent AI-driven evidentiary metadata rendition of activities, context, and content. Not just a snapshot of events but the active process of mining, fusing, and expressive tagging of multimodal � multidomain sensory contents (acoustics, thermal, full motion video, wide area motion imagery, etc.), including social media contents as evidence into a collaborative multi-level knowledge database. The multi-level metadata control measures and access points ensure content quality, validity, reliability, and accuracy, including: origination source (temporal, geospatial, operator, modalities); sensor types; signal characteristics (including format, encoding, files size, duration); scene narration; content validity and attributes (raw or time-stamped modification by end user�); security and privacy restriction policy; and chain of custody. These control measures ensure trusted collaborative knowledge medium that can be searched, processed, annotated, linked to relevant disparate data sources, and shared amongst military and Intelligence Community (IC) analysts, federal and local law enforcement, and other Government personnel in real-time.

DESCRIPTION: Analysts supporting naval missions develop actionable intelligence from an extensive array of data sources. National Intelligence, Surveillance, and Reconnaissance (ISR) assets such as Global Hawk and Predator have proven invaluable in multiple theaters of interest. These systems provide high resolution sensory content that has been used to detect adversarial activities, such as movement of fighters and weapons, implanting decoys and IEDs, or gathering of key leaders. Unfortunately, multimodal streaming contents are time consuming to analyze, cumbersome to annotate, and distribute for further review, analysis, or approval. For example, the large size of the video files encourages segmenting of the video data into small pieces containing highly valuable and sensitive information. When this is done, metadata links are broken, causing the loss of temporal- and geo-tracking � both of which are important for further refinement of intelligence and value evidentiary information in support of ongoing operations. Threat assessment efforts require a multi-disciplinary approach that can automatically ingest and process structured and unstructured data from an expanding array of sensors and information sources. Automated content tagging and multimodal sensor fusion are critical components of proactive threat assessment and course of action determination. This SBIR topic seeks development of novel AI metadata methods to automatically create, explicitly document, manage, control, and preserve time-critical sensory content for the development of actionable intelligence. Synchronization of different data types and formats will be an important component. Metadata promotes assessment of the captured behavioral indicators and observables of potentially threating activities. The multi-attribute metadata provides an aggregated array of chronicled indicators that brings into focus the likelihood of a specific entity or group being engaged in the identified hostile activity, as basis for concern. Analysts can then assess the gathered observables to justify additional ISR operations, precautionary defensive measures, or preemptive actions. This technology will be an essential building block for a seamless all-domain interactive offensive and defensive kill chain.

Weaknesses of current approaches: Metadata schemes vary based on mission objectives and operational domain. Lack of alignment and compatibility between the metadata schemes complicates the ability to share information and make systems interoperable for cross agency collaboration to mitigate future threats. For instance, metadata included in the video transport wrapper can vary from typical information about the video source and playback parameters to extensive information as detailed by the Motion Imagery Standards Board. Descriptive metadata consisting of geo-, time-, and other references may be directly overlaid onto the video image. While this is compact and avoids the challenge of synchronizing metadata to the video stream, it offers limited metadata content and occludes significant portions of the video image. Descriptive metadata, such as analyst annotations included in the transport wrapper, often trace events by noting the number of frames from the initial I-frame of the video file; however, this type of reference schema is easily broken when video is cut into smaller clips to be sent to other analysts. The goal is to improve efficiency and accuracy through automation.

Note 1: Work produced in Phase II may become classified. The prospective contractor(s) must be U.S. owned and operated with no foreign influence as defined by DoD 5220.22-M, National Industrial Security Program Operating Manual, unless acceptable mitigating procedures have been implemented and approved by the Defense Counterintelligence Security Agency (DCSA). The selected contractor must be able to acquire and maintain a secret level facility and Personnel Security Clearances in order to perform on advanced phases of this project as set forth by DCSA and ONR in order to gain access to classified information pertaining to the national defense of the United States and its allies; this will be an inherent requirement. The selected company will be required to safeguard classified material IAW DoD 5220.22-M during the advanced phases of this contract.

Note 2: Phase I will be UNCLASSIFIED and classified data is not required. For test and evaluation, a contractor needs to define the ground truth for a scenario and develop a storyboard to serve as an overarching scenario to guide the test and evaluation of this SBIR technology in a realistic context. Supporting datasets must have acceptable real-world data quality and complexity for the case studies to be considered rich in content. For example, image/video dataset of at least 4000 collected images and frames for a case study is considered content-rich.

Note 3: Contractors must provide appropriate dataset release authorization for use in their case studies, tests, and demonstrations, and certify that there are no legal or privacy issues, limitations, or restrictions with using the proposed data for this SBIR project.

PHASE I: Determine technical feasibility, design, and prototype an AI-enabled multi-attribute metadata generation system, as detailed:

� Develop metadata attribute representation methods to express: operational coverage; organic domain features; anomalous entities, events, observations, and relations; and perceived intent relevant to aforementioned naval sensory domains.

� Motivate the design by three compelling scenarios for emerging situations supported by relevant datasets.

� Develop ontology framework for representing and annotating multimodal events and entity relationships.

� Develop machine learning, recognition, and reasoning schemes for metadata annotation to infer content, context, association, and activity by interpreting the body of variety behaviors attached to collected text, video, audio, image, document, diagram, etc. As a minimum, the following metadata information types are required: (a) organic content metadata representing various salient features and signatures captured from a scene when those features are combined as a feature vector can be used as input to machine learning system to form final metadata annotation; (b) content independent (tagged) metadata representing the originator, geospatial, temporal details, etc.; and (c) semantically descriptive metadata that describes the significance of the scene by applying machine learning along with ontology based techniques, for example, video frames and audio data can describe intention, depict the escalation of an event, reveal depth of emotions, or implication of the scene.

� Develop metadata synchronization methods for multi-sensory content types while maintaining temporal synchronization.

� Performance metrics (considering outcomes are dependent on the quality of datasets):

1. Analytic Completeness: � not just identifying and stopping hostile act but how it occurred by synthesizing the entire chain of events what would have happened had it not been stopped < 90%

2. Uniqueness: Signature attributes definable and retrievable (who, what, why, where, when) < 90%

3. Validity: Supporting evidence < 95%

4. Consistency: Updated metadata attribute from various sources that reinforce linkages < 90%

5. Accuracy: Overcoming noisy data < 90%

� Deliverables: Analytics, signal processing tools, models, T&E and demonstration results, final Phase I report, prepare a Phase II plan.

PHASE II: Conduct proof-of-concept and prototype development incorporating the recommended candidate technology from Phase I. Demonstrate the operational effectiveness based on the following criteria: (a) prioritized sensor alerts, (b) prioritized threat escalation, (c) measured severity of events, and (d) measure of analytic completeness � not just identifying and stopping a hostile act but identifying how it occurred by synthesizing the entire chain of events i.e., what would have happened had it not been stopped. Apply the prototype to the synchronization of dissimilar multimodal data streams in real time, with at least one of the sources to include high-definition video. Ensure that the prototype is compatible with a cloud-type architecture and presents a scalable solution. Test and demonstrate the improved capability based on the performance metrics detailed for Phase I with the following requirements: Analytic Completeness < 95%, Uniqueness < 95%, Validity < 98%, Consistency < 98%, and Accuracy < 98%. Develop a final report to include a detailed design of the system, and a plan for transition to the program of record in Phase-III. Deliverables: analytics, signal processing tools, models, prototypes, T&E and demonstration results, interface requirements, and final report.

Note 4: It is highly likely that the work, prototyping, test, simulation, and validation may become classified in Phase II (see Note 1 in the Description section for details). However, the proposal for Phase II will be UNCLASSIFIED.

Note 5: If the selected Phase II contractor does not have the required certification for classified work, ONR or the related DON Program Office will work with the contractor to facilitate certification of related personnel and facility.

PHASE III DUAL USE APPLICATIONS: Further develop the AI-driven multi-attribute metadata analytic tools to TRL-8 for integration with representative multi-INT naval data sources to demonstrate potential naval all-domain tactical preemptive measures expected in Indo-Pacific regions either into Minerva INP, the Maritime Tactical Command and Control, or MAGTF Command, Control, and Communications. Once validated, demonstrate dual use applications of this technology in civilian law enforcement and commercial security services.

REFERENCES:

Algur S.P. and Bhat P.; "Web Video Mining: Metadata Predictive Analysis using Classification Techniques"; International Journal of Information Technology and Computer Science, pp. 68-76, Feb. 2016.
Balasubramanian V., Doraisamy S. G., and Kanakarajan N. K., "A Multimodal Approach for Extracting Content Descriptive Metadata from Lecture Videos"; Journal of Intelligent Information Syst, vol. 46, pp. 121�145, 2015.
Gibbon D.C., Liu Z., Basso A. and Shahraray B.; "Automated Content Metadata Extraction Services Based on MPEG Standards"; The Computer Journal; Dec. 2012.
Rangaswamy S., Ghosh S., Jha S., and S. Ramalingam; "Metadata Extraction and Classification of YouTube Videos Using Sentiment Analysis", Orlando: IEEE Intl. Carnahan Conf. on Security Technology, Oct. 2016.

KEYWORDS: Artificial Intelligence; Metadata; Machine Learning; Kill Chain; Intent; Geospatial; Temporal

** TOPIC NOTICE **

The Navy Topic above is an "unofficial" copy from the overall DoD 22.2 SBIR BAA. Please see the official DoD Topic website at www.defensesbirsttr.mil/SBIR-STTR/Opportunities/#announcements for any updates.

The DoD issued its 22.2 SBIR BAA pre-release on April 20, 2022, which opens to receive proposals on May 18, 2022, and closes June 15, 2022 (12:00pm est).

Direct Contact with Topic Authors: During the pre-release period (April 20, 2022 thru May 17, 2022) proposing firms have an opportunity to directly contact the Technical Point of Contact (TPOC) to ask technical questions about the specific BAA topic. Once DoD begins accepting proposals on May 18, 2022 no further direct contact between proposers and topic authors is allowed unless the Topic Author is responding to a question submitted during the Pre-release period.

SITIS Q&A System: After the pre-release period, proposers may submit written questions through SITIS (SBIR/STTR Interactive Topic Information System) at www.dodsbirsttr.mil/topics-app/, login and follow instructions. In SITIS, the questioner and respondent remain anonymous but all questions and answers are posted for general viewing.

Topics Search Engine: Visit the DoD Topic Search Tool at www.dodsbirsttr.mil/topics-app/ to find topics by keyword across all DoD Components participating in this BAA.

Help: If you have general questions about DoD SBIR program, please contact the DoD SBIR Help Desk via email at [email protected]

** TOPIC Q&A **

Questions answered 05/31/22
Q1. Do we need to fuse multiple sources of data to produce unique information that is otherwise not attainable from analyzing each individual source? For example, extracting latent features from a satellite video feed and fusing them with embeddings from social media text geofenced for that specific location to create an anomaly detector that alerts for a specific event?

A1. Yes, and it is clearly stated the Objective section; �Not just a snapshot of events but the active process of mining, fusing, and expressive tagging of multimodal � multidomain sensory contents (acoustics, thermal, full motion video, wide area motion imagery, etc.), including social media contents as evidence into a collaborative multi-level knowledge database. The multi-level metadata control measures and access points ensure content quality, validity, reliability, and accuracy, including: origination source (temporal, geospatial, operator, modalities); sensor types; signal characteristics (including format, encoding, files size, duration); scene narration; content validity and attributes (raw or time-stamped modification by end user�); security and privacy restriction policy; and chain of custody.�

Q2. Can we extract concrete features (i.e. look for specific objects) from the satellite video feed and keywords from the social media text independently and provide a way to query the synchronized outputs for further analysis by an analyst?

A2. Analysts can perform such query today. The goal of this SBIR topic is to develop an end-to-end automated reasoning technology that extracts entities of interest from various multi-modal sources, reasons about documented/capture activities (i.e., supported by collected metadata and associated databases), and connects the dots between the entities to their activities and their associates by processing various pieces of evidence into a complete picture.

Questions answered 05/31/22
Q1. The topic description defines performance thresholds relating to Analytic Completeness, Uniqueness, Validity, Consistency, and Accuracy. Should metrics for these requirements be developed by the poposer in the Phase I and/or Phase II work programs, or does the customer have specific metrics in mind? At what point will the proposed system be assesed against these metrics?

A1. Review N222-118 Topic Notes 2 and 3 and Phase I/II work statements. The proposed system performance will be evaluated during Phase I and II against the following metrics:
Analytic Completeness: � not just identifying and stopping hostile act but how it occurred by synthesizing the entire chain of events what would have happened had it not been stopped greater than 90% in Phase I; and greater than 95% in Phase II

Uniqueness: Signature attributes definable and retrievable greater than 90% in Phase I; and greater than 95% in Phase II

Validity: Supporting evidence greater than 95% in Phase I; and greater than 98% in Phase II

Consistency: Updated metadata attribute from various sources that reinforce linkages greater than 90% in Phase I; and greater than 98% in Phase II

Accuracy: Overcoming noisy data greater than 90% in Phase I; and greater than 98% in Phase II

[ Return ]