Distributed Relational Learning for Cloud Data Fusion
Navy SBIR FY2013.2


Sol No.: Navy SBIR FY2013.2
Topic No.: N132-135
Topic Title: Distributed Relational Learning for Cloud Data Fusion
Proposal No.: N132-135-1262
Firm: Commonwealth Computer Research, Inc.
1422 Sachem Pl., Unit #1
Charlottesville, Virginia 22901
Contact: Nicholas Hamblet
Phone: (434) 284-9415
Web Site: www.ccri.com
Abstract: The US military and intelligence community has been successfully fusing the data it gathers into actionable intelligence. However, the volume of data is increasing such that it cannot be processed on a single server, calling for distributed data fusion algorithms that operate across a cloud. As data grows to the point of requiring distributed storage, machine learning algorithms capable of producing situational awareness must rise to the challenge of working with distributed storage as well. The problem is to design distributed fusion algorithms which not only do as well as single-server solutions, but which leverage larger volumes of data to produce higher quality analytics. This proposal outlines an architecture that works with distributed data sources without needing data to be directly shared between compute nodes. Data fusion without shared memory is a difficult task; however we develop techniques to minimize the amount of information sent between nodes while maintaining high quality fusion. We propose to use models for which both model learning and inference can leverage distributed storage and computation. Inference should be fast and detached model instances readily deployable to local servers for real-time use, while maintaining data and model integrity with the cloud.
Benefits: In this effort CCRi will develop prototype algorithms to facilitate large scale distributed fusion, including level 1 (entity resolution) and level 2 (inference) fusion. These algorithms will make it feasible to achieve fusion from a large multi-source knowledge store using scalable but accurate machine learning models. Entity resolution and inference will be possible across nodes without requiring that all data is shared over limited bandwidth networks, making it possible to fuse data and infer new information without requiring a full local copy of the data set. This will augment evolving cloud computing systems in the Department of Defense and Intelligence Community with a platform for automated reasoning on large sets of enriched data, a capability that is currently lacking. Moreover, these goals will be accomplish by developing a statistical relational learning algorithm leveraging scalable learning algorithms composed of Restricted Boltzmann Machines, laying the groundwork for distributed deep learning and graph learning services that can be applied to a wide range of problems.

Return