Automated Audio Clustering
Navy SBIR 2011.2 - Topic N112-163
ONR - Mrs. Tracy Frost - [email protected]
Opens: May 26, 2011 - Closes: June 29, 2011

N112-163 TITLE: Automated Audio Clustering

TECHNOLOGY AREAS: Information Systems, Sensors

ACQUISITION PROGRAM: PM Intel

OBJECTIVE: Provide a system that can autonomously cluster a large database of audio files by speaker.

DESCRIPTION: Advances in open availability and collection technology for audio data is contributing to the overall large data problem for the DoD. As the difference between collection capacity and analytic throughput grows, so does the need for automated analysis. An important enabler of this is an ability to use sound characteristics to cluster audio files by unique speaker. There is both military and commercial value to being able to rapidly search for and retrieve all additional comments made by a newly discovered specific speaker of interest from a large library of previously untagged audio files. Related technology exists such as voice print matching which is used as a biometric to establish identity using text dependent matchers. Reliable speaker ID algorithms that are text independent have limitations in that they generally rely on the availability of training data collected under controlled conditions. The goal of the topic is to support research that can cluster a large data store of audio files by unique speaker using sound characteristics without the availability of training data. The topic will require a performer to demonstrate that algorithms such as vector quantization, mixture models, self organizing maps or artificial intelligence can be used to cluster very noisy frequency based data can be successfully employed. It is possible that sound will first have to be automatically translated to phonemes or words before clustering algorithms can be applied. A successful performer will develop a system that can cluster files with a useful true and false positive rate. Both text dependent and independent techniques can be considered but if text dependent algorithms are used the system must utilize one standard set of phonemes/words that can be identified automatically with high confidence. The objective system should assign a unique ID to each cluster. When new audio data is discovered by the system, those new audio files should be automatically be assigned to an existing cluster or designated a new assignment. Periodically the system should re-run clustering across the entire data set.

Challenges for this topic include 1) optimization of extractable voice features for downstream clustering 2) implementation of the optimized text independent audio feature extraction algorithms in both batch and streaming data architectures. 3) development of a reliable word list that can be easily and reliably recognized that are also useful for extracting voiceprints 4) Demonstration the viability of vector quantization, self organizing maps, mixture models or a related technique to perform accurate audio clustering using either or both text independent and dependent features without training data 5) Extraction of features from a cluster of audio files that can be used as training data for subsequent matches.

Advances in voice print matching and speaker ID technology can be leveraged along with recent work in clustering multi-dimensional data to provide a capability responsive to the topic.
The Navy will only fund proposals that are innovative address R&D and involve technical risk.

PHASE I: Complete a feasibility study, research plan and component algorithm testing in order to mature an approach for the development an audio file clustering system that can be run in batch mode and kept current in streaming mode. Identify the critical technology issues that must be overcome to achieve success. Technical work should focus on the reduction of key risk areas. For a constrained set of audio files, demonstrate that phase 1 risk reduction work has shown that a full implementation of the approach is technically tractable. Prepare a revised research plan for Phase 2 that addresses critical issues.

PHASE II: Produce a prototype audio file clustering service that can produce accurate clusters with defining metadata. The prototype should enable a demonstration of the capability to be conducted using relevant data sources, some of which may be classified. The prototype should be capable of operating in both batch and real time streaming mode. The prototype should be relevant to both DoD and commercial use cases.

PHASE III: Produce a system capable of deployment in an operational setting of interest against relevant data loading. Test the system in a relevant setting in a stand-alone mode and as a component of larger system (programs of record). The work should focus on tailoring the developed capability in order to achieve a transition to a program of record in one or more of the military Services. The system should provide metrics for performance assessment.

REFERENCES:
1. M.A. Siegler, U Jain, B. R, and R.M. Stern. "Automatic Segmentation, Classification and Clustering of Broadcast News Audio". Proceedings of DARPA Speech Recognition Workshop, 1997

2. S. Arya, D. M. Mount. "Algorithms for Fast Vector Quantization". Proc. Data Compression Conference, J. A. Storer and M. Cohn, eds., Snowbird, Utah, 1993, IEEE Computer Society Press, 381-390

3. D.A. Reynolds, T.F. Quatieri, and R.B. Dunn. "Speaker Verification Using Adapted Gaussian Mixture Models��. Digital Signal Processing, 10, pp. 19-41 (2000).

KEYWORDS: Clustering, Audio, Speaker ID, Voice Prints, Vector Quantization, Self Organizing Maps, Mixture Models

** TOPIC AUTHOR (TPOC) **
DoD Notice:  
Between April 26 and May 25, 2011, you may talk directly with the Topic Authors to ask technical questions about the topics. Their contact information is listed above. For reasons of competitive fairness, direct communication between proposers and topic authors is
not allowed starting May 26, 2011, when DoD begins accepting proposals for this solicitation.
However, proposers may still submit written questions about solicitation topics through the DoD's SBIR/STTR Interactive Topic Information System (SITIS), in which the questioner and respondent remain anonymous and all questions and answers are posted electronically for general viewing until the solicitation closes. All proposers are advised to monitor SITIS (11.2 Q&A) during the solicitation period for questions and answers, and other significant information, relevant to the SBIR 11.2 topic under which they are proposing.

If you have general questions about DoD SBIR program, please contact the DoD SBIR Help Desk at (866) 724-7457 or email weblink.