Spontaneous Text-independent Audio Clustering Software

Spontaneous Text-independent Audio Clustering Software
Navy SBIR FY2011.2

Sol No.:	Navy SBIR FY2011.2
Topic No.:	N112-163
Topic Title:	Spontaneous Text-independent Audio Clustering Software
Proposal No.:	N112-163-0084
Firm:	Physical Optics Corporation Applied Technologies Division 1845 West 205th Street Torrance, California 90501
Contact:	Kevin Degrood
Phone:	(310) 320-3088
Web Site:	www.poc.com
Abstract:	To address the Navy's need for a system that can autonomously cluster a large database of audio files by speaker, Physical Optics Corporation (POC) proposes to develop a new Spontaneous Text-independent Audio Clustering Software (STACS) system. This proposed technology is based on a unique combination of sophisticated signal processing algorithms that performs analysis and identification of voice data within audio files. The innovation in providing a dual-transform architecture will enable STACS to perform robust speaker identification as the basis for clustering files. Files that produce audio characteristics similar to each other are considered to contain speech data from the same speaker. Conversely, files with dissimilar characteristics are deemed to contain data from different speakers and are therefore not clustered together. As a result, this technology offers excellent identification accuracy and strong noise immunity without the need for training data, which directly address the PM Intel requirements. In Phase I, POC will demonstrate the feasibility of STACS by processing data files associated with multiple speakers, quantifying the computational latency, and verifying the clustering accuracy. In Phase II, POC plans to enhance STACS to accommodate all standard codec and container file formats used in modern cellular telephony.
Benefits:	STACS will offer immediate benefits to the Navy by providing superior speaker identification that results in highly accurate audio file clustering, as required by the Program Manager - Intelligence. Both military and commercial intelligence, surveillance, and reconnaissance programs would benefit from the capability provided by STACS to automatically segregate large data depositories on the basis of unique speaker characteristics. Once the depository is established, newly discovered speakers of interest can be tracked backward in time through the database to confirm suspected associations with known speakers of interest. This extensive database can be used to generate a voiceprint database directly from the STACS depository. Using the STACS technology, the entire database can be reanalyzed to strengthen the clustering. The STACS technology can be easily adapted for use by television and radio stations to identify and cluster legacy audio recordings for automated archiving and retrieval purposes. Similarly, audio library records could be quickly and accurately indexed, thus promoting improved data access and storage.

Return