Automatic Unsupervised Recognition and Allocation of Speakers (AURAS)

Automatic Unsupervised Recognition and Allocation of Speakers (AURAS)
Navy SBIR FY2011.2

Sol No.:	Navy SBIR FY2011.2
Topic No.:	N112-163
Topic Title:	Automatic Unsupervised Recognition and Allocation of Speakers (AURAS)
Proposal No.:	N112-163-0446
Firm:	DECISIVE ANALYTICS Corporation 1235 South Clark Street Suite 400 Arlington, Virginia 22202
Contact:	Jon Clausen
Phone:	(703) 414-5020
Web Site:	http://www.dac.us
Abstract:	Developments in the availability of audio data and the methods of collecting it have only added to the problem of data overload for the DoD. Automated methods of analysis are the only way to process the necessary volume of audio data in a timely manner. The need exists both within the military and industry to harness these automated methods to rapidly and reliably cluster segments of audio by unique speaker. The proposed solution, Automatic Unsupervised Recognition and Allocation of Speakers (AURAS) advances the state-of-the-art in this area. Able to operate on a static collection of audio files or an audio stream, AURAS automatically groups segments of audio by unique speaker using their individual sound characteristics. Unlike less-advanced methods, AURAS requires no training data, and detects the number of speakers present in the data automatically. While AURAS is completely language-independent, in cases when the language being spoken is known, the system is able to leverage the words used by speakers to enhance its accuracy. Additionally, AURAS "learns as it analyzes," and can therefore operate continuously, without the need for downtime to "re-learn" when it encounters a group of completely new speakers. In fact, performance improves the more new speakers it sees.
Benefits:	The proposed solution, Automatic Unsupervised Recognition and Allocation of Speakers (AURAS), provides analysts with access to groups of audio segments, each labeled by unique speaker ID. The grouping of audio segments by unique speaker ID is done in a completely unsupervised manner, freeing the analyst from the tedium of "hand-labeling" each and every segment. Indeed, identification of the speaker in a single audio segment (by ear, or by other information pertaining to the recording such as SIGINT) effectively identifies the speaker for the entire group. This can almost immediately rule out untold hours of audio from consideration, enabling the analyst to focus their time on persons of interest.

Return