Speech & Audio

One of our main areas of research in speech technology is indexing and retrieval of spoken documents, where we'd like to make large video collections more accessible by applying a wide range of components for speech and audio analysis. A major difficulty is the heterogeneity of the material we have to face, which can easily range from a professional media archive to a YouTube-like internet video portal with Lo-Fi user generated content.

Therefore, we study techniques which try to approach this challenge by combining results from structural audio analysis, robust speech recognition, semantic result analysis, speaker recognition and information retrieval into a robust and adaptive system. We continuously improve this process by adding more information extraction components, e.g. by automatically exploiting the web for model adaptation.

While most of our activities concentrate on speech, we are also interested in extracting information from non-speech audio content. This ranges from detecting jingles from your favorite news show to noise level estimation in public transport vehicles.