Speech Recognition
29,214 Followers
Recent papers in Speech Recognition
This paper proposes text independent automatic speaker verification system using IMFCC (Inverse/ Reverse Mel Frequency Coefficients) and IT-EM (Information Theoretic Expectation Maximization). To perform speaker verification, feature... more
A new form of augmentative and alternative communication (AAC) device for people with severe speech impairment-the voice-input voice-output communication aid (VIVOCA)-is described. The VIVOCA recognizes the disordered speech of the user... more
Most elderly people monitoring systems include the detection of abnormal situations, in particular distress situations, as one of their main goals. In order to reach this objective, many solutions end up combining several modalities such... more
This paper describes a study on tone statistics of peoples' names in Mandarin Chinese. The problem was brought out when we tried to apply an English version of a speech recognizer to a Chinese voice tag dialing task. The questions were:... more
For many audiovisual applications, the integration and synchronization of audio and video signals is essential. The objective of this paper is to develop a system that displays the active objects in the captured video signal, integrated... more
This paper presents an application, "LentInfo", which is a system used to provide information about programmes for the Festival Lent in Slovenia. The Festival Lent consists of different open-air theatre and music performances and raws... more
The estimation of initial language models for new applications of spoken dialogue systems without large taskspecific training corpora is becoming an increasingly important issue. This paper investigates two different approaches in which... more
Vocal communication is most effective when the listener is able to observe the mouth of the speaker. This is especially true for the hearing impaired, and dramatically true for the deaf, who rely on lip-reading for comprehending speech.
We present an improved system combination technique,iROVER. Our approach obtains significant improvements over ROVER, and is consistently better across varying numbers of component systems. A classifier is trained on features from the... more
The Sphinx-4 speech recognition system is the latest addition to Carnegie Mellon University's repository of Sphinx speech recognition systems. It has been jointly designed by Carnegie Mellon University, Sun Microsystems Laboratories... more
This paper proposes several speech technology improvements for increasing robustness, reliability and ergonomics in speech interfaces for controlling aerial vehicles. These improvements consist of including a statistical language model... more
We present two real-time hidden Markov model-based systems for recognizing sentence-level continuous American Sign Language (ASL) using a single camera to track the user's unadorned hands. The first system observes the user from a desk... more
This paper describes a method to detect smiles and laughter sounds from the video of natural dialogue. A smile is the most common facial expression observed in a dialogue. Detecting a user's smiles and laughter sounds can be useful for... more
With the advancement of technology, we can implement a variety of ideas to serve mankind in numerous ways. Inspired by this, we have developed a smart hand glove system which will be able to help the people having hearing and speech... more
Speech recognition and speaker identification are important for authentication and verification in security purpose, but they are difficult to achieve. Speaker identification methods can be divided into textindependent and text-dependent.... more
In this paper, we present a mismatch-aware stochastic matching (MASM) algorithm to alleviate the performance degradation under mismatched training and testing conditions. MASM first computes a reliability measure of applying a set of... more
A methodology and environment for building adaptive speech recognition systems is presented. The development environment is designed for isolated word recognition systems. A small speech recognition system is developed for a home... more
Prosody has been widely used in many speech-related applications including speaker and word recognition, emotion and accent identification, topic and sentence segmentation, and text-to-speech applications. An important application we... more
Growing needs for French closed-captioning of live TV broadcasts in Canada cannot be met only with stenography-based technology because of a chronic shortage of skilled stenographers. Using speech recognition for live closed-captioning,... more
This paper deals with the introduction of an efficient speech front-end for automatic speech recognition. This front-end not only performs well, in comparison to the traditional and widely used MFCC, but is also efficiently implemented in... more
This paper describes a database of dysarthric speech produced by 19 speakers with cerebral palsy. Speech materials consist of 765 isolated words per speaker: 300 distinct uncommon words and 3 repetitions of digits, computer commands,... more
This paper focuses on microphone arrays to realize distant-talking speech recognition in real environments. In distant-talking situations, users can speak at arbitrary positions while moving. Therefore, it is very important for high... more
We present an approach to automatically recognize sign language and translate it into a spoken language. A system to address these tasks is created based on state-ofthe-art techniques from statistical machine translation, speech... more
It is well known that the introduction of acoustic background distortion and the variability resulting from environmentally induced stress causes speech recognition algorithms to fail. In this paper, several causes for recognition... more
We describe a system for model based speech separation which achieves superhuman recognition performance when two talkers speak at similar levels. The system can separate the speech of two speakers from a single channel recording with... more
STRAIGHT, a speech analysis, modification synthesis system, is an extension of the classical channel VOCODER that exploits the advantages of progress in information processing technologies and a new conceptualization of the role of... more
Extractive speech summarization, which purports to select an indicative set of sentences from a spoken document so as to succinctly represent the most important aspects of the document, has garnered much research over the years. In this... more
The main steps of document processing have been reviewed, especially those implemented on Arabic writing. The techniques used in this research, such as Vector Quantization (VQ), Hidden Markov Models (HMM), and Induction of Decision Trees... more
This paper presents an emerging application of multimodal interface research to distributed applications. We have developed the QuickSet prototype, a pen/voice system running on a hand-held PC, communicating via wireless LAN through an... more
This paper considers the problem of constructing an efficient inverted index for the spoken term detection (STD) task. More specifically, we construct a deterministic weighted finite-state transducer storing soft-hits in the form of... more
In this paper, we present a set of optimizations for a spoken language interface for mobile devices that can improve the recognition accuracy and user interaction experience. A comparison between a speech and a graphical interface, when... more
Acoustic signals recorded simultaneously in a reverberant environment can be described as sums of differently convolved sources. The task of source separation is to identify the multiple channels and possibly to invert those in order to... more
Since 1990 the DRA Speech Research Unit has conducted research into applications of speech recognition technology to speech and language development for young children. This has been done in collaboration wirh Hereford and Worcester... more
This paper presents the design of a FPGA-based hardware co-processor, based on the SPHINX 3 speech recognition engine from CMU; capable of performing Acoustic Modeling (AM) for medium sized vocabularies in real-time. By creating an... more
This paper describes the development and validation of an Embedded Isolated Word Recognition System (IWR) for the Argentinian Spanish language, implemented on the STM32F4-Discovery platform. Its front-end extracts Mel Frequency Cepstral... more
The hearing abilities of a group of 30 elderly (67–93yr of age) subjects were compared with those of a group of 30 young (19–27yr of age) normal hearing volunteers with the aim of characterizing the changes in the peripheral and central... more
There exists a large conceptual gap between symbolic models and emergent models for the mind. Many emergent models work on low-level sensory data, while many symbolic models deal with high-level abstract (i.e., action) symbols. There has... more
Current predictors of speech intelligibility are inadequate for understanding and predicting speech confusions caused by acoustic interference. We develop a model of auditory speech processing that includes a phenomenological... more
Spoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multi-media collections in the near future. Considering the characteristics and monosyllabic structure of the... more
This article talks about how advances in human language technology can help overcomesome of the barriers that prevent community participation in cyberspace. Human languagetechnology refers to the set of technologies, such as speech... more
Sound is essential to enhance visual experience and human robot interaction, but usually most research and development efforts are made mainly towards sound generation, speech synthesis and speech recognition. The reason why only a little... more
In the case of a trlgr~m language model, the probability of the next word conditioned on the previous two words is estimated from a large corpus of text. The resulting static trigram language model (STLM) has fixed probabilities that are... more
An input device should be natural and convenient for a user to transmit information to a computer, and should be designed from an understanding of the task to be performed and the interrelationship between the task and the device from the... more