Papers by claude chapdelaine
Providing blind and visually impaired people with the descriptions of key visual elements can gre... more Providing blind and visually impaired people with the descriptions of key visual elements can greatly improve the accessibility of video, film and television. This project presents a Website platform for rendering videodescription (VD) using an adapted player. Our goal is to test the usability of an accessible player that provides end-users with various levels of VD, ondemand. This paper summarizes the user evaluations covering 1) the usability of the player and its controls, and 2) the quality and quantity of the VD selected. The complete results of these evaluations, including the accessibility of the Website, will be presented in the poster. Final results show that 90% of the participants agreed on the relevancy of a multi-level VD player. All of them rated the player easy to use. Some improvements were also identified. We found that there is a great need to provide blind and visually impaired people with more flexible tool to access rich media content.
Producing off-line captions for the deaf and hearing impaired people is a labor-intensive task th... more Producing off-line captions for the deaf and hearing impaired people is a labor-intensive task that can require up to 18 hours of production per hour of film. Captions are placed manually close to the region of interest but it must avoid masking human faces, texts or any moving objects that might be relevant to the story flow. Our goal is to use image processing techniques to reduce the off-line caption production process by automatically placing the captions on the proper consecutive frames. We implemented a computer-assisted captioning software tool which integrates detection of faces, texts and visual motion regions. The near frontal faces are detected using a cascade of weak classifier and tracked through a particle filter. Then, frames are scanned to perform text spotting and build a region map suitable for text recognition. Finally, motion mapping is based on the Lukas-Kanade optical flow algorithm and provides MPEG-7 motion descriptors. The combined detected items are then fed to a rule-based algorithm to determine the best captions localization for the related sequences of frames. This paper focuses on the defined rules to assist the human captioners and the results of a user evaluation for this approach.
Producing caption for the deaf and hearing impaired is a labor intensive task. We implemented a s... more Producing caption for the deaf and hearing impaired is a labor intensive task. We implemented a software tool, named SmartCaption, for assisting the caption production process using automatic visual detection techniques aimed at reducing the production workload. This paper presents the results of an eye-tracking analysis made on facial regions of interest to understand the nature of the task, not only to measure of the quantity of data but also to assess its importance to the end-user; the viewer. We also report on two interaction design approaches that were implemented and tested to cope with the inevitable outcomes of automatic detection such as false recognitions and false alarms. These approaches were compared with a Keystoke-Level Model (KLM) showing that the adopted approach allowed a gain of 43% in efficiency.
Nous présentons les premiers résultats d'une étude visant à explorer les facteurs pouvant permett... more Nous présentons les premiers résultats d'une étude visant à explorer les facteurs pouvant permettre un couplage adéquat entre le sous-titrage en direct et différents degrés de mouvement dans l'image. Dans cet article, nous quantifions le débit du sous-titrage français et démontrons l'efficacité d'une tâche secondaire tactile pour l'analyse de l'attention visuelle dans ce contexte.
Producing off-line captions for the deaf and hearing impaired people is a labor-intensive task th... more Producing off-line captions for the deaf and hearing impaired people is a labor-intensive task that can require up to 18 hours of production per hour of film. Captions are placed manually close to the region of interest but it must avoid masking human faces, texts or any moving objects that might be relevant to the story flow. Our goal is to use image processing techniques to reduce the off-line caption production process by automatically placing the captions on the proper consecutive frames. We implemented a computer-assisted captioning software tool which integrates detection of faces, texts and visual motion regions. The near frontal faces are detected using a cascade of weak classifier and tracked through a particle filter. Then, frames are scanned to perform text spotting and build a region map suitable for text recognition. Finally, motion mapping is based on the Lukas-Kanade optical flow algorithm and provides MPEG-7 motion descriptors. The combined detected items are then fed to a rule-based algorithm to determine the best captions localization for the related sequences of frames. This paper focuses on the defined rules to assist the human captioners and the results of a user evaluation for this approach.
We present an application of video indexing/summarization to produce Videodescription (VD) for th... more We present an application of video indexing/summarization to produce Videodescription (VD) for the blinds. Audio and computer vision technologies can automatically detect and recognize many elements that are pertinent to VD which can speed-up the VD production process. We have developed and integrated many of them into a first computer-assisted VD production software. The paper presents the main outcomes of this R&D activity started 5 years ago in our laboratory. Up to now, usability performance on various video and TV series types have shown a reduction of up to 50% in the VD time production process.
Universal Access in The Information Society, 2009
This paper presents the status of a R&D project targeting the development of computer-vision tool... more This paper presents the status of a R&D project targeting the development of computer-vision tools to assist humans in generating and rendering video description for people with vision loss. Three principal issues are discussed: (1) production practices, (2) needs of people with vision loss, and (3) current system design, core technologies and implementation. The paper provides the main conclusions of consultations with producers of video description regarding their practices and with end-users regarding their needs, as well as an analysis of described productions that lead to propose a video description typology. The current status of a prototype software is also presented (audio-vision manager) that uses many computer-vision technologies (shot transition detection, key-frame identification, key-face recognition, key-text spotting, visual motion, gait/gesture characterization, key-place identification, key-object spotting and image categorization) to automatically extract visual content, associate textual descriptions and add them to the audio track with a synthetic voice. A proof of concept is also briefly described for a first adaptive video description player which allows end users to select various levels of video description.
Deaf and hearing-impaired people capture information in video through visual content and captions... more Deaf and hearing-impaired people capture information in video through visual content and captions. Those activities require different visual attention strategies and up to now, little is known on how caption readers balance these two visual attention demands. Understanding these strategies could suggest more efficient ways of producing captions. Eye tracking and attention overload detections are used to study these strategies. Eye tracking is monitored using a pupilcenter-corneal-reflection apparatus. Afterward, gaze fixation is analyzed for each region of interest such as caption area, high motion areas and faces location. This data is also used to identify the scanpaths. The collected data is used to establish specifications for caption adaptation approach based on the location of visual action and presence of character faces. This approach is implemented in a computer-assisted captioning software which uses a face detector and a motion detection algorithm based on the Lukas-Kanade optical flow algorithm. The different scanpaths obtained among the subjects provide us with alternatives for conflicting caption positioning. This implementation is now undergoing a user evaluation with hearing impaired participants to validate the efficiency of our approach.
This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for ... more This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for content-based indexing and retrieval of audio-visual documents within the MPEG-7 standard. The project, called "MPEG-7 Audio-Visual Document Indexing System" (MADIS), specifically targets the indexing and retrieval of video shots and key frames from documentary film archives, based on audio-visual content like face recognition, motion activity, speech recognition and semantic clustering. The MPEG-7/XML encoding of the film database is done off-line. The description decomposition is based on a temporal decomposition into visual segments (shots), key frames and audio/speech sub-segments. The visible outcome will be a web site that allows video retrieval using a proprietary XQuery-based search engine and accessible to members at the Canadian National Film Board (NFB) Cineroute site. For example, end-user will be able to ask to point on movie shots in the database that have been produced in a specific year, that contain the face of a specific actor who tells a specific word and in which there is no motion activity. Video streaming is performed over the high bandwidth CA*net network deployed by CANARIE, a public Canadian Internet development organization.
Growing needs for French closed-captioning of live TV broadcasts in Canada cannot be met only wit... more Growing needs for French closed-captioning of live TV broadcasts in Canada cannot be met only with stenography-based technology because of a chronic shortage of skilled stenographers. Using speech recognition for live closed-captioning, however, requires several specific problems to be solved, such as the need for low-latency real-time recognition, remote operation, automated model updates, and collaborative work. In this paper we describe our solutions to these problems and the implementation of a live captioning system based on the CRIM speech recognizer. We report results from field deployment in several projects. The oldest in operation has been broadcasting real-time closed-captions for more than 2 years.
This paper describes the system currently under development at CRIM whose aim is to provide real-... more This paper describes the system currently under development at CRIM whose aim is to provide real-time closed captioning of live TV broadcast news in Canadian French. This project is done in collaboration with TVA Network, a national TV broadcaster and the RQST (a Québec association which promotes the use of subtitling). The automated closed-captioning system will use CRIM's transducer-based large vocabulary French recognizer. The system will be totally integrated to the existing broadcaster's equipment and working methods. First "on-air" use will take place in February 2004.
Advances in Human Factors/ergonomics, 1995
ABSTRACT The objective of the study presented here was to investigate usability problems encounte... more ABSTRACT The objective of the study presented here was to investigate usability problems encountered by people using the Mosaic browser for accessing the World Wide Web (WWW). It was found that 48% of the observed problems could be attributed to the implementation of the browser, mainly problems caused by improper feedback. Another 52% of the observed problems could be attributed to the design of the documents, mainly problems related to structure and presentation. New browsers might eliminate some of the problems observed with Mosaic. Still, the need remains for more precise and complete guidelines for hypermedia document design in order to assist content producers in improving the overall usability of the WWW.
Advances in Human Factors/ergonomics, 1995
A proposed classification of information in hypermedia documents into hypermedia, interfaces, end... more A proposed classification of information in hypermedia documents into hypermedia, interfaces, end nodes and embellishments is used as a basis for raising some questions concerning usability of networked hypermedia on the WWW. An experiment was conducted in order to see if embellishments, here defined as document elements that have a decorative function, enhance memory of hypermedia documents. The results showed better recall of document names and a tendency to better recall of document content for documents with embellishments. However, many other factors are at play, and in a networked environment document transfer time also has to be taken into account.
Uploads
Papers by claude chapdelaine