Towards Natural Gesture/Speech HCI: A case Study of Weather Narration
Indrajit Poddar, Yogesh Sethi, Ercan Ozyidiz, Rajeev Sharma
School of Computer Science and Engineering,
Pennsylvania State University,
University Park PA 16802
Comments:
Summary:
Introduction:
This paper is concerned with developing recognition techniques which are capable of handling continuous natural gesture and speech inputs. Previous studies have focussed on presenting data which is bounded with a clear beginning and end. This paper uses an HMM approach with naturalistic data collected from a weather narrator. A continuous gesture recognition framework has been combined with a co-occurrence analysis of speech to determine the interaction between the two modalities.
Gestures and Speech:
Three gesture types were identified for recognition as clearly conveying information:
Pointing
Area
Contour
Each of these can be further decomposed into:
Preparation phase in the beginning
Retraction phase at the end
Actual stroke phase in the middle
Compound gestures can therefore be formed through the combination of any of the above mentioned gesture types. The HMMs used in the implementation are based on this premise.
The streaming video was segmented to extract critical features of the presenters face and two hands. Bootstrapping was employed to improve performance. Gestures were mostly single handed but it was important to identify which hand was being used. As expected continuous gesture recognition has a lower recognition rate than isolated gesture recognition.
Keywords:
To examine the interaction of speech with gestures initially needed the identification of keywords and their interrelationship with gestures.
Evaluation:
85% of the time a meaningful gesture is made it is accompanied by a spoken keyword.
83% of the time a co-occurrence of the keyword
56% of the time a co-occurrence of the keyword
Discussion:
The use of a multimodal input in HCI interfaces is a very interesting concept. In norma human interaction this is a critical element to communication. It is all to familiar that a face to face conversation is much more revealing than one conducted over the telephone of some other restricted modality channel.
No comments:
Post a Comment