Monday, May 3, 2010

Webcam Mouse Using Face and Eye Tracking in Various Illumination Environemts

Yuan-Pin Lin, Chung-Chih Lin, Jyh-Horng Chen

Institue of Electrical Engineering,

National Taiwan University


Yi-Ping Chao,

Department of Computer Science and Information Engineering,

Chang Gung University


Comments:


Manoj’s Blog




Summary:


Introduction:

This article presents a way of using a screen mounted webcam to aquire human gestures as a substitute input for the mouse. K-Nearest neighbor classifier in combination with an adaptive skin model is used to provide real time tracking. The system implemented on a standard laptop is able to perform at 15fps.


This system is proposed as an alternative interface for the disabled. Unlike traditional solutions it is minimally intrusive, not requiring the user to wear additional devices. The system requires a dedicated machine to perform the visual tracking whose data is forwarded to the computer that is being operated by the user.

Face tracking is performed using a non linear skin color transform to overcome variations in lighting. Iris tracking is used to identify the eyes and their position. The cursor is then calculated as a product of the eye position relative to the head position. Turning the head has the effect of moving the cursor from side to side.


Feature Recognition:

After defining the KNN features for recognizing illumination conditions. An elliptical model was used with 10 images under different illumination conditions. An average of 92% accuracy for skin detection was achieved. The face was distinguishable due to its clustered appearance.

Eye tracking was then performed with a frame capture comparing the centers of the face and eyes.


Results:

The system runs on a standard 2.4-GHz laptop dedicated to tracking the position of the head and eyes at 15fps occupying 45% of the windows resources. It successfully tracks the head and eyes under a variety of different lighting conditions with complex real world backgrounds.


Discussion:

Calculating the vectors of the face and eyes separately and then determining the point of gaze as a product of the eye vectors relative to the face is a good approach which should lead to improved accuracy. Although it is mentioned that the system is under development. It would have been beneficial to have provided some details as to how the system was tested.

Wednesday, April 28, 2010

Real-Time Hand-Tracking as a User Input Device

Robert Y. Wang

Computer Science and Artificial Intelligence Laboratory,

Massachusetts Institute of Technology

Cambridge, MA USA



Comments:


Franck’s Blog




Summary:


Introduction:

This paper presents an affordable 3D articulated user input system for the hand. An ordinary cloth glove which has been imprinted with colors is used which facilitates the effectiveness of a KNN approach. This implementation allows the user to perform true 3D gestures which are not constrained to a single plane. This system is a compromise position between a bare hand capture and a wearable motion detection system. The graphical pattern on the glove facilitates faster and more accurate pose estimation.


Pose Estimation:

The system is based on single frame pose estimation. Unlike bare hand estimation where very different poses can map to the same image without intensive evaluation. The use of the colored glove ensures the very different poses always map to different images. The implementation of the KNN is adapted use a Hausdorff-like distance metric.


Database Sampling:

Low dispersion sampling of hand poses is used to provide a maximum bound on the estimation error that the algorithm can make. The result is to have a database where the distance from the KNN to the query image is minimized.


Fast Nearest Neighbor Search:

KNN for large databases can be computationally expensive. To speed classification, each pose is compressed into a 128-bit binary code.


Database Coverage Evaluation:

The average performance of 50 test poses which have been randomly sampled is measured. The distance to the nearest neighbor is measured for each pose. Both pose and image distance nearest neighbor improve with increasing database size.


Applications:

This application was intended for 3D direct manipulation tasks such as 3D modeling. Other application could be 3D character control which would certainly be an improvement on the paper summarized earlier in the semester where the implementation was based around a P5 data glove. Interaction with large displays from a distance etc.


Discussion:

This is a very nice approach with the use of a simple yet highly effective solution in the colored fabric glove. It would have useful to have included some data to illustrate the performance of the implementation. This type of feature identification for video tracking is very powerful and obviously has no limit to its applications.

Wednesday, April 21, 2010

Liquids, Smoke, and Soap Bubbles -

Reflections on Materials for Ephemeral User Interfaces

Axel Sylverter

AGIS,

University of Hamburg,

33537 Hamburg, Germany


Tanja Döring, Albrecht Schmidt

Pervasive Computing,

University of Duidburg-Essen,

45117 Essen, Germany


Comments:


Drew’s Blog

Franck’s Blog



Summary:


Introduction:

This article presents an interface based on the creation manipulation, and short life of bubbles. The choice of this implementation encourages the users to interact with a medium that is very fragile, has a short life and exhibits random behavior.

The use of different materials to provoke specific qualities in human computer interaction is the key point explored by this work.


Implementation:

A 20 inch circular pool with an under mounted camera is used on which bubbles are floated. The bubbles can be either clear or smoke filled. The under mounted camera is able to track the diameter and position of the bubbles by the ring that is formed by surface tension between the skin of the bubble and the liquid in the pool. The user can move the bubbles by either blowing or touching them.

The system was used to control the lighting in a room where the size of a bubble was used to control brightness.


Discussion:

An interesting direction to go. The random component of the setup could add interest and a life of its own to the application. The artistic component and soft imprecise nature of bubbles could be a nice contrast to many applications where the interface could provide both control and feedback.


Monday, April 19, 2010

EyeDraw: Enabling Children with Severe Motor Imparements to Draw with Their Eyes


Anthony J. Hornof

Computer and Information Science,

University of Oregan,

Eugene, OR 97403 USA


Anna Cavender

Computer Science and engineering,

University of Washington,

Seattle, WA 98195 USA


Comments:


Manoj’s Blog

Franck’s Blog



Summary:


Introduction:

This is an application that enable children with severe motor disabilities to draw pictures with their eyes. The application runs on a computer equipped with an eye tracking device.

In order to draw a line the start and end points are identified from successive gaze positions. The line is then automatically created by connecting the dots. The midas touch problem is avoided by the use of dwell time accompanied with visual feed back in the form of a change in the shape of the cursor.


Drawing Commands:

Alternation between viewing and drawing is performed by dwell time. The cursor changes from green for viewing to red for drawing. The dwell time threshold is adjustable based on experience but starts at a half second. A drawing command can be cancelled by extending the dwell for another half second.


Version 1:

This is a minimal control version which has the following tools

Line drawing,

Circle drawing,

Undo Button,

Grid to assist with dwell stabilization,

Save and open drawings.


Evaluation of Version 1:

participants without disabilities were first recruited. After calibration of the eye tracker and a short familiarization with the controls the participants were asked to make some drawings.

500ms was found to be the preferred dwell time.

The easiest functions were clicking buttons.

The hardest functions were drawing and controlling the drawing.

The grid was found to be useful.

Overall the task required a lot of focussed attention.

A second evaluation was performed by impaired users which lead to the second version of the program.


Version 2:

This version of the program was in response to the needs of the impaired users and contained the following additions:

Image of what the camera sees so the user can stay in an optimal position while drawing.

More user defined settings such as dwell time etc.

Audi feedback on the current state of the cursor while drawing.

Rectangle and polygon tools were added.

Colors.

On off switch for the eye tracking.


Evaluation of Version 2:

As before the evaluation was divided into two groups. Both found the rich palate of tools engaging, with the consequence that an extended period of familiarization was required. The next version will have a feature that reveals tools one at a time to enable first time users to start drawing and not be over faced by the apparent complexity of the application.


Discussion:

This is a very interesting article. The evaluation of the first two versions of the application is very thorough in its descriptions of each individuals reaction to the system. Some information about the implementation of the eye tracking would have been of interest. One of the users is reported to have written text at the bottom of their picture. Again, some information on how the keyboard was represented and controlled would have been of interest. Very little is mentioned about participants reaction to using eye movements deliberately. This is only touched upon by mentioning that a lot of attention was required.

Wednesday, April 7, 2010

Towards Natural Gesture/Speech HCI: A case Study of Weather Narration

Indrajit Poddar, Yogesh Sethi, Ercan Ozyidiz, Rajeev Sharma

School of Computer Science and Engineering,

Pennsylvania State University,

University Park PA 16802


Comments:


Drew’s Blog

Franck’s Blog



Summary:


Introduction:

This paper is concerned with developing recognition techniques which are capable of handling continuous natural gesture and speech inputs. Previous studies have focussed on presenting data which is bounded with a clear beginning and end. This paper uses an HMM approach with naturalistic data collected from a weather narrator. A continuous gesture recognition framework has been combined with a co-occurrence analysis of speech to determine the interaction between the two modalities.


Gestures and Speech:

Three gesture types were identified for recognition as clearly conveying information:

Pointing

Area

Contour


Each of these can be further decomposed into:

Preparation phase in the beginning

Retraction phase at the end

Actual stroke phase in the middle


Compound gestures can therefore be formed through the combination of any of the above mentioned gesture types. The HMMs used in the implementation are based on this premise.


The streaming video was segmented to extract critical features of the presenters face and two hands. Bootstrapping was employed to improve performance. Gestures were mostly single handed but it was important to identify which hand was being used. As expected continuous gesture recognition has a lower recognition rate than isolated gesture recognition.


Keywords:

here, up here, down here

east(ern), west(ern), etc

Name of some place


To examine the interaction of speech with gestures initially needed the identification of keywords and their interrelationship with gestures.


Evaluation:

85% of the time a meaningful gesture is made it is accompanied by a spoken keyword.

83% of the time a co-occurrence of the keyword with a gesture phase.

56% of the time a co-occurrence of the keyword with “point”.


Discussion:

The use of a multimodal input in HCI interfaces is a very interesting concept. In norma human interaction this is a critical element to communication. It is all to familiar that a face to face conversation is much more revealing than one conducted over the telephone of some other restricted modality channel.

Coming to Grips with the Objects We Grasp:

Detecting Interactions with Efficient Wrist-Worn Sensors

Eugen Berlin, Jun Liu, Kristof van Laerhoven, Brent Schiele

Department of Computer Science,

Techniche Universität Darmstadt,

Darmstadt, Germany


Comments:


Manoj’s Blog

Franck’s Blog



Summary:


Introduction:

This article describes a wrist worn sensor that is able to identify nearby objects with RFID tags and also the gestures performed by the user through accelerometer data. In particular this paper presents the technical challenges encountered in the development of a wrist worn device to perform this task.


System:

RFID antenna

M1-mini SkyeTek reader interface circuitry

3D accelerometer

Skin temperature sensor

Two ambient light sensors


Challenges:

Extending the range of the RFID from the wrist to the hand.

Choice of antenna to best perform.

Density of RFID tags and frequency of measurements to be able to determine which item is being picked up.

Classification of 3D accelerometer data.


The Box Test:

Used in optimizing the antenna.

Objects with RFID tags were placed in a box and then individually picked up. The Hit Rate was then calculated as the number of correct identifications divided by the overall number of attempts. This test had the advantage that it had many elements which would de present in a real world application and was therefore more representative.

Variables to optimize using the box test:

Antenna shape

Q-Value, the quality or reading range

RFID Reading Frequency


Optimizing Accelerometer Data (Inertial Sensing):

How the essence of the gestures was captured. The segmentation and classification phase were combined. The data is compacted into a series of linear segments which are then compared to members from a known set with a sliding window paradigm.


Gardening and Domestic Cleaning Scenarios:

These were chosen as test beds because of their real environments containing multiple pre tagged tools and tasks.


Discussion:

This paper presents a more practically usable implementation of an RFID/accelerometer bracelet combination for identifying objects and gestures. The various challenges encountered in the development are outlined leading to an efficient lightweight solution. A classification method for dealing with the accelerometer data was also adapted to run on the reduced resources of the mobile platform.

Monday, April 5, 2010

The Peppermill: A Human-Powered User Interface Device

Nicolas Villar,

Steve Hodges,

Microsoft Research

Camebridge, UK


Comments:


Manoj’s Blog

Franck’s Blog



Summary:


Introduction:

A user interface that sources its power from the effort required to operate it. This is a proof of concept device called the pepper mill which is used as an input device for multimedia browsing applications. The action involved in using the device as a viable alternative to batteries, with economic and ecological benefits. The transducer both senses the users intent and also generates the required power for the device to function. These characteristics are requirements of what are called human powered devices.


Background:

The Space Commander, developed by the Zenith corporation in 1955 as a TV remote control. This device was mechanical generating high frequency sounds which were received by a microphone and decoded. Simpler devices such as windup flash lights and radios have been around for a longtime. Body-worn systems to harvest energy to power cell phones etc are more recent additions. Another example which is mentioned is the MIT self powered button.


The Implementation:

The circuit diagram is presented showing the relatively simple implementation.

Standard components are used with a motor and reduction gear assembly providing the source of power.


The PepperMill Controller:

The controller is designed as a generic controller reminiscent of a culinary pepper mill hence the name. There is a total of four input DOF. the three buttons acting as modifiers to the rotary action. A micro controller, and radio transmitter are also incorporated in the package. A receiver is required on the other end which in this case was a USB serial device.


Evaluation:

An artificial scenario video browsing application was developed. Users could choose channels, and adjust volume. The rate of rotation determined the speed of selection of items on the screen.


Discussion:

A very interesting idea. The Zenith remote from the 50s is a brilliant use of available resources. Why they chose a human powered device when they could have just as easily had a radio version is additionally interesting. Small human powered devices have limitless applications, one is always needing to change batteries when one least need to. The smaller the devices the harder and more fragile this procedure can be. A very interesting paper.