Determining the Psychological Involvement in Multimedia Interactions

Human computer interaction (HCI) is currently aimed at the design of interactive computer applications for human use while preventing user frustration. When considering the nature of modern computer applications, such as e-learning systems and computer games, it appears that human involvement cannot be improved only by using traditional approaches, such as nice user interfaces. For a pleasant human involvement, these computer applications require that the computers should have the ability to naturally adapt to their users and this requires the computers to have the ability to recognize user emotions. For recognizing emotions currently most preferred research approach is aimed at facial expression based emotion recognition, which seems to have many limitations. Therefore, in this paper, we propose a method to determine the psychological involvement of a human during a multimedia interaction session using the eye movement activity and arousal evaluation. In our approach we use a low cost hardware/software combination, which determines eye movement activity based on electrooculogram (EOG) signals and the level of arousal using galvanic skin response (GSR) signals. The results obtained using six individuals show that the nature of involvement can be recognized using these affect signals as optimal levels and distracted conditions.

Abstract-Human computer interaction (HCI) is currently aimed at the design of interactive computer applications for human use while preventing user frustration.When considering the nature of modern computer applications, such as e-learning systems and computer games, it appears that human involvement cannot be improved only by using traditional approaches, such as nice user interfaces.For a pleasant human involvement, these computer applications require that the computers should have the ability to naturally adapt to their users and this requires the computers to have the ability to recognize user emotions.For recognizing emotions currently most preferred research approach is aimed at facial expression based emotion recognition, which seems to have many limitations.Therefore, in this paper, we propose a method to determine the psychological involvement of a human during a multimedia interaction session using the eye movement activity and arousal evaluation.In our approach we use a low cost hardware/software combination, which determines eye movement activity based on electrooculogram (EOG) signals and the level of arousal using galvanic skin response (GSR) signals.The results obtained using six individuals show that the nature of involvement can be recognized using these affect signals as optimal levels and distracted conditions.Index Terms-Arousal, Attention, Cognition, Emotion, EOG, Eye Movement Activity, GSR, HCI, Human Involvement, Multimedia Interactions.

I I.
NTRODUCTION M ost modern multimedia applications come up with very attractive user interfaces and HCI studies the design of user interfaces in greater detail [20].While most applications benefit from nice user interfaces, such as iPhone Twitterrific, there is another category of applications where the human-machine interaction could be improved by having machines naturally adapt to their users, for instance tutoring systems.In such systems the adaptation involves the consideration of emotional Manuscript received March 12, 2009. Accepted November 20th, 2009.This research was funded by the National Science Foundation, Sri Lanka.

Determining the Psychological Involvement in Multimedia Interactions
Hiran B. Ekanayake, Damitha D. Karunarathna and Kamalanath P. Hewagamage information, possibly including the expression of frustration, dislike, confusion, excitement, etc.This emotional communication along with handling affective information in HCI is currently studied under affective computing [35].
Human emotions are believed to be containing an emotional judgment about one's general state and bodily reactions [6] [42].For instance, when a driver cuts one off, he/she would experience physiological changes such as increases in heart rate and breathing rate, as well as feeling and expression of fear and anger.Literature suggests many research approaches that can be used to recognize emotions such as by using facial expressions [39], changes in tone of voice, affect signals [17] and electroencephalography (EEG) [7].However, these approaches contain their own limitations.
Humans' abilities to make decisions, judgments and keep information in memory are all studied under cognitive science [15].Although there is no widely accepted definition for human attention, it is considered as a cognitive process that helps humans to selectively concentrate on few tasks while ignoring other tasks, for instance concentrating on a movie played on a computer screen ignoring what is happening outside.According to recent developments in cognitive science emotions also play a major role in human cognition especially in decision making and memory, such as flashbulb memory [15] [42].
The research work discussed in this paper identifies human involvement in HCI as a measurable psychological phenomenon and it proposes several involvement types.Some of these involvement types can be considered as improving HCI while others are none or less involvement conditions.In determining human involvement the work proposes a low-cost hardware/software approach that consists of GSR and EOG sensing devices and recording software.
The remaining sections of this paper are organized as follows: In the related work section, the two popular approaches to improve human involvement in multimedia interactions are presented with their positive and negative aspects.The section also gives a brief overview of mental tasks, the role of attention in coordinating mental tasks, the correlation between eye activity and visual attention, the role of emotions in humans, recognizing emotions from psychophysiological signals especially using The International Journal on Advances in ICT for Emerging Regions 2009 02 (01) : 11 -20 the changes in skin conductance, and human involvement under cognitive emotional valances.The methodology section discusses the proposed methodology of determining psychological involvement in multimedia interactions and it identifies several involvement types with their distinguishable characteristics with respect to eye activity and GSR activity.This section also includes a brief discussion of proposed low-cost hardware/ software framework for evaluating the proposed involvement cases.Experimental process and results are presented in the results section.Finally, the last section concludes by commenting on the findings and suggestive future work.

Improving Human Involvement in A. Multimedia Interactions
HCI attempts to improve human involvement in computer-based multimedia interactions by improving user interfaces and presentation of multimedia content [20].This also includes monitoring and profiling of human behavioural patterns, such as their preferred visiting paths and selections, and personalization of multimedia content to support the preferences of individual human users [5][11] [19].
Although this method has the potential to assign similar patterns for similar multimedia content, for different types of content, the predicted patterns may not give acceptable outputs.Another drawback of this approach is that this method is less sensitive to human mood changes and long-term behavioural changes.Therefore, as an improvement to this approach it has been proposed that human emotional information captured in real-time can be used as a feedback to make the machines adapt to its users and change the presentation accordingly [36].Currently, the most prevailing method for capturing human emotional information is by capturing facial expressions of users.The challenges for this method are that the quality of a facial expression analyser depends on a number of properties, such as accurate facial image acquisition, lighting conditions, head motion, etc. [33] and masked emotional communication [36], for instance a "poker face" to mask a true confusion.
On the contrary, another school of researchers are developing theories to model human like cognition and related aspects in a computer to make the machines think and act like humans, thus with the expectation that these models can predict and decide how it should communicate information with humans in a human-like manner improving the relationship.These attempts are varying from artificial intelligence (AI) based techniques [9] [32] to cognitive modelling techniques like ACT-R models [3].Although, the cognitive science day-today reveal many other functions and relationships between human cognition, emotion and physiology, it is in doubt that the true human behaviour can ever be modelled in a computer when the biological and sub-symbolic nature of humans is considered [42].

Mental Tasks, Attention, and Eye Activity B.
In psychology it is believed that people have only a certain amount of mental energy to devote to all the possible tasks and to all the incoming information confronting them [15].Nature has resolved this problem by giving the ability to filter out unwanted information and to focus the cognitive resources on few tasks or events rather than on many through a method called attention [15][40].According to the model of memory [15], the working memory is the area that contains information about currently attended tasks.The attention plays an important role concerned with coordinating information in the working memory resulting from the outside environment as well as information from the long term memories.The Kahneman's model of attention and effort is a model that explains the relationship between mental effort for tasks and attention [15] [25].According to this model the attention is enforced through an allocation policy which is affected by both involuntary and voluntary activities.For example, while opera lovers are more likely to concentrate during an opera session, others would feel drowsy even if they want to be awake.
Although there are lot more theories to explain the attention, one prevailing theory that explains the visual attention is the spotlight metaphor that compares attention to a spotlight that highlights whatever information the system has currently focused on [15][40].According to this theory, one can attend to only one region of space at a time and shift of attention is considered as a change of previously focused tasks.Spotlight theory is used in implementing visual attention in modern cognitive modelling architectures [2].Apart from the visual attention, auditory attention also plays an important role concerning attention.Theories such as Broadbent's filter theory, Treisman's attenuation theory and Deutsch and Deutsch's theory suggest that all incoming messages are processed up to a certain level before they are selectively attended to [15] [40].
The published evidence supports that eye movements directed to a location in space are preceded by a shift of visual attention to the same location [18][21] [23].However, the attention is free to move independent of eyes.Eye-tracking is one of the most active research areas studying December 2009 The International Journal on Advances in ICT for Emerging Regions 02 Hiran B. Ekanayake, Damitha D. Karunarathna and Kamalanath P. Hewagamage eye movements and these eye movements consist of saccades and fixations [12].Saccades are rapid ballistic changes in eye position that occur at a rate of about 3-4 per second and the eye is blind during these movements.The information is acquired during long fixations of about 250 milliseconds that intervene during these saccades.
Biomedical investigations recognize EOG as a technique that can be used to measure the gaze angle and assess the dynamics of eye movements [24] [26] [27] [37].In this method it is required to place the sensors on the sides of the eyes for measuring the horizontal motions of the eyes and above and below the eyes if the vertical motions of the eyes are also studied.

Emotions and Psychophysiological Signals C.
There is no universally agreed explanation for emotional responses in humans.Literature suggests many reasons for emotions and many other factors having influence on it, such as limbic system activity [6][42], asymmetries between cerebral hemispheres [29], gender differences [34], mental states and dispositions, and disequilibria between the self and the world [10].
Emotional responses are considered to have two levels of responses [42], cognitive judgment about one's general state and bodily reaction.The cognitive judgment mainly contributes to the motivation of goal accomplishment, memory processing, deliberative reasoning, and learning.Bodily reactions of emotions are of two forms, i.e. expressions and physiological signals.This response is considered to have two dimensions: pleasure (pleasant and unpleasant) and arousal (aroused or unaroused).Emotion research mainly focuses on this bodily reaction in emotion recognition.
Facial expression analysis is one of the heavily researched areas in recognizing emotions.Paul Ekman has studied the presence of basic emotional categories expressed by facial expressions across different cultures and ethnicities and identified eight facial expressions [13], happiness, contempt, sadness, anger, surprise, fear, disgust, and neutral.It is believed that these basic emotions provide the ingredients for more complex emotions, such as guilt, pride and shame [22].One of the major challenges for facial expressions based approaches is that people's ability to mask their true expressions if they do not like to communicate their true feelings [36].
In contrast, most emerging methods for emotion recognition are provided by peripheral and central nervous system signals [7][10] [29].The sympathetic activation of autonomic nervous system of the peripheral nervous system and the activation of endocrine system introduce changes to the heart rate, skin conductivity, blood volume pressure, respiration, and many other sympathetic organs which can be detected using biofeedback sensing instruments.Healey and Picard [17] in their paper present how the emotions can be recognized from these physiological signals with a higher accuracy.Apart from the peripheral signals, EEG signals from the brain have also proved the possibility in assessing the level of arousal [7].
GSR or skin conductance response (SCR) is another popular method known to be having a nearly linear correlation with the person's arousal level thus with the cognitive emotional activity of a person [38] [41].Therefore, GSR is used as a method for quantifying a person's emotional reaction to different stimuli presented.Literature suggests that the low level of cortical arousal is associated with relaxation, hypnosis, and subjective experience of psyche states and unconscious manifestations, whereas the high level of cortical arousal is associated with increased power of reflection, focused concentration, increased reading speed, and increased capacity for long-term recall.The skin conductivity or GSR is associated with this cortical arousal with the relationship that when the arousal of the cortex increases, the conductivity of the skin also increases, and when the arousal of the cortex decreases, the conductivity of the skin also decreases and this is resulting from the "fight or flight" behaviour of the autonomic nervous system.However, literature shows few other responses which can have an impact on the electrical resistance of the skin.The following summarizes the causes for skin electrical activity: Tarchanoff response is a change in DC potential • across neurons of the autonomic nervous system connected to the sensory-motor strip of the cortex.It has an immediate effect (0.2 to 0.5 seconds) on the subject's level of arousal and this effect can be detected using hand-held electrodes, because hands have a particularly large representation of nerve endings on the sensory motor strip of the cortex."Fight or flight" stress response of the autonomic • nervous system comes into action as the arousal increases as a result of increased sweating due to release of adrenaline.This is a slow response compared to Tarchanoff response.Forebrain arousal is a complex physiological • response, unique to man, affecting the resistance in thumb and forefinger.Changes in alpha rhythms cause blood capillaries to enlarge and ultimately this too affects the skin resistance.

D. Cognition, Emotion and Brain's Involvement
The emotional reaction of body has sympathetic effects to the body for "fight or flight" behaviour as well as increased activation in the reticular activation system (RAS) [4], [38].The reticular activation system is the centre of attention and motivation of the brain [1], [28], [30].It is also the centre of balance for the other systems involved in learning, self-control or inhibition, and motivation.When it functions normally, it provides the neural connections that are needed for the processing and learning of information, and the ability to pay attention to the correct task.However, if over excited, it distracts the individual through excessive startle responses, hyper-vigilant, touching everything, talking too much, restless and hyperactive behaviours.Fig. 1 shows the co-relation of attention and arousal.During a computer-based multimedia interaction session, the optimal involvement can be expected as the human participant looks towards the computer screen and psychologically experiences the content presented by the computer.However, in real situations this expected involvement behaviour can not be observed all the time as the disturbances can occur as a result of outside events and internal stress responses of the participant.
In our research, we hypothesize that the eye movement activity measured as EOG signals can be used to distinguish whether a participant is attending the visual content presented at the computer or not.The reason for using an EOG based approach is that EOG signals can be captured using low cost hardware (cost about USD 1000) in contrast to using expensive eye tracking systems (cost about USD 10000).We expect low magnitude EOG signals as a result of saccade eye movements when the participant's visual space is limited by screen dimensions than when the participant is attending the general visual space or the environment (Fig. 2).Since the primary task during a multimedia interaction session is to pay attention to the content that appears on the computer screen, we assume that the fixations during eye movements are mostly located within the visual space defined by screen dimensions.Apart form using EOG signals to distinguish visual focus, these signals may be used to identify inattention conditions, such as drowsy situations.There are empirical evidences supporting that one's eye blink rate increases as one gets drowsy [31].
Although, EOG can be used to identify visual focus, it is less effective in determining whether the participant is psychologically experiencing the multimedia content, because, simply looking at the computer screen does not mean that the participant is mentally attending to the content that is seen.Therefore, to identify this mental involvement we employ skin conductivity based measurements or GSR.Literature points out that the maximum attention or the optimal involvement can be gained when the arousal is moderate whereas too much or too little arousal does not give satisfactory levels of involvement.
In our research we propose low cost hardware and software solution to capture EOG and GSR signals.Our EOG hardware unit is based on Grant's sound card EEG project [8] (Fig. 3).The cost for building this unit is around USD 100 whereas commercial products are ten times more expensive than this.The software to interface with this device is freely available in the project website and for our solution we have used the NeuroProbe.To detect EOG signals it is required to place electrodes at both sides of eyes and middle of forehead.We have developed a headband mounting these electrodes, so that the electrode placements can be done easily and without much difficulty to the participant.The EOG hardware then receives EOG potentials which are in the range of 10 -100 microvolts and these potentials are amplified, modulated and transmitted to the computer through the sound card.At the computer, NeuroProbe software demodulates the signal and filters out unwanted components, such as 50 Hz A/C interferences and electromyography (EMG) signals, recovering original EOG waveforms.For capturing GSR signals we used a LEGO mindstorms brick based solution [16] (Fig. 4).This unit is about USD 100 whereas commercial GSR recorders cost more than USD 1000.The electrodes are wrapped around middle and index fingers of the left hand of a participant, so that the right hand is free to use for other tasks, such as controlling the mouse.Although the brick can take readings at a rate of about 40 samples/second, since the transmission to the computer is through an IR link, the achievable transmission rate is about 2 samples/ second.To reduce the errors, we have implemented a Gaussian smoother in the brick software.Moreover, the readings are represented as a value between 0 to 1023, called the raw value, and the relationship between the actual skin resistance (SR) and the reading is given by, reading value = 1023*SR/ (SR+10000).We thought this accuracy is sufficient because usually in emotion research the response window of 1 to 10 seconds in analysed [17].Finally, EOG signals received from the NeuroProbe and GSR signals received from the brick are fused at a software developed by us.This software is also capable of annotating the signals based on media events, such as media transitions and user defined events.The recorded signals are then analysed using MATLAB signal processing toolbox [43].

Iv.
The experiments were conducted using six volunteers labelled A, B, C, D, E and F (Fig. 5).Multimedia interaction session consisted of several multimedia types and they were labelled as I01, I02, etc. Table I gives a brief description of multimedia interactions used in the experiment.After computer-based multimedia interactions, the recording was continued for a while without informing the subject For each subject, for each interaction, GSR and EOG signals were recorded.The letter G represents a GSR waveform.The time is measured in seconds.
Fig. 6 shows the GSR waveforms recorded for each subject over the interactions I01, I02 and I03.
Graphs (a) and (c) in Fig. 6 show some relationship between the GSR signal waveforms of each subject.However, graph (b) does not show much change in its signal waveforms or clear relationship between signals.In order to check the quality of visual attention over three types of multimedia interactions, I01, I02 and I03, the mean signal magnitudes of EOG waveforms were calculated for each subject and tabulated in Table II.
The results in Table II show that the interaction I03 has the lowest average EOG value compared to the interactions I01 and I02.Table III gives the mean values of EOG signal waveforms obtained for the interactions I03, I05, I06, I08 and I13 and it compares the increase of average EOG signal magnitude over the interaction I13 respect to interactions I03, I05, I06 and I08 (denoted as I03..8).From Fig. 7 and results in Table III it is apparent that average EOG signal magnitudes for off-screen interaction has 1.5 to 3.5 times increased value than the average EOG magnitudes for on-screen interactions.
Fig. 8 shows the GSR waveforms recorded for each subject over the interactions I05, I08 and I13.Table IV gives the means calculated for each of the GSR signals.From Fig. 6 (c) and Fig. 8 (b) it can be observed that for all the participants over the time segment 150-200 seconds of the interaction I03 and the time segment 25-35 seconds of the interaction I08 the GSR waveforms show a similar pattern.During these time segments the participants were observing the exciting events contained in those multimedia documents and from their facial expressions it was observed that they are getting excited for a while.
From Fig. 8 (a) it can be observed that for participants B and D the GSR waveforms show very similar patterns and except for participant E all the other GSR waveforms show less variance.During this interaction the participants were observing the video lecture and except the participant E all the others showed they are concentrating over the interaction form their facial expressions.However, it was observed that participant E is having some body movements.
Fig. 8 (c) identifies the GSR waveforms when the participants are not attending on-screen interactions.From Table IV it is apparent that off-screen interaction gives the lowest mean GSR signal values for all the participants than on-screen interactions.Emotionally significant interactions, i.e.I03 and I08, give the moderate mean GSR signal values and video lecture interactions, i.e.I05 and I06, give the highest mean GSR signal values.
Fig. 9 shows the GSR waveforms for subjects B and D over the interactions I05 and I06.
During the interaction I05 it was observed from their facial expressions that both participants B and D were concentrating over the interaction.However, when the interaction is repeated (i.e.I06) boring (or inattention) behaviours were observed.The boring behaviour was distinguishable from periodic rapid  Table V gives the means and standard deviations calculated for EOG signal waveforms of window size 10 seconds for randomly selected instances of interaction I05 and instances when concentration type (active) and drowsy type behaviours are present during the interaction I06.
From Table V results it can be identified that in most situations the EOG mean and standard deviation values of I06 report about 25% increased values than the values for the interaction I05.

DIsCUssION v.
The results in Table III have identified the effectiveness of using EOG signals to differentiate the users' attention to on-screen interactions from off-screen interactions.Further the results have proved the appropriateness of using EOG and GSR signals to distinguish human involvement with multimedia interactions.Multimedia documents having emotionally significant events can result in more active involvement and it can be identified from the resulting GSR signals having higher variances, moderate mean GSR values and changes in GSR pattern having correlation with emotional events in the media.The GSR waveform becomes smoother and reaches a higher mean value when the human is concentrating on an interaction having no or less emotionally significant contents.However, this type of involvement can also fall into inattention if the human feels the interaction is boring.Since GSR alone cannot identify concentration from inattention, the EOG signal patterns are analysed in fixed sized windows.The results have identified that under inattention the EOG waveforms report an increased mean and standard deviation values than concentration type of involvement.Off-screen type of involvement was easily distinguished by higher magnitude EOG signal waveforms and low GSR mean values.
Apart from the involvement types, the results have also shown the significance of having auditory content in addition to visual content in a multimedia interaction in improving the human involvement.However, for a robust conclusion more focused research work is required to identify the correlation between different media types and resulting types of involvement.
The work reported in this paper has considered only limited psychological factors for its work.For a more complete investigation, consideration of psychological factors, such as gender differences, age and cultural aspects are also required.Moreover, our experiments were conducted using low cost hardware/software having many limitations.Although, low cost hardware/software is more realistic when the practical use is considered, on the negative side it hinders the ability to identify more psychophysiological patterns with respect to human involvement in multimedia interactions.This was evident from the GSR signals recorded for the participant C, where the readings did not have much variance.
As a future continuation of this work, an application, such as for e-Learning, can be developed having the capability to determine the user's involvement and to dynamically change the presentation to give the human a pleasant multimedia experience while avoiding negative psychological conditions, such as boredom and fatigue.aCkNOWleDgmeNT
picture without auditory content I03 A video clip containing an exciting event I05 A video lecture without exciting events I06 Repeat of the same interaction I05 I08 A video clip containing an exciting event I13

Fig. 7
Fig. 7 shows samples of EOG waveforms recorded for each subject over the interactions I03 and I13.The letters L and R represent left and right eye EOG waveforms respectively.For instance, B05L corresponds to the left eye EOG waveform for the interaction I05 for the subject B.

Fig. 9 .
GSR waveforms for the interactions I05 and I06 of (a) participant B and (b) participant D.eye activity and frustrated facial expressions.Fig.10shows instances of EOG waveforms when the participant is concentrating during the interaction I05 and falling into boring and drowsy (inattention) situation during the interaction I06.

Fig. 10 .
Fig. 10.EOG waveform instances of the subject B (a) concentrating over the interaction I05, (b) active during the interaction I06, and (c) drowsy during the interaction I06.

TABLE I mUlTImeDIa
INTeRaCTIONs UseD IN The expeRImeNT

TABLE II meaN
sIgNal magNITUeDs Of eOg WavefORms Of eaCh paRTICIpaNT OveR INTeRaCTIONs 101, 102 aND 103

TABLE III a
COmpaRIsON Of meaN sIgNal magNITUDes Of eOg WavefORms BeTWeeN ON-sCReeN INTeRaCTIONs aND aN Off-sCReeN INTeRaCTION