Is Alexa Happy or Angry? Perceptions and Attributions of Emotional Displays of Smart Technologies in Residential Homes

By inergency On Mar 26, 2024

[ad_1]

1. Introduction

Virtual assistants such as Amazon’s Alexa, Microsoft’s Cortana, and Apple’s Siri are increasingly used in residential and commercial homes to control various functions [1]. For example, many residents use their virtual assistants to control their residential temperature [2,3]. Virtual assistants provide an important source of influence that has the potential to affect perceptions and behaviors of their users including sustainable and environmentally friendly behaviors. A defining feature through which computers act socially is through providing spoken feedback messages to users’ requests in a veneer which mimics human conversation [4]. A question arises as to whether these computer messages can influence subsequent behaviors of their users and what the underlying psychological processes are that drive potential behavior changes [5]. Research indicates that individuals use the behavior of computers to form inferences about feelings, goals, motives, and intentions of the computer as a social actor [6]. These inferences can influence subsequent user behavior in a systematic way, even when users are aware that their interlocutor is a computer [7,8,9,10].

Energy consumption in households plays a significant role in meeting environmental goals [11,12]. Thus, it is important to understand which messages of virtual assistants are most effective and efficient in helping residents to meet their energy goals. In a recent study, Kim et al. [3] observed that feedback delivered through voice assistants led to decreased energy usage by influencing residents to change the temperature settings to more efficient schedules. Kim et al. conducted a longitudinal study with residents in a multi-unit residential community, in which they introduced an energy-saving program using a modeling approach for personalized eco-feedback. The personalized feedback was provided through visual (wall-mounted tablet) and voice (Alexa) user interfaces. Kim et al. observed an increase in the indoor temperature during the cooling season compared to a baseline measure that speaks to the potential of the energy-saving program to increase sustainable behavior.

We set out to contribute to this literature on behavior-based feedback by looking at the emotionality of the spoken message of virtual assistants as one specific characteristic of verbal feedback. The mimicry of human emotion is one particular computer behavior which has the potential to influence human behavior, as emotional messages that provide feedback can be particularly persuasive [13]. However, as a prerequisite, to have the intended effects on energy saving behaviors, Alexa’s emotional messages have to trigger at least three basic social cognitions: (1) the emotional displays (e.g., the expression of happiness or anger) have to be identified by residents; (2) residents have to correctly identify their behavior as a target of the emotional display; and (3) residents have to attribute the emotional display to that behavior.

In the remainder of this paper, we first introduce research on attribution theory and emotions in computers and virtual assistants and advance five hypotheses about users’ social cognitions, including two hypotheses about the role of distinctiveness in attributions that are informed by Kelley’s covariance theory. Next, we report two studies that tested the proposed hypotheses and identified conditions that triggered the described three basic social cognitions. Whereas Study 1 tested the first three hypotheses related to the proposed basic social cognitions, Study 2 included the hypotheses on distinctiveness. We conclude by discussing the limitations of the provided studies and offering questions for future research.

2. The Scenario of Residential Internet of Things (IoT): Attributional Outcomes

One common situation in which virtual assistants are regularly used and in which a virtual assistant’s emotional displays may, thus, impact user behavior are interactions in which individuals regulate their residential internet of things (IoT) [14]. Prior research suggests that the behavioral impact of emotional displays in situations like this is dependent on the inferences and affective reactions of users [15]. For emotional displays to influence a user, the device does not have to directly interact with a user, but it can also exert influence if interactions and its use are observed by a user [15].

To study how actors and observers attribute emotional displays of virtual assistants, a context was defined in which individuals would utilize a virtual assistant in order to manipulate an IoT device. In this temperature regulation scenario, individuals imagine they are residents of an apartment complex and regulate their residential temperature by making requests for the virtual assistant to manipulate the temperature via a thermostat in the virtual assistant’s control. Individuals in a residence can ask for a variety of temperature changes and can make such requests multiple times as they decide to regulate the temperature of their residence. In response to the users’ requests, a virtual assistant can proffer a confirmation of a temperature change, provide extraneous information to a user’s request, and can display a facsimile of an emotional state.

Feedback from a virtual assistant offers an avenue to influence and shape users’ subsequent behavior: research suggests that the impact of emotional displays will depend on the social inferences made about the cause of the emotional display [15,16]. If individuals infer that a user has caused an emotional display with a specific request, this may affect whether that individual will make the same request in the future. However, individuals can also make attributions to other causes [16]. In the described scenario, four reasonable attributions can be distinguished: individuals can make situational attributions of the emotional display to a specific request made by users (e.g., Alexa is happy about my temperature change); to a specific user (e.g., Alexa likes me); or to extraneous information injected into the interaction by the virtual assistant (e.g., Alexa provides a weather report). Last but not least, users can also make a dispositional attribution to the virtual assistant’s general disposition towards displaying the emotional state featured in the display. For example, when observing that Alexa is providing feedback in a happy manner, users may assume that Alexa is always happy and always provides feedback in a happy manner. Effects of feedback that are attributed to a disposition of Alexa may be different from a situation in which users attribute the display of happiness to their specific behavior and, thus, interpret the emotional display as positive feedback on their behavior.

3. The Display of Emotions by Virtual Assistants

To have the intended effect on residents’ behavior, emotional displays have to be identified by residents. Thus, producing effects of emotional displays on energy-saving behaviors of residents entails the task of designing a variety of distinguishable emotional displays. The virtual assistant has to display emotions in a way such that observers can correctly identify the portrayed emotion of the assistant. This task is not trivial, as non-verbal auditory communication can complement or contradict verbally stated emotions. As part of this project, we developed a method to program a virtual assistant in a way that would display the emotion of happiness and anger. The virtual assistant was an Amazon Alexa, programmed with different voices to appear as a unique virtual assistant. Two male voices and two female voices were used. Happiness and anger conditions were varied through the virtual assistant’s response to user requests, where the response contained an expression of happiness or anger through the use of semantics, rate of speech, volume, and pitch.

In happiness conditions, the virtual assistant said “I am happy with…” with an increase in pitch (out of three options in Amazon’s Speech Synthesis Markup Language), an increase in volume, and at 5% reduction in speed. In anger conditions, the virtual assistant said “I am annoyed with…” with a decrease in pitch (out of three options in Amazon’s Speech Synthesis Markup Language), an increase in volume, and a 5% reduction in speed.

As one important goal, the study aimed to find out if the emotional displays of happiness and anger of an Alexa that is programmed in this way and explicitly expresses that it is happy or angry about an object will be correctly identified, yielding the first hypothesis:

Hypothesis 1 (H1).

Emotional displays of happiness and anger that are programmed through the use of semantics, rate of speech, volume, and pitch are identified and recognized by users of Alexa: happy displays are being judged as happier than angry displays and vice versa.

4. Emotional Display Targets and Attributions

To have the potential to influence behavior, emotional displays do not only have to be identified; an observer also has to be able to infer that the emotional display targeted the behavior that should be repeated (positive response) or changed (negative response) and attribute the emotion to that behavior. The target of an emotion should not be confused with the cause or attribution of an emotion. The target of an emotion such as happiness or anger refers to the object of the emotion, that is, what an agent is happy or angry about. Conversely, the attribution refers to the reason or cause of why an agent showed the respective emotion. An individual can infer that an expressed target of an emotional display is synonymous or divergent from the cause of a display [14]. For instance, after observing a virtual assistant telling a user that a request—such as a request to change the temperature of a thermostat or to play some music—made it happy, the user (or someone observing the user) can infer that the virtual assistant is indeed expressing happiness at the user’s request or can infer that the virtual assistant is simply happy towards everything the user does. This plays out in behavioral outcomes, in that a display of happiness in response to a user’s request might encourage a user to make a specific request again, or simply encourage them to use the virtual assistant more in general.

4.1. Hillebrandt and Barclay’s Study

Hillebrandt and Barclay [16] conducted a series of studies that provide insights about the attribution of emotional displays in the context of social interactions that are relevant to understanding how interactants may attribute emotional displays of computers. In their studies, they used the context of a negotiation task to investigate the attributions individuals made to a negotiation partner’s emotional displays of either happiness or anger and how their behaviors (namely, their concessions in the negotiation task) changed as a result of their attributions.

As a key part of their studies, Hillebrandt and Barclay [16] manipulated the target of an emotional display such that the display had either an integral or incidental target and tested if the change in target impacted whether participants attributed the emotional display to their own behavior or not. They defined an integral target as an object of an emotional display that is directly related to the immediate interaction, whereas an incidental target refers to an object of an emotional display unrelated to the interaction at hand. In their studies, participants engaged in a negotiation game with a confederate. Between rounds, the participant was made to overhear a telephone call made by the confederate. In this call, the confederate made an emotional display of either happiness or anger, saying, “Yes, I know I sound [angry/happy]”. This was then followed by an emotional display with either an incidental target or an integral target where the confederate said either, “I feel [angry/happy] because of something that happened outside of work,” or “I feel [angry/happy] because of the offer I just received”.

Hillebrandt and Barclay [16] found that when the target of the emotional display was perceived to be integral to the interaction (offer-related) as opposed to incidental (work-related), participants were more likely to attribute the cause of their negotiation partner’s emotional display to their own behavior. Additionally, Hillebrandt and Barclay found that when the negotiation partner’s emotional displays had integral targets, participants made greater concessions than when the emotional displays had incidental targets. While angry displays predicted greater self-attributions from the participant than happy displays, emotional displays with integral targets led to greater self-attributions than displays with incidental targets for both happy and angry displays. Furthermore, when participants attributed the cause of the negotiation partner’s display of anger to their own behavior, the participants were significantly more likely to concede greater amounts in the negotiation task.

This would imply that, when faced with emotional displays from a virtual assistant, individuals are more likely to attribute the display of emotion to a user’s behavior when the emotional display has an integral target. Specifically, in an interaction with a virtual assistant wherein a user makes a request of the virtual assistant, one such integral target would be the user’s request. In such a situation, Hillebrandt and Barclay’s findings suggest that individuals are more likely to attribute the virtual assistant’s emotional display to the user’s behavior—namely, the request the user made—than in a situation in which the emotional display has an incidental target.

This rationale yielded the following hypotheses:

Hypothesis 2 (H2).

Incidental and integral targets of an emotional display can be correctly identified and remembered by at least 90% of users of a virtual assistant and observers of an interaction of a user with a virtual assistant.

Hypothesis 3 (H3).

If a virtual assistant’s emotional display targets a specific request and is, thus, integral to an interaction, individuals will be more likely to attribute the cause of the emotional display to the specific request, in comparison to when the target of a virtual assistant’s emotional display appears incidental to the interaction.

We used a 90% criterion in Hypothesis 2 as an aspiration level that would suggest that the message and context triggered the intended identification of targets. In addition to looking at the effects of the emotional display and the target, we also aimed to explore the role of the distinctiveness of the displayed emotion, which has been shown to directly affect attributions and to qualify the effects of targets on social inferences [17] (see Figure 1).

4.2. Kelley’s Covariance Theory: The Role of Distinctiveness

Kelley’s [17] covariance theory of attribution suggests that attributions result from the covariance over time between a repeated behavior and potential causes which the behavior could be attributed to. Kelley proposed that the dimensions of distinctiveness, consistency, and consensus account for this covariance. Kelley’s theoretical framework suggests that individuals attribute the cause of a behavior such as an emotional display either to the disposition of the person displaying the behavior or to situational features based on whichever appears to be the most distinct and consistent covarying feature across instances of the behavior [18,19,20]. In Kelley’s theory, distinctiveness refers to situational features that co-occur most closely with the behavior in question. As an example, if a virtual assistant only responds to specific requests with displays of happiness, then the most distinctive covarying situational feature is that specific request. Conversely, if a virtual assistant were to respond to requests with displays of happiness but only when its owner made the query, as opposed to other users, then the owner would be the most distinctive feature covarying with the display of emotion over time. Consistency refers to the strength of the covariance, that is, the degree to which the behavior is only present along with the most distinct situational feature with which the behavior co-occurs. Finally, consensus as the third dimension refers to the degree to which the observer believes that the behavior would be displayed by other social actors in the same situation [17,18,20].

Of the three dimensions proposed by Kelley, distinctiveness addresses both the temporal nature of information influencing attributions as well as the variability in potential causes which an emotional display can be attributed to. As such, a primary focus of the current work towards predicting how emotional displays are attributed over time was in the understanding of the role of distinctiveness and how distinctiveness interacts with the manipulated target.

In the case of individuals interacting with a virtual assistant, a variety of users could submit a variety of requests to the virtual assistant. If the virtual assistant were to subsequently display an emotion, Kelley’s theory [17] would suggest that the cause an individual would attribute to the virtual assistant’s emotional display would be based on the individual’s experience observing that particular emotional display from the virtual assistant in the past. Research suggests that the specific attributions individuals make are context-specific and attributions to any specific dispositional or situational causes do not preclude attributions to other causes [21,22]. Based on this rationale, the following two additional hypotheses on the role of distinctiveness were advanced:

Hypothesis 4 (H4).

If the virtual assistant’s emotional displays are distinct (with regard to the user or an integral or incidental target), observers of the assistant’s interactions with users recognize the dimension of distinctiveness (in terms of the user or an integral or incidental target) in the displayed emotions.

Hypothesis 5 (H5).

If a virtual assistant’s emotional display targets a specific request in a distinct way, individuals will be more likely to attribute the cause of the emotional display to the specific request, in comparison to when the target of a virtual assistant’s emotional display appears incidental to the interaction or when the emotional display is not distinct.

5. Overview of Current Research

To test the proposed hypotheses and to explore the role of dispositional attributions, we conducted two studies that included a variety of different measures of participants’ attributions of Alexa’s emotional displays. Both studies involved a scenario wherein users interacted with a virtual assistant to regulate the temperature of their residence. Participants were put in the role of an observer, watching videos of confederates interacting with a virtual assistant. The studies set out to test whether individuals are able to recognize the emotional display of Alexa and the target of Alexa’s messages. In addition, the studies aimed to find out when individuals make dispositional attributions and when individuals make situational attributions while observing virtual assistants’ emotional displays to test for a relationship between the target of a virtual assistant’s emotional displays and attributions to user requests.

Recognition of emotional display of speaker and receiver [23] goes hand in hand in detecting whether one of the four maxims that govern effective communication [24] are violated. According to Hortensius et al. [25], the arrival of emotionally expressive artificial agents is imminent, yet facial and bodily expressions are the most studied. Recognition of emotions by voice, however, without facial expression, may prove to be more difficult for some populations using standard metrics [26]. We were, thus, interested in exploring to what extent a voice-based virtual assistant’s emotional display (in its imperfect state) affects attributions.

Even though the studies focused on situational attributions, we also included measures on dispositional attributions. Situational attributions attribute a behavior, including the display of an emotion, to causes that are rooted in the situation in which the behavior is shown. Attributing an emotion to a specific request would be an example of a situational attribution. Conversely, dispositional attributions attribute a behavior to the disposition of the actor or the observer.

Research comparing attributions of human and computer behavior suggests that computer behaviors are more prone to being attributed to the disposition of the computer in comparison to interpersonal interactions, in which situational attributions are more often used as primary attributions [5,27,28,29,30]. This same research also suggests that there are some differences in the degree to which individuals make situational attributions about computers, depending on whether the behavior had positive or negative outcomes and consequences [5].

Social inferences about computers can also be influenced by the emulation of social characteristics [31,32]. Specifically, the use of an emulation of a human voice can influence attributions as well. Studies on computer-synthesized voices have found that manipulations of tone and semantic message can impact inferences about the computer’s gender and intentions [31,33,34]. We conducted two studies that aimed to control for these factors while testing the proposed hypotheses. Whereas Study 1 focused on Hypotheses 1–3, Study 2 tested all five hypotheses.

6. Study 1

In Study 1, participants watched a video of users interacting with a virtual assistant and were asked to answer questions about the cause of the virtual assistant’s emotional display (see Figure 2). The study tested Hypotheses 1–3 and explored whether an attribution to the disposition of the virtual assistant was the primary attribution being made across conditions and if emotional displays that used integral targets yielded a higher percentage of attributions to specific requests than displays that used incidental targets.

6.1. Methods

6.1.1. Participants and Design

Amazon’s Mechanical Turk Program was used to recruit participants for the study. Participants were required to have a 95% success rate for other HITs hosted on Amazon’s Mechanical Turk Program and live in the United States. They were compensated USD 1.82 for approximately 15 min of work [35]. Participants were removed from the data analyses if they failed an attention check question or failed to complete the study.

For Study 1, the initial sample was 232 participants. Seventeen participants were not included in the analysis due to failing the attention check, and 21 participants were removed for failing to complete the study, leaving a sample of 194 participants. Of these participants, 126 were male, 66 were female, and 2 chose not to answer. The age of participants ranged from 23 to 78 years with M = 36.74 (SD = 10.92). In terms of education, 7.2% of the participants had graduated from high school or had a GED; 8.8% indicated that they had some college education; 36.6% had a bachelor’s degree; 5.7% had attended some graduate school; 40.7% had a graduate degree; and 1% chose not to answer.

The study varied two factors. First, the emotional display of the virtual assistant was either angry or happy. Second, the emotional display either targeted a request by the user or extraneous information (integral vs. incidental target). The extraneous information referred to HBO or something on the news. The gender emulated by the voice of the virtual assistant included male as well as female speakers. Factors were manipulated in a between-subjects design. As dependent variables, the study measured perceived emotion and target and included a forced-choice measure of primary attribution and an instrument measuring attributions to specific requests.

6.1.2. Procedure

Participants were instructed that they would be watching an interaction between a virtual assistant and a user and would be asked about what they thought of the virtual assistant’s emotional display. Participants were told that the user lives in an apartment and uses the virtual assistant to control the temperature of the apartment. Participants then watched a video displaying a single interaction between the user and Alexa where the user requested a temperature change from the virtual assistant (see Figure 2 for an overview). In all videos, the participant observed a user asking the virtual assistant to change the indoor temperature from 68° to the outdoor temperature of 72°. The virtual assistant responded with an acknowledgement of the temperature change (e.g., “Changed the temperature to 72°”) as well as a piece of extraneous information, an HBO update, or a news update (e.g., “By the way, HBO got new movies”). The virtual assistant also responded with either a display of happy or angry emotion with either the request or the extraneous information as the target (e.g., “I’m happy with that temperature change” vs. “I’m happy with those new movies”). Following the video, participants were asked a series of questions on their perception of emotion in the virtual assistant’s response, perceptions of the virtual assistant’s target, causes attributed to the emotional displays observed, and general demographic questions.

6.1.3. Materials

The study was administered through Qualtrics. The videos showed confederates interacting with a virtual assistant. The virtual assistant was an Amazon Alexa, programmed with different voices to appear as a unique virtual assistant. Two male voices and two female voices were used. Happiness and anger conditions were varied through the virtual assistant’s response to user requests, where the response contained an expression of happiness or anger through the use of semantics, rate of speech, volume, and pitch. Target was varied through the virtual assistant indicating that it was either happy or angry about the temperature change (integral target) or the extraneous information provided by the virtual assistant (incidental target). Two pieces of extraneous information were used to separate the impact of extraneous information from the impact of incidental targets (the virtual assistant telling the confederate that “By the way, HBO got new movies” or “By the way, I got a breaking news update”).

6.1.4. Measures

Perceived Emotions. Instruments on perceived anger and happiness were used to test Hypothesis 1 as the state of emotion being emulated in the virtual assistant’s emotional displays. Namely, a three-item instrument used by Hillebrandt and Barclay [16] was used to measure happiness (“The virtual assistant appeared happy”, “The virtual assistant appeared satisfied”, and “The virtual assistant appeared joyful”); and a three-item measure from their study for anger (“The virtual assistant appeared annoyed”, “The virtual assistant appeared aggravated”, and “The virtual assistant appeared irritated”) was also used. All items used Likert scales ranging from strongly disagree (1) to strongly agree (5). The three items of the perceived happiness scale were subsequently averaged (Cronbach’s α = 0.92), as were the three items of the perceived anger scale (Cronbach’s α = 0.92).

Perceived Target. The perceived target was assessed with a single forced-choice item adapted from Hillebrandt and Barclay [16]. It asked participants to determine whether a virtual assistant’s emotional display was directed either towards (a) the user’s request or (b) something other than their request (“The virtual assistant indicated that its feelings were due to the requested temperature change” vs. “The virtual assistant indicated that its feelings were due to something besides the user’s requests”).

Attributions. The following attributions were measured through instruments adapted from Hillebrandt and Barclay [16]: attributions to specific requests, attributions to a specific user, attributions to the virtual assistant, and attributions to extraneous information provided by the virtual assistant. The specific request attribution scale (Cronbach’s α = 0.88), user attribution scale (Cronbach’s α = 0.83), virtual assistant attribution scale (Cronbach’s α = 0.82), HBO attribution scale (Cronbach’s α = 0.88), and news attribution scale (Cronbach’s α = 0.86) all consisted of three Likert-type items (for example, “The virtual assistant’s feelings were most likely caused by HBO’s new movie selection”, “The HBO update most likely caused the virtual assistant’s feelings”, and “The virtual assistant’s feelings were most likely caused by the HBO movie selections”) ranging from 1 (strongly disagree) to 5 (strongly agree).

An additional forced-choice item asked participants to identify which of the following most likely caused the virtual assistant’s display of emotion: (a) a specific request made by the user, (b) the user regardless of request, (c) the virtual assistant’s programming, (d) HBO’s new movies, or (e) the news.

6.2. Results

6.2.1. Emotion

Hypothesis 1 predicted that participants judge the emotional display in the happy condition to be happier than in the angry condition; likewise, the emotional display in the angry condition was expected to be judged to appear angrier than the display in the happy condition. As perceived happiness and anger were measured independently, two ANOVAs were conducted comparing the respective scores for happiness and anger between conditions for emotion and target. Results showed that the variation of the virtual assistant’s display of emotion was effective. Participants in the happy condition reported higher perceived happiness (M = 3.82, SD = 0.84) than those in the anger condition (M = 3.28, SD = 1.38) in the analysis of perceived happiness (main effect emotion, F(1,190) = 9.88, p = 0.002, η_p² = 0.05). Likewise, participants in the happy condition reported lower perceived anger (M = 2.52, SD = 1.31) than those in the anger condition (M = 3.62, SD = 1.12) in the analysis of perceived anger (main effect emotion, F(1,190) = 38.45, p < 0.001, η_p² = 0.17). The target manipulation did not significantly impact perceptions of happiness (all Fs < 1.18) or anger (all Fs < 0.49).

6.2.2. Targets

While 73 (82.02%) participants in the incidental target condition correctly identified incidental targets, only 44 (41.90%) participants correctly identified integral targets. These findings were independent of the emotional display. Participants in the integral target happy condition made inferences about the target that were similar to those in the integral target angry condition. In the integral target happy condition, correct identification was achieved by 21 (42%) participants, and in the incidental target happy condition, 34 (80.95%) participants correctly identified the emotional display target, while 23 (41.82%) participants in the integral target angry condition and 39 (82.98%) participants in the incidental target angry condition correctly identified the target. Neither the used gender of Alexa nor the variation of extraneous information had an effect on perceived emotion or the perception of targets. Thus, Hypothesis 2 was not supported. Many participants had difficulty identifying and remembering when Alexa referred to the specific temperature change request.

6.2.3. Attributions

Overall, 65 (33.51%) participants chose a specific request as the primary attribution, 19 (9.79%) chose the user, 71 (36.60%) chose the virtual assistant, and 39 (20.10%) chose the extraneous information the virtual assistant provided. When split by emotion, participants were more likely to attribute the cause of the virtual assistant’s emotions to the virtual assistant when it was angry (44, 43.10%) and more likely to attribute it to a specific request when it was happy (27, 29.30%) (anger: χ²(3, 102) = 23.02, p < 0.001; happiness: χ²(3, 92) = 20.52, p < 0.001).

Hypothesis 3 suggested that participants in the integral target condition would make greater attributions to a specific request than those in the incidental target condition. When the virtual assistant responded to user requests with an emotional display that targeted their request (the integral target condition), participants’ attributions to the specific request averaged M = 3.57 (SD = 1.04) across the three items used to assess attributions to specific requests. When the virtual assistant targeted information extraneous to the user’s request (the incidental target condition), participants’ attributions to the specific request averaged M = 3.34 (SD = 1.26). The predicted pattern held for the happy condition wherein the means for those in integral vs. incidental target conditions were significantly different (integral: M = 3.77, SD = 0.81; incidental: M = 3.38, SD = 1.19; t(67.97) = 2.18, p = 0.02) but not for the angry condition (integral: M = 3.38, SD = 1.19; incidental: M = 3.39, SD = 1.29; t(100) = 0.06, p = 48). A two-way ANOVA comparing the effect of target and emotion on attributions to a specific request resulted in insignificant main effects for target (F(1,190) = 2.06, p = 0.15) and emotion (F(1,190) = 0.791, p = 0.38), with no significant interaction between the two (F(1,190) = 2.32, p = 0.13). As participants in the happy condition had made significantly greater attributions to specific requests to emotional displays with integral targets compared to those which had incidental targets, Hypothesis 3 was partially supported.

An additional test of Hypothesis 3 was conducted through assessing the impact of target and emotion on the forced-choice single item question, asking participants to choose the most likely cause of the virtual assistant’s emotional display. When attributions to the two extraneous information sources were condensed into one category, 38 (36.19%) participants in the integral target condition selected user requests as the primary cause of the emotional display, and 27 (30.34%) participants in the incidental target condition selected user request as the primary cause attributed to the emotional display.

As participants had low accuracy in identifying targets, we also tested Hypothesis 3 on participants who were able to correctly identify the target of the virtual assistant’s emotional display (N = 117). Within this group of participants, the average score for attributions to the specific request in the integral target condition (M = 3.31; SD = 1.02) did not systematically differ from the score in the incidental target condition (M = 3.25; SD = 1.32; t(170.58) = 0.29, p = 0.39), resembling the findings for the whole study sample.

6.3. Discussion

Study 1 demonstrated how a virtual assistant’s emotional displays of happiness and anger can be manipulated such that they are identified and recognized by individuals who watch an interaction of an Alexa user with the virtual assistant. This is an important requirement to be able to influence energy-saving behaviors of residents through emotional displays. At the same time, the study revealed that only about 40% of observers correctly identified a user’s request to change the room temperature as the target of an emotional display when Alexa explicitly referred to it while expressing an emotion (integral target condition). As suggested by the literature on attributions concerning human-computer interactions, a significant percentage of attributions were dispositional (36%). Together, attributions towards the virtual assistant and the specific user request accounted for 70% of the chosen attributions. Hypothesis 3 predicted that emotional displays that are integral to an interaction will be more likely attributed to the specific request of the conversation than displays that have an incidental target. There was some support for the hypothesis for happy displays but not for angry displays.

7. Study 2

Study 2 had the goal to test all five proposed hypotheses, including the hypotheses on distinctiveness. Study 2 extended Study 1 by having participants watch not only one video, but a series of videos of three users interacting with the virtual assistant prior to watching a final video of a user–assistant interaction. Participants were then asked to make attributions of cause to the emotional display from the virtual assistant in the final video. This was done in order to assess whether changes in the distinctiveness with which situational features co-occur with the virtual assistant’s emotional displays over subsequent interactions impact participants’ attributions. The use of multiple videos allowed participants to observe combinations of situational features, with emotional displays only occurring when a specific situational feature was present. Specifically, the sequence of videos made it so that either a specific user request, a specific user, or a specific piece of extraneous information (an HBO, news, or software update) was the most distinct situational feature co-occurring with videos containing an emotional display from the virtual assistant. As gender had not significantly impacted attributions in Study 1, it was dropped as a variable for the present study. Study 2 also served the purpose of replicating the main findings of Study 1.

7.1. Methods

7.1.1. Participants and Design

For Study 2, the initial sample had 458 participants. The final sample had 353 participants, as 54 were not included in the analysis due to failing the attention check, and 51 were removed for failing to complete the study. There were 227 male participants, 124 female participants, and 2 chose not to indicate their gender. The age of participants ranged from 20 to 76 years (M = 36.17, SD = 10.12). In terms of education, 6.2% of the participants had graduated from high school or had a GED; 8.5% indicated that they had some college education; 35.4% had a bachelor’s degree; 7.1% had attended some graduate school; 42.5% had a graduate degree; and 0.3% chose not to answer. As with Study 1, participants were compensated USD 1.82 for taking part in the approximately 15 min-long study.

The design had three factors: emotion (happiness vs. anger), target (integral vs. incidental), and distinctiveness (distinct user vs. distinct request vs. distinct extraneous information vs. indistinct). In addition, the study used three different forms of extraneous information (HBO vs. news vs. software). The same measures of attribution (scales for each potential attribution plus a forced choice measure) were used, as in Study 1.

7.1.2. Procedure

As with Study 1, participants were recruited through Amazon Mechanical Turk to take part in a study administered through Qualtrics that tested their perceptions of a virtual assistant. Participants were given the same task introduction as in Study 1 but were told that they would be watching a series of interactions between residents and a virtual assistant before being asked questions about a final interaction. Participants then watched a series of 19 videos where users made requests of the virtual assistant. The number of videos was a result of three users each making six requests of the virtual assistant, followed by a final video which participants were asked questions about.

Four levels of distinctiveness were designed (distinct user, distinct request, distinct extraneous information, and indistinctiveness) to account for variance in potential situational causes over the course of multiple interactions between the virtual assistant and the three users. For each distinctiveness condition, a different situational feature was always accompanied by an emotional display which only occurred when that situational feature was present. The exception was the indistinctiveness condition, in which the virtual assistant used an emotional display in every video.

The use of three users allowed for a condition wherein only one user was responded to with emotional displays. Each user made three requests twice, with each repeated request accompanied by a differing piece of extraneous information, which allowed one of the three requests to be the most distinctive feature in the distinct request condition. The use of three pieces of extraneous information (as opposed to two pieces from Study 1) allowed each piece of extraneous information to be accompanied by two different requests, and thus be the most distinctive situational feature in distinct extraneous information conditions. Using the distinct request condition as an example, across the first 18 videos, participants observed the virtual assistant using emotional displays in six videos, twice for each user and with two differing pieces of extraneous information for each user.

Each video featured a user asking the virtual assistant for one of three temperature changes and receiving a response from the virtual assistant which, as with Study 1, included an acknowledgement of the request, an emotional display, and a piece of extraneous information. As an example of the integral target distinct extraneous information condition, a single video might contain a request to change the temperature to 72°, an acknowledgement of the temperature change, the virtual assistant telling the user it received an HBO update, and then the virtual assistant indicating that it was happy with the new movies. For each distinctiveness condition, if the situational feature was not present, then the video did not contain an emotional display. The order of the videos was randomized across participants. Participants were then given one last video to answer questions about, which contained the situational feature which had distinctly covaried with the emotional display during the 18 videos.

Following that final nineteenth video, participants were asked a series of questions about their perceptions of emotion in the virtual assistant’s response including the perceived emotion and target of the display and its distinctiveness and possible attributions.

7.1.3. Materials and Measures

Study 2 used the same materials as Study 1, albeit with the inclusion of more videos as per the procedure. The same measures for perceived emotion and perceived target were used in Study 2 as were used in Study 1 (happiness scale: Cronbach’s α = 0.88; anger scale: Cronbach’s α = 0.89).

Distinctiveness was measured through a series of four single-item Likert-type scales ranging from 1 (strongly disagree) to 7 (strongly agree), adapted from Tamborini et al. [36]. These distinctiveness items were used for identifying the degree to which users thought that Alexa responded differently to (a) distinct requests (“The virtual assistant responds to certain requests differently than other requests”), (b) distinct users (“The virtual assistant responds to certain users differently than other users”), (c) distinct extraneous information (“The virtual assistant responds to certain updates differently than other updates”), and (d) indistinctiveness in its emotional displays in all interactions (“The virtual assistant doesn’t respond to certain requests, users, or updates any differently than it does to others”).

As with Study 1, attributions were measured through instruments adapted from Hillebrandt and Barclay [16]: the specific request attribution scale (Cronbach’s α = 0.92), the user attribution scale (Cronbach’s α = 0.90), and the virtual assistant attribution scale (Cronbach’s α = 0.82). Attributions to the various pieces of extraneous information were also measured through three instruments depending on the extraneous information being targeted. The HBO attribution scale (Cronbach’s α = 0.91), the news attribution scale (Cronbach’s α = 0.92), and the software update attribution scale (Cronbach’s α = 0.91), all consisted of three Likert-type scales ranging from 1 (strongly disagree) to 5 (strongly agree). The same forced-choice single item assessing primary attributions from Study 1 was used as well.

7.2. Results

7.2.1. Emotion

As with Study 1, two ANOVAs were conducted comparing the respective scores for anger and happiness between conditions for emotion, target, and distinctiveness for Study 2. The variation of the virtual assistants’ emotional displays was effective, and Hypothesis 1 was supported. Participants in the happy condition reported higher perceived happiness (M = 3.77, SD = 0.89) than those in the anger condition (M = 3.27, SD = 1.14), and the analysis checking participants’ perceptions of happiness yielded a main effect for emotion displayed, F(1,337) = 25.26, p < 0.001, η_p² = 0.07. Likewise, participants in the happy condition reported lower perceived anger (M = 2.70, SD = 1.27) than did those in the anger condition (M = 3.57, SD = 0.92), and the analysis using perceived anger as a dependent variable yielded a main effect for emotion F(1,337) = 43.89, p < 0.001, η_p² = 0.12. Thus, Hypothesis 1 was supported.

7.2.2. Target

An analysis of participants’ identification of targets revealed that integral targets were correctly identified in 115 cases (64.61%), and incidental targets were correctly identified in 102 cases (58.29%). This held for both happiness and anger conditions, as 57 (66.28%) participants in the integral target happy condition and 51 (58.62%) participants in the incidental target happy condition correctly identified the target, while 58 (63.04%) participants in the integral target angry condition and 51 (57.95%) participants in the incidental target angry condition correctly identified the target. Thus, different from Study 1, the identification of integral and incidental targets did not systematically differ from each other. As more than one-third of participants did not correctly remember the target of the emotional displays, the 90% criterion of Hypothesis 2 was not met.

7.2.3. Distinctiveness

Participants’ perceptions of distinctiveness were measured on four levels: the degree to which the virtual assistant responded differently to distinct requests, distinct users, distinct extraneous information, or indistinctly responded to all situational features. Except for the distinct request condition, the means for participants’ perceptions of distinctiveness were highest for the feature that was most distinct (distinct request, distinct extraneous information, or distinct user), and perceptions of no distinct specific request, specific user, or extraneous information (indistinctiveness) were highest in the indistinctiveness condition.

In the distinct request condition, participants did not perceive a greater degree of distinctiveness in Alexa’s responses to users’ specific requests (M = 3.33, SD = 1.09) than those in other conditions (M = 3.36, SD = 1.04) F(3,337) = 0.34, p = 0.79. In the distinct user condition, participants perceived a greater degree of distinctiveness of Alexa’s responses between users (M = 3.70, SD = 1.13) than those in other distinctiveness conditions (M = 3.32, SD = 1.23) F(3,337) = 3.62, p = 0.01, η_p² = 0.03. In the distinct extraneous information condition, participants reported a greater degree of distinctiveness in Alexa’s responses to different types of extraneous information (M = 3.76, SD = 0.99) than in the other distinctiveness conditions (M = 3.22, SD = 1.15) F(3,337) = 4.92, p = 0.002, η_p² = 0.04. Finally, in the indistinct condition, participants perceived a greater degree of indistinctiveness in Alexa’s responses (M = 3.58, SD = 1.16) than they did in other distinctiveness conditions (M = 3.11, SD = 1.29) F(3,337) = 4.33, p = 0.01, η_p² = 0.04. Thus, Hypothesis 4 was partially supported. Participants were sensitive to distinctiveness in the responses of Alexa and perceived greater distinctiveness for the variation across users and extraneous information. However, Alexa’s responses to specific requests by users were not perceived as being more distinct.

7.2.4. Attributions

In Study 2, 97 (27.48%) participants chose a specific request as the primary attribution; 32 (9.07%) chose the user; 138 (39.09%) chose the virtual assistant; and 86 (24.36%) chose the extraneous information the virtual assistant provided. Participants averaged M = 3.36 (SD = 1.25) in their attributions to specific requests; M = 3.22 (SD = 1.12) in their attributions to extraneous information; M = 3.63 (SD = 1.02) in their attributions to Alexa’s disposition; and M = 3.24 (SD = 1.25) in their attributions to a specific user.

It was hypothesized that participants’ attributions to user requests in the integral target distinct request condition would be higher than those in other conditions (Hypothesis 5). Participants in the integral target distinct request condition had the highest mean scores for attributions to user requests compared to the other seven combinations of target and distinctiveness conditions (see Table 1; integral target distinct user, integral target distinct extraneous information, integral target indistinct, incidental target distinct request, incidental target distinct user, incidental target distinct extraneous information, and incidental target indistinct). A one-way ANOVA, testing differences across these distinctiveness and target conditions, yielded a significant effect indicating systematic differences among the seven combinations of conditions, F(7,345) = 3.27, p = 0.002, η_p = 0.06. A Bonferroni post-hoc test revealed that the attributions to user request in the integral target distinct request condition were significantly different from the incidental target distinct request condition in a direct comparison (p = 0.04). In accordance with Hypothesis 5, in the integral distinct request condition, participants attributed requests more to specific user’ requests (M = 3.86; SD = 0.85) than in the remaining conditions (overall M = 3.33; SD = 1.27; t(27.67) = 2.78, p

7.3. Discussion

Study 2 replicated the test of Hypothesis 1 from Study 1 indicating that participants were able to distinguish between happy and angry emotional displays. Even though more participants correctly identified the intended integral target in Study 2 than in Study 1, Hypothesis 2 was again not supported as more than one-third of participants selected a wrong target.

Emotion target had a main effect on attributions to requests. Distinctiveness was found to significantly impact attributions to specific users in the angry condition but did not significantly impact attributions to specific requests or the virtual assistant. These findings, as well as limitations of the study which qualify these findings, are discussed below.

Whereas Study 1 had only found an effect for emotion targets on attributions in the happy condition, emotion targets had significant impacts on attributions to specific requests regardless of emotion over repeated interactions and had a marginally significant impact on attributions to the virtual assistant’s disposition in Study 2. This suggests that repeated experience may amplify the effect of emotion targets. It also suggests that the impact of integral emotion targets on user attributions can matter even in the face of situational evidence. Distinctiveness had a significant effect on the degree to which individuals attributed angry displays of emotion to specific users. A post-hoc test comparing all combinations of emotion target and distinctiveness showed a significant difference in attribution requests between integral and incidental targets for the distinct request condition.

Distinctiveness appears to affect situational attributions to virtual assistant’s emotional displays based on the emotion being displayed but has limited effects on dispositional attributions. This does provide support for the assumption that it is not just Kelley’s covariances, but also the content of communication, that influences attributions.

8. Conclusions

The reported studies have implications for our understanding of the role of emotional targets, Kelley’s attribution theory, and human–computer interactions with virtual assistants such as Alexa. The studies suggest that emotional display targets can affect observer attributions and that repeated exposure can make the attributions suggested by these targets more convincing to observers.

Collectively, these studies suggest that individuals often attribute emotional displays from virtual assistants to the virtual assistants themselves, and that there are circumstances in which individuals attribute the cause of emotional displays more to a user’s behavior when the virtual assistant’s emotional display target is relevant to the user’s behavior. In other words, the reported studies suggest that individuals see user input as impacting the virtual assistant’s displays of emotion when a virtual assistant’s emotional displays appear relevant to their requests, especially if it is backed by a consistent use of emotional displays. This aligns with work by Nass [34] and Edwards et al. [6] suggesting that individuals can infer characteristic tendencies in computer interactants and adds to this line of research that these tendencies are also the primary attribution for computer behaviors.

On the whole, participants were able to identify the presented emotional displays, which provides an important step for future research. Moreover, Study 1 and Study 2 both found significant differences between integral and incidental targets for happiness. Additionally, anger qualified the distinctiveness perception in Study 2 and anger displays were more recognizable to participants than happy displays, based on perceived emotions across both studies. As such, there may also be expectations about virtual assistant’s use of emotions that have built norms over how happiness is displayed and directed by virtual assistants. In future studies, it would be interesting to investigate the role of expectations in attributions of emotional displays, such as whether the attributional impact of unusual and unexpected emotional behaviors from virtual assistants differs from expected displays.

The presented studies have several limitations. One limitation refers to the demographics of the studied samples. The educational background of the Mechanical Turk samples indicates that the samples were composed of a disproportionately high percentage of participants with graduate school education. As such, this may have influenced the attitudes and norms that participants had towards virtual assistants prior to the study, as participants from differing educational backgrounds may have differing attitudes towards technology, or may differ in the degree to which they attribute emotional displays to situational vs. dispositional causes. Additionally, socioeconomic status and cultural variables were not assessed. Moreover, the studies did not require participants to be native English speakers, which may have influenced participants’ perceptions of emotional displays in English. Collectively, these three variables (education, cultural background, and socioeconomic status) could all affect the preconceptions and reactions individuals have towards virtual assistants, their recognition and interpretation of the spoken emotional displays used in these studies, and the degree to which social attributes are assigned to computers in general. Another important limitation of the presented studies refers to the use of a simulated environment in which participants observe thermostat users interact with a digital assistant. In future studies, it would be important to test the proposed hypotheses with residents interacting with digital assistants instead of observers of users that control their thermostats using digital assistants and receiving spoken feedback (see Kim et al. [3] for a study using digital assistants that provide feedback to residential thermostat users).

As a prerequisite, to have an intended effect on energy saving behaviors, Alexa’s emotional messages have to trigger at least three basic social cognitions: (1) the emotional displays (e.g., the expression of happiness or anger) have to be identified by residents; (2) residents have to correctly identify user inputs as a target of the emotional display; and (3) residents have to attribute the emotional display to that behavior. The described studies identified conditions that triggered these three basic social cognitions in a simulated environment. Whereas the variation and implementation of the emotional displays of happiness and anger were successful across various conditions in both studies (1), more research is needed to identify more effective references to integral targets that are perceived and remembered by potential residents (2), and to identify and delineate more effective and robust methods to trigger inferences that attribute Alexa’s emotional displays to a specific request if this is the focal target of Alexa’s emotional display (3).

Prior research suggests that emotional displays can improve the acceptance of computers as social agents and has issued calls for greater investigation into computer-generated emotions as an avenue for better understanding consumers’ trust in computers [37,38]. As applied to computers’ impact on environmentally friendly choices by consumers, virtual assistants’ ability to leverage emotions may improve their ability to develop sustainable behaviors by providing effective feedback and, thus, can help shape and reinforce users’ residential energy choices.

While the conducted studies did not explicitly test the persuasive impact of virtual assistants’ displays of emotion on energy-saving behaviors, the reported findings suggest that the target of an emotional display, and not just the emotional label used, can have an effect on the inferences that individuals make about virtual assistants’ emulations of emotion. This is important because individuals’ reactions to human emotional displays vary based on the perceived object and cause of an emotional display [16]. Replicating and extending these findings on emotional displays of humans in the context of a virtual assistant provides an important puzzle piece to understand if the persuasive effect of emotions might also depend on similarly nuanced message features coming from virtual assistants. A next step to advancing this research would be to replicate the reported observer inferences and attributions with participants as users rather than observers.

As applied to sustainability contexts, positive emotions expressed by a virtual assistant may ratify or fulfil the anticipated positive emotions of individuals seeking to make environmentally conscious choices about residential energy usage [39,40].

Author Contributions

Conceptualization, H.B., T.R. and J.R.; Methodology, H.B. and T.R.; Software, H.B. and D.Z.; Validation, H.B. and D.Z.; Formal analysis, H.B. and T.R.; Resources, H.B. and D.Z.; Writing—original draft, H.B. and T.R.; Writing—review & editing, H.B., T.R and J.R.; Funding acquisition, T.R. and J.R. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Science Foundation under Grant NSF1737591 to Torsten Reimer and Julia Rayz.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Institutional Review Board of Purdue University (protocol code IRB-2022-80).

Informed Consent Statement

Informed consent was obtained from all participants involved in the study.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hoy, M.B. Alexa, Siri, Cortana, and more: An introduction to voice assistants. Med. Ref. Serv. Q. 2018, 37, 81–88. [Google Scholar] [CrossRef]
Johnson, N.; Reimer, T. The adoption and use of smart assistants in residential homes: The matching hypothesis. Sustainability 2023, 15, 9224. [Google Scholar] [CrossRef]
Kim, H.; Ham, S.; Promann, M.; Devarapalli, H.; Bihani, G.; Ringenberg, T.; Kwarteng, V.; Bilionis, I.; Braun, J.E.; Taylor Rayz, J.; et al. MySmartE—An eco-feedback and gaming platform to promote energy conserving thermostat-adjustment behaviors in multi-unit residential buildings. Build. Environ. 2022, 221, 109252. [Google Scholar] [CrossRef]
Ram, A.; Prasad, R.; Khatri, C.; Venkatesh, A.; Gabriel, R.; Liu, Q.; Nunn, J.; Hedayatnia, B.; Cheng, M.; Nagar, A.; et al. Conversational ai: The science behind the Alexa prize. arXiv 2018, arXiv:1801.03604. [Google Scholar] [CrossRef]
Edwards, A.; Edwards, C. Does the correspondence bias apply to social robots?: Dispositional and situational attributions of human versus robot behavior. Front. Robot. AI 2021, 8, 788242. [Google Scholar] [CrossRef]
Van Kleef, G.A. The emerging view of emotion as social information. Soc. Personal. Psychol. Compass 2010, 4, 331–343. [Google Scholar] [CrossRef]
Nass, C.; Steuer, J.; Tauber, E.R. Computers are social actors. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Boston, MA, USA, 24–28 April 1994; Volume 1, pp. 72–78. [Google Scholar] [CrossRef]
Nass, C.; Steuer, J.; Tauber, E.; Reeder, H. Anthropomorphism, agency, and ethopoeia. In INTERACT’93 and CHI’93 Conference Companion on Human Factors in Computing Systems; ACM: New York, NY, USA, 1993; Volume 1, pp. 111–112. [Google Scholar] [CrossRef]
Nass, C.; Moon, Y.; Fogg, B.J.; Reeves, B.; Dryer, D.C. Can computer personalities be human personalities? Int. J. Hum.-Comput. Stud. 1995, 43, 223–239. [Google Scholar] [CrossRef]
Zhao, S. Humanoid social robots as a medium of communication. New Media Soc. 2006, 8, 401–419. [Google Scholar] [CrossRef]
Loaiza-Ramírez, J.P.; Reimer, T.; Moreno-Mantilla, C.E. Who prefers renewable energy? A moderated mediation model including perceived comfort and consumers’ protected values in green energy adoption and willingness to pay a premium. Energy Res. Soc. Sci. 2022, 91, 102753. [Google Scholar] [CrossRef]
Loaiza-Ramirez, J.P.; Moreno-Mantilla, C.E.; Reimer, T. Do consumers care about companies’ green supply chain management (GSCM) efforts? Analyzing the role of protected values and the halo effect in product evaluation. Clean. Logist. Supply Chain. 2022, 3, 100027. [Google Scholar] [CrossRef]
Fogg, B.J.; Nass, C. Silicon sycophants: The effects of computers that flatter. Int. J. Hum.-Comput. Stud. 1997, 46, 551–561. [Google Scholar] [CrossRef]
Ammari, T.; Kaye, J.; Tsai, J.Y.; Bentley, F. Music, search, and IoT: How people (really) use voice assistants. ACM Trans. Comput.-Hum. Interact. 2019, 26, 17. [Google Scholar] [CrossRef]
Van Kleef, G.A. The Interpersonal Dynamics of Emotion; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
Hillebrandt, A.; Barclay, L.J. Comparing integral and incidental emotions: Testing insights from emotions as social information theory and attribution theory. J. Appl. Psychol. 2017, 102, 732–752. [Google Scholar] [CrossRef]
Kelley, H.H. The processes of causal attribution. Am. Psychol. 1973, 28, 107–128. [Google Scholar] [CrossRef]
Kelley, H.H.; Michella, J.L. Attribution theory and research. Annu. Rev. Psychol. 1980, 31, 457–501. [Google Scholar] [CrossRef]
Martinko, M.J.; Thomson, N.F. A synthesis and extension of the Weiner and Kelley attribution models. Basic Appl. Soc. Psychol. 1998, 20, 271–284. [Google Scholar] [CrossRef]
Shaver, K.G. An Introduction to Attribution Processes; Routledge: London, UK, 1983. [Google Scholar]
Lichtenstein, D.R.; Bearden, W.O. Measurement and structure of Kelley’s covariance theory. J. Consum. Res. 1986, 13, 290–296. [Google Scholar] [CrossRef]
Solomon, S. Measuring dispositional and situational attributions. Personal. Soc. Psychol. Bull. 1978, 4, 589–594. [Google Scholar] [CrossRef]
Hovy, D.; Yang, D. The Importance of Modeling Social Factors of Language: Theory and Practice. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 588–602. [Google Scholar] [CrossRef]
Grice, H.P. Logic and conversation. In Speech Acts; Brill: Leiden, The Netherlands, 1975; pp. 41–58. [Google Scholar]
Hortensius, R.; Hekele, F.; Cross, E.S. The Perception of Emotion in Artificial Agents. IEEE Trans. Cogn. Dev. Syst. 2018, 10, 852–864. [Google Scholar] [CrossRef]
Davis, S.K.; Morningstar, M.; Dirks, M.A.; Qualter, P. Ability emotional intelligence: What about recognition of emotion in voices? Personal. Individ. Differ. 2020, 160, 109938. [Google Scholar] [CrossRef]
Lei, X.; Rau, P.L.P. Should I blame the human or the robot? Attribution within a human–robot group. Int. J. Soc. Robot. 2021, 13, 363–377. [Google Scholar] [CrossRef]
Li, Z.; Rau, P.L.P. Effects of self-disclosure on attributions in human–IoT conversational agent interaction. Interact. Comput. 2019, 31, 13–26. [Google Scholar] [CrossRef]
Li, Z.; Rau, P.L.P. Talking with an IoT-CA: Effects of the use of internet of things conversational agents on face-to-face conversations. Interact. Comput. 2021, 33, 238–249. [Google Scholar] [CrossRef]
Serenko, A. Are interface agents scapegoats? Attributions of responsibility in human–agent interaction. Interact. Comput. 2007, 19, 293–303. [Google Scholar] [CrossRef]
Lee, K.M.; Nass, C. Social-psychological origins of feelings of presence: Creating social presence with machine-generated voices. Media Psychol. 2005, 7, 31–45. [Google Scholar] [CrossRef]
Watkins, H.; Pak, R. Investigating user perceptions and stereotypic responses to gender and age of voice assistants. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting; Sage: Los Angeles, CA, USA, 2020; Volume 64, pp. 1800–1804. [Google Scholar]
Breazeal, C.; Aryananda, L. Recognition of affective communicative intent in robot directed speech. Auton. Robot. 2002, 12, 83–104. [Google Scholar] [CrossRef]
Edwards, C.; Edwards, A.; Spence, P.R.; Westerman, D. Initial interaction expectations with robots: Testing the human-to-human interaction script. Commun. Stud. 2016, 67, 227–238. [Google Scholar] [CrossRef]
Silberman, M.S.; Tomlinson, B.; LaPlante, R.; Ross, J.; Irani, L.; Zaldivar, A. Responsible research with crowds: Pay crowdworkers at least minimum wage. Commun. ACM 2018, 61, 39–41. [Google Scholar] [CrossRef]
Tamborini, R.; Grall, C.; Prabhu, S.; Hofer, M.; Novotny, E.; Hahn, L.; Klebig, B.; Kryston, K.; Baldwin, J.; Aley, M.; et al. Using attribution theory to explain the affective dispositions of tireless moral monitors toward narrative characters. J. Commun. 2018, 68, 842–871. [Google Scholar] [CrossRef]
Becker, C.; Kopp, S.; Wachsmuth, I. Why emotions should be integrated into conversational agents. In Conversational Informatics: An Engineering Approach; Nishida, T., Ed.; Wiley: Hoboken, NJ, USA, 2007; pp. 49–68. [Google Scholar]
Toader, D.C.; Boca, G.; Toader, R.; Măcelaru, M.; Toader, C.; Ighian, D.; Rădulescu, A.T. The effect of social presence and chatbot errors on trust. Sustainability 2019, 12, 256. [Google Scholar] [CrossRef]
Ahn, I.; Kim, S.H. Measuring the motivation: A scale for positive consequences in pro-environmental behavior. Sustainability 2023, 16, 250. [Google Scholar] [CrossRef]
Mastria, S.; Vezzil, A.; De Cesarei, A. Going green: A review on the role of motivation in sustainable behavior. Sustainability 2023, 15, 15429. [Google Scholar] [CrossRef]

Figure 1.
Conceptual figure showing the proposed influence of message features from an Amazon Alexa. Note. The introduced hypotheses test assumptions of a framework that assumes that future user behaviors are not only affected by the content of feedback but also by the emotional display with which feedback is provided. Accordingly, emotional displays do not directly affect behavior but the exerted impact on behavior depends on how the observed emotional displays are attributed. Attributions are affected by the emotional displays as well as the targets of the displayed emotions and their distinctiveness.

Figure 2.
Diagram illustrating how interaction and message features are expected to influence attributions of cause for emotional displays from an Amazon Alexa. Note: Study 1 manipulated emotional display targets in a video of a single interaction with a confederate, whereas Study 2 also manipulated distinctiveness of the emotional displays over the course of 19 videos. Conditions that were expected to facilitate attributions to user requests are highlighted in italics (see Hypotheses 3 and 5).

Table 1.
Study 2 attribution to specific request by condition.

Design Conditions	Target
	Integral	Incidental
	M (SD)	M (SD)
Distinct Request
Happiness	3.76 (0.73)	2.94 (1.50)
Anger	3.97 (0.97)	2.37 (1.42)
Distinct Extraneous Information
Happiness	3.48 (1.14)	3.54 (1.27)
Anger	3.72 (0.96)	3.05 (1.53)
Distinct user
Happiness	3.42 (1.29)	2.71 (1.28)
Anger	3.65 (1.05)	3.19 (1.22)
Indistinct
Happiness	3.64 (1.07)	2.69 (1.31)
Anger	3.73 (1.07)	3.76 (1.17)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

[ad_2]