User Perceptions of Smart Replies

Despite a growing body of research about the design and use of chatbots, little is known about how any type of smart agent is perceived or interacted with in communication between humans. In this generative research, we investigated user perceptions and potential effects of smart replies.

Methodology

In this generative research, we conducted in-lab experiments (N=72) and followed up with semi-structed interviews (N=5).
• Participants (N=72, 36 dyads) carried out one of three communication tasks in our lab with either a traditional messaging app or with a messaging app featuring smart replies. With a variety of tasks, we hoped to uncover insights about communication in various contexts. Conversations were recorded for analysis.
• Five random participants from the smart reply group were asked to return for interviews. While viewing a screen recording of the conversation, they were asked to “think aloud”, elaborate on their thoughts in retrospect and comment on the app itself and the smart replies. They were then asked some additional questions.
• I then transcribed and linguistically analyzed the conversations as well as thematically analyzed the interview data.

Main Findings

Smart replies were overwhelmingly positive.
• During an informal review, I noticed the overwhelming positive sentiment of the smart replies.
• I then formally classified the sentiment of all smart replies using Mechanical Turk (5 workers per response; 1012 unique smart replies) and found that 43.8% of all smart replies were classified as having a positive sentiment, while only 3.95% were classified as having a negative sentiment.

Smart replies were seldom used and often did not make sense contextually.
• All conversations and smart replies were analyzed using Linguistic Inquiry and Word Count (LIWC), a dictionary-based text analysis tool that determines the percentage of words that reflect a number of linguistic processes, psychological processes, and personal concerns.
• The smart replies were hardly ever used. On average, a smart reply was chosen 6.24% of the time.
•The smart replies often did not reflect what participants actually wanted to express. Examples of this can be seen in the figure below, where users wanted to disagree or suggest an opposing viewpoint, but the smart replies only offered expressions of agreement and positive reinforcement.

• Participants expressed that while they could see the potential usefulness of the smart replies, they were not particularly relevant to the completion of the task.

“Since the task was pretty specific, maybe not all suggestions were exactly useful, but there were some that were nice. In terms of general contact between a friend, I think that it would be helpful in terms of, ‘Hey how are you doing’, and stuff like that.”
- Participant 16
Participants suspected that smart replies might be influencing their conversations.

The AI “. . . would kind of guide me. It was what I was already thinking in my head, but then like okay, that’s also an appropriate thing to say.”
- Participant 8
“Let’s say if the other person asked for specific rank, and the prompts are all positive, and then I just agreed with it, I just went with the positive one just for the sake of completing the task. But if I hadn’t seen the prompts, maybe I would have opposed that question.”
- Participant 28

Recommendations

Each time responses are suggested, users should be presented with a mix of positive and negative options.
• Smart replies are currently overwhelmingly positive
• Existing smart reply algorithms already acquire data from conversations, so the sentiment of these suggestions could quantitatively correspond to the negative and positive makeup of conversations in the database.
• The reply-generating beam search could be amended to be biased towards paths leading to responses that reflect this sentiment makeup.

You might also want to see: