Is a Conversational interface the best interface?
Two weeks ago we shared our thoughts on Chatbots as a key ‘interface’ towards employees and customers. Building on that, we’d like to take you on a comparison of a few popular interaction interfaces. So, to make sure we are all aligned, we want to clarify what we mean with ‘interactions’ and ‘interfaces’. Interactions are patterns you can use to communicate information or intent to another person or a machine. Interfaces are the physical or abstract channels through which you can perform these interactions.
In this blogpost we will explore the differences between several interfaces. And we will try to help you answer the question if a ‘conversational interface’ is, by definition, the best interface to interact with employees or customers. Read on to discover which communication channel is best fit for a certain situation.
1. You are not alone
You are not alone, and neither am I. We live in a globalized world where we can hardly survive on our own. We need access to food and water, medical aid, personal interactions, safety & security, etc. Most importantly: We need people to thrive!
People around us have skills which we don’t have. We are pack animals, we need social interaction to function properly and to avoid losing our sanity. And, since there are so many people in our societies, we can benefit from economies of scale resulting in a broad range of different skills.
2. Speak up
To survive in this globalized world, we need to master the art of communicating our intent. You can easily find dozens of examples that show that we are really bad at communicating intent: from rocky personal relationships, to software not being developed according to the users’ needs. But let’s not go there.
Instead, let’s focus on what matters in communication: intent. We need to be able to communicate what we want, in order to get services and goods from other people. And these days, from machines as well!
In the following paragraphs, we’d like to take the time to revisit the typical interactions we have on a daily basis, and which interfaces enable these interactions.
Afterwards we will score and compare these interfaces using a simple set of criteria. We’ll also look at conversational interfaces and see what they offer in addition to the channels we already know.
3. Interface round-up
3.1 Human-to-Human (H2H)
Let’s start with the oldest form of interaction: between humans. There are three types of interaction1.
In non-verbal communication, you observe a person to understand their intent. For example, a person that shows you to the door with hand and arm gestures, to make it clear that you can go first.
In written communication, you read what a person wrote to understand their intent. For example, when a door sign reads “PULL”, you know that person wants you to open the door by pulling. Common channels here are books, posters, post-it notes, stone carvings, …
In verbal communication, you listen to a person to understand their intent. For example, when you approach a door together with someone who says, “after you”, you know that that person wants you to go through the door first.
Note that all of these channels also have empathic artefacts inside that also influence the intent. For written communication, there is a clear difference between “please pull”, “PULL” and “Pull only!”. For verbal communication, intonation should also be considered and for non-verbal communication, a lot can be added using facial expressions and other accents in movements.
The goal of these empathic artefacts is to communicate sentiment in a way that is appropriate for the intent.
1. Principles of Management 1.1, 2010 by Mason Carpenter, Talya Bauer and Berrin Erdogan ↩
3.2 Human-to-Machine (H2M)
Now things are getting interesting, and far closer to our comfort zone 😉
Whereas H2H interaction can use sentiment and continuous learning, these machine interactions are not as literate in human understanding and behavior (yet). Again, we have these three types of interaction.
Non-verbal communication consists of ways to interact and are required to be able to perform any other way of communication with machines. The following are common types of interaction:
- Keyboard and mouse: once a novelty but at the same time a logical evolution from typewriters. They allow efficient entry of information and gesture inside the abstractions a machine creates. For example, clicking a button means triggering an action. The machine prepares the button and knows what your intent is when clicking it.
- Touch: touch sensitive displays that started by replacing mouse clicks. This interaction type is far more intuitive and efficient than working with keyboard and mouse. For example, most toddlers have already associated swiping a screen with navigation intents.
- Movement: this includes devices like Nintendo’s Wii, Microsoft Kinect, … This is a similar evolution like touch but is less intuitive in all contexts and has restrictions on when and where to use it. For example, when I swing a controller from behind me to in front of me, my intent is to throw a bowling ball.
Written communication was the most important H2M interaction up until the late 1990s. We include “visual” communication as well here. We can identify the following types of interactions:
- Command-line interfaces: like DOS and *NIX. To use these interfaces, you must know exactly what you want to achieve. For example: copy sourceFileName.txt targetFolder/targetFileName.txt which shows my intent to copy a file from one location to another.
- Graphical User Interface: like Windows or iOS. They provide abstractions from the real world like recycle bins, windows, mice, menus, folders, “web” browsers, …
- Chat-based interfaces: allow users to use text and images to converse with a machine. This type of interaction is gaining a lot of traction in the market lately. For example, you can have a bot that informs you of the latest news, based on your interest. Your intent is to get up to speed on the topics of your preference.
Verbal communication was started with initiatives like Microsoft Sam or Lernout & Hauspie’s speech technology. There have been different interaction types throughout the years:
- Machine-to-human: your typical screen reader apps. Your intent here is to read information from the screen without using the screen itself.
- Human-to-machine: voice commands. These are not very popular yet in mainstream applications, but could have great potential.
- Conversational interfaces: allow users to speak with a machine to serve various intents. Think of intents like “Will it rain today?” that you can ask Google Assistant, Cortana, Siri or Alexa for example. In 2018, Google is working hard to push the boundary and showcases Google Assistant making a phone call on your behalf to schedule a hair salon appointment, just like a real-life would personal assistant do for you.
Physical integrations can also be realized using the latest Internet-of-Things (IoT) capabilities like (medical) body sensors, eye movement detection, … For example, your intent may be to have someone alerted when heart rate values drop below a threshold value. This is also the area of Virtual and Augmented Reality.
3.3 Human-to-Machine-to-Human (H2M2H)
Further building on H2M interactions, we can think of many advanced implementations of interactions between humans, using machines.
Written communication mostly takes the form of emails, but let’s not forget Social Media where billions of people are connected and share their intents with each other over a central machine-driven channel.
Verbal communication has its roots in telephony and Voice-over-IP (VoIP) where verbal human communications are channelled over machine-based infrastructure.
And finally, we have non-verbal communication that is channelled over video conferencing like FaceTime Skype, etc. which allow conversations to bear visual cues and sentiment on top of voice. The latest advancements here include new physical integrations between people over machine channels like virtual kissing and hugging apps.
4. Comparing these interfaces
Ok, so how do different interaction channels perform on basic criteria? Below you can find our findings in a graph, and the criteria we used are explained below:
- Easy to learn: how hard is it to learn this new channel? We can see that touch and voicebots are the easiest to use.
- Ethical: how hard is it to restrict these channels to be compliant with privacy regulations and other social and ethical constructs? It seems like voice and chatbots are the hardest to be ethical with. The basic problem there is trust, can we trust a computer with every intent we can think of at the same time, or do we want to have some sort of whimsical mist around our personal behavior?
- Empathic: how human-like is our interaction? Most machine-based interaction scores really badly of course, I’ve never had a command-line that actually made me feel good for example. But voice and chat bots can go a long way already.
- Cultural dependency: once you show text or start talking, language is involved. And language is largely culturally dependent. Movement and touch are least affected by this, but we must consider cultural barriers when talking about interactions with humans.
- Availability: H2H scores badly as expected, and all of H2M interactions score high, but you can’t use command lines while driving, can you?
- And finally, what is the quality of the channel today? How certain are we in each of these interaction types that our intent will be understood correctly? More formal channels like GUI and command lines score well, but voice and chat bots still have a long stretch ahead.