What ChatGPT and LLMs Mean for How We Build Conversational Interfaces for the Future

Firstly, what are LLMs and ChatGPT? This is not an article about what LLMs (or large language models) and ChatGPT are. If you have been living under a rock and are unfamiliar with these names and terminology then this article written by ChatGPT explaining itself should be a good starting point.

We have been receiving questions from – and participated in many discussions with – our customers and peers about this exciting new tech and wanted to clarify our stance on where we see the opportunities and weaknesses at the current stage, as well as looking forward to a potential hybridized future. The biggest talking point has been the need for conversation design in an increasingly automated and generative world.

From our perspective as experts on conversational interfaces and conversation design we see predominantly two paths that this technology and trend will continue to develop on: the path of consumer-facing applications and the path of the technology as a tool and force multiplier. Neither of which will be eliminating the need for humans behind the wheel, steering the technology, anytime soon.

Hopping on the LLM bandwagon

Broadly speaking, this technology and its implications are spreading at breakneck speed. Many platforms are currently aiming at capitalizing on this goldrush-like state. You may have heard of Microsoft implementing ChatGPT in Bing and Google looking at fusing their proprietary equivalent LaMDa with their own search engine. These search engines follow a trend that companies such as SoundHound have been pursuing for a while, responding to users not in lists of search results, but in concrete answers in the form of natural language.

Other examples of quick wins in this brand new space are bot platforms such as Voiceflow and Cognigy.AI. Here the same purpose of applying LLMs to dynamically generate the system responses or predictable training data for intent training is being used heavily. Some platforms, like Cognigy.AI, are also considering going a step further and looking into the empowerment of conversation designers by allowing the creation of flows and elements through natural language prompts, speeding up the process of setting up new conversations greatly and thus contributing to rapid prototyping capabilities of these low-code platforms. Will these features collate into conversations that are production-ready, about to be rolled out to millions of users, out-of-the-box? Of course not. But they provide a good first framework to expand upon.

Trust in the system and the tech is dwindling

Widely broadcasted anecdotes of tech journalists and influencers, as well as hear-say from colleagues and friends have recently lead to a lot of skepticism when it comes to the current state of the technology. Articles quoting the unsettling feeling, individual erroneous responses and behavioral patterns reinforce negative connotations when it comes to LLMs in todays world. This obviously has a huge negative impact on consumer-facing applications.

Finding an appropriate place for LLMs should not be difficult

Focusing on this new technology as a force multiplies and enablement tool, is therefore the more stable path from our perspective. At least while the technology matures and new, more refreshing experiences for consumer-facing applications improve the publics perception in the mid-term.

On a more immediate and applied note, ChatGPT and LLMs are a great vehicle for innovation and a popular driver for change, but they are tools and will not replace human experts in conversation design. It is a good gap-filler and repetitive tasks but it will not provide the confidence and accuracy of dialogues designed by humans for a while.

The conversation designer is still the agent of change for this new tech

Our workflows in the future could consist of conversation designers laying down the structure of a dialogue, such as the starting point, the goal of the conversation and some checkpoints along the way, with the generative AI or LLM filling the gaps.

In an ideal world we would provide the AI with a purpose and a personality, but no actual dialogue would need to be written by humans. The conversation designer would be focused entirely on the strategic purpose of the interface and the decision on a vector of the personality and tone of voice of the bot.

Paul Krizsan, Director Conversational AI

So while remaining up to date with the current developments of this exciting new technology is vital, we do not share the current ubiquitous sentiment that users are ready for unfettered access to potentially image-harming experiences without having some of the kinks of current LLMs ironed out over the course of 2023.

Are you interested in talking about conversational interfaces, LLMs and how to design for conversations? Talk to us!

An insight into the cosmos of voice assistants

In a world full of voice assistants, the choice is notoriously difficult. What are the strengths and weaknesses of the individual assistants and which ones are compatible with me? We have addressed these questions and created an overview of the most popular and competent voice assistants.

A brief overview

Google Assistant

Company: Google
Complexity: Android, iOS
Languages: 8
Topics: Travel, commute, search engine
Hotword: “Ok Google, Hey Google”
Features: Smart home integration, Android auto integration

Alexa

Company: Amazon
Complexity: Android, iOS
Languages: 4
Topics: Shopping, Entertainment
Hotword: “Alexa” (customizable)
Features: Smart Home integration, direct shopping on Amazon

Siri

Company: Apple
Complexity: iOS
Languages: 20+
Topics: Planning, productivity, every day life
Hotword: “Hey Siri”
Features: Apple CarPlay integration

Cortana

Company: Microsoft
Complexity: Android, iOS, Windows
Languages: 12
Topics: Planning, search engine
Hotword: “Hey Cortana”
Features: Integration to Alexa hardware

Our prediction for the future: Sprachassistenten und wohin führen uns diese mysteriösen Stimmen eigentlich?

Bixby

Company: Samsung
Complexity: Android
Languages: 2
Topics: Planning, Shopping
Hotword: “Hi Bixby” (customizable)
Features: Smartphone control, virtual shopping

Hound

Company: Soundhound
Complexity: Android, iOS
Languages: Englisch
Topics: Planning, search engine
Hotword: “Ok Hound”
Features: super fast, excellent at complex requests

Nomi

Company: Nio
Complexity: Car-Integration
Languages: Chinese
Topics: Vehicle management, daily companion
Hotword: unknown
Features: Very emotional through visualization, extension of the car

Personality and Voice

While features alone make up a small part of a good voice assistant, the personality of the virtual assistant is probably the most important part. The personality determines how committed we are to listening to the voice and how the hierarchy between human and machine is.

If the assistant is more humane, shows emotion and has as realistic a voice as possible, we tend to listen better and fulfill our wishes and requests through conversation. If the voice is tinny and clearly machine-like, we deal with the technology differently.

Our positioning of the personalities of the listed voice assistants (inspired of magenta.as)

The hierarchy between the user and the voice assistant can vary depending on personality: An always obeying, machine-sounding assistant is quickly regarded and treated as a will-less subordinate of the user, while a human-like personality tends to make us regard the assistant as an equal.

Due to the relevance of the personality, various voice assistants therefore also rely on visual supports, the so-called embodiment. Thereby, the voice assistant manifests itself not only in audible, but also in visual form, whether it is

We design and develop your voice assistant!

 think moto develops brand-adequate conversational user interfaces for voice assistants and chatbots and also deals with questions of visualization and embodiment of voice assistants.

Voice assistants: So, where are these mysterious voices actually taking us?

Anyone currently walking past advertising posters for Google’s voice assistants is probably wondering where this voice assistant arms race will lead. Almost every year, the digital companions are equipped with new features and better capabilities. Will we ultimately be left with a homogeneous set of voice assistants that do everything perfectly, or will we face a wild flora of smaller assistants that are highly specialized but generally weaker, albeit in symbiotic relationships with each other? Or will it even look different?

Symbiosis

It is no secret that established voice assistants like Amazon’s Alexa and Microsoft’s Cortana have unique strengths, but also equally individual weaknesses. While Alexa demonstrates excellence among voice assistants in the areas of shopping, entertainment and as a companion outside the work environment, Cortana’s strengths lie in organizing daily routines and supporting the user’s productivity.

In May 2018, the manufacturers of the two assistants therefore announced a collaboration: In the future, one should not only be able to address and command Alexa via Amazon Echo, but also be able to call Cortana. Complete with Cortana’s voice.

This form of symbiosis is supposed to strengthen Alexa in particular, but it is also a sign that Cortana will probably not expand on the competitive stage. It is therefore more likely that Cortana will focus on deepening known topics.

Similar to Alexa and Cortana, the most popular voice assistants, Google Assistant and Siri, have weaknesses and strengths. The resulting gaps are now being filled by a new generation of voice assistants that can often only do a few things, but do them much better than general voice assistants such as those from Google, Apple and Co.

Companies like the U.S.-based Soundhound, whose Hound assistant shines especially when it comes to complex questions and commands, are either hoping to participate in the market alongside giants like Amazon by licensing their own framework. This allows corporations that would benefit from speech recognition and commands to use Soundhound’s technology without spending the resources to develop their own.

Voices and embodiment

While for manufacturers of mobile, smart devices in lieu of physical manifestation, it is primarily the voice that is the avatar of the personality, companies from industries such as smart home and automotive have the opportunity to visually lend a hand to the personality of the assistants. Whether physical or digital, this is referred to as embodiment, the lending of a visual language of form.

Amazon Echo

The embodiment can take different forms: Amazon can give Alexa coarser character traits through the design of the Echo products. Thus, the voice assistant does not appear feminine to the extreme, but rather neutral and open, educated and likeable.

Amazon Echo, 2nd generation. Source: expertreviews.co.uk

Jibo

A good example of exaggerated embodiment is Jibo. Jibo is a curious and always joyful five-year-old in the cute body of a table lamp. By rotating the three body parts, the fun robot can, among other things, dance, tilt its head questioningly, and blink and show other emotions thanks to the eye in the display.

Although Jibo’s functions are limited and not nearly as elaborate as those of competitors, Jibo can convince with charm thanks to its physical form.

Jibo. Source: jibo.com

Nio Nomi

The automotive industry also sees a lot of potential in voice assistants. For many, our four-wheeled companions are already considered family members; you couldn’t ask for a simpler platform. Unlike smartwatches and smartphones, and not least because of the longevity and non-existent compulsion of portability, AI’s in cars can also take physical form. Much like Jibo, Chinese electric vehicle manufacturer Nio’s AI is intended to be perceived primarily as a social companion. Nomi – as Nio’s AI has been christened – can simulate and awaken in humans an astonishing array of human emotions thanks to a display above the car’s central console. It’s true that you feel like Luke Skywalker with a droid in a spaceship, but who can deny themselves those cute eyes?

Nio’s Nomi. Source: Wall Street Jounal

A Forecast

However, the biggest technological leaps in the field of voice assistants are still happening with the market leaders. For example, in May 2018, Google unveiled a demo version of Google Assistant, which could independently make phone calls to humans with such authenticity that the people on the other end couldn’t make out the caller as an artificial intelligence. Google Duplex, as the demo version was called, based its humanity not only on the emulation of human speech, but also on the regular interspersion of filler words such as searching ahms and confirming mhhms.

Do you want to give users the feeling that they are talking to a real person or does the machine have to be recognizable as such?

That this involves the arbitrary deception of people and that the possibility for abuse is outrageously close is obvious. One of the biggest questions in the design of voice assistants in the coming years must therefore be the question of ethics: Do we want to give users the feeling that we are talking to a real human being, or must the machine be recognizable as such?

We minimize and compensate our CO2 consumption.

Cookie Consent with Real Cookie Banner