The 3 next steps in conversational AI
Conversational AI is a subfield of artificial intelligence attentive on producing natural and seamless conversations between humans and computers. We’ve seen numerous amazing advances on this front in recent years, with important improvements in automatic speech recognition (ASR), text to speech (TTS), and determined recognition, as well as the rocket ship growing of voice assistant devices like the Amazon Echo and Google Home, with approximations of close to 100 million devices in homes in 2018.
Then we’re still a long way away from the easy human-machine conversation promised in science fiction. Here are certain key advances we should see over the next era that could get us closer to that long-term vision.
New tools beyond machine learning
Machine learning, and in particular deep learning, has become an extremely popular technique within the field of AI over the past few years. It has already fueled significant advances in domains such as facial recognition, speech recognition, and object recognition, leading many to believe it will solve all of the problems of conversational AI. Though, in reality, it will be only one valued tool in our toolbox.
We’ll need additional techniques to manage all aspects of a real human-computer conversation.
Machine learning is particularly well suited to problems that involve finding patterns in large corpora of data. Or as Turing Award winner Judea Pearl pithily said, machine learning fundamentally resolves to curve fitting. There are numerous problems in conversational AI that map well to this type of solution, such as speech recognition and speech mixture.
The technique has also been practical to determined recognition (taking a textual sentence of human language and changing that into a high-level account of the user’s determined or desire) with good success, though there are some limitations in using this technique to detention meaning from natural language, which is integrally stateful, sensitive to context, and often vague.
However, there are surely problems in computer conversation that are not as well right to machine learning. Think of human-machine conversation as being poised of two parts:
- Natural language understanding (NLU)
- Natural language generation (NLG)
Much of the consideration of late has been absorbed on that first part, but many tests are remaining on the group side, and these tend not to be well suitable to machine learning because response generation isn’t simply a product of gathering and analyzing lots of data. The challenge of maintaining a believable, ongoing, and stateful conversation will require more focus on these NLG and dialog management parts of the problem over the coming years.