09 Aug ESL & Machine Learning Hold The Key To An Accelerated ‘Voice First Revolution’
The accelerated adoption of chatbots and voice enabled applications is unprecedented. Almost every day new stats appear about the thousands upon thousands of new chatbots appearing in the main messaging platforms or the hundreds of skills being added to Amazon’s Alexa.
This “Voice First Revolution” is catching fire, and is without doubt a big part of the future of human to machine interaction, but is it in danger of becoming a flash fire, or hitting a wall?
As globalisation continues to make the world a smaller place, common language, predominantly English language, is a key driver in bringing people together.
In the UK the number of pupils that speak English as a second language (ESL) continues to rise topping over 1.1M, but the main language focus for most voice applications at the moment is English and other native tongues. With so many other languages vying for priority this leaves these pupils, along with their parents and fellow ESL speakers around the world left out of the voice first revolution, unable to converse with technology in either their native tongue or their adopted language.
A piece I read recently about research at MIT got me thinking about some of the early problems I faced when building the Indigo virtual assistant a few years ago. That problem being unless you could speak English like a native southerner, the speech and NLP would always have a lower fault tolerance.
I distinctly remember watching a Korean senior exec getting frustrated as his accent and word choices failed to elicit a suitable response from Indigo, and even some of my English colleagues who over pronounced words would fail to be understood by the VA.
For a decade or so I lived in London, a thriving hub of multi-nationals, where particularly in places like Camden (where I lived for a time) you could hear upwards of 20 different languages spoken on your walk around town. As globalisation took hold over the last 15 years the population became more diverse and interesting.
All of these nationalities speak English with slight nuances and turn of phrase. Grammatically they may be incorrect, but that’s not to say you couldn’t understand context, and hold perfectly meaningful conversations. Unfortunately speech and Natural Language technologies have not kept pace with this globalisation of language.
As I continue my search for a new challenging role, it’s increasingly visible that more and more companies whatever their location or nationality use English as the first language of business, and conduct all group conversations, and correspondence in English.
As Brits we’re thoroughly spoilt that our native tongue is so widely spoken. Even though I speak German and a little French I’m sometimes embarrassed not to be able to converse coherently in other people’s languages, when they’ve made such an effort to learn mine. My children who are 6 & 8 have already started learning French and I’m keen for them to continue their language education.
Before I get off track, my point here is that globalisation has brought English and ESL to the world as a by product, and that’s too useful to be ignored in the race for progress.
Language is the very foundation of human communication that every person knows, whatever their understanding of technology or access to education.
Uneducated people all over the world have excellent grasp of language, many of them of rudimentary English, and that makes the ‘voice first revolution’ very powerful for commerce, technology and humanity.
The ‘speech first revolution’ can be significantly accelerated by catering for the ESL population.
ASR (automatic speech recognition) engines are already making strides in this direction with variants of English accent catered for, including British English, US English and Indian English, but NLP (Natural Language Processing) engines are still way behind.
Most NLP engines will handle US English variants, as we did with Indigo, but Indian, Chinese, and a host of European variants need to be catered for. Some even cater for the mutant SMS language of the 1990’s (gr8 = great), but ESL seems to be largely ignored.
This is where Machine Learning can help. Lakes of data are generated every day by various companies that contain just the sort of data that machine learning can use to improve ASR & NLP engines with recognition and understanding. This would broaden the adoption of these technologies exponentially by including the ESL population in the ‘voice first revolution’.
Of all the platforms in the virtual assistant space Amazon’s Echo/Alexa strategy is the most interesting and progressive, and one I have great affinity with having had similar aspirations for Indigo.
Amazon has an open door policy for use of of the Alexa platform, allowing developers to generate new skills for Alexa, and even plant her on OEM devices. At a re/code event recently Jeff Bezos mentioned that over 1,000 people were now working on this product. That puts Amazon’s investment in the product line into the billions, and they are currently recruiting for 400+ people to join the team. That’s serious investment! In Europe they are opening another office in Turin, and the EU team alone will rise by 50%+.
What’s surprising about Amazon’s stake in this market is we haven’t seen any real acquisition strategy yet, there’s a real focus on adding manpower but very little focus on acquiring technology to improve the product stack. Recruitment is a slow process even in agile organisations like Amazon, whereas use of existing machine learning tools via acquisition or contract could rapidly increase their progress against the vast Amazon data lake.
There comes a point in every growth or acceleration strategy, where acquisition is the best strategy. I believe Amazon’s Echo has reached this buffer, and acquisition will hopefully come in the next cycle of Alexa’s story.
Alexa is great but it lacks true conversational ability. Building that rather than buying is a task not to be underestimated because of it’s inextricable nature with core NLP functions.
Alexa’s SDK is great but requires significant programming experience, and is still a little fragmented. You have to open a number of interfaces to get a skill configured and up and running. Wouldn’t it be great if you could build a skill or a bot as easily as you could build a website? After all this is a pure content play, the beauty of a good intelligent assistant is not in the lines of code, it’s in the content (quality dialog) and the navigation (conversational structure). Many of the big NLP players (Apple, Microsoft, Google etc.) now employ writers and journalists for their bot strategies for exactly this reason – to create engaging content. A good CMS for generating skills and bots would open the doors to so many more creative people to join in and pioneer.
Providing the widest possible audience with the opportunity to participate and the tools to get involved is what will accelerate the Voice First Revolution.
For me that means supporting the ESL population and creating non-dev tools for greater participation.