Unlock new engagement opportunities with a CPaaS leader! Read the IDC MarketScape report and find out what Sinch can do for you. Read more

Blog / Voice

The Present And Future Of Voice Recognition Technology

21 Aug 2015 - 5 min read

Like most times during my writing process, first thing is opening a new Google docs sheet, where I create a draft for the next interesting topic to feed to the blog. Then, I choose among technologies as carefully as the letters I’m presenting them with, and Google seem to be doing the same.

Google’s Alphabet

While writing, Google’s announcement on Alphabet is still fresh from last week. If it has passed anyone’s radar, Alphabet is Google’s move towards independence for the more “far afield” businesses that were previously flying under the wings of Google. According to Larry Page, CEO at Alphabet, the new structure is meant to make the company cleaner, more accountable and slimmed down. Alphabet is the parent company that’ll house a collection of companies ranging from Boston Dynamics to DeepMind (constituting the B and the D of the Alphabet), but what caught my attention was the technology behind the letter O.

Ok Google

Enabled through voice recognition technology, Ok Google is voice search and actions for waking up your web or mobile apps. Manage calendar, navigation, entertainment and more through voice, by letting Ok Google decipher it. You just go “Ok, Google…”.

“For example, say ‘Ok Google do I need an umbrella tomorrow’ to see if there's rain in the weather forecast.”

We can expect Ok Google to start working in offline mode pretty soon, according to an Android Police report. Apparently, new code in the app clearly refers to the possibility of actions by voice commands without an Internet connection. However, the number of strings are very limited to actions like playing music or sending texts.

Google Commands

From Audrey to Siri…And Barbie

Voice recognition technology has been around for some decades now, but is by no means getting old. Born in the 50’s with systems like Audrey, speech recognition “took off” in the 70’s, but the progress has still been p-r-e-t-t-y…s-l-o-w (did the system catch that?). During the last couple of years however, things have started to move.

First question asked of AI; "Is there a god?" First AI answer; "There is now."


Google’s speech recognition technology now only has an 8% error rate (compared to 23% in 2013). “Put simply, voice recognition in machines…will completely change the way humans interact with their computing devices”, writes Tim Tuttle, Chief Executive Officer of Expect Labs. Because of deep speech and the virtuous cycle of AI, voice recognition is getting “freakishly good”, with an 18 months progress that’s more aggressive than what we’ve seen in the last 15 years combined. According to Tuttle, computers will start listening to us 24/7, and intelligent voice interfaces will soon move into all kinds of apps.



The Virtuous Cycle of AI - the more it is used, the better it works as it gathers more data so more users come in


The market is trying out this newly improved technology in every thinkable way. Say Hello (to) the smart Barbie, a speech recognition equipped doll that’s able to converse with the users, i.e. children playing with it. This very first interactive doll has understandably raised privacy concerns around the fact that personal recordings of children’s conversations with the doll is being sent to third-party companies. “A lack of user understanding, and the involvement of children, who are potentially incapable of understanding that their actions are being monitored”, is an issue that has to be dealt with before this Barbie doll gets housebroken.

Hot words to hide?

One of the popular stats flourishing on the Internet now is that 83% of millennials sleep with their phone. That’s a fun way of saying that people nowadays, and especially the younger generations, are constantly carrying around their beloved devices. That underlines the apocalyptic eavesdropping phenomenon, that’s moving from the Hollywood screen to real users’ minds. Will god-like systems have access to everything I say, and then feed the words to their preachers, the companies?

Ok Google is trying to get around the privacy worry simply by using “Ok Google” as a hotword. Only then will the technology be activated, and the device start listening to what the user is saying. But it’s debatable whether a hotword is assuring enough.

S for Security

I’m amazed by the great potential that comes with voice in machines: from getting the weather forecast without a hassle and calling cheaply to people across the globe, to engaging in authentic conversations with robots. But for voice to really reach a smart future, it must simultaneously approach a secure one.

Today, constantly carrying around a digital ear creates a sense of being monitored by rather than monitoring technology. In terms of the Alphabet, that makes me think Google should have assigned the S to Security.

Click to learn more about voice API from Sinch.

For additional reading, check out this Awesome list of 70 Google Now voice commands.


Written by


Related Posts