How Voice Recognition Is Close To Eliminating Typing

With the recent acquisition by Facebook of voice-recognition company Wit.ai, all four major players in the post-PC market (Apple, Google, Microsoft and Facebook) now have a significant infrastructure for hands-free communication with your device. But what will that mean for our communication with our devices? Is voice just another method to talk to your computer, or are we on the cusp of a revolution in computer communication?

Picture: Getty Images/Hulton Archive

How old is your keyboard, anyway?

The humble computer mouse was created in the 1960s by engineer Doug Engelbart. The keyboard, through its ancestor the teleprinter, is even older, having been developed in the 1900s by mechanical engineer Charles Krum and connected to a video display terminal that owes its ancestry to a device developed in the 1930s.

Despite the age of these devices, they still remain the main input devices for your personal computer on your desk or laptop.

Sure, they have more buttons, or more colours, or higher resolution, but the basic input mechanism for the average home computer is the same now as it was in 1984 when the Macintosh became the first commercially available computer to provide a graphical user interface and mouse and keyboard input.

Picture: Marcin Wichary

Even the multi-touch screen, made famous by the iPhone and other devices in 2007, could be considered a direct descendant of the mouse, simply moving control of the pointer from an indirect method on your desk to a more direct method on the screen.

But perhaps that is all about to change, with voice-recognition technology finally becoming important to the main players and other technology changing the way we interact with computers.

Your voice is your password to a world of possibilities

Like the mouse and keyboard, voice-recognition technology has been around for a number of years. Commercial voice-recognition software has been available for computers since the early 1990s.

But it was only with the advent of technologies such as Apple's Siri and Google's Voice Search around 2010 that voice recognition became part of many people's lives.

Through a natural language, context-aware interface that is always connected to the Internet, technologies such as Siri allow users to address a vast range of needs while skipping touching their device altogether. Instead, they rely on their voice to set timers, check the weather, find movie times and even query where to hide a body.

In 2014, Microsoft introduced Cortana, a Siri-like competitor, meaning that all three leading smartphone platforms had voice recognition. Also in 2014, Apple introduced the "Hey Siri" feature in iOS 8, allowing users to "hail" a smartphone from across the room (as long as it's plugged in) and ask it a question without touching any buttons at all.

Finally, in 2012 Google released Google Now, an extension to Google Voice Search that provides users with contextual information prior to them requesting it, such as providing traffic information as you leave the office or a list of good restaurants to eat at when you arrive in a new town.

And it's widely rumoured that both Google and Apple have plans for voice-recognition technology in their television products as well.

While these solutions sometimes have a way to go (John Malkovich surely remains the only person in history to get Siri to correctly interpret "Linguica"), they present a starkly different view from the mouse-keyboard combination of old.

It surely won't be long before users can have a standard conversation with their device, talking it through a problem rather than frantically tapping the on-screen keyboard or clicking the mouse.

Blending the digital and the physical world

The revolution extends beyond our voice to other devices as well. It would appear that along with replacing old-fashioned input devices, output devices like the monitor are slowly being phased out.

Earlier this year, before it acquired Wit.ai, Facebook made news for acquiring pioneering virtually reality company, Oculus VR for a staggering $US2.3 billion. The major product of Oculus VR is the Oculus Rift, a virtual reality headset that immerses you fully in a 3D virtual experience.

Using positional sensors, the Oculus Rift can track your head movements to allow you to look around the environment. The device is still in development. Given the cost of the development kit at around $400, it's expected that the final product will retail for less than $500, bringing virtual reality to the everyday consumer.

Even if you don't want full immersion, new output products are making it easier for us to step in and out of a digital world without needing a computer monitor. Products such as the new Android Wear watches from Motorola and others, as well as the Pebble smartwatch and upcoming Apple Watch, provide us with small, customised views into the digital world.

These can all put notifications, music control, sleep and activity monitoring and all the power of those voice-control systems literally at our fingertips, all without the need to use a full input or output device.

Even in your car, Android Auto and Apple's CarPlay provide a glanceable, touchscreen and voice-controlled interface to your smartphone to ensure you're always connected to the cloud.

Sensors everywhere

Beyond these standard options, input devices and data-gathering devices are continuing to pop up in places that we don't expect, making it easier to interact with your devices and control your digital world.

At the Consumer Electronic Show (CES) this year, gadgets using Bluetooth Low Energy for communication with your home network abound, from a smart chair that helps you work out to a pot for your plants that monitors their vitals and allows you to apply water with a touch of a button.

These add to items from the last year such as the connected toothbrush that monitors your brushing time and style and reports on how you're doing and the Vessyl cup, a smart cup that can tell you the calorie and caffeine content of your beverages as well as keep track of your daily water intake.

No longer are we tied to our keyboard and mouse to look up and record this data. Our devices will now do it for us automatically and let us know when something needs to be changed.

This trend towards the Internet of Things has been brewing for a number of years, but if the CES is any indication, this year shows a real explosion in external input devices that collect data about us and feed it into the cloud.

It will be interesting to see what the future brings. It could be argued that the new ways of communicating with your computer are already here, although just beginning.

As the year progresses and these models mature, perhaps it won't be long before we are speaking to our device using natural language while wearing a VR headset and being instantly alerted about the status of our plants and how much activity, sleep and caffeine we've had so far today.

With all of these solutions, perhaps finally the old mouse and keyboard are looking mighty old-fashioned.The Conversation

Michael Cowling is Senior Lecturer & Discipline Leader, Mobile Computing & Applications at Central Queensland University.

This article was originally published on The Conversation. Read the original article.


Comments

    Oh yeh THATS going to work in an office environment.

    everyone talking to the computers... that's not going to be fucking distracting at all.

      you just beat me to the noisy aspect - it would be completely untenable in my current work place anyhow...

    Voice recognition can't always be the practicle solution. Maybe it's just me being technologically backwards - but dictating, editing and formatting a 60,000 word report for work with voice commands doesn't sound that exciting to me. Not to mention it would make for a very noisy office. I don't think it would really be more efficient or faster either.

    Everyone is always skeptical, but look at the way the work force is moving, more people working from home or remotely, meaning that the noise factor would be less of an issue. Its also not going to work for every role, but consider that voice will not take over completely and will not hit mainstream until it provides a strong enhancement and allows you to do things easier than you have before.

    Consider this: you are typing a 60,000 word report and you get a pop-up notification on your task bar of an email that you want to respond to straight away. Instead of moving your hands from typing positions to your mouse and responding, you could say "respond", which would bring up the email reply, you could continue to type the reply, or say it and then send it with another verbal command, to leave you in the same position typing away on your report.

    As long as it becomes an enhancement to your environment and doesn't make things harder, it will begin to get wider adoption until it becomes the norm.

    I think when most people hear about voice recognition taking over, they straight away start thinking about all the places it wouldn't work and then disregard it. Need to start considering the uses where is could improve and make your work life more efficient.

    Can Lifehacker write an article on the difference between voice recognition and speech recognition please? So many people do not know the difference, like the author of this article.

    Speech recognition = machine recognition of what was spoken, speaker independent
    Voice recognition = machine recognition of who said it, not worrying about what was said

    Siri and Cortana are both speech recognition software, NOT voice recognition. They work out what was said without regard to who is saying it, and works with anybody/everybody.

    Voice recognition is mainly used in security systems where the machine tries to figure out who said it. It doesn't really matter what words were spoken but who it was that said it. Voice recognition is a much harder problem, and it really hasn't been solved yet.

    A fast typist can not only enter text very quickly but can edit and format it at the same time. All of this can be done very rapidly. But to tell the machine to do the same thing requires a completely new set of skills - something that explains why speech/voice recognition has been so slow to take off.

    Imagine what is involved in instructing the machine to - say, change the order of a sentence, or change a misspelt 'too' to 'two'. You have to remember the correct way to do it or it won't work, you have to consciously think about it and how to express it correctly. Until you learn the new skills required, the old ways will be quicker - and how many people a) know where to learn these skills, or b) are willing to invest the time.

Join the discussion!

Trending Stories Right Now