Research Paper for NMD102
NMD Spring 2017
Keywords: voice, recognition, personal assistant, interface, speech
What is a “voice-user interface”? A voice-user interface(VUI) is what makes it possible for us humans to interact with our computers or other fitted technologies with just our voice. Whether we’re asking for the time, weather, upcoming date or telling our stove to preheat itself to 400 degrees for our frozen pizzas. The reason I chose this topic is because of recent increase in how often I saw these types of technologies being used by my friends and people around me. I personally never use any voice-user interfaces because my phone doesn’t have Siri and I don’t see the real point in using them yet when I can just Google something myself if need be. I do notice how often some of my friends use their Siri or their Amazon Alexa’s, and I do see their use for them, but it just doesn’t stick with me because I’m much more of a hands on type of person, so yelling across the room to a little speaker that’s always listening to what I have to say doesn’t really interest me. Yet the idea of these VUI’s does.
DuBravec of On TechRebulbic says that “We’ve seen more progress in this technology in the past 30 months than we have in the past 30 years. Ultimately vocal computing is replacing the traditional graphical user interface.” If VUI’s replace our standard graphical user interfaces, what would that be like? Would people just walk around with little speakers in their ears, or on their shirts clipped like microphones, or would some even be imbedded into us to get rid of the outside element itself? Spike Jones directed a movie called Her starring Jaoquin Phoenix that shows what only having VUI’s would be like. With little microphones in their ears to talk to their handheld computers the world seems to be smaller and smaller than it already was. Everything at this point is connected by talking to your computer instead of real user interfaces or even communicating with other people, who needs people when you’ve got the whole world internet in your ear.
Google, Apple, Baidu, Amazon, and Microsoft all provide this interface. These main sources of internet and household items have voice-user interfaces. Microsoft even has a piece of technology called the Kinect that is an attachment to their Xbox series that allows the user to be the controller, to control what’s happening on the screen, whether it be a videogame or just browsing their Xbox, their body is the controller. Along with the Kinect watching, recording, analyzing your body and movements at all times when the console is turned on, it also listens to you. The user can also control what’s happening with their voice, with basic commands such as “Xbox on” or “Xbox off” or “Xbox record that”, there’s many a time that it’s listening to what you say.
With Google it’s Google Voice, a software like Apple’s Siri, that allows you to use your voice to surf the web. It’s like your own personal assistant through your phone, laptop, tablet or anything you can download it on, making it virtually universally user friendly.
Baidu, or the “Google of China”, has Deep Speech 2, a speech-recognition system that leverages the power of cloud computing and machine learning to create a neural network. A neural network is practically a computer simulation of a human brain (freaky right?), which makes it a machine that learns. Deep Speech 2 had mastered a certain number of linguistic concepts, which allowed it to “learn” specific languages easier than usual. With the look at how Deep Speech 2 is going, the main developers think that this may be the first step forward into creating the first universal translation engine that could recognize any language and practically instantly and simultaneously translate it for you.
Apple has Siri, which most people have these days if they own an IPhone 4-7S, the S standing for Siri. Siri is a personal assistant application for the iOS that uses voice recognition to answer question, perform simple tasks, and make recommendations based on the user. Siri is the most commonly used VUI for people who don’t really understand what VUI is. People use Siri everyday to set alarms, reminders, send a text, ask the weather, perform basic web searches, and even to just make a ping noise to let them know where their phone is by saying “Hey Siri” loud enough that the phone dings.
That brings us to Amazon which has the new, sleek, game changing Amazon Echo. The Amazon Echo is a hands-free, voice-enabled speaker, equipped with a far-field speech recognition system that allows you to ask an array of questions/requests, at an impressive distance, without even touching it. You can even hook the Echo up to some of your household items, if you have the updated GE Appliances that are equipped with Amazon’s Alexa (derived from the Echo). This just makes doing things around the houser much more simple, you don’t even have to lift a finger.
With all these VUI’s already in use, what is in store for us in the future with these? We can already see that before Christmas of 2016 the Amazon Echo was in 4% of American households… 4%... That’s around 8 million households. That’s a substantial amount of people talking to a cylinder on top of their tables or counter tops. Redcode.net has an article about the future of these voice- user interfaces, and in it, it says that it will be a primary interface in households everywhere. Soon we’ll be able to turn on lights, kitchen appliances, water, alarm systems, sound systems and other things just by our voices. Soon there will be even cars that have VUI’s in them that allow them to be controlled, or interacted with voice. They VUI’s will also be a large part of the hands-free workplaces like hospitals, warehouses, laboratories, production plants, etc. With these voice controlled technologies rapidly taking over our workplaces, what is going to happen to all the real people who are likely to be replaced by machines? TheGuardian.com, a prestigious technology related website claims that at least 6% of all US jobs will be replaced by voice controlled machines. This obvious in the customer service industry especially with retail and fast food jobs where we practically don’t even need a human anymore as it is. I personally can’t wait to just converse with a computer when I’m ordering my food rather than a person, the computer will most likely get my order correct.
With all that being said we can expect a large increase of VUI’s in the next coming years. Whether they are just going to be updated ones we already have, or are new creations for in-workplace machines, we’ll soon be talking to inanimate objects much more.
"B Corporation." Technologies of Voice Interface Ltd. | B Corporation. N.p., 01 Nov. 2015. Web. 20 Mar. 2017.
"How voice technology is transforming computing." The Economist. The Economist Newspaper, 07 Jan. 2017. Web. 20 Mar. 2017.
Wesemann, Darren L., Dong-Kyun Nam, Richard T. Newton, and Inc. Talk2 Technology. "Patent US6349132 - Voice interface for electronic documents." Google Books. N.p., 16 Dec. 1999. Web. 20 Mar. 2017.
Condon, Stephanie. "CES 2017: Voice is the next computer interface." ZDNet. ZDNet, 05 Jan. 2017. Web. 20 Mar. 2017.
Tuttle, Tim. "The Future of Voice: What's Next After Siri, Alexa and Ok Google." Recode. Recode, 27 Oct. 2015. Web. 20 Mar. 2017.
Milazzo, Joe. "How Baidu's Deep Speech 2 Is Winning The Speech Recognition Game." Chinese Learning Tips. N.p., n.d. Web. 20 Mar. 2017.
By: Ry Crist, David Carnoy, Ry Crist Twitter Facebook Googleplus Originally Hailing from Troy, Ohio, Ry Crist Is a Text-based Adventure Connoisseur, a Lover of Terrible Movies, and an Enthusiastic Yet Mediocre Cook. He Has a Strong Appreciation for Nifty, Well-designed Tech That Saves Time, Looks Stylish, And/or Helps Him Avoid Burning His Dinner Quite so Often. Ry Lives in Louisville, KY. See Full Bio, and David Carnoy Twitter Facebook Googleplus Executive Editor David Carnoy Has Been a Leading Member of CNET's Reviews Team since 2000." CNET. N.p., 15 Feb. 2016. Web. 20 Mar. 2017.
Solon, Olivia. "Robots will eliminate 6% of all US jobs by 2021, report says." The Guardian. Guardian News and Media, 13 Sept. 2016. Web. 20 Mar. 2017.