5 Myths about Speaker Dependent Voice Solutions

Posted by on Aug 12, 2015 9:32:00 PM


For voice solutions that give workers task direction and accept their spoken commands as input, recognition quality is everything. The challenges for any speech recognition system are significant including:

  • The variety of ways the same word or phrase is said by different users
  • Changes in the user's voice within a day or from day to day
  • Loud background noise from fans, loud speakers, radios, conveyors, forklifts, sirens, etc.
  • Changes in that background noise from one area of the facility to another
  • Acoustic diversity of industrial and manufacturing environments

Myth 1: Speaker dependent solutions are the only way to achieve excellent speech recognition.

Fact: Speaker independent solutions, when implemented correctly, provide excellent voice recognition in noisy industrial environments and have significant advantages over their speaker dependent counterparts including recognition quality and improved productivity.

  1. It was not so long ago that speaker independent solutions were poor at recognizing users in industrial environments. However, through a combination of faster processors, improved independent speech recognition engines, better microphones, and more sophisticated software noise elimination techniques, speaker independent systems now provide excellent speaker recognition. In fact, the best-of-breed speaker independent systems have improved to the point where speakers are recognized approximately 99.6% of the time. This advantage gives workers the immediate ease of use that comes with speaker independent systems enabling them to focus on their tasks without worrying if the technology is working.
  2. Speaker dependent systems are beneficial for the small percent of the users who do have unique ways of saying a word or command. An independent speaker system enables you to address this issue by training specific words for this small group instead of making the entire user group train their words.
  3. Not all speaker independent solutions can handle the variety of speakers and background noises. Many still have yet to achieve the right combination of factors to provide consistent, excellent speaker recognition. Do your research when investigating speaker independent solutions. Before you buy, talk to users of their system.

Myth 2: The reason new entries into the voice enablement market use speaker independent engines is because they are inexpensive and easier to develop.

Fact: Yes, speaker independent systems are easier to develop and cost the user less. That is a good thing for developers and users. Why build a voice solution around an old technology when the new technology is better?

  1. In reality as companies have moved into the speech market they realize there is no need to develop the basic voice engine--they exist from several well-established providers. By using a commercially available speech engine, a company can focus on the issues that adapt that engine to their customer's needs. As advances in recognition technology are made, the speech engine manufacturer is equipped to implement those advances quickly so that existing clients can quickly move the enhancements to the field.
  2. No speech provider has developed a speaker dependent engine in the last six to ten years. The existing speaker independent engines are just that good.

Myth 3: Training speaker dependent systems is trivial and easy to do.

Fact: Each user of a speaker dependent solution has to train the speaker dependent solution to their voice. Speaker dependent providers downplay this training. It may take anywhere from 20 to 40 minutes to train their systems to a specific user's voice. Also, because users are new to the system they may train the system differently than the way they speak when they are actually performing their tasks. This causes recognition errors and can frequently cause a user to train the system a second, third, or fourth time.

  1. As the user's voice changes from day to day and from first thing in the shift to late in the shift, the user may have to retrain words in their voice template again. Any training or retraining of words is lost productivity.

Myth 4: Speaker dependent solutions allow you to provide any responses you want and still achieve good recognition.

Fact: When user responses get too short, they can cause what are called "inserts." These are times when the speaker dependent engine returns what it thought was an appropriate response but in fact it heard a horn beep or a gate squeak or a PA system.

  1. Frequently users need to train a number as "fiver" instead of five or "niner" instead of nine to get adequate recognition.
  2. Speaker independent systems can suffer from the same behavior. That is why you need to pick a speech provider that has a solution built and tested to work in noisy and frequently changing environments. What works in a conference room may not work next to a noisy conveyor.
  3. The system should be able to adapt automatically without making any configuration changes or retraining any users when the background sound changes significantly.

Myth 5: Once a speaker dependent solution is trained, it remains trained.

Fact: A user's voice is not always the same. The voice can become tired at the end of a hard day and cause recognition issues for a speaker dependent solution. A person that has a cold sounds very different from that same person without the cold. The voice of a person who comes to work a day after coaching their son's little league game can be significantly different from their normal voice. All of these can cause problems for a speaker dependent solution.

  1. Speaker dependent solutions have also been known to have to be re-trained when a user starts working in a different part of the warehouse with very different background noise.  For example, moving from floor pick where there is little noise to pick-to-belt where there is a noisy conveyor.
  2. If an issue arises with improper training of a speaker dependent solution, these challenges are addressed by either ignoring the 'inserts' or retraining voice templates in hope that it addresses the issue. These solutions rely heavily on noise cancelling microphones, a woefully inadequate "fix" when noise comes from all directions (especially from behind the user). The burden is on the user to train templates, retrain words, and test for background noise changes. These retraining and trial and error remediation tasks can significantly reduce productivity.
  3. With speaker dependent solutions, templates for each user must be stored and managed, adding to the overhead and cost of the solution. They require IT resources, a server, and user time. These activities and requirements all take away from the return on investment (ROI) expected from a voice solution.

Moving from the Myths to the Facts

Speaker dependent solutions used to be the only way to go for industrial environments. That has changed. Speaker independent solutions are now providing speaker recognition accuracy that has set a new bar in the industry without the problems inherent in speaker dependent solutions. There is now at least one speaker dependent provider that provides a speaker independent solution as well.

Today's best-of-breed speaker independent solutions offer the least hassle for your users, the best voice recognition in noisy or changing environments, and the best ROI in voice productivity.

rfid solutions that drive performance white paper

Related posts

Topics: Warehouse Managment

Did you find this interesting? Please share!