In Other WORDS Technical Writing, Translation, Training,
  USA and  Israel Technical Writing, Translation, Training, USA and Israel |:: Site Map :: Home
  


 

Using Text-to-Speech Technology in Presentation Software

by Dr. Joel Harband, founder of Tuval Software Industries, Ltd. Dr. Harband has a PhD in Mathematics from New York University and extensive professional experience in both software engineering and technical documentation. His company has recently developed PowerSpeak, a state-of-the-art text-to-speech based platform for cost effective speech-empowerment of PowerPoint® presentations.

Text-to-speech (TTS) technology is not new, but it has recently developed in quality to the point where it can now be used in presentation software like Microsoft® PowerPoint® to empower presentations with affordable, quality narration.

Overcoming Limitations in PowerPoint
These advances in TTS technology, which are described in this article, come just in time to answer a need in PowerPoint presentations that is being discussed more and more in the literature.
Verbal Channel vs. Visual Channel
Richard E. Mayer, Professor of Psychology at the University of California, Santa Barbara says that there are three important features of the human information processing system that are particularly relevant for PowerPoint users:
  1. Dual-channels – people have separate information processing channels for visual material and verbal material;
  2. Limited capacity – people can pay attention to only a few pieces of information in each channel at a time;
  3. Active processing – people learn best by actively organizing the presented material and integrating it with their prior knowledge.
Based on the dual-channel characteristic, Professor Mayer has defined the modality principle: people learn better from animation with spoken text than animation with printed text. Accordingly, PowerPoint presentations should use both visual and verbal forms of communication. And the verbal information should be more than just reading aloud what already appears on the screen.

In connection with the limited capacity trait, research has shown that there can be a visual overload in presentation software – more information is presented as words and pictures than the viewer can absorb.
Balancing Narration with Powerful Visuals
The proposed solution is to seek a balance of screen visuals and verbal narration to influence the viewer on both channels. In a live presentation, a capable presenter can provide the required narration as can high-level synchronized voice-over recordings. However, a stand-alone presentation, such as a promotional or a training presentation, is inherently limited in its ability to influence the viewer – no matter how good the visuals are.

This is where the new TTS technology fits in nicely: it is an affordable and convenient way to provide audio information synchronized with the presentation visuals to have the maximum effect on the viewer. It also lets the presentation designer remove written text from the screen, freeing up valuable real estate for the more powerful pictures and graphics that better illustrate the narration.

The result is so good that presenters may choose to play parts or all of the presentation like a movie clip, pausing to add information or to answer questions.
What is Text-to-Speech?
Text-to-Speech (TTS) is the automated synthesis of speech from text. The heart of the system is the TTS engine – a sophisticated piece of software that:
  • parses the text input,
  • analyzes its grammar, sentence structure, punctuation and capitalization, and
  • activates voice simulations to produce a vocal rendering of the text.
The data for individual voices, including regional accents, are provided in separate files called "voices". The TTS engine can work with any of the voices interchangeably. The TTS system is illustrated in the following diagram.

TTS system
Improved TTS Technology
Today's TTS technology is much improved over that of even a few years ago. The older systems—which produced the robotic-like sounds that people tend to associate with computer voices—used the parametric or formant synthesis method to simulate the acoustic properties of speech.
Using a Real Voice
Recently, voices that use the concatenation method have become commercially available: the voice of a real human speaker is divided into phonems, which are stored in the voice file. In a particular application, the TTS engine assembles the phonems according to the input text to reconstruct the original human voice to speak the text. Because a real human voice is used, it is sometimes hard to tell the difference between it and the real thing.
TTS Applications
TTS technology can help businesses save time and money, especially when compared to the alternative of using pre-recorded speech files. The cost per minute of audio track of the TTS system is less than half the cost of studio-based recording and synchronization. The savings are especially significant in the following cases:
  • Where the text can change – with TTS, changes are made by simple text editing as opposed to expensive studio re-recording
  • Multi-lingual applications – with TTS, you simply switch voices to the language desired
  • When you need to make deadlines – the TTS system is available any time, any place
PowerSpeak Platform
However, to make this work in a cost-effective way, you need an efficient platform for integrating the TTS system with the presentation software. Tuval Software Industries' PowerSpeakTM is such a platform for PowerPoint. Click here for more information on how In Other WORDS can speech-empower your presentations with PowerSpeak.
References:
PowerPoint discussions:
http://www.sociablemedia.com/articles_mayer.htm
http://www.sociablemedia.com/PDF/atkinson_mayer_powerpoint_4_23_04.pdf
http://www.ntlf.com/html/sf/notevil.htm
http://www.sethgodin.com/freeprize/reallybad-1.pdf

TTS system in applications:
http://www.tmaa.com/tts/TTS%20overview.pdf
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnnetspeech/html/txt2spch.asp

^ top

Technical Writing, Translation, Training,
  USA and  Israel © 2007 In Other WORDS All Rights Reserved