woensdag, april 11, 2007
Text-To-Speech and education
I've been toying with Text-To-Speech (TTS). These products allow you to input a text (from a document, typed in, from a web page,...) and hear it out loud via a synthetic, artificial voice.
The million dollar question: is TTS technology good enough today to include in e-learning? The answer: no. There has been a remarkable progress and some premium voices are sounding quite natural. But you can still tell the difference, which distracts a learner from the content. And they are OK for short periods, but you don't want to listen 15 minutes to a TTS voice unless you really have to.
The usage
That doesn't mean TTS cannot be used for learning. I suggest these usages:
- The software to generate the sound output from the text input
- The voices
The software
There are many text readers available. For our usage we need a tool that can also export the sound to files, preferably in a batch or automated process. For complete automation, the tool should support an API (interface) or command line that other programs can call.
I recommend TextAloud from Nextup.com. It's a popular and good shareware that is very cheap (29.95$ and a discount of 5$ when you purchase some voices as well). You can try it out for 30 days. It has an easy interface, supports both SAPI4 and SAPI5 voices and allows for changing pitch, tone and volume in a voice. Out of the box it exports to mp3 and wav file formats, and when you install a free extra ActiveX encoder it also exports to wma files. But the most interesting feature is the batch conversion. Just put the text files in a folder, point to it, and the tool creates the corresponding voice files. Other features I like are the possibility to change voice within a text and to add your own vocabulary.
TextAloud also has an API and a command line interface, but for that you need to pay an extra license of 250$.
Other tools I came accross:
- Ultra Hal reader; comes with the NeoSpeech voices Kate and Paul for only 24.95$ which makes it a cheaper package. No batch export or any automation.
- TextSound from ByteCool : another shareware tool, with command line tool for about $29.95. You can download a trial that lasts 50 conversions.
- Balabolka: (thanks for the link Ralph!) a freeware tool that reads text in SAPI4 or SAPI5 voices and can export to wav. No automation or batch. But totally free and good.
My recommendation for the tool: Buy TextAloud. All samples below are created with it.
The voices
Regardless of the tool you use for conversion, you want good voices. Together with the MS Agent, Microsoft did release the voices 'Sam', 'Mike' and 'Mary'. That was 5 years ago. You can tell by listening to the samples below. They are the typical robot-like monotone voices we have come to hate. Most tools will ship with those because they are free. Available in SAPI4 and SAPI5 versions.
Sample of Microsoft Sam (wav)
Sample of Microsoft Mike (wav)
Sample of Microsoft Mary (mp3)
Microsoft also included the L&H TruVoice voice engines for free download. (See the section 'free voices' at the end of this link for download.) Without going into the dramatic national story of the Belgian pride Lernout&Hauspie Speech Technologies going bust, these voices are what is left from that area. They are equally old now but better, and have non-English voices and I guess they are free because who will sue you for using them? Available in SAPI4 versions.
Sample of TruVoice CarolUK (mp3)
Sample of TruVoice PeterUK (mp3)
Now we get to the acceptable voices. Two examples below are 'special voices', also free but they represent a whisper or robot voice. Since both are not naturally speaking by definition, it doesn't matter so much it is a synthetic voice.
Sample of male whisper (mp3)
Sample of robot voice (wma)
So far the free voices, also available on http://www.bytecool.com/voices.htm.
There are also many 'premium voices' available from companies like AT&T, Cepstral, NeoSpeech, Acapela and others. They charge for voices, but the quality is much much better. These companies have invested millions in their voices, so voices for commercial use can become quite expensive.
Sample of Cepstral Amy (mp3) - unregistered voice includes a registration message
Sample of Cepstral David (mp3) - unregistered voice includes a registration message
You can buy voices here. They will be between 30-50$ each for personal use.
Very nice and recommended voices are available from NeoSpeech. You can do your own demo online at
NeoSpeech demo: http://www.neospeech.com/demo/demo_text.php
Acapela demo: http://demo.acapela-group.com
But license restrictions apply: you cannot distribute the sound files unless you pay extra. In general, AT&T Natural Voices licensing can be very expensive, Cepstral and Neospeech more reasonable, but none of the redistribtuion rights licenses start below $1500. The only affordable distribution licenses are available on the web store of Cepstral. They are offering an Audio Distribution License for $199 per voice.
My recommendation: buy the NeoSpeech Paul and Kate voices (35$) for personal use. For commercial use or distribution, buy CepStral voices.
The million dollar question: is TTS technology good enough today to include in e-learning? The answer: no. There has been a remarkable progress and some premium voices are sounding quite natural. But you can still tell the difference, which distracts a learner from the content. And they are OK for short periods, but you don't want to listen 15 minutes to a TTS voice unless you really have to.
The usage
That doesn't mean TTS cannot be used for learning. I suggest these usages:
- Inclusive learning: provide TTS as alternative medium for your students with eye or reading problems. Here there is no choice. For them it's either a monotone robot voice or nothing at all. But there are many text readers available -even standardly in the Windows XP or Vista operating system- so rather than include it in your learning package just give blind people one of the available softwares that can read the screen.
- Short sentences: you can use TTS for short parts of voice, such as an instruction "Click next to see your results" or objectives "In this lesson you will learn nothing useful but it will keep you off the street".
- Prototyping: the biggest use for TTS is in prototyping. Before you hire a professional and expensive speaker, include TTS audio files in the mock-up of your e-learning course. TTS has as a big advantage you can change sentences at no or low extra cost. Paying someone to rerecord and read one word extra in a sentence is expensive. When the prototype is accepted, go to the recording studio.
- TTS : Text-To-Speech, from written to spoken word.
- SAPI : The Speech-API. This is a standard for speech applications. Most products today will either use the older SAPI4 interface or the newer SAPI5 interface. Make sure your software and voices all adhere to the SAPI standard.
- The software to generate the sound output from the text input
- The voices
The software
There are many text readers available. For our usage we need a tool that can also export the sound to files, preferably in a batch or automated process. For complete automation, the tool should support an API (interface) or command line that other programs can call.
I recommend TextAloud from Nextup.com. It's a popular and good shareware that is very cheap (29.95$ and a discount of 5$ when you purchase some voices as well). You can try it out for 30 days. It has an easy interface, supports both SAPI4 and SAPI5 voices and allows for changing pitch, tone and volume in a voice. Out of the box it exports to mp3 and wav file formats, and when you install a free extra ActiveX encoder it also exports to wma files. But the most interesting feature is the batch conversion. Just put the text files in a folder, point to it, and the tool creates the corresponding voice files. Other features I like are the possibility to change voice within a text and to add your own vocabulary.
TextAloud also has an API and a command line interface, but for that you need to pay an extra license of 250$.
Other tools I came accross:
- Ultra Hal reader; comes with the NeoSpeech voices Kate and Paul for only 24.95$ which makes it a cheaper package. No batch export or any automation.
- TextSound from ByteCool : another shareware tool, with command line tool for about $29.95. You can download a trial that lasts 50 conversions.
- Balabolka: (thanks for the link Ralph!) a freeware tool that reads text in SAPI4 or SAPI5 voices and can export to wav. No automation or batch. But totally free and good.
My recommendation for the tool: Buy TextAloud. All samples below are created with it.
The voices
Regardless of the tool you use for conversion, you want good voices. Together with the MS Agent, Microsoft did release the voices 'Sam', 'Mike' and 'Mary'. That was 5 years ago. You can tell by listening to the samples below. They are the typical robot-like monotone voices we have come to hate. Most tools will ship with those because they are free. Available in SAPI4 and SAPI5 versions.
Sample of Microsoft Sam (wav)
Sample of Microsoft Mike (wav)
Sample of Microsoft Mary (mp3)
Microsoft also included the L&H TruVoice voice engines for free download. (See the section 'free voices' at the end of this link for download.) Without going into the dramatic national story of the Belgian pride Lernout&Hauspie Speech Technologies going bust, these voices are what is left from that area. They are equally old now but better, and have non-English voices and I guess they are free because who will sue you for using them? Available in SAPI4 versions.
Sample of TruVoice CarolUK (mp3)
Sample of TruVoice PeterUK (mp3)
Now we get to the acceptable voices. Two examples below are 'special voices', also free but they represent a whisper or robot voice. Since both are not naturally speaking by definition, it doesn't matter so much it is a synthetic voice.
Sample of male whisper (mp3)
Sample of robot voice (wma)
So far the free voices, also available on http://www.bytecool.com/voices.htm.
There are also many 'premium voices' available from companies like AT&T, Cepstral, NeoSpeech, Acapela and others. They charge for voices, but the quality is much much better. These companies have invested millions in their voices, so voices for commercial use can become quite expensive.
Sample of Cepstral Amy (mp3) - unregistered voice includes a registration message
Sample of Cepstral David (mp3) - unregistered voice includes a registration message
You can buy voices here. They will be between 30-50$ each for personal use.
Very nice and recommended voices are available from NeoSpeech. You can do your own demo online at
NeoSpeech demo: http://www.neospeech.com/demo/demo_text.php
Acapela demo: http://demo.acapela-group.com
But license restrictions apply: you cannot distribute the sound files unless you pay extra. In general, AT&T Natural Voices licensing can be very expensive, Cepstral and Neospeech more reasonable, but none of the redistribtuion rights licenses start below $1500. The only affordable distribution licenses are available on the web store of Cepstral. They are offering an Audio Distribution License for $199 per voice.
My recommendation: buy the NeoSpeech Paul and Kate voices (35$) for personal use. For commercial use or distribution, buy CepStral voices.
Labels: Cepstral, NeoSpeech, Text-To-Speech, TextAloud, TruVoice, TTS

