Text to Speech for EFL ESL Materials

Text to Speech (TTS) technology has come a long way in recent years and this is nowhere more evident than on the Read The Words website.

I've just been having a look at the site and trying to decide whether it has real potential for helping EFL ESL students with their listening, reading and pronunciation.

As an experiment I decided to select quite a challenging text and see what the site could do. I also decide to select a British English accent, as in the past I know that TTS systems had struggled more with UK accents than US ones, due to the wider range of sounds in UK English.

Anyway, here are the results. The text is from Wikipedia.org at:  and is about the challenges of text normalisation in TTS.

  • Click here to watch Elizabeth read the text to yousten using tis media 
This is the actual text you should be hearing:

"Text normalization challenges

The process of normalizing text is rarely straightforward. Texts are full of heteronyms, numbers, and abbreviations that all require expansion into a phonetic representation. There are many spellings in English which are pronounced differently based on context. For example, "My latest project is to learn how to better project my voice" contains two pronunciations of "project".

Most text-to-speech (TTS) systems do not generate semantic representations of their input texts, as processes for doing so are not reliable, well understood, or computationally effective. As a result, various heuristic techniques are used to guess the proper way to disambiguate homographs, like examining neighboring words and using statistics about frequency of occurrence.

Deciding how to convert numbers is another problem that TTS systems have to address. It is a simple programming challenge to convert a number into words, like "1325" becoming "one thousand three hundred twenty-five." However, numbers occur in many different contexts; when a year or perhaps a part of an address, "1325" should likely be read as "thirteen twenty-five", or, when part of a social security number, as "one three two five". A TTS system can often infer how to expand a number based on surrounding words, numbers, and punctuation, and sometimes the system provides a way to specify the context if it is ambiguous.

Similarly, abbreviations can be ambiguous. For example, the abbreviation "in" for "inches" must be differentiated from the word "in", and the address "12 St John St." uses the same abbreviation for both "Saint" and "Street". TTS systems with intelligent front ends can make educated guesses about ambiguous abbreviations, while others provide the same result in all cases, resulting in nonsensical (and sometimes comical) outputs. "

What I like about the site
  • The site is free though you do have to register.
  • The site creates a number of options once it has converted the text to speech. This includes creating an Mp3 file to download, creating an embed code to embed the audio into a blog or website, or download to i-pod.
  • They have quite a selection of avatars and voices
  • The site can convert text from a number of sources including Word, PDF, a website (just type in the URL) or even an RSS feed!
  • You can make the texts private or public
  • There doesn't seem to be a limit on many you can create
What I wasn't so sure about
  • I found it hard to get a link to the avatar reading the text. It would have been nice to be able to embed her into my blog, but I just couldn't get that to work.
  • Processing the text can take a while.
I haven't added any teaching suggestions yet for this posting, as I'm interested to see what other teachers think about this before I do that.

So, if you've listened to the text, please do send in a comment and let me know what you think about the useability of a tool like this with EFL ESL students.