(This post has been recorded as an AudioBlog. See bottom of page.)
By popular demand, here’s a summary of my experience with using Text-to-Speech (TTS) technology to record an audiobook edition of Irish Firebrands.
As a person with multiple disabilities, I’m acutely aware of the limited options for people like me. In addition, I wanted to make an audible copy of my first novel for my mother, who had gone blind while I was writing it.
My mother was an avid, eclectic reader, who amassed an enormous personal library, read to me from my infancy, and taught me to read. Cataract surgery restored enough of her sight for her to enjoy the landscapes visible from her windows, and to watch television, but because of eye damage from other causes, she can see only parts of pictures. Her brain makes Gestalts to fill in what’s missing, although a related disadvantage of that, is she also developed Charles Bonnet syndrome (visual hallucinations that can afflict sighted persons who become blind).
It’s also impossible for my mother to read large-print books or even magnified characters on screens, so for many years she’s had to rely on talking books from the National Library Service for the Blind and Physically Handicapped, at The Library of Congress. When I published Irish Firebrands, all she could do was hold a paper copy in her hands and admire the cover art.
There are many TTS software packages, most of which use a combination of operating system voices and proprietary voices from other sources. They cost a fraction of what hiring voice talent would cost, but even so, the programs are still too pricey for my nonexistent Indie budget. So I downloaded Balabolka, free software that uses a computer’s built in SAPI 4 or SAPI 5 voices. It reads text in 16 formats (including DOC, DOCX, EPUB, HTML, MOBI, PDF, and RTF), and records in formats with these filename extensions: .wav, .mp3, .mp4, .ogg, .wma, .m4a, .m4b, and .awb.
For the basic document, you choose one voice and set its rate, pitch and volume, but you can record different sections separately and combine them, and the software will also combine different recording formats into one audio file. You need to know how to nest HTML commands (for temporary changes to rate, pitch, and volume), but no other programming ability is necessary. Balabolka is supposed to be able to accept changes to its pronunciation database, and to add emphasis, but I haven’t been able to get those things to work, although that may be a limitation of the voices instead of the software.
Similar problems exist with Adobe Acrobat’s Read Out Loud utility, which uses only whatever built-in voices are available. This characteristic makes Read Out Loud of limited utility as an audiobook option, because the changes you make to the text to fix pronunciation problems for one computer voice, don’t necessarily work when the document is read by another person’s computer. It also has the annoying habit of reading everything on the page, including headers and footers, and it will pause at page breaks and the end of every line that terminates with a hard return. And depending on the PDF conversion settings, it may read aloud the punctuation, along with the text.
For best results in Read Out Loud, you have to strip out page breaks, headers, footers and apostrophes; then convert the file to PDF, using Standard formatting (no conversion alterations). When you listen to the PDF, take note of any additional pronunciation problems, fix them in your source document, and re-format. Anybody else who listens to the document must use the same voice preference settings you used.
From Text To Speech is a free online service, and you can save the files you record. It offers a selection of proprietary voices in American and UK English, as well as pronunciation for other major languages. The proprietary Peter and Rachel (both UK voices) that they use both sound good, with fewer mispronunciation problems, and the best ability to automatically add emphasis and interrogatory inflection. I’ve used Peter and Rachel to narrate recordings that appear on this blog. The drawbacks of the website include a limited number of voice adjustment options, it may be set up to periodically block the ISPs of frequent users, and the length of time it takes to generate an MP3 means it’s appropriate only for short reading selections.
After replacing the computer that I used to write Irish Firebrands, I discovered that the Windows 8 OS came with 3 new SAPI 5 voices: David and Zira (American English) and Hazel (UK English). Hazel is the only one of the three that automatically pronounces “Celtic” properly, with a hard C – but, oddly enough, she can’t say the name of my female main character, Lana. (UK Peter, on the other hand, can’t pronounce the name of my male MC, Dillon.) Although they’re afflicted with the same limitations of most other computer-generated voices (they don’t automatically elide, nor can they express emphasis and questions without help), their otherwise lifelike timbre made them a vast improvement over the SAPI 4 generation of voices.
Aside from difficulties due to hearing loss, I find most SAPI 4 voices impossible to listen to for any length of time, although some Sci-Fi writers may like to use them for their hollow, “robotic” qualities. In the Olden Days of cinematic and television sci-fi, it was assumed that robots would express themselves in flat, unfeeling tones – until the advent of the shouting, gesticulating robot in Lost in Space (“Warning! Danger, Will Robinson!”), who struggled with his emotions.
He was followed by the frankly psychotic HAL9000 (“I’m sorry, Dave…”). Eventually Droids came out of the closet with their feelings: in Star Wars, a machine sounds like a man (C3PO and his many emotional meltdowns), while a man sounds like a machine (James Earl Jones’s sinister inflection, helped out with a SCUBA respirator, as Darth Vader). R2D2 still “speaks” only with beeps and boops, but his whistles and squeals are distinctly anthropomorphic.
Before starting on the recording, I had to learn how to use the voices at my disposal. To do this, I recorded a book trailer with a voice-over track. I used all three of the new voices, and MovieMaker software. The work took about a week.
On the basis of this virtual audition, and about six months of additional testing, I decided that I liked Hazel, the UK voice. To me, the enunciation of most British actors naturally sounds more clipped than that of Americans (who elide, or drop, most of their gerund Gs and many middle Ts, and soften lots of terminal Ds). Hazel uses non-rhotic Received Pronunciation (dropping Rs, or, paradoxically, inserting them where they don’t exist, such as between a word that ends with a vowel, and one that begins with one), but I was willing to trade the necessity of creating David and/or Zira’s endless elisions, for Hazel’s non-rhotic-English habits.
Since then, I’ve figured out how to trick Hazel into pronouncing some Rs, which has improved the clarity of a few words, but she definitely doesn’t sound Irish, because like most varieties of American English, Hiberno-English is rhotic: The Irish pronounce their Rs. But Hazel has learned a little bit of Gaeilge, with the help of the synthesizer at abair.ie.
I’ve learned to correct the multitude of bizarre mispronunciations that crop up unexpectedly, by creatively misspelling words, hyphenating syllables, running words together, changing pitch and speed, dropping terminal punctuation – and adding a few elisions. Unfortunately, there are very few changes that can be made with Balabolka’s global find-and-replace function: most of Hazel’s mispronunciations are dependent on syntax.
Many people dislike computer-generated voices, on principle: The owner of an audiobook hosting service refused to accept my recording, when it came out that I was doing it with TTS technology, even though many of the human-read stories on the site are badly performed or ill-recorded (e.g., sloppy diction, uneven volume, background noise, etc.). It’s also been difficult to recruit and retain beta readers, so I’m very grateful to those who have stuck with the project. Their feedback has been invaluable, while I’ve worked to whip the narration into shape. When it’s “as clean as humanly (and robotically) possible,” the Irish Firebrands audiobook will be available for distribution to the visually-impaired … beginning with Mama.
Readers and writers who decide to try Balabolka are welcome to ask me questions (in comments here, or via the Guestbook page on the Feedback menu) about specific pronunciation problems they’re encountering. I may have already found a tweaking trick that will work for you, too. And anyone out there who has some favorite fixes, please share them with us? No sense in all of us reinventing the wheel! Eyes – ears – even sanity – may be at stake! Thanks!
This blog post was recorded in Microsoft Hazel United Kingdom English, edited for rate, pitch, and pronunciation, using Balabolka text-to-speech converter. How many pronunciation edits can you find?
Seeking Visually Disabled Beta Readers for Irish Firebrands text-to-speech (TTS) audiobook testing. Click HERE for Details.