Speak to me!
Ubuntu comes with a built in speech synthesizer and an application called Orca which acts as a screen reader. Orca “looks” at what is on the page, decides what to say, then passes what it has to say to an intermediate service called speech dispatcher. Speech dispatcher then decides which voice synthesizer to use and gets the text read out. It is designed so that it could use different synthesizers for different languages, or different voices. So the chain is like this:
Application that wants to speak “hello world” -> speech dispatcher -> speech synthesizer -> audio output
By default the speech synthesizer is one called espeak which has a number of synthetic voices (they sound like a robot should sound). You can try this now, go to a terminal window and enter the following:
spd-say "hello world"
If all is working you should hear your computer talking. So this is the default voice that is included on the size restricted Ubuntu CD. Whilst espeak is pretty good and synthetic voices perform well when speeded up (some blind people listen very very fast) there are more realistic voices and more sophisticated speech synthesizers available. The one I think is probably the most promising is called OpenMARY. This has a range of voices available including a number based on the rather good Hidden Markov Model technique. OpenMARY runs as a web server with a REST API, you go to the right URL and you get back a .wav file with the sound you asked for. Feel free to have a play with an OpenMARY server I installed on one of our servers, I have installed a bunch of decent hmm voices, try saying different things with them. So this is all rather fun and you can hear the difference in quality between the rather robotic espeak and the nearly human sounding OpenMARY, the next step is to get it working through speech dispatcher.
To do this I am using the speech dispatcher generic module which is a simple way of getting them to play together. Writing a proper module specific to OpenMary would allow a few more features to be used. To try this out we will be installing just a configuration file on your machine which tells speech dispatcher to pass the text it wants to say out to our OpenMARY server on the internet, you get back a wav file and it plays it. This is just meant for playing about really, don’t rely on our server to be there all the time and this also means that if I felt like it I could look at a log file on the server and see what you are saying (I can’t be bothered, and I don’t care what you say).
- Check that spd-say “hello world” actually works. If it doesn’t then go fix that first.
- Download the config file from http://people.ubuntu.com/~alanbell/openmary.conf
- Copy the config file to /etc/speech-dispatcher/modules/
sudo cp openmary.conf /etc/speech-dispatcher/modules/
- edit the speech dispatcher config to load your new module configuration file
sudo nano /etc/speech-dispatcher/speechd.conf
- Find the bit with all the AddModule lines (most are commented out with a #) and add a line containing:
AddModule "openmary" "sd_generic" "openmary.conf"
- Save and exit
- stop the speech dispatcher service if it is running
sudo killall speech-dispatcher
- Try it out
spd-say -o openmary "hello world"
If you want to run your own openmary server locally (possibly better performance, works offline, more privacy) then edit the openmary.conf file and change the mumble.libertus.co.uk bits to localhost (or whatever server you want to point it at). To use your new voice in Orca go to the preferences window and select speech dispatcher and openmary as the synthesizer. The generic module only seems to allow you to use the default voice (it doesn’t report the list of voices available back to Orca). A proper module would do that and get other features available for Orca to control. If anyone wants to help with that it would be great.
As Stephane Graber pointed out you may need the sox and curl packages for this to work, I thought they were part of the default install, but maybe not. The bit that makes the magic happen is this line:
"curl "http://mumble.libertus.co.uk:59125/process?INPUT_TEXT=`echo $DATA|sed 's/ /%20/g'`&INPUT_TYPE=TEXT&OUTPUT_TYPE=AUDIO&AUDIO=WAVE_FILE&LOCALE=en_GB&VOICE=$VOICE" > $TMPDIR/openmary.wav && play $TMPDIR/openmary.wav >/dev/null"
which inserts $DATA (The phrase it wants to say) and $VOICE into a URL which returns a wav file, curl retrieves the file and we output it to /tmp/openmary.wav. Then we use the play command to turn the wav file we just downloaded into sound.