Speak to me!

Ubuntu comes with a built in speech synthesizer and an application called Orca which acts as a screen reader. Orca “looks” at what is on the page, decides what to say, then passes what it has to say to an intermediate service called speech dispatcher. Speech dispatcher then decides which voice synthesizer to use and gets the text read out. It is designed so that it could use different synthesizers for different languages, or different voices. So the chain is like this:

Application that wants to speak “hello world” -> speech dispatcher -> speech synthesizer -> audio output

By default the speech synthesizer is one called espeak which has a number of synthetic voices (they sound like a robot should sound). You can try this now, go to a terminal window and enter the following:

spd-say "hello world"

If all is working you should hear your computer talking. So this is the default voice that is included on the size restricted Ubuntu CD. Whilst espeak is pretty good and synthetic voices perform well when speeded up (some blind people listen very very fast) there are more realistic voices and more sophisticated speech synthesizers available. The one I think is probably the most promising is called OpenMARY. This has a range of voices available including a number based on the rather good Hidden Markov Model technique. OpenMARY runs as a web server with a REST API, you go to the right URL and you get back a .wav file with the sound you asked for. Feel free to have a play with an OpenMARY server I installed on one of our servers, I have installed a bunch of decent hmm voices, try saying different things with them. So this is all rather fun and you can hear the difference in quality between the rather robotic espeak and the nearly human sounding OpenMARY, the next step is to get it working through speech dispatcher.

To do this I am using the speech dispatcher generic module which is a simple way of getting them to play together. Writing a proper module specific to OpenMary would allow a few more features to be used. To try this out we will be installing just a configuration file on your machine which tells speech dispatcher to pass the text it wants to say out to our OpenMARY server on the internet, you get back a wav file and it plays it. This is just meant for playing about really, don’t rely on our server to be there all the time and this also means that if I felt like it I could look at a log file on the server and see what you are saying (I can’t be bothered, and I don’t care what you say).

  • Check that spd-say “hello world” actually works. If it doesn’t then go fix that first.
  • Download the config file from http://people.ubuntu.com/~alanbell/openmary.conf
    wget http://people.ubuntu.com/~alanbell/openmary.conf
  • Copy the config file to /etc/speech-dispatcher/modules/
    sudo cp openmary.conf /etc/speech-dispatcher/modules/
  • edit the speech dispatcher config to load your new module configuration file
    sudo nano /etc/speech-dispatcher/speechd.conf
  • Find the bit with all the AddModule lines (most are commented out with a #) and add a line containing:
    AddModule "openmary" "sd_generic" "openmary.conf"
  • Save and exit
    Ctrl+x, y
  • stop the speech dispatcher service if it is running
    sudo killall speech-dispatcher
  • Try it out
    spd-say -o openmary "hello world"

If you want to run your own openmary server locally (possibly better performance, works offline, more privacy) then edit the openmary.conf file and change the mumble.libertus.co.uk bits to localhost (or whatever server you want to point it at). To use your new voice in Orca go to the preferences window and select speech dispatcher and openmary as the synthesizer. The generic module only seems to allow you to use the default voice (it doesn’t report the list of voices available back to Orca). A proper module would do that and get other features available for Orca to control. If anyone wants to help with that it would be great.

-Edit-

As Stephane Graber pointed out you may need the sox and curl packages for this to work, I thought they were part of the default install, but maybe not. The bit that makes the magic happen is this line:

"curl "http://mumble.libertus.co.uk:59125/process?INPUT_TEXT=`echo $DATA|sed 's/ /%20/g'`&INPUT_TYPE=TEXT&OUTPUT_TYPE=AUDIO&AUDIO=WAVE_FILE&LOCALE=en_GB&VOICE=$VOICE" > $TMPDIR/openmary.wav && play $TMPDIR/openmary.wav >/dev/null"

which inserts $DATA (The phrase it wants to say) and $VOICE into a URL which returns a wav file, curl retrieves the file and we output it to /tmp/openmary.wav. Then we use the play command to turn the wav file we just downloaded into sound.

8 Comments

  • You’ll need both “sox” and “curl” installed for your config to work.

  • Kai says:

    First, thanks for pointing me towards OpenMARY – some of those voices are freakishly human, and I’ve been stuck in the uncanny valley for about an hour now. That’s an honest thank you, not a sarcastic one, by the way. I’m enjoying myself being unproductive.

    My only problem is, no matter how I configure speech dispatcher, it outputs in eSpeak. And since this is the only thing I can find anywhere on the internet that even mentions using OpenMARY with spd, I’m going to randomly comment asking if you have any ideas? Sorry about that, I guess.

    • If you want Speech Dispatcher to use OpenMary as the standard system, you’re going to have to go into “/etc/speech-dispatcher/speechd.conf” and change DefaultModule from espeak to openmary.

      There should be a GUI for Speech Dispatcher.

  • I’m going to try to do the same for this for Jovie on KDE. I really think more work with speech synthesis services on desktop environments as a whole need some work.

  • sai says:

    Nice. However, reading multiple sentences with special characters doesn’t work with this setup. After I saw the following line in your script, I knew why:
    GenericExecuteSynth \
    "curl \"http://localhost:59125/process?INPUT_TEXT=`echo $DATA|sed 's/ /%20/g'`&INPUT_TYPE=TEXT&OUTPUT_TYPE=AUDIO&AUDIO=WAVE_FILE&LOCALE=en_GB&VOICE=$VOICE\" > $TMPDIR/openmary.wav && play $TMPDIR/openmary.wav >/dev/null"

    I replaced it with a less error-prone line of code:
    GenericExecuteSynth \
    "curl -G \"http://localhost:59125/process\" --data-urlencode \"INPUT_TEXT=$DATA\" --data-urlencode \"INPUT_TYPE=TEXT\" --data-urlencode \"OUTPUT_TYPE=AUDIO\" --data-urlencode \"AUDIO=WAVE_FILE\" --data-urlencode \"LOCALE=en_GB\" --data-urlencode \"VOICE=$VOICE\" > $TMPDIR/openmary.wav && play $TMPDIR/openmary.wav >/dev/null"

  • anonymous says:

    This wont work for me.
    When i type spd-say -o openmary “helloworld”
    i get no sound but
    spd-say “helloworld”
    works

    • Alan Bell says:

      my openmary server has been decommissioned, so this script won’t work any more, but you are welcome to run your own, or change the conf file to point at a public openmary server.

Leave a Reply

XHTML: You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>