- What is gnuspeech
- What is the goal of the gnuspeech project?
- Why is it called gnuspeech?
- Getting help with gnuspeech
additional packages for gnuspeech
- Further information
If you want to help with gnuspeech
Those who have helped research, develop and port gnuspeech
gnuspeech makes it easy to produce high quality computer speech output, design new language databases, and create controlled speech stimuli for psychophysical experiments.
The suite of programs uses a true articulatory model of the vocal tract and incorporates models of English rhythm and intonation based on extensive research that sets a new standard for synthetic speech.
The original NeXT computer implementation is complete. The ports to both OS X and GNU/Linux provide English text-to-speech capability, but parts of the database creation tools are still in the process of being ported.
Credits for research and implementation of the gnuspeech system appear the section
Thanks to those who have helped below. Some of the features of gnuspeech, with the tools that are part of the software suite, tools include:
- A Tube Resonance Model (TRM) for the human vocal tract (also known as a transmission-line analog, or a waveguide model) that truly represents the physical properties of the tract, including the energy balance between the nasal and oral cavities as well as the radiation impedance at lips and nose.
- A TRM Control Model, based on formant sensitivity analysis, that provides a simple, but accurate method of low-level articulatory control. By using the Distinctive Region Model (DRM) only eight slowly varying tube section radii need be specified. The glottal (vocal fold) waveform and various suitably “coloured” random noise signals may be injected at appropriate places to emulate voicing, aspiration, frication and noise bursts.
- Databases which specify: the characteristics of the articulatory postures (which loosely correspond to phonemes); rules for combinations of postures; and information about voicing, frication and aspiration. These are the data required to produce specific spoken languages from an augmented phonemic input. Currently, only the database for the English language exists, though French vowel postures are also included.
- A text-to-augmented-phonetics conversion module (the Parser) to convert arbitrary text, preferably incorporating normal punctuation, into the form required for applying the synthesis methods.
- Realistic Models of English rhythm and intonation that are automatically applied.
- Monet—a database creation and editing system, with a carefully designed graphical user interface (GUI) that allows the databases containing the necessary phonetic data and dynamic rules to be set up and modified in order that the computer can “speak” arbitrary languages.
- A 70,000+ word English Pronouncing Dictionary with rules for derivatives such as plurals, and adverbs, and including 6000 given names. The dictionary also provides part-of-speech information for later addition of grammatical parsing that can further improve the excellent pronunciation, rhythm and intonation .
- Sub-dictionaries that allow different user- or application-specific pronunciations to be substituted for the default pronunciations coming from the main dictionary.
- Letter-to-sound rules to deal with spellings and words that are not in the dictionaries.
- Tools for managing the dictionary and carrying out analysis of speech.
- Synthesizer—a GUI-based application to allow experimentation with a stand-alone TRM. All parameters, both static and dynamic, may be varied and the output can be monitored and analysed. It is an important component in the research needed to create the databases for target languages.
Overview of the main Articulatory Speech Synthesis System
More detailed information on the components noted above appears in what follows, with an indication of their current state in the OS X and GNU/Linux GNUStep ports, their origin, and some suggestions for further work.
It is a play on words. This is a new (g-nu) “event-based” approach to speech synthesis from text, that uses an accurate articulatory model rather than a formant-based approximation. It is also a GNU project, aimed at providing high quality text-to-speech output for GNU/Linux (and Mac OS X). In addition, it provides comprehensive tools for psychophysical and linguistic experiments as well as for creating the databases for arbitrary languages.
The goal of the project is to create the best speech synthesis software on the planet.
Although no official release has been made yet (but will occur “real soon now”), unofficial packages for GNUStep, Mac OS X and NeXT (NeXTSTEP 3.0), are available for anonymous download from the SVN repository. All provide text-to-speech capability. For GNUStep and OS X the database creation and inspection tools (such as Synthesizer) can be used as intended, but work remains to be done to complete the database creation components of Monet that are needed for psychophysical/linguistic experiments, and for setting up new languages. The SVN Repository material may soon be migrated to a Git Repository instead. Stay tuned.
It would be very helpful if those obtaining and using the pre-release material would also join the mailing list (as explained below), and provide some feedback, ask questions, and so on.
Those willing to help with the project are invited to contact the authors/developers through the gnu project facilities. Both helpers and users can join the project mailing list by visiting the subscription page, and send mail to the group. Offers of help receive special attention!
The project implementation history, explaining the components, is presented on a new page to reduce clutter.
In summary, much of the core software has been ported to the Mac under OS/X, and GNU/Linux under GNUStep. All sources and builds are currently in the SVN repository under those three branches. Speech may be produced from input text. The development facilities for managing and creating new language databases, or modifying the existing English database for text-to-speech are incomplete, but coming along. These facilities also provide the tools needed for psychophysical and linguistic experiments. Synthesizer, which gives direct access to the tube model, is about 70% complete and already usefully functional—some of the data displays are stubs at present and clean-up is needed. Some accessory tools are available.
gnuspeech is currently fully available as a NextSTEP 3.x version, and partly available (with working text-to-speech) as a version that compiles for both Mac OS/X, and GNU/Linux under GNUStep. Additionally, OS X .dmg files and NeXT packages that can be directly installed and run are available. These files are held in the Subversion Repository (not the CVS Repository) on the
savannah web page for the project— under “-Browse Sources Repository” in the “Development Tools” section. The material is organised according to the three branches previously mentioned (gnustep/, nextstep/ and osx/).
The original NeXT User and Developer Kits are complete, but do not run under OS X or under GNUStep on GNU/Linux. They also suffer from the limitations of a slow machine, so that shorter TRM lengths cannot be used. Any password can be selected to activate the NeXT kits from the file “nextstep / trunk / priv / SerialNumbers” and choosing a password such as “bb976d4a” for User 26 or “ebe22748” for Dev 15 from the very large selection provided. In fact, you can use these passwords. But you need a NeXT computer, of course (try Black Hole, Inc. if you'd like one).
Developers should contact the authors/developers through the gnu project facilities. To join the project mailing list, you can go directly to the subscription page. Papers and manuals are available on-line (see below).
A number of papers and manuals relevant to gnuspeech exist:
provides a reasonably detailed explanation of the theory underlying the tube resonance model.
A heavily cross-referenced conceptionary is available to provide access to some of the background terms and research in the relevant scientific fields.
A guide to the pronunication notation used in the text-to-speech work showing the relationship between standard forms (IPA, Websters) and the ASCII-friendly form used in gnuspeech, with examples of actual pronunciations.
The Tube Resonance Model a write-up of the waveguide model of the acoustic tubes that form the underlying model of the human vocal apparatus.
Additional material, including sound files, is also available on Professor Hill's university web site.
Papers related to the research that has led to gnuspeech are also collected on Professor Hill's university web site. These include the development of the “event-based” approach to speech synthesis, which is also applicable to speech recognition.
Some examples of the papers by other researchers that helped us in developing gnuspeech include:
- Carré, R. and Mrayati, M. (1992) “Distinctive regions in acoustic tubes. Speech production modelling. J. Acoustique 5, 141-151
- Fant, G. & Pauli, S. (1974) “Spatial characteristics of vocal tract resonance models.” Proc. Stockholm Speech Communication Seminar, KTH, Stockholm, Sweden.
- Smith, J.O. (1992) Physical modelling using digital waveguides. Computer Music Journal, 16 (4) 74-91
- Cook, P.R. (1989) Synthesis of the singing voice using a physically parameterised model of the human vocal tract. International Computer Music Conference, Columbus, Ohio.
- Liberman, A.M., Ingemann, F., Lisker, L., Delattre, P. & Cooper, F.S. (1959) Minimal rules for synthesising speech. J. Acoust. Soc. Amer. 31 (11), 1490-1499, Nov
- ’t Hart, J. & Cohen, A. (1973). Intonation by rule: a perceptual quest. Journal of Phonetics,
1 (4), 309-327.
- Wells, J.C. (1963) “A study of the formants of the pure vowels of British English”, Progress report for July, University College, London.
but there are far too many to list them all. Further papers may be found in the citations incorporated in the relevant papers noted above and/or listed on David Hill's university web site.
See the section on Manuals and papers
To contact the maintainers of gnuspeech, to report a bug, or to
contribute fixes or improvements, to join the development team, or to join the gnuspeech mailing list, please visit the gnuspeech project page and use the facilities provided. The mailing list can be accessed under the section “Communication Tools”. To help with the project work you can also contact Professor David Hill directly.
The research that provides the foundation of the system was carried out in research departments in France, Sweden, Poland, and Canada and is ongoing. The original system was commercialised by a now-liquidated University of Calgary spin-off company—Trillium Sound Research Inc. All the software has subsequently been donated by its creators to the Free Software Foundation forming the basis of the GNU Project gnuspeech. It is freely available under a General Public Licence, as described herein.
Many people have contributed to the work, either directly on the project, or indirectly through relevant research. The latter appear in the citations to the papers referenced above. Of particular note are Perry Cook & Julius Smith (Center for Computer Research in Music and Acoustics) for the waveguide model and the DSP Music Kit), René Carré (at the Département Signal, École Nationale Supérieure des Télécommunications in Paris). Carré’s work was, in turn, based on work on formant sensitivity analysis by Gunnar Fant and his colleagues at the Speech Rechnology Lab of the Royal Institute of Technology in Stockholm, Sweden. The original gnuspeech system was created over several years from 1990 to 1995 by the University of Calgary technology-transfer spin-off company Trillium Sound Research Inc. founded by David Hill, Leonard Manzara and Craig Schock at Leonard's suggestion. The work then and since was mainly performed by the following:
- David Hill designed the event-based approach to speech synthesis and supplied the basic knowledge of speech, gleaned from many years working in the field at places like Edinburgh University Department of Phonetics and Linguistics and visiting major research centres around the western world. He compiled the pronunciation dictionary, following initial work by Adam Rostis, ported Synthesizer to Mac OS X, and ran the project.
- Walter Lawrence the inventor of the Parametric Artifical Talker (PAT)—the first complete formant based speech synthesiser provided hardware for the initial research.
- Elizabeth Uldall of the Edinburgh University Department of Phonetics and Linguistics introdcued David Hill to speech synthesis and PAT during several visits.
- Miroslav Preucil of the Czech Technical University, Prague, Czechoslovakia, worked with David Hill on an improved PAT used in the early work on formant-based speech synthesis by rules.
- Wiktor Jassem of the Polish Instytut Podstawowych Problemów Techniki, Ian Witten at Essex University and Neal Reid, and university of Calgary student worked with David Hill on the basic research needed to understand how to create reasonably natural rhythm and intonation the rhythm and intonation.
- Julius Smith and Perry Cook at the Center for Computer Research on Music and Acoustics whose work and correspondence helped Leonard Manzara develop the Tube Resonance Model (see below).
- René Carré who developed the Distinctive Region Model used as a basis for the TRM Control Model which, in turn, was basd on research into formant sensitivity analysis by Gunnar Fant and his colleagues at the Royal Institute of Technology (KTH) in Stockholm, Sweden.
- Craig Schock designed and developed the database editor Monet used to create the databases needed to re-implement David Hill's “event-based” approach to speech synthesis in the new gnuspeech system. He created dictionary creation tools. He wrote WhosOnFirst, the "say" command line tool, the Speech Manager,... and was the project software rchitect.
- Leonard Manzara wrote the "C" implementation of the tube model that forms the acoustic core of the synthesis system, and then re-implemented it on the DSP56000 to make it run in real time. He created the original Synthesizer app for the Next. He wrote BigMouth to add speech as a service.
- Vince Demarco and David Marwood wrote the original PrEditor.
- Eric Zoerner did an initial but incomplete port of PrEditor to Mac OS X.
- Michael Forbes refactored PrEditor
- The Savannah hackers set up the original GNU project files, including the CVS repository, now migrated to the SVN repository.
- Adam Fedor and Greg Casamento worked through the original NeXTSTEP source code to bring it to OpenStep standards which was a huge help to get the port started.
- Steven Nygard provided the major effort needed to port the original NeXTSTEP version of Monet and related items to Mac OS X, adding to the original CVS repository material in the process.
- Dalmazio Brisinda took over from Steven and extended the Mac OS X port of gnuspeech modules, including integrating the parser and migrating all the material to the current SVN repository, reorganizing it to make it easier to manage and access in the process.
- Marcelo Matuda worked with Dalmazio to produce the first port to GNU/Linux GNUStep
Return to GNU's home page.
Please send FSF & GNU inquiries & questions to
David Hill is responsible for writing this gnuspeech page. Thanks to Steve Nygard for his helpful criticisms
Please send comments on these web pages to
send other questions to
Copyright (C) 1998, 2001 Free Software Foundation, Inc.,
59 Temple Place - Suite 330, Boston, MA 02111, USA
Verbatim copying and distribution of this article in its entirety is
permitted in any medium, provided this copyright notice is preserved.
Page originally created in the mists of time (2004?)
Last modified: Sun Jun 3 19:03:39 PDT 2012