The current state and a little history of the project are as follows. The descriptions also provides a reference for the original system components that are available in the NeXT source archive in the project SVN Repository, and to understand the scope of gnuspeech.
No database creation and manipulation components or interactive interfaces are provided for the TextToSpeech Server itself. Those are only appropriate for Monet and other applications that use it. However, provision is made to set the parameters for controlling static aspects of the synthesis (tube length, mean pitch, and so on—the so-called “utterance-rate parameters”). These static parameters are normally held in a system library as a “defaults database”. This refinement is not yet included in the ports but is a function of ServerTest (see below). The Text-to-Speech Server computes the event framework from the input text via the intermediate input syntax produced by the Parser. This pre-processing includes dictionary look-up to get the correct pronunciation. There is no significant parsing in terms of normal English grammar, and no attempt is made to determine meaning (which would allow different pronunciations of words with the same spelling to be disambiguated, and would to allow slightly more accurate rhythm and intonation to be generated). Such abilities should eventually be added. The word stress information from the dictionary is used to help determine the rhythmic framework according to the Jones/Abercrombie/Halliday (British) “tendency-towards-isochrony” theory of British English speech by placing “foot” boundaries before the word stress in words having word-stressed syllables. “
There's a diagram of the relationships between the various TTS components of the complete system on the project Home Page.
These applications were designed to allow the TextToSpeech Server to be tested and, in the case of the TextToSpeech Server Plus, provide certain “hidden” methods that were restricted to Trillium's “in-house” use. Now that the whole system is available under a GPL, the restricted “ServerTest” version is obsolete and the name ServerTest will refer to a reimplementation of ServerTestPlus. For example, one of the 18 originally-hidden methods allowed plain text to be converted into the intermediate Monet input syntax. The ported version of gnuspeech incoporates use of the Parser in various accessible roles. It was originally hidden to keep the main dictionary material proprietary, as it could have been used to completely decode the encoded dictionary.
WhosOnFirst was the first publicly available software associated with the Trillium TextToSpeech system and was designed as a bit of a teaser. As issued, it provided indication, on the NeXT console, of remote logins. It also told the user that, if they had the Trillium TextToSpeech system, they could get voice alerts not only to remote logins, but other system activity such as application launches. WhosOnFirst was written by Craig Schock and was instrumental in catching and identifying a hacker trying to break into our system soon after it was set up. WhosOnFirst has not yet been ported.
A command line interface to the TextToSpeech Server that can be used from a terminal or in shell scripts. It was written by Craig Schock and has not been ported yet, though there is a similar facility for the GNU/Linux GNUStep version.
The SpeechManager was provided to allow the TextToSpeech Server operating system parameters to be optimised for different systems, because no particular setting of priorities, initial silence fill, and so on, could be right for all systems. In particular, in networked systems, or systems with a high compute load from other tasks, the speech would sometimes crackle due to interference from other tasks. The SpeechManager, which could only be run as root, allowed the TextToSpeech Server to be restarted, and the various parameters controlling priority and so on to be set to new values to avoid crackling whilst minimising the use of system resources. These functions are almost certainly obsolete these days, given the increased compute-power available. Some functions (such as reporting the version of the main dictionary in use, or restarting the TextToSpeech Server) may still be required in some form. The SpeechManager was written by Craig Schock. It has not been ported.
An applet that was provided to allow any of the TextToSpeech Kits to be registered, using a password, and was run under the root account. The original function is now obsolete, but may be useful, in revised form, as a way of building user groups for the ported system. It was written by Craig Schock. It has not been ported.
The TrilliumSoundEditor is speech editor and analysis program intended to provide a more versatile replacement for the publicly available Sonagram program written by Hiroshi Momose. Although TrilliumSoundEditor was never completely finished, it provided the basic spectrographic analysis functionality required for speech development and could be finished/upgraded/ported at some point in the future. The program was written by Craig Schock. None of the TrilliumSoundEditor has yet been ported, but the source is available. With the advent of Praat (see the Monet manual in the initial distribution of gnuspeech), the Trillium Sound Editor is probably redundant.
In summary, much of the core software has been, and some is being ported to the Mac under OS/X, and GNU/Linux under GNUStep. All sources and builds for the current work are currently in the Git repository, with older material in the SVN repository under three branches (for the Next, Mac OS X, and GNU/Linux under GNUStep versions—see below). Speech may be produced from input text. The development facilities for managing and creating new language databases, or modifying the existing English database for text-to-speech lack mainly the file writing components. The gnuspeech facilities also provide the tools needed for psychophysical and linguistic experiments. TRAcT, which gives direct access to the tube model, functional—a few of the logarithmic data displays remain to be finished, and clean-up is needed. Some accessory tools are available. As well as the acknowledgements above, Greg Casamento, Adam Fedor and the Savannah Hackers provided valuable support getting the gnuspeech project established, as well as initial work that facilitated the port, including making ubiquitous and tedious changes to the entire NeXT source code to bring it up to OpenStep standards. This work and support is gratefully acknowledged. It involves a lot of effort but is largely invisible to all but the developers involved, and made the actual port to OS X and GNUStep much less painful.