EP0214274A1 - Collection and editing system for speech data - Google PatentsCollection and editing system for speech data
- Publication number
- EP0214274A1 EP0214274A1 EP19860902100 EP86902100A EP0214274A1 EP 0214274 A1 EP0214274 A1 EP 0214274A1 EP 19860902100 EP19860902100 EP 19860902100 EP 86902100 A EP86902100 A EP 86902100A EP 0214274 A1 EP0214274 A1 EP 0214274A1
- Grant status
- Patent type
- Prior art keywords
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
COLLECTION AND EDITING SYSTEM FOR SPEECH DATA
The invention pertains generally to interactive computer systems which transform digitally encoded speech data into understandable speech and, is more particularly directed to a system for the collection and editing of speech data into digitally encoded speech data segments so that they can be spoken by the interactive application programs of computer systems. Speech processors which can operate in parallel with a programmable digital computer have come into widespread use in recent years. An example is the speech synthesizer used as an accessory for the Texas Instruments TI-99/4A personal computer. The speech synthesizer accepts a stream of data from the' personal computer that defines the individual phonemes of a spoken message. The stream of data must be presented to the synthesizer at a controlled rate of speed with a continuous involvement of the computer in the speech generation process.
An alternative arrangement is disclosed in U.S. Patent No. 4,335,277 issued to Puri. In this system, a read only memory containing a plurality of variable length, digitally encoded speech data segments is connected to a speech synthesizer and a personal computer. The computer can initiate the conversion of an entire single data segment into speech by passing the address of the segment to the read only memory and by then starting the synthesizer. In this manner, a single speech segment may be generated in response to a single request from the computer with no further intervention.
Further, in response to the need for flexibility in speech data segment and overall message length, and to provide for arbitrary spoken messages which can be programmed not only in advance of their speaking but also in an easily variable format, a system including a programmable digital speech processor and a host personal computer was developed. This system is more fully described in U.S. Patent application Serial
No. 469,482 filed in the names of Raymond et al. on February 24, 1983 and entitled "Phrase Programmable
Digital Speech System". The Raymond et al. system generates an entire speech message, without further host computer intervention upon command. The speech message is programmed to contain an arbitrarily arranged sequence of spoken phases, each derived from separate and nonconsecutively stored, variable length, digitally encoded speech data segments or phrases.
A further improvement to Raymond et al. is found in U.S. Patent application Serial No. 543,490 filed in the names of Raymond et al. on September 21, 1983 and entitled "Speech Processor with Auxiliary Memory Access". Raymond et al., '490, discloses an apparatus for permitting a personal computer to have access to the additional RAM space on a speech processor. The disclosures of Raymond et al. '490 and '482 are hereby expressly incorporated by reference.
In these speech systems the digitally encoded data which is spoken must be encoded from some source. Some earlier speech processing systems which were used for collecting and encoding data were embodied as word translators where in response to a typewritten word a number of translation routines were used to convert the word into digitally encoded data to be later spoken. These translation routines used fixed rules for transforming the words into phoneme combinations which when spoken by a speech processor would produce intelligible language. One of the main drawbacks with a collection system of this sort is the unnatural quality of the speech produced. The speech lacks inflection, tonal qualities, natural pauses, and all those other undefinable characteristics which make up the personality of a speaker. More modern collection systems use direct analog to digital coding techniques using filters and adaptive feedback to change a spoken phrase into digital
'"V code. The most advantageous conversions are not simple
5 PCM techniques, but ones that changes the spoken words into a digital code which is readily transformed into speech by a digital speech processor. A code in use and which does not omit the inflection and tonal qualities of a speaker is linear predictive code or LPC. The
10 linear predictive code is developed by a recursive filtering of the input waveform and generates a number of filter coefficients along with energy and pitch parameters which can be used to reconstruct the analog voice waveform. An example of an apparatus which
15 converts spoken words directly into linear predictive code is a Portable Analysis Synthesis System, currently referred to as system SDS-50, manufactured by Texas Instruments, Inc. of Dallas, Texas.
While advantageous for retaining the
20 personality of the speaker, the SDS-50 collection system is not flexible enough to easily combine with the power of the referenced Raymond et al. systems where arbitrary lengths of messages and phases can be programmed. A direct conversion system such as the PASS system does
25 not provide an advantageous method of storing and then programming arbitrary phrases from direct conversions into a soundtrack or a group of data segments which can be output from a speech synthesizer.
Another problem with direct conversion systems
30 is there has been, until the present invention, no facile way of editing large amounts of input data. Although retaining the personality of the speaker, microphone noise and other extraneous sounds such as hissing, popping, dropouts, etc. are additionally
35 recorded by direct conversion techniques. Further, if a conversion of all the words or information from a voice is not taken in one session, the person may sound significantly different in another session. Thus, while it is important to be able to edit the data generated by the direct conversion technique, it is even more advantageous to edit it off line or after a live recording session when all the information needed for a soundtrack has been recorded.
In addition, a collection of phrases that have been edited for good tonal quality and inflection should be used to build more than one message. A collection of phrases forms a library from which other messages can be built and collected while insuring that the humanistic quality of the spoken words are not lost in the editing and concatenating.
SUMMARY OF THE INVENTION Therefore, it is an object of the invention to provide a speech collection system which can store directly converted spoken phases in a phrase storage memory with index means for determining the location of any stored phase in that memory. Another object of the invention is to provide a speech collection system having a means to edit a phrase stored in linear predictive code.
Still another object of the invention is to provide means for selecting among the stored information in the phase storage memory means and for assembling sentences and entire instructional messages from those stored phrases to form a soundtrack.
In accordance with the foregoing objects, the invention provides a system for collecting, editing, and concatenating directly converted phrases into a soundtrack. The system comprises a collection means for storing the directly converted phrases in a phrase storage memory, editing means for permitting those stored phrases to be reviewed and edited, and soundtrack formation means for selecting and assembling edited phrases or unedited phrases into a soundtrack.
In a preferred implementation a personal computer is used to provide these means operating under the control of a collection program for the collection function, an editing and review program for the editing function, and a soundtrack formation program for the soundtrack generation function. The personal computer system is supplemented for these operations by a system including a linear predictive code (LPC) generator for directly converting intervals of spoken words into LPC format digital data and speech collector circuitry for interfacing the personal computer to the LPC generator. The system further includes a speech processor as is described in Raymond et al. '490 for providing auxiliary memory useful in concatenating phrases in the soundtrack formation operation. These and other features, aspects, and objects of the invention will be readily apparent and' ore fully described upon reading the following detailed description in conjunction with the appended drawings wherein: Brief Description of the Drawings
FIGURE 1 is a series of operational diagrams describing the sequence of operations of a system constructed in accordance with the invention using a collection program, a editing program, and a soundtrack formation program.
FIGURE 2 is a system hardware block diagram of an apparatus for collecting, editing, and assembling directly converted phrases into a soundtrack which is constructed in accordance with the invention; FIGURE 3 is a more detailed schematic block diagram of the expansion circuitry which supplements the basic architecture of the personal computer illustrated in FIGURE 2i
FIGURE 4 is a detailed electrical schematic diagram of the circuitry implementing the speech collector circuit 32 illustrated in FIGURE 3?
FIGURE 5 is a detailed schematic block diagram of the circuitry implementing the speech processor circuit 34 illustrated in FIGURE 3;
FIGURE 6 is a pictorial representation of the memory segmentation for the speech processor circuit 34 illustrated in FIGURE 3;
FIGURE 7 is a detailed flowchart of the collection program which controls the personal computer illustrated in FIGURE 2 for the collection and indexing functions; FIGURE 8 is a detailed flowchart of the subroutine COLLECT DATA called from the collection program illustrated in FIGURE 7;
FIGURE 9 is a detailed flowchart of the subroutine SPEAK called from the collection program illustrated in FIGURE 7;
FIGURE 10 is a pictorial representation of the digital format for spoken phrases after they have been converted into linear predictive code;
FIGURE 11 is a pictorial functional block diagram of the operation of the editing program;
FIGURE 12 is a pictorial representation of the graphical display of the pitch and energy parameters of an LPC digital format segment as generated during the editing program; FIGURE 13 is a pictorial representation of the listing display of the pitch, energy, and filter coefficient parameters of a LPC digital format segment as generated during the editing program;
FIGURE 14 is a system flow chart of the editing program which controls the personal computer illustrated in FIGURE 2 for the review and editing function;
FIGURE 15 is a pictorial representation of the operational commands available for the editing program illustrated in FIGURE 14; FIGURE 16 is a pictorial representation of the operational commands available for the editing mode command illustrated in FIGURE 15; and FIGURE 17 is a functional block diagram of the soundtrack formation program which controls the personal computer illustrated in FIGURE 2 for the selection and assembling function. Detailed Description of the Preferred Embodiment
The operational sequence of forming a soundtrack from spoken phrases and its use with an educational program are shown more fully in Figure 1.
As illustrated in steps A, B and C of the sequence, the system uses a collection program 1, an editing and review program 3, and a soundtrack formation program 5 to produce soundtrack disks 9. The soundtrack disks 9 in combination with an application program 11 in step D form an educational program 7 which outputs graphics 13, video 15, and speech 17 in an interactive manner with a student. After the application program 11 and soundtrack disks 9 are combined into the educational program 7, the information stored therein can be loaded into a speech processor memory to be played directly on cue from the educational program 7 or integrated into the educational program by actually programming sequences of the soundtrack disks with commands loaded in the command buffer of a speech processor of the type illustrated in Raymond et al. '490. The input to the collection program 1 in step A is a plurality of segments of linear predictive code in digital format which have been converted from directly spoken phrases 19. The collection program output from this input is one or more as yet unedited phrase storage disks 21 each storing a plurality of the collected but unedited phrases. The collection program additionally provides means for indexing or labeling each phrase such that its location in the phrase storage memory is known. In step B the unedited speech segments on disks 21 are reviewed and edited by the editing program and output as edited phrase storage disks 23. The edited phrase storage disks 23 are physically the same disks as the unedited phrase storage disks 21 but may contain the stored information in modified form. From the edited phrase storage disks 23, in step C, the soundtrack formation program 5 selects and arranges the phrases to form selected phrase disks 25. A plurality of selected phrase disks 25 are then concatenated or linked by the soundtrack formation program 5 to produce one or more soundtrack disks 9.
The memory storage for the collection program 1 is shown as unedited phrase storage disks 21, and for the soundtrack formation program 5 is shown as selected phase disks 25, and soundtrack disks 9. The disks can be of the standard 5.25 in. floppy disks used in personal computers and are advantageous in the system because of the present extra cost of providing expanded main memory space for small personal computers. As memory costs decrease, it is evident that the phase storage memory, selected phase memory, and soundtrack memory could also be embodied in main memory. Figure 2 shows a basic hardware block diagram for a collection, editing, and assembling system for speech data constructed in accordance with the invention. The system includes a interactive processor means or apparatus 10 which can be embodied as a personal computer. The personal computer 10 is provided with a standard disk operating system and a dual disk drive including a program disk drive 26 and a data disk drive 28. As is conventional with these small personal computing systems, the processor means 10 includes a means for interfacing the microprocessor of the system with operator controlled devices such as a keyboard 27 and a video monitor 29.
The processor means 10 under program control collects the speech data segments which are output from a linear predictive code generator 16 and processes such data into the soundtrack disks 9 which can then be spoken by the processor means 10 as an accompaniment to an application program.
The LPC generator 16 receives its input from one of two sources. The LPC generator 16 may receive spoken words directly transduced into an analog waveform from a microphone 18 when switch 24 is closed or from a prerecorded tape played on a commercial audio recorder
20 when switch 22 is closed. The LPC generator 16 receives intervals of speech from either of these sources and converts them into LPC digital code which can be spoken directly by many of todays speech synthesis chips. Although many LPC generators may be used to convert analog voice data into digital LPC format code, the LPC generator 16 is preferably a portable analysis synthesis system (SDS-50) manufactured by the Texas Instruments Corporation of Dallas, Texas. This system contains an analysis circuit which converts intervals of spoken words in the form of analog voltages to a digital format and extracts energy, pitch and ten filter parameters forming the LPC code for each spoken phrase or segment.
As the speech segments are collected by the personal computer 10 they are stored on phrase storage disks 21 and later edited and concatenated into the sound track. A speaker 12 which can produce audio sounds is driven by the personal computer 10 to provide a means for listening to the speech data segments as they are collected. The speaker 12 further provides a means for listening to the speech data segments or portions thereof during the review and editing process. Another audio amplifier 14 which is driven by the personal computer 10 produces audio sounds from the soundtrack after it has been assembled.
Figure 3 shows an overall block diagram of the collecting and editing system for speech data including the microprocessor 30 of the personal computer 10 and further including a speech collector circuit 32 and a speech processor circuit 34. While many types of personal computers can be used in implementing the invention, the Apple lie computer manufactured by Apple
Computer Incorporated of California was selected for use in constructing the preferred embodiment of the invention because of its suitability for use in computer aided instruction, its relatively low cost, and its present popularity among educators. The overall architecture will therefore be described with respect to the Apple lie and its operating system, peripheral communication apparatus, and programming language. On the Apple lie computer the peripheral connector 31 shown as the interface between the PC and a plurality of expansion slots generally includes 8 Winchester No. 2HW25C0-111 50 pin printed circuit card edge connectors, the wiring of which is described in the "Apple lie Reference Manual" published by the manufacturer. Through these connectors eight accessories may be connected to the Apple lie architecture. The speech collector circuit 32 and the speech processor circuit 34 each fit into one of the expansion slots on the peripheral connector 21 of the Apple lie which provides communications to the circuits via the control bus, data bus and address bus of the microprocessor 30. The speech processor circuit 34 generates an audio drive signal to control amplifier 14 to produce audio sounds. The speech collector circuit 32 further generates an audio drive signal to contol speaker 12 to produce audio sounds. The speech collector circuit 32 additionally inputs the digital LPC formatted segments for their collection and storage by the personal computer 10.
FIGURE 3 shows only the connections of the microprocessor 30 to the speech collector circuit 32 and speech processor circuit 34 leaving out the normal connections of the keyboard 27, disk drives 26, 28, and video monitor 29 of the personal computer 10 for the purpose of clarity. Additionally, the block diagram of the complete system would as implemented show an ΞPROM with a system program for the microprocessor 30 and a random acess memory RAM on the PC motherboard or as an accessory plugged into one of the expansion slots of the peripheral connector 31. These connections and system architecture are more fully illustrated and described in the above-referenced "Apple lie Reference Manual" the disclosure of which is incorporated herein by reference. The function of the speech collector circuit 32 is to allow digital LPC data to be input for storage in the memory of microprocessor 30 and further to allow the LPC data to be spoken by means of the speaker 12 during the collection and editing processes. The speech collector circuit 32 therefore acts as an interface between the microprocessor 30 and the LPC generator 16 and speaker 12.
The speech processor circuit 34, which is more fully described in the above-referenced Raymond et al. '490 application, provides a memory means for storing speech data as it is assembled into a soundtrack from a plurality of the selected phrase disks 25 and a means to speak long, concatenated sentences in a message form comprising an arbitrary number of phrases. Further, as more fully set forth in Raymond et al. '490, the speech processor circuit 34 is used for programming of the soundtrack associated with certain application programs on the personal computer 10. The programming of the speech processor 34 controls the device to transform the digitally encoded speech data segments of a soundtrack storage disk into understandable speech in conjunction with the running of the interactive application program.
FIGURE 4 is a detailed electrial schematic of the speech collector circuit 32 shown connected to the transmit (XMT) and receive (RCV) terminals of the LPC generator 16. These designations generally describe the serial data inputs and outputs of a SDS-50 apparatus. Further, the connections to the specific control bus lines of the Apple lie through the peripheral connector
31 are illustrated. The control bus lines used for the interface are the read/write line (R/W) , the device select line (DEVSEL) the reset line (*RST) and the phase zero clock line ( 0) . The data bus lines of the Apple lie connected to the speech collector circuitry are the
8 bit data path (D0-D7) . The address bus provides the three address lines A0-A2 for selecting the various devices and operations of the speech collector circuitry 32.
The speech collector circuitry 32 comprises a peripheral interface adaptor PIA 50, a speech synthesis chip 52, a buffer 54, and an asynchronous communication interface adaptor ACIA chip 66 along with their associated circuitry. The buffer 54 provides an interface between the data bus lines DO-D7 of the Apple lie and an eight bit internal data path 55 having data lines D0-D7. The buffer 54 is bidirectional and provides two bidirectional data input ports DA0-DA7 and DB0-DB7. The direction of the data flow is determined by the logic level on the R/W control line which is applied to the *W input of the buffer 54. The buffer is enabled for data transfer by a low logic level on the DEVSEL control line applied to its *E input. Thus, data bytes can be passed from the Apple lie data bus to the data bus 55 or vice versa.
For communicating with the LPC generator 16 the speech collection circuitry uses the ACIA 66. Data input to the ACIA 66 or data received from the device is passed over data path 55 to or from its data terminals D0-D7. The direction of the data flow is determined by the logic level of the R/W control line applied to the R/W input of the ACIA 66. The device is adapted to operate in a transmission mode and a receiving mode. The mode of the device is selected by the logical bit combination applied to the A0 and A2 control lines of the personal computer 10 which are connected to the RS and CSO inputs respectively of the ACIA 66. The different bit combinations on the control lines allow for transmission and reception of data bytes between the LPC generator 16 and personal computer 10. Enablement of the ACIA 66 is provided by a low logic level on the DEVSEL control line which is applied to the *CS1 input of the device while a phase zero clock logic level is applied to the enable input E of the device. The transmission of a byte of data from the
ACIA 66 is a serial string of 8 bits from the TD output of the ACIA 66 to a bus driver circuit. The bus driver circuit comprises a PNP transitor 84 with its emitter connected to a source of positive voltage +V and its collector connected to the transmit terminal XMT of the LPC generator 16 and further to a source of negative voltage -V through resistor 86. The base of the transitor 84 is connected to the TD input of the ACIA 66 via resistor 80 and bias is provided by a resistor 82 connected between the base and emitter of the transistor 84.
When the transistor 84 is in an off state the XMT terminal is at approximately -V while if the TD terminal is caused to sink current, the transistor 84 is turned on and the XMT voltage level makes a transition to nearly +V. In this manner a serial digital transmission is provided for the bytes of information from the Apple lie to the LPC generator 16. Conversely, the receive terminal RCV is connected to a receiver amplifier circuit comprising NPN transistor 74 whose emitter is connected to ground and whose collector is connected to the receive terminal RD of the ACIA 66 and further to a source of positive voltage +V through resistor 72. A base resistor 78 provides a connection between the base of the transistor 74 and the receive terminal RCV of the LPC generator 16. A diode 76 protecting the receive transistor 74 connects between the base to emitter junction of the device.
By raising the voltage on its base from -V to
+V, the transitor 74 is turned on which lowers the voltage at the RD input of the ACIA device nearly to ground. When the RCV terminal is returned to a -V voltage by the passage of a bit, the RD terminal returns to nearly +V as the transistor 74 will then turn off. In this manner a serial transmission from the LPC generator 16 is received by the ACIA device 66. The data rate of a serial transmission and reception at the ACIA 66 is determined by the frequency of a clock signal applied to its CLK input. The clock signal is generated from an oscillator 70 and a counter 68 which divides down the oscillator signal into a lower frequency compatible with the LPC generator communication rate. In operation, the ACIA 66 asynchronously receives a byte of data from the LPC generator 16 and stores it in a register and signals the Apple lie that a byte of data has been received. The Apple lie thereafter can read the stored byte via the datapath 55 and databus D0-D7 through buffer 54. For transmitting a data byte, the Apple lie commands the buffer 54 to reverse the data direction and commands the ACIA to store a data byte coming in from data path 55 into a transmission register. The personal computer 10 thereafter commands the ACIA 66 to transmit the data in serial fashion to the XMT terminal of the LPC generator 16.
The other function of speaking the data is provided by the PIA 50 and the speech synthesis chip 52. The speech synthesis chip preferably is a Texas Instruments TMS 5220 speech synthesizer which has data inputs D0-D7 for receiving a byte of data. The speech synthesizer chip includes a number of interface command inputs *WS, *RS, *INT and *RDY connected to the port B terminals B2, B4, B6, and B7 of the PIA 50. A timing circuit 58 is further connected between a source of negative voltage -V and the -5 input and the OSC input of the chip 52. The interface thus consists of a eight-bit bidirectional data bus (D0-D7) , separate select inputs for a read (*RS) or a write (*WS) operation, a ready line (*RDY) for synchronization, and an interrupt line (*INT) to indicate a status change requiring the attention of the personal computer 10.
The audio output AUDIO of the speech synthesis chip 52 is filtered and attenuated by a circuit 64 before being applied to the inputs of operational amplifier 60.
After being amplified the audio signal is capacitively coupled to the speaker 12.
The data terminals D0-D7 of the speech synthesis chip (SSC) 52 receive data and provide status information to the port A inputs A0-A7 of the PIA 50. The bit combination on control lines Al, A2 are used to control the mode of the PIA 50 by applying signals to the RS0 and RSI inputs of the device. The PIA is selected for operation by a low logic level on the DEVSEL-line being applied to its *CS2 input during a clock transition of the 0 clock applied to its enable E input. The direction of data flow between the personal computer 10 and the speech synthesis chip 52 is determined by the logic level on the R/W control line which is connected to the R/W input of the PIA 50. The PIA 50 can be reset by the reset control signal *RST which is received at its *R input.
In operation the PIA has three operations to perform where the first is passing along bytes of digital data from the personal computer 10 to be spoken by the SSC 52. A second operation is to transfer the status of the SSC 52 to the personal computer 10 and the third is to set the correct logic levels on the control inputs of the SSC from commands generated by the personal computer 10.
The speech data which has been compressed using pitch excited linear predictive coding by the LPC generator 16 is supplied to the SSC 52 by the personal computer 10 through the PIA 50. The SSC 52 decodes this data to construct a time-varying digital f lter model of the vocal tract. This model is excited with a digital representation of either glottal air impulses (voiced sounds) or the rush of air (unvoiced sounds) . The output of the model is passed through an eight-bit digital to analog converter to produce a synthetic speech waveform output from the AUDIO terminal of the chip 52.
With reference to Figure 5, a detailed electrical schematic of the speech processor 200 is illustrated. At the heart of the speech processor 200 is a microprocessor 202 which generates the control signals that control the operation of the speech processor 200. The microprocessor 202 is a preferably Rockwell International Corporation R6500/1 microcomputer system, the details of which are described in a publication dated June, 1979, entitled "document No. 29650-N48: R6500 Microcomputer System Product
Description R6501/1 One Chip Microcomputer", published by Rockwell International Corporation.
The speech processor 200 also contains a speech memory 214 which is a 64k byte random access memory constructed from eight 64k bit dynamic random access memory units No. TMS 4164 manufactured by Texas Instruments, Inc. In addition, the processor contains a speech synthesizer chip 216 preferably. Model TMS 5220 also manufactured by Texas Instruments, Inc. The speech synthesizer chip 216 is more fully described in the publication DM-02, dated June, 1981 and entitled "TMS 5220 Voice Synthesis Processor Data Manual" published by Texas Instruments, Inc.
The speech memory 214, if the speech processing mode is to be used, is first loaded with speech data by the personal computer 10. The memory 214 has the capacity to hold about five minutes worth of speech which may be broken up into 605 discrete speech segments or phrases. To load the memory 214 with speech data, the personal computer 10 first generates the necessary signals at peripheral connector 31 to cause bus input logic 300 to generate a signal *COUNT on line 217. The signal resets an 8-bit counter 218 to a zero count and also passes through a gate 206 and resets an 8-bit counter 220 to a zero count. The outputs of the counters 218, 220 on lines 222 and 224 are alternately connected to a 2-1 select logic device 212 and then through a 2-1 select logic device 210 to the ADDR inputs of the speech memory 214. The 2-1 select logic device 212 is driven by the PHASE 0 timing signal on line 228 into alternate states at a rapid rate and thereby presents a complete 16-bit memory address to the speech memory 214 once during a complete cycle of the PHASE 0 signal.
Accordingly, the counters 218 and 220 determine which memory location within the memory 214 is to be accessed, and the *COUNT signal on line 217 forces these counters to address the location $0000. When the address counters for the speech memory 214 are a zero count, the personal computer 10 can proceed to write data into successive locations within the speech memory. The personal computer 10 presents the data one byte at a time to the peripheral connector 31 on data bus lines D0-D7 and simultaneously generates the necessary signals to cause the bus input logic 300 and timing logic 400 to generate a RAS signal on line 236, a *WRITE signal on line 232, and a *CAS signal on line 234. The *CAS signal on line 234 and the immediately proceeding *RAS signal on line 236 cause the speech memory 214 to accept a byte of data from the data bus 230 and store this byte into location $0000 of the memory 214. The *CAS signal on line 234 also advances the 8 bit counter 220 so that it will now address location $0001. The above process is repeated until all the necessary data (up to 64k bytes) have been loaded into the speech memory by the personal computer 10. Each time a new byte is presented the *CAS, *RAS and *WRITE signals on lines 234, 236 and 232 respectively load the byte into the next successive location of the speech memory 214 and the *CAS signal adds one to the address count presented by the counters 218 and 220. The counters 218 and 220 are interconnected such that an overflow or carry forward from the counter 220 flows into the count input of the counter 218 through a gate 238. The personal computer 10 thus loads speech data or other data into the speech memory 214 almost as rapidly as it can load data into its own internal RAM. To read the data from memory 214, the personal computer 10 reverses the loading process. The *COUNT signal on line 217 is generated by the bus input logic 300 to counters 218, 220. The signal on line 217 resets the counter 218 to a zero count and also passes through the gate 206 and resets the counter 220 to a zero count. Having set the address counter for the speech memory to its zero address position the personal computer 10 may now proceed to read data from successive locations within the speech memory 214. The personal computer executes the read operation by generating the necessary signals to cause the bus input logic 300 and timing logic 400 to generate a *READ signal on line 237, a *RAS signal on line 236, and a *CAS signal on line 234. The *CAS and *RAS signals cause a strobe of the memory 214 at the location addressed which is presently location $0000. The contents of that address are then output on the bus 258 which is returned to the personal computer 10 via the data bus lines D0-D7 of data bus 230. A buffer 259 connects the data bus 258 to the data bus 230 when enabled by the *READ signal on line 237. As previously described, the *CAS signal on line 234 increments the address counters 218, 220 so that they will now point to the next location, in the example
$0001. The personal computer 10 can thus repeat this read sequence as often as required to access the data that is stored in the memory 214. Figure 6 illustrates the normal memory map of the storage area for the speech processor implementing memory 214. The LPC speech data is stored in the higher order addresses $0540-$FFFF. From address $0082 to address $053F are located phrase pointers and identification bytes. For example, from address $0082 to address $053B are located the 605 phrase pointers which point to the 605 phrases loaded into the LPC speech data buffer. If not all 605 phrases are used then the pointers instead of addressing a particular buffer address location address $0081 where a stop code $FF is stored. The address $0080 is reserved as a check sum for the total information stored in this area and addresses $053E, and F are used to store a two byte soundtrack identification number. A command buffer from addresses $0000-$007F is loaded with a plurality of coded commands for executing the phrases of the storage area in an arbitrary sequence. The command buffer and its operation and programming are more fully described in the above-referenced Raymond et al. application.
Whenever the personal computer 10 wishes to have the speech processor 200 produce speech, the personal computer proceeds as described above to store a particular soundtrack and causes the speech address counters 218 and 220 to be set to a zero count. Next the personal computer 10 feeds two bytes of command data to a first command location of the command buffer. The personal computer 10 then actuates the peripheral connector 31 to produce signals in such a manner as to cause the bus input logic 300 to generate an enable signal on line 242. The enable signal on line 242 sets a bistable 240 which causes the generation of a BDEN signal to place the microprocessor 202 into operation.
The microprocessor 202 then examines the contents of the initial command location and takes whatever action the personal computer 10 has called for. A number of command options are available. The simplest command that the personal computer 10 can place into the location is a phrase address. Upon receiving a phrase address as a command, the microprocessor 202 causes the speech synthesizer 216 to generate the corresponding phrase as speech. If the command supplied is a number within the range of $0800 to $083F, the command informs the microprocessor 202 that a multiple series of commands have been placed in the command buffer of the speech memory 214. The least significant six bits of the number supplied indicates how many commands have been placed in sequential memory locations starting at $0002 through to the end of the multiple command set. The individual commands within this set may be phrase address commands or silent interval commands or both. If a command stored in either the initial command location or the multiple commands falls within the numeric range of between $C000 to $FFF, then the command is a time delay that orders the speech processor to do nothing for a specific number of 12.5 milisecond time intervals specified by the least significant 14 bits of the command.
The personal computer 10 can, by proper manipulation of the periheral connector 31 control signals, cause a STATUS signal 244 to be generated by the bus input logic 300 which causes status data presented at Port B of the microprocessor 202 to be gated through a gate 246 and presented to the data bus D0-D7 230 from which the status data of the speech processor may be read. This data can indicate, for example, whether the microcomputer 202 is busy generating speech or otherwise executing a series of commands. A special number presented to the personal computer 10 on leads D0-D7 of data lines 230 can identify the speech processor to assist the computer in searching the peripheral connector 31 slots looking for the speech processor 200. Other status leads can indicate such things as the size of the speech memory, if it is variable in size.
Once placed into operation by the BDEN signal 241, the microprocessor 202 generates a LOAD signal 248 that resets the bistable 240, flows through the gate 206 to clear the counter 220, and flows to the LD (Load) input of the counter 218, thereby causing the counter 218 to load itself with the number 250 presented at Port A of the microprocessor 202. At this time. Port A presents $0000 to the counter 218. Accordingly, the two counters are cleared to zero count so they address the single command data stored in location $0000 of the memory 214.
Thereafter, the microprocessor 202 generates a NEXT signal 252 which a NEXT signal pulse generator 500 converts into a carefully synchronized NEXT PULSE 254. The NEXT PULSE flows into the timing logic 400 and initiates an *RAS and *CAS signal sequence that transfers the contents of location $0000 within the speech memory into the latch 256 over the memory output bus 258 and that advances the counter 220 to a count of $01 so the counters 218 and 220 now address location $0001.
The microprocessor 202 then terminates the NEXT signal 252 and initiates an EN DATA signal 260 that displays the contents of the latch 256 to the bus 250 and to Port A of the microprocessor 202. The microprocessor then accepts the byte of data from the bus 250. Immediately thereafter, the microprocessor 202 again generates the NEXT and EN DATA signals in rapid sequence and thereby reads a second byte of data from location $0001 within the speech memory 214, leaving the counters 218 and 220 addressing memory location $0002. The microprocessor 202 next examines the 16-bit command it has retrieved from the speech memory and takes whatever action is appropriate, as is explained more fully below. If the address counters 218 and 220 need to be reset to point to a specific address, the microprocessor 204 presents the most significant byte of the desired address to the counter 218 over the bus 250 extending from Port A, and then it generates the LOAD signal to clear the counter 220 and load the most significant byte of the address into counter 218. Then, if the least significant byte of the desired address is nonzero, the microprocessor 202 generates the NEXT signal 252 a number of times equal to the numeric value of the least significant byte. Since each NEXT signal causes the timing logic 400 to generate a *CAS signal which advances the counter 220, the net effect of these operations is to set the counters 218 and 220 to the desired address value. By then generating the NEXT signal 252 and the EN DATA signal 260 in alternate sequence, the microprocessor 202 can step through and examine the contents of the speech memory locations starting with the specified address. The microprocessor 202 maintains its status at Port B where it is available to the personal computer 10, including one data bit that indicates whether the microprocessor is "BUSY."
Since the speech memory 214 is dynamic, it must be "refreshed" periodically to keep it from losing data. At times when the speech processor 200 and personal computer 10 are otherwise not using the memory 214, a REFRESH signal 262 connects the address input of the speech memory 214 to an 8-bit counter 264 that counts upwards continuously in synchronism with the PHASE O signal on line 228. The *RAS signal on line 236 continuously pulses the memory 214 even in the absence of the CAS signal and thereby causes locations within the speech memory 214 to be addressed by the count output of the counter 264. The RESET signal 266 from the personal computer 10 is applied to the microprocessor 202 and the next signal pulse generator
500 to reset these devices whenever the personal computer 10 is reset. The Q3 signal on line 268 is a timing signal from the personal computer 10 that is fed into the microprocessor 202 to serve as a clock signal and to synchronize the operation of the microprocessor 202 with the operation of the speech memory 214, which derives its timing from the PHASE O signal of line 228. The timing relationship of the Q3 signal and the PHASE 0 signal is illustrated in the "Apple II Reference Manual" book mentioned above. The Q3 signal fluctuates at precisely twice the frequency of the PHASE O signal, going high for 300 nanoseconds each half cycle of the PHASE 0 signal.
The Q3 signal 268 is applied to input CLK of the microprocessor 202, and the RESET signal 266 is applied to input RST. The remaining four signals, NEXT, EN DATA, LOAD, BDEN, connect to bit positions of Port C of the microprocessor. The remaining four Port C signals are connected to the speech synthesizer 216 by a bus 270, and are connected to the control inputs *RDY, *INT, *RS, and *WS of that device. Port D, bits 0-7, connect respectively to the speech synthesizer 216 input leads labeled D0-D7 via a Bus 272. A capacitor 274 connects the OSC lead of the speech synthesizer 216 to ground, and resistor 276 and variable resistor 278 connect the same lead to a negative supply 280 which also connects to the voice synthesizer input -5.
The system flowchart and operation of the collection routine is illustrated more fully in Figure 7. The program is loaded into the Apple lie by inserting a program disk which contains the software of the routine and a DOS such as APPLE DOS 3.3 into the program disk drive 26. A phrase storage disk 21 is also inserted in the data disk drive 28 to provide an area for storing the collected phrases that are produced by this process. The system is then booted into memory and control transferred from the DOS to the collection program. The routine begins in block A10 where initial instructions are provided to the operator on the video monitor 29. Such instructions as "Turn on the power to the SDS-50 unit", "Turn on power to the terminal", and
■Toggle reset for SDS-50 unit" are given to assist in initialization. After the operator has read and executed these instructions he will press a return key on keyboard 27 to continue the program which is sensed by block A12. The user then inputs the date on which the collection occurs and causes the program to sequence to the next step.
The program now prompts the operator with a menu on the video monitoring 29 displaying three choices for collection operation. The first choice is whether he wishes to initialize the phrase storage disk 21 that is mounted in the disk drive 28. A second option is to actually store phrases on that phrase storage disk and a third option is to stop. By picking a particular option and hitting a key on the keyboard 27 corresponding to that option, the operator will cause the program to advance. If option 3 is chosen, as indicated at block A18,.then an affirmative branch from that test will cause the program to stop. In this context, a stop command is a transfer of control from this particular application program of the Apple lie back to the operating system monitor such that another application program may be run. If option 1 is chosen, as sensed in block A20, than an affirmative branch from that text will transfer control to a path headed by block A34. In block A34 the program will prompt the operator with a visual message on the display screen to insert a disk and a file name for the particular phrase storage disk 21 he inserts in data disk drive 28. After the disk has been inserted in the data disk drive 28, or if a data disk is already ounted as in the present example, then the operator presses a return key, as sensed by block A36, to continue the sequence of the program. The program will then display a warning message on the video monitor 29 that the disk will be erased if the program continues.
Another prompt to the operator is given in block A40 requesting him to affirmatively indicate whether the program should continue this operation.
The operator replies by pressing either the yes ("Y") or no ("N") key on keyboard 27 which is tested for in block A50. If the answer to the prompt is negative, then the program loops back to block A16 where the command menu is displayed on the video monitor 29 so the process can start again. If, however, the yes ("Y") key is pressed then the program will prepare the disk in the system format which is operationally represented in block A52. The disk preparation places a blank index table on the memory device so that as phrases are stored thereon, the table will be filled and the positions of the arbitrary length stored phrases will be easily locatable. Further, the phrase storage disk 21 is labeled with the file name given the memory segment in block A34. After the disk preparation is finished, the program tests for whether or not the return key has been pressed. As soon as the operator operates the return key, as sensed in block A54, then the program will transfer control back to block A16 where the program menu is again displayed.
Returning now to block A20, if the first two tests for the option number were negative, then a test in block A22 is performed to determine if the operator has chosen the record or collect function. If the test result here is negative, then the program defaults back to the menu display in block A16 where the choice of functions is continued.
However, if the result of the test in block A22 is affirmative, then the program will sequence to block A24 where a message is displayed on the video monitor 29 to insert a disk on which the collected phrases may be stored and then to press return. The program delays at block A26 until the program senses that the operator has pressed the return key after loading the disk. The program thereafter displays the disk identity on the video monitor 29, as represented by block A28, to indicate what is stored on the header portion or index table of the phrase storage disk. This table is the one initially stored on disk during option 1 when the disk was initialized in blocks A34-A54. Next the program will display the message, "Is this the correct disc?" on the video monitor 29 as illustrated in block A30. This operation allows an operator to insure that the disk on which he intends to store phrases is the correct one and that a mistake has not been made.
The operator in reply to the prompt answers with the yes ("Y") or no ("N") keys from the keyboard 27 which are decoded by software as represented by block A32. If the answer to the question is no, then the program returns to displaying the menu in block A16. If the operator has mounted the correct disk on which he wishes to store phrases, then the program continues to block A56 where the rest of the index table of the disk is printed out on the video monitor 29 including the disk status, the number of phrases, the remaining free space on the disk, and the phrase names of the phrases collected during the current recording session as they are collected. The program determines if the particular disk that is mounted in the data drive 28 is full or not in block A58. If the disk is full, then in block A60 the program will display a message indicating that status on the video monitor 29 for the operator. The program will then delay in block A62 until the operator presses the return key such that the program can continue to block A16 and the menu display.
If the disk is not full, then the operator may collect more phrases thereon and the program continues to block A64 where a message is displayed on the video monitor 29 indicating the system is "Ready to Collect".
The system collects a phrase by calling the subroutine COLLECT DATA in block A66 where a converted phrase is transferred to the personal computer 10 from the PASS system. A phrase is input to the system through the speech collector circuit 32 and is saved in an intermediate RAM buffer where it is decoded to determine which command was used for the collection process.
The command is generated by an operator on the PASS system during the conversion process and is used to command a number of functions from the recording or collection program. If the command is detected as a "space" in block A68, then the phrase or segment input will be stored in block A80 on the data disk proceeded by a phrase number which is automatially incremented each time a new phrase is saved. If the command is decoded as four or less alphanumerics in block A70, then in block A82 the phrase is saved to the disk 21 using that code name as its phrase name in the table. If the command is an "S", then a subroutine SPEAK is called in block A84 to verbally output the stored phrase with speech synthesis chip 52. If the phrase is only spoken by this command, then it is discarded after the input of another data group by overlaying it in the RAM buffer. If the command is decoded as "S.", then the subroutine SPEAK is again called and the phrase is stored on the storage disk and labeled by the automatically incremented numbering scheme. If the command is "S." followed by four alphanumerics, then the subroutine SPEAK is called and the phrase is saved with the four alphanumerics for a label. Alternatively, whenever an escape key (esc) is pressed on keyboard 27 the program returns to the menu display in block A16.
After a phrase has either been stored or discarded by the collection routine, the program cycles back to the COLLECT .DATA routine to wait from another phrase input from the PASS unit. In this manner the phrases may be stored on a phrase storage memory (disk) one by one in any amount. When one disk is full another one can be initialized and recorded on. A single standard floppy disk generally will hold about 90 phrases of approximately ten seconds apiece. Thus, this expandable phrase storage memory using the phrase storage disks 21 can be used to store an entire message session without having to go back and record missing data at a future time.
The collection program provides a facile means for storing the phrases to disks in that once an operator is set up and ready to collect, the system will collect the data faster than a PASS system can accomplish the conversions of the analog waveforms into LPC data. The operator thus is free to concentrate on the conversion and merely transmits the converted data and the command he wants to accomplish at the end of every conversion. The collection program waits in the storage loop for the PASS input, accepts it, and then labels such for subsequent retrieval.
FIGURE 8 illustrates a more detailed flow chart of the collection process as accomplished by the routine COLLECT DATA. The ACIA 66 is reset in the first block A80 and the program then displays the message "waiting" on the video monitor 29. During this time, the operator has initialized the PASS unit and has input an analog speech waveform of a predetermined interval and converted it into LPC code. He then appends a command to the converted data and operates a sequence of commands for the transmit function on the SDS-50 unit to send the converted data and command to the ACIA 66. When the first byte of data is received, the ACIA notifies the personal computer 10 that data has been started and this operation permits the program to continue from block A84 where it had been waiting. The command is stripped from the data and a nonessential tag removed in block A86 before progressing. The incomming LPC code is then collected in block A90 and tested for errors in block A92. The data is then checked to determine if an end of transmission character has occurred. The data byte, if there were no errors, is saved in the intermediate buffer for transferral to the phrase storage disk or speaking. The process repeats until an entire phrase or data segment has been input to the intermediate buffer or an error occurs in the transmission of the data. If either one of these happens then the program will exit immediately to the calling program.
A more detailed description of the subroutine SPEAK will now be disclosed with reference to FIGURE 9. Initially for the speak commands the data to be converted back into speech is in the intermediate buffer and is of LPC form which can be directly converted by the TMS 5220 into speech. Therefore, in block A98 the SSC 52 is reset with the signal *RST to the *R input of the device. Next eight bytes of data to fill the buffer of the SSC 52 is transmitted through the PIA 50. After the data has been sent, the personal computer 10 will monitor the status of the SSC 52 to determine if it is done talking in block A102 or whether it needs more data in block A104. Data is sent to the SSC 52 as the status of the device requests it until an end of message signal causes the program to exit.
Figure 10 is a pictorial representation of the linear predictive code format for the phrase data. The LPC format compresses the data during conversion into a number of consecutive frames. As seen in the figure there are anywhere from 4-52 bits forming each frame of data. A phrase is made up of one or more frames which characterize the verbalizations of the analog waveform in the digital LPC code. There are five basic formats for the frames including voiced sounds, unvoiced sounds, repeats, silence and stops. A voiced frame 290 is represented by a 4 bit energy parameter and a 6 bit pitch parameter in addition to ten filter coefficients
K1-K10. Each of the filter coefficients may have a different number of values, and therefore bits, depending upon their importance and frequency. The number of bits for each K coefficient is from 3 to 5 for each of the ten filter coefficients.
Normally, the voiced frames 290 are at a fixed rate, but by appending two bits to the beginning of a frame a variable frame rate system can be provided. The next format for the frames is an unvoiced sound 292 which contains an energy parameter but has a pitch parameter of zero. Further, only the first four filter coefficients are needed to define the unvoiced parameters. As was the case with the voiced frames, the unvoiced frames may have a variable or fixed frame rate. The variable frame rate is provided by appending two extra bits to the beginning of the frame. In the voiced and unvoiced frames, the bit separating the energy parameter from the pitch parameter has been zero. However, for a repeat format 294 this bit becomes a 1 and the different frames may change in energy and pitch but the K parameters for the repeat frame stay the same. As was the case previously, a variable frame rate is provided for this format by appending two extra bits to the beginning of the frame. The silence format 296 sets the four bits of the energy parameter equal to zero but can include two initial bits for determining whether a variable frame rate is desired. The fifth and final format 298 for the frames is a stop format where the energy parameter is set equal to all ones. The stop format 298 can be at either a fixed or variable frame rate depending upon whether an initial two bits are appended.
Figure 11 is a functional block diagram of the operational characteristics of the editing program. This block diagram should be viewed in conjunction with the hardware illustrated in Figure 1 to more fully understand the operation of the system under control of this routine. Initially, the editing routine is loaded into the Apple lie by mounting an editing program disk in the operating disk drive 26. The program disk has the editing routine and an Apple disc operating system, such as APPLE DOS 3.3, stored thereon. A phrase storage disc 300, previously loaded with phrases stored by the collection routine, is mounted in the data disk drive 28 and serves as a source for unedited phrases and a destination for edited phrases. The DOS of the Apple lie is used to load the editing routine from the program disk into the RAM of the microprocessor 30 of the personal computer 10 and thereafter to transfer control of the PC to that software which runs as an application program.
During operation of the editing routine, the program can download a phrase from the disk 300 and store the phrase in a RAM buffer area defined as codepack data storage 308 as illustrated by functional block 306. In the following context codepack data is defined as a data segment in all LPC or binary which is quite hard to decipher by an operator. Frame data is defined as that binary LPC code converted to a display code, such as ASCII, which can be humanly interpreted. From the codepack buffer 308 the phrase is automatically unpacked from LPC code into humanly intelligible parameters for viewing on the video monitor 29 as illustrated in functional block 312. The unpacked FRAME data is stored in another RAM buffer area defined as frame data storage 316.
In addition to the automatic unpacking of the codepack data from disk 300 into storage buffer 308, the editing routine has a means for manually packing the data in the frame buffer 316 back into CODEPACK data as illustrated by functional block 314. This is a manual routine and can be used to command the packing of a phrase into codepack at any time it is stored in the frame data storage buffer 316. A manual operation of the unpacking function of block 308 can be called to replace FRAME data with the original CODEPACK data version. In addition to the functions just described,
CODEPACK data in the storage buffer 308 can be spoken by an audio output function as represented by block 310 and/or saved back to the disk 300 as represented by functional block 304. Other functions allow the
CODEPACK data to be stored in either a prefix buffer 309 or a suffix buffer 311. CODEPACK data in either of these buffers can be spoken by the audio output represented by functional block 310. A final option for operation on the CODEPACK data is that a phrase may be identified and deleted from. the disk 300 as represented by function block 302.
The FRAME data stored in the storage buffer 316 may be printed out visually on the video monitor 29 or on a printer means 33 as represented by block 318. This function provides a hard copy duplication of the phrase being edited and is useful in training operators in the techniques of editing phrases. Further, the FRAME data in the frame storage buffer 316 may be temporarily converted to codepack by function of block 322 and so that it can be spoken by an audio output block 324. Additionally the PHRASE data in frame storage buffer 316 may be reviewed and edited as represented by functional block 320. The review and the editing of the FRAME data and the advantageous operational structure shown produce a means for modifying and correcting data with an ease unknown prior to the invention.
The structure previously shown in FIGURE 11 is used by the editing routine to perform numerous review and modification operations on the phrases stored on the phrase storage disk 300. A system flowchart of the editing routine will now be more fully described with reference to FIGURE 14. The program begins by initially in block A90 by indicating to the monitor program that the video monitor 29 should be set to the high resolution mode. The video monitor 29 is then prepared for text in block A91 and the screen cleared by erasing previously stored material and moving the cursor to the home position in block A92. Thereafter, the editing routine sets the system operating monitor for a restart or a reset by providing its own starting address as the jump location for the start up routine in blocks A93, A94. Next, the message area at the bottom of the CRT monitor is cleared in block A95 and the program will then prompt the operator with the question "command?" in block A96. This is a prompt for the operator to input one of a number of commands which will call the several separate subroutines of the editing program to perform the various operations called. The program waits for a command to be given in block A97 which is by the operator entering a predetermining sequence of keys on keyboard 27. After the operation that is commanded has been performed in block A99, the program will return to the loop to the beginnng of the program and be ready for another command.
The commands which may be given to the system during the editing routine are more fully illustrated in Figure 15. Initially, a command can be given to load a phrase from the phrase storage disk 300 as illustrated in Block A100. During this operation the editing program will call the necessary subroutines to download the index table listing the contents of the phrase storage disk 300. The table contents are displayed entirely and lists a two digit number indicating the number of the phrase on the disk separated by a hyphen from a four digit alphanumeric designator indicating the name of the phrase. The program will display a visual table on the video monitor 29 in this format and prompt the operator with a question which asks "which -34- number?". After the opeator enters the number of the phrase that is to be loaded from the disk 300, the program automatically transfers that phrase to the codepack data storage area 308 and automatically converts it from codepack to frame data. The frame data is then loaded into the frame data storage area 316 prior to the program returning. After the phrase has been loaded into the two storage areas, the program blanks the video monitor 29 and returns with the prompt for another command.
The second in the list of commands, in block
A102, is to provide the operator a function for listing the frame data. During this operation the program calls those subroutines necessary to list the frame data stored in the frame data storage buffer 316 onto the video monitor 29 so that the operator can understandably see in an intelligible format what the value of each parameter of the frames are. A pictorial representation of a representative of the listed display is illustrated in FIGURE 13. The phrase is listed on the video screen as a number of lines from 1-19, which each contain the
12 parameters of one frame. The first column of a line corresponds to the energy parameter of an LPC coded frame and the second column corresponds to the pitch parameter. The other ten columns are respectively the filter coefficients K1-K10 in numerical order for the specific frame.
In this manner, the operator has a numerical display in a frame array format representing the phrase in column format so that he may look at a specific parameter and/or line to vary or change the data. This provides a facile method of displaying a phrase in information which is easily recognizable to an operator and lends itself to an understandable modification of the data. Further, to assist in the modification of the phrase down to the parameter level the routine also makes use of the cursor of the video monitor 29 to point to a particular parameter or field in the data.
« The next command illustrated in block A104 is provided to allow the operator to speak the phrase which is stored in the codepack data storage buffer 308. This is done directly by outputting the LPC code in the buffer 308 to the speech synthesis chip 52 of the speech collector 32 as has previously been described. After the operation of speaking the phrase has been completed, the program will jump back to the beginning of the main editing routine and prompt the operator for another command.
The next command illustrated in block A104 is provided to allow the operator to speak the phrase which is stored in the frame data storage buffer 316. This is done by converting the frame data in buffer 316 into codepack 322 and outputing the converted LPC code to the speech synthesis chip 52 of the speech collector as illustrated in block 324. The speech command A104 and the repeat command A106 allows the operator to compare unedited speech in the buffer 308 with the edited speech in the buffer 316. When the operation is complete the program will cycle back to the beginning of the editing routine where the operator will again be prompted by the command question. Another related command to the speak and repeat commands is the speak slow command in block A108 to which the operator adds a numeric designator between 2 and 9. The numeric designator indicates the speed with which the phrase is spoken by the speech synthesizer chip 52 and allows the operator to determine where an error in a phrase may be by varying the speed with which it is spoken. In this manner the operator can easily narrow down the approximate location of an error or something that he desires to change before a slower and more detailed analysis of the phrase is attempted. With the various speak commands, many phrases may be reviewed without having to go to a detailed analysis of each frame as will be more fully described hereinafter. If they pass review under the initial listening tests as defined by the speak, repeat and speak slow commands then no further editing and review are necessary. When the phrase that is stored in the buffers
308 and 316 is determined to contain an error or something in need of correction then a command to enter an edit mode can be given as represented in block A110.
This mode will be more fully discussed hereinafter and generally provides a second group of options with various control commands for producing considerably more detailed editing and review functions.
The next three commands as shown in blocks A112, A114 and A116 are concerned with a graphical display of the phrase in the data buffer 316. By giving the command in block A112 which calls for the graph mode on, the operator by generating a second command to list the frame data as illustrated in block A102 may graphically display in terms of a bar graph, such as that illustrated in Figure 12, the energy and pitch parameters for the frames as a function of amplitude. The energy and pitch parameters are shown only as an example of this function while in actual operation the program will provide graphical illustrations of all twelve of the LPC parameters. The graphical display of these two parameters are excellent visual indications of where errors in the LPC code have been produced during encoding. Particularly, for the energy parameter the amplitude in a phrase will not rapidly vary from frame to frame and is relatively constant throughout a series of voiced sounds until the phrase goes to silence.
Therefore, one of the errors seen immediately from the graphical mode during sequential voiced frames is a missing energy parameter value or one that is discontinuous with the general average of the surrounding voiced frames such as at 350. This parameter may then be replaced to cure the error in the phrase so that a smooth transition across the energy or pitch parameters can be maintained. Normally, as with the energy parameter the pitch for a particular voice does not discontinuously change between sequential frames and therefore the graphical mode shows a distortion of this sort extremely well such at 352. The operator may toggle between the frame data in a listed display on the video monitor or as shown in a graphical mode by giving the command for the toggle text/graph in block A114. To remove the system from the graphical mode capability, the operator generates the command indicative of the graph mode off operation as indicated in block A116. Those discontinuities in the graphs of the energy and pitch found during the graphical mode then may be repaired by the edit mode which is entered via block A110 by generating the correct command.
The next command illustrated in block A118 is to determine how much free space is left on the phrase storage disk 300. Upon receiving this command the program will read the disk index table and also the end of file record to determine how many bytes are open for the storage of more phrases. After the routine has calculated this number and displayed it on the screen, the operator is asked to press any key to return the program to the main routine where another command can be accepted.
The operator may also delete a phrase from the disk by keying in the command illustrated by block A120. When the machine is performing this function the operation of the system is similar to the load a phrase from disk command as discussed previously. The table of contents for the phrase storage disk 300 is displayed on the video monitor screen. Thereafter, the program will prompt the operator with the question "which number?". The phrases that are stored on the phrase storage disk 300 are listed by a two digit number from 1 to 90 and the program will attempt to match the number input by the operator with one of these stored phrases. If it finds a match, the program will delete that number from the table and also erase the phrase from the phrase storage disk. When the phrase is erased from the phrase storage disk 300, the number of bytes of free disk space are increased which allows more data to be stored on the disk. After the phrase has been deleted, the program will return to the main routine where the operator will again be prompted to input another command. Another command that can be given is to call the subroutines performing a hard copy printout of the phrase as illustrated by block A122. These routines will take the data stored in the frame data storage buffer 316 and similar to the list frame data command will output this data in humanly intelligible form on a printer or other hard copy device. After the operation is completed, the program will cycle back to where the prompt for an input command is generated.
The next two commands unpack and pack represented by blocks A124, A126 respectively, are used to transfer data between the two buffers 308 and 316 while a conversion takes place during the transfer. The unpack command transfers the data from the codepack data storage buffer 308 to the frame data storage buffer 316 while converting the data from codepack to frame data. The packing command does the opposite and transfers data from the frame data storage buffer 316 to the codepack data storage buffer 308 while converting frame data to codepack data. The unpack command is useful when during the editing of frame data mistakes have been made in the editing and it is too difficult to change the frame data back to where the editor started. In this instance he uses the manual unpacking command to merely overlay the frame data with new frame data equivalent to the codepack data stored in buffer 308. The pack command is generally used as the last command before the codepack data is saved to the phrase storage disk 300.
Generally, the frame data in buffer 316 has been fully edited and it has been determined that this phrase should be restored to the disk. At this point the original codepack data is overlayed with the new frame data because of the pack command in block A126. This data, now in the codepack data storage buffer 308, can be saved back to the edited phrase storage disk with the command generated by block A130. In a similar manner, the commands illustrated as functional blocks A133, A135 can be used to transfer the data in the codepack data storage buffer 308 into the prefix buffer 309 and the suffix buffer 311, respectively. The prefix buffer and suffix buffer are smaller than the codepack buffer and any overlapping data is truncated by the transfer. These commands are particularly useful in assembling sentences or substantially longer phrases which must fit two or more shorter speech segments together. In addition to the printout of a single phrase on hardcopy, all the phrases of a particular phrase storage disk 300 can be made available in this form by giving the command illustrated in block A128. Since this could be up to 512 lines of 12 columns for each of 90 phrases the hard copy command for all phrases is generally limited in use. However, it does provide an archival method for storing in hard copy an entire phrase storage disk 300 in humanly intelligible form.
The last two commands that may be given to the system when the editing routine is commanding the personal computer are the quit and reset commands. The quit command has the effect of transferring control back to the Apple operating monitor where other application programs can be input and run. On the other hand, the reset command as illustrated in block A134 results in a restart of the edit program by transferring command to the initial operating address of the editing program. The specific and detailed editing functions of the edit mode will now be more fully explained with respect to Figure 16. The edit mode as represented by block A110 provides a number of operations for the detailed modification and change of the phrase in the frame data storage buffer 316. The data that is changed is the frame data which is stored therein and any individual parameter of any frame of the phrase can be modified. Basically, the edit mode is entered by the generation of the edit command which produces a visual representation of the phrase data similar to the list command in the main routine. This produces the column format where the phrase is illustrated as lines each having one frame of data with between 2-12 columns or parameters. A present location cursor is provided to point from parameter to parameter along any of the lines and thus any parameter including the energy, pitch, or filter coefficients of any frame can be isolated and then modified. Initially, the options of the edit mode include a command to speak the frame presently pointed to by the cursor as illustrated in block A148. This allows the operator to review the phrase in more detail by listening to each frame of a phrase as it is spoken to determine if errors can be found or enhancements or modifications are needed. Another command of similar operation is illustrated in block A150 where the system can speak the present frame and automatically advance the cursor to the next line. This allows an operator to rapidly speak each frame in order and go through an entire phrase very quickly looking for errors. Further modifications on these commands are the command illustrated in block A158 which allows the speed at which the frame data is spoken to be varied upon the appending of a numeric indicator from 2 to 9. Another editing command which is a variation of the speak commands is the command illustrated in block A146 which allows the operator to speak the codepack data from buffer 308. With this command and the command illustrated in block A136, a comparison can be made between the two to determine if a mistake that was found in the frame data has been corrected.
A group of commands in the edit mode from block
A136-A142 are for allowing an operator to more readily determine if an error is present in a single line or frame. In block A136 a command allows an entire phrase to be repeated by the editor. This command is similar in function to the repeat command of the main command group. Further, the frames around a particular frame that the cursor is presently pointing to can be spoken by giving the command in block A138 to speak a window. The window is ten frames long and allows the three previous frames, the present frame, and the six subsequent frames to form the window. Block A142 provides a command to allow the window described above to be spoken at a different rates depending upon a number from 2-9 input by the editor. Initially, the speak window, and speak window slow commands can find errors very rapidly. For review of several frames of data, the command in block A140 allows a window to be spoken and the cursor to automatically advance to the next window.
The command represented by block A144 allows the cursor to be decremented to the previous line and the command represented in block A152 provides the operator with a jump command to place the cursor at any line desired. Thus, after initially reviewing a phrase with the speaking and error finding commands and making notes on the errors which are present, the cursor movement commands in blocks A144 and A152 can be used to position the cursor for editing those particular parameters in the phrase found to be incorrect or in need of enhancement.
To provide editing capability the edit mode contains several modification commands including the zero frame and advance command illustrated in block
A156. This command sets the parameters of a line or frame in the phrase to zero and then advances the cursor to the next line. Additionally, another modification command is found in block A154 where the operator can insert a silent transition between two frames. The silent transition is provided by changing the filter coefficient parameters to a set group used particularly for this purpose. These parameters are more fully described in the operating manual for the SDS-50 system.
Further in the editing commands are a number of commands that act entirely on one frame. These commands are represented by blocks A160, A162, A164, A166 and A168. The command in block A160 allows a frame to be deleted from the phrase entirely. This is different from the zero frame command given in block A156 because there is still a place holder frame in that modification while the present operation completely removes the time difference between the end of the previous frame and the start of the subsequent frame. The next full frame edit command is found in block A162 and allows the editor to generate a copy of the present frame. This copy is inserted between the previous frame and the frame the cursor is presently pointing to. Similarly in blocks A164 and A166 commands are provided to copy either the previous frame or the subsequent frame respectively. The copied frame is copied to (replaces) the frame the cursor is presently pointing to. The last command in this group is merely to accept the frame and advance the cursor which is illustrated by block A168.
Another group of commands for the editing mode allow the actual parameters of a single frame to be edited and changed. These include commands found in blocks A170, A172, A174, A176 and A178. Initially when the cursor is moved or advanced it points to the energy -43- parameter or the first field or column in a particular line. The command represented by block A170 allows the cursor to move field by field along the line so as to point to individual parameters. The command in block A172 provides a left tab lock for a particular column such that when the cursor is advanced instead of pointing to the energy parameter it will advance to the same column where the tab is locked when the line is changed. Conversel,y in block A174 there is a command allowing the editor to unlock that left tab and have the cursor return to the first field position of each line.
When the cursor is pointing to a particular parameter in a line or frame the commands found in blocks A176 and
A178 allow the parameter to either be decremented or incremented respectively.
If information from the codepack buffer 308 has been loaded into either the prefix buffer 309 or the suffix buffer 311, a group of commands represented by blocks A182, A184, and A186 may be used to link that data with the data presently in the frame buffer and speak the combined result. The command represented in block A184 allows the editor to speak the frame buffer preceeded by the speech data of the prefix buffer while the command represented in block A186 allows the editor to speak the frame buffer followed by the speech data of the suffix buffer. The command represented in block A182 allows the editor to listen to the frame data preceeded by the prefix buffer data and followed by the suffix buffer data. These commands greatly facilitate the linking of phrases into an integrated speech segment. The critical transitions between the smaller segments can be reviewed and changed if necessary by the system.
The last command in the editing mode is the quit command as illustrated in block A180. The quit command ends the edit mode and produces a transfer of control back to the command sequence of the editing routine where blocks A100-A134 may be selected. Thus there has been shown and illustrated in detail a powerful editing means for LPC frame data which permits the modification and enhancement of digital data which will be spoken.
An overall functional description of the soundtrack formation routine will now be more fully explained with reference to Figure 17. The soundtrack formation program basically takes those phrases which have been stored and edited on the phrase storage discs 326 and provides a means for an operator to choose which he will collect onto selected phrase storage discs 328. During the collection process from the phrase storage discs 326, the selected phrases may be arranged and thus put into the order in which they are to be played back on the soundtrack. Once the arrangement of the selected phrases are produced on one or more selected phrase storage discs 328 then they are concatenated on a soundtrack storage disc 330 by linking the contents of a number of the selected phrase storage discs 328.
Because a soundtrack can be up to 605 phrases in length, concatenating them into a soundtrack requires an extended RAM buffer. The auxilliary DRAM of the speech processor 34 is used in the soundtrack formation process to assemble the necessary quantity of information required by a soundtrack. The soundtrack once completed and stored in the speech processor 34 can be spoken to provide a final assurance of accuracy and then transferred to the soundtrack storage disk 330. While the preferred embodiment of the invention has been illustrated, it will be obvious to those skilled in the art that various modifications and changes may be made thereto without departing from the spirit and scope of the invention as defined in the appended claims.
Priority Applications (2)
|Application Number||Priority Date||Filing Date||Title|
|Publication Number||Publication Date|
|EP0214274A1 true true EP0214274A1 (en)||1987-03-18|
|EP0214274A4 true EP0214274A4 (en)||1987-07-30|
Family Applications (1)
|Application Number||Title||Priority Date||Filing Date|
|EP19860902100 Withdrawn EP0214274A4 (en)||1985-02-25||1986-02-24||Collection and editing system for speech data.|
Country Status (3)
|EP (1)||EP0214274A4 (en)|
|JP (1)||JPS62501938A (en)|
|WO (1)||WO1986005025A1 (en)|
Families Citing this family (3)
|Publication number||Priority date||Publication date||Assignee||Title|
|JP2880592B2 (en) *||1990-10-30||1999-04-12||インターナショナル・ビジネス・マシーンズ・コーポレイション||Editing apparatus and method of a composite audio information|
|US5600756A (en) *||1994-05-11||1997-02-04||Sony Corporation||Method of labelling takes in an audio editing system|
|US20010025289A1 (en) *||1998-09-25||2001-09-27||Jenkins Michael D.||Wireless pen input device|
|Publication number||Priority date||Publication date||Assignee||Title|
|US4406626A (en) *||1979-07-31||1983-09-27||Anderson Weston A||Electronic teaching aid|
Family Cites Families (5)
|Publication number||Priority date||Publication date||Assignee||Title|
|US4150429A (en) *||1974-09-23||1979-04-17||Atex, Incorporated||Text editing and display system having a multiplexer circuit interconnecting plural visual displays|
|US4193112A (en) *||1976-01-22||1980-03-11||Racal-Milgo, Inc.||Microcomputer data display communication system with a hardwire editing processor|
|GB2059203B (en) *||1979-09-18||1984-02-29||Victor Company Of Japan||Digital gain control|
|JPS5774799A (en) *||1980-10-28||1982-05-11||Sharp Kk||Word voice notifying system|
|US4398059A (en) *||1981-03-05||1983-08-09||Texas Instruments Incorporated||Speech producing system|
Patent Citations (1)
|Publication number||Priority date||Publication date||Assignee||Title|
|US4406626A (en) *||1979-07-31||1983-09-27||Anderson Weston A||Electronic teaching aid|
Non-Patent Citations (6)
|ELECTRONIC DESIGN, vol. 29, no. 17, 20th August 1981, pages 107-112, Waseca, MN, US; T. BRIGHTMAN: "Speech-synthesizer software generated from text or speech" *|
|ELECTRONICS INTERNATIONAL, vol. 55, no. 17, 25th August 1982, pages 68,70, New York, US; J. GOSCH: "Voice-synthesizer editor displays speech as curves easily alterable by keyboard" *|
|ELECTRONIQUE INDUSTRIELLE, no. 59, 15th October 1983, pages 65-68, Paris, FR; C. GROSS: "Développement d'un vocabulaire de synthèse de la parole avec Centigram" *|
|IBM TECHNICAL DISCLOSURE BULLETIN, vol. 12, no. 5, October 1969, pages 640-642, New York, US; R. BAKIS: "Improving the fundamental frequency contour in speech synthesis" *|
|ICASSP 80 PROCEEDINGS, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH & SIGNAL PROCESSING SOCIETY, 9th-11th April 1980, Denver, Colorado, vol. 2, pages 402-405, IEEE, New York, US; D.E. MORRIS et al.: "A new speech synthesis chip set" *|
|See also references of WO8605025A1 *|
Also Published As
|Publication number||Publication date||Type|
|US5278943A (en)||Speech animation and inflection system|
|US6175821B1 (en)||Generation of voice messages|
|US6879957B1 (en)||Method for producing a speech rendition of text from diphone sounds|
|US6084582A (en)||Method and apparatus for recording a voice narration to accompany a slide show|
|US5640590A (en)||Method and apparatus for scripting a text-to-speech-based multimedia presentation|
|Black et al.||Building synthetic voices|
|USH2098H1 (en)||Multilingual communications device|
|Harrington||Phonetic analysis of speech corpora|
|US5878393A (en)||High quality concatenative reading system|
|US6510413B1 (en)||Distributed synthetic speech generation|
|US6202049B1 (en)||Identification of unit overlap regions for concatenative speech synthesis system|
|US5570340A (en)||Disk recording medium and method which uses an order table to correlate stored programs|
|US4527274A (en)||Voice synthesizer|
|US4587635A (en)||Information retrieval system equipped with video disk|
|US20080005656A1 (en)||Apparatus, method, and file format for text with synchronized audio|
|US20100064882A1 (en)||Mashup data file, mashup apparatus, and content creation method|
|JP2004347786A (en)||Speech display output controller, image display controller, and speech display output control processing program, image display control processing program|
|EP0294202A2 (en)||Digital sound data storing device|
|US20030225578A1 (en)||System and method for improving the accuracy of a speech recognition program|
|EP0402911A2 (en)||Information processing system|
|US7099828B2 (en)||Method and apparatus for word pronunciation composition|
|US4946391A (en)||Electronic arithmetic learning aid with synthetic speech|
|US4234761A (en)||Method of communicating digital speech data and a memory for storing such data|
|JP2003316356A (en)||Method and device for superimposing playing data on digital audio data, or extracting playing data from digital audio data|
|US6175071B1 (en)||Music player acquiring control information from auxiliary text data|
|AK||Designated contracting states:||
Kind code of ref document: A1
Designated state(s): AT BE CH DE FR GB IT LI LU NL SE
|17P||Request for examination filed||
Effective date: 19861013
|A4||Supplementary search report||
Effective date: 19870730
|18D||Deemed to be withdrawn||
Effective date: 19871015
Inventor name: MILLER, RICKY, LEE
Inventor name: MORGAN, ROBERT, LEE
Inventor name: PFEIFFER, JAMES, EDWARD
Inventor name: RAYMOND, WILLIAM, JOSEPH