EP0214274A1 - Systeme d'acquisition et de preparation des donnees de parole - Google Patents

Systeme d'acquisition et de preparation des donnees de parole

Info

Publication number
EP0214274A1
EP0214274A1 EP86902100A EP86902100A EP0214274A1 EP 0214274 A1 EP0214274 A1 EP 0214274A1 EP 86902100 A EP86902100 A EP 86902100A EP 86902100 A EP86902100 A EP 86902100A EP 0214274 A1 EP0214274 A1 EP 0214274A1
Authority
EP
European Patent Office
Prior art keywords
frame
speech
data
set forth
phrase
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP86902100A
Other languages
German (de)
English (en)
Other versions
EP0214274A4 (fr
Inventor
William Joseph Raymond
Robert Lee Morgan
Ricky Lee Miller
James Edward Pfeiffer
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JOSTENS LEARNING SYSTEMS Inc
Original Assignee
JOSTENS LEARNING SYSTEMS Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JOSTENS LEARNING SYSTEMS Inc filed Critical JOSTENS LEARNING SYSTEMS Inc
Publication of EP0214274A1 publication Critical patent/EP0214274A1/fr
Publication of EP0214274A4 publication Critical patent/EP0214274A4/fr
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output

Definitions

  • the invention pertains generally to interactive computer systems which transform digitally encoded speech data into understandable speech and, is more particularly directed to a system for the collection and editing of speech data into digitally encoded speech data segments so that they can be spoken by the interactive application programs of computer systems.
  • Speech processors which can operate in parallel with a programmable digital computer have come into widespread use in recent years.
  • An example is the speech synthesizer used as an accessory for the Texas Instruments TI-99/4A personal computer.
  • the speech synthesizer accepts a stream of data from the' personal computer that defines the individual phonemes of a spoken message.
  • the stream of data must be presented to the synthesizer at a controlled rate of speed with a continuous involvement of the computer in the speech generation process.
  • a read only memory containing a plurality of variable length, digitally encoded speech data segments is connected to a speech synthesizer and a personal computer.
  • the computer can initiate the conversion of an entire single data segment into speech by passing the address of the segment to the read only memory and by then starting the synthesizer. In this manner, a single speech segment may be generated in response to a single request from the computer with no further intervention.
  • the Raymond et al. system generates an entire speech message, without further host computer intervention upon command.
  • the speech message is programmed to contain an arbitrarily arranged sequence of spoken phases, each derived from separate and nonconsecutively stored, variable length, digitally encoded speech data segments or phrases.
  • linear predictive code is developed by a recursive filtering of the input waveform and generates a number of filter coefficients along with energy and pitch parameters which can be used to reconstruct the analog voice waveform.
  • the SDS-50 collection system is not flexible enough to easily combine with the power of the referenced Raymond et al. systems where arbitrary lengths of messages and phases can be programmed.
  • a direct conversion system such as the PASS system does
  • a collection of phrases that have been edited for good tonal quality and inflection should be used to build more than one message.
  • a collection of phrases forms a library from which other messages can be built and collected while insuring that the humanistic quality of the spoken words are not lost in the editing and concatenating.
  • a speech collection system which can store directly converted spoken phases in a phrase storage memory with index means for determining the location of any stored phase in that memory.
  • Another object of the invention is to provide a speech collection system having a means to edit a phrase stored in linear predictive code.
  • Still another object of the invention is to provide means for selecting among the stored information in the phase storage memory means and for assembling sentences and entire instructional messages from those stored phrases to form a soundtrack.
  • the invention provides a system for collecting, editing, and concatenating directly converted phrases into a soundtrack.
  • the system comprises a collection means for storing the directly converted phrases in a phrase storage memory, editing means for permitting those stored phrases to be reviewed and edited, and soundtrack formation means for selecting and assembling edited phrases or unedited phrases into a soundtrack.
  • a personal computer is used to provide these means operating under the control of a collection program for the collection function, an editing and review program for the editing function, and a soundtrack formation program for the soundtrack generation function.
  • the personal computer system is supplemented for these operations by a system including a linear predictive code (LPC) generator for directly converting intervals of spoken words into LPC format digital data and speech collector circuitry for interfacing the personal computer to the LPC generator.
  • LPC linear predictive code
  • the system further includes a speech processor as is described in Raymond et al. '490 for providing auxiliary memory useful in concatenating phrases in the soundtrack formation operation.
  • FIGURE 1 is a series of operational diagrams describing the sequence of operations of a system constructed in accordance with the invention using a collection program, a editing program, and a soundtrack formation program.
  • FIGURE 2 is a system hardware block diagram of an apparatus for collecting, editing, and assembling directly converted phrases into a soundtrack which is constructed in accordance with the invention
  • FIGURE 3 is a more detailed schematic block diagram of the expansion circuitry which supplements the basic architecture of the personal computer illustrated in FIGURE 2i
  • FIGURE 4 is a detailed electrical schematic diagram of the circuitry implementing the speech collector circuit 32 illustrated in FIGURE 3?
  • FIGURE 5 is a detailed schematic block diagram of the circuitry implementing the speech processor circuit 34 illustrated in FIGURE 3;
  • FIGURE 6 is a pictorial representation of the memory segmentation for the speech processor circuit 34 illustrated in FIGURE 3;
  • FIGURE 7 is a detailed flowchart of the collection program which controls the personal computer illustrated in FIGURE 2 for the collection and indexing functions;
  • FIGURE 8 is a detailed flowchart of the subroutine COLLECT DATA called from the collection program illustrated in FIGURE 7;
  • FIGURE 9 is a detailed flowchart of the subroutine SPEAK called from the collection program illustrated in FIGURE 7;
  • FIGURE 10 is a pictorial representation of the digital format for spoken phrases after they have been converted into linear predictive code
  • FIGURE 11 is a pictorial functional block diagram of the operation of the editing program
  • FIGURE 12 is a pictorial representation of the graphical display of the pitch and energy parameters of an LPC digital format segment as generated during the editing program
  • FIGURE 13 is a pictorial representation of the listing display of the pitch, energy, and filter coefficient parameters of a LPC digital format segment as generated during the editing program
  • FIGURE 14 is a system flow chart of the editing program which controls the personal computer illustrated in FIGURE 2 for the review and editing function;
  • FIGURE 15 is a pictorial representation of the operational commands available for the editing program illustrated in FIGURE 14;
  • FIGURE 16 is a pictorial representation of the operational commands available for the editing mode command illustrated in FIGURE 15;
  • FIGURE 17 is a functional block diagram of the soundtrack formation program which controls the personal computer illustrated in FIGURE 2 for the selection and assembling function.
  • the system uses a collection program 1, an editing and review program 3, and a soundtrack formation program 5 to produce soundtrack disks 9.
  • the soundtrack disks 9 in combination with an application program 11 in step D form an educational program 7 which outputs graphics 13, video 15, and speech 17 in an interactive manner with a student.
  • the information stored therein can be loaded into a speech processor memory to be played directly on cue from the educational program 7 or integrated into the educational program by actually programming sequences of the soundtrack disks with commands loaded in the command buffer of a speech processor of the type illustrated in Raymond et al. '490.
  • the input to the collection program 1 in step A is a plurality of segments of linear predictive code in digital format which have been converted from directly spoken phrases 19.
  • the collection program output from this input is one or more as yet unedited phrase storage disks 21 each storing a plurality of the collected but unedited phrases.
  • the collection program additionally provides means for indexing or labeling each phrase such that its location in the phrase storage memory is known.
  • step B the unedited speech segments on disks 21 are reviewed and edited by the editing program and output as edited phrase storage disks 23.
  • the edited phrase storage disks 23 are physically the same disks as the unedited phrase storage disks 21 but may contain the stored information in modified form.
  • the soundtrack formation program 5 selects and arranges the phrases to form selected phrase disks 25. A plurality of selected phrase disks 25 are then concatenated or linked by the soundtrack formation program 5 to produce one or more soundtrack disks 9.
  • the memory storage for the collection program 1 is shown as unedited phrase storage disks 21, and for the soundtrack formation program 5 is shown as selected phase disks 25, and soundtrack disks 9.
  • the disks can be of the standard 5.25 in. floppy disks used in personal computers and are advantageous in the system because of the present extra cost of providing expanded main memory space for small personal computers. As memory costs decrease, it is evident that the phase storage memory, selected phase memory, and soundtrack memory could also be embodied in main memory.
  • Figure 2 shows a basic hardware block diagram for a collection, editing, and assembling system for speech data constructed in accordance with the invention.
  • the system includes a interactive processor means or apparatus 10 which can be embodied as a personal computer.
  • the personal computer 10 is provided with a standard disk operating system and a dual disk drive including a program disk drive 26 and a data disk drive 28.
  • the processor means 10 includes a means for interfacing the microprocessor of the system with operator controlled devices such as a keyboard 27 and a video monitor 29.
  • the processor means 10 under program control collects the speech data segments which are output from a linear predictive code generator 16 and processes such data into the soundtrack disks 9 which can then be spoken by the processor means 10 as an accompaniment to an application program.
  • the LPC generator 16 receives its input from one of two sources.
  • the LPC generator 16 may receive spoken words directly transduced into an analog waveform from a microphone 18 when switch 24 is closed or from a prerecorded tape played on a commercial audio recorder
  • the LPC generator 16 receives intervals of speech from either of these sources and converts them into LPC digital code which can be spoken directly by many of todays speech synthesis chips. Although many LPC generators may be used to convert analog voice data into digital LPC format code, the LPC generator 16 is preferably a portable analysis synthesis system (SDS-50) manufactured by the Texas Instruments Corporation of Dallas, Texas. This system contains an analysis circuit which converts intervals of spoken words in the form of analog voltages to a digital format and extracts energy, pitch and ten filter parameters forming the LPC code for each spoken phrase or segment.
  • SDS-50 portable analysis synthesis system
  • a speaker 12 which can produce audio sounds is driven by the personal computer 10 to provide a means for listening to the speech data segments as they are collected.
  • the speaker 12 further provides a means for listening to the speech data segments or portions thereof during the review and editing process.
  • Another audio amplifier 14 which is driven by the personal computer 10 produces audio sounds from the soundtrack after it has been assembled.
  • Figure 3 shows an overall block diagram of the collecting and editing system for speech data including the microprocessor 30 of the personal computer 10 and further including a speech collector circuit 32 and a speech processor circuit 34. While many types of personal computers can be used in implementing the invention, the Apple lie computer manufactured by Apple
  • the peripheral connector 31 shown as the interface between the PC and a plurality of expansion slots generally includes 8 Winchester No. 2HW25C0-111 50 pin printed circuit card edge connectors, the wiring of which is described in the "Apple lie Reference Manual" published by the manufacturer. Through these connectors eight accessories may be connected to the Apple lie architecture.
  • the speech collector circuit 32 and the speech processor circuit 34 each fit into one of the expansion slots on the peripheral connector 21 of the Apple lie which provides communications to the circuits via the control bus, data bus and address bus of the microprocessor 30.
  • the speech processor circuit 34 generates an audio drive signal to control amplifier 14 to produce audio sounds.
  • the speech collector circuit 32 further generates an audio drive signal to contol speaker 12 to produce audio sounds.
  • the speech collector circuit 32 additionally inputs the digital LPC formatted segments for their collection and storage by the personal computer 10.
  • FIGURE 3 shows only the connections of the microprocessor 30 to the speech collector circuit 32 and speech processor circuit 34 leaving out the normal connections of the keyboard 27, disk drives 26, 28, and video monitor 29 of the personal computer 10 for the purpose of clarity. Additionally, the block diagram of the complete system would as implemented show an ⁇ PROM with a system program for the microprocessor 30 and a random acess memory RAM on the PC motherboard or as an accessory plugged into one of the expansion slots of the peripheral connector 31. These connections and system architecture are more fully illustrated and described in the above-referenced "Apple lie Reference Manual" the disclosure of which is incorporated herein by reference.
  • the function of the speech collector circuit 32 is to allow digital LPC data to be input for storage in the memory of microprocessor 30 and further to allow the LPC data to be spoken by means of the speaker 12 during the collection and editing processes.
  • the speech collector circuit 32 therefore acts as an interface between the microprocessor 30 and the LPC generator 16 and speaker 12.
  • the speech processor circuit 34 which is more fully described in the above-referenced Raymond et al. '490 application, provides a memory means for storing speech data as it is assembled into a soundtrack from a plurality of the selected phrase disks 25 and a means to speak long, concatenated sentences in a message form comprising an arbitrary number of phrases. Further, as more fully set forth in Raymond et al. '490, the speech processor circuit 34 is used for programming of the soundtrack associated with certain application programs on the personal computer 10. The programming of the speech processor 34 controls the device to transform the digitally encoded speech data segments of a soundtrack storage disk into understandable speech in conjunction with the running of the interactive application program.
  • FIGURE 4 is a detailed electrial schematic of the speech collector circuit 32 shown connected to the transmit (XMT) and receive (RCV) terminals of the LPC generator 16. These designations generally describe the serial data inputs and outputs of a SDS-50 apparatus. Further, the connections to the specific control bus lines of the Apple lie through the peripheral connector
  • the control bus lines used for the interface are the read/write line (R/W) , the device select line (DEVSEL) the reset line (*RST) and the phase zero clock line ( 0) .
  • the data bus lines of the Apple lie connected to the speech collector circuitry are the
  • the address bus provides the three address lines A0-A2 for selecting the various devices and operations of the speech collector circuitry 32.
  • the speech collector circuitry 32 comprises a peripheral interface adaptor PIA 50, a speech synthesis chip 52, a buffer 54, and an asynchronous communication interface adaptor ACIA chip 66 along with their associated circuitry.
  • the buffer 54 provides an interface between the data bus lines DO-D7 of the Apple lie and an eight bit internal data path 55 having data lines D0-D7.
  • the buffer 54 is bidirectional and provides two bidirectional data input ports DA0-DA7 and DB0-DB7. The direction of the data flow is determined by the logic level on the R/W control line which is applied to the *W input of the buffer 54.
  • the buffer is enabled for data transfer by a low logic level on the DEVSEL control line applied to its *E input. Thus, data bytes can be passed from the Apple lie data bus to the data bus 55 or vice versa.
  • the speech collection circuitry uses the ACIA 66.
  • Data input to the ACIA 66 or data received from the device is passed over data path 55 to or from its data terminals D0-D7.
  • the direction of the data flow is determined by the logic level of the R/W control line applied to the R/W input of the ACIA 66.
  • the device is adapted to operate in a transmission mode and a receiving mode.
  • the mode of the device is selected by the logical bit combination applied to the A0 and A2 control lines of the personal computer 10 which are connected to the RS and CSO inputs respectively of the ACIA 66.
  • the different bit combinations on the control lines allow for transmission and reception of data bytes between the LPC generator 16 and personal computer 10.
  • Enablement of the ACIA 66 is provided by a low logic level on the DEVSEL control line which is applied to the *CS1 input of the device while a phase zero clock logic level is applied to the enable input E of the device.
  • ACIA 66 is a serial string of 8 bits from the TD output of the ACIA 66 to a bus driver circuit.
  • the bus driver circuit comprises a PNP transitor 84 with its emitter connected to a source of positive voltage +V and its collector connected to the transmit terminal XMT of the LPC generator 16 and further to a source of negative voltage -V through resistor 86.
  • the base of the transitor 84 is connected to the TD input of the ACIA 66 via resistor 80 and bias is provided by a resistor 82 connected between the base and emitter of the transistor 84.
  • the receive terminal RCV is connected to a receiver amplifier circuit comprising NPN transistor 74 whose emitter is connected to ground and whose collector is connected to the receive terminal RD of the ACIA 66 and further to a source of positive voltage +V through resistor 72.
  • a base resistor 78 provides a connection between the base of the transistor 74 and the receive terminal RCV of the LPC generator 16.
  • a diode 76 protecting the receive transistor 74 connects between the base to emitter junction of the device.
  • the transitor 74 is turned on which lowers the voltage at the RD input of the ACIA device nearly to ground.
  • the RD terminal returns to nearly +V as the transistor 74 will then turn off.
  • the data rate of a serial transmission and reception at the ACIA 66 is determined by the frequency of a clock signal applied to its CLK input.
  • the clock signal is generated from an oscillator 70 and a counter 68 which divides down the oscillator signal into a lower frequency compatible with the LPC generator communication rate.
  • the ACIA 66 asynchronously receives a byte of data from the LPC generator 16 and stores it in a register and signals the Apple lie that a byte of data has been received.
  • the Apple lie thereafter can read the stored byte via the datapath 55 and databus D0-D7 through buffer 54.
  • the Apple lie commands the buffer 54 to reverse the data direction and commands the ACIA to store a data byte coming in from data path 55 into a transmission register.
  • the personal computer 10 thereafter commands the ACIA 66 to transmit the data in serial fashion to the XMT terminal of the LPC generator 16.
  • the other function of speaking the data is provided by the PIA 50 and the speech synthesis chip 52.
  • the speech synthesis chip preferably is a Texas Instruments TMS 5220 speech synthesizer which has data inputs D0-D7 for receiving a byte of data.
  • the speech synthesizer chip includes a number of interface command inputs *WS, *RS, *INT and *RDY connected to the port B terminals B2, B4, B6, and B7 of the PIA 50.
  • a timing circuit 58 is further connected between a source of negative voltage -V and the -5 input and the OSC input of the chip 52.
  • the interface thus consists of a eight-bit bidirectional data bus (D0-D7) , separate select inputs for a read (*RS) or a write (*WS) operation, a ready line (*RDY) for synchronization, and an interrupt line (*INT) to indicate a status change requiring the attention of the personal computer 10.
  • the audio output AUDIO of the speech synthesis chip 52 is filtered and attenuated by a circuit 64 before being applied to the inputs of operational amplifier 60.
  • the audio signal After being amplified the audio signal is capacitively coupled to the speaker 12.
  • the data terminals D0-D7 of the speech synthesis chip (SSC) 52 receive data and provide status information to the port A inputs A0-A7 of the PIA 50.
  • the bit combination on control lines Al, A2 are used to control the mode of the PIA 50 by applying signals to the RS0 and RSI inputs of the device.
  • the PIA is selected for operation by a low logic level on the DEVSEL-line being applied to its *CS2 input during a clock transition of the 0 clock applied to its enable E input.
  • the direction of data flow between the personal computer 10 and the speech synthesis chip 52 is determined by the logic level on the R/W control line which is connected to the R/W input of the PIA 50.
  • the PIA 50 can be reset by the reset control signal *RST which is received at its *R input.
  • the PIA has three operations to perform where the first is passing along bytes of digital data from the personal computer 10 to be spoken by the SSC 52.
  • a second operation is to transfer the status of the SSC 52 to the personal computer 10 and the third is to set the correct logic levels on the control inputs of the SSC from commands generated by the personal computer 10.
  • the speech data which has been compressed using pitch excited linear predictive coding by the LPC generator 16 is supplied to the SSC 52 by the personal computer 10 through the PIA 50.
  • the SSC 52 decodes this data to construct a time-varying digital f lter model of the vocal tract.
  • This model is excited with a digital representation of either glottal air impulses (voiced sounds) or the rush of air (unvoiced sounds) .
  • the output of the model is passed through an eight-bit digital to analog converter to produce a synthetic speech waveform output from the AUDIO terminal of the chip 52.
  • a detailed electrical schematic of the speech processor 200 is illustrated.
  • a microprocessor 202 which generates the control signals that control the operation of the speech processor 200.
  • the microprocessor 202 is a preferably Rockwell International Corporation R6500/1 microcomputer system, the details of which are described in a publication dated June, 1979, entitled "document No. 29650-N48: R6500 Microcomputer System Product
  • the speech processor 200 also contains a speech memory 214 which is a 64k byte random access memory constructed from eight 64k bit dynamic random access memory units No. TMS 4164 manufactured by Texas Instruments, Inc.
  • the processor contains a speech synthesizer chip 216 preferably. Model TMS 5220 also manufactured by Texas Instruments, Inc.
  • the speech synthesizer chip 216 is more fully described in the publication DM-02, dated June, 1981 and entitled "TMS 5220 Voice Synthesis Processor Data Manual" published by Texas Instruments, Inc.
  • the speech memory 214 if the speech processing mode is to be used, is first loaded with speech data by the personal computer 10.
  • the memory 214 has the capacity to hold about five minutes worth of speech which may be broken up into 605 discrete speech segments or phrases.
  • the personal computer 10 first generates the necessary signals at peripheral connector 31 to cause bus input logic 300 to generate a signal *COUNT on line 217.
  • the signal resets an 8-bit counter 218 to a zero count and also passes through a gate 206 and resets an 8-bit counter 220 to a zero count.
  • the outputs of the counters 218, 220 on lines 222 and 224 are alternately connected to a 2-1 select logic device 212 and then through a 2-1 select logic device 210 to the ADDR inputs of the speech memory 214.
  • the 2-1 select logic device 212 is driven by the PHASE 0 timing signal on line 228 into alternate states at a rapid rate and thereby presents a complete 16-bit memory address to the speech memory 214 once during a complete cycle of the PHASE 0 signal.
  • the counters 218 and 220 determine which memory location within the memory 214 is to be accessed, and the *COUNT signal on line 217 forces these counters to address the location $0000.
  • the personal computer 10 can proceed to write data into successive locations within the speech memory.
  • the personal computer 10 presents the data one byte at a time to the peripheral connector 31 on data bus lines D0-D7 and simultaneously generates the necessary signals to cause the bus input logic 300 and timing logic 400 to generate a RAS signal on line 236, a *WRITE signal on line 232, and a *CAS signal on line 234.
  • the *CAS signal on line 234 and the immediately proceeding *RAS signal on line 236 cause the speech memory 214 to accept a byte of data from the data bus 230 and store this byte into location $0000 of the memory 214.
  • the *CAS signal on line 234 also advances the 8 bit counter 220 so that it will now address location $0001. The above process is repeated until all the necessary data (up to 64k bytes) have been loaded into the speech memory by the personal computer 10.
  • the *CAS, *RAS and *WRITE signals on lines 234, 236 and 232 respectively load the byte into the next successive location of the speech memory 214 and the *CAS signal adds one to the address count presented by the counters 218 and 220.
  • the counters 218 and 220 are interconnected such that an overflow or carry forward from the counter 220 flows into the count input of the counter 218 through a gate 238.
  • the personal computer 10 thus loads speech data or other data into the speech memory 214 almost as rapidly as it can load data into its own internal RAM. To read the data from memory 214, the personal computer 10 reverses the loading process.
  • the *COUNT signal on line 217 is generated by the bus input logic 300 to counters 218, 220.
  • the signal on line 217 resets the counter 218 to a zero count and also passes through the gate 206 and resets the counter 220 to a zero count. Having set the address counter for the speech memory to its zero address position the personal computer 10 may now proceed to read data from successive locations within the speech memory 214.
  • the personal computer executes the read operation by generating the necessary signals to cause the bus input logic 300 and timing logic 400 to generate a *READ signal on line 237, a *RAS signal on line 236, and a *CAS signal on line 234.
  • the *CAS and *RAS signals cause a strobe of the memory 214 at the location addressed which is presently location $0000.
  • the contents of that address are then output on the bus 258 which is returned to the personal computer 10 via the data bus lines D0-D7 of data bus 230.
  • a buffer 259 connects the data bus 258 to the data bus 230 when enabled by the *READ signal on line 237.
  • the *CAS signal on line 234 increments the address counters 218, 220 so that they will now point to the next location, in the example
  • FIG. 6 illustrates the normal memory map of the storage area for the speech processor implementing memory 214.
  • the LPC speech data is stored in the higher order addresses $0540-$FFFF.
  • From address $0082 to address $053F are located phrase pointers and identification bytes.
  • From address $0082 to address $053B are located the 605 phrase pointers which point to the 605 phrases loaded into the LPC speech data buffer. If not all 605 phrases are used then the pointers instead of addressing a particular buffer address location address $0081 where a stop code $FF is stored.
  • the address $0080 is reserved as a check sum for the total information stored in this area and addresses $053E, and F are used to store a two byte soundtrack identification number.
  • a command buffer from addresses $0000-$007F is loaded with a plurality of coded commands for executing the phrases of the storage area in an arbitrary sequence. The command buffer and its operation and programming are more fully described in the above-referenced Raymond et al. application.
  • the personal computer 10 Whenever the personal computer 10 wishes to have the speech processor 200 produce speech, the personal computer proceeds as described above to store a particular soundtrack and causes the speech address counters 218 and 220 to be set to a zero count. Next the personal computer 10 feeds two bytes of command data to a first command location of the command buffer. The personal computer 10 then actuates the peripheral connector 31 to produce signals in such a manner as to cause the bus input logic 300 to generate an enable signal on line 242. The enable signal on line 242 sets a bistable 240 which causes the generation of a BDEN signal to place the microprocessor 202 into operation.
  • the microprocessor 202 then examines the contents of the initial command location and takes whatever action the personal computer 10 has called for. A number of command options are available. The simplest command that the personal computer 10 can place into the location is a phrase address. Upon receiving a phrase address as a command, the microprocessor 202 causes the speech synthesizer 216 to generate the corresponding phrase as speech. If the command supplied is a number within the range of $0800 to $083F, the command informs the microprocessor 202 that a multiple series of commands have been placed in the command buffer of the speech memory 214. The least significant six bits of the number supplied indicates how many commands have been placed in sequential memory locations starting at $0002 through to the end of the multiple command set.
  • the individual commands within this set may be phrase address commands or silent interval commands or both. If a command stored in either the initial command location or the multiple commands falls within the numeric range of between $C000 to $FFF, then the command is a time delay that orders the speech processor to do nothing for a specific number of 12.5 milisecond time intervals specified by the least significant 14 bits of the command.
  • the personal computer 10 can, by proper manipulation of the periheral connector 31 control signals, cause a STATUS signal 244 to be generated by the bus input logic 300 which causes status data presented at Port B of the microprocessor 202 to be gated through a gate 246 and presented to the data bus D0-D7 230 from which the status data of the speech processor may be read.
  • This data can indicate, for example, whether the microcomputer 202 is busy generating speech or otherwise executing a series of commands.
  • a special number presented to the personal computer 10 on leads D0-D7 of data lines 230 can identify the speech processor to assist the computer in searching the peripheral connector 31 slots looking for the speech processor 200.
  • Other status leads can indicate such things as the size of the speech memory, if it is variable in size.
  • the microprocessor 202 Once placed into operation by the BDEN signal 241, the microprocessor 202 generates a LOAD signal 248 that resets the bistable 240, flows through the gate 206 to clear the counter 220, and flows to the LD (Load) input of the counter 218, thereby causing the counter 218 to load itself with the number 250 presented at Port A of the microprocessor 202. At this time. Port A presents $0000 to the counter 218. Accordingly, the two counters are cleared to zero count so they address the single command data stored in location $0000 of the memory 214.
  • the microprocessor 202 generates a NEXT signal 252 which a NEXT signal pulse generator 500 converts into a carefully synchronized NEXT PULSE 254.
  • the NEXT PULSE flows into the timing logic 400 and initiates an *RAS and *CAS signal sequence that transfers the contents of location $0000 within the speech memory into the latch 256 over the memory output bus 258 and that advances the counter 220 to a count of $01 so the counters 218 and 220 now address location $0001.
  • the microprocessor 202 then terminates the NEXT signal 252 and initiates an EN DATA signal 260 that displays the contents of the latch 256 to the bus 250 and to Port A of the microprocessor 202.
  • the microprocessor accepts the byte of data from the bus 250.
  • the microprocessor 202 again generates the NEXT and EN DATA signals in rapid sequence and thereby reads a second byte of data from location $0001 within the speech memory 214, leaving the counters 218 and 220 addressing memory location $0002.
  • the microprocessor 202 next examines the 16-bit command it has retrieved from the speech memory and takes whatever action is appropriate, as is explained more fully below.
  • the microprocessor 204 If the address counters 218 and 220 need to be reset to point to a specific address, the microprocessor 204 presents the most significant byte of the desired address to the counter 218 over the bus 250 extending from Port A, and then it generates the LOAD signal to clear the counter 220 and load the most significant byte of the address into counter 218. Then, if the least significant byte of the desired address is nonzero, the microprocessor 202 generates the NEXT signal 252 a number of times equal to the numeric value of the least significant byte. Since each NEXT signal causes the timing logic 400 to generate a *CAS signal which advances the counter 220, the net effect of these operations is to set the counters 218 and 220 to the desired address value.
  • the microprocessor 202 can step through and examine the contents of the speech memory locations starting with the specified address.
  • the microprocessor 202 maintains its status at Port B where it is available to the personal computer 10, including one data bit that indicates whether the microprocessor is "BUSY.”
  • a REFRESH signal 262 connects the address input of the speech memory 214 to an 8-bit counter 264 that counts upwards continuously in synchronism with the PHASE O signal on line 228.
  • the *RAS signal on line 236 continuously pulses the memory 214 even in the absence of the CAS signal and thereby causes locations within the speech memory 214 to be addressed by the count output of the counter 264.
  • the RESET signal 266 from the personal computer 10 is applied to the microprocessor 202 and the next signal pulse generator
  • the Q3 signal on line 268 is a timing signal from the personal computer 10 that is fed into the microprocessor 202 to serve as a clock signal and to synchronize the operation of the microprocessor 202 with the operation of the speech memory 214, which derives its timing from the PHASE O signal of line 228.
  • the timing relationship of the Q3 signal and the PHASE 0 signal is illustrated in the "Apple II Reference Manual" book mentioned above.
  • the Q3 signal fluctuates at precisely twice the frequency of the PHASE O signal, going high for 300 nanoseconds each half cycle of the PHASE 0 signal.
  • the Q3 signal 268 is applied to input CLK of the microprocessor 202, and the RESET signal 266 is applied to input RST.
  • the remaining four signals, NEXT, EN DATA, LOAD, BDEN, connect to bit positions of Port C of the microprocessor.
  • the remaining four Port C signals are connected to the speech synthesizer 216 by a bus 270, and are connected to the control inputs *RDY, *INT, *RS, and *WS of that device.
  • Port D, bits 0-7 connect respectively to the speech synthesizer 216 input leads labeled D0-D7 via a Bus 272.
  • a capacitor 274 connects the OSC lead of the speech synthesizer 216 to ground, and resistor 276 and variable resistor 278 connect the same lead to a negative supply 280 which also connects to the voice synthesizer input -5.
  • the system flowchart and operation of the collection routine is illustrated more fully in Figure 7.
  • the program is loaded into the Apple lie by inserting a program disk which contains the software of the routine and a DOS such as APPLE DOS 3.3 into the program disk drive 26.
  • a phrase storage disk 21 is also inserted in the data disk drive 28 to provide an area for storing the collected phrases that are produced by this process.
  • the system is then booted into memory and control transferred from the DOS to the collection program.
  • the routine begins in block A10 where initial instructions are provided to the operator on the video monitor 29. Such instructions as "Turn on the power to the SDS-50 unit", "Turn on power to the terminal", and
  • ⁇ Toggle reset for SDS-50 unit are given to assist in initialization. After the operator has read and executed these instructions he will press a return key on keyboard 27 to continue the program which is sensed by block A12. The user then inputs the date on which the collection occurs and causes the program to sequence to the next step.
  • the program now prompts the operator with a menu on the video monitoring 29 displaying three choices for collection operation.
  • the first choice is whether he wishes to initialize the phrase storage disk 21 that is mounted in the disk drive 28.
  • a second option is to actually store phrases on that phrase storage disk and a third option is to stop.
  • the operator By picking a particular option and hitting a key on the keyboard 27 corresponding to that option, the operator will cause the program to advance. If option 3 is chosen, as indicated at block A18,.then an affirmative branch from that test will cause the program to stop.
  • a stop command is a transfer of control from this particular application program of the Apple lie back to the operating system monitor such that another application program may be run.
  • block A34 the program will prompt the operator with a visual message on the display screen to insert a disk and a file name for the particular phrase storage disk 21 he inserts in data disk drive 28. After the disk has been inserted in the data disk drive 28, or if a data disk is already ounted as in the present example, then the operator presses a return key, as sensed by block A36, to continue the sequence of the program. The program will then display a warning message on the video monitor 29 that the disk will be erased if the program continues.
  • the operator replies by pressing either the yes ("Y") or no ("N") key on keyboard 27 which is tested for in block A50. If the answer to the prompt is negative, then the program loops back to block A16 where the command menu is displayed on the video monitor 29 so the process can start again. If, however, the yes ("Y") key is pressed then the program will prepare the disk in the system format which is operationally represented in block A52. The disk preparation places a blank index table on the memory device so that as phrases are stored thereon, the table will be filled and the positions of the arbitrary length stored phrases will be easily locatable. Further, the phrase storage disk 21 is labeled with the file name given the memory segment in block A34. After the disk preparation is finished, the program tests for whether or not the return key has been pressed. As soon as the operator operates the return key, as sensed in block A54, then the program will transfer control back to block A16 where the program menu is again displayed.
  • test in block A22 is performed to determine if the operator has chosen the record or collect function. If the test result here is negative, then the program defaults back to the menu display in block A16 where the choice of functions is continued.
  • the program will sequence to block A24 where a message is displayed on the video monitor 29 to insert a disk on which the collected phrases may be stored and then to press return.
  • the program delays at block A26 until the program senses that the operator has pressed the return key after loading the disk.
  • the program thereafter displays the disk identity on the video monitor 29, as represented by block A28, to indicate what is stored on the header portion or index table of the phrase storage disk. This table is the one initially stored on disk during option 1 when the disk was initialized in blocks A34-A54.
  • the program will display the message, "Is this the correct disc?” on the video monitor 29 as illustrated in block A30. This operation allows an operator to insure that the disk on which he intends to store phrases is the correct one and that a mistake has not been made.
  • the system collects a phrase by calling the subroutine COLLECT DATA in block A66 where a converted phrase is transferred to the personal computer 10 from the PASS system.
  • a phrase is input to the system through the speech collector circuit 32 and is saved in an intermediate RAM buffer where it is decoded to determine which command was used for the collection process.
  • the command is generated by an operator on the PASS system during the conversion process and is used to command a number of functions from the recording or collection program. If the command is detected as a "space" in block A68, then the phrase or segment input will be stored in block A80 on the data disk proceeded by a phrase number which is automatially incremented each time a new phrase is saved. If the command is decoded as four or less alphanumerics in block A70, then in block A82 the phrase is saved to the disk 21 using that code name as its phrase name in the table. If the command is an "S", then a subroutine SPEAK is called in block A84 to verbally output the stored phrase with speech synthesis chip 52.
  • phrase storage memory disk
  • the program cycles back to the COLLECT .DATA routine to wait from another phrase input from the PASS unit.
  • the phrases may be stored on a phrase storage memory (disk) one by one in any amount. When one disk is full another one can be initialized and recorded on.
  • a single standard floppy disk generally will hold about 90 phrases of approximately ten seconds apiece.
  • this expandable phrase storage memory using the phrase storage disks 21 can be used to store an entire message session without having to go back and record missing data at a future time.
  • the collection program provides a facile means for storing the phrases to disks in that once an operator is set up and ready to collect, the system will collect the data faster than a PASS system can accomplish the conversions of the analog waveforms into LPC data. The operator thus is free to concentrate on the conversion and merely transmits the converted data and the command he wants to accomplish at the end of every conversion.
  • the collection program waits in the storage loop for the PASS input, accepts it, and then labels such for subsequent retrieval.
  • FIGURE 8 illustrates a more detailed flow chart of the collection process as accomplished by the routine COLLECT DATA.
  • the ACIA 66 is reset in the first block A80 and the program then displays the message "waiting" on the video monitor 29.
  • the operator has initialized the PASS unit and has input an analog speech waveform of a predetermined interval and converted it into LPC code. He then appends a command to the converted data and operates a sequence of commands for the transmit function on the SDS-50 unit to send the converted data and command to the ACIA 66.
  • the ACIA notifies the personal computer 10 that data has been started and this operation permits the program to continue from block A84 where it had been waiting.
  • the command is stripped from the data and a nonessential tag removed in block A86 before progressing.
  • the incomming LPC code is then collected in block A90 and tested for errors in block A92.
  • the data is then checked to determine if an end of transmission character has occurred.
  • the data byte if there were no errors, is saved in the intermediate buffer for transferral to the phrase storage disk or speaking. The process repeats until an entire phrase or data segment has been input to the intermediate buffer or an error occurs in the transmission of the data. If either one of these happens then the program will exit immediately to the calling program.
  • Figure 10 is a pictorial representation of the linear predictive code format for the phrase data.
  • the LPC format compresses the data during conversion into a number of consecutive frames. As seen in the figure there are anywhere from 4-52 bits forming each frame of data.
  • a phrase is made up of one or more frames which characterize the verbalizations of the analog waveform in the digital LPC code. There are five basic formats for the frames including voiced sounds, unvoiced sounds, repeats, silence and stops.
  • a voiced frame 290 is represented by a 4 bit energy parameter and a 6 bit pitch parameter in addition to ten filter coefficients
  • Each of the filter coefficients may have a different number of values, and therefore bits, depending upon their importance and frequency.
  • the number of bits for each K coefficient is from 3 to 5 for each of the ten filter coefficients.
  • the voiced frames 290 are at a fixed rate, but by appending two bits to the beginning of a frame a variable frame rate system can be provided.
  • the next format for the frames is an unvoiced sound 292 which contains an energy parameter but has a pitch parameter of zero. Further, only the first four filter coefficients are needed to define the unvoiced parameters.
  • the unvoiced frames may have a variable or fixed frame rate.
  • the variable frame rate is provided by appending two extra bits to the beginning of the frame.
  • the bit separating the energy parameter from the pitch parameter has been zero. However, for a repeat format 294 this bit becomes a 1 and the different frames may change in energy and pitch but the K parameters for the repeat frame stay the same.
  • the silence format 296 sets the four bits of the energy parameter equal to zero but can include two initial bits for determining whether a variable frame rate is desired.
  • the fifth and final format 298 for the frames is a stop format where the energy parameter is set equal to all ones.
  • the stop format 298 can be at either a fixed or variable frame rate depending upon whether an initial two bits are appended.
  • Figure 11 is a functional block diagram of the operational characteristics of the editing program. This block diagram should be viewed in conjunction with the hardware illustrated in Figure 1 to more fully understand the operation of the system under control of this routine.
  • the editing routine is loaded into the Apple lie by mounting an editing program disk in the operating disk drive 26.
  • the program disk has the editing routine and an Apple disc operating system, such as APPLE DOS 3.3, stored thereon.
  • a phrase storage disc 300 previously loaded with phrases stored by the collection routine, is mounted in the data disk drive 28 and serves as a source for unedited phrases and a destination for edited phrases.
  • the DOS of the Apple lie is used to load the editing routine from the program disk into the RAM of the microprocessor 30 of the personal computer 10 and thereafter to transfer control of the PC to that software which runs as an application program.
  • codepack data is defined as a data segment in all LPC or binary which is quite hard to decipher by an operator.
  • Frame data is defined as that binary LPC code converted to a display code, such as ASCII, which can be humanly interpreted.
  • codepack buffer 308 From the codepack buffer 308 the phrase is automatically unpacked from LPC code into humanly intelligible parameters for viewing on the video monitor 29 as illustrated in functional block 312.
  • the unpacked FRAME data is stored in another RAM buffer area defined as frame data storage 316.
  • the editing routine has a means for manually packing the data in the frame buffer 316 back into CODEPACK data as illustrated by functional block 314.
  • This is a manual routine and can be used to command the packing of a phrase into codepack at any time it is stored in the frame data storage buffer 316.
  • a manual operation of the unpacking function of block 308 can be called to replace FRAME data with the original CODEPACK data version.
  • CODEPACK data in the storage buffer 308 can be spoken by an audio output function as represented by block 310 and/or saved back to the disk 300 as represented by functional block 304.
  • Other functions allow the
  • CODEPACK data to be stored in either a prefix buffer 309 or a suffix buffer 311. CODEPACK data in either of these buffers can be spoken by the audio output represented by functional block 310.
  • a final option for operation on the CODEPACK data is that a phrase may be identified and deleted from. the disk 300 as represented by function block 302.
  • the FRAME data stored in the storage buffer 316 may be printed out visually on the video monitor 29 or on a printer means 33 as represented by block 318. This function provides a hard copy duplication of the phrase being edited and is useful in training operators in the techniques of editing phrases. Further, the FRAME data in the frame storage buffer 316 may be temporarily converted to codepack by function of block 322 and so that it can be spoken by an audio output block 324. Additionally the PHRASE data in frame storage buffer 316 may be reviewed and edited as represented by functional block 320. The review and the editing of the FRAME data and the advantageous operational structure shown produce a means for modifying and correcting data with an ease unknown prior to the invention.
  • the structure previously shown in FIGURE 11 is used by the editing routine to perform numerous review and modification operations on the phrases stored on the phrase storage disk 300.
  • a system flowchart of the editing routine will now be more fully described with reference to FIGURE 14.
  • the program begins by initially in block A90 by indicating to the monitor program that the video monitor 29 should be set to the high resolution mode.
  • the video monitor 29 is then prepared for text in block A91 and the screen cleared by erasing previously stored material and moving the cursor to the home position in block A92.
  • the editing routine sets the system operating monitor for a restart or a reset by providing its own starting address as the jump location for the start up routine in blocks A93, A94.
  • the message area at the bottom of the CRT monitor is cleared in block A95 and the program will then prompt the operator with the question "command?" in block A96.
  • the program waits for a command to be given in block A97 which is by the operator entering a predetermining sequence of keys on keyboard 27. After the operation that is commanded has been performed in block A99, the program will return to the loop to the beginnng of the program and be ready for another command.
  • a command can be given to load a phrase from the phrase storage disk 300 as illustrated in Block A100.
  • the editing program will call the necessary subroutines to download the index table listing the contents of the phrase storage disk 300.
  • the table contents are displayed entirely and lists a two digit number indicating the number of the phrase on the disk separated by a hyphen from a four digit alphanumeric designator indicating the name of the phrase.
  • the program will display a visual table on the video monitor 29 in this format and prompt the operator with a question which asks "which -34- number?".
  • the program After the opeator enters the number of the phrase that is to be loaded from the disk 300, the program automatically transfers that phrase to the codepack data storage area 308 and automatically converts it from codepack to frame data. The frame data is then loaded into the frame data storage area 316 prior to the program returning. After the phrase has been loaded into the two storage areas, the program blanks the video monitor 29 and returns with the prompt for another command.
  • A102 is to provide the operator a function for listing the frame data.
  • the program calls those subroutines necessary to list the frame data stored in the frame data storage buffer 316 onto the video monitor 29 so that the operator can understandably see in an intelligible format what the value of each parameter of the frames are.
  • FIGURE 13 A pictorial representation of a representative of the listed display is illustrated in FIGURE 13. The phrase is listed on the video screen as a number of lines from 1-19, which each contain the
  • the first column of a line corresponds to the energy parameter of an LPC coded frame and the second column corresponds to the pitch parameter.
  • the other ten columns are respectively the filter coefficients K1-K10 in numerical order for the specific frame.
  • the operator has a numerical display in a frame array format representing the phrase in column format so that he may look at a specific parameter and/or line to vary or change the data.
  • This provides a facile method of displaying a phrase in information which is easily recognizable to an operator and lends itself to an understandable modification of the data.
  • the routine also makes use of the cursor of the video monitor 29 to point to a particular parameter or field in the data.
  • the next command illustrated in block A104 is provided to allow the operator to speak the phrase which is stored in the codepack data storage buffer 308. This is done directly by outputting the LPC code in the buffer 308 to the speech synthesis chip 52 of the speech collector 32 as has previously been described. After the operation of speaking the phrase has been completed, the program will jump back to the beginning of the main editing routine and prompt the operator for another command.
  • the next command illustrated in block A104 is provided to allow the operator to speak the phrase which is stored in the frame data storage buffer 316. This is done by converting the frame data in buffer 316 into codepack 322 and outputing the converted LPC code to the speech synthesis chip 52 of the speech collector as illustrated in block 324.
  • the speech command A104 and the repeat command A106 allows the operator to compare unedited speech in the buffer 308 with the edited speech in the buffer 316. When the operation is complete the program will cycle back to the beginning of the editing routine where the operator will again be prompted by the command question.
  • Another related command to the speak and repeat commands is the speak slow command in block A108 to which the operator adds a numeric designator between 2 and 9.
  • the numeric designator indicates the speed with which the phrase is spoken by the speech synthesizer chip 52 and allows the operator to determine where an error in a phrase may be by varying the speed with which it is spoken. In this manner the operator can easily narrow down the approximate location of an error or something that he desires to change before a slower and more detailed analysis of the phrase is attempted.
  • many phrases may be reviewed without having to go to a detailed analysis of each frame as will be more fully described hereinafter. If they pass review under the initial listening tests as defined by the speak, repeat and speak slow commands then no further editing and review are necessary. When the phrase that is stored in the buffers
  • a command to enter an edit mode can be given as represented in block A110.
  • This mode will be more fully discussed hereinafter and generally provides a second group of options with various control commands for producing considerably more detailed editing and review functions.
  • the next three commands as shown in blocks A112, A114 and A116 are concerned with a graphical display of the phrase in the data buffer 316.
  • the operator by generating a second command to list the frame data as illustrated in block A102 may graphically display in terms of a bar graph, such as that illustrated in Figure 12, the energy and pitch parameters for the frames as a function of amplitude.
  • the energy and pitch parameters are shown only as an example of this function while in actual operation the program will provide graphical illustrations of all twelve of the LPC parameters.
  • the graphical display of these two parameters are excellent visual indications of where errors in the LPC code have been produced during encoding. Particularly, for the energy parameter the amplitude in a phrase will not rapidly vary from frame to frame and is relatively constant throughout a series of voiced sounds until the phrase goes to silence.
  • one of the errors seen immediately from the graphical mode during sequential voiced frames is a missing energy parameter value or one that is discontinuous with the general average of the surrounding voiced frames such as at 350.
  • This parameter may then be replaced to cure the error in the phrase so that a smooth transition across the energy or pitch parameters can be maintained.
  • the graphical mode shows a distortion of this sort extremely well such at 352.
  • the operator may toggle between the frame data in a listed display on the video monitor or as shown in a graphical mode by giving the command for the toggle text/graph in block A114.
  • the operator generates the command indicative of the graph mode off operation as indicated in block A116. Those discontinuities in the graphs of the energy and pitch found during the graphical mode then may be repaired by the edit mode which is entered via block A110 by generating the correct command.
  • the next command illustrated in block A118 is to determine how much free space is left on the phrase storage disk 300.
  • the program Upon receiving this command the program will read the disk index table and also the end of file record to determine how many bytes are open for the storage of more phrases. After the routine has calculated this number and displayed it on the screen, the operator is asked to press any key to return the program to the main routine where another command can be accepted.
  • the operator may also delete a phrase from the disk by keying in the command illustrated by block A120.
  • the operation of the system is similar to the load a phrase from disk command as discussed previously.
  • the table of contents for the phrase storage disk 300 is displayed on the video monitor screen. Thereafter, the program will prompt the operator with the question "which number?".
  • the phrases that are stored on the phrase storage disk 300 are listed by a two digit number from 1 to 90 and the program will attempt to match the number input by the operator with one of these stored phrases. If it finds a match, the program will delete that number from the table and also erase the phrase from the phrase storage disk. When the phrase is erased from the phrase storage disk 300, the number of bytes of free disk space are increased which allows more data to be stored on the disk.
  • the program will return to the main routine where the operator will again be prompted to input another command.
  • Another command that can be given is to call the subroutines performing a hard copy printout of the phrase as illustrated by block A122. These routines will take the data stored in the frame data storage buffer 316 and similar to the list frame data command will output this data in humanly intelligible form on a printer or other hard copy device. After the operation is completed, the program will cycle back to where the prompt for an input command is generated.
  • the next two commands unpack and pack represented by blocks A124, A126 respectively, are used to transfer data between the two buffers 308 and 316 while a conversion takes place during the transfer.
  • the unpack command transfers the data from the codepack data storage buffer 308 to the frame data storage buffer 316 while converting the data from codepack to frame data.
  • the packing command does the opposite and transfers data from the frame data storage buffer 316 to the codepack data storage buffer 308 while converting frame data to codepack data.
  • the unpack command is useful when during the editing of frame data mistakes have been made in the editing and it is too difficult to change the frame data back to where the editor started. In this instance he uses the manual unpacking command to merely overlay the frame data with new frame data equivalent to the codepack data stored in buffer 308.
  • the pack command is generally used as the last command before the codepack data is saved to the phrase storage disk 300.
  • the frame data in buffer 316 has been fully edited and it has been determined that this phrase should be restored to the disk.
  • the original codepack data is overlayed with the new frame data because of the pack command in block A126.
  • This data, now in the codepack data storage buffer 308, can be saved back to the edited phrase storage disk with the command generated by block A130.
  • the commands illustrated as functional blocks A133, A135 can be used to transfer the data in the codepack data storage buffer 308 into the prefix buffer 309 and the suffix buffer 311, respectively.
  • the prefix buffer and suffix buffer are smaller than the codepack buffer and any overlapping data is truncated by the transfer.
  • all the phrases of a particular phrase storage disk 300 can be made available in this form by giving the command illustrated in block A128. Since this could be up to 512 lines of 12 columns for each of 90 phrases the hard copy command for all phrases is generally limited in use. However, it does provide an archival method for storing in hard copy an entire phrase storage disk 300 in humanly intelligible form.
  • the last two commands that may be given to the system when the editing routine is commanding the personal computer are the quit and reset commands.
  • the quit command has the effect of transferring control back to the Apple operating monitor where other application programs can be input and run.
  • the reset command as illustrated in block A134 results in a restart of the edit program by transferring command to the initial operating address of the editing program.
  • the specific and detailed editing functions of the edit mode will now be more fully explained with respect to Figure 16.
  • the edit mode as represented by block A110 provides a number of operations for the detailed modification and change of the phrase in the frame data storage buffer 316.
  • the data that is changed is the frame data which is stored therein and any individual parameter of any frame of the phrase can be modified.
  • the edit mode is entered by the generation of the edit command which produces a visual representation of the phrase data similar to the list command in the main routine.
  • a present location cursor is provided to point from parameter to parameter along any of the lines and thus any parameter including the energy, pitch, or filter coefficients of any frame can be isolated and then modified.
  • the options of the edit mode include a command to speak the frame presently pointed to by the cursor as illustrated in block A148. This allows the operator to review the phrase in more detail by listening to each frame of a phrase as it is spoken to determine if errors can be found or enhancements or modifications are needed.
  • Another command of similar operation is illustrated in block A150 where the system can speak the present frame and automatically advance the cursor to the next line. This allows an operator to rapidly speak each frame in order and go through an entire phrase very quickly looking for errors. Further modifications on these commands are the command illustrated in block A158 which allows the speed at which the frame data is spoken to be varied upon the appending of a numeric indicator from 2 to 9. Another editing command which is a variation of the speak commands is the command illustrated in block A146 which allows the operator to speak the codepack data from buffer 308. With this command and the command illustrated in block A136, a comparison can be made between the two to determine if a mistake that was found in the frame data has been corrected.
  • A136-A142 are for allowing an operator to more readily determine if an error is present in a single line or frame.
  • a command allows an entire phrase to be repeated by the editor. This command is similar in function to the repeat command of the main command group.
  • the frames around a particular frame that the cursor is presently pointing to can be spoken by giving the command in block A138 to speak a window.
  • the window is ten frames long and allows the three previous frames, the present frame, and the six subsequent frames to form the window.
  • Block A142 provides a command to allow the window described above to be spoken at a different rates depending upon a number from 2-9 input by the editor. Initially, the speak window, and speak window slow commands can find errors very rapidly. For review of several frames of data, the command in block A140 allows a window to be spoken and the cursor to automatically advance to the next window.
  • the command represented by block A144 allows the cursor to be decremented to the previous line and the command represented in block A152 provides the operator with a jump command to place the cursor at any line desired.
  • the cursor movement commands in blocks A144 and A152 can be used to position the cursor for editing those particular parameters in the phrase found to be incorrect or in need of enhancement.
  • the edit mode contains several modification commands including the zero frame and advance command illustrated in block
  • This command sets the parameters of a line or frame in the phrase to zero and then advances the cursor to the next line. Additionally, another modification command is found in block A154 where the operator can insert a silent transition between two frames. The silent transition is provided by changing the filter coefficient parameters to a set group used particularly for this purpose. These parameters are more fully described in the operating manual for the SDS-50 system.
  • commands that act entirely on one frame. These commands are represented by blocks A160, A162, A164, A166 and A168.
  • the command in block A160 allows a frame to be deleted from the phrase entirely. This is different from the zero frame command given in block A156 because there is still a place holder frame in that modification while the present operation completely removes the time difference between the end of the previous frame and the start of the subsequent frame.
  • the next full frame edit command is found in block A162 and allows the editor to generate a copy of the present frame. This copy is inserted between the previous frame and the frame the cursor is presently pointing to.
  • blocks A164 and A166 commands are provided to copy either the previous frame or the subsequent frame respectively. The copied frame is copied to (replaces) the frame the cursor is presently pointing to.
  • the last command in this group is merely to accept the frame and advance the cursor which is illustrated by block A168.
  • commands for the editing mode allow the actual parameters of a single frame to be edited and changed. These include commands found in blocks A170, A172, A174, A176 and A178. Initially when the cursor is moved or advanced it points to the energy -43- parameter or the first field or column in a particular line. The command represented by block A170 allows the cursor to move field by field along the line so as to point to individual parameters. The command in block A172 provides a left tab lock for a particular column such that when the cursor is advanced instead of pointing to the energy parameter it will advance to the same column where the tab is locked when the line is changed. Conversel,y in block A174 there is a command allowing the editor to unlock that left tab and have the cursor return to the first field position of each line.
  • A178 allow the parameter to either be decremented or incremented respectively.
  • a group of commands represented by blocks A182, A184, and A186 may be used to link that data with the data presently in the frame buffer and speak the combined result.
  • the command represented in block A184 allows the editor to speak the frame buffer preceeded by the speech data of the prefix buffer while the command represented in block A186 allows the editor to speak the frame buffer followed by the speech data of the suffix buffer.
  • the command represented in block A182 allows the editor to listen to the frame data preceeded by the prefix buffer data and followed by the suffix buffer data.
  • the last command in the editing mode is the quit command as illustrated in block A180.
  • the quit command ends the edit mode and produces a transfer of control back to the command sequence of the editing routine where blocks A100-A134 may be selected.
  • the soundtrack formation program basically takes those phrases which have been stored and edited on the phrase storage discs 326 and provides a means for an operator to choose which he will collect onto selected phrase storage discs 328. During the collection process from the phrase storage discs 326, the selected phrases may be arranged and thus put into the order in which they are to be played back on the soundtrack. Once the arrangement of the selected phrases are produced on one or more selected phrase storage discs 328 then they are concatenated on a soundtrack storage disc 330 by linking the contents of a number of the selected phrase storage discs 328.
  • a soundtrack can be up to 605 phrases in length, concatenating them into a soundtrack requires an extended RAM buffer.
  • the auxilliary DRAM of the speech processor 34 is used in the soundtrack formation process to assemble the necessary quantity of information required by a soundtrack.
  • the soundtrack once completed and stored in the speech processor 34 can be spoken to provide a final assurance of accuracy and then transferred to the soundtrack storage disk 330.

Abstract

Un système pour la génération d'une bande sonore de la parole encodée sous une forme numérique à partir de formes d'ondes analogiques représentant des paroles prononcées. La parole, représentée par une forme d'onde analogique, est transformée membre de phrase par membre de phrase en un code linéaire numérique de prédiction (19). Un système basé sur un ordinateur personnel interactif est utilisé pour identifier chaque membre de phrase transformé et pour stocker les membres de phrase sur une pluralité de disquettes pendant une opération d'acquisition (21). Lorsque les membres de phrase ont été recueillis, des disques individuels peuvent faire l'objet d'un traitement par un autre programme d'application qui effectue une fonction de préparation (3). Un traitement détaillé permet au système de localiser et de changer tout paramètre dans n'importe quel cadre de membre de phrase, ce qui offre la possibilité d'effectuer des modifications importantes. De plus, le traitement comporte des moyens permettant l'examen rapide des paramètres de cadre et de champ en vue de la scrutation automatique pour repérer les erreurs. Après la modification, un cadre d'un membre de phrase peut être réintroduit dans le membre de phrase, et un autre cadre peut être contrôlé. L'invention comporte, en outre, des moyens pour la sélection de certains membres de phrase parmi l'ensemble des membres de phrase stockés, ainsi que des moyens pour stocker les résultats choisis sur des disquettes (25). De plus, on prévoit des moyens pour arranger à nouveau les membres de phrase choisis dans l'ordre souhaité pour une bande sonore. A partir des membres de phrase choisis, on prépare une bande sonore en produisant un enchaînement de tous les membres de phrase choisis (9).
EP19860902100 1985-02-25 1986-02-24 Systeme d'acquisition et de preparation des donnees de parole. Withdrawn EP0214274A4 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US70551485A 1985-02-25 1985-02-25
US705514 1985-02-25

Publications (2)

Publication Number Publication Date
EP0214274A1 true EP0214274A1 (fr) 1987-03-18
EP0214274A4 EP0214274A4 (fr) 1987-07-30

Family

ID=24833818

Family Applications (1)

Application Number Title Priority Date Filing Date
EP19860902100 Withdrawn EP0214274A4 (fr) 1985-02-25 1986-02-24 Systeme d'acquisition et de preparation des donnees de parole.

Country Status (4)

Country Link
EP (1) EP0214274A4 (fr)
JP (1) JPS62501938A (fr)
AU (1) AU5549686A (fr)
WO (1) WO1986005025A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2880592B2 (ja) * 1990-10-30 1999-04-12 インターナショナル・ビジネス・マシーンズ・コーポレイション 複合音声情報の編集装置および方法
US5600756A (en) * 1994-05-11 1997-02-04 Sony Corporation Method of labelling takes in an audio editing system
US20010025289A1 (en) * 1998-09-25 2001-09-27 Jenkins Michael D. Wireless pen input device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4406626A (en) * 1979-07-31 1983-09-27 Anderson Weston A Electronic teaching aid

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4150429A (en) * 1974-09-23 1979-04-17 Atex, Incorporated Text editing and display system having a multiplexer circuit interconnecting plural visual displays
US4193112A (en) * 1976-01-22 1980-03-11 Racal-Milgo, Inc. Microcomputer data display communication system with a hardwire editing processor
GB2059203B (en) * 1979-09-18 1984-02-29 Victor Company Of Japan Digital gain control
JPS5774799A (en) * 1980-10-28 1982-05-11 Sharp Kk Word voice notifying system
US4398059A (en) * 1981-03-05 1983-08-09 Texas Instruments Incorporated Speech producing system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4406626A (en) * 1979-07-31 1983-09-27 Anderson Weston A Electronic teaching aid

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ELECTRONIC DESIGN, vol. 29, no. 17, 20th August 1981, pages 107-112, Waseca, MN, US; T. BRIGHTMAN: "Speech-synthesizer software generated from text or speech" *
ELECTRONICS INTERNATIONAL, vol. 55, no. 17, 25th August 1982, pages 68,70, New York, US; J. GOSCH: "Voice-synthesizer editor displays speech as curves easily alterable by keyboard" *
ELECTRONIQUE INDUSTRIELLE, no. 59, 15th October 1983, pages 65-68, Paris, FR; C. GROSS: "Développement d'un vocabulaire de synthèse de la parole avec Centigram" *
IBM TECHNICAL DISCLOSURE BULLETIN, vol. 12, no. 5, October 1969, pages 640-642, New York, US; R. BAKIS: "Improving the fundamental frequency contour in speech synthesis" *
ICASSP 80 PROCEEDINGS, IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH & SIGNAL PROCESSING SOCIETY, 9th-11th April 1980, Denver, Colorado, vol. 2, pages 402-405, IEEE, New York, US; D.E. MORRIS et al.: "A new speech synthesis chip set" *
See also references of WO8605025A1 *

Also Published As

Publication number Publication date
WO1986005025A1 (fr) 1986-08-28
JPS62501938A (ja) 1987-07-30
EP0214274A4 (fr) 1987-07-30
AU5549686A (en) 1986-09-10

Similar Documents

Publication Publication Date Title
CN101171624B (zh) 语音合成装置及语音合成方法
US20030036911A1 (en) Translating method and machine for recording translation of a list of phrases and playing back translations lof phrases as selected by an operator
JPH05181491A (ja) 音声合成装置
KR101108003B1 (ko) 사용자 단어검색 이력을 통한 학습컨텐츠 제공 시스템
EP3462443A1 (fr) Procédé d'assistance d'édition de voix chantée et dispositif assistant d'édition de voix chantée
CN1813285B (zh) 语音合成设备和方法
EP0047175B2 (fr) Synthétiseur de parole
US4587635A (en) Information retrieval system equipped with video disk
EP0214274A1 (fr) Systeme d'acquisition et de preparation des donnees de parole
US5852802A (en) Speed engine for analyzing symbolic text and producing the speech equivalent thereof
JP2005326811A (ja) 音声合成装置および音声合成方法
Gibson Using digitized auditory stimuli on the Macintosh computer
JPH0554960B2 (fr)
JP4189653B2 (ja) 画像記録再生方法および画像記録再生装置
JPS58160993A (ja) 文書編集装置の音声確認方法
JP2005070604A (ja) 音声ラベリングエラー検出装置、音声ラベリングエラー検出方法及びプログラム
Bunta et al. Bridging the digital divide: Aspects of computerized data collection and analysis for language professionals
JP2868256B2 (ja) プログラム編集装置
Mac Lochlainn Sintéiseoir 1.0: a multidialectical TTS application for Irish
JP3949546B2 (ja) 語学教材データ生成方法
JPS60205599A (ja) 音声登録方式
Draxler Speech databases
JPH02251998A (ja) 音声合成装置
Akbar Waveedit, an interactive speech processing environment for microsoft windows platform.
JPH0736905A (ja) テキスト音声変換装置

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 19861013

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH DE FR GB IT LI LU NL SE

A4 Supplementary search report drawn up and despatched

Effective date: 19870730

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 19871015

RIN1 Information on inventor provided before grant (corrected)

Inventor name: MORGAN, ROBERT, LEE

Inventor name: MILLER, RICKY, LEE

Inventor name: PFEIFFER, JAMES, EDWARD

Inventor name: RAYMOND, WILLIAM, JOSEPH