GB2250405A - Speech analysis and image synthesis - Google Patents

Speech analysis and image synthesis Download PDF

Info

Publication number
GB2250405A
GB2250405A GB9119492A GB9119492A GB2250405A GB 2250405 A GB2250405 A GB 2250405A GB 9119492 A GB9119492 A GB 9119492A GB 9119492 A GB9119492 A GB 9119492A GB 2250405 A GB2250405 A GB 2250405A
Authority
GB
United Kingdom
Prior art keywords
speech
mouth
store
data
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
GB9119492A
Other versions
GB9119492D0 (en
Inventor
Alison Diane Simons
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
British Telecommunications PLC
Original Assignee
British Telecommunications PLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by British Telecommunications PLC filed Critical British Telecommunications PLC
Publication of GB9119492D0 publication Critical patent/GB9119492D0/en
Publication of GB2250405A publication Critical patent/GB2250405A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/2053D [Three Dimensional] animation driven by audio data
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • G10L21/10Transforming into visible information
    • G10L2021/105Synthesis of the lips movements from speech, e.g. for talking heads
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/20Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using video object coding

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

Successive frames of speech (at 1) are analysed 2, 3, 4 to produce a sequence of codewords identifying the character of the frame. A store 6 stores probability values Pei indicating the probability that any codeword was produced by one of a set of standard 'mouth shapes', whilst a second store 7 stores values tij indicating the probability of one mouth shape following another. A Viterbi decoder examines the sequence of codewords and, using the probabilities, estimates the most likely sequence of mouth shapes to correspond to the speech. This can be used to generate a synthetic "talking face" moving image, e.g. for videophone or audio conferencing applications. <IMAGE>

Description

SPEECH ANALYSIS AND IMAGE SYNTHESIS The present application relates to the analysis of speech, and more particularly to the analysis of speech to estimate the visual appearance of a mouth by which the speech is uttered. One specific application of such analysis is for the synthesis on the basis of an input speech signal, of a moving image of a human face for display to accompany such speech. Such synthesis may be desired for a video terminal in low bit rate transmission systems, or for enhanced audio-conferencing facilities.
In our European patent application no. 86308732.6 (Publication No. 0225729A) we describe an apparatus for synthesis of a moving image which has a store for an image of a face, and a store for storing a set of data blocks each corresponding to the mouth area of the face and representing a respective different mouth shape. In operation, an input audio signal is analysed to produce sequences of spectral parameters which are then used to access a table relating these parameters to codewords identifying mouth data blocks, the codewords obtained being employed to select the corresponding mouth data blocks for output to a display device.
The present invention is defined in the claims.
Some embodiments of the invention will now be described with reference to the accompanying drawings in which: Figure 1 is a block diagram of one form of speech analysis apparatus in accordance with the invention; and Figure 2 is a schematic diagram illustrating the operation of the apparatus; and Figure 3 is an apparatus for synthesising a moving image, incorporating the speech analysis apparatus of Figure 1.
The purpose of the speech analysis apparatus shown in Figure 1 is to receive an input speech signal at an input 1, analyse it, and estimate at intervals the mouth shape which was most likely to have produced that portion of speech. The output of the apparatus is a sequence of codewords each of which identifies an entry in a codebook of mouth shapes. In this example, the codebook is assumed to contain 16 entries. The actual mouth shapes are not stored in the apparatus of Figure 1.
In the embodiment shown, the speech is firstly sampled at 8kHz and converted into digital form in an analogue-to-digital converter 2 and processed by an LPC analysis unit 3 to produce for successive frames of speech (e.g. of 20ms duration) a set of eight LPC coefficients defining a filter having a spectral response similar to that of the speech frame. Any of the conventional LPC analysis methods commonly used for LPC speech coders may be employed for this purpose. The coefficients are vector quantised in a VQ unit 4, which matches each set of coefficients to the nearest entry in a codebook of (e.g.
sixty-four) coefficient sets stored in a speech codebook store 5. This process is again conventional; for example the entry chosen may be that for which the City Block distance (viz the sum of the moduli of the intercoefficient differences) between the actual set and the stored set is a minimum.
The use of vector quantised LPC coefficients is one possible example; LPC cepstral coefficients, or extraction of other speech features, as is common in speech recognition systems, may alternatively be employed.
The apparatus also includes a store 6 containing probability values Pei each of which indicates the relative probability that the speech represented by codeword e originated from a mouth having the shape represented by codeword i. In this example, the store has 16 x 64 = 1024 entries.
A further store 7 contains transition probability values tij each of which indicates the relative probability that mouth shape i is followed by mouth shape j; thus it has 16 x 16 = 256 entries.
The mouth codebook, the speech codebook, and the probability values are generated by a training process, analysing a video signal and its accompanying speech. In a test, fifty sentences totalling 200 seconds were used thereby generating 10,000 frames of speech data and (at 25 frames/second) 5,000 frames of video data.
The speech codebook is generated by selecting that set of 64 coefficient sets which substantially minimises the average distance between a training frame and the nearest entry in the set. Similarly, the mouth codebook was generated by selecting that set of sixteen mouth shapes which substantially minimises the average distance between mouth portions of a training video frame and the nearest entry in the set. The distance measure used was simply the City Block distance between two mouths as represented by the height and width of the mouth opening, but naturally more sophisticated measures could be used if desired.
The mouth/speech probability values Pei are generated by, for each frame, matching-the video and speech frames to the nearest codebook entry; the number of occurences of each pair is recorded. For this part of the training, each video frame was repeated once. Likewise the transitional mouth probability values tij are obtained by counting the number of occurences of mouths i and ; appearing consecutively.
If any particular event did not occur in the training sequence, the corresponding probability value was set to a small value rather than zero.
Returning to Figure 1, the apparatus includes a Viterbi algorithm unit 8. The purpose of this is, for a passage of speech containing N frames, to determine the most likely sequence of mouth shapes having regard to (1) the observed speech as represented by the codewords e(n) (n=l ... N) output by the VQ unit 4 (2) the speech/mouth shape probability values Pei stored in the store 6 and (3) the transitional probability values tij stored in the store 7. This process is, illustrated schematically in Figure 2.
The application of the Viterbi algorithm to the analysis of the speech information will now be described.
For an utterance (or other portion) of speech a sequence of speech codewords, have been produced, one for each frame of speech. The basic procedure is that one calculates, for each frame in succession, the probability that that frame resulted from each of the permitted mouth shapes, taking into account the speech codeword for that frame, the calculated probability for the preceding frame, and the stored probability values. When the end of the sentence is reached, the mouth shape associated with the largest of the calculated probabilities is chosen for that speech frame, whereupon one then re-visits successive preceding frames and makes a similar decision taking into account the previous decision (in respect of the following frame).
We recall that the probability that a particular speech frame having codeword e was generated by mouth shape i is Pei, and that the probability of mouth shape i being followed by shape j is tij - these values being stored in the stores 6, 7. We define Pi(n) for the nth frame as being the calculated probability that that frame resulted from mouth i.
We commence by finding Pi(l) (i = 0 ... 15) for the first frame. There is no previous frame, so we estimate these probabilities on the basis of the codeword e(l) for that frame. Thus: Pi(l) = Pe(l)i (i = 0 ... 15).
For the second frame, we first apply the stored transitional probability values tij to the calculated probabilities Pi(l) from frame 1. For each candidate second frame mouth shape, we multiply Pi(l) by the corresponding transitional probabilities - for the jth shape: Tit(2) = P.(l)t. (i = 0 ... 15) Tij 1 1 ] Now select the largest of these, Tmax j (2) noting also the value of i associated with it.
= max (2,j)) Note that the significance of this is that if mouthshape j is the shape chosen for frame 2 then shape max (2,j) is the most likely one to precede it in frame 1 having regard to the calculated probabilities and andthe transitional probability values tij Having found all these maxima Tmax i (i = 0 15) we then use the frame 2 codeword e(2) to obtain the probability values Pe(2)i from the store and multiply the two to obtain.
Pi(2) = Tmax i(2) Pe(2)i - the calculated probabilities for the second frame.
This process is repeated for successive frames, until we have found Pi(N) for the last frame of the utterance. At this point the first actual decision is made; mouth shape I(N) associated with the largest of the set Pi(N) being chosen, where I(N) is the associated value of i.
Recalling that each time we selected the maximum value of T. (n) we recorded the previous frame mouth shape 13 max (n-l,j), associated with the choice, we can now go back to the penultimate frame N-l and deduce that this implies selection of shape i max (N-1,-I(N)) for it.
In the above description, the calculated probability value for the first frame was estimated simply on the basis of the received speech, since there was no previous frame. In a modification, a further store 9 is included which contains the probability for each mouth shape of its occuring at the beginning of an utterance. These values are then used for the first frame just as the product T max i is used for later frames.
This description assumes that the application of the Viterbi algorithm is performed after the whole of an utterance (e.g. a sentence) has been received. Where it is desired to perform the analysis in real time this may involve an undesirable delay, and a modified approach may be preferred. In that the algorithm relies upon the history of the speech over a period, some delay is inherent; however tests indicate that traceback over a period greater than 200ms is of little value.
In the modified method, assuming operation over a window of m frames in length, suppose we start analysis at frame n, the mouth shape for frame n-l having already been fixed (unless n = 1 in which case that frame is dealt with as already discussed for the first frame). As frame n-l is fixed, the previous frame probability Pi (n-l) is unity for the selected entry and zero for all other values of i. The above procedure is then followed up to frame n+m, the mouth codeword for which has just been generated by the VQ unit 4, with traceback to frame n. The mouthshape for frame n is then fixed, and the decisions for later frames discarded. When frame n+m+l is available, the process is the repeated starting at frame n+l and extending up to frame n+m+1.
Figure 2 shows an apparatus for synthesis of a moving image including a human face having a mouth which moves in correspondence with a speech signal received at an input 1. The input 1 is connected to the input of an analysis apparatus 100 which is of the structure described above and shown in Figure 1.
A store 101 contains data representing a stored image of a face; this may be for example a stored digital representation of a raster-scan television picture, the store being a conventional video frame store.
A second store 102 stores sixteen data blocks each being a digital representation of a mouth having a respective one of the mouth shapes discussed above.
For the purposes of the present description it is assumed that each block is stored in pixel map form - i.e.
a mouth could be superimposed into the face stored in the store 101 simply by writing each picture element from the block into the appropriate location in the store 101.
However, other, parametric, methods of-mouth representation could be used.
The analysis apparatus 100 produces at its output, every 20ms, a codeword identifying one of the data blocks in the store 102. An output unit 103 reads the face data from the store 101 every 20ms to form a raster scan television signal; when however it requires picture data in respect of a portion of the picture area corresponding to the mouth, it reads the data instead from the portion of the mouth data store 102 identified by the codeword supplied by the analysis apparatus 100, so that the desired mouth shape is incorporated into the image. The video signal output on an output line 104 can be displayed on a video monitor 105.
This description assumes a video rate of 50 frames per second; in order to generate a signal at the UK Standard (System I) rate of 25 frames per second one could use a 40ms speech frame period, but if it is preferred to retain the 20ms analysis period then the frame rate could be reduced by omitting alternate mouth shapes, or (more preferably to avoid aliasin# temporally filtering the mouth image sequence and then discarding alternate images. Obviously other video frame rates such as 30 frames/second used in system H can be achieved by similar such adjustments.

Claims (9)

1. A speech analysis apparatus comprising means for analysing a speech signal to generate at intervals items of speech information each representative of the speech during that interval; first store means storing, for each possible item of speech information, data representing the probabilities that a particular portion of speech corresponding to that item of information has been generated by a mouth having each of a predetermined plurality of mouth shapes; second store means storing, for each possible one of the plurality of mouth shapes, data representing the probabilities that that shape is followed by that shape and by each other one of those shapes; and decoding means responsive to the generated items of speech information, to the probability data stored in said first store means in respect of those generated items, and to the probability data stored in the second store means to determine a sequence of mouth shapes deemed to be substantially most likely to correspond to the said generated items.
2. An apparatus according to claim 1 in which the speech analysis means comprises (a) means to analyse each of successive frames of speech to produce therefor a set of parameters representing the spectral content thereof; (b) a fourth store storing a plurality of reference sets of parameters; and (c) means to determine, for each produced parameter set, which of the stored sets it most closely resembles; the said items of speech information being codewords identifying the determined reference sets.
3. An apparatus according to claim 2 in which the parameters are the coefficients of a linear prediction filter.
4. An apparatus according to claim 1, 2 or 3 in which the decoding means is a Viterbi decoder.
5. An apparatus according to any one of the preceding claims including further store means storing for each of the plurality of mouth shapes data representing the probability of its occurence at the commencement of an utterance, the decoding means being responsive also to the data stored in the further store means.
6. An apparatus for synthesis of a moving image, comprising (a) means for storage and output of data representing an image of a face; (b) means for storage and output of a set of mouth data blocks each representing an image of a mouth having a respective one of the said shapes; (c) a speech signal input; (d) a speech analysis apparatus according to any one of the preceding claims to receive the speech input; and (e) control means responsive to the output of the speech analysis apparatus to select for output from the mouth data storage means the data blocks corresponding to the determined sequence.
7. An apparatus according to claim 6 further including video signal generating means operable to generate video frames each representing a face image corresponding to the stored face data having superimposed thereon a mouth image represented by a said selected mouth data block.
8. An apparatus for speech analysis substantially as herein described with reference to the accompanying drawings.
9. An apparatus for synthesis of a moving image substantially as herein described with reference to the accompanying drawings.
GB9119492A 1990-09-11 1991-09-11 Speech analysis and image synthesis Withdrawn GB2250405A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
GB909019829A GB9019829D0 (en) 1990-09-11 1990-09-11 Speech analysis and image synthesis

Publications (2)

Publication Number Publication Date
GB9119492D0 GB9119492D0 (en) 1991-10-23
GB2250405A true GB2250405A (en) 1992-06-03

Family

ID=10682008

Family Applications (2)

Application Number Title Priority Date Filing Date
GB909019829A Pending GB9019829D0 (en) 1990-09-11 1990-09-11 Speech analysis and image synthesis
GB9119492A Withdrawn GB2250405A (en) 1990-09-11 1991-09-11 Speech analysis and image synthesis

Family Applications Before (1)

Application Number Title Priority Date Filing Date
GB909019829A Pending GB9019829D0 (en) 1990-09-11 1990-09-11 Speech analysis and image synthesis

Country Status (1)

Country Link
GB (2) GB9019829D0 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0603809A2 (en) * 1992-12-21 1994-06-29 Casio Computer Co., Ltd. Object image display devices
EP0673170A2 (en) * 1994-03-18 1995-09-20 AT&T Corp. Video signal processing systems and methods utilizing automated speech analysis
EP0674315A1 (en) * 1994-03-18 1995-09-27 AT&T Corp. Audio visual dubbing system and method
EP0689362A2 (en) * 1994-06-21 1995-12-27 AT&T Corp. Sound-synchronised video system
WO1996002898A1 (en) * 1994-07-18 1996-02-01 477250 B.C. Ltd. Process of producing personalized video cartoons
EP0710929A2 (en) * 1994-11-07 1996-05-08 AT&T Corp. Acoustic-assisted image processing
EP0734162A2 (en) * 1995-03-24 1996-09-25 Deutsche Thomson-Brandt Gmbh Communication terminal
FR2749420A1 (en) * 1996-06-03 1997-12-05 Alfonsi Philippe METHOD AND DEVICE FOR FORMING MOVING IMAGES OF A CONTACT PERSON
EP0860811A2 (en) * 1997-02-24 1998-08-26 Digital Equipment Corporation Automated speech alignment for image synthesis
WO1998043235A2 (en) * 1997-03-25 1998-10-01 Telia Ab (Publ) Device and method for prosody generation at visual synthesis
WO1998043236A2 (en) * 1997-03-25 1998-10-01 Telia Ab (Publ) Method of speech synthesis
EP0893923A1 (en) * 1997-07-23 1999-01-27 Texas Instruments France Video communication system
WO2000045380A1 (en) * 1999-01-27 2000-08-03 Bright Spark Technologies (Proprietary) Limited Voice driven mouth animation system
US7369992B1 (en) * 2002-05-10 2008-05-06 At&T Corp. System and method for triphone-based unit selection for visual speech synthesis

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0056507A1 (en) * 1981-01-19 1982-07-28 Richard Welcher Bloomstein Apparatus and method for creating visual images of lip movements
EP0179701A1 (en) * 1984-10-02 1986-04-30 Yves Guinet Television method for multilingual programmes
EP0225729A1 (en) * 1985-11-14 1987-06-16 BRITISH TELECOMMUNICATIONS public limited company Image encoding and synthesis
GB2231246A (en) * 1989-03-08 1990-11-07 Kokusai Denshin Denwa Co Ltd Converting text input into moving-face picture

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0056507A1 (en) * 1981-01-19 1982-07-28 Richard Welcher Bloomstein Apparatus and method for creating visual images of lip movements
EP0179701A1 (en) * 1984-10-02 1986-04-30 Yves Guinet Television method for multilingual programmes
EP0225729A1 (en) * 1985-11-14 1987-06-16 BRITISH TELECOMMUNICATIONS public limited company Image encoding and synthesis
GB2231246A (en) * 1989-03-08 1990-11-07 Kokusai Denshin Denwa Co Ltd Converting text input into moving-face picture

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0603809A3 (en) * 1992-12-21 1994-08-17 Casio Computer Co Ltd Object image display devices.
EP0603809A2 (en) * 1992-12-21 1994-06-29 Casio Computer Co., Ltd. Object image display devices
US5608839A (en) * 1994-03-18 1997-03-04 Lucent Technologies Inc. Sound-synchronized video system
EP0673170A2 (en) * 1994-03-18 1995-09-20 AT&T Corp. Video signal processing systems and methods utilizing automated speech analysis
EP0674315A1 (en) * 1994-03-18 1995-09-27 AT&T Corp. Audio visual dubbing system and method
EP0673170A3 (en) * 1994-03-18 1996-06-26 At & T Corp Video signal processing systems and methods utilizing automated speech analysis.
EP0689362A2 (en) * 1994-06-21 1995-12-27 AT&T Corp. Sound-synchronised video system
EP0689362A3 (en) * 1994-06-21 1996-06-26 At & T Corp Sound-synchronised video system
WO1996002898A1 (en) * 1994-07-18 1996-02-01 477250 B.C. Ltd. Process of producing personalized video cartoons
EP0710929A2 (en) * 1994-11-07 1996-05-08 AT&T Corp. Acoustic-assisted image processing
EP0710929A3 (en) * 1994-11-07 1996-07-03 At & T Corp Acoustic-assisted image processing
EP0734162A2 (en) * 1995-03-24 1996-09-25 Deutsche Thomson-Brandt Gmbh Communication terminal
EP0734162A3 (en) * 1995-03-24 1997-05-21 Thomson Brandt Gmbh Communication terminal
WO1997046974A1 (en) * 1996-06-03 1997-12-11 Pronier Jean Luc Device and method for transmitting animated and sound images
FR2749420A1 (en) * 1996-06-03 1997-12-05 Alfonsi Philippe METHOD AND DEVICE FOR FORMING MOVING IMAGES OF A CONTACT PERSON
EP0860811A3 (en) * 1997-02-24 1999-02-10 Digital Equipment Corporation Automated speech alignment for image synthesis
EP0860811A2 (en) * 1997-02-24 1998-08-26 Digital Equipment Corporation Automated speech alignment for image synthesis
US6385580B1 (en) 1997-03-25 2002-05-07 Telia Ab Method of speech synthesis
WO1998043236A3 (en) * 1997-03-25 1998-12-23 Telia Ab Method of speech synthesis
WO1998043235A3 (en) * 1997-03-25 1998-12-23 Telia Ab Device and method for prosody generation at visual synthesis
WO1998043236A2 (en) * 1997-03-25 1998-10-01 Telia Ab (Publ) Method of speech synthesis
WO1998043235A2 (en) * 1997-03-25 1998-10-01 Telia Ab (Publ) Device and method for prosody generation at visual synthesis
US6389396B1 (en) 1997-03-25 2002-05-14 Telia Ab Device and method for prosody generation at visual synthesis
EP0893923A1 (en) * 1997-07-23 1999-01-27 Texas Instruments France Video communication system
WO2000045380A1 (en) * 1999-01-27 2000-08-03 Bright Spark Technologies (Proprietary) Limited Voice driven mouth animation system
US7369992B1 (en) * 2002-05-10 2008-05-06 At&T Corp. System and method for triphone-based unit selection for visual speech synthesis
US7933772B1 (en) 2002-05-10 2011-04-26 At&T Intellectual Property Ii, L.P. System and method for triphone-based unit selection for visual speech synthesis
US9583098B1 (en) 2002-05-10 2017-02-28 At&T Intellectual Property Ii, L.P. System and method for triphone-based unit selection for visual speech synthesis

Also Published As

Publication number Publication date
GB9119492D0 (en) 1991-10-23
GB9019829D0 (en) 1990-10-24

Similar Documents

Publication Publication Date Title
US6330023B1 (en) Video signal processing systems and methods utilizing automated speech analysis
GB2250405A (en) Speech analysis and image synthesis
EP0225729B1 (en) Image encoding and synthesis
US5890120A (en) Matching, synchronization, and superposition on orginal speaking subject images of modified signs from sign language database corresponding to recognized speech segments
CN111816158B (en) Speech synthesis method and device and storage medium
EP1141939B1 (en) System and method for segmentation of speech signals
EP0453649B1 (en) Method and apparatus for modeling words with composite Markov models
KR880700387A (en) Speech processing system and voice processing method
Chen et al. Lip synchronization using speech-assisted video processing
CN108648745B (en) Method for converting lip image sequence into voice coding parameter
EP0731348A3 (en) Voice storage and retrieval system
SE470577B (en) Method and apparatus for encoding and / or decoding background noise
JPS59101700A (en) Method and apparatus for spoken voice recognition
JPH0527792A (en) Voice emphasizing device
Chen et al. Speech-assisted video processing: Interpolation and low-bitrate coding
JP2611728B2 (en) Video encoding / decoding system
JPH09198082A (en) Speech recognition device
FR2692070A1 (en) Variable speed voice synthesis method and device.
KR20040037099A (en) Viseme based video coding
Wrench A realtime implementation of a text independent speaker recognition system
JP3254696B2 (en) Audio encoding device, audio decoding device, and sound source generation method
JPS59111699A (en) Speaker recognition system
JP2709198B2 (en) Voice synthesis method
Shah et al. An image/speech relational database and its application
Shah et al. Lip synchronization through alignment of speech and image data

Legal Events

Date Code Title Description
WAP Application withdrawn, taken to be withdrawn or refused ** after publication under section 16(1)