CN1556496A

CN1556496A - Lip shape identifying sound generator

Info

Publication number: CN1556496A
Application number: CNA2003101220227A
Authority: CN
Inventors: 刚李; 李刚; 解国明; 林凌; 任惠茹
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2003-12-31
Filing date: 2003-12-31
Publication date: 2004-12-22

Abstract

The invention discloses a lip-shaped identifying sounder, and its connecting mode: a micro camera head is connected with an image collecting unit, the output end of the image collecting unit is connected with a lip-shaped image mode identifying unit, the signal of the identifying unit is outputted to a voice synthesizing unit, the synthesizing unit is connected with a voice storage unit, the synthesizing unit extracts voice synthesizing elements from the voice storage unit to synthesize a sound signal and output the signal to a sounding unit. Then a speaker gives out a corresponding sound to the lip shape and its variation sequence. By identifying the lip shape of a speaker, it determines speech contents, makes voice synthesis on the speech contents and real-timely gives out a sound by a speaker. It can help the persons unable to sound because of removal of throat or vocal cords or the deaf-mute able to speak lip languages to sound, convenient for them to exchange with the normal persons.

Description

Lip identification acoustical generator

Technical field

The present invention relates to a kind of acoustical generator, particularly a kind of lip identification acoustical generator.

Background technology

Clinically, many patients have been because larynx or vocal cords pathology have been carried out larynx or vocal cords resection operation, exchange with the normal person thereby postoperative can not sounding have hindered them.The deaf-mute is general to be exchanged with the normal person is to determine the other side's content of speaking by the lip reading of reading the normal person.The deaf-mute but is difficult to allow others understand the meaning of oneself.Utilize lip image recognition and phonetic synthesis sounding instrument, it can help can not sounding the people sound, remove they and normal person's communication disorder.But there are not a kind of instrument and technical scheme can help above-mentioned patient and deaf-mute to sound at present as yet, make things convenient for them to exchange with the normal person.

Summary of the invention

Purpose of the present invention is to provide a kind of sounding instrument can help above-mentioned patient and deaf-mute's sounding, conveniently exchanges with the normal person.The present invention is the lip by the identification speaker, determines its content of speaking by pattern-recognition, sounds by speech synthesis technique then.Most sounds of language all have definite lip when speaking.The present invention can and think that " sound " of sounding is corresponding one by one, adopts speech synthesis technique to sound by loudspeaker speaker's lip.

The present invention is realized by following technical proposals:

1. gather speaker's lip image by camera and image acquisition units.

2. the lip image is carried out Flame Image Process, real-time, Dynamic Extraction lip feature are determined the content of speaking with the lip algorithm for pattern recognition then.

3. according to the pattern-recognition result, the phonetic synthesis unit extracts voice from voice memory unit, the synthetic content and send by phonation unit of speaking.

The present invention is as shown in Figure 1: minisize pick-up head 1 is connected with image acquisition units 2, the output of image acquisition units 2 connects lip type image model recognition unit 3, the signal of lip type image model recognition unit 3 outputs to phonetic synthesis unit 4, phonetic synthesis unit 4 is connected with voice memory unit 5, phonetic synthesis unit 4 extracts phonetic synthesis key element synthetic video signal from voice memory unit 5, output to phonation unit 6, then send and lip type and the corresponding sound of variation order thereof by loudspeaker 7.

Can be with lip Flame Image Process and pattern recognition unit, the phonetic synthesis unit, voice memory unit realizes that with processor 8 processor can be digital signal processor (DSP) or other microprocessors (as ARM) etc.

And minisize pick-up head 1 can be the camera with digital signal output that integrates with image acquisition units, as CCD camera and other image sensors.

Phonation unit 6 can adopt digital/analog converter and amplifier to form, and also can adopt codec.

The present invention is by identification speaker's lip, and the content of determine speaking, the phonetic synthesis content of speaking is sounded by loudspeaker in real time.The present invention can help because the excision of larynx or vocal cords can not sounding the people or deaf-mute that can lip reading sound, made things convenient for them to exchange with the normal person.

Description of drawings

Fig. 1 is that system of the present invention connects block diagram.

Fig. 2 a kind of lip identification acoustical generator of the present invention.

Embodiment

Below in conjunction with accompanying drawing the present invention is elaborated:

Method of attachment as shown in Figure 1, minisize pick-up head 1 is connected with image acquisition units 2, the output of image acquisition units 2 connects lip type image model recognition unit 3, the signal of lip type image model recognition unit 3 outputs to phonetic synthesis unit 4, phonetic synthesis unit 4 is connected with voice memory unit 5, phonetic synthesis unit 4 extracts phonetic synthesis key element synthetic video signal from voice memory unit 5, output to phonation unit 6, then send and lip type and the corresponding sound of variation order thereof by loudspeaker 7.

Adopt minisize pick-up head 1, reduce volume, before minisize pick-up head is put in lip, only absorb the lip image, do not absorb facial other image, its output map interlinking is as collecting unit.Image acquisition units 2 adopts video capture processor, and input connects the output of minisize pick-up head, and output is connected with pattern recognition unit 3 with Flame Image Process.Flame Image Process and pattern recognition unit are the cores of instrument, adopt digital signal processor (DSP) or other microprocessors (as ARM), mainly carry out pre-service, feature extraction and the pattern-recognition of lip image.Phonetic synthesis unit 4 is according to the synthetic speech as a result of lip pattern-recognition.It is also finished by digital signal processor.Voice memory unit 5 is a database, stores all basic phonemes, adopts mass memory stores.Phonation unit 6 is made up of digital to analog converter and amplifier.Digital to analog converter converts digital audio and video signals to simulated audio signal, amplifies rear drive loudspeaker 7 through amplifier.Phonation unit also can adopt codec.Loudspeaker is sent sound.

The minisize pick-up head of present embodiment and image acquisition units can adopt integrated image sensor.

The lip Flame Image Process and the pattern recognition unit of present embodiment, the phonetic synthesis unit, the processor 8 that voice memory unit adopts can be digital signal microprocessor or digital signal microprocessor system, also microprocessor or microprocessor system be can adopt, ARM microprocessor or ARM microprocessor system perhaps adopted.

The phonation unit of present embodiment comprises digital to analog converter and amplifier composition.

Be suitable for for convenient, outward appearance of the present invention is the earphone shape.Minisize pick-up head is put in the position that common headphones is put microphone, and loudspeaker picks out by line, and other functional unit circuit of instrument is placed the ear position.As shown in Figure 2.

This device on user's image-tape earphone one belt transect is left behind minisize pick-up head, aims at the lip of oneself, opens switch, loquiturs.Although the user can not send out sound, as long as the action of lip when normally speaking, this device just can send correct sound.Lip is nonstandard when speaking for some user, needs through certain training.Trained user, this instrument can satisfy daily interchange.

Claims

1. a lip is discerned acoustical generator, and it is made of six parts: minisize pick-up head, image acquisition units, lip Flame Image Process and pattern recognition unit, phonetic synthesis unit, voice memory unit and phonation unit; It is characterized in that minisize pick-up head (1) is connected with image acquisition units (2), the output of image acquisition units (2) connects lip type image model recognition unit (3), the signal of lip type image model recognition unit (3) outputs to phonetic synthesis unit (4), phonetic synthesis unit (4) is connected with voice memory unit (5), phonetic synthesis unit (4) extracts phonetic synthesis key element synthetic video signal from voice memory unit (5), output to phonation unit (6), then send and lip type and the corresponding sound of variation order thereof by loudspeaker (7).

2. by the said lip identification of claim 1 acoustical generator, it is characterized in that: minisize pick-up head and image acquisition units adopt integrated imageing sensor.

3. by the said lip identification of claim 1 acoustical generator, it is characterized in that: lip Flame Image Process and pattern recognition unit, the phonetic synthesis unit, voice memory unit adopts digital signal microprocessor or digital signal microprocessor system.

4. by the said lip identification of claim 1 acoustical generator, it is characterized in that: lip Flame Image Process and pattern recognition unit, the phonetic synthesis unit, voice memory unit adopts microprocessor or microprocessor system.

5. by the said lip identification of claim 1 acoustical generator, it is characterized in that: lip Flame Image Process and pattern recognition unit, the phonetic synthesis unit, voice memory unit adopts ARM microprocessor or ARM microprocessor system.

6. by the said lip identification of claim 1 acoustical generator, it is characterized in that: phonation unit comprises digital to analog converter and amplifier composition.

7. by the said lip identification of claim 1 acoustical generator, it is characterized in that: phonation unit adopts codec.

8. by the said lip identification of claim 1 acoustical generator, it is characterized in that: minisize pick-up head is arranged on lip the place ahead.