CN104217218A

CN104217218A - Lip language recognition method and system

Info

Publication number: CN104217218A
Application number: CN201410462392.3A
Authority: CN
Inventors: 王冠华; 伍楷舜; 倪明选
Original assignee: Guangzhou HKUST Fok Ying Tung Research Institute
Current assignee: Guangzhou HKUST Fok Ying Tung Research Institute
Priority date: 2014-09-11
Filing date: 2014-09-11
Publication date: 2014-12-17
Anticipated expiration: 2034-09-11
Also published as: CN104217218B

Abstract

The invention discloses a lip language recognition method. The lip language recognition method comprises the steps of transmitting wireless signals directionally so that the wireless signals cover the face of a user; receiving wireless signals reflected by the face of the user, filtering the reflected wireless signals, and obtaining reflected signals of the mouth during the user's mouth motion; segmenting the reflected signals of the mouth, obtaining segmented signals, and obtaining waveform characteristic patterns of the segmented signals; enabling the segmented signals to be reflected signals each time a voice event is sent; comparing the similarity of the waveform characteristic patterns of the segmented signals with all mouth motion characteristic patterns sampled in advance, and reading the voice event corresponding to the mouth motion characteristic pattern highest in similarity. Correspondingly, an embodiment of the invention further provides a lip language recognition system. By the adoption of the lip language recognition method and system, the user's mouth motion can be detected through the wireless signals to achieve lip language recognition, and recognition efficiency and the accurate rate are improved.

Description

A kind of lip reading recognition methods and system

Technical field

The present invention relates to mobile communication technology field, particularly relate to a kind of lip reading recognition methods and system.

Background technology

The application of wireless exploration identification reaches a new high, and comprises motion detection, gesture identification, location, materials classification etc.By the reflection of determination and analysis signal, wireless exploration recognition system can find motion through walls and identify the gesture of people, even can tumour in detection and positioning human body.

But, in the prior art, to the identification that user speaks, just realized by acoustic sensor or camera head.The system layout cost of adopting in this way is very high, and has sensing and the communication range of limit.In addition, the system of acoustic sensor or camera head is adopted to have delay for detection, because the necessary first recorded voice of sensor, the necessary first pictures taken of camera head, then process, then just can be sent to receiver.Meanwhile, the system of acoustic sensor is adopted can not to decode in too noisy environment.

Summary of the invention

The embodiment of the present invention proposes a kind of lip reading recognition methods and system, can be realized the identification of lip reading, improve recognition efficiency and accuracy rate by the motion of wireless signal acquisition user mouth.

The embodiment of the present invention provides a kind of lip reading recognition methods, comprising:

Directional transmissions wireless signal, makes described wireless signal cover user's face;

Receive the wireless signal of user's face reflection, and the wireless signal of described reflection is filtered, obtain the mouth reflected signal during motion of user's mouth;

Segmentation is carried out to described mouth reflected signal, obtains block signal, and extract the waveform character figure of described block signal; Described block signal is the reflected signal often sending a speech events;

The waveform character figure of more described block signal and the similarity of all mouth motion feature figure sampled in advance, read the speech events corresponding to mouth motion feature figure that similarity is the highest; Described mouth motion feature figure is wireless signal waveform character figure when sending a speech events.

Further, described directional transmissions wireless signal, makes described wireless signal cover user's face, specifically comprises:

At the uniform velocity rotating and radio signal, records the time point that described wireless signal conversion degree is maximum;

The angular velocity at the uniform velocity rotated according to described wireless signal and described time point, calculate the angle of described wireless signal directional transmissions;

According to described angle directional transmissions wireless signal, described wireless signal is made to cover user's face.

Further, the wireless signal of described reception user's face reflection, and the wireless signal of described reflection is filtered, obtain the mouth reflected signal during motion of user's mouth, specifically comprise:

Receive the wireless signal of user's face reflection, and adopt Butterworth filter, the wireless signal of described reflection is filtered, obtain filtering signal;

Delay threshold value is set, removes the filtering signal being greater than described delay threshold value time delay, obtain the mouth reflected signal during motion of user's mouth.

Further, described delay threshold value is set, removes the filtering signal being greater than described delay threshold value time delay, obtain the mouth reflected signal during motion of user's mouth, specifically comprise:

Inverse fast Fourier transform is carried out to the channel condition information CSI of described filtering signal, obtains the time domain CSI of filtering signal;

Delay threshold value is set, removes the filtering signal that time domain CSI is greater than described delay threshold value, obtain the mouth reflected signal with time domain CSI;

Fast Fourier Transform (FFT) is carried out to the time domain CSI of described mouth reflected signal, obtains the mouth reflected signal during motion of user's mouth.

Further, described segmentation is carried out to described mouth reflected signal, obtains block signal, and extract the waveform character figure of described block signal, specifically comprise:

Adopt Wavelet Transformation Algorithm, segmentation is carried out to described mouth reflected signal, obtain block signal;

In the CSI of described block signal, choose the subcarrier that each time period change in signal strength is maximum, and the subcarrier that each time period is chosen is stitched together, obtain the waveform character figure of described block signal; Described CSI has 30 subcarriers.

Further, the waveform character figure of described block signal and the similarity of all mouth motion feature figure sampled in advance, read the speech events corresponding to mouth motion feature figure that similarity is the highest, specifically comprise:

According to least square method algorithm, the waveform character figure of more described block signal and the similarity of all mouth motion feature figure sampled in advance, read the speech events corresponding to mouth motion feature figure that similarity is the highest.

Correspondingly, the embodiment of the present invention also provides a kind of lip reading recognition system, comprises transmitting terminal and receiving end; Described receiving end comprises signal filtering module, characteristic extracting module and Characteristic Contrast module;

Described transmitting terminal is used for directional transmissions wireless signal, makes described wireless signal cover user's face;

Described signal filtering module for receiving the wireless signal of user's face reflection, and filters the wireless signal of described reflection, obtains the reflected signal of mouth when user speaks;

Described characteristic extracting module is used for carrying out segmentation to described mouth reflected signal, obtains block signal, and extracts the waveform character figure of block signal; Described block signal is the reflected signal often sending a speech events;

Described Characteristic Contrast module is used for the waveform character figure of more described block signal and the similarity of all mouth motion feature figure sampled in advance, reads the speech events corresponding to mouth motion feature figure that similarity is the highest; Described mouth motion feature figure is the wireless signal waveform character figure of user when sending a speech events.

Implement the embodiment of the present invention, there is following beneficial effect:

The lip reading recognition methods that the embodiment of the present invention provides and system can extract the waveform character figure of mouth reflected signal by the motion of wireless signal acquisition user mouth, and waveform character figure and the mouth motion feature figure sampled in advance are contrasted, thus realize the identification of lip reading, improve recognition efficiency and accuracy rate; Without the need to disposing extra device, with low cost, and, under the environment with noise, still accurately can carry out lip reading identification; Directional transmissions wireless signal, makes wireless signal cover user's face, to reduce incoherent multipath effect, improves the precision of detectable signal; Extract the waveform character figure of mouth reflected signal, choose subcarrier that in every period, change in signal strength is maximum as characteristic pattern, reduce computation complexity, improve recognition efficiency; Because same subscriber has identical word speed, thus set up mouth motion feature figure archives per family to each use, when identifying the lip reading of user, direct and its mouth motion feature figure archives contrast, and improve the accuracy rate of lip reading identification; Adopt context-sensitive error correcting technique, the lip reading identified is verified, improve the accuracy rate of lip reading identification further.

Accompanying drawing explanation

Fig. 1 is the schematic flow sheet of an embodiment of lip reading recognition methods provided by the invention;

Fig. 2 is the schematic flow sheet of an embodiment of step S1 in lip reading recognition methods provided by the invention;

Fig. 3 is the schematic flow sheet of an embodiment of step S2 in lip reading recognition methods provided by the invention;

Fig. 4 be embodiment illustrated in fig. 2 in the schematic flow sheet of an embodiment of step S22;

Fig. 5 is the schematic flow sheet of an embodiment of step S3 in lip reading recognition methods provided by the invention;

Fig. 6 is the structural representation of an embodiment of lip reading recognition system provided by the invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

See Fig. 1, be the schematic flow sheet of an embodiment of lip reading recognition methods provided by the invention, comprise:

S1, directional transmissions wireless signal, make described wireless signal cover user's face;

The wireless signal of S2, the reflection of reception user's face, and the wireless signal of described reflection is filtered, obtain the mouth reflected signal during motion of user's mouth;

S3, segmentation is carried out to described mouth reflected signal, obtain block signal, and extract the waveform character figure of described block signal; Described block signal is the reflected signal often sending a speech events;

The waveform character figure of S4, more described block signal and the similarity of all mouth motion feature figure sampled in advance, read the speech events corresponding to mouth motion feature figure that similarity is the highest; Described mouth motion feature figure is wireless signal waveform character figure when sending a speech events.

In one preferably embodiment, as shown in Figure 2, described step S1 specifically comprises:

S11, at the uniform velocity rotating and radio signal, record the time point that described wireless signal conversion degree is maximum;

S12, the angular velocity at the uniform velocity rotated according to described wireless signal and described time point, calculate the angle of described wireless signal directional transmissions;

S13, according to described angle directional transmissions wireless signal, make described wireless signal cover user's face.

In another preferably embodiment, described step S1 specifically comprises:

S111, transmitting terminal keep the vertical direction elevation angle of wireless signal transmission constant, in the horizontal direction at the uniform velocity 360 degree of rotating and radio signals;

S112, receiving end record wireless signal convert the maximum very first time point of degree in the horizontal direction, and will put the very first time and feed back to transmitting terminal;

The angular velocity that S113, transmitting terminal at the uniform velocity rotate according to horizontal direction and very first time point, the horizontal direction angle of adjustment wireless signal;

S114, transmitting terminal fixing horizontal directional angular velocity, in the vertical direction at the uniform velocity 360 degree of rotating and radio signals;

S115, receiving end record wireless signal convert the second maximum time point of degree in the vertical direction, and the second time point is fed back to transmitting terminal;

The angular velocity that S116, transmitting terminal at the uniform velocity rotate according to wireless signal vertical direction and the second time point, the vertical direction angle of adjustment wireless signal;

S117, according to horizontal direction angle and vertical direction angle, directional transmissions wireless signal, make wireless signal cover user's face.

In another embodiment, the speech events that the angle of directional transmissions wireless signal repeats predefine known by user in fixed position realizes.Such as, user's sound sending " " per second, transmitting terminal is rotating and radio signal at the uniform velocity, receiving end detects the waveform received, and the waveform received and the waveform character figure corresponding to the " " sound of sampling in advance are contrasted, find the time point that the waveform character figure similarity corresponding with " " sound is the highest, waveform mates most.According to time point and wireless signal angular velocity of rotation, calculate wireless signal directional transmissions angle.

It should be noted that, transmitting terminal wireless signal carries out 360 degree of scannings to surrounding environment, and its scanning process realizes by transmitting terminal being arranged on rotation on stepper motor.Receiving end converts degree by perceptual signal, the time point that tracer signal conversion degree is maximum.Wherein, transmitting terminal can from same position repeatedly multiple scanning process, receiving end record time point repeatedly, then by analyzing and getting rid of accidental error, feeds back to transmitting terminal by correct time point.Transmitting terminal is according to the angle of the time point adjustment directional transmissions wireless signal of feedback.Receiving end can also be analyzed in matching process at follow-up signal, further to transmitting terminal feedback time point information more accurately, to improve the transmit direction of wireless signal.

Adopt the mode of directional transmissions wireless signal, make wireless signal cover user's face, to reduce incoherent multipath effect, improve the precision of detectable signal.

Further, as shown in Figure 3, described step S2 specifically comprises:

The wireless signal of S21, the reflection of reception user's face, and adopt Butterworth filter, the wireless signal of described reflection is filtered, obtains filtering signal.

Adopt 3 rank Butterworth bandpass filter, and the frequency response that Butterworth bandpass filter has maximally-flat in passband is set, to guarantee the fidelity of the signal in range of target frequencies, cancellation band external noise simultaneously.By Butterworth bandpass filter, retain the interfere information of mouth motion to signal, the information of other frequency ranges of filtering.

S22, delay threshold value is set, removes the filtering signal being greater than described delay threshold value time delay, obtain the mouth reflected signal during motion of user's mouth.

Wireless signal sends from transmitting terminal, along different multipath tolerant, i.e. multipath reflection, finally can arrive receiving end.And due to mouth motion, as the motion of tongue, lip and lower jaw, be nonrigid, one group of multipath reflection may reflect the movable information of mouth different piece.Therefore, delay threshold value is set, removes the multipath component (usually from the reflection of surrounding static environment) exceeding time delay and postpone threshold value.Wherein, postponing threshold value is rule of thumb to select and assorting process based on mouth motion feature figure carries out adjusting.Because the maximum extra latency of typical indoor channel was less than for 500 nanoseconds usually, therefore, usually arranging delay threshold value was 500 nanoseconds.

Further, as shown in Figure 4, described step S22 specifically comprises:

S221, inverse fast Fourier transform is carried out to the channel condition information CSI of described filtering signal, obtain the time domain CSI of filtering signal;

S222, delay threshold value is set, removes the filtering signal that time domain CSI is greater than described delay threshold value, obtain the mouth reflected signal with time domain CSI;

S223, Fast Fourier Transform (FFT) is carried out to the time domain CSI of described mouth reflected signal, obtain the mouth reflected signal during motion of user's mouth.

CSI (Channel State Information, channel condition information) represents the fine-grained channel frequency response of each subcarrier.According to CSI power delay profile in the time domain, filtering signal is further filtered.First inverse fast Fourier transform is carried out to the frequency domain CSI of filtering signal, frequency domain CSI is converted to CSI power delay profile in the time domain.Then, delay threshold value is set, removes the multipath component being greater than time delay and postponing threshold value.Finally, by Fast Fourier Transform (FFT), the time domain CSI of the multipath component of reservation is converted back frequency domain CSI, thus obtain mouth reflected signal.

Further, as shown in Figure 5, described step S3 specifically comprises:

S31, employing Wavelet Transformation Algorithm, carry out segmentation to described mouth reflected signal, obtain block signal;

S32, in the CSI of described block signal, choose the subcarrier that each time period change in signal strength is maximum, and the subcarrier that each time period is chosen is stitched together, obtain the waveform character figure of described block signal; Described CSI has 30 subcarriers.

Within each time period, block signal CSI all has 30 subcarriers, and namely block signal CSI all has 30 groups of data in each time period, often organizes signal amplitude and the phase information of a data representation subcarrier.Choose signal intensity (waveform peak-to-peak value) in 30 subcarriers and change maximum subcarrier, and give up all the other 29 subcarriers in this time period.Using the single typical value of subcarrier maximum for change as this time period, and the single typical value chosen each time period is stitched together, and form the signal transformed value in the block signal whole time, this signal transformed value is the waveform character figure of block signal.Within every period, choose the maximum subcarrier of change in signal strength to carry out subsequent treatment, simplify calculating, and improve efficiency.

Further, described step S4 specifically comprises:

It should be noted that, for same user, its word speed has similar rhythm pattern.Sample the mouth motion feature figure of this user in advance, thus according to generalized least square method algorithm, directly compare the waveform character figure of block signal and the similarity of all mouth motion feature figure sampled in advance, obtain the mouth motion feature figure that similarity is the highest.Read the speech events that the highest mouth motion feature figure of similarity is corresponding, the identification of this user's lip reading can be completed.

Further, before described step S2, also comprise:

When sample user sends known speech events, the waveform character figure of wireless signal, obtains the mouth motion feature figure that described known speech events is corresponding;

Described mouth motion feature figure is sorted out, makes the mouth motion feature figure corresponding to known speech events with identical pronunciation be a class.

Before lip reading identification is carried out to user, need first to sample to the mouth motion feature figure of user, different mouth motion feature archives are set up to different user.The method of sampling of mouth motion feature figure is identical with the preparation method of the waveform character figure of above-mentioned block signal, is not described in detail here.

For the speech events that pronunciation is different, mouth motion is different, also different on the impact of wireless signal waveform.But for the speech events that pronunciation is identical, mouth motion is substantially identical, also identical on the impact of wireless signal waveform, therefore, the mouth motion feature figure identical on the impact of wireless signal waveform is classified as a class.

Owing to being relevant between each speech events that user sends, after completing the identification to lip reading, by using contextual error correcting technique, the lip reading identified is verified, reduce the identification error of generic mouth motion feature figure, improve the accuracy rate of lip reading identification further.

Preferably, described speech events is syllable or word.

When the mouth motion feature figure of sample user in advance, the mouth motion feature figure that user sends a syllable can be sampled, also can the mouth motion feature figure that user sends a word be sampled.Accordingly, when carrying out segmentation to mouth reflected signal, the method for segmentation between segmentation in word or word can be adopted.According to the method for segmentation in word, then a word is divided into multiple syllable, identifies this word by the combination of syllable.According to the method for segmentation between word, because usual people has shorter interval time (as 300 milliseconds) sending continuously between two words, by detecting noiseless interval section, word area is separated.

See Fig. 6, be the structural representation of an embodiment of a kind of lip reading recognition system provided by the invention, comprise transmitting terminal 101 and receiving end 102; Described receiving end 102 comprises signal filtering module 103, characteristic extracting module 104 and Characteristic Contrast module 105;

Described transmitting terminal 101, for directional transmissions wireless signal, makes described wireless signal cover user's face;

Described signal filtering module 103 for receiving the wireless signal of user's face reflection, and filters the wireless signal of described reflection, obtains the reflected signal of mouth when user speaks;

Described characteristic extracting module 104, for carrying out segmentation to described mouth reflected signal, obtains block signal, and extracts the waveform character figure of block signal; Described block signal is the reflected signal often sending a speech events;

The waveform character figure of described Characteristic Contrast module 105 for more described block signal and the similarity of all mouth motion feature figure sampled in advance, read the speech events corresponding to mouth motion feature figure that similarity is the highest; Described mouth motion feature figure is the wireless signal waveform character figure of user when sending a speech events.

Wherein, transmitting terminal 101 adopts directional antenna or has bundle wave energy, thus ensures the directional transmissions of wireless signal.When receiving the wireless signal of reflection, multiple receiving end 102 can be used, be deployed in different angles, thus improve the precision identified.

Preferably, described speech events is syllable or word.

It should be noted that, the embodiment of the present invention is only identified as example with the lip reading of a user and is described, but in the middle of concrete enforcement, also can carry out lip reading identification to multiple user simultaneously.

Before identification, first multiple user is sampled respectively, set up the mouth motion feature figure archives that user is corresponding.In identifying, transmitting terminal launches different wireless signals, makes each wireless signal locate different users.Receiving end adopts MIMO (Multiple-Input Multiple-Output, multiple-input and multiple-output) technology, the mouth motion of multiple user that simultaneously decodes.When carrying out lip reading identification to multiple user, adopting and eliminating recognition technology while that (Zigzag cancelation) realizing in a zigzag.Such as, when two user's lip readings being identified simultaneously, first identify the first speech events of first user, when the second speech events of first user and first speech events of the second user occur simultaneously, eliminate the second speech events of first user, and predict its second speech events according to the first speech events of first user, meanwhile, identify first speech events of the second user.Repeat this process, thus realize the lip reading identification to multiple user, without the need to disposing extra equipment.

The above is the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.

Claims

1. a lip reading recognition methods, is characterized in that, comprising:

2. lip reading recognition methods as claimed in claim 1, is characterized in that, described directional transmissions wireless signal, makes described wireless signal cover user's face, specifically comprises:

The angular velocity rotated according to described wireless signal and described time point, calculate the angle of described wireless signal directional transmissions;

3. lip reading recognition methods as claimed in claim 1, is characterized in that, the wireless signal of described reception user's face reflection, and filters the wireless signal of described reflection, obtains the mouth reflected signal during motion of user's mouth, specifically comprises:

4. lip reading recognition methods as claimed in claim 3, is characterized in that, describedly arranges delay threshold value, removes the filtering signal being greater than described delay threshold value time delay, obtains the mouth reflected signal during motion of user's mouth, specifically comprise:

5. lip reading recognition methods as claimed in claim 1, is characterized in that, describedly carries out segmentation to described mouth reflected signal, obtains block signal, and extracts the waveform character figure of described block signal, specifically comprise:

6. lip reading recognition methods as claimed in claim 1, it is characterized in that, the waveform character figure of described block signal and the similarity of all mouth motion feature figure sampled in advance, read the speech events corresponding to mouth motion feature figure that similarity is the highest, specifically comprise:

7. lip reading recognition methods as claimed in claim 1, is characterized in that, at the wireless signal of described reception user's face reflection, and filters the wireless signal of described reflection, before mouth reflected signal when obtaining the motion of user's mouth, also comprises:

Described mouth motion feature figure is sorted out, makes the mouth motion feature figure corresponding to known speech events with same pronunciation be a class.

8. the lip reading recognition methods as described in any one of claim 1 to 7, is characterized in that, described speech events is syllable or word.

9. a lip reading recognition system, is characterized in that, comprises transmitting terminal and receiving end; Described receiving end comprises signal filtering module, characteristic extracting module and Characteristic Contrast module;

10. lip reading recognition system as claimed in claim 9, it is characterized in that, described speech events is syllable or word.