CN105022470A

CN105022470A - Method and device of terminal operation based on lip reading

Info

Publication number: CN105022470A
Application number: CN201410153736.2A
Authority: CN
Inventors: 尚国强
Original assignee: ZTE Corp
Current assignee: ZTE Corp
Priority date: 2014-04-17
Filing date: 2014-04-17
Publication date: 2015-11-04
Also published as: WO2015158082A1

Abstract

The invention discloses a method and a device of terminal operation based on lip reading, and relates to technical field of multimedia communication. The method comprises the following steps: respectively recognizing lip motion and voice of a user, to obtain a lip motion recognition result and a voice recognition result of the user; performing matching processing on the obtained lip motion recognition result and the voice recognition result of the user, to obtain a matching result; and according to the matching result, operating the terminal. The method and the device improve accuracy of terminal operation, and provide convenience for users, and improve user experience.

Description

A kind of terminal operation method based on labiomaney and device

Technical field

The present invention relates to multimedia communication technology field, particularly a kind of terminal operation method based on labiomaney and device.

Background technology

Along with the development of hardware technology and the development of software, the technology such as plane picture, 3D rendering, voice technology, digital image-forming obtain great development, not only image is more and more clear, and its module is also more and more less, and the equipment such as mobile terminal are widely used; The ability of the processor of mobile terminal also from strength to strength, the video of high-resolution and image and broadband voice can be processed comparatively easily, the development of speech recognition technology, make to have also been obtained great expansion to the manipulation of mobile terminal, namely mutual by voice and mobile terminal, the both hands discharging people do other affairs more.

Voice technology development on mobile terminals thus, also allow manufacturer terminal in the man-machine interaction constantly attempting how allowing mobile terminal convenient and more accurate, the process of interactive voice also has some shortcomings not temporarily also to be resolved on mobile terminals, accuracy as speech recognition in a noisy environment can decline, when multiple sound resource, the accuracy identified also declines fast, to decline to a great extent the also shortcoming such as None-identified descending voice assignment more at a distance.

In order to solve the problem, the invention provides a kind of terminal operation method based on labiomaney and device.

Summary of the invention

The object of the present invention is to provide a kind of terminal operation method based on labiomaney and device, solve in prior art under noisy environment or larger distance, the problem that the accuracy that use speech recognition operates terminal is lower.

According to an aspect of the present invention, provide a kind of terminal operation method based on labiomaney, comprise the following steps:

The lip motion of user and voice are identified respectively, obtains lip motion recognition result and the voice identification result of user;

The lip motion recognition result of obtained user and voice identification result are carried out matching treatment, obtains matching result;

According to described matching result, described terminal is operated.

Preferably, user's lip motion is identified, obtains user's lip motion recognition result and comprise:

Obtain user's face image sequence;

Lip-region in obtained user's face image sequence is identified, obtains user's lip characteristic sequence;

The lip standard sequence feature obtained user's lip characteristic sequence and terminal prestored carries out matching treatment, finds the lip standard sequence feature of mating with described user's lip characteristic sequence;

Using the operational order corresponding to the lip standard sequence feature of mating with described user's lip characteristic sequence as user's lip motion recognition result.

Preferably, user speech is identified, obtains user speech recognition result and comprise:

By carrying out voice recognition processing to the user speech picked up, obtain user vocal feature sequence;

The token sound sequence signature that obtained user vocal feature sequence and terminal prestore is carried out matching treatment, finds the token sound characteristic sequence with described user vocal feature sequences match;

Using with the operational order corresponding to the token sound sequence signature of described user vocal feature sequences match as user speech recognition result.

Preferably, described carries out matching treatment by the lip motion recognition result of obtained user and voice identification result, obtains matching result and comprises:

Judge whether lip motion recognition result and the voice identification result of the user obtained match;

When described lip motion recognition result and institute's speech recognition result match, using the described recognition result matched as matching result;

When described lip motion recognition result does not mate with institute speech recognition result, using described lip motion recognition result or institute's speech recognition result as matching result.

Preferably, the operational order corresponding to the described lip standard sequence feature of mating with described user's lip characteristic sequence is comprised as user's lip motion recognition result:

Set up the first mapping table of lip standard sequence feature and operational order;

According to the first set up mapping table, find out the operational order corresponding to lip standard sequence feature mated with described user's lip characteristic sequence;

Using described operational order as user's lip motion recognition result.

Preferably, the operational order corresponding to token sound sequence signature that is described and described user vocal feature sequences match is comprised as user speech recognition result:

Set up the second mapping table of token sound sequence signature and operational order;

According to the second set up mapping table, find out the operational order corresponding with the token sound sequence signature of described user vocal feature sequences match;

Using described operational order as user speech recognition result.

According to a further aspect in the invention, provide a kind of terminal operation device based on labiomaney, comprising:

Identification module, for identifying respectively the lip motion of user and voice, obtains lip motion recognition result and the voice identification result of user;

Matching module, for the lip motion recognition result of obtained user and voice identification result are carried out matching treatment, obtains matching result;

Operational module, for according to described matching result, operates described terminal.

Preferably, described identification module comprises:

Acquiring unit, for obtaining user's face image sequence, and identifies the lip-region in obtained user's face image sequence, obtains user's lip characteristic sequence;

Lip movement matching unit, carries out matching treatment for the lip standard sequence feature obtained user's lip characteristic sequence and terminal prestored, and finds the lip standard sequence feature of mating with described user's lip characteristic sequence;

Obtain lip movement recognition result unit, for the operational order corresponding to the lip standard sequence feature of will mate with described user's lip characteristic sequence as user's lip motion recognition result.

Preferably, described identification module comprises:

Obtaining phonetic feature sequence units, for by carrying out voice recognition processing to the user speech picked up, obtaining user vocal feature sequence;

Voice match unit, carries out matching treatment for the token sound sequence signature obtained user vocal feature sequence and terminal prestored, and finds the token sound characteristic sequence with described user vocal feature sequences match;

Obtain voice identification result unit, for using with the operational order corresponding to the token sound sequence signature of described user vocal feature sequences match as user speech recognition result.

Preferably, described matching module comprises:

Whether judging unit, match for the lip motion recognition result and voice identification result judging obtained user;

Operating unit, for when described lip motion recognition result and institute's speech recognition result match, using the described recognition result matched as matching result, and when described lip motion recognition result does not mate with institute speech recognition result, using described lip motion recognition result or institute's speech recognition result as matching result.

Compared with prior art, beneficial effect of the present invention is:

The present invention is operated terminal by speech recognition and lip identification, improves the accuracy of user operation terminal, brings conveniently to user.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of a kind of terminal operation method based on labiomaney provided by the invention;

Fig. 2 is the schematic diagram of a kind of terminal operation device based on labiomaney provided by the invention;

Fig. 3 is the process flow diagram only utilizing the terminal operation method of labiomaney that first embodiment of the invention provides;

Fig. 4 is the process flow diagram utilizing the terminal operation method of labiomaney and voice that second embodiment of the invention provides;

Fig. 5 is the schematic diagram of the terminal operation device based on labiomaney that third embodiment of the invention provides.

Embodiment

Below in conjunction with accompanying drawing to a preferred embodiment of the present invention will be described in detail, should be appreciated that following illustrated preferred embodiment is only for instruction and explanation of the present invention, is not intended to limit the present invention.

Fig. 1 shows the process flow diagram of a kind of terminal operation method based on labiomaney provided by the invention, as shown in Figure 1, comprises the following steps:

Step S101: identify respectively the lip motion of user and voice, obtains lip motion recognition result and the voice identification result of user;

Specifically, user's lip motion is identified, obtain user's lip motion recognition result and comprise: obtain user's face image sequence; Lip-region in obtained user's face image sequence is identified, obtains user's lip characteristic sequence; The lip standard sequence feature obtained user's lip characteristic sequence and terminal prestored carries out matching treatment, finds the lip standard sequence feature of mating with described user's lip characteristic sequence; Using the operational order corresponding to the lip standard sequence feature of mating with described user's lip characteristic sequence as user's lip motion recognition result.User speech being identified, obtains user speech recognition result and comprise: by carrying out voice recognition processing to the user speech picked up, obtaining user vocal feature sequence; The token sound sequence signature that obtained user vocal feature sequence and terminal prestore is carried out matching treatment, finds the token sound characteristic sequence with described user vocal feature sequences match; Using with the operational order corresponding to the token sound sequence signature of described user vocal feature sequences match as user speech recognition result.Wherein, the acquisition of user's face image sequence can have several mode, obtain as obtained by camera, by video file or by the file acquisition of other types as animation sequence.

More particularly, the operational order corresponding to the described lip standard sequence feature of mating with described user's lip characteristic sequence is comprised as user's lip motion recognition result: the first mapping table setting up lip standard sequence feature and operational order; According to the first set up mapping table, find out the operational order corresponding to lip standard sequence feature mated with described user's lip characteristic sequence; Using described operational order as user's lip motion recognition result.Operational order corresponding to token sound sequence signature that is described and described user vocal feature sequences match is comprised as user speech recognition result: the second mapping table setting up token sound sequence signature and operational order; According to the second set up mapping table, find out the operational order corresponding with the token sound sequence signature of described user vocal feature sequences match; Using described operational order as user speech recognition result.

Step S102: the lip motion recognition result of obtained user and voice identification result are carried out matching treatment, obtains matching result;

Specifically, first, judge whether the lip motion recognition result of the user obtained and voice identification result match; Secondly, when described lip motion recognition result and institute's speech recognition result match, using the described recognition result matched as matching result; When described lip motion recognition result does not mate with institute speech recognition result, using described lip motion recognition result or institute's speech recognition result as matching result.

Step S103: according to described matching result, operates described terminal.

Specifically, when described lip motion recognition result and institute's speech recognition result match, according to the described recognition result matched, terminal is operated; When described lip motion recognition result does not mate with institute speech recognition result, according to described lip motion recognition result or institute's speech recognition result, terminal is operated.

Fig. 2 shows the schematic diagram of a kind of terminal operation device based on labiomaney provided by the invention, as shown in Figure 2, comprising: identification module 201, for identifying respectively the lip motion of user and voice, obtaining lip motion recognition result and the voice identification result of user; Matching module 202, for the lip motion recognition result of obtained user and voice identification result are carried out matching treatment, obtains matching result; Operational module 203, for according to described matching result, operates described terminal.

Specifically, identification module 201 comprises: acquiring unit, for obtaining user's face image sequence, and identifies the lip-region in obtained user's face image sequence, obtains user's lip characteristic sequence; Lip movement matching unit, carries out matching treatment for the lip standard sequence feature obtained user's lip characteristic sequence and terminal prestored, and finds the lip standard sequence feature of mating with described user's lip characteristic sequence; Obtain lip movement recognition result unit, for the operational order corresponding to the lip standard sequence feature of will mate with described user's lip characteristic sequence as user's lip motion recognition result.And obtain phonetic feature sequence units, for by carrying out voice recognition processing to the user speech picked up, obtain user vocal feature sequence; Voice match unit, carries out matching treatment for the token sound sequence signature obtained user vocal feature sequence and terminal prestored, and finds the token sound characteristic sequence with described user vocal feature sequences match; Obtain voice identification result unit, for using with the operational order corresponding to the token sound sequence signature of described user vocal feature sequences match as user speech recognition result.

Whether described matching module 203 comprises: judging unit, match for the lip motion recognition result and voice identification result judging obtained user; Operating unit, for when described lip motion recognition result and institute's speech recognition result match, using the described recognition result matched as matching result, and when described lip motion recognition result does not mate with institute speech recognition result, using described lip motion recognition result or institute's speech recognition result as matching result.

Fig. 3 shows the process flow diagram only utilizing the terminal operation method of labiomaney that first embodiment of the invention provides, and as shown in Figure 3, comprises the following steps:

Step S301: obtain image sequence;

Terminal starts labiomaney application technology, obtain the image comprising face accordingly, identify human face region, that is, containing people's face area or at least containing lip-region in this image sequence.

Wherein, the acquisition of image sequence can have several mode, obtain as obtained by camera, by video file or by the file acquisition of other types as animation sequence.

Step S302: identify from the lip-region obtained image sequence, obtains user's lip characteristic sequence;

According to the image sequence obtained in step S301, carry out the identification of lip-region, as the unique point of the major effect labiomaney such as two, left and right labial angle, upper lip summit, lower lip low spot, lip outline line for lip, line identify, form a lip-region characteristic sequence.

Step S303: described lip-region feature series is extracted, and and the identification of lip feature identification module, obtain recognition result;

Extract lip-region characteristic sequence, and form corresponding lip characteristic kinematic sequence chart according to time sequencing, and carry out Model Identification according to this sequence chart, that is, to lip-region feature series carry out extraction and and lip feature identification module obtain recognition result alternately.

Step S304: the order by recognition result coupling being correspondence, to realize the operation to terminal device.

Recognition result and interaction command module are carried out coupling and is converted into corresponding order, terminal makes corresponding operation to this order, then once complete alternately.

Below for the direct controlling equipment of labiomaney identification, specific embodiment illustrates particular content of the present invention:

Step 1. opens the camera of equipment, starts labiomaney application module;

Step 2. is by the Head And Face of interactive interface tracker, and first equipment identify Head And Face, according to the attribute of Head And Face, follows up lip, identifies lip-region and follows the trail of lip movement;

The feature of step 3. pair lip movement is extracted, and forms a characteristic sequence R, is carried out extracting the input as lip identification module by this characteristic sequence, lip identification module output matching result S;

Matching result S and man-machine interaction storehouse mate by step 4., if there is coupling, then to the operation that equipment is correlated with, as " making a phone call " etc.; If matching result is wrong, then point out this subcommand invalid.

Described above to be that lip-reading result directly applies to terminal device mutual, and that is, in interactive voice process, the result of lip-reading directly applies to alternately.But, when as interactive voice technique complementary time, when cannot voice identification result be got, the result of lip-reading identification is converted into voice mate, it is mutual that the result of coupling is applied to terminal device, therefore, with the embodiment of Fig. 4 below, particular content of the present invention is described:

Fig. 4 shows the process flow diagram utilizing the terminal operation method of labiomaney and voice that second embodiment of the invention provides, and as shown in Figure 4, comprises the following steps:

Step S401. opens the labiomaney application module of the phonetic feature identification module of equipment, the camera opening equipment and starting outfit;

Equipment, except carrying out except speech recognition, also can carry out lip identification simultaneously.

Step S402. is by interactive interface identification lip-region and follow the trail of lip movement;

By the Head And Face of interactive interface tracker, and identify number of people facial zone, according to the attribute of Head And Face, follow up lip, identify lip-region and follow the trail of lip movement

The feature of step S403. to lip movement is extracted;

The feature of lip movement is extracted, forms a characteristic sequence R, this characteristic sequence is extracted, and using the input of the R1 of extraction as lip identification module, lip identification module output matching result S

The result of the result of speech recognition and lip identification compares by step S404., operates terminal according to comparative result.

The result of the result of speech recognition and lip identification is compared, if identical, then man-machine interaction performs this order, if not identical, the result of user's choice for use speech recognition is then pointed out still to use the result of labiomaney, or user is arranged based on voice recognition commands, when speech recognition is without the result using labiomaney identification during result.Here speech recognition and lip identification can be confirmed mutually, mutually supplement, and object obtains better recognition result.

Fig. 5 shows the schematic diagram of the terminal operation device based on labiomaney that third embodiment of the invention provides, and as shown in Figure 5, comprising: labiomaney application module, lip feature identification module, phonetic feature identification module and interaction command module.Wherein labiomaney application module refers to and uses lip-reading as the application exported, as used the messages application of lip-reading; Lip feature identification module refers to the public module that labiomaney application module will call, and this lip feature identification module can carry out face recognition, lip identification, lip feature extraction and lip reading identification; Phonetic feature identification module refers to the module realizing speech recognition, interaction command module refers to the application in order to realize man-machine interaction, interactive interface is had between interaction command module and labiomaney application module, lip feature identification module, phonetic feature identification module, receive the input from these modules, and make corresponding interactive action according to these inputs, or provide corresponding Output rusults.

In sum, the invention provides a kind of human-computer interaction device based on lip-reading, particularly lip-reading application on mobile terminals, if the realization of interactive command, labiomaney are to the conversion etc. of voice.Lip-reading can be used alone in the manipulation of terminal, also can supplement as the effective of speech recognition controlled, namely when Voice command is identified in too late, come supplementary complete with lip-reading.

In sum, the present invention has following technique effect:

The present invention makes lip-reading to be used fully by the ability such as front camera and rear camera, high-definition image process of terminal, improves the limit of power of man-machine interaction, makes to extend existing interactive capability to a certain extent.

Although above to invention has been detailed description, the present invention is not limited thereto, those skilled in the art of the present technique can carry out various amendment according to principle of the present invention.Therefore, all amendments done according to the principle of the invention, all should be understood to fall into protection scope of the present invention.

Claims

1. based on a terminal operation method for labiomaney, it is characterized in that, comprise the following steps:

According to described matching result, described terminal is operated.

2. method according to claim 1, is characterized in that, identifies user's lip motion, obtains user's lip motion recognition result and comprises:

Obtain user's face image sequence;

3. method according to claim 1, is characterized in that, identifies user speech, obtains user speech recognition result and comprises:

4. according to the method in claim 2 or 3, it is characterized in that, described carries out matching treatment by the lip motion recognition result of obtained user and voice identification result, obtains matching result and comprises:

5. method according to claim 2, is characterized in that, is comprised by the operational order corresponding to the described lip standard sequence feature of mating with described user's lip characteristic sequence as user's lip motion recognition result:

Using described operational order as user's lip motion recognition result.

6. method according to claim 3, is characterized in that, is comprised by the operational order corresponding to token sound sequence signature that is described and described user vocal feature sequences match as user speech recognition result:

Using described operational order as user speech recognition result.

7., based on a terminal operation device for labiomaney, it is characterized in that, comprising:

8. device according to claim 7, is characterized in that, described identification module comprises:

9. device according to claim 7, is characterized in that, described identification module also comprises:

10. device according to claim 8 or claim 9, it is characterized in that, described matching module comprises: