CN109461437B

CN109461437B - Verification content generation method and related device for lip language identification

Info

Publication number: CN109461437B
Application number: CN201811430520.0A
Authority: CN
Inventors: 庞烨; 王义文; 王健宗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-11-28
Filing date: 2018-11-28
Publication date: 2023-05-09
Anticipated expiration: 2038-11-28
Also published as: CN109461437A; WO2020107834A1

Abstract

The embodiment of the invention discloses a method and a device for generating verification content for lip language identification, which are suitable for identifying facial actions in living body detection, wherein the method comprises the following steps: acquiring a lip identification request of a terminal device, acquiring verification request parameters according to the lip identification request, determining the number of verification objects required by the lip identification verification, and selecting the identification objects from a plurality of preset identification object groups to serve as verification objects to form verification contents of the lip identification, wherein adjacent verification objects in the verification contents respectively belong to different identification object groups; outputting the verification content to a verification interface of lip language identification, and carrying out the lip language identification verification of the verification content on the user of the terminal equipment. By adopting the method and the device, different verification contents of adjacent verification object pronunciation lip deformation can be generated, and the occurrence probability of difficult recognition of the lip deformation is reduced, so that the accuracy and convenience of lip language recognition are improved.

Description

Verification content generation method and related device for lip language identification

Technical Field

The invention relates to the technical field of computers, in particular to a verification content generation method for lip language identification and a related device.

Background

At present, the remote identity authentication requirement of a client is gradually revealed, and the identification of the client by face is one of the common means of remote identity authentication. In the face recognition process, in order to perform anti-fraud recognition, living detection strategies such as blinking, shaking, random number and/or character lip reading inspection are added on the basis of video verification. The lip language recognition adopted in the lip reading inspection process of the random numbers and/or the characters is mainly based on voice recognition and is matched with the lip language, although lip language detection is carried out. In the prior art, the lip language identification is to directly identify a human face from an image by a machine vision technology, extract continuous mouth shape variation characteristics of the human, input the continuous mouth shape variation characteristics into a lip language identification model to identify corresponding pronunciation, and then calculate a natural language sentence with the highest possibility. In the prior art, the lip language identification mainly depends on the support of context semantics to obtain the most probable matching result, and when the overlap ratio of a large number of Chinese and/or digital single character mouth shapes is higher, the simple lip language identification has great difficulty.

Disclosure of Invention

The embodiment of the invention provides a verification content generation method and a related device for lip language identification, which can reduce the occurrence probability of difficult identification of lip deformation in the lip language identification process, improve the accuracy and convenience of the lip language identification and have stronger applicability.

In one aspect, the embodiment of the invention provides a method for generating verification content for lip language identification, which comprises the following steps:

acquiring a lip language identification request of terminal equipment, and acquiring verification request parameters according to the lip language identification request;

determining the number n of verification objects required by lip language identification verification according to the verification request parameters, and selecting n identification objects from a plurality of preset identification object groups as n verification objects to form verification content of lip language identification, wherein the n verification objects at least belong to two identification object groups, and adjacent verification objects in the verification content respectively belong to different identification object groups, wherein the pronunciation lip shape changes of the identification objects included in the different identification object groups are different;

outputting the verification content to a verification interface of lip language identification, and carrying out lip language identification verification of the verification content on a user of the terminal equipment based on the verification interface.

Wherein the method further comprises: the pronunciation lip shape change of the first verification object in the verification content of the lip language identification does not comprise pronunciation lip shape change starting from half mouth or mouth closing.

Wherein the method further comprises:

acquiring a plurality of identification objects, wherein the plurality of identification objects comprise identification objects with at least two types of lip changes;

determining a pronunciation lip change of each recognition object in the plurality of recognition objects;

dividing the recognition objects with the pronunciation lips changed to the first type of lip change in the at least two types of lip changes in the plurality of recognition objects into a first recognition object group, and dividing the recognition objects with the pronunciation lips changed to the second type of lip change in the at least two types of lip changes in the plurality of recognition objects into a second recognition object group so as to obtain a plurality of recognition object groups.

Optionally, the determining the pronunciation lip variation of each recognition object in the plurality of recognition objects includes:

classifying a plurality of Chinese phonemes according to pronunciation lips to obtain a phoneme classification result, wherein the phoneme classification result comprises a corresponding relation between the phonemes and the pronunciation lips;

performing phoneme decomposition on the pinyin of any recognition object in the plurality of recognition objects, and determining the pronunciation lip corresponding to each phoneme according to the correspondence between each phoneme obtained by decomposition and the phoneme and the pronunciation lip;

And combining the pronunciation lips corresponding to the phonemes to generate pronunciation lip changes of any recognition object so as to obtain pronunciation lip changes of the recognition objects.

Optionally, the classifying the plurality of chinese phonemes according to pronunciation lips to obtain a phoneme classification result includes:

acquiring a plurality of Chinese phonemes, wherein the plurality of Chinese phonemes comprise phonemes of at least two pronunciation lips;

the method comprises the steps of classifying Chinese phonemes corresponding to a first pronunciation lip of the at least two pronunciation lips in the plurality of Chinese phonemes into a first category, and classifying Chinese phonemes corresponding to a second pronunciation lip of the at least two pronunciation lips in the plurality of Chinese phonemes into a second category;

and storing the Chinese phonemes of the first category and the Chinese phonemes of the second category into the phoneme classification result.

Specifically, the decomposing the pinyin of any recognition object of the plurality of recognition objects into phonemes, and determining the pronunciation lip corresponding to each phoneme according to the correspondence between each phoneme obtained by decomposition and the phoneme and the pronunciation lip includes:

decomposing the pinyin of any recognition object in the plurality of recognition objects into a consonant phoneme and a vowel phoneme, performing lip-shape matching on the consonant phoneme and the vowel phoneme and the phoneme classification result, and obtaining a consonant pronunciation lip corresponding to the consonant phoneme and a vowel pronunciation lip corresponding to the vowel phoneme according to the correspondence between the phonemes and pronunciation lips in the phoneme classification result;

The combining the pronunciation lips corresponding to the phonemes to generate pronunciation lip changes of the any recognition object includes: and combining the consonant pronunciation lip corresponding to the consonant phoneme with the vowel pronunciation lip corresponding to the vowel phoneme to obtain pronunciation lip change of any recognition object.

Another aspect of the embodiment of the present invention provides a verification content generating apparatus for lip language identification, including:

the response module is used for acquiring a lip language identification request of the terminal equipment and acquiring verification request parameters according to the lip language identification request;

the processing module is used for determining the number n of verification objects required by lip language identification verification according to the verification request parameters obtained by the response module, selecting n recognition objects from a plurality of preset recognition object groups as n verification objects to form verification contents of lip language identification, wherein the n verification objects at least belong to two recognition object groups, and adjacent verification objects in the verification contents respectively belong to different recognition object groups, wherein the pronunciation lip changes of the recognition objects included in the different recognition object groups are different;

and the output module is used for outputting the verification content formed by the processing module to a verification interface of the lip language identification, and carrying out the lip language identification verification of the verification content on the user of the terminal equipment based on the verification interface.

Wherein the processing module is further configured to:

Wherein the apparatus further comprises:

the storage module is used for storing the phoneme classification result, a plurality of identification object groups and formulated verification content generation rules;

and the lip recognition module is used for recognizing the lip action of the user, and matching the lip action of the user with the generated pronunciation lip deformation of the verification content of the lip recognition to obtain the lip recognition result of the verification content.

Wherein the processing module is further configured to:

and selecting a generation rule of the verification content of the lip language identification according to the system requirement, and selecting the identification object as the verification object according to the generation rule to be combined into the verification content of the lip language identification.

Another aspect of the embodiment of the present invention provides a terminal device, including: a processor, a transceiver and a memory, the processor, the transceiver and the memory being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor and the transceiver being configured to invoke the program instructions for performing a method as described in an aspect of an embodiment of the invention.

Another aspect of the embodiments of the present invention provides a computer readable storage medium storing computer program instructions which, when executed by a processor, cause the processor to perform a method as in one aspect of the embodiments of the present invention.

According to the embodiment of the invention, the Chinese phonemes are classified according to the pronunciation lip shapes, the recognition objects are subjected to pinyin decomposition, the pinyin decomposition result is matched with the phoneme classification result of the Chinese phonemes, the pronunciation lip shape changes corresponding to the recognition objects are obtained, the recognition objects are grouped according to the pronunciation lip changes, wherein the recognition objects with consistent pronunciation lip changes are grouped, when a lip language recognition request of a terminal device is received, the recognition objects are selected from a plurality of recognition object groups according to a preset verification content generation rule to serve as verification objects, the pronunciation lip changes of two adjacent verification objects are prevented from being identical, and verification contents are generated in a combined mode, so that the occurrence of the situation that the pronunciation lip changes are difficult to recognize is reduced, and the accuracy of lip language recognition is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a recognition object packet provided by an embodiment of the present invention;

FIG. 2 is a diagram of a phoneme classification result according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating digital sound lip deformation provided by an embodiment of the present invention;

fig. 4 is a flow chart of a verification content generation method for lip language identification according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a scenario of verification content generation for lip language recognition provided by an embodiment of the present invention;

FIG. 6-a is an interactive flowchart of a method for generating verification content for lip language identification according to an embodiment of the present invention;

FIG. 6-b is a further interactive flow chart of a verification content generation method for lip language identification provided by an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a verification content generating device for lip language identification according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Living verification based on lip language recognition is another living detection method that is different from voice verification. The lip language identification belongs to a part of face recognition detection, and can read the content spoken by a speaker only by identifying the lip action of the speaker through machine vision identification, so that the lip language identification can assist voice interaction and image identification, for example, when the surrounding noise is overlarge, the interference can be avoided through the lip language identification technology, and the accuracy of system identification is greatly improved. At present, the lip language identification is to extract a human face through an image, obtain continuous mouth-shaped variation characteristics of the human, and match corresponding pronunciation identified in a lip language identification model, so that natural language sentences with the highest possibility are calculated, and a result cannot be directly obtained.

The verification content generation method for lip language identification (for convenience of description, the method provided by the embodiment of the invention may be abbreviated) provided by the embodiment of the invention may be suitable for mobile phones, computers, mobile internet devices (mobile internet device, MID) or other terminal devices capable of obtaining lip images, and may be specifically determined according to actual application scenarios, which is not limited herein. For convenience of description, a terminal device will be described as an example. Specifically, in the embodiment of the invention, a plurality of identification objects are selected from preset identification object groups according to a certain verification content generation rule, and the selected identification objects are used as verification objects to form verification content used for lip language identification verification. The recognition object grouping is obtained by grouping the recognition objects with the same pronunciation lip variation as a group, and the pronunciation lip variation among different groups is different. The method provided by the embodiment of the invention can select the identification objects with different pronunciation lip shape changes based on the identification object group to form the verification content of the lip language identification verification, so that the pronunciation lip changes of all the identification objects in the verification content are different, the accuracy of the lip language identification is improved, and the applicability is stronger. For convenience of description, a description will be first made of a generation process of a preset recognition object packet.

Referring to fig. 1, fig. 1 is a schematic flow chart of identifying object packets according to an embodiment of the present invention. As shown in fig. 1, the process of identifying an object group according to the embodiment of the present invention may include the following steps:

step S101, classifying the Chinese phonemes according to pronunciation lips to obtain a phoneme classification result.

In some possible embodiments, the chinese phonemes are the smallest phonetic units, abbreviated phonemes, that are divided according to the natural properties of the speech. And analyzing according to pronunciation actions in syllables, wherein one pronunciation action forms a phoneme, the sounds emitted by the same pronunciation action are the same phoneme, and each pronunciation action corresponds to one pronunciation lip. Wherein the phonemes are divided into two major classes, vowels and consonants, specifically, vowels include a, o, e, i, u, -i 1 (front i) and-i iota (back i), er, and consonants include b, p, m, f, z, c, s, d, t, n, l, zh, ch, sh, r, j, q, x, g, k, h, ng. Based on the pronunciation lip shape of the phonemes, all Chinese phonemes can be classified according to the pronunciation lip shape, and a phoneme classification result is obtained. Specifically, the phoneme classification result includes a correspondence between phonemes and pronunciation lips, and all chinese phonemes may be classified into 7 types of phonemes, where the phoneme classification result is shown in fig. 2, and fig. 2 is a schematic diagram of the phoneme classification result provided in the embodiment of the present invention. As shown in fig. 2, the above-mentioned phoneme classification result is 7 kinds as follows:

The first class (such as deformation 1) is a half-lip shape, as shown in a picture 1, wherein the phoneme classification result of the first class can comprise phonemes such as e, i, d, t, n, l, g, k, h, j, q, x, z, c, s, zh, ch, sh, ng, y and the like;

the second class (such as deformation 2) is a full lip shape, as shown in a picture 2, where the phoneme classification result of the second class may include phonemes such as a, er, etc.;

a third class (e.g., shape-change 3) that is AO-shaped lip, as shown in fig. 3, where AO may be included in the phoneme classification result of the third class;

a fourth class (e.g., shape-change 4) is a w-shaped lip, as shown in fig. 4, where the phoneme classification result of the fourth class may include u, v, o, w;

a fifth class (e.g., shape change 5) is an ou-shaped lip, as shown in fig. 5, where the phoneme classification result of the fifth class may include ou, iu;

a sixth class (e.g., variant 6) that is a closed mouth lip, as shown in fig. 6, where the phoneme classification result of the sixth class may include b, p, m;

a seventh class (e.g., variant 7) is labial, as shown in fig. 7, where the phoneme classification result of the seventh class may include f.

Here ao, ou, iu is composed of a plurality of phonemes, but based on the principle of single-mouth shape matching, since the pronunciation lips of the three phoneme combinations are all single-mouth shapes, they are also considered herein to be classified as single phonemes. The single-mouth type matching principle refers to a standard of pronunciation lip matching by taking a phoneme or a phoneme combination with unchanged lip shape as a subsequent recognition object. It can be understood that the phonemes, which are the minimum phonetic units, are the basic units of the pinyin composition of the recognition object, and can be used as the basis for subsequent verification of the pronunciation lip matching of the content.

Step S102, performing pinyin decomposition on any recognition object of the plurality of recognition objects, and performing pronunciation lip matching on the pinyin decomposition result and the phoneme classification result.

In some possible embodiments, the pronunciation of any recognition object may be an independent syllable, and each syllable may be composed of phonemes or combinations of phonemes. Specifically, a plurality of recognition objects may be obtained, any one of the plurality of recognition objects may be subjected to pinyin decomposition, a consonant phoneme and a vowel phoneme constituting the pinyin of any one of the recognition objects may be obtained after the pinyin decomposition, the consonant phoneme and the vowel phoneme may be subjected to pronunciation lip matching with the phoneme classification result obtained in step S101, a consonant pronunciation lip corresponding to the consonant phoneme and a vowel pronunciation lip corresponding to the vowel phoneme may be determined from the correspondence between the phonemes and the pronunciation lips in the phoneme classification result, and the pronunciation lip change of any one recognition object may be obtained by combining the consonant pronunciation lip and the vowel pronunciation lip.

Here, it is assumed that the recognition objects are digits, the digits include 0 to 9 ten recognition objects, consonant phonemes and vowel phonemes corresponding to each digit are obtained based on pinyin decomposition, the consonant phonemes and the vowel phonemes are matched with the phoneme classification result to obtain corresponding consonant pronunciation lips and vowel pronunciation lips, the consonant pronunciation lips and the vowel pronunciation lips are combined to obtain pronunciation lip changes corresponding to the digits, specifically referring to fig. 3, fig. 3 is a pronunciation lip deformation schematic diagram of the digits provided by the embodiment of the present invention, and as shown in fig. 3, the pronunciation lip changes of the digits are as follows:

A first recognition object, for example, the number 0, where the pinyin of 0 is ling, may be decomposed into a consonant deformation 1 and a vowel deformation ing, where the consonant deformation 1 and the vowel deformation ing each correspond to a first class (i.e., deformation 1) in the phoneme classification result. The pronunciation lip corresponding to the first class in the phoneme classification result is half-sheet, so the pronunciation lip with the number 0 obtained by combining the consonant deformation 1 and the vowel deformation ing is half-sheet, and the lip changes are shown in fig. 1.

A second recognition object, for example, the numeral 1, where the pinyin of 1 is yi, can be decomposed into a consonant deformation y and a vowel deformation i, where both the consonant deformation y and the vowel deformation i correspond to the first class (i.e., deformation 1) in the phoneme classification result. The pronunciation lip corresponding to the first class in the phoneme classification result is half-sheet, so the pronunciation lip with the number 1 obtained by combining the consonant deformation y and the vowel deformation i is half-sheet, and the lip changes are shown in fig. 1.

A second recognition object, for example, numeral 1, where the pinyin of 1 may also be yao, may be decomposed into a consonant deformation y and a vowel deformation ao, where the consonant deformation y and the vowel deformation ao correspond to the first class (i.e., deformation 1) and the third class (i.e., deformation 3) in the phoneme classification result, respectively. The pronunciation lips corresponding to the first class in the phoneme classification result are half, the pronunciation lips corresponding to the third class in the phoneme classification result are AO, and therefore, the consonant deformation y and the vowel deformation AO are combined to obtain the pronunciation lip deformation corresponding to the number 1, which is half to AO, and the lip shape changes as shown in fig. 1 to 3.

A third recognition object, for example, numeral 2, where the pinyin of 2 is er, may be decomposed into a consonant shape e and a vowel shape er, where the consonant shape e and the vowel shape er correspond to the first class (i.e., shape 1) and the second class (i.e., shape 2) of the phoneme classification result, respectively. The pronunciation lips corresponding to the first class in the phoneme classification result are half, and the pronunciation lips corresponding to the second class in the phoneme classification result are full, so that the consonant deformation e and the vowel deformation er are combined to obtain the pronunciation lip deformation corresponding to the number 2, which is half to full, as shown in fig. 1 to 2.

A fourth recognition object, for example, numeral 3, where 3 pinyin is san, may be decomposed into a consonant deformation s and a vowel deformation an, where the consonant deformation s and the vowel deformation an correspond to the first class (i.e., deformation 1) and the second class (i.e., deformation 2) of the phoneme classification result, respectively. The pronunciation lips corresponding to the first class in the phoneme classification result are half, and the pronunciation lips corresponding to the second class in the phoneme classification result are full, so that the consonant deformation s and the vowel deformation an are combined to obtain the pronunciation lip deformation corresponding to the number 3, which is half to full, as shown in fig. 1 to 2.

A fifth recognition object, for example, numeral 4, where 4 pinyin is si, may be decomposed into a consonant deformation s and a vowel deformation i, where both the consonant deformation s and the vowel deformation i correspond to the first class (i.e., deformation 1) in the phoneme classification result. The pronunciation lip corresponding to the first class in the phoneme classification result is half-sheet, so the consonant deformation s and the vowel deformation i are combined to obtain pronunciation lip deformation corresponding to the number 4, which is half-sheet, as shown in the lip change of fig. 1.

A sixth recognition object, for example, numeral 5, where 5 pinyin is wu, is decomposed into a consonant deformation w and a vowel deformation u, where both the consonant deformation w and the vowel deformation u correspond to the fourth class (i.e., deformation 4) in the phoneme classification result. The pronunciation lip corresponding to the fourth class in the phoneme classification result is W-shaped, so that the consonant deformation W and the vowel deformation u are combined to obtain the pronunciation lip corresponding to the number 5, which is changed into W-shape, as shown in fig. 4.

A seventh recognition object, for example, numeral 6, where 6 pinyin is liu, may be decomposed into a consonant deformation l and a vowel deformation iu, where the consonant deformation l and the vowel deformation iu correspond to the first class (i.e., the deformation 1) and the fifth class (i.e., the deformation 5) in the phoneme classification result, respectively. The pronunciation lip corresponding to the first class in the phoneme classification result is half, and the pronunciation lip corresponding to the fifth class in the phoneme classification result is ou, so that the consonant deformation l and the vowel deformation iu are combined to obtain the pronunciation lip corresponding to the number 6, and the pronunciation lip is changed into half to ou, as shown in fig. 1 to 5.

An eighth recognition object, for example, numeral 7, wherein 7 pinyin is qi, can be decomposed into a consonant deformation q and a vowel deformation i, wherein the consonant deformation q and the vowel deformation i each correspond to the first class (i.e., deformation 1) in the phoneme classification result. The pronunciation lips corresponding to the first class in the phoneme classification result are half-sheets, so that the consonant deformation q and the vowel deformation i are combined to obtain pronunciation lips corresponding to the number 7, and the pronunciation lips are half-sheets, and the lip changes are shown in fig. 1.

A ninth recognition object, for example, numeral 8, where 8 pinyin is ba, may be decomposed into a consonant shape change b and a vowel shape change a, where the consonant shape change b and the vowel shape change a correspond to a sixth class (i.e., shape change 6) and a second class (i.e., shape change 2) in the phoneme classification result, respectively. The pronunciation lips corresponding to the sixth category in the phoneme classification result are closed mouth, and the pronunciation lips corresponding to the second category in the phoneme classification result are full-open mouth, so that the pronunciation lips corresponding to the number 8 are obtained by combining the consonant deformation b and the vowel deformation a, and the pronunciation lips are changed from closed mouth to full-open mouth, as shown in fig. 6 to 2.

A tenth recognition object, for example, numeral 9, where 9 pinyin is jiu, may be decomposed into a consonant deformation j and a vowel deformation iu, where the consonant deformation j and the vowel deformation iu correspond to the first class (i.e., deformation 1) and the fifth class (i.e., deformation 5) of the phoneme classification result, respectively. The pronunciation lip corresponding to the first class in the phoneme classification result is half, and the pronunciation lip corresponding to the fifth class in the phoneme classification result is ou, so that the consonant deformation j and the vowel deformation iu are combined to obtain the pronunciation lip deformation corresponding to the number 9, which is half to ou, and the lip shape changes as shown in fig. 1 to 5.

Optionally, if the recognition object is a word, because the word base number is more, the word with high probability can be subjected to pinyin decomposition and pronunciation lip matching according to the verification code generated by the identification of the living body of the identity and the use frequency of the word, so as to obtain the pronunciation lip change corresponding to the word. In addition, a new character can be learned as a new recognition object according to a preset update time or an update instruction, a pinyin decomposition result of the new character is obtained according to pinyin decomposition of the new character, and pronunciation lip shape matching is performed on the pinyin decomposition result of the new character and the phoneme classification result to obtain pronunciation lip shape change corresponding to the new character. The spelling decomposition and pronunciation lip deformation generation modes of the characters are the same as those of the digits, and five recognition objects are assumed to be obtained here, namely the characters of numbers, words, abundance, valleys and the like.

Specifically, the pinyin hao of the "number" may be decomposed into a consonant deformation h and a vowel deformation AO, where the consonant deformation h and the vowel deformation AO respectively correspond to the first class and the third class in the phoneme classification result shown in fig. 2, as can be seen from fig. 2, the pronunciation lip of the corresponding consonant deformation h is half-tone, and the pronunciation lip of the vowel deformation AO is AO, so that the pronunciation lip corresponding to the "number" is half-tone to AO based on the combination of the pronunciation lips of the consonant deformation h and the vowel deformation AO.

The pinyin yu of the "word" may be decomposed into a consonant shape y and a vowel shape u, which correspond to the first class and the fourth class in the phoneme classification result shown in fig. 2, respectively, and as can be seen from fig. 2, the pronunciation lip shape of the corresponding consonant shape y is half-sheet, and the pronunciation lip shape of the vowel shape u is W-shaped, so that the pronunciation lip shape corresponding to the "word" is half-sheet to W-shaped based on the combination of the pronunciation lip shapes of the consonant shape y and the vowel shape u.

The phonetic pronunciation feng of the "feng" can be decomposed into a consonant deformation f and a vowel deformation ng, which respectively correspond to the seventh class and the first class in the phoneme classification result shown in fig. 2, as can be seen from fig. 2, the pronunciation lip shape of the corresponding consonant deformation f is a biting lip, and the pronunciation lip shape of the vowel deformation ng is a half, so that the pronunciation lip shape corresponding to the "feng" is changed into a biting lip to a half based on the combination of the pronunciation lip shapes of the consonant deformation f and the vowel deformation ng.

The pinyin gu of the valley can be decomposed into a consonant deformation g and a vowel deformation u, which respectively correspond to the first class and the fourth class in the phoneme classification result shown in fig. 2, and as can be seen from fig. 2, the pronunciation lips of the corresponding consonant deformation g are half-sheets, and the pronunciation lips of the vowel deformation u are W-shaped, so that the pronunciation lips corresponding to the valley are half-sheets to W-shaped based on the combination of the pronunciation lips of the consonant deformation g and the vowel deformation u.

The "so" pinyin gu may be decomposed into consonant shape change g and vowel shape change u, which correspond to the first and fourth types of the phoneme classification result shown in fig. 2, respectively, and as can be seen from fig. 2, the pronunciation lips of the corresponding consonant shape change g are half-sheets, and the pronunciation lips of the vowel shape change u are W-shaped, so that the "so" pronunciation lips change to half-sheets to W-shapes based on the combination of the pronunciation lips of the consonant shape change g and the vowel shape change u.

The above-mentioned five recognition objects of the words "number, language, rich, valley, so" are merely examples of the corresponding pinyin decomposition and pronunciation lip deformation generation modes of the words, including but not limited to the above-mentioned five words, and may be specifically determined according to the actual application scenario, and are not limited herein.

Step S103, grouping a plurality of recognition objects based on the pronunciation lip deformation of the recognition objects obtained by the lip matching.

In some possible embodiments, the plurality of recognition objects may be grouped based on a change in a pronunciation lip of each recognition object, and the recognition objects having the same change in pronunciation lip may be grouped into one group, and different recognition objects may be grouped into different groups. Specifically, after the step S102 is performed on any recognition object, the grouping condition of the recognition object group may be searched, and if there is a recognition object group corresponding to the change in the sounding lip of any recognition object, that is, if the change in the sounding lip of any recognition object included in any one of the obtained plurality of recognition object groups is the same as the change in the sounding lip of the recognition object included in the obtained plurality of recognition object groups, the recognition object may be stored in the recognition object group corresponding to the change in the sounding lip. If there is no recognition object group corresponding to the change of the pronunciation lips of any recognition object, that is, if the change of the pronunciation lips of the recognition object included in any one of the obtained recognition object groups is different from the change of the pronunciation lips of the recognition object included in any one of the obtained recognition object groups, a new recognition object group is created, and the change of the pronunciation lips of any one recognition object is stored, so that the change of the pronunciation lips of the recognition object included in the new recognition object group is obtained. Based on the same identification object grouping operation mode of any identification object, the obtained identification objects can be finally placed into the identification object groups corresponding to the pronunciation lip changes respectively, so that a plurality of identification object groups are obtained. Optionally, the acquired pronunciation lip changes of the plurality of recognition objects may be marked, after statistics of pronunciation lip changes of the plurality of recognition objects is completed, the same marks are added to corresponding recognition objects with the same pronunciation lip changes, and the recognition objects with the same marks are grouped into a group to obtain a plurality of recognition object groups.

Alternatively, assuming that the recognition objects are 0 to 9 ten digits, the digits may be sequentially grouped according to the pronunciation lip deformation of the digital recognition objects, as shown in fig. 3, according to the pronunciation lip deformation of the digits obtained in step S102, so as to obtain a plurality of recognition object groups of digits. Specifically, the following cases can be obtained by grouping the numbers according to the pronunciation lip deformation:

a first group comprising 0, 1 (yi), 4, 7, the corresponding pronunciation lip changes to half;

a second group comprising 2, 3, 1 (yao), the corresponding voicing lip changes to half-to full-sheet/AO shape;

a third group comprising 6, 9, corresponding pronunciation lip changes to half-to-ou shape;

a fourth grouping comprising 5, the corresponding pronunciation lips being deformed into w-shapes;

a fifth group, comprising 8, the corresponding sound-emitting lips change from closed to half-open.

Wherein the voicing lip of 1 (yao) is deformed to half-to-AO like 2, 3, and thus can be divided into the same group.

Optionally, if the recognition object is a text, assuming that there are a plurality of recognition object groups of the number, step S102 is sequentially performed on the acquired plurality of text recognition objects to obtain a pronunciation lip change corresponding to any one of the plurality of text recognition objects, and the plurality of recognition object groups of the number are matched. If the recognition object group which is the same as the pronunciation lip deformation of any word recognition object exists, storing any word recognition object into the group, otherwise, creating a recognition object group, and storing any word recognition object into the created recognition object group.

Alternatively, it is assumed here that a group of recognition objects for which numbers already exist, a group of text recognition objects being stored at the same address as the group of recognition objects for which numbers are stored. Grouping the five character recognition objects shown in step S102, specifically, changing the pronunciation lip of the "number" into half-to-AO shape, and putting the "number" into the second group in the same way as the pronunciation lip change corresponding to the second group; the pronunciation lip of the "language" is changed into half to W shape, and the pronunciation lip change detected to be the same as the pronunciation lip change of the W shape can be considered to be the same as the pronunciation lip change corresponding to the fourth group, and the "language" is put into the fourth group; the abundant pronunciation lips are deformed into a half of the mouth, and no grouping corresponding to the pronunciation lips is formed in the grouping, so that a sixth grouping is newly established, and the abundant pronunciation lips are placed in the sixth grouping; the change of the pronunciation lips of the valleys is half-sheet to W-shaped, and the pronunciation lips corresponding to the fourth group can be considered to be the same as the change of the pronunciation lips of the W-shaped through detection, and the valleys are put into the fourth group; therefore, the sounding lip changes the same as the valley, the grouping mode is the same as the valley, and the "so" is put into the fourth grouping.

Specifically, the obtained grouping of recognition objects is as follows:

a second group comprising 2, 3, 1 (yao), number, corresponding pronunciation lip changes to half-to-full/AO shape;

a fourth group comprising 5, words, valleys, and so, the corresponding pronunciation lip changes into w shape;

a fifth group, comprising 8, corresponding pronunciation lips change to closed mouth to half open;

and a sixth grouping comprising the enlargement of the corresponding pronunciation lips to the bite lip half.

Alternatively, the digital recognition object group may be stored separately from the text recognition object group, that is, the digital recognition object group and the text recognition object group are stored in different address spaces, so that after the types of verification objects forming the verification content are obtained, the recognition objects may be selected from the corresponding recognition object groups as the verification objects to form the verification content. Optionally, the grouping process is the same as the grouping process when the numbers and the characters are mixed and stored, but the corresponding number recognition objects are accessed when the numbers are grouped, and the corresponding character recognition objects are accessed when the characters are grouped.

According to the grouping process of the identification objects, the grouping can be carried out through the pronunciation lip deformation, the pronunciation lip deformation corresponding to the identification objects in the same identification object grouping is the same, the pronunciation lip changes corresponding to the identification objects in different identification object groupings are different, a selection basis is provided for the verification content generation of the lip language identification, and only when the identification objects are selected as the verification objects, the identification objects corresponding to the adjacent verification objects do not belong to the same identification object grouping, so that the pronunciation lip changes of the adjacent verification objects are different, the verification content with different pronunciation lip changes of the adjacent verification objects can be obtained conveniently, the occurrence of the condition that the verification content is difficult to identify is reduced, and the accuracy of the lip language identification is improved.

The method provided by the embodiment of the present invention may generate verification content required for lip language identification verification based on the multiple identification object groups generated in the steps shown in fig. 1, and the method provided by the embodiment of the present invention will be described with reference to fig. 4.

Referring to fig. 4, fig. 4 is a flowchart of a verification content generation method for lip language identification according to an embodiment of the present invention. As shown in fig. 4, the method for generating verification content for lip language identification according to the embodiment of the present invention may include the following steps:

Step S401, a lip language identification request of the terminal equipment is obtained, and verification request parameters are obtained according to the lip language identification request.

In some possible embodiments, after the request for identifying the lip language of the terminal device is obtained, the verification request parameter of the terminal device is obtained according to the request for identifying the lip language, where the verification request parameter includes at least the number n of verification objects required for identifying and verifying the lip language. Optionally, the method provided by the embodiment of the invention may be executed by a terminal device, specifically, after the terminal device obtains a lip language identification instruction of a user through a lip language identification verification request interface, the terminal device sends a lip language identification request to a processor of the terminal device based on the verification request interface, and the processor determines information such as the number of verification objects forming the verification content based on the verification request interface, so as to obtain the verification request parameter. Optionally, the terminal device may display a verification interface for lip language identification through a display thereof, and send a verification request for lip language identification to a server connected to the terminal device based on the obtained user operation instruction on the verification interface. Here, the server may be configured to store data such as each recognition object group processed by the implementation manner provided in steps S101 to S103 shown in fig. 1 and a sound lip change included in each recognition object group.

Specifically, it is assumed here that a user needs to perform identity verification, where the identity verification mode is lip language identification, and verification content of the lip language identification is generated by a server, where the user submits a verification application at an identity verification interface of a terminal device, after receiving the verification application, the terminal device sends a lip language identification request to the server, the server obtains the lip language identification request of the terminal device, extracts information in the lip language identification request based on the identity verification interface of the terminal device, and includes verification interface information of the terminal device: and verifying the content length and the content type, thereby obtaining the verification request parameters of the terminal equipment.

Step S402, determining the number of verification objects required by the lip language identification verification according to the verification request parameters, and selecting the identification objects from a plurality of preset identification object groups to serve as verification objects to form verification contents.

In some possible embodiments, n recognition objects are selected from a plurality of preset recognition object groups according to the verification request parameters to serve as verification contents of the lip language recognition by n verification objects. The preset plurality of recognition object groups are the plurality of recognition object groups obtained in step S101-step S103 shown in fig. 1. Specifically, the n verification objects belong to at least two identification object groups, adjacent verification objects in the verification content respectively belong to different identification object groups, the verification objects form the verification content of lip language identification through the generation rule of the verification content, the n verification objects belong to at least two identification object groups, and the adjacent verification objects in the verification content respectively belong to different identification object groups, and the generation rule of the verification content determines the verification object. Specifically, assuming that the verification content is composed of four verification objects (i.e., n is 4), selecting one of a plurality of groups of identification objects as a first verification object, selecting one of the groups of identification objects with different lip changes as a second verification object, selecting the identification objects in the same way to obtain a third verification object and a fourth verification object, and finally combining the first verification object, the second verification object, the third verification object and the fourth verification object to generate the verification content.

In some possible embodiments, the verification content may be generated by a verification content generation rule, where the verification content generation rule formulated according to the grouping situation of the identification objects may be acquired, and the identification objects may be selected from the plurality of identification objects as verification objects to compose the verification content. Wherein adjacent authentication objects constituting the authentication content do not belong to the same recognition object group. Specifically, if two adjacent verification objects belong to the first group, no lip deformation exists; when the initial state is half-open or closed, if the first verification object belongs to the first group, the change of the pronunciation lip cannot be detected, and if the first verification object belongs to the second group or the fifth group, the group of recognition objects corresponding to the specific pronunciation lip change cannot be detected. Rules may be formulated as follows: 1. adjacent verification objects belong to the first group differently; 2. the identification object in the first group is not used as a first verification object of verification content; 3. the identification object in the second group or the fifth group is not used as the first verification object of the verification content; 4. only one verification object in the verification content belongs to the first group; 5. adjacent two verification objects do not belong to the same group; 6. the verification objects in the verification content do not belong to the same group or the like.

Specifically, here, it is assumed that four-bit authentication contents composed of numbers are to be generated, and the generation rule for acquiring the authentication contents is rule 5. Then randomly selecting one identification object from a plurality of identification object groups as a first verification object, assuming that the first verification object is 4,4 belongs to the first group, selecting the identification object from the second group to the fifth group as a second verification object according to a generation rule of verification contents, assuming that the second verification object is 5,5 belongs to the fourth group, selecting the identification object from the first group, the second group, the third group and the fifth group as a third verification object, assuming that the third verification object is 2,2 belongs to the second group, selecting the identification object from the first group, the third group, the fourth group and the fifth group as a fourth verification object, assuming that the fourth verification object is 6,6 belongs to the third group, and composing the verification contents of the first verification object, the second verification object, the third verification object and the fourth verification object to obtain four-bit verification contents 4526.

Step S403, outputting the verification content to a verification interface of the lip language identification, and performing the lip language identification verification of the verification content on the user of the terminal equipment based on the verification interface.

Outputting the verification content generated by the steps to a verification page, carrying out lip language identification verification on the verification content for a user of terminal equipment, obtaining lip actions of the user, extracting lip characteristics of the user, matching phonemes to form corresponding identification objects, comparing the identification objects with the output verification content to obtain a verification result, and feeding the verification result back to the user.

Specifically, the generated verification content is output to a verification interface of lip language identification, an identification image is obtained based on the verification interface, a face is continuously identified from the identification image, continuous lip deformation characteristics of a user are extracted, the lip deformation characteristics are matched with a phoneme classification result to obtain corresponding phonemes, the phonemes are combined to obtain corresponding pronunciation, and the pronunciation is compared with the verification content to obtain a lip language identification result. If it is assumed that the verification content 4526 generated in the step S402 is identified, after the continuous lip deformation feature of the user is extracted, the lip deformation feature is identified, so as to obtain a lip deformation of the user from half to W-shape, from half to full to half to ou shape, and the lip deformation feature is matched with the identification object group, so as to obtain a first group, a fourth group, a second group and a third group corresponding to the lip deformation feature, and finally obtain a lip language identification content according to the real data learning result, and the lip language identification content is compared with the verification content 4526, so as to obtain an identification result of the verification content, and is displayed on the verification interface for feedback to the user.

The embodiment of the invention is based on the recognition object group obtained in the steps S101-S103 shown in fig. 1, and the recognition objects are selected from the recognition object group to be used as verification objects to form verification contents, wherein adjacent verification objects do not belong to the same recognition object group, so that the pronunciation lip shape changes corresponding to the adjacent verification objects are different, the occurrence of the condition that the verification contents are difficult to recognize is reduced, and the accuracy of lip language recognition is improved.

Referring specifically to fig. 5, fig. 5 is a schematic diagram of a scenario for verification content generation of lip recognition according to an embodiment of the present invention. Assuming that the terminal device smart phone 200a is used, the grouping situation of the recognition objects, the phoneme lip classification result, the acquired verification content generation rule and the like are stored in the server 100a, after the smart phone 200a finishes information input, in order to ensure that the acquired information is provided for the user, living body detection is required, and the lip language recognition provided by the embodiment of the present invention is used here. When the lip language identification request is obtained, a verification page 201 is generated, wherein the verification page 201 includes a verification content display interface and a face recognition interface, and the verification content display interface displays verification content 202. The verification content 202 is assumed to be composed of four verification objects, namely, a first verification object 2021, a second verification object 2022, a third verification object 2023, and a fourth verification object 2024. Optionally, after receiving the verification request for lip language recognition, the terminal device 200a randomly selects one recognition object from the server 100a according to a rule for generating the verification content, and inputs the recognition object as a first verification object 2021 to the verification content 202, selects a recognition object different from the first verification object 2021 in terms of lip shape change as a second verification object 2022 to input the recognition object 202, selects a recognition object different from the second verification object 2022 in terms of lip shape change as a third verification object 2023 to input the recognition object 202, and selects a recognition object different from the third verification object 2023 in terms of lip shape change as a fourth verification object 2024 to input the recognition object 202, that is, the second verification object 2022 and the first verification object 2021 do not belong to the same recognition object group, the third verification object 2023 and the second verification object 2022 do not belong to the same recognition object group, and the fourth verification object 2024 and the third verification object 2023 do not belong to the same recognition object group. Finally, the first verification object 2021, the second verification object 2022, the third verification object 2023 and the fourth verification object 2024 are combined to generate verification content 202, the verification content 202 is output to a verification content display interface of the verification page 201, a face image of the user is obtained from a face recognition part, lip changes of the user are obtained, lip features of the user are extracted, and lip recognition verification is performed.

Optionally, the terminal device 200a may send a verification request for lip identification to the server 100a, after the server 100a receives the verification request for lip identification, the server 100a performs an operation that the terminal device selects an identification object as a verification object to form a verification content, so as to obtain a first verification object 2021, a second verification object 2022, a third verification object 2023 and a fourth verification object 2024, and combine to generate the verification content 202, the server 100a sends the verification content 202 to the terminal device 200a, and the terminal device displays the verification content 202 in the verification page 201 to perform lip identification of the verification content for the user.

Alternatively, after a preset update time or after an update instruction is acquired, a new recognition object may be learned, steps S102 to S103 shown in fig. 1 are performed on the new recognition object, and the new recognition object is added to a plurality of recognition object groups, so as to update the plurality of recognition object groups.

Optionally, a request for identifying a lip language may be sent to a server through a terminal device, and verification content of the lip language identification is generated by the server, specifically referring to fig. 6-a, fig. 6-a is an interaction flow chart of a method for generating verification content of the lip language identification provided by an embodiment of the present invention. As shown in fig. 6-a, the verification process of the lip language identification is implemented by using the server as a main body, and the interactive flow of the verification content generation method of the lip language identification is as follows:

Step S601a, a lip language identification request is sent.

Specifically, when the user performs the lip recognition, the terminal device sends a verification request for the lip recognition to the server, and specifically reference may be made to step S401 shown in fig. 4.

Step S602a, determining the number of verification objects and the generation rule of the verification content.

Specifically, the server determines the number n of verification objects constituting the verification content and a verification content generation rule according to the received verification request of the lip language identification. Optionally, the server may include a correspondence between the terminal device application and the verification content generation rule flag, or the server randomly selects the verification content generation rule when receiving the verification request of the lip identification.

Specifically, assuming that a user uses a WeChat of a terminal device application program, identity authentication is required when logging in because an account is not used for a long time, and assuming that the authentication mode is lip language identification authentication, after receiving an authentication request of the lip language identification, a server searches an authentication content generation rule corresponding to the WeChat and determines an authentication object selection and combination mode of the authentication content.

Step S603a, selecting a plurality of identification objects as verification objects according to the generation rule of the verification content to form the verification content. Specifically, step S403 is shown in fig. 4.

Step S604a, the verification content is sent to the terminal equipment.

Step S605a, the terminal device acquires a verification image of the user.

Specifically, the verification image is a user verification image obtained by a verification interface of the terminal device, namely a face image of the user.

Step S606a, the terminal device feeds back the acquired verification image to the server.

Step S607a, the server extracts the continuous lip change of the verification image.

Step 608a, the server identifies the continuous lip deformation, obtains corresponding pronunciation, and matches with verification content.

Step S609a, the server feeds back the lip language identification result of the verification content to the terminal equipment, and the terminal equipment displays the lip language identification result.

Specifically, the steps S604a to S609a are a process of recognizing the lip language of the verification content, continuously recognizing the face from the image through machine vision, extracting the continuous variation characteristic of the mouth shape, inputting the continuous variation characteristic into the recognition model, matching with a plurality of recognition object groups to obtain the corresponding recognition object pronunciation, comparing the recognition object pronunciation with the verification content to obtain the lip language recognition result of the verification content, and feeding back the lip language recognition result to the verification interface for feedback to the user.

Specifically, the terminal device sends a lip recognition request to the server, and the server executes the steps S602 a-S604 a and S607 a-S609 a; alternatively, the steps S601a to S609a may be executed by a server, where the step S601a sends a lip recognition request to the server for an application of the terminal device, and the server may be accessed by the terminal device including the step S5 and invoke a verification content generation procedure of the lip recognition.

Alternatively, the terminal device may directly perform the verification content generation process of lip language identification, and the generation of the verification content is achieved by accessing the data in the memory. The memory may be an internal memory or an external memory of the terminal device, or may be a cloud server that is shared with other terminal devices, and the memory stores data obtained in steps S101 to S103 shown in fig. 1, including a phoneme classification result, a plurality of recognition object groups, and pronunciation lip changes corresponding to the respective recognition object groups.

Specifically, referring to fig. 6-b, fig. 6-b is an interaction schematic diagram of another verification content generation method for lip language identification according to an embodiment of the present invention. The concrete steps are as follows:

Step 601b, a lip language identification request is obtained.

Step 602b, determining the number of verification objects and the generation rule of the verification content.

Step 603b, selecting a plurality of identification objects according to the generation rule of the verification content.

Specifically, the terminal device sequentially selects the identification objects from the memory storing the data in steps S101 to S103 according to the generation rule of the verification content.

Step 604b, forming verification content by using the plurality of identification objects as verification objects.

Step 605b, the authentication content is fed back to the user.

In step 606b, the terminal device obtains a verification image of the user.

Step 607b, extracting the continuous lip change of the verification image.

Step 608b, identifying the continuous lip deformation, obtaining corresponding pronunciation, and matching with verification content.

Specifically, after step 607b, the terminal device obtains the phoneme classification result, the multiple recognition object groups, the pronunciation lip shape changes corresponding to the respective recognition object groups, and the like from the memory, so as to match with the continuous lip shape changes, and obtain the verification result of lip language recognition.

The specific implementation manner of the step S601b to the step S608b is as shown in fig. 6-a, step S601a to step S608a, where the terminal device directly executes the steps, and only the data stored in the memory and shown in the step S101 to step S103 in fig. 1 are obtained from the memory, so as to implement the selection of the identification object and the lip identification verification of the verification content in the steps.

Optionally, referring to fig. 7, fig. 7 is a verification content generating apparatus for lip language identification according to an embodiment of the present invention. As shown in fig. 7, the verification content generating apparatus 70 for lip language identification may be used for the terminal device in the embodiment corresponding to fig. 5, and the apparatus may include: a response module 701, a processing module 704 and an output module 705.

The response module 701 is configured to obtain a lip language identification request of the terminal device, and obtain a verification request parameter according to the lip language identification request.

And a processing module 704, configured to determine, according to the verification request parameter obtained by the response module, the number n of verification objects required for lip language identification verification, and select n recognition objects from a preset plurality of recognition object groups as n verification objects to form verification contents of lip language identification, where the n verification objects at least belong to two recognition object groups and adjacent verification objects in the verification contents respectively belong to different recognition object groups, where pronunciation lip changes of recognition objects included in the different recognition object groups are different.

And the output module 705 is configured to output the verification content to a verification interface for lip language identification, and perform lip language identification verification of the verification content on a user of the terminal device based on the verification interface.

Wherein, the processing module 704 is further configured to:

the method comprises the steps of dividing the recognition objects with the pronunciation lips changed to be a first type of lip change in the at least two types of lip changes in the plurality of recognition objects into a first recognition object group, and dividing the recognition objects with the pronunciation lips changed to be a second type of lip change in the at least two types of lip changes in the plurality of recognition objects into a second recognition object group so as to obtain a plurality of recognition object groups.

Wherein, the processing module 704 is further configured to:

Wherein, the device can also include: a storage module 702 and a lip language identification module 703.

A storage module 702, configured to store a phoneme classification result, a plurality of recognition object groups, a formulated verification content generation manner, and data used when other verification content is generated;

the lip recognition module 703 is configured to recognize a lip motion of a user, and match the lip motion of the user with a generated pronunciation lip of the verification content of the lip recognition, so as to obtain a result of the lip recognition.

Wherein, the processing module 704 is further configured to:

and selecting a generation rule of the verification content of the lip language identification according to the system requirement, and selecting the identification object as the verification object according to the generation rule to combine into the verification content of the lip language identification.

In a specific implementation, the apparatus may implement the functions implemented in the above embodiments by implementing the implementation manners provided by the steps in the implementation manners provided in fig. 1 or fig. 4 by using the above respective modules, and specifically reference may be made to corresponding descriptions provided by the steps in the method embodiment shown in fig. 1 or fig. 4, which are not repeated herein.

In the embodiment of the invention, the verification content generating device (or simply referred to as the device) can select the identification objects from the preset identification object groups as verification objects to form the verification content, wherein the identification objects corresponding to the adjacent verification objects do not belong to the same identification object group, namely the pronunciation lip shape changes corresponding to the adjacent verification objects are different, so that the pronunciation lip changes of the adjacent verification objects in the formed verification content are changed, the occurrence of the condition that the verification content is difficult to identify is reduced, and the accuracy of lip identification is improved.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present invention. As shown in fig. 8, the terminal device in the present embodiment may include: one or more processors 801, memory 802, and a transceiver 803. The processor 801, the memory 802, and the transceiver 803 are connected through a bus 804. The memory 802 is used to store a computer program comprising program instructions. The processor 801 and the transceiver 803 are configured to call the program instructions stored in the memory 802, and perform the following operations:

the transceiver 803 is configured to obtain a lip language identification request of the terminal device.

The processor 801 is configured to obtain a verification request parameter according to a lip recognition request obtained by the transceiver 803, determine a number n of verification objects required for lip recognition verification according to the verification request parameter, and select n recognition objects from a preset plurality of recognition object groups as n recognition objects to form verification contents of lip recognition, where the n recognition objects belong to at least two recognition object groups and adjacent verification objects in the verification contents respectively belong to different recognition object groups, and pronunciation lip changes of recognition objects included in the different recognition object groups are different;

The transceiver 803 is further configured to output verification content to a verification interface for lip language identification.

The processor 801 is further configured to perform lip recognition verification of verification content for a user of the terminal device based on the verification interface.

In some possible embodiments, the pronunciation lip shape change of the first verification object in the verification content of the above-mentioned lip language identification does not include a pronunciation lip shape change starting with half a mouth or a closed mouth.

In some possible embodiments, the processor 801 is configured to:

performing phoneme decomposition on the pinyin of any one of the plurality of recognition objects, and determining a pronunciation lip corresponding to each phoneme according to the correspondence between each phoneme obtained by decomposition and the phoneme and the pronunciation lip;

In some possible embodiments, the processor 801 is configured to:

the method comprises the steps of classifying Chinese phonemes corresponding to a first pronunciation lip of at least two pronunciation lips of the plurality of Chinese phonemes into a first category, and classifying Chinese phonemes corresponding to a second pronunciation lip of the plurality of Chinese phonemes into a second category;

and storing the first category and the second category into a phoneme classification result.

In some possible embodiments, the processor 801 is configured to:

decomposing the pinyin of any one recognition object of the plurality of recognition objects into a consonant phoneme and a vowel phoneme, performing lip matching on the consonant phoneme and the vowel phoneme and the phoneme classification result, obtaining a consonant pronunciation lip corresponding to the consonant phoneme and a vowel pronunciation lip corresponding to the vowel phoneme according to the correspondence between the phonemes and pronunciation lips in the phoneme classification result, and combining the consonant pronunciation lip and the vowel pronunciation lip to obtain the pronunciation lip change of any recognition object.

In some possible embodiments, the processor 801 may be a central processing unit (central processing unit, CPU), which may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 802 may include read only memory and random access memory and provide instructions and data to the processor 801 and the transceiver 803. A portion of memory 802 may also include non-volatile random access memory. For example, the memory 802 may also store information of device type.

In a specific implementation, the terminal device may execute, through each built-in functional module, an implementation provided by each step in fig. 1 or fig. 4, and specifically, the implementation provided by each step in fig. 1 or fig. 4 may be referred to, which is not described herein again.

In the embodiment of the invention, the terminal equipment can select the identification objects from the preset identification object groups as the identification objects to form the verification content, wherein the identification objects corresponding to the adjacent verification objects do not belong to the same identification object group, namely the pronunciation lip changes corresponding to the adjacent verification objects are different, so that the pronunciation lip changes of the adjacent verification objects in the formed verification content are changed, the occurrence of the condition that the verification content is difficult to identify is reduced, and the accuracy of lip identification is improved.

The embodiment of the present invention further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, where the computer program includes program instructions, where the program instructions, when executed by a processor, implement a method for generating verification content for identifying lips provided in each step of fig. 1 or fig. 4, and specifically refer to an implementation manner provided in each step of fig. 1 or fig. 4, which is not described herein again.

The computer readable storage medium may be the verification content generating apparatus for lip language identification provided in any of the foregoing embodiments or an internal storage unit of the terminal device, for example, a hard disk or a memory of an electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) card, a flash card (flash card) or the like, which are provided on the electronic device. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the electronic device. The computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.

The terms first, second and the like in the description and in the claims and drawings of embodiments of the invention are used for distinguishing between different objects and not for describing a particular sequential order. Furthermore, the term "include" and any variations thereof is intended to cover a non-exclusive inclusion. For example, a process, method, apparatus, article, or device that comprises a list of steps or elements is not limited to the list of steps or modules but may, in the alternative, include other steps or modules not listed or inherent to such process, method, apparatus, article, or device. In addition, the term "at least" is used in a list of partial cases to reflect the implementation of the process and is not intended to include only the implementation requirements of the method presented.

The method and related apparatus provided in the embodiments of the present invention are described with reference to the flowchart and/or schematic structural diagrams of the method provided in the embodiments of the present invention, and each flow and/or block of the flowchart and/or schematic structural diagrams of the method may be implemented by computer program instructions, and combinations of flows and/or blocks in the flowchart and/or block diagrams. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing disclosure is illustrative of the present invention and is not to be construed as limiting the scope of the invention, which is defined by the appended claims.

Claims

1. A verification content generation method for lip language identification, comprising:

2. The method of claim 1, wherein the pronunciation lip change of the first verification object in the verification content of the lip recognition does not include a pronunciation lip change starting with a half-open or closed mouth.

3. The method of claim 1 or 2, wherein the method further comprises:

4. The method of claim 3, wherein the determining a pronunciation lip change for each of the plurality of recognition objects comprises:

5. The method of claim 4 wherein classifying the plurality of chinese phonemes by pronunciation lips to obtain a phoneme classification result comprises:

6. The method of claim 4 or 5, wherein decomposing the pinyin of any one of the plurality of recognition objects, and determining the pronunciation lip corresponding to each phoneme according to the correspondence between each phoneme obtained by decomposition and the phoneme and the pronunciation lip comprises:

7. A verification content generation apparatus for lip language identification, comprising:

8. The apparatus of claim 7, wherein the processing module is further to:

9. A terminal device, characterized in that the terminal device comprises a processor, a transceiver and a memory, the processor, the transceiver and the memory being interconnected, wherein the memory is adapted to store a computer program comprising program instructions, the processor and the transceiver being configured to invoke the program instructions to perform the method according to any of claims 1-6.

10. A computer readable storage medium storing computer program instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1-6.