WO2020107834A1 - 唇语识别的验证内容生成方法及相关装置 - Google Patents

唇语识别的验证内容生成方法及相关装置 Download PDF

Info

Publication number
WO2020107834A1
WO2020107834A1 PCT/CN2019/088800 CN2019088800W WO2020107834A1 WO 2020107834 A1 WO2020107834 A1 WO 2020107834A1 CN 2019088800 W CN2019088800 W CN 2019088800W WO 2020107834 A1 WO2020107834 A1 WO 2020107834A1
Authority
WO
WIPO (PCT)
Prior art keywords
lip
recognition
verification
pronunciation
objects
Prior art date
Application number
PCT/CN2019/088800
Other languages
English (en)
French (fr)
Inventor
庞烨
王义文
王健宗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020107834A1 publication Critical patent/WO2020107834A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • G10L15/25Speech recognition using non-acoustical features using position of the lips, movement of the lips or face analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units

Definitions

  • the present application relates to the field of computer technology, in particular to a verification content generation method and related device for lip recognition.
  • the embodiments of the present application provide a verification content generation method and related device for lip recognition, which can reduce the occurrence probability of lip shape change that is difficult to recognize in the lip recognition process, and improve the accuracy and convenience of lip recognition and applicability Stronger.
  • An aspect of an embodiment of the present application provides a verification content generation method for lip recognition, including:
  • the number n of verification objects required for lip recognition verification is determined, and n identification objects are selected from the preset multiple identification object groups as n verification objects to form verification content of lip recognition.
  • the n verification objects belong to at least two identification object groups and adjacent verification objects in the verification content belong to different identification object groups, respectively, the pronunciation lip shape of the identification objects included in the different identification object groups differs;
  • a verification content generation device for lip recognition including:
  • the response module is used to obtain the lip recognition request of the terminal device, and obtain the verification request parameters according to the lip recognition request;
  • a processing module configured to determine the number n of verification objects required for verification of lip recognition based on the verification request parameters obtained by the response module, and select n identification objects as n from a plurality of preset identification object groups Verification objects constitute verification content of lip recognition.
  • the n verification objects belong to at least two identification object groups and adjacent verification objects in the verification content belong to different identification object groups respectively.
  • the lip shape change of the pronunciation of the included recognition objects is different;
  • the output module is configured to output the verification content composed of the processing module to a verification interface for lip recognition, and perform lip recognition verification of the verification content to the user of the terminal device based on the verification interface.
  • a terminal device including: a processor, a transceiver, and a memory, where the processor, the transceiver, and the memory are connected to each other, wherein the memory is used to store a computer program
  • the computer program includes program instructions, and the processor and the transceiver are configured to call the program instructions to perform the method described in one aspect of the embodiments of the present application.
  • Another aspect of an embodiment of the present application provides a computer-readable storage medium that stores computer program instructions, and when executed by a processor, the computer program instructions cause the processor to execute as described in the embodiments of the present application The method in one aspect.
  • the adoption of the embodiments of the present application can reduce the occurrence of the situation that the pronunciation lip shape change is difficult to recognize, and improve the accuracy of lip recognition.
  • FIG. 1 is a schematic flowchart of a grouping of identification objects provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of a phoneme classification result provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a lip shape change of digital pronunciation provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a verification content generation method for lip recognition provided by an embodiment of the present application
  • FIG. 5 is a schematic diagram of a scenario for generating verification content of lip recognition provided by an embodiment of the present application.
  • 6-a is an interactive flowchart of a verification content generation method for lip recognition provided by an embodiment of the present application
  • 6-b is another interactive flowchart of the verification content generation method for lip recognition provided by the embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a verification content generation device for lip recognition provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a terminal device provided by an embodiment of the present application.
  • Living verification based on lip recognition is another living detection method different from speech verification.
  • Lip recognition is a part of face recognition detection. Through machine vision recognition, it is possible to interpret what the speaker says by recognizing the lip movements of the speaker. Therefore, lip recognition can assist voice interaction and image recognition, such as in When the surrounding noise is too loud, the lip recognition technology can avoid interference and greatly improve the accuracy of system recognition.
  • lip recognition is to extract a person's face through an image to obtain the continuous lip-change characteristics of the person, and matches the corresponding pronunciation recognized in the lip recognition model, thereby calculating the most likely natural language sentence, which cannot be directly obtained.
  • the verification content generation method and related device for lip recognition provided by the embodiments of the present application are aimed at reducing the occurrence of situations where the lip shape change of pronunciation is difficult to recognize from the generation stage, and improving the accuracy of lip recognition.
  • the verification content generation method for lip recognition provided in the embodiments of the present application (for convenience of description may be referred to as the method provided in the embodiments of the present application) can be applied to mobile phones, computers, mobile Internet devices (mobile Internet devices (MID) or other available lip
  • the terminal device of the image can be determined according to the actual application scenario, and is not limited here.
  • the terminal device will be taken as an example for description below.
  • multiple identification objects are selected from preset identification object groups according to certain verification content generation rules, and the selected multiple identification objects are used as verification objects to form verification content for lip recognition verification .
  • the identification object grouping is obtained by grouping the identification objects with the same pronunciation lip shape as a group, and the pronunciation lip shape changes between different groups.
  • the method provided in the embodiment of the present application can select recognition objects with different pronunciation lip shapes based on the group of recognition objects to form the verification content of the lip recognition verification, so that the pronunciation lip shapes of the various recognition objects in the verification content can be different and improve the lip
  • the accuracy of language recognition is more applicable.
  • the following will first describe the generation process of the preset identification object group.
  • FIG. 1 is a schematic flowchart of a grouping of identification objects provided by an embodiment of the present application.
  • the process of identifying object grouping provided by the embodiment of the present application may include the following steps:
  • Step S101 classify Chinese phonemes according to pronunciation lip shape, and obtain phoneme classification results.
  • the Chinese phoneme is the smallest phonetic unit divided according to the natural attributes of speech, referred to as phoneme for short.
  • a pronunciation action constitutes a phoneme, and the sounds produced by the same pronunciation action are the same phoneme, and each of the above pronunciation actions corresponds to a pronunciation lip shape.
  • phonemes are divided into vowels and consonants.
  • vowels include a, o, e, ê, i, u, ü, -i[ 1 ] (front i) and -i[ ⁇ ] (back i), er, consonants include b, p, m, f, z, c, s, d, t, n, l, zh, ch, sh, r, j, q, x, g, k, h, ng .
  • all Chinese phonemes can be classified according to pronunciation lip shapes to obtain phoneme classification results.
  • the above phoneme classification results include the correspondence between phonemes and pronunciation lip shapes. All Chinese phonemes can be divided into 7 types of phonemes.
  • the above phoneme classification results are shown in FIG. Schematic diagram of the results. As shown in Figure 2, the above phoneme classification results are as follows:
  • the first type (such as deformation 1) is a half-lip shape.
  • the phoneme classification results of the first type here can include e, i, d, t, n, l, g, k, h, j,q,x,z,c,s,zh,ch,sh,ng,y and other phonemes;
  • the second type (such as deformation 2) is a full-length lip shape, as shown in picture 2, where the phoneme classification results of the second type may include a, er and other phonemes;
  • the third type (such as deformation 3) is an AO-shaped lip shape, as shown in picture 3, where the third-class phoneme classification result may include ao;
  • the fourth type (such as deformation 4) is a w-shaped lip shape, as shown in picture 4, where the phoneme classification results of the fourth type may include u, v, o, w;
  • the fifth type (such as deformation 5) is ou-shaped lip shape, as shown in picture 5, here the phoneme classification result of the fifth type may include ou,iu;
  • the sixth category (such as deformation 6) is a closed-lip shape, as shown in picture 6, where the phoneme classification results of the sixth category may include b, p, m;
  • the seventh type (such as deformation 7) is a lip-biting lip shape, as shown in picture 7, where the phoneme classification result of the seventh type may include f.
  • ao, ou, iu are composed of multiple phonemes, but based on the single-mouth matching principle, since the pronunciation lip shape of the combination of these three phonemes is all single-mouth type, it is also considered as a single phoneme for pronunciation lip classification here.
  • the single-mouth matching principle refers to the standard for performing lip-matching of pronunciations with phonemes or phoneme combinations with no change in lip shape as subsequent recognition objects. It can be understood that the phoneme, as the smallest phonetic unit, is the basic unit composed of the pinyin of the recognition object, and can be used as the basis for the subsequent lip-matching of the pronunciation of the verified content.
  • step S102 any one of the plurality of recognition objects is subjected to pinyin decomposition, and the result of the above-mentioned pinyin decomposition and the above-mentioned phoneme classification result are lip-matched in pronunciation.
  • the pronunciation of any recognition object may be an independent syllable, and each syllable may be composed of phonemes or a combination of phonemes.
  • each syllable may be composed of phonemes or a combination of phonemes.
  • multiple recognition objects can be acquired, and any of the multiple recognition objects can be pinyin decomposed to obtain consonant phonemes and vowel phonemes that constitute the pinyin of any of the recognition objects.
  • the phoneme and the phoneme classification result obtained in step S101 are lip-matched in pronunciation, and the consonant pronunciation lip corresponding to the consonant phoneme and the vowel corresponding to the vowel phoneme are determined by the correspondence between the phoneme and the phonetic lip in the phoneme classification result Pronunciation lip shape, combining the consonant pronunciation lip shape and the vowel pronunciation lip shape to obtain the pronunciation lip shape change of any recognition object.
  • the recognition object is a number, and the number includes 10 to 10 recognition objects.
  • the consonant phonemes and vowel phonemes corresponding to each number are obtained, and the above consonant phonemes and vowel phonemes are matched with the above phoneme classification results.
  • FIG. 3 is an embodiment of the present application Provided is a schematic diagram of the lip shape change of digital pronunciation, as shown in FIG. 3, the lip shape change of digital pronunciation is as follows:
  • the first recognition object such as the number 0, where the pinyin of 0 is ling, which can be decomposed into consonant deformation 1 and vowel deformation ing, wherein the above consonant deformation 1 and vowel deformation ing both correspond to the first in the phoneme classification result Class (ie deformation 1).
  • the pronunciation lip shape corresponding to the first category in the above phoneme classification result is half, so the combination of the above consonant deformation 1 and vowel deformation ing to obtain the number 0 pronunciation lip shape changes to half, as shown in Figure 1 .
  • the second recognition object for example, the number 1, where the pinyin of 1 is yi, which can be decomposed into a consonant deformation y and a vowel deformation i, where the above consonant deformation y and vowel deformation i both correspond to the first in the phoneme classification result Class (ie deformation 1).
  • the pronunciation lip shape corresponding to the first category in the above phoneme classification result is half, so the combination of the above consonant deformation y and vowel deformation i to get the number 1 pronunciation lip shape changes to half, as shown in Figure 1 .
  • the second recognition object for example, the number 1, where the pinyin of 1 can also be yao, which can be decomposed into consonant deformation y and vowel deformation ao, where the above consonant deformation y and vowel deformation ao correspond to the The first category (ie deformation 1) and the third category (ie deformation 3).
  • the pronunciation lip shape corresponding to the first category in the above phoneme classification result is half
  • the pronunciation lip shape corresponding to the third category in the above phoneme classification result is AO shape. Therefore, the above consonant deformation y and vowel deformation ao are combined to obtain a number
  • the corresponding pronunciation lip shape changes from 1 to half-open to AO shape, as shown in Figures 1 to 3.
  • the third recognition object such as the number 2, where the pinyin of 2 is er, which can be decomposed into a consonant deformation e and a vowel deformation er, where the above consonant deformation e and vowel deformation er correspond to the first in the above phoneme classification results, respectively Type (ie deformation 1) and the second type (ie deformation 2).
  • the phonetic lip shape corresponding to the first type in the phoneme classification result is half, and the phonetic lip shape corresponding to the second type in the phoneme classification result is full. Therefore, the above consonant deformation e and vowel deformation er get a number 2
  • the corresponding lip shape changes from half to full, as shown in Figures 1 to 2.
  • the fourth recognition object for example, the number 3, where 3 pinyin is san, which can be decomposed into consonant deformation s and vowel deformation an, wherein the above consonant deformation s and vowel deformation an correspond to the first category in the above phoneme classification results, respectively (Ie deformation 1) and the second category (ie deformation 2).
  • the phonetic lip shape corresponding to the first type in the phoneme classification result is half, and the phonetic lip shape corresponding to the second type in the phoneme classification result is full. Therefore, the combination of the consonant deformation s and the vowel deformation an obtains the number 3.
  • Corresponding pronunciation lip shape changes from half to full, as shown in Figures 1 to 2.
  • the fifth recognition object for example, the number 4, where the 4 pinyin is si, which can be decomposed into consonant deformation s and vowel deformation i, wherein the above consonant deformation s and vowel deformation i both correspond to the first category in the above phoneme classification result (Ie deformation 1).
  • the pronunciation lip shape corresponding to the first category in the above phoneme classification result is half, so the above-mentioned consonant deformation s and vowel deformation i are combined to obtain the pronunciation lip shape change corresponding to the number 4 as half, as shown in FIG. 1 Variety.
  • the sixth recognition object for example, the number 5, where the 5 pinyin is wu, which can be decomposed into a consonant deformation w and a vowel deformation u, where the above consonant deformation w and vowel deformation u both correspond to the fourth category in the above phoneme classification result (Ie deformation 4).
  • the pronunciation lip shape corresponding to the fourth category in the above phoneme classification result is W-shaped, therefore, the above-mentioned consonant deformation w and vowel shape modification u are combined to obtain the pronunciation lip shape corresponding to the number 5 to be W-shaped, as shown in FIG. 4 Variety.
  • the seventh recognition object for example, the number 6, where the 6 pinyin is liu, which can be decomposed into consonant deformation l and vowel deformation iu, where the above consonant deformation l and vowel deformation iu respectively correspond to the first category in the above phoneme classification results (Ie deformation 1) and the fifth category (ie deformation 5).
  • the pronunciation lip shape corresponding to the first category in the above phoneme classification result is half
  • the pronunciation lip shape corresponding to the fifth category in the above phoneme classification result is ou shape. Therefore, the above consonant deformation l and vowel deformation iu are combined to obtain a number
  • the corresponding pronunciation lip shape of 6 changes from half to ou shape, as shown in Figures 1 to 5.
  • An eighth recognition object such as the number 7, where the 7 pinyin is qi, which can be decomposed into consonant deformation q and vowel deformation i, where the above consonant deformation q and vowel deformation i both correspond to the first category in the above phoneme classification results (Ie deformation 1).
  • the pronunciation lip shape corresponding to the first type in the above phoneme classification result is half-sheet. Therefore, the combination of the above-mentioned consonant deformation q and vowel shape modification i to obtain the pronunciation lip shape corresponding to the number 7 is half-sheet, as shown in FIG. 1.
  • the ninth recognition object such as the number 8, where the 8 pinyin is ba, which can be decomposed into a consonant deformation b and a vowel modification a, where the above consonant modification b and vowel modification a correspond to the sixth category in the above phoneme classification results, respectively (Ie deformation 6) and the second category (ie deformation 2).
  • the pronunciation lip shape corresponding to the sixth category in the phoneme classification result is closed, and the pronunciation lip shape corresponding to the second category in the phoneme classification result is full sheet. Therefore, the consonant deformation b and the vowel deformation a are combined to obtain a number
  • the corresponding pronunciation lip shape changes from 8 to closed to full, as shown in Figure 6 to Figure 2.
  • the tenth recognition object such as the number 9, where 9 is pinyin jiu, which can be decomposed into consonant deformation j and vowel deformation iu, where the above consonant deformation j and vowel deformation iu respectively correspond to the first category in the above phoneme classification results (Ie deformation 1) and the fifth category (ie deformation 5).
  • the pronunciation lip shape corresponding to the first category in the above phoneme classification result is half, and the pronunciation lip shape corresponding to the fifth category in the above phoneme classification result is ou shape, therefore, the above consonant deformation j and vowel deformation iu are combined to obtain a number
  • the corresponding pronunciation lip shape of 9 changes from half to ou shape, as shown in Figures 1 to 5.
  • the recognition object is text
  • the statistical analysis of big data can be used according to the verification code generated by the identification of living organisms and the frequency of use of the text.
  • the pinyin decomposition and pronunciation lip change generation method of the characters are the same as the pinyin decomposition and pronunciation lip change generation methods of the above numbers.
  • the pinyin hao of the "sign" can be decomposed into a consonant deformation h and a vowel deformation ao.
  • the above consonant deformation h and vowel deformation ao correspond to the first and third categories in the phoneme classification result shown in FIG. 2, As shown in FIG. 2, it can be seen that the pronunciation lip shape of the corresponding consonant deformation h is half, and the pronunciation lip shape of the vowel deformation ao is AO shape. The corresponding pronunciation lip shape changes from half to AO shape.
  • the pinyin yu of " ⁇ " can be decomposed into consonant deformation y and vowel deformation u, and the above consonant deformation y and vowel deformation u correspond to the first and fourth categories in the phoneme classification results shown in Fig. 2, respectively, as shown in Fig. 2
  • the pronunciation lip shape of the corresponding consonant deformation y is half
  • the pronunciation lip shape of the vowel modification u is W-shaped. Therefore, based on the pronunciation lip combination of the consonant modification y and the vowel shape modification u, the pronunciation corresponding to the "yu”
  • the lip shape changes from half-open to W-shaped.
  • the pinyin feng of "feng” can be decomposed into consonant deformation f and vowel deformation ng, and the above consonant deformation f and vowel deformation ng correspond to the seventh and first categories in the phoneme classification results shown in Fig. 2, respectively, as shown in Fig. 2
  • the pronunciation lip shape of the corresponding consonant deformation f is a lip bite
  • the pronunciation lip shape of a vowel deformation ng is a half-piece. Therefore, based on the pronunciation lip combination of the above consonant deformation f and vowel shape change ng, the pronunciation corresponding to "feng" is obtained
  • the shape of the lips changes from biting to half.
  • the pinyin gu of "Gu” can be decomposed into consonant deformation g and vowel modification u, and the above consonant modification g and vowel modification u correspond to the first and fourth categories in the phoneme classification results shown in Fig. 2, respectively, as shown in Fig. 2
  • the pronunciation lip shape of the corresponding consonant shape change g is half
  • the pronunciation lip shape of the vowel shape change u is W shape. Therefore, based on the above pronunciation lip combination of the consonant shape change g and the vowel shape change u, the pronunciation corresponding to "gu” is obtained.
  • the lip shape changes from half-open to W-shaped.
  • the "gu" pinyin gu can be decomposed into consonant deformation g and vowel modification u.
  • the above consonant modification g and vowel modification u correspond to the first and fourth categories in the phoneme classification results shown in Fig. 2, respectively, as shown in Fig. 2
  • the pronunciation lip shape of the corresponding consonant shape change g is half
  • the pronunciation lip shape of the vowel shape change u is W shape.
  • the lip shape changes from half-open to W-shaped.
  • pinyin decomposition and pronunciation lip change generation methods of the five recognition objects of the above characters “sign, language, abundant, valley, so” are only examples of the pinyin decomposition and pronunciation lip change generation methods corresponding to the text, including but not limited to the above five This text can be determined according to the actual application scenario, without limitation.
  • step S103 a plurality of recognition objects are grouped based on the change in pronunciation lip shape of the recognition objects obtained by the above lip matching.
  • the multiple recognition objects may be grouped based on the pronunciation lip changes of the respective recognition objects, and the recognition objects with the same pronunciation lip changes may be grouped into one group, and different groups are different. Specifically, after performing the above step S102 on any recognition object, the grouping situation of the above recognition object group may be searched. If there is a recognition object group corresponding to the pronunciation lip change of any recognition object, that is, any of the recognition objects and the above Among the obtained multiple recognition object groups, the pronunciation lip shape change of the recognition object included in a certain recognition object group is the same, and the recognition object is stored in the recognition object group corresponding to the pronunciation lip shape change.
  • any recognition object group corresponding to the change in pronunciation lip shape of any recognition object that is, any one of the above recognition objects and the recognition object included in any of the plurality of recognition object groups obtained above
  • a new recognition object group is created, and the pronunciation lip shape change of any recognition object is stored to obtain the pronunciation lip shape change of the recognition object included in the new recognition object group.
  • the obtained multiple recognition objects can be finally put into the recognition object groups corresponding to the respective pronunciation lip changes to obtain multiple recognition object groups.
  • the numbers may be grouped sequentially according to the change in the pronunciation lip shape of the number recognition object. In this way, a number of groups of multiple identification objects are obtained. Specifically, the numbers are grouped according to the pronunciation lip changes to obtain the following conditions:
  • the first group includes 0, 1 (yi), 4, and 7, and the corresponding pronunciation lip changes to half.
  • the second group including 2, 3, 1 (yao), the corresponding pronunciation lip shape changes from half to full sheet/AO shape;
  • the third group includes 6, 9 and the corresponding pronunciation lip shape changes from half to ou shape;
  • the fourth group including 5, the corresponding pronunciation lip shape changes to w shape
  • the fifth group including 8, the corresponding pronunciation lip shape changes from shut up to half open.
  • the pronunciation lip shape of 1 (yao) changes from half to AO, similar to 2, 3, so it can be divided into the same group.
  • the recognition object is a character
  • the plurality of recognition objects with the above numbers are grouped
  • the obtained plurality of character recognition objects are sequentially executed in step S102 to obtain a corresponding one of the character recognition objects of the plurality of character recognition objects
  • the lip shape of the pronunciation changes, and multiple recognition objects matching the above numbers are grouped. If there is a recognition object group that has the same pronunciation lip change as any of the character recognition objects, save any of the character recognition objects in the group, otherwise create a new recognition object group and save any of the character recognition objects in the newly created recognition object group in.
  • the character recognition object group and the digital recognition object group are stored at the same address.
  • Group the five character recognition objects as shown in step S102.
  • the pronunciation lip shape of the "number” changes from half to AO, and the pronunciation lip shape corresponding to the above second grouping is the same.
  • the pronunciation lip shape change corresponding to the above fourth group is the same.
  • “Language” is put into the fourth group; Feng's pronunciation lip shape changes from biting to half of the lip.
  • the sixth group is newly created and "feng" is put into the above In the sixth group; the pronunciation lip shape of Gu is half-shaped to W-shaped. It is detected to be the same as the W-shaped pronunciation lip shape change. It can be considered that the pronunciation lip shape change corresponding to the above fourth group is the same.
  • the lip shape of the pronunciation is the same as that of the valley, and the grouping method is the same as that of the valley. Put the "old" into the fourth group.
  • the grouping of the identified objects is as follows:
  • the first group includes 0, 1 (yi), 4, and 7, and the corresponding pronunciation lip changes to half.
  • the second group including 2, 3, 1 (yao), number, the corresponding pronunciation lip shape changes from half to full sheet/AO shape;
  • the third group includes 6, 9 and the corresponding pronunciation lip shape changes from half to ou shape;
  • the fourth group including 5, language, valley, so, the corresponding pronunciation lip shape changes to w shape;
  • the fifth group including 8, the corresponding pronunciation lip shape changes from shut up to half open;
  • the sixth group including Feng, has a corresponding pronunciation lip shape that changes from biting to half.
  • the digital recognition object grouping and the text recognition object grouping can be stored separately, that is, the digital recognition object grouping and the text recognition object grouping are stored in different address spaces, so that after obtaining the types of verification objects constituting the verification content, Select the identification object from the corresponding identification object group as the verification object to form the verification content.
  • the grouping process is the same as the grouping process when the numbers and characters are stored together, except that the corresponding number recognition object group is accessed when the numbers are grouped, and the corresponding character recognition object group is accessed when the characters are grouped.
  • the grouping process of the recognition objects provided in the embodiments of the present application may be grouped by the pronunciation lip changes.
  • the recognition objects corresponding to the recognition objects in the same recognition object group have the same pronunciation lip shape change, and the pronunciation lips corresponding to the recognition objects in different recognition object groups
  • the shape changes are different, which provides a selection basis for the generation of verification content of lip recognition. It is only necessary to select the identification object as the verification object, and the identification objects corresponding to the adjacent verification objects do not belong to the same identification object group, so that the adjacent verification Different pronunciation lip shapes of objects can easily obtain verification contents with different pronunciation lip shapes of adjacent verification objects, which reduces the occurrence of difficult identification of verification contents and improves the accuracy of lip recognition.
  • the method provided by the embodiment of the present application may generate the verification content required for the verification of lip recognition based on the multiple identification object groups generated by the various steps shown in FIG. 1 above, and the method provided by the embodiment of the present application will be described below in conjunction with FIG. 4 Instructions.
  • FIG. 4 is a flowchart of a method for generating verification content of lip recognition provided by an embodiment of the present application.
  • a verification content generation method provided by an embodiment of the present application may include the following:
  • Step S401 Obtain the lip recognition request of the terminal device, and obtain the verification request parameters according to the lip recognition request.
  • the verification request parameters of the terminal device are obtained according to the lip recognition request, and the verification request parameters include at least the verification object required by the lip recognition verification Number n.
  • the method provided in this embodiment of the present application may be executed by a terminal device. Specifically, the terminal device may obtain a user's lip recognition instruction through a verification request interface of lip recognition, and then send lip recognition based on the verification request interface Requested to the processor of the terminal device, the processor determines information such as the number of verification objects constituting the verification content based on the verification request interface to obtain verification request parameters.
  • the terminal device may display a verification interface for lip recognition through its display, and send a verification request for lip recognition to the server connected to the terminal device based on the user operation instruction obtained on the verification interface.
  • the above-mentioned server may be used to store data of each recognition object group processed by the implementation manners provided in the above-mentioned steps S101 to S103 shown in FIG. 1 and the pronunciation lip change included in each recognition object group.
  • the identity verification method is lip recognition
  • the verification content of the lip recognition is generated by the server.
  • the user submits a verification application on the identity verification interface of the terminal device, and the terminal device receives After the verification application, a lip recognition request is sent to the server.
  • the server obtains the lip recognition request of the terminal device, and extracts the information in the lip recognition request based on the terminal device's identity verification interface, including the verification interface information of the terminal device: verification The content length and the type of verification content, so as to obtain the verification request parameters of the terminal device.
  • Step S402 Determine the number of verification objects required for lip recognition based on the verification request parameters, and select the identification objects from the preset multiple identification object groups as verification objects to form verification content.
  • n identification objects are selected from a plurality of preset identification object groups as n verification objects according to the verification request parameters to form verification content of lip recognition.
  • the preset multiple identification object groupings are the multiple identification object groupings obtained in step S101-step S103 shown in FIG. 1.
  • the n verification objects belong to at least two identification object groups and the adjacent verification objects in the verification content belong to different identification object groups, and the verification objects form verification content of lip recognition through the generation rules of the verification content
  • the above n verification objects belong to at least two identification object groups, and the adjacent verification objects in the verification content belong to different identification object groups respectively, which is determined by the verification content generation rule.
  • the above verification content is composed of four verification objects (that is, n is 4)
  • select one identification object from multiple groups of identification objects as the first verification object, and then select one identification from the groups with different lip changes
  • the object is used as the second verification object.
  • the identification object is selected to obtain the third verification object and the fourth verification object, and finally the first verification object, the second verification object, the third verification object, and the fourth verification object are combined to generate the verification content.
  • the verification content may be generated by the verification content generation rule, wherein the verification content generation rule formulated according to the grouping of the identification objects may be acquired, and the identification objects are selected from the multiple identification objects as the verification objects to form the verification content.
  • the adjacent verification objects constituting the verification content do not belong to the same identification object group. Specifically, if two adjacent verification objects belong to the first group mentioned above, there is no lip change; when the initial state is half-open or closed, if the first verification object belongs to the first group, the pronunciation lip change cannot be detected. When the first verification object belongs to the second group or the fifth group, the identification object group corresponding to the specific pronunciation lip change cannot be detected. Rules can be formulated from this: 1. The adjacent verification objects belong to the first group differently; 2.
  • the identification object in the first group is not used as the first verification object of the verification content; 3.
  • the second group or the fifth group is not used.
  • the identification object is the first verification object of the verification content; 4. Only one verification object in the verification content belongs to the first group; 5. Two adjacent verification objects do not belong to the same group; 6. The verification objects in the verification content do not belong to the same group Wait.
  • the verification content is composed of numbers, and the generation rule for the verification content is rule 5.
  • the generation rule for the verification content is rule 5.
  • randomly select one identification object from multiple identification object groups as the first verification object assuming that the first verification object is 4, and 4 belongs to the first group, then from the second group to the fifth group according to the generation rule of the verification content
  • the identification object is selected as the second verification object, and the second verification object is assumed to be 5, and the fifth group belongs to the fourth group.
  • the identification object is selected from the first, second, third, and fifth groups as the third verification object, and the third verification object is assumed to be 2, 2 belongs to the second group, then select the recognition object from the first, third, fourth, and fifth groups as the fourth verification object, assuming that the fourth verification object is 6, 6 belongs to the third group, then the above first verification object , The second verification object, the third verification object, and the fourth verification object constitute verification content, and four-digit verification content 4526 is obtained.
  • Step S403 Output the verification content to the verification interface of lip recognition, and perform lip recognition verification of the verification content to the user of the terminal device based on the verification interface.
  • the verification content generated by the above steps is output to the verification page, and the lip recognition of the verification content is performed on the user of the terminal device to obtain the lip motion of the user, extract the lip characteristics of the user, match the phonemes, and form the corresponding Identify the object, compare with the output verification content to obtain the verification result, and feed it back to the user.
  • the generated verification content is output to a verification interface for lip recognition, a recognition image is obtained based on the verification interface, a human face is continuously recognized from the recognition image, continuous lip shape change characteristics of the user are extracted, and the lip shape is changed.
  • the feature is matched with the phoneme classification result to obtain the corresponding phoneme, and the phonemes are combined to obtain the corresponding pronunciation, and compared with the verification content to obtain the lip recognition result.
  • the verification content 4526 generated in the above step S402 is identified, after the continuous lip shape change feature of the user is extracted, the above lip shape change feature is identified, and it is obtained that the user's lip shape change is half to W shape From half-sheet to full-sheet to half-sheet to ou-shaped, after matching with the recognition object group, the above-mentioned lip-shaped change characteristics correspond to the first group, fourth group, second group and third group, and then learn based on real data As a result, the final lip recognition content is obtained, compared with the verification content 4526 above, and the verification result of the verification content is obtained, which is displayed on the verification interface and fed back to the user.
  • the embodiment of the present application is based on the identification object group obtained from step S101 to step S103 shown in FIG. 1, and selects the identification object as the verification object from the above identification object group to form the verification content, wherein adjacent verification objects do not belong to the same identification object group In this way, the lip shapes of pronunciations corresponding to adjacent verification objects change differently, which reduces the difficulty of identifying the verification contents and improves the accuracy of lip recognition.
  • FIG. 5 is a schematic diagram of a scenario for generating verification content of lip recognition provided by an embodiment of the present application.
  • the terminal device smartphone 200a Assuming that the terminal device smartphone 200a is used, the grouping of the identification objects used above, the phonetic lip classification results, and the obtained verification content generation rules are stored on the server 100a.
  • the above smartphone 200a completes the information entry, in order to protect the acquired The information is provided by the person, and a biopsy is required.
  • the lip recognition provided by the embodiment of the present application is used.
  • a verification page 201 is generated, wherein the verification page 201 includes a verification content display interface and a face recognition interface, and the verification content display interface displays the verification content 202.
  • the verification content 202 is composed of four verification objects, namely a first verification object 2021, a second verification object 2022, a third verification object 2023, and a fourth verification object 2024.
  • the terminal device 200a randomly selects an identification object from the server 100a according to the generation rule of the verification content, and inputs it into the verification content 202 as the first verification object 2021.
  • the first verification object 2021 recognizes objects with different pronunciation lip changes as the second verification object 2022 is input to the verification content 202, and selects the recognition objects different from the second verification object 2022 pronunciation lip changes as the third verification from the server 100a
  • the object 2023 is input to the verification content 202, and a recognition object different from the third verification object 2023's pronunciation lip change is selected as the fourth verification object 2024, which is input to the verification content 202, that is, the second verification object 2022 and the first verification object 2021 are not Belonging to the same identification object group, the third verification object 2023 and the second verification object 2022 do not belong to the same identification object group, and the fourth verification object 2024 and the third verification object 2023 do not belong to the same identification object group.
  • the verification content 202 is generated from the combination of the first verification object 2021, the second verification object 2022, the third verification object 2023, and the fourth verification object 2024, and the verification content 202 is output to the verification content display interface of the verification page 201.
  • the face recognition part acquires the user's face image, acquires the user's lip changes, extracts the user's lip shape feature, and performs lip recognition verification.
  • the terminal device 200a may send a verification request for lip recognition to the server 100a.
  • the server 100a may perform the operation of selecting the identification object as the verification object by the terminal device to form verification content.
  • the first verification object 2021, the second verification object 2022, the third verification object 2023, and the fourth verification object 2024 are obtained, and the verification content 202 is generated in combination.
  • the server 100a sends the verification content 202 to the terminal device 200a, and the terminal device sends the verification content 202 It is displayed on the verification page 201, and the user is lip-recognized for the verification content.
  • the terminal device may send a lip recognition request to the server, and the server generates verification content of the lip recognition.
  • FIG. 6-a is a lip recognition provided by an embodiment of the present application.
  • Interactive flow chart of the verification content generation method Specifically, as shown in Fig. 6-a, the verification process of lip recognition is realized with the server as the main body.
  • the interaction process of the verification content generation method of the above lip recognition is as follows:
  • Step S601a Send a lip recognition request.
  • the terminal device when the user performs lip recognition, the terminal device sends a verification request for lip recognition to the server. For details, refer to step S401 shown in FIG. 4.
  • Step S602a Determine the number of verification objects and the generation rules of the verification content.
  • the server determines the number n of verification objects constituting the verification content and the verification content generation rules according to the received verification request for lip recognition.
  • the server may include the correspondence between the terminal device application program and the verification content generation rule flag, or the server randomly selects the verification content generation rule when receiving the verification request recognized by the lip language.
  • the server will search for WeChat after receiving the verification request for lip recognition
  • the corresponding verification content generation rule determines the verification object selection and combination method of the verification content.
  • Step S603a Select multiple identification objects as verification objects to form verification content according to the generation rule of the verification content. Specifically, step S403 is shown in FIG. 4.
  • Step S604a Send the verification content to the terminal device.
  • Step S605a The terminal device acquires the verification image of the user.
  • the above verification image is a user verification image acquired by the verification interface of the terminal device, that is, a face image of the user.
  • Step S606a The terminal device feeds back the acquired verification image to the server.
  • Step S607a The server extracts the continuous lip shape change of the verification image.
  • Step S608a The server recognizes the above-mentioned continuous lip shape change, obtains the corresponding pronunciation, and matches with the verification content.
  • Step S609a The server feeds back the lip recognition result of the verification content to the terminal device, and the terminal device displays it.
  • the above steps S604a to S609a are the lip recognition process of verifying the content, continuously recognizing the human face from the image through machine vision, extracting the continuously changing features of the mouth shape, and inputting the continuously changing features into the recognition model, and more
  • the recognition objects are grouped and matched to obtain the corresponding pronunciation of the recognition object.
  • the pronunciation of the recognition object is compared with the verification content to obtain the lip recognition result of the verification content.
  • the lip recognition result is displayed on the verification interface and fed back to the user.
  • the above terminal device sends the lip recognition request to the server, and the server performs the above steps S602a-S604a and S607a-step S609a; optionally, the above-mentioned steps S601a to S609a can be performed by the server, and the above-mentioned step S601a is
  • the application program of the terminal device sends a lip recognition request to the server.
  • the server may be accessed by the terminal device shown in FIG. 5 and invoke the verification content generation process of lip recognition.
  • the verification content generation process of lip recognition can be directly performed by the terminal device, and the verification content can be generated by accessing the data in the memory.
  • the above memory may be an internal memory or an external memory of the terminal device, or a cloud server that can be shared with other terminal devices.
  • the above memory stores the data obtained in steps S101-S103 shown in FIG. 1, including phoneme classification results, Multiple recognition object groups and changes in pronunciation lip shape corresponding to each recognition object group.
  • FIG. 6-b is an interaction schematic diagram of another verification content generation method for lip recognition provided by an embodiment of the present application. The details are as follows:
  • Step 601b Obtain a lip recognition request.
  • Step 602b Determine the number of verification objects and the rules for generating verification content.
  • Step 603b Select multiple identification objects according to the generation rule of the verification content.
  • the terminal device sequentially selects the recognition object from the memory storing the data in steps S101-S103 according to the generation rule of the verification content.
  • Step 604b Use the multiple identification objects as verification objects to form verification content.
  • Step 605b feedback the verification content to the user.
  • step 606b the terminal device acquires the verification image of the user.
  • Step 607b extract the continuous lip changes of the verification image.
  • step 608b the continuous lip shape changes are identified, corresponding pronunciations are obtained, and the verification content is matched.
  • the terminal device obtains the phoneme classification result, a plurality of recognition object groups and the pronunciation lip changes corresponding to each recognition object group from the memory, so as to match the above continuous lip shape changes to obtain the lip recognition Validation results.
  • steps S601b-step S608b The specific implementation of the above steps S601b-step S608b is shown in steps S601a-step S608a shown in FIG. 6-a.
  • the above steps are directly performed by the terminal device, and only the step S101 shown in FIG. 1 stored in the memory is obtained from the memory -The data in step S103, so as to realize the selection of the identification object in the above step and the lip recognition verification of the verification content.
  • FIG. 7 is a verification content generation device for lip recognition provided by an embodiment of the present application.
  • the verification content generating device 70 for lip recognition can be used for the terminal device in the embodiment corresponding to FIG. 5 above.
  • the device can include: a response module 701, a processing module 704 and an output module 705.
  • the response module 701 is used to obtain the lip recognition request of the terminal device, and obtain the verification request parameters according to the lip recognition request.
  • the processing module 704 is configured to determine the number n of verification objects required for lip recognition based on the verification request parameters obtained by the response module, and select n identification objects as n verification objects from a plurality of preset identification object groups It constitutes the verification content of lip recognition.
  • the above n verification objects belong to at least two identification object groups and the adjacent verification objects in the verification content belong to different identification object groups. Among them, the identification objects included in the different identification object groups The lip changes in pronunciation are different.
  • the output module 705 is configured to output the verification content to the verification interface for lip recognition, and perform lip recognition verification of the verification content to the user of the terminal device based on the verification interface.
  • the above processing module 704 is also used to:
  • Acquire multiple identification objects, and the multiple identification objects include at least two types of identification objects with lip shape changes;
  • Recognition objects whose pronunciation lip shape changes among the plurality of recognition objects into the first type of lip shape changes among the at least two types of lip shape changes are divided into first recognition object groups, and the pronunciation lip shapes among the plurality of recognition objects change
  • the identification objects of the second type of lip shape change among the above-mentioned at least two types of lip shape changes are divided into second identification object groups to obtain a plurality of identification object groups.
  • the above processing module 704 is also used to:
  • Phonetic decomposition of the pinyin of any one of the plurality of recognition objects and according to the corresponding relationship between each phoneme obtained by the decomposition and the phoneme and pronunciation lip, determine the pronunciation lip corresponding to each phoneme;
  • the above processing module 704 is also used to:
  • the Chinese phonemes corresponding to the first pronunciation lip shape of the at least two pronunciation lip shapes in the plurality of Chinese phonemes are classified into a first category, and the pronunciation lip shapes in the plurality of Chinese phonemes are The Chinese phonemes corresponding to the second pronunciation lip shape of the at least two pronunciation lip shapes are divided into a second category;
  • the above processing module 704 is also used to:
  • the combining of the pronunciation lip shapes corresponding to the respective phonemes to generate the pronunciation lip shape change of the recognition object includes: combining the consonant pronunciation lip shape corresponding to the consonant phoneme and the corresponding vowel phoneme
  • the vowel pronunciation lip shape combination obtains the pronunciation lip shape change of any recognition object.
  • the above device may further include: a storage module 702 and a lip recognition module 703.
  • the storage module 702 is used to store phoneme classification results, multiple identification object groupings, established verification content generation methods, and other data used when generating verification content;
  • the lip recognition module 703 is configured to recognize the user's lip motion, match the user's lip motion with the pronunciation lip shape of the verification content of the generated lip recognition, and obtain a lip recognition result.
  • the above processing module 704 is also used to:
  • the generation rule of the verification content of the lip recognition is selected, and the identification object is selected as the verification object according to the generation rule to form the verification content of the lip recognition.
  • the above-mentioned device can execute the implementation provided by each step in the implementation provided in FIG. 1 or FIG. 4 through the above-mentioned modules to realize the functions implemented in the above embodiments.
  • FIG. 1 or FIG. 4 See FIG. 1 or The corresponding description provided by each step in the method embodiment shown in FIG. 4 will not be repeated here.
  • the verification content generating device may select verification objects from the preset identification object groups as verification objects to form verification content, wherein the identification objects corresponding to adjacent verification objects do not belong to The same recognition object grouping, that is, the pronunciation lip changes corresponding to the adjacent verification objects are different, so that the change of pronunciation lip shape of the adjacent verification objects in the composed verification content is changed, thereby reducing the occurrence of the situation that the verification content is difficult to identify and improving the lip language Recognition accuracy.
  • the terminal device in this embodiment may include: one or more processors 801, a memory 802, and a transceiver 803.
  • the processor 801, the memory 802, and the transceiver 803 are connected through a bus 804.
  • the memory 802 is used to store a computer program, and the computer program includes program instructions.
  • the processor 801 and the transceiver 803 are used to call program instructions stored in the memory 802, and perform the following operations:
  • the above transceiver 803 is used to obtain a lip recognition request of the terminal device.
  • the processor 801 is configured to obtain the verification request parameters according to the lip recognition request acquired by the transceiver 803, determine the number n of verification objects required for the lip recognition verification according to the verification request parameters, and select from a plurality of preset recognitions Select n identification objects as n verification objects in the object group to form verification content of lip recognition.
  • the above n verification objects belong to at least two identification object groups and the adjacent verification objects in the verification content belong to different identification object groups respectively , Where the pronunciation lip shape of the recognition objects included in the different recognition object groups is different;
  • the above transceiver 803 is also used to output the verification content to the verification interface for lip recognition.
  • the processor 801 is also used to perform lip recognition verification of the verification content of the user of the terminal device based on the verification interface.
  • the pronunciation lip change of the first verification object in the verification content of the above-mentioned lip recognition does not include the pronunciation lip change starting with a half-opened mouth or closed mouth.
  • the foregoing processor 801 is used to:
  • Acquire multiple identification objects, and the multiple identification objects include at least two types of identification objects with lip shape changes;
  • Recognition objects whose pronunciation lip shape changes among the plurality of recognition objects into the first type of lip shape changes among the at least two types of lip shape changes are divided into first recognition object groups, and the pronunciation lip shapes among the plurality of recognition objects change
  • the identification objects of the second type of lip shape change among the above-mentioned at least two types of lip shape changes are divided into second identification object groups to obtain a plurality of identification object groups.
  • the foregoing processor 801 is used to:
  • the above phoneme classification results include the correspondence between phonemes and pronunciation lip shapes;
  • Phonetic decomposition of the pinyin of any one of the plurality of recognition objects and according to the correspondence between each phoneme obtained by the decomposition and the phoneme and the pronunciation lip, determine the pronunciation lip corresponding to each phoneme;
  • the pronunciation lip shapes corresponding to the respective phonemes are combined to generate the pronunciation lip changes of any of the recognition objects to obtain the pronunciation lip changes of the respective recognition objects.
  • the foregoing processor 801 is used to:
  • the multiple Chinese phonemes include at least two phonetic lip-shaped phonemes
  • the Chinese phonemes corresponding to the lip shape of the first pronunciation lip shape are divided into the first category, and the pronunciation lip shapes of the plurality of Chinese phonemes are at least two pronunciations.
  • the second pronunciation in the lip shape The Chinese phonemes corresponding to the lip shape are divided into the second category;
  • the first category and the second category are stored in the phoneme classification result.
  • the foregoing processor 801 is used to:
  • the pinyin of any one of the plurality of recognition objects is decomposed into consonant phonemes and vowel phonemes, and the consonant phonemes and vowel phonemes are lip-matched with the phoneme classification results, and the phonemes and pronunciations in the phoneme classification results
  • the correspondence relationship of the lip shape is obtained the consonant pronunciation lip shape corresponding to the consonant phoneme and the vowel pronunciation lip shape corresponding to the vowel phoneme, and combining the consonant pronunciation lip shape and the vowel pronunciation lip shape to obtain the pronunciation lip of any recognition object ⁇ Shape changes.
  • the processor 801 may be a central processing unit (CPU), and the processor may also be other general-purpose processors, digital signal processors (DSPs), and dedicated integration Circuit (application specific integrated circuit, ASIC), ready-made programmable gate array (field-programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 802 may include a read-only memory and a random access memory, and provide instructions and data to the processor 801 and the transceiver 803. A portion of the memory 802 may also include non-volatile random access memory. For example, the memory 802 may also store device type information.
  • the above-mentioned terminal device may execute the implementation manner provided by each step in FIG. 1 or FIG. 4 through each of its built-in functional modules.
  • the terminal device may select the identification object from the preset identification object group as the verification object to constitute the verification content, wherein the identification objects corresponding to the adjacent verification objects do not belong to the same identification object group, that is, adjacent
  • the lip shape of the pronunciation corresponding to the verification object changes differently, so that the lip shape of the adjacent verification object in the composition verification content changes, thereby reducing the occurrence of the situation that the verification content is difficult to identify and improving the accuracy of lip recognition.
  • Embodiments of the present application also provide a computer-readable storage medium that stores a computer program, and the computer program includes program instructions, which are executed by a processor to implement the steps in FIG. 1 or FIG. 4
  • the verification content generation method provided by the lip recognition can be specifically referred to the implementation manner provided in the above steps of FIG. 1 or FIG. 4, and will not be repeated here.
  • the above-mentioned computer-readable storage medium may be the verification content generating device for lip recognition provided in any of the foregoing embodiments or the internal storage unit of the above-mentioned terminal device, such as a hard disk or a memory of an electronic device.
  • the computer-readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk equipped on the electronic device, a smart memory card (smart media card, SMC), a secure digital (SD) card, Flash card (flash card), etc.
  • the computer-readable storage medium may also include both an internal storage unit of the electronic device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the electronic device.
  • the computer-readable storage medium can also be used to temporarily store data that has been or will be output.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Social Psychology (AREA)
  • Business, Economics & Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

一种唇语识别的验证内容生成方法和装置,该方法包括:获取终端设备的唇语识别请求,根据唇语识别请求得到验证请求参数(S401);确定唇语识别验证所需的验证对象的数量,并从预置的多个识别对象分组中选取识别对象作为验证对象组成唇语识别的验证内容,其中,验证内容中相邻的验证对象分别属于不同的识别对象分组(S402);将该验证内容输出到唇语识别的验证界面,使用该验证内容对终端设备的用户进行唇语识别验证(S403)。该方法能够生成相邻验证对象发音唇形变化不同的验证内容,提高了唇语识别的准确率和方便性。

Description

唇语识别的验证内容生成方法及相关装置
本申请要求于2018年11月28日提交中国专利局、申请号为201811430520.0、申请名称为“唇语识别的验证内容生成方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种唇语识别的验证内容生成方法及相关装置。
背景技术
目前,客户的远程身份验证需求逐渐显现,采用人脸识别对客户进行身份验证成为了远程身份验证的常用手段之一。在人脸识别过程中,为了进行反欺诈识别,在采用视频验证的基础上,加入眨眼、摇头、随机数字和/或文字唇读检验等活体检测策略。其中,在随机数字和/或文字的唇读检验的过程中所采用的唇语识别虽然有进行检测唇部动作,但是这个过程仍然以语音识别为主,唇部动作匹配为辅。现有技术中,唇语识别是直接通过机器视觉技术,从图像中识别出人脸,并提取此人连续的口型变化特征,将连续变化的口型特征输入到唇语识别模型中识别出对应的发音,再计算出可能性最大的自然语言语句。现有技术中的唇语识别主要依赖上下文语义的支持以获得最可能匹配的结果,在大量汉语和/或数字单字口型重合度较高时,单纯的唇语识别难度大。
发明内容
本申请实施例提供了一种唇语识别的验证内容生成方法及相关装置,可降低唇语识别过程中唇形变化难以识别的出现概率,提高了唇语识别的准确率和便捷性,适用性更强。
本申请实施例一方面提供了一种唇语识别的验证内容生成方法,包括:
获取终端设备的唇语识别请求,根据所述唇语识别请求得到验证请求参数;
根据所述验证请求参数确定唇语识别验证所需的验证对象的数量n,并从预置的多个识别对象分组中选取n个识别对象作为n个验证对象组成唇语识别的验证内容,所述n个验证对象至少属于两个识别对象分组且所述验证内容中相邻的验证对象分别属于不同的识别对象分组,其中,不同识别对象分组中所包括的识别对象的发音唇形变化不同;
将所述验证内容输出到唇语识别的验证界面,基于所述验证界面对所述终端设备的用户进行所述验证内容的唇语识别验证。
本申请实施例另一方面提供了一种唇语识别的验证内容生成装置,包括:
响应模块,用于获取终端设备的唇语识别请求,根据所述唇语识别请求得到验证请求参数;
处理模块,用于根据所述响应模块得到的所述验证请求参数确定唇语识别验证所需的验证对象的数量n,并从预置的多个识别对象分组中选取n个识别对象作为n个验证对象组成唇语识别的验证内容,所述n个验证对象至少属于两个识别对象分组且所述验证内容中相邻的验证对象分别属于不同的识别对象分组,其中,不同识别对象分组中所包括的识别对象的发音唇形变化不同;
输出模块,用于将所述处理模块组成的所述验证内容输出到唇语识别的验证界面,基于所述验证界面对所述终端设备的用户进行所述验证内容的唇语识别验证。
本申请实施例另一方面提供了一种终端设备,包括:处理器、收发器和存储器,所述处理器、所述收发器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器和所述收发器被配置用于调用所述程序指令,执 行如本申请实施例中一方面中所述的方法。
本申请实施例另一方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序指令,所述计算机程序指令当被处理器执行时使处理器执行如本申请实施例中一方面中的方法。
采用本申请实施例可减少发音唇形变化难以识别的情况的发生,提高唇语识别的准确率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。
图1是本申请实施例提供的识别对象分组的流程示意图;
图2是本申请实施例提供的音素分类结果示意图;
图3是本申请实施例提供的数字发音唇形变化示意图;
图4是本申请实施例提供的唇语识别的验证内容生成方法的流程示意图;
图5是本申请实施例提供的唇语识别的验证内容生成的场景示意图;
图6-a是本申请实施例提供的唇语识别的验证内容生成方法的一交互流程图;
图6-b是本申请实施例提供的唇语识别的验证内容生成方法的又一交互流程图;
图7是本申请实施例提供的唇语识别的验证内容生成装置的结构示意图;
图8是本申请实施例提供的终端设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
基于唇语识别的活体验证是区别于语音验证之外的另一种活体检测方式。唇语识别隶属于人脸识别检测的一部分,通过机器视觉识别,仅靠识别说话人唇部动作,就能解读说话者所说的内容,因此唇语识别可以辅助语音交互及图像识别,比如在周围噪音过大时,通过唇语识别技术则可以规避干扰,大大提高系统识别的准确性。目前,唇语识别是通过图像提取人脸,获得此人的连续口型变化特征,与唇语识别模型中识别出的对应发音匹配,从而计算出可能性最大的自然语言语句,并不能直接得出结果,因此本申请实施例提供的唇语识别的验证内容生成方法及相关装置,旨在从生成阶段减少发音唇形变化难以识别的情况出现,提高唇语识别的准确率。
本申请实施例提供的唇语识别的验证内容生成方法(为方便描述可简称本申请实施例提供的方法)可适用于手机、电脑、移动互联网设备(mobile internet device,MID)或其他可获取唇部图像的终端设备,具体可根据实际应用场景确定,在此不做限制。为方便描述,下面将以终端设备为例进行说明。具体的,本申请实施例是通过从预置的识别对象分组中按照一定的验证内容生成规则选取多个识别对象,将选取的多个识别对象作为验证对象组成用作唇语识别验证的验证内容。识别对象分组是以发音唇形变化相同的识别对象为一组进行分组得到的,不同组之间的发音唇形变化不同。本申请实施例提供的方法可以基于识别对象分组选择发音唇形变化不同的识别对象用于组成唇语识别验证的验证内容,从而可使得验证内容中各个识别对象的发音唇形变化不同,提高唇语识别的准确率,适用性更强。为方便描述,下面将首先对预置的识别对象分组的生成过程进行说明。
参见图1,图1是本申请实施例提供的识别对象分组的流程示意图。其中,如图1所示,本申请实施例提供的识别对象分组的过程可包括如下步骤:
步骤S101,将汉语音素按发音唇形分类,得到音素分类结果。
在一些可行的实施方式中,汉语音素是根据语音的自然属性划分出的最小语音单位,简称音素。依据音节里的发音动作进行分析,一个发音动作构成一个音素,相同发音动作发出的音就是同一音素,上述每个发音动作各自对应一个发音唇形。其中,音素分为元音和辅音两大类,具体的,元音包括a、o、e、ê、i、u、ü、-i[ 1](前i)和-i[ι](后i)、er,辅音包括b、p、m、f、z、c、s、d、t、n、l、zh、ch、sh、r、j、q、x、g、k、h、ng。基于音素的发音唇形,可以将所有的汉语音素按发音唇形进行分类,得到音素分类结果。具体的,上述音素分类结果包括音素和发音唇形的对应关系,所有的汉语音素可分成7类音素,其中,上述音素分类结果如图2所示,图2是本申请实施例提供的音素分类结果示意图。如图2所示,上述音素分类结果如下7类:
第一类(比如形变1),为半张唇形,如图片1所示,这里第一类的音素分类结果中可包括e,i,d,t,n,l,g,k,h,j,q,x,z,c,s,zh,ch,sh,ng,y等音素;
第二类(比如形变2),为全张唇形,如图片2所示,这里第二类的音素分类结果中可包括a,er等音素;
第三类(比如形变3),为AO形唇形,如图片3所示,这里第三类的音素分类结果中可包括ao;
第四类(比如形变4),为w形唇形,如图片4所示,这里第四类的音素分类结果中可包括u,v,o,w;
第五类(比如形变5),为ou形唇形,如图片5所示,这里第五类的音素分类结果中可包括ou,iu;
第六类(比如形变6),为闭嘴唇形,如图片6所示,这里第六类的音素分类结果中可包括b,p,m;
第七类(比如形变7),为咬唇唇形,如图片7所示,这里第七类的音素分类结果中可包括f。
其中,ao,ou,iu由多个音素组成,但基于单口型匹配原则,由于这三个音素组合的发音唇形均为单口型,因此这里也被认为可作为单个音素进行发音唇形分类。其中,单口型匹配原则指以唇形无变化的音素或音素组合作为后续识别对象进行发音唇形匹配的标准。可以理解,音素作为最小语音单位,是识别对象拼音组成的基本单位,可以作为后续进行验证内容的发音唇形匹配的基础。
步骤S102,将多个识别对象的任一识别对象进行拼音分解,将上述拼音分解的结果与上述音素分类结果进行发音唇形匹配。
在一些可行的实施方式中,任一识别对象的读音可以是一个独立的音节,而每个音节可以由音素或音素组合组成。具体的,可获取多个识别对象,将多个识别对象中的任一识别对象进行拼音分解,分解后得到组成上述任一识别对象拼音的辅音音素和元音音素,将上述辅音音素和上述元音音素与步骤S101中得到的音素分类结果进行发音唇形匹配,通过上述音素分类结果中音素和发音唇形的对应关系确定上述辅音音素对应的辅音发音唇形和上述元音音素对应的元音发音唇形,将上述辅音发音唇形和上述元音发音唇形组合得到任一识别对象的发音唇形变化。
其中,这里假设识别对象为数字,数字包括0~9十个识别对象,基于拼音分解得到各个数字对应的辅音音素和元音音素,将上述辅音音素和元音音素与上述音素分类结果进行匹配,得到对应的辅音发音唇形和元音发音唇形,将上述辅音发音唇形和元音发音唇形组合得到该数字对应的发音唇形变化,具体的参见图3,图3是本申请实施例提供的一种数字的发音唇形变化示意图,如图3所示,数字的发音唇形变化如下:
第一识别对象,例如数字0,其中,0的拼音为ling,可分解为辅音形变1和元音形变ing,其中,上述辅音形变1和元音形变ing均对应上述音素分类结果中的第一类(即形变1)。上述音素分类结果中的第一类对应的发音唇形为半张,因此,上述辅音形变1和元音形变ing组合得到数字0的发音唇形变化为半张,如图1所示唇形变化。
第二识别对象,例如数字1,其中,1的拼音为yi,可分解为辅音形变y和元音形变i,其中,上述辅音形变y和元音形变i均对应上述音素分类结果中的第一类(即形变1)。上述音素分类结果中的第一类对应的发音唇形为半张,因此,上述辅音形变y和元音形变i组合得到数字1的发音唇形变化为半张,如图1所示唇形变化。
第二识别对象,例如数字1,其中,1的拼音还可以为yao,可分解为辅音形变y和元音形变ao,其中,上述辅音形变y和元音形变ao分别对应上述音素分类结果中的第一类(即形变1)和第三类(即形变3)。上述音素分类结果中的第一类对应的发音唇形为半张,上述音素分类结果中的第三类对应的发音唇形为AO形,因此,上述辅音形变y和元音形变ao组合得到数字1对应的发音唇形变化为半张到AO形,如图1到图3所示唇形变化。
第三识别对象,例如数字2,其中,2的拼音为er,可分解为辅音形变e和元音形变er,其中,上述辅音形变e和元音形变er分别对应上述音素分类结果中的第一类(即形变1)和第二类(即形变2)。上述音素分类结果中的第一类对应的发音唇形为半张,上述音素分类结果中的第二类对应的发音唇形为全张,因此,上述辅音形变e和元音形变er组合得到数字2对应的发音唇形变化为半张到全张,如图1到图2所示唇形变化。
第四识别对象,例如数字3,其中,3拼音为san,可分解为辅音形变s和元音形变an,其中,上述辅音形变s和元音形变an分别对应上述音素分类结果中的第一类(即形变1)和第二类(即形变2)。上述音素分类结果中的第一类对应的发音唇形为半张,上述音素分类结果中第二类对应的发音唇形为全张,因此,上述辅音形变s和元音形变an组合得到数字3对应的发音唇形变化为半张到全张,如图1到图2所示唇形变化。
第五识别对象,例如数字4,其中,4拼音为si,可分解为辅音形变s和元音形变i,其中,上述辅音形变s和元音形变i均对应上述音素分类结果中的第一类(即形变1)。上述音素分类结果中的第一类对应的发音唇形为半张,因此,上述辅音形变s和元音形变i组合得到数字4对应的发音唇形变化为半张,如图1所示唇形变化。
第六识别对象,例如数字5,其中,5拼音为wu,可分解为辅音形变w和元音形变u,其中,上述辅音形变w和元音形变u均对应上述音素分类结果中的第四类(即形变4)。上述音素分类结果中的第四类对应的发音唇形为W形,因此,上述辅音形变w和元音形变u组合得到数字5对应的发音唇形变化为W形,如图4所示唇形变化。
第七识别对象,例如数字6,其中,6拼音为liu,可分解为辅音形变l和元音形变iu,其中,上述辅音形变l和元音形变iu分别对应上述音素分类结果中的第一类(即形变1)和第五类(即形变5)。上述音素分类结果中的第一类对应的发音唇形为半张,上述音素分类结果中的第五类对应的发音唇形为ou形,因此,上述辅音形变l和元音形变iu组合得到数字6对应的发音唇形变化为半张到ou形,如图1到图5所示唇形变化。
第八识别对象,例如数字7,其中,7拼音为qi,可分解为辅音形变q和元音形变i,其中,上述辅音形变q和元音形变i均对应上述音素分类结果中的第一类(即形变1)。上述音素分类结果中第一类对应的发音唇形为半张因此,上述辅音形变q和元音形变i组合得到数字7对应的发音唇形变化为半张,如图1所示唇形变化。
第九识别对象,例如数字8,其中,8拼音为ba,可分解为辅音形变b和元音形变a,其中,上述辅音形变b和元音形变a分别对应上述音素分类结果中的第六类(即形变6)和第二类(即形变2)。上述音素分类结果中的第六类对应的发音唇形为闭嘴,上述音素分类 结果中的第二类对应的发音唇形为全张,因此,上述辅音形变b和元音形变a组合得到数字8对应的发音唇形变化为闭嘴到全张,如图6到图2所示唇形变化。
第十识别对象,例如数字9,其中,9拼音为jiu,可分解为辅音形变j和元音形变iu,其中,上述辅音形变j和元音形变iu分别对应上述音素分类结果中的第一类(即形变1)和第五类(即形变5)。上述音素分类结果中的第一类对应的发音唇形为半张,上述音素分类结果中的第五类对应的发音唇形为ou形,因此,上述辅音形变j和元音形变iu组合得到数字9对应的发音唇形变化为半张到ou形,如图1到图5所示唇形变化。
可选的,若识别对象为文字,由于文字基数较多,可以根据身份活体识别产生的验证码及文字的使用频率,采用大数据统计分析,对大概率出现的文字进行拼音分解及发音唇形匹配,得到文字对应的发音唇形变化。此外,还可以根据预设的更新时间或获取更新指令学习新的文字作为新的识别对象,并根据新的文字的拼音分解得到新的文字的拼音分解结果,将上述新的文字的拼音分解结果与上述音素分类结果进行发音唇形匹配得到新的文字对应的发音唇形变化。其中,文字的拼音分解及发音唇形变化生成方式与上述数字的拼音分解及发音唇形变化生成方式相同,这里假定获取到文字“号、语、丰、谷、故”五个识别对象。
具体的,“号”的拼音hao可以分解为辅音形变h和元音形变ao,上述辅音形变h和元音形变ao分别对应图2所示的音素分类结果中的第一类和第三类,如图2可知对应的辅音形变h的发音唇形为半张,元音形变ao的发音唇形为AO形,因此,基于上述辅音形变h和元音形变ao的发音唇形组合得到“号”对应的发音唇形变化为半张到AO形。
“语”的拼音yu可以分解为辅音形变y和元音形变u,上述辅音形变y和元音形变u分别对应图2所示的音素分类结果中的第一类和第四类,如图2可知对应的辅音形变y的发音唇形为半张,元音形变u的发音唇形为W形,因此,基于上述辅音形变y和元音形变u的发音唇形组合得到“语”对应的发音唇形变化为半张到W形。
“丰”的拼音feng可以分解为辅音形变f和元音形变ng,上述辅音形变f和元音形变ng分别对应图2所示的音素分类结果中的第七类和第一类,如图2可知对应的辅音形变f的发音唇形为咬唇,元音形变ng的发音唇形为半张,因此,基于上述辅音形变f和元音形变ng的发音唇形组合得到“丰”对应的发音唇形变化为咬唇到半张。
“谷”的拼音gu可以分解为辅音形变g和元音形变u,上述辅音形变g和元音形变u分别对应图2所示的音素分类结果中的第一类和第四类,如图2可知对应的辅音形变g的发音唇形为半张,元音形变u的发音唇形为W形,因此,基于上述辅音形变g和元音形变u的发音唇形组合得到“谷”对应的发音唇形变化为半张到W形。
“故”的拼音gu可以分解为辅音形变g和元音形变u,上述辅音形变g和元音形变u分别对应图2所示的音素分类结果中的第一类和第四类,如图2可知对应的辅音形变g的发音唇形为半张,元音形变u的发音唇形为W形,因此,基于上述辅音形变g和元音形变u的发音唇形组合得到“故”对应的发音唇形变化为半张到W形。
上述文字“号、语、丰、谷、故”五个识别对象的拼音分解及发音唇形变化生成方式仅是文字对应的拼音分解及发音唇形变化生成方式的示例,包括但不限于上述五个文字,具体可根据实际应用场景确定,在此不做限制。
步骤S103,基于上述唇形匹配得到的识别对象的发音唇形变化将多个识别对象进行分组。
在一些可行的实施方式中,基于上述各个识别对象的发音唇形变化可将上述多个识别对象进行分组,将发音唇形变化相同的识别对象分为一组,不同则为不同组。具体的,可以对任一识别对象执行上述步骤S102后,查找上述识别对象分组的分组情况,若存在该任 一识别对象的发音唇形变化对应的识别对象分组,即上述任一识别对象与上述得到的多个识别对象分组中某一个识别对象分组中所包括的识别对象的发音唇形变化相同,则将该识别对象存入该发音唇形变化对应的识别对象分组。若不存在该任一识别对象的发音唇形变化对应的识别对象分组,即上述任一识别对象与上述得到的多个识别对象分组中任一识别对象分组中所包括的识别对象的发音唇形变化均不相同,则创建新的识别对象分组,并存储该任一识别对象的发音唇形变化情况,获得新的识别对象分组中所包括的识别对象的发音唇形变化。基于上述任一识别对象相同的识别对象分组操作方式可最终将获取的多个识别对象分别放入各自发音唇形变化对应的识别对象分组,得到多个识别对象分组。可选的,还可以对获取的多个识别对象的发音唇形变化进行标记,在将多个识别对象的发音唇形变化情况统计完后,将发音唇形变化相同的对应识别对象添加相同的标记,将具有相同标记的识别对象分为一组,得到多个识别对象分组。
可选的,假定识别对象为0~9十个数字,根据步骤S102得到的数字发音唇形变化情况,如图3所示,根据数字识别对象的发音唇形变化情况可依次对数字进行分组,从而得到数字的多个识别对象分组。具体的,数字按发音唇形变化分组可得到如下情况:
第一分组,包括0、1(yi)、4、7,对应的发音唇形变化为半张;
第二分组,包括2、3、1(yao),对应的发音唇形变化为半张到全张/AO形;
第三分组,包括6、9,对应的发音唇形变化为半张到ou形;
第四分组,包括5,对应的发音唇形变化为w形;
第五分组,包括8,对应的发音唇形变化为闭嘴到半张。
其中,1(yao)的发音唇形变化为半张到AO形,与2、3相似,因此可划分为同一分组。
可选的,若识别对象为文字,假定已存在上述数字的多个识别对象分组,则将获取的多个文字识别对象依次执行步骤S102,得到上述多个文字识别对象的任一文字识别对象对应的发音唇形变化,匹配上述数字的多个识别对象分组。若存在与上述任一文字识别对象的发音唇形变化相同的识别对象分组,则将上述任一文字识别对象存入该组,否则新建识别对象分组,将上述任一文字识别对象存入该新建识别对象分组中。
可选的,这里假定已经存在数字的识别对象分组,文字识别对象分组与数字识别对象分组在同一地址存储。对如步骤S102所示的五个文字识别对象进行分组,具体的,“号”的发音唇形变化为半张到AO形,与上述第二分组对应的发音唇形变化相同,将“号”放入上述第二分组;“语”的发音唇形变化为半张到W形,经检测与W形的发音唇形变化相同,可认为与上述第四分组对应的发音唇形变化相同,将“语”放入上述第四分组;丰的发音唇形变化为咬唇到半张,上述分组中没有该发音唇形变化对应的分组,因此新建分组第六分组,将“丰”放入上述第六分组中;谷的发音唇形变化为半张到W形,经检测与W形的发音唇形变化相同,可认为与上述第四分组对应的发音唇形变化相同,将“谷”放入第四分组;故的发音唇形变化与谷相同,分组方式与谷的相同,将“故”放入上述第四分组。
具体的,得到的识别对象分组情况如下:
第一分组,包括0、1(yi)、4、7,对应的发音唇形变化为半张;
第二分组,包括2、3、1(yao)、号,对应的发音唇形变化为半张到全张/AO形;
第三分组,包括6、9,对应的发音唇形变化为半张到ou形;
第四分组,包括5、语、谷、故,对应的发音唇形变化为w形;
第五分组,包括8,对应的发音唇形变化为闭嘴到半张;
第六分组,包括丰,对应的发音唇形变化为咬唇到半张。
可选的,数字识别对象分组可以和文字识别对象分组分别存储,即数字识别对象分组 与文字识别对象分组以不同的地址空间进行存储,从而可以在获取到组成验证内容的验证对象的类型后,从对应的识别对象分组中选取识别对象作为验证对象,组成验证内容。可选的,分组过程如上述数字和文字混合存储时的分组过程相同,只是对数字进行分组时访问对应的数字识别对象分组,对文字进行分组时访问对应的文字识别对象分组。
本申请实施例提供的识别对象的分组过程,可通过发音唇形变化进行分组,同一识别对象分组中的识别对象对应的发音唇形变化相同,不同的识别对象分组中的识别对象对应的发音唇形变化不同,为唇语识别的验证内容生成提供了选择依据,只需在选取识别对象作为验证对象时,相邻的验证对象对应的识别对象不属于同一识别对象分组,从而使得相邻的验证对象的发音唇形变化不同,可以较为方便的得到相邻验证对象的发音唇形变化不同的验证内容,减少了验证内容难以识别的情况的出现,提高了唇语识别的准确率。
本申请实施例提供的方法可基于上述图1所示的各个步骤所生成的多个识别对象分组,生成唇语识别验证所需的验证内容,下面将结合图4对本申请实施例提供的方法进行说明。
参见图4,图4是本申请实施例提供的一种唇语识别的验证内容生成方法的流程图。其中,如图4所示,本申请实施例提供的一种唇语识别的验证内容生成方法可以包括如下:
步骤S401,获取终端设备的唇语识别请求,根据上述唇语识别请求得到验证请求参数。
在一些可行的实施方式中,获取到终端设备的唇语识别请求后,根据上述唇语识别请求得到上述终端设备的验证请求参数,上述验证请求参数至少包括唇语识别验证所需的验证对象的数量n。可选的,本申请实施例提供的方法可通过终端设备执行,具体的,终端设备可通过唇语识别的验证请求界面获取到用户的唇语识别指令后,基于该验证请求界面发送唇语识别请求到该终端设备的处理器,该处理器基于上述验证请求界面确定组成验证内容的验证对象的数量等信息,得到验证请求参数。可选的,终端设备可通过其显示器展示唇语识别的验证界面,并基于该验证界面上获取得到的用户操作指令向终端设备所连接的服务器发送唇语识别的验证请求。这里,上述服务器可用于存储上述图1所示步骤S101至步骤S103所提供的实现方式处理得到的各个识别对象分组以及各个识别对象分组中所包括的发音唇形变化等数据。
具体的,这里假定用户需要进行身份验证,该身份验证方式为唇语识别,该唇语识别的验证内容由服务器生成,其中,上述用户在终端设备的身份验证界面提交验证申请,终端设备接收到该验证申请后,向服务器发送唇语识别请求,服务器获取到终端设备的唇语识别请求,基于上述终端设备的身份验证界面提取唇语识别请求中的信息,包括终端设备的验证界面信息:验证内容长度及验证内容类型,从而得到上述终端设备的验证请求参数。
步骤S402,根据上述验证请求参数确定唇语识别验证所需的验证对象的数量,并从预置的多个识别对象分组中选取识别对象作为验证对象组成验证内容。
在一些可行的实施方式中,根据上述验证请求参数从预置的多个识别对象分组中选取n个识别对象作为n个验证对象组成唇语识别的验证内容。其中,上述预置的多个识别对象分组为图1所示的步骤S101-步骤S103得到的多个识别对象分组。具体的,上述n个验证对象至少属于两个识别对象分组且上述验证内容中相邻的验证对象分别属于不同的识别对象分组,上述验证对象通过验证内容的生成规则组成唇语识别的验证内容,上述n个验证对象至少属于两个识别对象分组且上述验证内容中相邻的验证对象分别属于不同的识别对象分组是由验证内容的生成规则决定的。具体的,假定上述验证内容由四位验证对象组成(即n为4),从识别对象的多个分组中选取一个识别对象作为第一验证对象,再从唇形变化不同的分组中选取一个识别对象作为第二验证对象,同理选取识别对象得到第三验证对象及第四验证对象,最终将上述第一验证对象、第二验证对象、第三验证对象及第四验证对象组合生成上述验证内容。
在一些可行的实施方式中,可以通过验证内容生成规则生成验证内容,其中可以获取根据识别对象的分组情况制定的验证内容生成规则,从多个识别对象中选取识别对象作为验证对象组成验证内容。其中,组成验证内容的相邻验证对象不属于同一识别对象分组。具体的,若相邻两个验证对象同属于上述第一分组,无唇形变化;初始状态为半张或闭嘴时,若首位验证对象属于第一分组则无法检测到发音唇形变化,若首位验证对象属于第二分组或第五分组时,无法检测出具体的发音唇形变化对应的识别对象分组等。可以由此制定规则:1、相邻的验证对象不同属于第一分组;2、不以第一分组中识别对象作为验证内容的首位验证对象;3、不以第二分组或第五分组中的识别对象作为验证内容的首位验证对象;4、验证内容中只存在一个验证对象属于第一分组;5、相邻两个验证对象不属于同一分组;6、验证内容中的验证对象不属于同一分组等。
具体的,这里假定要生成四位验证内容,上述验证内容由数字组成,获取到该验证内容的生成规则为规则5。则从多个识别对象分组中随机选取一个识别对象作为第一验证对象,假定第一验证对象为4,4属于第一分组,则根据验证内容的生成规则,从第二分组到第五分组中选取识别对象作为第二验证对象,假定第二验证对象为5,5属于第四分组,因此从第一、二、三、五分组中选取识别对象作为第三验证对象,假定第三验证对象为2,2属于第二分组,则从第一、三、四、五分组中选取识别对象作为第四验证对象,假定第四验证对象为6,6属于第三分组,则将上述第一验证对象、第二验证对象、第三验证对象、第四验证对象组成验证内容,得到四位验证内容4526。
步骤S403,将该验证内容输出到唇语识别的验证界面,基于上述验证界面对上述终端设备的用户进行该验证内容的唇语识别验证。
将通过以上步骤生成的验证内容输出到验证页面,对终端设备的用户进行上述验证内容的唇语识别验证,获取上述用户的唇部动作,提取上述用户的唇部特征,匹配音素,组成相应的识别对象,与输出的验证内容进行对比得到验证结果,反馈给上述用户。
具体的,将生成的上述验证内容输出到唇语识别的验证界面,基于上述验证界面获取识别图像,从上述识别图像中连续识别出人脸,提取用户连续的唇形变化特征,将唇形变化特征与音素分类结果进行匹配,得到对应的音素,将音素组合得到相应的发音,与验证内容进行对比得到唇语识别结果。如假定对上述步骤S402所示生成的验证内容4526进行识别,在提取到用户连续的唇形变化特征后,对上述唇形变化特征进行识别,得到用户的唇形变化为半张到W形到半张到全张形到半张再到ou形,经过与识别对象分组进行匹配,得到上述唇形变化特征对应第一分组、第四分组、第二分组及第三分组,再根据真实数据学习结果得到最终的唇语识别内容,与上述验证内容4526进行对比,得到验证内容的识别结果,显示在验证界面反馈给用户。
本申请实施例基于图1所示的步骤S101-步骤S103得到的识别对象分组,从上述识别对象分组中选取识别对象作为验证对象组成验证内容,其中,相邻的验证对象不属于同一识别对象分组,从而使得相邻的验证对象对应的发音唇形变化不同,减少了验证内容难以识别情况的出现,提高了唇语识别的准确性。
具体的参见图5,图5是本申请实施例提供的唇语识别的验证内容生成的场景示意图。假定使用终端设备智能手机200a,上述使用的识别对象的分组情况、音素唇形分类结果及获取的验证内容生成规则等存储在服务器100a上,当上述智能手机200a完成信息录入后,为了保障获取的信息为本人提供,需要进行活体检测,此处使用本申请实施例提供的唇语识别。当获取到唇语识别请求,生成验证页面201,其中,上述验证页面201包括验证内容显示界面、人脸识别界面,上述验证内容显示界面显示验证内容202。假设上述验证内容202由四位验证对象组成,分别为第一验证对象2021、第二验证对象2022、第三验证对 象2023及第四验证对象2024。可选的,上述终端设备200a收到唇语识别的验证请求后,根据验证内容的生成规则,从服务器100a中随机选取一个识别对象,作为第一验证对象2021输入到验证内容202,选取与上述第一验证对象2021发音唇形变化不同的识别对象作为第二验证对象2022输入到验证内容202,从服务器100a中选取与上述第二验证对象2022发音唇形变化不同的识别对象,作为第三验证对象2023输入到验证内容202,选取与上述第三验证对象2023发音唇形变化不同的识别对象,作为第四验证对象2024输入到验证内容202,即第二验证对象2022与第一验证对象2021不属于同一识别对象分组,第三验证对象2023与第二验证对象2022不属于同一识别对象分组,第四验证对象2024与第三验证对象2023不属于同一识别对象分组。最后由第一验证对象2021、第二验证对象2022、第三验证对象2023及第四验证对象2024组合生成验证内容202,将上述验证内容202输出到上述验证页面201的验证内容显示界面,从人脸识别部分获取用户的脸部图像,获取上述用户的唇部变化,提取上述用户的唇形特征,进行唇语识别验证。
可选的,还可以是终端设备200a发送唇语识别的验证请求给服务器100a,服务器100a收到上述唇语识别的验证请求后,执行上述终端设备选取识别对象作为验证对象组成验证内容的操作,得到第一验证对象2021、第二验证对象2022、第三验证对象2023及第四验证对象2024,组合生成验证内容202,服务器100a将上述验证内容202发送给终端设备200a,终端设备将验证内容202在上述验证页面201中显示出来,对用户进行验证内容的唇语识别。
可选的,可以在预设的更新时间或获取到更新指令后,学习新的识别对象,对上述新的识别对象执行图1所示的步骤S102到S103,将上述新的识别对象加入多个识别对象分组中,更新多个识别对象分组的情况。
可选的,可以通过终端设备发送唇语识别请求给服务器,由服务器生成唇语识别的验证内容,具体的参见图6-a,图6-a是本申请实施例提供的一种唇语识别的验证内容生成方法的交互流程图。具体的如图6-a所示,这里以服务器为主体实现唇语识别的验证过程,上述唇语识别的验证内容生成方法的交互流程如下:
步骤S601a、发送唇语识别请求。
具体的,终端设备在用户进行唇语识别时,向服务器发送唇语识别的验证请求,具体的可参考图4所示步骤S401。
步骤S602a、确定验证对象的数量及验证内容的生成规则。
具体的,服务器根据收到的唇语识别的验证请求,确定组成验证内容的验证对象的数量n,及验证内容生成规则。可选的,服务器中可以包括终端设备应用程序和验证内容生成规则标记的对应关系,或者服务器在收到唇语识别的验证请求时随机选取验证内容生成规则。
具体的,假定用户使用终端设备应用程序微信,在登录时由于账号长期未使用,需要进行身份验证,假定这里验证方式为唇语识别验证,则服务器收到唇语识别的验证请求后,查找微信所对应的验证内容生成规则,确定上述验证内容的验证对象选取和组合方式。
步骤S603a、根据验证内容的生成规则选取多个识别对象作为验证对象组成验证内容。具体的,如图4所示步骤S403。
步骤S604a、将上述验证内容发送给终端设备。
步骤S605a、终端设备获取用户的验证图像。
具体的,上述验证图像为终端设备的验证界面获取到的用户验证图像,即用户的脸部图像。
步骤S606a、终端设备将获取的验证图像反馈给服务器。
步骤S607a、服务器提取上述验证图像的连续唇形变化。
步骤S608a、服务器对上述连续唇形变化进行识别,得到相应的发音,与验证内容进行匹配。
步骤S609a、服务器将上述验证内容的唇语识别结果反馈给终端设备,终端设备进行显示。
具体的,上述步骤S604a至S609a是验证内容的唇语识别过程,通过机器视觉从图像中连续识别出人脸,提取口型的连续变化特征,将上述连续变化的特征输入到识别模型,与多个识别对象分组进行匹配,得到对应的识别对象发音,将上述识别对象发音与验证内容对比,得到上述验证内容的唇语识别结果,将上述唇语识别结果反馈显示到验证界面,反馈给用户。
具体的,上述终端设备通过向服务器发送唇语识别请求,由服务器执行上述步骤S602a-步骤S604a及步骤S607a-步骤S609a;可选的,可以通过服务器执行上述步骤S601a至步骤S609a,上述步骤S601a为终端设备的应用程序向服务器发送唇语识别请求,上述服务器可以被包括上述图5所示的终端设备访问并调用唇语识别的验证内容生成过程。
可选的,可以由终端设备直接执行唇语识别的验证内容生成过程,通过访问存储器中的数据实现验证内容的生成。其中,上述存储器可以是终端设备的内部存储器或外接存储器,或者可以与其他终端设备共享访问的云服务器,上述存储器存储有图1所示的步骤S101-步骤S103得到的数据,包括音素分类结果,多个识别对象分组及各个识别对象分组对应的发音唇形变化等。
具体的,参见图6-b,图6-b是本申请实施例提供的又一种唇语识别的验证内容生成方法的交互示意图。具体的如下所示:
步骤601b,获取唇语识别请求。
步骤602b,确定验证对象的数量及验证内容的生成规则。
步骤603b,根据验证内容的生成规则选取多个识别对象。
具体的,终端设备根据验证内容的生成规则从存储步骤S101-步骤S103中数据的存储器中依次选取识别对象。
步骤604b,将上述多个识别对象作为验证对象组成验证内容。
步骤605b,将验证内容反馈给用户。
步骤606b,终端设备获取用户的验证图像。
步骤607b,提取上述验证图像的连续唇形变化。
步骤608b,对上述连续唇形变化进行识别,得到相应的发音,与验证内容进行匹配。
具体的,终端设备在步骤607b后从存储器中获取音素分类结果、多个识别对象分组及各个识别对象分组对应的发音唇形变化等,从而与上述连续唇形变化进行匹配,得到唇语识别的验证结果。
其中,上述步骤S601b-步骤S608b的具体实现方式如图6-a所示步骤S601a-步骤S608a,这里是直接由终端设备执行上述步骤,只从存储器中获取上述存储器存储的图1所示步骤S101-步骤S103中的数据,从而实现上述步骤中识别对象的选取及验证内容的唇语识别验证。
可选的,参见图7,图7是本申请实施例提供的一种唇语识别的验证内容生成装置。如图7所示,该唇语识别的验证内容生成装置70可以用于上述图5所对应实施例中的终端设备,该装置可以包括:响应模块701、处理模块704和输出模块705。
响应模块701,用于获取终端设备的唇语识别请求,根据上述唇语识别请求得到验证 请求参数。
处理模块704,用于根据上述响应模块得到的验证请求参数确定唇语识别验证所需的验证对象的数量n,并从预置的多个识别对象分组中选取n个识别对象作为n个验证对象组成唇语识别的验证内容,上述n个验证对象至少属于两个识别对象分组且验证内容中相邻的验证对象分别属于不同的识别对象分组,其中,不同识别对象分组中所包括的识别对象的发音唇形变化不同。
输出模块705,用于将上述验证内容输出到唇语识别的验证界面,基于上述验证界面对上述终端设备的用户进行该验证内容的唇语识别验证。
其中,上述处理模块704还用于:
获取多个识别对象,上述多个识别对象中包括至少两种类型唇形变化的识别对象;
确定上述多个识别对象中各个识别对象的发音唇形变化;
将上述多个识别对象中发音唇形变化为上述至少两种类型唇形变化中的第一类型唇形变化的识别对象分为第一识别对象分组,将上述多个识别对象中发音唇形变化为上述至少两种类型唇形变化中的第二类型唇形变化的识别对象划分为第二识别对象分组,以得到多个识别对象分组。
其中,上述处理模块704还用于:
将多个汉语音素按发音唇形进行分类,得到音素分类结果,所述音素分类结果中包括音素和发音唇形的对应关系;
将所述多个识别对象中任一识别对象的拼音进行音素分解,根据分解得到的各个音素与所述音素和发音唇形的对应关系,确定所述各个音素对应的发音唇形;
将所述各个音素对应的发音唇形进行组合以生成所述任一识别对象的发音唇形变化,以得到所述各个识别对象的发音唇形变化。
其中,上述处理模块704还用于:
获取多个汉语音素,所述多个汉语音素中包括至少两种发音唇形的音素;
将所述多个汉语音素中发音唇形为所述至少两种发音唇形中的第一种发音唇形对应的汉语音素分为第一类别,将所述多个汉语音素中发音唇形为所述至少两种发音唇形中的第二种发音唇形对应的汉语音素分为第二类别;
将所述第一类别的汉语音素与所述第二类别的汉语音素存储到所述音素分类结果。
其中,上述处理模块704还用于:
将所述多个识别对象中的任一识别对象的拼音分解成辅音音素和元音音素,将所述辅音音素和所述元音音素与所述音素分类结果进行唇形匹配,通过所述音素分类结果中所述音素和发音唇形的对应关系得到所述辅音音素对应的辅音发音唇形和所述元音音素对应的元音发音唇形;
所述将所述各个音素对应的发音唇形进行组合以生成所述任一识别对象的发音唇形变化包括:将所述辅音音素对应的所述辅音发音唇形和所述元音音素对应的所述元音发音唇形组合得到所述任一识别对象的发音唇形变化。
其中,上述装置还可以包括:存储模块702、唇语识别模块703。
存储模块702,用于存储音素分类结果、多个识别对象分组、制定的验证内容生成方式及其他验证内容生成时使用的数据;
唇语识别模块703,用于识别用户的唇部动作,将上述用户的唇部动作与生成的唇语识别的验证内容的发音唇形进行匹配,得到唇语识别的结果。
其中,上述处理模块704还用于:
根据系统需求选择上述唇语识别的验证内容的生成规则,根据上述生成规则选取上述 识别对象作为上述验证对象组合成上述唇语识别的验证内容。
具体实现中,上述装置可通过上述各个模块执行上述图1或图4所提供的实现方式中各个步骤所提供的实现方式,实现上述各实施例中所实现的功能,具体可参见上述图1或图4所示的方法实施例中各个步骤提供的相应描述,在此不再赘述。
在本申请实施例中,上述验证内容生成装置(或简称装置)可通过从预置的识别对象分组中选取识别对象作为验证对象组成验证内容,其中,相邻的验证对象对应的识别对象不属于同一识别对象分组,即相邻验证对象对应的发音唇形变化不同,从而使组成的验证内容中相邻的验证对象发音唇形变化发生改变,进而减少验证内容难以识别情况的出现,提高唇语识别的准确性。
参见图8,图8是本申请实施例提供的终端设备的结构示意图。如图8所示,本实施例中的终端设备可以包括:一个或多个处理器801、存储器802和收发器803。上述处理器801、存储器802和收发器803通过总线804连接。存储器802用于存储计算机程序,该计算机程序包括程序指令。上述处理器801和收发器803用于调用存储器802存储的程序指令,执行如下操作:
上述收发器803,用于获取终端设备的唇语识别请求。
上述处理器801,用于根据上述收发器803获取的唇语识别请求得到验证请求参数,根据上述验证请求参数确定唇语识别验证所需的验证对象的数量n,并从预置的多个识别对象分组中选取n个识别对象作为n个验证对象组成唇语识别的验证内容,上述n个验证对象至少属于两个识别对象分组且上述验证内容中相邻的验证对象分别属于不同的识别对象分组,其中,不同识别对象分组中所包括的识别对象的发音唇形变化不同;
上述收发器803,还用于将验证内容输出到唇语识别的验证界面。
上述处理器801,还用于基于上述验证界面对上述终端设备的用户进行验证内容的唇语识别验证。
在一些可行的实施方式中,上述唇语识别的验证内容中首位验证对象的发音唇形变化不包括以半张嘴或闭嘴开始的发音唇形变化。
在一些可行的实施方式中,上述处理器801用于:
获取多个识别对象,上述多个识别对象中包括至少两种类型唇形变化的识别对象;
确定上述多个识别对象中各个识别对象的发音唇形变化;
将上述多个识别对象中发音唇形变化为上述至少两种类型唇形变化中的第一类型唇形变化的识别对象分为第一识别对象分组,将上述多个识别对象中发音唇形变化为上述至少两种类型唇形变化中的第二类型唇形变化的识别对象划分为第二识别对象分组,以得到多个识别对象分组。
在一些可行的实施方式中,上述处理器801用于:
将多个汉语音素按发音唇形进行分类,得到音素分类结果,上述音素分类结果中包括音素和发音唇形的对应关系;
将上述多个识别对象中任一识别对象的拼音进行音素分解,根据分解得到的各个音素与上述音素和发音唇形的对应关系,确定上述各个音素对应的发音唇形;
将上述各个音素对应的发音唇形进行组合以生成上述任一识别对象的发音唇形变化,以得到上述各个识别对象的发音唇形变化。
在一些可行的实施方式中,上述处理器801用于:
获取多个汉语音素,上述多个汉语音素中包括至少两种发音唇形的音素;
将上述多个汉语音素中发音唇形为至少两种发音唇形中的第一种发音唇形对应的汉语音素分为第一类别,将上述多个汉语音素中发音唇形为至少两种发音唇形中的第二种发音 唇形对应的汉语音素分为第二类别;
将上述第一类别与上述第二类别存储到音素分类结果。
在一些可行的实施方式中,上述处理器801用于:
将上述多个识别对象中的任一识别对象的拼音分解成辅音音素和元音音素,将上述辅音音素和上述元音音素与上述音素分类结果进行唇形匹配,通过音素分类结果中音素和发音唇形的对应关系得到上述辅音音素对应的辅音发音唇形和上述元音音素对应的元音发音唇形,将上述辅音发音唇形和上述元音发音唇形组合得到任一识别对象的发音唇形变化。
在一些可行的实施方式中,上述处理器801可以是中央处理单元(central processing unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现成可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器802可以包括只读存储器和随机存取存储器,并向处理器801和收发器803提供指令和数据。存储器802的一部分还可以包括非易失性随机存取存储器。例如,存储器802还可以存储设备类型的信息。
具体实现中,上述终端设备可通过其内置的各个功能模块执行如上述图1或图4中各个步骤所提供的实现方式,具体可参见上述图1或图4中各个步骤所提供的实现方式,在此不再赘述。
在本申请实施例中,终端设备可通过从预置的识别对象分组中选取识别对象作为验证对象组成验证内容,其中,相邻的验证对象对应的识别对象不属于同一识别对象分组,即相邻验证对象对应的发音唇形变化不同,从而使组成的验证内容中相邻的验证对象发音唇形变化发生改变,进而减少验证内容难以识别情况的出现,提高唇语识别的准确性。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序包括程序指令,该程序指令被处理器执行时实现图1或图4中各个步骤所提供的唇语识别的验证内容生成方法,具体可参见上述图1或图4各个步骤所提供的实现方式,在此不再赘述。
上述计算机可读存储介质可以是前述任一实施例提供的唇语识别的验证内容生成装置或者上述终端设备的内部存储单元,例如电子设备的硬盘或内存。该计算机可读存储介质也可以是该电子设备的外部存储设备,例如该电子设备上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,该计算机可读存储介质还可以既包括该电子设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该电子设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本申请实施例的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。另,术语“至少”是用于列举部分情况,以反映实施过程,而非只包括给出的方法实施要求。本申请实施例提供的方法及相关装置是参照本申请实施例提供的方法流程图和/或结构示意图来描述的,具体可由计算机程序指令实现方法流程图和/或结构示意图的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (20)

  1. 一种唇语识别的验证内容生成方法,其特征在于,包括:
    获取终端设备的唇语识别请求,根据所述唇语识别请求得到验证请求参数;
    根据所述验证请求参数确定唇语识别验证所需的验证对象的数量n,并从预置的多个识别对象分组中选取n个识别对象作为n个验证对象组成唇语识别的验证内容,所述n个验证对象至少属于两个识别对象分组且所述验证内容中相邻的验证对象分别属于不同的识别对象分组,其中,不同识别对象分组中所包括的识别对象的发音唇形变化不同;
    将所述验证内容输出到唇语识别的验证界面,基于所述验证界面对所述终端设备的用户进行所述验证内容的唇语识别验证。
  2. 如权利要求1所述的方法,其特征在于,所述唇语识别的验证内容中首位验证对象的发音唇形变化不包括以半张嘴或闭嘴开始的发音唇形变化。
  3. 如权利要求1或2所述的方法,其特征在于,所述方法还包括:
    获取多个识别对象,所述多个识别对象中包括至少两种类型唇形变化的识别对象;
    确定所述多个识别对象中各个识别对象的发音唇形变化;
    将所述多个识别对象中发音唇形变化为所述至少两种类型唇形变化中的第一类型唇形变化的识别对象分为第一识别对象分组,将所述多个识别对象中发音唇形变化为所述至少两种类型唇形变化中的第二类型唇形变化的识别对象划分为第二识别对象分组,以得到多个识别对象分组。
  4. 如权利要求3所述的方法,其特征在于,所述确定所述多个识别对象中各个识别对象的发音唇形变化包括:
    将多个汉语音素按发音唇形进行分类,得到音素分类结果,所述音素分类结果中包括音素和发音唇形的对应关系;
    将所述多个识别对象中任一识别对象的拼音进行音素分解,根据分解得到的各个音素与所述音素和发音唇形的对应关系,确定所述各个音素对应的发音唇形;
    将所述各个音素对应的发音唇形进行组合以生成所述任一识别对象的发音唇形变化,以得到所述各个识别对象的发音唇形变化。
  5. 如权利要求4所述的方法,其特征在于,所述将多个汉语音素按发音唇形进行分类,得到音素分类结果包括:
    获取多个汉语音素,所述多个汉语音素中包括至少两种发音唇形的音素;
    将所述多个汉语音素中发音唇形为所述至少两种发音唇形中的第一种发音唇形对应的汉语音素分为第一类别,将所述多个汉语音素中发音唇形为所述至少两种发音唇形中的第二种发音唇形对应的汉语音素分为第二类别;
    将所述第一类别的汉语音素与所述第二类别的汉语音素存储到所述音素分类结果。
  6. 如权利要求4或5所述的方法,其特征在于,所述将所述多个识别对象中任一识别对象的拼音进行音素分解,根据分解得到的各个音素与所述音素和发音唇形的对应关系,确定所述各个音素对应的发音唇形包括:
    将所述多个识别对象中的任一识别对象的拼音分解成辅音音素和元音音素,将所述辅音音素和所述元音音素与所述音素分类结果进行唇形匹配,通过所述音素分类结果中所述音素和发音唇形的对应关系得到所述辅音音素对应的辅音发音唇形和所述元音音素对应的元音发音唇形;
    所述将所述各个音素对应的发音唇形进行组合以生成所述任一识别对象的发音唇形变化包括:将所述辅音音素对应的所述辅音发音唇形和所述元音音素对应的所述元音发音唇 形组合得到所述任一识别对象的发音唇形变化。
  7. 一种唇语识别的验证内容生成装置,其特征在于,包括:
    响应模块,用于获取终端设备的唇语识别请求,根据所述唇语识别请求得到验证请求参数;
    处理模块,用于根据所述响应模块得到的所述验证请求参数确定唇语识别验证所需的验证对象的数量n,并从预置的多个识别对象分组中选取n个识别对象作为n个验证对象组成唇语识别的验证内容,所述n个验证对象至少属于两个识别对象分组且所述验证内容中相邻的验证对象分别属于不同的识别对象分组,其中,不同识别对象分组中所包括的识别对象的发音唇形变化不同;
    输出模块,用于将所述处理模块组成的所述验证内容输出到唇语识别的验证界面,基于所述验证界面对所述终端设备的用户进行所述验证内容的唇语识别验证。
  8. 如权利要求7所述的装置,其特征在于,所述唇语识别的验证内容中首位验证对象的发音唇形变化不包括以半张嘴或闭嘴开始的发音唇形变化。
  9. 如权利要求7或8所述的装置,其特征在于,所述处理模块还用于:
    获取多个识别对象,所述多个识别对象中包括至少两种类型唇形变化的识别对象;
    确定所述多个识别对象中各个识别对象的发音唇形变化;
    将所述多个识别对象中发音唇形变化为所述至少两种类型唇形变化中的第一类型唇形变化的识别对象分为第一识别对象分组,将所述多个识别对象中发音唇形变化为所述至少两种类型唇形变化中的第二类型唇形变化的识别对象划分为第二识别对象分组,以得到多个识别对象分组。
  10. 如权利要求9所述的装置,其特征在于,所述处理模块还用于:
    将多个汉语音素按发音唇形进行分类,得到音素分类结果,所述音素分类结果中包括音素和发音唇形的对应关系;
    将所述多个识别对象中任一识别对象的拼音进行音素分解,根据分解得到的各个音素与所述音素和发音唇形的对应关系,确定所述各个音素对应的发音唇形;
    将所述各个音素对应的发音唇形进行组合以生成所述任一识别对象的发音唇形变化,以得到所述各个识别对象的发音唇形变化。
  11. 如权利要求10所述的装置,其特征在于,所述处理模块还用于:
    获取多个汉语音素,所述多个汉语音素中包括至少两种发音唇形的音素;
    将所述多个汉语音素中发音唇形为所述至少两种发音唇形中的第一种发音唇形对应的汉语音素分为第一类别,将所述多个汉语音素中发音唇形为所述至少两种发音唇形中的第二种发音唇形对应的汉语音素分为第二类别;
    将所述第一类别的汉语音素与所述第二类别的汉语音素存储到所述音素分类结果。
  12. 如权利要求10或11所述的装置,其特征在于,所述处理模块还用于:
    将所述多个识别对象中的任一识别对象的拼音分解成辅音音素和元音音素,将所述辅音音素和所述元音音素与所述音素分类结果进行唇形匹配,通过所述音素分类结果中所述音素和发音唇形的对应关系得到所述辅音音素对应的辅音发音唇形和所述元音音素对应的元音发音唇形;
    将所述辅音音素对应的所述辅音发音唇形和所述元音音素对应的所述元音发音唇形组合得到所述任一识别对象的发音唇形变化。
  13. 如权利要求7所述的装置,其特征在于,所述装置还包括:
    存储模块,用于存储音素分类结果,多个识别对象分组及制定的验证内容生成规则;
    唇语识别模块,用于识别用户的唇部动作,将所述用户的唇部动作与生成的所述唇语 识别的验证内容的发音唇形变化进行匹配,得到所述验证内容的唇语识别结果。
  14. 一种终端设备,其特征在于,所述终端设备包括处理器、收发器和存储器,所述处理器、所述收发器和所述存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器和所述收发器被配置用于调用所述程序指令,执行如下操作:
    所述处理器,用于获取终端设备的唇语识别请求,根据所述唇语识别请求得到验证请求参数;
    所述处理器,还用于根据所述验证请求参数确定唇语识别验证所需的验证对象的数量n,并从预置的多个识别对象分组中选取n个识别对象作为n个验证对象组成唇语识别的验证内容,所述n个验证对象至少属于两个识别对象分组且所述验证内容中相邻的验证对象分别属于不同的识别对象分组,其中,不同识别对象分组中所包括的识别对象的发音唇形变化不同;
    所述收发器,用于将所述验证内容输出到唇语识别的验证界面,基于所述验证界面对所述终端设备的用户进行所述验证内容的唇语识别验证。
  15. 如权利要求14所述的终端设备,其特征在于,所述唇语识别的验证内容中首位验证对象的发音唇形变化不包括以半张嘴或闭嘴开始的发音唇形变化。
  16. 如权利要求14或15所述的终端设备,其特征在于,所述处理器还用于:
    获取多个识别对象,所述多个识别对象中包括至少两种类型唇形变化的识别对象;
    确定所述多个识别对象中各个识别对象的发音唇形变化;
    将所述多个识别对象中发音唇形变化为所述至少两种类型唇形变化中的第一类型唇形变化的识别对象分为第一识别对象分组,将所述多个识别对象中发音唇形变化为所述至少两种类型唇形变化中的第二类型唇形变化的识别对象划分为第二识别对象分组,以得到多个识别对象分组。
  17. 如权利要求16所述的终端设备,其特征在于,所述处理器用于:
    将多个汉语音素按发音唇形进行分类,得到音素分类结果,所述音素分类结果中包括音素和发音唇形的对应关系;
    将所述多个识别对象中任一识别对象的拼音进行音素分解,根据分解得到的各个音素与所述音素和发音唇形的对应关系,确定所述各个音素对应的发音唇形;
    将所述各个音素对应的发音唇形进行组合以生成所述任一识别对象的发音唇形变化,以得到所述各个识别对象的发音唇形变化。
  18. 如权利要求17所述的终端设备,其特征在于,所述处理器用于:
    获取多个汉语音素,所述多个汉语音素中包括至少两种发音唇形的音素;
    将所述多个汉语音素中发音唇形为所述至少两种发音唇形中的第一种发音唇形对应的汉语音素分为第一类别,将所述多个汉语音素中发音唇形为所述至少两种发音唇形中的第二种发音唇形对应的汉语音素分为第二类别;
    将所述第一类别的汉语音素与所述第二类别的汉语音素存储到所述音素分类结果。
  19. 如权利要求14所述的终端设备,其特征在于,所述处理器用于:
    将所述多个识别对象中的任一识别对象的拼音分解成辅音音素和元音音素,将所述辅音音素和所述元音音素与所述音素分类结果进行唇形匹配,通过所述音素分类结果中所述音素和发音唇形的对应关系得到所述辅音音素对应的辅音发音唇形和所述元音音素对应的元音发音唇形;
    所述将所述各个音素对应的发音唇形进行组合以生成所述任一识别对象的发音唇形变化包括:将所述辅音音素对应的所述辅音发音唇形和所述元音音素对应的所述元音发音唇 形组合得到所述任一识别对象的发音唇形变化。
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序指令,所述计算机程序指令当被处理器执行时使处理器执行如权利要求1-6任一项所述的方法。
PCT/CN2019/088800 2018-11-28 2019-05-28 唇语识别的验证内容生成方法及相关装置 WO2020107834A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811430520.0A CN109461437B (zh) 2018-11-28 2018-11-28 唇语识别的验证内容生成方法及相关装置
CN201811430520.0 2018-11-28

Publications (1)

Publication Number Publication Date
WO2020107834A1 true WO2020107834A1 (zh) 2020-06-04

Family

ID=65611807

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/088800 WO2020107834A1 (zh) 2018-11-28 2019-05-28 唇语识别的验证内容生成方法及相关装置

Country Status (2)

Country Link
CN (1) CN109461437B (zh)
WO (1) WO2020107834A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807234B (zh) * 2021-09-14 2023-12-19 深圳市木愚科技有限公司 口型合成视频校验方法、装置、计算机设备及存储介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109461437B (zh) * 2018-11-28 2023-05-09 平安科技(深圳)有限公司 唇语识别的验证内容生成方法及相关装置
CN109830236A (zh) * 2019-03-27 2019-05-31 广东工业大学 一种双视位口型合成方法
CN111242029A (zh) * 2020-01-13 2020-06-05 湖南世优电气股份有限公司 设备控制方法、装置、计算机设备和存储介质
CN113743160A (zh) * 2020-05-29 2021-12-03 北京中关村科金技术有限公司 活体检测的方法、装置以及存储介质
CN112104457B (zh) * 2020-08-28 2022-06-17 苏州云葫芦信息科技有限公司 一种数字转汉字类型的验证码生成方法及验证系统
CN112241521A (zh) * 2020-12-04 2021-01-19 北京远鉴信息技术有限公司 爆破音的身份验证方法、装置、电子设备及介质
CN112749629A (zh) * 2020-12-11 2021-05-04 东南大学 一种身份验证系统汉语唇语识别的工程优化方法
CN114267374B (zh) * 2021-11-24 2022-10-18 北京百度网讯科技有限公司 音素检测方法及装置、训练方法及装置、设备和介质

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (zh) * 2007-07-19 2008-01-09 华中科技大学 基于视觉特征的单音节语言唇读识别系统
CN104361276A (zh) * 2014-11-18 2015-02-18 新开普电子股份有限公司 一种多模态生物特征身份认证方法及系统
CN104598796A (zh) * 2015-01-30 2015-05-06 科大讯飞股份有限公司 身份识别方法及系统
CN104992095A (zh) * 2015-06-29 2015-10-21 百度在线网络技术(北京)有限公司 信息验证方法和系统
CN105930713A (zh) * 2016-04-14 2016-09-07 深圳市金立通信设备有限公司 一种验证码生成方法及终端
US20170024608A1 (en) * 2015-07-20 2017-01-26 International Business Machines Corporation Liveness detector for face verification
CN106529379A (zh) * 2015-09-15 2017-03-22 阿里巴巴集团控股有限公司 一种活体识别方法及设备
CN109461437A (zh) * 2018-11-28 2019-03-12 平安科技(深圳)有限公司 唇语识别的验证内容生成方法及相关装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107404381A (zh) * 2016-05-19 2017-11-28 阿里巴巴集团控股有限公司 一种身份认证方法和装置
CN106453278B (zh) * 2016-09-23 2019-04-30 财付通支付科技有限公司 信息验证方法及验证平台
CN106778496A (zh) * 2016-11-22 2017-05-31 重庆中科云丛科技有限公司 活体检测方法及装置
CN107133608A (zh) * 2017-05-31 2017-09-05 天津中科智能识别产业技术研究院有限公司 基于活体检测和人脸验证的身份认证系统
CN107358085A (zh) * 2017-07-28 2017-11-17 惠州Tcl移动通信有限公司 一种终端设备解锁方法、存储介质及终端设备

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101101752A (zh) * 2007-07-19 2008-01-09 华中科技大学 基于视觉特征的单音节语言唇读识别系统
CN104361276A (zh) * 2014-11-18 2015-02-18 新开普电子股份有限公司 一种多模态生物特征身份认证方法及系统
CN104598796A (zh) * 2015-01-30 2015-05-06 科大讯飞股份有限公司 身份识别方法及系统
CN104992095A (zh) * 2015-06-29 2015-10-21 百度在线网络技术(北京)有限公司 信息验证方法和系统
US20170024608A1 (en) * 2015-07-20 2017-01-26 International Business Machines Corporation Liveness detector for face verification
CN106529379A (zh) * 2015-09-15 2017-03-22 阿里巴巴集团控股有限公司 一种活体识别方法及设备
CN105930713A (zh) * 2016-04-14 2016-09-07 深圳市金立通信设备有限公司 一种验证码生成方法及终端
CN109461437A (zh) * 2018-11-28 2019-03-12 平安科技(深圳)有限公司 唇语识别的验证内容生成方法及相关装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807234B (zh) * 2021-09-14 2023-12-19 深圳市木愚科技有限公司 口型合成视频校验方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN109461437A (zh) 2019-03-12
CN109461437B (zh) 2023-05-09

Similar Documents

Publication Publication Date Title
WO2020107834A1 (zh) 唇语识别的验证内容生成方法及相关装置
US11514909B2 (en) Third party account linking for voice user interface
CN108962255B (zh) 语音会话的情绪识别方法、装置、服务器和存储介质
WO2021208287A1 (zh) 用于情绪识别的语音端点检测方法、装置、电子设备及存储介质
US11823678B2 (en) Proactive command framework
US20150325240A1 (en) Method and system for speech input
US10152974B2 (en) Unobtrusive training for speaker verification
US9754585B2 (en) Crowdsourced, grounded language for intent modeling in conversational interfaces
CN108447471A (zh) 语音识别方法及语音识别装置
WO2014190732A1 (en) Method and apparatus for building a language model
WO2020238045A1 (zh) 智能语音识别方法、装置及计算机可读存储介质
WO2021169365A1 (zh) 声纹识别的方法和装置
WO2022257452A1 (zh) 表情回复方法、装置、设备及存储介质
US11322151B2 (en) Method, apparatus, and medium for processing speech signal
KR102312993B1 (ko) 인공신경망을 이용한 대화형 메시지 구현 방법 및 그 장치
CN110826637A (zh) 情绪识别方法、系统及计算机可读存储介质
CN111210824B (zh) 语音信息处理方法、装置、电子设备及存储介质
WO2020073839A1 (zh) 语音唤醒方法、装置、系统及电子设备
CN112037772A (zh) 基于多模态的响应义务检测方法、系统及装置
CN110895938B (zh) 语音校正系统及语音校正方法
US20210398544A1 (en) Electronic device and control method thereof
CN112204506A (zh) 用于手写文本的自动语言检测的系统和方法
US11792365B1 (en) Message data analysis for response recommendations
JP5850886B2 (ja) 情報処理装置及び方法
WO2021139737A1 (zh) 一种人机交互的方法和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19888916

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.09.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19888916

Country of ref document: EP

Kind code of ref document: A1