CN114242075A

CN114242075A - Identity authentication method, device and equipment based on face and voiceprint

Info

Publication number: CN114242075A
Application number: CN202111524358.0A
Authority: CN
Inventors: 徐波
Original assignee: GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD; Duoyi Network Co ltd
Current assignee: GUANGDONG LIWEI NETWORK TECHNOLOGY CO LTD; Duoyi Network Co ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-03-25

Abstract

The invention relates to the technical field of safety, in particular to an identity authentication method, device, equipment and storage medium based on human faces and voiceprints, wherein the method comprises the following steps: acquiring face screen recording data and audio data of a user; inputting the face recording data into a preset face recognition model to obtain a first face information characteristic; inputting the audio data into a preset voice recognition model, acquiring recognized text information data, inputting the text information data into a preset word embedding model, and acquiring a first authentication word characteristic; splicing the first face feature, the first voiceprint feature and the first authentication word feature to obtain a first identity feature, and storing the first identity feature and the first lip action feature in a database as identity registration features; and responding to an identity verification instruction of the user, acquiring identity authentication characteristics, verifying the identity of the user according to the identity authentication characteristics and identity registration characteristics, and acquiring an identity authentication result.

Description

Identity authentication method, device and equipment based on face and voiceprint

Technical Field

The invention relates to the technical field of safety, in particular to an identity authentication method, device, equipment and storage medium based on human faces and voiceprints.

Background

At present, the application of the safety lock is increasingly wide under the application scenes of a storage cabinet, a mobile phone terminal and the like. In the prior technical scheme, a face feature extraction model is trained by using a deep learning and large-scale data training mode based on face recognition, equipment takes a picture by using a camera during locking and calculates the face feature of an owner as an authentication reference feature, the equipment takes a picture again during decoding and obtains a new feature to compare with the reference feature, and the authentication is agreed after the comparison.

And the other method is based on voiceprint recognition, large-scale linguistic data are adopted, a voiceprint feature extraction model is trained through deep learning, the device performs recording through a recording device during locking, the voiceprint feature of the owner is calculated to serve as an authentication reference feature, recording is performed again during decoding, a new feature is obtained and compared with the reference feature, and authentication is agreed after comparison.

However, the above methods all have certain defects, the 2D information of the face is acquired through the camera, and the 2D information can be easily impersonated by using a photo, so that the risk is high, the safety is not high enough, the voiceprint feature is equivalent to the face feature, the feature dimension is low, and the distinguishing effect is poor in a big data scene; and with the development of sound imitation technology, the reliability of voiceprint recognition is lower and lower.

Disclosure of Invention

Based on this, the present invention aims to provide an identity authentication method, apparatus, device and storage medium based on a face and a voiceprint, which, by combining face recognition, voiceprint recognition and authentication word recognition, improves the extraction effect of face information features and voice features, realizes identity authentication of a user, and simultaneously solves the problem of weak security of extracting features based on a 2D picture and extracting a simple voiceprint, thereby enhancing the security of identity authentication.

The technical scheme is as follows:

in a first aspect, an embodiment of the present application provides an identity authentication method based on a face and a voiceprint, including the following steps:

acquiring face screen recording data and audio data of a user;

inputting the face video data into a preset face recognition model to obtain a first face information characteristic, wherein the first face information characteristic comprises a first face characteristic and a first lip action characteristic;

inputting the audio data into a preset voiceprint recognition model to obtain a first voiceprint characteristic;

inputting the audio data into a preset voice recognition model, acquiring recognized character information data, inputting the character information data into a preset word embedding model, and acquiring first authentication word characteristics, wherein the first authentication word characteristics are word vector characteristics of authentication words used for authentication in the audio data;

splicing the first face feature, the first voiceprint feature and the first authentication word feature to obtain a first identity feature, and storing the first identity feature and the first lip action feature in a database as identity registration features;

responding to an identity verification instruction of a user, wherein the identity verification instruction comprises face screen recording data and audio data when the user performs authentication, acquiring identity authentication characteristics, and verifying the identity of the user according to the identity authentication characteristics and identity registration characteristics to acquire an identity authentication result.

In a second aspect, an embodiment of the present application provides an identity authentication apparatus based on a face and a voiceprint, including:

the acquisition module is used for acquiring face screen recording data and audio data of a user;

the face information feature acquisition module is used for inputting the face recording data into a preset face recognition model and acquiring a first face information feature, wherein the first face information feature comprises a first face feature and a first lip action feature;

the voiceprint feature acquisition module is used for inputting the audio data into a preset voiceprint recognition model to acquire a first voiceprint feature;

the authentication word feature acquisition module is used for inputting the audio data into a preset voice recognition model, acquiring recognized character information data, inputting the character information data into a preset word embedding model, and acquiring first authentication word features, wherein the first authentication word features are word vector features of authentication words used for authentication in the audio data;

the identity registration module is used for splicing the first face feature, the first voiceprint feature and the first authentication word feature to obtain a first identity feature, and storing the first identity feature and the first lip action feature in a database as identity registration features;

the identity verification module is used for responding to an identity verification instruction of a user, the identity verification instruction comprises face screen recording data and audio data when the user performs authentication, identity authentication characteristics are obtained, the identity of the user is verified according to the identity authentication characteristics and the identity registration characteristics, and an identity authentication result is obtained.

In a third aspect, an embodiment of the present application provides a computer device, including: a processor, a memory, and a computer program stored on the memory and executable on the processor; the computer program when executed by the processor performs the steps of the method for face and voiceprint based identity authentication according to the first aspect.

In a fourth aspect, the present application provides a storage medium storing a computer program, where the computer program is executed by a processor to implement the steps of the identity authentication method based on human face and voiceprint according to the first aspect.

In the embodiment of the application, the extraction effects of the face information features and the voice features are improved by combining face recognition, voiceprint recognition and authentication word recognition, the problem of poor safety of feature extraction based on a 2D picture and simple voiceprint extraction is solved while the identity authentication of a user is realized, and the safety of the identity authentication is enhanced.

For a better understanding and practice, the invention is described in detail below with reference to the accompanying drawings.

Drawings

Fig. 1 is a schematic flowchart of an identity authentication method based on a face and a voiceprint according to an embodiment of the present application;

fig. 2 is a schematic flowchart of S2 in the identity authentication method based on a face and a voiceprint according to an embodiment of the present application;

fig. 3 is a face feature diagram obtained by an identity authentication method based on a face and a voiceprint according to an embodiment of the present application;

fig. 4 is a schematic flowchart of S3 in the identity authentication method based on a face and a voiceprint according to an embodiment of the present application;

fig. 5 is a schematic flowchart of S5 in the identity authentication method based on a face and a voiceprint according to an embodiment of the present application;

fig. 6 is a schematic flowchart of S6 in the identity authentication method based on a face and a voiceprint according to an embodiment of the present application;

fig. 7 is a schematic structural diagram of an identity authentication apparatus based on a face and a voiceprint according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of an apparatus according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if/if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Referring to fig. 1, fig. 1 is a schematic flow chart of an identity authentication method based on a face and a voiceprint according to an embodiment of the present application, where the method includes the following steps:

s1: and acquiring face screen recording data and audio data of the user.

The main execution body of the identity authentication method based on the face and the voiceprint is authentication equipment (hereinafter referred to as authentication equipment for short) of the identity authentication method based on the face and the voiceprint, and in an optional embodiment, the authentication equipment can be one computer equipment, can be a server, or can be a server cluster formed by combining a plurality of computer equipments.

In this embodiment, the authentication device is provided with a user interaction interface, through which a user can send an identity registration instruction and an identity verification instruction to the authentication device, and when the user sends the identity registration instruction, the authentication device responds to record a video of the user by controlling the camera, so as to obtain face video data of the user during locking, and record the video data by controlling the microphone, so as to obtain audio data of the user during locking.

S2: and inputting the face video data into a preset face recognition model, and acquiring a first face information characteristic, wherein the first face information characteristic comprises a first face characteristic and a first lip action characteristic.

The face recognition model comprises a face frame detection module, a face feature extraction module and a face feature point tracking module, wherein the face frame detection module obtains the face frame with the largest area in the face record data by adopting a Dlib model and based on HOG features and a linear classifier, and in an optional embodiment, when the face record data comprises a plurality of face data, the largest face is taken for feature calculation.

The face feature extraction module adopts a FaceNet model, and because the FaceNet model training logic is to calculate the minimized inter-class distance and the maximized inter-class distance of the input data, the feature data obtained by the output feature of the last inclusion layer in FaceNet through the last pooling layer can effectively distinguish different faces for obtaining the face features.

The face feature point tracking module adopts a Dlib model and obtains a face feature map of a face frame output by the face frame detection module based on GBDT (gradient descent tree), wherein the face feature map comprises a plurality of face feature points and coordinate data of the face feature points, and the coordinate data comprises abscissa data and ordinate data.

In this embodiment, the authentication device inputs the face video data into a preset face recognition model, obtains N face frames in the face video data according to the face frame detection module, stretches/compresses pixel matrices corresponding to the N face frames output by the face frame detection module to a matrix Input with a size of 299 × 299 × 3, and obtains first face features D corresponding to the N face frames output by the FaceNet model by using the Input as Input data of the FaceNet pre-training model₁

And the authentication equipment acquires the first lip action characteristic according to the human face characteristic point tracking module.

Referring to fig. 2, fig. 2 is a schematic flow chart of S2 in the identity authentication method based on human face and voiceprint according to an embodiment of the present application, including steps S201 to S202, which are as follows

S201: the method comprises the steps of obtaining face characteristic points, wherein the face characteristic points comprise eye characteristic points, lip characteristic points and nose tip characteristic points.

As shown in fig. 3, in the embodiment, an authentication device obtains a face feature point corresponding to each frame in the face record data according to the face feature point tracking module, where the face feature point includes an eye feature point, a lip feature point, and a nose tip feature point, the eye feature point includes a left inner canthus feature point and a right inner canthus feature point, the position of the left inner canthus feature point is 40, the position of the left inner canthus feature point is 43, the lip feature point includes points 49-68, and the position of the nose tip feature point is 34.

S202: and acquiring the first lip action characteristic according to the face characteristic points and a normalization algorithm.

Wherein, the normalization algorithm is as follows:

in the formula (di)_iIs the lip feature point; x is the number of_iThe abscissa of the face characteristic point is the abscissa; x is the number of₃₄The abscissa of the nose tip characteristic point is taken as the reference point; x is the number of₄₀The horizontal coordinate of the left inner canthus characteristic point of the eye characteristic point is taken as the horizontal coordinate; x is the number of₄₃The abscissa of the right inner canthus feature point of the eye feature point is the abscissa; y is_iThe ordinate of the face characteristic point is the ordinate of the face characteristic point; y is₃₄The longitudinal coordinate of the nose tip characteristic point is taken as the longitudinal coordinate; y is₄₀The vertical coordinate of the left inner canthus characteristic point of the eye characteristic point is taken as the vertical coordinate; y is₄₃And the vertical coordinate of the right inner canthus characteristic point of the eye characteristic point.

The normalization is not accurate, and is the normalization of the distance between the lip feature point and the feature point 34, and the normalized data is calculated for each frame to describe the motion track of the lip. In the scene, only lip feature points, namely 49-68, and the position difference of the nose tip 34 are tracked, and each frame is calculated through the formula, so that each frame obtains distance values of 68-49+ 1-20 (one) to form a corresponding matrix. Other normalization related content in the text should be modified

In this embodiment, the authentication device normalizes the distances between the lip feature points and the nose tip feature points corresponding to each frame in the face record data according to the face feature points and the normalization algorithm to form a corresponding Matrix, the Matrix reflects the motion trajectory of the lips of the user, the lip feature points are tracked, and the Matrix is used as the first lip motion feature Matrix_pre。

S3: and inputting the audio data into a preset voiceprint recognition model to obtain a first voiceprint characteristic.

The voiceprint recognition model is a VGG model, and the VGG model supports extraction of 128-dimensional embedding feature vectors with semantics from audio waveforms and is used for obtaining voiceprint features.

In this embodiment, the authentication device inputs the audio data to a preset voiceprint recognition model to obtain a first voiceprint feature.

Referring to fig. 4, fig. 4 is a schematic flowchart of S3 in the identity authentication method based on a face and a voiceprint according to an embodiment of the present application, which includes steps S301 to S302, specifically as follows:

s301: and carrying out resampling processing on the audio data to obtain single-channel audio data.

In this embodiment, the authentication device performs resampling processing on the audio data to obtain monaural audio data, and specifically, the monaural audio data is 16kHz monaural audio data.

S302: obtaining a spectrogram of the single sound channel audio data, calculating Mel frequency spectrum characteristics of the single sound channel audio data, inputting the Mel frequency spectrum characteristics to the voiceprint recognition model, and obtaining a first voiceprint characteristic.

In this embodiment, the authentication device performs windowing on the monaural audio data by using a 25ms haining window, and performs short-time fourier transform on the windowed monaural audio data according to a 10ms frame shift standard to obtain a spectrogram;

mapping the spectrogram into a mel (Mel spectrum) filter bank of order 64 to calculate mel-spectrum features;

and processing the Mel spectrum feature according to a formula log (mel-spectrum +0.01), acquiring the processed Mel spectrum feature, inputting the Mel spectrum feature into the voiceprint recognition model, and acquiring a first voiceprint feature. In an alternative embodiment, the mel-frequency spectrum features are framed in 0.96s duration, and a total (audio-len/0.96s) frame (division is rounded down), wherein the audio-len is the audio length measured in seconds(s), and the matrix is used as the input of the voiceprint recognition model, and the first voiceprint features output by the voiceprint recognition model are obtained, and the size of the first voiceprint feature matrix is (audio-len/0.96s) multiplied by 128.

S4: inputting the audio data into a preset voice recognition model, acquiring recognized text information data, inputting the text information data into a preset word embedding model, and acquiring first authentication word characteristics, wherein the first authentication word characteristics are word vector characteristics of authentication words used for authentication in the audio data.

The speech recognition model is a Transoformer (machine translation) model, the Transoformer model is used for recognizing text information data in the audio data, the word embedding model is a word2vec (word to vector) model, the word2vec model is used for encoding the text information data, and processing of the text information data can be simplified into vector operation in a vector space.

In this embodiment, an authentication device inputs the audio data into the transormer model, obtains text information data in the audio data identified by the speech recognition model, inputs the text information data into the word2vec model, encodes the text information data, and obtains a unique hot code of the text data, where the unique hot code is a word vector corresponding to a word in the audio data; and acquiring a word vector corresponding to the context of the word as the first authentication word characteristic according to the one-hot code.

S5: and splicing the first face characteristic, the first voiceprint characteristic and the first authentication word characteristic to obtain a first identity characteristic, and storing the first identity characteristic and the first lip action characteristic in a database as identity registration characteristics.

In this embodiment, the authentication device splices the first face feature, the first voiceprint feature and the first authentication word feature to obtain a first identity feature, combines the first identity feature and the first lip motion feature as an identity registration feature, and stores the identity registration feature in a database.

Referring to fig. 5, fig. 5 is a schematic flow chart of S5 in the identity authentication method based on a face and a voiceprint according to an embodiment of the present application, which includes steps S501 to S502, and specifically includes the following steps:

s501: and stretching the first face feature, the first voiceprint feature and the first authentication word feature into a one-dimensional vector.

S502: and according to a matrix splicing algorithm, splicing the first face characteristic, the first voiceprint characteristic and the first authentication word characteristic to obtain a first identity characteristic.

The matrix splicing algorithm is as follows:

D_pos＝con(D₁，D₂，D₃)

in the formula, D_posIs the first identity characteristic; d₁The first face information characteristic is obtained; d₂Is the first voiceprint feature; d₃Is the first authentication word characteristic.

In this embodiment, the authentication device performs end-to-end concatenation on the stretched first face feature, the stretched first voiceprint feature, and the stretched first authentication word feature to obtain a first identity feature.

S6: responding to an identity verification instruction of a user, wherein the identity verification instruction comprises face screen recording data and audio data when the user performs authentication, acquiring identity authentication characteristics, and verifying the identity of the user according to the identity authentication characteristics and identity registration characteristics to acquire an identity authentication result.

In this embodiment, a user sends an identity verification instruction to an authentication device through the user interaction interface, the authentication device obtains the identity verification instruction and responds, a camera is controlled to record a video of the user, face recording data of the user during authentication is obtained, a microphone is controlled to record the video, audio data of the user during authentication is obtained, an identity authentication feature is obtained according to the face recording data and the audio data during authentication, and the identity of the user is verified according to the identity authentication feature and an identity registration feature to obtain an identity authentication result.

Referring to fig. 6, fig. 6 is a schematic flowchart of S6 in the identity authentication method based on a face and a voiceprint according to an embodiment of the present application, including steps S601 to S602, which are as follows:

s601: acquiring a second face feature, a second lip action feature, a second voiceprint feature and a second authentication word feature, splicing the second face feature, the second voiceprint feature and the second authentication word feature to acquire a second identity feature, and combining the second identity feature and the second lip action feature to serve as an identity authentication feature; .

In this embodiment, the authentication device obtains the second face feature and the second lip action feature according to the face screen recording data and the face recognition model during authentication; acquiring the second voiceprint characteristics according to the audio data and the voiceprint recognition model; and acquiring the second authentication word characteristic according to the audio data, the voice recognition module and the word embedding model, splicing the second face characteristic, the second voiceprint characteristic and the second authentication word characteristic to acquire a second identity characteristic, and combining the second identity characteristic and the second lip action characteristic to be used as the identity authentication characteristic.

S602: and acquiring a distance value according to the identity authentication characteristic, the identity registration characteristic and a matrix distance algorithm, and acquiring the identity authentication result according to the distance value and a preset distance threshold value.

The matrix distance algorithm is as follows:

z＝(dis_cos(Matrix_poc，Matrix_pre)+dis_cos(D_pos，D_pre))/2

wherein z is the distance value; dis_cosIs the cosine distance; matrix_posIs the second lip motion feature; matrix_preIs the first lip action feature; d_posIs the first identity characteristic; d_preThe second identity characteristic;

wherein the expression of the cosine distance is

In this embodiment, the authentication device may obtain, according to the matrix distance algorithm, a first cosine distance between the first identity feature and the second identity feature, which represents the similarity between the first identity feature and the second identity feature, obtain a second cosine distance between the first lip action feature and the second lip action feature, which represents the similarity between the first lip action feature and the second lip action feature, and perform average processing by combining the first cosine distance and the second cosine distance to obtain a distance value, so that the accuracy of determining the similarity between the identity authentication feature and the identity registration feature is improved, and the identity of the user is better authenticated.

And according to the distance value and a preset distance threshold value, when the distance value is smaller than the distance threshold value, obtaining an identity authentication success result, and when the distance value is larger than the distance threshold value, obtaining an identity authentication failure result.

In an optional embodiment, the authentication device may perform an authentication action on the corresponding device terminal according to the result of successful identity authentication; the authentication equipment can control the corresponding equipment terminal to alarm according to the identity authentication failure result.

Referring to fig. 7, fig. 7 is a schematic structural diagram of an identity authentication apparatus based on a face and a voiceprint according to an embodiment of the present application, where the apparatus may implement all or a part of the identity authentication apparatus based on the face and the voiceprint through software, hardware, or a combination of the two, and the apparatus 7 includes:

an obtaining module 71, configured to obtain face screen recording data and audio data of a user;

a face information feature obtaining module 72, configured to input the face video data into a preset face recognition model, and obtain a first face information feature, where the first face information feature includes a first face feature and a first lip action feature;

a voiceprint feature obtaining module 73, configured to input the audio data into a preset voiceprint recognition model, and obtain a first voiceprint feature;

an authentication word feature obtaining module 74, configured to input the audio data to a preset voice recognition model, obtain recognized text information data, input the text information data to a preset word embedding model, and obtain a first authentication word feature, where the first authentication word feature is a word vector feature of an authentication word used for authentication in the audio data;

an identity registration module 75, configured to splice the first face feature, the first voiceprint feature, and the first authentication word feature to obtain a first identity feature, and store the first identity feature and the first lip action feature in a database as an identity registration feature;

and the identity verification module 76 is configured to respond to an identity verification instruction of the user, where the identity verification instruction includes face screen recording data and audio data when the user performs authentication, obtain an identity authentication feature, and verify the identity of the user according to the identity authentication feature and the identity registration feature, so as to obtain an identity authentication result.

In the embodiment of the application, the face screen recording data and the audio data of a user are acquired through the acquisition module; inputting the face video data into a preset face recognition model through a face information feature acquisition module to acquire a first face information feature, wherein the first face information feature comprises a first face feature and a first lip action feature; inputting the audio data into a preset voiceprint recognition model through a voiceprint feature acquisition module to acquire a first voiceprint feature; inputting the audio data into a preset voice recognition model through an authentication word feature acquisition module to acquire recognized character information data, inputting the character information data into a preset word embedding model to acquire a first authentication word feature, wherein the first authentication word feature is a word vector feature of an authentication word used for authentication in the audio data; splicing the first face feature, the first voiceprint feature and the first authentication word feature through an identity registration module to obtain a first identity feature, and storing the first identity feature and the first lip action feature in a database as identity registration features; responding to an identity verification instruction of a user through an identity verification module, wherein the identity verification instruction comprises face screen recording data and audio data when the user performs authentication, acquiring identity authentication characteristics, and verifying the identity of the user according to the identity authentication characteristics and identity registration characteristics to acquire an identity authentication result. By combining face recognition, voiceprint recognition and authentication word recognition, the extraction effect of face information features and voice features is improved, the problem of poor safety of 2D picture extraction-based and simple voiceprint extraction is solved while the identity authentication of a user is realized, and the safety of the identity authentication is enhanced.

Referring to fig. 8, fig. 8 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device 8 includes: a processor 81, a memory 82, and a computer program 83 stored on the memory 82 and operable on the processor 81; the computer device may store a plurality of instructions, which are suitable for being loaded by the processor 81 and executing the method steps in the embodiments described in fig. 1 to 2 and fig. 4 to 6, and the specific execution process may refer to the specific description of the embodiments described in fig. 1 to 2 and fig. 4 to 6, which is not repeated herein.

Processor 81 may include one or more processing cores, among others. The processor 81 is connected to various parts in the server by various interfaces and lines, and by operating or executing instructions, programs, code sets or instruction sets stored in the memory 82 and calling up data in the memory 82, various functions and Processing data of the identity authentication apparatus 7 based on the face and the voiceprint, optionally, the processor 81 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), Programmable Logic Array (PLA). The processor 81 may integrate one or a combination of a Central Processing Unit (CPU) 81, a Graphics Processing Unit (GPU) 81, a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing contents required to be displayed by the touch display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 81, but may be implemented by a single chip.

The Memory 82 may include a Random Access Memory (RAM) 82, and may also include a Read-Only Memory (Read-Only Memory) 82. Optionally, the memory 82 includes a non-transitory computer-readable medium. The memory 82 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 82 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch instructions, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 82 may optionally be at least one memory device located remotely from the processor 81.

An embodiment of the present application further provides a storage medium, where the storage medium may store a plurality of instructions, and the instructions are suitable for being loaded by a processor and executing the method steps in the embodiments described in fig. 1 to fig. 2 and fig. 4 to fig. 6, and a specific execution process may refer to specific descriptions of the embodiments described in fig. 1 to fig. 2 and fig. 4 to fig. 6, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc.

The present invention is not limited to the above-described embodiments, and various modifications and variations of the present invention are intended to be included within the scope of the claims and the equivalent technology of the present invention if they do not depart from the spirit and scope of the present invention.

Claims

1. An identity authentication method based on human faces and voiceprints is characterized by comprising the following steps:

acquiring face screen recording data and audio data of a user;

2. The identity authentication method based on the human face and the voiceprint according to claim 1, wherein the step of inputting the human face video data into a preset human face recognition model to obtain a first human face information characteristic comprises the steps of:

acquiring face characteristic points, wherein the face characteristic points comprise eye characteristic points, lip characteristic points and nose tip characteristic points;

acquiring the first lip action characteristic according to the face characteristic points and a normalization algorithm, wherein the normalization algorithm is as follows:

3. The identity authentication method based on the human face and the voiceprint according to claim 1, wherein the step of inputting the audio data into a preset voiceprint recognition model to obtain a first voiceprint feature comprises the steps of:

resampling the audio data to obtain single-channel audio data;

obtaining a spectrogram of the single sound channel audio data, calculating Mel frequency spectrum characteristics of the single sound channel audio data, inputting the Mel frequency spectrum characteristics to the voiceprint recognition model, and obtaining a first voiceprint characteristic.

4. The identity authentication method based on the face and the voiceprint according to claim 1, wherein the step of splicing the first face feature, the first voiceprint feature and the first authentication word feature to obtain the first identity feature comprises the steps of:

stretching the first face feature, the first voiceprint feature and the first authentication word feature into a one-dimensional vector;

according to a matrix splicing algorithm, splicing the first face feature, the first voiceprint feature and the first authentication word feature to obtain a first identity feature, wherein the matrix splicing algorithm is as follows:

D_pos＝con(D₁，D₂，D₃)

5. The identity authentication method based on the face and the voiceprint according to the claim 1 or 4, wherein the identity authentication instruction responding to the user comprises face screen recording data and audio data when the user authenticates, an identity authentication feature is obtained, the identity of the user is authenticated according to the identity authentication feature and the identity registration feature, and an identity authentication result is obtained, comprising the following steps:

acquiring a second face feature, a second lip action feature, a second voiceprint feature and a second authentication word feature, splicing the second face feature, the second voiceprint feature and the second authentication word feature to acquire a second identity feature, and combining the second identity feature and the second lip action feature to serve as an identity authentication feature;

obtaining a distance value according to the identity authentication characteristic, the identity registration characteristic and a matrix distance algorithm, and obtaining the identity authentication result according to the distance value and a preset distance threshold, wherein the matrix distance algorithm is as follows:

z＝(dis_cos(Matrix_pos，Matrix_pre)+dis_cos(D_pos，D_pre))/2

wherein z is the distance value; dis_cosIs the cosine distance; matrix_posIs the second lip motion feature; matrix_preIs the first lip action feature; d_posIs the first identity characteristic; d_preIs the second identity.

6. An identity authentication device based on a face and a voiceprint, comprising:

7. A computer device, comprising: a processor, a memory, and a computer program stored on the memory and executable on the processor; the computer program when executed by the processor realizes the steps of the face and voiceprint based identity authentication method of any one of claims 1 to 5.

8. A storage medium, characterized by: the storage medium stores a computer program which, when executed by a processor, performs the steps of the face and voiceprint based authentication method of any one of claims 1 to 5.