WO2019174131A1 - 身份认证方法、服务器及计算机可读存储介质 - Google Patents

身份认证方法、服务器及计算机可读存储介质 Download PDF

Info

Publication number
WO2019174131A1
WO2019174131A1 PCT/CN2018/089204 CN2018089204W WO2019174131A1 WO 2019174131 A1 WO2019174131 A1 WO 2019174131A1 CN 2018089204 W CN2018089204 W CN 2018089204W WO 2019174131 A1 WO2019174131 A1 WO 2019174131A1
Authority
WO
WIPO (PCT)
Prior art keywords
lip
image
target user
universal
dynamic
Prior art date
Application number
PCT/CN2018/089204
Other languages
English (en)
French (fr)
Inventor
王义文
王健宗
肖京
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019174131A1 publication Critical patent/WO2019174131A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24143Distances to neighbourhood prototypes, e.g. restricted Coulomb energy networks [RCEN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present application relates to the field of information security, and in particular, to an identity authentication method, a server, and a computer readable storage medium.
  • the present application proposes an identity authentication method and a server to solve the problem.
  • an identity authentication method which includes the steps of:
  • An identity authentication method is applied to a server, characterized in that the method comprises the steps of:
  • the universal lip recognition model Inputting the dynamic lip image into a pre-trained universal lip recognition model, the universal lip recognition model identifying a lip shape semantic corresponding to the dynamic lip image, and the universal lip recognition model is the dynamic port And mapping the user image in the universal port recognition model to obtain a target user identity corresponding to the dynamic port image;
  • the target user is authenticated in vivo. Successfully, if the target user has the target user identity, the identity verification of the target user is successful;
  • the preset reference port type semantics means reading information given by the system.
  • the present application further provides a server, including a memory, a processor, and an identity authentication system stored on the memory and operable on the processor, where the identity authentication system is processed.
  • a server including a memory, a processor, and an identity authentication system stored on the memory and operable on the processor, where the identity authentication system is processed. The steps of the identity authentication method as described above are implemented when the device is executed.
  • the present application further provides a computer readable storage medium storing an identity authentication system, the identity authentication system being executable by at least one processor to enable the At least one processor performs the steps of the identity authentication method as described above.
  • the identity authentication method, the server, and the computer readable storage medium proposed by the present application firstly extract the dynamic of the target user after receiving the video data read by the target user to be authenticated according to the prompt of the system. a port image; secondly, inputting the dynamic lip image into a pre-trained universal lip recognition model, the universal lip recognition model identifying a lip shape corresponding to the dynamic lip image, and the universal lip recognition
  • the model maps the dynamic lip image to a user database in the universal lip recognition model to obtain a target user identity corresponding to the dynamic lip image; and finally, compares the lip semantics with a reference lip semantic
  • the analysis is performed to authenticate the user in vivo, and the identity of the target user is verified.
  • the identity authentication method, the server and the computer readable storage medium proposed by the application can quickly verify the identity of the user, improve the speed and accuracy of the identity verification, greatly save the cost and improve the work efficiency.
  • 1 is a schematic diagram of an optional hardware architecture of the server of the present application.
  • FIG. 2 is a schematic diagram of a program module of a first embodiment of the identity authentication system of the present application
  • FIG. 3 is a schematic flowchart of a first embodiment of an identity authentication method according to the present application.
  • FIG. 4 is a schematic flowchart of a second embodiment of an identity authentication method according to the present application.
  • FIG. 5 is a schematic flowchart diagram of a third embodiment of the identity authentication method of the present application.
  • FIG. 1 it is a schematic diagram of an optional hardware architecture of the server 1 of the present application.
  • the server 1 may include, but is not limited to, the memory 11, the processor 12, and the network interface 13 being communicably connected to each other through a system bus. It is pointed out that Figure 1 only shows the server 1 with the components 11-13, but it should be understood that not all illustrated components are required to be implemented, and more or fewer components may be implemented instead.
  • the server 1 may be a computing device such as a rack server, a blade server, a tower server, or a rack server.
  • the server 1 may be an independent server or a server cluster composed of multiple servers.
  • the memory 11 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like.
  • the memory 11 may be an internal storage unit of the server 1, such as a hard disk or memory of the server 1.
  • the memory 11 may also be an external storage device of the server 1, such as a plug-in hard disk equipped on the server 1, a smart memory card (SMC), and a secure digital (Secure) Digital, SD) cards, flash cards, etc.
  • the memory 11 can also include both the internal storage unit of the server 1 and its external storage device.
  • the memory 11 is generally used to store an operating system installed in the server 1 and various types of application software, such as program codes of the identity authentication system 2. Further, the memory 11 can also be used to temporarily store various types of data that have been output or are to be output.
  • the processor 12 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments.
  • the processor 12 is typically used to control the overall operation of the server 1.
  • the processor 12 is configured to run program code or process data stored in the memory 11, such as running the identity authentication system 2 and the like.
  • the network interface 13 may comprise a wireless network interface or a wired network interface, which is typically used to establish a communication connection between the server 1 and other electronic devices.
  • the present application proposes an identity authentication system 2.
  • FIG. 2 it is a program module diagram of the first embodiment of the identity authentication system 2 of the present application.
  • the identity authentication system 2 includes a series of computer program instructions stored in the memory 11, and when the computer program instructions are executed by the processor 12, the identity authentication operation of the embodiments of the present application can be implemented.
  • the identity authentication system 2 can be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 3, the identity authentication system 2 can be divided into a receiving and pre-processing module 20, an extraction module 21, an identification module 22, a mapping module 23, a comparison module 24, and a validation module 25. among them:
  • the receiving and pre-processing module 20 is configured to receive, by the target user to be authenticated, the pre-processing of the video data according to the video data that is prompted by the system.
  • the information that the system prompts to read may be a number, a string, a sentence, or a combination thereof, such as "abc”, “13579abc”, “Hello 9527”, and the like.
  • the means for receiving the video data may be a webcam.
  • the extraction module 21 is configured to extract a dynamic lip image of the target user in the video data.
  • the step of acquiring the dynamic mouth image of the target user includes: acquiring a video image of the verified person; performing face detection on the pre-described video image; performing lip positioning on the detected face; and facing the lip
  • the feature extraction is performed to obtain an effective lip shape feature to identify the spoken language semantics.
  • the feature extraction includes the steps of: lip contour extraction; lip contour tracking; lip contour feature extraction.
  • the identification module 22 is configured to input the dynamic lip image into a pre-trained universal lip recognition model, and the universal lip recognition model identifies a lip semantic corresponding to the dynamic lip image.
  • the universal port type recognition model is constructed based on a deep neural network for training the processed voice and video data.
  • the structure of the network includes 9 layers, and the 3 layers of STCNN are connected to the Flatten layer.
  • the data of the three-layer volume base layer convolution calculation is processed by Flatten laminating, so that the multi-dimensional data is one-dimensionalized, the two layers of Bi-GRU are connected to the Flatten layer to train the deep neural network, and the second layer of Bi-GRU is connected to the whole layer.
  • the connection layer Dense connects each unit of the previous layer to the next layer.
  • the normalized layer softmax is connected, the softmax layer is connected to a layer of CTC, the CTC is the connection timing classification, and the CTC allows the probability of multiple sequences. Computationally, these sequences are all possible character-level translation sets of the speech sample.
  • the STCNN is a spatiotemporal convolution
  • the convolutional neural network performs a convolution stacking operation on the image space, which helps to improve the performance of the computer vision task.
  • the spatiotemporal convolution can be convolved in the time domain so that the video data can be processed.
  • a convolution from the C channel to the C' channel without the offset is calculated as follows:
  • GRU is a gated repeat unit.
  • GRU is a cyclic neural network (RNN).
  • RNN cyclic neural network
  • T is determined by the input z of the GRU.
  • z is the output of the STCNN;
  • the CTC loss function is widely used in speech recognition, which eliminates the step of aligning the input with the target output. Given a model, by adding a special "blank" mark to increase the vocabulary output discrete distribution sequence, CTC calculates the possible sequence probability by boundarying all the sequences defined as the same sequence, avoiding the one-to-one correspondence of input and output. step.
  • the mapping module 23 is configured to map the dynamic port image to a user database in the universal port recognition model to obtain a target user identity corresponding to the dynamic port image.
  • the universal lip recognition model further includes a user database, where the user database includes a user identity corresponding to the input voice video data, that is, after training through the deep neural network, the universal lip recognition model will be the user.
  • the identity is mapped one by one with the lip image data in the voice and video data.
  • the recognition system performs deep mining on the user's mouth image data through a deep neural network model, and constructs a depth feature of the user's speech type and a context relationship when speaking, such that the user A, user B, user C's identity corresponds to their respective mouth images.
  • the comparison module 24 compares the spoken language semantics with the reference oral semantics to perform biometric authentication on the user.
  • the reference port type semantics may include a semantic meaning of the voice recognition.
  • the universal mouth type recognition model may include a voice recognition model, and the voice recognition model may acquire voice data of the target user, and analyze the voice data.
  • the semantic meaning corresponding to the voice data may be obtained, and the reference mouth type semantics may also be a semantic meaning provided by the system. For example, when performing identity verification, the system will give characters, strings, sentences, etc. for the target user to read, At this point the system can know the correct semantics.
  • the confirmation module 25 is configured to confirm the biometric authentication and identity authentication of the target user.
  • the living authentication is successful, and the target user and the user in the user database are successfully mapped, and the identity of the target user is verified. The identity of the target user was successful.
  • the lip-type identification verification module verifies the detection at least three times, and prompts the identity authentication to succeed after multiple detections are successful.
  • the system provides a second set of character combinations after providing the first character combination, and the second set of character combinations is read by the verified person, and then verified, if the second After the secondary port type identification verification is passed, the third verification is performed, and the third group of character combinations are randomly provided by the system, and the third group of character combinations are read by the verified person, and then verified.
  • the prompt verification is passed. .
  • the present application also proposes an identity authentication method.
  • FIG. 3 it is a schematic flowchart of the first embodiment of the identity authentication method of the present application.
  • the order of execution of the steps in the flowchart shown in FIG. 3 may be changed according to different requirements, and some steps may be omitted.
  • Step S110 extracting a dynamic lip image of the target user in the video data.
  • Step S120 the dynamic lip image is input into a pre-trained universal lip recognition model, the universal lip recognition model identifies a lip semantic corresponding to the dynamic lip image, and the universal lip recognition model is The dynamic lip image is mapped to a user database in the universal lip recognition model to obtain a target user identity corresponding to the dynamic lip image.
  • the universal lip recognition model is constructed based on a deep neural network, the structure of the network includes 9 layers, and the three layers of STCNN are connected to the Flatten layer, and the data calculated by the three-layer volume base convolution is processed by Flatten lamination.
  • the Flatten layer is connected with two layers of Bi-GRU to train the deep neural network, and the second layer of Bi-GRU is connected with a fully connected layer Dense so that each unit of the previous layer is followed by the next layer.
  • the fully connected layer is connected to the normalized layer softmax
  • the softmax layer is connected to a layer of CTC
  • the CTC is the connection timing class
  • the CTC allows multi-sequence probability calculations.
  • Step S130 Perform comparison analysis on the spoken language semantics and the reference oral semantics to perform biometric authentication on the user. If the obtained spoken language semantics is similar to or the same as the semantic meaning of the reference oral semantics, the living authentication is successful and completed at the same time. The identity of the target user is verified, and the identity verification of the target user is successful.
  • a recognition method such as face recognition and voice recognition may be added.
  • face recognition and voice recognition are added to enhance the accuracy of the recognition.
  • the face recognition method may adopt a face recognition method based on a feature face (PCA): the feature face method is a face recognition method based on KL transform, and the KL transform is an optimal orthogonal transform of image compression.
  • the high-dimensional image space is KL-transformed to obtain a new set of orthogonal bases, and the important orthogonal bases are preserved. These bases can be expanded into low-dimensional linear spaces. If the projection of the face in these low-dimensional linear spaces is assumed to be separable, these projections can be used as the identified feature vectors, which is the basic idea of the feature face method.
  • These methods require more training samples and are based entirely on the statistical properties of the image grayscale.
  • a face recognition method based on geometric features a face recognition method based on a neural network, a face recognition method using an elastic map matching, a face recognition method using a line segment Hausdorff distance (LHD), and the like may be used. This will not be repeated here.
  • LHD line segment Hausdorff distance
  • FIG. 4 it is a schematic flowchart of a second embodiment of the identity authentication method of the present application.
  • the execution order of the steps in the flowchart shown in FIG. 4 may be changed according to different requirements, and some steps may be omitted.
  • the method includes the following steps:
  • Step S210 Receive, by the target user to be authenticated, the video data according to the video data read by the system, and preprocess the video data.
  • the pre-processing step includes enhancing the face image by performing methods such as color gradation, contrast, color balance, sharpening, noise reduction, deblurring, super-resolution, and histogram equalization.
  • Step S220 extracting a dynamic lip image of the target user in the video data.
  • the step of acquiring the dynamic lip image includes: acquiring a real-time image of the verifier (webcam), video image preprocessing, face detection, lip positioning, feature extraction (lip contour extraction, lip contour tracking, Lip outline feature extraction).
  • a real-time image of the verifier webcam
  • video image preprocessing face detection
  • lip positioning lip positioning
  • feature extraction lip contour extraction, lip contour tracking, Lip outline feature extraction
  • Step S230 the dynamic lip image is input into a pre-trained universal lip recognition model, the universal lip recognition model identifies a lip semantic corresponding to the dynamic lip image, and the universal lip recognition model is The dynamic lip image is mapped to a user database in the universal lip recognition model to obtain a target user identity corresponding to the dynamic lip image.
  • Step S240 Perform comparison analysis on the spoken language semantics and the reference oral semantics to perform live authentication on the user. If the obtained spoken language semantics is similar to or the same as the semantic meaning of the reference oral semantics, the living authentication is successful and completed at the same time. The identity of the target user is verified, and the identity verification of the target user is successful.
  • Steps S6220-S240 of the identity authentication method of FIG. 4 are similar to steps S110-S130 of the first embodiment, except that the method further includes step S210.
  • the universal lip recognition model of the identity authentication method is constructed based on a deep neural network, and the method for constructing the universal lip recognition model includes the following steps:
  • Step S310 Acquire a voice video sample in the server, and process the voice video sample, where the processing manner includes face detection, lip positioning, data labeling, and video framing.
  • processing the voice video samples may make the voice video samples more in line with the requirements of training the universal lip recognition model.
  • face detection refers to judging whether there is a face image in a dynamic scene and a complicated background, and separating the face image.
  • reference template method first design one or several standard face templates, then calculate the degree of matching between the sample collected by the test and the standard template, and judge whether there is a face through the threshold; face Rule method, because the face has a certain structural distribution feature, the method of face rule is to extract these features to generate corresponding rules to determine whether the test sample contains human face; sample learning method, which adopts artificial neural recognition in pattern recognition The method of the network, that is, the classifier is generated by learning the face image sample set and the non-face image sample set; the skin color model method is based on the law that the skin color distribution is relatively concentrated in the color space; the feature face method This method treats all face image sets as one face image subspace and determines whether there is a face image based on the distance between the detected sample and its projection between the child holes.
  • step S320 a deep neural network is constructed.
  • the structure of the network includes 9 layers, and the three layers of STCNN are connected to the Flatten layer.
  • the data calculated by the three-layer volume base layer convolution is processed by Flatten lamination, so that the multidimensional data is one-dimensional, and the Flatten layer is connected to the two layers.
  • Bi-GRU trains deep neural networks.
  • the second layer of Bi-GRU is connected to a fully connected layer Dense so that each unit in the previous layer is connected to the next layer.
  • After the full connection layer is connected to the normalized layer softmax
  • the softmax layer is connected to a layer of CTC, the CTC is a connection timing classification, and the CTC allows multi-sequence probability calculations, which are all possible character-level translation sets of the speech sample.
  • step S330 the deep neural network is trained by using the processed voice video data to obtain the universal lip recognition model, and the universal lip recognition model can recognize the lip semantic meaning corresponding to the input lip image.
  • the recognition system performs deep mining on the user's mouth image data through a deep neural network model, and constructs a depth feature of the user's speech type and a context relationship when speaking, such that the user A, user B, user C's identity corresponds to their respective mouth images.
  • the model uses a single character as the basic unit and a contextual relationship as a bridge to achieve sentence-level recognition.
  • the identity authentication method, the server, and the computer readable storage medium proposed by the present application firstly extract the dynamic of the target user after receiving the video data read by the target user to be authenticated according to the prompt of the system. a port image; secondly, inputting the dynamic lip image into a pre-trained universal lip recognition model, the universal lip recognition model identifying a lip shape corresponding to the dynamic lip image, and the universal lip recognition
  • the model maps the dynamic lip image to a user database in the universal lip recognition model to obtain a target user identity corresponding to the dynamic lip image; and finally, compares the lip semantics with a reference lip semantic
  • the analysis is performed to authenticate the user in vivo, and the identity of the target user is verified.
  • the identity authentication method, the server and the computer readable storage medium proposed by the application can quickly verify the identity of the user, improve the speed and accuracy of the identity verification, greatly save the cost and improve the work efficiency.
  • the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better.
  • Implementation Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Collating Specific Patterns (AREA)
  • Image Analysis (AREA)

Abstract

一种身份认证方法,该方法包括:提取视频数据中目标用户的动态口型影像,通用口型识别模型识别出所述动态口型影像对应的口型语意,并且将所述动态口型影像与用户数据库进行映射以获得该动态口型影像对应的目标用户身份,将所述口型语意与参考口型语意进行比对分析以对该用户进行活体认证,从而完成身份认证。所述身份认证方法、服务器及计算机可读存储介质可以快速对用户的身份进行验证,提高身份验证的速度、准确度,极大地节约了成本、提高了工作效率。

Description

身份认证方法、服务器及计算机可读存储介质
优先权申明
本申请要求于2018年03月12日提交中国专利局、申请号为201810198704.2,名称为“身份认证方法、服务器及计算机可读存储介质”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合本申请中。
技术领域
本申请涉及信息安全领域,尤其涉及一种身份认证方法、服务器及计算机可读存储介质。
背景技术
目前,很多公司业务范围广,涉及多个方面,每个业务范畴都需要同客户进行沟通,而这些业务包含了大量的重复性对话,如业务咨询、办理业务的客户信息获取。为了满足业务需求,目前很多公司主要采用人工和计算机引导的方式处理此类业务。但是当公司的客户群体庞大,靠人工处理业务就显得费时费力,增加业务成本,而计算机引导灵活性较差,只能针对特定的业务流程,切计算机处理时难以快速、准确的获知客户的真实身份,无法对客户的身份进行快速有效的验证。
因此,如何快速对客户身份进行识别验证,成为当下亟需解决的一大问题。
发明内容
有鉴于此,本申请提出一种身份认证方法及服务器,以解决如何的问题。
首先,为实现上述目的,本申请提出一种身份认证方法,该方法包括步骤:
一种身份认证方法,应用于服务器,其特征在于,所述方法包括步骤:
提取该目标用户的动态口型影像;
将所述动态口型影像输入预先训练的通用口型识别模型,该通用口型识别模型识别出所述动态口型影像对应的口型语意,并且所述通用口型识别模型将所述动态口型影像与所述通用口型识别模型中的用户数据库进行映射以获得该动态口型影像对应的目标用户身份;及
将所述口型语意与预设参考口型语意进行比对分析以对该目标用户进行活体认证,若所述口型语意与参考口型语意的语意相近或者相同,则所述目标用户活体认证成功,若所述目标用户具有所述目标用户身份,则该目标用户的身份验证成功;
其中,所述预设参考口型语意为系统给出的阅读信息。
此外,为实现上述目的,本申请还提供一种服务器,包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的身份认证系统,所述身份认证系统被所述处理器执行时实现如上述的身份认证方法的步骤。
进一步地,为实现上述目的,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有身份认证系统,所述身份认证系统可被至少一个处理器执行,以使所述至少一个处理器执行如上述的身份认证方法的步骤。
相较于现有技术,本申请所提出的身份认证方法、服务器及计算机可读存储介质,首先在接收到待进行身份验证的目标用户根据系统提示阅读的视频数据后,提取该目标用户的动态口型影像;其次,将所述动态口型影像输入预先训练的通用口型识别模型,该通用口型识别模型识别出所述动态口型影像对应的口型语意,并且所述通用口型识别模型将所述动态口型影像与所述通用口型识别模型中的用户数据库进行映射以获得该动态口型影像对应的目标用户身份;最后,将所述口型语意与参考口型语意进行比对分析以对该用户进行活体认证,同时完成对该目标用户的身份进行验证。采用本申请所 提出的身份认证方法、服务器及计算机可读存储介质可以快速对用户的身份进行验证,提高身份验证的速度、准确度,极大地节约了成本、提高了工作效率。
附图说明
图1是本申请服务器一可选的硬件架构的示意图;
图2是本申请身份认证系统第一实施例的程序模块示意图;
图3是本申请身份认证方法第一实施例的流程示意图;
图4是本申请身份认证方法第二实施例的流程示意图;
图5是本申请身份认证方法第三实施例的流程示意图。
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之 内。
参阅图1所示,是本申请服务器1一可选的硬件架构的示意图。
本实施例中,所述服务器1可包括,但不仅限于,可通过系统总线相互通信连接存储器11、处理器12、网络接口13。需要指出的是,图1仅示出了具有组件11-13的服务器1,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
其中,所述服务器1可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器等计算设备,该服务器1可以是独立的服务器,也可以是多个服务器所组成的服务器集群。
所述存储器11至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器11可以是所述服务器1的内部存储单元,例如该服务器1的硬盘或内存。在另一些实施例中,所述存储器11也可以是所述服务器1的外部存储设备,例如该服务器1上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器11还可以既包括所述服务器1的内部存储单元也包括其外部存储设备。本实施例中,所述存储器11通常用于存储安装于所述服务器1的操作系统和各类应用软件,例如身份认证系统2的程序代码等。此外,所述存储器11还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器12在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器12通常用于控制所述服务器1的总体操作。本实施例中,所述处理器12用于运行所 述存储器11中存储的程序代码或者处理数据,例如运行所述的身份认证系统2等。
所述网络接口13可包括无线网络接口或有线网络接口,该网络接口13通常用于在所述服务器1与其他电子设备之间建立通信连接。
至此,己经详细介绍了本申请相关设备的硬件结构和功能。下面,将基于上述介绍提出本申请的各个实施例。
首先,本申请提出一种身份认证系统2。
参阅图2所示,是本申请身份认证系统2第一实施例的程序模块图。
本实施例中,所述身份认证系统2包括一系列的存储于存储器11上的计算机程序指令,当该计算机程序指令被处理器12执行时,可以实现本申请各实施例的身份认证操作。在一些实施例中,基于该计算机程序指令各部分所实现的特定的操作,身份认证系统2可以被划分为一个或多个模块。例如,在图3中,所述身份认证系统2可以被分割成接收及预处理模块20、提取模块21、识别模块22、映射模块23、对比模块24及确认模块25。其中:
所述接收及预处理模块20,用于接收待进行身份验证的目标用户根据系统提示录入的所述视频数据,对所述视频数据进行预处理。
具体地,系统提示阅读的信息可以是数字、字符串、一句话,也可以是它们的组合,比如可以使“abc”、“13579abc”、“你好9527”等。
具体地,用于接收所述视频数据的装置可以是网络摄像机。
所述提取模块21,用于提取视频数据中目标用户的动态口型影像。
具体地,获取所述目标用户的动态口型影像的步骤包括:获取被验证者的视频图像;对预所述视频图像进行人脸检测;对检测到的人脸进行唇部定位;及对唇部进行特征提取,从而获取有效口型特征以识别口型语意。
具体地,所述的特征提取包括步骤:唇部轮廓提取;唇部轮廓追踪;唇部轮廓特征提取。
所述识别模块22,用于将所述动态口型影像输入预先训练的通用口型识别模型,该通用口型识别模型识别出所述动态口型影像对应的口型语意。
具体地,所述通用口型识别模型基于深度神经网络而构建,该深度神经网络用于上述处理后的语音视频数据进行训练,该网络的结构包括9层,三层STCNN后连接Flatten层,经过三层卷基层卷积计算后的数据经过Flatten层压平处理,使得多维数据一维化,Flatten层后连接两层Bi-GRU对深度神经网络进行训练,第二层Bi-GRU连接一层全连接层Dense使得前面一层的每个单元都与后面一层的相连接,全连接层后连接归一化层softmax,softmax层连接一层CTC,CTC为连接时序分类,CTC允许多序列的概率计算,这些序列是语音样例的所有可能的字符级转译集合。
具体地,STCNN为时空卷积,卷积神经网络在图像空间上进行卷积堆叠操作,有助于提高计算机视觉任务的性能。时空卷积可以在时间域上进行卷积从而可以对视频数据进行处理,一个不包含偏置的由C通道变为C′通道的卷积计算如下:
Figure PCTCN2018089204-appb-000001
其中x为输入,w为卷积核权重,
Figure PCTCN2018089204-appb-000002
时空卷积可以在时间域上进行卷积从而可以对视频数据进行处理;
GRU为门控重复单元,GRU是一种循环神经网络(RNN),通过添加细胞和控制门来改进早期的RNN,并学习控制该信息流,门控重复单元可以使RNN具有更多持久的记忆从而支持更长的序列,我们使用双向GRU,即Bi-GRU,使用双层Bi-GRU来训练深度神经网络可以学习到更多的东西,对于预测准确率的提升有较大帮助。GRU标准公式为:
[u t,r t] T=sigm(W zz t+W hh t-1+b g)
Figure PCTCN2018089204-appb-000003
Figure PCTCN2018089204-appb-000004
Figure PCTCN2018089204-appb-000005
其中z:={z 1,...,z T}为RNN的输入序列,⊙表示元素乘法,我们使用双向GRU(Bi-GRU),RNN序列为:
Figure PCTCN2018089204-appb-000006
Figure PCTCN2018089204-appb-000007
Figure PCTCN2018089204-appb-000008
对每一个时间步t使得p(u t|z)=softmax(mlp(h t;W mlp)),其中mlp是含有权重W mlp的前馈神经网络。然后我们可以定义时间序列分布:
p(u t,...,u T|z)=∏ 1≤t≤Tp(u t|z)
其中T由GRU的输入z决定。在该模型中z为STCNN的输出;
CTC损失函数广泛应用于语音识别,它可以消除输入与目标输出对齐的步骤。给定一个模型,通过用特殊的“空白”标记增加词汇表输出离散分布序列,CTC通过边界化所有被定义为与该序列相同的序列来计算可能序列概率,避免了将输入输出一一对应的步骤。
令表示单个时间步长输入词汇表的集合
Figure PCTCN2018089204-appb-000009
其中○为“空白”标记。定义函数
Figure PCTCN2018089204-appb-000010
给定字符串
Figure PCTCN2018089204-appb-000011
删除相邻的重复字符并删除空白。对于一个标签序列y∈V *,CTC定义:
Figure PCTCN2018089204-appb-000012
其中T为模型序列的时间步长。例如,T=3,则CTC定义字符串“am”的可能性为:
p(aam)+p(amm)+p(○am)+p(a○m)+p(am○)
所述映射模块23,用于将所述动态口型影像与所述通用口型识别模型中的用户数据库进行映射以获得该动态口型影像对应的目标用户身份。
具体地,该通用口型识别模型还包括用户数据库,该用户数据库包括输入的语音视频数据对应的用户身份,也就是说,经过所述深度神经网络进行训练后,该通用口型识别模型将用户身份与语音视频数据中的口型影像数据一一映射。
具体地,不同的用户具有不同的口型特征,本识别系统通过深度神经网络模型,对用户的口型影像数据进行深度挖掘,构建用户说话口型的深度特征和说话时的上下文关系,这样用户A,用户B,用户C的身份就与其各自的口型影像一一对应。
所述对比模块24,将所述口型语意与参考口型语意进行比对分析以对该用户进行活体认证。
具体地,其中,所述参考口型语意可以包括语音识别的语意,例如通用口型识别模型中可以包括语音识别模型,该语音识别模型可以获取目标用户的语音数据,对所述语音数据进行分析后可以得到所述语音数据对应的语意,所述参考口型语意还可以是系统提供的语意,例如,当进行身份验证时,系统会给出字符、字符串、语句等让目标用户读出,此时系统可以得知正确的语意。
所述确认模块25,用于对所述目标用户的活体认证及身份认证进行确认。
具体地,若获取的口型语意与参考口型语意的语意相近或者相同,则活体认证成功,同时所述目标用户与用户数据库中的用户映射成功,则完成对该目标用户的身份进行验证,该目标用户的身份验证成功。
具体地,为了进一步提高口型识别验证的可靠性,本申请中,所述口型识别验证模块验证至少检测三次,并在多次检测成功后提示身份认证成功。
具体地,第一次验证成功后,系统在提供了第一次字符组合后,再随机提供第二组字符组合,由被验证者读出该第二组字符组合,再进行验证,如 果第二次口型识别验证通过,则进行第三次验证,由系统再随机提供第三组字符组合,由被验证者读出该第三组字符组合,再进行验证,三次验证成功后,提示验证通过。当然,也可以根据需要设置更多次的验证,具体不限。
进一步地,本系统全部模块都在线上运行,通过GPU并行加速计算。可以显著的减少训练与识别时间,能够有效降低相关业务的人力成本。
此外,本申请还提出一种身份认证方法。
参阅图3所示,是本申请身份认证方法第一实施例的流程示意图。在本实施例中,根据不同的需求,图3所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。
步骤S110,提取视频数据中目标用户的动态口型影像。
步骤S120,将所述动态口型影像输入预先训练的通用口型识别模型,该通用口型识别模型识别出所述动态口型影像对应的口型语意,并且所述通用口型识别模型将所述动态口型影像与所述通用口型识别模型中的用户数据库进行映射以获得该动态口型影像对应的目标用户身份。
具体地,所述通用口型识别模型基于深度神经网络而构建,该网络的结构包括9层,三层STCNN后连接Flatten层,经过三层卷基层卷积计算后的数据经过Flatten层压平处理,使得多维数据一维化,Flatten层后连接两层Bi-GRU对深度神经网络进行训练,第二层Bi-GRU连接一层全连接层Dense使得前面一层的每个单元都与后面一层的相连接,全连接层后连接归一化层softmax,softmax层连接一层CTC,CTC为连接时序分类,CTC允许多序列的概率计算,这些序列是语音样例的所有可能的字符级转译集合。
步骤S130,将所述口型语意与参考口型语意进行比对分析以对该用户进行活体认证,若获取的口型语意与参考口型语意的语意相近或者相同,则活体认证成功,同时完成对该目标用户的身份进行验证,该目标用户的身份验证成功。
具体地,为了增加身份验证的可靠性,还可以增加人脸识别,语音识别等识别方式,例如,当口型识别验证通过后,再增加人脸识别及语音识别以增强识别的准确性。
具体地,所述人脸识别方法可以采用基于特征脸(PCA)的人脸识别方法:特征脸方法是基于KL变换的人脸识别方法,KL变换是图像压缩的一种最优正交变换。高维的图像空间经过KL变换后得到一组新的正交基,保留其中重要的正交基,由这些基可以张成低维线性空间。如果假设人脸在这些低维线性空间的投影具有可分性,就可以将这些投影用作识别的特征矢量,这就是特征脸方法的基本思想。这些方法需要较多的训练样本,而且完全是基于图像灰度的统计特性的。当然,在其他实施方式中,还可以采用几何特征的人脸识别方法、神经网络的人脸识别方法、弹性图匹配的人脸识别方法、线段Hausdorff距离(LHD)的人脸识别方法等,在此不再赘述。
如图4所示,是本申请身份认证方法的第二实施例的流程示意图。本实施例中,根据不同的需求,图4所示的流程图中的步骤的执行顺序可以改变,某些步骤可以省略。
该方法包括以下步骤:
步骤S210,接收待进行身份验证的目标用户根据系统提示阅读的所述视频数据,对所述视频数据进行预处理。
具体地,预处理步骤包括对人脸图像通过做色阶、对比度、色彩平衡、锐化、降噪、去模糊、超解析、直方图均衡化等方法进行增强。
步骤S220,提取视频数据中目标用户的动态口型影像。
具体地,获取所述动态口型影像的步骤包括:获取被验证者的实时图像(网络摄像机)、视频图像预处理、人脸检测、唇部定位、特征提取(嘴唇轮廓提取、嘴唇轮廓追踪、嘴唇轮廓特征提取)。
步骤S230,将所述动态口型影像输入预先训练的通用口型识别模型,该 通用口型识别模型识别出所述动态口型影像对应的口型语意,并且所述通用口型识别模型将所述动态口型影像与所述通用口型识别模型中的用户数据库进行映射以获得该动态口型影像对应的目标用户身份。
步骤S240,将所述口型语意与参考口型语意进行比对分析以对该用户进行活体认证,若获取的口型语意与参考口型语意的语意相近或者相同,则活体认证成功,同时完成对该目标用户的身份进行验证,该目标用户的身份验证成功。
图4所述身份认证方法的步骤S6220-S240与第一实施例的步骤S110-S130相类似,区别在于该方法还包括步骤S210。
如图5所示,是本申请身份认证方法的第三实施例的流程示意图。本实施例中,所述身份认证方法的通用口型识别模型基于深度神经网络而构建,所述通用口型识别模型的构建方法包括以下步骤:
步骤S310,获取服务器中的语音视频样本,对所述语音视频样本进行处理,所述处理方式包括人脸检测、嘴唇定位、数据标注、视频分帧。
具体地,对所述语音视频样本进行处理可以使得这些语音视频样本更加符合训练所述通用口型识别模型的要求。
具体地,人脸检测是指在动态的场景与复杂的背景中判断是否存在面像,并分离出这种面像。一般有下列几种方法:参考模板法,首先设计一个或数个标准人脸的模板,然后计算测试采集的样品与标准模板之间的匹配程度,并通过阈值来判断是否存在人脸;人脸规则法,由于人脸具有一定的结构分布特征,所谓人脸规则的方法即提取这些特征生成相应的规则以判断测试样品是否包含人脸;样品学习法,这种方法即采用模式识别中人工神经网络的方法,即通过对面像样品集和非面像样品集的学习产生分类器;肤色模型法,这种方法是依据面貌肤色在色彩空间中分布相对集中的规律来进行检测;特征子脸法,这种方法是将所有面像集合视为一个面像子空间,并基于检测样 品与其在子孔间的投影之间的距离判断是否存在面像。
步骤S320,构建深度神经网络。
具体地,该网络的结构包括9层,三层STCNN后连接Flatten层,经过三层卷基层卷积计算后的数据经过Flatten层压平处理,使得多维数据一维化,Flatten层后连接两层Bi-GRU对深度神经网络进行训练,第二层Bi-GRU连接一层全连接层Dense使得前面一层的每个单元都与后面一层的相连接,全连接层后连接归一化层softmax,softmax层连接一层CTC,CTC为连接时序分类,CTC允许多序列的概率计算,这些序列是语音样例的所有可能的字符级转译集合。
步骤S330,使用处理后的语音视频数据对所述深度神经网络进行训练,得到所述通用口型识别模型,该通用口型识别模型可以识别出输入口型影像对应的口型语意。
具体地,不同的用户具有不同的口型特征,本识别系统通过深度神经网络模型,对用户的口型影像数据进行深度挖掘,构建用户说话口型的深度特征和说话时的上下文关系,这样用户A,用户B,用户C的身份就与其各自的口型影像一一对应。
具体地,本模型以单字符为基本单元,以上下文关系为桥梁,实现句子级别的识别。
相较于现有技术,本申请所提出的身份认证方法、服务器及计算机可读存储介质,首先在接收到待进行身份验证的目标用户根据系统提示阅读的视频数据后,提取该目标用户的动态口型影像;其次,将所述动态口型影像输入预先训练的通用口型识别模型,该通用口型识别模型识别出所述动态口型影像对应的口型语意,并且所述通用口型识别模型将所述动态口型影像与所述通用口型识别模型中的用户数据库进行映射以获得该动态口型影像对应的目标用户身份;最后,将所述口型语意与参考口型语意进行比对分析以对该用户进行活体认证,同时完成对该目标用户的身份进行验证。采用本申请所 提出的身份认证方法、服务器及计算机可读存储介质可以快速对用户的身份进行验证,提高身份验证的速度、准确度,极大地节约了成本、提高了工作效率。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种身份认证方法,应用于服务器,其特征在于,所述方法包括步骤:
    提取视频数据中目标用户的动态口型影像;
    将所述动态口型影像输入预先训练的通用口型识别模型,该通用口型识别模型识别出所述动态口型影像对应的口型语意,并且所述通用口型识别模型将所述动态口型影像与所述通用口型识别模型中的用户数据库进行映射以获得该动态口型影像对应的目标用户身份;及
    将所述口型语意与预设参考口型语意进行比对分析以对该目标用户进行活体认证,若所述口型语意与参考口型语意的语意相近或者相同,则所述目标用户活体认证成功,若所述目标用户具有所述目标用户身份,则该目标用户的身份验证成功;
    其中,所述预设参考口型语意为系统给出的阅读信息。
  2. 如权利要求1所述的身份认证方法,其特征在于,获取所述目标用户的动态口型影像的步骤包括:
    获取所述目标用户的视频图像;
    对预所述视频图像进行人脸检测;
    对检测到的人脸进行唇部定位;及
    对唇部进行特征提取,从而获取有效口型特征,将所述有效口型特征作为所述动态口型影像。
  3. 如权利要求2所述的身份认证方法,其特征在于,所述的对唇部进行特征提取包括:
    对所述唇部进行唇部轮廓提取;
    对所述唇部进行唇部轮廓追踪;及
    对所述唇部进行唇部轮廓特征提取。
  4. 如权利要求1-3所述的身份认证方法,其特征在于,在获取所述动态口型影像之前还包括步骤:接收所述目标用户根据系统提示录入的视频数据, 对所述视频数据进行预处理,所述预处理包括步骤:对所述视频图像通过做色阶、对比度、色彩平衡、锐化、降噪、去模糊、超解析、直方图均衡化的处理进行图像增强。
  5. 如权利要求4所述的身份认证方法,其特征在于,所述通用口型识别模型基于深度神经网络而构建,该深度神经网络的结构包括9层,三层时空卷积后连接压平层,压平层后连接两层Bi-GRU,第二层Bi-GRU连接一层全连接层,全连接层后连接归一化层,归一化层连接一层连接时序分类层,其中,所述时空卷积可以在时间域上进行卷积从而可以对视频数据进行处理,所述压平层使得多维数据一维化,所述Bi-GRU为双向门控重复单元。
  6. 如权利要求5所述的身份认证方法,其特征在于,构建所述通用口型识别模型的步骤包括:
    获取服务器中的语音视频样本,对所述语音视频样本进行处理,所述处理方式包括人脸检测、嘴唇定位、数据标注、视频分帧;
    构建深度神经网络;及
    使用处理后的语音视频数据对所述深度神经网络进行训练,得到所述通用口型识别模型,该通用口型识别模型可以识别出输入口型影像对应的口型语意。
  7. 如权利要求6所述的身份认证方法,其特征在于,所述通用口型识别模型还包括用户数据库,该用户数据库包括输入的语音视频数据对应的用户身份,所述通用口型识别模型将用户身份与所述语音视频数据中的口型影像数据一一映射。
  8. 如权利要求7所述的身份认证方法,其特征在于,对所述目标用户的身份验证至少检测三次,并在多次检测成功后提示身份认证成功。
  9. 一种服务器,其特征在于,所述服务器包括存储器、处理器及存储在所述存储器上并可在所述处理器上运行的身份认证系统,所述身份认证系统被所述处理器执行时实现如下步骤:
    提取视频数据中目标用户的动态口型影像;
    将所述动态口型影像输入预先训练的通用口型识别模型,该通用口型识别模型识别出所述动态口型影像对应的口型语意,并且所述通用口型识别模型将所述动态口型影像与所述通用口型识别模型中的用户数据库进行映射以获得该动态口型影像对应的目标用户身份;及
    将所述口型语意与预设参考口型语意进行比对分析以对该目标用户进行活体认证,若所述口型语意与参考口型语意的语意相近或者相同,则所述目标用户活体认证成功,若所述目标用户具有所述目标用户身份,则该目标用户的身份验证成功;
    其中,所述预设参考口型语意为系统给出的阅读信息。
  10. 如权利要求9所述的服务器,其特征在于,获取所述目标用户的动态口型影像的步骤包括:
    获取所述目标用户的视频图像;
    对预所述视频图像进行人脸检测;
    对检测到的人脸进行唇部定位;及
    对唇部进行特征提取,从而获取有效口型特征,将所述有效口型特征作为所述动态口型影像。
  11. 如权利要求10所述的服务器,其特征在于,所述的对唇部进行特征提取包括:
    对所述唇部进行唇部轮廓提取;
    对所述唇部进行唇部轮廓追踪;及
    对所述唇部进行唇部轮廓特征提取。
  12. 如权利要求9所述的服务器,其特征在于,在获取所述动态口型影像之前还包括步骤:接收所述目标用户根据系统提示录入的视频数据,对所述视频数据进行预处理,所述预处理包括步骤:对所述视频图像通过做色阶、对比度、色彩平衡、锐化、降噪、去模糊、超解析、直方图均衡化的处理进 行图像增强。
  13. 如权利要求12所述的服务器,其特征在于,所述通用口型识别模型基于深度神经网络而构建,该深度神经网络的结构包括9层,三层时空卷积后连接压平层,压平层后连接两层Bi-GRU,第二层Bi-GRU连接一层全连接层,全连接层后连接归一化层,归一化层连接一层连接时序分类层,其中,所述时空卷积可以在时间域上进行卷积从而可以对视频数据进行处理,所述压平层使得多维数据一维化,所述Bi-GRU为双向门控重复单元。
  14. 如权利要求13所述的服务器,其特征在于,构建所述通用口型识别模型的步骤包括:
    获取服务器中的语音视频样本,对所述语音视频样本进行处理,所述处理方式包括人脸检测、嘴唇定位、数据标注、视频分帧;
    构建深度神经网络;及
    使用处理后的语音视频数据对所述深度神经网络进行训练,得到所述通用口型识别模型,该通用口型识别模型可以识别出输入口型影像对应的口型语意。
  15. 如权利要求14所述的服务器,其特征在于,所述通用口型识别模型还包括用户数据库,该用户数据库包括输入的语音视频数据对应的用户身份,所述通用口型识别模型将用户身份与所述语音视频数据中的口型影像数据一一映射。
  16. 如权利要求15所述的服务器,其特征在于,对所述目标用户的身份验证至少检测三次,并在多次检测成功后提示身份认证成功。
  17. 一种计算机可读存储介质,所述计算机可读存储介质存储有身份认证系统,所述身份认证系统可被至少一个处理器执行时,实现如下步骤:
    提取视频数据中目标用户的动态口型影像;
    将所述动态口型影像输入预先训练的通用口型识别模型,该通用口型识别模型识别出所述动态口型影像对应的口型语意,并且所述通用口型识别模 型将所述动态口型影像与所述通用口型识别模型中的用户数据库进行映射以获得该动态口型影像对应的目标用户身份;及
    将所述口型语意与预设参考口型语意进行比对分析以对该目标用户进行活体认证,若所述口型语意与参考口型语意的语意相近或者相同,则所述目标用户活体认证成功,若所述目标用户具有所述目标用户身份,则该目标用户的身份验证成功;
    其中,所述预设参考口型语意为系统给出的阅读信息。
  18. 如权利要求17所述的计算机可读存储介质,其特征在于,获取所述目标用户的动态口型影像的步骤包括:
    获取所述目标用户的视频图像;
    对预所述视频图像进行人脸检测;
    对检测到的人脸进行唇部定位;及
    对唇部进行特征提取,从而获取有效口型特征,将所述有效口型特征作为所述动态口型影像。
  19. 如权利要求18所述的计算机可读存储介质,其特征在于,所述的对唇部进行特征提取包括:
    对所述唇部进行唇部轮廓提取;
    对所述唇部进行唇部轮廓追踪;及
    对所述唇部进行唇部轮廓特征提取。
  20. 如权利要求17所述的计算机可读存储介质,其特征在于,在获取所述动态口型影像之前还包括步骤:接收所述目标用户根据系统提示录入的视频数据,对所述视频数据进行预处理,所述预处理包括步骤:对所述视频图像通过做色阶、对比度、色彩平衡、锐化、降噪、去模糊、超解析、直方图均衡化的处理进行图像增强。
PCT/CN2018/089204 2018-03-12 2018-05-31 身份认证方法、服务器及计算机可读存储介质 WO2019174131A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810198704.2A CN108427874A (zh) 2018-03-12 2018-03-12 身份认证方法、服务器及计算机可读存储介质
CN201810198704.2 2018-03-12

Publications (1)

Publication Number Publication Date
WO2019174131A1 true WO2019174131A1 (zh) 2019-09-19

Family

ID=63158176

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/089204 WO2019174131A1 (zh) 2018-03-12 2018-05-31 身份认证方法、服务器及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN108427874A (zh)
WO (1) WO2019174131A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705438A (zh) * 2019-09-27 2020-01-17 腾讯科技(深圳)有限公司 步态识别方法、装置、设备及存储介质
CN111126124A (zh) * 2019-10-12 2020-05-08 深圳壹账通智能科技有限公司 多方视频的用户身份验证方法、装置及计算机设备
CN111860454A (zh) * 2020-08-04 2020-10-30 北京深醒科技有限公司 一种基于人脸识别的模型切换算法
CN112491840A (zh) * 2020-11-17 2021-03-12 平安养老保险股份有限公司 信息修改方法、装置、计算机设备及存储介质
CN113314145A (zh) * 2021-06-09 2021-08-27 广州虎牙信息科技有限公司 样本生成、模型训练、口型驱动方法、装置、设备及介质
CN114495908A (zh) * 2022-02-08 2022-05-13 北京中科深智科技有限公司 一种基于时序卷积的语音驱动口型的方法和系统

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109040781B (zh) * 2018-08-27 2021-04-30 北京京东尚科信息技术有限公司 视频认证方法、装置、系统、电子设备及可读介质
CN109450639A (zh) * 2018-10-23 2019-03-08 出门问问信息科技有限公司 身份验证的方法、装置、电子设备及计算机可读存储介质
CN111160047A (zh) * 2018-11-08 2020-05-15 北京搜狗科技发展有限公司 一种数据处理方法、装置和用于数据处理的装置
CN110223710A (zh) * 2019-04-18 2019-09-10 深圳壹账通智能科技有限公司 多重联合认证方法、装置、计算机装置及存储介质
CN110210196B (zh) * 2019-05-08 2023-01-06 北京地平线机器人技术研发有限公司 身份认证方法及装置
CN110955874A (zh) * 2019-10-12 2020-04-03 深圳壹账通智能科技有限公司 身份验证方法、装置、计算机设备和存储介质
CN113657135A (zh) * 2020-05-12 2021-11-16 北京中关村科金技术有限公司 基于深度学习的活体检测方法、装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177238A (zh) * 2011-12-26 2013-06-26 宇龙计算机通信科技(深圳)有限公司 终端和用户识别方法
CN104217212A (zh) * 2014-08-12 2014-12-17 优化科技(苏州)有限公司 真人身份验证方法
US20160162729A1 (en) * 2013-09-18 2016-06-09 IDChecker, Inc. Identity verification using biometric data
CN106126017A (zh) * 2016-06-20 2016-11-16 北京小米移动软件有限公司 智能识别方法、装置和终端设备
US20170024608A1 (en) * 2015-07-20 2017-01-26 International Business Machines Corporation Liveness detector for face verification

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140010417A1 (en) * 2012-07-04 2014-01-09 Korea Advanced Institute Of Science And Technology Command input method of terminal and terminal for inputting command using mouth gesture
CN104200146A (zh) * 2014-08-29 2014-12-10 华侨大学 一种结合视频人脸和数字唇动密码的身份验证方法
CN104966053B (zh) * 2015-06-11 2018-12-28 腾讯科技(深圳)有限公司 人脸识别方法及识别系统
CN105787428A (zh) * 2016-01-08 2016-07-20 上海交通大学 基于稀疏编码的唇语特征身份认证方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177238A (zh) * 2011-12-26 2013-06-26 宇龙计算机通信科技(深圳)有限公司 终端和用户识别方法
US20160162729A1 (en) * 2013-09-18 2016-06-09 IDChecker, Inc. Identity verification using biometric data
CN104217212A (zh) * 2014-08-12 2014-12-17 优化科技(苏州)有限公司 真人身份验证方法
US20170024608A1 (en) * 2015-07-20 2017-01-26 International Business Machines Corporation Liveness detector for face verification
CN106126017A (zh) * 2016-06-20 2016-11-16 北京小米移动软件有限公司 智能识别方法、装置和终端设备

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110705438A (zh) * 2019-09-27 2020-01-17 腾讯科技(深圳)有限公司 步态识别方法、装置、设备及存储介质
CN110705438B (zh) * 2019-09-27 2023-07-25 腾讯科技(深圳)有限公司 步态识别方法、装置、设备及存储介质
CN111126124A (zh) * 2019-10-12 2020-05-08 深圳壹账通智能科技有限公司 多方视频的用户身份验证方法、装置及计算机设备
CN111860454A (zh) * 2020-08-04 2020-10-30 北京深醒科技有限公司 一种基于人脸识别的模型切换算法
CN111860454B (zh) * 2020-08-04 2024-02-09 北京深醒科技有限公司 一种基于人脸识别的模型切换算法
CN112491840A (zh) * 2020-11-17 2021-03-12 平安养老保险股份有限公司 信息修改方法、装置、计算机设备及存储介质
CN112491840B (zh) * 2020-11-17 2023-07-07 平安养老保险股份有限公司 信息修改方法、装置、计算机设备及存储介质
CN113314145A (zh) * 2021-06-09 2021-08-27 广州虎牙信息科技有限公司 样本生成、模型训练、口型驱动方法、装置、设备及介质
CN114495908A (zh) * 2022-02-08 2022-05-13 北京中科深智科技有限公司 一种基于时序卷积的语音驱动口型的方法和系统

Also Published As

Publication number Publication date
CN108427874A (zh) 2018-08-21

Similar Documents

Publication Publication Date Title
WO2019174131A1 (zh) 身份认证方法、服务器及计算机可读存储介质
CN109214360B (zh) 一种基于ParaSoftMax损失函数的人脸识别模型的构建方法及应用
WO2019120115A1 (zh) 人脸识别的方法、装置及计算机装置
US7873189B2 (en) Face recognition by dividing an image and evaluating a similarity vector with a support vector machine
WO2020077895A1 (zh) 签约意向判断方法、装置、计算机设备和存储介质
WO2019179036A1 (zh) 深度神经网络模型、电子装置、身份验证方法和存储介质
CN112232117A (zh) 一种人脸识别方法、装置及存储介质
WO2019062080A1 (zh) 身份识别方法、电子装置及计算机可读存储介质
US20130226587A1 (en) Lip-password Based Speaker Verification System
WO2019179029A1 (zh) 电子装置、身份验证方法和计算机可读存储介质
CN112395979B (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
US20200065573A1 (en) Generating variations of a known shred
WO2021056710A1 (zh) 多轮问答识别方法、装置、计算机设备及存储介质
CN103839041A (zh) 客户端特征的识别方法和装置
CN102737633A (zh) 一种基于张量子空间分析的说话人识别方法及其装置
US10423817B2 (en) Latent fingerprint ridge flow map improvement
CN109635625B (zh) 智能身份核验方法、设备、存储介质及装置
CN111898550B (zh) 建立表情识别模型方法、装置、计算机设备及存储介质
CN109376717A (zh) 人脸对比的身份识别方法、装置、电子设备及存储介质
CN111104852B (zh) 一种基于启发式高斯云变换的人脸识别技术
CN112966685B (zh) 用于场景文本识别的攻击网络训练方法、装置及相关设备
US11893773B2 (en) Finger vein comparison method, computer equipment, and storage medium
US20230012235A1 (en) Using an enrolled biometric dataset to detect adversarial examples in biometrics-based authentication system
CN105681324A (zh) 互联网金融交易系统及方法
CN116152870A (zh) 人脸识别方法、装置、电子设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18909890

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 08/12/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18909890

Country of ref document: EP

Kind code of ref document: A1