CN111104833A

CN111104833A - Method and apparatus for in vivo examination, storage medium, and electronic device

Info

Publication number: CN111104833A
Application number: CN201811269906.8A
Authority: CN
Inventors: 唐宇晨; 邱迪
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2018-10-29
Filing date: 2018-10-29
Publication date: 2020-05-05
Also published as: WO2020088029A1

Abstract

An object of the present disclosure is to provide a method and apparatus for live body examination, a storage medium, and an electronic device to solve the problem in the related art that the accuracy of live body examination is low when face recognition is performed. The method comprises the following steps: acquiring a face depth image and a face thermography of an object to be examined; extracting feature information from the face depth map and the face thermal image through a preset feature extraction model; and determining whether the object to be detected belongs to the living body category or not according to the characteristic information and the judgment model of the category to which the characteristic corresponds.

Description

Method and apparatus for in vivo examination, storage medium, and electronic device

Technical Field

The present disclosure relates to the field of image processing, and in particular, to a method and apparatus for in vivo testing, a storage medium, and an electronic device.

Background

With the development of science and technology, the data processing efficiency is improved, and the authentication mode of the legal identity changes day by day. In the related art, a scheme for performing user legal identity verification by collecting biometric features of a to-be-verified person is proposed, where the biometric features may be fingerprint features, or facial features of a person, or the like.

The human face is inherent like other biological characteristics (fingerprints, irises and the like) of a human body, and the uniqueness and good characteristic that the human face is not easily copied provide necessary preconditions for identity authentication. Compared with other types of biological recognition, the human face recognition has the characteristic of non-contact, namely, the user does not need to be in direct contact with the device, and the device can acquire a human face image. In addition, the face sorting, judging and identifying method can be used for sorting, judging and identifying a plurality of faces in an actual application scene.

Disclosure of Invention

An object of the present disclosure is to provide a method and apparatus for live body examination, a storage medium, and an electronic device to solve the problem in the related art that the accuracy of live body examination is low when face recognition is performed.

To achieve the above object, in a first aspect, the present disclosure provides a method for in vivo testing, the method comprising:

acquiring a face depth image and a face thermography of an object to be examined;

extracting feature information from the face depth map and the face thermal image through a preset feature extraction model;

and determining whether the object to be detected belongs to the living body category or not according to the characteristic information and the judgment model of the category to which the characteristic corresponds.

Optionally, the acquiring a facial depth map and a facial thermography of the object to be examined includes:

simultaneously acquiring an RGB (red, green and blue) image, a depth image and a thermal image of an object to be detected;

acquiring face position coordinates from the RGB image;

and mapping the face position coordinates to the areas of the depth map and the thermal image to be a face depth map and a face thermal image respectively.

Optionally, the method further includes:

enhancing contrast of a face region and a background region in one or more of the RGB map, the depth map and the thermographic map by histogram equalization.

Optionally, the extracting, by using a preset feature extraction model, feature information from the face depth map and the face thermal image includes:

fusing pixel points corresponding to the face depth image and the face thermal image to obtain a fused face image;

extracting a feature matrix of the fused face image through a pre-trained convolutional neural network model, wherein the feature information comprises the feature matrix.

Optionally, the determining, according to the feature information and the determination model of the category to which the feature corresponds, whether the object to be detected belongs to the living body category includes:

inputting the image feature matrix into a preset classified function model to obtain a probability value of the fused face image belonging to a living body class;

and if the probability value is larger than a preset probability threshold value, determining that the object to be detected is a living body.

Optionally, the preset classification function model is a Softmax classification function;

the inputting the image feature matrix into a preset classified function model to obtain a probability value of the fused face image belonging to a living body class includes:

performing full-connection layer transformation on the image feature matrix to obtain an output multi-dimensional feature vector, wherein the dimension number of the multi-dimensional feature vector corresponds to the category number of the Softmax classification function;

determining and obtaining a probability value of the fused face image belonging to the living body class according to the following formula:

wherein, a_iProbability value, z, representing the ith class of Softmax_iIs the ith value in the multi-dimensional feature vector.

Optionally, the fusing the pixel points corresponding to the face depth map and the face thermal map to obtain a fused face image, including:

and taking the value obtained by weighted average of the values of the first pixel points in the face depth image and the value of the second pixel points corresponding to the first pixel points in a target image channel in the face thermography as the value of the pixel points corresponding to the first pixel points in the fused face image, wherein the target image channel is any one image channel in the face thermography.

In a second aspect, the present disclosure provides a device for in vivo testing, the device comprising:

an acquisition module for acquiring a facial depth map and a facial thermography of an object to be examined;

the feature extraction module is used for extracting feature information from the face depth image and the face thermal image through a preset feature extraction model;

and the determining module is used for determining whether the object to be detected belongs to the living body category or not according to the characteristic information and the judging model of the category to which the characteristic corresponds.

Optionally, the obtaining module is configured to:

acquiring face position coordinates from the RGB image;

Optionally, the obtaining module is configured to:

Optionally, the feature extraction module is configured to fuse pixel points corresponding to the facial depth map and the facial thermal map to obtain a fused facial image; extracting a feature matrix of the fused face image through a pre-trained convolutional neural network model, wherein the feature information comprises the feature matrix.

Optionally, the determining module is configured to input the image feature matrix into a preset classified function model, so as to obtain a probability value that the fused facial image belongs to a living body class; and if the probability value is larger than a preset probability threshold value, determining that the object to be detected is a living body.

the determining module is configured to:

Optionally, the feature extraction module is configured to take a value obtained by weighted average of values of first pixel points in the face depth map and a value of a second pixel point corresponding to the first pixel point in a target image channel in the face thermography as a value of a pixel point corresponding to the first pixel point in the fused face image, where the target image channel is any image channel in the face thermography.

In a third aspect, the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of any of the methods for in vivo testing.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of any of the methods for in vivo testing.

The technical scheme can at least achieve the following technical effects:

the method comprises the steps of obtaining a face depth image and a face thermal image of an object to be detected, extracting feature information from the face depth image and the face thermal image through a preset feature extraction model, and determining whether the object to be detected belongs to a living body category or not according to the feature information and a judgment model of the category to which the features correspond. Thus, the accuracy of the biopsy is improved, and the situation that the mask or the face model is disguised as the identity of a legal person to invade can be identified.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow chart illustrating a method for in vivo testing according to an exemplary embodiment.

Fig. 2.1 is a flow chart illustrating another method for in vivo testing according to an exemplary embodiment.

Fig. 2.2 is a schematic diagram of a convolutional layer neural network according to the diagram of fig. 2.1.

Fig. 2.3 is a schematic diagram of the convolutional layer neural network operation shown in fig. 2.1.

FIG. 3 is a block diagram illustrating an apparatus for in vivo testing in accordance with an exemplary embodiment.

FIG. 4 is a block diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

In the related art, an authentication identification technique is proposed which judges the legitimate identity of an object to be inspected by photographing the object to be inspected and by a face image of the object to be inspected. Since the authentication identification in the related art is based on a 2D image, it is impossible to accurately judge whether or not the object to be examined is a living body. For the authentication and identification technology, some lawless persons can simulate the facial features of a legal object through a mask or a three-dimensional face model tool, so that the legal object is disguised and the account information of the legal object is stolen.

In view of the above, the embodiments of the present disclosure provide a method for in-vivo testing to improve the accuracy of in-vivo testing during face recognition.

FIG. 1 is a flow chart illustrating a method for in vivo testing according to an exemplary embodiment. The method can be applied to devices with face recognition function, such as mobile terminals, electronic devices such as cash dispensers and the like. The method comprises the following steps:

s11, acquiring a face depth map and a face thermography of the object to be examined.

Wherein the depth map (DepthMap) is an image or image channel containing information about the distance of the surface of the object to be examined from the viewpoint. Each pixel value of the depth map is the actual distance of the sensor from the object.

Thermography, also called infrared thermography, reflects the temperature distribution on the surface of the object.

Specifically, the acquiring a facial depth map and a facial thermography of the object to be examined includes: simultaneously acquiring an RGB (red, green and blue) image, a depth image and a thermal image of an object to be detected; acquiring face position coordinates from the RGB image; and mapping the face position coordinates to the areas of the depth map and the thermal image to be a face depth map and a face thermal image respectively.

In specific implementation, the RGB image, the depth image and the thermography of the object to be detected can be acquired simultaneously through the 3D camera. The RGB image has better face identifiability, so the face position coordinate of the object to be detected can be obtained through the RGB image.

In yet another possible implementation, an RGB image, a depth image, and a thermal image of the object to be detected may be obtained from the video, and in order to ensure the alignment effect of the pixel coordinate positions in the later period, the obtained RGB image, depth image, and thermal image may correspond to the same frame in the video file.

In addition, because the three images are obtained simultaneously, the pixel points of the images are in corresponding relation, and thus, the alignment of the pixel point coordinates of the three images is facilitated.

In specific implementation, the three graphics of the image can be preprocessed through histogram equalization, and the contrast of the face region and the background region is enhanced, so that effective information of corresponding face features is highlighted. In this way, the accuracy of subsequent feature extraction can be improved.

And S12, extracting feature information from the face depth map and the face thermal image through a preset feature extraction model.

For example, in the area corresponding to the nose-eyes of a person, the facial features such as the highest point of the face, the lowest point of the face, the relative position relationship between the highest point and the lowest point and the like can be obtained from the depth map. For another example, the temperature distribution of the face is different, and in winter, the temperature of the area near the mouth and nose is high, and the temperature of the area near the side face is low.

For a living body, the facial features may form certain rules, and preset living body facial feature information may be generated according to the rules.

For another example, the feature matrices of the facial depth map and the facial thermography map may be extracted through a pre-trained convolutional neural network model. The feature matrices can further input a judgment model of the category to which the features correspond, namely a preset classification function model used in cooperation with the pre-trained convolutional neural network model, so as to determine whether the object to be detected belongs to the living body category.

And S13, determining whether the object to be detected belongs to the living body category or not according to the characteristic information and the judgment model of the category to which the characteristic corresponds.

Specifically, the determination model of the category to which the feature corresponds includes preset living body facial feature information, the feature information may be compared with the preset living body facial feature information, and if the matching degree between the feature information extracted this time and the preset living body facial feature information is greater than a preset threshold value, the object to be detected is determined to be a living body; otherwise, determining that the object to be detected is not a living body.

The fact that the object to be detected is a non-living body means that the object to be detected is illegal at this time, and further, when the operation related to face recognition is specifically performed, the object to be detected can be prohibited from accessing an account or alarm information can be generated.

The technical scheme can at least achieve the following technical effects:

In addition, compared with the in-vivo test only through a thermal image, the technical scheme improves the stability and robustness of the in-vivo test.

Compared with the scheme of carrying out facial recognition by shooting a video indicating the object to be detected to execute a series of action instructions, the technical scheme does not need special cooperation of the object to be detected, improves user experience, and also improves the execution efficiency of in-vivo detection operation during facial recognition.

Fig. 2.1 is a flow chart illustrating a method for in vivo testing according to an exemplary embodiment. The method can be applied to devices with face recognition function, such as mobile terminals, electronic devices such as cash dispensers and the like. The method comprises the following steps:

s21, acquiring a face depth map and a face thermography of the object to be examined.

And S22, fusing the corresponding pixel points of the face depth image and the face thermal image to obtain a fused face image.

Specifically, the fusing the pixel points corresponding to the facial depth map and the facial thermal map to obtain a fused facial image includes: and taking the value obtained by weighted average of the values of the first pixel points in the face depth image and the value of the second pixel points corresponding to the first pixel points in a target image channel in the face thermography as the value of the pixel points corresponding to the first pixel points in the fused face image, wherein the target image channel is any one image channel in the face thermography.

For example, if a (i, j) is a pixel in the facial depth map and B (i, j) is a pixel in a single channel of the facial thermal map, the pixel C (i, j) in the fused image can be obtained by the following formula:

C(i,j)＝w_A(i,j)A(i,j)+w_B(i,j)B(i,j)

w_A(i,j)+w_B(i,j)＝1

by adjusting the weight w of the corresponding face depth map_A(i, j) and weights w corresponding to the facial thermography_B(i, j), the proportion of the depth distribution factor and the temperature distribution factor in the obtained fusion image can be adjusted more flexibly, so that the extracted facial feature matrix can reflect the facial features of the object to be detected more clearly.

The fused face image extracts favorable information in the depth image and the thermal image to the maximum extent, improves the information content of the result image, and is favorable for acquiring image characteristics more accurately, more reliably and more comprehensively.

And S23, extracting a feature matrix of the fused face image through a pre-trained convolutional neural network model.

The convolutional neural network is a multi-layer neural network, and comprises convolutional layers, pooling layers and the like. The artificial neuron can respond to surrounding units, continuously reduces the dimension of the image recognition problem with huge data volume, and finally can be trained to realize the functions and effects of classification, positioning, detection and the like. In addition, convolutional neural networks may be trained. The training method of the convolutional neural network needs to utilize a chain derivation rule to conduct derivation on nodes of a hidden layer, namely a back propagation rule of gradient descent and chain derivation.

Fig. 2.2 is a schematic diagram of a convolutional neural network, which is a schematic diagram of a neural network provided by an embodiment of the present disclosure, that is, from an input layer (input layer) to an output layer (output layer) through a hidden layer (hidden layer). Wherein the hidden layer comprises a number of different layers (convolutional layers, pooling layers, activation function layers, fully connected layers, etc.).

Feature extraction is completed by convolutional layer down-sampling, and a feature matrix is generated. Please refer to fig. 2.3, which is a schematic diagram illustrating operations provided by the embodiment of the present disclosure.

First, the fused face image may be an input matrix of 7 × 7 in the figure, and no pixel in the matrix is a source pixel (source pixel). And traversing the input matrix by a 3-by-3 filter window (filter), and performing convolution operation to obtain a value output after the convolution operation. Wherein the filtering window is also called a convolution matrix (convolution kernel).

The specific convolution operation process can be seen in the vertical form at the upper right corner of the figure, and the operation result is-8.

The center value (pixel with value 1) in the window matrix framed in the input matrix of fig. 2.3 is replaced by the result of the convolution operation, that is, by the pixel with value-8.

It should be noted that the above calculation process may be repeated several times according to actual needs until the required feature matrix is generated.

And S24, inputting the image feature matrix into a preset classified function model to obtain the probability value of the fused face image belonging to the living body class.

In an optional embodiment, the function model of the preset classification is a Softmax classification function. Specifically, the characteristic matrix is input into a Softmax classification function model, and the output quantity of the obtained multiple neurons can be mapped into an interval of 0-1. That is, the fusion image is input to the Softmax layer after extracting the feature matrix through the convolutional neural network, and finally, probability values corresponding to each category are output.

Specifically, full-connection layer transformation is performed on the image feature matrix to obtain an output multi-dimensional feature vector, wherein the dimension number of the multi-dimensional feature vector corresponds to the category number of the Softmax classification function.

After the convolution operation exemplified above, the output of the neuron can be expressed by the following formula:

wherein x is_ijIs the jth input value to the ith neuron; w is a_ijIs the jth weight of the ith neuron, b is an offset value, z_iThe ith output of the network, i.e. the ith value in the multi-dimensional feature vector, is represented.

Further, the probability value of the face image after fusion belonging to the living body class is determined and obtained according to the following formula:

For example, the Softmax classification function has two classes, the first class is "non-living", and the corresponding probability value is a₁(ii) a The second category is "live", corresponding to a probability value of a₂. Then, the probability value a of the face image after fusion belonging to the living body class can be obtained through the probability calculation formula₂。

S25, if the probability value is larger than a preset probability threshold value, determining that the object to be detected is a living body.

For example, if the preset probability threshold is 0.5 and the probability value P belonging to the living body class is greater than 0.5, it can be determined that the object to be inspected is a living body; otherwise, if the probability value P belonging to the living body class is not more than 0.5, the object to be detected is determined to be a non-living body.

The technical scheme can at least achieve the following technical effects:

It should be noted that, for simplicity of description, the above-mentioned method embodiments are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.

FIG. 3 is a block diagram illustrating an apparatus for in vivo testing in accordance with an exemplary embodiment. The device can be applied to equipment with a face recognition function, such as electronic equipment of mobile terminals, cash dispensers and the like. The device comprises:

an acquisition module 310 for acquiring a facial depth map and a facial thermography of an object to be examined;

a feature extraction module 320, configured to extract feature information from the facial depth map and the facial thermography map through a preset feature extraction model;

and the determining module 330 is configured to determine whether the object to be detected belongs to a living body category according to the feature information and the determination model of the category to which the feature corresponds.

The technical scheme can at least achieve the following technical effects:

Optionally, the obtaining module is configured to:

acquiring face position coordinates from the RGB image;

Optionally, the obtaining module is configured to:

Optionally, the feature extraction module is configured to fuse pixel points corresponding to the facial depth map and the facial thermal map to obtain a fused facial image; extracting a feature matrix of the fused face image through a pre-trained convolutional neural network model;

the determining module is used for inputting the image feature matrix into a preset classified function model to obtain a probability value of the fused face image belonging to a living body class; and if the probability value is larger than a preset probability threshold value, determining that the object to be detected is a living body.

the determining module is configured to:

It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional units (modules) is merely used as an example, and in practical applications, the above function distribution may be performed by different functional units (modules) according to needs, that is, the internal structure of the device is divided into different functional units (modules) to perform all or part of the above described functions. The specific working process of the functional unit (module) described above may refer to the corresponding process in the foregoing method embodiment, and is not described herein again.

The disclosed embodiments provide a computer-readable storage medium having stored thereon a computer program, which when executed by a processor, performs the steps of the method for in-vivo testing.

An embodiment of the present disclosure provides an electronic device, including:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method for biopsy.

Fig. 4 is a block diagram illustrating an electronic device 400 according to an example embodiment. As shown in fig. 4, the electronic device 400 may include: a processor 401 and a memory 402. The electronic device 400 may also include one or more of a multimedia component 403, an input/output (I/O) interface 404, and a communications component 405.

Wherein the processor 401 is used for controlling the overall operation of the electronic device 400 to complete all or part of the steps of the method for in vivo examination described above. The memory 402 is used to store various types of data to support operation at the electronic device 400, such as instructions for any application or method operating on the electronic device 400, as well as application-related data, such as pre-trained convolutional neural network models, thermographic and depth map data of objects to be detected, and further, identity data of legitimate users, messages sent and received, audio, video, and so forth. The Memory 402 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 403 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 402 or transmitted through the communication component 405. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 404 provides an interface between the processor 401 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 405 is used for wired or wireless communication between the electronic device 400 and other devices. Wireless communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding communication component 405 may include: Wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the electronic Device 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described method for in vivo testing.

In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the above-described method for in-vivo testing is also provided. For example, the computer readable storage medium may be the memory 402 described above comprising program instructions executable by the processor 401 of the electronic device 400 to perform the method for in vivo testing described above.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure. It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. A method for in vivo testing, the method comprising:

2. The method of claim 1, wherein the acquiring of the facial depth map and the facial thermography of the object to be examined comprises:

acquiring face position coordinates from the RGB image;

3. The method of claim 2, further comprising:

4. The method according to any one of claims 1-3, wherein the extracting feature information from the face depth map and the face thermography map by a preset feature extraction model comprises:

5. The method according to claim 4, wherein the determining whether the object to be examined belongs to a living body class according to the feature information and a judgment model of the class to which the feature corresponds comprises:

6. The method of claim 5, wherein the function model of the preset classification is a Softmax classification function;

7. The method of claim 4, wherein the fusing the pixel points corresponding to the face depth map and the face thermal map to obtain a fused face image comprises:

8. A device for in vivo testing, the device comprising:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

10. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 7.