CN114627534A

CN114627534A - Living body discrimination method, electronic device, and storage medium

Info

Publication number: CN114627534A
Application number: CN202210252551.1A
Authority: CN
Inventors: 时勇杰; 陆进; 刘玉宇; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-03-15
Filing date: 2022-03-15
Publication date: 2022-06-14

Abstract

A living body discrimination method, an electronic apparatus, and a storage medium, which are mainly intended to improve stability of living body discrimination, the method comprising: the method comprises the steps of obtaining a target image, generating a first detection frame and a second detection frame in the target image, wherein the first detection frame and the second detection frame are used for positioning an object to be distinguished, and the detection area of the first detection frame is larger than that of the second detection frame. The method comprises the steps of utilizing a first detection frame and a second detection frame to respectively intercept a first image and a second image from a target image, inputting the first image into a first preset neural network obtained based on self-supervision pre-training for feature extraction to obtain first feature information, and inputting the second image into a second preset neural network for feature extraction to obtain second feature information. And if the non-living body characteristic is identified according to the first characteristic information, determining the object to be judged as a non-living body, otherwise, judging the living body according to the first characteristic information and the second characteristic information to obtain a living body judgment result.

Description

Living body discrimination method, electronic device, and storage medium

Technical Field

The present application relates to the field of artificial intelligence technologies, and in particular, to a living body identification method, an electronic device, and a storage medium.

Background

In face recognition, live body discrimination is an important anti-fraud means for recognizing whether a target object in an image is a real human being. The existing living body distinguishing mode mainly generates a mask image of a face image, so that the living body is distinguished according to the probability value of pixel points of the mask image. In practice, it is found that this method is greatly influenced by image quality, and it is difficult to distinguish the difference between a high-definition captured image (such as a high-definition screen or photographic paper) and a real living body, so that the stability of living body identification is low.

Disclosure of Invention

The present application provides a living body discrimination method, an electronic apparatus, and a storage medium, and mainly aims to improve the stability of living body discrimination.

In order to achieve the above object, an embodiment of the present application provides a living body identification method, including:

acquiring a target image, and generating a first detection frame and a second detection frame in the target image, wherein the first detection frame and the second detection frame are both used for positioning an object to be distinguished, and the detection area of the first detection frame is larger than that of the second detection frame;

intercepting a first image from the target image by using the first detection frame, and inputting the first image into a first preset neural network for feature extraction to obtain first feature information, wherein the first preset neural network is obtained based on self-supervision pre-training;

intercepting a second image from the target image by using the second detection frame, and inputting the second image into a second preset neural network for feature extraction to obtain second feature information;

identifying non-living features according to the first feature information, and if the non-living features are identified, determining the object to be judged as a non-living body; and if the non-living body characteristics are not identified, carrying out living body judgment according to the first characteristic information and the second characteristic information to obtain a living body judgment result.

In some embodiments, the training of the first and second pre-set neural networks comprises:

acquiring an image sample and a living body distinguishing label of the image sample;

generating a third detection frame and a fourth detection frame in the image sample, wherein the detection area of the third detection frame is larger than that of the fourth detection frame;

intercepting a third image from the image sample by using the third detection frame, and intercepting a fourth image from the image sample by using the fourth detection frame;

generating an automatic supervision characteristic of the third image according to a preset characteristic type, wherein the preset characteristic type is a characteristic type with difference between a living body image and a non-living body image;

training a first preset neural network by using the third image to obtain a first training result, and performing self-supervision learning on the first preset neural network by using the self-supervision characteristics to obtain a self-supervision result;

training a second preset neural network by using the fourth image to obtain a second training result;

performing living body judgment according to the first training result and the second training result to obtain a target judgment result;

and adjusting parameters of the first preset neural network and the second preset neural network according to the self-supervision result, the target judgment result and the living body judgment label until a training end condition is reached.

In some embodiments, the training of the first preset neural network by using the third image to obtain a first training result includes:

inputting the third image into a first preset neural network for N times of feature extraction to obtain first extraction information corresponding to the Nth time of feature extraction, wherein N is a positive integer; determining the first extraction information as a first training result;

the self-monitoring learning is performed on the first preset neural network by using the self-monitoring characteristics to obtain a self-monitoring result, and the method comprises the following steps:

acquiring second extraction information corresponding to the N-m times of feature extraction, wherein m represents the number of self-supervision layers, m is a positive integer and belongs to [1, N-1 ]; and inputting the second extraction information and the self-supervision features into a pre-trained self-supervision branch to obtain a self-supervision result.

In some embodiments, the generating the self-supervision feature of the third image according to the preset feature type includes:

generating a target feature map of the third image according to a preset feature type; acquiring a preset self-supervision layer number, and acquiring a characteristic diagram size matched with the self-supervision layer number; and adjusting the size of the target feature map according to the size of the feature map to obtain the self-supervision feature of the third image.

In some embodiments, said truncating a third image from the image sample with the third detection box comprises:

intercepting a sub-image corresponding to the third detection frame from the image sample; determining the sub-image corresponding to the third detection frame as a third image, or performing data enhancement processing on the sub-image corresponding to the third detection frame to obtain a third image;

the intercepting a fourth image from the image sample with the fourth detection frame includes:

intercepting a sub-image corresponding to the fourth detection frame from the image sample; and determining the sub-image corresponding to the fourth detection frame as a fourth image, or performing data enhancement processing on the sub-image corresponding to the fourth detection frame to obtain a fourth image.

In some embodiments, the inputting the first image into a first preset neural network for feature extraction to obtain first feature information includes:

carrying out standardization processing on the first image to obtain a fifth image; inputting the fifth image into a first preset neural network for feature extraction to obtain first feature information;

inputting the second image into a second preset neural network for feature extraction to obtain second feature information, wherein the feature extraction comprises the following steps:

the second image is subjected to standardization processing to obtain a sixth image, wherein a first image size corresponding to the fifth image and a second image size corresponding to the sixth image meet specified multiples, and the first image size is larger than the second image size; and inputting the sixth image into a second preset neural network for feature extraction to obtain second feature information.

In some embodiments, the generating a first detection frame and a second detection frame in the target image includes:

generating an initial detection frame in the target image, wherein the initial detection frame is used for positioning an object to be distinguished;

determining the initial detection frame as a second detection frame, and performing external expansion processing on the initial detection frame to obtain a first detection frame;

or carrying out twice external expansion processing on the initial detection frame to obtain a first detection frame and a second detection frame, wherein the detection range of the first detection frame is larger than that of the second detection frame.

In some embodiments, the performing living body discrimination according to the first feature information and the second feature information to obtain a living body discrimination result includes:

acquiring a pre-trained living body distinguishing model, wherein the living body distinguishing model comprises a fusion unit and a classification unit; inputting the first characteristic information and the second characteristic information into the fusion unit for characteristic fusion to obtain target characteristic information; inputting the target characteristic information into the classification unit to obtain a living body probability and a non-living body probability, wherein the living body probability represents the probability that the object to be distinguished is a living body, and the non-living body probability represents the probability that the object to be distinguished is a non-living body; and generating a living body judgment result according to the living body probability and the non-living body probability.

In order to achieve the above object, an embodiment of the present application further provides a living body distinguishing device, including:

the acquisition module is used for acquiring a target image;

a generating module, configured to generate a first detection frame and a second detection frame in the target image, where the first detection frame and the second detection frame are both used to position an object to be distinguished, and a detection area of the first detection frame is larger than a detection area of the second detection frame;

the first extraction module is used for intercepting a first image from the target image by using the first detection frame, inputting the first image into a first preset neural network for feature extraction to obtain first feature information, wherein the first preset neural network is obtained based on self-supervision pre-training;

the second extraction module is used for intercepting a second image from the target image by using the second detection frame and inputting the second image into a second preset neural network for feature extraction to obtain second feature information;

the identification module is used for identifying non-living body characteristics according to the first characteristic information;

the discrimination module is used for determining the object to be discriminated as a non-living body when the non-living body characteristics are identified; and when the non-living body feature is not identified, carrying out living body judgment according to the first feature information and the first feature information to obtain a living body judgment result.

In order to achieve the above object, an embodiment of the present application further provides an electronic device, which includes a memory and a processor, where the memory stores a program, and the program implements the steps of the foregoing method when executed by the processor.

To achieve the above object, the present application provides a storage medium for a computer-readable storage, the storage medium storing one or more programs, the one or more programs being executable by one or more processors to implement the steps of the aforementioned method.

According to the living body identification method, the electronic device and the storage medium, the first detection frame and the second detection frame for positioning the object to be identified are generated in the target image, so that the detection area of the first detection frame is larger than that of the second detection frame, the first detection frame can cover the background area of the target image more, and the second detection frame focuses on the local area containing the object to be identified in the target image. And then, inputting a first image intercepted from the target image by using the first detection frame into a first preset neural network to obtain first characteristic information, and inputting a second image intercepted by using the second detection frame into a second preset neural network to obtain second characteristic information. The first preset neural network is obtained based on the self-supervision pre-training, so that the first preset neural network has the capability of excavating image features based on a specific self-supervision learning task, and the input of the first preset neural network is the first image, so that the first preset neural network can further excavate global features corresponding to an image background area, and the global difference of a living body image and a non-living body image is conveniently embodied. Based on this, the identification of the non-living body feature is performed according to the first feature information, and if the non-living body feature is identified, the object to be discriminated is directly determined as the non-living body. If the non-living body feature is not identified, the living body is judged according to the first feature information and the second feature information to obtain a living body judgment result, the non-living body feature is identified from the image background layer, the living body is judged by combining the local image feature and the global image feature at the same time, the image quality is not limited, and the stability and the generalization of the living body judgment are improved.

Drawings

Fig. 1 is a block diagram of an electronic device to which an embodiment of the present application is applied;

FIG. 2 is a schematic flow chart of a living body identification method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a training process of a first predetermined neural network and a second predetermined neural network in an embodiment of the present application;

FIG. 4 is a schematic diagram of the detailed flow chart of step S340 in FIG. 3;

FIG. 5 is a schematic structural diagram of a first pre-defined neural network, a second pre-defined neural network, an auto-supervision branch and a living body discrimination model according to an embodiment of the present application;

fig. 6 is a block diagram showing a living body discriminating apparatus according to an embodiment of the present application.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no peculiar meaning by themselves. Thus, "module", "component" or "unit" may be used mixedly.

The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. The artificial intelligence software technology mainly includes several directions such as a computer vision technology (such as face recognition), a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, and machine learning/deep learning.

In face recognition, live body discrimination is an important anti-fraud means for recognizing whether a target object in an image is a real human. The existing living body distinguishing mode mainly generates a mask image of a face image, so that the living body is distinguished according to the probability value of pixel points of the mask image. In practice, it is found that this method is greatly affected by image quality, and it is difficult to distinguish the difference between a high-definition captured image (such as a high-definition screen or photographic paper) and a real living body, so the stability of living body identification is low.

In order to solve the above problem, the present application provides a living body identification method applied to an electronic device. Referring to fig. 1, fig. 1 is a block diagram of an electronic device to which an embodiment of the present application is applied.

In the embodiment of the present application, the electronic device may be a terminal device having an arithmetic function, such as a server, a smart phone, a tablet computer, a portable computer, and a desktop computer.

The electronic device includes: memory 11, processor 12, network interface 13, and data bus 14.

The memory 11 includes at least one type of readable storage medium, which may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory, or the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic device, such as a hard disk of the electronic device. In other embodiments, the readable storage medium may be an external memory of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device.

In the present embodiment, the readable storage medium of the memory 11 is generally used for storing a living body discrimination program installed in an electronic device, a plurality of sample sets, a model trained in advance, and the like. The memory 11 may also be used to temporarily store data that has been output or is to be output.

The processor 12 may be a Central Processing Unit (CPU), microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 11 or Processing data, such as executing living body identification program.

The network interface 13 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), and is typically used to establish a communication link between the electronic device and other electronic devices.

The data bus 14 is used to enable connection communication between these components.

Optionally, the electronic device may further include a user interface, the user interface may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other devices with voice recognition function, a voice output device such as a sound box, a headset, or other devices, and optionally, the user interface may further include a standard wired interface or a wireless interface.

Optionally, the electronic device may further include a display, which may also be referred to as a display screen or a display unit. In some embodiments, the display device may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic device and for displaying a visualized user interface.

Optionally, the electronic device further comprises a touch sensor. The area provided by the touch sensor for the user to perform touch operation is referred to as a touch area. Further, the touch sensor here may be a resistive touch sensor, a capacitive touch sensor, or the like. The touch sensor may include not only a contact type touch sensor but also a proximity type touch sensor. Further, the touch sensor may be a single sensor, or may be a plurality of sensors arranged in an array, for example.

In addition, the area of the display of the electronic device may be the same as or different from the area of the touch sensor. Optionally, the display is stacked with the touch sensor to form a touch display screen. The device detects touch operation triggered by a user based on the touch display screen.

A living body discrimination method disclosed in the embodiment of the present application will be specifically described below.

As shown in fig. 2, fig. 2 is a schematic flow chart of a living body identification method according to an embodiment of the present application. Based on the electronic apparatus shown in fig. 1, the processor 12 implements steps S200 to S270 as follows when executing the program stored in the memory 11.

Step S200: and acquiring a target image, and generating a first detection frame and a second detection frame in the target image.

In the embodiment of the present application, the target image refers to an image including an object to be determined, and the object to be determined may be any object having specific properties (including shape, gray scale, texture, and the like), and is specifically classified as a living object or a non-living object, the living object includes but is not limited to a human face, a human body, and the like, and the non-living object includes but is not limited to paper, an electronic screen, a work card, a mask, and the like.

Specifically, the manner of acquiring the target image includes, but is not limited to: shooting an object to be distinguished by using a shooting device of the electronic equipment; retrieving a pre-stored image from a designated database or other storage module; receiving an image uploaded to an electronic device through a user interface; and receiving images sent to the electronic equipment by other equipment (such as an entrance guard photographic device or a road monitoring device).

In the embodiment of the application, the first detection frame and the second detection frame are used for positioning the object to be distinguished, and the detection area of the first detection frame is larger than that of the second detection frame, so that the first detection frame can cover more background areas of the target image, and the second detection frame focuses on a local area containing the object to be distinguished in the target image, which is beneficial to improving the non-living body defense force.

Specifically, the generating of the first detection frame and the second detection frame in the target image may be: and generating an initial detection frame in the target image, wherein the initial detection frame is used for positioning the object to be distinguished. The manner of generating the initial detection frame includes, but is not limited to, a single shot multi-box Face detection (SSD) algorithm, an ssh (single Stage foreground Face detector) network-based algorithm, and a multi-task convolutional network (MTCNN) based algorithm. In an exemplary practical application, the target image is directly input into a pre-trained face detection network or model, so as to obtain detection frame information, where the detection frame information at least includes vertex coordinates (such as coordinates of a top-left corner of the detection frame) of the detection frame in the target image, and a width and a height of the detection frame. And generating an initial detection frame in the target image according to the detection frame information, wherein the initial detection frame can cover the main characteristic area of the object to be distinguished.

Based on this, in an implementation manner, the initial detection frame may be determined as the second detection frame, and the first detection frame may be obtained by performing an external expansion process on the initial detection frame. Optionally, when the initial detection frame is subjected to outward expansion processing, the central coordinate is determined first, and then the initial detection frame is expanded outwards with the central coordinate as the center₁Multiple, R₁Can be an artificially specified positive integer, e.g., when R is₁When the number is 4, the first detection frame and the second detection frame satisfy a living body discrimination requirement with high accuracy. The central coordinate may be a central coordinate of an image area corresponding to the initial detection frame in the target image; alternatively, the central coordinate may be calculated from coordinates of a plurality of key points in the image region, where the type of the key point is related to the object to be determined, for example, if the object to be determined is a human face, the key points may include, but are not limited to, a left eye, a right eye, a nose tip, and lips, and the central coordinate (x) is_o,y_o) In the step (1), the first step,

(x_i,y_i) Is the coordinate of any key point, and a is the number of key point coordinates. The center coordinates are not particularly limited.

In another implementation manner, the initial detection frame may also be subjected to two times of outward expansion processing to obtain a first detection frame and a second detection frame, and a detection range of the first detection frame is larger than a detection range of the second detection frame. That is, to the initial detection frame extension R₂Multiplying to obtain a first detection frame, and performing initial detection on the amplified R₃Multiplying to obtain a second detection frame, R₂＞R₃，R₂And R₃Are all artificially specified positive integers, e.g. R₂＝4，R₃＝2。

In other implementation manners, the number of the second detection frames can also be two or more, so that the requirements of more kinds of expansion multiple processing are met, and the extracted image area information is enriched.

Step S210: and intercepting a first image from the target image by using a first detection frame.

Optionally, if the first detection frame exceeds the image edge of the target image, when the first image is captured from the target image, a pixel filling operation (also called a padding operation) is not performed on the image edge, so as to avoid introducing non-living body significant features such as a black edge, for example, an edge of an electronic device.

Step S220: and inputting the first image into a first preset neural network for feature extraction to obtain first feature information, wherein the first preset neural network is obtained based on self-supervision pre-training.

In this embodiment of the application, the first preset neural network may adopt a ResNet network, a VGG network, or another convolutional neural network, and the like, which is not specifically limited. In the training stage of the first preset neural network, a specific self-supervision learning task is utilized to mine own supervision information from large-scale unsupervised data, and the supervision information is utilized to train the first preset neural network, so that the first preset neural network learns the valuable representation of downstream tasks. The input of the first preset neural network is the first image, so that the first preset neural network can further mine global features corresponding to the background area of the image, and the global difference of the live image and the non-live image is conveniently embodied.

Step S230: and intercepting a second image from the target image by using a second detection frame.

Optionally, if the second detection frame exceeds the image edge of the target image, when the second image is captured from the target image, a pixel filling operation (also called a padding operation) is not performed on the image edge, so as to avoid introducing non-living body significant features such as a black edge, for example, an edge of an electronic device.

Step S240: and inputting the second image into a second preset neural network for feature extraction to obtain second feature information.

In this embodiment of the application, the second preset neural network may adopt a ResNet network, a VGG network, or another convolutional neural network, and the like, which is not specifically limited.

In an optional implementation manner, step S220 may specifically be: and carrying out standardization processing on the first image to obtain a fifth image. And inputting the fifth image into a first preset neural network for feature extraction to obtain first feature information. Correspondingly, step S240 may specifically be: and carrying out standardization processing on the second image to obtain a sixth image, so that the first image size corresponding to the fifth image and the second image size corresponding to the sixth image meet the specified multiple, and the first image size is larger than the second image size. And inputting the sixth image into a second preset neural network for feature extraction to obtain second feature information.

The normalization process is used to transform the corresponding image into a designated format, the designated format and the designated multiple may be set manually, and the designated format includes, but is not limited to, at least one of an image size and an image pixel value, which is not limited. For example, the specified multiple may relate to a parameter of the flaring process, such as the specified multiple being N₂÷N₃. Therefore, the method ensures that the different images meet the specified size relevance, is beneficial to the accuracy of feature extraction, and improves the generalization of the whole mechanism.

In another optional implementation, the specified first image size and the specified second image size may be determined, and then the first image is directly normalized to the first image size, and the second image is normalized to the second image size, so as to complete the image adjustment based on the specified size. Specifically, the first image size is W × H × C — 224 × 3, the second image size is W × H × C — 112 × 3, W denotes an image width, H denotes an image height, and C denotes the number of image channels.

In an alternative embodiment, please refer to fig. 3, wherein fig. 3 is a schematic diagram illustrating a training process of a first predetermined neural network and a second predetermined neural network according to an embodiment of the present application. As shown in fig. 3, the training steps of the first preset neural network and the second preset neural network include:

step S300: an image sample and a living body discrimination label of the image sample are acquired.

The image sample comprises a target object, the target object is a living object or a non-living object, and the living body distinguishing label is used for indicating that the target object is a living body or a non-living body.

Step S310: and generating a third detection frame and a fourth detection frame in the image sample, wherein the detection area of the third detection frame is larger than that of the fourth detection frame.

The implementation of step S310 may specifically refer to the description of step S200, and is not described herein again.

Step S320: a third image is cut out of the image sample with a third detection frame and a fourth image is cut out of the image sample with a fourth detection frame.

In one implementation, the capturing the third image from the image sample by using the third detection frame specifically includes the following steps: and intercepting a sub-image corresponding to the third detection frame from the image sample. And determining the sub-image corresponding to the third detection frame as a third image, or performing data enhancement processing on the sub-image corresponding to the third detection frame to obtain a third image.

Correspondingly, the step of capturing the fourth image from the image sample by using the fourth detection frame may specifically include the following steps: and intercepting a sub-image corresponding to the fourth detection frame from the image sample. And determining the sub-image corresponding to the fourth detection frame as a fourth image, or performing data enhancement processing on the sub-image corresponding to the fourth detection frame to obtain the fourth image.

Wherein the data enhancement processing includes, but is not limited to, at least one of flipping, rotating, mirroring, chrominance transformation, lighting, glass, and fogging. Therefore, in the network training stage, the sample types and combinations for training the network are enriched by enhancing the data of the image samples, and the network precision can be effectively improved on the premise of limited sample data.

Step S330: and generating the self-supervision characteristic of the third image according to a preset characteristic type, wherein the preset characteristic type is a characteristic type with difference between the living body image and the non-living body image.

In the embodiment of the present application, the preset feature type may be specified by a human, and the preset feature type may specifically include, but is not limited to, at least one of color texture, non-rigid motion deformation, material (such as skin, paper, or mirror, etc.), frequency domain feature, and feature descriptor (such as histogram of oriented gradients HOG, speedup robust feature SURF, etc.).

Specifically, in step S330, if the preset feature type includes a feature descriptor, the feature descriptor of the third image may be calculated by using a SURF algorithm, and the feature descriptor is subjected to principal component projection by using a Principal Component Analysis (PCA) algorithm, and then subjected to principal component coding by using a Gaussian Mixed Model (GMM), so as to obtain the self-supervision feature of the third image. If the preset feature type comprises frequency domain features, Fourier transform processing can be carried out on the third image to obtain a Fourier frequency domain image as the self-supervision features of the third image, and the Fourier frequency domain image has obvious feature difference on the live image and the non-live image. In other implementation manners, the third image may also be input into an attention model trained based on a preset feature type, so as to obtain an auto-supervision feature of the third image.

Step S340: and training the first preset neural network by using the third image to obtain a first training result, and performing self-supervision learning on the first preset neural network by using the self-supervision characteristics to obtain a self-supervision result.

In one implementation, as shown in fig. 4, training a first preset neural network by using a third image to obtain a first training result specifically includes the following steps S341 to S342:

step S341: and inputting the third image into a first preset neural network for N times of feature extraction to obtain first extraction information corresponding to the Nth time of feature extraction.

Wherein N is a positive integer. Specifically, the first preset neural network may include N convolutional layers, and convolution parameters of the convolutional layers gradually decrease until the last convolutional layer outputs a feature map of a specified image size, that is, the first extraction information.

Step S342: the first extracted information is determined as a first training result.

Correspondingly, the self-supervision learning is performed on the first preset neural network by using the self-supervision characteristics to obtain a self-supervision result, and the method specifically includes the following steps S343 to S344:

step S343: and acquiring second extraction information corresponding to the N-m times of feature extraction, wherein m represents the number of self-supervision layers.

Wherein m is a positive integer and m is within the scope of [1, N-1 ]. The second extraction information corresponding to the N-m times of feature extraction may be a feature map output by the N-m convolutional layers in the first preset neural network. When the self-supervision branch is introduced after the N-m convolutional layers in the first preset neural network, the self-supervision result can assist in guiding the feature extraction learning process of the 1 st to N-m-1 th convolutional layers in the first preset neural network, and the subsequent convolutional layers can be ensured to more accurately extract the non-living features, so that the smaller the value of m is, the more the convolutional layers guided by the self-supervision assistance are, and the larger the self-supervision influence is. Therefore, self-supervision based on the preset feature type can be used for intervening specific nodes in the image feature extraction process, and supervision modes are more flexible and diversified.

Step S344: and inputting the second extraction information and the self-supervision characteristics into a pre-trained self-supervision branch to obtain a self-supervision result.

Specifically, the self-monitoring branch is used for extracting the target feature corresponding to the preset feature type from the second extraction information, so that the self-monitoring branch may include a plurality of pre-trained convolutional layers. And calculating a first loss value between the second extraction information and the target feature through the self-supervision branch, thereby determining the first loss value as a self-supervision result. The loss function for calculating the first loss value includes, but is not limited to, Mean Squared Error (MSE), cross entropy loss function, and the like. In addition, the self-supervision branch does not participate in final reasoning deployment, and only plays an auxiliary role in improving the network precision in the training stage of the first preset neural network.

In an alternative embodiment, step S330 may include the following steps:

and generating a target feature map of the third image according to the preset feature type. And acquiring a preset self-supervision layer number, and acquiring the size of a characteristic diagram matched with the self-supervision layer number. And adjusting the size of the target feature map according to the size of the feature map to obtain the self-supervision feature of the third image. Specifically, the feature map size matched with the number of self-supervision layers may refer to an image size corresponding to the second extracted information, and the image size of the self-supervision feature is ensured to be consistent with the image size of the second extracted information by size adjustment.

Step S350: and training the second preset neural network by using the fourth image to obtain a second training result.

Step S360: and performing living body judgment according to the first training result and the second training result to obtain a target judgment result.

In one implementation, a living body discrimination branch may be constructed, and the living body discrimination branch may be trained simultaneously in the training process of the first preset neural network and the second preset neural network, that is, the first training result and the second training result are input to the living body discrimination branch to obtain the target discrimination result. In another implementation manner, the first training result and the second training result may also be directly input into a pre-trained living body discrimination model to obtain a target discrimination result.

Step S370: and adjusting parameters of the first preset neural network and the second preset neural network according to the self-supervision result, the target judgment result and the living body judgment label until a training end condition is reached.

Specifically, in step S370, a second loss value may be calculated based on the target discrimination result and the living body discrimination label. And verifying whether the training end condition is met according to the first loss value and the second loss value. And if the training end condition is met, ending the training. If the training end condition is not met, adjusting parameters of the first preset neural network and the second preset neural network (or the first preset neural network, the second preset neural network and the living body distinguishing branch) according to the first loss value and the second loss value, increasing the number of samples and re-executing the training steps.

The loss function used for calculating the second loss value includes, but is not limited to, a cross entropy loss function and an edge loss function. Therefore, the loss value of the self-supervision branch is added to serve as an auxiliary item of the loss function, and the effect of multi-scale auxiliary loss is achieved.

Optionally, the training end condition includes, but is not limited to, a specified loss threshold, and then verifying whether the training end condition is met according to the first loss value and the second loss value includes: and calculating a target loss value according to the first loss value and the second loss value, wherein if the target loss value is less than or equal to a loss threshold value, the first preset model meets the training ending condition, and if the target loss value is greater than the loss threshold value, the first preset model does not meet the training ending condition. Ways to calculate the target loss value include, but are not limited to: the target loss value is first weight × first loss value + second weight × second loss value, and both the first weight and the second weight may be artificially set weights. In some implementations, the first weight can be related to a value of N-m, for example, in a direct relationship.

It can be seen that, by implementing the above steps S300 to S370, in the training process of the first preset neural network and the second preset neural network, the first preset mesh is self-supervised learned by using the self-supervision features based on the preset feature types, so that the non-living body features in the first preset neural network learning image can be effectively guided.

Step S250: identification of non-living body features is performed based on the first feature information, and step S260 or step S270 is performed.

In the embodiment of the present application, the non-living features include, but are not limited to, black edges, borders, moire fringes, imaging deformities, infrared reflected light, and the like.

Step S260: and if the non-living body characteristics are identified, determining the object to be distinguished as a non-living body.

Step S270: and if the non-living body characteristic is not identified, carrying out living body judgment according to the first characteristic information and the second characteristic information to obtain a living body judgment result.

Therefore, based on steps S250 to S270, preliminary living body discrimination can be performed by combining the non-living body features of the background region in the target image, and if the non-living body features are identified, the object to be discriminated is directly determined as a non-living body, which is beneficial to improving discrimination efficiency. If the non-living body characteristics are not identified, secondary living body judgment is carried out by combining the first characteristic information and the second characteristic information, and the accuracy and the rigor of living body judgment are improved.

In an alternative embodiment, please refer to fig. 5, in which fig. 5 is a schematic structural diagram of a first predetermined neural network, a second predetermined neural network, an auto-supervision branch and a living body discrimination model in the embodiment of the present application. Based on fig. 5, performing living body identification according to the first feature information and the second feature information to obtain a living body identification result, which may specifically include the following steps:

step S271: and acquiring a pre-trained living body discrimination model, wherein the living body discrimination model comprises a fusion unit and a classification unit.

Step S272: and inputting the first characteristic information and the second characteristic information into a fusion unit for characteristic fusion to obtain target characteristic information.

The feature fusion performed by the fusion unit may adopt a Blending fusion mode or a Stacking fusion mode, and the like, and is not limited specifically. For example, if the merging unit adopts a Stacking merging manner, the merging unit may include at least two learners and a merging layer, and each learner may include a batch normalization layer, a long-term and short-term memory network layer, a pooling layer, and a flat layer, which is not particularly limited. Inputting the first feature information into one learner to obtain new first feature information, inputting the second feature information into the other learner to obtain new second feature information, and inputting the new first feature information and the second feature information into a fusion layer to perform feature fusion to obtain target feature information so as to realize the fusion of local image features and global image features.

In practical application, in order to improve accuracy of image feature fusion, convolution step lengths adopted by the first preset neural network and the second preset neural network meet a specified multiple, so that first feature information output by the first preset neural network and second feature information output by the second preset neural network both meet a specified image size. In one implementation, the image size of the first image input to the first preset neural network is 224 × 224, the image size of the second image input to the second preset neural network is 112 × 112, and then the convolution step size adopted by the first preset neural network is 32, and the convolution step size adopted by the second preset neural network is 16.

Step S273: and inputting the target characteristic information into a classification unit to obtain a living body probability and a non-living body probability, wherein the living body probability represents the probability that the object to be distinguished is a living body, and the non-living body probability represents the probability that the object to be distinguished is a non-living body.

Wherein, the classification unit at least comprises a full connection layer and a softmax classification layer.

Step S274: and generating a living body judgment result according to the living body probability and the non-living body probability.

Illustratively, in one case, the object to be discriminated is determined as a living body if the living body probability is larger than the non-living body probability, and the object to be discriminated is determined as a non-living body if the living body probability is smaller than the non-living body probability. In another case, if the living body probability is greater than the specified probability, the object to be determined is determined as a living body, otherwise, the object to be determined is determined as a non-living body, and the specified probability is specified artificially, such as 0.6 or 0.7, and the like, which is not particularly limited.

Therefore, the implementation of the method embodiment can realize the identification of the non-living body characteristics from the image background level, and can simultaneously combine the local image characteristics and the global image characteristics to carry out living body judgment, not only the image quality, but also improve the stability and the generalization of the living body judgment.

The embodiment of the application also provides a living body distinguishing device. Referring to fig. 6, fig. 6 is a block diagram illustrating a living body distinguishing apparatus according to an embodiment of the present application. As shown in fig. 6, the living body identification apparatus 600 includes an acquisition module 610, a generation module 620, a first extraction module 630, a second extraction module 640, an identification module 650, and an identification module 660, wherein:

an obtaining module 610, configured to obtain a target image.

The generating module 620 is configured to generate a first detection frame and a second detection frame in the target image, where the first detection frame and the second detection frame are both used to position the object to be determined, and a detection area of the first detection frame is larger than a detection area of the second detection frame.

The first extraction module 630 is configured to intercept a first image from the target image by using the first detection frame, and input the first image into a first preset neural network for feature extraction, so as to obtain first feature information, where the first preset neural network is obtained based on an auto-supervised pre-training.

The second extraction module 640 is configured to intercept a second image from the target image by using the second detection frame, and input the second image into a second preset neural network for feature extraction, so as to obtain second feature information.

And the identification module 650 is used for identifying the non-living body characteristics according to the first characteristic information.

The judging module 660 is used for determining the object to be judged as a non-living body when the non-living body characteristics are identified; and when the non-living body feature is not identified, carrying out living body judgment according to the first feature information and the first feature information to obtain a living body judgment result.

In some optional embodiments, the living body distinguishing apparatus may further include a training module, the training module includes an acquiring unit, a generating unit, a clipping unit, a first training unit, a second training unit, and a distinguishing unit, wherein:

the acquiring unit is used for acquiring the image sample and the living body distinguishing label of the image sample;

a generation unit configured to generate a third detection frame and a fourth detection frame in the image sample, a detection area of the third detection frame being larger than a detection area of the fourth detection frame;

an intercepting unit, which is used for intercepting a third image from the image sample by using a third detection frame and intercepting a fourth image from the image sample by using a fourth detection frame;

the first training unit is used for generating the self-supervision characteristic of the third image according to a preset characteristic type, wherein the preset characteristic type is a characteristic type with difference between a living body image and a non-living body image; training the first preset neural network by using the third image to obtain a first training result, and performing self-supervision learning on the first preset neural network by using the self-supervision characteristics to obtain a self-supervision result;

the second training unit is used for training a second preset neural network by using a fourth image to obtain a second training result;

the judging unit is used for judging the living body according to the first training result and the second training result to obtain a target judging result; and adjusting parameters of the first preset neural network and the second preset neural network according to the self-supervision result, the target judgment result and the living body judgment label until a training end condition is reached.

Further, optionally, the first training unit is further configured to input the third image into a first preset neural network for N times of feature extraction, so as to obtain first extraction information corresponding to the nth time of feature extraction, where N is a positive integer; determining the first extraction information as a first training result; acquiring second extraction information corresponding to the N-m times of feature extraction, wherein m represents the number of self-supervision layers, m is a positive integer and belongs to [1, N-1 ]; and inputting the second extraction information and the self-supervision characteristics into a pre-trained self-supervision branch to obtain a self-supervision result.

Further, optionally, the first training unit is further configured to generate a target feature map of the third image according to a preset feature type; acquiring a preset self-supervision layer number, and acquiring a characteristic diagram size matched with the self-supervision layer number; and adjusting the size of the target characteristic diagram according to the size of the characteristic diagram to obtain the self-supervision characteristic of the third image.

Further, optionally, the intercepting unit is further configured to intercept a sub-image corresponding to the third detection frame from the image sample; determining the sub-image corresponding to the third detection frame as a third image, or performing data enhancement processing on the sub-image corresponding to the third detection frame to obtain a third image; intercepting a sub-image corresponding to the fourth detection frame from the image sample; and determining the sub-image corresponding to the fourth detection frame as a fourth image, or performing data enhancement processing on the sub-image corresponding to the fourth detection frame to obtain the fourth image.

In some optional embodiments, the first extracting module 630 is further configured to perform a normalization process on the first image to obtain a fifth image; and inputting the fifth image into a first preset neural network for feature extraction to obtain first feature information. The second extraction module 640 is further configured to perform normalization processing on the second image to obtain a sixth image, where a first image size corresponding to the fifth image and a second image size corresponding to the sixth image satisfy a specified multiple, and the first image size is larger than the second image size; and inputting the sixth image into a second preset neural network for feature extraction to obtain second feature information.

In some optional embodiments, the generating module 620 is specifically configured to generate an initial detection frame in the target image, where the initial detection frame is used to position an object to be determined; determining the initial detection frame as a second detection frame, and performing external expansion processing on the initial detection frame to obtain a first detection frame; or, performing two times of external expansion processing on the initial detection frame to obtain a first detection frame and a second detection frame, wherein the detection range of the first detection frame is larger than that of the second detection frame.

In some optional embodiments, the discrimination module 660 is specifically configured to obtain a pre-trained living body discrimination model, where the living body discrimination model includes a fusion unit and a classification unit; inputting the first characteristic information and the second characteristic information into a fusion unit for characteristic fusion to obtain target characteristic information; inputting the target characteristic information into a classification unit to obtain a living body probability and a non-living body probability, wherein the living body probability represents the probability that the object to be judged is a living body, and the non-living body probability represents the probability that the object to be judged is a non-living body; and generating a living body judgment result according to the living body probability and the non-living body probability.

It should be noted that, for the specific implementation process of this embodiment, reference may be made to the specific implementation process of the foregoing method embodiment, and details are not described again.

The embodiment of the application also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores programs, and the programs realize the living body distinguishing method when being executed by the processor.

Embodiments of the present application further provide a storage medium for a computer-readable storage, where the storage medium stores one or more programs, and the one or more programs are executable by one or more processors to implement the living body identification method.

One of ordinary skill in the art will appreciate that all or some of the steps of the methods, systems, functional modules/units in the devices disclosed above may be implemented as software, firmware, hardware, and suitable combinations thereof.

In a hardware implementation, the division between functional modules/units mentioned in the above description does not necessarily correspond to the division of physical components; for example, one physical component may have multiple functions, or one function or step may be performed by several physical components in cooperation. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application specific integrated circuit. Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). The term computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, as is well known to those of ordinary skill in the art. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can accessed by a computer. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.

The preferred embodiments of the present application have been described above with reference to the accompanying drawings, and are not intended to limit the scope of the claims of the application accordingly. Any modifications, equivalents and improvements which may occur to those skilled in the art without departing from the scope and spirit of the present application are intended to be within the scope of the claims of the present application.

Claims

1. A living body discrimination method, comprising:

2. The method of claim 1, wherein the training step of the first and second pre-defined neural networks comprises:

3. The method of claim 2, wherein the training the first pre-set neural network with the third image to obtain a first training result comprises:

inputting the third image into a first preset neural network for N times of feature extraction to obtain first extraction information corresponding to the Nth time of feature extraction, wherein N is a positive integer;

determining the first extraction information as a first training result;

acquiring second extraction information corresponding to the N-m times of feature extraction, wherein m represents the number of self-supervision layers, m is a positive integer and belongs to [1, N-1 ];

and inputting the second extraction information and the self-supervision features into a pre-trained self-supervision branch to obtain a self-supervision result.

4. The method of claim 2, wherein generating the self-supervised feature of the third image according to the preset feature type comprises:

generating a target feature map of the third image according to a preset feature type;

acquiring a preset self-supervision layer number, and acquiring a characteristic diagram size matched with the self-supervision layer number;

and adjusting the size of the target feature map according to the size of the feature map to obtain the self-supervision feature of the third image.

5. The method of claim 2, wherein said using the third detection box to capture a third image from the image sample comprises:

intercepting a sub-image corresponding to the third detection frame from the image sample;

determining the sub-image corresponding to the third detection frame as a third image, or performing data enhancement processing on the sub-image corresponding to the third detection frame to obtain a third image;

intercepting a sub-image corresponding to the fourth detection frame from the image sample;

and determining the sub-image corresponding to the fourth detection frame as a fourth image, or performing data enhancement processing on the sub-image corresponding to the fourth detection frame to obtain a fourth image.

6. The method according to any one of claims 1 to 5, wherein the inputting the first image into a first preset neural network for feature extraction to obtain first feature information comprises:

carrying out standardization processing on the first image to obtain a fifth image;

inputting the fifth image into a first preset neural network for feature extraction to obtain first feature information;

inputting the second image into a second preset neural network for feature extraction to obtain second feature information, wherein the second feature information comprises:

the second image is subjected to standardization processing to obtain a sixth image, wherein a first image size corresponding to the fifth image and a second image size corresponding to the sixth image meet specified multiples, and the first image size is larger than the second image size;

and inputting the sixth image into a second preset neural network for feature extraction to obtain second feature information.

7. The method according to any one of claims 1 to 5, wherein the generating a first detection frame and a second detection frame in the target image comprises:

8. The method according to any one of claims 1 to 5, wherein the performing living body discrimination according to the first feature information and the second feature information to obtain a living body discrimination result comprises:

acquiring a pre-trained living body distinguishing model, wherein the living body distinguishing model comprises a fusion unit and a classification unit;

inputting the first characteristic information and the second characteristic information into the fusion unit for characteristic fusion to obtain target characteristic information;

inputting the target characteristic information into the classification unit to obtain a living body probability and a non-living body probability, wherein the living body probability represents the probability that the object to be distinguished is a living body, and the non-living body probability represents the probability that the object to be distinguished is a non-living body;

and generating a living body judgment result according to the living body probability and the non-living body probability.

9. An electronic device, characterized in that the electronic device includes a memory, a processor, a program stored on the memory and executable on the processor, and a data bus for realizing connection communication between the processor and the memory, the program realizing the steps of the living body discrimination method according to any one of claims 1 to 8 when executed by the processor.

10. A storage medium for computer-readable storage, characterized in that the storage medium stores one or more programs executable by one or more processors to implement the steps of the living body discriminating method of any one of claims 1 to 8.