CN112861586B - Living body detection, image classification and model training method, device, equipment and medium - Google Patents

Living body detection, image classification and model training method, device, equipment and medium Download PDF

Info

Publication number
CN112861586B
CN112861586B CN201911186211.8A CN201911186211A CN112861586B CN 112861586 B CN112861586 B CN 112861586B CN 201911186211 A CN201911186211 A CN 201911186211A CN 112861586 B CN112861586 B CN 112861586B
Authority
CN
China
Prior art keywords
deformable
image
depth
constraint
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911186211.8A
Other languages
Chinese (zh)
Other versions
CN112861586A (en
Inventor
付华
赵立军
高砚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mashang Consumer Finance Co Ltd
Original Assignee
Mashang Consumer Finance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mashang Consumer Finance Co Ltd filed Critical Mashang Consumer Finance Co Ltd
Priority to CN201911186211.8A priority Critical patent/CN112861586B/en
Publication of CN112861586A publication Critical patent/CN112861586A/en
Application granted granted Critical
Publication of CN112861586B publication Critical patent/CN112861586B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/40Spoof detection, e.g. liveness detection
    • G06V40/45Detection of the body part being alive
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a device, equipment and a medium for in vivo detection, image classification and model training, relates to the technical field of data processing, and aims to improve the speed of in vivo detection. The method comprises the following steps: acquiring a target face image group, wherein the target face image group comprises an RGB (red, green and blue) image and a frame depth image corresponding to the RGB image; fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image; and inputting the first fusion image into a first model to obtain a first living body detection result. The embodiment of the invention can improve the speed of in vivo detection.

Description

Living body detection, image classification and model training method, device, equipment and medium
Technical Field
The invention relates to the technical field of image processing, in particular to a method, a device, equipment and a medium for living body detection, image classification and model training.
Background
With the wide application of technologies such as Face recognition and Face unlocking in daily life such as finance, entrance guard, mobile equipment and the like, a Face Anti-counterfeiting/living body detection (Face Anti-Spoofing) technology has gained more and more attention in recent years. Based on a deeper and more complex deep neural network model, the living body detection model running at the server end can reach 99% of accuracy. With the increase of application scenes, a living body detection model which runs in real time on a movable terminal is needed.
Currently, the mobile terminal mostly adopts an interactive mode to perform the living body detection. However, this method requires the detected object to act in coordination, which is time-consuming, and thus affects the detection speed.
Disclosure of Invention
The embodiment of the invention provides a method, a device, equipment and a medium for living body detection, image classification and model training, which are used for improving the speed of living body detection.
In a first aspect, an embodiment of the present invention provides a method for detecting a living body, including:
acquiring a target face image group, wherein the target face image group comprises a frame of RGB (Red, green, blue) image and a frame of depth image corresponding to the RGB image;
fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image;
and inputting the first fusion image into a first model to obtain a first living body detection result.
In a second aspect, an embodiment of the present invention further provides a model training method, including:
obtaining a model training sample set, wherein the model training sample set comprises a plurality of fusion images, and each fusion image is obtained by fusing a frame of RGB image and a frame of depth image corresponding to the RGB image;
and inputting the training sample set into a machine learning network model, and training to obtain a first model.
In a third aspect, an embodiment of the present invention further provides an image classification method, including:
acquiring a target image group, wherein the target image group comprises a frame of RGB image and a frame of depth image corresponding to the RGB image;
fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image;
and inputting the first fusion image into a first model to obtain an image classification result.
In a fourth aspect, an embodiment of the present invention further provides a living body detection apparatus, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a target face image group, and the target face image group comprises a frame of RGB image and a frame of depth image corresponding to the RGB image; the size of a face area in the RGB image meets a first preset requirement, and the depth of the depth image meets a second preset requirement;
the first fusion module is used for fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image;
and the first processing module is used for inputting the first fusion image into a first model to obtain a first living body detection result.
In a fifth aspect, an embodiment of the present invention further provides a model training apparatus, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a model training sample set, the model training sample set comprises a plurality of fusion images, and each fusion image is obtained by fusing a frame of RGB image and a frame of depth image corresponding to the RGB image;
and the training module is used for inputting the training sample set into a machine learning network model and training to obtain a first model.
In a sixth aspect, an embodiment of the present invention further provides an image classification apparatus, including:
the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a target image group, and the target image group comprises a frame of RGB image and a frame of depth image corresponding to the RGB image;
the first fusion module is used for fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image;
and the first processing module is used for inputting the first fusion image into a first model to obtain an image classification result.
In a seventh aspect, an embodiment of the present invention further provides an electronic device, including: a transceiver, a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in the method according to the first aspect or the second aspect or the third aspect as described above when executing the program.
In an eighth aspect, the embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the method according to the first aspect, the second aspect, or the third aspect described above.
In the embodiment of the invention, the single-frame RGB image and the corresponding depth image in the acquired target face image group are fused, and the fused result is used as the input of the model, so that the living body detection result is obtained. Therefore, by using the device provided by the embodiment of the invention, the detected object does not need to be matched to act, so that the detection speed is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.
FIG. 1 is a flowchart of a method for detecting a living body according to an embodiment of the present invention;
FIG. 2 is a flowchart of selecting a target face image group according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an image fusion process provided by an embodiment of the invention;
FIG. 4 is a second flowchart of a method for detecting a living body according to an embodiment of the present invention;
FIG. 5 is a block diagram of three modules stacked into CNN Stem according to an embodiment of the present invention;
FIG. 6 is a third flowchart of a method for detecting a living body according to an embodiment of the present invention;
FIG. 7 is a block diagram of a Fire Module provided in accordance with an embodiment of the present invention;
FIG. 8 is a flow chart of a model training method provided by an embodiment of the present invention;
FIG. 9 is a flowchart of an image classification method provided by an embodiment of the invention;
FIG. 10 is a structural view of a living body detecting apparatus provided in an embodiment of the present invention;
FIG. 11 is a block diagram of a model training apparatus according to an embodiment of the present invention;
fig. 12 is a block diagram of an image classification apparatus provided in an embodiment of the present invention;
FIG. 13 is a block diagram of an electronic device according to an embodiment of the present invention;
FIG. 14 is a second block diagram of an electronic device according to an embodiment of the invention;
fig. 15 is a third structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
Referring to fig. 1, fig. 1 is a flowchart of a living body detection method according to an embodiment of the present invention, which is applied to an electronic device, such as a mobile terminal. As shown in fig. 1, the method comprises the following steps:
step 101, obtaining a target face image group, wherein the target face image group comprises a frame of RGB image and a frame of depth image corresponding to the RGB image.
In the embodiment of the invention, the target face image group can be acquired through the camera provided by the electronic equipment. In practical application, a plurality of face image groups can be acquired through a camera provided by electronic equipment. In the embodiment of the invention, in order to improve the accuracy of judgment, the size of a face area in an RGB image in a target face image group is required to meet a first preset requirement, and the depth of a depth map is required to meet a second preset requirement. The first preset requirement and the second preset requirement can be set according to needs.
For example, the first preset requirement may be that the size of the face region is greater than a certain preset value, and the second preset requirement may be that the depth is greater than a certain preset value.
Thus, prior to step 101, the method may further comprise: the method comprises the steps of obtaining a face image group to be detected, wherein the face image group to be detected comprises a frame of RGB image and a frame of depth image corresponding to the RGB image, and then selecting a target face image group from the face image group to be detected.
Referring to fig. 2, a process of selecting a target face image group is shown. For one frame of RGB image in the obtained face image group to be detected and one frame of depth image corresponding to the RGB image, firstly, judging whether a face area exists in the RGB. If so, continuing the subsequent processing. Otherwise, the face image group can be obtained again. And under the condition that the face area exists, determining the face area in the RGB image, and judging whether the size of the face area meets the requirement or not. If the image meets the requirements, continuing the subsequent processing, otherwise, acquiring the face image group again. And under the condition that the size of the face image meets the preset requirement, cutting out a face area from the RGB image. And in the cut human face area, the pixel positions of the RGB image and the depth image correspond to each other one by one. And judging whether the depth of the cut human face region meets the requirement or not. If the requirements are met, continuing the subsequent treatment. Otherwise, the face image group can be obtained again. Meanwhile, whether the cut human face area has the phenomenon that the human face is shielded or not is judged. If not, continuing the subsequent processing. Otherwise, the face image group can be acquired again. If no human face shielding exists and the depth of the cut human face region meets the preset requirement, the human face region can be used as a target human face image group and subjected to subsequent processing.
And step 102, fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image.
Referring to fig. 3, in an embodiment of the present invention, the fusion manner may include the following:
(1) Only the Depth map is reserved, and a single-channel map (marked as A, depth (1)) is obtained;
(2) Mapping the Depth map into a Color map (marked as B), and superposing the Color map and the RGB map (for example, superposing according to different weights) to obtain a three-channel map (Depth (3) + Color (3));
(3) Only the depth map is reserved to obtain a single-channel map; adding a single-channel map to an Alpha channel of the RGB map to obtain a four-channel map (Color (3) + Depth (a));
(4) Mapping the Depth map into a color map (denoted as B) (Depth (3));
(5) Converting the RGB image into a single-channel grey-scale image, and mapping the depth image into a color image; the single-channel grayscale image is added to the Alpha channel of the Color image, resulting in a four-channel image (Depth (3) + Color (a)).
Accordingly, then, in this step, the first mode may be any one of the above fusion modes. Specifically, according to any one of the following manners, the RGB map and the depth map are fused in a first fusion manner to obtain a first fused image:
only the depth map is reserved to obtain a first single-channel map; or
Mapping the depth map into a first color map, and superposing the first color map and the RGB map to obtain a three-channel map; or
Only the depth map is reserved to obtain a second single channel map; adding the second single-channel image to an Alpha channel of the RGB image to obtain a four-channel image; or
Mapping the depth map into a second color map; or
Converting the RGB map to a single channel grayscale map, mapping the depth map to a second color map; and adding the single-channel gray image to an Alpha channel of the second color image to obtain a four-channel image.
And 103, inputting the first fusion image into a first model to obtain a first living body detection result.
In the embodiment of the present invention, the first model may be, for example, one of featuret (feather network), featuret, mobileNet, shuffleNet, efficientNet, and SqueezeNet.
In the embodiment of the present invention, featherNet is taken as an example, and is modified to serve as a first model herein. Thus, in embodiments of the present invention, a FeatherNet may be referred to as a modified FeatherNet.
The CNN Stem (convolutional neural network backbone) of the FeatherNet of the embodiment of the invention comprises a Deformable Depthwise Convolvulation (DDWConv); the Deformable Depthwise Convolition is included in the Streaming Module of FeatherNet.
The Deformable Depthwise contribution is obtained by combining Depthwise contribution (DWConv) with Deformable contribution (e.g., deformable contribution V2, second version of Deformable Convolution).
Or, in practical application, the CNN Stem of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of 3 × 3; wherein, the 3 × 3 Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the 3 × 3 Deformable depth Convolution with any one of the following Convolution modes: deformable constraint, or scaled constraint.
Alternatively, the CNN Stem of the FeatherNet is a combination of 1 × 3 DDWConv and 3 × 1 DDWConv.
The Streaming Module of the FeatherNet is k multiplied by k Deformable depth Convolution Deformable Depthwise conversion; the k × k Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the k × k Deformable depth Convolution Deformable constraint with any one of the following Convolution modes: deformable constraint, or related constraint.
Specifically, the k × k Deformable depth Convolution Deformable depth constraint may be a 7 × 7 Deformable depth Convolution Deformable depth constraint, and correspondingly, the k × k Deformable depth Convolution Deformable constraint is a 7 × 7 Deformable depth Convolution constraint.
Or the Streaming Module of the FeatherNet is the combination of the Deformable Depthlose contribution of 1 multiplied by k and the Deformable Depthlose contribution of k multiplied by 1. Specifically, the 1 × k Deformable Depthlose conversion may be a 1 × 7 Deformable Depthlose conversion, and the k × 1 Deformable Depthlose conversion may be a 7 × 1 Deformable Depthlose conversion.
In the embodiment of the invention, the single frame RGB image in the acquired target human face image group and the corresponding depth image are fused, and the fused result is used as the input of the model, so that the living body detection result is obtained. Therefore, by using the device provided by the embodiment of the invention, the detected object does not need to be matched to act, so that the detection speed is improved.
Referring to fig. 4, fig. 4 is a flowchart of a living body detection method according to an embodiment of the present invention, which is applied to an electronic device, such as a mobile terminal. As shown in fig. 4, the method comprises the following steps:
step 401, train the first model.
Wherein the first model may comprise FeatherNet.
Taking FeatherNet as an example, in the present invention example, featherNet is modified to obtain modified FeatherNet. In the embodiment of the invention, the FeatherNet is mainly formed by connecting a CNN Stem network and a Streaming Module. According to the difference of CNN Stem, it can be divided into two FeatherNet types: featherNet A and FeatherNet B.
CNN Stem of FeatherNet comprises 3 × 3 Deformable Depthwise restriction; A7X 7 Deformable Depthwise conversion is included in the Streaming Module of FeatherNet. Wherein, the Deformable Depthwise constraint is obtained by combining a 3 × 3 depth Convolution DWConv and a Deformable constraint V2. Or, in practical application, the 3 × 3 Deformable depth Convolution Deformable depth Convolution is obtained by combining the 3 × 3 Deformable depth Convolution with any one of the following Convolution modes: a Deformable Convolution Deformable constraint or a void convolved constraint; or, the CNN Stem of the FeatherNet is the combination of 1 × 3 Deformable Depthwise restriction and 3 × 1 Deformable Depthwise restriction;
the Streaming Module of the FeatherNet is a Deformable depth Convolution Deformable Depthwise conversion of 7 multiplied by 7; wherein, the 7 × 7 Deformable depth Convolution Deformable Depthwise Convolution is obtained by combining the 7 × 7 Deformable depth Convolution with any one of the following Convolution modes: deformable Convolution Deformable constraint or hole Convolution distorted constraint; or, the Streaming Module of the FeatherNet is a combination of 1 × k Deformable Depthlose contribution and k × 1 Deformable Depthlose contribution, for example, the 1 × k Deformable Depthlose contribution is 1 × 7 Deformable Depthlose contribution; the formalble Depthwise contribution of k × 1 is a formalble Depthwise contribution of 7 × 1. K is a positive integer greater than 1
FIG. 5 is a block diagram of three blocks, i.e., block A, block B, block C, stacked into CNN Stem. Specifically, with reference to fig. 5, in the embodiment of the present invention, in three modules, 3 × 3 DWConv in the original CNN Stem is combined with formalble constraint V2 to obtain formalble Depthwise constraint (DDWConv): that is, 3 more dimensions are added to 3 × 3 DWConv for learning the offset (x and y directions) and the weight term of the position, respectively. Meanwhile, 3 × 3 DWConv is replaced with 3 × 3 DDWConv.
Meanwhile, 7 × 7 DWConv in the original Streaming Module is replaced by 7 × 7 DDWConv, so that the design intention of the Streaming Module can be highlighted, and an Effective Receptive Field (Effective received Field) can be learned.
Wherein, the Deformable constraint V2 is added with a weight term on the basis of the V1 version, and the effect is better.
In this step, a model training sample set is obtained, where the model training sample set includes a plurality of fusion images, where each fusion image is obtained by performing fusion processing on a frame of RGB image and a frame of depth image corresponding to the RGB image, and then the training sample set is input into a machine learning network model to train to obtain the first model.
Step 402, obtaining a face image group to be detected, wherein the face image group to be detected comprises a frame of RGB image and a frame of depth image corresponding to the RGB image.
And 403, selecting a target face image group from the face image group to be detected. The target face image group comprises a frame of RGB image and a frame of depth image corresponding to the RGB image.
And step 404, fusing the RGB image and the depth image of the target face image group in a first fusion mode to obtain a first fusion image.
Step 405, inputting the first fusion image into a first model to obtain a first in vivo detection result.
In the embodiment of the present invention, the first living body detection result may be a numerical value. By comparing the value with a preset threshold value, whether the real face image is included can be determined. In addition, if the value of the first in-vivo detection result meets the preset requirement, for example, the value is within a certain value range, in order to improve the accuracy of the detection result, the subsequent cascade judgment can be performed.
Step 406, fusing the RGB image and the depth image in a second fusion mode to obtain a second fusion image; the second fusion mode is different from the first fusion mode.
The specific contents of the first fusion mode and the second fusion mode can refer to the description of the foregoing embodiments.
And 407, inputting the second fusion image into the first model or the second model to obtain a second living body detection result.
Wherein the first model and the second model are different models. The second model may be, for example, a SqueezeNet. In practical applications, the second model may also be trained in advance.
And step 408, obtaining a final in vivo detection result according to the first in vivo detection result and the second in vivo detection result.
In the embodiment of the present invention, the second living body detection result may be a numerical value. Then, here, the first and second living body detection results are operated, and the operated result is taken as the final living body detection result.
The operation comprises any one of the following: calculating a product of the first in-vivo detection result and a first weighting value, calculating a product of the second in-vivo detection result and a second weighting value, and summing the obtained products; alternatively, an average value of the first and second in-vivo detection results is calculated. Of course, there may be other calculation manners in practical application, and the embodiments of the present invention are not limited thereto.
And comparing the obtained operation value with a certain preset value so as to determine whether the real face image is included.
And after the first in-vivo detection result is obtained, obtaining a second in-vivo detection result, and integrating the first in-vivo detection result and the second in-vivo detection result to obtain a final in-vivo detection result. Through the cascade detection, the accuracy of the detection result can be improved.
In the embodiment of the invention, the single-frame RGB image and the corresponding depth image in the acquired target face image group are fused, and the fused result is used as the input of the model, so that the living body detection result is obtained. Therefore, by using the device provided by the embodiment of the invention, the detected object does not need to be matched to act, so that the detection speed is improved. In addition, as FeatherNet is adopted in the scheme of the embodiment of the invention, the model is very small, so that the method and the device are suitable for being arranged at mobile terminals such as terminals.
Referring to fig. 6, fig. 6 is a flowchart of a method for detecting a living body according to an embodiment of the present invention, which is applied to an electronic device, such as a mobile terminal. As shown in fig. 6, the method comprises the following steps:
step 601, training a first model.
Wherein the first model may comprise SqueezeNet, and the like.
Taking the SqueezeNet as an example, in the embodiment of the invention, the SqueezeNet is improved to obtain the improved SqueezeNet. In an embodiment of the present invention, the SqueezeNet includes a Fire Module and a Streaming Module.
FIG. 7 is a block diagram of the Fire Module in an embodiment of the present invention. Wherein the Fire Module comprises a Squeeze layer, an Expand layer and a BatchNorm layer. The function of the Squeeze layer and the expanded layer is the same as that of the prior art, and the difference is that the Squeeze layer and the expanded layer perform Convolution operation by using a Convolution kernel of 1 × 1 and a Convolution kernel of Deformable default constraint (DDWConv). The BatchNorm layer is used to converge the model. By converging the model, the speed of obtaining an accurate model can be increased. Wherein, the Deformable Depthwise conversion is obtained by combining a 3 × 3 deep Convolution DWConv and Deformable conversion V2. Or, in practical application, the Deformable Depthwise contribution is obtained by combining 3 × 3 DWConv with any one of the following Convolution kernels: deformable Convolution V1, scaled Convolution, a combination of 1 × 3 and 3 × 1 Convolution kernels.
The Streaming Module is used for carrying out weighting calculation on each region of the image to be processed, so that the accuracy of the model can be improved.
In this step, a model training sample set is obtained, where the model training sample set includes multiple fusion images, where each fusion image is obtained by fusing a frame of RGB image and a frame of depth image corresponding to the RGB image, and then the training sample set is input into a machine learning network model to train and obtain the first model.
Step 602, a face image group to be detected is obtained, wherein the face image group to be detected comprises a frame of RGB image and a frame of depth image corresponding to the RGB image.
Step 603, selecting a target face image group from the face image group to be detected. The target face image group comprises a frame of RGB image and a frame of depth image corresponding to the RGB image; the size of the face area in the RGB image meets a first preset requirement, and the depth of the depth image meets a second preset requirement.
And step 604, fusing the RGB image and the depth image of the target face image group in a first fusion mode to obtain a first fusion image.
Step 605, inputting the first fusion image into a first model to obtain a first in vivo detection result.
In the embodiment of the present invention, the first living body detection result may be a numerical value. By comparing the value with a preset threshold value, whether a real face image is included can be determined. In addition, if the value of the first in-vivo detection result meets the preset requirement, for example, the value is within a certain value range, in order to improve the accuracy of the detection result, the subsequent cascade judgment can be performed.
Step 606, fusing the RGB image and the depth image in a second fusion mode to obtain a second fusion image; the second fusion mode is different from the first fusion mode.
The specific contents of the first fusion mode and the second fusion mode can refer to the description of the foregoing embodiments.
And 607, inputting the second fusion image into the first model or the second model to obtain a second living body detection result.
Wherein the first model and the second model are different models. The second model can be, for example, featherNet, mobileNet, shuffleNet, efficientNet, and the like. In practical applications, the second model may also be trained in advance.
And 608, obtaining a final in vivo detection result according to the first in vivo detection result and the second in vivo detection result.
In the embodiment of the present invention, the second living body detection result may be a numerical value. Then, here, the first and second live detection results are operated, and the operation result is taken as the final live detection result.
The operation comprises any one of the following: calculating a product of the first in-vivo detection result and a first weighting value, calculating a product of the second in-vivo detection result and a second weighting value, and summing the obtained products; calculating an average value of the first and second in-vivo detection results. Of course, there may be other calculation manners in practical application, and the embodiments of the present invention are not limited thereto.
And comparing the obtained operation value with a certain preset value so as to determine whether the real face image is included.
And after the first in-vivo detection result is obtained, obtaining a second in-vivo detection result, and integrating the first in-vivo detection result and the second in-vivo detection result to obtain a final in-vivo detection result. Through the cascade detection, the accuracy of the detection result can be improved.
In the embodiment of the invention, the single-frame RGB image in the acquired target face image group and the corresponding depth image are fused, and the fused result is used as the input of the model, so that the living body detection result is obtained. Therefore, by using the device provided by the embodiment of the invention, the detected object does not need to be matched to act, so that the detection speed is improved. In addition, the SqueezeNet is adopted in the scheme of the embodiment of the invention, and the model is very small, so that the method and the device are suitable for being arranged at mobile terminals such as terminals.
Referring to fig. 8, fig. 8 is a flowchart of a model training method according to an embodiment of the present invention. As shown in fig. 8, the method comprises the following steps:
step 801, obtaining a model training sample set, wherein the model training sample set comprises a plurality of fusion images, and each fusion image is obtained by fusing a frame of RGB image and a frame of depth image corresponding to the RGB image.
In this step, an image to be processed may be acquired, and then a label is added to the image to be processed. The image to be processed comprises a frame of RGB image and a frame of depth image corresponding to the RGB image. When labeling, both the RGB map and the depth map may be labeled, or only the RGB map or the depth map may be labeled. The annotation is used for indicating whether a real face image exists in the image. And then, fusing the RGB image and the depth image to obtain a fused image. The fusion mode can refer to the description of the foregoing embodiments.
In the method, a Balanced cross entropy Loss function (a-Balanced local Loss) is used as a Loss function to train a classification model, and labels are added to the image to be processed, so that the problem of unbalanced distribution of classes and difficulty levels of training samples can be effectively solved, and the generalization capability and accuracy of the model are improved.
The calculation method of the balance cross entropy loss function is as follows:
FL(p t )=-a t (1-p t ) γ log(p t )
FL is a cross-entropy loss function with dynamically adjustable scale, and two parameters a are arranged in FL t And γ, wherein, a t The main function of (a) is to solve the problem of imbalance of positive and negative samples,gamma is mainly to solve the problem of unbalance of the difficult and easy samples.
And step 802, inputting the training sample set into a machine learning network model, and training to obtain a first model.
In an embodiment of the present invention, the first model includes one of FeatherNet, mobileNet, shuffleNet, efficientNet, and SqueezeNet.
Taking FeatherNet as an example, calculating by using Deformable Depthwise Convolition in a main CNN Stem of a convolutional neural network of FeatherNet; calculating by using a Deformable Depthwise constraint in a Streaming Module of the FeatherNet; the Deformable Depthwise constraint is obtained by combining the depth Convolution Depthwise constraint and the Deformable Convolution Depthwise constraint. Or CNN Stem of FeatherNet is Deformable depth Convolution Deformable Depthwise Convolition of 3 × 3; wherein, the 3 × 3 Deformable depth Convolution Deformable Depthwise Convolution is obtained by combining the 3 × 3 Deformable Depthwise Convolution with any one of the following Convolution modes: a Deformable Convolution Deformable constraint or a void convolved constraint; or, the CNN Stem of the FeatherNet is the combination of 1 × 3 Deformable Depthwise restriction and 3 × 1 Deformable Depthwise restriction;
the Streaming Module of the FeatherNet is a Deformable depth Convolution Deformable Depthwise conversion of 7 multiplied by 7; wherein, the 7 × 7 Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the 7 × 7 Deformable depth Convolution Deformable constraint with any one of the following Convolution modes: deformable Convolution Deformable constraint, or hole Convolution related.
Or, the Streaming Module of FeatherNet is the combination of 1 × k Deformable Depthwise contribution and k × 1 Deformable Depthwise contribution. Specifically, the 1 × k Deformable Depthlose conversion may be a 1 × 7 Deformable Depthlose conversion, and the k × 1 Deformable Depthlose conversion may be a 7 × 1 Deformable Depthlose conversion. And k is a positive integer greater than 1.
Through DWConv and Deformable convention V2 combination with among the FeatherNet, not only make the Convolution kernel concentrate on more effective experience area, strengthen the feature extraction of model, promote the model rate of accuracy, DWConv can reduce the model volume again moreover, more is fit for using on the mobile terminal.
On the basis of the embodiment, pruning and retraining can be performed on the trained model, so that the model is further reduced.
As can be seen from the above description, in the embodiment of the present invention, the single-frame RGB image and the depth image are used to perform fusion in multiple ways, so that the processing speed is increased, and the accuracy of the detection result is improved through the cascade judgment. The FeatherNet model in the embodiment of the invention is small, so that the FeatherNet model is suitable for running at a mobile terminal. Meanwhile, in the process of obtaining the FeatherNet model, DWConv and Deformable contribution V2 are combined, so that the Convolution kernel is concentrated on a more effective experience area to enhance the characteristic extraction of the model, the accuracy rate of the model is improved, and meanwhile, the volume of the model can be reduced.
Referring to fig. 9, fig. 9 is a flowchart of an image classification method according to an embodiment of the present invention. As shown in fig. 9, the method comprises the following steps:
step 901, obtaining a target image group, wherein the target image group includes a frame of RGB image and a frame of depth image corresponding to the RGB image.
The target image group may be an image including any content, such as a human face, a landscape, and the like.
And 902, fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image.
The fusion mode can be referred to the description of the previous embodiment.
And 903, inputting the first fusion image into a first model to obtain an image classification result.
Wherein, different image classification results can be obtained according to different classification targets. For example, the image classification result may be an image including a human face and an image not including a human face, an image including a landscape and an image not including a landscape, and the like; the image classification method can be applied to the field of living body detection and can also be applied to other fields. The first model can refer to the structure of the FeatherNet model and the content of the corresponding training process in the previous embodiment.
In the embodiment of the invention, the acquired single-frame RGB image and the corresponding depth image are fused, and the fused result is used as the input of the model, so that the image classification result is obtained. Therefore, the speed of image classification is improved by using the device provided by the embodiment of the invention.
The embodiment of the invention also provides a living body detection device. Referring to fig. 10, fig. 10 is a structural view of a living body detecting apparatus according to an embodiment of the present invention. Since the principle of solving the problem of the biopsy device is similar to that of the biopsy method in the embodiment of the present invention, the implementation of the biopsy device can be referred to the implementation of the method, and the repeated description is omitted.
As shown in fig. 10, the living body detecting apparatus includes: a first obtaining module 1001, configured to obtain a target face image group, where the target face image group includes a frame of RGB image and a frame of depth image corresponding to the RGB image; a first fusion module 1002, configured to fuse the RGB map and the depth map in a first fusion manner to obtain a first fusion image; the first processing module 1003 is configured to input the first fused image into a first model, so as to obtain a first living body detection result.
Optionally, the first fusion module 1002 fuses the RGB map and the depth map in a first fusion manner according to any one of the following manners to obtain a first fusion image:
only the depth map is reserved to obtain a first single-channel map; or,
mapping the depth map into a first color map, and superposing the first color map and the RGB map to obtain a three-channel map; or,
only the depth map is reserved to obtain a second single-channel map; adding the second single-channel image to an Alpha channel of the RGB image to obtain a four-channel image; or,
mapping the depth map into a second color map; or,
converting the RGB map to a single channel grayscale map, mapping the depth map to a second color map; and adding the single-channel gray-scale image to an Alpha channel of the second color image to obtain a four-channel image.
Optionally, the apparatus may further include:
the second fusion module is used for fusing the RGB image and the depth image in a second fusion mode to obtain a second fusion image; the second fusion mode is different from the first fusion mode;
the second processing module is used for inputting the second fusion image into the first model or the second model to obtain a second living body detection result; wherein the first model and the second model are different models;
and the third processing module is used for obtaining a final living body detection result according to the first living body detection result and the second living body detection result.
Optionally, the third processing module is configured to perform an operation on the first in-vivo detection result and the second in-vivo detection result, and use an operation result as the final in-vivo detection result;
the operation comprises any one of the following:
calculating a product of the first in-vivo detection result and a first weighting value, calculating a product of the second in-vivo detection result and a second weighting value, and summing the obtained products; or
Calculating an average value of the first and second in-vivo detection results.
Optionally, the apparatus may further include:
the second acquisition module is used for acquiring a face image group to be detected, wherein the face image group to be detected comprises a frame of RGB image and a frame of depth image corresponding to the RGB image;
and the selection module is used for selecting the target face image group from the face image group to be detected.
The meaning of the first model can be referred to the description of the method embodiments described above.
The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
The embodiment of the invention also provides a model training device. Referring to fig. 11, fig. 11 is a structural diagram of a model training apparatus according to an embodiment of the present invention. Because the principle of solving the problem of the model training device is similar to the model training method in the embodiment of the invention, the implementation of the model training device can refer to the implementation of the method, and repeated details are not repeated.
As shown in fig. 11, the model training apparatus includes: a first obtaining module 1101, configured to obtain a model training sample set, where the model training sample set includes multiple fusion images, and each fusion image is obtained by fusing a frame of RGB image and a frame of depth image corresponding to the RGB image; the training module 1102 is configured to input the training sample set into a machine learning network model, and train to obtain a first model.
Optionally, calculating by using a Deformable Depthwise Convolition in a convolutional neural network backbone CNN Stem of FeatherNet;
calculating by using Deformable Depthlose conversion in a Streaming Module of FeatherNet;
wherein, CNN Stem of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of 3 multiplied by 3; wherein, the 3 × 3 Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the 3 × 3 Deformable depth Convolution with any one of the following Convolution modes: a Deformable Convolution Deformable constraint or a void convolved constraint; or, CNN Stem of FeatherNet is the combination of Deformable Depthlose contribution of 1 × 3 and Deformable Depthlose contribution of 3 × 1.
The Streaming Module of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of k multiplied by k; the k × k Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the k × k Deformable depth Convolution Deformable constraint with any one of the following Convolution modes: deformable Convolution Deformable constraint or hole Convolution distorted constraint; or, the Streaming Module of FeatherNet is the combination of 1 × k Deformable Depthwise contribution and k × 1 Deformable Depthwise contribution. And k is a positive integer greater than 1.
Specifically, the k × k Deformable depth Convolution Deformable Depthwise Convolition may be a 7 × 7 Deformable depth Convolution Deformable Depthwise Convolition; the 7 × 7 Deformable depth Convolution Deformable depth Convolution constraint is obtained by combining the 7 × 7 Deformable depth Convolution constraint and any one of the following Convolution modes, and includes: 1 × 7 Deformable Depthwise contribution, or 7 × 1 Deformable Depthwise contribution.
Optionally, the fused image is obtained by any one of the following methods:
only the depth map is reserved to obtain a first single-channel map; or,
mapping the depth map into a first color map, and superposing the first color map and the RGB map to obtain a three-channel map; or,
only the depth map is reserved to obtain a second single channel map; adding the second single-channel image to an Alpha channel of the RGB image to obtain a four-channel image; or,
mapping the depth map into a second color map; or,
converting the RGB map to a single channel grayscale map, mapping the depth map to a second color map; and adding the single-channel gray-scale image to an Alpha channel of the second color image to obtain a four-channel image.
The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
The embodiment of the invention also provides an image classification device. Referring to fig. 12, fig. 12 is a structural diagram of an image classification apparatus according to an embodiment of the present invention. Because the principle of the image classification device for solving the problems is similar to the image classification method in the embodiment of the invention, the implementation of the image classification device can be referred to the implementation of the method, and repeated details are not repeated.
As shown in fig. 12, the image classification apparatus includes: a first obtaining module 1201, configured to obtain a target image group, where the target image group includes a frame of RGB image and a frame of depth image corresponding to the frame of RGB image; a first fusion module 1202, configured to fuse the RGB map and the depth map in a first fusion manner to obtain a first fusion image; the first processing module 1203 is configured to input the first fused image into a first model, so as to obtain an image classification result.
The meaning of the first model can be referred to the description of the method embodiments described above.
The apparatus provided in the embodiment of the present invention may implement the method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
As shown in fig. 13, the electronic device according to the embodiment of the present invention includes: a processor 1300, for reading the program in the memory 1320, executes the following processes:
acquiring a target face image group, wherein the target face image group comprises a frame of RGB image and a frame of depth image corresponding to the RGB image;
fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image;
and inputting the first fusion image into a first model to obtain a first living body detection result.
A transceiver 1310 for receiving and transmitting data under the control of the processor 1300.
In fig. 13, among other things, the bus architecture may include any number of interconnected buses and bridges with various circuits being linked together, particularly one or more processors represented by processor 1300 and memory represented by memory 1320. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1310 can be a number of elements including a transmitter and a transceiver that provide a means for communicating with various other apparatus over a transmission medium. The processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1320 may store data used by the processor 1300 in performing operations.
The processor 1300 is responsible for managing the bus architecture and general processing, and the memory 1320 may store data used by the processor 1300 in performing operations.
The processor 1300 is further configured to read the program and execute the following steps:
according to any one of the following modes, fusing the RGB image and the depth image in a first fusion mode to obtain a first fused image:
only the depth map is reserved to obtain a first single-channel map; or,
mapping the depth map into a first color map, and superposing the first color map and the RGB map to obtain a three-channel map; or,
only the depth map is reserved to obtain a second single channel map; adding the second single-channel image to an Alpha channel of the RGB image to obtain a four-channel image; or,
mapping the depth map into a second color map; or,
converting the RGB map to a single channel grayscale map, mapping the depth map to a second color map; and adding the single-channel gray-scale image to an Alpha channel of the second color image to obtain a four-channel image.
The processor 1300 is further configured to read the program and execute the following steps:
fusing the RGB image and the depth image in a second fusion mode to obtain a second fusion image; the second fusion mode is different from the first fusion mode;
inputting the second fusion image into the first model or the second model to obtain a second living body detection result; wherein the first model and the second model are different models;
and obtaining a final in-vivo detection result according to the first in-vivo detection result and the second in-vivo detection result.
The processor 1300 is further configured to read the program and execute the following steps:
calculating the first living body detection result and the second living body detection result, and taking the calculation result as the final living body detection result;
the operation comprises any one of the following:
calculating a product of the first in-vivo detection result and a first weighting value, calculating a product of the second in-vivo detection result and a second weighting value, and summing the obtained products; or,
calculating an average value of the first and second in-vivo detection results.
Wherein the meaning of the first model can refer to the description of the previous embodiments.
As shown in fig. 14, the electronic device according to the embodiment of the present invention includes: the processor 1400 is used for reading the program in the memory 1420 and executing the following processes:
obtaining a model training sample set, wherein the model training sample set comprises a plurality of fusion images, and each fusion image is obtained by fusing a frame of RGB image and a frame of depth image corresponding to the RGB image;
and inputting the training sample set into a machine learning network model, and training to obtain a first model.
A transceiver 1410 for receiving and transmitting data under the control of the processor 1400.
Where, in fig. 14, the bus architecture may include any number of interconnected buses and bridges, particularly one or more processors, represented by processor 1400, and various circuits, represented by memory 1420, linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1410 may be a number of elements including a transmitter and a transceiver providing a means for communicating with various other apparatus over a transmission medium. The processor 1400 is responsible for managing the bus architecture and general processing, and the memory 1420 may store data used by the processor 1400 in performing operations.
The processor 1400 is responsible for managing the bus architecture and general processing, and the memory 1420 may store data used by the processor 1400 in performing operations.
Wherein the first model comprises one of FeatherNet, mobileNet, shuffleNet, efficientNet and SqueezeNet.
Wherein the first model is FeatherNet; calculating by using Deformable Depthwise Convolition in a convolutional neural network backbone CNN Stem of FeatherNet;
calculating by using Deformable Depthlose conversion in a Streaming Module of FeatherNet;
wherein, the Deformable Depthwise constraint is obtained by combining the depth Convolution DWConv and the Deformable constraint V2.
Wherein the first model is FeatherNet; calculating by using Deformable Depthwise Convolition in CNN Stem of FeatherNet;
calculating by using Deformable Depthwise conversion in Streaming Module of FeatherNet;
wherein, the Deformable Depthwise contribution is obtained by combining DWConv with any one of the following Convolution kernels:
a first version of Deformable Convolution Deformable constraint V1, a hole Convolution scaled constraint; or, the CNN Stem of the FeatherNet is the combination of the Deformable depth Convolution Deformable Depthlose contribution of 1 × 3 and the Deformable depth Convolution Deformable Depthlose contribution of 3 × 1;
the Streaming Module of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of k x k, wherein k is a positive integer greater than 1; further, the Streaming Module of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of k × k; the k × k Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the k × k Deformable depth Convolution Deformable constraint with any one of the following Convolution modes: a Deformable Convolution Deformable constraint or a void convolved constraint;
or, the Streaming Module of FeatherNet is a combination of Deformable Depthwise constraint of 1 × k Deformable depth Convolution and Deformable Depthwise constraint of k × 1 Deformable depth Convolution.
Wherein the fused image is obtained by any one of the following methods:
only the depth map is reserved to obtain a first single channel map; or,
mapping the depth map into a first color map, and superposing the first color map and the RGB map to obtain a three-channel map; or,
only the depth map is reserved to obtain a second single-channel map; adding the second single-channel image to an Alpha channel of the RGB image to obtain a four-channel image; or,
mapping the depth map into a second color map; or,
converting the RGB map to a single channel grayscale map, mapping the depth map to a second color map; and adding the single-channel gray-scale image to an Alpha channel of the second color image to obtain a four-channel image.
As shown in fig. 15, the electronic device according to the embodiment of the present invention includes: the processor 1500, which is used to read the program in the memory 1520, executes the following processes:
acquiring a target image group, wherein the target image group comprises a frame of RGB image and a frame of depth image corresponding to the RGB image;
fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image;
and inputting the first fusion image into a first model to obtain an image classification result.
The transceiver 1510 is used to receive and transmit data under the control of the processor 1500.
In fig. 15, among other things, the bus architecture may include any number of interconnected buses and bridges, with one or more processors represented by processor 1500 and various circuits of memory represented by memory 1520 being linked together. The bus architecture may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. The bus interface provides an interface. The transceiver 1510 may be a plurality of elements, including a transmitter and a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 1500 is responsible for managing the bus architecture and general processing, and the memory 1520 may store data used by the processor 1500 in performing operations.
The processor 1500 is responsible for managing the bus architecture and general processing, and the memory 1520 may store data used by the processor 1500 in performing operations.
The meaning of the first model can be referred to the description of the method embodiments described above.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above-mentioned embodiment of the living body detection method, the model training method, or the image classification method, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. With such an understanding, the technical solutions of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the methods according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (18)

1. A method of model training, comprising:
obtaining a model training sample set, wherein the model training sample set comprises a plurality of fusion images, each fusion image is obtained by fusing a frame of red, green and blue (RGB) image and a frame of depth image corresponding to the RGB image, and the model training sample set comprises fusion images of at least two different fusion modes;
inputting the training sample set into a machine learning network model, and training to obtain a first model;
inputting the model training sample set into a machine learning network model, and training to obtain a second model, wherein the first model and the second model are different models;
inputting a fusion image obtained by a first fusion mode into the first model in the process of living body detection or image classification to obtain a first living body detection result or a first image classification result; inputting a fusion image obtained by a second fusion mode into the first model or the second model to obtain a second living body detection result or a second image classification result; calculating the first living body detection result and the second living body detection result to obtain a living body detection result; or, calculating the first image classification result and the second image classification result to obtain an image classification result; the second fusion mode is different from the first fusion mode.
2. The method of claim 1, wherein the first model comprises one of FeatherNet, mobileNet, shuffleNet, effectientNet, and SqueezeNet.
3. The method of claim 2, wherein the first model is a FeatherNet;
a convolutional neural network backbone CNN Stem of FeatherNet is a Deformable deep convolutional Deformable Depthwise constraint with the length of 3 multiplied by 3;
a Streaming Module of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of k x k, wherein k is a positive integer greater than 1;
the Deformable Depthwise constraint is obtained by combining the depth Convolution Depthwise constraint and the Deformable Convolution Depthwise constraint.
4. The method of claim 2, wherein the first model is a FeatherNet;
the CNN Stem of FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of 3 multiplied by 3; wherein, the 3 × 3 Deformable depth Convolution Deformable Depthwise Convolution is obtained by combining the 3 × 3 Deformable Deformable Depthwise Convolution with any one of the following Convolution modes: a Deformable Convolution Deformable constraint or a void convolved constraint;
or, the CNN Stem of the FeatherNet is the combination of the Deformable depth Convolution Deformable Depthlose contribution of 1 × 3 and the Deformable depth Convolution Deformable Depthlose contribution of 3 × 1;
the Streaming Module of the FeatherNet is k multiplied by k Deformable depth Convolution Deformable Depthwise conversion; the k × k Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the k × k Deformable depth Convolution Deformable constraint with any one of the following Convolution modes: a Deformable Convolution Deformable constraint or a void convolved constraint;
or the Streaming Module of the FeatherNet is the combination of the Deformable Depthlose contribution of the 1 xk Deformable depth Convolution and the Deformable Depthlose contribution of the kx 1 Deformable depth Convolution;
and k is a positive integer greater than 1.
5. The method of claim 1, wherein the fused image is obtained by any one of:
only the depth map is reserved to obtain a first single channel map; or,
mapping the depth map into a first color map, and superposing the first color map and the RGB map to obtain a three-channel map; or,
only the depth map is reserved to obtain a second single-channel map; adding the second single-channel image to an Alpha channel of the RGB image to obtain a four-channel image; or,
mapping the depth map into a second color map; or,
converting the RGB map to a single channel grayscale map, mapping the depth map to a second color map; and adding the single-channel gray-scale image to an Alpha channel of the second color image to obtain a four-channel image.
6. A method of in vivo detection, comprising:
acquiring a target face image group, wherein the target face image group comprises a frame of RGB image and a frame of depth image corresponding to the RGB image;
fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image;
inputting the first fusion image into a first model to obtain a first living body detection result;
fusing the RGB image and the depth image in a second fusion mode to obtain a second fusion image; the second fusion mode is different from the first fusion mode;
inputting the second fusion image into the first model or the second model to obtain a second living body detection result; wherein the first model and the second model are different models;
and calculating the first living body detection result and the second living body detection result to obtain a living body detection result.
7. The method according to claim 6, wherein the fusing the RGB map and the depth map in a first fusion manner to obtain a first fused image comprises any one of the following manners:
only the depth map is reserved to obtain a first single-channel map; or,
mapping the depth map into a first color map, and superposing the first color map and the RGB map to obtain a three-channel map; or,
only the depth map is reserved to obtain a second single-channel map; adding the second single-channel image to an Alpha channel of the RGB image to obtain a four-channel image; or,
mapping the depth map into a second color map; or,
converting the RGB map to a single channel grayscale map, mapping the depth map to a second color map; and adding the single-channel gray image to an Alpha channel of the second color image to obtain a four-channel image.
8. The method of claim 6, wherein the computing the first in-vivo test result and the second in-vivo test result to obtain a final in-vivo test result comprises:
calculating the first in-vivo detection result and the second in-vivo detection result, and taking the calculation result as the final in-vivo detection result;
the operation comprises any one of the following:
calculating a product of the first in-vivo detection result and a first weighting value, calculating a product of the second in-vivo detection result and a second weighting value, and summing the obtained products; or,
calculating an average of the first in vivo detection result and the second in vivo detection result.
9. The method of claim 6, wherein the first model comprises one of FeatherNet, mobileNet, shuffleNet, effectientNet, and SqueezeNet.
10. The method of claim 9, wherein the first model is a FeatherNet;
a convolutional neural network backbone CNN Stem of FeatherNet is a Deformable depth convolutional Deformable Depthwise constraint of 3 multiplied by 3;
a Streaming Module of the FeatherNet is a Deformable depth Convolution Deformable Depthwise conversion of k multiplied by k, wherein k is a positive integer larger than 1;
the Deformable Depthwise constraint is obtained by combining the depth Convolution Depthwise constraint and the Deformable Convolution Depthwise constraint.
11. The method of claim 9, wherein the first model is a FeatherNet;
CNN Stem of the FeatherNet is a Deformable depth Convolution Deformable Depthwise Convolition of 3 multiplied by 3; wherein, the 3 × 3 Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the 3 × 3 Deformable depth Convolution Deformable constraint with any one of the following Convolution modes: deformable Convolution Deformable constraint or hole Convolution distorted constraint;
or, the CNN Stem of the FeatherNet is the combination of the Deformable depth Convolution Deformable Depthlose contribution of 1 × 3 and the Deformable depth Convolution Deformable Depthlose contribution of 3 × 1;
the Streaming Module of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of k multiplied by k; the k × k Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the k × k Deformable depth Convolution Deformable constraint with any one of the following Convolution modes: a Deformable Convolution Deformable constraint or a void convolved constraint;
or the Streaming Module of the FeatherNet is the combination of 1 × k Deformable Depthlose contribution and k × 1 Deformable Depthlose contribution;
and k is a positive integer greater than 1.
12. An image classification method, comprising:
acquiring a target image group, wherein the target image group comprises a frame of RGB image and a frame of depth image corresponding to the RGB image;
fusing the RGB image and the depth image in a first fusion mode to obtain a first fusion image;
inputting the first fusion image into a first model to obtain a first image classification result;
fusing the RGB image and the depth image in a second fusion mode to obtain a second fusion image; the second fusion mode is different from the first fusion mode;
inputting the second fusion image into the first model or the second model to obtain a second image classification result; wherein the first model and the second model are different models;
and calculating the first image classification result and the second image classification result to obtain an image classification result.
13. The method of claim 12, wherein the first model comprises one of FeatherNet, mobileNet, shuffleNet, effectientNet, and SqueezeNet.
14. The method of claim 12, wherein the first model is FeatherNet;
the main Stem CNN Stem of the convolutional neural network of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of 3 multiplied by 3;
the Streaming Module of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of k multiplied by k, wherein k is a positive integer larger than 1;
the Deformable Depthwise constraint is obtained by combining the depth Convolution Depthwise constraint and the Deformable Convolution Depthwise constraint.
15. The method of claim 12, wherein the first model is a FeatherNet;
the CNN Stem of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of 3 multiplied by 3; wherein, the 3 × 3 Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the 3 × 3 Deformable depth Convolution with any one of the following Convolution modes: a Deformable Convolution Deformable constraint or a void convolved constraint;
or, CNN Stem of the FeatherNet is a combination of 1 × 3 Deformable depth Convolution Deformable Depthwise constraint and 3 × 1 Deformable depth Convolution Deformable Depthwise constraint;
the Streaming Module of the FeatherNet is a Deformable depth Convolution Deformable Depthwise constraint of k multiplied by k; the k × k Deformable depth Convolution Deformable Depthwise constraint is obtained by combining the k × k Deformable depth Convolution Deformable constraint with any one of the following Convolution modes: a Deformable Convolution Deformable constraint or a void convolved constraint;
or, the Streaming Module of the FeatherNet is obtained by combining a 1 × k Deformable depth Convolution Deformable Depthwise constraint with a k × 1 Deformable depth Convolution Deformable Depthwise constraint;
and k is a positive integer greater than 1.
16. The method according to claim 12, wherein the fusing the RGB map and the depth map in a first fusion manner to obtain a first fused image comprises any one of the following manners:
only the depth map is reserved to obtain a first single channel map; or,
mapping the depth map into a first color map, and superposing the first color map and the RGB map to obtain a three-channel map; or,
only the depth map is reserved to obtain a second single-channel map; adding the second single-channel image to an Alpha channel of the RGB image to obtain a four-channel image; or,
mapping the depth map into a second color map; or,
converting the RGB map to a single channel grayscale map, mapping the depth map to a second color map; and adding the single-channel gray image to an Alpha channel of the second color image to obtain a four-channel image.
17. An electronic device, comprising: a transceiver, a memory, a processor, and a program stored on the memory and executable on the processor; it is characterized in that the preparation method is characterized in that,
the processor for reading the program in the memory to implement the steps in the method of any one of claims 1 to 5; or implementing a step in a method as claimed in any one of claims 6 to 11; or implementing steps in a method according to any of claims 12 to 16.
18. A computer-readable storage medium for storing a computer program, wherein the computer program, when executed by a processor, implements the steps in the method according to any one of claims 1 to 5; or implementing a step in a method as claimed in any one of claims 6 to 11; or implementing steps in a method according to any of claims 12 to 16.
CN201911186211.8A 2019-11-27 2019-11-27 Living body detection, image classification and model training method, device, equipment and medium Active CN112861586B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911186211.8A CN112861586B (en) 2019-11-27 2019-11-27 Living body detection, image classification and model training method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911186211.8A CN112861586B (en) 2019-11-27 2019-11-27 Living body detection, image classification and model training method, device, equipment and medium

Publications (2)

Publication Number Publication Date
CN112861586A CN112861586A (en) 2021-05-28
CN112861586B true CN112861586B (en) 2022-12-13

Family

ID=75985103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911186211.8A Active CN112861586B (en) 2019-11-27 2019-11-27 Living body detection, image classification and model training method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN112861586B (en)

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3156942A1 (en) * 2015-10-16 2017-04-19 Thomson Licensing Scene labeling of rgb-d data with interactive option
CN109034102B (en) * 2018-08-14 2023-06-16 腾讯科技(深圳)有限公司 Face living body detection method, device, equipment and storage medium
CN109376608B (en) * 2018-09-26 2021-04-27 中国计量大学 Human face living body detection method
CN109711243B (en) * 2018-11-01 2021-02-09 长沙小钴科技有限公司 Static three-dimensional face in-vivo detection method based on deep learning
CN109460733A (en) * 2018-11-08 2019-03-12 北京智慧眼科技股份有限公司 Recognition of face in-vivo detection method and system based on single camera, storage medium
CN109635770A (en) * 2018-12-20 2019-04-16 上海瑾盛通信科技有限公司 Biopsy method, device, storage medium and electronic equipment
CN109840475A (en) * 2018-12-28 2019-06-04 深圳奥比中光科技有限公司 Face identification method and electronic equipment
CN109977757B (en) * 2019-01-28 2020-11-17 电子科技大学 Multi-modal head posture estimation method based on mixed depth regression network
CN109934195A (en) * 2019-03-21 2019-06-25 东北大学 A kind of anti-spoofing three-dimensional face identification method based on information fusion
CN110175566B (en) * 2019-05-27 2022-12-23 大连理工大学 Hand posture estimation system and method based on RGBD fusion network

Also Published As

Publication number Publication date
CN112861586A (en) 2021-05-28

Similar Documents

Publication Publication Date Title
US11887311B2 (en) Method and apparatus for segmenting a medical image, and storage medium
CN108171701B (en) Significance detection method based on U network and counterstudy
EP3989104A1 (en) Facial feature extraction model training method and apparatus, facial feature extraction method and apparatus, device, and storage medium
CN111881706B (en) Living body detection, image classification and model training method, device, equipment and medium
CN109871845B (en) Certificate image extraction method and terminal equipment
CN110781976B (en) Extension method of training image, training method and related device
US20190340473A1 (en) Pattern recognition method of autoantibody immunofluorescence image
CN113822951A (en) Image processing method, image processing device, electronic equipment and storage medium
CN111582459B (en) Method for executing operation, electronic equipment, device and storage medium
CN107066980A (en) A kind of anamorphose detection method and device
CN117710921A (en) Training method, detection method and related device of target detection model
CN112861586B (en) Living body detection, image classification and model training method, device, equipment and medium
CN114764942B (en) Difficult positive and negative sample online mining method and face recognition method
CN115063795A (en) Urinary sediment classification detection method and device, electronic equipment and storage medium
CN113902044A (en) Image target extraction method based on lightweight YOLOV3
CN111651626B (en) Image classification method, device and readable storage medium
CN114913513A (en) Method and device for calculating similarity of official seal images, electronic equipment and medium
CN116563898A (en) Palm vein image recognition method, device, equipment and medium based on GhostNet network
CN108805190B (en) Image processing method and device
CN113033305A (en) Living body detection method, living body detection device, terminal equipment and storage medium
US11055512B2 (en) Method, apparatus and server for determining mental state of human
CN106446902B (en) non-character image recognition method and device
CN115731588B (en) Model processing method and device
RU2747214C1 (en) Hardware-software complex designed for training and (or) re-training of processing algorithms for aerial photographs in visible and far infrared band for detection, localization and classification of buildings outside of localities
CN116109823B (en) Data processing method, apparatus, electronic device, storage medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant