CN109886087B

CN109886087B - Living body detection method based on neural network and terminal equipment

Info

Publication number: CN109886087B
Application number: CN201910007987.2A
Authority: CN
Inventors: 刘振轩; 陆进; 陈斌; 宋晨; 郭锦昆
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-01-04
Filing date: 2019-01-04
Publication date: 2023-10-20
Anticipated expiration: 2039-01-04
Also published as: CN109886087A

Abstract

The application relates to the technical field of computers, and provides a living body detection method based on a neural network and terminal equipment. The method comprises the following steps: acquiring a pre-trained convolutional neural network model; adding at least one group of newly added convolutional layers after the last convolutional layer of the convolutional neural network model to generate a living body detection model; each group of newly added convolution layers corresponds to one action detection type; respectively inputting the motion training image samples corresponding to each group of newly added convolution layers into the living body detection model, and carrying out local training on each group of newly added convolution layers; and inputting the whole-network training image sample into the living body detection model after local training, and carrying out overall training on the living body detection model after local training. According to the application, the newly added convolution layers corresponding to each action detection type are added in the pre-trained convolution neural network model, and the local training and the overall training are carried out through the corresponding image samples, so that the living body detection model can realize multi-action detection of living bodies.

Description

Living body detection method based on neural network and terminal equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a living body detection method and a terminal device based on a neural network.

Background

At present, motion detection such as eye opening and mouth opening of living bodies is mostly judged based on face characteristic points (Facial Landmark Detection), and has strong instability. The living body detection mode based on the neural network model is usually used for detecting single actions, and cannot meet the detection requirement of multiple actions of the living body.

Disclosure of Invention

In view of the above, the embodiment of the application provides a living body detection method and terminal equipment based on a neural network, so as to solve the problem that the existing living body detection model based on the neural network can only detect a single action of a living body.

A first aspect of an embodiment of the present application provides a neural network-based living body detection method, including:

acquiring a pre-trained convolutional neural network model;

adding at least one group of newly added convolutional layers after the last convolutional layer of the convolutional neural network model to generate a living body detection model; each group of newly added convolution layers corresponds to one action detection type; each motion detection type corresponds to a group of motion training image samples;

respectively inputting the motion training image samples corresponding to each group of newly added convolution layers into the living body detection model, and carrying out local training on each group of newly added convolution layers;

and inputting the whole-network training image sample into the living body detection model after local training, and carrying out overall training on the living body detection model after local training.

A second aspect of an embodiment of the present application provides a neural network-based living body detection apparatus, including:

the acquisition module is used for acquiring a pre-trained convolutional neural network model;

the generating module is used for adding at least one group of newly added convolution layers after the last convolution layer of the convolution neural network model to generate a living body detection model; each group of newly added convolution layers corresponds to one action detection type; each motion detection type corresponds to a group of motion training image samples;

the first training module is used for respectively inputting the action training image samples corresponding to each group of newly added convolution layers into the living body detection model and carrying out local training on each group of newly added convolution layers;

and the second training module is used for inputting the whole-network training image sample into the living body detection model after local training and carrying out overall training on the living body detection model after local training.

A third aspect of the embodiments of the present application provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the neural network-based living detection method in the first aspect.

A fourth aspect of the embodiments of the present application provides a terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the neural network based living detection method in the first aspect when executing the computer program.

Compared with the prior art, the embodiment of the application has the beneficial effects that: adding at least one group of newly added convolutional layers after the last convolutional layer of the convolutional neural network model trained in advance to generate a living body detection model; each group of newly added convolution layers corresponds to one action detection type; and carrying out local training on the living body detection model through action training image samples corresponding to the newly added convolution layers of each group, and then carrying out overall training on the living body detection model subjected to the local training through the whole-network training image samples. According to the embodiment of the application, the newly added convolution layers corresponding to each action detection type are added in the pre-trained convolution neural network model, and the local training and the overall training are carried out through the corresponding image samples, so that the living body detection model can realize multi-action detection of living bodies.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an implementation of a neural network-based in-vivo detection method provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of a pre-trained convolutional neural network model in one implementation example provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of adding a full connection layer leg in an embodiment example provided by an embodiment of the present application;

FIG. 4 is a flowchart of a local training implementation of each newly added convolutional layer in a neural network-based living body detection method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a stage P1 of a local training process in one implementation example provided by an embodiment of the present application;

FIG. 6 is a schematic diagram of a stage P3 of a local training process in one implementation example provided by an embodiment of the present application;

FIG. 7 is a schematic diagram of a stage P3 of a local training process in one implementation example provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of a living body detection model in one embodiment example provided by an embodiment of the present application;

fig. 9 is a schematic diagram of a living body detection apparatus based on a neural network according to an embodiment of the present application;

fig. 10 is a schematic diagram of a terminal device according to an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

In order to illustrate the technical scheme of the application, the following description is made by specific examples.

Fig. 1 is a flowchart of an implementation of a living body detection method based on a neural network according to an embodiment of the present application, which is described in detail below:

in S101, a pre-trained convolutional neural network model is acquired.

In this embodiment, the obtaining of the pre-trained convolutional neural network model may be establishing a convolutional neural network model, and then training the convolutional neural network model through image samples to obtain the convolutional neural network model; or may be to obtain a convolutional neural network model that has been trained. For example, a pre-trained vgg model, a mobilet model, etc. is obtained.

In S102, adding at least one group of newly added convolution layers after the last convolution layer of the convolution neural network model to generate a living body detection model; each group of newly added convolution layers corresponds to one action detection type; each motion detection type corresponds to a set of motion training image samples.

In this embodiment, the set of newly added convolution layers includes at least one newly added convolution layer, for example, may include two convolution layers, three convolution layers, and so on. The motion detection type is a motion type which needs to be detected, such as eye opening, mouth opening, nodding, head shaking and the like. And the newly added convolution layers correspond to an action detection type and are used for identifying and judging the action. For example, the motion detection types are three, namely eye opening, mouth opening and nodding, and then three groups of new convolution layers are added after the last convolution layer of the convolutional neural network model trained in advance, namely the new convolution layer corresponding to the eye opening, the new convolution layer corresponding to the mouth opening and the new convolution layer corresponding to the nodding, wherein each group of new convolution layers is added after the last convolution layer of the convolutional neural network model in a parallel mode.

In this embodiment, one motion detection type corresponds to a set of motion training image samples, and a newly added convolution layer corresponding to the motion detection type can be trained by the corresponding set of motion training image samples. For example, there are three types of motion detection, namely, eye opening, mouth opening and nodding, and three groups of motion training image samples, namely, a motion training image sample corresponding to eye opening, a motion training image sample corresponding to mouth opening and a motion training image sample corresponding to nodding.

As an embodiment of the present application, after S101, before S102, the method may further include:

and deleting the convolution layer with the last preset layer number of the convolution neural network model.

In this embodiment, for an obtained pre-trained convolutional neural network model, if the operation speed of the model is lower than a preset speed threshold and/or the memory space occupied by the model exceeds a preset memory threshold, the convolutional layer with the preset layer number of the reciprocal of the convolutional neural network model may be deleted first, and then a newly added convolutional layer corresponding to the action detection type may be added. Thus, the calculation speed of the generated living body detection model can be ensured, and the detection efficiency can be improved.

Fig. 2 is a schematic diagram of a pre-trained convolutional neural network model according to an embodiment of the present application, where Conv represents a convolutional layer. The convolution layers of the convolution neural network model have 5 layers, and the last two layers of the convolution neural network can be deleted firstly, namely the fourth layer of convolution layer Conv4 and the fifth layer of convolution layer Conv5 are deleted, and then one or more groups of newly added convolution layers are added after the third layer of convolution layer Conv3.

As an embodiment of the present application, after S102, the method may further include:

adding at least one full-connection layer branch after the newly added convolution layer corresponding to the first action detection type; the first action detection type is an action detection type with misjudgment factors; each full connection layer leg corresponds to one false positive factor of the first action detection type.

In this embodiment, the erroneous determination factor is a factor that easily causes an error in the detection result of one action when the action is detected in vivo. Some motion detection types have misjudgment factors, and some motion detection types do not have misjudgment factors. And taking the action detection type with the misjudgment factor as a first action detection type. In order to eliminate the influence of the misjudgment factor on the detection result, one or more full-connection layer branches can be added after the newly added convolution layer corresponding to the first action detection type. One full-connection layer branch corresponds to one misjudgment factor of the first dynamic detection type and is used for reducing the influence of the misjudgment factor on the detection result. The plurality of full connection layer branches corresponding to the first action detection type may be added in parallel after the newly added convolution layer corresponding to the first action detection type.

As one embodiment of the present application, the first action detection type includes open-eye and closed-eye detection, and the misjudgment factor of the open-eye and closed-eye detection includes a jitter factor and/or a shielding factor; as shown in fig. 2, the step of adding at least one full connection layer leg after the newly added convolution layer corresponding to the first action detection type may include:

and adding a full-connection layer branch corresponding to a jitter factor and/or a full-connection layer branch corresponding to a shielding factor after the newly added convolution layer corresponding to the open-close eye detection.

In the present embodiment, the eye-opening and eye-closing detection means detection of eye-opening and eye-closing actions of the target. Image shake may cause detection misjudgment, and image shake is taken as a shake factor. If the traditional neural network two-classification model is used for judging whether the eyes are opened or not, the predicted probability value of opening the eyes is quite inaccurate due to the image blurring generated during image dithering, so that the motion living body detection is quite difficult to use, the user experience is poor, and even the situation that the motion living body detection can be broken by using paper dithering is likely to occur. The method provided by the embodiment can train the full-connection layer branch corresponding to the jitter factor, thereby eliminating the influence of the jitter factor on the living body detection result, preventing the living body detection model from being broken due to paper jitter, image jitter and the like, and improving the detection accuracy and safety of the living body detection model.

Occlusion of a target may cause detection false positives, where the object is occluded as an occlusion factor. On the one hand, the face area is covered by the mask: open eyes can be detected using the conventional open eye model, but this violates the principle of living detection because the mask is a non-living and non-self-attacking type. On the other hand, the traditional open-close eye judging model can simulate the living body action of the closed eye by using fingers and foreign matters to shield the eyes of the face paper, thereby achieving the purpose of breaking the living body. In this embodiment, whether the target is masked is determined by training a full connection layer branch corresponding to the mask, and if so, the user may not be allowed to determine or prompt that the target is masked through the living body detection model. According to the embodiment, the influence of the shielding factors on the living body detection result is eliminated through the full-connection layer branch corresponding to the shielding factors, the living body detection model is prevented from being broken due to mask shielding, foreign matter shielding and the like, and the detection accuracy and safety of the living body detection model are improved.

Fig. 3 is a schematic diagram of a full connection layer adding branch circuit according to an embodiment of the present application. Fig. 3 is a set of newly added convolutional layers corresponding to open-eye and closed-eye detection, which include two layers, namely, the convolutional layers conv4_eye and conv5_eye in the figure, added after the third layer of convolutional layers in the convolutional neural network model shown in fig. 2. After the newly added convolution layer corresponding to the eye opening and closing detection, three full-connection layer branches are added, namely a full-connection layer branch FC_OpenEye corresponding to the jitter factor, a full-connection layer branch FC_BlockEye corresponding to the shielding factor and a full-connection layer branch FC_new corresponding to the new erroneous judgment factor. The new misjudgment factor means that a certain misjudgment factor except the jitter factor and the shielding factor exists in the open-close eye detection, and a full-connection layer branch corresponding to the certain misjudgment factor can be added.

In S103, motion training image samples corresponding to the newly added convolutional layers of each group are input to the living body detection model, and local training is performed on the newly added convolutional layers of each group.

In this embodiment, the motion training image samples corresponding to the newly added convolutional layer are a set of motion image training samples corresponding to the motion detection type corresponding to the newly added convolutional layer. The set of newly added convolutional layers in the living body detection model can be trained by the set of motion image training samples as input to the living body detection model. The network parameters of the corresponding newly added convolution layer can be trained by utilizing each group of motion training image samples sequentially. The local training is to train the network parameters of the newly added convolution layers of each group, and other network parameters in the living body detection model are adjusted less in the training process of the step.

As one embodiment of the present application, the motion training image samples corresponding to the first motion detection type include motion training image samples corresponding to each misjudgment factor of the first motion detection type; as shown in fig. 4, S103 may include:

in S401, motion training image samples corresponding to the misjudgment factors of the first motion detection type are input to the living body detection model, and distribution training is performed on all the full-connection layer branches corresponding to the first motion detection type.

In this embodiment, one misjudgment factor of the first motion detection type corresponds to a set of motion training image samples, and the set of motion training image samples may be used to train the full-connection layer leg corresponding to the misjudgment factor, and adjust the parameters corresponding to the full-connection layer leg. And training each connecting layer branch by using the action training image samples corresponding to each misjudgment factor.

In S402, the motion training image samples corresponding to the first motion detection type are input to a distributed and trained living body detection model, and the newly added convolution layer corresponding to the first motion detection type and all the full connection layer branches corresponding to the first motion detection type are jointly trained according to a preset loss function weight value; the preset loss function weight value is the weight value of the loss function corresponding to each misjudgment factor of the first action detection type.

In this embodiment, the preset weight value of the loss function may be determined according to the influence degree of the erroneous judgment factor on the detection result. For example, the first action detection type has two misjudgment factors, and the weight values of the loss functions corresponding to the two misjudgment factors may be set to 0.3 and 0.7, or both may be set to 0.5, or the like. In this step, the motion training image samples corresponding to the first motion detection type include, but are not limited to, motion training image samples corresponding to each misjudgment factor of the first motion detection type. The network parameters of the newly added convolution layer corresponding to the first action detection type and all the full-connection layer branches corresponding to the first action detection type can be trained and adjusted through the action training image samples corresponding to the first action detection type.

Fig. 5 to 7 are schematic diagrams illustrating a local training process according to an embodiment of the present application. The open and close eye detection corresponds to two misjudgment factors, namely a jitter factor and a shielding factor. The two misjudgment factors correspond to the two full-connection layer branches respectively. The training process for open-close eye detection can be divided into the following three steps in sequence: in the P1 distribution stage, the shaky face image and the non-shaky face image are used as training input samples of the living body detection model, as shown in fig. 5, in this step, only network parameters of newly added convolution layers (conv4_eye and conv5_eye) in a rectangular dotted line frame and a full connection layer branch (fc_openeye) corresponding to the shaky factor are adjusted. In the P2 distribution stage, the mask-blocked face image and the mask-unblocked face image are used as training input samples of the living body detection model, as shown in fig. 6, in this step, only network parameters of newly added convolution layers (conv4_eye and conv5_eye) in a rectangular dotted line frame and a full connection layer branch (fc_blockeye) corresponding to blocking factors are adjusted. In the P3 joint stage, using the dithered face image, the non-dithered face image, the mask-blocked face image and the mask-free face image as training input samples of the living body detection model, as shown in fig. 7, in this step, network parameters of a newly added convolution layer (conv4_eye and conv5_eye) in a rectangular dashed frame, a full-connection layer branch (fc_openeye) corresponding to the dithering factor and a full-connection layer branch (fc_blockeye) corresponding to the blocking factor are adjusted.

According to the embodiment, the motion training image samples corresponding to each misjudgment factor are input to the living body detection model for distribution training, and then the motion training image samples corresponding to the first motion detection type are input to the living body detection model subjected to distribution training for joint training, so that the local training of the living body detection model can be realized, and the detection precision is improved; the weight value of the loss function of the full-connection layer branch corresponding to each misjudgment factor is convenient to flexibly adjust through the preset loss function weight value, and the accuracy and safety of the living body detection model can be improved.

In S104, the whole-network training image sample is input to the locally trained living body model, and the locally trained living body model is subjected to overall training.

In this embodiment, the whole-network training image sample refers to a training image for performing living body detection on a target. The full-network training image comprises, but is not limited to, a motion training image sample corresponding to the first motion detection type and a motion training image sample corresponding to each misjudgment factor. For example, network parameters of the living body detection model can be optimized through a general face image sample, and the network parameters of the convolution layers in the original convolution neural network except for the newly added convolution layer and the full-connection layer branch in the living body detection model, such as a first layer convolution layer Conv1, a second layer convolution layer Conv2 and a third layer convolution layer Conv3 in fig. 3, are mainly optimized and trained.

In the embodiment of the application, at least one group of newly added convolution layers is added after the last convolution layer of the pre-trained convolution neural network model to generate a living body detection model; each group of newly added convolution layers corresponds to one action detection type; and carrying out local training on the living body detection model through action training image samples corresponding to the newly added convolution layers of each group, and then carrying out overall training on the living body detection model subjected to the local training through the whole-network training image samples. According to the embodiment of the application, the newly added convolution layers corresponding to each action detection type are added in the pre-trained convolution neural network model, and the local training and the overall training are carried out through the corresponding image samples, so that the living body detection model can realize multi-action detection of living bodies.

In the embodiment of the application, for the trained living body detection model, if other living body action detection requirements (such as opening and closing eyes, shaking head, nodding head and the like) and other factors (such as camera shake, shielding, wearing mask and the like) influencing the judgment stability exist in the follow-up, a corresponding new convolution layer and full connection layer branch can be added for the living body detection model according to the steps of the method. This allows for less variation in the model and more motion detection requirements.

For example, referring to fig. 8, if a new judgment Mouth opening and closing action is required to perform living body detection, new network layers (such as a convolution layer conv4_mouth and a convolution layer conv5_mouth in a rectangular frame) and full connection layer branches (such as full connection layer branches fc_openmouth and fc_blockmouth in a rectangular frame) corresponding to Mouth opening and closing detection may be added, the new network layers and the full connection layer branches may be trained by detecting corresponding action training image samples through Zhang Bizui, and then the living body detection model may be trained as a whole by full network training image samples.

The application provides a method for training a living body action judgment model based on a neural network grafting mode. The model is stable, the unique structure can control the future possible attack situation by adding new branches, and the iterative optimization is convenient. The embodiment of the application has the following advantages: 1. the training mode of using the multi-connection layer branch circuit greatly prevents misjudgment caused by shaking, shielding and the like, improves the stability of the whole model, and improves the detection accuracy; 2. different neural network layers for motion detection are integrated in one network by using a network grafting mode, so that the volume of a living body detection model is reduced, and the speed of the living body model is improved; 3. when other newly increased living body motion detection requirements exist, the new detection requirements can be realized by adding a convolution layer and a full connection layer branch on the basis of the original model, so that the model is subjected to less variation, and more motion detection requirements can be realized.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

Fig. 9 shows a schematic diagram of a living body detection apparatus based on a neural network according to an embodiment of the present application, corresponding to the living body detection method based on a neural network described in the above embodiment. For convenience of explanation, only the portions related to the present embodiment are shown.

Referring to fig. 9, the apparatus includes an acquisition module 91, a generation unit 92, a first training unit 93, and a second training unit 94.

An acquisition module 91, configured to acquire a convolutional neural network model trained in advance.

A generating module 92, configured to add at least one set of newly added convolutional layers after the last convolutional layer of the convolutional neural network model, to generate a living body detection model; each group of newly added convolution layers corresponds to one action detection type; each motion detection type corresponds to a set of motion training image samples.

The first training module 93 is configured to input motion training image samples corresponding to each set of newly added convolution layers to the living body detection model, and perform local training on each set of newly added convolution layers.

A second training module 94 is configured to input the whole-network training image sample into the locally trained living body model, and perform overall training on the locally trained living body model.

Optionally, the apparatus further comprises a preprocessing module, wherein the preprocessing module is used for:

Optionally, the apparatus further comprises a processing module, the processing module is configured to:

Optionally, the first action detection type includes open-eye detection, and the misjudgment factor of the open-eye detection includes a jitter factor and/or a shielding factor;

the processing module is used for:

Optionally, the motion training image samples corresponding to the first motion detection type include motion training image samples corresponding to each misjudgment factor of the first motion detection type; the first training module 93 is configured to:

respectively inputting motion training image samples corresponding to each misjudgment factor of the first motion detection type into the living body detection model, and carrying out distributed training on each full-connection layer branch corresponding to the first motion detection type;

inputting the motion training image sample corresponding to the first motion detection type into a living body detection model subjected to distributed training, and carrying out joint training on the newly added convolution layer corresponding to the first motion detection type and all the connection layer branches corresponding to the first motion detection type according to a preset loss function weight value; the preset loss function weight value is the weight value of the loss function corresponding to each misjudgment factor of the first action detection type.

Fig. 10 is a schematic diagram of a terminal device according to an embodiment of the present application. As shown in fig. 10, the terminal device 10 of this embodiment includes: a processor 100, a memory 101, and a computer program 102, e.g. a program, stored in the memory 101 and executable on the processor 100. The processor 100, when executing the computer program 102, implements the steps of the various method embodiments described above, such as steps 101 to 104 shown in fig. 1. Alternatively, the processor 100 may perform the functions of the modules/units of the apparatus embodiments described above, such as the functions of the modules 91 to 94 shown in fig. 9, when executing the computer program 102.

Illustratively, the computer program 102 may be partitioned into one or more modules/units that are stored in the memory 101 and executed by the processor 100 to accomplish the present application. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions for describing the execution of the computer program 102 in the terminal device 10.

The terminal device 10 may be a computing device such as a desktop computer, a notebook computer, a palm computer, a cloud server, etc. The terminal device may include, but is not limited to, a processor 100, a memory 101. It will be appreciated by those skilled in the art that fig. 10 is merely an example of the terminal device 10 and is not limiting of the terminal device 10, and may include more or fewer components than shown, or may combine certain components, or different components, e.g., the terminal device may also include input and output devices, network access devices, buses, displays, etc.

The processor 100 may be a central processing unit (Central Processing Unit, CPU), other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 101 may be an internal storage unit of the terminal device 10, such as a hard disk or a memory of the terminal device 10. The memory 101 may also be an external storage device of the terminal device 10, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 10. Further, the memory 101 may also include both an internal storage unit and an external storage device of the terminal device 10. The memory 101 is used for storing the computer program and other programs and data required by the terminal device. The memory 101 may also be used to temporarily store data that has been output or is to be output.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical function division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the computer readable medium may include content that is subject to appropriate increases and decreases as required by jurisdictions in which such content is subject to legislation and patent practice, such as in certain jurisdictions in which such content is not included as electrical carrier signals and telecommunication signals.

The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims

1. A neural network-based living body detection method, characterized by comprising:

acquiring a pre-trained convolutional neural network model;

inputting the whole-network training image sample into a living body detection model subjected to local training, and carrying out overall training on the living body detection model subjected to local training;

after adding at least one set of newly added convolution layers after the last convolution layer of the convolution neural network model, and generating a living body detection model, inputting motion training image samples corresponding to each set of newly added convolution layers into the living body detection model respectively, and before carrying out local training on each set of newly added convolution layers, further comprising:

adding at least one full-connection layer branch after the newly added convolution layer corresponding to the first action detection type; the first action detection type is an action detection type with misjudgment factors; each full connection layer branch corresponds to one misjudgment factor of the first action detection type;

the action training image samples corresponding to the first action detection type comprise action training image samples corresponding to each misjudgment factor of the first action detection type;

the step of respectively inputting the motion training image samples corresponding to each group of newly added convolution layers into the living body detection model, and the step of carrying out local training on each group of newly added convolution layers comprises the following steps:

2. The neural network-based living body detection method of claim 1, wherein after the acquiring the pre-trained convolutional neural network model, adding at least one set of newly added convolutional layers after a last convolutional layer of the convolutional neural network model, and before generating the living body detection model, further comprising:

3. A neural network based living being detection method as claimed in claim 1 wherein the first action detection type includes open-eye and closed-eye detection, and the false positive factors of open-eye and closed-eye detection include jitter factors and/or occlusion factors;

the adding at least one full connection layer branch after the newly added convolution layer corresponding to the first action detection type includes:

4. A neural network-based living body detection apparatus for implementing the neural network-based living body detection method according to any one of claims 1 to 3, the neural network-based living body detection apparatus comprising:

5. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 3.

6. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, realizes the steps of:

acquiring a pre-trained convolutional neural network model;

7. The terminal device of claim 6, wherein after the acquiring the pre-trained convolutional neural network model, adding at least one set of newly added convolutional layers after a last convolutional layer of the convolutional neural network model, before generating the in-vivo detection model, further comprising: