CN115995028A

CN115995028A - Living body detection model training method, living body detection method and living body detection system

Info

Publication number: CN115995028A
Application number: CN202211425567.4A
Authority: CN
Inventors: 曹佳炯; 丁菁汀
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-11-15
Filing date: 2022-11-15
Publication date: 2023-04-21

Abstract

The specification provides a living body detection model training method, a living body detection method and a living body detection system, wherein after a first training image set is acquired, a second training image set is generated based on the first training image set, the living body resolvable of the second training image set on a basic living body detection model is lower than that of the first training image set on the basic living body detection model, and a target neural network model is trained by using a comprehensive training image set to obtain a target living body detection model with living body resolution performance superior to that of the basic living body detection model, wherein the comprehensive training image set comprises the first training image set and the second training image set; the scheme can improve the collection efficiency of different domain data of living body detection.

Description

Living body detection model training method, living body detection method and living body detection system

Technical Field

The specification relates to the technical field of the internet of things, in particular to a living body detection model training method, a living body detection method and a living body detection system.

Background

The development of face recognition technology brings great convenience to life of people, but face recognition systems also face security threats of living body attacks. For example, after the attacker obtains the face image of the user a, the face image can be displayed to the face recognition system by a mobile phone or a printing mode, so that the attacker can attack the face image. Once a living attack is successful, a significant loss is incurred to the user, and thus, it is necessary to perform living attack detection.

Currently, a deep learning model is generally used for living body detection. However, the deep learning model has poor generalization capability across domains, and in order to solve the problem, one way is to train based on multi-domain data, that is, collect training data of different data domains as much as possible to train, so as to obtain the deep learning model with better performance in various scenes. The other mode is a training method based on few-shot finetune, namely, a small number of samples are collected and optimized through a data field which is poor in performance on a trained model, so that the performance of the trained model is improved. However, the above methods all need to collect samples of each data field, so that a lot of time is required for collecting the field data, and the field data collecting efficiency is low.

Disclosure of Invention

The specification provides a living body detection model training method, a living body detection method and a living body detection system, which can improve the acquisition efficiency of domain data.

In a first aspect, the present specification provides a method for training a living body detection model, including: acquiring a first training image set; generating a second training image set based on the first training image set, wherein the living body resolvable of the second training image set in a basic living body detection model is lower than that of the first training image set in the basic living body detection model; and training the target neural network model by using a comprehensive training image set to obtain a target living body detection model with living body resolution performance superior to that of a basic living body detection model, wherein the comprehensive training image set comprises the first training image set and the second training image set.

In some embodiments, the base biopsy model is trained using the following method steps: acquiring a third training image set, and recording each training image in the third training image set as a target training image; randomly scrambling the image blocks corresponding to the target training images by adopting a preset basic living body detection network to obtain at least two scrambled training images corresponding to the target training images; performing feature extraction and living body category prediction on the at least two disturbance training images to obtain feature vectors and living body classification results corresponding to the at least two disturbance training images; and determining first loss information based on feature vectors and living body classification results corresponding to the at least two disturbance training images, and converging the preset basic living body detection network based on the first loss information to obtain the basic living body detection model.

In some embodiments, the determining the first loss information based on the feature vectors and the living body classification results corresponding to the at least two perturbed training images includes: determining feature comparison loss information based on feature vectors corresponding to the at least two disturbance training images, wherein constraint conditions of the feature comparison loss information are that the similarity between the feature vectors corresponding to the at least two disturbance training images is minimized, and the similarity between the feature vectors of disturbance training images corresponding to different target training images is maximized; determining at least two first living body classification loss information corresponding to the at least two disturbance training images based on living body classification results corresponding to the at least two disturbance training images and the labeled living body category of the target training image; and obtaining the first loss information based on the accumulated sum of the feature comparison loss information and the at least two first living body classification loss information.

In some embodiments, the generating a second training image set based on the first training image set includes: combining image blocks of a plurality of training images in the first training image set by adopting a trained reinforcement learning network to obtain an intermediate training image set corresponding to the first training image set; and performing domain sample division on the intermediate training image set to obtain the second training image set.

In some embodiments, the reinforcement learning network is trained using the following method steps: acquiring a fourth training image set, and combining image blocks of a plurality of training images in the fourth training image set by adopting a preset reinforcement learning network to obtain a new training image set corresponding to the fourth training image set; predicting the living body category corresponding to the new training image set by adopting the basic living body detection model to obtain a living body classification result corresponding to the new training image set; and determining second loss information based on the living body classification result corresponding to the new training image set and the labeled living body category of the fourth training image set, and converging the preset reinforcement learning network towards the direction of maximizing the second loss information to obtain the trained reinforcement learning network.

In some embodiments, the performing domain sample division on the intermediate training image set to obtain the second training image set includes: inputting the intermediate training image set into the basic living body detection model to obtain a feature vector corresponding to the intermediate training image set; n-class clustering is carried out on the feature vectors corresponding to the middle training image set to obtain N new domain classes corresponding to the middle training image set, wherein N is an integer greater than or equal to 1; and obtaining the second training image set based on the intermediate training image set and N new domain categories corresponding to the intermediate training image set.

In some embodiments, the method of N-class clustering is KMeans clustering method.

In some embodiments, the comprehensive training image set corresponds to a sub-training image set of a plurality of domain categories, the plurality of domain categories including M domain categories corresponding to the first training image set and N new domain categories corresponding to the second training image set, the M being an integer greater than or equal to 1, the target neural network model including a feature extraction network, a multi-branch classification network, and a fusion classification network, the multi-branch classification network including a plurality of sub-classification networks; training the target neural network model by using the comprehensive training image set to obtain a target living body detection model with living body resolution performance superior to that of a basic living body detection model, wherein the method comprises the following steps of: extracting features of the sub-training image sets of the domain categories by adopting the feature extraction network to obtain a plurality of first training feature sets corresponding to the sub-training image sets; the multiple sub-branch classification networks are adopted to respectively conduct feature extraction and living body category prediction on the multiple first training feature sets, and multiple second training feature sets and multiple sub-living body classification results corresponding to the multiple first training feature sets are obtained; performing living body category fusion prediction based on the plurality of second training feature sets and the plurality of sub living body classification results by adopting the fusion classification network to obtain a fusion living body classification result; and determining second loss information based on the fusion living body classification result and the labeled living body category corresponding to the comprehensive training image set, and converging the fusion classification network, the multi-branch classification network and the feature extraction network based on the second loss information to obtain the target living body detection model.

In some embodiments, the performing feature fusion on the second training features of the plurality of sub-training image sets and predicting the living body class by using the fusion classification network to obtain a fused living body classification result of the plurality of sub-training image sets includes: the fusion classification network is adopted to adaptively adjust the weights of the plurality of second training feature sets based on the difference between the sub-classification results and the labeled living body categories, so that the weights of the plurality of second training feature sets are obtained; feature fusion is carried out on the plurality of second training feature sets based on the weights of the plurality of second training feature sets, so that fusion training features of the plurality of second training feature sets are obtained; and predicting the living body category of the plurality of sub-training image sets based on the fusion training characteristics to obtain a fusion living body category result of the plurality of sub-training image sets.

In some embodiments, the determining the second loss information based on the fused living body classification result and the labeled living body class corresponding to the comprehensive training image set, the plurality of sub living body classification results and the labeled living body class corresponding thereto includes: obtaining a fusion living body classification loss based on the fusion living body classification result and the labeled living body category corresponding to the comprehensive training image set; determining a plurality of sub-living body classification losses based on the plurality of sub-living body classification results and the corresponding labeled living body categories; and obtaining the second loss information based on a cumulative sum of the fused living body classification loss and the plurality of sub living body classification losses.

In a second aspect, the present specification also provides a living body detection method, including: acquiring an original image of a target object; and performing living body detection on the original image by adopting a living body detection model to obtain a living body detection result, wherein the living body detection model is obtained by training by adopting the training method of the living body detection model in the first aspect.

In a third aspect, the present specification also provides a living body detection system, comprising: at least one storage medium storing at least one set of instructions for performing a living organism detection; and at least one processor in communication with the at least one storage medium, wherein, when the biopsy device is in operation, the at least one processor reads a biopsy model and implements the biopsy method of the second aspect, the biopsy model being trained in accordance with the biopsy model training method of the first aspect.

According to the technical scheme, after the first training image set is acquired, the second training image set is generated based on the first training image set, the living body resolvable of the second training image set on the basis of the living body detection model is lower than that of the first training image set on the basis of the living body detection model, and the target neural network model is trained by using the first training image set and the second training image set, so that a target living body detection model with better living body resolution performance than that of the basis living body detection model is obtained; because the second training image set is generated based on the acquired first training image set, and the living body resolvable of the second training image set on the basic living body detection model is lower than that of the first training image set, the second training image set is difficult for the basic living body detection model to distinguish new domain samples of living body categories, so that the new domain samples can be generated through the existing domain samples, and the domain data acquisition efficiency is improved.

Other functions of the method of training a biopsy model, the method of biopsy, and the system provided in this specification are set forth in part in the description that follows. The following numbers and examples presented will be apparent to those of ordinary skill in the art in view of the description. The inventive aspects of the methods, apparatus, and systems for training a biopsy model provided herein may be best explained by practicing or using the methods, apparatus, and combinations described in the detailed examples below.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present description, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present description, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows an application scenario schematic of a living body detection system provided according to an embodiment of the present specification;

FIG. 2 illustrates a hardware architecture diagram of a computing device provided in accordance with an embodiment of the present description;

FIG. 3 shows a flow chart of a method of training a living body detection model provided in accordance with an embodiment of the present disclosure;

FIG. 4 shows a training flow of a living body detection model provided in accordance with an embodiment of the present disclosure;

fig. 5 shows a flowchart of a living body detection method provided according to an embodiment of the present specification.

Detailed Description

The following description is presented to enable one of ordinary skill in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the disclosure. Thus, the present description is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.

The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting. For example, as used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. The terms "comprises," "comprising," "includes," and/or "including," when used in this specification, are taken to specify the presence of stated integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

These and other features of the present specification, as well as the operation and function of the related elements of structure, as well as the combination of parts and economies of manufacture, may be significantly improved upon in view of the following description. All of which form a part of this specification, reference is made to the accompanying drawings. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended as a definition of the limits of the description. It should also be understood that the drawings are not drawn to scale.

The flowcharts used in this specification illustrate operations implemented by systems according to some embodiments in this specification. It should be clearly understood that the operations of the flow diagrams may be implemented out of order. Rather, operations may be performed in reverse order or concurrently. Further, one or more other operations may be added to the flowchart. One or more operations may be removed from the flowchart.

For convenience of description, terms appearing in the specification are explained first as follows:

living body anti-attack: in the face recognition system, an algorithm technology for detecting and intercepting living body attacks (including attack behaviors such as mobile phone photos, paper photos, masks and the like) is adopted.

Cross-domain living body detection: the same set of living body detection algorithms is used for living body detection between different data fields.

Different data fields: the data obtained by image acquisition is carried out under different scenes, such as different characteristics of acquired images caused by different gestures of people, heights of image acquisition devices, running time of the image acquisition devices (some devices are used at night and some devices are used in daytime), interaction time and the like.

Patch-wise: in this scheme, the patch (image block) of different domains is utilized to combine the images of the new domain so as to expand the domain category.

Before describing the specific embodiments of the present specification, the application scenario of the present specification will be described as follows:

the living body detection method provided by the specification can be applied to any scene needing identity verification, such as a face-brushing payment service scene of an off-line retail store, an artificial intelligence (Artificial Intelligence, AI) vending machine, an entrance guard machine, or other scenes needing face-brushing payment. When it is determined that the living body detection result of the detection object is a living body, the next authentication is performed. For example, in a payment scenario, a user may be detected in a living body by a living body detection method provided by the present specification, and face recognition may be performed on the user when the user is determined to be a living body, and a face-swipe payment operation may be allowed to be performed when the user is determined to be a legitimate user, or, in an entrance guard scenario, a user may be detected in a living body by a living body detection method provided by the present specification, and face recognition may be performed on the user when the user is determined to be a legitimate user, and traffic may be allowed when the user is determined to be a legitimate user, or, in an information inquiry type scenario, a living body detection may be performed on the user by a living body detection method provided by the present specification, and face recognition may be performed on the user when the user is determined to be a legitimate user, and information inquiry may be allowed to be performed on the user.

It should be understood by those skilled in the art that the living body detection method of the present specification is applicable to other usage scenarios and is also within the protection scope of the present specification.

Fig. 1 shows an application scenario schematic diagram of a living body detection system 001 provided according to an embodiment of the present specification. As shown in fig. 1, the living body detection system 001 (hereinafter, referred to as system 001) may include: target user 100, terminal device 200, server 300, and network 400. The terminal device 200 and the server 300 are connected to the network 400.

The target user 100 may be a user that triggers a living body detection of the portion to be detected, and the target user 100 may perform a living body detection operation at the terminal device 200. For example, the target user 100 may enter the face-brushing payment link through a series of operations in the payment service, or may trigger the living body detection function by placing a face within a face detection range in the entrance guard scene, or enter the face-brushing verification link through a series of operations in the information inquiry type scene.

The terminal device 200 may be a device that performs living body detection of a site to be detected in response to a living body detection operation of the target user 100. In some embodiments, the in-vivo detection model training method and the in-vivo detection method may be performed on the terminal device 200. At this time, the terminal device 200 may store data or instructions to perform the living body detection model training method and the living body detection method described in the present specification, and may execute or be used to execute the data or instructions. In some embodiments, the terminal device 200 may include a hardware device having a data information processing function and a program necessary to drive the hardware device to operate. As shown in fig. 1, the terminal device 200 may be communicatively connected to a server 300. In some embodiments, the server 300 may be communicatively coupled to a plurality of terminal devices 200. In some embodiments, the terminal device 200 may interact with the server 300 through the network 400 to receive or transmit messages or the like, such as receiving or transmitting an original image or a living body detection result. In some embodiments, the terminal device 200 may include a mobile device, a tablet, a laptop, a built-in device of a motor vehicle, or the like, or any combination thereof. In some embodiments, the mobile device may include a smart home device, a smart mobile device, a virtual reality device, an augmented reality device, or the like, or any combination thereof. In some embodiments, the smart home device may include a smart television, desktop computer, or the like, or any combination. In some embodiments, the smart mobile device may include a smart phone, personal digital assistant, gaming device, navigation device, etc., or any combination thereof. In some embodiments, the virtual reality device or augmented reality device may include a virtual reality helmet, virtual reality glasses, virtual reality patch, augmented reality helmet, augmented reality glasses, augmented reality patch, or the like, or any combination thereof. For example, the virtual reality device or augmented reality device may include google glass, head mounted display, VR, and the like. In some embodiments, built-in devices in a motor vehicle may include an on-board computer, an on-board television, and the like. In some embodiments, the terminal device 200 may include an image acquisition device for acquiring a first training image set, a third training image set, a fourth training image set, and an original image. In some embodiments, the image capture device may be a two-dimensional image capture device (such as an RGB camera), or may be a two-dimensional image capture device (such as an RGB camera) and a depth image capture device (such as a 3D structured light camera, a laser detector, etc.).

In some embodiments, the terminal device 200 may be installed with one or more Applications (APP). The APP can provide the target user 100 with the ability to interact with the outside world via the network 400 as well as an interface. APP includes, but is not limited to: web browser-like APP programs, search-like APP programs, chat-like APP programs, shopping-like APP programs, video-like APP programs, financial-like APP programs, instant messaging tools, mailbox clients, social platform software, and the like. In some embodiments, the terminal device 200 may have a target APP installed thereon. The target APP may acquire the first training image set, the third training image set, the fourth training image set, the original image, or the living body detection result through the terminal device 200. In some embodiments, the target user 100 may also trigger a liveness detection request through the target APP. The target APP may perform the living body detection method described in the present specification in response to the living body detection request. The living body detection method will be described in detail later. In some embodiments, the target user 100 may also trigger a liveness detection model training request through the target APP. The target APP may perform the in-vivo detection model training method described in the present specification in response to the in-vivo detection model training request. The living body detection model training method and the living body detection method will be described in detail later.

The server 300 may be a server providing various services, such as a background server providing support for a training image set and an original image acquired on the terminal device 200, and the like. In some embodiments, the in-vivo detection model training method and the in-vivo detection method may be performed on the server 300. At this time, the server 300 may store data or instructions to perform the living body model training method and the living body detection method described in the present specification, and may execute or be used to execute the data or instructions. In some embodiments, the server 300 may include a hardware device having a data information processing function and a program necessary to drive the hardware device to operate. The server 300 may be communicatively connected to a plurality of terminal apparatuses 200 and receive data transmitted from the terminal apparatuses 200.

The network 400 is a medium used to provide a communication connection between the terminal device 200 and the server 300. The network 400 may facilitate the exchange of information or data. As shown in fig. 1, the terminal device 200 and the server 300 may be connected to a network 400 and mutually transmit information or data through the network 400. In some embodiments, the network 400 may be any type of wired or wireless network, or a combination thereof. For example, network 400 may include a cable network, a wireline network, a fiber optic network, a telecommunications network, an intranet, the Internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Wide Area Network (WAN), a Public Switched Telephone Network (PSTN), a Bluetooth network ^TM 、ZigBee ^TM Network, near field communication (NFC) A network or the like. In some embodiments, network 400 may include one or more network access points. For example, network 400 may include a wired or wireless network access point, such as a base station or an internet switching point, through which one or more components of terminal device 200 and server 300 may connect to network 400 to exchange data or information.

It should be understood that the number of terminal devices 200, servers 300, and networks 400 in fig. 1 are merely illustrative. There may be any number of terminal devices 200, servers 300, and networks 400, as desired for implementation.

The living body model training method and the living body detection method may be performed entirely on the terminal device 200, entirely on the server 300, or partially on the terminal device 200 and partially on the server 300.

Fig. 2 illustrates a hardware architecture diagram of a computing device 500 provided in accordance with an embodiment of the present description. The computing device 500 may perform the in-vivo detection model training method and in-vivo detection method described herein. Living detection model training methods and living detection methods are described elsewhere in this specification. When the living body detection model training method and the living body detection method are executed on the terminal device 200, the computing device 500 may be the terminal device 200. When the in-vivo detection model training method and the in-vivo detection method are executed on the server 300, the computing device 500 may be the server 300. When the in-vivo detection model training method and the in-vivo detection method may be partially executed on the terminal device 200 and partially executed on the server 300, the computing device 500 may be the terminal device 200 and the server 300.

As shown in fig. 2, computing device 500 may include at least one storage medium 530 and at least one processor 520. In some embodiments, computing device 500 may also include a communication port 550 and an internal communication bus 510. Meanwhile, the computing device 500 may also include an I/O component 560.

Internal communication bus 510 may connect the different system components including storage medium 530, processor 520, and communication port 550.

I/O component 560 supports input/output between computing device 500 and other components.

The communication port 550 is used for data communication between the computing device 500 and the outside world, for example, the communication port 550 may be used for data communication between the computing device 500 and the network 400. The communication port 550 may be a wired communication port or a wireless communication port.

The storage medium 530 may include a data storage device. The data storage device may be a non-transitory storage medium or a transitory storage medium. For example, the data storage devices may include one or more of a magnetic disk 532, a Read Only Memory (ROM) 534, or a Random Access Memory (RAM) 536. The storage medium 530 further includes at least one set of instructions stored in a data storage device. The instructions are computer program code that may include programs, routines, objects, components, data structures, procedures, modules, etc. that perform the biopsy model training methods and biopsy methods provided herein.

The at least one processor 520 may be communicatively coupled with at least one storage medium 530 and a communication port 550 via an internal communication bus 510. The at least one processor 520 is configured to execute the at least one instruction set described above. When the computing device 500 is running, the at least one processor 520 reads at least one instruction set and, according to the instructions of the at least one instruction set, performs the in-vivo detection model training method and the in-vivo detection method provided herein. Processor 520 may perform all steps involved in the in vivo detection model training method and in vivo detection method. Processor 520 may be in the form of one or more processors, in some embodiments processor 520 may include one or more hardware processors, such as microcontrollers, microprocessors, reduced Instruction Set Computers (RISC), application Specific Integrated Circuits (ASIC), application specific instruction set processors (ASIP), central Processing Units (CPU), graphics Processing Units (GPU), physical Processing Units (PPU), microcontroller units, digital Signal Processors (DSP), field Programmable Gate Arrays (FPGA), advanced RISC Machines (ARM), programmable Logic Devices (PLD), any circuit or processor capable of executing one or more functions, or the like, or any combination thereof. For illustrative purposes only, only one processor 520 is depicted in the computing device 500 of the present specification. It should be noted, however, that computing device 500 may also include multiple processors, and thus, operations and/or method steps disclosed in this specification may be performed by one processor, as in this specification, or may be performed jointly by multiple processors. For example, if the processor 520 of the computing device 500 performs steps a and B in this specification, it should be understood that steps a and B may also be performed by two different processors 520 in combination or separately (e.g., a first processor performs step a, a second processor performs step B, or the first and second processors perform steps a and B together).

Fig. 3 shows a flowchart of a living body detection model training method P100 provided according to an embodiment of the present specification. As before, computing device 500 may perform liveness model training method P100 of the present specification. Specifically, the processor 520 may read the instruction set stored in its local storage medium and then execute the in-vivo detection model training method P100 of the present specification according to the specification of the instruction set. As shown in fig. 3, the method P100 may include:

s110: a first training image set is acquired.

The first training image set includes a plurality of training images, each training image is an image of a target detection part of a training object, and the training object is an object to be subjected to living detection in the training images, such as a user. When the training object is a user, the target detection part may be a human body part such as a human face, an iris, a fingerprint, etc. capable of extracting biological characteristics.

The method for acquiring the first training image set may be various, and specifically may be as follows:

for example, the processor 520 may receive a face training image set of the user uploaded by the target user 100 through the terminal device, or may acquire at least one face image from a network or a public image data set, thereby obtaining a first training image set, or may receive a model training request, and acquire the first training image set based on a storage address of the first training image set carried in the model training request.

S120: a second training image set is generated based on the first training image set.

Wherein the in vivo resolvable of the second training image set in the base in vivo detection model is lower than the in vivo resolvable of the first training image set in the base in vivo detection model. The living body resolvable refers to the resolution of the basic living body detection model to the living body category, for example, the basic living body detection model can well resolve the living body category of the training object in the first training image set, but cannot well resolve the living body category of the training object in the second training image set.

The manner of generating the second training image set based on the first training image set may be various, for example, the processor 520 combines the image blocks of the plurality of training images in the first training image set by using the trained reinforcement learning network, so as to obtain an intermediate training image set corresponding to the first training image set, and performs domain sample division on the intermediate training image set, so as to obtain the second training image set. The reinforcement learning network can use the existing domain data to combine to obtain new domain data based on the method of generating the patch-phase sample, wherein each new domain data is formed by combining patches of different domain data in the existing domain data, and the domain class of the new domain data is a domain class which does not appear in the existing domain data.

Both training and domain sample partitioning of the reinforcement learning network require the use of a trained basic living model, thus allowing the training process of the basic living model to be introduced before introducing the training process of the reinforcement learning network.

The basic living body detection model can be obtained by training in the following way: the processor 520 may acquire a third training image set, record each training image in the third training image set as a target training image, randomly shuffle image blocks corresponding to the target training image by using a preset basic living body detection network to obtain at least two disturbing training images corresponding to the target training image, perform feature extraction and prediction on the at least two disturbing training images to obtain feature vectors and living body classification results corresponding to the at least two disturbing training images, determine first loss information based on the feature vectors and the living body classification results corresponding to the at least two disturbing training images, and converge the preset basic living body detection network based on the first loss information to obtain a basic living body detection model.

The training images in the third training image set may be completely overlapped or partially overlapped with the training images in the first training image set, for example, the first training image set is adopted as the third training image set, or the first training image set is expanded to obtain the third training image set.

The basic living body detection model is a model which is insensitive to a human face structure and is obtained by training a preset basic living body detection network based on a patch-wise random shuffle strategy, so that the basic living body detection model is not excessively dependent on the distribution of human face five sense organs, human face contour and other human face structure information, but focuses on texture information in a human face image, such as whether a screen reflects light, paper materials and the like, and can judge whether an object in the human face image is living body information; in the model training process, the disturbing training images obtained by randomly disturbing the image blocks of the training images in the third training image set are used for training, so that the dependence of living body detection on the face structure can be removed, and the living body detection performance of the model is still stable when the face structure changes, such as the face posture or the face angle changes, in some cross-domain scenes, and can be well performed.

The preset basic living body detection network comprises a patch-wise random shuffle sample generator and a feature extraction module, and the feature extraction module can be a network structure for feature extraction, such as a resnet18, a resnet50 and a transform network. The patch-wise random shuffle sample generator is used for randomly scrambling image blocks corresponding to the target training images to obtain at least two scrambled training images corresponding to the target training images. The feature extraction module is used for extracting features of at least two disturbing training images corresponding to each training image in the third training image set to obtain feature vectors of the disturbing training images corresponding to the training images in the third training image set, and performing living body judgment based on the feature vectors of the disturbing training images corresponding to the third training image set to obtain a final living body classification result.

The patch-wise random shuffle sample generator is configured to randomly shuffle image blocks corresponding to a target training image, and obtain at least two disturbed training images corresponding to the target training image, where the following manner may be adopted: for example, the target training image is divided into w×h regions, and then the w×h regions are randomly switched in order, so as to obtain the target training image after the sequence is disturbed, i.e. the disturbed training image. Here, each pair of target training images randomly scrambles the order of the image blocks once, so that one disturbing training image can be obtained, and at least two disturbing training images corresponding to the target training images can be obtained by scrambling the order of the image blocks at least twice with different scrambling orders.

After obtaining at least two disturbing training images corresponding to each training image in the third training image set, at least two disturbing training images corresponding to each training image in the third training image set can be input into a feature extraction network to obtain feature vectors and living body classification results of the disturbing training images corresponding to the training images in the third training image set. The living body classification result characterizes the living body category of the training object, namely, whether the training object in the training image is the living body category. For example, the living category may include living images/objects/samples, and attack images/objects/samples.

After feature extraction and living body classification prediction are performed on the disturbing training images corresponding to the training images of the third training image set, the processor 520 may determine the first loss information based on the feature vectors and the living body classification result of the disturbing training images corresponding to the training images of the third training image set. The manner of determining the first loss information may be various based on the feature vectors and the living body classification results corresponding to the at least two disturbance training images, for example, the processor 520 may determine feature comparison loss information based on the feature vectors corresponding to the at least two disturbance training images, determine at least two first living body classification loss information corresponding to the at least two disturbance training images based on the living body classification results corresponding to the at least two disturbance training images and the labeled living body class of the target training image, and obtain the first loss information based on the accumulated sum of the feature comparison loss information and the at least two first living body classification loss information. The constraint condition of the feature comparison loss information is that the similarity between feature vectors corresponding to at least two disturbance training images is minimized, and the similarity between feature vectors of disturbance training images corresponding to different target training images is maximized. The feature comparison loss information aims at ensuring that feature vectors of disturbed training images obtained by two different disturbed orders of the same training image are kept consistent and feature vectors of different disturbed training images obtained by the same disturbed order of different training images are kept away. The first living organism classification loss information characterization feature extraction module characterizes loss information generated for differences between at least two living organism categories perturbing the training image predictions and the annotated living organism category.

After determining the feature comparison loss information and the at least two first living organism classification loss information, the processor 520 may accumulate the feature comparison loss information and the at least two first living organism classification loss information to obtain the first loss information. The first loss information may be expressed as the following formula (1):

Loss _base ＝Loss _contrastive +Loss _cls ；(1)

in formula (1), loss _base For the first Loss information, loss _contrastive Loss information for feature comparison, loss _cls Loss information is classified for at least two first living organisms.

The feature comparison loss information is aimed at least two disturbance training images corresponding to the original similar training images, such as the same training image, after the dimension reduction (feature extraction), the at least two disturbance training images are still similar in the feature space, and the two disturbance training images which are not similar originally, such as different training images, even if the disturbance order of the image blocks is the same, after the dimension reduction, the two training images are still dissimilar in the feature space.

After determining the first loss information, the processor 520 may converge the preset basic living detection network based on the first loss information, so as to obtain a trained basic living detection model. The method of the processor 520 converging the preset basic living body detection network based on the first loss information may be various, for example, the processor 520 may update the network parameters of the preset basic living body detection network based on the first loss information by using a gradient descent algorithm, and continue training until the preset basic living body detection network reaches a preset training end condition, so as to obtain a trained basic living body detection model.

The preset training ending condition may be that the training frequency reaches a preset frequency or the first loss information is minimized, etc.

After the basic living body detection model is trained, the reinforcement learning network can be trained based on the trained basic living body detection model. The reinforcement learning network can be trained by the following steps: for example, the processor 520 obtains a fourth training image set, combines image blocks of a plurality of training images in the fourth training image set by using a preset reinforcement learning network to obtain a new training image set corresponding to the fourth training image set, predicts a living body class corresponding to the new training image set by using a basic living body detection model to obtain a living body classification result corresponding to the new training image set, determines second loss information based on the living body classification result corresponding to the new training image set and a labeled living body class of the fourth training image set, and converges the preset reinforcement learning network toward a direction that maximizes the second loss information to obtain a trained reinforcement learning network.

The preset reinforcement learning network may adopt a DQN (Deep Q-network), and when image blocks of a plurality of training images in the fourth training image set are combined in each iterative training process of the preset reinforcement learning network, at least two training images are randomly selected from the plurality of training images in the fourth training image set, and a new training image is generated based on the at least two training images that are randomly selected. The method of randomly selecting at least two training images from the plurality of training images in the fourth training image set and generating a new training image based on the at least two randomly selected training images may be various, for example, the processor 520 randomly selects at least two training images, performs image block division on the at least two training images, selects some image block combinations from the image blocks of the at least two training images to obtain a new training image, then inputs the new training image into the basic living body detection model to predict the living body category of the new training image, and determines the second loss information based on the living body category and the labeled living body category of the new training image.

Wherein the function of the second Loss information can be expressed as Loss in the above formula (1) _cls The second loss information characterizes the loss information generated by the difference between the living class of the new training image and the labeled living class, and after determining the second loss information, the processor 520 may optimize the loss function in a direction that maximizes the second loss information. Here, maximizing the second loss information of the underlying living detection model aims at making the living resolvable of the new training image learned by the reinforcement learning network on the underlying living detection model low. And, in the training process of the reinforcement learning network, only the parameters of the reinforcement learning network are updated using the loss function of the basic living detection model, and the parameters of the basic living detection model are not updated. That is, during the reinforcement learning network training process, the performance of the basic living detection model remains unchanged, and the reinforcement learning network is trained only by using the trained basic living detection model. In this way, the base living body detection model can be used for identifying the new training image generated by the reinforcement learning network, and when the living body resolvable of the new training image learned by the reinforcement learning network on the base living body detection model is lower, the accuracy of the image of the new domain category which is not trained by the reinforcement learning network based on the new training image is higher.

After the reinforcement learning network is utilized to generate the intermediate training image set for the first training image set, domain categories of training images in the intermediate training image set need to be determined for subsequent training of the target neural network model. The implementation manner of determining the domain category of the training image in the middle training image set is various, and specifically may be as follows: for example, the processor 520 inputs the intermediate training image set to the basic living body detection model to obtain a feature vector corresponding to the intermediate training image set, performs N-type clustering based on the feature vector corresponding to the intermediate training image set to obtain N domain categories corresponding to the intermediate training image set, where N is an integer greater than or equal to 1, and obtains a second training image set based on the intermediate training image set and the N domain categories corresponding to the intermediate training image set; the N domain categories are domain categories that do not appear in the first training image set, and when the N domain categories do not appear in the first training image set, the basic living body detection model cannot perform living body resolution well, and the generated intermediate training image set is characterized as a new domain category relative to the domain category of the first training image set.

In some embodiments, in order to ensure that the basic living detection model can better serve the training of the reinforcement learning network, the domain class corresponding to the first training image set may be substantially the same as the domain class corresponding to the third training image set, that is, the domain class of the first training image set may be the domain class of the training image set used in the training process of the basic living detection model, so that in the training process of the reinforcement learning network, the generated second training image set is ensured to be the training image of the new domain class of the basic living detection model, which cannot distinguish the living class.

The N-type clustering is to group the feature vectors of the training images belonging to the same domain category in the middle training image set into one type, and the feature vectors of the training images are in one-to-one correspondence, so that the clustering result of the feature vectors of the training images can represent the clustering result of the training images, that is, the domain category corresponding to the feature vectors of the training images is the domain category of the training images. The N-type clustering mode can be various, for example, a clustering algorithm such as a KMeans clustering method.

It should be understood that KMeans clustering method is an exemplary illustration, and that N types of clustering is performed on feature vectors corresponding to the intermediate training image set for other clustering algorithms, which are all within the protection scope of the present specification.

S130: and training the target neural network model by using the comprehensive training image set to obtain a target living body detection model with living body resolution performance superior to that of the basic living body detection model.

The comprehensive training image set comprises a first training image set and a second training image set. After the first training image set and the second training image set are obtained, the first training image set and the second training image set can be combined, and the combined comprehensive training image set is applied to training of the target neural network model, so that the target living body detection model is obtained.

After the intermediate training image set is subjected to domain sample division, training images in the intermediate training image set can be clustered into N sub-training image sets, each sub-training image set in the N sub-training image sets corresponds to one domain type in N domain types, each training image in the first training image set corresponds to one domain type with a manual label, the training image set can be marked as M domain types, namely, the first training image set can be regarded as the sub-training image set corresponding to the M domain types, and M is an integer greater than or equal to 1. Based on M domain categories corresponding to the first training image set and N new domain categories corresponding to the second training image set, a sub-training image set of the comprehensive training image set corresponding to the domain categories can be obtained, which is equivalent to dividing training images belonging to the same domain category in the comprehensive training image set into one sub-training image set, so that the sub-training image set of the domain categories is obtained.

The target neural network model comprises a feature extraction network, a multi-branch classification network and a fusion classification network, wherein the multi-branch classification network comprises a plurality of sub-classification networks; the feature extraction network may be a network such as a backhaul for performing feature extraction, the multi-branch classification network includes a plurality of sub-classification networks, the structures of the plurality of sub-classification networks are the same, each sub-classification network may be a residual network structure of a plurality of convolution layers+relu layers, and the fusion classification network may be a structure of a full-connection layer+se block.

After the comprehensive training image set is obtained, the target neural network model can be trained based on the comprehensive training image set, and the target living body detection model of the living body resolution performance optimization basic living body detection model is obtained. The method for training the target neural network model by using the comprehensive training image set to obtain the target living body detection model with living body resolution performance superior to that of the basic living body detection model can be as follows:

for example, the processor 520 performs feature extraction on the sub-training image sets of the multiple domain categories by using a feature extraction network to obtain multiple first training feature sets corresponding to the multiple sub-training image sets, performs feature extraction and prediction on the multiple first training feature sets by using multiple sub-branch classification networks to obtain multiple second training feature sets corresponding to the multiple first training feature sets and multiple sub-living body classification results, performs living body category fusion prediction by using the fusion classification network based on the multiple second training feature sets and the multiple sub-living body classification results to obtain a fusion living body classification result, and determines second loss information based on the fusion living body classification result and the labeled living body category corresponding to the comprehensive training image set, and converges the fusion classification network, the multi-branch classification network and the feature extraction network based on the second loss information to obtain the target living body detection model.

The input of the feature extraction network is all training images in the comprehensive training image set, and outputs first feature vectors of each training image in all training images in the comprehensive training image set, then the first feature vectors of at least one training image in the same domain category are input into the corresponding sub-branch classification network, so as to obtain second feature vectors and living body categories of at least one training image in the same domain category, for example, the first feature vectors of the training image in the A domain category are input into the sub-branch classification network of the A domain category, so as to obtain second feature vectors and sub-living body classification results of the training image in the A domain category, the first feature vectors of the training image in the B domain category are input into the sub-branch classification network of the B domain category, so as to obtain second feature vectors and sub-living body classification results of the training image in the C domain category, and the training image in the C domain category can obtain the second feature vectors and the sub-living body classification results of the sub-branch classification network of the C domain category, so as to obtain the training image in the sub-domain category set of each of the training image in the multiple domain category. The sub-living body classification result characterizes whether the training object of each sub-training image set is a category of living body.

After obtaining the second feature vector corresponding to the training image in the sub-training image set of each domain category in the plurality of domain categories and the sub-living body classification result corresponding to the sub-training image set of each domain category in the plurality of domain categories, the second feature vector corresponding to the training image in the sub-training image set of each domain category in the plurality of domain categories and the sub-living body classification result corresponding to the sub-training image set of each domain category in the plurality of domain categories can be input into the fusion classification network so as to perform self-adaptive feature fusion through SE block (quantitative-and-exteriorization blocks), thereby obtaining the fusion classification result corresponding to the comprehensive training image set. The fusion classification result characterizes whether the training object in the comprehensive training image set is a living body type. The implementation manner of obtaining the fused living body classification result of the plurality of sub-training image sets is various by adopting the fusion classification network to perform feature fusion and predict living body types of the second training feature sets of the plurality of sub-training image sets, for example, the processor 520 adopts the fusion classification network to adaptively adjust weights of the plurality of second training feature sets based on differences between the plurality of sub-classification results and the labeled living body types, so as to obtain weights of the plurality of second training feature sets; feature fusion is carried out on the plurality of second training feature sets based on the weights of the plurality of second training feature sets, so that fusion training features of the plurality of second training feature sets are obtained; and predicting the living body category of the plurality of sub-training image sets based on the fusion training characteristics to obtain a fusion living body classification result of the plurality of sub-training image sets.

The difference between the sub-classification result and the labeled living body category in the multiple sub-classification results is inversely related to the weight of the second training feature corresponding to the sub-classification result, that is, the smaller the difference between the sub-classification result and the labeled living body category is, the larger the weight of the second training feature corresponding to the sub-classification result is, otherwise, the larger the difference between the sub-classification result and the labeled living body category is, the smaller the weight of the second training feature corresponding to the sub-classification result is.

After the fused living body classification result corresponding to the comprehensive training image set is obtained, second loss information is determined based on the fused living body classification result, the marked living body category corresponding to the plurality of sub-training image sets, and the sub-living body classification result and the marked living body category of the plurality of sub-training image sets, and the fused classification network, the multi-branch classification network and the feature extraction network are converged based on the second loss information, so that the target living body detection model is obtained. The implementation manner of determining the second loss information may be various based on the fused living body classification result and the labeled living body category corresponding to the plurality of sub-training image sets, and the sub-living body classification result and the labeled living body category of the plurality of sub-training image sets, for example, the processor 520 may determine the fused living body classification loss based on the fused living body classification result and the labeled living body category corresponding to the plurality of sub-training image sets, determine the plurality of sub-living body classification loss based on the sub-living body classification result of the plurality of sub-training image sets and the labeled living body category corresponding thereto, and obtain the second loss information based on the accumulated sum of the fused living body classification loss and the plurality of sub-living body classification loss.

The fusion living body classification loss represents loss information generated by differences between fusion living body classification results and labeled living body categories corresponding to a plurality of sub-training image sets, and the sub-living body classification loss represents loss information generated by differences between the sub-living body classification results corresponding to each sub-training image set and the labeled living body categories corresponding to each sub-training image set. After determining the fusion living body classification loss and the plurality of sub living body classification losses, the fusion living body classification loss and the plurality of sub living body classification losses can be accumulated to obtain second loss information.

After determining the second loss information, the processor 520 may converge the feature extraction network, the multi-branch classification network, and the fusion classification network based on the second loss information, thereby obtaining a trained target living detection model. The method of the processor 520 converging the feature extraction network, the multi-branch classification network, and the fusion classification network based on the second loss information may be various, for example, the processor 520 may update the network parameters of the preset feature extraction network, the multi-branch classification network, and the fusion classification network based on the second loss information by using a gradient descent algorithm, and continue training until the feature extraction network, the multi-branch classification network, and the fusion classification network reach a preset training end condition, thereby obtaining a trained target living body detection model. The preset training ending condition may be that the training frequency reaches a preset frequency or the second loss information is minimized, etc.

After training the feature extraction network, the multi-branch classification network and the fusion classification network to obtain a living body detection model, the living body detection model can be adopted to carry out living body detection.

In order to reduce the labor cost and the time cost of collecting the domain sample data and improve the collection efficiency of the domain sample data, the scheme provides a cross-domain generalization living body detection method based on patch-wise domain generation. The overall flow diagram of the patch-wise domain generation-based cross-domain generalization living detection method can be shown in fig. 4, and mainly comprises four parts: basic living detection model training, patch-wise sample generation based on performance decay, multi-domain sample based joint training and living detection can be specifically as follows:

(1) Basic living body detection model training: and (3) performing basic living body model training based on a patch-wise random shuffle strategy to obtain a basic living body model insensitive to face structure information.

(2) Performance decay based patch-wise sample generation: by using the reinforcement learning method, new training images of a plurality of domains which are most difficult to distinguish on the basic model are generated, and each new training image is formed by combining patches of different domains in the existing training images and can be regarded as a brand new domain training image.

(3) Joint training based on multi-domain samples: and carrying out multi-domain and multi-branch combined training based on the generated patch-wise training image.

(4) Living body detection: the multi-branch and intelligent fusion living body detection model is obtained by adopting the multi-domain sample based joint training and is deployed on corresponding equipment to carry out living body detection.

After training to obtain the target living body detection model, the living body detection model can be used for living body detection.

Fig. 5 illustrates a flow chart of a method P200 for in-vivo detection provided in accordance with some embodiments of the present description. As previously described, the computing device 500 may perform the in-vivo detection method P200 described herein. Specifically, the processor 620 may read an instruction set stored in its local storage medium and then execute the living body detection method P200 described in the present specification according to the specification of the instruction set. As shown in fig. 5, the method P200 may include:

s210: an original image of an object to be detected is acquired.

The object to be detected refers to an object to be subjected to living body detection, such as a human body. The original image may be a face image of the object to be detected acquired by the image acquisition device.

S220: inputting the original image into a living body detection model to obtain a living body detection result corresponding to the original image, and training the living body detection model according to a living body detection model training method.

For example, the processor 520 may directly input the original image to the living body detection model trained by the training method of fig. 3, so as to perform living body detection on the object to be detected in the original image, thereby obtaining a living body detection result.

When the living body detection model is adopted to carry out living body detection on the original image, the original image can be input into the feature extraction network to carry out feature extraction, so that a first feature vector corresponding to the original image is obtained, the first feature vector of the original image is simultaneously input into a plurality of sub-classification networks, and the sub-classification networks respectively carry out feature extraction according to the first feature vector of the original image, so that a plurality of second feature vectors corresponding to the original image are obtained. The plurality of second feature vectors may be input into a fusion classification network, and adaptive feature fusion is performed through SE block to obtain living body classification probability, and based on the living body classification probability, alternative execution is performed in the first operation or the second operation: the first operation is to determine that the object to be detected in the original image is an attacker if the living body classification probability is greater than a threshold T. The second operation is to determine that the object to be detected in the original image is a living body if the living body classification probability is less than or equal to a threshold T.

The adaptive feature fusion may be understood as that the fusion classification network is capable of adaptively adjusting the weight of the feature vector of each of the received multiple branches, so as to weight the feature vector from each branch based on the adjusted weight, thereby performing feature fusion, and performing in-vivo detection based on the fused feature vector.

In the self-adaptive feature fusion process, a fusion classification network fuses feature vectors obtained by a plurality of branches by adopting an attribute-based method to obtain a final living body classification result, wherein the attribute-based method is a similarity measure, and the more similar the feature vector is to a target state, the greater the weight of a current feature vector is, which indicates that the current output is more dependent on the current input feature vector.

The specification has the following specific characteristics: (1) The method based on patch-wise sample generation is adopted, and the existing domain data are used for combination to obtain new domain data, so that the problem of data diversity is solved, and the acquisition efficiency of different domain data is improved. (2) In the training process of the basic detection model, the human face structure is directly disturbed from the input by a patch-wise random shuffle sample generation and comparison learning method, and the dependence of living detection on the human face structure is removed. (3) In the training process of the target neural network model, a multi-branch network structure is adopted, so that each branch learns training images of a plurality of similar domains, and characteristics obtained by a plurality of branches are fused by adopting an attention-based to obtain a final living body classification result; therefore, the generalization capability requirement of each branch can be reduced, and better overall performance is obtained.

In summary, after the first training image set is acquired, a second training image set is generated based on the first training image set, the living body resolvable of the second training image set in the basic living body detection model is lower than the living body resolvable of the first training image set in the basic living body detection model, and the first training image set and the second training image set are used for training the target neural network model to obtain a target living body detection model with better living body resolution performance than the basic living body detection model; because the second training image set is generated based on the acquired first training image set, and the living body resolvable of the second training image set on the basic living body detection model is lower than that of the first training image set, the second training image set is difficult for the basic living body detection model to distinguish new domain samples of living body categories, so that the new domain samples can be generated through the existing domain samples, and the domain data acquisition efficiency is improved.

In another aspect of the present description, a non-transitory storage medium is provided storing at least one set of executable instructions for performing in vivo model training and in vivo detection. When executed by a processor, the executable instructions direct the processor to implement the steps of the liveness detection model training method P100 and the liveness detection method P200 described herein. In some possible implementations, aspects of the specification can also be implemented in the form of a program product including program code. The program code is for causing the computing device 500 to perform the steps of the in-vivo detection model training method P100 and the in-vivo detection method P200 described herein when the program product is run on the computing device 500. The program product for implementing the methods described above may employ a portable compact disc read only memory (CD-ROM) comprising program code and may run on computing device 500. However, the program product of the present specification is not limited thereto, and in the present specification, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system. The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. The computer readable storage medium may include a data signal propagated in baseband or as part of a carrier wave, with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable storage medium may also be any readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing. Program code for carrying out operations of the present specification may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on computing device 500, partly on computing device 500, as a stand-alone software package, partly on computing device 500, partly on a remote computing device, or entirely on a remote computing device.

The foregoing describes specific embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In view of the foregoing, it will be evident to a person skilled in the art that the foregoing detailed disclosure may be presented by way of example only and may not be limiting. Although not explicitly described herein, those skilled in the art will appreciate that the present description is intended to encompass various adaptations, improvements, and modifications of the embodiments. Such alterations, improvements, and modifications are intended to be proposed by this specification, and are intended to be within the spirit and scope of the exemplary embodiments of this specification.

Furthermore, certain terms in the present description have been used to describe embodiments of the present description. For example, "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present description. Thus, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined as suitable in one or more embodiments of the invention.

It should be appreciated that in the foregoing description of embodiments of the present specification, various features have been combined in a single embodiment, the accompanying drawings, or description thereof for the purpose of simplifying the specification in order to assist in understanding one feature. However, this is not to say that a combination of these features is necessary, and it is entirely possible for a person skilled in the art to label some of the devices as separate embodiments to understand them upon reading this description. That is, embodiments in this specification may also be understood as an integration of multiple secondary embodiments. While each secondary embodiment is satisfied by less than all of the features of a single foregoing disclosed embodiment.

Each patent, patent application, publication of patent application, and other materials, such as articles, books, specifications, publications, documents, articles, etc., cited herein are hereby incorporated by reference. All matters are to be interpreted in a generic and descriptive sense only and not for purposes of limitation, except for any prosecution file history associated therewith, any and all matters not inconsistent or conflicting with this document or any and all matters not complaint file histories which might have a limiting effect on the broadest scope of the claims. Now or later in association with this document. For example, if there is any inconsistency or conflict between the description, definition, and/or use of terms associated with any of the incorporated materials, the terms in the present document shall prevail.

Finally, it is to be understood that the embodiments of the application disclosed herein are illustrative of the principles of the embodiments of the present specification. Other modified embodiments are also within the scope of this specification. Accordingly, the embodiments disclosed herein are by way of example only and not limitation. Those skilled in the art can adopt alternative arrangements to implement the application in the specification based on the embodiments in the specification. Therefore, the embodiments of the present specification are not limited to the embodiments precisely described in the application.

Claims

1. A method of training a living body detection model, comprising:

acquiring a first training image set; generating a second training image set based on the first training image set, wherein the living body resolvable of the second training image set in a basic living body detection model is lower than that of the first training image set in the basic living body detection model; and

and training the target neural network model by using a comprehensive training image set to obtain a target living body detection model with living body resolution performance superior to that of a basic living body detection model, wherein the comprehensive training image set comprises the first training image set and the second training image set.

2. The method of claim 1, wherein the base living detection model is trained using the method steps of:

Acquiring a third training image set, and recording each training image in the third training image set as a target training image;

randomly scrambling the image blocks corresponding to the target training images by adopting a preset basic living body detection network to obtain at least two scrambled training images corresponding to the target training images;

performing feature extraction and living body category prediction on the at least two disturbance training images to obtain feature vectors and living body classification results corresponding to the at least two disturbance training images; and

and determining first loss information based on feature vectors and living body classification results corresponding to the at least two disturbance training images, and converging the preset basic living body detection network based on the first loss information to obtain the basic living body detection model.

3. The method of claim 2, wherein the determining the first loss information based on the feature vectors and the living body classification results corresponding to the at least two perturbed training images comprises:

determining feature comparison loss information based on feature vectors corresponding to the at least two disturbance training images, wherein constraint conditions of the feature comparison loss information are that the similarity between the feature vectors corresponding to the at least two disturbance training images is minimized, and the similarity between the feature vectors of disturbance training images corresponding to different target training images is maximized;

Determining at least two first living body classification loss information corresponding to the at least two disturbance training images based on living body classification results corresponding to the at least two disturbance training images and the labeled living body category of the target training image; and

and obtaining the first loss information based on the accumulated sum of the feature comparison loss information and the at least two first living body classification loss information.

4. The method of claim 1, wherein the generating a second training image set based on the first training image set comprises:

combining image blocks of a plurality of training images in the first training image set by adopting a trained reinforcement learning network to obtain an intermediate training image set corresponding to the first training image set; and

and carrying out domain sample division on the intermediate training image set to obtain the second training image set.

5. The method of claim 4, wherein the reinforcement learning network is trained using the method steps of:

acquiring a fourth training image set, and combining image blocks of a plurality of training images in the fourth training image set by adopting a preset reinforcement learning network to obtain a new training image set corresponding to the fourth training image set;

Predicting the living body category corresponding to the new training image set by adopting the basic living body detection model to obtain a living body classification result corresponding to the new training image set; and

and determining second loss information based on a living body classification result corresponding to the new training image set and the marked living body category of the fourth training image set, and converging the preset reinforcement learning network towards the direction of maximizing the second loss information to obtain the trained reinforcement learning network.

6. The method of claim 4, wherein the performing domain sample division on the intermediate training image set to obtain the second training image set includes:

inputting the intermediate training image set into the basic living body detection model to obtain a feature vector corresponding to the intermediate training image set;

n-class clustering is carried out on the feature vectors corresponding to the middle training image set to obtain N new domain classes corresponding to the middle training image set, wherein N is an integer greater than or equal to 1; and

and obtaining the second training image set based on the intermediate training image set and N new domain categories corresponding to the intermediate training image set.

7. The method of claim 6, wherein the method of N-class clustering is KMeans clustering method.

8. The method of any of claims 1-7, wherein the comprehensive training image set corresponds to a sub-training image set of a plurality of domain categories including M domain categories corresponding to the first training image set and N new domain categories corresponding to the second training image set, the M being an integer greater than or equal to 1, the target neural network model including a feature extraction network, a multi-branch classification network, and a fusion classification network, the multi-branch classification network including a plurality of sub-classification networks; and

the training of the target neural network model by using the comprehensive training image set to obtain a target living body detection model with living body resolution performance superior to that of a basic living body detection model comprises the following steps:

extracting features of the sub-training image sets of the domain categories by adopting the feature extraction network to obtain a plurality of first training feature sets corresponding to the sub-training image sets;

the multiple sub-branch classification networks are adopted to respectively conduct feature extraction and living body category prediction on the multiple first training feature sets, and multiple second training feature sets and multiple sub-living body classification results corresponding to the multiple first training feature sets are obtained;

Performing living body category fusion prediction based on the plurality of second training feature sets and the plurality of sub living body classification results by adopting the fusion classification network to obtain a fusion living body classification result; and

and determining second loss information based on the fusion living body classification result and the marked living body category corresponding to the comprehensive training image set, and converging the fusion classification network, the multi-branch classification network and the feature extraction network based on the second loss information to obtain the target living body detection model.

9. The method of claim 8, wherein the employing the fusion classification network to perform feature fusion and predict the living body class on the second training features of the plurality of sub-training image sets to obtain the fusion living body classification result of the plurality of sub-training image sets comprises:

the fusion classification network is adopted to adaptively adjust the weights of the plurality of second training feature sets based on the difference between the sub-classification results and the labeled living body categories, so that the weights of the plurality of second training feature sets are obtained;

feature fusion is carried out on the plurality of second training feature sets based on the weights of the plurality of second training feature sets, so that fusion training features of the plurality of second training feature sets are obtained; and

And predicting the living body category of the plurality of sub-training image sets based on the fusion training characteristics to obtain a fusion living body category result of the plurality of sub-training image sets.

10. The method of claim 8, wherein the determining second loss information based on the fused living organism classification result and the labeled living organism category corresponding to the comprehensive training image set, the plurality of sub-living organism classification results and the labeled living organism category corresponding thereto, comprises:

obtaining a fusion living body classification loss based on the fusion living body classification result and the labeled living body category corresponding to the comprehensive training image set;

determining a plurality of sub-living body classification losses based on the plurality of sub-living body classification results and the corresponding labeled living body categories; and

and obtaining the second loss information based on the accumulated sum of the fusion living body classification loss and the plurality of sub living body classification losses.

11. A living body detection method, comprising:

acquiring an original image of an object to be detected; and

inputting the original image into a living body detection model to obtain a living body detection result corresponding to the original image, wherein the living body detection model is obtained by training according to the living body detection model training method according to any one of claims 1-10.

12. A biopsy system, comprising:

at least one storage medium storing at least one set of instructions for performing a living organism detection; and

at least one processor communicatively coupled to the at least one storage medium,

wherein, when the biopsy system is running, the at least one processor reads the biopsy model and implements the biopsy method of claim 11, the biopsy model being trained in accordance with the biopsy model training method of any one of claims 1-10.