CN114140851A

CN114140851A - Image detection method and method for training image detection model

Info

Publication number: CN114140851A
Application number: CN202111452667.1A
Authority: CN
Inventors: 王珂尧
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-01
Filing date: 2021-12-01
Publication date: 2022-03-04
Anticipated expiration: 2041-12-01
Also published as: CN114140851B

Abstract

The disclosure provides an image detection method and a method for training an image detection model, relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scenes such as face recognition, face image processing and the like. The implementation scheme is as follows: performing at least two binary class predictions on a target image and obtaining at least two corresponding prediction results, wherein the at least two binary class predictions correspond to the at least two classes respectively, and wherein for each of the at least two binary class predictions, the corresponding prediction result corresponds to a first class and a second class, the first class is different from any one of the at least two classes, and the second class is the class of the at least two classes corresponding to the binary prediction; and obtaining a detection result based on the at least two prediction results, wherein the detection result comprises any one of the following items: a first detection result indicating that the target image corresponds to a first classification, and a second detection result indicating that the target image corresponds to one of the at least two classifications.

Description

Image detection method and method for training image detection model

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, in particular to the field of deep learning and computer vision technologies, and may be applied to scenes such as face recognition and face image processing, and in particular to an image detection method, and a method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product for training an image detection model.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like: the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

Image processing techniques based on artificial intelligence have penetrated into various fields. The human face living body detection technology based on artificial intelligence judges whether the image data is from a human face living body or not according to the image data input by a user.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

The present disclosure provides an image detection method, apparatus, electronic device, computer-readable storage medium, and computer program product.

According to an aspect of the present disclosure, there is provided an image detection method including: performing at least two binary class predictions on a target image and obtaining at least two corresponding prediction results, wherein the at least two binary class predictions correspond to at least two classes respectively, and wherein, for each of the at least two binary class predictions, the corresponding prediction result corresponds to a first class and a second class, the first class being different from any one of the at least two classes, and the second class being the class of the at least two classes corresponding to the binary prediction; and obtaining a detection result based on the at least two prediction results, wherein the detection result comprises any one of the following: a first detection result indicating that the target image corresponds to the first classification, and a second detection result indicating that the target image corresponds to one of the at least two classifications.

According to another aspect of the present disclosure, there is provided a method for training an image detection model, the detection model including at least two classification networks, wherein the at least two classification networks correspond to the at least two classifications, respectively, and wherein, for each of the at least two classification networks, the classification network is configured to perform classification prediction corresponding to a first classification and a second classification of the at least two classifications corresponding to the classification network, the first classification being different from any of the at least two classifications; wherein the method comprises the following steps: obtaining a training image set comprising a plurality of first images corresponding to a first classification and a plurality of second images corresponding to each of the at least two classifications; and for each of the at least two secondary classification networks, performing corresponding secondary classification training on the secondary classification network by using a training image subset consisting of the plurality of first images and the corresponding plurality of second images, wherein the corresponding secondary classification training corresponds to a second classification corresponding to the secondary classification network in the first classification and the at least two classifications.

According to another aspect of the present disclosure, there is provided an image detection apparatus including: a classification unit configured to perform at least two binary class predictions on a target image and obtain corresponding at least two prediction results, wherein the at least two binary class predictions correspond to at least two classes, respectively, and wherein, for each of the at least two binary class predictions, the corresponding prediction result corresponds to a first class that is different from any one of the at least two classes and a second class that is a class of the at least two classes to which the binary prediction corresponds; and an obtaining unit configured to obtain a detection result based on the at least two prediction results, wherein the detection result includes any one of the following: a first detection result indicating that the target image corresponds to the first classification, and a second detection result indicating that the target image corresponds to one of the at least two classifications.

According to another aspect of the present disclosure, there is provided a training apparatus for a detection model for living human face detection, the detection model including at least two classification networks, wherein the at least two classification networks correspond to the at least two classifications, respectively, and wherein, for each of the at least two classification networks, the classification network is configured to perform classification prediction corresponding to a first classification and a second classification corresponding to the classification network in the at least two classifications, the first classification being different from any one of the at least two classifications; wherein the apparatus comprises: an image acquisition unit configured to acquire a training image set including a plurality of first images corresponding to a first classification and a plurality of second images corresponding to each of the at least two classifications; and a first training unit configured to perform, for each of the at least two classification networks, a corresponding classification training for the classification network using a training image subset composed of the plurality of first images and a corresponding plurality of second images, the corresponding classification training corresponding to a second classification corresponding to the classification network of the first classification and the at least two classifications.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to implement a method according to the above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to implement the method according to the above.

According to another aspect of the present disclosure, a computer program product is provided comprising a computer program, wherein the computer program realizes the method according to the above when executed by a processor.

According to one or more embodiments of the present disclosure, by performing at least two binary predictions on a target image at the same time, the at least two binary predictions perform a prediction between a first classification of the target image and a second classification of the at least two binary predictions that is different from the first classification, a detection result for determining whether the target image is the first classification can be obtained according to at least two prediction results of the at least two binary predictions. Namely, the detection result of the classification of the target image can be obtained by performing calculation on the target image once, and the calculation force is effectively saved. Meanwhile, the detection result is simultaneously related to at least two prediction results of at least two secondary classification predictions, and the accuracy of the detection result is effectively improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of an image detection method according to an embodiment of the present disclosure;

fig. 3 shows a flowchart of a procedure of obtaining a detection result based on at least two prediction results in an image detection method according to an embodiment of the present disclosure;

fig. 4 shows a flowchart of a process of obtaining a detection result based on a first confidence average in an image detection method according to an embodiment of the present disclosure;

FIG. 5 shows a flow diagram of a method for training an image detection model according to an embodiment of the present disclosure;

FIG. 6 shows an architecture diagram of a detection model according to an embodiment of the present disclosure;

FIG. 7 shows a flow chart of a process of performing respective classification training on a two-class network using a training image subset composed of a plurality of first images and a corresponding plurality of second images in a method for training an image detection model according to an embodiment of the present disclosure;

FIG. 8 shows a flow diagram of a process for two-class supervised training of a plurality of two-class networks in a method for training an image detection model according to an embodiment of the present disclosure;

fig. 9 shows a block diagram of the structure of an image detection apparatus according to an embodiment of the present disclosure;

FIG. 10 shows a block diagram of an apparatus for training an image detection model according to an embodiment of the present disclosure; and

FIG. 11 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable the image detection method to be performed.

In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a

client device

101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may view the searched objects using

client devices

101, 102, 103, 104, 105, and/or 106. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptops), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, Linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, Windows Phone, Android. Portable handheld devices may include cellular telephones, smart phones, tablets, Personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), Short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 120 may include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and 106.

In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and object files. The data store 130 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 130 may be of different types. In certain embodiments, the data store used by the server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or regular stores supported by a file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

Referring to fig. 2, an image detection method 200 according to some embodiments of the present disclosure includes:

step S210: performing at least two secondary classification predictions on a target image, and obtaining at least two corresponding prediction results;

step S220: obtaining a detection result based on the at least two prediction results.

Wherein, in step S210, the at least two bi-class predictions correspond to at least two classes, respectively, and wherein, for each of the at least two bi-class predictions, the corresponding prediction result corresponds to a first class and a second class, the first class being different from any of the at least two classes, the second class being the class of the at least two classes corresponding to the bi-class prediction; in step S220, the detection result includes any one of the following: a first detection result indicating that the target image corresponds to the first classification, and a second detection result indicating that the target image corresponds to one of the at least two classifications.

According to one or more embodiments of the present disclosure, by performing at least two binary predictions on a target image at the same time, the at least two binary predictions perform a prediction between a first classification and a second classification of at least two binary classifications different from the first classification on the target image, a detection result for determining whether the target image is the first classification can be obtained according to at least two prediction results of the at least two binary predictions. Namely, the detection result of the classification of the target image can be obtained by performing calculation on the target image once, and the calculation force is effectively saved. Meanwhile, the detection result is simultaneously related to at least two prediction results of at least two secondary classification predictions, and the accuracy of the detection result is effectively improved.

In the related art, face live body detection is performed based on image data input by a user to determine whether the input image data is from a face live body. After the image data is processed into a target image, composite image detection and human face living body detection are sequentially carried out on the target image so as to judge whether the image data corresponding to the target image comes from the human face living body. Firstly, composite image detection is carried out on a target image to judge whether image data corresponding to the target image is from the composite image, and after the image data corresponding to the target image is judged not to be from the composite image, face living body detection is carried out on the target image to judge whether the image data corresponding to the target image is from the face living body. The whole process needs to carry out detection and judgment twice on the target image, so that the consumed computing power is very large. Meanwhile, in the process of detecting the composite image of the target image, if the judgment is wrong, the final detection result is wrong, so that the accuracy is difficult to ensure. For example, if the target image corresponding to the composite image is determined not to correspond to the composite image, the target image is further subjected to human face living body detection, and the target image is determined to be a human face living body, and in some application scenarios, the image data corresponding to the target image passes verification, which may cause huge property loss and even personal injury.

According to the embodiment of the disclosure, the image detection is directly performed on the target image from the image data input by the user, whether the image data corresponding to the target image corresponds to the first classification is directly obtained, namely, the image data is from the living human face, and the detection result can be obtained by performing detection judgment on the target image once, so that the calculation power is reduced. Meanwhile, the detection result is obtained based on multiple binary prediction on the target image, the detection result is related to the result of each binary prediction in the multiple binary predictions, and the results of the binary predictions are not related to each other, so that the obtained detection result is not wrong due to the error of the binary prediction result of a certain time, and the accuracy of the detection result is greatly improved.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

In some embodiments, the method 200 further comprises, prior to performing at least two bi-categorical predictions on the target image, acquiring the target image.

According to some embodiments, acquiring the target image comprises: acquiring image data input by a user, and acquiring the target image based on the image data.

In some embodiments, the image data input by the user may be, without limitation, a video, a photograph, or the like.

In some embodiments, the target image comprises an image comprising a human face, and based on the image data, acquiring the target image comprises: acquiring an image to be detected based on the image data; and preprocessing the image to be detected to obtain a target image. Wherein, the pretreatment process comprises the following steps: face detection, acquiring a region image, normalizing the region image, enhancing data and the like.

For example, taking a frame of image in a video input by a user as an image to be detected as an example, a process of preprocessing the image to be detected to obtain a target image will be described, where the process includes:

firstly, face detection is carried out on an image to be detected so as to obtain a detection frame surrounding a face. In some examples, the face key points are obtained by detecting the face key points in the image to be detected, and the detection frame is obtained based on the face key points.

Then, based on the detection frame, a region image is obtained. In some examples, a region surrounded by the detection frame in the image to be detected is taken as the region image. In other examples, the detection frame is enlarged by a predetermined multiple (e.g., three times), an enlarged bounding frame is obtained, and an area enclosed based on the enlarged bounding frame is taken as the area image.

Then, the area image is normalized and data enhanced to obtain a target image. In some examples, the region image is normalized by processing pixels at various locations in the region image into values distributed between-0.5-0.5. In some examples, the normalized image is subjected to random data enhancement to perform data enhancement processing on the region image.

It should be understood that, in the above embodiments, the illustrated examples of the process of obtaining the target image are all exemplary, and those skilled in the art should understand that the image to be detected which is subjected to other forms of preprocessing processes and the image to be detected which is not subjected to preprocessing can also be taken as the target image to execute the image detection method of the present disclosure.

In some embodiments, the target image is input into at least two-classification networks to implement step S210.

In some embodiments, the target image comprises an image comprising a human face, the first classification comprises a live human face classification, and the at least two classifications comprise an attack classification and a composite map classification.

In some examples, the live face classification indicates that the target image is from a live face.

For example, a real face is photographed to obtain a target image in which a face corresponds to the real face.

In some examples, the attack classification indicates that the target image is from an attack, and the attack may include, but is not limited to, a screen attack, a paper attack, and a three-dimensional model attack.

For example, when the attack is a three-dimensional model attack, the three-dimensional model is photographed to obtain a target image, and the target image includes a human face, but the human face corresponds to the three-dimensional model.

In some examples, the composite map classification indicates that the target image is from a composite map, which may be, for example, a target image generated by processing an image containing a face, the face being included in the target image and the face corresponding to the face in the image.

In one example, step S210 is implemented by inputting the target image into a living face-composite map two-classification network for classifying the target image into one of two classifications, a living face classification and a living composite map classification, and a living face-attack two-classification network for classifying the target image into one of the living face classification and the attack classification.

In the process of carrying out two-classification prediction on a target image, a human face living body-synthetic image two-classification network and a human face living body-attack two-classification network process the target image in parallel, a structure similar to a double-current network is realized, and prediction results of the two-classification networks can be obtained simultaneously through one-time image input.

In some embodiments, for each of the at least two bi-categorical predictions, the corresponding prediction result comprises one of:

a first secondary classification result indicating that the target image corresponds to the first classification, an

A second classification result indicating that the target image corresponds to a second classification to which the classification prediction corresponds, and wherein obtaining the detection result based on the at least two prediction results comprises:

obtaining the first detection result in response to determining that a prediction result corresponding to each of the at least two bi-class predictions is a first bi-class result; and

obtaining the second detection result in response to determining that a prediction result corresponding to any one of the at least two bi-class predictions is a second classification result.

When the prediction result of each of the two-class predictions is a first two-class result indicating that the target image corresponds to the first class, the detection result is judged to be a first detection result indicating that the target image corresponds to the first class, and when the prediction result of any one of the two-class predictions is a second classification result indicating that the target image corresponds to a second class different from the first class (i.e., not corresponding to the first class), the detection result is judged to be a second detection result indicating that the target image corresponds to one of the at least two classes. The process makes the process that the obtained detection result is the first detection result strict, and greatly reduces the probability that the detection result is wrong because the second detection result is used as the detection result of the target image and the first detection result is used as the detection result of the target image.

For example, in the process of detecting the target image to judge whether the target image corresponds to the living human face classification, the detection result is obtained by performing living human face-synthesized image two-classification prediction and living human face-attack two-classification prediction on the target image. When the prediction results of carrying out human face living body-synthetic image two-classification prediction on the target image and human face living body-attack two-classification prediction on the target image are human face living body classifications, obtaining a detection result as that the target image corresponds to the human face living body classification; when the prediction result in one of the face live-synthetic image two-class prediction on the target image and the face live-attack two-class prediction on the target image is not the face live-class, for example, the synthetic image classification at the time of the prediction result in the face live-synthetic image two-class prediction on the target image, the detection result is obtained as the target image corresponding to one of two classes (synthetic image class and attack class) different from the face live-class.

In some embodiments, each of the at least two classification networks obtains the prediction result by obtaining a first confidence that the target image corresponds to the first classification and a confidence that the target image corresponds to the corresponding second classification. When the corresponding first confidence coefficient is larger than or equal to the second confidence coefficient, obtaining a corresponding prediction result as the first secondary classification result; and when the corresponding first confidence coefficient is smaller than the second confidence coefficient, obtaining the corresponding prediction result as a second classification result.

In some embodiments, for each of the at least two bi-categorical predictions, the corresponding prediction result comprises:

a first confidence level that the target image corresponds to the first classification, an

The target image corresponds to a second location distinct from a second classification to which the binary prediction corresponds; as shown in fig. 3, obtaining the detection result based on the at least two prediction results includes:

step S310: obtaining a first confidence coefficient average value based on a first confidence coefficient corresponding to each two-class prediction in the at least two class predictions; and

step S320: and obtaining the detection result based on the first confidence coefficient average value.

According to the first confidence degree of the target image corresponding to the first classification obtained in each two-classification prediction of the at least two-classification predictions, obtaining a first confidence degree average value, wherein the first confidence degree average value represents the probability that the target image corresponds to the first classification under various different two-classification predictions, namely the probability that the target image corresponds to the first classification under the condition that various classifications are considered by the first confidence degree average value representation, and the method is more accurate. The detection result obtained based on the first confidence degree average value is more accurate.

In some embodiments, the detection result is obtained by comparing the first confidence average with a preset threshold. For example, when the first confidence degree average value is higher than a preset threshold (e.g. 80%), a first detection result is obtained; and when the first confidence coefficient average value is lower than a preset threshold value, obtaining a second detection result.

In some embodiments, the target image is detected to determine whether the target image corresponds to a live human face classification. The rigor degree of the classification of the target image into the living human face can be adjusted and judged according to the set preset threshold, and different prediction thresholds can be set according to application scenes.

In some embodiments, as shown in fig. 4, obtaining the detection result based on the first confidence average comprises:

step S410: comparing the first confidence measure average with a second confidence measure corresponding to each of the at least two second class predictions;

step S420: obtaining the first detection result in response to determining that a second confidence corresponding to each of the at least two bi-class predictions is not greater than the first confidence average; and

step S420: obtaining the second detection result in response to determining that the first confidence measure average is less than a second confidence measure for any of the at least two second class predictions.

And comparing the first confidence coefficient average value with the second confidence coefficient of each of the at least two secondary classification predictions to obtain a detection result, so that the obtained detection result is more accurate.

For example, in the process of detecting the target image to judge whether the target image corresponds to the living human face classification, the detection result is obtained by performing living human face-synthesized image two-classification prediction and living human face-attack two-classification prediction on the target image. In the process of carrying out face living body-synthesized image two-class prediction on a target image, the first confidence coefficient of the obtained target image corresponding to the face living body class is 80%, and the second confidence coefficient corresponding to the synthesized image class is 20%; in the process of carrying out human face living body-attack two-classification prediction on the target image, the first confidence coefficient of the target image corresponding to the human face living body classification is obtained to be 40%, and the second confidence coefficient corresponding to the attack classification is obtained to be 60%. The first confidence measure is obtained as 60% of the average value, and the detection result is obtained as the second detection result, namely, the target image corresponds to one of the composite image classification and the attack classification.

In some embodiments, the method 200 further comprises: in response to the detection result comprising the second detection result, obtaining a third classification of the at least two classifications corresponding to the target image, wherein the third classification corresponds to a first binary prediction of the at least two binary predictions, and wherein a second confidence of the first binary prediction is greater than a second confidence of any binary prediction of the at least two binary predictions that is different from the first binary prediction.

After the detection result is obtained as the second detection result, namely the target image is determined to correspond to one of the at least two classifications, the corresponding classification of the target image in the at least two classifications is further obtained, so that the effect of multi-classification on any target image can be realized.

For example, in the process of detecting the target image to judge whether the target image corresponds to the living human face classification, the detection result is obtained by performing living human face-synthesized image two-classification prediction and living human face-attack two-classification prediction on the target image. When the obtained detection result is a second detection result, namely the target image corresponds to one of the synthetic image classification and the attack classification; further, the classification corresponding to the obtained target image is the composite image classification, so that two results are obtained through one-time detection, namely whether the target image corresponds to the composite image classification and whether the target image corresponds to the human face living body classification are judged at the same time. Compared with the prior art, the method has the advantages that the composite image detection and the human face living body detection are required to be sequentially carried out on the target image so as to judge whether the target image corresponds to the composite image or not and whether the target image corresponds to the human face living body or not, and calculation force and processing steps are greatly reduced.

According to another aspect of the present disclosure, there is also provided a method for training an image detection model, the detection model including at least two classification networks, wherein the at least two classification networks correspond to the at least two classifications, respectively, and wherein, for each of the at least two classification networks, the classification network is configured to perform classification prediction corresponding to a first classification and a second classification of the at least two classifications corresponding to the classification network, the first classification being different from any of the at least two classifications.

Referring to FIG. 5, a method 500 for training an image detection model according to some embodiments includes:

step S510: obtaining a training image set comprising a plurality of first images corresponding to a first classification and a plurality of second images corresponding to each of the at least two classifications; and

step S520: for each of the at least two classification networks, performing classification training corresponding to the classification network of the first classification and the at least two classifications using a training image subset composed of the plurality of first images and a corresponding plurality of second images.

By performing corresponding binary training on each of at least two binary networks in the detection model, each of the two binary networks is enabled to independently perform corresponding binary prediction, so that the trained detection model can obtain a detection result for judging whether the target image is in the first classification according to at least two prediction results of the at least two binary networks. That is, the detection model performs a single calculation on the input image input thereto, and thus a detection result indicating the classification of the input image can be obtained, which effectively saves the calculation effort. Meanwhile, the detection result is simultaneously related to at least two prediction results of at least two classification networks, and the accuracy of the detection result is effectively improved.

Meanwhile, in the training process, the corresponding training image subsets are adopted for the at least two-classification networks to carry out independent training, so that the training process of each two-classification network corresponds to the corresponding two-classification prediction, the training interference between the at least two-classification networks is avoided, and the training precision and efficiency are improved.

The detection model comprises at least two parallel two-classification networks, realizes a structure similar to a double-current network, and can simultaneously obtain the prediction results of the two-classification networks through one-time image input.

Referring to fig. 6, an architecture of a detection model is shown, according to some embodiments of the present disclosure. The detection model 600 as shown in FIG. 6 includes a first binary network 610, a second binary network 620; the first binary network includes a feature extraction network 611 and a full connection layer 612, the second binary network 620 includes a feature extraction network 621 and a full connection layer 622, and the first binary network 610 and the second binary network 620 obtain

respective output results

610B and 620B based on a common input a. In some embodiments, as shown in fig. 6, the detection model 600 further includes a processing module 630 obtaining a detection result for output based on the output results 610B and 620B of the first classification network 610 and the second classification network 620.

In some embodiments, the training image may be an image frame extracted based on a video, an image input by a user, or the like.

In some embodiments, obtaining the training image further includes pre-processing the acquired image frames or the user input images. In some examples, the preprocessing process may be consistent with the preprocessing process for obtaining the target image in the foregoing embodiments, and is not limited herein.

In some embodiments, as shown in fig. 7, performing the respective classification training on the classification network using the training image subset composed of the plurality of first images and the corresponding plurality of second images includes: for each image in the subset of training images, performing

Step S670: inputting the image into the two-class network to obtain a corresponding prediction result, wherein the corresponding prediction result comprises:

the image corresponding to a first confidence level of the first classification, an

The image corresponding to a second confidence of a second one of the at least two classifications corresponding to the two classification networks, an

Step S670: adjusting parameters of the two-classification network based on the respective classifications of the image in the first classification and the at least two classifications and the corresponding prediction results.

In the process of training the second classification network, corresponding classification training is carried out on the classification network by obtaining a first confidence coefficient of a training image corresponding to a first classification and a second confidence coefficient of the training image corresponding to a second classification of the classification network, so that the classification network can obtain the corresponding first confidence coefficient and the second confidence coefficient based on the input image, and the trained classification network outputs different prediction results according to the obtained first confidence coefficient and the second confidence coefficient.

In some examples, the predicted result for the classification network includes a first confidence level and a second confidence level.

In some examples, the prediction results corresponding to the two-classification network include a first two-classification result and a second two-classification result obtained according to a first confidence and a second confidence, wherein the first two-classification result indicates that the input image corresponds to the first classification, and the second two-classification result indicates that the input image corresponds to the second classification corresponding to the two-classification network.

In some embodiments, each of the at least two bi-classification networks comprises a feature extraction network, and the method for training an image detection model according to the present disclosure further comprises performing a bi-classification supervised training on the at least two bi-classification networks. As shown in fig. 8, performing two-class supervised training on at least two-class networks includes, for each image in the training image set:

step S810: fusing the feature extraction network of each of the at least two-class networks based on the features extracted from the image;

step S820: inputting the fused features into a two-class supervision network to obtain two-class supervision prediction results, wherein the two-class supervision prediction results comprise a first prediction result indicating that the image corresponds to a first class and a second prediction result indicating that the image does not correspond to the first class;

step S830: and adjusting parameters of the feature extraction network of each of the at least two classification networks based on the corresponding classification of the image in the first classification and the at least two classifications and the two classification supervision prediction result.

According to the at least two classification networks of the present disclosure, since the same input image is targeted in the calculation process, the same or similar processing procedure may be performed when the features of the input image are extracted for the first classification. Therefore, in the training process, the same images corresponding to the first classification are used as training images, the feature extraction networks in the at least two-classification networks can be subjected to unified supervised training, and in the supervised training process, the images corresponding to the first classification are used as one class of images, and the images corresponding to any classification in the at least two classifications are used as another class of images, so that the two-classification supervised training is realized. Through the steps S810-S830, the two-classification supervised training is further realized for the feature extraction network in at least two-classification networks, so that the trained detection model is more accurate.

In some embodiments, the first classification comprises a live face classification, and the at least two classifications comprise an attack classification and a composite map classification.

With continued reference to fig. 6, in the two-classification supervision training, the features extracted by the feature extraction network 611 of the first two-classification network 610 and the feature extraction network 621 of the second two-classification network 620 are respectively obtained and merged and then input to the two-classification supervision network 640 for two-classification supervision training.

According to another aspect of the present disclosure, there is also provided an image detection apparatus, and referring to fig. 8, the apparatus 800 includes: a classification unit 910 configured to perform at least two binary predictions for a target image and obtain corresponding at least two prediction results, wherein the at least two binary predictions correspond to at least two classifications, respectively, and wherein, for each of the at least two binary predictions, the corresponding prediction result corresponds to a first classification and a second classification of the at least two classifications that corresponds to the binary prediction, the first classification being different from any of the at least two classifications; and an obtaining unit 920 configured to obtain a detection result based on the at least two prediction results, wherein the detection result includes any one of the following items: a first detection result indicating that the target image corresponds to the first classification, and a second detection result indicating that the target image corresponds to one of the at least two classifications.

In some embodiments, for each of the at least two bi-categorical predictions, the corresponding prediction result comprises one of: a first classification result indicating that the target image corresponds to the first classification, and a second classification result indicating that the target image corresponds to a second classification to which the classification prediction corresponds, and wherein the obtaining unit 910 includes: a first obtaining unit configured to obtain the first detection result in response to determining that a prediction result corresponding to each of the at least two second-class predictions is a first second-class result; and a second obtaining unit configured to obtain the second detection result in response to determining that a prediction result corresponding to any one of the at least two second-class predictions is a second classification result.

In some embodiments, wherein for each of the at least two bi-categorical predictions, the corresponding prediction result comprises: a first confidence level that the target image corresponds to a first classification and a second confidence level that the target image corresponds to a second classification different from the classification prediction, and wherein the obtaining unit 910 comprises: a calculating unit configured to obtain a first confidence average value based on a first confidence corresponding to each of the at least two class predictions; and an obtaining subunit configured to obtain the detection result based on the first confidence degree average value.

In some embodiments, wherein the obtaining subunit comprises: a comparison unit configured to compare the first confidence measure average with a second confidence measure corresponding to each of the at least two second class predictions; a first obtaining subunit configured to obtain the first detection result in response to determining that a second confidence corresponding to each of the at least two bi-class predictions is not greater than the first confidence average; and a second obtaining subunit configured to obtain the second detection result in response to determining that the first confidence average is less than a second confidence of any one of the at least two second-class predictions.

In some embodiments, further comprising: a classification obtaining unit configured to obtain, in response to the detection result including the second detection result, a third classification corresponding to the target image among the at least two classifications, wherein the third classification corresponds to a first binary prediction of the at least two binary predictions, and a second confidence of the first binary prediction is greater than a second confidence of any binary prediction of the at least two binary predictions that is different from the first binary prediction.

According to another aspect of the present disclosure, there is also provided a training apparatus for a detection model for living human face detection, the detection model including at least two classification networks, wherein the at least two classification networks correspond to the at least two classifications, respectively, and wherein, for each of the at least two classification networks, the classification network is configured to perform classification prediction corresponding to a first classification and a second classification corresponding to the second classification, the first classification being different from any one of the at least two classifications. As shown in fig. 10, the apparatus 1000 includes: an image acquisition unit 1010 configured to acquire a training image set including a plurality of first images corresponding to a first classification and a plurality of second images corresponding to each of the at least two classifications; and a first training unit 1020 configured to, for each of the at least two classification networks, perform a corresponding classification training on the classification network using a training image subset composed of the plurality of first images and a corresponding plurality of second images, the corresponding classification training corresponding to a second classification corresponding to the classification network in the first classification and the at least two classifications.

In some embodiments, the first training unit 1020 comprises: an input unit configured to input, for each image in the subset of training images, the image into the binary network to obtain a corresponding prediction result, wherein the corresponding prediction result includes: a first confidence level that the image corresponds to the first classification and a second confidence level that the image corresponds to a second classification of the at least two classifications that corresponds to the classification network, and an input unit configured for adjusting a parameter of the classification network based on the respective classification of the image in the first classification and the at least two classifications and the corresponding prediction result.

In some embodiments, each of the at least two-class networks comprises a feature extraction network, the apparatus further comprising: a second training unit configured for, for each image of the set of training images: fusing the feature extraction network of each of the at least two classification networks based on the extracted features of the image, and inputting the fused features into a two classification supervision network to obtain two classification supervision prediction results, wherein the two classification supervision prediction results comprise a first prediction result indicating that the image corresponds to a first classification and a second prediction result indicating that the image does not correspond to the first classification; and adjusting parameters of the feature extraction network of each of the at least two classification networks based on the corresponding classification of the image in the first classification and the at least two classifications and the two classification supervision prediction result.

According to another aspect of the present disclosure, there is also provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program which, when executed by the at least one processor, implements a method according to the above.

According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method according to the above.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program realizes the method according to the above when executed by a processor.

Referring to fig. 11, a block diagram of a structure of an electronic device 1100, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 11, the electronic device 1100 includes a computing unit 1101, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)1102 or a computer program loaded from a storage unit 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 may also be stored. The calculation unit 1101, the ROM 1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.

A number of components in electronic device 1100 connect to I/O interface 1105, including: an input unit 1106, an output unit 1107, a storage unit 1108, and a communication unit 1109. The input unit 1106 may be any type of device capable of inputting information to the electronic device 1100, and the input unit 1106 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote control. Output unit 1107 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, an object/audio output terminal, a vibrator, and/or a printer. Storage unit 1108 may include, but is not limited to, a magnetic or optical disk. The communication unit 1109 allows the electronic device 1100 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, modems, network cards, infrared communication devices, wireless communication transceivers and/or chipsets, such as bluetooth (TM) devices, 802.11 devices, WiFi devices, WiMax devices, cellular communication devices, and/or the like.

The computing unit 1101 can be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1101 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The computing unit 1101 performs the various methods and processes described above, such as the method 200. For example, in some embodiments, method 200 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 1108. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1100 via the ROM 1102 and/or the communication unit 1109. When loaded into RAM 1103 and executed by computing unit 1101, may perform one or more of the steps of method 200 described above. Alternatively, in other embodiments, the computing unit 1101 may be configured to perform the method 200 by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

Although embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely exemplary embodiments or examples and that the scope of the present invention is not limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. An image detection method, comprising:

performing at least two binary class predictions on a target image and obtaining at least two corresponding prediction results, wherein the at least two binary class predictions correspond to at least two classes respectively, and wherein, for each of the at least two binary class predictions, the corresponding prediction result corresponds to a first class and a second class, the first class being different from any one of the at least two classes, and the second class being the class of the at least two classes corresponding to the binary prediction; and

obtaining a detection result based on the at least two prediction results, wherein the detection result comprises any one of:

a first detection result indicating that the target image corresponds to the first classification, an

A second detection result indicating that the target image corresponds to one of the at least two classifications.

2. The method of claim 1, wherein, for each of the at least two bi-class predictions, the corresponding prediction result comprises one of:

A second classification result indicating that the target image corresponds to a second classification to which the classification prediction corresponds, and wherein,

the obtaining a detection result based on the at least two prediction results comprises:

3. The method of claim 1, wherein, for each of the at least two bi-class predictions, the corresponding prediction result comprises:

The target image corresponds to a second confidence level that is distinct from a second classification to which the binary prediction corresponds, and wherein the obtaining a detection result based on the at least two prediction results comprises:

obtaining a first confidence coefficient average value based on a first confidence coefficient corresponding to each two-class prediction in the at least two class predictions; and

and obtaining the detection result based on the first confidence coefficient average value.

4. The method of claim 3, wherein the obtaining the detection result based on the first confidence measure comprises:

comparing the first confidence measure average with a second confidence measure corresponding to each of the at least two second class predictions;

obtaining the first detection result in response to determining that a second confidence corresponding to each of the at least two bi-class predictions is not greater than the first confidence average; and

obtaining the second detection result in response to determining that the first confidence measure average is less than a second confidence measure for any of the at least two second class predictions.

5. The method of claim 3 or 4, further comprising:

in response to the detection result comprising the second detection result, obtaining a third classification of the at least two classifications corresponding to the target image, wherein the third classification corresponds to a first binary prediction of the at least two binary predictions, and wherein a second confidence of the first binary prediction is greater than a second confidence of any binary prediction of the at least two binary predictions that is different from the first binary prediction.

6. The method of claim 1, wherein the target image comprises an image containing a human face, the first classification comprises a live human face classification, and the at least two classifications comprise an attack classification and a composite map classification.

7. A method for training an image detection model, the detection model comprising at least two classification networks, wherein the at least two classification networks correspond to the at least two classifications, respectively, and wherein, for each of the at least two classification networks, the classification network is configured to perform a classification prediction corresponding to a first classification and a second classification of the at least two classifications corresponding to the classification network, the first classification being distinct from either of the at least two classifications; wherein the method comprises the following steps:

obtaining a training image set comprising a plurality of first images corresponding to a first classification and a plurality of second images corresponding to each of the at least two classifications; and

for each of the at least two classification networks, performing corresponding classification training on the classification network by using a training image subset composed of the plurality of first images and a plurality of corresponding second images, where the corresponding classification training corresponds to a second classification corresponding to the classification network in the first classification and the at least two classifications.

8. The method of claim 7, wherein said performing respective classification training on the classification network using a training image subset consisting of the plurality of first images and the corresponding plurality of second images comprises:

for each image in the subset of training images,

inputting the image into the two-class network to obtain a corresponding prediction result, wherein the corresponding prediction result comprises:

Adjusting parameters of the two-classification network based on the respective classifications of the image in the first classification and the at least two classifications and the corresponding prediction results.

9. The method of claim 7 or 8, wherein each of the at least two-class networks comprises a feature extraction network, the method further comprising:

for each image in the training image set:

fusing the feature extraction networks of each of the at least two bi-classification networks based on the extracted features of the image,

inputting the fused features into a two-class supervision network to obtain two-class supervision prediction results, wherein the two-class supervision prediction results comprise a first prediction result indicating that the image corresponds to a first class and a second prediction result indicating that the image does not correspond to the first class;

and adjusting parameters of the feature extraction network of each of the at least two classification networks based on the corresponding classification of the image in the first classification and the at least two classifications and the two classification supervision prediction result.

10. The method of claim 7, wherein the first classification comprises a live face classification and the at least two classifications comprise an attack classification and a composite map classification.

11. An image detection apparatus comprising:

a classification unit configured to perform at least two binary class predictions on a target image and obtain corresponding at least two prediction results, wherein the at least two binary class predictions correspond to at least two classes, respectively, and wherein, for each of the at least two binary class predictions, the corresponding prediction result corresponds to a first class that is different from any one of the at least two classes and a second class that is a class of the at least two classes to which the binary prediction corresponds; and

an obtaining unit configured to obtain a detection result based on the at least two prediction results, wherein the detection result includes any one of the following:

12. The apparatus of claim 11, wherein, for each of the at least two bi-class predictions, the corresponding prediction result comprises one of:

A second classification result indicating that the target image corresponds to a second classification to which the classification prediction corresponds, and wherein the obtaining unit includes:

a first obtaining unit configured to obtain the first detection result in response to determining that a prediction result corresponding to each of the at least two second-class predictions is a first second-class result; and

a second obtaining unit configured to obtain the second detection result in response to determining that a prediction result corresponding to any one of the at least two second class predictions is a second classification result.

13. The apparatus of claim 11, wherein, for each of the at least two bi-class predictions, the corresponding prediction result comprises:

The target image corresponds to a second confidence level that is different from a second classification to which the binary prediction corresponds, and wherein the obtaining unit includes:

a calculating unit configured to obtain a first confidence average value based on a first confidence corresponding to each of the at least two class predictions; and

an obtaining subunit configured to obtain the detection result based on the first confidence coefficient average value.

14. The apparatus of claim 13, wherein the acquisition subunit comprises:

a comparison unit configured to compare the first confidence measure average with a second confidence measure corresponding to each of the at least two second class predictions;

a first obtaining subunit configured to obtain the first detection result in response to determining that a second confidence corresponding to each of the at least two bi-class predictions is not greater than the first confidence average; and

a second obtaining subunit configured to obtain the second detection result in response to determining that the first confidence average is less than a second confidence of any one of the at least two second-class predictions.

15. The apparatus of claim 13 or 14, further comprising:

a classification obtaining unit configured to obtain, in response to the detection result including the second detection result, a third classification corresponding to the target image among the at least two classifications, wherein the third classification corresponds to a first binary prediction of the at least two binary predictions, and wherein a second confidence corresponding to the first binary prediction is greater than a second confidence of any binary prediction of the at least two binary predictions that is different from the first binary prediction.

16. A training device of a detection model for human face living body detection, the detection model comprising at least two classification networks, wherein the at least two classification networks correspond to the at least two classifications, respectively, and wherein, for each of the at least two classification networks, the two classification network is used for performing two classification predictions corresponding to a first classification and a second classification corresponding to the two classification network in the at least two classifications, the first classification being different from any one of the at least two classifications; wherein the apparatus comprises:

an image acquisition unit configured to acquire a training image set including a plurality of first images corresponding to a first classification and a plurality of second images corresponding to each of the at least two classifications; and

a first training unit configured to, for each of the at least two classification networks, perform a corresponding classification training on the classification network using a training image subset composed of the plurality of first images and a corresponding plurality of second images, the corresponding classification training corresponding to a second classification corresponding to the classification network of the first classification and the at least two classifications.

17. The apparatus of claim 16, wherein the first training unit comprises:

an input unit configured to input, for each image in the subset of training images, the image into the binary network to obtain a corresponding prediction result, wherein the corresponding prediction result includes:

An input unit configured to adjust a parameter of the bi-classification network based on the respective classifications of the image in the first classification and the at least two classifications and the corresponding prediction result.

18. The apparatus of claim 16, wherein each of the at least two-class networks comprises a feature extraction network, the apparatus further comprising:

a second training unit configured for, for each image of the set of training images:

19. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-10.

20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-10.

21. A computer program product comprising a computer program, wherein the computer program realizes the method of any one of claims 1-10 when executed by a processor.