CN115171225A

CN115171225A - Image detection method and training method of image detection model

Info

Publication number: CN115171225A
Application number: CN202210771396.4A
Authority: CN
Inventors: 张国生
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2022-10-11

Abstract

The disclosure provides an image detection method, an image detection model training device, a storage medium and a computer program product, relates to the technical field of artificial intelligence, specifically to the technical field of deep learning, image processing and computer vision, and can be applied to scenes such as living body detection. The specific implementation scheme is as follows: acquiring an image to be detected; extracting at least two image features with different feature scales from an image to be detected; carrying out scale combination on the image characteristics to determine target image characteristics; and carrying out image classification on the target image characteristics to obtain a detection result. The detection accuracy rate is improved.

Description

Image detection method and training method of image detection model

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of deep learning, image processing, and computer vision technologies, which can be applied to scenes such as in vivo detection, and in particular, to an image detection method, an image detection model training device, an image detection model training apparatus, a storage medium, and a computer program product.

Background

At present, when images are subjected to living body detection, the living body images and attack images are generally regarded as two categories, detection results are obtained through a binary classification method, however, the method adopts the same scale feature for learning the attack images, and detection effects are influenced. Or the detection is performed by an anomaly detection method, but the anomaly detection method lacks supervision of attack characteristics and also influences the detection effect.

Disclosure of Invention

The present disclosure provides an image detection method, a training method, an apparatus, a device, a storage medium, and a computer program product for an image detection model, which improve detection accuracy.

According to an aspect of the present disclosure, there is provided an image detection method including: acquiring an image to be detected; extracting at least two image features with different feature scales from an image to be detected; carrying out scale combination on the image characteristics to determine target image characteristics; and carrying out image classification on the target image characteristics to obtain a detection result.

According to another aspect of the present disclosure, there is provided a training method of an image detection model, including: acquiring a training sample set, wherein the training samples comprise living body image samples and at least two types of attack image samples; the following training steps are performed: selecting an image sample from a training sample set; extracting at least two image features with different feature scales from the selected image sample based on the initial image detection model, and carrying out scale combination and image classification on the extracted at least two image features to obtain a target loss value and a trained image detection model; and determining the trained image detection model as the target image detection model in response to the target loss value being less than the loss threshold value.

According to still another aspect of the present disclosure, there is provided an image detection apparatus including: the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire an image to be detected; the extraction module is configured to extract image features of at least two different feature scales from the image to be detected; the combination module is configured to carry out scale combination on the image characteristics and determine target image characteristics; and the classification module is configured to perform image classification on the target image characteristics to obtain a detection result.

According to still another aspect of the present disclosure, there is provided a training apparatus of an image detection model, including: a second acquisition module configured to acquire a training sample set, wherein the training samples include living body image samples and at least two types of attack image samples; a training module configured to perform the following training steps: selecting an image sample from a training sample set; extracting at least two image features with different feature scales from the selected image sample based on the initial image detection model, and carrying out scale combination and image classification on the extracted at least two image features to obtain a target loss value and a trained image detection model; and determining the trained image detection model as the target image detection model in response to the target loss value being less than the loss threshold value.

According to still another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executable by the at least one processor to enable the at least one processor to execute the image detection method and the training method of the image detection model.

According to still another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the image detection method and the training method of the image detection model.

According to yet another aspect of the present disclosure, a computer program product is provided, which comprises a computer program, which when executed by a processor, implements the image detection method and the training method of the image detection model.

It should be understood that the statements in this section are not intended to identify key or critical features of the embodiments of the present disclosure, nor are they intended to limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of an image detection method according to the present disclosure;

FIG. 3 is a flow chart of another embodiment of an image detection method according to the present disclosure;

FIG. 4 is a flow diagram of one embodiment of a training method of an image detection model according to the present disclosure;

FIG. 5 is a flow diagram of another embodiment of a training method of an image detection model according to the present disclosure;

FIG. 6 is a schematic diagram of a training method of an image detection model according to the present disclosure;

FIG. 7 is a schematic block diagram of one embodiment of an image detection apparatus according to the present disclosure;

FIG. 8 is a schematic diagram illustrating an embodiment of an apparatus for training an image inspection model according to the present disclosure;

fig. 9 is a block diagram of an electronic device for implementing an image detection method or a training method of an image detection model according to an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of embodiments of the present disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 illustrates an exemplary system architecture 100 to which an embodiment of the image detection method or training method of the image detection model or training apparatus of the image detection model of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. Network 104 is the medium used to provide communication links between

terminal devices

101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to obtain detection results or target image detection models, etc. The

terminal devices

101, 102, 103 may have various client applications installed thereon, such as an image processing application and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices including, but not limited to, smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the

terminal devices

101, 102, 103 are software, they can be installed in the electronic devices described above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may provide various services based on the detection results obtained or the detection model of the target image. For example, the server 105 may analyze and process the to-be-detected images acquired from the

terminal apparatuses

101, 102, 103, and generate a processing result (e.g., determine a detection result, etc.).

The server 105 may be hardware or software. When the server 105 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 105 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be noted that the image detection method or the training method of the image detection model provided by the embodiment of the present disclosure is generally executed by the server 105, and accordingly, the image detection apparatus or the training apparatus of the image detection model is generally disposed in the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of an image detection method according to the present disclosure is shown. The image detection method comprises the following steps:

step 201, obtaining an image to be detected.

In the present embodiment, the execution subject of the image detection method (for example, the server 105 shown in fig. 1) may acquire an image to be detected. The image to be detected may be an image shot by a real human face or an animal or a plant, or may also be a non-real human face image or an animal image or a plant image, where the non-real image may be a screenshot image, a printed image, a synthesized image, an image with mask occlusion, and the like, and the disclosure does not limit this. The execution main body can select an image from a public image database as an image to be detected, can shoot an image as an image to be detected, and can also draw an image as an image to be detected, which is not limited by the disclosure.

In the technical scheme of the disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related users all accord with the regulations of related laws and regulations, and do not violate the customs of public order.

Step 202, extracting at least two image features with different feature scales from the image to be detected.

In this embodiment, after the execution main body obtains the image to be detected, the execution main body may extract image features of at least two different feature scales from the image to be detected. Specifically, the image to be detected may be down-sampled for multiple times to obtain images with different resolutions, the images are used as images with multiple scales, one image feature extraction model is arbitrarily selected, the image with each scale is used as input data and is input into the image feature extraction model, and corresponding image features are output from the output end of the image feature extraction model, so as to obtain image features with at least two different feature scales.

The image features may be any features that represent characteristics of an image to be detected, and exemplarily, the image features may be moire features, color difference features, paper fold features, and the like, which are not limited by the present disclosure.

The image features with different feature scales are feature points extracted from images with different resolutions, the feature scales can comprise a low feature scale, a medium feature scale and a high feature scale, the low feature scale corresponds to the profile feature of the image, the high feature scale corresponds to the detail feature of the image, illustratively, the moire feature belongs to a low-level texture feature, and the moire feature can be captured by a shallow network of an image feature extraction network, so that the moire feature can be regarded as the feature with the low feature scale; the color difference characteristic and the paper crease characteristic are relatively complex and belong to the middle-layer texture characteristic, so that the color difference characteristic and the paper crease characteristic can be regarded as the characteristic of a middle characteristic scale; if the image to be detected is an image shielded by a mask, the mask features are more vivid and more complex, so the mask features can be regarded as features of high feature scales.

And step 203, carrying out scale combination on the image characteristics to determine the target image characteristics.

In this embodiment, after acquiring the image features of at least two different feature scales, the executing body may perform scale combination on the image features to determine the target image feature. Specifically, one of the at least two image features with different feature scales may be selected as the target image feature, or multiple image features may be selected from the at least two image features with different feature scales, the multiple selected image features are calculated according to a preset scale operation rule, and the calculation result is determined as the target image feature, which is not limited in this disclosure.

And step 204, carrying out image classification on the target image characteristics to obtain a detection result.

In this embodiment, after the executing body acquires the target image feature, the executing body may perform image classification on the target image feature to obtain a detection result. Specifically, one image classification model may be selected at will, the target image features are input into the image classification model as input data, and the detection result is output from the output end of the image classification model. The detection result may be that the image to be detected is a real image, or the image to be detected is a non-real image.

The image detection method provided by the embodiment of the disclosure comprises the steps of firstly obtaining an image to be detected, and then extracting at least two image features with different feature scales from the image to be detected; secondly, carrying out scale combination on the image characteristics to determine target image characteristics; and finally, carrying out image classification on the target image characteristics to obtain a detection result. By dividing the image features into a plurality of different feature scales, the determined target image features are more discriminative, and the detection result is more accurate.

Continuing further with reference to fig. 3, a flow 300 of another embodiment of an image detection method according to the present disclosure is shown. The image detection method comprises the following steps:

and 301, acquiring an image to be detected.

In this embodiment, the specific operation of step 301 has been described in detail in step 201 in the embodiment shown in fig. 2, and is not described herein again.

It should be noted that after the image to be detected is obtained, the image to be detected may be input into a pre-trained image detection model, where the image detection model includes a pyramid feature extraction network, a scale classification network, and a feature classification network. The image detection model can be a model capable of detecting an image to be detected to obtain a detection result, the pyramid feature extraction network can be a network for extracting image features of the image to be detected, the scale classification network can be a network for screening more discriminative image features from a plurality of extracted image features, and the feature classification network can be a network for classifying based on the screened image features to obtain the detection result. The following steps may be performed based on an image detection model.

Step 302, inputting an image to be detected into a pyramid feature extraction network for feature extraction, so as to obtain at least two initial image features with different feature scales.

In this embodiment, after the execution main body obtains the image to be detected, the image to be detected may be input into the pyramid feature extraction network for feature extraction, so as to obtain at least two initial image features with different feature scales. Specifically, the image to be detected can be used as input data and input into the pyramid feature extraction network, and at least two initial image features with different feature scales are output from the output end of the pyramid feature extraction network. The pyramid feature extraction network has multiple layers, each layer can extract feature points of an image with one scale, illustratively, the first layer of the pyramid feature extraction network can extract feature points of an original image, and the image scales of the other layers are respectively the result of down-sampling of the image of the previous layer, so that at least two initial image features output by the pyramid feature extraction network have different feature scales, illustratively, three initial image features can be output, and the three initial image features are respectively the features of low feature scale, the features of medium feature scale and the features of high feature scale.

Step 303, performing a side link operation and an upsampling operation on the at least two initial image features to obtain at least two image features with different feature scales.

In this embodiment, after the executing body obtains a plurality of initial image features, a side link operation and an upsampling operation may be performed on at least two initial image features to obtain image features of at least two different feature scales. Each initial image feature is a three-dimensional feature, which may be represented as, for example, a D × H × W feature, where D dimension represents the number of channels, and H and W dimensions represent the width and height of the feature, which represent the spatial size of the feature. Specifically, the three-dimensional size of each initial image feature output by the pyramid feature extraction network is different, and a side link operation may be performed on at least two initial image features to obtain at least two equal-channel-number initial image features, where the side link operation may make the D-dimensions of the multiple initial image features the same, and unify the channel numbers of the multiple initial image features.

In some optional implementations of this embodiment, the side link may be a convolutional network, and may input at least two initial image features as input data to the convolutional network, and output at least two equal-channel number initial image features from an output end of the convolutional network.

And then performing upsampling operation on at least two equal-channel number initial image features to obtain the image features of at least two different feature scales, wherein the upsampling operation can enable the H and W dimensions of the plurality of equal-channel number initial image features to be the same, and unify the space sizes of the plurality of equal-channel number initial image features. The three-dimensional sizes of the obtained image features with different feature scales are the same by performing side edge linking operation and up-sampling operation on the plurality of initial image features.

And step 304, performing feature linking operation and global average pooling operation on the image features to obtain input features.

In this embodiment, after acquiring a plurality of image features with different feature scales, the executing body may perform a feature linking operation and a global average pooling operation on the image features with the plurality of different feature scales to obtain an input feature. Specifically, a feature linking operation may be performed on a plurality of image features with different feature scales to obtain a linked image feature, where the feature linking operation may splice the plurality of image features with different feature scales. And then performing global average pooling operation on the linked image features to obtain input features, wherein the global average pooling operation can reduce the parameter amount and prevent overfitting.

And 305, inputting the input features into a scale classification network for calculation to obtain the probability of each image feature.

In this embodiment, after the execution subject obtains the input features, the execution subject may input the input features into the scale classification network to perform calculation, so as to obtain a probability of the image feature of each feature scale. Specifically, the input features may be input into the scale classification network as input data, and the probability of the image feature of each feature scale is output from the output end of the scale classification network. The sum of the probabilities of all the image features is 1, and the larger the probability is, the more discriminative the corresponding image feature is. When the images to be detected are different, the probability of the image features of each feature scale is different.

And step 306, carrying out weighted summation on the at least two obtained probabilities and the image characteristics to obtain the target image characteristics.

In this embodiment, after obtaining a plurality of probabilities, the executing entity may perform weighted summation on the obtained plurality of probabilities and a plurality of image features of different feature scales to obtain a target image feature. Specifically, each probability may be multiplied by the image feature of the corresponding feature scale, and all the multiplication results are added to obtain the target image feature.

And 307, inputting the target image characteristics into a characteristic classification network for image classification to obtain an image classification result.

In this embodiment, after obtaining the target image feature, the execution subject may input the target image feature into a feature classification network for image classification, so as to obtain an image classification result. Specifically, the target image feature may be input into the feature classification network as input data, and the image classification result may be output from an output end of the feature classification network. The image classification result may be that the image to be detected is a real image or that the image to be detected is a non-real image.

And step 308, determining the image classification result as a detection result.

In this embodiment, after obtaining the image classification result, the executing body may directly determine the image classification result as the detection result.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, in the image detection method in this embodiment, the probability of selecting the image feature of each feature scale is obtained based on the scale classification network, and the target image feature is determined based on the probability, so that the features for the two classifications are more discriminative, and the detection accuracy is further improved.

With further continuing reference to FIG. 4, a flow 400 of one embodiment of a method of training an image detection model according to the present disclosure is illustrated. The training method of the image detection model comprises the following steps:

step 401, a training sample set is obtained, wherein the training samples include living body image samples and at least two types of attack image samples.

In this embodiment, an executing subject (e.g., the server 105 shown in fig. 1) of the training method of the image detection model may obtain a training sample set. The executing entity may obtain an existing sample set stored in the public database, or may collect samples through a terminal device (for example,

terminal devices

101, 102, and 103 shown in fig. 1), so that the executing entity may receive the samples collected by the terminal device and store the samples locally, thereby generating a training sample set.

The training sample set may include live image samples and at least two types of attack image samples. The living body image sample can be an image shot by a real human face or an animal or a plant, and the attack image sample can be a non-real human face image or an animal image or a plant image. The attack image samples are of various types, and exemplarily, the attack image samples may include screenshot type attack image samples, printing type attack image samples, and mask type attack image samples, which is not limited in this disclosure.

Step 402, selecting an image sample from a training sample set.

In this embodiment, after obtaining the training sample set, the executive body may select an image sample from the training sample set. Specifically, one image sample may be randomly selected from the training sample set, or one image sample may be selected from the training sample set based on a preset sample selection rule, which is not limited in this disclosure.

And 403, extracting at least two image features with different feature scales from the selected image sample based on the initial image detection model, and performing scale combination and image classification on the at least two extracted image features to obtain a target loss value and a trained image detection model.

In this embodiment, the executing entity may extract at least two image features with different feature scales from the selected image sample based on the initial image detection model, and perform scale combination and image classification on the at least two extracted image features to obtain the target loss value and the trained image detection model. The initial image detection model may be a model that detects an input image sample to obtain a detection result, and may input a selected image sample as input data into the initial image detection model, and the initial image detection model may perform feature extraction on the input image sample to obtain a plurality of image features of different feature scales, where the plurality of image features have different feature scales.

Then, the initial image detection model may perform scale combination on the extracted multiple image features, exemplarily, one of the multiple image features with different feature scales may be selected as a combined image feature, or multiple image features may be selected from the multiple image features with different feature scales, the selected multiple image features are calculated according to a preset scale operation rule, and a calculation result is determined as the combined image feature, which is not limited by the present disclosure.

After the combined image features are obtained, the initial image detection model can perform image classification on the combined image features to obtain a classification result. The classification result may be that the input image sample is a real image or the input image sample is a non-real image. A target loss value can be calculated based on the classification result, parameters of the initial image detection model are adjusted based on the target loss value, and the adjusted image detection model is determined to be the trained image detection model.

After the target loss value is obtained, the target loss value may be compared with a preset loss threshold, and the

step

404 or 405 may be continuously performed according to the comparison result.

And step 404, in response to the target loss value being greater than or equal to the loss threshold value, taking the trained image detection model as an initial image detection model, and executing the training step again.

In this embodiment, after obtaining the target loss value, the executing entity may compare the target loss value with a preset loss threshold, and if the target loss value is greater than or equal to the loss threshold, use the trained image detection model as the initial image detection model, and execute the training step again, specifically, execute steps 402-403 again based on the trained image detection model. Wherein, for example, the loss threshold may be a loss value less than 0.05.

And step 405, determining the trained image detection model as a target image detection model in response to the target loss value being smaller than the loss threshold value.

In this embodiment, after obtaining the target loss value, the executing entity may compare the target loss value with a preset loss threshold, and if the target loss value is smaller than the loss threshold, may determine that the trained image detection model is finished, and determine the trained image detection model as the target image detection model.

The training method of the image detection model provided by the embodiment of the disclosure firstly obtains a training sample set, and then executes the following training steps: selecting an image sample from a training sample set; extracting at least two image features with different feature scales from a selected image sample based on an initial image detection model, and carrying out scale combination and image classification on the extracted at least two image features to obtain a target loss value and a trained image detection model; and determining the trained image detection model as the target image detection model in response to the target loss value being less than the loss threshold value. By extracting a plurality of image features with different feature scales and carrying out scale combination and image classification on the extracted plurality of image features, the detection result of the target image detection model obtained by training is more accurate.

With further continued reference to fig. 5, a flow 500 of another embodiment of a method of training an image detection model according to the present disclosure is shown. The training method of the image detection model comprises the following steps:

step 501, a training sample set is obtained, wherein the training samples comprise living body image samples and at least two types of attack image samples.

Step 502, selecting an image sample from a training sample set.

In this embodiment, the specific operations of steps 501 to 502 have been described in detail in steps 401 to 402 in the embodiment shown in fig. 4, and are not described herein again.

It should be noted that the initial image detection model may include a pyramid feature extraction network, a scale classification network, and a feature classification network. The pyramid feature extraction network may be a network that extracts image features of an input image sample, the scale classification network may be a network that screens more discriminative image features from the extracted image features, and the feature classification network may be a network that classifies based on the screened image features to obtain a classification result. The following steps may be performed based on the initial image detection model described above.

Step 503, inputting the selected image sample into a pyramid feature extraction network for feature extraction, and obtaining at least two image features with different feature scales, wherein the number of categories of the feature scales is the same as the number of categories of the attack image sample.

In this embodiment, after the execution subject selects the image sample, the selected image sample may be input to a pyramid feature extraction network for feature extraction, so as to obtain at least two image features with different feature scales. Specifically, the selected image sample may be input into the pyramid feature extraction network as input data, and at least two image features with different feature scales are output from an output end of the pyramid feature extraction network.

In some optional implementation manners of the embodiment, the selected image sample is input into a pyramid feature extraction network for feature extraction, so as to obtain at least two initial image features with different feature scales; and performing side edge linking operation and up-sampling operation on the at least two initial image features to obtain at least two image features with different feature scales.

Specifically, the selected image sample may be input into a pyramid feature extraction network as input data, and a plurality of initial image features of different feature scales are output from an output end of the pyramid feature extraction network, where the plurality of initial image features have different feature scales. Then, a side link operation and an up-sampling operation are performed on the plurality of initial image features to obtain a plurality of image features with different feature scales. Each initial image feature is a three-dimensional feature, which may be represented as, for example, a D × H × W feature, where D dimension represents the number of channels, and H and W dimensions represent the width and height of the feature, which represent the spatial size of the feature. Specifically, the three-dimensional sizes of the initial image features output by the pyramid feature extraction network are different, a side link operation may be performed on the initial image features to obtain a plurality of equal-channel-number initial image features, and the side link operation may make the D-dimensions of the initial image features the same and unify the channel numbers of the initial image features. And then performing upsampling operation on the initial image features with the plurality of equal channel numbers to obtain the image features with the plurality of different feature scales, wherein the upsampling operation can enable the dimensions H and W of the initial image features with the plurality of equal channel numbers to be the same, and unify the spatial sizes of the initial image features with the plurality of equal channel numbers. The three-dimensional sizes of the obtained image features with different feature scales are the same by performing side link operation and up-sampling operation on the initial image features.

Wherein the number of categories of the characteristic scale is the same as the number of categories of the attack image sample.

And step 504, inputting the extracted at least two image features into a scale classification network for calculation to obtain target image features.

In this embodiment, after obtaining the image features of at least two different feature scales, the executing entity may input the extracted at least two image features into a scale classification network for calculation, so as to obtain the target image feature. Specifically, a plurality of image features with different feature scales may be input into the scale classification network as input data, and the target image feature may be output from an output end of the scale classification network.

In some optional implementation manners of this embodiment, a feature linking operation and a global average pooling operation may be performed on the extracted at least two image features to obtain input features; inputting the input features into a scale classification network for calculation to obtain the probability of each image feature; and carrying out weighted summation on the obtained at least two probabilities and the extracted at least two image features to obtain the target image features.

Specifically, a feature linking operation may be performed on a plurality of image features with different feature scales to obtain a linked image feature, where the feature linking operation may splice the image features with different feature scales. And then performing global average pooling operation on the linked image features to obtain input features, wherein the global average pooling operation can reduce the parameter amount and prevent overfitting. Then, the input features are used as input data and input into the scale classification network, and the probability of the image features of each feature scale is output from the output end of the scale classification network. The sum of the probabilities of all the image features is 1, and the larger the probability is, the more discriminative the corresponding image feature is. When the input image samples are different, the probability of the image features of each output feature scale is different. After obtaining the multiple probabilities, each probability may be multiplied by the image feature of the corresponding feature scale, and all the multiplication results are added to obtain the target image feature.

In some optional implementation manners of this embodiment, after obtaining the probability of the image feature of each feature scale, it may be further determined that at least two target probabilities are both reciprocals of the number of categories of the attack image sample in response to the selected image sample being a living body image sample; responding to the selected image sample as an attack image sample, determining that one target probability corresponding to the category of the selected attack image sample is 1, and the other target probabilities are 0; and calculating to obtain a scale loss value based on the obtained at least two probabilities and at least two target probabilities, wherein the at least two target probabilities are in one-to-one correspondence with at least two categories of the attack image sample.

Specifically, if the selected image sample is a living body image sample, determining that the target probabilities are all the reciprocal of the category number of the attack image sample; if the selected image sample is an attack image sample, the type of the selected image sample is determined, and exemplarily, the type of the selected image sample is determined to be a screenshot type attack image sample, a printing type attack image sample or a mask type attack image sample. And then determining that one target probability corresponding to the selected category of the attack image sample is 1, and the other target probabilities are 0. After determining the plurality of target probabilities, the calculated plurality of probabilities and the difference between the plurality of target probabilities may be compared to obtain a scale loss value, where the plurality of target probabilities are in one-to-one correspondence with the plurality of classes of the attack image sample.

In the technical scheme of the disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the related users all conform to the regulations of related laws and regulations and do not violate the good customs of the public order.

And 505, inputting the target image characteristics into a characteristic classification network for image classification to obtain a target loss value and a trained image detection model.

In this embodiment, after obtaining the target image features, the executing entity may input the target image features into a feature classification network for image classification, so as to obtain a target loss value and a trained image detection model. Specifically, the target image feature may be input into the feature classification network as input data, and the image sample classification result may be output from an output end of the feature classification network. The image sample classification result may be that the input image sample is a real image or the input image sample is a non-real image. And calculating a target loss value and a trained image detection model based on the image sample classification result.

In some optional implementation manners of the embodiment, the target image features may be input into a feature classification network for image classification, so as to obtain a classification result; calculating to obtain a classification loss value based on the classification result; calculating to obtain a target loss value based on the scale loss value and the classification loss value; and carrying out parameter adjustment on the scale classification network and the feature classification network of the initial image detection model based on the target loss value to obtain the trained image detection model.

Specifically, the target image feature may be input into the feature classification network as input data, and the image sample classification result may be output from an output end of the feature classification network. And then obtaining the type of the input image sample, and comparing the classification result of the output image sample with the type of the image sample to obtain a classification loss value. And then multiplying the first parameter by the scale loss value to obtain a first product, subtracting the first parameter by 1 to obtain a second parameter, multiplying the second parameter by the classification loss value to obtain a second product, and determining the sum of the first product and the second product as the target loss value. The sum of the first parameter and the second parameter is equal to 1, the first parameter can be set to 0.9 in the initial training stage, and gradually reduced to 0.5 as the training times increase, and the setting can focus on the learning of the scale classification network in the initial training stage. And then, carrying out parameter adjustment on the scale classification network and the feature classification network of the initial image detection model based on the target loss value to obtain the trained image detection model.

step

506 or 507 may be continuously performed according to the comparison result.

And step 506, in response to the target loss value being greater than or equal to the loss threshold value, taking the trained image detection model as an initial image detection model, and executing the training step again.

And 507, determining the trained image detection model as a target image detection model in response to the target loss value being smaller than the loss threshold value.

In this embodiment, the specific operations of steps 506 to 507 have been described in detail in steps 404 to 405 in the embodiment shown in fig. 4, and are not described herein again.

As can be seen from fig. 5, compared with the embodiment corresponding to fig. 4, in the training method of the image detection model in this embodiment, the probability of selecting the image feature of each feature scale is obtained based on the scale classification network, and the target image feature is determined based on the probability, so that the features used for the second classification are more discriminative, and the detection accuracy of the trained target image detection model is further improved. And calculating to obtain a target loss value based on the scale loss value and the classification loss value, and performing parameter adjustment on the scale classification network and the feature classification network of the initial image detection model based on the target loss value to enable the scale classification network and the feature classification network to be more accurate, so that the detection accuracy of the trained target image detection model is improved.

With further continued reference to fig. 6, a schematic diagram 600 of a method of training an image detection model according to the present disclosure is shown. As can be seen from fig. 6, a training sample set may be obtained first, where the training sample includes a living body image sample and at least two types of attack image samples, then an image sample is selected from the training sample set, and is input into a pyramid feature extraction network for feature extraction, so as to obtain image features of at least two different feature scales, then the image features of at least two different feature scales are input into a scale classification network for calculation, so as to obtain target image features, and finally the target image features are input into the feature classification network for image classification, so as to obtain a target loss value. The detection accuracy of the trained target image detection model is improved.

With further reference to fig. 7, as an implementation of the above-described image detection method, the present disclosure provides an embodiment of an image detection apparatus, which corresponds to the method embodiment shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 7, the image detection apparatus 700 of this embodiment may include a first obtaining module 701, an extracting module 702, a combining module 703, and a classifying module 704. The first acquiring module 701 is configured to acquire an image to be detected; an extraction module 702 configured to extract image features of at least two different feature scales from an image to be detected; a combination module 703 configured to perform scale combination on the image features to determine target image features; and a classification module 704 configured to perform image classification on the target image features to obtain a detection result.

In the present embodiment, the image detection apparatus 700: the specific processing of the first obtaining module 701, the extracting module 702, the combining module 703 and the classifying module 704 and the technical effects thereof can refer to the related descriptions of steps 201 to 204 in the corresponding embodiment of fig. 2, which are not repeated herein.

In some optional implementations of the present embodiment, the image detection apparatus 700 further includes: the image detection module is configured to input an image to be detected into a pre-trained image detection model, and the image detection model comprises a pyramid feature extraction network, a scale classification network and a feature classification network; the extraction module 702 includes: the first extraction submodule is configured to input an image to be detected into a pyramid feature extraction network for feature extraction, so that at least two initial image features with different feature scales are obtained; and the sampling sub-module is configured to perform side edge linking operation and up-sampling operation on the at least two initial image features to obtain at least two image features with different feature scales.

In some optional implementations of this embodiment, the combining module 703 includes: the pooling submodule is configured to perform feature linking operation and global average pooling operation on the image features to obtain input features; the first calculation submodule is configured to input the input features into the scale classification network for calculation to obtain the probability of each image feature; and the second calculation submodule is configured to perform weighted summation on the obtained at least two probabilities and the image characteristics to obtain the target image characteristics.

In some optional implementations of this embodiment, the classification module 704 includes: the first classification submodule is configured to input the target image features into a feature classification network for image classification to obtain an image classification result; a determination submodule configured to determine the image classification result as a detection result.

With further reference to fig. 8, as an implementation of the training method for the image detection model, the present disclosure provides an embodiment of an apparatus for training an image detection model, where the apparatus embodiment corresponds to the method embodiment shown in fig. 4, and the apparatus may be specifically applied to various electronic devices.

As shown in fig. 8, the training apparatus 800 for an image detection model of the present embodiment may include a second obtaining module 801 and a training module 802. The second acquiring module 801 is configured to acquire a training sample set, where the training samples include living body image samples and at least two types of attack image samples; a training module 802 configured to perform the following training steps: selecting an image sample from a training sample set; extracting at least two image features with different feature scales from a selected image sample based on an initial image detection model, and carrying out scale combination and image classification on the extracted at least two image features to obtain a target loss value and a trained image detection model; and determining the trained image detection model as the target image detection model in response to the target loss value being less than the loss threshold value.

In this embodiment, the training apparatus 800 for the image detection model: the detailed processing and technical effects of the second obtaining module 801 and the training module 802 can refer to the related descriptions of steps 401 to 405 in the corresponding embodiment of fig. 4, which are not repeated herein.

In some optional implementations of the present embodiment, the training apparatus 800 for an image detection model further includes: and the retraining module is configured to respond to the target loss value being greater than or equal to the loss threshold value, take the trained image detection model as the initial image detection model and execute the training step again.

In some optional implementations of this embodiment, the initial image detection model includes a pyramid feature extraction network, a scale classification network, and a feature classification network; the training module 802 includes: the second extraction submodule is configured to input the selected image sample into a pyramid feature extraction network for feature extraction to obtain at least two image features with different feature scales, wherein the category number of the feature scales is the same as that of the attack image sample; the third calculation submodule is configured to input the extracted at least two image features into a scale classification network for calculation to obtain target image features; and the second classification submodule is configured to input the target image features into the feature classification network for image classification to obtain a target loss value and a trained image detection model.

In some optional implementations of this embodiment, the second extraction sub-module includes: the extraction unit is configured to input the selected image sample into a pyramid feature extraction network for feature extraction to obtain at least two initial image features with different feature scales; and the sampling unit is configured to execute a side edge linking operation and an up-sampling operation on the at least two initial image features to obtain at least two image features with different feature scales.

In some optional implementations of this embodiment, the third calculation submodule includes: the pooling unit is configured to perform feature linking operation and global average pooling operation on the extracted at least two image features to obtain input features; the first calculation unit is configured to input the input features into the scale classification network for calculation to obtain the probability of each image feature; and the second calculation unit is configured to perform weighted summation on the obtained at least two probabilities and the extracted at least two image features to obtain the target image features.

In some optional implementations of this embodiment, after obtaining the probability of each image feature, the third computing sub-module further includes: the first determining unit is configured to respond to the selected image samples as living body image samples, and determine that at least two target probabilities are both the reciprocal of the category number of the attack image samples; a second determining unit configured to determine that one target probability corresponding to the category of the selected attack image sample is 1 and the remaining target probabilities are 0 in response to the selected image sample being an attack image sample; and a third calculating unit configured to calculate a scale loss value based on the obtained at least two probabilities and the at least two target probabilities, wherein the at least two target probabilities are in one-to-one correspondence with at least two classes of the attack image sample.

In some optional implementations of this embodiment, the second classification submodule includes: the classification unit is configured to input the target image features into a feature classification network for image classification to obtain a classification result; a fourth calculation unit configured to calculate a classification loss value based on the classification result; a fifth calculation unit configured to calculate a target loss value based on the scale loss value and the classification loss value; and the adjusting unit is configured to perform parameter adjustment on the scale classification network and the feature classification network of the initial image detection model based on the target loss value to obtain the trained image detection model.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM 902, and RAM903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as an image detection method or a training method of an image detection model. For example, in some embodiments, the image detection method or the training method of the image detection model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 900 via ROM 902 and/or communications unit 909. When the computer program is loaded into the RAM903 and executed by the computing unit 901, one or more steps of the above-described image detection method or training method of an image detection model may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g. by means of firmware) to perform an image detection method or a training method of an image detection model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a server of a distributed system or a server incorporating a blockchain. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology. The server may be a server of a distributed system or a server incorporating a blockchain. The server can also be a cloud server, or an intelligent cloud computing server or an intelligent cloud host with artificial intelligence technology.

It should be understood that various forms of the flows shown above, reordering, adding or deleting steps, may be used. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. An image detection method, comprising:

acquiring an image to be detected;

extracting at least two image features with different feature scales from the image to be detected;

carrying out scale combination on the image features to determine target image features;

and carrying out image classification on the target image characteristics to obtain a detection result.

2. The method of claim 1, further comprising:

inputting the image to be detected into a pre-trained image detection model, wherein the image detection model comprises a pyramid feature extraction network, a scale classification network and a feature classification network;

the step of extracting at least two image features with different feature scales from the image to be detected comprises the following steps:

inputting the image to be detected into the pyramid feature extraction network for feature extraction to obtain at least two initial image features with different feature scales;

and performing side edge linking operation and up-sampling operation on at least two initial image features to obtain the image features with at least two different feature scales.

3. The method of claim 2, wherein the scale combining the image features, determining a target image feature comprises:

performing feature linking operation and global average pooling operation on the image features to obtain input features;

inputting the input features into the scale classification network for calculation to obtain the probability of each image feature;

and carrying out weighted summation on the obtained at least two probabilities and the image characteristics to obtain the target image characteristics.

4. The method of claim 3, wherein the image classification of the target image feature to obtain a detection result comprises:

inputting the target image characteristics into the characteristic classification network for image classification to obtain an image classification result;

and determining the image classification result as the detection result.

5. A training method of an image detection model comprises the following steps:

acquiring a training sample set, wherein the training samples comprise living body image samples and at least two types of attack image samples;

the following training steps are performed: selecting an image sample from the training sample set; extracting at least two image features with different feature scales from the selected image sample based on the initial image detection model, and carrying out scale combination and image classification on the extracted at least two image features to obtain a target loss value and a trained image detection model; and determining the trained image detection model as a target image detection model in response to the target loss value being less than a loss threshold value.

6. The method of claim 5, further comprising:

and in response to the target loss value being greater than or equal to the loss threshold value, taking the trained image detection model as the initial image detection model, and executing the training step again.

7. The method of claim 6, wherein the initial image detection model comprises a pyramid feature extraction network, a scale classification network, a feature classification network;

the method for extracting at least two image features with different feature scales from a selected image sample based on an initial image detection model, and performing scale combination and image classification on the at least two extracted image features to obtain a target loss value and a trained image detection model comprises the following steps:

inputting the selected image sample into the pyramid feature extraction network for feature extraction to obtain the image features of the at least two different feature scales, wherein the category number of the feature scales is the same as that of the attack image sample;

inputting the extracted at least two image features into the scale classification network for calculation to obtain target image features;

and inputting the target image characteristics into the characteristic classification network for image classification to obtain the target loss value and the trained image detection model.

8. The method of claim 7, wherein the inputting the selected image sample into the pyramid feature extraction network for feature extraction to obtain the image features of the at least two different feature scales comprises:

inputting the selected image sample into the pyramid feature extraction network for feature extraction to obtain at least two initial image features with different feature scales;

9. The method of claim 8, wherein the inputting the extracted at least two image features into the scale classification network for computation to obtain a target image feature comprises:

performing feature linking operation and global average pooling operation on the extracted at least two image features to obtain input features;

and carrying out weighted summation on the obtained at least two probabilities and the extracted at least two image characteristics to obtain the target image characteristics.

10. The method of claim 9, wherein after obtaining the probability for each image feature, the method further comprises:

in response to that the selected image sample is a living body image sample, determining that the probability of at least two targets is the reciprocal of the category number of the attack image sample;

in response to the selected image sample being an attack image sample, determining that one target probability corresponding to the category of the selected attack image sample is 1 and the other target probabilities are 0;

and calculating to obtain a scale loss value based on the obtained at least two probabilities and the at least two target probabilities, wherein the at least two target probabilities are in one-to-one correspondence with at least two categories of the attack image sample.

11. The method of claim 10, wherein the inputting the target image features into the feature classification network for image classification to obtain the target loss value and the trained image detection model comprises:

inputting the target image features into the feature classification network for image classification to obtain a classification result;

calculating a classification loss value based on the classification result;

calculating the target loss value based on the scale loss value and the classification loss value;

and carrying out parameter adjustment on the scale classification network and the feature classification network of the initial image detection model based on the target loss value to obtain the trained image detection model.

12. An image detection apparatus, the apparatus comprising:

the device comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is configured to acquire an image to be detected;

the extraction module is configured to extract image features with at least two different feature scales from the image to be detected;

the combination module is configured to perform scale combination on the image features and determine target image features;

and the classification module is configured to perform image classification on the target image features to obtain a detection result.

13. The apparatus of claim 12, the apparatus further comprising:

the input module is configured to input the image to be detected into a pre-trained image detection model, and the image detection model comprises a pyramid feature extraction network, a scale classification network and a feature classification network;

the extraction module comprises:

the first extraction submodule is configured to input the image to be detected into the pyramid feature extraction network for feature extraction, so that at least two initial image features with different feature scales are obtained;

and the sampling sub-module is configured to perform a side link operation and an up-sampling operation on at least two initial image features to obtain the image features of the at least two different feature scales.

14. The apparatus of claim 13, wherein the combining module comprises:

the pooling submodule is configured to perform feature linking operation and global average pooling operation on the image features to obtain input features;

the first calculation submodule is configured to input the input features into the scale classification network for calculation, and the probability of each image feature is obtained;

and the second calculation submodule is configured to perform weighted summation on the obtained at least two probabilities and the image characteristics to obtain the target image characteristics.

15. The apparatus of claim 14, wherein the classification module comprises:

the first classification submodule is configured to input the target image features into the feature classification network for image classification to obtain an image classification result;

a determination sub-module configured to determine the image classification result as the detection result.

16. An apparatus for training an image inspection model, the apparatus comprising:

a second acquisition module configured to acquire a training sample set, wherein the training samples include living body image samples and at least two types of attack image samples;

a training module configured to perform the following training steps: selecting an image sample from the training sample set; extracting at least two image features with different feature scales from a selected image sample based on an initial image detection model, and carrying out scale combination and image classification on the extracted at least two image features to obtain a target loss value and a trained image detection model; and determining the trained image detection model as a target image detection model in response to the target loss value being less than a loss threshold value.

17. The apparatus of claim 16, the apparatus further comprising:

and the retraining module is configured to take the trained image detection model as the initial image detection model and execute the training step again in response to the target loss value being greater than or equal to the loss threshold value.

18. The apparatus of claim 17, wherein the initial image detection model comprises a pyramid feature extraction network, a scale classification network, a feature classification network;

the training module comprises:

the second extraction submodule is configured to input the selected image sample into the pyramid feature extraction network for feature extraction to obtain the image features of the at least two different feature scales, wherein the number of the categories of the feature scales is the same as that of the attack image sample;

the third calculation submodule is configured to input the extracted at least two image features into the scale classification network for calculation to obtain target image features;

and the second classification submodule is configured to input the target image features into the feature classification network for image classification, so that the target loss value and the trained image detection model are obtained.

19. The apparatus of claim 18, wherein the second extraction submodule comprises:

the extraction unit is configured to input the selected image sample into the pyramid feature extraction network for feature extraction to obtain at least two initial image features with different feature scales;

and the sampling unit is configured to perform a side link operation and an up-sampling operation on at least two initial image features to obtain the image features of at least two different feature scales.

20. The apparatus of claim 19, wherein the third calculation sub-module comprises:

the pooling unit is configured to perform feature linking operation and global average pooling operation on the extracted at least two image features to obtain input features;

the first calculation unit is configured to input the input features into the scale classification network for calculation, and the probability of each image feature is obtained;

and the second calculation unit is configured to perform weighted summation on the obtained at least two probabilities and the extracted at least two image characteristics to obtain the target image characteristics.

21. The apparatus of claim 20, wherein after deriving the probability for each image feature, the third computing sub-module further comprises:

the first determining unit is configured to respond to the selected image samples as living body image samples, and determine that at least two target probabilities are both the reciprocal of the category number of the attack image samples;

a second determining unit configured to determine that one target probability corresponding to the category of the selected attack image sample is 1 and the remaining target probabilities are 0 in response to the selected image sample being an attack image sample;

and a third calculating unit configured to calculate a scale loss value based on the obtained at least two probabilities and the at least two target probabilities, wherein the at least two target probabilities are in one-to-one correspondence with at least two classes of the attack image sample.

22. The apparatus of claim 21, wherein the second classification submodule comprises:

the classification unit is configured to input the target image features into the feature classification network for image classification to obtain a classification result;

a fourth calculation unit configured to calculate a classification loss value based on the classification result;

a fifth calculation unit configured to calculate the target loss value based on the scale loss value and the classification loss value;

and the adjusting unit is configured to perform parameter adjustment on the scale classification network and the feature classification network of the initial image detection model based on the target loss value to obtain the trained image detection model.

23. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-11.

24. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-11.

25. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-11.