CN113139471A

CN113139471A - Target detection method and device, electronic equipment and storage medium

Info

Publication number: CN113139471A
Application number: CN202110448199.4A
Authority: CN
Inventors: 吴凌云; 胡志强
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2021-07-20

Abstract

The present disclosure relates to a target detection method and apparatus, an electronic device, and a storage medium, wherein the method includes: acquiring a target image; detecting the target image by using a target detection network to obtain a first detection result of a target object in the target image; the target detection network is obtained based on a sample image and supervision information of the sample image, and the supervision information of the sample image comprises a mask map. The embodiment of the disclosure can be applied to medical scenes, can perform target detection on medical images, and improves the detection precision of a target detection network.

Description

Target detection method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer vision technologies, and in particular, to a target detection method and apparatus, an electronic device, and a storage medium.

Background

The computer vision technology is a technology for simulating biological vision by using a computer and related equipment, and information of a corresponding scene can be obtained by processing a collected image or video, so that the observation of a living being is simulated. Object detection is an important task of computational vision techniques.

The target detection can be applied to various scenes, for example, in a medical scene, the target detection can be used for identifying a focus area of a medical image, and the medical image can be analyzed in an auxiliary manner. However, in the current target detection task, the accuracy of the detection result is low.

Disclosure of Invention

The present disclosure provides a technical scheme for target detection.

According to an aspect of the present disclosure, there is provided an object detection method including:

acquiring a target image; detecting the target image by using a target detection network to obtain a first detection result of a target object in the target image; the target detection network is obtained based on a sample image and supervision information of the sample image, and the supervision information of the sample image comprises a mask map.

In the embodiment of the disclosure, a target image to be detected may be obtained, and a target detection network is used to detect the target image, so as to obtain a first detection result of a target object in the target image. The target detection network is obtained based on the sample image and the supervision information of the sample image, the supervision information of the sample image comprises a mask map, and the mask map can supervise the spatial information corresponding to the sample image, so that the target detection network obtained by the sample image and the supervision information of the sample image can reserve the spatial information of the target image, and the detection precision of the target detection network is improved.

In one or more possible implementation manners, the detecting the target image by using a target detection network to obtain a first detection result of a target object in the target image includes: determining at least one first candidate region based on the target detection network; and obtaining a first detection result comprising the category information and the position information of the target object based on the image characteristics of the at least one first candidate region.

At least one first candidate area where the target object is possibly located can be determined through the target detection network, and the target object possibly existing in the first candidate area can be detected, so that the detected target object is more comprehensive.

In one or more possible implementations, the method further includes: processing the sample image through a neural network to obtain a second detection result and a thermodynamic diagram of a reference object in the sample image; determining a network loss of the neural network based on the second detection result, the thermodynamic diagram and supervised information of the sample image, wherein the thermodynamic diagram is used for indicating a predicted position of the reference object in the sample image; and adjusting network parameters of the neural network based on the network loss to obtain a trained neural network, wherein the trained neural network comprises the target detection network.

Here, since the neural network supervises thermodynamic diagrams providing spatial information during training, the target detection network obtained by the trained neural network may have higher detection accuracy.

In one or more possible implementations, the neural network includes a detection branch and a segmentation branch; the processing the sample image through a neural network to obtain a second detection result and a thermodynamic diagram of the reference object in the sample image includes: performing feature extraction on the sample image by using the neural network to obtain at least one second candidate region; inputting the at least one second candidate region into the detection branch to obtain a second detection result of the reference object in the sample image; and inputting the at least one second candidate region into the segmentation branch to obtain the thermodynamic diagram of the reference object.

Here, since the detection branch and the segmentation branch of the neural network share the image feature of the sample image, the segmentation branch can retain structural information of the feature space, so that the detection accuracy of the detection branch can be improved. The division branches are supervised by adopting radial mask graphs, and compared with a complete mask graph for finely marking the outline of the object, the manual marking cost can be greatly reduced.

In one or more possible implementations, the inputting the at least one second candidate region into the segmentation branch to obtain a thermodynamic diagram of the reference object includes: performing at least one deconvolution operation and/or at least one upsampling operation on the first image feature of the at least one second candidate region by using the segmentation branch to obtain a second image feature of the at least one second candidate region; and carrying out normalization operation on the second image characteristics to obtain a thermodynamic diagram of the reference object.

Here, the thermodynamic diagram may be obtained from a second image feature of the second feature region, where the second image feature is increased in feature size compared to the first image feature, so as to better represent the position of the reference object in the sample image, and thus the thermodynamic diagram may directly present the reference corresponding position in an image manner.

In one or more possible implementations, the supervisory information further includes a reference label indicating category information and location information of a reference object in the sample image; the determining a network loss of the neural network based on the second detection result, the thermodynamic diagram, and the supervisory information comprises: determining a first loss of the neural network according to a comparison result of the second detection result and the reference label; determining a second loss of the neural network according to the comparison result of the thermodynamic diagram and the mask diagram; and obtaining the network loss of the neural network according to the first loss and the second loss.

By using the mask diagram and the reference label as the supervision information of the neural network, the neural network can be better supervised, and the detection precision of the neural network is further improved.

In one or more possible implementations, the determining a second loss of the neural network according to the comparison of the thermodynamic diagram and the mask diagram includes: and according to the comparison result of the thermodynamic diagram and the mask diagram at each pixel position, performing weighted summation on the comparison result of the thermodynamic diagram and the mask diagram at a plurality of pixel positions to determine a second loss of the neural network, wherein the weight corresponding to the comparison result of one pixel position is related to the relative position relationship of the pixel position relative to the central pixel position.

In one or more possible implementations, the method further includes: acquiring a sample image marked with the reference area as a marked image; and setting the reference region as a first characteristic value, and setting a non-reference region except the reference region in the annotation image as a second characteristic value to obtain the mask image, wherein the first characteristic value of each pixel position in the reference region is determined by the relative position relationship between each pixel position and the central pixel position. Different first characteristic values are set for different pixel positions in the reference region, the center of the reference object can be indicated, the neural network is supervised through the mask diagram on the basis of reducing the labeling workload, and the detection precision of the obtained target detection network can be improved.

In one or more possible implementations, the method further includes: and determining a first characteristic value of each pixel position in the reference area according to the distance between each pixel position in the reference area and the central pixel position, wherein the first characteristic value of each pixel position is in negative correlation with the distance.

By the mode, the reference area of the mask image can be changed in a radial mode, the network precision of the target detection network can be improved, and the target detection network has a good detection effect aiming at target detection tasks of target objects which are fuzzy or close to the edge of the image.

According to an aspect of the present disclosure, there is provided an object detection apparatus including:

the acquisition module is used for acquiring a target image;

the detection module is used for detecting the target image by using a target detection network to obtain a first detection result of a target object in the target image;

the target detection network is obtained based on a sample image and supervision information of the sample image, and the supervision information of the sample image comprises a mask map.

In one or more possible implementations, the detection module is configured to determine at least one first candidate region based on the target detection network; and obtaining a first detection result comprising the category information and the position information of the target object based on the image characteristics of the at least one first candidate region.

In one or more possible implementations, the apparatus further includes: the training module is used for processing the sample image through a neural network to obtain a second detection result and a thermodynamic diagram of a reference object in the sample image; determining a network loss of the neural network based on the second detection result, the thermodynamic diagram and supervised information of the sample image, wherein the thermodynamic diagram is used for indicating a predicted position of the reference object in the sample image; and adjusting network parameters of the neural network based on the network loss to obtain a trained neural network, wherein the trained neural network comprises the target detection network.

In one or more possible implementations, the neural network includes a detection branch and a segmentation branch; the training module is used for extracting the characteristics of the sample image by using the neural network to obtain at least one second candidate region; inputting the at least one second candidate region into the detection branch to obtain a second detection result of the reference object in the sample image; and inputting the at least one second candidate region into the segmentation branch to obtain the thermodynamic diagram of the reference object.

In one or more possible implementations, the training module is configured to perform at least one deconvolution operation and/or at least one upsampling operation on the first image feature of the at least one second candidate region by using the segmentation branch to obtain a second image feature of the at least one second candidate region; and carrying out normalization operation on the second image characteristics to obtain a thermodynamic diagram of the reference object.

In one or more possible implementations, the supervisory information further includes a reference label indicating category information and location information of a reference object in the sample image; the training module is used for determining a first loss of the neural network according to a comparison result of the second detection result and the reference label; determining a second loss of the neural network according to the comparison result of the thermodynamic diagram and the mask diagram; and obtaining the network loss of the neural network according to the first loss and the second loss.

In one or more possible implementation manners, the training module is configured to perform weighted summation on the comparison results of the thermodynamic diagram and the mask diagram at multiple pixel positions according to the comparison result of the thermodynamic diagram and the mask diagram at each pixel position, and determine the second loss of the neural network, where a weight corresponding to the comparison result of one pixel position is related to a relative position relationship of the pixel position with respect to a central pixel position.

In one or more possible implementation manners, the training module is further configured to obtain a sample image labeled with the reference region as a labeled image; and setting the reference region as a first characteristic value, and setting a non-reference region except the reference region in the annotation image as a second characteristic value to obtain the mask image, wherein the first characteristic value of each pixel position in the reference region is determined by the relative position relationship between each pixel position and the central pixel position.

In one or more possible implementations, the training module is further configured to determine a first feature value of each pixel position in the reference region according to a distance between each pixel position in the reference region and the center pixel position, where the first feature value of each pixel position is negatively correlated with the distance.

According to an aspect of the present disclosure, there is provided an electronic device including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: the above object detection method is performed.

According to an aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described object detection method.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.

Fig. 1 shows a flow diagram of a target detection method according to an embodiment of the present disclosure.

FIG. 2 illustrates a block diagram of an object detection network according to an embodiment of the disclosure.

Figure 3 illustrates a block diagram of a neural network training process in accordance with an embodiment of the present disclosure.

Fig. 4 shows a block diagram of an object detection apparatus according to an embodiment of the present disclosure.

Fig. 5 shows a block diagram of an example of an electronic device according to an embodiment of the present disclosure.

Fig. 6 shows a block diagram of an example of an electronic device according to an embodiment of the present disclosure.

Detailed Description

Various exemplary embodiments, features and aspects of the present disclosure will be described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.

The word "exemplary" is used exclusively herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail so as not to obscure the present disclosure.

The technical scheme provided by the embodiment of the disclosure can be applied to the extension of application scenes such as target detection and target identification of images or videos, and the embodiment of the disclosure does not limit the application scenes. For example, in a medical scenario, a target detection network may be used to perform target detection on a lesion area in a mirror image or a contrast image, so that a lesion area that may exist in the mirror image or the contrast image may be identified, and useful information may be provided to a user.

Fig. 1 shows a flow diagram of a target detection method according to an embodiment of the present disclosure. The object detection method may be performed by a terminal device, a server, or other types of electronic devices, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the object detection method may be implemented by a processor calling computer readable instructions stored in a memory. The following describes an object detection method according to an embodiment of the present disclosure, taking an electronic device as an execution subject.

In step S11, a target image is acquired.

In the embodiment of the present disclosure, the electronic device may have an image capturing function, and may perform image capturing on a scene to obtain a target image to be detected. Alternatively, the electronic device may acquire the target image to be detected from another device, for example, the electronic device may acquire the target image to be detected from an image capturing device, a monitoring device, or the like to the device. The target image may be an image waiting for target detection. In some implementations, the target image may be an image frame in a video. For example, the electronic device may acquire a medical image to be detected, such as an endoscopic image, a contrast image, etc., to be detected.

Step S12, detecting the target image by using a target detection network, and obtaining a first detection result of the target object in the target image.

In the embodiment of the disclosure, the target image may be detected by using a target detection network, for example, the target image is input into the target detection network, image features of the target image are extracted by using the target detection network, so as to obtain image features of the target image, and then a first detection result of the target object in the target image may be obtained according to the image features of the target image. The target detection network may include network layers such as a convolution layer, a pooling layer, and an activation layer, and may perform convolution operation, pooling operation, activation operation, and the like on image features of the target image by using the target detection network to obtain a first detection result in the target image. The first detection result may indicate a category of the target object and a position in the target image. Here, the target image may include a plurality of target objects, and target detection may be performed on categories and positions of the plurality of target objects, respectively, to obtain a first detection result for each target object. The target detection network may be trained on a neural network.

In the embodiment of the present disclosure, the target detection network may be obtained by training based on the sample images and the monitoring information of the sample images, for example, the neural network may be trained by using a plurality of sample images, for example, the plurality of sample images are input into the neural network to obtain a second detection result of each sample image output by the neural network, the second detection result output by the neural network is further monitored by using the monitoring information of the sample images, so as to implement training of the neural network, and the target detection network may be obtained after the training is completed. The supervision information may include a mask map, and the mask map may be generated based on a relative positional relationship between a plurality of pixel positions and a center pixel position of a reference region, where the reference region is a region in which the reference object is located in the sample image, for example, a mask map of the sample image may be generated by setting feature values for the plurality of pixel positions of the reference region according to the relative positional relationship between the plurality of pixel positions and the center pixel position of the reference region, for example, setting the feature values for the plurality of pixel positions of the reference region to pixel values, luminance values, and the like of the corresponding pixel positions. The area where the reference object is located can be indicated through the characteristic values of different pixel positions in the mask image, so that the neural network is supervised by using the mask image, the position information of the target object in the target image can be reserved by the target detection network obtained by the neural network training, the detection precision of the target detection network is improved, and the accuracy of the first detection result is improved.

In some implementations, the first detection result may include category information and location information, so that the category and location of the target object in the target image may be determined by the target detection network. The category information may indicate a category of the target object, for example, the category information may indicate that the target object belongs to a category of a background, a person, a vehicle, a building, and the like. The position information may indicate an image position of the target object, for example, the position information may include center position coordinates of the target object and an image size, and in some implementations, the position information may also include corner position coordinates of the target object and an image size.

Here, in the case that the target image is detected by using the target detection network to obtain the first detection result of the target object in the target image, at least one first candidate region may be determined based on the target detection network, and then the first detection result including the category information and the position information of the target object may be obtained based on the image feature of the at least one first candidate region. In some implementations, the target image may be input into a target detection network, and feature extraction may be performed on the target image by using the target detection network, for example, a convolution operation, a pooling operation, a feature fusion operation, and the like may be performed on the target image by using the target detection network to extract image features of the target image. Then, at least one first candidate region may be determined according to the image feature of the target image, for example, at least one first candidate region may be randomly determined in a feature map corresponding to the image feature. The first candidate regions may be feature regions where the target object may be located, for example, one or more first candidate regions where the target object may be located may be determined by further performing a convolution operation or the like on the extracted image features of the target image. Further, the feature size of at least one first candidate region (i.e., the feature size of the image feature of the extracted first candidate region) may be scaled to a first preset size, for example, the feature size of the first candidate region may be scaled to 7 × 7 × 256. Then, for the image features of the plurality of first candidate regions, a plurality of pooling operations and full connection operations may be performed on the image feature of each first candidate region, so as to obtain category information and position information of the target object in each first candidate region. In some implementation manners, some first candidate regions may overlap or correspond to the same target object, so that the detection results of multiple first candidate regions may be further integrated to obtain the category information and the position information of each target object in the target image. At least one first candidate area where the target object is possibly located can be determined through the target detection network, and the target object possibly existing in the first candidate area can be detected, so that the detected target object is more comprehensive.

For example, the target detection network may include a feature extraction subnetwork and a detection subnetwork. The target image is input into the target detection network, the image feature of the target image can be extracted by using the feature extraction sub-network, at least one first candidate region is determined according to the image feature of the target image, and the at least one first candidate region can be further converted into a first preset size. Further, the detection subnetwork may perform a plurality of pooling operations and full-link operations on the image features of the first candidate regions to obtain a first detection result of the target object in the target image. Here, the first detection result may include category information and position information of the target object.

In some implementations, the detection subnetwork may also include a detection branch and a splitting branch, fig. 2 shows a block diagram of a target detection network according to an embodiment of the present disclosure. The detection branch may be the network structure for obtaining the first detection result, which is not described herein again. The segmentation branch (indicated by a dashed box) may perform a deconvolution operation and/or an upsampling operation on the image features of the target image to obtain a thermodynamic diagram of the target object. The thermodynamic diagram of the target object may indicate the image position of the target object in the target image, for example, a characteristic region with a characteristic value greater than 0 in the thermodynamic diagram of the target object may correspond to an image region where the target object is located, and the larger the characteristic value, the higher the probability that the characteristic position corresponds to the image position of the target object. The feature position where the feature value is maximum in the thermodynamic diagram of the target object may correspond to the center position of the target object. Since the first detection result output by the detection branch can be expressed in a vector form and lacks the spatial information of the target object in the target image, the introduction of the segmentation branch can provide the spatial information of the target object in the target image through thermodynamic diagram.

Here, the detection subnetwork of the target detection network may comprise only the detection branches described above, and in some implementations, the detection subnetwork of the target detection network may comprise a detection branch and a splitting branch.

In the embodiment of the present disclosure, target detection may be performed on a target image by using a target detection network, where the target detection network may be trained by a neural network. The training process to obtain the target detection network is described below in one or more implementations.

In one or more implementations, the sample image may be processed through a neural network to obtain a second detection result and a thermodynamic diagram of the reference object in the sample image. The network loss of the neural network may be determined based on the second detection result of the reference object, the thermodynamic diagram indicating the predicted position of the reference object in the sample image, and the supervised information of the sample image. And adjusting network parameters of the neural network based on the network loss to obtain a trained neural network, wherein the trained neural network comprises a target detection network.

In this implementation, the sample images may be training samples for neural network training, e.g., sample images in a training set may be acquired. Each sample image has corresponding surveillance information, which may indicate a category of the reference object and position information in the sample image. Inputting the sample image into the constructed neural network, performing target detection on the reference object in the sample image by using the neural network, for example, extracting image features from the sample image, and then obtaining a second detection result and a thermodynamic diagram of the reference object in the sample image according to the image features of the sample image. The thermodynamic diagram may be used to indicate the predicted position of the reference object in the sample image, i.e., in addition to obtaining the second detection result of the reference object, a thermodynamic diagram indicating the predicted position of the reference object in the sample image may be obtained. And then, comparing the second detection result and the thermodynamic diagram obtained by the neural network with the supervision information of the sample image to obtain a comparison result. And determining the network loss of the neural network according to the comparison result, and adjusting the network parameters of the neural network by using the determined network loss to finally obtain the trained neural network. Here, the network parameter of the neural network may be a weight parameter. In the case of determining the network loss of the neural network, the network loss of the neural network may be calculated using some loss functions, for example, the network loss of the neural network may be determined using a loss function such as a logarithmic loss function, an L1 loss function, a cross entropy loss function, or the like.

Here, the output information of the neural network may include the second detection result and a thermodynamic diagram, and since the thermodynamic diagram may indicate the predicted position of the reference object in the sample image in an image manner, the trained neural network may retain spatial information of the sample image. The target detection network may include a part of the trained neural network, for example, a part of the trained neural network that obtains the second detection result, and since the part of the neural network that obtains the second detection result and the part of the neural network that obtains the thermodynamic diagram are trained together, the two parts of the network share the extracted image features, so that the target detection network obtained from the part of the trained neural network that obtains the second detection result can also retain the spatial information of the target image, so that the target detection network obtained from the part of the trained neural network has higher detection accuracy, that is, the detection accuracy of the target detection network can be improved by obtaining the target detection network through the training method.

In one example of this implementation, the neural network described above includes a detection branch and a segmentation branch. Under the condition that the sample image is input into the constructed neural network to obtain a second detection result and a thermodynamic diagram of the reference object in the sample image, the neural network can be used for carrying out feature extraction on the sample image to obtain at least one second candidate region, and then the at least one second candidate region is input into the detection branch to obtain a second detection result of the reference object in the sample image. And inputting at least one second candidate region into the segmentation branch to obtain the thermodynamic diagram of the reference object.

In this example, the neural network may include a feature extraction subnetwork and a detection subnetwork, where the detection subnetwork may include a detection branch and a segmentation branch. The image features of the sample image may be extracted by using the feature extraction sub-network of the constructed neural network, for example, the image features of the sample image may be extracted by performing a convolution operation, a pooling operation, a feature fusion operation, and the like on the sample image by using the feature extraction sub-network. At least one second candidate region may be selected further based on image characteristics of the sample image. The second candidate regions may be feature regions where the reference object may be located, and for example, one or more second candidate regions where the reference object may be located may be randomly selected in the feature map of the image feature of the extracted sample image. Further, at least one second candidate region may be input to the detection branch, and a second detection result of the reference object in the sample image may be obtained from the detection branch, and the second detection result may include category information and position information, and the at least one second candidate region may be input to the segmentation branch, and a thermodynamic diagram of the reference object may be obtained from the segmentation branch.

Here, since the output information of the neural network includes the second detection result and the thermodynamic diagram, the output of the second detection result and the thermodynamic diagram may be different network branches. Since the thermodynamic diagram can graphically indicate the predicted position of the reference object in the sample image, the trained neural network can retain spatial information of the sample image. The target detection network may include a partially trained neural network, and since the second detection result and the thermodynamic diagram may both indicate an image position of the reference object in the sample image, where the second detection result indicates the image position of the reference object in the sample image in a numerical manner, and the thermodynamic diagram indicates the image position of the reference object in the sample image in an image manner, a network branch outputting the second detection result or the thermodynamic diagram may be selected to generate the target detection network, for example, a detection branch of the trained neural network may be selected as the target detection network. Even if the target detection network is not completely the same as the trained neural network, since the outputs of the two network branches are obtained based on the same image features (the image features extracted by the feature extraction sub-network), one network branch can improve the detection accuracy of the other network branch, and the target detection network obtained from one network branch also has higher detection accuracy, that is, the detection accuracy of the target detection network can be improved by obtaining the target detection network in this way.

In an example of this implementation, the segmentation branch may be used to perform at least one convolution operation and/or at least one upsampling operation on the first image feature of the at least one second candidate region to obtain a second image feature of the at least one second candidate region, and then the normalization operation may be performed on the second image feature to obtain a thermodynamic diagram of the reference object.

In this example, the segmentation branch may scale the feature size of the at least one second candidate region (i.e., the feature size of the extracted image feature of the second candidate region) to a second preset size, which may be different from or the same as the first preset size. For example, the feature size of the second candidate region is reduced to 14 × 14 × 256, and the first image feature of the second candidate region is obtained. Then, for the first image features of the plurality of second candidate regions, at least one deconvolution operation and/or at least one upsampling operation may be performed on the first image feature of each second candidate region, and the feature size of the second candidate region is further increased, so as to obtain the second image feature of each second candidate region. Further, performing a normalization operation on the second image feature of the second candidate region, for example, performing a softmax operation on the second image feature, may obtain a thermodynamic diagram corresponding to the second candidate region. The thermodynamic diagram can be obtained by a second image feature of the second feature region, and the second image feature is increased in feature size compared with the first image feature, so that the position of the reference object in the sample image can be better reflected, and the thermodynamic diagram can directly present the position corresponding to the reference in an image mode.

In an example of this implementation, the above-mentioned supervision information may include a reference label indicating category information and position information of a reference object in the sample image, in addition to the mask map. Here, the reference label and the mask map may be obtained according to the actual type of the reference object and the actual position of the reference object in the sample image, and may be considered to be accurate, so that the reference label and the mask map may be used as the supervision information of the neural network to better supervise the neural network, thereby improving the detection accuracy of the neural network.

Under the condition that the network loss of the neural network is determined, the output of the detection branch and the output of the segmentation branch can be respectively compared with corresponding supervision information, namely, a second detection result obtained by the detection branch is compared with a reference label of the label information to obtain a comparison result of the second detection result and the reference label, and a thermodynamic diagram obtained by the segmentation branch is compared with a mask diagram of the supervision information to obtain a comparison result of the thermodynamic diagram and the mask diagram. Here, the second detection result may include category information and position information of the reference object, and when the second detection result is compared with the reference label of the label information, the category information of the second detection result may be compared with the category information of the label information, and the position information of the second detection result may be compared with the position information of the label information. Then, the comparison result of the category information of the second detection result and the category information of the label information and the comparison result of the position information of the second detection result and the position information of the label information may be integrated, for example, the two comparison results may be added, so that the first loss of the neural network may be determined. Accordingly, a second loss of the neural network may be determined from the comparison of the thermodynamic diagram to the mask diagram. The first loss and the second loss may then be added, or the first loss and the second loss may be added proportionally, and the network loss of the neural network may be obtained. Here, the comparison result of the category information of the second detection result and the category information of the label information, the comparison result of the position information of the second detection result and the position information of the label information, and the comparison result of the thermodynamic diagram and the mask diagram may be determined by a certain calculation method, for example, by using a specific loss function. The loss function used to determine the first loss of the neural network may be different from the loss function used to determine the second loss of the neural network.

In this example, the reference label and the mask map included in the supervision information may be used to supervise the second detection result and the thermodynamic diagram output by the neural network, and the mask map may indicate the position of the reference object in the sample image in an image manner, so that the network loss obtained through the mask map and the reference label is more beneficial to training the neural network, and the detection accuracy of the neural network may be improved.

In the embodiment of the disclosure, the second loss of the neural network may be determined according to the comparison result of the thermodynamic diagram and the mask diagram, so that the thermodynamic diagram output by the neural network may be constrained by using the mask diagram. In some implementations, in the case of determining the second loss of the neural network, the comparison result of the thermodynamic diagram and the mask diagram at a plurality of pixel positions may be weighted and summed according to the comparison result of the thermodynamic diagram and the mask diagram at each pixel position, so as to determine the second loss of the neural network, wherein the weight corresponding to the comparison result of one pixel position is related to the relative position relationship of the pixel position with respect to the central pixel position.

In the embodiment of the present disclosure, the thermodynamic diagram and the mask diagram may have the same image size, so that the characteristic value of the thermodynamic diagram and the characteristic value of the mask diagram may be compared at each pixel position to obtain a comparison result of each pixel position. The characteristic value of the thermodynamic diagram and the mask diagram at each pixel position may be a pixel value, a gray value, and the like. After the comparison result of each pixel position is obtained, the weight corresponding to the comparison result of each pixel position can be determined. Here, the weight corresponding to the comparison result of one pixel position may be determined according to the relative positional relationship of the pixel position with respect to the center pixel position, for example, the weight corresponding to the comparison result of the pixel position may be determined according to the distance between the pixel position and the center pixel position, and the weight corresponding to the comparison result of the pixel position is smaller as the distance is larger, that is, the weight corresponding to the comparison result of one pixel position may be inversely proportional to the distance between the pixel position and the center pixel position. The central pixel position may be a central position where the reference object is located, so that a maximum weight may be set for a comparison result of the central pixel position. The second loss is obtained by performing weighted summation on the comparison result of the thermodynamic diagram and the mask diagram at a plurality of pixel positions, so that more space structure information can be reserved in the obtained target detection network, and the missing detection condition and the accuracy of the strip target detection are reduced.

In the embodiment of the disclosure, the neural network may be trained by using a mask diagram indicating the position of the reference object in the sample image, so that the obtained target detection network may retain the spatial structure information of the target image, and improve the accuracy of target detection. The process of obtaining the mask map is described below by one implementation.

In one possible implementation manner, the sample image labeled with the reference region may be obtained as a labeled image, then the reference region is set as a first feature value, and a non-reference region except the reference region in the labeled image is set as a second feature value, so as to obtain the mask map. And determining the first characteristic value of each pixel position in the reference area according to the relative position relation between each pixel position and the central pixel position.

In this embodiment, the sample image to which the reference region is added may be used as the added image of the sample image, and the reference region may be an image region in which the reference object is located. Here, the reference region may be labeled by a preset figure, for example, the reference region may be labeled by a preset figure such as a circle or an ellipse. Compared with the outline of the reference object as the label, the preset graph has lower fineness, so that the workload of labeling can be reduced by taking the preset graph as the label. Further, the mask map may be obtained by setting each pixel position in the reference region of the annotation image as a first feature value and setting each pixel position in the non-reference region of the annotation image other than the reference region as a second feature value. The first characteristic value is different from the second characteristic value, so that the reference area where the reference object is located and the non-reference area outside the reference area can be distinguished through the characteristic value corresponding to each pixel position in the mask image. Here, the second characteristic value may be a fixed value, that is, the characteristic values of the non-reference areas may coincide. The first characteristic value for each pixel position in the reference area may be determined by the relative positional relationship of each pixel position to the central pixel position, i.e. the characteristic values may be different for different pixel positions within the reference area. Different first characteristic values are set for different pixel positions in the reference region, the center of the reference object can be indicated, the neural network is supervised through the mask diagram on the basis of reducing the labeling workload, and the detection precision of the obtained target detection network can be improved.

In one example of this implementation, the first feature value of each pixel position in the reference region may be determined according to a distance between each pixel position in the reference region and the center pixel position, for example, a cosine distance or a euclidean distance between one pixel position and the center pixel position may be calculated, and the first feature value of the pixel position may be determined according to the cosine distance or the euclidean distance between the pixel position and the center pixel position. The first characteristic value of each pixel position is in negative correlation with the distance, that is, the smaller the distance of a pixel position from the central pixel position is, the larger the first characteristic value of the pixel position can be, and the larger the distance of a pixel position from the central pixel position is, the smaller the first characteristic value of the pixel position can be. For example, if one pixel position is a center pixel position, the first feature value of the pixel position may be set to 1, and if one pixel position is an edge pixel position of the reference region, the first feature value of the pixel position may be set to 0. The first characteristic value of each pixel position in the reference area can be determined through the distance between each pixel position and the central pixel position, so that the reference area of the mask image can be changed radially, the network precision of the target detection network can be improved, and the target detection network has good detection effect aiming at target detection tasks with target objects being fuzzy or close to the image edge.

In the embodiment of the present disclosure, the neural network may be supervised based on the mask map, so as to obtain the trained neural network. The following describes a training process of a neural network by an example. Figure 3 illustrates a block diagram of a neural network training process in accordance with an embodiment of the present disclosure.

In one example, the training process of this example is illustrated with the sample image being an image frame in an endoscopic video, and the constructed neural network being used to detect lesion detection in the sample image. The constructed neural network may include a feature extraction sub-network and a detection sub-network, wherein the detection sub-network includes a detection branch and a segmentation branch. Inputting a sample image into a constructed neural network, firstly, extracting image features of the sample image by using a feature extraction sub-network, then determining at least one second candidate region according to the image features of the sample image, further, inputting the at least one second candidate region into a detection branch, converting the feature size of the at least one second candidate region into 7 × 7 × 256 by using the detection branch, and then performing a plurality of pooling operations and full connection operations on the image features of the second candidate region by using the detection branch to obtain a second detection result. Accordingly, at least one second candidate region may be input to the segmentation branch, the feature size of the at least one second candidate region may be converted into 14 × 14 × 256 by using the segmentation branch, then the convolution operation and the upsampling operation are performed on the image feature of the second candidate region by using the segmentation branch, the feature size of the second candidate region is enlarged from 14 × 14 × 256 to 28 × 28 × 256, and then the thermodynamic diagram corresponding to the second candidate region may be obtained through the softmax operation. The neural network may then be supervised using the supervision information of the sample image. The monitoring process of the detection branch may be: and calculating the second detection result and the first loss of the reference label, and supervising the detection branch by using the first loss. The supervision process of splitting branches may be: the cross-entropy loss (second loss) of the thermodynamic diagram and the radial mask diagram is calculated, and the segmentation branches are supervised by the cross-entropy loss.

The target detection network may comprise other network structures of the neural network described above, except for the split branches. Since the detection branch and the segmentation branch of the neural network share the image features of the sample image (the image features output by the feature extraction sub-network), the segmentation branch can retain the structural information of the feature space, so that the detection accuracy of the detection branch can be improved. The segmentation branches are supervised by radial mask images, and compared with a complete mask image for finely marking the outline of the object, the manual marking cost can be greatly reduced.

It is understood that the above-mentioned method embodiments of the present disclosure can be combined with each other to form a combined embodiment without departing from the logic of the principle, which is limited by the space, and the detailed description of the present disclosure is omitted.

In addition, the present disclosure also provides an apparatus, an electronic device, a computer-readable storage medium, and a program, which can be used to implement any one of the target detection methods provided by the present disclosure, and the corresponding technical solutions and descriptions and corresponding descriptions in the method sections are not repeated.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Fig. 4 shows a block diagram of an object detection apparatus according to an embodiment of the present disclosure, which, as shown in fig. 4, includes:

an acquisition module 31 for acquiring a target image;

the detection module 32 is configured to detect the target image by using a target detection network, so as to obtain a first detection result of a target object in the target image;

In one or more possible implementations, the detecting module 31 is configured to determine at least one first candidate region based on the target detection network; and obtaining a first detection result comprising the category information and the position information of the target object based on the image characteristics of the at least one first candidate region.

In some embodiments, functions of or modules included in the apparatus provided in the embodiments of the present disclosure may be used to execute the method described in the above method embodiments, and specific implementation thereof may refer to the description of the above method embodiments, and for brevity, will not be described again here.

Fig. 5 is a block diagram illustrating an apparatus 800 for observed behavior control in accordance with an example embodiment. For example, the apparatus 800 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 5, the apparatus 800 may include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816.

The processing component 802 generally controls overall operation of the device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 802 may include one or more processors 820 to execute instructions to perform all or a portion of the steps of the methods described above. Further, the processing component 802 can include one or more modules that facilitate interaction between the processing component 802 and other components. For example, the processing component 802 can include a multimedia module to facilitate interaction between the multimedia component 808 and the processing component 802.

The memory 804 is configured to store various types of data to support operations at the apparatus 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 804 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 806 provide power to the various components of device 800. The power components 806 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the apparatus 800.

The multimedia component 808 includes a screen that provides an output interface between the device 800 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 800 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 includes a Microphone (MIC) configured to receive external audio signals when the apparatus 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 804 or transmitted via the communication component 816. In some embodiments, audio component 810 also includes a speaker for outputting audio signals.

The I/O interface 812 provides an interface between the processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 814 includes one or more sensors for providing various aspects of state assessment for the device 800. For example, the sensor assembly 814 may detect the open/closed status of the device 800, the relative positioning of components, such as a display and keypad of the device 800, the sensor assembly 814 may also detect a change in the position of the device 800 or a component of the device 800, the presence or absence of user contact with the device 800, the orientation or acceleration/deceleration of the device 800, and a change in the temperature of the device 800. Sensor assembly 814 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 816 is configured to facilitate communications between the apparatus 800 and other devices in a wired or wireless manner. The device 800 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium, such as the memory 804, is also provided that includes computer program instructions executable by the processor 820 of the device 800 to perform the above-described methods.

An embodiment of the present disclosure further provides an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured as the above method.

The electronic device may be provided as a terminal, server, or other form of device.

Fig. 6 is a block diagram illustrating an electronic device 1900 according to an example embodiment. For example, the electronic device 1900 may be provided as a server. Referring to fig. 6, electronic device 1900 includes a processing component 1922 further including one or more processors and memory resources, represented by memory 1932, for storing instructions, e.g., applications, executable by processing component 1922. The application programs stored in memory 1932 may include one or more modules that each correspond to a set of instructions. Further, the processing component 1922 is configured to execute instructions to perform the above-described method.

The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system, such as the Microsoft Server operating system (Windows Server), stored in the memory 1932^TM) Apple Inc. of the present application based on the graphic user interface operating System (Mac OS X)^TM) Multi-user, multi-process computer operating system (Unix)^TM) Free and open native code Unix-like operating System (Linux)^TM) Open native code Unix-like operating System (FreeBSD)^TM) Or the like.

In an exemplary embodiment, a non-transitory computer readable storage medium, such as the memory 1932, is also provided that includes computer program instructions executable by the processing component 1922 of the electronic device 1900 to perform the above-described methods.

The present disclosure may be systems, methods, and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for causing a processor to implement various aspects of the present disclosure.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

Having described embodiments of the present disclosure, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the disclosed embodiments. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims

1. A method of object detection, comprising:

acquiring a target image;

detecting the target image by using a target detection network to obtain a first detection result of a target object in the target image;

2. The method according to claim 1, wherein the detecting the target image by using a target detection network to obtain a first detection result of a target object in the target image comprises:

determining at least one first candidate region based on the target detection network;

and obtaining a first detection result comprising the category information and the position information of the target object based on the image characteristics of the at least one first candidate region.

3. The method according to claim 1 or 2, characterized in that the method further comprises:

processing the sample image through a neural network to obtain a second detection result and a thermodynamic diagram of a reference object in the sample image;

determining a network loss of the neural network based on the second detection result, the thermodynamic diagram and supervised information of the sample image, wherein the thermodynamic diagram is used for indicating a predicted position of the reference object in the sample image;

and adjusting network parameters of the neural network based on the network loss to obtain a trained neural network, wherein the trained neural network comprises the target detection network.

4. The method of claim 3, wherein the neural network comprises a detection branch and a segmentation branch; the processing the sample image through a neural network to obtain a second detection result and a thermodynamic diagram of the reference object in the sample image includes:

performing feature extraction on the sample image by using the neural network to obtain at least one second candidate region;

inputting the at least one second candidate region into the detection branch to obtain a second detection result of the reference object in the sample image;

and inputting the at least one second candidate region into the segmentation branch to obtain the thermodynamic diagram of the reference object.

5. The method of claim 4, wherein inputting the at least one second candidate region into the segmentation branch to obtain a thermodynamic diagram of the reference object comprises:

performing at least one deconvolution operation and/or at least one upsampling operation on the first image feature of the at least one second candidate region by using the segmentation branch to obtain a second image feature of the at least one second candidate region;

and carrying out normalization operation on the second image characteristics to obtain a thermodynamic diagram of the reference object.

6. The method according to any one of claims 3 to 5, wherein the supervisory information further includes a reference label indicating category information and position information of a reference object in the sample image; the determining a network loss of the neural network based on the second detection result, the thermodynamic diagram, and the supervisory information comprises:

determining a first loss of the neural network according to a comparison result of the second detection result and the reference label;

determining a second loss of the neural network according to the comparison result of the thermodynamic diagram and the mask diagram;

and obtaining the network loss of the neural network according to the first loss and the second loss.

7. The method of claim 6, wherein determining the second loss of the neural network based on the comparison of the thermodynamic diagram to the mask diagram comprises:

and according to the comparison result of the thermodynamic diagram and the mask diagram at each pixel position, performing weighted summation on the comparison result of the thermodynamic diagram and the mask diagram at a plurality of pixel positions to determine a second loss of the neural network, wherein the weight corresponding to the comparison result of one pixel position is related to the relative position relationship of the pixel position relative to the central pixel position.

8. The method according to any one of claims 1 to 7, further comprising:

acquiring a sample image marked with the reference area as a marked image;

and setting the reference region as a first characteristic value, and setting a non-reference region except the reference region in the annotation image as a second characteristic value to obtain the mask image, wherein the first characteristic value of each pixel position in the reference region is determined by the relative position relationship between each pixel position and the central pixel position.

9. The method of claim 8, further comprising:

and determining a first characteristic value of each pixel position in the reference area according to the distance between each pixel position in the reference area and the central pixel position, wherein the first characteristic value of each pixel position is in negative correlation with the distance.

10. An object detection device, comprising:

the acquisition module is used for acquiring a target image;

11. An electronic device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to invoke the memory-stored instructions to perform the method of any of claims 1 to 9.

12. A computer readable storage medium having computer program instructions stored thereon, which when executed by a processor implement the method of any one of claims 1 to 9.