CN113240638B

CN113240638B - Target detection method, device and medium based on deep learning

Info

Publication number: CN113240638B
Application number: CN202110518366.8A
Authority: CN
Inventors: 曲国祥
Original assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Current assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2023-11-10
Anticipated expiration: 2041-05-12
Also published as: CN113240638A

Abstract

The application provides a target detection method, equipment and medium based on deep learning, wherein the method comprises the following steps: acquiring an image to be detected; processing an image to be detected through a first detection network to obtain a target first detection frame; extracting a feature map of an image to be detected; cutting a feature map of the image to be detected to obtain a target cutting image containing a target first detection frame; inputting the target clipping image into a second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of the target second detection frame, wherein the centrality is the probability that the corresponding pixel point is the central point of the second detection frame, and the target pixel point is the pixel point with the centrality larger than the preset centrality threshold value; and determining each target second detection frame according to the offset between each target pixel point and each boundary of the corresponding target second detection frame. The application can solve the problem that adjacent objects in the image are difficult to accurately distinguish.

Description

Target detection method, device and medium based on deep learning

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a method, apparatus, and medium for detecting a target based on deep learning.

Background

The image-based computer-aided diagnosis technology mainly realizes automatic detection and identification of a target structure or focus through machine learning or deep learning. The conventional detection technology generally uses methods such as fitting ellipse, fast-RCNN, retina-Net and the like to directly locate and detect a target structure or a focus, but because of the complex human body structure, the imaging manifestations of different diseases have diversity and can be influenced by other diseases and human anatomy structures, a plurality of detection targets close to each other are often detected as a single object, so that false detection is caused. The statistics of the number of the target objects is affected, and the judgment of the size of the target objects is greatly deviated, so that the judgment of doctors on the illness state of patients is indirectly affected.

Taking lymph node detection as an example, documents [ Liu Fang, qu Qiuyi, li Lingling, and the like ] a gastric CT image lymph node detection system and method based on shape and ellipse fitting, 2013 ] the disclosed lymph node automatic detection system consists of preprocessing, interested boundary point detection, boundary ellipse fitting, region merging, and lymph node tracking extraction functional modules. The image processing device comprises a preprocessing module, an interested boundary point detection module and a data processing module, wherein the preprocessing module is used for preprocessing an image to be detected, and the interested boundary point detection module is used for further processing the preprocessed image to obtain an interested boundary point; the boundary ellipse fitting module is used for performing ellipse fitting on a curve formed by the interested boundary points to obtain an ellipse-like closed region; the region merging module is used for eliminating ambiguous regions formed by intersection of ellipses; the lymph node tracking and extracting module is used for carrying out window feature matching tracking on the suspected lymph nodes to finish lymph node extraction. Although the scheme can extract lymph nodes, the generalization performance is poor, and the accuracy of lymph node detection of a complex shape is not high.

Document [ Cao Hanjiang, xu Guoping ] discloses a lymph node detection method for improving a SegNet partition network, 2019 ] a lymph node detection method based on deep learning, which comprises the steps of firstly constructing the SegNet partition network based on a hole convolution operation, then training the SegNet partition network by using a training set, and optimizing the SegNet partition network by using a sine and cosine cross entropy loss function as a network optimization objective function, so that the identification and the partition of lymph nodes can be realized according to the optimized SegNet partition network. Although this method can divide and detect lymph nodes, it is easy to detect a plurality of lymph nodes close in distance as one, resulting in detection errors.

Disclosure of Invention

In order to solve the problem that adjacent objects in an image are difficult to accurately distinguish in the prior art, the application provides a target detection method, device and medium based on deep learning.

In order to achieve the above object, the present application provides a target detection method based on deep learning, including:

acquiring an image to be detected;

processing the image to be detected through a first detection network trained in advance to obtain a target first detection frame;

extracting a feature map of the image to be detected;

cutting the feature image of the image to be detected to obtain a target cutting image containing the target first detection frame;

inputting the target clipping image into a pre-trained second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of the target second detection frame, wherein the centrality is the probability that the corresponding pixel point is the central point of the second detection frame, and the target pixel point is the pixel point with the centrality larger than a preset centrality threshold value;

and determining each target second detection frame according to the offset between each target pixel point and each boundary of the corresponding target second detection frame.

In a preferred embodiment of the present application, the processing, by the first detection network trained in advance, the image to be detected to obtain a target first detection frame includes:

inputting the image to be detected into the first detection network to obtain the central positions, the sizes and the confidence coefficients of a plurality of first detection frames;

and when the confidence coefficient of a certain first detection frame is larger than a preset confidence coefficient threshold value, determining the first detection frame as the target first detection frame.

In a preferred embodiment of the present application, the cropping the feature map of the image to be detected to obtain a target cropped image including the target first detection frame includes:

and cutting the feature map of the image to be detected by taking the center position of the target first detection frame as the center and taking m times of the size of the target first detection frame as the cutting size, wherein m > =1.

In a preferred embodiment of the application, the method further comprises:

and removing the overlapped target second detection frames from the target second detection frames.

In a preferred embodiment of the present application, the training procedure of the first detection network is as follows:

acquiring a first sample set, wherein the first sample set comprises a plurality of first sample images and a first detection frame gold standard;

inputting the first sample image into a preset first detection network to obtain the central position and the size of a first prediction detection frame;

calculating a first model loss according to the central position and the size of the first prediction detection frame and the corresponding first detection frame gold standard;

training the first detection network according to the first model loss.

In a preferred embodiment of the present application, the training procedure of the second detection network is as follows:

acquiring a second sample set, wherein the second sample set comprises a plurality of second sample images and a second detection frame gold standard, the second detection frame gold standard comprises second labeling detection frames labeled in the second sample images, and at least part of the second sample images are labeled with two or more adjacent second labeling detection frames;

inputting the second sample image into a preset second detection network to obtain the prediction centrality of each pixel point in the second sample image and the prediction offset between each target pixel point and each boundary of a target second prediction detection frame to which the target pixel point belongs, wherein the prediction centrality is the prediction probability that the corresponding pixel point is the central point of the second prediction detection frame to which the target pixel point belongs, and the target pixel point in the second sample image is the pixel point with the prediction centrality greater than the preset centrality threshold value;

calculating a second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the target second prediction detection frame to which the target pixel point belongs, and a corresponding second detection frame gold standard;

and training the second detection network type according to the second model loss.

In a preferred embodiment of the present application, the calculating the second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the target second prediction detection frame to which each target pixel point belongs, and the corresponding second detection frame gold standard includes:

acquiring a second detection frame gold standard, wherein the second detection frame gold standard comprises a plurality of second labeling detection frames corresponding to the second sample image;

calculating the standard centrality of each pixel point in the second sample image based on the second annotation detection frames, wherein the standard centrality is the standard probability that the corresponding pixel point is the central point of the second annotation detection frame;

calculating standard offset between each target pixel point and each boundary of the second labeling detection frame;

and calculating the second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the affiliated target second prediction detection frame, the standard centrality of each pixel point and the standard offset between each target pixel point and each boundary of the affiliated second annotation detection frame.

In a preferred embodiment of the present application, the calculating, based on the plurality of second labeling detection frames, a standard centrality of each pixel point in the second sample image includes:

when the second sample image is a 2D image, calculating a standard center degree C of each pixel point in the second labeling detection frame in the second sample image according to the following formula:

wherein l ^* R, u and d respectively represent distances between the corresponding pixel point and the left, right, upper and lower boundaries in the second labeling detection frame;

when the second sample image is a 3D image, calculating a standard center degree C of each pixel point in the second labeling detection frame in the second sample image according to the following formula:

wherein l ^* R, u, d, f, and b respectively represent distances between the corresponding pixel point and the left, right, upper, lower, front, and rear boundaries in the second labeling detection frame;

when the second sample image is a 2D or 3D image, the standard centrality C of each pixel point outside the second annotation detection frame in the second sample image is 0;

when a certain pixel point in the second sample image is simultaneously located in n second labeling detection frames, and n is greater than 1, the centrality C of the pixel point is as follows: c=max (C ₁ ,C ₂ ,…,C _n ) Wherein C _i And representing the centrality of the pixel point acquired based on the ith second annotation detection frame.

To achieve the above object, the present application also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the steps of the aforementioned method when executing the computer program.

In order to achieve the above object, the present application also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the aforementioned method.

By adopting the technical scheme, the application has the following beneficial effects:

firstly, processing the image to be detected through a first detection network trained in advance to obtain a target first detection frame; then extracting a feature image of the image to be detected, and cutting the feature image of the image to be detected to obtain a target cutting image containing the target first detection frame; and finally, inputting the target cutting image into a pre-trained second detection network to obtain the centrality of each pixel point in the target cutting image and the offset between each target pixel point and each boundary of the corresponding target second detection frame, and determining each target second detection frame according to the offset. Therefore, on the basis of coarse detection through the first detection network, the second detection network is added for fine detection, the second detection network can detect the centrality of each pixel point in the target cutting image, when the centrality of a certain pixel point is larger than a preset probability threshold value, the pixel point is used as a target pixel point to represent the central point of a corresponding target object to be detected in the target cutting image, the number of target objects (namely target second detection frames) can be determined according to the number of the central points, and the corresponding target second detection frames can be positioned by combining the offset between the pixel point and each boundary of the corresponding target second detection frames.

Drawings

Fig. 1 is a flow chart of a target detection method based on deep learning in embodiment 1 of the present application;

FIG. 2 is a schematic diagram of lymph node detection according to the deep learning-based target detection method of embodiment 1 of the present application;

fig. 3 is a schematic flow chart of training a first detection network in embodiment 2 of the present application;

FIG. 4 is a flow chart of training a second detection network in embodiment 3 of the present application;

FIG. 5 is a block diagram showing the structure of a deep learning-based object detection system according to embodiment 4 of the present application;

fig. 6 is a hardware architecture diagram of an electronic device according to embodiment 5 of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

Example 1

The embodiment provides a target detection method based on deep learning, as shown in fig. 1 and fig. 2, the method specifically includes the following steps:

s11, acquiring an image to be detected.

In this embodiment, the image to be detected may be acquired from the PACS (Picture Archiving and Communication Systems, image archiving and communication system), or may be acquired in real time from the image acquisition device.

Alternatively, the image to be detected may be a computed tomography (Computed Tomography, CT) image, a magnetic resonance (Magnetic Resonance Imaging, MRI) image, a low dose positron emission computed tomography (Positron Emission Computed Tomography/Magnetic Resonance Imaging, PET) image or other modality image, and the embodiment does not specifically limit the modality of the image to be detected.

S12, processing the image to be detected through a first detection network trained in advance to obtain a target first detection frame.

In this embodiment, the first detection network CNN1 may be any suitable first-order or multi-order detection network such as Fast-RCNN, retina-Net, yolo, etc. Inputting an image to be detected into a first detection network CNN1 for coarse detection processing, and obtaining the central positions, the sizes and the confidence coefficients of a plurality of first detection frames; and when the confidence coefficient of a certain first detection frame is larger than a preset confidence coefficient threshold value, determining the first detection frame as the target first detection frame, as shown in fig. 2. Wherein the target first detection frame may include more than one adjacent target object.

Optionally, before inputting the image to be detected into the first detection network CNN1, the embodiment further includes preprocessing the image to be detected, for example, preprocessing such as window width and level taking, pixel normalization, gaussian filtering, etc., so as to reduce interference of noise on the network, make the image features more vivid, and reduce learning difficulty.

S13, extracting the feature map of the image to be detected.

In this embodiment, the multi-scale feature map of the image to be detected may be extracted by a plurality of residual blocks that are connected in a symmetrically jumping manner, so as to ensure effective extraction of the image features in shallow layers and deep layers.

S14, cutting the feature map of the image to be detected to obtain a target cutting image containing the target first detection frame.

In the present embodiment, the clipping method includes, but is not limited to, clipping using ROI Pooling or ROI alignment, or the like. Specifically, the feature map of the image to be detected is clipped with the center position of the target first detection frame as the center and m times the size of the target first detection frame as the clipping size, where m > =1. For example, the resulting target crop image is shown in FIG. 2.

S15, inputting the target clipping image into a pre-trained second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of the corresponding target second detection frame, wherein the centrality is the probability that the corresponding pixel point is the central point of the second detection frame, and the target pixel point is the pixel point (between 0 and 1) with the centrality larger than a preset centrality threshold value.

When the target clipping image is input into the second detection network CNN2, as shown in fig. 2, a centrality graph of the target clipping image may be obtained, and also an offset (i.e., the number of offset pixels) between each target pixel point and each boundary of the second detection frame to which the target pixel point belongs may be obtained.

Preferably, before the target cut image is input to the second detection network, it may be subjected to interpolation processing in advance so that the resolution thereof is the same as that of the image to be detected.

S16, determining each target second detection frame according to the offset between each target pixel point and each boundary of the corresponding target second detection frame.

For example, when the centrality of the pixel points a and B in the target clipping image is greater than the preset centrality threshold, two independent target objects are considered to be in the target clipping image, the pixel points a and B are respectively the centers of the two target objects, and the positions of the boundaries of the two target second detection frames can be determined according to the coordinate positions of the pixel points A, B and the offset between the pixel points A, B and the boundaries of the target second detection frames, so that accurate identification of adjacent target objects is realized.

Preferably, the method of the present embodiment may further include removing the target second detection frame that is superfluous and overlapped from the determined target second detection frames. In particular, NMS, greedy or other optimization algorithms may be used to eliminate redundant overlapping detection boxes.

Therefore, in this embodiment, on the basis of performing coarse detection through the first detection network, a second detection network is added to perform fine detection, where the second detection network can detect the centrality of each pixel point in the target cutting image, when the centrality of a certain pixel point is greater than a preset probability threshold, the pixel point is used as a target pixel point to represent the central point of a corresponding target object to be detected in the target cutting image, the number of target objects (i.e., target second detection frames) can be determined according to the number of central points, and then the corresponding target second detection frames can be positioned by combining the offset between the pixel point and each boundary of the corresponding target second detection frame.

Example 2

This embodiment is a further improvement of embodiment 1, and as shown in fig. 3, the training process of the first detection network is specifically defined as follows:

s21, acquiring a first sample set, wherein the first sample set comprises a plurality of first sample images and a first detection frame gold standard.

In this embodiment, the first sample image may be acquired from a PACS (Picture Archiving and Communication Systems, image archiving and communication system) or may be acquired in real time from an image acquisition device.

Alternatively, the first sample image may be a computed tomography (Computed Tomography, CT) image, a magnetic resonance (Magnetic Resonance Imaging, MRI) image, a low dose positron emission computed tomography (Positron Emission Computed Tomography/Magnetic Resonance Imaging, PET) image or other modality image, and the modality of the first sample image is not particularly limited in this embodiment, but it should be understood that the modality of the first sample image should be consistent with the modality of the image to be detected as described above.

S22, inputting the first sample image into a preset first detection network CNN1 to obtain the central position, the size and the confidence of the first prediction detection frame.

For example, when the first sample image is a 2D image, the first detection network CNN1 will obtain the center position coordinates (x, y), the two-dimensional size (w, h), and the confidence level (p) of the first prediction detection frame, and output a confidence level p greater than the confidence level threshold p ₀ As a prediction result. When the first sample image is a 3D image, the first detection network CNN1 obtains the central position coordinates (x, y, z), the two-dimensional size (w, h, D) and the confidence coefficient (p) of the first prediction detection frame, and outputs a confidence coefficient p greater than the confidence coefficient threshold p ₀ As a prediction result.

Optionally, the present embodiment further comprises preprocessing the first sample image before inputting the first sample image into the first detection network CNN 1. The preprocessing process is consistent with the preprocessing process of the image to be detected.

S23, calculating a first model loss according to the central position, the size and the confidence coefficient of the first prediction detection frame output by the first detection network CNN1 and the corresponding first detection frame gold standard.

In this embodiment, a first detection frame gold standard (including a center position, a size, and a confidence) that is manually marked is used as a gold standard of the first detection network, so as to calculate a corresponding first model loss.

S24, performing iterative training on the first detection network according to the first model loss until the first model loss converges or reaches preset iteration times.

In this embodiment, the function of the first model loss is dependent on the specific structure of the first detection network.

The first detection network trained by the embodiment can accurately obtain the center position, the size and the confidence of the first prediction detection frame in the image to be detected.

Example 3

This example is a further improvement over examples 1 or 2. As shown in fig. 4, the training process of the second detection network is specifically defined as follows:

s31, a second sample set is obtained, the second sample set comprises a plurality of second sample images and a second detection frame gold standard, the second detection frame gold standard comprises second labeling detection frames labeled in the second sample images, and at least part of the second sample images are labeled with two or more adjacent second labeling detection frames.

In this embodiment, the second sample image may be an image obtained by performing rectangular clipping on the feature map of the first sample image. When the first sample image is cut, an individual target object in the first sample image may be cut, a plurality of adjacent target objects in the first sample image may be cut, and a blank area (i.e., an area not containing the target object) in the first sample image may be cut, so that the second sample image may include a single target object, or a plurality of adjacent target objects, or no target object, thereby improving the robustness of the second detection network.

S32, inputting the second sample image into a preset second detection network to obtain the prediction centrality of each pixel point in the second sample image and the prediction offset between each target pixel point and each boundary of the second prediction detection frame of the target to which the target pixel point belongs, wherein the prediction centrality is the prediction probability that the corresponding pixel point is the central point of the second prediction detection frame to which the target pixel point belongs, and the target pixel point in the second sample image is the pixel point with the prediction centrality larger than the preset centrality threshold value.

S33, calculating a second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the target second prediction detection frame to which the target pixel point belongs, and a corresponding second detection frame gold standard. The specific implementation process is as follows:

s331, acquiring the second detection frame gold standard, wherein the second detection frame gold standard comprises a plurality of second annotation detection frames corresponding to the second sample image.

And S332, calculating the standard centrality of each pixel point in the second sample image based on the second annotation detection frames, wherein the standard centrality is the standard probability that the corresponding pixel point is the central point of the second annotation detection frame.

Specifically, when the second sample image is a 2D image, calculating a standard centrality C of each pixel point in the second labeling detection frame in the second sample image according to the following formula:

wherein l ^* And r, u and d respectively represent distances between the corresponding pixel point and the left, right, upper and lower boundaries in the second labeling detection frame.

wherein l ^* And r, u, d, f and b respectively represent distances between the corresponding pixel point and the left, right, upper, lower, front and rear boundaries in the second labeling detection frame.

When the second sample image is a 2D or 3D image, the standard centrality C of each pixel point outside the second labeling detection frame in the second sample image is 0.

When a certain pixel point in the second sample image is simultaneously located in n (n is greater than 1) second labeling detection frames, the centrality C of the pixel point is as follows: c=max (C ₁ ,C ₂ ,…,C _n ) Wherein C _i Representing the centrality of the pixel point acquired based on the ith second labeling detection frame, namely C _i For the probability that the pixel point is the center point of the ith second label detection frame, max (C ₁ ,C ₂ ,…,C _n ) The representation taking C ₁ ,C ₂ ,…,C _n Is the maximum value of (a).

S333, calculating the standard offset between each target pixel point and each boundary of the second annotation detection frame.

Specifically, when the centrality of a pixel point is greater than a preset centrality threshold c ₀ And when the pixel point is the target pixel point, the standard offset can be obtained by calculating the distance between the pixel point and each boundary of the corresponding second annotation detection frame.

S334, calculating the second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the second target prediction detection frame, the standard centrality of each pixel point and the standard offset between each target pixel point and each boundary of the second label detection frame.

In this embodiment, the second model loss may be L1, L2, or other regression loss.

And S34, performing iterative training on the second detection network type according to the second model loss until the second model loss converges or reaches the preset iteration times.

The second detection network obtained through training in the embodiment can accurately position the position of the target second prediction detection frame in the image to be detected.

Example 4

The present embodiment provides a target detection system based on deep learning, as shown in fig. 5, the system includes: the device comprises an image acquisition module 11, a first detection network processing module 12, a feature extraction module 13, a clipping module 14, a second detection network processing module 15 and a target detection module 16. The functions of the above modules are described in detail below:

the image acquisition module 11 is used for acquiring an image to be detected.

The first detection network processing module 12 is configured to process the image to be detected through a first detection network trained in advance, so as to obtain a target first detection frame.

The feature extraction module 13 is configured to extract a feature map of the image to be detected.

The cropping module 14 is configured to crop the feature map of the image to be detected, so as to obtain a target cropping image including the target first detection frame.

The second detection network processing module 15 is configured to input the target cropping image into a pre-trained second detection network, and obtain a centrality of each pixel point in the target cropping image and an offset between each target pixel point and each boundary of the target second detection frame to which the target pixel point belongs, where the centrality is a probability that the corresponding pixel point is a central point of the second detection frame, and the target pixel point is a pixel point (belonging to between 0 and 1) with a centrality greater than a preset centrality threshold.

The target detection module 16 is configured to determine each target second detection frame according to an offset between each target pixel point and each boundary of the target second detection frame to which the target pixel point belongs.

The training process of the first detection network and the second detection network in this embodiment is shown with reference to embodiments 2 and 3.

Example 5

The present embodiment provides an electronic device, which may be expressed in the form of a computing device (for example, may be a server device), including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor may implement the deep learning-based object detection method provided in embodiments 1, 2, or 3 when the processor executes the computer program.

Fig. 6 shows a schematic diagram of the hardware structure of the present embodiment, and as shown in fig. 6, the electronic device 9 specifically includes:

at least one processor 91, at least one memory 92, and a bus 93 for connecting the different system components (including the processor 91 and the memory 92), wherein:

the bus 93 includes a data bus, an address bus, and a control bus.

The memory 92 includes volatile memory such as Random Access Memory (RAM) 921 and/or cache memory 922, and may further include Read Only Memory (ROM) 923.

Memory 92 also includes a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The processor 91 executes various functional applications and data processing such as the deep learning-based object detection method provided in embodiment 1, 2 or 3 of the present application by running a computer program stored in the memory 92.

The electronic device 9 may further communicate with one or more external devices 94 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 95. Also, the electronic device 9 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, through a network adapter 96. The network adapter 96 communicates with other modules of the electronic device 9 via the bus 93. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the electronic device 9, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.

It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present application. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

Example 6

The present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the deep learning-based object detection method provided in embodiment 1, 2 or 3.

More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible embodiment, the application may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps of implementing the deep learning based object detection method as described in embodiments 1, 2 or 3 when said program product is run on the terminal device.

Wherein the program code for carrying out the application may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on the remote device or entirely on the remote device.

While specific embodiments of the application have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and the scope of the application is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the application, but such changes and modifications fall within the scope of the application.

Claims

1. The target detection method based on deep learning is characterized by comprising the following steps of:

acquiring an image to be detected;

extracting a feature map of the image to be detected;

2. The method for detecting a target according to claim 1, wherein the processing the image to be detected through the first detection network trained in advance to obtain a first detection frame of the target includes:

3. The method for detecting an object according to claim 1, wherein the cropping the feature map of the image to be detected to obtain a target cropped image including the target first detection frame comprises:

4. The target detection method according to claim 1, characterized in that the method further comprises:

5. The target detection method according to claim 1, wherein the training process of the first detection network is as follows:

training the first detection network according to the first model loss.

6. The target detection method according to claim 1, wherein the training process of the second detection network is as follows:

7. The method of claim 6, wherein calculating a second model loss based on the predicted centrality of each pixel point in the second sample image, the predicted offset between each target pixel point and each boundary of the second predicted detection frame of the target to which the target pixel point belongs, and a corresponding second detection frame gold standard, comprises:

calculating the standard centrality of each pixel point in the second sample image based on the second annotation detection frame, wherein the standard centrality is the standard probability that the corresponding pixel point is the central point of the second annotation detection frame;

8. The method of claim 7, wherein calculating the standard centrality of each pixel in the second sample image based on the second labeling detection frame comprises:

when a certain pixel point in the second sample image is simultaneously located in n second labeling detection frames, and n is greater than 1, the centrality C of the pixel point is as follows: c=max(C ₁ ,C ₂ ,…,C _n ) Wherein C _i And representing the centrality of the pixel point acquired based on the ith second annotation detection frame.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed by the processor.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 8.