CN113240638A

CN113240638A - Target detection method, device and medium based on deep learning

Info

Publication number: CN113240638A
Application number: CN202110518366.8A
Authority: CN
Inventors: 曲国祥
Original assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Current assignee: Shanghai United Imaging Intelligent Healthcare Co Ltd
Priority date: 2021-05-12
Filing date: 2021-05-12
Publication date: 2021-08-10
Anticipated expiration: 2041-05-12
Also published as: CN113240638B

Abstract

The invention provides a target detection method, equipment and medium based on deep learning, wherein the method comprises the following steps: acquiring an image to be detected; processing an image to be detected through a first detection network to obtain a target first detection frame; extracting a characteristic diagram of an image to be detected; cutting a feature map of an image to be detected to obtain a target cut image containing a target first detection frame; inputting the target clipping image into a second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of a second target detection frame to which the target clipping image belongs, wherein the centrality is the probability that the corresponding pixel point is the central point of the second detection frame, and the target pixel point is the pixel point of which the centrality is greater than a preset centrality threshold; and determining each target second detection frame according to the offset between each target pixel point and each boundary of the corresponding target second detection frame. The invention can solve the problem that adjacent objects in the image are difficult to distinguish accurately.

Description

Target detection method, device and medium based on deep learning

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a medium for target detection based on deep learning.

Background

The image-based computer-aided diagnosis technology mainly realizes automatic detection and identification of target structures or lesions through machine learning or deep learning. The traditional detection technology generally uses methods such as fitting ellipse, fast-RCNN, Retina-Net and the like to directly position and detect a target structure or a focus, but because the human body structure is complex, the imaging performance of different diseases has diversity and can be influenced by other diseases and human anatomy structures, a plurality of detection targets which are close to each other are often detected as a single object, thereby causing false detection. This not only influences the statistics of target object quantity, still can lead to the size of target object to judge the big deviation that appears, and then indirectly influences the judgement of doctor to the patient's state of an illness.

Taking lymph node detection as an example, documents [ Liu Fang, Dianthus superbus and Job's tears, Li Ling, and the like ] stomach CT image lymph node detection system and method based on shape and ellipse fitting, 2013 ] discloses an automatic lymph node detection system which comprises preprocessing, interesting boundary point detection, boundary ellipse fitting, region merging and lymph node tracking extraction functional modules. The system comprises a preprocessing module, an interest boundary point detecting module and a boundary point detecting module, wherein the preprocessing module is used for preprocessing an image to be detected, and the interest boundary point detecting module is used for further processing the preprocessed image to obtain an interest boundary point; the boundary ellipse fitting module is used for carrying out ellipse fitting on a curve formed by the interested boundary points to obtain an ellipse-like closed area; the region merging module is used for eliminating ambiguous regions formed by the intersection of the ellipses; and the lymph node tracking and extracting module is used for carrying out window characteristic matching and tracking on the suspected lymph nodes to finish lymph node extraction. Although the scheme can extract the lymph nodes, the generalization performance is poor, and the detection accuracy rate of the lymph nodes with complex shapes is not high.

The document [ Caohanqiang, Xuzhou Ping ] discloses a lymph node detection method for improving a segNet segmentation network, 2019. Although this method can segment and detect lymph nodes, it is easy to detect a plurality of lymph nodes close to each other in a single distance, which causes detection errors.

Disclosure of Invention

The invention provides a target detection method, device and medium based on deep learning, aiming at solving the problem that adjacent objects in an image are difficult to distinguish accurately in the prior art.

In order to achieve the above object, the present invention provides a target detection method based on deep learning, including:

acquiring an image to be detected;

processing the image to be detected through a pre-trained first detection network to obtain a target first detection frame;

extracting a characteristic diagram of the image to be detected;

cutting the characteristic graph of the image to be detected to obtain a target cut image containing the target first detection frame;

inputting the target clipping image into a pre-trained second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of a second target detection frame, wherein the centrality is the probability that the corresponding pixel point is the central point of the second target detection frame, and the target pixel point is the pixel point with the centrality larger than a preset centrality threshold;

and determining each target second detection frame according to the offset between each target pixel point and each boundary of the target second detection frame.

In a preferred embodiment of the present invention, the processing the image to be detected through a pre-trained first detection network to obtain a first target detection frame includes:

inputting the image to be detected into the first detection network to obtain the central position, the size and the confidence coefficient of a plurality of first detection frames;

and when the confidence coefficient of a certain first detection frame is greater than a preset confidence coefficient threshold value, determining that the first detection frame is the target first detection frame.

In a preferred embodiment of the present invention, the cropping the feature map of the image to be detected to obtain an object cropped image including the first object detection frame includes:

and cutting the feature map of the image to be detected by taking the central position of the first target detection frame as the center and taking m times of the size of the first target detection frame as the cutting size, wherein m > is 1.

In a preferred embodiment of the present invention, the method further comprises:

removing the overlapped target second detection frame from each of the target second detection frames.

In a preferred embodiment of the present invention, the training process of the first detection network is as follows:

acquiring a first sample set, wherein the first sample set comprises a plurality of first sample images and a first detection frame gold standard;

inputting the first sample image into a preset first detection network to obtain the central position and the size of a first prediction detection frame;

calculating a first model loss according to the central position and the size of the first prediction detection frame and the corresponding first detection frame gold standard;

and training the first detection network according to the first model loss.

In a preferred embodiment of the present invention, the training process of the second detection network is as follows:

acquiring a second sample set, wherein the second sample set comprises a plurality of second sample images and a second detection frame gold standard, the second detection frame gold standard comprises a second labeling detection frame labeled in the second sample images, and at least part of the second sample images are labeled with two or more adjacent second labeling detection frames;

inputting the second sample image into a preset second detection network to obtain the prediction centrality of each pixel point in the second sample image and the prediction offset between each target pixel point and each boundary of the target second prediction detection frame, wherein the prediction centrality is the prediction probability that the corresponding pixel point is the central point of the second prediction detection frame, and the target pixel point in the second sample image is the pixel point of which the prediction centrality is greater than the preset centrality threshold;

calculating a second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the target second prediction detection frame to which the target pixel point belongs, and a corresponding second detection frame gold standard;

and training the second detection network type according to the second model loss.

In a preferred embodiment of the present invention, the calculating a second model loss according to the prediction center degree of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the target second prediction detection frame to which the target pixel point belongs, and the corresponding second detection frame gold standard includes:

acquiring a second detection frame gold standard, wherein the second detection frame gold standard comprises a plurality of second labeling detection frames corresponding to the second sample image;

calculating the standard centrality of each pixel point in the second sample image based on the plurality of second labeling detection frames, wherein the standard centrality is the standard probability that the corresponding pixel point is the central point of the second labeling detection frame to which the corresponding pixel point belongs;

calculating the standard offset between each target pixel point and each boundary of the second labeling detection frame to which the target pixel point belongs;

and calculating the second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the second target prediction detection frame, the standard centrality of each pixel point and the standard offset between each target pixel point and each boundary of the second target labeling detection frame.

In a preferred embodiment of the present invention, the calculating the standard centrality of each pixel point in the second sample image based on the plurality of second annotation detection frames includes:

when the second sample image is a 2D image, calculating the standard centrality C of each pixel point in the second label detection frame in the second sample image according to the following formula:

wherein l^*R, u and d respectively represent the distances between the corresponding pixel points and the left, right, upper and lower boundaries in the second labeling detection frame;

when the second sample image is a 3D image, calculating the standard centrality C of each pixel point in the second label detection frame in the second sample image according to the following formula:

wherein l^*R, u, d, f and b respectively represent the distances between the corresponding pixel points and the left, right, upper, lower, front and rear boundaries in the second labeling detection frame;

when the second sample image is a 2D or 3D image, the standard centrality C of each pixel point outside the second label detection frame in the second sample image is 0;

when a certain pixel point in the second sample image is simultaneously located in the n second annotation detection frames, and n is greater than 1, the centrality C of the pixel point is as follows: max (C)₁,C₂,…,C_n) Wherein, C_iAnd representing the centrality of the pixel point obtained based on the ith second label detection frame.

In order to achieve the above object, the present invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the aforementioned method when executing the computer program.

In order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the aforementioned method.

By adopting the technical scheme, the invention has the following beneficial effects:

firstly, processing the image to be detected through a pre-trained first detection network to obtain a target first detection frame; then extracting a characteristic diagram of the image to be detected, and cutting the characteristic diagram of the image to be detected to obtain a target cut image containing the target first detection frame; and finally, inputting the target clipping image into a pre-trained second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of the target second detection frame, and determining each target second detection frame according to the offset. Therefore, the invention adds a second detection network for fine detection on the basis of coarse detection through the first detection network, the second detection network can detect the centrality of each pixel point in the target clipping image, when the centrality of a certain pixel point is greater than a preset probability threshold, the pixel point is taken as a target pixel point to represent the central point of a corresponding target object to be detected in the target clipping image, the number of the target objects (namely, target second detection frames) can be determined according to the number of the central points, and then the corresponding target second detection frames can be positioned by combining the offset between the pixel point and each boundary of the corresponding target second detection frames.

Drawings

Fig. 1 is a schematic flowchart of a deep learning-based target detection method according to embodiment 1 of the present invention;

FIG. 2 is a schematic diagram of lymph node detection by the deep learning-based target detection method according to embodiment 1 of the present invention;

fig. 3 is a schematic flowchart of training a first detection network according to embodiment 2 of the present invention;

fig. 4 is a schematic flowchart of training a second detection network according to embodiment 3 of the present invention;

fig. 5 is a block diagram of a deep learning-based target detection system according to embodiment 4 of the present invention;

fig. 6 is a hardware architecture diagram of an electronic device according to embodiment 5 of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

Example 1

The present embodiment provides a target detection method based on deep learning, as shown in fig. 1 and fig. 2, the method specifically includes the following steps:

and S11, acquiring an image to be detected.

In the present embodiment, the image to be detected may be acquired from a PACS (Picture Archiving and Communication Systems), or may be acquired in real time from an image acquisition device.

Alternatively, the image to be detected may be a Computed Tomography (CT) image, a Magnetic Resonance Imaging (MRI) image, a low-dose Positron Emission Tomography (PET) image, or other modality images, and the modality of the image to be detected is not particularly limited in this embodiment.

And S12, processing the image to be detected through a pre-trained first detection network to obtain a first target detection frame.

In this embodiment, the first detection network CNN1 may be Fast-RCNN, Retina-Net, Yolo, or any other suitable one-or multi-stage detection network. Inputting an image to be detected into a first detection network CNN1 for coarse detection processing, and obtaining the central positions, sizes and confidence degrees of a plurality of first detection frames; when the confidence of a certain first detection frame is greater than a preset confidence threshold, determining that the first detection frame is the target first detection frame, as shown in fig. 2. Wherein, more than one adjacent target object may be included in the target first detection frame.

Optionally, before the image to be detected is input into the first detection network CNN1, the embodiment further includes preprocessing the image to be detected, for example, preprocessing such as window width level taking, pixel normalization, gaussian filtering, so as to reduce interference of noise on the network, make image features more vivid, and reduce difficulty in learning.

And S13, extracting the characteristic diagram of the image to be detected.

In this embodiment, a multi-scale feature map of an image to be detected can be extracted through a plurality of symmetrically jumping connected residual blocks, so as to ensure effective extraction of image features in shallow layers and deep layers.

And S14, cutting the characteristic diagram of the image to be detected to obtain a target cut image containing the target first detection frame.

In this embodiment, the cutting manner includes, but is not limited to, cutting with ROI Pooling or ROI Align. Specifically, the feature map of the image to be detected is cropped by taking the central position of the first target detection frame as the center and taking m times of the size of the first target detection frame as the cropping size, wherein m > is 1. For example, the resulting object trimming image is shown in fig. 2.

And S15, inputting the target clipping image into a pre-trained second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of the target second detection frame to which the target clipping image belongs, wherein the centrality is the probability that the corresponding pixel point is the central point of the second detection frame, and the target pixel point is a pixel point (belonging to a range between 0 and 1) with the centrality larger than a preset centrality threshold.

When the target clipped image is input to the second detection network CNN2, as shown in fig. 2, a central degree map of the target clipped image may be obtained, and an offset amount (i.e., an offset pixel number) between each target pixel point and each boundary of the second detection frame to which the target pixel point belongs may also be obtained.

Preferably, before the target trimming image is input to the second detection network, the target trimming image may be subjected to interpolation processing in advance so that the resolution thereof is the same as that of the image to be detected.

S16, determining each target second detection frame according to the offset between each target pixel point and each boundary of the target second detection frame.

For example, when the centrality of the pixel points a and B in the target clipping image is greater than the preset centrality threshold, it is determined that there are two separate target objects in the target clipping image, and the pixel points a and B are the centers of the two target objects, and the positions of the boundaries of the two target second detection frames can be determined according to the coordinate position of the pixel point A, B and the offset between the pixel point A, B and the boundaries of the target second detection frames, so that the accurate identification of the adjacent target objects is realized.

Preferably, the method of this embodiment may further include removing the redundantly overlapped target second detection frames from each of the determined target second detection frames. In particular, redundant overlap detection boxes may be eliminated using an NMS, greedy algorithm, or other optimization algorithm.

It can be seen that, in this embodiment, on the basis of performing coarse detection through the first detection network, a second detection network is added to perform fine detection, and the second detection network can detect the centrality of each pixel point in the target clipping image, and when the centrality of a certain pixel point is greater than a preset probability threshold, the pixel point is used as a target pixel point to represent a central point of a target object to be detected in the target clipping image, the number of target objects (i.e., target second detection frames) can be determined according to the number of the central points, and then, in combination with offsets between the pixel point and each boundary of the corresponding target second detection frame, the corresponding target second detection frame can be positioned, and according to this way, each target object in the target clipping image can be accurately positioned, so that the problem that adjacent objects are difficult to be accurately distinguished is solved.

Example 2

The embodiment is a further improvement of embodiment 1, and as shown in fig. 3, the embodiment specifically defines the training process of the first detection network as follows:

s21, a first sample set is obtained, and the first sample set comprises a plurality of first sample images and a first detection frame gold standard.

In the present embodiment, the first sample image may be acquired from a PACS (Picture Archiving and Communication Systems) or may be acquired in real time from an image capturing apparatus.

Alternatively, the first sample image may be a Computed Tomography (CT) image, a Magnetic Resonance Imaging (MRI) image, a low-dose Positron Emission Tomography (PET) image or other modality image, and the modality of the first sample image is not particularly limited, but it should be understood that the modality of the first sample image should be consistent with the modality of the image to be detected.

And S22, inputting the first sample image into a preset first detection network CNN1 to obtain the central position, the size and the confidence of the first prediction detection frame.

For example, when the first sample image is a 2D image, the first detection network CNN1 will get a first prediction detection boxIs calculated, and the center position coordinates (x, y), two-dimensional size (w, h), and confidence (p) are output, and the confidence p is greater than a confidence threshold p₀As a prediction result, all the first prediction detection blocks of (a). When the first sample image is a 3D image, the first detection network CNN1 obtains the coordinates (x, y, z) of the center position, the two-dimensional size (w, h, D), and the confidence level (p) of the first predicted detection frame, and outputs the confidence level p greater than the confidence level threshold p₀As a prediction result, all the first prediction detection blocks of (a).

Optionally, before inputting the first sample image into the first detection network CNN1, the present embodiment further includes preprocessing the first sample image. Wherein the preprocessing process is consistent with the preprocessing process of the image to be detected.

S23, calculating a first model loss according to the center position, size and confidence of the first prediction detection frame output by the first detection network CNN1 and the corresponding first detection frame gold standard.

In this embodiment, manually labeled first detection frame gold criteria (including center position, size, confidence) are used as the gold criteria for the first detection network to calculate the corresponding first model loss.

And S24, performing iterative training on the first detection network according to the first model loss until the first model loss converges or a preset iteration number is reached.

In this embodiment, the function of the first model penalty depends on the specific structure of the first detection network.

The first detection network obtained through training of the embodiment can accurately obtain the central position, the size and the confidence coefficient of the first prediction detection frame in the image to be detected.

Example 3

This example is a further modification of example 1 or 2. As shown in fig. 4, this embodiment specifically defines the training process of the second detection network as follows:

and S31, acquiring a second sample set, wherein the second sample set comprises a plurality of second sample images and a second detection frame gold standard, the second detection frame gold standard comprises a second labeling detection frame labeled in the second sample images, and at least part of the second sample images are labeled with two or more adjacent second labeling detection frames.

In this embodiment, the second sample image may be an image obtained by rectangular-clipping the feature map of the first sample image. When the first sample image is cropped, a single target object in the first sample image may be cropped, a plurality of adjacent target objects in the first sample image may be cropped, or a blank area (i.e., an area not containing a target object) in the first sample image may be cropped, so that the second sample image may include a single target object, or a plurality of adjacent target objects, or no target object, thereby improving the robustness of the second detection network.

And S32, inputting the second sample image into a preset second detection network to obtain the prediction centrality of each pixel point in the second sample image and the prediction offset between each target pixel point and each boundary of the target second prediction detection frame, wherein the prediction centrality is the prediction probability that the corresponding pixel point is the central point of the second prediction detection frame, and the target pixel point in the second sample image is the pixel point of which the prediction centrality is greater than the preset centrality threshold.

And S33, calculating a second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the target second prediction detection frame to which the target pixel point belongs, and the corresponding second detection frame gold standard. The specific implementation process is as follows:

and S331, obtaining the second detection frame gold standard, wherein the second detection frame gold standard comprises a plurality of second labeling detection frames corresponding to the second sample image.

And S332, calculating the standard centrality of each pixel point in the second sample image based on the plurality of second labeling detection frames, wherein the standard centrality is the standard probability that the corresponding pixel point is the central point of the second labeling detection frame to which the corresponding pixel point belongs.

Specifically, when the second sample image is a 2D image, the standard centrality C of each pixel point in the second annotation detection frame in the second sample image is calculated by the following formula:

wherein l^*R, u, d respectively represent the distance between the corresponding pixel point and the left, right, upper and lower boundaries in the second label detection frame.

wherein l^*R, u, d, f and b respectively represent the distances between the corresponding pixel points and the left, right, upper, lower, front and rear boundaries in the second label detection frame.

And when the second sample image is a 2D or 3D image, the standard centrality C of each pixel point outside the second labeling detection frame in the second sample image is 0.

When a certain pixel point in the second sample image is simultaneously located in n (n is greater than 1) second label detection frames, the centrality C of the pixel point is as follows: max (C)₁,C₂,…,C_n) Wherein, C_iRepresenting the centrality of the pixel point obtained based on the ith second label detection box, namely C_iProbability of the pixel point being the center point of the ith second label detection frame, max (C)₁,C₂,…,C_n) Represents taking C₁,C₂,…,C_nMaximum value of (2).

S333, calculating the standard offset between each target pixel point and each boundary of the second labeling detection frame.

Specifically, when the centrality of a certain pixel point is greater than the preset centreDegree threshold c₀And then, the pixel point is the target pixel point, and the standard offset can be obtained by calculating the distance between the pixel point and each boundary of the corresponding second label detection frame.

And S334, calculating the second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the second prediction detection frame of the target to which the pixel point belongs, the standard centrality of each pixel point, and the standard offset between each target pixel point and each boundary of the second labeling detection frame of the target to which the pixel point belongs.

In this embodiment, the second model loss may be L1, L2, or other regression loss.

And S34, performing iterative training on the second detection network type according to the second model loss until the second model loss converges or a preset iteration number is reached.

The second detection network obtained through training of the embodiment can accurately position the position of the target second prediction detection frame in the image to be detected.

Example 4

The present embodiment provides a target detection system based on deep learning, as shown in fig. 5, the system includes: the system comprises an image acquisition module 11, a first detection network processing module 12, a feature extraction module 13, a cutting module 14, a second detection network processing module 15 and an object detection module 16. The functions of the above modules are described in detail below:

the image acquisition module 11 is used for acquiring an image to be detected.

The first detection network processing module 12 is configured to process the image to be detected through a pre-trained first detection network to obtain a first target detection frame.

The feature extraction module 13 is configured to extract a feature map of the image to be detected.

And the cutting module 14 is used for cutting the characteristic diagram of the image to be detected to obtain a target cutting image containing the target first detection frame.

The second detection network processing module 15 is configured to input the target clipping image into a second detection network trained in advance, and obtain a centrality of each pixel point in the target clipping image and an offset between each target pixel point and each boundary of the target second detection frame to which the target clipping image belongs, where the centrality is a probability that the corresponding pixel point is a central point of the second detection frame, and the target pixel point is a pixel point (between 0 and 1) whose centrality is greater than a preset centrality threshold.

The object detection module 16 is configured to determine each of the object second detection frames according to an offset between each of the object pixel points and each of the boundaries of the object second detection frame to which the object second detection frame belongs.

The training procedure of the first detection network and the second detection network in this embodiment is described with reference to embodiments 2 and 3.

Example 5

The present embodiment provides an electronic device, which may be represented in the form of a computing device (for example, may be a server device), including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor may implement the deep learning based object detection method provided in embodiments 1, 2, or 3 when executing the computer program.

Fig. 6 shows a schematic diagram of a hardware structure of the present embodiment, and as shown in fig. 6, the electronic device 9 specifically includes:

at least one processor 91, at least one memory 92, and a bus 93 for connecting the various system components (including the processor 91 and the memory 92), wherein:

the bus 93 includes a data bus, an address bus, and a control bus.

Memory 92 includes volatile memory, such as Random Access Memory (RAM)921 and/or cache memory 922, and can further include Read Only Memory (ROM) 923.

Memory 92 also includes a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The processor 91 executes various functional applications and data processing, such as the target detection method based on deep learning provided in embodiments 1, 2, or 3 of the present invention, by running the computer program stored in the memory 92.

The electronic device 9 may further communicate with one or more external devices 94 (e.g., a keyboard, a pointing device, etc.). Such communication may be through an input/output (I/O) interface 95. Also, the electronic device 9 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 96. The network adapter 96 communicates with the other modules of the electronic device 9 via the bus 93. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 9, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Example 6

The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the deep learning-based object detection method provided in embodiments 1, 2 or 3.

More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible implementation, the present invention can also be implemented in the form of a program product including program code for causing a terminal device to perform the steps of implementing the deep learning based object detection method described in embodiment 1, 2 or 3 when the program product is run on the terminal device.

Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. A target detection method based on deep learning is characterized by comprising the following steps:

acquiring an image to be detected;

extracting a characteristic diagram of the image to be detected;

2. The method for detecting the target of claim 1, wherein the processing the image to be detected through the pre-trained first detection network to obtain the first target detection frame comprises:

3. The object detection method of claim 1, wherein the cropping the feature map of the image to be detected to obtain an object cropped image including the first object detection frame comprises:

4. The object detection method of claim 1, further comprising:

5. The object detection method of claim 1, wherein the training process of the first detection network is as follows:

and training the first detection network according to the first model loss.

6. The object detection method of claim 1, wherein the training process of the second detection network is as follows:

7. The method of claim 6, wherein the calculating a second model loss according to the predicted centrality of each pixel in the second sample image, the predicted offset between each target pixel and each boundary of the target second predicted frame, and the corresponding second frame metal criterion comprises:

calculating the standard centrality of each pixel point in the second sample image based on the second labeling detection frame, wherein the standard centrality is the standard probability that the corresponding pixel point is the central point of the second labeling detection frame;

8. The method of claim 7, wherein the calculating the normal centrality of each pixel point in the second sample image based on the second annotation detection box comprises:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 8 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.