CN113240638A - Target detection method, device and medium based on deep learning - Google Patents

Target detection method, device and medium based on deep learning Download PDF

Info

Publication number
CN113240638A
CN113240638A CN202110518366.8A CN202110518366A CN113240638A CN 113240638 A CN113240638 A CN 113240638A CN 202110518366 A CN202110518366 A CN 202110518366A CN 113240638 A CN113240638 A CN 113240638A
Authority
CN
China
Prior art keywords
target
image
detection
pixel point
detection frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110518366.8A
Other languages
Chinese (zh)
Other versions
CN113240638B (en
Inventor
曲国祥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai United Imaging Intelligent Healthcare Co Ltd
Original Assignee
Shanghai United Imaging Intelligent Healthcare Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai United Imaging Intelligent Healthcare Co Ltd filed Critical Shanghai United Imaging Intelligent Healthcare Co Ltd
Priority to CN202110518366.8A priority Critical patent/CN113240638B/en
Publication of CN113240638A publication Critical patent/CN113240638A/en
Application granted granted Critical
Publication of CN113240638B publication Critical patent/CN113240638B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10072Tomographic images
    • G06T2207/10081Computed x-ray tomography [CT]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection method, equipment and medium based on deep learning, wherein the method comprises the following steps: acquiring an image to be detected; processing an image to be detected through a first detection network to obtain a target first detection frame; extracting a characteristic diagram of an image to be detected; cutting a feature map of an image to be detected to obtain a target cut image containing a target first detection frame; inputting the target clipping image into a second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of a second target detection frame to which the target clipping image belongs, wherein the centrality is the probability that the corresponding pixel point is the central point of the second detection frame, and the target pixel point is the pixel point of which the centrality is greater than a preset centrality threshold; and determining each target second detection frame according to the offset between each target pixel point and each boundary of the corresponding target second detection frame. The invention can solve the problem that adjacent objects in the image are difficult to distinguish accurately.

Description

Target detection method, device and medium based on deep learning
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, and a medium for target detection based on deep learning.
Background
The image-based computer-aided diagnosis technology mainly realizes automatic detection and identification of target structures or lesions through machine learning or deep learning. The traditional detection technology generally uses methods such as fitting ellipse, fast-RCNN, Retina-Net and the like to directly position and detect a target structure or a focus, but because the human body structure is complex, the imaging performance of different diseases has diversity and can be influenced by other diseases and human anatomy structures, a plurality of detection targets which are close to each other are often detected as a single object, thereby causing false detection. This not only influences the statistics of target object quantity, still can lead to the size of target object to judge the big deviation that appears, and then indirectly influences the judgement of doctor to the patient's state of an illness.
Taking lymph node detection as an example, documents [ Liu Fang, Dianthus superbus and Job's tears, Li Ling, and the like ] stomach CT image lymph node detection system and method based on shape and ellipse fitting, 2013 ] discloses an automatic lymph node detection system which comprises preprocessing, interesting boundary point detection, boundary ellipse fitting, region merging and lymph node tracking extraction functional modules. The system comprises a preprocessing module, an interest boundary point detecting module and a boundary point detecting module, wherein the preprocessing module is used for preprocessing an image to be detected, and the interest boundary point detecting module is used for further processing the preprocessed image to obtain an interest boundary point; the boundary ellipse fitting module is used for carrying out ellipse fitting on a curve formed by the interested boundary points to obtain an ellipse-like closed area; the region merging module is used for eliminating ambiguous regions formed by the intersection of the ellipses; and the lymph node tracking and extracting module is used for carrying out window characteristic matching and tracking on the suspected lymph nodes to finish lymph node extraction. Although the scheme can extract the lymph nodes, the generalization performance is poor, and the detection accuracy rate of the lymph nodes with complex shapes is not high.
The document [ Caohanqiang, Xuzhou Ping ] discloses a lymph node detection method for improving a segNet segmentation network, 2019. Although this method can segment and detect lymph nodes, it is easy to detect a plurality of lymph nodes close to each other in a single distance, which causes detection errors.
Disclosure of Invention
The invention provides a target detection method, device and medium based on deep learning, aiming at solving the problem that adjacent objects in an image are difficult to distinguish accurately in the prior art.
In order to achieve the above object, the present invention provides a target detection method based on deep learning, including:
acquiring an image to be detected;
processing the image to be detected through a pre-trained first detection network to obtain a target first detection frame;
extracting a characteristic diagram of the image to be detected;
cutting the characteristic graph of the image to be detected to obtain a target cut image containing the target first detection frame;
inputting the target clipping image into a pre-trained second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of a second target detection frame, wherein the centrality is the probability that the corresponding pixel point is the central point of the second target detection frame, and the target pixel point is the pixel point with the centrality larger than a preset centrality threshold;
and determining each target second detection frame according to the offset between each target pixel point and each boundary of the target second detection frame.
In a preferred embodiment of the present invention, the processing the image to be detected through a pre-trained first detection network to obtain a first target detection frame includes:
inputting the image to be detected into the first detection network to obtain the central position, the size and the confidence coefficient of a plurality of first detection frames;
and when the confidence coefficient of a certain first detection frame is greater than a preset confidence coefficient threshold value, determining that the first detection frame is the target first detection frame.
In a preferred embodiment of the present invention, the cropping the feature map of the image to be detected to obtain an object cropped image including the first object detection frame includes:
and cutting the feature map of the image to be detected by taking the central position of the first target detection frame as the center and taking m times of the size of the first target detection frame as the cutting size, wherein m > is 1.
In a preferred embodiment of the present invention, the method further comprises:
removing the overlapped target second detection frame from each of the target second detection frames.
In a preferred embodiment of the present invention, the training process of the first detection network is as follows:
acquiring a first sample set, wherein the first sample set comprises a plurality of first sample images and a first detection frame gold standard;
inputting the first sample image into a preset first detection network to obtain the central position and the size of a first prediction detection frame;
calculating a first model loss according to the central position and the size of the first prediction detection frame and the corresponding first detection frame gold standard;
and training the first detection network according to the first model loss.
In a preferred embodiment of the present invention, the training process of the second detection network is as follows:
acquiring a second sample set, wherein the second sample set comprises a plurality of second sample images and a second detection frame gold standard, the second detection frame gold standard comprises a second labeling detection frame labeled in the second sample images, and at least part of the second sample images are labeled with two or more adjacent second labeling detection frames;
inputting the second sample image into a preset second detection network to obtain the prediction centrality of each pixel point in the second sample image and the prediction offset between each target pixel point and each boundary of the target second prediction detection frame, wherein the prediction centrality is the prediction probability that the corresponding pixel point is the central point of the second prediction detection frame, and the target pixel point in the second sample image is the pixel point of which the prediction centrality is greater than the preset centrality threshold;
calculating a second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the target second prediction detection frame to which the target pixel point belongs, and a corresponding second detection frame gold standard;
and training the second detection network type according to the second model loss.
In a preferred embodiment of the present invention, the calculating a second model loss according to the prediction center degree of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the target second prediction detection frame to which the target pixel point belongs, and the corresponding second detection frame gold standard includes:
acquiring a second detection frame gold standard, wherein the second detection frame gold standard comprises a plurality of second labeling detection frames corresponding to the second sample image;
calculating the standard centrality of each pixel point in the second sample image based on the plurality of second labeling detection frames, wherein the standard centrality is the standard probability that the corresponding pixel point is the central point of the second labeling detection frame to which the corresponding pixel point belongs;
calculating the standard offset between each target pixel point and each boundary of the second labeling detection frame to which the target pixel point belongs;
and calculating the second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the second target prediction detection frame, the standard centrality of each pixel point and the standard offset between each target pixel point and each boundary of the second target labeling detection frame.
In a preferred embodiment of the present invention, the calculating the standard centrality of each pixel point in the second sample image based on the plurality of second annotation detection frames includes:
when the second sample image is a 2D image, calculating the standard centrality C of each pixel point in the second label detection frame in the second sample image according to the following formula:
Figure BDA0003062837340000041
wherein l*R, u and d respectively represent the distances between the corresponding pixel points and the left, right, upper and lower boundaries in the second labeling detection frame;
when the second sample image is a 3D image, calculating the standard centrality C of each pixel point in the second label detection frame in the second sample image according to the following formula:
Figure BDA0003062837340000051
wherein l*R, u, d, f and b respectively represent the distances between the corresponding pixel points and the left, right, upper, lower, front and rear boundaries in the second labeling detection frame;
when the second sample image is a 2D or 3D image, the standard centrality C of each pixel point outside the second label detection frame in the second sample image is 0;
when a certain pixel point in the second sample image is simultaneously located in the n second annotation detection frames, and n is greater than 1, the centrality C of the pixel point is as follows: max (C)1,C2,…,Cn) Wherein, CiAnd representing the centrality of the pixel point obtained based on the ith second label detection frame.
In order to achieve the above object, the present invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the aforementioned method when executing the computer program.
In order to achieve the above object, the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the aforementioned method.
By adopting the technical scheme, the invention has the following beneficial effects:
firstly, processing the image to be detected through a pre-trained first detection network to obtain a target first detection frame; then extracting a characteristic diagram of the image to be detected, and cutting the characteristic diagram of the image to be detected to obtain a target cut image containing the target first detection frame; and finally, inputting the target clipping image into a pre-trained second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of the target second detection frame, and determining each target second detection frame according to the offset. Therefore, the invention adds a second detection network for fine detection on the basis of coarse detection through the first detection network, the second detection network can detect the centrality of each pixel point in the target clipping image, when the centrality of a certain pixel point is greater than a preset probability threshold, the pixel point is taken as a target pixel point to represent the central point of a corresponding target object to be detected in the target clipping image, the number of the target objects (namely, target second detection frames) can be determined according to the number of the central points, and then the corresponding target second detection frames can be positioned by combining the offset between the pixel point and each boundary of the corresponding target second detection frames.
Drawings
Fig. 1 is a schematic flowchart of a deep learning-based target detection method according to embodiment 1 of the present invention;
FIG. 2 is a schematic diagram of lymph node detection by the deep learning-based target detection method according to embodiment 1 of the present invention;
fig. 3 is a schematic flowchart of training a first detection network according to embodiment 2 of the present invention;
fig. 4 is a schematic flowchart of training a second detection network according to embodiment 3 of the present invention;
fig. 5 is a block diagram of a deep learning-based target detection system according to embodiment 4 of the present invention;
fig. 6 is a hardware architecture diagram of an electronic device according to embodiment 5 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
Example 1
The present embodiment provides a target detection method based on deep learning, as shown in fig. 1 and fig. 2, the method specifically includes the following steps:
and S11, acquiring an image to be detected.
In the present embodiment, the image to be detected may be acquired from a PACS (Picture Archiving and Communication Systems), or may be acquired in real time from an image acquisition device.
Alternatively, the image to be detected may be a Computed Tomography (CT) image, a Magnetic Resonance Imaging (MRI) image, a low-dose Positron Emission Tomography (PET) image, or other modality images, and the modality of the image to be detected is not particularly limited in this embodiment.
And S12, processing the image to be detected through a pre-trained first detection network to obtain a first target detection frame.
In this embodiment, the first detection network CNN1 may be Fast-RCNN, Retina-Net, Yolo, or any other suitable one-or multi-stage detection network. Inputting an image to be detected into a first detection network CNN1 for coarse detection processing, and obtaining the central positions, sizes and confidence degrees of a plurality of first detection frames; when the confidence of a certain first detection frame is greater than a preset confidence threshold, determining that the first detection frame is the target first detection frame, as shown in fig. 2. Wherein, more than one adjacent target object may be included in the target first detection frame.
Optionally, before the image to be detected is input into the first detection network CNN1, the embodiment further includes preprocessing the image to be detected, for example, preprocessing such as window width level taking, pixel normalization, gaussian filtering, so as to reduce interference of noise on the network, make image features more vivid, and reduce difficulty in learning.
And S13, extracting the characteristic diagram of the image to be detected.
In this embodiment, a multi-scale feature map of an image to be detected can be extracted through a plurality of symmetrically jumping connected residual blocks, so as to ensure effective extraction of image features in shallow layers and deep layers.
And S14, cutting the characteristic diagram of the image to be detected to obtain a target cut image containing the target first detection frame.
In this embodiment, the cutting manner includes, but is not limited to, cutting with ROI Pooling or ROI Align. Specifically, the feature map of the image to be detected is cropped by taking the central position of the first target detection frame as the center and taking m times of the size of the first target detection frame as the cropping size, wherein m > is 1. For example, the resulting object trimming image is shown in fig. 2.
And S15, inputting the target clipping image into a pre-trained second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of the target second detection frame to which the target clipping image belongs, wherein the centrality is the probability that the corresponding pixel point is the central point of the second detection frame, and the target pixel point is a pixel point (belonging to a range between 0 and 1) with the centrality larger than a preset centrality threshold.
When the target clipped image is input to the second detection network CNN2, as shown in fig. 2, a central degree map of the target clipped image may be obtained, and an offset amount (i.e., an offset pixel number) between each target pixel point and each boundary of the second detection frame to which the target pixel point belongs may also be obtained.
Preferably, before the target trimming image is input to the second detection network, the target trimming image may be subjected to interpolation processing in advance so that the resolution thereof is the same as that of the image to be detected.
S16, determining each target second detection frame according to the offset between each target pixel point and each boundary of the target second detection frame.
For example, when the centrality of the pixel points a and B in the target clipping image is greater than the preset centrality threshold, it is determined that there are two separate target objects in the target clipping image, and the pixel points a and B are the centers of the two target objects, and the positions of the boundaries of the two target second detection frames can be determined according to the coordinate position of the pixel point A, B and the offset between the pixel point A, B and the boundaries of the target second detection frames, so that the accurate identification of the adjacent target objects is realized.
Preferably, the method of this embodiment may further include removing the redundantly overlapped target second detection frames from each of the determined target second detection frames. In particular, redundant overlap detection boxes may be eliminated using an NMS, greedy algorithm, or other optimization algorithm.
It can be seen that, in this embodiment, on the basis of performing coarse detection through the first detection network, a second detection network is added to perform fine detection, and the second detection network can detect the centrality of each pixel point in the target clipping image, and when the centrality of a certain pixel point is greater than a preset probability threshold, the pixel point is used as a target pixel point to represent a central point of a target object to be detected in the target clipping image, the number of target objects (i.e., target second detection frames) can be determined according to the number of the central points, and then, in combination with offsets between the pixel point and each boundary of the corresponding target second detection frame, the corresponding target second detection frame can be positioned, and according to this way, each target object in the target clipping image can be accurately positioned, so that the problem that adjacent objects are difficult to be accurately distinguished is solved.
Example 2
The embodiment is a further improvement of embodiment 1, and as shown in fig. 3, the embodiment specifically defines the training process of the first detection network as follows:
s21, a first sample set is obtained, and the first sample set comprises a plurality of first sample images and a first detection frame gold standard.
In the present embodiment, the first sample image may be acquired from a PACS (Picture Archiving and Communication Systems) or may be acquired in real time from an image capturing apparatus.
Alternatively, the first sample image may be a Computed Tomography (CT) image, a Magnetic Resonance Imaging (MRI) image, a low-dose Positron Emission Tomography (PET) image or other modality image, and the modality of the first sample image is not particularly limited, but it should be understood that the modality of the first sample image should be consistent with the modality of the image to be detected.
And S22, inputting the first sample image into a preset first detection network CNN1 to obtain the central position, the size and the confidence of the first prediction detection frame.
For example, when the first sample image is a 2D image, the first detection network CNN1 will get a first prediction detection boxIs calculated, and the center position coordinates (x, y), two-dimensional size (w, h), and confidence (p) are output, and the confidence p is greater than a confidence threshold p0As a prediction result, all the first prediction detection blocks of (a). When the first sample image is a 3D image, the first detection network CNN1 obtains the coordinates (x, y, z) of the center position, the two-dimensional size (w, h, D), and the confidence level (p) of the first predicted detection frame, and outputs the confidence level p greater than the confidence level threshold p0As a prediction result, all the first prediction detection blocks of (a).
Optionally, before inputting the first sample image into the first detection network CNN1, the present embodiment further includes preprocessing the first sample image. Wherein the preprocessing process is consistent with the preprocessing process of the image to be detected.
S23, calculating a first model loss according to the center position, size and confidence of the first prediction detection frame output by the first detection network CNN1 and the corresponding first detection frame gold standard.
In this embodiment, manually labeled first detection frame gold criteria (including center position, size, confidence) are used as the gold criteria for the first detection network to calculate the corresponding first model loss.
And S24, performing iterative training on the first detection network according to the first model loss until the first model loss converges or a preset iteration number is reached.
In this embodiment, the function of the first model penalty depends on the specific structure of the first detection network.
The first detection network obtained through training of the embodiment can accurately obtain the central position, the size and the confidence coefficient of the first prediction detection frame in the image to be detected.
Example 3
This example is a further modification of example 1 or 2. As shown in fig. 4, this embodiment specifically defines the training process of the second detection network as follows:
and S31, acquiring a second sample set, wherein the second sample set comprises a plurality of second sample images and a second detection frame gold standard, the second detection frame gold standard comprises a second labeling detection frame labeled in the second sample images, and at least part of the second sample images are labeled with two or more adjacent second labeling detection frames.
In this embodiment, the second sample image may be an image obtained by rectangular-clipping the feature map of the first sample image. When the first sample image is cropped, a single target object in the first sample image may be cropped, a plurality of adjacent target objects in the first sample image may be cropped, or a blank area (i.e., an area not containing a target object) in the first sample image may be cropped, so that the second sample image may include a single target object, or a plurality of adjacent target objects, or no target object, thereby improving the robustness of the second detection network.
And S32, inputting the second sample image into a preset second detection network to obtain the prediction centrality of each pixel point in the second sample image and the prediction offset between each target pixel point and each boundary of the target second prediction detection frame, wherein the prediction centrality is the prediction probability that the corresponding pixel point is the central point of the second prediction detection frame, and the target pixel point in the second sample image is the pixel point of which the prediction centrality is greater than the preset centrality threshold.
And S33, calculating a second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the target second prediction detection frame to which the target pixel point belongs, and the corresponding second detection frame gold standard. The specific implementation process is as follows:
and S331, obtaining the second detection frame gold standard, wherein the second detection frame gold standard comprises a plurality of second labeling detection frames corresponding to the second sample image.
And S332, calculating the standard centrality of each pixel point in the second sample image based on the plurality of second labeling detection frames, wherein the standard centrality is the standard probability that the corresponding pixel point is the central point of the second labeling detection frame to which the corresponding pixel point belongs.
Specifically, when the second sample image is a 2D image, the standard centrality C of each pixel point in the second annotation detection frame in the second sample image is calculated by the following formula:
Figure BDA0003062837340000111
wherein l*R, u, d respectively represent the distance between the corresponding pixel point and the left, right, upper and lower boundaries in the second label detection frame.
When the second sample image is a 3D image, calculating the standard centrality C of each pixel point in the second label detection frame in the second sample image according to the following formula:
Figure BDA0003062837340000112
wherein l*R, u, d, f and b respectively represent the distances between the corresponding pixel points and the left, right, upper, lower, front and rear boundaries in the second label detection frame.
And when the second sample image is a 2D or 3D image, the standard centrality C of each pixel point outside the second labeling detection frame in the second sample image is 0.
When a certain pixel point in the second sample image is simultaneously located in n (n is greater than 1) second label detection frames, the centrality C of the pixel point is as follows: max (C)1,C2,…,Cn) Wherein, CiRepresenting the centrality of the pixel point obtained based on the ith second label detection box, namely CiProbability of the pixel point being the center point of the ith second label detection frame, max (C)1,C2,…,Cn) Represents taking C1,C2,…,CnMaximum value of (2).
S333, calculating the standard offset between each target pixel point and each boundary of the second labeling detection frame.
Specifically, when the centrality of a certain pixel point is greater than the preset centreDegree threshold c0And then, the pixel point is the target pixel point, and the standard offset can be obtained by calculating the distance between the pixel point and each boundary of the corresponding second label detection frame.
And S334, calculating the second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the second prediction detection frame of the target to which the pixel point belongs, the standard centrality of each pixel point, and the standard offset between each target pixel point and each boundary of the second labeling detection frame of the target to which the pixel point belongs.
In this embodiment, the second model loss may be L1, L2, or other regression loss.
And S34, performing iterative training on the second detection network type according to the second model loss until the second model loss converges or a preset iteration number is reached.
The second detection network obtained through training of the embodiment can accurately position the position of the target second prediction detection frame in the image to be detected.
Example 4
The present embodiment provides a target detection system based on deep learning, as shown in fig. 5, the system includes: the system comprises an image acquisition module 11, a first detection network processing module 12, a feature extraction module 13, a cutting module 14, a second detection network processing module 15 and an object detection module 16. The functions of the above modules are described in detail below:
the image acquisition module 11 is used for acquiring an image to be detected.
In the present embodiment, the image to be detected may be acquired from a PACS (Picture Archiving and Communication Systems), or may be acquired in real time from an image acquisition device.
Alternatively, the image to be detected may be a Computed Tomography (CT) image, a Magnetic Resonance Imaging (MRI) image, a low-dose Positron Emission Tomography (PET) image, or other modality images, and the modality of the image to be detected is not particularly limited in this embodiment.
The first detection network processing module 12 is configured to process the image to be detected through a pre-trained first detection network to obtain a first target detection frame.
In this embodiment, the first detection network CNN1 may be Fast-RCNN, Retina-Net, Yolo, or any other suitable one-or multi-stage detection network. Inputting an image to be detected into a first detection network CNN1 for coarse detection processing, and obtaining the central positions, sizes and confidence degrees of a plurality of first detection frames; when the confidence of a certain first detection frame is greater than a preset confidence threshold, determining that the first detection frame is the target first detection frame, as shown in fig. 2. Wherein, more than one adjacent target object may be included in the target first detection frame.
Optionally, before the image to be detected is input into the first detection network CNN1, the embodiment further includes preprocessing the image to be detected, for example, preprocessing such as window width level taking, pixel normalization, gaussian filtering, so as to reduce interference of noise on the network, make image features more vivid, and reduce difficulty in learning.
The feature extraction module 13 is configured to extract a feature map of the image to be detected.
In this embodiment, a multi-scale feature map of an image to be detected can be extracted through a plurality of symmetrically jumping connected residual blocks, so as to ensure effective extraction of image features in shallow layers and deep layers.
And the cutting module 14 is used for cutting the characteristic diagram of the image to be detected to obtain a target cutting image containing the target first detection frame.
In this embodiment, the cutting manner includes, but is not limited to, cutting with ROI Pooling or ROI Align. Specifically, the feature map of the image to be detected is cropped by taking the central position of the first target detection frame as the center and taking m times of the size of the first target detection frame as the cropping size, wherein m > is 1. For example, the resulting object trimming image is shown in fig. 2.
The second detection network processing module 15 is configured to input the target clipping image into a second detection network trained in advance, and obtain a centrality of each pixel point in the target clipping image and an offset between each target pixel point and each boundary of the target second detection frame to which the target clipping image belongs, where the centrality is a probability that the corresponding pixel point is a central point of the second detection frame, and the target pixel point is a pixel point (between 0 and 1) whose centrality is greater than a preset centrality threshold.
When the target clipped image is input to the second detection network CNN2, as shown in fig. 2, a central degree map of the target clipped image may be obtained, and an offset amount (i.e., an offset pixel number) between each target pixel point and each boundary of the second detection frame to which the target pixel point belongs may also be obtained.
Preferably, before the target trimming image is input to the second detection network, the target trimming image may be subjected to interpolation processing in advance so that the resolution thereof is the same as that of the image to be detected.
The object detection module 16 is configured to determine each of the object second detection frames according to an offset between each of the object pixel points and each of the boundaries of the object second detection frame to which the object second detection frame belongs.
For example, when the centrality of the pixel points a and B in the target clipping image is greater than the preset centrality threshold, it is determined that there are two separate target objects in the target clipping image, and the pixel points a and B are the centers of the two target objects, and the positions of the boundaries of the two target second detection frames can be determined according to the coordinate position of the pixel point A, B and the offset between the pixel point A, B and the boundaries of the target second detection frames, so that the accurate identification of the adjacent target objects is realized.
Preferably, the method of this embodiment may further include removing the redundantly overlapped target second detection frames from each of the determined target second detection frames. In particular, redundant overlap detection boxes may be eliminated using an NMS, greedy algorithm, or other optimization algorithm.
It can be seen that, in this embodiment, on the basis of performing coarse detection through the first detection network, a second detection network is added to perform fine detection, and the second detection network can detect the centrality of each pixel point in the target clipping image, and when the centrality of a certain pixel point is greater than a preset probability threshold, the pixel point is used as a target pixel point to represent a central point of a target object to be detected in the target clipping image, the number of target objects (i.e., target second detection frames) can be determined according to the number of the central points, and then, in combination with offsets between the pixel point and each boundary of the corresponding target second detection frame, the corresponding target second detection frame can be positioned, and according to this way, each target object in the target clipping image can be accurately positioned, so that the problem that adjacent objects are difficult to be accurately distinguished is solved.
The training procedure of the first detection network and the second detection network in this embodiment is described with reference to embodiments 2 and 3.
Example 5
The present embodiment provides an electronic device, which may be represented in the form of a computing device (for example, may be a server device), including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor may implement the deep learning based object detection method provided in embodiments 1, 2, or 3 when executing the computer program.
Fig. 6 shows a schematic diagram of a hardware structure of the present embodiment, and as shown in fig. 6, the electronic device 9 specifically includes:
at least one processor 91, at least one memory 92, and a bus 93 for connecting the various system components (including the processor 91 and the memory 92), wherein:
the bus 93 includes a data bus, an address bus, and a control bus.
Memory 92 includes volatile memory, such as Random Access Memory (RAM)921 and/or cache memory 922, and can further include Read Only Memory (ROM) 923.
Memory 92 also includes a program/utility 925 having a set (at least one) of program modules 924, such program modules 924 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The processor 91 executes various functional applications and data processing, such as the target detection method based on deep learning provided in embodiments 1, 2, or 3 of the present invention, by running the computer program stored in the memory 92.
The electronic device 9 may further communicate with one or more external devices 94 (e.g., a keyboard, a pointing device, etc.). Such communication may be through an input/output (I/O) interface 95. Also, the electronic device 9 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet) via the network adapter 96. The network adapter 96 communicates with the other modules of the electronic device 9 via the bus 93. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 9, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.
It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module, according to embodiments of the application. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.
Example 6
The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the deep learning-based object detection method provided in embodiments 1, 2 or 3.
More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.
In a possible implementation, the present invention can also be implemented in the form of a program product including program code for causing a terminal device to perform the steps of implementing the deep learning based object detection method described in embodiment 1, 2 or 3 when the program product is run on the terminal device.
Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims (10)

1. A target detection method based on deep learning is characterized by comprising the following steps:
acquiring an image to be detected;
processing the image to be detected through a pre-trained first detection network to obtain a target first detection frame;
extracting a characteristic diagram of the image to be detected;
cutting the characteristic graph of the image to be detected to obtain a target cut image containing the target first detection frame;
inputting the target clipping image into a pre-trained second detection network to obtain the centrality of each pixel point in the target clipping image and the offset between each target pixel point and each boundary of a second target detection frame, wherein the centrality is the probability that the corresponding pixel point is the central point of the second target detection frame, and the target pixel point is the pixel point with the centrality larger than a preset centrality threshold;
and determining each target second detection frame according to the offset between each target pixel point and each boundary of the target second detection frame.
2. The method for detecting the target of claim 1, wherein the processing the image to be detected through the pre-trained first detection network to obtain the first target detection frame comprises:
inputting the image to be detected into the first detection network to obtain the central position, the size and the confidence coefficient of a plurality of first detection frames;
and when the confidence coefficient of a certain first detection frame is greater than a preset confidence coefficient threshold value, determining that the first detection frame is the target first detection frame.
3. The object detection method of claim 1, wherein the cropping the feature map of the image to be detected to obtain an object cropped image including the first object detection frame comprises:
and cutting the feature map of the image to be detected by taking the central position of the first target detection frame as the center and taking m times of the size of the first target detection frame as the cutting size, wherein m > is 1.
4. The object detection method of claim 1, further comprising:
removing the overlapped target second detection frame from each of the target second detection frames.
5. The object detection method of claim 1, wherein the training process of the first detection network is as follows:
acquiring a first sample set, wherein the first sample set comprises a plurality of first sample images and a first detection frame gold standard;
inputting the first sample image into a preset first detection network to obtain the central position and the size of a first prediction detection frame;
calculating a first model loss according to the central position and the size of the first prediction detection frame and the corresponding first detection frame gold standard;
and training the first detection network according to the first model loss.
6. The object detection method of claim 1, wherein the training process of the second detection network is as follows:
acquiring a second sample set, wherein the second sample set comprises a plurality of second sample images and a second detection frame gold standard, the second detection frame gold standard comprises a second labeling detection frame labeled in the second sample images, and at least part of the second sample images are labeled with two or more adjacent second labeling detection frames;
inputting the second sample image into a preset second detection network to obtain the prediction centrality of each pixel point in the second sample image and the prediction offset between each target pixel point and each boundary of the target second prediction detection frame, wherein the prediction centrality is the prediction probability that the corresponding pixel point is the central point of the second prediction detection frame, and the target pixel point in the second sample image is the pixel point of which the prediction centrality is greater than the preset centrality threshold;
calculating a second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the target second prediction detection frame to which the target pixel point belongs, and a corresponding second detection frame gold standard;
and training the second detection network type according to the second model loss.
7. The method of claim 6, wherein the calculating a second model loss according to the predicted centrality of each pixel in the second sample image, the predicted offset between each target pixel and each boundary of the target second predicted frame, and the corresponding second frame metal criterion comprises:
calculating the standard centrality of each pixel point in the second sample image based on the second labeling detection frame, wherein the standard centrality is the standard probability that the corresponding pixel point is the central point of the second labeling detection frame;
calculating the standard offset between each target pixel point and each boundary of the second labeling detection frame to which the target pixel point belongs;
and calculating the second model loss according to the prediction centrality of each pixel point in the second sample image, the prediction offset between each target pixel point and each boundary of the second target prediction detection frame, the standard centrality of each pixel point and the standard offset between each target pixel point and each boundary of the second target labeling detection frame.
8. The method of claim 7, wherein the calculating the normal centrality of each pixel point in the second sample image based on the second annotation detection box comprises:
when the second sample image is a 2D image, calculating the standard centrality C of each pixel point in the second label detection frame in the second sample image according to the following formula:
Figure FDA0003062837330000031
wherein l*R, u and d respectively represent the distances between the corresponding pixel points and the left, right, upper and lower boundaries in the second labeling detection frame;
when the second sample image is a 3D image, calculating the standard centrality C of each pixel point in the second label detection frame in the second sample image according to the following formula:
Figure FDA0003062837330000032
wherein l*R, u, d, f and b respectively represent the distances between the corresponding pixel points and the left, right, upper, lower, front and rear boundaries in the second labeling detection frame;
when the second sample image is a 2D or 3D image, the standard centrality C of each pixel point outside the second label detection frame in the second sample image is 0;
when a certain pixel point in the second sample image is simultaneously located in the n second annotation detection frames, and n is greater than 1, the centrality C of the pixel point is as follows: max (C)1,C2,…,Cn) Wherein, CiAnd representing the centrality of the pixel point obtained based on the ith second label detection frame.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 8 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.
CN202110518366.8A 2021-05-12 2021-05-12 Target detection method, device and medium based on deep learning Active CN113240638B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110518366.8A CN113240638B (en) 2021-05-12 2021-05-12 Target detection method, device and medium based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110518366.8A CN113240638B (en) 2021-05-12 2021-05-12 Target detection method, device and medium based on deep learning

Publications (2)

Publication Number Publication Date
CN113240638A true CN113240638A (en) 2021-08-10
CN113240638B CN113240638B (en) 2023-11-10

Family

ID=77133732

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110518366.8A Active CN113240638B (en) 2021-05-12 2021-05-12 Target detection method, device and medium based on deep learning

Country Status (1)

Country Link
CN (1) CN113240638B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658273A (en) * 2021-08-19 2021-11-16 上海新氦类脑智能科技有限公司 Scene self-adaptive target positioning method and system based on spatial perception

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190272438A1 (en) * 2018-01-30 2019-09-05 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for detecting text
CN111091091A (en) * 2019-12-16 2020-05-01 北京迈格威科技有限公司 Method, device and equipment for extracting target object re-identification features and storage medium
WO2020164282A1 (en) * 2019-02-14 2020-08-20 平安科技(深圳)有限公司 Yolo-based image target recognition method and apparatus, electronic device, and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190272438A1 (en) * 2018-01-30 2019-09-05 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for detecting text
WO2020164282A1 (en) * 2019-02-14 2020-08-20 平安科技(深圳)有限公司 Yolo-based image target recognition method and apparatus, electronic device, and storage medium
CN111091091A (en) * 2019-12-16 2020-05-01 北京迈格威科技有限公司 Method, device and equipment for extracting target object re-identification features and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李文斌;何冉;: "基于深度神经网络的遥感图像飞机目标检测", 计算机工程, no. 07 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658273A (en) * 2021-08-19 2021-11-16 上海新氦类脑智能科技有限公司 Scene self-adaptive target positioning method and system based on spatial perception
CN113658273B (en) * 2021-08-19 2024-04-26 上海新氦类脑智能科技有限公司 Scene self-adaptive target positioning method and system based on space perception

Also Published As

Publication number Publication date
CN113240638B (en) 2023-11-10

Similar Documents

Publication Publication Date Title
US10891473B2 (en) Method and device for use in hand gesture recognition
US7787683B2 (en) Tree structure based 2D to 3D registration
CN110363817B (en) Target pose estimation method, electronic device, and medium
CN110136153B (en) Image processing method, device and storage medium
US9349207B2 (en) Apparatus and method for parsing human body image
US20200226392A1 (en) Computer vision-based thin object detection
CN113743385A (en) Unmanned ship water surface target detection method and device and unmanned ship
CN109145752B (en) Method, apparatus, device and medium for evaluating object detection and tracking algorithms
CN111275040A (en) Positioning method and device, electronic equipment and computer readable storage medium
CN115496923B (en) Multi-mode fusion target detection method and device based on uncertainty perception
CN111696133A (en) Real-time target tracking method and system
CN112116635A (en) Visual tracking method and device based on rapid human body movement
CN114820639A (en) Image processing method, device and equipment based on dynamic scene and storage medium
CN115797929A (en) Small farmland image segmentation method and device based on double-attention machine system
CN113240638B (en) Target detection method, device and medium based on deep learning
CN114119695A (en) Image annotation method and device and electronic equipment
CN111709269B (en) Human hand segmentation method and device based on two-dimensional joint information in depth image
CN112884804A (en) Action object tracking method and related equipment
US20220155441A1 (en) Lidar localization using optical flow
CN115223173A (en) Object identification method and device, electronic equipment and storage medium
CN112184766B (en) Object tracking method and device, computer equipment and storage medium
US20220245860A1 (en) Annotation of two-dimensional images
US20180161013A1 (en) Computer-aided tracking and motion analysis with ultrasound for measuring joint kinematics
CN110717406A (en) Face detection method and device and terminal equipment
CN117078761B (en) Automatic positioning method, device, equipment and medium for slender medical instrument

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant