CN111160140A - Image detection method and device - Google Patents

Image detection method and device Download PDF

Info

Publication number
CN111160140A
CN111160140A CN201911284783.XA CN201911284783A CN111160140A CN 111160140 A CN111160140 A CN 111160140A CN 201911284783 A CN201911284783 A CN 201911284783A CN 111160140 A CN111160140 A CN 111160140A
Authority
CN
China
Prior art keywords
feature map
feature
map
attention
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911284783.XA
Other languages
Chinese (zh)
Other versions
CN111160140B (en
Inventor
崔婵婕
任宇鹏
卢维
熊剑平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN201911284783.XA priority Critical patent/CN111160140B/en
Publication of CN111160140A publication Critical patent/CN111160140A/en
Application granted granted Critical
Publication of CN111160140B publication Critical patent/CN111160140B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/05Recognition of patterns representing particular kinds of hidden objects, e.g. weapons, explosives, drugs

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image detection method and device, and the method comprises the following steps: acquiring an image containing one or more objects to be detected; extracting the features of the image according to a neural network to obtain N feature maps; n is an integer greater than or equal to 2; performing fusion processing on the N characteristic graphs to obtain a first characteristic graph H; extracting semantic information from the first feature map H to obtain an attention map; obtaining a second feature map according to the attention map and the first feature map H; and sequentially fusing the N characteristic graphs with the second characteristic graph to obtain a detection result. By the method, the dangerous goods in the luggage can be detected, and the detection accuracy and efficiency of the dangerous goods are improved.

Description

Image detection method and device
Technical Field
The present application relates to the field of communications technologies, and in particular, to an image detection method and apparatus.
Background
Dangerous goods detection tasks based on an X-ray machine generally comprise detecting objects such as knives, guns, containers, lighters and the like in packages; dangerous goods such as knives and guns appear in a real scene with very low probability, so that training data which can be used for training a target detection network are few, and when dangerous goods such as knives and guns appear, the hiding performance is very high, the angle is drilled, and the requirement on a target detection algorithm is particularly high; the probability of dangerous articles such as containers appearing in a real life scene is extremely high, but the dangerous articles are different in type and size, serious shielding, overlapping and the like exist, and the difficulty of marking by a target detection algorithm is high. Due to the particularity of the detection task, once the missed detection threatens the public safety, and the false detection influences the user experience, the target detection algorithm is required to have higher accuracy and detection rate.
Disclosure of Invention
The embodiment of the application provides an image detection method and device, which are used for solving the problem that in the prior art, dangerous goods are difficult to detect.
In a first aspect, an image detection method provided in an embodiment of the present application includes:
acquiring an image containing one or more objects to be detected;
extracting the features of the image according to a neural network to obtain N feature maps; n is an integer greater than or equal to 2;
performing fusion processing on the N characteristic graphs to obtain a first characteristic graph H;
extracting semantic information from the first feature map H to obtain an attention map;
obtaining a second feature map according to the attention map and the first feature map H;
and sequentially fusing the N characteristic graphs with the second characteristic graph to obtain a detection result.
Optionally, the obtaining a first feature map H by performing fusion processing on the N feature maps includes:
the N feature maps comprise a plurality of layers; the M feature maps of the i +1 th layer are obtained by convolution and fusion of N feature maps of the i-th layer, wherein M is N-1; i is an integer greater than or equal to 1;
and taking the feature map of the last layer as the first feature map H.
Optionally, the M feature maps of the i +1 th layer are obtained by convolution and fusion of N feature maps of the i-th layer, and the method includes:
and the jth feature map in the M feature maps is obtained by fusing the jth feature map of the ith layer and the jth +1 feature map, wherein j is an integer greater than or equal to 1.
Optionally, the extracting semantic information from the first feature map H to obtain an attention map includes:
performing convolution dimensionality reduction on the first feature map H to obtain a feature map Q and a feature map K;
obtaining attention weight according to the feature map Q and the feature map K;
and normalizing the attention weight to obtain the attention diagram.
Optionally, the extracting semantic information of the first feature map H to obtain an attention map further includes:
performing convolution on the first feature map H to obtain a feature map V;
obtaining a third feature map S according to the feature map V and the first feature map H; wherein the third characteristic diagram S satisfies the following formula:
S=(V*A)+H;
wherein S is the third feature map, V is the feature map V, A is the attention map, and H is the first feature map H;
performing convolution on the third feature map S to generate a segmentation result;
and if the segmentation result is consistent with the marked region in the attention map, obtaining a second feature map according to the attention map and the first feature map H. Optionally, the obtaining a second feature map according to the attention map and the first feature map H includes:
the second characteristic diagram satisfies the following formula:
F=H*(A+1);
wherein F is the second feature map, A is the attention map, and H is the first feature map.
Optionally, the fusing the N feature maps with the second feature map in sequence to obtain a detection result includes: and fusing the second feature map and the last feature map of each layer except the last first layer in the N feature maps to obtain the detection result.
In a second aspect, an image detection apparatus provided in an embodiment of the present application includes: the device comprises an acquisition module, a detection module and a display module, wherein the acquisition module is used for acquiring an image containing one or more objects to be detected;
the processing module is used for extracting the features of the image according to the neural network to obtain N feature maps; n is an integer greater than or equal to 2;
the processing module is further configured to perform fusion processing on the N feature maps to obtain a first feature map H;
the processing module is further configured to extract semantic information from the first feature map H to obtain an attention map;
the processing module is further used for obtaining a second feature map according to the attention map and the first feature map H;
and the processing module is further used for fusing the N characteristic graphs with the second characteristic graph in sequence to obtain a detection result.
In a third aspect, an electronic device provided in an embodiment of the present application includes: at least one processor, and a memory communicatively coupled to the at least one processor, wherein: the memory stores instructions executable by the at least one processor to enable the at least one processor to perform one or more of the steps of the above-described method.
In a fourth aspect, embodiments of the present application provide a computer-readable medium storing computer-executable instructions for performing the above method.
The application provides an image detection method, which comprises the following steps: acquiring an image containing one or more objects to be detected; extracting the features of the image according to a neural network to obtain N feature maps; n is an integer greater than or equal to 2; performing fusion processing on the N characteristic graphs to obtain a first characteristic graph H; extracting semantic information from the first feature map H to obtain an attention map; obtaining a second feature map according to the attention map and the first feature map H; and sequentially fusing the N characteristic graphs with the second characteristic graph to obtain a detection result. By the method, the dangerous goods in the luggage can be detected, and the detection accuracy and efficiency of the dangerous goods are improved.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
fig. 1A is a schematic network structure diagram of an image detection algorithm provided in an embodiment of the present application;
fig. 1B is a schematic flowchart of an image detection method according to an embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a semantic information extraction module provided in the embodiment of the present application;
fig. 3 is a schematic diagram of a second characteristic diagram F generated according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an image detection apparatus according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to solve the problems that dangerous goods in luggage goods are difficult to identify, and the identification accuracy and the identification efficiency cannot be considered at the same time, the embodiment of the application provides a dangerous goods detection method and device.
The preferred embodiments of the present application will be described below with reference to the accompanying drawings of the specification, it should be understood that the preferred embodiments described herein are merely for illustrating and explaining the present application, and are not intended to limit the present application, and that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
It should be understood that the terms first, second, etc. in the description of the embodiments of the present application are used for distinguishing between the descriptions and not for indicating or implying relative importance or order. In the description of the embodiments of the present application, "a plurality" means two or more.
The term "and/or" in the embodiment of the present application is only one kind of association relationship describing an associated object, and means that three kinds of relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The purpose of image detection is to accurately predict the object type and position present in an image and output a detection target in the form of a type number and a detection frame. The traditional target detection method comprises the following three steps: the method comprises three steps of generating detection frames with different aspect ratios on different scales through a sliding window, extracting features through SIFT and other modes, classifying the detection frames by a support vector machine and other methods, wherein the method usually needs huge calculation amount, the feature extraction only utilizes low-order visual information and is difficult to capture semantic information in a complex scene, the three steps in the method are independently executed and optimized, the global optimal solution is difficult to obtain, and the detection accuracy is low.
The target detection algorithm based on deep learning can effectively avoid the defects of the traditional image detection method, and is mainly divided into target detection based on a detection frame and pixel-level example segmentation. The example segmentation is used for detecting the object at the pixel level, so that the precision is high but the real-time performance is poor; and the target detection algorithm based on the detection frame is divided into two types, namely a two-step detection algorithm and a one-step detection algorithm. Two-step detection algorithms, such as the (Region-conditional Neural Networks, R-CNN) algorithm and its variants, first use a detection box generator to generate a large number of detection boxes, then extract feature layers from the detection boxes, and then use a classifier to predict the class of the detection boxes. A one-step detection algorithm, such as the YOLO algorithm and its variants, directly performs class prediction at each location of the feature map. The two-step detection algorithm has high precision but poor real-time performance; the one-step detection algorithm has good real-time performance, but low precision, and the target accuracy and the detection rate are difficult to balance.
In order to solve the technical problem that the requirement for detecting the dangerous goods is high, in the prior art scheme, a two-step detection algorithm is adopted to detect the dangerous goods, although the detection accuracy is improved, the calculation amount is large, the execution is complex, the calculation efficiency is low, the real-time performance is poor, and the real object boundary cannot be well fitted.
Therefore, according to the image detection method provided by the application, a new image detection algorithm is constructed based on a one-step detection algorithm, semantic information extraction information is fused in an original target detection network, so that the detection result is more accurate, and the efficiency of detecting an object is improved. Referring to fig. 1A, fig. 1A is a schematic diagram of a network structure of an image detection algorithm provided in an embodiment of the present application, where the network structure includes the following four modules:
1) feature extraction module
The feature extraction module adopts a multi-hop connected image classification network (DLA 34) in the target Center detection network Center-Net. The characteristic extraction module is used for extracting the characteristics of the acquired image to be detected to obtain one or more first characteristic layers.
2) Fusion module
The high-order feature layer is rich in semantic features and lacks spatial information, and the low-order feature layer is just opposite to the high-order feature layer, so that the fusion module is used for fusing feature layers of different orders.
3) Semantic segmentation module
In order to improve the accuracy of target detection, a semantic segmentation module is designed in the DLA34 network in the embodiment of the application, and pixels in a feature map are segmented according to preset semantic rules to obtain semantic extraction information.
4) Detection module
In order to further fuse high-level semantic information, the detection module is configured to fuse the one or more second feature layers with the second feature map, respectively, to obtain a detection result.
The following describes the implementation process of the whole image detection algorithm in detail with reference to fig. 1A and 1B. Referring to fig. 1B, fig. 1B is a schematic flow chart of an image detection method according to an embodiment of the present application, where the method includes the following steps:
step 1: an image containing one or more objects to be detected is acquired.
The computer acquires an image of an object to be detected, wherein the image of the object to be detected can be an image based on transmission of an X-ray machine or an image shot by a mobile phone, and the embodiment of the application is not particularly limited.
Step 2: extracting the features of the image according to a neural network to obtain N feature maps; n is an integer greater than or equal to 2.
It should be noted that the feature map shown in fig. 1A is only a partial feature map of the N feature maps, and the feature map number is also only an example, and does not limit the network structure in the image detection algorithm provided in the present application. .
Illustratively, the neural network may adopt a classic multistage jump connection image classification network (DLA) 34 to perform feature extraction on the image to be detected, so as to obtain N feature maps. As shown in fig. 1A, four layers of feature maps in the DLA34 network are provided, where a feature map layer12, a feature map layer13, a feature map layer14 and a feature map layer15 are feature maps in a first layer, the feature maps in the first layer are feature maps in a lower order, where the resolution and the number of channels of each feature map are different, and spatial information in the feature maps in the first layer is rich but semantic feature information is less; the feature maps layer23, layer24 and layer25 are feature maps in the second layer; the feature maps layer34 and layer35 are feature maps in the third layer; the first feature pattern H is a feature pattern in the fourth layer.
It should be understood that "layer" herein refers to the abbreviation of "feature layer" described above, and the following description is made for all purposes.
And step 3: and performing fusion processing on the N characteristic graphs to obtain a first characteristic graph H. Optionally, the obtaining of the first feature map H by performing fusion processing on the N feature maps includes: the N feature maps comprise a plurality of layers; the M feature maps of the i +1 th layer are obtained by convolution and fusion of N feature maps of the i-th layer, wherein M is N-1; i is an integer greater than or equal to 1; and taking the feature map of the last layer as the first feature map H.
Optionally, the M feature maps of the i +1 th layer are obtained by convolution and fusion of N feature maps of the i-th layer, and the method includes: and the jth feature map in the M feature maps is obtained by fusing the jth feature map of the ith layer and the jth +1 feature map, wherein j is an integer greater than or equal to 1.
For example, assuming that the N feature maps include 4 layers, taking fig. 1A as an example, the feature map of the second layer is obtained by fusing the feature maps of the first layer, the feature map of the second layer is obtained by fusing the feature maps of the third layer, and the feature map of the third layer is obtained by fusing the feature maps of the third layer.
The process of fusion is explained in detail below by specific examples.
Taking the process of obtaining the layer25 feature map in the feature map of the second layer as an example, to obtain the layer25 layer, the layer14 feature map and the layer15 feature map in the feature map of the first layer need to be fused according to the following formula:
Layer25=Conv(Layer14+Deconv(Conv(Layer15)));
specifically, 3 × 3 convolution is performed on the layer15 feature map, so that the number of output channels of the layer15 feature map is the same as that of the layer14 feature map; then, deconvoluting the convolved layer15 feature map to enlarge the resolution to be the same as that of the layer14 feature map;
and summing the layer15 feature map and the layer14 feature map after convolution and deconvolution, and performing 3-by-3 convolution on the sum of the layer15 feature map and the layer14 feature map to obtain a layer25 feature map, wherein the number of output channels of the layer25 feature map is the same as that of the layer14 feature map. Similarly, a layer12 feature map and a layer13 feature map in the first feature map H are fused to obtain a layer 23; and fusing the layer13 feature map and the layer14 feature map in the first feature map H to obtain a layer24 feature map.
Fusing the layer23 feature map and the layer24 feature map in the feature map of the second layer in the same manner as described above to obtain a layer34 feature map; fusing a layer24 characteristic diagram and a layer25 characteristic diagram in the characteristic diagram of the second layer to obtain a layer 35; and fusing the layer34 feature map and the layer35 feature map in the feature map of the third layer to obtain a first feature map H.
And 4, step 4: and extracting semantic information from the first feature map H to obtain an attention map.
The specific implementation of the semantic segmentation module as shown in FIG. 1A includes a number of steps. The semantic segmentation process provided by the embodiment of the present application will be described in detail below with reference to fig. 2. Referring to fig. 2, fig. 2 is a schematic view of a semantic segmentation process according to an embodiment of the present disclosure.
201: and (5) performing dimension reduction on the first feature diagram H to obtain a feature diagram Q and a feature diagram K.
Illustratively, the first feature map H is reduced in dimension by a convolution kernel of 1 × 1, and Q, K two feature maps are generated respectively
Figure BDA0002317702360000081
The number of channels in the profile Q and the profile K is reduced to 1/8 for the number of channels in the first profile H. The arrangement mode of the characteristic diagram Q is H W1/8C, and the arrangement mode of the characteristic diagram K is 1/8C H W.
202: and obtaining an attention diagram according to the characteristic diagram Q and the characteristic diagram K.
Illustratively, the attention weight is obtained by multiplying the feature map Q and the feature map K, and then the attention weight is normalized by using a softmax function to obtain the attention map A.
203: and (5) convolving the first feature map H to obtain a feature map V.
Illustratively, the feature map V is generated by convolving the first feature map H with a convolution kernel of 1 × 1.
204: a feature map F is obtained from the feature map V, the attention map A and the first feature map H.
For example, as shown in fig. 2, the feature map V is multiplied by the attention map a and then added to the first feature map H to obtain the feature map S. Wherein, the characteristic diagram S satisfies the following formula:
S=(V*A)+H;
wherein S is the feature map S, V is the feature map V, A is the attention map, and H is the first feature map.
205: and (5) performing convolution on the feature map S to obtain a segmentation result.
Illustratively, the segmentation result is obtained by convolving the feature map S with a convolution kernel of 1 × 1.
It should be understood that the various classes in the segmentation result have been labeled by classification.
206: and processing the segmentation result according to a loss function.
Illustratively, the feature map S is convolved according to a convolution kernel of 1 × 1 to generate segmentation results, the preset classes are assumed to be five classes, each class has different marks, each pixel in the segmentation results is marked according to the class, and in order to ensure the accuracy of semantic information extraction, the cross entropy Loss function segmentation results are adopted for supervision processing.
It should be understood that after the segmentation result is obtained, the attention map is verified according to the segmentation result to ensure the accuracy of the attention map, and the second feature map is further obtained according to the attention map and the first feature map H.
And 5: and obtaining a second feature map according to the attention map and the first feature map H.
Optionally, the obtaining a second feature map according to the attention map and the first feature map H includes:
the second characteristic diagram satisfies the following formula:
F=H*(A+1);
wherein F is the second feature map, A is the attention map, and H is the first feature map H.
For example, please refer to fig. 3, fig. 3 is a schematic diagram of generating a second feature map F according to an embodiment of the present application, and as shown in fig. 3, the segmentation information obtained by performing semantic information extraction after performing semantic segmentation on the first feature map is an attention map a; in order to make the detection result more accurate, the attention diagram A is multiplied and added with the first feature diagram H output by the fusion module, so that a second feature diagram F which is an input feature diagram of the detection module can be obtained, and the second feature diagram F is fused with the segmented semantic information.
Step 6: and sequentially fusing the N characteristic graphs with the second characteristic graph to obtain a detection result.
Optionally, the fusing the N feature maps with the second feature map in sequence to obtain a detection result includes: and fusing the second feature map and the last feature map of each layer except the last first layer in the N feature maps to obtain the detection result.
In a possible implementation manner, the feature map F and the last feature map of each layer in the N feature maps are used as the input of the detection module, and the feature map F and the last feature map of each layer in the N feature maps are fused at one time to obtain the detection result.
In another possible implementation, the second feature map and the last feature map of the penultimate layer in the N feature maps are fused to obtain a first intermediate feature map; fusing the first intermediate feature map and the last feature map of the h +2 th layer from the last feature map in the N feature maps to obtain a g-th intermediate feature map; h is an integer greater than or equal to 1; g is an integer greater than or equal to 2; when the h +2 th layer in the N feature maps is the 2 nd layer in the N feature maps, fusing the last-but-one intermediate feature map with the last feature map of the first layer in the N feature maps to obtain a last intermediate feature map;
and fusing the last intermediate feature map and the last feature map of the first layer in the N feature maps to obtain the detection result. For example, taking fig. 1A as an example, first, the second feature map F and the layer35 feature map are used as inputs of the detection module, and the two feature maps are fused to obtain a feature map F1; then fusing the feature map F1 and the layer25 feature map to obtain a feature map F2; and fusing the feature map F2 and the layer15 to obtain a feature map F3, and outputting a detection result according to the feature map F3.
Taking the second feature map F and layer35 feature map fusion as an example, the fusion is performed according to the following formula:
F1=Conv(F+Deconv(Conv(Layer35)));
wherein Conv represents convolution and Deconv represents deconvolution.
The technical solution of the present application will be explained in detail with reference to fig. 1A, fig. 2, fig. 3 and the specific embodiments.
Taking the transmission image based on the X-ray machine as an example, the neural network can adopt a classic multistage jump connected image classification network (DLA) to perform feature extraction on the image to be detected. As shown in fig. 1A, assuming that the feature maps in the DLA34 network have 4 layers, the feature extraction module performs feature extraction on the image to be detected to obtain a feature map of a first layer as follows: a feature map layer12, a feature map layer13, a feature map layer14 and a feature map layer 15. And fusing the feature maps of the first layer to obtain a feature map of a second layer: a feature map layer23, a feature map layer24 and a feature map layer 25;
fusing the layer14 characteristic diagram and the layer15 characteristic diagram in the characteristic diagram of the first layer to obtain a layer25 characteristic diagram; 3 × 3 convolution is performed on the layer15 feature map to enable the number of output channels of the layer15 feature map to be the same as that of the layer14 feature map, and then deconvolution is performed on the convolved layer15 feature map to expand the resolution to be the same as that of the layer14 feature map. And summing the layer15 feature diagram and the layer14 feature diagram after convolution and deconvolution, and performing 3 x 3 convolution on the summed layer15 feature diagram and the layer14 feature diagram to obtain a layer25 feature diagram, wherein the number of output channels of the layer25 feature diagram is the same as that of the layer14 feature diagram.
Similarly, a layer12 feature map and a layer13 feature map in the feature map of the first layer are fused to obtain a layer23 feature map; and fusing the layer13 feature map and the layer14 feature map in the feature map of the first layer to obtain a layer24 feature map.
After the feature map of the second layer is obtained, in order to make the final feature map have more spatial information, the feature maps layer23, layer24 and layer25 in the feature map of the second layer are further fused to obtain a feature map of a third layer.
Fusing a feature map layer23 and a feature map layer24 in the feature map of the second layer to obtain a feature map layer 34; and fusing the layer24 feature map and the layer25 feature map in the feature map of the second layer to obtain a feature map layer 35.
And fusing the feature map layer34 and the feature map layer35 in the feature map of the second layer to obtain a first feature map H.
After the first feature map H is obtained, dimension reduction is performed on the first feature map H through convolution kernel of 1 × 1, Q, K two feature maps are generated respectively, and the number of channels of the feature map Q and the feature map K is reduced to 1/8 of the number of channels of the first feature map H. The arrangement mode of the characteristic diagram Q is H W1/8C, and the arrangement mode of the characteristic diagram K is 1/8C H W.
As shown in fig. 2, the attention weight is obtained by multiplying the feature map Q and the feature map K, and the attention weight is normalized by the softmax function to obtain the attention map a.
As shown in fig. 2, the first feature map H is convolved with the convolution kernel 1 × 1 to generate a feature map V, and the feature map V is multiplied by the attention map and added to the first feature map H to obtain a third feature map S.
And (3) performing convolution on the third characteristic diagram S according to the convolution core of 1 x 1 to generate a segmentation result, performing supervision processing on the segmentation result by adopting a Loss function of cross entropy, and judging the accuracy of the attention diagram according to the segmentation result.
In order to make the detection result more accurate, as shown in fig. 3, the attention pattern a is multiplied by the first feature pattern H output from the feature extraction module and then added to obtain a second feature pattern F, which is the input of the detection module, and the segmented semantic information is fused in the second feature pattern F.
As shown in fig. 1, the detection module inputs a second feature map F, fuses the second feature map F and a feature map Layer35 to obtain a feature map F1, and fuses a feature map F1 by using a feature map Layer25 to obtain a feature map F2; and fusing the feature map F2 by using the feature map Layer15 to obtain a feature map F3, and further outputting a detection result through the feature map F3, wherein the image of the dangerous goods in the detection result is marked.
Based on the same inventive concept, the present application further provides an image detection apparatus, please refer to fig. 4, where fig. 4 is a schematic structural diagram of an image detection apparatus provided in an embodiment of the present application, and the apparatus includes: an acquisition module 401 and a processing module 402;
an obtaining module 401, configured to obtain an image including one or more objects to be detected;
a processing module 402, configured to perform feature extraction on the image according to a neural network to obtain N feature maps; n is an integer greater than or equal to 2;
the processing module 402 is further configured to perform fusion processing on the N feature maps to obtain a first feature map H;
the processing module 402 is further configured to perform semantic information extraction on the first feature map H to obtain an attention map;
the processing module 402 is further configured to obtain a second feature map according to the attention map and the first feature map H;
the processing module 402 is further configured to fuse the N feature maps with the second feature map in sequence to obtain a detection result.
Optionally, the processing module 402 is configured to perform fusion processing on the N feature maps to obtain a first feature map H, and specifically configured to:
the N feature maps comprise a plurality of layers; the M feature maps of the i +1 th layer are obtained by convolution and fusion of N feature maps of the i-th layer, wherein M is N-1; i is an integer greater than or equal to 1;
and taking the feature map of the last layer as the first feature map H.
Optionally, the M feature maps of the i +1 th layer are obtained by convolution and fusion of N feature maps of the i-th layer, and the method includes:
and the jth feature map in the M feature maps is obtained by fusing the jth feature map of the ith layer and the jth +1 feature map, wherein j is an integer greater than or equal to 1.
Optionally, the processing module 402 performs semantic information extraction on the first feature map H to obtain an attention map, and is specifically configured to:
performing convolution dimensionality reduction on the first feature map H to obtain a feature map Q and a feature map K;
obtaining attention weight according to the feature map Q and the feature map K;
and normalizing the attention weight to obtain the attention diagram.
Optionally, the processing module 402 is configured to perform semantic information extraction on the first feature map H, and after obtaining an attention map, further configured to:
performing convolution on the first feature map H to obtain a feature map V;
obtaining a third feature map S according to the feature map V and the first feature map H; wherein the third characteristic diagram S satisfies the following formula:
S=(V*A)+H;
wherein S is the third feature map, V is the feature map V, A is the attention map, and H is the first feature map H;
performing convolution on the third feature map S to generate a segmentation result;
if the segmentation result is consistent with the marked region in the attention map, obtaining a second feature map according to the attention map and the first feature map H;
and if the segmentation result is inconsistent with the marked region in the attention map, extracting semantic information of the first feature map H again.
Optionally, the processing module 402 is configured to obtain a second feature map according to the attention map and the first feature map H, and specifically is configured to:
the second characteristic diagram satisfies the following formula:
F=H*(A+1);
wherein F is the second feature map, A is the attention map, and H is the first feature map.
Optionally, the processing module 402 is configured to fuse the N feature maps with the second feature map in sequence to obtain a detection result, and includes: and fusing the second feature map and the last feature map of each layer except the last first layer in the N feature maps to obtain the detection result.
Based on the same inventive concept, an electronic device with an image detection function is provided in the embodiments of the present application, please refer to fig. 5 for description, and fig. 5 is a schematic structural diagram of the electronic device provided in the embodiments of the present application. The electronic device with the image detection function includes at least one processor 502 and a memory 501 connected to the at least one processor, in this embodiment of the present application, a specific connection medium between the processor 502 and the memory 501 is not limited, fig. 5 illustrates an example in which the processor 502 and the memory 501 are connected by a bus 500, the bus 500 is represented by a thick line in fig. 5, and a connection manner between other components is merely a schematic illustration, and is not limited thereto. The bus 500 may be divided into an address bus, a data bus, a control bus, etc., and is shown with only one thick line in fig. 5 for ease of illustration, but does not represent only one bus or one type of bus.
In the embodiment of the present application, the memory 501 stores instructions executable by the at least one processor 502, and the at least one processor 502 may execute the steps included in the aforementioned method for using an album by calling the instructions stored in the memory 501.
The processor 502 is a control center of the electronic device provided with the image detection function, and can connect various parts of the whole electronic device provided with the image detection function by using various interfaces and lines, and implement various functions of the electronic device provided with the image detection function by executing instructions stored in the memory 501. Optionally, the processor 502 may include one or more processing units, and the processor 502 may integrate an application processor and a modem processor, wherein the application processor mainly handles operating systems, user interfaces, application programs, and the like, and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 502. In some embodiments, the processor 502 and the memory 501 may be implemented on the same chip, or in some embodiments, they may be implemented separately on separate chips.
The memory 501, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 501 may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charge Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory 501 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 501 in the embodiments of the present application may also be a circuit or any other device capable of implementing a storage function for storing program instructions and/or data.
The processor 502 may be a general-purpose processor, such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like, that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method for detecting an image disclosed in the embodiments of the present application may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor.
By programming the processor 502, codes corresponding to the method for detecting a path described in the foregoing embodiment may be fixed in a chip, so that the chip can execute the steps of the method for detecting a path when running.
Based on the above embodiments, in the embodiments of the present application, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the image detection method in any of the above method embodiments.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (10)

1. An image detection method, comprising:
acquiring an image containing one or more objects to be detected;
extracting the features of the image according to a neural network to obtain N feature maps; n is an integer greater than or equal to 2;
performing fusion processing on the N characteristic graphs to obtain a first characteristic graph H;
extracting semantic information from the first feature map H to obtain an attention map;
obtaining a second feature map according to the attention map and the first feature map H;
and sequentially fusing the N characteristic graphs with the second characteristic graph to obtain a detection result.
2. The method according to claim 1, wherein the fusing the N feature maps to obtain a first feature map H includes:
the N feature maps comprise a plurality of layers; the M feature maps of the i +1 th layer are obtained by convolution and fusion of N feature maps of the i-th layer, wherein M is N-1; i is an integer greater than or equal to 1;
and taking the feature map of the last layer as the first feature map H.
3. The method of claim 2, wherein the M feature maps of the i +1 th layer are obtained by convolution fusion of N feature maps of the i-th layer, and the method comprises the following steps:
and the jth feature map in the M feature maps is obtained by fusing the jth feature map of the ith layer and the jth +1 feature map, wherein j is an integer greater than or equal to 1.
4. The method of claim 1, wherein the extracting semantic information from the first feature map H to obtain an attention map comprises:
performing convolution dimensionality reduction on the first feature map H to obtain a feature map Q and a feature map K;
obtaining attention weight according to the feature map Q and the feature map K;
and normalizing the attention weight to obtain the attention diagram.
5. The method of claim 1, wherein after extracting semantic information from the first feature map H to obtain an attention map, the method further comprises:
performing convolution on the first feature map H to obtain a feature map V;
obtaining a third feature map S according to the feature map V and the first feature map H; wherein the third characteristic diagram S satisfies the following formula:
S=(V*A)+H;
wherein S is the third feature map, V is the feature map V, A is the attention map, and H is the first feature map H;
performing convolution on the third feature map S to generate a segmentation result;
and if the segmentation result is consistent with the marked region in the attention map, obtaining a second feature map according to the attention map and the first feature map H.
6. The method of claim 1, wherein the deriving a second profile from the attention map and the first profile H comprises:
the second characteristic diagram satisfies the following formula:
F=H*(A+1);
wherein F is the second feature map, A is the attention map, and H is the first feature map.
7. The method according to claim 1, wherein said fusing the N feature maps with the second feature map in sequence to obtain a detection result comprises:
and fusing the second feature map and the last feature map of each layer except the last first layer in the N feature maps to obtain the detection result.
8. An image detection apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition module, a detection module and a display module, wherein the acquisition module is used for acquiring an image containing one or more objects to be detected;
the processing module is used for extracting the features of the image according to the neural network to obtain N feature maps; n is an integer greater than or equal to 2;
the processing module is further configured to perform fusion processing on the N feature maps to obtain a first feature map H;
the processing module is further configured to extract semantic information from the first feature map H to obtain an attention map;
the processing module is further used for obtaining a second feature map according to the attention map and the first feature map H;
and the processing module is further used for fusing the N characteristic graphs with the second characteristic graph in sequence to obtain a detection result.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method of any of claims 1-7 are implemented when the program is executed by the processor.
10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program when executed by a processor implements the steps of the method of any one of claims 1 to 7.
CN201911284783.XA 2019-12-13 2019-12-13 Image detection method and device Active CN111160140B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911284783.XA CN111160140B (en) 2019-12-13 2019-12-13 Image detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911284783.XA CN111160140B (en) 2019-12-13 2019-12-13 Image detection method and device

Publications (2)

Publication Number Publication Date
CN111160140A true CN111160140A (en) 2020-05-15
CN111160140B CN111160140B (en) 2023-04-18

Family

ID=70557099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911284783.XA Active CN111160140B (en) 2019-12-13 2019-12-13 Image detection method and device

Country Status (1)

Country Link
CN (1) CN111160140B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709951A (en) * 2020-08-20 2020-09-25 成都数之联科技有限公司 Target detection network training method and system, network, device and medium
CN112102244A (en) * 2020-08-17 2020-12-18 湖南大学 Fetus ultrasonic standard section image detection method, computer equipment and storage medium
CN112508848A (en) * 2020-11-06 2021-03-16 上海亨临光电科技有限公司 Deep learning multitask end-to-end-based remote sensing image ship rotating target detection method
CN112633156A (en) * 2020-12-22 2021-04-09 浙江大华技术股份有限公司 Vehicle detection method, image processing apparatus, and computer-readable storage medium
CN112967264A (en) * 2021-03-19 2021-06-15 深圳市商汤科技有限公司 Defect detection method and device, electronic equipment and storage medium
CN113673557A (en) * 2021-07-12 2021-11-19 浙江大华技术股份有限公司 Feature processing method, action positioning method and related equipment
CN112633156B (en) * 2020-12-22 2024-05-31 浙江大华技术股份有限公司 Vehicle detection method, image processing device, and computer-readable storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636755A (en) * 2015-01-31 2015-05-20 华南理工大学 Face beauty evaluation method based on deep learning
US20170124432A1 (en) * 2015-11-03 2017-05-04 Baidu Usa Llc Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering
CN109242845A (en) * 2018-09-05 2019-01-18 北京市商汤科技开发有限公司 Medical imaging processing method and processing device, electronic equipment and storage medium
CN109389129A (en) * 2018-09-15 2019-02-26 北京市商汤科技开发有限公司 A kind of image processing method, electronic equipment and storage medium
CN109829429A (en) * 2019-01-31 2019-05-31 福州大学 Security protection sensitive articles detection method under monitoring scene based on YOLOv3
CN110298361A (en) * 2019-05-22 2019-10-01 浙江省北大信息技术高等研究院 A kind of semantic segmentation method and system of RGB-D image
CN110533045A (en) * 2019-07-31 2019-12-03 中国民航大学 A kind of luggage X-ray contraband image, semantic dividing method of combination attention mechanism
GB2586858A (en) * 2019-09-06 2021-03-10 Smiths Heimann Sas Image retrieval system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636755A (en) * 2015-01-31 2015-05-20 华南理工大学 Face beauty evaluation method based on deep learning
US20170124432A1 (en) * 2015-11-03 2017-05-04 Baidu Usa Llc Systems and methods for attention-based configurable convolutional neural networks (abc-cnn) for visual question answering
CN109242845A (en) * 2018-09-05 2019-01-18 北京市商汤科技开发有限公司 Medical imaging processing method and processing device, electronic equipment and storage medium
CN109389129A (en) * 2018-09-15 2019-02-26 北京市商汤科技开发有限公司 A kind of image processing method, electronic equipment and storage medium
CN109829429A (en) * 2019-01-31 2019-05-31 福州大学 Security protection sensitive articles detection method under monitoring scene based on YOLOv3
CN110298361A (en) * 2019-05-22 2019-10-01 浙江省北大信息技术高等研究院 A kind of semantic segmentation method and system of RGB-D image
CN110533045A (en) * 2019-07-31 2019-12-03 中国民航大学 A kind of luggage X-ray contraband image, semantic dividing method of combination attention mechanism
GB2586858A (en) * 2019-09-06 2021-03-10 Smiths Heimann Sas Image retrieval system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
FISHER YU: "Deep layer aggregation" *
向杰;卜巍;邬向前;: "基于深度学习的手分割算法研究" *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112102244A (en) * 2020-08-17 2020-12-18 湖南大学 Fetus ultrasonic standard section image detection method, computer equipment and storage medium
CN111709951A (en) * 2020-08-20 2020-09-25 成都数之联科技有限公司 Target detection network training method and system, network, device and medium
CN111709951B (en) * 2020-08-20 2020-11-13 成都数之联科技有限公司 Target detection network training method and system, network, device and medium
CN112508848A (en) * 2020-11-06 2021-03-16 上海亨临光电科技有限公司 Deep learning multitask end-to-end-based remote sensing image ship rotating target detection method
CN112508848B (en) * 2020-11-06 2024-03-26 上海亨临光电科技有限公司 Deep learning multitasking end-to-end remote sensing image ship rotating target detection method
CN112633156A (en) * 2020-12-22 2021-04-09 浙江大华技术股份有限公司 Vehicle detection method, image processing apparatus, and computer-readable storage medium
CN112633156B (en) * 2020-12-22 2024-05-31 浙江大华技术股份有限公司 Vehicle detection method, image processing device, and computer-readable storage medium
CN112967264A (en) * 2021-03-19 2021-06-15 深圳市商汤科技有限公司 Defect detection method and device, electronic equipment and storage medium
CN113673557A (en) * 2021-07-12 2021-11-19 浙江大华技术股份有限公司 Feature processing method, action positioning method and related equipment

Also Published As

Publication number Publication date
CN111160140B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN111160140B (en) Image detection method and device
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN110598788B (en) Target detection method, target detection device, electronic equipment and storage medium
US20150279048A1 (en) Method for generating a hierarchical structured pattern based descriptor and method and device for recognizing object using the same
CN111640089A (en) Defect detection method and device based on feature map center point
CN111274981B (en) Target detection network construction method and device and target detection method
CN110428414A (en) The method and device of bill quantity in a kind of identification image
Zhang et al. Fine localization and distortion resistant detection of multi-class barcode in complex environments
CN109344824A (en) A kind of line of text method for detecting area, device, medium and electronic equipment
KR102576157B1 (en) Method and apparatus for high speed object detection using artificial neural network
US9710703B1 (en) Method and apparatus for detecting texts included in a specific image
CN114494823A (en) Commodity identification, detection and counting method and system in retail scene
CN113723352A (en) Text detection method, system, storage medium and electronic equipment
CN111738069A (en) Face detection method and device, electronic equipment and storage medium
CN111062385A (en) Network model construction method and system for image text information detection
Eilertsen et al. BriefMatch: Dense binary feature matching for real-time optical flow estimation
CN116310688A (en) Target detection model based on cascade fusion, and construction method, device and application thereof
CN109815975A (en) A kind of objective classification method and relevant apparatus based on robot
CN113936288A (en) Inclined text direction classification method and device, terminal equipment and readable storage medium
Pototzky et al. Self-supervised learning for object detection in autonomous driving
Dong et al. SiameseDenseU‐Net‐based Semantic Segmentation of Urban Remote Sensing Images
CN112733741A (en) Traffic signboard identification method and device and electronic equipment
US9740947B2 (en) Hardware architecture for linear-time extraction of maximally stable extremal regions (MSERs)
Gao et al. Spatial Cross-Attention RGB-D Fusion Module for Object Detection
CN116958981B (en) Character recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant