CN113642572A - Image target detection method, system and device based on multi-level attention - Google Patents

Image target detection method, system and device based on multi-level attention Download PDF

Info

Publication number
CN113642572A
CN113642572A CN202110798192.5A CN202110798192A CN113642572A CN 113642572 A CN113642572 A CN 113642572A CN 202110798192 A CN202110798192 A CN 202110798192A CN 113642572 A CN113642572 A CN 113642572A
Authority
CN
China
Prior art keywords
image
attention
module
target
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110798192.5A
Other languages
Chinese (zh)
Other versions
CN113642572B (en
Inventor
张重阳
赵炳堃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN202110798192.5A priority Critical patent/CN113642572B/en
Publication of CN113642572A publication Critical patent/CN113642572A/en
Application granted granted Critical
Publication of CN113642572B publication Critical patent/CN113642572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method, a system and a device for detecting an image target based on multi-level attention, which comprises the following steps: constructing a feature extractor based on the deep convolutional neural network, using the feature extractor as a backbone network, and inputting the image into the backbone network to extract the depth features of the image; constructing a branch network based on the convolutional neural network as an attention branch; inputting the depth feature of the image into the attention branch to obtain a multi-level attention weight map; multiplying the multilevel attention weight graph by the depth feature of the image to obtain a weighted feature graph; inputting the weighted feature map into an RPN module to obtain a target candidate frame; and sending the weighted feature map corresponding to the target candidate frame to a classification and regression module to obtain a target detection frame. The invention can extract the target in the image in a grading way according to the interested degree, thereby greatly reducing the false detection caused by the disturbance of the background area in the detection of the specific image target such as personnel in the monitoring system, the defect in the industrial product and other image targets.

Description

Image target detection method, system and device based on multi-level attention
Technical Field
The invention relates to the technical field of image target detection, in particular to a method, a system and a device for detecting an image target based on multi-level attention.
Background
The task of target detection is to find out all interested targets (objects) in the image and determine their positions and sizes, which is a research direction of a core in the field of computer vision and has wide application requirements in scenes such as automatic driving, security monitoring, industrial manufacturing and the like.
Traditional target detection is mainly based on artificially designed features, such as HOG, DPM and other methods, and features of an interested target are designed manually and then classified by a classifier such as SVM. The traditional target detection algorithm is only suitable for scenes with obvious and single characteristics, and corresponding characteristics are difficult to design manually for some complex scenes.
In recent years, with the rapid development of deep learning technology in the field of computer vision, the target detection algorithm based on deep learning has been advanced greatly. After R.Girshick et al pioneering the R-CNN (regions with CNN) algorithm in 2014, the target detection field enters the deep learning period. The R-CNN algorithm firstly uses a selective search method to lead out a series of suggested areas, then inputs the suggested areas into a CNN model to extract features, and finally uses an SVM linear classifier to classify all the suggested areas, thereby obtaining excellent detection performance. And then Fast R-CNN designs a multi-task loss function, and the classification task and the frame are regressed and unified into the same network, so that the training speed is obviously improved. The subsequent Faster R-CNN algorithm proposed by anyone in Cynanchum paniculatum et al further breaks through the speed bottleneck of Fast R-CNN, and generates a suggestion region by introducing an RPN network instead of selective search, thereby truly realizing end-to-end target detection. In addition, there is a single-Stage (One-Stage) target detection algorithm represented by algorithms such as YOLO and SSD, which converts a target detection task into a regression task. The YOLO algorithm abandons the previous thought of extracting a candidate box and verifying, and applies a single neural network to the whole image, thereby achieving high detection speed. Firstly, the whole graph is divided into grids, each grid is responsible for detecting targets with central points positioned in the grid, a certain number of detection frames and confidence degrees of the detection frames are predicted for each unit grid, and finally the prediction frames are screened through non-maximum value inhibition.
However, the above target detection based on deep learning is basically directed to detection of general objects, and in practical application scenarios, the scenarios are often more complex, and features of the object to be detected may not be sufficiently prominent, so that the detector may be interfered by a background or some other objects with similar features, thereby causing a large amount of false detections. For example, in defect detection of a reed switch, detection of foreign matter in a reed contact area is easily affected by dirt or fine fibers on a glass tube wall because they have similar characteristics to the foreign matter in the reed contact area, thereby causing erroneous judgment. Some solutions to such problems employ a two-stage detection method, that is, a region where a target object is most likely to adhere is detected first, and then the target object is detected in the region, so as to achieve the purpose of filtering out false detection samples outside the region of interest. Firstly, the method carries out two model reasoning processes, so that the detection efficiency is sacrificed to a certain extent; secondly, the method needs to train two detection models, and an end-to-end structure is not formed; in addition, this method performs two calculations on the features of the same area, thereby wasting computational resources.
Therefore, how to design an end-to-end network structure at the algorithm level and to assist with a complete set of complete detection system and apparatus to solve some pain points faced in the above practical industrial scenarios is a very worthy of research.
Through retrieval, chinese patent CN 112686304A discloses a target detection method, device and storage medium based on an attention mechanism and multi-scale feature fusion, which only focuses on global attention, does not consider the interference of a specific background region in an image to a target to be detected, and is difficult to effectively filter out some objects in the background region similar to the features of the target to be detected.
Disclosure of Invention
The invention provides a method, a system and a device for detecting an image target based on multi-level attention, aiming at the problems in the scene, wherein multi-level weighting is carried out by taking an attention mechanism as a characteristic diagram, so that difficult samples except an interested area are filtered to a certain extent, and the false detection rate is reduced.
In a first aspect of the present invention, an image target detection method based on a multi-stage attention mechanism is provided, which includes:
s1, constructing a feature extractor based on the deep convolutional neural network, using the feature extractor as a backbone network, and inputting the image into the backbone network to extract the depth features of the image;
s2, constructing a branch network based on the convolutional neural network as an attention branch for extracting a multi-level attention weight map;
s3, inputting the depth feature of the image into the attention branch to obtain a multi-level attention weight map;
s4, multiplying the multilevel attention weight map and the depth feature of the image to obtain a weighted feature map;
s5, inputting the weighted feature map into an RPN module to obtain a series of target candidate frames;
and S6, sending the weighted feature map corresponding to the target candidate frame to a classification and regression module to finally obtain a target detection frame.
Preferably, the S2, including:
s21, performing dimensionality reduction on the depth features of the image of S1 through convolution operation to obtain an output feature map with the same scale and the channel number of 1;
and S22, performing convolution operation on the output characteristic diagram obtained in the step S21 to obtain a multi-level attention weight diagram with a value between 0 and 1 as the output of the attention branch.
Preferably, the S3 further comprises providing supervision information for the attention branch, including:
s31, collecting a large number of images containing the object to be detected, constructing a training data set, labeling the training data set, and marking the position, size and category information corresponding to the object to be detected and the area to which the object to be detected is attached, namely the position, size and category information of the region of interest;
s32, generating a zero matrix with the same depth characteristic scale as that of the image of S1, and carrying out equal-proportion transformation on the position and the size of the region of interest in the training image to obtain a transformation coordinate;
s33, in the zero matrix, according to the transformation coordinates obtained in S32, assigning values to the positions corresponding to the transformed interested areas according to the primary interested area, the secondary interested area and the uninteresting area respectively, wherein different areas correspond to different values;
s34, during training, the matrix after being assigned in the S33 is used as supervision information of the attention weight graph, and the Loss of the attention branch is calculated through a Loss function and is marked as Lossa
Preferably, the attention weight map output in S2 represents the degree of importance of different regions, wherein the primary region of interest has the largest weight and the secondary region of interest has the next highest weight. In S4, the weighted feature map is obtained by multiplying the attention weight map by the feature map output in S1.
Preferably, in S6, the classification and regression module includes: the classification network is used for classifying the weighted feature maps corresponding to the target candidate frames and outputting specific categories of the weighted feature maps; and the regression network is used for finely adjusting the position of the target candidate frame. The classification and regression networks respectively obtain a Loss during training, and the two losses are added with the Loss of the attention branch obtained in S3 to serve as the total Loss of the whole network, so that end-to-end training is realized.
In a second aspect of the present invention, there is provided an image target detection system based on a multi-stage attention mechanism, comprising:
the characteristic extraction module constructs a characteristic extractor based on the deep convolutional neural network, and the characteristic extractor is used as a backbone network to input the image into the backbone network to extract the depth characteristic of the image;
the attention branch module is used for constructing a branch network based on the convolutional neural network to serve as an attention branch and extracting a multi-level attention weight map;
a multi-level attention weight map acquisition module, which inputs the depth features of the image obtained by the feature extraction module into the attention branches constructed by the attention branch module to obtain a multi-level attention weight map;
a weighted feature map acquisition module, which multiplies the multi-level attention weight map obtained by the multi-level attention weight map acquisition module by the depth feature of the image obtained by the feature extraction module to obtain a weighted feature map;
a target candidate frame acquisition module, which inputs the weighted feature map obtained by the weighted feature map acquisition module into an RPN module to obtain a series of target candidate frames;
and the classification and regression module is used for classifying and regressing according to the weighted feature map corresponding to the target candidate frame obtained by the target candidate frame obtaining module to obtain the target detection frame.
In a third aspect of the present invention, there is provided an image object detection apparatus based on multi-level attention, comprising:
the image acquisition module is used for capturing a target to be detected, acquiring an image or a video containing the target to be detected in a specific scene and then carrying out subsequent detection;
the detection module is used for detecting the image acquired by the image acquisition module to obtain a specific detection result and displaying or feeding back the detection result to the control module; the detection adopts the image target detection method based on the multi-stage attention mechanism.
In a fourth aspect of the present invention, there is provided a computer device comprising at least one processor and at least one memory, wherein the memory stores a computer program which, when executed by the processor, enables the processor to perform the multi-level attention mechanism-based image object detection method.
In a fifth aspect of the present invention, there is provided a computer-readable storage medium, wherein instructions, when executed by a processor within a device, enable the device to perform the above-mentioned image object detection method based on a multi-level attention mechanism.
Compared with the prior art, the invention has the following advantages:
(1) according to the method, a certain weight is given to the extracted depth features through a multi-level attention mechanism, different weights are given to different regions according to the interested degree, for example, a higher weight is given to a region which is relatively interested, and a lower weight is given to a region which is easy to cause false detection, so that the occurrence probability of false detection caused by background disturbance can be reduced to a certain extent;
(2) the invention adds the loss of the branch to the total loss by designing a branch structure, thereby realizing end-to-end training and leading the training and reasoning of the detection process to be more concise;
(3) the branch structure introduced in the invention only brings a small amount of computational complexity, and compared with a two-stage detection method for carrying out twice reasoning, the method avoids repeated computation of the characteristic diagram, thereby accelerating the reasoning efficiency in the detection process.
(4) Based on the method, the invention provides a detection system and a detection device to automatically detect the surface defects in the production process of industrial products, thereby replacing the manual labor to a certain extent and saving the labor cost.
Drawings
Embodiments of the invention are further described below with reference to the accompanying drawings:
FIG. 1 is a flowchart of a multi-level attention-based image target detection method according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an attention-branch according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an embodiment of monitoring information during training of attention deficit;
fig. 4 is a schematic diagram of a system and an apparatus for multi-level attention-based image target detection according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications can be made by persons skilled in the art without departing from the spirit of the invention. All falling within the scope of the present invention.
Fig. 1 is a flowchart illustrating a multi-level attention-based image target detection method according to an embodiment of the present invention.
Referring to fig. 1, the method for detecting an image target based on a multi-level attention mechanism of the present embodiment includes:
s1, constructing a feature extractor which is called a backbone network based on the deep convolutional neural network, inputting the image into the backbone network, and extracting the depth features of the image, which are marked as M1;
s2, constructing a branch network based on the convolutional neural network as an attention branch for extracting a multi-level attention weight map;
s3, taking the depth feature M1 of the image obtained in S1 as the input of the attention branch, and outputting a multi-level attention weight map, which is marked as W;
s4, multiplying the output W of S3 and the output M1 of S1 to obtain a weighted characteristic diagram, which is marked as M2;
s5, taking the weighted feature map M2 of S4 as the input of the RPN module to obtain a series of target candidate frames, and marking the target candidate frames as B1;
and S6, sending the weighted feature map corresponding to the target candidate box output in the S5 to a classification and regression module, and finally obtaining a target detection box marked as B2.
In S1, in a preferred embodiment, feature extraction is performed through a backbone network, where ResNet-50 may be used as the backbone network, each stage is subjected to convolution and pooling operations to reduce the size of the feature map and increase the number of channels of the feature map, and finally, the output feature map of the fourth stage is selected as the output of the backbone network, i.e., the depth feature M1 of the image. Of course, in other embodiments, other networks may be used, and are not limited to ResNet-50.
In a specific embodiment, S1 may refer to the following operations:
s11, carrying out preprocessing such as zooming on the input image, and then sequentially sending the input image into a convolution layer and a pooling layer to obtain the characteristic of the first stage, namely a shallow characteristic, wherein the size of the shallow characteristic is reduced to be half of the size of the preprocessed image, and the number of channels is 64;
s12, performing convolution and pooling on the shallow layer features to obtain second-stage features, namely intermediate layer features, wherein the size of the intermediate layer features is reduced to half of that of the shallow layer features, and the number of channels is 128;
s13, performing convolution and pooling operations on the intermediate layer characteristics to obtain characteristics of a third stage, namely deeper layer characteristics, wherein the size of the deeper layer characteristics is reduced to half of the intermediate layer characteristics, and the number of channels is 256;
and S14, performing convolution and pooling on the deeper features to obtain features of a fourth stage, namely deep features, which are used as an output feature map of the backbone network, wherein the size of the output feature map is reduced to be half of the deeper features, and the number of channels is 512.
In a preferred example, in S2, in order to construct a branch network for extracting a multi-level attention weight map, the method may include:
s21, performing convolution on the output characteristic diagram M1 of the S1 by 3x3 to reduce the dimensionality to 1, and obtaining an output characteristic diagram which has the same scale as that of M1 and the number of channels is 1;
and S22, performing convolution operation on the output characteristic diagram of the S21 by 3x3 to obtain an attention weight diagram with a value between 0 and 1 as the output of the attention branch, which is marked as W.
In this embodiment, different regions in the image are classified into different attention levels according to the degree of interest, the different attention levels correspond to different values, and the higher the attention level is, the larger the value is. Repeated calculation of the feature map is avoided through the introduced branch structure, and therefore processing efficiency is improved.
In the preferred example, in S3, a multi-level attention weight map is obtained through the attention branch, and in order to guide the attention branch to generate a larger weight for the region of interest, the attention branch should be provided with corresponding supervision information. For example, in one embodiment, the region of interest is the contact area of the reed, since foreign matter is always present attached to the reed contact area. Specifically, the method comprises the following steps:
s31, labeling the training data set, in addition to labeling the position, size, and category information corresponding to the foreign object in the object to be detected, i.e., the reed contact area, and labeling the area to which the object to be detected is attached, i.e., the region of interest, i.e., the position, size, and category information of the reed contact area in this example.
And S32, generating a zero matrix with the same scale as the output feature map M1 of the S1, and carrying out equal-scale transformation on the position and the size of the region of interest in the training picture, namely the reed contact region. For example, assume that the width and height of the picture after the S1 is input and preprocessed are W, H; the width and height of the output characteristic map M1 of S1 are W1、H1(ii) a The picture corresponds to the region of interest, i.e. the location coordinates of the reed contact area, as ((x)11,y11),(x12,y12) The position coordinates of the corresponding area after transformation are derived by the following formula:
x21=x11·W1/W
y21=y11·H1/H
x22=x12·W1/W
y22=y12·H1/H
wherein x21,y21,x22,y22Respectively representing the abscissa of the upper left corner, the ordinate of the upper left corner, the abscissa of the lower right corner and the ordinate of the lower right corner of the region of interest after transformation. The set of coordinates corresponds to the coordinates of the reed contact area after being scaled.
And S33, in the zero matrix, according to the transformation coordinates obtained in S32, assigning a transformed region of interest, namely the position corresponding to the reed contact area, as 1, wherein the region represents a primary region of interest, and the practical meaning is that foreign matters in the reed contact area influence the switching performance of the reed and should be accurately detected. The other position, which is at the same level as the primary region of interest, is assigned 0.5, called the secondary region of interest, which is in reality in that foreign objects on the reed non-contact area do not affect the performance of the reed switch temporarily, but may move to the reed contact area later, and thus can be detected with a lower degree of confidence. The remaining area, called the region of no interest, which is kept at 0, represents that other areas, such as smudges on the glass tube wall, fibers, etc., which are easily misdetected, should be filtered out. Of course, in other embodiments, other value assignment rules may be adopted, which are mainly used to distinguish different areas, and this is only an example and is not intended to limit the present invention.
S34, during training, the matrix after assignment in the S33 is used as supervision information of the attention weight graph, and the Loss of the attention branch is calculated through a Loss function and is marked as Lossa
The supervision information obtained by the embodiment can guide attention branching, and can generate a larger weight for the region of interest, so that the target detection probability in the region is increased, and conversely, the target detection probability in the region with low target occurrence probability, such as the background, is reduced because the weight is relatively smaller, so that the false detection caused by the interference of the background and the like is reduced.
In a preferred embodiment, the attention weight graph W output in S2 represents the importance of different regions, wherein the primary region of interest, i.e., the reed contact region, has the greatest weight, the secondary region of interest, i.e., the reed non-contact region, has the next greatest weight, and the other regions of no interest have a weight of 0. In S4, the attention weight map W is multiplied by the feature map M1 output in S1 to obtain a weighted feature map, which is denoted as M2.
In the preferred embodiment, in S6, the classification and regression networks respectively obtain a Loss during training, and the two losses are added to the Loss of the attention branch obtained in S3 to obtain the total Loss of the entire network, thereby achieving end-to-end training. As shown in the following equation:
Loss=Lossa+Losscls+Lossreg
wherein Loss represents the total Loss of the entire networkaLoss, Loss representing a branch of attentionclsLoss, Loss on behalf of a classified networkregRepresents the Loss of the bounding box regression network.
In this embodiment, the loss of the branch is added to the total loss, so that end-to-end training is realized, and the training and reasoning of the network are more concise.
Based on the same technical concept, another embodiment of the present invention further provides an image target detection system based on a multi-stage attention mechanism, including:
the feature extraction module is used for constructing a feature extractor based on the deep convolutional neural network, and inputting the image into the backbone network to extract the depth features of the image;
the attention branch module is used for constructing a branch network based on the convolutional neural network to serve as an attention branch and extracting a multi-level attention weight map;
the multi-level attention weight map acquisition module inputs the depth features of the image obtained by the feature extraction module into the attention branches constructed by the attention branch module to obtain a multi-level attention weight map;
a weighted feature map acquisition module, which multiplies the multi-level attention weight map obtained by the multi-level attention weight map acquisition module by the depth feature of the image obtained by the feature extraction module to obtain a weighted feature map;
a target candidate frame acquisition module, which inputs the weighted feature map obtained by the weighted feature map acquisition module into the RPN module to obtain a series of target candidate frames;
and the classification and regression module is used for classifying and regressing according to the weighted feature map corresponding to the target candidate frame obtained by the target candidate frame obtaining module to obtain the target detection frame.
The specific implementation technology of each module in the embodiment of the image target detection system based on multi-level attention of the present invention may refer to the steps corresponding to the method, and will not be described herein again. The embodiment of the invention can meet the requirement of real-time detection and is more suitable for application in industrial scenes.
Based on the detection method and system, in another embodiment of the present invention, a multi-level attention-based image object detection apparatus is provided, in which the multi-level attention-based image object detection method is adopted for implementing a task of detecting a specific object in an image. Specifically, the image target detection device based on multi-level attention comprises: the image acquisition module is used for capturing a target to be detected, acquiring an image or a video containing the target to be detected in a specific scene and then carrying out subsequent detection; the detection module is used for detecting the image acquired by the image acquisition module to obtain a specific detection result and displaying or feeding back the detection result to the control module; the detection adopts the image target detection method based on the multi-stage attention mechanism in any one of the above embodiments.
Further, in order to clearly understand the technical solutions described above, the following description will be given in detail by taking a case of defect detection applied in an industrial scene as an example, but it should be understood that the example is not intended to limit the application of the present invention.
Specifically, referring to fig. 4, the image target detection method based on multi-level attention is applied to defect detection in an industrial scene. The embodiment is applied to defect detection in an industrial scene, and the detection target is a tiny foreign matter on a reed contact area in the magnetic reed switch. Since dirt or some fine fibers are often present on the wall of the reed switch glass tube, and their characteristics are similar to those of foreign objects on the reed contact area, it is difficult to completely distinguish the two by using the conventional target detection algorithm. These objects do not affect the function of the reed switch, and therefore a large number of false detections exist in the detection process. In view of this, the present embodiment adopts a product defect detecting apparatus based on multi-level attention, which includes: the device comprises a mechanical transmission module, an image acquisition module, a detection module and a software and hardware communication module. The mechanical transmission module conveys, rotates and grabs a product to be detected (a magnetic reed switch); the image acquisition module acquires images of conveyed and rotated products to be detected (magnetic reed switches) in the working process of the mechanical transmission module; the detection module processes and analyzes the image acquired by the image acquisition module, specifically adopts the image target detection method based on multi-level attention in the embodiment to obtain a specific detection result, and feeds the detection result back to the mechanical transmission module, and the mechanical transmission module carries out classified grabbing on the product to be detected (the magnetic reed switch) according to the fed-back detection result. Further, the device may further comprise a communication module for communication between the mechanical transmission module and the detection module.
Specifically, in a preferred embodiment, the mechanical transmission module performs conveying, rotating and grabbing of the product to be detected, and may include:
the material conveying module: the reed switch is automatically conveyed in a pipeline mode so as to carry out detection and classification.
The material rotation module: the magnetic reed switch can be grabbed, rotated and the like, rotated at any angle according to the axis, and continuously grabbed and subjected to image acquisition in the rotating process, so that the magnetic reed switch can be subjected to all-dimensional dead-angle-free detection.
The material dividing module: the magnetic reed switches are classified according to detection results, when the magnetic reed switches are conveyed to the tail end of the conveying belt by the conveying module, the material sorting module is used for grabbing the magnetic reed switches and putting the magnetic reed switches according to the detection results, the magnetic reed switches are divided into good products and defective products, and the defective products are further classified according to specific defect categories.
The programmable logic controller: the material distribution module is used for controlling the operation of the whole machine, and comprises a material conveying module, a material rotating module and a material distribution module, so that an operable interface is formed, and the operation such as control, parameter setting and the like can be performed on a mechanical device.
Specifically, in a preferred embodiment, the image capturing module may employ an optical image capturing device, and may include: the optical microscope is used for imaging the surface condition of the magnetic reed switch, and industrial products can be magnified and observed at a certain multiplying power for image acquisition. The light source system comprises a light source and a light source controller, and is used for providing good illumination conditions for the optical microscope. Industrial cameras are used to capture optical images captured by an optical microscope into a series of images or videos for subsequent inspection.
Of course, in other embodiments, the optical image capturing device, including but not limited to a visual microscope, a monitoring probe, etc., may alternatively be other devices for optically imaging and capturing the object to be inspected, such as an industrial product, as a digital image.
Specifically, in a preferred embodiment, the detection module may specifically include two parts, namely hardware and software, where the hardware is a computer, such as a high-performance GPU computer, and the computer is configured to run the detection software, detect the image acquired by the image acquisition module, feed the obtained detection result back to the control system, and feed the detection result obtained by the operation back to the control system, such as an industrial material sorting module and a security alarm linkage module, to finally realize sorting of defective products of industrial products, audible and visual alarm of abnormal conditions, and the like. In this embodiment, the detection result can be fed back to the sorting module, and finally, the detection and sorting of the reed switch are realized. The detection software mainly detects the acquired pictures, wherein the detection software comprises a graphical user interface, a user can check the real-time pictures acquired by the image acquisition module in real time and present the detection results of each time, and the detection software also comprises the functions of parameter configuration, data statistics, log recording and the like.
Specifically, in a preferred embodiment, the hardware and software communication module is a signal conversion and transmission module, including but not limited to a switching value module, which can be used for signal conversion and communication between the control system and the high-performance GPU computer. The signal conversion is the conversion of digital quantity into analog quantity or switching quantity, and is used for realizing the signal control of mechanical devices and the like. In this embodiment, a switching value module may be used, and the high module is used for communication between the high-performance GPU computer and mechanical devices such as the material rotation module and the material fetching module. Specifically, the material rotation module can send a signal for starting detection to the computer through the switching value module while starting rotation, so that the computer can start detection of the current reed switch; and after the computer detects a reed switch, a detection result can be sent to the material sorting module through the switching value module, so that the reed switch can be sorted.
In another embodiment of the present invention, there is further provided an anomaly detection apparatus based on a logging mechanism, including at least one processor and at least one memory, where the memory stores a computer program, and when the program is executed by the processor, the processor is enabled to execute the anomaly detection method based on the logging mechanism of any one of the above embodiments.
In another embodiment of the present invention, a computer-readable storage medium is further provided, wherein when the instructions in the storage medium are executed by a processor in a device, the device is enabled to execute any one of the above-mentioned anomaly detection methods based on a memorization mechanism.
It should be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. An image target detection method based on a multi-stage attention mechanism is characterized by comprising the following steps:
s1, constructing a feature extractor based on the deep convolutional neural network, using the feature extractor as a backbone network, and inputting the image into the backbone network to extract the depth features of the image;
s2, constructing a branch network based on the convolutional neural network as an attention branch for extracting a multi-level attention weight map;
s3, inputting the depth feature of the image into the attention branch to obtain a multi-level attention weight map;
s4, multiplying the multilevel attention weight map and the depth feature of the image to obtain a weighted feature map;
s5, inputting the weighted feature map into an RPN module to obtain a series of target candidate frames;
and S6, sending the weighted feature map corresponding to the target candidate frame to a classification and regression module to finally obtain a target detection frame.
2. The image target detection method based on multi-stage attention mechanism as claimed in claim 1, wherein said S2 comprises:
s21, performing dimensionality reduction on the depth features of the image of S1 through convolution operation to obtain an output feature map with the same scale and the channel number of 1;
and S22, performing convolution operation on the output characteristic diagram obtained in the step S21 to obtain a multi-level attention weight diagram with a value between 0 and 1 as the output of the attention branch.
3. The method for image object detection based on multi-level attention mechanism as claimed in claim 1, wherein said S3 further comprises providing supervision information for said attention branch, including:
s31, collecting a large number of images containing the object to be detected, constructing a training data set, labeling the training data set, and marking the position, size and category information corresponding to the object to be detected and the area to which the object to be detected is attached, namely the position, size and category information of the region of interest;
s32, generating a zero matrix with the same depth characteristic scale as that of the image of S1, and carrying out equal-proportion transformation on the position and the size of the region of interest in the training image to obtain a transformation coordinate;
s33, in the zero matrix, according to the transformation coordinates obtained in S32, assigning values to the positions corresponding to the transformed interested areas according to the primary interested area, the secondary interested area and the uninteresting area respectively, wherein different areas correspond to different values;
s34, training will be as followsThe matrix assigned in the step S33 is used as the supervision information of the attention weight graph, and the Loss of the attention branch is calculated by the Loss function and is denoted as Lossa
4. The method for detecting image targets based on multi-level attention mechanism as claimed in claim 3, wherein in S32, it is assumed that the width and height of the original image inputted into S1 are W, H; the width and height of the output characteristic diagram of S1 are W1、H1(ii) a The picture has a position coordinate of ((x) corresponding to the region of interest11,y11),(x12,y12) The position coordinates of the corresponding area after transformation are derived by the following formula:
x21=x11·W1/W
y21=y11·H1/H
x22=x12·W1/W
y22=y12·H1/H
wherein x21,y21,x22,y22Respectively representing the abscissa of the upper left corner, the ordinate of the upper left corner, the abscissa of the lower right corner and the ordinate of the lower right corner of the region of interest after transformation.
5. The image target detection method based on the multi-level attention mechanism as claimed in claim 3, wherein in S33, the position corresponding to the region of interest after transformation is assigned with a larger value ranging from 0 to 1, where the region represents a primary region of interest; assigning a smaller value with the value range of 0 to 1 to other positions which are positioned in the same horizontal line with the primary interested area, and calling the smaller value as a secondary interested area; the remaining area remains 0, called the region of no interest.
6. The image target detection method based on multi-stage attention mechanism as claimed in claim 1, wherein in the step S6, the classification and regression module comprises:
the classification network is used for classifying the weighted feature maps corresponding to the target candidate frames and outputting specific categories of the weighted feature maps;
the regression network is used for finely adjusting the position of the target candidate frame;
the classification and regression networks respectively obtain a Loss during training, and the two losses are added with the Loss of the attention branch of S3 to serve as the total Loss of the whole network, so that end-to-end training is realized.
7. An image target detection system based on a multi-stage attention mechanism, comprising:
the characteristic extraction module constructs a characteristic extractor based on the deep convolutional neural network, and the characteristic extractor is used as a backbone network to input the image into the backbone network to extract the depth characteristic of the image;
the attention branch module is used for constructing a branch network based on the convolutional neural network to serve as an attention branch and extracting a multi-level attention weight map;
a multi-level attention weight map acquisition module, which inputs the depth features of the image obtained by the feature extraction module into the attention branches constructed by the attention branch module to obtain a multi-level attention weight map;
a weighted feature map acquisition module, which multiplies the multi-level attention weight map obtained by the multi-level attention weight map acquisition module by the depth feature of the image obtained by the feature extraction module to obtain a weighted feature map;
a target candidate frame acquisition module, which inputs the weighted feature map obtained by the weighted feature map acquisition module into an RPN module to obtain a series of target candidate frames;
and the classification and regression module is used for classifying and regressing according to the weighted feature map corresponding to the target candidate frame obtained by the target candidate frame obtaining module to obtain the target detection frame.
8. An image object detecting apparatus based on multi-level attention, comprising:
the image acquisition module is used for capturing a target to be detected, acquiring an image or a video containing the target to be detected in a specific scene and then carrying out subsequent detection;
the detection module is used for detecting the image acquired by the image acquisition module to obtain a specific detection result and displaying or feeding back the detection result to the control module; the detection adopts the image object detection method based on the multi-stage attention mechanism as claimed in any one of claims 1-6.
9. A computer device comprising at least one processor and at least one memory, wherein the memory stores a computer program which, when executed by the processor, enables the processor to perform the method of image object detection based on a multi-level attentional force mechanism of any one of claims 1-6.
10. A computer readable storage medium having instructions which, when executed by a processor within an apparatus, enable the apparatus to perform the method of image object detection based on a multi-level attentional force mechanism of any one of claims 1 to 6.
CN202110798192.5A 2021-07-15 2021-07-15 Image target detection method, system and device based on multi-level attention Active CN113642572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110798192.5A CN113642572B (en) 2021-07-15 2021-07-15 Image target detection method, system and device based on multi-level attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110798192.5A CN113642572B (en) 2021-07-15 2021-07-15 Image target detection method, system and device based on multi-level attention

Publications (2)

Publication Number Publication Date
CN113642572A true CN113642572A (en) 2021-11-12
CN113642572B CN113642572B (en) 2023-10-27

Family

ID=78417381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110798192.5A Active CN113642572B (en) 2021-07-15 2021-07-15 Image target detection method, system and device based on multi-level attention

Country Status (1)

Country Link
CN (1) CN113642572B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109213868A (en) * 2018-11-21 2019-01-15 中国科学院自动化研究所 Entity level sensibility classification method based on convolution attention mechanism network
CN110135243A (en) * 2019-04-02 2019-08-16 上海交通大学 A kind of pedestrian detection method and system based on two-stage attention mechanism
CN111046871A (en) * 2019-12-11 2020-04-21 厦门大学 Region-of-interest extraction method and system
US20200356854A1 (en) * 2017-11-03 2020-11-12 Siemens Aktiengesellschaft Weakly-supervised semantic segmentation with self-guidance
US20210012146A1 (en) * 2019-07-12 2021-01-14 Wuyi University Method and apparatus for multi-scale sar image recognition based on attention mechanism
CN112686304A (en) * 2020-12-29 2021-04-20 山东大学 Target detection method and device based on attention mechanism and multi-scale feature fusion and storage medium
CN112767466A (en) * 2021-01-20 2021-05-07 大连理工大学 Light field depth estimation method based on multi-mode information
CN113065550A (en) * 2021-03-12 2021-07-02 国网河北省电力有限公司 Text recognition method based on self-attention mechanism

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200356854A1 (en) * 2017-11-03 2020-11-12 Siemens Aktiengesellschaft Weakly-supervised semantic segmentation with self-guidance
CN109213868A (en) * 2018-11-21 2019-01-15 中国科学院自动化研究所 Entity level sensibility classification method based on convolution attention mechanism network
CN110135243A (en) * 2019-04-02 2019-08-16 上海交通大学 A kind of pedestrian detection method and system based on two-stage attention mechanism
US20210012146A1 (en) * 2019-07-12 2021-01-14 Wuyi University Method and apparatus for multi-scale sar image recognition based on attention mechanism
CN111046871A (en) * 2019-12-11 2020-04-21 厦门大学 Region-of-interest extraction method and system
CN112686304A (en) * 2020-12-29 2021-04-20 山东大学 Target detection method and device based on attention mechanism and multi-scale feature fusion and storage medium
CN112767466A (en) * 2021-01-20 2021-05-07 大连理工大学 Light field depth estimation method based on multi-mode information
CN113065550A (en) * 2021-03-12 2021-07-02 国网河北省电力有限公司 Text recognition method based on self-attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李文涛 等: "多尺度通道注意力融合网络的小目标检测算法", 《计算机科学与探索》, pages 2390 - 2400 *

Also Published As

Publication number Publication date
CN113642572B (en) 2023-10-27

Similar Documents

Publication Publication Date Title
Jia et al. Detection and segmentation of overlapped fruits based on optimized mask R-CNN application in apple harvesting robot
CN112884064B (en) Target detection and identification method based on neural network
CN111899227A (en) Automatic railway fastener defect acquisition and identification method based on unmanned aerial vehicle operation
CN108985169B (en) Shop cross-door operation detection method based on deep learning target detection and dynamic background modeling
CN110009622B (en) Display panel appearance defect detection network and defect detection method thereof
CN114120093B (en) Coal gangue target detection method based on improved YOLOv algorithm
CN113642474A (en) Hazardous area personnel monitoring method based on YOLOV5
CN110599458A (en) Underground pipe network detection and evaluation cloud system based on convolutional neural network
CN114037684B (en) Defect detection method based on yolov and attention mechanism model
CN112561885B (en) YOLOv 4-tiny-based gate valve opening detection method
CN113469938A (en) Pipe gallery video analysis method and system based on embedded front-end processing server
CN113642572B (en) Image target detection method, system and device based on multi-level attention
CN116152722A (en) Video anomaly detection method based on combination of residual attention block and self-selection learning
CN112730437B (en) Spinneret plate surface defect detection method and device based on depth separable convolutional neural network, storage medium and equipment
CN113642473A (en) Mining coal machine state identification method based on computer vision
CN115564031A (en) Detection network for glass defect detection
CN112861681B (en) Pipe gallery video intelligent analysis method and system based on cloud processing
Chang et al. Deep Learning Approaches for Dynamic Object Understanding and Defect Detection
CN114387564A (en) Head-knocking engine-off pumping-stopping detection method based on YOLOv5
CN114140879A (en) Behavior identification method and device based on multi-head cascade attention network and time convolution network
CN112967335A (en) Bubble size monitoring method and device
CN113888604A (en) Target tracking method based on depth optical flow
Tennakoon et al. Visual Inspection of Storm-Water Pipe Systems using Deep Convolutional Neural Networks.
He et al. Fabric defect detection based on improved object as point
Wu et al. Express parcel detection based on improved faster regions with CNN features

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant