WO2023159927A1

WO2023159927A1 - Rapid object detection method based on conditional branches and expert systems

Info

Publication number: WO2023159927A1
Application number: PCT/CN2022/120298
Authority: WO
Inventors: 高红霞; 黄滨; 廖宏宇; 牛世成
Original assignee: 华南理工大学
Priority date: 2022-02-25
Filing date: 2022-09-21
Publication date: 2023-08-31
Also published as: CN114626443A; CN114626443B

Abstract

A rapid object detection method based on conditional branches and expert systems. The method comprises: 1) acquiring an X-ray image; 2) obtaining image feature maps of RGB, HSV and gradient by means of conditional branches; 3) obtaining an ROI region using a region proposal network; 4) obtaining three ROI feature maps by means of branch feature alignment; 5) calculating the contribution degrees of the three feature maps, and performing feature concatenation on the basis of the contribution degrees, so as to obtain feature vectors subjected to weighted fusion; 6) inputting the three feature vectors subjected to weighted fusion into three expert system networks, so as to obtain object categories and positions; and 7) performing weighted fusion on prediction results of the three expert system networks, and identifying and marking the category and position of a tested object. Object detection is performed on the basis of conditional branches and expert systems, and a complex network is decomposed into network branches for parallel calculation, such that the inference speed of the network is increased, and the capability of mapping between a feature space and a solution space is also enhanced, thereby improving the speed and precision of object detection.

Description

A Fast Object Detection Method Based on Conditional Branch and Expert System

technical field

The present invention relates to the technical field of smart home appliance detection, in particular to a rapid object detection method based on conditional branches and an expert system, which realizes automatic detection, reduces work costs, and improves product defects on PCBA home appliance production and assembly lines and X-ray security inspection. Contraband detection accuracy and efficiency.

Background technique

With the development of artificial intelligence, the use of machines to replace manpower to achieve labor work has gradually become a new trend of technological development, especially in the fields of smart home appliance detection and X-ray safety inspection. PCBA intelligent detection was born in response to the current lagging manual/semi-automatic platform testing and the increasing demand for production efficiency. The seamless connection with the existing production line is realized through the universal connecting station, and a complete automatic test line can be formed with the existing ICT and functional test equipment, and the fully automatic online test. With the advancement of machine vision theory, X-ray safety detection has been updated to a certain extent. Relevant agencies often install X-ray security inspection devices in public places such as subways and airports for safety detection, so as to prevent danger from the source.

In the existing technology, the intelligent detection of home appliance PCBA uses algorithms to realize automatic detection, but the traditional algorithms currently used rely too much on prior knowledge, and the algorithm is fixedly designed according to the characteristics of the current short-term detection objects, such as feature selection and threshold limit wait. Although the above traditional algorithms can realize automatic detection, their generalization ability is poor. When a new batch of data is introduced, the algorithm needs to be readjusted to adapt to the new data. In order to improve the detection performance, a large number of judgment conditions are often added to the algorithm, which greatly reduces the detection speed of the object and causes the problem of poor real-time detection. The same problem also exists in the field of X-ray security detection. The current detection methods mainly rely on Manual operation requires a lot of human resources and requires long-term professional training for testing personnel. In the process of detection, due to long-term concentration, it may lead to the decrease of the staff's attention and distraction in the detection, which will lead to the increase of time in the detection, and the situation of missed detection and false detection often occurs. Sometimes it is necessary to adjust the running speed of the security inspection channel so that the inspectors can find out the contraband.

Therefore, whether it is PCBA home appliance inspection or X-ray safety inspection, the currently used inspection methods are very inefficient and not suitable for long-term operation and maintenance.

Contents of the invention

The purpose of the present invention is to overcome the shortcomings and deficiencies of the prior art, and propose a rapid object detection method based on conditional branching and expert systems, to realize automatic detection of home appliance production lines and contraband security checks, without training special staff , reduce the input of manpower and material resources, and can maintain stable detection accuracy and detection speed, and realize high-efficiency work.

In order to achieve the above object, the technical solution provided by the present invention is: a method for rapid object detection based on conditional branching and expert system, comprising the following steps:

1) Collect the X-ray image of the detection object on the conveyor belt;

2) Input the X-ray image into the three conditional branches to obtain the image feature maps of RGB, HSV and gradient respectively;

3) Input the RGB image feature map into the region proposal network to obtain the ROI region;

4) Align the ROI area using branch features to obtain ROI feature maps corresponding to the three feature maps of RGB, HSV, and gradient;

5) For the ROI area, calculate the contribution degree of the three ROI feature maps to the detection, assign corresponding weight vectors to the three conditional branches according to the contribution degree, and perform feature concatenation according to the respective weight vectors, wherein each ROI The feature map must calculate a contribution vector, and do point multiplication with the contribution vector to obtain three weighted and fused feature vectors;

6) Input the three weighted and fused feature vectors into the corresponding three expert system networks to obtain the object category and position;

7) According to the contribution vector, the prediction results of the three expert system networks are weighted and fused, and the category and location of the detected object are identified and marked.

Further, in step 1), the detection object is placed on the conveyor belt, and when the conveyor belt transports the detection object to the detection area, the X-ray instrument emits a fan-shaped ray beam through the collimator to scan the detection object, and the fan-shaped ray beam passes through the detection area. The inside of the object is projected on the receiving screen, and the X-ray image of the detected object is obtained through computer rendering technology.

Further, in step 2), each branch is provided with a feature extraction network, and the X-ray image is sent to three conditional branches after color space transformation, and the image feature maps of RGB, HSV and gradient are obtained after the operation;

The feature extraction network is a deep network consisting of a convolutional layer, a pooling layer and a nonlinear mapping layer;

Its convolution process is as follows:

In the formula, f ₁ [x, y] is the data of the image in the (x, y) area, w[x, y] is the convolution kernel, f ₂ [x, y] is the convolution in the (x, y) area The resulting features, n _i , _nj are the offset distances from the convolution center, n ₁ , n ₂ are the maximum offset distances in the vertical and horizontal directions of the convolution, respectively, f[x+n _i ,y+ n _j ] is the value of the image at (x+n _i ,y+n _j ), w[n _i ,n _j ] is the weight of the convolution kernel at (n _i ,n _j );

Its nonlinear mapping process:

f ₃ [x,y]=max(0,f ₂ [x,y])

In the formula, f ₃ [x, y] is the feature map obtained after nonlinear mapping.

Further, in step 3), each point in the RGB image feature map is defined as an anchor point, and each anchor point defines 9 anchor boxes centered on itself, and the anchor boxes beyond the image area are removed, and the remaining anchor boxes Feature map for binary classification and bounding box regression:

a. Two classifications: y=f[f ₄ (x,y)]

In the formula, y is the classification prediction of the foreground border, f ₄ (x, y) is the feature map of the anchor box, f is the classifier, and the classifier artificially sets a threshold, and the prediction greater than this threshold is the foreground, and is added to the subsequent steps to calculate , predictions smaller than this threshold are background and discarded;

b. Border regression: r=[Δx,Δy,Δh,Δw]=g(f ₄ [x,y])

In the formula, r is the offset of the foreground border, g is the linear regression function; Δx, Δy are the center offset predictions of the anchor frame; Δh, Δw are the scale factors of the anchor frame; the position and scale of the anchor frame are calculated according to the foreground regression Adjustment; then use non-maximum value suppression to filter the anchor boxes, and remove overlapping anchor boxes; then take the top n anchor boxes with the highest confidence as the ROI area, and enter the next step for processing.

Further, in step 4), after obtaining the ROI area extracted by the region proposal network, the ROI area is scale-adapted, scaled according to the ratio of the original image to the size of the feature map, and then the scaled area is aligned to RGB, HSV and gradient Feature map, get three different ROI feature maps.

Further, in step 5), for the ROI region, calculate the contribution degree of the three ROI feature maps for detection, assign corresponding weight vectors to the three conditional branches according to the contribution degree, and perform feature concatenation according to the respective weight vectors;

The contribution rate is calculated by the following formula:

W=softmax([V ₁ ,V ₂ ,V ₃ ])

In the formula, c is the maximum number of feature channels, f _i ^k is the feature value of the i-th channel after the k-th feature passes through the channel pooling layer, m _k is the feature mean value of the k-th feature after passing through the channel pooling layer, V _k is the contribution degree of each feature, and W is the final contribution vector. A contribution vector must be calculated for each ROI feature map and multiplied with the contribution vector to obtain three weighted and fused feature vectors.

Further, in step 6), three expert system networks are set, and the feature vectors after three weighted fusions are input into corresponding three expert system networks respectively, and each expert system network reasoning obtains object category and position;

Each expert system network needs to complete two tasks of classification and regression:

Classification: y′=max(h(f _p ))

In the formula, f _p is the feature vector of weighted fusion, h is a multi-classifier, and the output y' is the confidence degree of each class;

Classify all feature vectors obtained by reweighting each ROI feature map, and take the classification result with the highest confidence as the classification result of the ROI feature map;

Regression: r'=[Δx',Δy',Δh',Δw']=g(f _p )

In the formula, r' is the offset of the predicted frame; Δx', Δy' are the center offset prediction of the predicted frame; Δh', Δw' are the scaling factors of the predicted frame; g is the linear regression function;

Regression is performed on each ROI area to obtain a more accurate ROI area.

Further, in step 7), according to the contribution vector obtained in step 5), the prediction results of each expert system network in step 6) are weighted and fused to obtain the final prediction result:

In the formula, y _f is the final classification prediction result, r _f is the final regression prediction result, W _i is the contribution of the i-th branch to the classification prediction, y _i is the classification prediction result of the i-th branch, W _j is The contribution of the jth branch to the regression prediction, r _j is the regression prediction result of the jth branch; after the above process, the final prediction result is obtained, which is marked in the detection image to obtain the category and position of the object.

Compared with the prior art, the present invention has the following advantages and beneficial effects:

1. Compared with other deep learning detection methods, the present invention improves the detection speed while maintaining the detection accuracy. The proposed method splits the complex feature network into multiple conditional branches, and splits the detection head network into multiple expert In the system network, each network is small in scale and is calculated in parallel with each other, so the overall reasoning time is reduced. At the same time, the use of branch feature alignment avoids redundant calculation of region proposals under multiple branches and improves detection efficiency.

2. For the first time in the field of X-ray detection, the present invention adopts conditional branching for object detection, decomposes and expands the feature space, enables the network to dig out more distinguishing features, and avoids feature redundancy under massive data sets, resulting in excessive utilization The problem of overfitting.

3. The present invention sets a plurality of expert system networks, and each expert system network focuses on reasoning object categories belonging to its own branch, which improves the mapping ability between the feature space and the solution space, and is small for inter-class distances and large intra-class distances dataset, the method proposed by the present invention has higher detection accuracy.

4. The method of the present invention has a wide application space in computer vision tasks, can realize end-to-end training and detection, has strong data adaptability, and has broad application prospects.

Description of drawings

Fig. 1 is the test picture of this embodiment.

Fig. 2 is a characteristic heat map of the present embodiment.

FIG. 3 is a schematic diagram of the detection results of this embodiment.

Detailed ways

The present invention will be further described in detail below in conjunction with the embodiments and the accompanying drawings, but the embodiments of the present invention are not limited thereto.

This embodiment discloses a method for fast object detection based on conditional branching and expert system, comprising the following steps:

1) Place the package detection object with a business card on the conveyor belt. When the conveyor belt transports the detection object to the detection area, the X-ray instrument emits a fan-shaped ray beam through the collimator to scan the detection object, and the fan-shaped ray beam passes through the detection area. The inside of the object is projected on the receiving screen, and the X-ray image of the business card is obtained through computer rendering technology, as shown in Figure 1.

2) Transform the X-ray image of the business card into the color space and send it to three conditional branches respectively. Each conditional branch is equipped with a feature extraction network. After the image is processed by the three conditional branches, RGB, HSV and gradient images are respectively obtained. Feature map, calculate the low-resolution feature heat map after superimposing the three feature maps, scale the size of the low-resolution feature heat map to the same size as the original image, and then superimpose it with the original image to generate the final feature heat map, as shown in Figure 2 As shown, it can be found that the features are concentrated on the object surface.

The feature extraction network is a deep network, mainly composed of a convolutional layer, a pooling layer and a nonlinear mapping layer.

Its convolution process is as follows:

In the formula, f ₁ [x, y] is the data of the image in the (x, y) area, w[x, y] is the convolution kernel, f ₂ [x, y] is the convolution in the (x, y) area The resulting features, n _i , n _j are the offset distances from the convolution center, n ₁ , n ₂ are the maximum offset distances in the vertical direction and horizontal direction of the convolution, respectively, f[x+n _i ,y +n _j ] is the value of the image at (x+n _i ,y+n _j ), w[n _i ,n _j ] is the weight of the convolution kernel at (n _i ,n _j );

Its nonlinear mapping process:

f ₃ [x,y]=max(0,f ₂ [x,y])

For the detection object whose RGB input component is difficult to fit the prediction curve in the original algorithm, by decomposing the feature space, the object features of three different dimensions can be obtained, which improves the feature expression ability.

3) Input the RGB image feature map into the region proposal network to obtain the ROI region.

Each point in the RGB image feature map is defined as an anchor point. In order to better match objects of different sizes, each anchor point defines three sizes with itself as the center, and three anchor boxes with three aspect ratios combined with each other. Anchor boxes that exceed the image area, perform binary classification and border regression on the remaining anchor box feature maps:

a. Two classifications: y=f[f ₄ (x,y)]

In the formula, y is the classification prediction of the foreground border, f ₄ (x, y) is the feature map of the anchor box, f is the classifier, and the classifier artificially sets a threshold, and the prediction greater than this threshold is the foreground, and is added to the subsequent steps to calculate , predictions smaller than this threshold are considered background and discarded.

b. Border regression: r=[Δx,Δy,Δh,Δw]=g(f ₄ [x,y])

In the formula, r is the offset of the foreground frame, g is the linear regression function; Δx, Δy are the center offset predictions of the anchor box; Δh, Δw are the scale factors of the anchor box. Adjust the position and scale of the anchor frame according to the foreground regression. The anchor boxes are then screened using non-maximum suppression to remove overlapping anchor boxes. Then take the first n anchor boxes with the highest confidence as the ROI area, and enter the next step for processing.

4) After obtaining the ROI area extracted by the region proposal network, scale the ROI area, scale it according to the ratio of the original image to the feature map size, and then align the scaled area to the RGB, HSV, and gradient feature maps to obtain three Different ROI feature maps, this method of combining single feature calculation ROI+ROI multi-feature alignment can avoid redundant calculation of ROI area under multi-branch, and improve the reasoning speed.

5) For the ROI area, calculate the contribution degree of the three ROI feature maps to the detection, assign corresponding weight vectors to the three conditional branches according to the contribution degree, and perform feature concatenation according to the respective weight vectors. The salient features in each detected object are different, and using data-driven learning to learn the features that are more conducive to detection among different features of the object and applying the attention mechanism can improve the reasoning ability of the expert system network.

Contribution can be calculated by the following formula:

W=softmax([V ₁ ,V ₂ ,V ₃ ])

In the formula, c is the maximum number of feature channels, f _i ^k is the feature value of the i-th channel after the k-th feature passes through the channel pooling layer, and m _k is the feature mean value of the k-th feature after passing through the channel pooling layer. V _k is the contribution degree of each feature, and W is the final contribution vector. A contribution vector must be calculated for each ROI feature map and multiplied with the contribution vector to obtain three weighted and fused feature vectors.

6) Set up three expert system networks, input the three weighted and fused feature vectors into the corresponding three expert system networks, each expert system network deduces the object category and location, for the sake of simplicity, the three expert system networks Using the same structure, it consists of a channel reduction convolutional layer and a fully connected layer;

Classification: y′=max(h(f _p ))

Classify all the feature vectors obtained by reweighting each ROI feature map, and take the classification result with the highest confidence as the classification result of the ROI feature map.

Regression: r'=[Δx',Δy',Δh',Δw']=g(f _p )

In the formula, r' is the offset of the predicted frame; Δx', Δy' are the center offset predictions of the predicted frame; Δh', Δw' are the scaling factors of the predicted frame; g is the linear regression function;

Regression is performed on each ROI area to obtain a more accurate ROI area.

7) According to the contribution vector obtained in step 5), the prediction results of each expert system network in step 6) are weighted and fused to obtain the final prediction result.

In the formula, y _f is the final classification prediction result, r _f is the final regression prediction result, W _i is the contribution of the i-th branch to the classification prediction, y _i is the classification prediction result of the i-th branch, W _j is The contribution of the jth branch to the regression prediction, r _j is the regression prediction result of the jth branch; after the above process, the final prediction result is obtained, which is marked in the detection image to obtain the category and position of the object, and the final The test results are shown in Figure 3.

The above-mentioned embodiment is a preferred embodiment of the present invention, but the embodiment of the present invention is not limited by the above-mentioned embodiment, and any other changes, modifications, substitutions, combinations, Simplifications should be equivalent replacement methods, and all are included in the protection scope of the present invention.

Claims

The object quick detection method based on conditional branch and expert system, is characterized in that, comprises the following steps:

1) Collect the X-ray image of the detection object on the conveyor belt;

2) Input the X-ray image into the three conditional branches to obtain the image feature maps of RGB, HSV and gradient respectively;

3) Input the RGB image feature map into the region proposal network to obtain the ROI region;

4) Align the ROI area using branch features to obtain ROI feature maps corresponding to the three feature maps of RGB, HSV, and gradient;

5) For the ROI area, calculate the contribution degree of the three ROI feature maps to the detection, assign corresponding weight vectors to the three conditional branches according to the contribution degree, and perform feature concatenation according to the respective weight vectors, wherein each ROI The feature map must calculate a contribution vector, and do point multiplication with the contribution vector to obtain three weighted and fused feature vectors;

6) Input the three weighted and fused feature vectors into the corresponding three expert system networks to obtain the object category and position;

7) According to the contribution vector, the prediction results of the three expert system networks are weighted and fused, and the category and location of the detected object are identified and marked.
The object rapid detection method based on conditional branch and expert system according to claim 1, characterized in that, in step 1), the detection object is placed on the conveyor belt, and when the conveyor belt transports the detection object to the detection area, the X-ray instrument The fan-shaped ray beam is emitted by the collimator to scan the detection object, the fan-shaped ray beam passes through the inside of the detection object and is projected on the receiving screen, and the X-ray image of the detection object is obtained through computer rendering technology.
The object rapid detection method based on conditional branch and expert system according to claim 1, it is characterized in that, in step 2), each branch is provided with a feature extraction network, X-ray images are sent into respectively after color space transformation Three conditional branches, after the operation, the image feature maps of RGB, HSV and gradient are obtained;

The feature extraction network is a deep network consisting of a convolutional layer, a pooling layer and a nonlinear mapping layer;

Its convolution process is as follows:

In the formula, f 1 [x, y] is the data of the image in the (x, y) area, w[x, y] is the convolution kernel, f 2 [x, y] is the convolution in the (x, y) area The resulting features, n i , n j are the offset distances from the convolution center, n 1 , n 2 are the maximum offset distances in the vertical direction and horizontal direction of the convolution, respectively, f[x+n i ,y +n j ] is the value of the image at (x+n i ,y+n j ), w[n i ,n j ] is the weight of the convolution kernel at (n i ,n j );

Its nonlinear mapping process:

f 3 [x,y]=max(0,f 2 [x,y])

In the formula, f 3 [x, y] is the feature map obtained after nonlinear mapping.
The object rapid detection method based on conditional branch and expert system according to claim 1, it is characterized in that: in step 3), each point in the RGB image feature map is defined as an anchor point, and each anchor point takes itself as Define 9 anchor boxes in the center, remove the anchor boxes beyond the image area, and perform binary classification and border regression on the remaining anchor box feature maps:

a. Two classifications: y=f[f 4 (x,y)]

In the formula, y is the classification prediction of the foreground border, f 4 (x, y) is the feature map of the anchor box, f is the classifier, and the classifier artificially sets a threshold, and the prediction greater than this threshold is the foreground, and is added to the subsequent steps to calculate , predictions smaller than this threshold are background and discarded;

b. Border regression: r=[Δx,Δy,Δh,Δw]=g(f 4 [x,y])

In the formula, r is the offset of the foreground border, g is the linear regression function; Δx, Δy are the center offset predictions of the anchor frame; Δh, Δw are the scale factors of the anchor frame; the position and scale of the anchor frame are calculated according to the foreground regression Adjustment; then use non-maximum value suppression to filter the anchor boxes, and remove overlapping anchor boxes; then take the top n anchor boxes with the highest confidence as the ROI area, and enter the next step for processing.
The object rapid detection method based on conditional branch and expert system according to claim 1, characterized in that: in step 4), after obtaining the ROI region extracted by the region suggestion network, the ROI region is scale-adapted, according to the original image Scale in proportion to the size of the feature map, and then align the scaled area to RGB, HSV, and gradient feature maps to obtain three different ROI feature maps.
The object rapid detection method based on conditional branching and expert system according to claim 1, characterized in that: in step 5), for the ROI region, calculate three kinds of ROI feature maps for the degree of contribution that can be detected, according to available The contribution degree assigns corresponding weight vectors to the three conditional branches and performs feature concatenation according to the respective weight vectors;

The contribution rate is calculated by the following formula:

W＝soft max([V 1 ,V 2 ,V 3 ])

where c is the maximum number of feature channels,
is the feature value of the i-th channel after the k-th feature passes through the channel pooling layer, m k is the feature mean value of the k-th feature after passing through the channel pooling layer, V k is the contribution degree of each feature, and W is For the final contribution vector, a contribution vector must be calculated for each ROI feature map, and dot multiplication with the contribution vector to obtain three weighted and fused feature vectors.
The object rapid detection method based on conditional branch and expert system according to claim 1, is characterized in that: in step 6), three expert system networks are set, and the feature vectors after three weighted fusions are input into corresponding Three expert system networks, each expert system network infers the object category and location;

Each expert system network needs to complete two tasks of classification and regression:

Classification: y′=max(h(f p ))

In the formula, f p is the feature vector of weighted fusion, h is a multi-classifier, and the output y' is the confidence degree of each class;

Classify all feature vectors obtained by reweighting each ROI feature map, and take the classification result with the highest confidence as the classification result of the ROI feature map;

Regression: r'=[Δx',Δy',Δh',Δw']=g(f p )

In the formula, r' is the offset of the predicted frame; Δx', Δy' are the center offset predictions of the predicted frame; Δh', Δw' are the scaling factors of the predicted frame; g is the linear regression function;

Regression is performed on each ROI area to obtain a more accurate ROI area.
The object rapid detection method based on conditional branch and expert system according to claim 1, characterized in that: in step 7), according to the contribution vector obtained in step 5), for each expert system network in step 6) The prediction results are weighted and fused to get the final prediction result:

In the formula, y f is the final classification prediction result, r f is the final regression prediction result, W i is the contribution of the i-th branch to the classification prediction, y i is the classification prediction result of the i-th branch, W j is The contribution of the jth branch to the regression prediction, r j is the regression prediction result of the jth branch; after the above process, the final prediction result is obtained, which is marked in the detection image to obtain the category and position of the object.