CN110070536B

CN110070536B - Deep learning-based PCB component detection method

Info

Publication number: CN110070536B
Application number: CN201910333652.XA
Authority: CN
Inventors: 高�浩; 杨泽宇; 胡海东
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2019-04-24
Filing date: 2019-04-24
Publication date: 2022-08-30
Anticipated expiration: 2039-04-24
Also published as: CN110070536A

Abstract

The invention discloses a PCB component detection method based on deep learning, which comprises the following steps: acquiring a large number of PCB images and marking the PCB images for training a network; training the faster-rcnn to detect the position of the component and cutting the component; training a simple convolution network to judge the polarity of the component; training an EAST network to detect the position of a text box on an image of the component and cutting the text box; training a CRNN network to identify text contents in the cut text box images; and comparing the polarity and the text content with the PCB design file to obtain a result. The invention realizes the full-automatic identification of the object identifier and solves the problem of difficult butt joint in each detection stage at present.

Description

Deep learning-based PCB component detection method

Technical Field

The invention relates to the technical field of automatic detection of PCB components, in particular to a PCB component detection method based on deep learning.

Background

The PCB, i.e. the printed circuit board, is an important component of various electronic devices, and is a support for electronic components, and almost every kind of electronic devices commonly used in life, such as electronic watches, calculators, computers, electronic communications, etc., need to use a PCB board. Therefore, the PCB can be developed more and more widely, and the characteristics of high reliability and high density of the PCB cannot be divided, and the characteristics also determine that the requirement on the accuracy of each component is very high, so that the large-scale detection of the PCB becomes one of important processes for PCB production.

In recent years, as the size of components on a PCB becomes smaller and the types of components become more diversified, the accuracy and speed of a purely manual visual inspection method cannot meet the production requirements, and the methods of power-on detection and visual detection are continuously developed, so that the automatic monitoring of the appearance, type, position, polarity, model and other indexes of the components on the PCB is expected to be realized. Most of the traditional visual detection methods rely on a workbench, a mechanical arm, a CCD lens and the like to perform comparative analysis with a standard image, and the method is slow in speed and low in automation degree. With the rapid development of deep learning, more target detection methods based on neural networks become a popular research direction for detecting the PCB, the method has the advantages of high speed and high precision, meanwhile, an end-to-end detection scheme can be realized, the automation degree is higher, but the method has a single function, most of the methods only detect the positions and the types of components, and a highly integrated comprehensive automatic detection system is lacked, and the detection method comprises the step of detecting the polarity, the model and other information of the components.

Disclosure of Invention

The invention aims to solve the problems in the prior art and provides a PCB component detection method based on deep learning, which can realize efficient and automatic detection of wrong components on a PCB, can uniformly package the traditional image processing, target detection and text recognition algorithms, and improves the automation degree and accuracy of detection.

In order to realize the purpose, the invention adopts the technical scheme that:

a PCB component detection method based on deep learning comprises the following steps:

s1, acquiring a PCB data set to be detected and preprocessing the PCB data set;

s2, identifying the position and the type of the component on the PCB board by using a target detection network, and cutting out an image of a single component;

s3, constructing a neural network model, and detecting the polarity of the components in each component image;

s4, detecting a text box in each component image by using an EAST network, and identifying characters in the text box by using a CRNN network;

and S5, comparing the detected polarity of the component and the recognized characters with a PCB design file to obtain a detection result.

Preferably, S1 further includes: shooting a large number of sample images of the PCB to be detected, and carrying out attitude estimation on the camera by means of a marker; and performing radiation transformation and perspective transformation according to the angular point coordinates to obtain image correction, and labeling component information in the image to obtain a required PCB data set.

Specifically, a large number of sample images of the PCB to be detected are captured, and in consideration of the fact that the direction angle of each component mark is different, four cameras in different directions need to be erected to collect image information in different directions. Because the adopted image angle is not beneficial to image detection and text recognition, image correction is required, and in order to improve the correction precision, the posture of the camera can be estimated by means of a marker.

Specifically, four markers at fixed positions are arranged, the contour and corner coordinates of the markers are detected by edge detection through image binarization, radiation transformation and perspective transformation are carried out according to the corner coordinates to obtain a corrected standard image, and component information in the image is marked.

Preferably, in S2, the target detection network adopts a fast-rcnn network structure.

Specifically, the underlying network for target detection is the residual neural network resnet-101. Compared with the traditional convolution neural network such as VGG, the complexity of the residual neural network resnet-101 is reduced, and the required parameters are reduced; the problem of gradient dispersion can not occur when a deep network structure is constructed; meanwhile, the problem of degradation of a deep learning network is solved, so that the high-precision target identification effect is better.

Specifically, the fast-rcnn is adopted as a target detection framework, compared with other detection frameworks such as yolo and ssd, the fast-rcnn detection is higher in precision but longer in time consumption, and the fast-rcnn detection method is not high in real-time requirement due to the fact that the fast-rcnn is applied to product flaw detection, so that the fast-rcnn is more suitable for the scene.

Specifically, the target detection may obtain the vertex coordinates and device category name of each component, compare the vertex coordinates and device category name with a standard design drawing, and feed back error information to a system maintenance person for manual confirmation when the category of the component is inconsistent with the design drawing. For the components with detailed coordinate information in the design drawing, when the detection type is correct, the components can be adjusted according to the standard coordinate information of the drawing, and subsequent further detection is facilitated.

Preferably, in S3, the neural network model adopts a classification network composed of a five-layer convolution network, a two-layer fully-connected network and a softmax activation function.

Specifically, in the detected component, polarity detection needs to be considered because some devices on the PCB are differentiated by positive and negative polarities. Because the characteristic of distinguishing the polarity is not obvious enough, a step of judging the polarity is separately added between two stages of target detection and identifier character recognition (namely text box character recognition), and error information is fed back to maintenance personnel. The step of polarity determination is a simple binary task.

Preferably, in S4, the EAST network has a two-stage workflow, wherein the first stage directly generates word or text line level predictions using a fully convolutional network model, and the second stage sends the generated word or text line level predictions to non-maximum suppression to produce final results, and the two stages achieve end-to-end training.

Specifically, the recognition of the identifier is divided into text pinpointing (i.e., text box detection) and text recognition. Because the text is composed of discontinuous characters, a simple target detection algorithm cannot accurately position the text, and meanwhile, in order to improve the automation degree of the system, the text direction is considered to be distinguished and corrected, and a character-based detection algorithm EAST is adopted. This algorithm has a two-stage workflow, where the first stage uses a complete convolutional network (FCN) model that directly generates word or text line level predictions, eliminating redundant and slow intermediate steps; the second stage sends the generated text predictions to non-maximum suppression to produce the final result, and the network structure can realize end-to-end training and optimization.

Preferably, in S4, the CRNN network inputs the images as a sequence into the long-short term memory network to process continuous text of arbitrary length.

Specifically, after the coordinates of each component identifier are extracted, that is, after the text box of each component image is determined, text recognition needs to be performed on the identifier (that is, the text in the text box).

Specifically, a deep neural network framework CRNN fusing a plurality of network structures of CNN, RNN and CTC is selected, and the method has the following advantages: the method has the advantages that end-to-end training is realized, sequences with any length can be processed, character segmentation or horizontal and vertical standardization is not involved, no predefined words are required in a context text cognition task without a dictionary and based on the dictionary, a model is smaller, the method is more suitable for an actual application scene, and the method has stronger generalization.

Specifically, the final detection result is obtained according to each detection step, and by comparing information such as the type, polarity and model of the detection component with standard design information, error information and component information with low confidence coefficient are provided for a manual detection person to review.

Compared with the prior art, the invention has the beneficial effects that: according to the invention, the object detection and character detection and identification framework is subjected to interface packaging, so that full-automatic identification of the object identifier is realized, and the problem of difficult butt joint in each detection stage at present is solved; the CRNN network framework of the invention unifies the loss functions of the three components, measures the difference of the results by using the logarithmic loss of the predicted label and the real label, and can also realize end-to-end training.

Drawings

FIG. 1 is a schematic flow diagram of a method of the present invention according to an embodiment;

FIG. 2 is a schematic diagram of an EAST network architecture according to an embodiment;

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The invention provides a PCB component detection method based on deep learning, which comprises the following steps:

s1, acquiring a PCB data set to be detected and preprocessing the PCB data set;

Examples

The embodiment provides a PCB component detection method based on deep learning, as shown in fig. 1, which specifically includes the steps of:

(1) preparing a data set:

a large number of sample images of the PCB are shot, and four cameras in different directions need to be erected to collect image information in different directions in consideration of different direction angles of the mark symbol of each component. Because the adopted image angle is not beneficial to image detection and text recognition, image correction is required, and in order to improve the correction precision, the posture of the camera can be estimated by means of a marker mode. Setting four markers at fixed positions, detecting the outline and the corner coordinates of the markers by edge detection through image binarization, performing radiation transformation and perspective transformation according to the corner coordinates to obtain a corrected standard image, and labeling component information in the image by using labelme.

The labeling is divided into two steps: firstly, marking the type and position of each component on an image, and then cutting all the components from the original image to form a large number of independent component images; and secondly, marking characters on the images of the independent components for training a character detection network. The two data sets are expanded by affine transformation (rotation, translation, scaling and the like) and image transformation (noise, color offset, Gaussian blur, sharpening and the like), and a training set, a verification set and a test set are respectively designed. The samples are typically divided into independent three-part training set (train set), validation set (validation set), and test set (test set). The division of the three sets of original data is performed to select the most effective model with the best generalization capability.

The partitioning of the three data sets is done on the raw data, also to prevent model overfitting. When all of the raw data is used to train the model, the result is likely that the model fits the raw data to the maximum extent possible, i.e., the model exists to fit all of the raw data. But testing this model in exchange for other data sets may not work as well. When new samples appear and the model is used for prediction, the effect is probably not as good as that of the model trained by only using a part of data.

One typical division is that the training set is 50% of the total samples, while the others are 25%, all three being randomly drawn from the samples. For the case of a small number of samples, a small portion is usually left as a test set. And then adopting a K-fold cross-validation method for the rest N samples. The method comprises the steps of disordering a sample, uniformly dividing the sample into K parts, selecting K-1 parts of training in turn, verifying the rest parts, calculating the sum of squares of prediction errors, and averaging the sum of squares of prediction errors of K times to be used as a basis for selecting an optimal model structure. Taking N as K is a leave one out method.

(2) And detecting the position coordinates and device names of all components on the PCB by using a target detection algorithm, and checking the position coordinates and the device names with a standard design drawing to give error information:

the basic network for target detection is a residual error neural network resnet-101, and compared with a traditional convolutional neural network such as VGG (convolutional neural network), the complexity is reduced, and required parameters are reduced; the problem of gradient dispersion can not occur when a deep network structure is constructed; meanwhile, the problem of degradation of a deep learning network is solved, so that the high-precision target identification effect is better. In the embodiment, the fast-rcnn is adopted by the target detection framework, compared with other detection frameworks such as yolo and ssd, the fast-rcnn is higher in detection accuracy but longer in time consumption, and the fast-rcnn is not high in real-time requirement and is more suitable for the scene because the fast-rcnn is applied to product flaw detection.

Introduction of the framework: a picture containing a plurality of RoIs (regions of interest) is input into a multi-layer fully-connected network, feature maps are obtained, then each RoI is pooled into a feature map with a fixed size, and then the feature map is stretched into a feature vector by the fully-connected layer. For each RoI, the feature vectors obtained after the full connectivity layer are finally shared: one is used for performing softmax regression after full connection and used for performing object identification on the RoI area, and the other is used for performing b-box regression after full connection and performing correction positioning, so that the positioning frame is more accurate. And segmenting the components from the original image according to the positioning frame and sending the segmented components to a subsequent network for judgment.

(3) Constructing a neural network model, and detecting the polarity of each component:

the polarity determination can be classified as a multi-classification problem. The solution can be solved by several layers of convolution pooling and finally adding a full connection layer.

The first layer of input data is the original 227 x 3 image, which is convolved by 96 convolution kernels of 11 x 3. The feature map pixels generated 55 x 96 form the pixel layer after convolution of the original image. These pixel layers are processed by the relu unit to generate pixel layer data of activated pixel layers, still 55 x 96 in size. These pixel layers are processed by pool operations (pooling operations) with a scale of 3 x 3 and a step size of 2, and the size of the pooled image is (55-3)/2+1= 27. I.e. the size of the pooled pixels is 27 x 96; then normalization processing is carried out, and the scale of normalization operation is 5 x 5; the pixel layer formed after the first convolution operation is completed has a size of 27 × 96. Respectively corresponding to 96 convolution kernels.

The second layer was fed with a signature 27 x 96, using 256 convolution kernels of 5 x 96 and 0 fill to maintain the signature size to generate a signature 27 x 256, which was processed by the relu cell to generate activated pixel layers, again of size 27 x 256. After pool calculation with step size 2 through 3 × 3, 13 × 256 feature maps were obtained.

The third layer uses 384 convolution kernels of 3 x 256 to convolve the upper layer outputs, and the signature of 13 x 384 is generated using 0 fill hold image size via relu to generate the activated pixel layers. The fourth layer is the same as the third layer.

And adding a pool layer on the third layer to generate a characteristic diagram of 6 x 512 for inputting into the full-connection layer.

The sixth and seventh layers are fully connected layers with dimensions 4096 x 4096 and 4096 x polarity direction numbers, respectively. And adding a drop operation in the full connection layer in the training process to prevent overfitting.

And finally, outputting the classification probability by a softmax activation function, and obtaining the direction of the input image by taking the maximum value.

(4) For each component, determining the precise position of the identifier text on each component:

the invention adopts a text-based detection algorithm EAST, which has a two-stage workflow, as shown in fig. 2. The first stage of the process uses a complete convolution network (FCN) model that directly generates word or text line level predictions, eliminating redundant and slow intermediate steps. The second stage sends the generated text prediction to non-maximum suppression to produce the final result, and the network structure can realize end-to-end training and optimization.

Since the size of a word region varies greatly, a feature from the back of the neural network is required to determine the position of a large word, and feature information at an early stage is required to predict a region containing a small word. A structure similar to U-net is used here to enable gradual fusion of various levels of features to enable the exploitation of multi-scale features without adding too much computational burden. Thus, a full convolutional network can be roughly divided into three parts: a feature extraction backbone network, a feature fusion branch and an output layer.

The feature extraction backbone network can be used in ImageNet and classical backbone networks such as trained PVANet, VGG16 and the like, and feature extraction is realized by removing a full connection layer and retaining a convolution result.

The feature fusion branch is actually an upsampling process, the size of a feature map is expanded by two times from the last layer of a backbone network through an unpool layer, the number of channels of the feature map is reduced, the feature map is connected with the feature map on the last layer of the backbone network in series, the feature map is sent to the unpool on the next layer through two layers of convolution, the operation is repeated, and finally a character area geometric prediction map with the same size as an original image and a pixel confidence score map corresponding to the character area geometric prediction map are obtained through several 1 × 1 convolutions. And (3) reserving the geometric prediction graph with higher confidence coefficient through a preset threshold, removing redundant geometric prediction by using a non-maximum suppression algorithm, obtaining a final original upper character boundary box, and dividing the final original upper character boundary box into the following flows.

In the training, firstly, pre-training is carried out on a natural scene detection data set COCO-Text to enable a network to extract features as much as possible to prevent overfitting, and then fine tuning is carried out on a marked picture data set of the PCB component to adapt to service data.

(5) Text recognition is carried out on the text box in the meta device, and comparison is carried out on the text box and the standard model information:

in this embodiment, a deep neural network framework CRNN that integrates multiple network structures of CNN, RNN, and CTC is used to identify text, where the network framework of CRNN is mainly divided into three layers, the first layer: and a depth convolution layer (DCNN) compresses the original input graph to the same size, performs feature extraction in a convolution, pooling and full-connection mode, and finally obtains a feature map as the input of the next layer, wherein at the moment, one column of the feature map is equivalent to a matrix area of the original image, and the specific mapping relation is related to the selection of convolution and pooling.

The second layer is a Bidirectional sequence feature extraction layer (Bidirectional current Neural Network), at this time, the row vector of the feature map matrix is equivalent to the feature, the column vector becomes a sequence data, and the layer outputs a sequence feature with the length equal to the column and the width of 1. The reasons for selecting the bidirectional LSTM as the sequence feature extraction are five: the method has good feature capture capability on texts with context relation and characters with longer width which cannot be represented in a column in a feature diagram; secondly, the method can be propagated to a DCNN layer in a backward direction, so that the whole network framework can share a loss function; thirdly, the method can process texts with any length; fourthly, when the text is long, the long-term dependence problem can be overcome; and fifthly, the bidirectional transmission can solve the problem of front-back dependence.

The third Layer is a sequence label (translation Layer) Layer, the task of the third Layer is to convert the sequence features into required labels, the conversion is divided into a dictionary-based mode and a dictionary-free mode, when the dictionary is used, the obtained labels are words with the highest probability in the dictionary, and when the dictionary is not used, the labels are formed by characters with the highest probability, and the whole labels are not required to be in the dictionary. The final output in the second layer is in the form of

T is the number of columns of the feature map matrix, then

Indicating that a certain character is

A probability of

Is a label for each column of the matrix,

indicates a predicted tag consisting of

The correct label in training, obtained by removing repeated and blank characters, is

The predicted probability of the tag being correct is

When the correct label is not known during testing, two methods are used to obtain the result, the first method is not based on a dictionary, and the second method is based on a dictionary

The second method is based on a dictionary, traverses all labels in the dictionary, and selects a label with the maximum probability as an output result, but the method has a problem that too much time is consumed when the number of labels in the dictionary is too large, so that the length of a text is determined in a non-dictionary mode in a dictionary-based mode, and then the labels are selected based on the dictionary, so that the number of the traversed labels can be greatly reduced. And comparing the final prediction result with the standard model to obtain whether the selection of the component is correct or not.

Therefore, the original positioning problem and the polarity judgment problem in PCB original detection can be solved.

The invention constructs a perfect and uniform neural network framework for detecting the PCB components, solves the requirements of high reliability and high accuracy in the PCB manufacturing link, greatly reduces the manpower consumption of manual detection and the hardware equipment requirements of the traditional mechanical method, can be used as a detection platform or a detection system to detect the PCB of the assembly line in real time through a trained model, and solves the problem that the real-time detection cannot be realized in the traditional detection method.

The beneficial effects of the invention are: according to the invention, the object detection and character detection and identification framework is subjected to interface packaging, so that full-automatic identification of the object identifier is realized, and the problem of difficult butt joint in each detection stage at present is solved; the CRNN network framework of the invention unifies the loss functions of the three components, measures the difference of the results by using the logarithmic loss of the prediction label and the real label, and can also realize end-to-end training.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A PCB component detection method based on deep learning is characterized by comprising the following steps:

s1, acquiring a PCB data set to be detected and preprocessing the PCB data set;

s5, comparing the detected polarity of the component and the recognized text with the PCB design file to obtain a detection result, further comprising: selecting a deep neural network framework CRNN fusing various network structures of CNN, RNN and CTC to identify text content, wherein the network framework of the CRNN is mainly divided into three layers, the first layer is a deep convolution layer, compressing an original input graph to the same size, performing feature extraction in a convolution, pooling and full-connection mode, and finally obtaining a feature map as the input of the next layer, wherein at the moment, one column of the feature map is equivalent to a matrix area of the original image, and the specific mapping relation is relevant according to the selection of convolution and pooling; the second layer is a bidirectional sequence feature extraction layer, the row vector of the feature map matrix is equivalent to the feature, the column vector becomes sequence data, and the layer outputs a sequence feature with the length equal to the column and the width of 1; the third layer is a sequence labeling layer, and the task of the third layer is to convert the sequence characteristics into the required labels; and comparing the final prediction result with the standard model to know whether the selection of the component is correct or not.

2. The PCB component detection method based on deep learning of claim 1, wherein S1 further comprises: shooting a large number of sample images of the PCB to be detected, and carrying out attitude estimation on the camera by means of a marker; and performing radiation transformation and perspective transformation according to the angular point coordinates to obtain image correction, and labeling component information in the image to obtain a required PCB data set.

3. The deep learning based PCB component detecting method of claim 1, wherein in S2, the target detection network adopts a fast-rcnn network structure.

4. The PCB board element detection method based on deep learning of claim 1, wherein in S3, the neural network model adopts a classification network consisting of a five-layer convolution network, a two-layer full-connection network and a softmax activation function.

5. The deep learning-based PCB board component inspection method of claim 1, wherein in S4, the EAST network has a two-stage workflow, wherein the first stage directly generates word or text line level prediction using a complete convolution network model, and the second stage sends the generated word or text line level prediction to non-maximum suppression to generate final result, and the two stages achieve end-to-end training.

6. The PCB component detection method based on deep learning of claim 1, wherein in S4, the CRNN inputs the images as a sequence into a long-short term memory network to process continuous words of any length.