CN114708593B

CN114708593B - Heterogeneous multi-model-based brand recognition method for waste electronic products

Info

Publication number: CN114708593B
Application number: CN202111673248.0A
Authority: CN
Inventors: 汤健; 王子轩; 张晓晓; 荆中岭
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2024-06-14
Anticipated expiration: 2041-12-31
Also published as: CN114708593A

Abstract

The method for identifying the waste electronic products based on the heterogeneous multi-model aims at the problems that the accuracy of the identification method is difficult to meet the actual industrial requirements due to the fact that the identification method is limited by the rarity of related data sets. Extracting a back character area of the electronic product by utilizing CTAFT algorithm, and extracting character parts and integral features of the electronic product to be recovered by utilizing a VGG19 model pre-trained by ImageNet as an image feature embedding model; constructing an OCR character recognition model aiming at part of character features to obtain an OCR sub-model recognition result, and constructing a depth forest classification model aiming at the character and the whole feature to obtain a depth forest sub-model recognition result; and linearly combining the OCR recognition result and the depth forest classification vector, obtaining a class weight vector by using a softmax nonlinear function, and taking the result with the highest weight as an electronic product brand recognition result. The validity of the real mobile phone and the flat plate image shot based on the waste electronic product recycling equipment is verified.

Description

Heterogeneous multi-model-based brand recognition method for waste electronic products

Technical Field

The invention belongs to the field of recovery of waste electronic products.

Background

With the development of technology and the rapid popularization of 5G, the speed of intelligent electronic products is continuously increased. The STRATEGY ANALYTICS predicts that the shipment of 2021 global intelligent electronic products will rebound by 6.5 percent with a total of 13.8 hundred million parts. The speed of replacing electronic products by people is a main reason for continuously increasing the delivery of the electronic products, and the accumulated quantity of personal idle electronic products is increased year by year. Therefore, the market at home and abroad puts higher demands on the recovery efficiency of the electronic product recovery industry. The waste electronic product is taken as a typical urban renewable resource, and is recovered by unmanned and intelligent recovery equipment, so that a great amount of labor cost can be saved. The intelligent waste electronic product identification method is a key for completing the tasks.

Image recognition is widely applied to the fields of target detection, face recognition and the like, and how to use a related data set to construct a classification model to intelligently recognize waste electronic products is also a research focus of current intelligent recovery equipment. But image-based deep neural network model construction relies on massive numbers of labeled samples. The waste electronic product identification problem data set is only derived from actual photographed pictures of the recovery equipment prototype, the data size is small, an effective neural network classification model is difficult to construct, the definition of photographed images in the industrial process is low, and the problems of poor image integrity of the electronic product, partial region mirror image of the electronic product and the like can be caused by irregular operation of users. How to classify brands of electronic products on the premise of small sample size and low sample quality has become a major problem to be solved at present.

Based on the research current situation, the authors of the invention put forward a waste electronic product identification system based on parallel differential evolution-gradient feature depth forests, and a mobile phone brand classification model is constructed by using a waste mobile phone back image, so that the model classification accuracy can reach 80.12%; the old and useless electronic product recognition system based on optical character recognition utilizes the back characters of the old and useless electronic products to construct a character classification model, and the character classification result is mapped into old and useless electronic product brands through a mapping rule, so that the classification accuracy of the model can reach 86.37 percent. However, the method only builds the classification model from a single angle of texture features, character features and the like, and the model precision still hardly meets the actual industrial requirements. Therefore, the invention provides a method for identifying waste electronic products based on heterogeneous multi-model.

Firstly, extracting a back character area of an electronic product by utilizing CTAFT algorithm; then, performing feature extraction on the back image of the electronic product and the character feature area of the back image by using an ImageNet pre-trained VGG19 model, and replacing a single dimension feature by using a high dimension convolution feature; then constructing an Optical Character Recognition (OCR) model based on the character features, and constructing a deep forest electronic product classification model based on the electronic product image features and the character features; finally, the classification results of different models are linearly spliced, and a final classification result is obtained through a Softmax activation function. Based on the typical electronic product image data set of the telecommunication equipment authentication center of the industry and informatization department, the effectiveness of the algorithm in the identification of the waste electronic products is verified.

Disclosure of Invention

The method for identifying the waste electronic products based on the heterogeneous multi-model comprises the following steps: the system comprises an image preprocessing module, a multi-element feature extraction module and a heterogeneous multi-model identification module, wherein the number of the image preprocessing module, the multi-element feature extraction module and the heterogeneous multi-model identification module is 3. The overall system structure is shown in fig. 1.

The meaning of the variables appearing in the present invention is shown in Table 1.

TABLE 1 variable meaning Table

The input of the image preprocessing module isThe data enhancement preprocessing output is X _img, and the character preprocessing output by using a CRATE character level target detection algorithm is X _digit;

The multi-element feature extraction module acquires the representation of character features and integral back image pixel features in a high-dimensional space by using a VGG19 network based on ImageNet pre-training, wherein the input of the module is X _img and the output of the module is X _digit, and the output of the module is respectively And

The heterogeneous multi-model recognition module comprises 3 parts including an OCR character recognition sub-module, a deep forest electronic product recognition sub-module and a softmax nonlinear output layer sub-module, wherein: OCR submodule input isOutput is/>Depth forest submodule input is/>Output is/>The softmax nonlinear output layer sub-module maps the output result of the classification sub-module to obtain/>The label with the highest score is the final output/>

2.1 Image preprocessing Module

2.1.1 Data enhanced Pre-processing

Data enhancement randomly alters the training samples can reduce the dependence of the model on certain properties, thereby improving the generalization ability of the model. The data enhancement method mainly comprises geometric transformation, color space transformation, kernel filter, mixed image, random erasure, countermeasure training, enhancement based on a generated countermeasure network and nerve style transfer. The geometric transformation can solve the problem of position deviation of training samples, and in the waste electronic product recycling process, the position deviation of the electronic product image can be generated due to different placement positions of users, so that the data enhancement used by the invention is mainly based on the geometric transformation. The method specifically comprises the following steps: rotation, turnover, mirroring, translation, addition of gaussian noise, etc.

2.1.2 Character enhanced pretreatment

In the electronic product recycling process, irregular operation of a user can cause the problems that a camera acquires an image incompletely, an image of an electronic product images and the like, and the image acquired by recycling equipment is directly used for model construction and the brand prediction effect of the electronic product is poor. The back characters of the electronic products are important basis for identifying brands of the electronic products, but the back characters are worn and blocked in the use process of users, and the model is limited only by taking the back characters as the classification basis. Therefore, the author selects character features in the back image of the electronic product as one of the classification basis, adopts a CRATT character-level image positioning algorithm to determine the character position of the electronic product and divide the character, and linearly splices the character features with the whole picture to be used as a subsequent model input. The method solves the problem that the classification model is difficult to construct according to the mobile phone image and the model is limited by constructing the single character feature.

A large number of experiments show that the target detection algorithms such as YOLO3 and Fast-RCNN are widely applied to the fields of face detection, license plate detection and the like, but the target aspect ratio detected by the algorithms is relatively fixed, and the problems of deformation, abrasion and the like are mostly avoided, in the recycling problem of electronic products, the character strings are rotated and deformed due to the difference of placement positions, and the abrasion phenomenon is caused by the characters of the waste electronic products, so that the positions of the characters are directly calibrated for training, the effect is poor, the manual labeling workload of single characters in the image dataset of the electronic products is difficult to estimate, and the known target detection dataset has character level labels. The CRAST algorithm trains the artificial data set with character labels in a weak supervision learning mode, when the back picture of the electronic product without character division is used as input, the model can detect and synthesize the corresponding character labels and then identify the corresponding character labels, and the algorithm predicts the area of the text according to the degree of tightness among the characters. The CRAFT model training process is shown in fig. 2.

For an artificial dataset, the dataset comprises a Gaussian heat map of a single character in the map, and the CRATT algorithm carries out supervision training on the part; for the back image data set of the electronic product, firstly marking a text box area in the image of the electronic product, and stretching the text box area to a text box which is relatively positive through perspective transformation; and then, obtaining a position frame of a single character by using a watershed algorithm, generating a corresponding Gaussian heat map, and re-pasting the corresponding position of the label map corresponding to the original map after transformation. The scoring formula of the segmentation result of the watershed algorithm is calculated as follows:

Wherein, l (w) represents the length of the text box of the image of the electronic product, and l ^c (w) represents the result of dividing the length of the character string by the watershed algorithm.

After the watershed algorithm is divided to obtain the character string length, algorithm evaluation is obtained according to the formula (1), if the evaluation is consistent with the real character length, the confidence degree S _c (p) is 1, and the lower the score is, the worse the reliability of the division result is.

The electronic product image collected by the industrial equipment can be represented as X _img, the size of the electronic product image is 400X 300, and the character image of the electronic product image after 2.1-section image pretreatment can be represented asWherein/>The i-th character in the character image is represented, the single character size is 50×50, and the overall character image size is 50×50×m. For visual description of the pixel change of the multi-element feature extraction module, fig. 3 is an example of a five-character image, and the size of X _digit is 50×250. The output dimension of the multi-element feature extraction module is determined by human beings, so that the output dimension of the m-character image is consistent with the output dimension of the five-character image. The multi-element feature extraction module structure is shown in fig. 3.

The model adopts a VGG19 model pre-trained based on ImageNet as a base model, and firstly, parameters of a convolution layer and a pooling layer in the VGG19 model are solidified; then, constructing full-connection layers with different sizes according to different image features; and finally, outputting the linear combination of the model outputs of the different images as the input features of the subsequent classification model. The feature dimension after the multi-element feature extraction is determined by the dimension of the full connection layer. The feature extraction process for the mobile phone image X _img and the character image X _digit with different sizes is shown in the formula (2):

wherein, f _VGG (·) represents the VGG19 model output process.

2.3 Heterogeneous Multi-model identification Module

The accuracy of the electronic product classification model constructed from only a single angle of texture features, character features and the like is verified to be still difficult to reach the actual industrial requirements. Therefore, the invention adopts STRACKING integration thought, linearly combines the characteristics of different angles, constructs the heterogeneous multi-classification model, and improves the precision of the whole model through the integration of a plurality of different models. Aiming at the brand classification problem of the waste electronic products, an OCR character recognition model and a deep forest recognition model are constructed, and the sub-model structure is shown as follows.

2.3.1OCR character recognition model submodule

In the OCR electronic product back character recognition process, only the character features described in the section 2.2 are usedAs input. Firstly, extracting character sequence characteristics containing complete context information through a bidirectional LSTM; then, solving the problem that the input characteristics and the output sequences cannot be matched through a CTC network; finally, determining the distance between the OCR output character string and the known label through the Levenstein distance, and finally obtaining the electronic product brand classification result/>The OCR character recognition model structure is shown in fig. 4.

As shown in FIG. 4, the OCR character recognition model is based on character features obtained by the image preprocessing sectionK (where k > m) LSTM base units are constructed. The bidirectional LSTM network comprises two sub-network structures, and formulas (3) and (4) represent the transfer of the antecedents and postambles respectively.

Wherein k is the LSTM basic unit super parameter,Representing the output result of forward LSTM at instant i,/>The output result of backward LSTM at time i is shown, i input x _i is shown, and bidirectional LSTM output at time i is:

The CTC network then deduplicates the duplicate recognized characters in the bi-directional LSTM network output [ h ₁,h₂,...,h_x ] to become [ y ₁,y₂,...,y_n ]. Since the bi-directional LSTM base unit is more than the number of handset characters n, the characters are repeatedly divided, for example, "honor" may be divided into "hoonorr". "hoonorr" multiple substrings can be mapped to the correct result "honor" as shown in equation (6)

The CTC network obtains the final result Y by maximizing the posterior probability P < y|x > given the input X, where P < y|x > is as shown in equation (7):

Where pi ε B (Y) represents all substring sets that can be integrated as Y.

2.3.2 Depth forest recognition model submodule

In the image recognition process of the deep forest waste electronic product, the character features are characterizedAnd image characteristics/>The linear combination results in a depth forest input feature X _DF, which is shown in equation (8):

Firstly, constructing different random forests by using X _DF to obtain different random forest outputs Random forest output/>Linear combination with X _DF, as input, transmitting into a next layer model to construct different random forests, and determining whether to continue constructing the next layer network model according to the classification precision of the current model; finally, ending model growth when the model accuracy is not improved any more, weighting the last plurality of random forest classification results to obtain a final classification result/>The depth forest recognition model structure is shown in fig. 5.

2.3.3 Multi-model output weighting Module sub-Module

OCR character recognition model output in the heterogeneous multi-model recognition moduleIs a continuous character string, is mapped based on distance measurement and then is output as a brand of an electronic product, and a depth forest recognition model outputs/>Probability of branding all electronic products. In order to solve the problem that heterogeneous models have different output forms or inconsistent output results, the invention adds a multi-model output weighting module at the end of the classification model. The multi-model weighted output module is shown in fig. 6.

The softmax function, also known as a normalized exponential function, is a classifier widely used in the supervised learning section of deep networks in current deep learning studies. The Softmax function is shown in equation (9):

Where n+1 represents the heterogeneous multi-model output vector dimension and e represents the natural logarithm. In the classification model of the invention, n waste electronic product labels are provided, and OCR character recognition results are obtained Depth forest recognition results/>Obtaining n+1-dimensional result vector/>, after linear splicingInput as a softmax function, finally obtainedCorresponding to the weight, the label with the highest weight is used as the final waste electronic product classification result/>

1. And performing feature extraction on the waste electronic product image by using the VGG19 network pre-trained by the ImageNet. With the penetration of the convolution layer, the receptive field of the single feature is continuously increased, the characterization capability is continuously enhanced, and the method is better than a single-angle feature extraction method. Compared with the construction of a depth forest classification model only on the HOG features sensitive to textures, the precision of the model constructed by using the VGG19 is obviously improved.

2. And constructing a waste mobile phone electronic product classification model by adopting a heterogeneous multi-model method, constructing classification models for different tasks by using the same data set, and finally weighting a plurality of model outputs by using a nonlinear function to obtain a final classification result. Proved by experiments. Compared with a single OCR recognition model and a single depth forest recognition model, the heterogeneous multi-model provided by the invention has the advantage that the precision is obviously improved.

Drawings

FIG. 1 is a diagram of a structure of a method for identifying waste electronic products by heterogeneous multi-model

FIG. 2 CRATT positioning and cutting module structure diagram

Fig. 3 multiple feature extraction module

FIG. 4OCR character recognition model

Figure 5 depth forest identification model

FIG. 6 is a diagram of a multi-model weighted output architecture

FIG. 7 application scenario of waste electronic product recovery equipment

FIG. 8 data enhancement effect diagram

FIG. 9 image preprocessing results

FIG. 10OCR recognition model confusion matrix

Figure 11 depth forest recognition model confusion matrix

FIG. 12 heterogeneous multi-model old and useless electronic product classification model confusion matrix

Detailed Description

An application scene of the waste electronic product recovery equipment is shown in fig. 7, and experimental data of the invention are derived from a real shot picture of the equipment. The dataset contains 123 images of 10 categories of brand of waste electronic products, namely Hua mobile phone (HUAWEI), hua tablet (MatePad) glowing (Honor), millet (Mi), zhongxing (ZTE), OPPO, VIVO, apple (iPhone), apple tablet (iPad) and other brands (Others).

Because fewer real-time shooting samples are taken by the waste electronic equipment, a training set and a testing set sample are expanded by adopting a data enhancement means before a classification model is constructed. Taking honor back images of the mobile phone as an example, the back images of the mobile phone are subjected to operations such as rotation, folding, noise adding and the like, 1 sample of the back images of the mobile phone is expanded to 12 samples, and the total number of the samples is 400 to 4800. A sample expansion schematic is shown in fig. 8.

Then, the image of the waste electronic product to be recovered is segmented by adopting a CRAST character segmentation algorithm, a corresponding electronic product character data set is obtained, and an image preprocessing result is shown in fig. 9.

The multivariate feature extraction part pretrains the VGG19 model using 1400 ten thousand pictures 2 ten thousand classes of ImageNet dataset, and the pretrained VGG model is denoted as f _VGG (). According to different input images, the multi-element feature extraction part adds full-connection layers with different sizes into the VGG model, wherein 1024-dimensional full-connection layers are added for 400-300 waste electronic product images, and 512-dimensional full-connection layers are added for 50-50 waste electronic character images.

The OCR character recognition module builds 128 LSTM primitives, i.e., k=128, for the EasyOCR chinese-english character pre-training model. The depth forest recognition model uses random forests, GBDT as the basis classifier for each layer of depth forest model, where each uses 100 decision trees to construct RF and GBDT, optimizing GBDT loss functions using l1+l2 regularization.

The OCR character recognition model classification confusion matrix is constructed by using the character pictures after image preprocessing is shown in fig. 10, and the depth forest recognition model classification confusion matrix is constructed by using the waste electronic product images and the character pictures is shown in fig. 11.

The results of the 2 models are integrated through the multi-model output weighting module, so that a waste electronic product classification model confusion matrix based on heterogeneous multi-model is obtained, and as shown in fig. 12, the classification accuracy can reach 90.17%.

In order to verify the effectiveness of the method, the method uses the same waste electronic product data set to respectively construct 10 classification models of single feature + depth forest, VGG feature + OCR model. The accuracy of different brand classification models based on the image dataset of the waste electronic product is shown in table 1.

Table 1 precision comparison table for waste electronic product identification model

Claims

1. The method for identifying the waste electronic products based on the heterogeneous multi-model is characterized by comprising the following steps of: the system comprises an image preprocessing module, a multi-element feature extraction module and a heterogeneous multi-model identification module, wherein the image preprocessing module, the multi-element feature extraction module and the heterogeneous multi-model identification module comprise 3 parts;

The meaning of the variables appearing is as follows;

The input of the image preprocessing module is The data enhancement preprocessing output is X _img, and the character preprocessing output by using a CRATE character level target detection algorithm is X _digit; the method comprises the steps that a multi-element feature extraction module obtains representations of character features and whole back image pixel features in a high-dimensional space by using a VGG19 network based on ImageNet pre-training, wherein the inputs of the multi-element feature extraction module are X _img and X _digit, and the outputs of the multi-element feature extraction module are/>And/>

The heterogeneous multi-model recognition module comprises 3 parts including an OCR character recognition sub-module, a deep forest electronic product recognition sub-module and a softmax nonlinear output layer sub-module, wherein: OCR submodule input isThe output isDepth forest submodule input is/>Output is/>The softmax nonlinear output layer sub-module maps the output result of the classification sub-module to obtain/>The label with the highest score is the final output/>

The image preprocessing module comprises data enhancement preprocessing and character enhancement preprocessing;

Character enhancement preprocessing selects character features in the back image of the electronic product as one of classification basis, adopts a CRATE character-level image positioning algorithm to determine the character position of the electronic product and divide the character, and linearly splices the character features with the whole picture to be used as a subsequent model input;

For an artificial dataset, the dataset comprises a Gaussian heat map of a single character in the map, and the CRATT algorithm carries out supervision training on the part; for the back image data set of the electronic product, firstly marking a text box area in the image of the electronic product, and stretching the text box area to a text box which is relatively positive through perspective transformation; then, a position frame of a single character is obtained by utilizing a watershed algorithm, a corresponding Gaussian heat map is generated, and the corresponding position of a label map corresponding to the original map is pasted again after transformation; the scoring formula of the segmentation result of the watershed algorithm is calculated as follows:

Wherein l (w) represents the length of an image text box of the electronic product, and l ^c (w) represents the length result of the character string segmented by the watershed algorithm;

After the watershed algorithm is divided to obtain the character string length, algorithm evaluation is obtained according to a formula (1), if the evaluation is consistent with the real character length, the confidence coefficient S _c (p) is 1, and the lower the score is, the worse the reliability of the division result is;

The data collected by the industrial equipment is enhanced, preprocessed and displayed as X _img, the size of the image is 400X 300, and the character preprocessed and displayed as Wherein/>Representing the ith character in the character image, wherein the size of a single character is 50 x 50, and the size of the whole character image is 50 x (50 x m); the m character image output dimension is consistent with the five character image output dimension;

Taking a VGG19 model pre-trained based on ImageNet as a base model, and firstly, solidifying parameters of a convolution layer and a pooling layer in the VGG19 model; then, constructing full-connection layers with different sizes according to different image features; finally, outputting the models of different images to be linearly combined to be used as the input characteristics of the subsequent classification model; the feature dimension after the multi-element feature extraction is determined by the dimension of the full connection layer; the feature extraction process of the pre-processed image X _img and the character pre-processed image X _digit for data enhancement of different sizes is as shown in formula (2):

wherein, f _VGG (·) represents the VGG19 model output process;

Constructing an OCR character recognition model and a depth forest recognition model, wherein the sub-model structure is shown as follows;

a) OCR character recognition model submodule

In OCR electronic product back character recognition process, only character features are usedAs input; firstly, extracting character sequence characteristics containing complete context information through a bidirectional LSTM; then, solving the problem that the input characteristics and the output sequences cannot be matched through a CTC network; finally, determining the distance between the OCR output character string and the known label through the Levenstein distance, and finally obtaining the electronic product brand classification result/>

Character features obtained by OCR character recognition model according to image preprocessing partConstructing k LSTM basic units; the bidirectional LSTM network comprises two sub-network structures, and formulas (3) and (4) respectively represent the transfer of the front term and the rear term; wherein k > m;

wherein k is the LSTM basic unit super parameter, Representing the output result of forward LSTM at instant i,/>The output result of backward LSTM at time i is shown, i input x _i is shown, and bidirectional LSTM output at time i is:

Then, the CTC network de-duplicates the repeatedly recognized characters in the bidirectional LSTM network output [ h ₁,h₂,...,h_x ] to change the characters into [ y ₁,y₂,...,y_n ]; since the bidirectional LSTM basic unit is more than the number n of characters of the mobile phone, the characters are repeatedly divided as shown in a formula (6)

wherein pi ε B (Y) represents all substring sets that can be integrated as Y;

b) Depth forest recognition model submodule

Firstly, constructing different random forests by using X _DF to obtain different random forest outputs Random forest output/>Linear combination with X _DF, as input, transmitting into a next layer model to construct different random forests, and determining whether to continue constructing the next layer network model according to the classification precision of the current model; finally, ending model growth when the model accuracy is not improved any more, weighting the last plurality of random forest classification results to obtain a final classification result/>

C) Multi-model output weighting module sub-module

OCR character recognition model output in the heterogeneous multi-model recognition moduleIs a continuous character string, is mapped based on distance measurement and then is output as a brand of an electronic product, and a depth forest recognition model outputs/>Probability of being branded for all electronic products; in order to solve the problem that heterogeneous models have different output forms or inconsistent output results, a multi-model output weighting module is added to the classification model;

the Softmax function is shown in equation (9):

Wherein n+1 represents the heterogeneous multi-model output vector dimension, and e represents the natural logarithm; in the classification model, n waste electronic product labels are arranged, and OCR character recognition results are obtained Depth forest recognition results/>Obtaining n+1-dimensional result vector/>, after linear splicingInput as a softmax function, finally obtained/>Corresponding to the weight, the label with the highest weight is used as the final waste electronic product classification result/>