CN111583502A

CN111583502A - Renminbi (RMB) crown word number multi-label identification method based on deep convolutional neural network

Info

Publication number: CN111583502A
Application number: CN202010381442.0A
Authority: CN
Inventors: 田莹; 王澧冰; 董惠文; 汪洋; 崔龙磊; 苗丰泽
Original assignee: University of Science and Technology Liaoning USTL
Current assignee: University of Science and Technology Liaoning USTL
Priority date: 2020-05-08
Filing date: 2020-05-08
Publication date: 2020-08-25
Anticipated expiration: 2040-05-08
Also published as: CN111583502B

Abstract

The invention relates to the technical field of banknote identification, in particular to a method for identifying RMB serial numbers with multiple labels based on a deep convolutional neural network. Preprocessing the banknote image; firstly, roughly positioning by using priori knowledge, and then accurately positioning the serial number to obtain a RMB serial number image; zooming all RMB crown word number images into a preset same size; extracting image features by using a deep convolutional neural network, training a model to obtain a prediction vector, and storing the model when the model reaches a certain accuracy rate; in the prediction stage, the image is transmitted into a depth convolution neural network to extract image characteristics; stretching and inputting the feature map into a full-connection layer to obtain a prediction vector; performing Sigmoid operation on the prediction vector; and dividing the prediction vector subjected to Sigmoid operation into ten pieces, finding the maximum value from each piece, and mapping the maximum value to the corresponding label vector to obtain a final classification result. Compared with the traditional identification method, the method is rapid, stable and high in accuracy.

Description

Renminbi (RMB) crown word number multi-label identification method based on deep convolutional neural network

Technical Field

The invention relates to the technical field of banknote identification, in particular to a method for identifying RMB serial numbers with multiple labels based on a deep convolutional neural network.

Background

The serial number of the paper currency is used for recording the paper currency issuing sequence, and has the functions of controlling the issuing quantity of the paper currency and preventing the paper currency from being counterfeited. The crown word number can be understood as an identification card of each paper currency, and the bank or self-service financial equipment can record the crown word number of the inflowing or outflowing paper currency so as to manage evidence obtaining and track the flow direction of the paper currency. The self-service financial equipment such as the automatic teller machine or the access all-in-one machine can also identify the truth of the paper currency according to the serial number of the paper currency. Therefore, it is very important to accurately identify the paper currency crown word number.

At present, the method for identifying the crown word number of the paper money at home and abroad comprises the following steps: the paper money images are transmitted to an upper computer for processing through a USB, and the real-time effect is poor due to the limitation of the USB transmission speed; the bank note crown word number recognition is carried out through the DSP platform, but the recognition effect and the software robustness are poor due to the fact that the low-efficiency method is adopted for edge searching, orientation-oriented recognition, crown word number area positioning and segmentation and crown word number recognition of the bank note image. For example, the abnormal point removal is not carried out on the edge searching of the banknote image, so that the edge of the searched banknote is inaccurate, and the positioning and the identification of the banknote serial number are influenced. And for example, the facing direction of the paper money is identified, the coarse grid characteristic is adopted, and the efficiency of the program is seriously influenced.

The main defects of the methods are low efficiency, poor recognition effect and low crown word number recognition rate.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a method for identifying the serial number of the RMB based on a deep convolutional neural network, which is rapid, stable and high in accuracy compared with the traditional identification method.

The RMB crown word number multi-label identification method based on the deep convolutional neural network specifically comprises the following steps:

1) preprocessing the banknote image, including improving brightness intensity, extracting a crown word number image and registering the crown word number image;

2) firstly, roughly positioning by using priori knowledge, and then accurately positioning the serial number to obtain a RMB serial number image;

3) zooming all RMB crown word number images into a preset same size;

4) extracting image features by using a deep convolutional neural network, training a model to obtain a prediction vector, and storing the model when the model reaches a certain accuracy rate;

5) in the prediction stage, the image is transmitted into a depth convolution neural network to extract image characteristics;

6) stretching and inputting the feature map into a full-connection layer to obtain a prediction vector;

7) performing Sigmoid operation on the prediction vector;

8) and dividing the prediction vector subjected to Sigmoid operation into ten pieces, finding the maximum value from each piece, and mapping the maximum value to the corresponding label vector to obtain a final classification result.

The step 1 specifically comprises: the binarization effect of the paper money image is improved by combining top hat transformation on the basis of graying; extracting a matrix area where the paper money image is located to remove irrelevant background information; registering the images with a homography matrix to correct for tilt and eliminate perspective effects;

the first image preprocessing specifically includes:

1) establishing a corresponding relation between four vertex angle coordinates of the banknote image before registration and the registration image;

2) solving a homography matrix according to the coordinate correspondence;

3) solving the corresponding points of the banknote image before registration in the registered banknote image by using a homography matrix;

4) and assigning the matched paper money image by a bilinear interpolation method.

The step 2 specifically comprises: rectangular regions located approximately to the left 1/4 and below 1/3 of the registered image with a priori knowledge; and (3) adopting accurate positioning based on block binarization, namely dividing the approximate positioning map into a left block and a right block, respectively using a global threshold value to carry out binarization, and then splicing the two blocks to carry out scanning positioning.

The step 4 specifically includes: in the training stage, inputting the normalized binary crown word number image into a deep convolution neural network to obtain a characteristic vector through self-training of a model; stretching and inputting the feature map into a full-connection layer to obtain a prediction vector; and training the prediction vector and the label vector through a Sigmoid cross entropy function to obtain a final model.

The step 5 specifically includes: and in the prediction stage, the image is input into a saved deep convolutional neural network model to extract features.

The deep convolutional neural network model structure is as follows:

firstly, 4 layers are provided, namely a convolution layer, a batch regularization layer, an activation layer and a maximum pooling layer;

the input image size is (128,64,3), where the convolution kernel size of the convolution layer is 7x7, the depth of the convolution kernel is 64, and the convolution step size is 2;

the batch regularization layer normalizes the input without changing the size of the input;

the activation layer increases the nonlinearity of the neural network without changing the size of the input;

the sampling layer in the maximum pooling layer is 3x3, the step length is 2, the maximum pooling layer is used for reducing the size of the model, the calculation speed is increased, and meanwhile the robustness of the extracted features is improved;

then, a bottleneck module is provided, the bottleneck module comprises nine layers, the first layer is a coiling layer, the second layer is a batch regularization layer, the third layer is an activation layer, the fourth layer is a trimming layer, the fifth layer is a coiling layer, the sixth layer is a batch regularization layer, the seventh layer is an activation layer, the eighth layer is a coiling layer, and the ninth layer is a batch regularization layer;

a total of 16 bottleneck modules are stacked;

and then, the shortcut residual error is fast, and in a bottleneck module, the first layer is connected to the third layer in a weighted mode across three layers, so that the problem of gradient divergence in a deep network is effectively solved. Wherein the weight in the shortcut channel is set to 1;

then a global mean pooling layer is carried out, wherein the global mean pooling layer is used for reducing the number of parameters, namely reducing the occurrence of model overfitting;

and finally, outputting a highly purified characteristic by the full connection layer, and delivering the characteristic to a final classifier for classification.

The step 6 specifically includes: and stretching and inputting the feature map into the full-connection layer to obtain a prediction vector.

The step 7 specifically includes: and (4) making the prediction vector as a Sigmoid cross entropy function to obtain the prediction vector with the value range of 0-1.

The step 8 specifically includes: since the crown word number image is fixed to ten bits, the vector is cut into 10 pieces, the maximum value is found from each piece, and mapped onto the corresponding label vector, and then the prediction result is output.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention adopts the banknote image registration method based on the homography matrix to ensure that the input banknote images with different angles, backgrounds, illumination intensities and resolutions can be output into a unified banknote top view.

2. The invention adopts the problem of quickly judging whether the paper money is a front side or a back side based on the image texture characteristics of the RMB paper money and the preprocessed registered paper money image, and quickly positions the RMB crown word number.

3. The invention does not need to perform complicated image segmentation operation of the RMB crown word number, greatly improves the recognition efficiency and ensures that the character recognition accuracy rate reaches 99.84 percent.

Drawings

FIG. 1 is a block diagram of a RMB crown word number recognition system according to the present invention;

FIG. 2 is a diagram of a homography matrix based banknote image registration process of the present invention;

FIG. 3 is an exemplary diagram of an input image according to the present invention;

FIG. 4 is a diagram of a deep convolutional neural network architecture of the present invention;

FIG. 5 is an exemplary diagram of an image for extracting the Renminbi crown word number according to the present invention;

FIG. 6 is a view of the bottleneck module of the present invention;

FIG. 7 is a diagram of the convolution calculation process of the present invention;

FIG. 8 is a diagram of a pooling calculation process of the present invention;

FIG. 9 is a block diagram of the shortcut residual block of the present invention.

Detailed Description

The invention discloses a Renminbi (RMB) crown word number multi-label identification method based on a deep convolutional neural network. Those skilled in the art can modify the parameter implementation appropriately in view of the disclosure herein. It is expressly intended that all such similar substitutes and modifications which would be obvious to one skilled in the art are deemed to be included in the invention. While the methods and applications of this invention have been described in terms of preferred embodiments, it will be apparent to those of ordinary skill in the art that variations and modifications in the methods and applications described herein, as well as other suitable variations and combinations, may be made to implement and use the techniques of this invention without departing from the spirit and scope of the invention.

Example (b):

as shown in fig. 1-9, a renminbi crown word number multi-label recognition method based on a deep convolutional neural network includes the following steps of, firstly, preprocessing a banknote image, including improving severe exposure, extracting the banknote image, registering the banknote image, and the like. Extracting a matrix area where the paper money image is located to remove irrelevant background information; registering the images with a homography matrix to correct for tilt and eliminate perspective effects; the reversing condition is judged according to the left-right distribution of the binary image pixel points of the paper money, and then the front and back conditions are judged according to the color tone of the left lower area of the paper money. The preprocessing algorithm can be well adapted to the positioning and identification of subsequent serial numbers, the constraint requirement on the input paper money image is low, and the upright paper money image can be output under any angle, illumination intensity and resolution ratio as long as the image can be visually and clearly identified.

The crown word number is positioned by a two-step method, namely, the first step is roughly positioned by using priori knowledge, and the second step is used for accurately positioning the crown word number.

And finally, training and predicting the image by utilizing the deep convolutional neural network to obtain higher recognition rate.

In order to output a unified banknote top view, a banknote image registration method based on a homography matrix is adopted to correct inclination of a banknote image and eliminate a perspective effect. And setting the banknote image before registration as A, and obtaining a registered image B after registration is finished, wherein B is HA, and H is a homography matrix.

The processing steps of the banknote image registration are as follows:

establishing a corresponding relation between four vertex angle coordinates of the original image and the registration image;

solving a homography matrix H according to the coordinate corresponding relation;

solving the corresponding point of B at A by using H;

and B is assigned by a bilinear interpolation method.

And judging whether the paper money is reversed or not according to the texture of the image of the Renminbi paper money and the distribution condition of two pixel values on the left and the right of the binary image, and judging the front and the back through the color tone of the lower left area.

The deep convolutional neural network model has the following structure:

there are 4 layers, namely a convolutional layer, a batch regularization layer, an activation layer and a maximum pooling layer.

The input image size is (128,64,3), the features of the input picture are extracted through convolution operation to obtain a convolution layer, wherein the size of a convolution kernel is 7x7, the depth of the convolution kernel is 64, the convolution step size is 2, the convolution layer is composed of 32 feature maps, and (7x7+1) x32 is 1600 parameters in total.

The batch regularization layer normalizes the inputs without changing the size of the inputs.

The active layer increases the non-linearity of the model, again without changing the size of the input.

The sampling layer in the maximum pooling layer is 3x3, the step length is 2, the maximum pooling layer is used for reducing the size of the model, the calculation speed is improved, the robustness of the extracted features is improved, and a 64x32x32 feature mapping graph is obtained after the maximum pooling layer is subjected to pooling operation.

The bottle neck module comprises nine layers, wherein the first layer is a winding layer, the second layer is a batch regularization layer, the third layer is an activation layer, the fourth layer is a trimming layer, the fifth layer is a winding layer, the sixth layer is a batch regularization layer, the seventh layer is an activation layer, the eighth layer is a winding layer, and the ninth layer is a batch regularization layer.

A total of 16 bottleneck modules were stacked.

And then a shortcut residual block, wherein the first layer is connected to the third layer in a weighted mode in a bottleneck mode by spanning the three layers, so that the problem of gradient divergence in a deep network is effectively solved. Wherein the weight in the shortcut channel is set to 1.

Then follows a global mean pooling layer, wherein the global mean pooling layer is used to reduce the number of parameters, i.e. to mitigate the occurrence of model overfitting.

And finally, the full connection layer outputs highly purified characteristics for classifying the classifier.

And (3) convolution operation: in convolutional neural networks, there are two most common calculation methods, one is convolution operation, and the other is pooling operation. Convolution is effectively an integral operation that describes the relationship of the input and output of a linear time-invariant system: i.e. the output can be obtained by convolution of the input with a function characterizing the system. When F (N) is finite length N, S (N) is finite length M, the main method for calculating convolution F (N) S (N) has direct calculation method, and it uses convolution definition

If F (n) and S (n) are both real signals, then MN multiplications are required. The complexity of the computation convolution is therefore O (m × n).

Convolution process of image: performing convolution operation on another input image by using a trainable convolution kernel x, and then adding an offset b to obtain a convolution characteristic layer, namely f (xW)_ij+ b), f is the relu function.

And (3) pooling operation: after the convolution operation, the most basic features of the image are extracted, theoretically, the features can be used for classification, but the features do not necessarily represent the features of an abstract concept, the data volume is large, overfitting is easy to occur, so that higher-level abstraction needs to be performed on the image, namely, pooling operation, namely secondary feature extraction, so that more feature information can be detected, and the calculation complexity can be reduced.

In the pooling operation, the number of elements is reduced by scanning the tensor by using a matrix window and taking the maximum value or the average value of each matrix. If continuous pixel points in the image are selected as the pooling areas and only the features generated by the same hidden units are pooled, then these pooled units have translation invariance, which is very important for identification.

A bottleneck module: the bottleneck module comprises three convolutional layers in total, wherein the first convolutional layer is 1x1 convolution and can perform the function of reducing the dimension of the channel number, so that the second convolutional layer of 3x3 can perform convolution calculation with relatively low-dimension input during calculation, and the calculation efficiency is improved. And the third 1x1 convolution layer plays a role of increasing dimension. The other layers are mainly the optimization of the model.

Shortcut residual block: as the depth of the network increases, convolutional neural networks work better, but training becomes more difficult. This is mainly because the multi-layer back propagation of the error signal is very likely to cause the phenomenon of gradient diffusion during the network training process based on the random gradient descent. Some special weight initialization strategies and batch regularization methods can improve such problems, but when the model converges, the training error does not decrease but increases as the depth of the network increases. The shortcut residual block well solves the problem of difficult training caused by network depth.

The residual calculation formula is as follows: y ═ F (x, w)_f)*T(x,w_t)+x*C(x,w_c) The traditional convolutional neural network does not have T and C terms. T (x, wt) is a non-linear transformation, called a transform gate, responsible for controlling the strength of the transformation. C (x, wc) is also a non-linear variation, called a carry gate, and is responsible for controlling the retained strength of the original input signal, in other words, y is a weighted combination of F and x, and T and C respectively control two corresponding weights, where T + C is 1. The residual block makes the process of training the model easier than training the original function.

The invention adopts the banknote image registration method based on the homography matrix to ensure that the input banknote images with different angles, backgrounds, illumination intensities and resolutions can be output into a unified banknote top view. The invention adopts the problem of quickly judging whether the paper money is a front side or a back side based on the image texture characteristics of the RMB paper money and the preprocessed registered paper money image, and quickly positions the RMB crown word number. The invention does not need to perform complicated image segmentation operation of the RMB crown word number, greatly improves the recognition efficiency and ensures that the character recognition accuracy rate reaches 99.84 percent.

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims

1. The RMB crown word number multi-label identification method based on the deep convolutional neural network is characterized by comprising the following steps:

3) zooming all RMB crown word number images into a preset same size;

7) performing Sigmoid operation on the prediction vector;

2. The method for identifying the Renminbi (RMB) crown word number multiple tags based on the deep convolutional neural network as claimed in claim 1, wherein the step 1 specifically comprises: the binarization effect of the paper money image is improved by combining top hat transformation on the basis of graying; extracting a matrix area where the paper money image is located to remove irrelevant background information; registering the images with a homography matrix to correct for tilt and eliminate perspective effects;

the first image preprocessing specifically includes:

2) solving a homography matrix according to the coordinate correspondence;

3. The method for identifying the Renminbi number with multiple tags based on the deep convolutional neural network as claimed in claim 1, wherein the step 2 specifically comprises: rectangular regions located approximately to the left 1/4 and below 1/3 of the registered image with a priori knowledge; and (3) adopting accurate positioning based on block binarization, namely dividing the approximate positioning map into a left block and a right block, respectively using a global threshold value to carry out binarization, and then splicing the two blocks to carry out scanning positioning.

4. The method for identifying the Renminbi number with multiple tags based on the deep convolutional neural network as claimed in claim 1, wherein the step 4 specifically comprises: in the training stage, inputting the normalized binary crown word number image into a deep convolution neural network to obtain a characteristic vector through self-training of a model; stretching and inputting the feature map into a full-connection layer to obtain a prediction vector; and training the prediction vector and the label vector through a Sigmoid cross entropy function to obtain a final model.

5. The method for identifying the Renminbi number with multiple tags based on the deep convolutional neural network as claimed in claim 1, wherein the step 5 specifically comprises: and in the prediction stage, the image is input into the stored deep convolutional neural network model to extract the image characteristics.

6. The Renminbi number multi-label identification method based on the deep convolutional neural network as claimed in claim 5, wherein the deep convolutional neural network model structure is as follows:

a total of 16 bottleneck modules are stacked;

7. The method for identifying the Renminbi number with multiple tags based on the deep convolutional neural network as claimed in claim 1, wherein the step 6 specifically comprises: and stretching and inputting the feature map into the full-connection layer to obtain a prediction vector.

8. The method for identifying the Renminbi number with multiple tags based on the deep convolutional neural network as claimed in claim 1, wherein the step 7 specifically comprises: and (4) making the prediction vector as a Sigmoid cross entropy function to obtain the prediction vector with the value range of 0-1.

9. The method for identifying the Renminbi number with multiple tags based on the deep convolutional neural network as claimed in claim 1, wherein the step 8 specifically comprises: and dividing the vector into 10 pieces, finding the maximum value from each piece, mapping the maximum value to the corresponding label vector, and outputting a prediction result.