CN113610085B

CN113610085B - Character wheel image identification method based on attention mechanism

Info

Publication number: CN113610085B
Application number: CN202111178572.5A
Authority: CN
Inventors: 朱炼; 贾忠友; 常关羽; 吴忝睿
Original assignee: Chengdu Qianjia Technology Co Ltd
Current assignee: Chengdu Qianjia Technology Co Ltd
Priority date: 2021-10-10
Filing date: 2021-10-10
Publication date: 2021-12-07
Anticipated expiration: 2041-10-10
Also published as: CN113610085A

Abstract

The invention relates to a character wheel image identification method based on an attention mechanism, which comprises the following steps: acquiring a character wheel image acquired by a camera terminal, and traversing each line of pixels in the character wheel image to calculate the inter-class variance of each line of pixels; inputting the character wheel image acquired by the camera terminal into a neural network based on attention, and obtaining a characteristic image after sampling; acquiring the maximum value of the inter-class variance of each line of pixels, and inputting the maximum value of the inter-class variance of all the lines of pixels into an attention-based neural network as an attention parameter; and (4) acting the attention parameters on the characteristic image, and obtaining characters which are required to be acquired finally by the character wheel image through calculation. The method comprises the steps of firstly calculating the maximum value of the inter-class variance of each row of pixels in a character wheel image, injecting the maximum value of the inter-class variance of all rows into a neural network as an attention parameter, and enabling the neural network to more accurately segment and identify characters in the character wheel image by using the neural network.

Description

Character wheel image identification method based on attention mechanism

Technical Field

The invention relates to the technical field of image recognition, in particular to a character wheel image recognition method based on an attention mechanism.

Background

The camera shooting gas meter is mainly applied to the transformation of an ordinary gas meter, and specifically means that a buckle with a camera and a communication module is additionally arranged on a basic meter of an old ordinary gas meter to sense the number wheel reading of the gas meter. In such a scenario, since the manufacturers of the basic tables are different, the character wheel images (also referred to as images of character wheels, images of characters, and images of characters) on the gas meter captured by the installed cameras are also different.

Typical differences are: (1) only the image of the character to be recognized exists; (2) the character wheel image to be identified is superposed with the shadow; (3) character wheel images with interference rows at the upper ends; (4) character wheel images of interference rows exist at the upper end and the lower end. In general, the character wheel images of different basic table manufacturers have the following problems compared with the standard images: (1) the interference rows are mainly concentrated at the upper end and the lower end; (2) the disturbing lines are mainly embodied as background, i.e. pure white or pure black, and the pixel standard deviation of each line is low.

At present, although there is a document (publication number CN 107610144B) that discloses an infrared image segmentation method by the maximum inter-class variance method, in which a region where a contrast between a target and a sky background is large is segmented, a sky background image is not necessarily present at the time of acquiring a character wheel image of a photographed gas meter, and a difference between a contrast between an interference line and a character in the character wheel image is not large, so that the method of the document is not completely accurate for segmenting a character in the character wheel image.

Disclosure of Invention

The invention aims to inject an attention mechanism into a convolutional neural network, and provides a character wheel image recognition method based on the attention mechanism, which can more accurately recognize characters in a character wheel image.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

the character wheel image identification method based on the attention mechanism comprises the following steps of:

step S1: acquiring a character wheel image acquired by a camera terminal, and traversing each line of pixels in the character wheel image to calculate the inter-class variance of each line of pixels;

step S2: inputting the character wheel image acquired by the camera terminal into a neural network based on attention, and obtaining a characteristic image after sampling;

step S3: acquiring the maximum value of the inter-class variance of each line of pixels, and inputting the maximum value of the inter-class variance of all the lines of pixels into an attention-based neural network as an attention parameter; and (4) acting the attention parameters on the characteristic image, and obtaining characters which are required to be acquired finally by the character wheel image through calculation.

In the scheme, the maximum value of the inter-class variance of each row of pixels in the character wheel image is calculated firstly, the maximum value of the inter-class variance of all rows is used as an attention parameter to be injected into the neural network, and the neural network based on attention is formed by the common neural network, so that the segmentation and identification of characters in the character wheel image by the neural network are more accurate.

The step of traversing each row of pixels in the word-wheel image to calculate the inter-class variance of each row of pixels includes:

step S11: the method comprises the following steps of (1) totally obtaining n rows of pixels, and calculating the minimum gray value qmin and the maximum gray value qmax of each row of pixels;

step 12: traversing each gray value between the minimum gray value qmin and the maximum gray value qmax of each row of pixels, and calculating the inter-class variance G under each gray value, wherein G = { G =_i}，g_iRepresenting the inter-class variance under the gray value of the ith traversal;

step S13: taking the inter-class variance g in each row of pixels_iAs the inter-class variance of the pixels of the row, Ln = max (g).

The step of calculating the inter-class variance g under each gray value comprises the following steps:

step S121: the gray value of each traversal is recorded as t_i，t_iRepresenting the gray value of the ith traversal;

step S122: with t_iTo separate the thresholds, the pixels in the row are smaller than or equal to t_iThe pixel point of (2) is used as a background pixel and is larger than t_iThe pixel point of (2) is taken as a foreground pixel;

step S123: calculating the mean gray value mu of the foreground pixels₀And the ratio omega of the number of background pixels to the total number of pixels of the row₀(ii) a Calculating the mean gray value mu of the foreground pixels₁And the ratio omega of the number of background pixels to the total number of pixels of the row₁；

Step S124: calculating the variance g = omega between current classes₀*ω₁*(μ₀-μ₁)²。

The attention-based neural network consists of 1 convolutional neural network and 1 Bp neural network; the convolutional neural network consists of 1 input layer, 2 convolutional layers, 2 pooling layers and 1 full-connection layer; the Bp neural network is composed of 1 input layer, 1 hidden layer and 1 output layer.

The character wheel image acquired by the camera terminal is input into a neural network based on attention, and a characteristic image is obtained after sampling, and the method comprises the following steps:

step S21: the pixel of the character wheel image collected by the camera terminal is 28 × 28, the character wheel image is input into a first convolution layer of a convolution neural network, and a feature vector diagram with 6 pixels being 24 × 24 is generated after 6 convolution kernels of the first convolution layer are calculated; inputting the generated feature vector diagram with 6 pixels of 24 into a first pooling layer, performing 2 x 2 maximum downsampling calculation on the feature vector diagram with the pixels of 24 into a feature vector diagram with the pixels of 12 x 12, and generating a feature vector diagram with the 6 pixels of 12 x 12;

step S22: inputting the feature vector diagram with 6 pixels of 12 × 12 generated by the first pooling layer into a second pooling layer, and generating a feature vector diagram with 12 pixels of 8 × 8 after 12 × 5 × 6 convolution kernels of the second pooling layer are calculated; and inputting the generated feature vector diagram with 12 pixels of 8 × 8 into the second pooling layer, performing 2 × 2 maximum downsampling calculation on the feature vector diagram with the pixels of 8 × 8, and converting the feature vector diagram with the pixels of 4 × 4 into the feature vector diagram with the pixels of 4 × 4, thereby generating the feature vector diagram with 12 pixels of 4 × 4.

The maximum value of the inter-class variance of each line of pixels is obtained, and the maximum value of the inter-class variance of all the lines of pixels is used as an attention parameter and is input into an attention-based neural network; the method comprises the following steps of applying attention parameters to a characteristic image, and obtaining characters which are finally required to be obtained by a character wheel image through calculation, wherein the steps comprise:

step S31: acquiring the maximum value of the inter-class variance of the 28 rows of pixels, inputting the maximum value as 28 attention parameters into an input layer of a Bp neural network, and outputting 4 neurons by an output layer after the number of nodes of a hidden layer is estimated;

step S32: and respectively acting 4 neurons output by the output layer on the feature vector diagram with each pixel being 4 x 4 generated by the second pooling layer, obtaining the feature diagram after weighting processing, and acquiring the required characters from the feature diagram.

In the scheme, the maximum value of the inter-pixel-class variance of all rows of the character wheel image is used as an attention parameter to be injected into the convolutional neural network, which is equivalent to that an attention mechanism is added in the ordinary convolutional neural network to form a newly-added Bp neural network, and the convolution neural network has a better basis in the character recognition process of the character wheel image by combining the inter-pixel-class variance of the character wheel image, so that the finally-recognized character is more accurate.

Compared with the prior art, the invention has the beneficial effects that:

the method comprises the steps of firstly calculating the maximum value of the inter-class variance of each row of pixels in a character wheel image, injecting the maximum value of the inter-class variance of all rows into a neural network as an attention parameter, and enabling the neural network to more accurately segment and identify characters in the character wheel image by using the neural network.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

FIG. 1 is a flow chart of a print wheel image recognition method of the present invention;

FIG. 2 is a diagram illustrating an example of an interfering row at the top and bottom ends of a character;

FIG. 3 is a diagram illustrating a maximum value curve of variance between pixel classes according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an attention-based neural network according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a Bp neural network structure according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating the calculation of the feature vector graph with attention parameters according to the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Also, in the description of the present invention, the terms "first", "second", and the like are used for distinguishing between descriptions and not necessarily for describing a relative importance or implying any actual relationship or order between such entities or operations.

Example (b):

the invention is realized by the following technical scheme, as shown in figure 1, the character wheel image identification method based on the attention mechanism comprises the following steps:

step S1: and acquiring a character wheel image acquired by the camera terminal, and traversing each row of pixels in the character wheel image to calculate the inter-class variance of each row of pixels.

In this embodiment, a character wheel image with 28 × 28 pixels is described, and the character wheel image has 28 rows of pixels, and the minimum grayscale value qmin and the maximum grayscale value qmax of each row of pixels are calculated. For example, the minimum gray value of the first row of pixels is qmin₁Maximum gray value qmax₁The minimum gray value in the first row is qmin₁And maximum gray value qmax₁If there are 10 gray values in between, then calculate the inter-class variance G under each gray value in turn, and there is a set G = { G = { (G) }₁,g₂,...g₁₀In which g is₁Represents the inter-class variance at the gray value of the 1 st traversal, i.e., the minimum gray value is qmin₁And maximum gray value qmax₁Between 1 st gray value; same principle g₁₀Representing the gray value of the 10 th traversal.

When traversing the gray value, the gray value traversed each time is t_i，t_iRepresenting the gray value of the i-th traversal, e.g. t₁Then it indicates thatGray value of 1 traversal, t₁₀The gray value for the 10 th traversal is indicated. Because the gray values need to be traversed sequentially, when the 1 st gray value is traversed, the t1 is used as a separation threshold, the pixel points which are less than or equal to t1 in the 1 st line of pixels are used as background pixels, and the pixel points which are greater than t1 are used as foreground pixels.

Suppose that 15 pixels in 27 pixels in the 1 st row (excluding the pixel at t1 at this time) are less than or equal to t1, and 12 pixels are greater than t1, then the 15 pixels less than or equal to t1 are used as background pixels, and the 12 pixels greater than t1 are used as foreground pixels.

The bottleneck gray value mu of the foreground pixel is then calculated₀And the ratio omega of the number of background pixels to the total number of pixels of the row₀(ii) a Calculating the mean gray value mu of the foreground pixels₁And the ratio omega of the number of background pixels to the total number of pixels of the row₁. After obtaining these parameters, the inter-class variance g of the 1 st traversal gray value can be calculated₁=ω₀*ω₁*(μ₀-μ₁)²。

According to the method, by analogy, the inter-class variance of the 10-time traversal gray-scale values in the lower 1 row of pixels can be calculated, that is, the set G = { G = { G }₁,g₂,...g₁₀And acquiring a maximum value Ln = max (G) in the set G as the inter-class variance of the pixels in the first row. Then after the inter-class variance calculation for the 28 rows of pixels, the maximum inter-class variance for the 28 pixels can be obtained.

After the maximum value of the inter-class variance of the pixels is calculated in step S1, the interval between the upper end and the lower end (the dashed-line box A, B in fig. 2) in the character "0" shown in fig. 2 can be distinguished, please refer to the maximum value curve diagram of the inter-class variance of the pixels shown in fig. 3, where the abscissa is the serial number of the pixel row, 28 rows are labeled in total, the ordinate is the maximum value of the inter-class variance corresponding to each row of pixels, and it can be seen that the maximum value of the inter-class variance is very small because there is no character in the pixels of the first row and the pixels of the last row. Therefore, after the maximum value of the variance among the pixel classes is calculated, the lines with characters and the lines without characters in the character wheel image can be distinguished.

Step S2: and inputting the character wheel image acquired by the camera terminal into a neural network based on attention, and obtaining a characteristic image after sampling.

Referring to fig. 4, the attention-based neural network is composed of 1 convolutional neural network and 1 Bp neural network. The convolutional neural network consists of 1 input layer, 2 convolutional layers (a first convolutional layer and a second convolutional layer respectively), 2 pooling layers (a first pooling layer and a second pooling layer respectively) and 1 full-connection layer; the Bp neural network is composed of 1 input layer, 1 hidden layer and 1 output layer. It should be noted that the dashed boxes in fig. 4 only represent the structure of the convolutional neural network and the structure of the Bp neural network, and the dashed boxes have no other specific meaning.

Inputting a character wheel image with 28 × 28 pixels acquired by a camera terminal into a first convolution layer of a convolution neural network, and generating a feature vector diagram with 24 × 24 pixels after 6 convolution kernels of the first convolution layer are calculated by 5 × 5 convolution kernels; the generated feature vector diagram with 6 pixels of 24 is input into the first pooling layer, and after 2 × 2 maximum downsampling calculation of the first pooling layer, the feature vector diagram with 24 pixels of 24 is converted into a feature vector diagram with 12 pixels of 12, so that the feature vector diagram with 6 pixels of 12 is generated.

Then inputting the feature vector diagram with 6 pixels of 12 x 12 generated by the first pooling layer into a second pooling layer, and generating a feature vector diagram with 12 pixels of 8 x 8 after 12 convolution kernels of 5 x 6 of the second pooling layer are calculated; and inputting the generated feature vector diagram with 12 pixels of 8 × 8 into the second pooling layer, performing 2 × 2 maximum downsampling calculation on the feature vector diagram with the pixels of 8 × 8, and converting the feature vector diagram with the pixels of 4 × 4 into the feature vector diagram with the pixels of 4 × 4, thereby generating the feature vector diagram with 12 pixels of 4 × 4.

Referring to fig. 4, the present solution is equivalent to adding an attention-based network, i.e., a Bp neural network, to the convolutional neural network, and using the maximum value of the inter-class variance of 28 pixels obtained through inter-class variance calculation as an input layer of 28 attention parameters input to the Bp neural network, referring to fig. 5, after the number of 10 nodes of the hidden layer is estimated, 4 neurons are output by an output layer. It should be noted that the dashed boxes in fig. 5 only represent the input layer, the hidden layer, and the output layer, and the dashed boxes have no other specific meaning.

And respectively acting 4 neurons output by the output layer on the feature vector diagram with 4 × 4 pixels generated by the second pooling layer. Referring to fig. 6, it is assumed that parameters of the 1 st feature vector diagram in the feature vector diagram with 12 pixels of 4 × 4 are shown as S1, and after 4 neurons P1 act, the 1 st row of S1 is multiplied by the 1 st row of P1 to obtain the 1 st row of N1, and so on to obtain the feature vector diagram N1. Then 12 feature vector diagrams after the action can be obtained, a final feature diagram is obtained after weighting processing, and the required characters can be obtained from the feature diagram.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The character wheel image identification method based on the attention mechanism is characterized in that: the method comprises the following steps:

the attention-based neural network consists of 1 convolutional neural network and 1 Bp neural network; the convolutional neural network consists of 1 input layer, 2 convolutional layers, 2 pooling layers and 1 full-connection layer; the Bp neural network consists of 1 input layer, 1 hidden layer and 1 output layer;

step S22: inputting the feature vector diagram with 6 pixels of 12 × 12 generated by the first pooling layer into a second pooling layer, and generating a feature vector diagram with 12 pixels of 8 × 8 after 12 × 5 × 6 convolution kernels of the second pooling layer are calculated; inputting the generated feature vector diagram with 12 pixels being 8 × 8 into a second pooling layer, performing 2 × 2 maximum downsampling calculation on the second pooling layer, and converting the feature vector diagram with the pixels being 8 × 8 into a feature vector diagram with the pixels being 4 × 4, thereby generating a feature vector diagram with 12 pixels being 4 × 4;

step S3: acquiring the maximum value of the inter-class variance of each line of pixels, and inputting the maximum value of the inter-class variance of all the lines of pixels into an attention-based neural network as an attention parameter; the attention parameters are acted on the characteristic image, and characters which are required to be acquired finally by the character wheel image are obtained through calculation;

2. The attention-based character wheel image recognition method of claim 1, wherein: the step of traversing each row of pixels in the word-wheel image to calculate the inter-class variance of each row of pixels includes:

3. The attention-based character wheel image recognition method of claim 2, wherein: the step of calculating the inter-class variance g under each gray value comprises the following steps:

step S123: calculating the mean gray value mu of the foreground pixels₀And the ratio omega of the number of background pixels to the total number of pixels of the row₀(ii) a Calculating average gray of foreground pixelsValue of mu₁And the ratio omega of the number of background pixels to the total number of pixels of the row₁；