CN112364726B

CN112364726B - Part code-spraying character positioning method based on improved EAST

Info

Publication number: CN112364726B
Application number: CN202011163480.5A
Authority: CN
Inventors: 唐倩; 冯琪翔; 李代杨; 张志豪
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2024-06-04
Anticipated expiration: 2040-10-27
Also published as: CN112364726A

Abstract

The application provides a part code-spraying character positioning method based on improved EAST, which is based on the existing EAST network structure, improves the network structure, label generation, loss function and candidate area processing of the existing EAST network, obtains the improved network structure, improved label generation, improved loss function and candidate area post-processing, and uses the improved EAST network to position characters. The application optimizes the network structure, label generation, loss function and candidate area post-treatment based on the existing EAST algorithm, and the improved deep learning network can effectively enlarge the perception field of view, improve the recognition efficiency of long texts and realize the accurate positioning of part code-spraying characters.

Description

Part code-spraying character positioning method based on improved EAST

Technical Field

The invention relates to the technical field of character recognition, in particular to a part code-spraying character positioning method based on improved EAST.

Background

With the development of information science and technology, image processing and machine vision techniques have been widely used in the industry. Character text detection has been the most widely used machine vision technology, and its industrial application has been the focus and difficulty of research. The first step of character text detection is to locate the character text region. The EAST algorithm is one of excellent natural scene text localization algorithms, and has excellent performance in a plurality of public data sets. But limited by the receptive field size, training process setting and the like, the EAST algorithm is not ideal in character positioning, and the detection effect on long texts is required to be improved.

Therefore, there is a need for an end-to-end character recognition method that has little dependency on character segmentation and high communication.

Disclosure of Invention

In view of this, the present invention provides a method for improved EAST-based part spray symbol positioning, characterized by: the method is based on the existing EAST network structure, improves the network structure, label generation, loss function and candidate area processing of the existing EAST network, obtains improved network structure, improved label generation, improved loss function and candidate area post-processing, and locates characters by using the improved EAST network;

The network structure for improving the existing EAST network specifically comprises:

constructing a VGG16 model, wherein the VGG16 model comprises 5 convolution layers and 5 pooling layers, the convolution kernel size is 3 multiplied by 3, and the step length is 1;

Improving a VGG16 model, and replacing convolution operation of the last stage in the VGG16 network structure with mixed cavity convolution HDC with the cavity coefficient of [1,2,5 ];

the improved label generation specifically comprises the following steps: narrowing a quadrangle formed by a character area of the target training image in a mode of distinguishing long and short sides;

The improved loss function specifically includes: replacing a balance cross entropy loss function in an EAST algorithm by using a dice loss function;

The processing of the improved candidate region specifically includes: a pre-ordering based local sense NMS algorithm is used to order all candidate boxes according to their upper left corner coordinates.

Further, the improved label generation specifically includes the following steps:

narrowing the quadrangle formed by the character area of the target training image in a mode of treating the long side and the short side respectively;

the short side of the quadrangle is reduced by 0.3 times on the basis of the length of the short side of the original label remark;

the long side of the quadrangle is reduced by shrunk _rate times on the basis of the length of the long side of the original label remark:

wherein shrunk _rate represents a reduction multiple of the long side, length _s represents the length of the short side of the quadrangle, and length _l represents the length of the long side of the quadrangle.

Further, the dice loss function is determined by the following method:

L＝L_s+λ_gL_g＝L_s+λ_g(L_IoU+λ_θL_θ) (2)

where L represents the modified dice loss function, L _s represents the score map loss, L _g represents the geometric loss, λ _g represents the coefficient of geometric loss, L _IoU represents the overlap area loss, λ _θ represents the coefficient of angular loss, and L _θ represents the angular loss.

Further, the processing of the improved candidate region specifically includes a pre-ordered based local sense NMS algorithm, and the pre-ordered based local sense NMS algorithm specifically includes the following steps:

s1: initializing a set box, an overlapping degree threshold lambda and a shock threshold omega of all candidate rectangular frames;

s2: extracting the left upper corner coordinates (x ₁,y₁) of all rectangular frames in the boxes, respectively arranging according to the sequence from small to large of x ₁ and y ₁, respectively calculating and comparing the shock times of x ₁ and y ₁ according to omega, and arranging the boxes according to the sequence from small to large of the shock times;

S3: let s be the empty set, p be the empty set, for g e boxes, where g represents an element in the set boxes of all candidate rectangular boxes, i.e. a candidate rectangular box, and sequentially execute, in order from small to large, the number of times of shock of all rectangular boxes in boxes: if p is not empty and the overlapping area ratio of p and g is larger than lambda, p is a rectangular frame obtained by combining p and g; if p is not null and the overlapping area ratio of p and g is not greater than lambda, then p is put into s; if p is null, p=g;

s4: if p is not null, then put p into s;

s5: and outputting a set S, wherein the S represents the filtered rectangular frame set.

Further, the processing of improving the candidate region further includes candidate box merging, where the candidate box merging is determined by the following method:

S(a)＝η_gS(g)+η_pS(p) (3)

a_i＝η_gS(g)g_i+η_pS(p)p_i (4)

Wherein a is a combined rectangular frame, g and p represent candidate rectangular frames meeting an overlapping condition, i represents parameter indexes in the rectangular frames, i=1, 2,3 …,8 coordinates are shared by 4 vertexes, 1 angle value is obtained, S represents a score value of the rectangular frame, eta _g represents an area coefficient of the candidate rectangular frame g, eta _p represents an area coefficient of the candidate rectangular frame p;

the area coefficients eta _g and eta _p are set by the following method:

Whether the area of the candidate rectangular frame g is larger than that of the candidate rectangular frame p is judged, if yes, η _g＝1.1,η_p =1, and if not, η _g＝1,η_p =1.1.

The beneficial technical effects of the application are as follows: the application optimizes the network structure, label generation, loss function and candidate area post-treatment based on the existing EAST algorithm, and the improved deep learning network can effectively enlarge the perception field of view, improve the recognition efficiency of long texts and realize the accurate positioning of part code-spraying characters.

Drawings

The invention is further described below with reference to the accompanying drawings and examples:

Fig. 1 is a schematic diagram of an improved EAST algorithm network architecture proposed by the present invention.

Fig. 2 is a schematic diagram of the label generation process of the present invention.

FIG. 3 is a graph comparing the detection effect of the present invention with the detection effect of the original algorithm.

Detailed Description

The invention is further described below with reference to the accompanying drawings of the specification:

The invention provides a part code-spraying character positioning method based on improved EAST, which is characterized in that: the method is based on the existing EAST network structure, improves the network structure, label generation, loss function and candidate area processing of the existing EAST network, obtains improved network structure, improved label generation, improved loss function and candidate area post-processing, and locates characters by using the improved EAST network;

In the convolutional neural network, the size of the receptive field is determined by parameters such as a convolutional kernel, a convolutional step length and the like. In the standard VGG16 model, the receptive field expansion is achieved by the superposition and pooling operations of the convolution layers. However, the method for expanding the receptive field is limited by the number of convolution layers, so that the effect is limited, the size of the convolution kernel is expanded, parameters are excessive, and the receptive field is increased through a pooling layer, so that information is lost. These factors limit the size of the VGG16 network receptive field. The cavity convolution is a method that the size of a convolution kernel is unchanged, but a plurality of cavities are added to enlarge the whole convolution matrix and expand a convolution area, so that parameters are not increased and a receptive field is expanded, but grid effect and the phenomenon that a small target cannot be detected can occur. However, a hybrid cavity convolution scheme is proposed, and the above problem can be avoided as long as the following three requirements are met by the convolution coefficient setting in the process of building the neural network, and according to the thought, the convolution operation (Conv_5) of the final stage of VGG16 is replaced, the original standard convolution for 3 times is replaced by the hybrid cavity convolution (HDC) with the cavity coefficient of [1,2,5], so that the original 7×7 receptive field of the part is enlarged to 17×17, and the VGG16 network with larger receptive field is obtained. The new network structure is shown in fig. 1, and the dark blue part is the improvement part of the VGG16 network, wherein Maxpooling represents the maximum pooling operation, up Sampling represents the upsampling operation, and Concat represents the feature map stitching.

According to the technical scheme, the network structure, the label generation, the loss function and the candidate region post-processing are optimized on the basis of the existing EAST algorithm, the improved deep learning network can effectively enlarge the perception field of view, the recognition efficiency of long texts is improved, and the accurate positioning of part code-spraying characters is realized.

In this embodiment, the improved label generation specifically includes the following steps:

In the network training process, character area labels of all training images are quadrilateral, and coordinates of four points are expressed in a format of (x ₁,y₁,x₂,y₂,x₃y₃,x₄,y₄). x and y represent the abscissa and ordinate, respectively, of the point. The subscripts "1", "2", "3", "4" denote points at the upper left, upper right, lower right and lower left corners, respectively, i.e., are labeled in a clockwise order. In EAST algorithm training, a form RBOX channels { d ₁,d₂,d₃d₄, θ } is adopted to generate a training tag, wherein d _i represents distances from points in the training tag to four edges of the tag (text boxes in FIG. 1), and θ represents a rotation angle (angle in FIG. 1). There are another 1 channel to record the likelihood score (score map in fig. 1) that each point in the image is a character point. In order to reduce interference caused by label labeling errors, a score map (score map) in an EAST algorithm is reduced by 0.3 times on the basis of the minimum circumscribed rectangle of the original label labeling. The overall label generation process is shown in fig. 2. The method for reducing the label rectangle can reduce the interference brought by the character edge area and improve the accuracy of character positioning. However, as is apparent from fig. 2, since the long-side cardinality of the character area is large, the 0.3-fold reduction results in a large-area character area not entering the score map. Such training is very bad for long character positioning. Aiming at the problem, the label training of the EAST algorithm is improved, the short sides of the label rectangle are still reduced by 0.3 times by adopting a mode of differentiating the long sides, and the reduction multiple of the long sides is the ratio of the length of the short sides of the rectangle to the length of the long sides of the rectangle.

In this embodiment, the dice loss function is determined by the following method:

L＝L_s+λ_gL_g＝L_s+λ_g(L_IoU+λ_θL_θ) (2)

For neural network training, the loss function is an optimization target of the whole network training, and the quality of the loss function setting directly influences the training result of the network and the detection effect of final parameters. For the FCN network proposed by the present invention, the loss function L is mainly composed of two parts of score map loss L _s and geometric loss L _g, where geometric loss L _g is composed of overlapping area loss L _IoU and angle loss L _θ. For the score map loss L _s, it is essentially a two-class loss function, i.e., the character region is a positive sample, the other regions are negative samples, and the positive and negative samples are very unbalanced, and the area of the character region is much smaller than the other regions. The EAST algorithm adopts a balanced cross entropy loss function to solve the sample imbalance problem, and research in recent years finds that the dice has more excellent performance in coping with the sample imbalance problem. Therefore, the invention adopts the race loss to express the score graph loss, and can better cope with the sample unbalance problem compared with the EAST algorithm.

In this embodiment, the processing of the improved candidate region specifically includes a pre-ordered based local-awareness NMS algorithm, where the pre-ordered based local-awareness NMS algorithm specifically includes the following steps:

Typically, when the network is used for character region detection, there may be hundreds or even thousands of candidate frames extracted by FCN. Most of them overlap each other, so combining this huge number of candidate boxes is the last step of the whole character region locating algorithm. Previous subsequent block merging often employed non-maximal suppression, NMS, algorithms, but the time complexity of such algorithms was O (n ²), which is very time consuming. For this case, the EAST algorithm employs a locally aware NMS algorithm. The algorithm only compares two adjacent candidate areas, and if the overlapping degree is larger than a threshold value, the two candidate areas are combined, and if the overlapping degree is smaller than the threshold value, the two candidate areas are reserved. The time complexity depends on the arrangement of the candidate regions, and is preferably O (n) (i.e., the candidate frames that should be merged are all arranged together), and is most preferably O (n ²) (i.e., the candidate frames that do not overlap with each other are alternately arranged). Meanwhile, two candidate frames meeting the overlap condition are not discarded one by one as in the standard NMS, but are combined according to the scoring condition.

It can be seen that the temporal complexity of the locally perceived NMS algorithm depends on the random candidate arrangement. Aiming at the whole problem, the invention improves the local sensing NMS algorithm, provides the local sensing NMS algorithm based on pre-sequencing, sequences all candidate frames according to the values of the left upper corner coordinates of the candidate frames, and can arrange all the candidate frames which are possibly combined together, thereby achieving the effect that the time complexity is O (n) as much as possible. Meanwhile, the arrangement of the candidate frames which are possibly combined is beneficial to improving the combining precision, so that the final text positioning effect is improved.

In this embodiment, the process of improving the candidate region further includes candidate box merging, where the candidate box merging is determined by the following method:

S(a)＝η_gS(g)+η_pS(p) (3)

a_i＝η_gS(g)g_i+η_pS(p)p_i (4)

the area coefficients eta _g and eta _p are set by the following method:

In addition, in the original local sense NMS algorithm, only the scoring condition is considered when the candidate rectangular frames are combined, the area information of the candidate frames is also important for long texts, and the possibility that the candidate frames with larger areas cover the whole text area is higher. Thus, in the candidate frame merging herein, η, which introduces the larger of the area coefficients η, g and p, is set to 1.05, and the other is 1.

The detection effect before and after the improvement of the comparison algorithm is implemented according to the improved character positioning algorithm based on EAST, as shown in figure 3. It is obvious that the original EAST algorithm has defects of long text detection, the rectangular box on the rightmost side of the first row of characters does not cover all character areas, and the last 1 of the second row of characters is only covered by half. The optimized algorithm of the invention is used for detecting long text, so that the whole character area is well covered, and the condition that the head and the tail of the character area are missed does not occur.

Finally, it is noted that the above embodiments are only for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the technical solution of the present invention, which is intended to be covered by the scope of the claims of the present invention.

Claims

1. A method for improving the positioning of part spray characters based on EAST, which is characterized in that: the method is based on the existing EAST network structure, improves the network structure, label generation, loss function and candidate area processing of the existing EAST network, obtains improved network structure, improved label generation, improved loss function and candidate area post-processing, and locates characters by using the improved EAST network;

The processing of the improved candidate region specifically includes: a local perception NMS algorithm based on pre-sequencing is adopted, and all candidate frames are sequenced according to the values of the upper left corner coordinates of the candidate frames;

The improved label generation specifically comprises the following steps:

Wherein shrunk _rate represents a reduction multiple of the long side, length _s represents the length of the short side of the quadrangle, and length _l represents the length of the long side of the quadrangle;

the dice loss function is determined by the following method:

L＝L_s+λ_gL_g＝L_s+λ_g(L_IoU+λ_θL_θ) (2)

Wherein L represents the modified dice loss function, L _s represents the score map loss, L _g represents the geometric loss, λ _g represents the coefficient of geometric loss, L _IoU represents the overlap area loss, λ _θ represents the coefficient of angular loss, and L _θ represents the angular loss;

the processing of the improved candidate region specifically comprises a pre-ordered based local awareness NMS algorithm, and the pre-ordered based local awareness NMS algorithm specifically comprises the following steps:

s4: if p is not null, then put p into S;

S5: outputting a set S, wherein the S represents a rectangular frame set after screening;

the processing of the improved candidate region further includes candidate block merging, the candidate block merging determined by:

S(a)＝η_gS(g)+η_pS(p) (3)

a_i＝η_gS(g)g_i+η_pS(p)p_i (4)

Wherein a is a combined rectangular frame, g and p represent candidate rectangular frames meeting an overlapping condition, i represents parameter indexes in the rectangular frames, i=1, 2,3 …, 8 coordinates are shared by 4 vertexes, 1 angle value is obtained, S represents a score value of the rectangular frame, eta _g represents an area coefficient of the candidate rectangular frame g, eta _p represents an area coefficient of the candidate rectangular frame p;

the area coefficients eta _g and eta _p are set by the following method: