CN112364726A

CN112364726A - Part code spraying character positioning method based on improved EAST

Info

Publication number: CN112364726A
Application number: CN202011163480.5A
Authority: CN
Inventors: 唐倩; 冯琪翔; 李代杨; 张志豪
Original assignee: Chongqing University
Current assignee: Chongqing University
Priority date: 2020-10-27
Filing date: 2020-10-27
Publication date: 2021-02-12
Anticipated expiration: 2040-10-27
Also published as: CN112364726B

Abstract

The invention provides a part code spraying character positioning method based on improved EAST, which is based on the existing EAST network structure, improves the network structure, label generation, loss function and candidate area processing of the existing EAST network, obtains the improved network structure, the improved label generation, the improved loss function and candidate area post-processing, and positions characters by using the improved EAST network. According to the method and the device, on the basis of the existing EAST algorithm, the network structure, the label generation, the loss function and the candidate area post-processing are optimized, the improved deep learning network can effectively enlarge the experience visual field, the recognition efficiency of long texts is improved, and the accurate positioning of part code spraying characters is realized.

Description

Part code spraying character positioning method based on improved EAST

Technical Field

The invention relates to the technical field of character recognition, in particular to a part code spraying character positioning method based on improved EAST.

Background

With the development of information science and technology, image processing and machine vision technologies have been widely used in the industrial field. As the most widely used machine vision technology, character text detection has been the focus and difficulty of research. The primary step of character text detection is to locate regions of character text. The EAST algorithm, one of the excellent natural scene text localization algorithms, performs well in a number of public data sets. However, due to the limitations of the size of the receptive field and the setting of the training process, the EAST algorithm is not ideal enough in character positioning, and the detection effect of the EAST algorithm on long texts needs to be improved.

Therefore, an end-to-end character recognition method with less dependency on character segmentation and high communication is needed.

Disclosure of Invention

In view of the above, the present invention provides a method for positioning a part code-spraying character based on improved EAST, which is characterized in that: the method is based on the existing EAST network structure, improves the network structure, label generation, loss function and candidate area processing of the existing EAST network, obtains the improved network structure, the improved label generation, the improved loss function and candidate area post-processing, and uses the improved EAST network to position characters;

the network structure for improving the existing EAST network specifically comprises:

constructing a VGG16 model, wherein the VGG16 model comprises 5 convolutional layers and 5 pooling layers, the size of a convolutional kernel is 3 multiplied by 3, and the step length is 1;

improving a VGG16 model, and replacing the convolution operation of the last stage in the VGG16 network structure with the mixed hole convolution HDC with a hole coefficient of [1,2,5 ];

the improved tag generation specifically comprises: reducing a quadrangle formed by a character area of the target training image by adopting a mode of differential treatment of long and short sides;

the improvement loss function specifically includes: replacing a balance cross entropy loss function in the EAST algorithm with a dice loss function;

the processing of the improved candidate region specifically includes: and sorting all the candidate frames according to the value of the coordinates of the upper left corner of the candidate frames by adopting a local perception NMS algorithm based on pre-sorting.

Further, the improved tag generation specifically comprises the following steps:

reducing a quadrangle formed by a character area of the target training image in a mode of respectively treating long and short sides;

the short side of the quadrangle is reduced by 0.3 time on the basis of the length of the short side of the original label remark;

the length of the long side of the quadrangle is reduced by a scrub _ rate multiple on the basis of the length of the long side of the original label remark:

wherein the scrub _ rate represents the reduction factor of the long side, length_sLength, representing the length of the short side of the quadrilateral_lIndicating the length of the long side of the quadrilateral.

Further, the dice loss function is determined by the following method:

L＝L_s+λ_gL_g＝L_s+λ_g(L_IoU+λ_θL_θ) (2)

wherein L represents the modified dice loss function, L_sRepresents the score map loss, L_gDenotes the geometric loss, λ_gCoefficient representing geometric loss, L_IoUDenotes the loss of overlap area, λ_θCoefficient representing angular loss, L_θIndicating the loss of angle.

Further, the processing of the improved candidate region specifically includes a pre-ranking based locally-aware NMS algorithm, and the pre-ranking based locally-aware NMS algorithm specifically includes the following steps:

s1: initializing a set box, an overlapping threshold lambda and a catastrophe threshold omega of all candidate rectangular frames;

s2: extracting coordinates (x) of upper left corner of all rectangular boxes in boxes₁，y₁) According to x respectively₁And y₁Arranging in sequence from small to large, respectively calculating and comparing x according to omega₁And y₁The blocks are arranged according to the order of the rapid change times from small to large;

s3: setting s as an empty set, and setting p as an empty set, wherein for g belonging to boxes, g represents elements in a set box of all candidate rectangular frames, namely the candidate rectangular frames, and the steps are sequentially performed according to the order of the transition times of all the rectangular frames in the boxes from small to large: if p is not empty and the overlapping area ratio of p and g is larger than lambda, p is a rectangular frame formed by combining p and g; if p is not empty and the ratio of the overlapping areas of p and g is not more than lambda, putting p into s; if p is empty, then p is g;

s4: if p is not empty, put p into s;

s5: and outputting a set S, wherein the S represents the filtered rectangular frame set.

Further, the processing of improving the candidate region further includes candidate frame merging, where the candidate frame merging is determined by the following method:

S(a)＝η_gS(g)+η_pS(p) (3)

a_i＝η_gS(g)g_i+η_pS(p)p_i (4)

where a is a merged rectangular frame, g and p denote rectangular frame candidates satisfying an overlap condition, i denotes a parameter index in the rectangular frame, i is 1,2,3 … 9, 8 coordinates including 4 vertices, 1 angular value, S denotes a score of the rectangular frame, and η_gArea coefficient, η, representing candidate rectangular box g_pAn area coefficient representing a candidate rectangular frame p;

the area coefficient eta_gAnd η_pThe following method is adopted for setting:

judging whether the area of the candidate rectangular frame g is larger than that of the candidate rectangular frame p, if so, eta_g＝1.1，η_pIf not, η_g＝1，η_p＝1.1。

The invention has the beneficial technical effects that: according to the method and the device, on the basis of the existing EAST algorithm, the network structure, the label generation, the loss function and the candidate area post-processing are optimized, the improved deep learning network can effectively enlarge the experience visual field, the recognition efficiency of long texts is improved, and the accurate positioning of part code spraying characters is realized.

Drawings

The invention is further described below with reference to the following figures and examples:

fig. 1 is a schematic diagram of a network structure of an improved EAST algorithm proposed by the present invention.

FIG. 2 is a schematic diagram of the tag generation process of the present invention.

FIG. 3 is a comparison graph of the detection effect of the present invention and the detection effect of the original algorithm.

Detailed Description

The invention is further described with reference to the accompanying drawings in which:

the invention provides a part code spraying character positioning method based on improved EAST, which is characterized by comprising the following steps: the method is based on the existing EAST network structure, improves the network structure, label generation, loss function and candidate area processing of the existing EAST network, obtains the improved network structure, the improved label generation, the improved loss function and candidate area post-processing, and uses the improved EAST network to position characters;

In the convolutional neural network, the size of the receptive field is determined by parameters such as convolution kernel and convolution step. In the standard VGG16 model, the expansion of the receptive field is achieved by the superposition and pooling of convolutional layers. However, this method of enlarging the field is limited by the number of layers of convolution, so the effect is very limited, enlarging the size of the convolution kernel will result in too many parameters, and increasing the field through the pooling layer will result in information loss. These factors limit the size of the VGG16 network receptive field. The hole convolution is a method that the size of a convolution kernel is not changed, but a plurality of holes are added to enlarge the whole convolution matrix and expand a convolution area, so that parameters are not increased and the receptive field is enlarged, but a grid effect and a small target cannot be detected. However, a hybrid hole convolution scheme is proposed, and it is considered that the above-mentioned problems can be avoided as long as the convolution coefficient setting in the process of building the neural network meets the following three requirements, according to the idea, the convolution operation (Conv _5) in the last stage of the VGG16 is replaced, the original standard convolution for 3 consecutive times is replaced by the hybrid hole convolution (HDC) with the hole coefficient of [1,2,5], and the original 7 × 7 receptive field of the part is expanded to 17 × 17, so that the VGG16 network with a larger receptive field is obtained. The new network structure is shown in fig. 1, and the dark blue part is an improved part of the VGG16 network, where maxporoling represents maximum pooling operation, Up Sampling represents upsampling operation, and Concat represents feature map splicing.

According to the technical scheme, the network structure, the label generation, the loss function and the candidate area post-processing are optimized on the basis of the conventional EAST algorithm, the improved deep learning network can effectively enlarge the perception field of view, the recognition efficiency of long texts is improved, and the accurate positioning of part code spraying characters is realized.

In this embodiment, the improved tag generation specifically includes the following steps:

In the network training process, the character area labels of all training images are quadrangles, and the coordinates of four points are expressed by (x)₁，y₁，x₂，y₂，x₃y₃，x₄，y₄) Is shown in the format of (1). x and y represent the abscissa and ordinate of the point, respectively. The subscripts "1", "2", "3", "4" indicate the points of the upper left, upper right, lower right and lower left corners, respectively, i.e., the clockwise order notation. In EAST algorithm training, the RBOX form is adopted to generate a training label, namely 5 channels { d }₁，d₂，d₃d₄θ, where d is_iThe distances from points within the training labels to the four sides of the labels (text boxes in fig. 1) are shown, and θ represents the rotation angle (angle in fig. 1). There are 1 more channels to record the likelihood score (score map in fig. 1) that each point in the image is a character point. In order to reduce interference caused by label labeling errors, a score map (score map) in an EAST algorithm is reduced by 0.3 times on the basis of the minimum bounding rectangle of the original label labeling. The whole label generation process is shown in fig. 2. The method for reducing the label rectangle can reduce the interference caused by the character edge area and improve the character positioning accuracy. However, as is apparent from fig. 2, since the long-side base of the character region is large, the reduction by a factor of 0.3 may result in a very large area of the character region not entering the score map. Such training is very poor for long character positioning. Aiming at the problem, the label training of the EAST algorithm is improved, the long and short sides are treated in a different mode, the short sides of the label rectangle are still reduced by 0.3 time, and the long sides of the label rectangle are longThe reduction factor of the sides is 0.3 times the ratio of the length of the short side of the rectangle to the length of the long side of the rectangle.

In this embodiment, the dice loss function is determined by the following method:

L＝L_s+λ_gL_g＝L_s+λ_g(L_IoU+λ_θL_θ) (2)

For neural network training, a loss function is an optimization target of the whole network training, and the quality of the loss function setting directly influences the training result of the network and the detection effect of final parameters. For the FCN network proposed by the present invention, the loss function L is mainly composed of the score map loss L_sAnd geometric loss L_gTwo-part, in which the geometric loss L_gAgain, the loss of L from the overlap area_IoUAnd angle loss L_θAnd (4) forming. Loss of L for score map_sThe method is essentially a two-classification loss function, namely, a character region is a positive sample, other regions are negative samples, the positive and negative samples are quite unbalanced, and the area of the character region is far smaller than that of the other regions. The EAST algorithm adopts a balance cross entropy loss function to solve the sample imbalance problem, and recent researches find that the dice loss has a better performance in dealing with the sample imbalance problem. Therefore, the method adopts dice loss to express the score map loss, and can better deal with the sample imbalance problem compared with the EAST algorithm.

In this embodiment, the processing of the improved candidate region specifically includes a locally aware NMS algorithm based on pre-ranking, and the steps of the locally aware NMS algorithm based on pre-ranking are as follows:

s2: extracting in boxesAll the coordinates (x) of the upper left corner of the rectangular frame₁，y₁) According to x respectively₁And y₁Arranging in sequence from small to large, respectively calculating and comparing x according to omega₁And y₁The blocks are arranged according to the order of the rapid change times from small to large;

s3: setting S as an empty set, wherein p is an empty set, and for g belonging to boxes, wherein g represents elements in a set box of all candidate rectangular frames, namely the candidate rectangular frames, sequentially executing the following steps in the order from small to large according to the transition times of all the rectangular frames in the box: if p is not empty and the overlapping area ratio of p and g is larger than lambda, p is a rectangular frame formed by combining p and g; if p is not empty and the ratio of the overlapping areas of p and g is not more than lambda, putting p into S; if p is empty, then p is g;

In general, when character region detection is performed using the network, there may be hundreds or even thousands of candidate boxes extracted by FCN. Most of them are overlapped, so that the combination of the candidate boxes with large number is the last step of the whole character area positioning algorithm. The former subsequent block merging often adopts non-maximum suppression, namely NMS algorithm, but the time complexity of the algorithm is O (n)²) And is very time-consuming. For this case, the EAST algorithm employs a locally aware NMS algorithm. The algorithm only compares two adjacent candidate areas in sequence, and if the overlapping degree is greater than a threshold value, the two candidate areas are combined, and if the overlapping degree is smaller than the threshold value, the two candidate areas are kept. The time complexity depends on the arrangement of the candidate regions, the best case is O (n) (i.e. the candidate frames that should be merged are arranged together), and the worst case is O (n)²) (alternate candidate frames that do not overlap with each other). Meanwhile, for two candidate frames meeting the overlapping degree condition, one candidate frame is not discarded and the other candidate frame is left according to the standard NMS, and the two candidate frames are merged according to the scoring conditions of the two candidate frames.

It can be seen that the temporal complexity of the locally-aware NMS algorithm depends on the random candidate arrangement. Aiming at the whole problem, the invention improves the local perception NMS algorithm, provides the local perception NMS algorithm based on pre-sorting, and sorts all the candidate frames according to the value of the coordinate at the upper left corner of the candidate frames, so that all the candidate frames which are possibly merged can be arranged together, and the effect that the time complexity is O (n) as much as possible is achieved. Meanwhile, arranging the candidate frames which are possibly merged together is also beneficial to improving the merging precision, so that the final text positioning effect is improved.

In this embodiment, the processing of improving the candidate region further includes candidate frame merging, where the candidate frame merging is determined by the following method:

S(a)＝η_gS(g)+η_pS(p) (3)

a_i＝η_gS(g)g_i+η_pS(p)p_i (4)

the area coefficient eta_gAnd η_pThe following method is adopted for setting:

In addition, in the original local perception NMS algorithm, when merging the candidate rectangular boxes, only the score condition is considered, and for long texts, the area information of the candidate boxes is also important, and the probability that the candidate box with a larger area covers the whole text area is higher. Therefore, in the candidate box merging, η of the area coefficients η is introduced, and η of the larger area of g and p is set to 1.05, and the other is 1.

According to the implementation of the improved character positioning algorithm based on EAST, the detection effect before and after the improvement of the algorithm is compared, as shown in FIG. 3. It is obvious that the defect of the original EAST algorithm for the detection of long text is that the rightmost rectangle of the first line of characters does not cover all the character areas, and the last "1" of the second line of characters is only covered by half. The optimized algorithm of the invention is used for detecting the long text, so that the whole character area is well covered, and the condition that the head and the tail of the character area are missed does not occur.

Finally, the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made to the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the claims of the present invention.

Claims

1. A method for positioning part code spraying characters based on improved EAST is characterized in that: the method is based on the existing EAST network structure, improves the network structure, label generation, loss function and candidate area processing of the existing EAST network, obtains the improved network structure, the improved label generation, the improved loss function and candidate area post-processing, and uses the improved EAST network to position characters;

the improvement loss function specifically includes: replacing a balance cross entropy loss function in the EAST algorithm with a diceloss loss function;

2. The method for improved EAST-based part inkjet character positioning as recited in claim 1, wherein: the improved label generation specifically comprises the following steps:

3. The method for improved EAST-based part inkjet character positioning as recited in claim 1, wherein: the dice loss function is determined by the following method:

L＝L_s+λ_gL_g＝L_s+λ_g(L_IoU+λ_θL_θ) (2)

4. The method for improved EAST-based part inkjet character positioning as claimed in claim 3, wherein: the processing of the improved candidate region specifically includes a pre-ranking based locally-aware NMS algorithm, which specifically includes the following steps:

s2: extracting coordinates (x) of upper left corner of all rectangular boxes in boxes₁,y₁) According to x respectively₁And y₁Arranging in sequence from small to large, respectively calculating and comparing x according to omega₁And y₁The blocks are arranged according to the order of the rapid change times from small to large;

s4: if p is not empty, putting p into S;

5. The method for improved EAST-based part inkjet character positioning as recited in claim 4, wherein: the processing for improving the candidate region further comprises candidate frame merging, wherein the candidate frame merging is determined by adopting the following method:

S(a)＝η_gS(g)+η_pS(p) (3)

a_i＝η_gS(g)g_i+η_pS(p)p_i (4)

the area coefficient eta_gAnd η_pThe following method is adopted for setting: