CN111274863A

CN111274863A - Text prediction method based on text peak probability density

Info

Publication number: CN111274863A
Application number: CN201911237840.9A
Authority: CN
Inventors: 张发恩; 孙天齐; 袁智超; 陆强
Original assignee: Innovation Qizhi (chengdu) Technology Co Ltd
Current assignee: Innovation Qizhi (chengdu) Technology Co Ltd
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-06-12
Anticipated expiration: 2039-12-06
Also published as: CN111274863B

Abstract

The invention relates to a text positioning method based on text peak probability density, which comprises the steps of preprocessing data, generating a required text mask and a text peak probability graph by disclosing the existing marking information of a data set; performing data enhancement on the processed training set, transmitting the data into a convolutional neural network for training, and continuously optimizing network parameters through a combined loss function to obtain an optimal model; firstly, loading the trained model parameters, transmitting the pictures to be predicted to the deep learning network to obtain a predicted text prediction result, and performing post-processing on prediction information to obtain an accurate text positioning result. According to the technical scheme, the learning of the neural network on the probability information of the text center is increased, the segmentation mass center of the text prediction image is obtained by matching with a post-processing algorithm, the center line of the curved text can be obtained, the boundary of the curved text is further obtained, and the long, multidirectional and curved text can be well processed.

Description

Text prediction method based on text peak probability density

Technical Field

The invention relates to the technical field of image processing, in particular to a text prediction method based on text peak probability density.

Background

Characters are visible anywhere in life, with the wide popularization of intelligent terminals and digital products, people can shoot scenes in which people are interested anytime and anywhere, the scenes contain a large amount of useful text information, such as road sign information, shop names, propaganda slogans and the like in natural scenes, and the text information can provide decision input for intelligent transportation, unmanned driving, data intelligence and the like.

In an industrial scene, laser engraving numbers, printing numbers, spot welding numbers and the like on parts need to be systematically input by workers to trace the source of products so as to accurately position to a production link, but human errors can exist when the workers observe, record and input the codes into a computer by eyes.

The intelligent, fast and efficient life style is created, and the current people are oriented, in different scenes, the background is complex and various, the surrounding environment can generate illumination or shading influence on character positioning, different characters and fonts can also bring great interference to a text positioning algorithm, and therefore the robust character positioning algorithm is the basis of character recognition.

The existing text positioning method mainly comprises the following two types:

based on the segmentation idea, the whole image is segmented, then the segmented image is sent to a trained text detector to judge whether characters exist or not, then the segmented image with the characters is combined into an image and is sent to a text detection model to be predicted, and text positioning information is obtained.

Based on the deep learning idea, feature extraction is carried out on the image, at least one text preselection area is selected by using an area recommendation network, the candidate areas are adjusted by a convolutional neural network, and finally the preselection areas are combined to obtain the final text area position.

The method carries out feature extraction and image combination by stages through the two models, is not an end-to-end process, has long positioning time, and easily separates character parts by image segmentation parts, thereby easily causing classification errors. The method has the advantages that the method is difficult to distinguish adjacent texts from bent texts, cannot effectively position characters with different sizes in a digital image, cannot identify the characters, and is poor in text positioning effect under a complex background.

Disclosure of Invention

The invention provides a text positioning method based on text peak probability density, designs a multi-scale text detection network, and solves a central line by adopting a segmentation centroid method so as to obtain an accurate prediction result.

A text positioning method based on text peak probability density comprises the following four steps:

s1) image preprocessing: performing data preprocessing, and generating a required text mask and a text peak probability chart by disclosing the existing labeling information of a data set;

s2) data enhancement: performing data enhancement on the processed training set, wherein the data enhancement comprises at least one random transformation of brightness, contrast, hue, saturation, rotation, mirror image, cutting, scaling and the like;

s3) network training: transmitting the data into a convolutional neural network for training, and continuously optimizing network parameters through a combined loss function to obtain an optimal model;

s4) prediction process: firstly, loading the trained model parameters, transmitting the pictures to be predicted to the deep learning network to obtain a predicted text prediction result, and performing post-processing on prediction information to obtain an accurate text positioning result.

In the above technical solution, step S1) specifically includes:

s11) data preprocessing: setting a feature map of a text line center area to be composed of 0-255 gray levels, wherein the numerical value of the center line of the text line is larger as the text line is closer, the center of the text is whiter, the edge is blacker, and a gradual change effect is presented;

s12) text mask generation: generating a required text mask and a text line center mask by disclosing the existing labeling information of the data set, wherein the text mask generates a text rectangular mask by text block positioning information in the data set, generally 4 corner point coordinates of a text block;

s13) text peak probability map: and obtaining a text block central line according to the 4 corner point coordinates marked by the text, wherein the central line is the peak highest point with the value of 255, the central line gradually decreases from the central line to the edge of the text block, and the value of the edge is 0, so that a text peak probability graph is obtained.

In the above technical solution, the convolutional neural network base module in step S3) includes a residual network module, a concat module, an upsampling module, and the like.

In the foregoing technical solution, the post-processing of the prediction information in step S4) is specifically:

s41) firstly, calculating the connected domain of the predicted text mask and the text peak probability graph and eliminating the smaller connected domain;

s42) calculating coordinates of four corner points of the predicted text box, dividing the text box, and calculating mass center coordinates of each divided block by using the prediction information;

s43) fitting the circle center and the radius of all centroids calculated by the text box by using a least square method, carrying out non-calculation on the longest edge of the predicted text box which is less than 2 times of the broadside, carrying out deviation on a centroid connecting line to obtain accurate text prediction information, carrying out polar coordinate transformation on a curved text box to finally obtain a horizontal text, and directly carrying out text cutting on the predicted box which does not accord with the segmentation condition.

The invention has the beneficial effects that:

1. different from the traditional text detection or segmentation mode, the learning of the neural network on the text center probability information is increased. The text peak probability map is just like a mountain (the peak is the center and the foot is the boundary), and the top of the mountain can more easily separate text instances (through semantic segmentation). Although text masks exist, the boundary between the boundary and the center is not accurate enough, and the network can be well helped to distinguish text positioning information by fully utilizing the text peak probability map. And (3) solving the segmentation mass center of the text prediction image by matching with a post-processing algorithm, obtaining a curved text center line and further obtaining a curved text boundary, and well processing long, multi-directional and curved texts.

2. And a coordinate learning mode of a text region in the past is abandoned, and a text mask and a text peak probability map are generated by the labeled data, so that the convolutional neural network can learn the mask and peak probabilities at the same time.

3. A multi-scale text detection network is designed.

4. And (3) using high-pass filtering on the probability map obtained by prediction to inhibit pixels with lower probability, further separating adjacent texts, eliminating a characteristic region with a small number of elements by using a connected domain method, and reducing prediction errors.

5. A center line is solved by a segmentation centroid method so as to obtain an accurate prediction result.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a schematic diagram of a convolutional neural network architecture design of a text localization method based on text peak probability density.

Fig. 2 is a schematic diagram of a training process of a text positioning method based on text peak probability density according to the present invention.

Fig. 3 is a schematic diagram of a prediction flow of a text positioning method based on text peak probability density according to the present invention.

FIG. 4 is a schematic diagram of the post-processing steps of the prediction information of the text positioning method based on the text peak probability density.

FIG. 5 is a schematic diagram of centroid calculation of a text localization method based on text peak probability density according to the present invention.

Fig. 6 shows a picture to be predicted according to an embodiment of the method of the present invention.

FIG. 7 is a text mask predicted by the embodiment of FIG. 6.

FIG. 8 is a graph of the text peak probability predicted by the embodiment of FIG. 6.

FIG. 9 is a diagram of the embodiment of FIG. 6 calculating the centroid of a text mask within a text block to obtain a text centerline.

Fig. 10 is an image obtained after the embodiment in fig. 6 is fitted.

Fig. 11 is a text prediction result image obtained by the embodiment in fig. 6.

FIG. 12 is a final image of the warped text of FIG. 6 after correction and segmentation.

Detailed Description

The technical scheme of the invention is further described by combining the attached drawings of the invention:

In the above technical solution, step S1) specifically includes:

In the above technical solution, the convolutional neural network basic module in step S3) includes a residual network module, a concat module, an upsampling module, and the like, and the network architecture is designed as shown in fig. 1, where Conv2d _ block is 1 layer of convolution, BN layer, and Relu layer, Res _ block is a residual module, upsample is an upsampling layer, Conv2d is a convolutional layer, h is an image height, and w is an image width. The network structure of the invention can obviously improve the multi-scale text detection and the distinguishing of adjacent text lines in the image.

In the above technical solution, step S3) is a schematic diagram of a network training process shown in fig. 2.

In the above technical solution, a schematic diagram of the prediction process in step S4) is shown in fig. 3.

In the above technical solution, the post-processing step of the prediction information in step S4) is as shown in fig. 4, and specifically includes:

s42) calculating coordinates of four corner points of the predicted text box, dividing the text box, and calculating mass center coordinates of each divided block by using the prediction information; the principle is shown in fig. 5, wherein the small circle is a local centroid coordinate calculated by a text prediction mask of each segment;

The following describes in detail the implementation steps of the text localization method based on the text peak probability density according to the present invention with specific examples according to the attached drawings.

As shown in fig. 2, the image preprocessing, data enhancement and network training steps described in the method of the present invention are specifically described as follows:

firstly, carrying out data preprocessing on a public data set, decoding labeling information, generating a text mask and a text center mountain peak probability graph, respectively putting an original image and a preprocessed image into three folders, and manufacturing a training set and a test set;

performing data enhancement on the three images of the training set according to batches, amplifying the training set, increasing the data density and avoiding overfitting;

constructing a convolutional neural network;

establishing a combined Loss function, using an L1 distance to constrain the predicted text mask (tp) and text peak probability (tcp), and superposing the two parts of Loss to form a comprehensive Loss function Loss, wherein the Loss is tp _ Loss + lambda tcp _ Loss, and 5 is assigned through a large number of experiments lambda, so that the Loss function can be reduced most quickly;

and training the model, and storing the optimal model parameters obtained by each iterative training.

As shown in FIG. 3, the prediction process of the present invention is described as follows:

loading optimal model parameters;

predicting the model, carrying out scaling processing on the image to be predicted as shown in FIG. 6, and transmitting the image to be predicted into the model;

predicting to obtain a text mask and a text peak probability graph, as shown in fig. 7 and 8, eliminating a text region with a smaller probability through threshold filtering, and performing gaussian filtering on predicted information to enable a connection position to be smoother;

generating connected domains, eliminating smaller connected domains, performing digital marking on the connected domains, traversing the number of the connected domains, and calculating the minimum external moment of each connected domain to obtain 4 coordinates of the text box;

after a text circumscribed rectangle is obtained, carrying out blocking processing on the rectangle according to the perpendicular direction of the longest side;

calculating the centroid of the text mask in the text block, calculating and connecting each region divided in the rectangle to obtain the text center line, as shown in fig. 9;

obtaining a plurality of centroid points in the rectangular frame, fitting curvature and radius by using the plurality of centroid points by using a least square method, setting a threshold value, and when the radius is larger than the width or the height of the image, not needing to perform least square fitting on the text frame; the circle centers after fitting are marked by circular points, as shown in FIG. 10;

performing two-side displacement on the fitted text center line in the direction of the perpendicular line of the long side of the rectangle, wherein the distance is half of the width of the rectangle, and connecting the two ends to finish the detection of the bent text;

according to the curvature and the radius of the detected bent text, performing polar coordinate transformation on the bent text, and for the non-bent text, directly using the minimum external moment to perform segmentation;

the text prediction result and the result of the corrected and segmented warped text shown in fig. 11 are finally obtained, as shown in fig. 12.

The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, which are described in the specification and illustrated only to illustrate the principle of the present invention, but that various changes and modifications may be made therein without departing from the spirit and scope of the present invention, which fall within the scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. A text positioning method based on text peak probability density is characterized by comprising the following four steps:

2. The method as claimed in claim 1, wherein the step S1) specifically includes:

s12) text mask generation: generating a required text mask and a text line center mask by disclosing the existing labeling information of the data set, wherein the text mask generates a text rectangular mask through 4 corner point coordinates of text blocks in the data set;

s13) text peak probability map: and obtaining a center line of the text block according to the coordinates of the 4 corner points marked by the text, wherein the center line is the peak highest point, the value of the peak highest point is 255, the center line gradually decreases towards the edge of the text block, and the value of the edge is 0, so that a text peak probability map is obtained.

3. The method of claim 1, wherein the text positioning method based on the text peak probability density comprises:

the convolutional neural network basic module in the step S3) comprises a residual error network module, a concat module and an up-sampling module.

4. The method as claimed in claim 1, wherein the post-processing of the prediction information in step S4) is specifically: