CN110956088B

CN110956088B - Overlapped text line positioning and segmentation method and system based on deep learning

Info

Publication number: CN110956088B
Application number: CN201911053860.0A
Authority: CN
Inventors: 王勇; 朱军民; 康铁钢; 施维
Original assignee: Beijing Yidao Boshi Technology Co ltd
Current assignee: Beijing Yidao Boshi Technology Co ltd
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2023-06-30
Anticipated expiration: 2039-10-31
Also published as: CN110956088A

Abstract

The invention discloses an overlapping text line positioning and dividing method and system based on deep learning, and belongs to the field of computer vision. The method comprises the following steps: preprocessing an original image; inputting the non-overlapping text line region feature score map, the overlapping text line region feature score map and the link information feature score map among text line region pixels into a trained instance segmentation full convolution neural network; acquiring outlines of a non-overlapping text line area and an overlapping text line area; merging the non-overlapping text line areas into the overlapping text line areas; and performing quadrilateral fitting on the text line area to obtain an external quadrilateral of the text line area, and realizing positioning segmentation of overlapped text lines. The method can effectively solve the difficult problem of positioning and dividing the overlapped text lines, and can complete the task which cannot be completed by the traditional method. In addition, the method can achieve good algorithm performance by only needing less training data and training iteration rounds and matching with simple post-processing.

Description

Overlapped text line positioning and segmentation method and system based on deep learning

Technical Field

The invention relates to the field of computer vision, in particular to an overlapping text line positioning and dividing method and system based on deep learning.

Background

In many application scenarios, there is a need to electronically record document picture content to generate structured data and complete automated recording. Such a need can be addressed using OCR (Optical Character Recognition) techniques. In general, OCR technology includes two major steps of text detection (text detection) and text recognition (text recognition). Conventional text detection methods typically employ connected-domain analysis (Connected Component Analysis, CCA) or a Sliding Window detection mechanism (SW). These methods typically require manual design of a series of rules to extract low-level or medium-level features from the image, and combine complex pre-and post-processing procedures to accomplish the text detection task. Limited feature representation capability and complex processing flow of manual design rules, the traditional methods are difficult to perform with high price, and particularly in difficult recognition scenes such as fuzzy characters, overlapped characters, scene characters with complex background and the like.

In recent years, deep learning techniques have rapidly evolved and have been successfully applied to text detection and recognition tasks. In essence, deep learning is an algorithm of feature learning that approximates a potential function mapping from input to output by automatically learning and extracting features of the input object (image, text, etc.), fitting a specific target output label. A deep learning model is typically composed of a series of sequential operations that must be differentiable so that optimization methods such as gradient descent can be employed to achieve end-to-end training optimization.

Although deep learning techniques have greatly improved the performance of document text detection algorithms, and even harder scene text detection tasks, it has to be acknowledged that some special harder text detection tasks, such as overlapping text line detection, remain quite challenging. As shown in fig. 1. Such overlapping lines of text are present in large numbers in tickets, forms, and pictures of documents, and are typically caused by offset, tilt, and even nested printing. If the problems of detection and identification of the text can be well solved, the performance of structured input of the object can be greatly improved, so that the method has great practical application value.

Disclosure of Invention

The invention relates to a positioning and segmentation method of overlapped text lines based on deep learning, which can well solve the detection problem of the overlapped text lines in various notes, forms and document images shot by scanners, high-speed cameras and mobile phones, provide more accurate text line area information for subsequent recognition tasks so as to improve the overall recognition precision and further finish the automatic input work of structured data with high quality.

According to a first aspect of the present invention, there is provided a method for positioning and dividing overlapping text lines based on deep learning, the method comprising the steps of:

step 1, inputting an original image containing overlapped text lines, and preprocessing the original image;

step 2, training the example segmentation full convolutional neural network, inputting the preprocessed original image into the trained example segmentation full convolutional neural network, and outputting a non-overlapping text line region feature score map, an overlapping text line region feature score map and a link information feature score map among pixels of the text line region;

step 3, acquiring the outlines of the non-overlapping text line area and the overlapping text line area based on the non-overlapping text line area characteristic score map, the overlapping text line area characteristic score map and the link information characteristic score map among pixels of the text line area by a connected domain analysis method;

step 4, merging the non-overlapping text line area to the overlapping text line area according to the outline of the non-overlapping text line area and the outline of the overlapping text line area;

and 5, performing quadrilateral fitting on the combined text line areas to obtain circumscribed quadrilaterals of the text line areas, and realizing positioning segmentation of overlapped text lines.

Further, the step 1 specifically includes: and carrying out boundary filling N units on the input original image, and then carrying out 1/M downsampling to obtain the preprocessed original image, wherein M and N are integers which are equal to or larger than 1, and M is an integer multiple of N.

Further, the step 2 specifically includes:

step 21: labeling each sample image in the training sample set through the outline of the text line area represented by the quadrangle, and generating a labeled tag file;

step 22: the method comprises the steps of sending a label file and a sample image into an instance segmentation full-convolution instance segmentation network for training, wherein in order to complete supervision and learning of overlapped text lines, the instance segmentation full-convolution instance segmentation network automatically calculates the outline of an overlapped text line area according to the outline of a Chinese text line area in the label file, takes the outline as a supervision and learning target of the overlapped text line area, and combines the outline of a non-overlapped text line area to complete a training process to form a preliminary training model;

step 23: aiming at the preliminary training model, testing is carried out through a test sample set, the detection segmentation precision of the non-overlapping text line region and the overlapping text line region is evaluated, if the precision requirement is met, the training process is terminated, and the preliminary training model is used as a trained example to segment the full convolution neural network; if the accuracy requirement is not met, increasing the training sample size, adjusting the structure of the instance segmentation full convolution instance segmentation network and training parameters, and repeating the training process until a trained instance segmentation full convolution neural network meeting the accuracy requirement is obtained;

step 24: and inputting the preprocessed original image into a trained instance segmentation full convolution neural network, and outputting a non-overlapping text line region feature score map, an overlapping text line region feature score map and a link information feature score map among text line region pixels.

Further, the step 3 specifically includes:

step 31: setting a first threshold for the non-overlapping text line region feature score map, a second threshold for the overlapping text line region feature score map, and a third threshold for the link information feature score map between text line region pixels;

step 32: binarization processing is carried out on the non-overlapping text line area characteristic score diagram according to a first threshold value, binarization processing is carried out on the overlapping text line area characteristic score diagram according to a second threshold value, binarization processing is carried out on the link information characteristic score diagram between text line area pixels according to a third threshold value, non-overlapping text line area pixel points and background pixel points are obtained in the non-overlapping text line area characteristic score diagram, overlapping text line area pixel points and background pixel points are obtained in the overlapping text line area characteristic score diagram, and link state information and non-link state information are obtained in the link information characteristic score diagram between text line area pixels;

step 33: and obtaining a pixel point region of the non-overlapping text line region according to the pixel point combination link state information of the non-overlapping text line region, obtaining a pixel point region of the overlapping text line region according to the pixel point combination link state information of the overlapping text line region, and representing the outline of the pixel point region by using a connected domain.

Further, the value ranges of the first threshold, the second threshold and the third threshold are all [0,1].

Further, the step 4 specifically includes:

step 41: combining the pixel point area of the non-overlapping text line area and the pixel point area of the overlapping text line area;

step 42: judging adjacent information between adjacent pixel points, combining a characteristic score graph of link information between pixels of a text line region, and merging two pixel points into a connected domain when the two pixel points are adjacent and the link information of the two pixel points is positive;

further, two pixels are adjacent to each other: the two pixel points differ by 1-3 pixels in the X-direction pixel coordinate axis or the Y-direction pixel coordinate axis.

Step 43: and acquiring an optimal distance threshold value on a variable distance threshold value test set by adopting a strategy based on variable distance threshold value combination and adopting a dynamic searching distance threshold value mode based on end-to-end detection precision, and carrying out combination operation if the distance between two connected domains is within the optimal distance threshold value range.

According to a second aspect of the present invention, there is provided a deep learning based overlapping text line location splitting device, comprising:

an original image input section for inputting an original image containing overlapping text lines, the original image being preprocessed;

the feature score map output component is used for inputting the preprocessed original image into the trained instance segmentation full convolution neural network and outputting a non-overlapping text line region feature score map, an overlapping text line region feature score map and a link information feature score map among text line region pixels;

contour obtaining means for obtaining, by a connected domain analysis method, contours of the non-overlapping text line region and the overlapping text line region based on the non-overlapping text line region feature score map, the overlapping text line region feature score map, and the link information feature score map between pixels of the text line region;

a region merging section for merging the non-overlapping text line region to the overlapping text line region based on the outline of the non-overlapping text line region and the overlapping text line region;

and the result output part is used for performing quadrilateral fitting on the text line area to obtain an external quadrilateral of the text line area, so as to realize positioning segmentation of overlapped text lines.

According to a third aspect of the present invention, there is provided a deep learning based overlapping text line localization segmentation system, the system comprising:

a processor and a memory for storing executable instructions;

wherein the processor is configured to execute the executable instructions to perform the deep learning based overlapping text line location segmentation method as set forth in any of the preceding aspects.

According to a fourth aspect of the present invention, there is provided a computer-readable storage medium, having stored thereon a computer program,

the computer program, when executed by a processor, implements the deep learning based overlapping text line localization segmentation method as described in any of the previous aspects.

The invention has the beneficial effects that:

1. image features are automatically extracted and learned based on a deep learning full convolution network instance segmentation method, so that difficult manual rule design and complex preprocessing and post-processing flows are avoided;

2. the method can adapt to different types of document images and different types of text line overlapping patterns, and solves the problem that the traditional method cannot solve;

3. the designed full convolution network finally outputs Score Maps, namely Score Maps, representing the predicted confidence level, and the confidence level can effectively guide the subsequent recognition and even structuring work;

4. the method is simple and efficient, the whole flow is composed of the FCN network and simple and efficient post-processing logic, and the requirements of practical application are met.

5. The labeling training process is simple, and the training process can be efficiently completed on the premise that the overlapped area does not need to be specially labeled.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to the structures shown in these drawings without inventive effort for a person skilled in the art.

FIG. 1 illustrates a typical prior art overlapping text market;

FIG. 2 shows a flow chart of an overlapping text line location segmentation method based on deep learning according to the present invention;

FIG. 3 illustrates an example of overlapping text lines in a deep learning based overlapping text line localization segmentation method in accordance with the present invention;

FIG. 4 illustrates a non-overlapping text line region feature score plot in a deep learning based overlapping text line location segmentation method in accordance with the present invention;

FIG. 5 illustrates an overlapping text line region feature score plot in a deep learning based overlapping text line location segmentation method in accordance with the present invention;

FIG. 6 shows a schematic diagram of an optimal threshold search process in a deep learning based overlapping text line location segmentation method according to the present invention;

FIG. 7 illustrates an exemplary circumscribed quadrilateral schematic of overlapping text lines in a deep learning based overlapping text line location segmentation method in accordance with the present invention;

fig. 8 shows an effect diagram of an overlapping text line location segmentation method based on deep learning according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein, for example.

Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

A plurality, including two or more.

And/or, it should be understood that for the term "and/or" used in this disclosure, it is merely one association relationship describing associated objects, meaning that there may be three relationships. For example, a and/or B may represent: a exists alone, A and B exist together, and B exists alone.

The invention relates to a deep learning-based overlapped text line positioning and segmentation method, which comprises the following steps:

inputting an original image containing overlapped text lines, and preprocessing the original image;

inputting the preprocessed original image into a trained instance segmentation full convolution neural network, and outputting a non-overlapping text line region feature score map, an overlapping text line region feature score map and a link information feature score map among pixels of the text line region;

acquiring outlines of a non-overlapping text line region and an overlapping text line region based on the non-overlapping text line region feature score map, the overlapping text line region feature score map and the link information feature score map among pixels of the text line region by a connected domain analysis method;

merging the non-overlapping text line region into the overlapping text line region according to the outline of the non-overlapping text line region and the overlapping text line region;

and performing quadrilateral fitting on the text line area to obtain an external quadrilateral of the text line area, and realizing positioning segmentation of overlapped text lines.

Examples

The invention relates to an accurate positioning and dividing method for overlapped text lines. Aiming at the difficult problem of overlapping text line detection in bills, forms and document images, the team innovatively adopts an example segmentation full convolution neural network model, integrates the detection of text regions and segmentation tasks of different text line example regions into a neural network, adopts an innovative post-processing method, and generates complete and accurate text line example outlines by combining a text line segmentation diagram and an overlapping region segmentation diagram, thereby accurately determining different text line region coordinates.

The specific flow is shown in fig. 2:

the first step of image preprocessing: the input image is preprocessed, it is important that the boundaries are padded so that the width and height of the image are not affected by the downsampling, and the value of the aligned boundaries is generally consistent with the downsampled value. For example, where the downsampling is 1/16, the boundary alignment is 16 units or pixels, or an integer multiple of 16, such as 32,64, etc.

Second step of text instance segmentation at pixel level: and sending the preprocessed image into a trained instance segmentation full convolution neural network, and outputting a plurality of feature Score Maps (Score Maps) which respectively represent the background, the text region, the overlapped text region and the feature map of link information among pixels of the text region.

Training process for example segmentation full convolution network: firstly, labeling each sample in a sample set, wherein the labeling content mainly comprises the outline of a text line region represented by a quadrangle, and special labeling is not required for the overlapped text line region, so that a label file is generated. The file and the image file are sent together to a full convolution instance segmentation network for training. In order to complete supervision and learning of the overlapped text line areas, outline information of the overlapped areas is automatically calculated according to outlines of the text line examples in the label file in network preprocessing, and the outline information is sequentially used as a supervision target of the overlapped areas and combined with a target of a non-overlapped area to jointly complete a training task. When one round of training is completed, the accuracy of the whole text line detection segmentation needs to be evaluated on the test set for the trained model, wherein the accuracy evaluation of the overlapping area is also included. If the expected effect and the accuracy index are reached, the training process can be terminated, and the model is used as prediction; if not, the training sample size is increased, and the structure, training parameters and the like of the full convolution example segmentation network model are possibly adjusted, and the training process is repeated until the estimated performance meets the requirement. The possible adjustment of the model structure is usually from two aspects, namely, the adjustment of the model capacity on one hand, and the aim is to improve the characteristic learning capacity of the model, including the number of layers of the convolutional neural network, the number of filters of the convolutional operation of each layer, the characteristic map fusion mode, the style of the nonlinear activation function and the like; another aspect is the adjustment of model generalization ability, such as adjustment of regularization term parameters in the network, with the goal of improving the performance of the model on the test set (i.e., the unworn samples). The possible adjustment of training parameters generally comprises several aspects, namely, adjustment of super parameters of the training process, such as adjustment of learning rate attenuation strategy and initial size, adjustment of training batch size and overall iteration number, and the like; on the other hand, the adjustment of the training loss function comprises the adjustment of the loss function style and the super parameters related to the loss function, and the like.

Taking fig. 3 as an example, a non-overlapping text line area feature score map and an overlapping text line area feature score map are described below.

1) The non-overlapping text line area Score Map, as shown in fig. 4, has each pixel value representing the confidence that the pixel is inside the text line area, normalized to the [0,1] interval.

2) The overlapping text line area Score Map, as shown in fig. 5, each pixel value represents the confidence that the pixel is in the overlapping text area, again normalized to the [0,1] interval.

Thirdly, extracting text region and overlapping region outlines: based on the connected domain analysis method, the contour information of the non-overlapping text line region and the overlapping text line region is obtained by creatively combining the non-overlapping text line region, the overlapping text line region Score maps and the Score maps of the inter-pixel link information in the text region.

Fourth step, merging the overlapped text line area outline to the non-overlapped text line area outline: the step is to complete the merging of the outline of each text line instance and the outline of the overlapped area, and the process adopts the methods of connected domain analysis and multi-connected domain merging to generate the complete outline of each text line area. The connected domain analysis process creatively combines the adjacent position information between pixels and the link information between adjacent pixels predicted by the full convolution neural network, namely, only when two pixels are adjacent and the link information of the predicted pixels is positive, the two pixels are considered to be combined into one connected domain. In the multi-connected domain merging process, a strategy based on variable distance threshold merging is innovatively adopted, and an optimal distance threshold is obtained on a test set in a dynamic searching distance threshold mode based on end-to-end detection precision. When the optimal threshold is determined, the merging operation is performed as long as the distance between the communicating domains is within the threshold range. The best threshold search process is shown in fig. 6, and specifically includes:

and testing the full convolution instance segmentation network through a variable distance threshold test set, setting a distance threshold search range interval, such as [0,5], and carrying out optimal distance threshold search to obtain an optimal distance threshold.

Setting a distance threshold search range interval to perform optimal distance threshold search comprises:

traversing a threshold interval, wherein the step length is 1; applying the threshold to perform text line contour merging; calculating the overall detection segmentation precision of the test set; and storing the maximum precision and the optimal threshold, and taking the threshold as an optimal distance threshold.

Experiments have shown that in combination with these two innovative approaches, the algorithm exhibits a relatively high degree of accuracy in overlapping text contours and text contour combinations.

And fifthly, performing quadrilateral fitting on the text line instance area to obtain an external quadrilateral of the text line. As shown in fig. 7.

Fig. 8 shows some effect diagrams of overlapping text line location segmentation using the technical solution of the present invention. Experiments show that the method can very effectively solve the difficult problem of positioning and dividing the overlapped text lines, and can complete tasks which cannot be completed by the traditional method. In addition, the method can achieve good algorithm performance by only matching less training data with training iteration rounds and simple post-processing, and therefore has great practical application value.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be apparent to those skilled in the art that the above implementation may be implemented by means of software plus necessary general purpose hardware platform, or of course by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The embodiments of the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present invention and the scope of the claims, which are to be protected by the present invention.

Claims

1. The overlapping text line positioning and dividing method based on deep learning is characterized by comprising the following steps of:

step 5, performing quadrilateral fitting on the combined text line area to obtain the circumscribed quadrilateral of the text line area, realizing the positioning segmentation of overlapped text lines,

wherein, the step 3 specifically includes:

step 33: obtaining a pixel point region of the non-overlapping text line region according to the pixel point combination link state information of the non-overlapping text line region, obtaining a pixel point region of the overlapping text line region according to the pixel point combination link state information of the overlapping text line region, and representing the outline of the pixel point region by using a connected domain;

wherein, the step 4 specifically includes:

step 42: judging adjacent information between adjacent pixels, combining a characteristic score graph of link information between pixels of a text line region, and merging two pixels into a connected domain when the two pixels are adjacent and the link state information of the two pixels is positive; wherein, two adjacent pixels refer to: the two pixel points are different by 1-3 pixels in the X-direction pixel coordinate axis or the Y-direction pixel coordinate axis;

step 43: and acquiring an optimal distance threshold value on a variable distance threshold value test set by adopting a strategy based on variable distance threshold value combination and adopting a dynamic distance threshold searching mode based on end-to-end detection precision, and carrying out combination operation if the distance between two connected domains is within the optimal distance threshold value range.

2. The method for positioning and splitting overlapped text lines based on deep learning according to claim 1, wherein the step 1 specifically includes: and carrying out boundary filling N units on the input original image, and then carrying out 1/M downsampling to obtain the preprocessed original image, wherein M and N are integers more than or equal to 1, and M is integer multiple of N.

3. The method for positioning and splitting overlapped text lines based on deep learning according to claim 1, wherein the step 2 specifically comprises:

step 22: the method comprises the steps of sending a label file and a sample image into an example segmentation full convolution neural network for training, wherein in order to complete supervision and learning of overlapped text lines, the example segmentation full convolution neural network automatically calculates the outline of an overlapped text line area according to the outline of a text line area in the label file, takes the outline as a supervision and learning target of the overlapped text line area, and combines the outline of a non-overlapped text line area to complete a training process to form a preliminary training model;

step 23: aiming at the preliminary training model, testing is carried out through a test sample set, the detection segmentation precision of the non-overlapping text line region and the overlapping text line region is evaluated, if the precision requirement is met, the training process is terminated, and the preliminary training model is used as a trained example to segment the full convolution neural network; if the accuracy requirement is not met, increasing the training sample size, adjusting the structure and training parameters of the instance segmentation full convolution neural network, and repeating the training process until the trained instance segmentation full convolution neural network meeting the accuracy requirement is obtained;

4. The method for positioning and partitioning overlapped text lines based on deep learning as set forth in claim 1, wherein the first threshold, the second threshold and the third threshold are all in the value ranges of [0,1].

5. An overlapped text line location splitting device based on deep learning, characterized in that the overlapped text line location splitting device operates based on the overlapped text line location splitting method according to any one of claims 1 to 4, the overlapped text line location splitting device comprising:

6. An overlapping text line location segmentation system based on deep learning, the system comprising:

a processor and a memory for storing executable instructions;

wherein the processor is configured to execute the executable instructions to perform the deep learning based overlapping text line location segmentation method of any one of claims 1 to 4.

7. A computer-readable storage medium, having a computer program stored thereon,

the computer program, when executed by a processor, implements the deep learning based overlapping text line localization segmentation method as claimed in any one of claims 1 to 4.