CN110956088A - Method and system for positioning and segmenting overlapped text lines based on deep learning - Google Patents

Method and system for positioning and segmenting overlapped text lines based on deep learning Download PDF

Info

Publication number
CN110956088A
CN110956088A CN201911053860.0A CN201911053860A CN110956088A CN 110956088 A CN110956088 A CN 110956088A CN 201911053860 A CN201911053860 A CN 201911053860A CN 110956088 A CN110956088 A CN 110956088A
Authority
CN
China
Prior art keywords
text line
line region
overlapped
overlapping
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911053860.0A
Other languages
Chinese (zh)
Other versions
CN110956088B (en
Inventor
王勇
朱军民
康铁刚
施维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yidao Boshi Technology Co ltd
Original Assignee
Beijing Yidao Boshi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yidao Boshi Technology Co ltd filed Critical Beijing Yidao Boshi Technology Co ltd
Priority to CN201911053860.0A priority Critical patent/CN110956088B/en
Publication of CN110956088A publication Critical patent/CN110956088A/en
Application granted granted Critical
Publication of CN110956088B publication Critical patent/CN110956088B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for positioning and segmenting overlapped text lines based on deep learning, and belongs to the field of computer vision. The method comprises the following steps: preprocessing an original image; inputting the partial area feature score map of the non-overlapped text line, the partial area feature score map of the overlapped text line and the partial area pixel link information feature score map of the text line into a trained example segmentation full convolution neural network; acquiring the outlines of a non-overlapped text line region and an overlapped text line region; combining the overlapped text line area to the overlapped text line area; and performing quadrilateral fitting on the text line region to obtain an external quadrilateral of the text line region, and realizing the positioning segmentation of the overlapped text lines. The invention can effectively solve the difficult problem of positioning and dividing the overlapped text lines and can complete the task which can not be completed by the traditional method. And moreover, good algorithm performance can be achieved only by matching less training data with training iteration rounds and simple post-processing.

Description

Method and system for positioning and segmenting overlapped text lines based on deep learning
Technical Field
The invention relates to the field of computer vision, in particular to a method and a system for positioning and segmenting overlapped text lines based on deep learning.
Background
In many application scenarios, there is a need to electronize document picture content to generate structured data and complete automated entry. Such requirements can be addressed using OCR (optical Character recognition) techniques. Generally, OCR technology includes two major steps of text detection (text detection) and text recognition (text recognition). Conventional text detection methods typically employ Connected Component Analysis (CCA) or Sliding Window detection (SW) mechanisms. These methods usually require manual design of a series of rules to extract low-level or medium-level features in the image, and complex pre-processing and post-processing procedures are combined to complete the task of text detection. The traditional methods are limited by limited feature representation capability of manual design rules and complex processing flows, and the traditional methods are difficult to have higher performance, especially in some difficult recognition scenes, such as fuzzy characters, overlapped characters, scene characters with complex backgrounds and the like.
In recent years, deep learning techniques have been rapidly developed and successfully applied to text detection and recognition tasks. In essence, deep learning pertains to feature learning algorithms that approximate a potential functional mapping from input to output by automatically learning and extracting features of the input objects (images, text, etc.) and fitting specific target output labels. A deep learning model is usually composed of a series of sequential operations that must be differentiable so that end-to-end training optimization can be achieved using optimization methods such as gradient descent.
Although the deep learning technique brings great improvement to the performance of the document text detection algorithm, and even has great improvement to the difficult scene text detection task, it has to be acknowledged that some special difficult text detection tasks still have great challenges, such as the detection of overlapped text lines. As shown in fig. 1. Such overlapping lines of text are present in large numbers in tickets, forms, and documents, etc. in pictures, often caused by offset, skewed, and even nested printing, among other reasons. If the detection and identification problems of the text can be well solved, the performance of structured input of the object is greatly improved, and therefore the method has a great practical application value.
Disclosure of Invention
The invention relates to a positioning segmentation method of overlapped text lines based on deep learning, which can well solve the detection problem of the overlapped text lines in various types of bills, forms and document images shot by a scanner, a high-speed shooting instrument and a mobile phone, provide more accurate text line region information for subsequent identification tasks, improve the accuracy of overall identification and further finish the automatic input work of structured data with high quality.
According to a first aspect of the present invention, there is provided a method for positioning and segmenting overlapped text lines based on deep learning, the method comprising the following steps:
step 1, inputting an original image containing overlapped text lines, and preprocessing the original image;
step 2, training the example segmentation full convolution neural network, inputting the preprocessed original image into the trained example segmentation full convolution neural network, and outputting a non-overlapping text line region feature score image, an overlapping text line region feature score image and a link information feature score image among text line region pixels;
step 3, acquiring outlines of the non-overlapping text line region and the overlapping text line region by a connected domain analysis method based on the non-overlapping text line region feature score map, the overlapping text line region feature score map and the link information feature score map among the text line region pixels;
step 4, combining the non-overlapped text line region to the overlapped text line region according to the outlines of the non-overlapped text line region and the overlapped text line region;
and 5, performing quadrilateral fitting on the combined text line region to obtain an external quadrilateral of the text line region, and realizing the positioning segmentation of the overlapped text lines.
Further, the step 1 specifically includes: the method comprises the steps of conducting boundary completion on input original images by N units, then conducting 1/M down-sampling, and obtaining preprocessed original images, wherein M and N are integers which are not less than 1, and M is an integral multiple of N.
Further, the step 2 specifically includes:
step 21: labeling each sample image in the training sample set by using a quadrilateral to represent the outline of a text line area, and generating a labeled file with labels;
step 22: sending the label file and the sample image into an example segmentation full convolution example segmentation network for training, wherein in order to complete supervision and learning of the overlapped text line, the example segmentation full convolution example segmentation network automatically calculates the outline of the overlapped text line region according to the outline of the text line region in the label file, and then the outline is taken as the supervision and learning target of the overlapped text line region, and the training process is completed by combining the outline of the non-overlapped text line region to form a primary training model;
step 23: testing the preliminary training model through a test sample set, evaluating the detection and segmentation precision of the non-overlapped text line region and the overlapped text line region, if the precision requirement is met, terminating the training process, and segmenting the full convolution neural network by taking the preliminary training model as a trained example; if the precision requirement is not met, increasing the training sample size, adjusting the structure and the training parameters of the example segmentation full-convolution example segmentation network, and repeating the training process until a trained example segmentation full-convolution neural network meeting the precision requirement is obtained;
step 24: inputting the preprocessed original image into a trained example segmentation full convolution neural network, and outputting a non-overlapping text line region feature score image, an overlapping text line region feature score image and a link information feature score image among text line region pixels.
Further, the step 3 specifically includes:
step 31: setting a first threshold value for the characteristic score map of the non-overlapping text line region, setting a second threshold value for the characteristic score map of the overlapping text line region, and setting a third threshold value for the characteristic score map of the link information between the pixels of the text line region;
step 32: performing binarization processing on the non-overlapping text line region characteristic score map according to a first threshold value, performing binarization processing on the overlapping text line region characteristic score map according to a second threshold value, performing binarization processing on the link information characteristic score map among the text line region pixels according to a third threshold value, obtaining non-overlapping text line region pixel points and background pixel points in the non-overlapping text line region characteristic score map, obtaining overlapping text line region pixel points and background pixel points in the overlapping text line region characteristic score map, and obtaining link state information and non-link state information in the link information characteristic score map among the text line region pixels;
step 33: and combining the link state information according to the pixel points of the non-overlapping text line region to obtain the pixel point region of the non-overlapping text line region, combining the link state information according to the pixel points of the overlapping text line region to obtain the pixel point region of the overlapping text line region, and expressing the outline of the pixel point region by using a connected domain.
Further, the value ranges of the first threshold, the second threshold and the third threshold are all [0,1 ].
Further, the step 4 specifically includes:
step 41: combining the pixel point regions which are not overlapped text line regions and the pixel point regions which are overlapped text line regions;
step 42: judging adjacent information between adjacent pixel points, combining a characteristic score chart of link information between pixels in a text line region, and merging the two pixel points into a connected domain when the two pixel points are adjacent and the link information of the two pixel points is positive;
further, two adjacent pixel points mean: the difference between the two pixel points is 1-3 pixels on the X-direction pixel coordinate axis or the Y-direction pixel coordinate axis.
Step 43: and acquiring an optimal distance threshold on the variable distance threshold test set by adopting a strategy based on variable distance threshold merging and taking end-to-end detection accuracy as a basis and adopting a dynamic distance threshold searching mode, and merging if the distance between two connected domains is within the optimal distance threshold range.
According to a second aspect of the present invention, there is provided an overlapped text line positioning and segmenting device based on deep learning, comprising the following components:
an original image input means for inputting an original image containing overlapping text lines, and preprocessing the original image;
the characteristic score graph output component is used for inputting the preprocessed original image into a trained example segmentation full convolution neural network and outputting a non-overlapping text line region characteristic score graph, an overlapping text line region characteristic score graph and a link information characteristic score graph among text line region pixels;
the outline acquisition component is used for acquiring outlines of the non-overlapped text line region and the overlapped text line region based on the non-overlapped text line region feature score map, the overlapped text line region feature score map and the link information feature score map among the text line region pixels by a connected domain analysis method;
the region merging component is used for merging the non-overlapped text line region into the overlapped text line region according to the outlines of the non-overlapped text line region and the overlapped text line region;
and the result output component is used for performing quadrilateral fitting on the text line region to obtain an external quadrilateral of the text line region and realize the positioning segmentation of the overlapped text lines.
According to a third aspect of the present invention, there is provided a deep learning based overlapping text line location segmentation system, the system comprising:
a processor and a memory for storing executable instructions;
wherein the processor is configured to execute the executable instructions to perform the method of deep learning based overlapping text line location segmentation according to any of the preceding aspects.
According to a fourth aspect of the present invention, there is provided a computer-readable storage medium having stored thereon a computer program,
the computer program, when executed by a processor, implements a method of deep learning based overlapping text line localization segmentation as described in any of the preceding aspects.
The invention has the beneficial effects that:
1. the method based on deep learning full convolution network instance segmentation automatically extracts and learns the image characteristics, and avoids difficult manual rule design and complex pre-processing and post-processing flows;
2. the method can adapt to different types of document images and different types of text line overlapping styles, and solves the problem that the traditional method cannot solve;
3. the designed full convolution network finally outputs Score Maps, namely, a Score map, which represents the predicted confidence coefficient, and the confidence coefficient can effectively guide the subsequent recognition and even structuring work;
4. the method is simple and efficient, the whole process consists of an FCN network and simple and efficient post-processing logic, and the requirements of practical application are met.
5. The marking training process is simple, and the training process can be efficiently completed on the premise of not needing to specially mark the overlapped area.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
FIG. 1 illustrates a common prior art overlapped text line in a marketplace;
FIG. 2 illustrates a flow chart of a method for deep learning based segmentation of overlapping text line locations in accordance with the present invention;
FIG. 3 illustrates an example of overlapping text lines in the method for segmentation based on deep learning for locating overlapping text lines according to the present invention;
FIG. 4 is a graph illustrating feature scores of non-overlapping text line regions in the method for segmentation of overlapping text line positioning based on deep learning according to the present invention;
FIG. 5 is a diagram illustrating feature scores of overlapped text line regions in the method for positioning and segmenting overlapped text lines based on deep learning according to the present invention;
FIG. 6 is a schematic diagram illustrating an optimal threshold search process in the method for segmentation of overlapped text lines based on deep learning according to the present invention;
FIG. 7 is a schematic diagram illustrating an exemplary enclosing quadrilateral of an overlapped text line in the method for positioning and segmenting the overlapped text line based on deep learning according to the present invention;
FIG. 8 is a diagram illustrating the effect of the method for locating and segmenting the overlapped text lines based on deep learning according to the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terms "first," "second," and the like in the description and in the claims of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein.
Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
A plurality, including two or more.
And/or, it should be understood that, for the term "and/or" as used in this disclosure, it is merely one type of association that describes an associated object, meaning that three types of relationships may exist. For example, a and/or B, may represent: a exists alone, A and B exist simultaneously, and B exists alone.
The invention relates to a method for positioning and segmenting overlapped text lines based on deep learning, which comprises the following steps:
inputting an original image containing overlapped text lines, and preprocessing the original image;
inputting the preprocessed original image into a trained example segmentation full convolution neural network, and outputting a non-overlapping text line region feature score image, an overlapping text line region feature score image and a link information feature score image among text line region pixels;
acquiring outlines of the non-overlapping text line region and the overlapping text line region by a connected domain analysis method based on the non-overlapping text line region feature score map, the overlapping text line region feature score map and a link information feature score map among text line region pixels;
combining the non-overlapped text line region to the overlapped text line region according to the outlines of the non-overlapped text line region and the overlapped text line region;
and performing quadrilateral fitting on the text line region to obtain an external quadrilateral of the text line region, and realizing the positioning segmentation of the overlapped text lines.
Examples
The invention relates to a precise overlapping text line positioning and dividing method. Aiming at the problem of detection of overlapped text lines in bills, forms and document images, the team creatively adopts an example segmentation full convolution neural network model, integrates detection of text areas and segmentation tasks of different text line example areas in a neural network, adopts an innovative post-processing method, and combines a text line segmentation graph and an overlapped area segmentation graph to generate a complete and accurate text line example outline so as to accurately determine coordinates of different text line areas.
The specific flow is shown in fig. 2:
the first step of image preprocessing: the important thing for preprocessing the input image is the boundary alignment, so that the width and height of the image can be downsampled without influence, and the value of the alignment boundary generally coincides with the value of downsampling. For example where the present embodiment downsamples are 1/16, the boundary alignment is 16 units or pixels, or an integer multiple of 16, such as 32,64, etc.
Second step text instance segmentation at pixel level: and sending the preprocessed image into a trained example segmentation full convolution neural network, and outputting a plurality of feature Score Maps (Score Maps) which respectively represent feature Maps of link information among pixels of a background, a text region, an overlapped text region and the text region.
Training procedure for example split full convolutional networks: firstly, labeling each sample in a sample set, wherein the labeled content mainly comprises a text line area outline represented by a quadrangle, and special labeling is not needed for the overlapped text line area, so that a label file is generated. The file and the image file are fed into a full convolution example segmentation network for training. In order to complete supervision and learning of the overlapped text line regions, in network preprocessing, contour information of the overlapped regions can be automatically calculated according to contours of text line examples in the label files, the contour information is sequentially used as supervision targets of the overlapped regions, and training tasks are jointly completed by combining targets of non-overlapped regions. After one round of training is completed, the accuracy of the whole text line detection segmentation needs to be evaluated on the test set according to the trained model, wherein the accuracy evaluation also includes the accuracy evaluation of the overlapping region. If the expected effect and accuracy index are achieved, the training process may be terminated, and the model used as a prediction; if not, increasing the training sample size, adjusting the full-volume example segmentation network model structure, training parameters and the like possibly, and repeating the training process until the evaluated performance meets the requirements. The adjustment of the possible model structure generally comes from two aspects, namely the adjustment of the model capacity on one hand, and the aim is to improve the feature learning capability of the model, including the adjustment of the number of layers of the convolutional neural network, the number of filters of the convolutional operation of each layer, the feature map fusion mode, the style of the nonlinear activation function and the like; another aspect is the tuning of the generalization capability of the model, such as the tuning of the regularization term parameters in the network, with the goal of improving the performance of the model on the test set (i.e., the unlearned samples). The possible adjustment of the training parameters generally includes several aspects, on one hand, the adjustment of the super parameters of the training process, such as the adjustment of the learning rate attenuation strategy and the initial size, the adjustment of the training batch size and the whole iteration number, etc.; on the other hand, the adjustment of the training loss function includes the adjustment of the loss function style and the super parameters involved in the loss function.
Taking fig. 3 as an example, the following describes the feature score map of the non-overlapping text line region and the feature score map of the overlapping text line region.
1) The non-overlapping text line region Score Map, as shown in fig. 4, has each pixel value representing the confidence that the pixel is located inside the text line region, normalized in the [0,1] interval.
2) The overlapping text line region Score Map, as shown in fig. 5, has a pixel value representing the confidence that the pixel is located in the overlapping text region, also normalized to the [0,1] interval.
The third step extracts the text region and overlap region contours: on the basis of a connected domain analysis method, profile information of a non-overlapped text line region and an overlapped text line region is obtained by innovatively combining a non-overlapped text line region, an overlapped text line region Score maps and a Score maps of inter-pixel link information in a text region.
And step four, combining the outlines of the overlapped text line regions into the outlines of the non-overlapped text line regions: the step completes the merging work of the outline graph of each text line instance and the outline graph of the overlapped area, and the process adopts a method of connected domain analysis and multi-connected domain merging to generate a complete outline of each text line area. In the connected domain analysis process, adjacent position information between pixels and link information between adjacent pixels predicted by a full convolution neural network are creatively combined, namely, only when two pixels are adjacent and the link information of the predicted pixels is positive, the two pixels are considered to be combined into a connected domain. In the process of merging the multiple connected domains, a strategy based on variable distance threshold merging is innovatively adopted, and the optimal distance threshold is obtained on a test set by adopting a mode of dynamically searching the distance threshold on the basis of end-to-end detection precision. When the optimal threshold is determined, the merge operation is performed as long as the distance between connected domains is within the threshold. As shown in fig. 6, the optimal threshold search process specifically includes:
and testing the full convolution example segmentation network through a variable distance threshold test set, setting a distance threshold search range interval, such as [0, 5], and performing optimal distance threshold search to obtain an optimal distance threshold.
The step of setting the distance threshold search range interval to perform the optimal distance threshold search comprises the following steps:
traversing a threshold interval, wherein the step length is 1; applying the threshold to perform text line outline combination; calculating the integral detection and segmentation precision of the test set; and storing the maximum precision and the optimal threshold, and taking the threshold as the optimal distance threshold.
Experiments show that by combining the two innovative methods, the algorithm shows higher precision on the combination of the overlapped text outline and the text outline.
And fifthly, carrying out quadrilateral fitting on the text line example area to obtain a circumscribed quadrilateral of the text line. As shown in fig. 7.
Fig. 8 shows some effect diagrams of the positioning and segmentation of the overlapped text lines by using the technical solution of the present invention. Experiments show that the method can effectively solve the problem of positioning and dividing the overlapped text lines and can complete the tasks which cannot be completed by the traditional method. In addition, good algorithm performance can be achieved only by matching less training data and training iteration rounds with simple post-processing, and therefore, the method has a very high practical application value.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the above implementation method can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation method. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A method for positioning and segmenting overlapped text lines based on deep learning is characterized by comprising the following steps:
step 1, inputting an original image containing overlapped text lines, and preprocessing the original image;
step 2, training the example segmentation full convolution neural network, inputting the preprocessed original image into the trained example segmentation full convolution neural network, and outputting a non-overlapping text line region feature score image, an overlapping text line region feature score image and a link information feature score image among text line region pixels;
step 3, acquiring outlines of the non-overlapping text line region and the overlapping text line region by a connected domain analysis method based on the non-overlapping text line region feature score map, the overlapping text line region feature score map and the link information feature score map among the text line region pixels;
step 4, combining the non-overlapped text line region to the overlapped text line region according to the outlines of the non-overlapped text line region and the overlapped text line region;
and 5, performing quadrilateral fitting on the combined text line region to obtain an external quadrilateral of the text line region, and realizing the positioning segmentation of the overlapped text lines.
2. The method for positioning and segmenting overlapped text lines based on deep learning according to claim 1, wherein the step 1 specifically comprises: the method comprises the steps of conducting boundary completion on input original images by N units, then conducting 1/M down-sampling, and obtaining preprocessed original images, wherein M and N are integers which are not less than 1, and M is an integral multiple of N.
3. The method for positioning and segmenting overlapped text lines based on deep learning according to claim 1, wherein the step 2 specifically comprises:
step 21: labeling each sample image in the training sample set by using a quadrilateral to represent the outline of a text line area, and generating a labeled file with labels;
step 22: sending the label file and the sample image into an example segmentation full convolution example segmentation network for training, wherein in order to complete supervision and learning of the overlapped text line, the example segmentation full convolution example segmentation network automatically calculates the outline of the overlapped text line region according to the outline of the text line region in the label file, and then the outline is taken as the supervision and learning target of the overlapped text line region, and the training process is completed by combining the outline of the non-overlapped text line region to form a primary training model;
step 23: testing the preliminary training model through a test sample set, evaluating the detection and segmentation precision of the non-overlapped text line region and the overlapped text line region, if the precision requirement is met, terminating the training process, and segmenting the full convolution neural network by taking the preliminary training model as a trained example; if the precision requirement is not met, increasing the training sample size, adjusting the structure and the training parameters of the example segmentation full-convolution example segmentation network, and repeating the training process until a trained example segmentation full-convolution neural network meeting the precision requirement is obtained;
step 24: inputting the preprocessed original image into a trained example segmentation full convolution neural network, and outputting a non-overlapping text line region feature score image, an overlapping text line region feature score image and a link information feature score image among text line region pixels.
4. The method for positioning and segmenting overlapped text lines based on deep learning according to claim 1, wherein the step 3 specifically comprises:
step 31: setting a first threshold value for the characteristic score map of the non-overlapping text line region, setting a second threshold value for the characteristic score map of the overlapping text line region, and setting a third threshold value for the characteristic score map of the link information between the pixels of the text line region;
step 32: performing binarization processing on the non-overlapping text line region characteristic score map according to a first threshold value, performing binarization processing on the overlapping text line region characteristic score map according to a second threshold value, performing binarization processing on the link information characteristic score map among the text line region pixels according to a third threshold value, obtaining non-overlapping text line region pixel points and background pixel points in the non-overlapping text line region characteristic score map, obtaining overlapping text line region pixel points and background pixel points in the overlapping text line region characteristic score map, and obtaining link state information and non-link state information in the link information characteristic score map among the text line region pixels;
step 33: and combining the link state information according to the pixel points of the non-overlapping text line region to obtain the pixel point region of the non-overlapping text line region, combining the link state information according to the pixel points of the overlapping text line region to obtain the pixel point region of the overlapping text line region, and expressing the outline of the pixel point region by using a connected domain.
5. The method according to claim 4, wherein the first threshold, the second threshold and the third threshold all have a value range of [0,1 ].
6. The method for positioning and segmenting overlapped text lines based on deep learning according to claim 1, wherein the step 4 specifically comprises:
step 41: combining the pixel point regions which are not overlapped text line regions and the pixel point regions which are overlapped text line regions;
step 42: judging adjacent information between adjacent pixel points, combining a characteristic score chart of link information between pixels in a text line region, and merging the two pixel points into a connected domain when the two pixel points are adjacent and the link information of the two pixel points is positive;
step 43: and acquiring an optimal distance threshold on the variable distance threshold test set by adopting a strategy based on variable distance threshold merging and adopting a mode of dynamically searching the distance threshold according to the end-to-end detection precision, and merging if the distance between two connected domains is within the optimal distance threshold range.
7. The method for locating and segmenting the overlapped text lines based on the deep learning of claim 6, wherein the two adjacent pixel points are: the difference between the two pixel points is 1-3 pixels on the X-direction pixel coordinate axis or the Y-direction pixel coordinate axis.
8. An overlapped text line positioning and segmentation device based on deep learning is characterized by comprising the following components:
an original image input means for inputting an original image containing overlapping text lines, and preprocessing the original image;
the characteristic score graph output component is used for inputting the preprocessed original image into a trained example segmentation full convolution neural network and outputting a non-overlapping text line region characteristic score graph, an overlapping text line region characteristic score graph and a link information characteristic score graph among text line region pixels;
the outline acquisition component is used for acquiring outlines of the non-overlapped text line region and the overlapped text line region based on the non-overlapped text line region feature score map, the overlapped text line region feature score map and the link information feature score map among the text line region pixels by a connected domain analysis method;
the region merging component is used for merging the non-overlapped text line region into the overlapped text line region according to the outlines of the non-overlapped text line region and the overlapped text line region;
and the result output component is used for performing quadrilateral fitting on the text line region to obtain an external quadrilateral of the text line region and realize the positioning segmentation of the overlapped text lines.
9. A system for overlapping text line location segmentation based on deep learning, the system comprising:
a processor and a memory for storing executable instructions;
wherein the processor is configured to execute the executable instructions to perform the method of deep learning based overlapping text line location segmentation of any one of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a computer program,
the computer program when executed by a processor implements a method of deep learning based overlapping text line localization segmentation according to any one of claims 1 to 7.
CN201911053860.0A 2019-10-31 2019-10-31 Overlapped text line positioning and segmentation method and system based on deep learning Active CN110956088B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911053860.0A CN110956088B (en) 2019-10-31 2019-10-31 Overlapped text line positioning and segmentation method and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911053860.0A CN110956088B (en) 2019-10-31 2019-10-31 Overlapped text line positioning and segmentation method and system based on deep learning

Publications (2)

Publication Number Publication Date
CN110956088A true CN110956088A (en) 2020-04-03
CN110956088B CN110956088B (en) 2023-06-30

Family

ID=69976607

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911053860.0A Active CN110956088B (en) 2019-10-31 2019-10-31 Overlapped text line positioning and segmentation method and system based on deep learning

Country Status (1)

Country Link
CN (1) CN110956088B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539269A (en) * 2020-04-07 2020-08-14 北京达佳互联信息技术有限公司 Text region identification method and device, electronic equipment and storage medium
CN112232336A (en) * 2020-09-02 2021-01-15 深圳前海微众银行股份有限公司 Certificate identification method, device, equipment and storage medium
CN113515920A (en) * 2020-04-09 2021-10-19 北京庖丁科技有限公司 Method, electronic device and computer readable medium for extracting formula from table
CN114419641A (en) * 2022-03-15 2022-04-29 腾讯科技(深圳)有限公司 Training method and device of text separation model, electronic equipment and storage medium
CN116152842A (en) * 2022-11-18 2023-05-23 北京中卡信安电子设备有限公司 Certificate image processing method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160180163A1 (en) * 2014-12-19 2016-06-23 Konica Minolta Laboratory U.S.A., Inc. Method for segmenting text words in document images using vertical projections of center zones of characters
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN109902622A (en) * 2019-02-26 2019-06-18 中国科学院重庆绿色智能技术研究院 A kind of text detection recognition methods for boarding pass information verifying
CN109919025A (en) * 2019-01-30 2019-06-21 华南理工大学 Video scene Method for text detection, system, equipment and medium based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160180163A1 (en) * 2014-12-19 2016-06-23 Konica Minolta Laboratory U.S.A., Inc. Method for segmenting text words in document images using vertical projections of center zones of characters
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN109919025A (en) * 2019-01-30 2019-06-21 华南理工大学 Video scene Method for text detection, system, equipment and medium based on deep learning
CN109902622A (en) * 2019-02-26 2019-06-18 中国科学院重庆绿色智能技术研究院 A kind of text detection recognition methods for boarding pass information verifying

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
周翔宇等: "基于YOLO的自然场景倾斜文本定位方法研究", 《计算机工程与应用》 *
王涛等: "基于语义分割技术的任意方向文字识别", 《应用科技》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111539269A (en) * 2020-04-07 2020-08-14 北京达佳互联信息技术有限公司 Text region identification method and device, electronic equipment and storage medium
CN113515920A (en) * 2020-04-09 2021-10-19 北京庖丁科技有限公司 Method, electronic device and computer readable medium for extracting formula from table
CN112232336A (en) * 2020-09-02 2021-01-15 深圳前海微众银行股份有限公司 Certificate identification method, device, equipment and storage medium
CN114419641A (en) * 2022-03-15 2022-04-29 腾讯科技(深圳)有限公司 Training method and device of text separation model, electronic equipment and storage medium
CN114419641B (en) * 2022-03-15 2022-06-21 腾讯科技(深圳)有限公司 Training method and device of text separation model, electronic equipment and storage medium
CN116152842A (en) * 2022-11-18 2023-05-23 北京中卡信安电子设备有限公司 Certificate image processing method and device, storage medium and electronic equipment
CN116152842B (en) * 2022-11-18 2023-11-03 北京中卡信安电子设备有限公司 Certificate image processing method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN110956088B (en) 2023-06-30

Similar Documents

Publication Publication Date Title
CN110956088B (en) Overlapped text line positioning and segmentation method and system based on deep learning
CN110008956B (en) Invoice key information positioning method, invoice key information positioning device, computer equipment and storage medium
CN110555433B (en) Image processing method, device, electronic equipment and computer readable storage medium
CN109118473B (en) Angular point detection method based on neural network, storage medium and image processing system
JP7246104B2 (en) License plate identification method based on text line identification
US11042742B1 (en) Apparatus and method for detecting road based on convolutional neural network
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN110675407B (en) Image instance segmentation method and device, electronic equipment and storage medium
CN104573688A (en) Mobile platform tobacco laser code intelligent identification method and device based on deep learning
CN113435240B (en) End-to-end form detection and structure identification method and system
CN113221925B (en) Target detection method and device based on multi-scale image
KR101997048B1 (en) Method for recognizing distant multiple codes for logistics management and code recognizing apparatus using the same
CN114120289B (en) Method and system for identifying driving area and lane line
CN111507337A (en) License plate recognition method based on hybrid neural network
CN113591719A (en) Method and device for detecting text with any shape in natural scene and training method
CN113989604A (en) Tire DOT information identification method based on end-to-end deep learning
CN111444773B (en) Image-based multi-target segmentation identification method and system
CN111914596B (en) Lane line detection method, device, system and storage medium
CN114913338A (en) Segmentation model training method and device, and image recognition method and device
CN112364863B (en) Character positioning method and system for license document
CN117115614B (en) Object identification method, device, equipment and storage medium for outdoor image
CN113537187A (en) Text recognition method and device, electronic equipment and readable storage medium
CN112364687A (en) Improved Faster R-CNN gas station electrostatic sign identification method and system
Jain et al. Number plate detection using drone surveillance
KR102363049B1 (en) Method and apparatus for machine learning based defocus map estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100083 office A-501, 5th floor, building 2, yard 1, Nongda South Road, Haidian District, Beijing

Applicant after: BEIJING YIDAO BOSHI TECHNOLOGY Co.,Ltd.

Address before: 100083 office a-701-1, a-701-2, a-701-3, a-701-4, a-701-5, 7th floor, building 2, No.1 courtyard, Nongda South Road, Haidian District, Beijing

Applicant before: BEIJING YIDAO BOSHI TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
CB03 Change of inventor or designer information

Inventor after: Wang Yong

Inventor after: Zhu Junmin

Inventor after: Kang Tiegang

Inventor after: Shi Wei

Inventor before: Wang Yong

Inventor before: Zhu Junmin

Inventor before: Kang Tiegang

Inventor before: Shi Wei

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant