CN113642401A - Document line segmentation and classification method and system based on deep learning network - Google Patents

Document line segmentation and classification method and system based on deep learning network Download PDF

Info

Publication number
CN113642401A
CN113642401A CN202110790181.2A CN202110790181A CN113642401A CN 113642401 A CN113642401 A CN 113642401A CN 202110790181 A CN202110790181 A CN 202110790181A CN 113642401 A CN113642401 A CN 113642401A
Authority
CN
China
Prior art keywords
deep learning
learning network
network model
line segmentation
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110790181.2A
Other languages
Chinese (zh)
Inventor
汪昕
郭骏
闫科萍
潘正颐
侯大为
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou Weiyizhi Technology Co Ltd
Original Assignee
Changzhou Weiyizhi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou Weiyizhi Technology Co Ltd filed Critical Changzhou Weiyizhi Technology Co Ltd
Priority to CN202110790181.2A priority Critical patent/CN113642401A/en
Publication of CN113642401A publication Critical patent/CN113642401A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a document line segmentation and classification method and system based on a deep learning network, which comprises the following steps: step M1: establishing a deep learning network model capable of segmenting texts; step M2: training a deep learning network model by utilizing the synthetic text picture to obtain a trained deep learning network model; step S3: and performing line segmentation and classification on the documents by using the trained deep learning network model. The method generates the synthetic text through an A-Res algorithm by combining the sample prior probability, so that the deep learning network model completes training on the synthetic text, and has a better line segmentation effect than a histogram method; compared with the method for completing the deep model training by marking data, the method is lower in labor cost.

Description

Document line segmentation and classification method and system based on deep learning network
Technical Field
The invention relates to the technical field of deep learning-computer vision, in particular to a document line segmentation and classification method and system based on a deep learning network, and more particularly to a deep learning network for line segmentation and classification of documents.
Background
Document line detection is an important sub-direction of the OCR domain, and its task is to locate the upper and lower boundaries of a document line and label its category. Unlike the general target detection task, there is a clear regularity in the input data. A complete set of algorithm application flows generally includes: collecting document pictures, marking the pictures, training a model and deploying the model.
In the field of current document line detection, in order to obtain higher line segmentation accuracy, a huge neural network structure is generally adopted, and the bottleneck is that a large number of parameters need to be trained by using massive labeled samples to participate in fitting; in addition, since the training set is always a subset of the real samples, according to the independent and identically distributed assumption of machine learning, it is often necessary to invest manpower again to label data in order to generalize the model into new unlabeled samples.
Patent document CN112257586A (application number: 202011135858.0) discloses a true value frame selection method, device, storage medium, and apparatus in target detection, and belongs to the technical field of image processing. The method comprises the following steps: acquiring a target characteristic diagram obtained after characteristic extraction is carried out on an image, wherein the target characteristic diagram comprises a plurality of grids with preset sizes; acquiring a plurality of detection frames corresponding to each small target object in the image in the target characteristic diagram; for each detection frame, calculating the centrality score of the grid with the predetermined points in the detection frame, wherein the predetermined points are corner points and/or central points of the grid; and for each small target object, determining the detection frame corresponding to the maximum centrality score as a truth frame of the small target object from a plurality of detection frames corresponding to the small target object. Compared with the patent, the application object of the invention is the line segmentation and classification of the document, the line segmentation and classification particularity of the document is that the line segmentation only considers the segmentation position in the vertical direction, and the particularity causes that the application object is the target detection frame scheme which cannot be transferred to the invention, because the scheme of the target detection frame considers the segmentation positions in the horizontal direction and the vertical direction at the same time, and the invention specially designs the following characteristics aiming at the line segmentation and classification of the document: and removing root number evolution, and setting the result in the horizontal direction as 1 to ensure that the calculation of the centrality is not influenced by different tasks.
The traditional modeling ideas generally have two types, two-stage type and one-stage type. The two-stage method is characterized in that the segmentation of text lines is finished by using an algorithm of a horizontal projection histogram, and then the classification of the text lines is finished by using a statistical model or a depth model, and the method has the defects that the effect of the horizontal projection histogram is easily interfered by noise, and the result is not robust; the one-stage method is to label the whole image and then use a uniform depth model to complete the segmentation and classification of different text lines on one image, and the classical method includes fast R-CNN, R-FCN, YOLO and FCOS.
For a document line segmentation and classification task, the traditional two-stage modeling is to split the document line segmentation and classification task into two subtasks. The method is characterized in that the horizontal projection histogram algorithm is easily interfered by noise, and results with low confidence coefficient are generated. Fig. 1 is a visualization result of the horizontal projection histogram algorithm on a line segmentation task.
The traditional one-stage modeling completely adopts deep model training and end-to-end modeling, and the general process is as follows: collecting data, marking data, training a model and predicting the model. However, the depth model contains a large number of parameters to be fitted, and therefore a large amount of data needs to be annotated, while the cost of purely human annotation data is very large. In addition, machine learning requires that data be independently and uniformly distributed, which means that data with various distributions need to be labeled to meet the generalization requirement, and this again increases the cost of human labeling.
Here, the problem to be solved is how to ensure the accuracy of model prediction and minimize the labeling cost. The method adopted by the invention is to use the synthetic text as a training set, so that the problem is converted into how to ensure the generalization performance of the model trained on the synthetic text, and a specific solution is introduced below.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a document line segmentation and classification method and system based on a deep learning network.
The invention provides a document line segmentation and classification method based on a deep learning network, which comprises the following steps:
step S1: establishing a deep learning network model capable of segmenting texts;
step S2: training a deep learning network model by utilizing the synthetic text picture to obtain a trained deep learning network model;
step S3: and performing line segmentation and classification on the documents by using the trained deep learning network model.
Preferably, the deep learning network model comprises: the method comprises the steps of presetting a number of Convolation layers with ReLU activation functions, presetting a number of BN layers, presetting a number of BiLSTM layers, presetting a number of Convolation layers and presetting a number of full-connection layer branches.
Preferably, in the deep learning network model: performing dimensionality reduction on a picture with a preset size through a Convolation layer with a ReLU activation function in a preset layer and a BN layer; inputting the feature graph after the dimension reduction processing into a BiLSTM layer with a preset number in a behavior time step, and reserving the output of the last time step; and transforming the output feature graph to obtain a feature graph with a preset size, extracting features through a preset number of Convolation layers to obtain an extracted feature graph, and generating output results of positioning, classification and centrality respectively by utilizing the extracted feature graph through three full-connection layer branches.
Preferably, the centrality is taken as:
Figure BDA0003160567880000031
wherein, t*Representing the distance between the pixel point and the top of the group Truth frame; b*And the distance between the pixel point and the bottom of the group Truth frame is represented.
Preferably, the composite text picture includes: through offline statistics, the probability distribution of each element in different documents, including font, character size, paragraph spacing and line spacing, is obtained; and generating a composite text picture with consistent distribution by using an A-Res algorithm in combination with the prior probability.
The invention provides a document line segmentation and classification system based on a deep learning network, which comprises the following steps:
module M1: establishing a deep learning network model capable of segmenting texts;
module M2: training a deep learning network model by utilizing the synthetic text picture to obtain a trained deep learning network model;
module M3: and performing line segmentation and classification on the documents by using the trained deep learning network model.
Preferably, the deep learning network model comprises: the preset layer comprises a Convolation layer, a BN layer, a preset number of BilSTM layers, a preset number of Convolation layers and a preset number of full-connection layer branches of a ReLU activation function.
Preferably, in the deep learning network model: performing dimensionality reduction on a picture with a preset size through a Convolation layer with a ReLU activation function in a preset layer and a BN layer; inputting the feature graph after the dimension reduction processing into a BiLSTM layer with a preset number in a behavior time step, and reserving the output of the last time step; and transforming the output feature graph to obtain a feature graph with a preset size, extracting features through a preset number of Convolation layers to obtain an extracted feature graph, and generating output results of positioning, classification and centrality respectively by utilizing the extracted feature graph through three full-connection layer branches.
Preferably, the centrality is taken as:
Figure BDA0003160567880000032
wherein, t*Representing the distance between the pixel point and the top of the group Truth frame; b*And the distance between the pixel point and the bottom of the group Truth frame is represented.
Preferably, the composite text picture includes: through offline statistics, the probability distribution of each element in different documents, including font, character size, paragraph spacing and line spacing, is obtained; and generating a composite text picture with consistent distribution by using an A-Res algorithm in combination with the prior probability.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention provides a novel deep learning network model structure, which introduces the idea of centrality in a text line detection task and ensures the accuracy of segmentation;
2. a synthesized text is generated through an A-Res algorithm by combining with the sample prior probability, so that the deep learning network model completes training on the synthesized text, and the method has a better line segmentation effect than a histogram method; compared with the method for completing the deep model training by marking data, the method has the advantages that the labor cost is lower;
3. by providing a lightweight, end-to-end deep text detection network, the invention can achieve 3FPS using a GPU speed of Tesla K80 for 1024 x 724 picture input.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a diagram illustrating the line segmentation effect of the horizontal projection histogram algorithm.
Fig. 2 is a schematic diagram of the line segmentation effect of the depth model according to the present invention.
Fig. 3 is a schematic structural diagram of a deep learning network model.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
Example 1
The invention provides a document line segmentation and classification method based on a deep learning network, which comprises the following steps:
step M1: establishing a deep learning network model capable of segmenting texts;
step M2: training a deep learning network model by utilizing the synthetic text picture to obtain a trained deep learning network model;
step S3: and performing line segmentation and classification on the documents by using the trained deep learning network model.
Specifically, the deep learning network model in step S1 includes: the preset layer comprises a Convolation layer, a BN layer, a preset number of BilSTM layers, a preset number of Convolation layers and a preset number of full-connection layer branches of a ReLU activation function.
Specifically, the deep learning network model in step S1 includes: as shown in fig. 2, a grayscale picture with size 1024 × 724 is passed through two convergence layers with ReLU activation function and one BN layer; the purpose of this step is to reduce the dimensions, because the original picture is too large to be directly input into the BilSTM layer; then, the generated 4 × 256 × 181 feature map is input to two layers of BilSTMs in row time steps, and the output of the last time step is reserved; and (3) deforming the output 1024 × 1024 characteristic diagram to obtain 16 × 256 characteristic diagram, extracting characteristics through 4 layers of Convolation layers to obtain 256 × 8 characteristic diagram, and generating output results of positioning, classification and centrality through three fully-connected layer branches.
In particular, the common text detection model requires only two outputs, namely, localization and classification, where a centrality branch is added. The idea of centrality branching is proposed in the FCOS paper for the first time, and centrality reflects the distance of a pixel from the center of a Ground Truth, and is finally used as the weight of positioning loss. By the method, the segmentation precision of the model is greatly enhanced. The original formula is as follows,
Figure BDA0003160567880000051
here, |*,t*,r*And b*The position of a target detection box is uniquely determined, and the text line segmentation task is introduced into the text line segmentation task of the invention, only proper simplification needs to be carried out, and a simplified centrality calculation formula is shown below.
Figure BDA0003160567880000052
Wherein, t*Representing the distance between the pixel point and the top of the group Truth frame; b*And the distance between the pixel point and the bottom of the group Truth frame is represented.
The application object of the invention is the line segmentation and classification of the document, the line segmentation and classification particularity of the document is that the line segmentation only considers the segmentation position in the vertical direction, and aiming at the line segmentation and classification particularity of the document, the root number evolution is removed based on the existing centrality calculation mode, and the result in the horizontal direction is set to be 1, thereby ensuring that the centrality calculation is not influenced by different tasks.
In an experiment, the weight of different position points cannot be well distinguished by the formula (2), so that the t is combined with the idea of a softmax classifier on the basis of the formula (2)*And b*Changed to exp (t)*) And exp (b)*) The influence of the central point on the final prediction is strengthened. Experiments prove that the method is effective in our scene. The modified formula is as follows:
Figure BDA0003160567880000061
specifically, the step S2 of synthesizing the text picture includes: the synthesized text data must ensure that the synthesized text is as close to the actual text as possible. The text, text size, paragraph spacing, and line spacing may be different for different documents, and the occurrence of these elements that make up the document follows a certain probability distribution. Through offline statistics, the probability distribution of each element in different documents, including font, character size, paragraph spacing and line spacing, is obtained; the A-Res algorithm is used in combination with the prior probability to generate synthetic text pictures with consistent distribution, and the method is proved to be effective in the later experimental results.
The invention provides a document line segmentation and classification system based on a deep learning network, which comprises the following steps:
module M1: establishing a deep learning network model capable of segmenting texts;
module M2: training a deep learning network model by utilizing the synthetic text picture to obtain a trained deep learning network model;
module M3: and performing line segmentation and classification on the documents by using the trained deep learning network model.
Specifically, the deep learning network model in the module M1 includes: the preset layer comprises a Convolation layer, a BN layer, a preset number of BilSTM layers, a preset number of Convolation layers and a preset number of full-connection layer branches of a ReLU activation function.
Specifically, the deep learning network model in the module M1 includes: as shown in fig. 2, a grayscale picture with size 1024 × 724 is passed through two convergence layers with ReLU activation function and one BN layer; the purpose of this step is to reduce the dimensions, because the original picture is too large to be directly input into the BilSTM layer; then, the generated 4 × 256 × 181 feature map is input to two layers of BilSTMs in row time steps, and the output of the last time step is reserved; and (3) deforming the output 1024 × 1024 characteristic diagram to obtain 16 × 256 characteristic diagram, extracting characteristics through 4 layers of Convolation layers to obtain 256 × 8 characteristic diagram, and generating output results of positioning, classification and centrality through three fully-connected layer branches.
In particular, the common text detection model requires only two outputs, namely, localization and classification, where a centrality branch is added. The idea of centrality branching is proposed in the FCOS paper for the first time, and centrality reflects the distance of a pixel from the center of a Ground Truth, and is finally used as the weight of positioning loss. By the method, the segmentation precision of the model is greatly enhanced. The original formula is as follows,
Figure BDA0003160567880000062
here, |*,t*,r*And b*The position of a target detection box is uniquely determined, and the text line segmentation task is introduced into the text line segmentation task of the invention, only proper simplification needs to be carried out, and a simplified centrality calculation formula is shown below.
Figure BDA0003160567880000071
Wherein, t*Representing the distance between the pixel point and the top of the group Truth frame; b*And the distance between the pixel point and the bottom of the group Truth frame is represented.
The application object of the invention is the line segmentation and classification of the document, the line segmentation and classification particularity of the document is that the line segmentation only considers the segmentation position in the vertical direction, and aiming at the line segmentation and classification particularity of the document, the root number evolution is removed based on the existing centrality calculation mode, and the result in the horizontal direction is set to be 1, thereby ensuring that the centrality calculation is not influenced by different tasks.
In an experiment, the weight of different position points cannot be well distinguished by the formula (2), so that the t is combined with the idea of a softmax classifier on the basis of the formula (2)*And b*Changed to exp (t)*) And exp (b)*) The influence of the central point on the final prediction is strengthened. Experiments prove that the method is effective in our scene. The modified formula is as follows:
Figure BDA0003160567880000072
specifically, the synthesizing of the text picture in the module M2 includes: the synthesized text data must ensure that the synthesized text is as close to the actual text as possible. The text, text size, paragraph spacing, and line spacing may be different for different documents, and the occurrence of these elements that make up the document follows a certain probability distribution. Through offline statistics, the probability distribution of each element in different documents, including font, character size, paragraph spacing and line spacing, is obtained; the A-Res algorithm is used in combination with the prior probability to generate synthetic text pictures with consistent distribution, and the method is proved to be effective in the later experimental results.
In the experiment, two English novels are used as real samples, as shown in FIG. 2 and tables 1 and 2, the trained deep learning network model obtains 98.1% of line segmentation accuracy and 99.5% of classification accuracy on Pride and Prejudge; line segmentation accuracy Of 98.5% and classification accuracy Of 99.7% were obtained on The sample "The Secret Of Plato's Atlantis".
Table 1: comparison of precision on Pride and Prejudge by different methods
Figure BDA0003160567880000081
Table 2: comparison Of The precision Of The different methods on The Secret Of Plato's Atlantis
Figure BDA0003160567880000082
Those skilled in the art will appreciate that, in addition to implementing the systems, apparatus, and various modules thereof provided by the present invention in purely computer readable program code, the same procedures can be implemented entirely by logically programming method steps such that the systems, apparatus, and various modules thereof are provided in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system, the device and the modules thereof provided by the present invention can be considered as a hardware component, and the modules included in the system, the device and the modules thereof for implementing various programs can also be considered as structures in the hardware component; modules for performing various functions may also be considered to be both software programs for performing the methods and structures within hardware components.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (10)

1. A document line segmentation and classification method based on a deep learning network is characterized by comprising the following steps:
step S1: establishing a deep learning network model capable of segmenting texts;
step S2: training a deep learning network model by utilizing the synthetic text picture to obtain a trained deep learning network model;
step S3: and performing line segmentation and classification on the documents by using the trained deep learning network model.
2. The deep learning network-based document line segmentation and classification method according to claim 1, wherein the deep learning network model comprises: the method comprises the steps of presetting a number of Convolation layers with ReLU activation functions, presetting a number of BN layers, presetting a number of BiLSTM layers, presetting a number of Convolation layers and presetting a number of full-connection layer branches.
3. The deep learning network-based document line segmentation and classification method according to claim 2, wherein in the deep learning network model: performing dimensionality reduction on a picture with a preset size through a Convolation layer with a ReLU activation function in a preset layer and a BN layer; inputting the feature graph after the dimension reduction processing into a BiLSTM layer with a preset number in a behavior time step, and reserving the output of the last time step; and transforming the output feature graph to obtain a feature graph with a preset size, extracting features through a preset number of Convolation layers to obtain an extracted feature graph, and generating output results of positioning, classification and centrality respectively by utilizing the extracted feature graph through three full-connection layer branches.
4. The deep learning network-based document line segmentation and classification method according to claim 3, wherein the centrality is as follows:
Figure FDA0003160567870000011
wherein, t*Representing the distance between the pixel point and the top of the group Truth frame; b*And the distance between the pixel point and the bottom of the group Truth frame is represented.
5. The deep learning network-based document line segmentation and classification method according to claim 1, wherein the composite text picture comprises: through offline statistics, the probability distribution of each element in different documents, including font, character size, paragraph spacing and line spacing, is obtained; and generating a composite text picture with consistent distribution by using an A-Res algorithm in combination with the prior probability.
6. A system for document line segmentation and classification based on a deep learning network, comprising:
module M1: establishing a deep learning network model capable of segmenting texts;
module M2: training a deep learning network model by utilizing the synthetic text picture to obtain a trained deep learning network model;
module M3: and performing line segmentation and classification on the documents by using the trained deep learning network model.
7. The deep learning network-based document line segmentation and classification system of claim 6, wherein the deep learning network model comprises: the preset layer comprises a Convolation layer, a BN layer, a preset number of BilSTM layers, a preset number of Convolation layers and a preset number of full-connection layer branches of a ReLU activation function.
8. The deep learning network-based document line segmentation and classification system of claim 7, wherein in the deep learning network model: performing dimensionality reduction on a picture with a preset size through a Convolation layer with a ReLU activation function in a preset layer and a BN layer; inputting the feature graph after the dimension reduction processing into a BiLSTM layer with a preset number in a behavior time step, and reserving the output of the last time step; and transforming the output feature graph to obtain a feature graph with a preset size, extracting features through a preset number of Convolation layers to obtain an extracted feature graph, and generating output results of positioning, classification and centrality respectively by utilizing the extracted feature graph through three full-connection layer branches.
9. The deep learning network-based document line segmentation and classification system according to claim 8, wherein the centrality is using:
Figure FDA0003160567870000021
wherein, t*Representing the distance between the pixel point and the top of the group Truth frame; b*And the distance between the pixel point and the bottom of the group Truth frame is represented.
10. The deep learning network-based document line segmentation and classification system of claim 6, wherein the composite text picture comprises: through offline statistics, the probability distribution of each element in different documents, including font, character size, paragraph spacing and line spacing, is obtained; and generating a composite text picture with consistent distribution by using an A-Res algorithm in combination with the prior probability.
CN202110790181.2A 2021-07-13 2021-07-13 Document line segmentation and classification method and system based on deep learning network Pending CN113642401A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110790181.2A CN113642401A (en) 2021-07-13 2021-07-13 Document line segmentation and classification method and system based on deep learning network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110790181.2A CN113642401A (en) 2021-07-13 2021-07-13 Document line segmentation and classification method and system based on deep learning network

Publications (1)

Publication Number Publication Date
CN113642401A true CN113642401A (en) 2021-11-12

Family

ID=78417193

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110790181.2A Pending CN113642401A (en) 2021-07-13 2021-07-13 Document line segmentation and classification method and system based on deep learning network

Country Status (1)

Country Link
CN (1) CN113642401A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016547A (en) * 2020-08-20 2020-12-01 上海天壤智能科技有限公司 Image character recognition method, system and medium based on deep learning
CN113065396A (en) * 2021-03-02 2021-07-02 国网湖北省电力有限公司 Automatic filing processing system and method for scanned archive image based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112016547A (en) * 2020-08-20 2020-12-01 上海天壤智能科技有限公司 Image character recognition method, system and medium based on deep learning
CN113065396A (en) * 2021-03-02 2021-07-02 国网湖北省电力有限公司 Automatic filing processing system and method for scanned archive image based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XIN WANG ET AL: "A Model for Text Line Segmentation and Classification in Printed Documents", 《ICMLC 2021: 2021 13TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING》 *
李晓双等: "基于优化残差网络的多模态音乐情感分类", 《计算机与现代化》 *
郭小勇: "基于深度学习的新闻推荐算法研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 *

Similar Documents

Publication Publication Date Title
Dong et al. PGA-Net: Pyramid feature fusion and global context attention network for automated surface defect detection
Zhang et al. Research on face detection technology based on MTCNN
CN107358262B (en) High-resolution image classification method and classification device
US8379994B2 (en) Digital image analysis utilizing multiple human labels
CN111401371A (en) Text detection and identification method and system and computer equipment
CN108764242A (en) Off-line Chinese Character discrimination body recognition methods based on deep layer convolutional neural networks
CN111832573B (en) Image emotion classification method based on class activation mapping and visual saliency
CN111967313A (en) Unmanned aerial vehicle image annotation method assisted by deep learning target detection algorithm
CN109886159B (en) Face detection method under non-limited condition
CN111951154B (en) Picture generation method and device containing background and medium
CN112766170B (en) Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image
CN113689436A (en) Image semantic segmentation method, device, equipment and storage medium
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN113205047A (en) Drug name identification method and device, computer equipment and storage medium
CN112508000B (en) Method and equipment for generating OCR image recognition model training data
CN107918759A (en) Automatic segmentation recognition method, electronic equipment and the storage medium of indoor object
CN116778164A (en) Semantic segmentation method for improving deep V < 3+ > network based on multi-scale structure
KR102026280B1 (en) Method and system for scene text detection using deep learning
CN113642401A (en) Document line segmentation and classification method and system based on deep learning network
CN108898188A (en) A kind of image data set aid mark system and method
CN116912872A (en) Drawing identification method, device, equipment and readable storage medium
CN114926851A (en) Method, system and storage medium for identifying table structure in table picture
CN113191942A (en) Method for generating image, method for training human detection model, program, and device
CN112926670A (en) Garbage classification system and method based on transfer learning
CN112395834A (en) Brain graph generation method, device and equipment based on picture input and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20211112