CN111126128A - Method for detecting and dividing document layout area - Google Patents
Method for detecting and dividing document layout area Download PDFInfo
- Publication number
- CN111126128A CN111126128A CN201911036942.4A CN201911036942A CN111126128A CN 111126128 A CN111126128 A CN 111126128A CN 201911036942 A CN201911036942 A CN 201911036942A CN 111126128 A CN111126128 A CN 111126128A
- Authority
- CN
- China
- Prior art keywords
- detection model
- detection
- document
- data set
- detecting
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000001514 detection method Methods 0.000 claims abstract description 107
- 230000011218 segmentation Effects 0.000 claims abstract description 15
- 238000013528 artificial neural network Methods 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 5
- 230000003993 interaction Effects 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 238000012805 post-processing Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 239000006185 dispersion Substances 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
- G06V10/225—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/153—Segmentation of character regions using recognition of characters or words
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/416—Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method for detecting and segmenting a document layout area, which comprises the steps of obtaining a document picture and establishing a training data set; creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model; the document picture to be detected and segmented is detected and segmented according to the second detection model, automatic detection and segmentation of the document picture can be achieved, and accuracy is high.
Description
Technical Field
The invention relates to the technical field of image detection, in particular to a method for detecting and dividing a document layout area.
Background
Current OCR technology typically first identifies all the text in the entire picture and then analyzes the content to extract useful information. When documents are digitized by using an OCR technology and made into an electronic book, not only are recognized characters detected by using the OCR technology, but also the layout of the original book needs to be followed, and therefore, an effective content area, an area in a frame (such as a black frame), a header and a footer and the like in a layout need to be determined. Due to the diversity of the layouts of different documents, the layout is difficult to be divided in a regularized mode, and no corresponding technology is available for realizing automatic division of the layout at present.
Therefore, a method for detecting and dividing a document layout area is needed, which can realize automatic detection and division of document pictures and has high accuracy.
Disclosure of Invention
Technical problem to be solved
In order to solve the above problems in the prior art, the present invention provides a method for detecting and segmenting a document layout area, which can realize automatic detection and segmentation of a document picture and has high accuracy.
(II) technical scheme
In order to achieve the purpose, the invention adopts the main technical scheme that:
a method for detecting and dividing document layout areas comprises the following steps:
s1, acquiring a document picture, and establishing a training data set;
s2, creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model;
and S3, detecting and segmenting the document picture to be detected and segmented according to the second detection model.
(III) advantageous effects
The invention has the beneficial effects that: obtaining a document picture and establishing a training data set; creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model; the document picture to be detected and segmented is detected and segmented according to the second detection model, automatic detection and segmentation of the document picture can be achieved, and accuracy is high.
Drawings
Fig. 1 is a flowchart of a method for detecting and dividing a document layout area according to an embodiment of the present invention.
Detailed Description
For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.
A method for detecting and dividing a document layout area is characterized by comprising the following steps:
s1, acquiring a document picture, and establishing a training data set;
s2, creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model;
and S3, detecting and segmenting the document picture to be detected and segmented according to the second detection model.
As can be seen from the above description, a training data set is established by acquiring literature images; creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model; the document picture to be detected and segmented is detected and segmented according to the second detection model, automatic detection and segmentation of the document picture can be achieved, and accuracy is high.
Further, step S1 is specifically:
acquiring literature pictures with different formats, and establishing a first detection data set.
Further, step S1 further includes:
and marking the pictures in the first detection data set to obtain a second detection data set.
According to the description, the accuracy of subsequent detection segmentation is improved by acquiring document pictures with different formats and marking the pictures in the first detection data set to obtain the second detection data set.
Further, step S2 is specifically:
and creating a first neural network YOLO V3 detection model, and training the first neural network YOLO V3 detection model through the training data set to obtain a trained second neural network YOLO V3 detection model.
From the above description, it can be seen that the efficiency and the accuracy of the detection segmentation are improved by creating the first neural network YOLO V3 detection model and training the first neural network YOLO V3 detection model through the training data set to obtain the trained second neural network YOLOV3 detection model.
Further, the training the first neural network YOLO V3 detection model through the training data set specifically includes:
training the first neural network YoLO V3 detection model through the second detection data set.
As can be seen from the above description, the first neural network YOLO V3 detection model is trained through the second detection data set, so that the accuracy of detection and segmentation of the trained model is ensured.
Further, step S3 is specifically:
and carrying out detection segmentation on the document picture to be detected and segmented according to the second neural network YOLO V3 detection model.
From the above description, by performing detection segmentation on the document picture to be detected and segmented according to the second neural network YOLO V3 detection model, the accuracy and efficiency of the detection segmentation of the document picture are improved.
Further, before the detecting and segmenting of the document picture to be detected and segmented according to the second detection model, the method further comprises:
and carrying out standardization processing on the document picture to be detected and segmented.
As can be seen from the above description, by performing the standardization processing on the document picture to be detected and segmented, the accuracy of detection and segmentation is facilitated to be improved.
Example one
Referring to fig. 1, a method for detecting and dividing a document layout area includes the steps of:
s1, acquiring a document picture, and establishing a training data set;
step S1 specifically includes:
acquiring literature pictures with different formats, and establishing a first detection data set.
Step S1 further includes:
and marking the pictures in the first detection data set to obtain a second detection data set.
S2, creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model;
step S2 specifically includes:
and creating a first neural network YOLO V3 detection model, and training the first neural network YOLO V3 detection model through the training data set to obtain a trained second neural network YOLO V3 detection model.
The training of the first neural network YOLO V3 detection model by the training data set specifically includes:
training the first neural network YoLO V3 detection model through the second detection data set.
And S3, detecting and segmenting the document picture to be detected and segmented according to the second detection model.
Step S3 specifically includes:
and carrying out detection segmentation on the document picture to be detected and segmented according to the second neural network YOLO V3 detection model.
The method further comprises the following steps before the document picture to be detected and segmented is detected and segmented according to the second detection model:
and carrying out standardization processing on the document picture to be detected and segmented.
Example two
The difference between this embodiment and the first embodiment is that this embodiment will further describe how the method for detecting and dividing layout areas in the above-mentioned document according to the present invention is implemented in combination with a specific application scenario:
collecting data
Acquiring a document picture, and establishing a training data set;
acquiring literature pictures with different formats, and establishing a first detection data set.
Specifically, various plate-type literature pictures are collected according to business requirements, and data analysis and sorting are performed. The number of pictures is as many as possible, the number of the formats is as many as possible, and the data volume is in the order of tens of thousands of pictures.
Marking data
And marking the pictures in the first detection data set to obtain a second detection data set.
Specifically, each picture is marked by a marking tool in a mode of manually dividing a layout area, and coordinates of each area of the layout of the picture are all recorded in a TXT file to serve as an area label of the picture, wherein the label file is one picture. The specific label file content format is as follows:
X1,Y1,X2,Y2
table 1 description table of contents of tag file
Three, training model
And creating a first neural network YOLO V3 detection model, and training the first neural network YOLO V3 detection model through the training data set to obtain a trained second neural network YOLO V3 detection model.
The YOLO V3 framework is established, a 105-layer structure is adopted, main hyper-parameter definition is adopted, and a dark net-53 feature extraction module and a feature interaction layer of the YOLO network are adopted.
darknet-53: from layer 0 to layer 74, there are a total of 53 convolutional layers, the remainder being res layers. The convolution layer is used for extracting image characteristics, and the res layer is used for solving the phenomenon of gradient dispersion or gradient explosion of the network. As the main network structure for the extraction of YOLO V3 features. The structure uses a series of 3 x 3 and 1 x 1 convolutions.
A characteristic interaction layer: the feature interaction layer of the network from 75 to 105 layers is divided into three scales, and in each scale, local feature interaction is realized by means of convolution kernels, and the effect is similar to that of a fully-connected layer, but local feature interaction between feature maps is realized by means of convolution kernels (3 × 3 and 1 × 1) (global feature interaction is realized by the fc layer).
Training the first neural network YOLO V3 detection model through the second detection dataset;
fourth, application detection model
And carrying out detection segmentation on the document picture to be detected and segmented according to the second neural network YOLO V3 detection model.
Specifically, after the model training stage, a second neural network YOLO V3 detection model is obtained for the picture to call the single character detection and segmentation function.
The process flow of applying the second detection model is as follows:
1. and carrying out first standardized preprocessing on the to-be-detected segmented picture input by the user.
Before the document picture to be detected and segmented according to the second detection model is detected and segmented, the method further comprises the following steps:
and carrying out standardization processing on the document picture to be detected and segmented.
The second detection model has strict limitations on the size and format of the input to-be-detected segmented picture, but due to the diversity of the size and format of the picture input by the user, the input picture needs to be processed in a standardized manner.
The pretreatment process comprises the following steps:
the pictures are uniformly converted into an RGB format,
the size scales uniformly to 416 x 416 size,
and uniformly subtracting the average value from the picture content.
2. A second detection model is invoked.
And sending the preprocessed pictures into a trained second detection model, and outputting detection data (x, y, w, h, confidence).
The second detection model outputs data:
TABLE 2 second test model output data description Table
3. Data post-processing
The data obtained by detecting the to-be-detected segmented picture by the second detection model is not suitable for being directly used by the user.
(x1,y1,x2,y2)
Data obtained by data post-processing:
TABLE 3 post-processing output description table for second test model data
4. And writing related data into a json file for single character detection.
In order to enable a user to use the data better, the data related to the picture and the data obtained by the second detection model are integrated and are uniformly written into a json file.
The Json file format is as follows:
TABLE 4json document content description
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or directly or indirectly applied in the related technical fields, are included in the scope of the present invention.
Claims (7)
1. A method for detecting and dividing a document layout area is characterized by comprising the following steps:
s1, acquiring a document picture, and establishing a training data set;
s2, creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model;
and S3, detecting and segmenting the document picture to be detected and segmented according to the second detection model.
2. The method for detecting and dividing the layout area of the document according to claim 1, wherein the step S1 is specifically as follows:
acquiring literature pictures with different formats, and establishing a first detection data set.
3. The document layout area detection and segmentation method according to claim 2, wherein the step S1 further comprises:
and marking the pictures in the first detection data set to obtain a second detection data set.
4. The method for detecting and dividing the layout area of the document according to claim 1, wherein the step S2 is specifically as follows:
and creating a first neural network YOLO V3 detection model, and training the first neural network YOLO V3 detection model through the training data set to obtain a trained second neural network YOLO V3 detection model.
5. The method for detecting and segmenting document layout areas according to claim 4, wherein the training of the first neural network YOLO V3 detection model through the training data set specifically comprises:
training the first neural network YoLO V3 detection model through the second detection data set.
6. The method for detecting and dividing the layout area of the document according to claim 4, wherein the step S3 is specifically as follows:
and detecting and segmenting the document picture to be detected and segmented according to the second neural network YOLO V3 detection model.
7. The method for detecting and segmenting the document layout area according to claim 1, wherein before the detecting and segmenting the document picture to be detected and segmented according to the second detection model, the method further comprises:
and carrying out standardization processing on the document picture to be detected and segmented.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911036942.4A CN111126128A (en) | 2019-10-29 | 2019-10-29 | Method for detecting and dividing document layout area |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911036942.4A CN111126128A (en) | 2019-10-29 | 2019-10-29 | Method for detecting and dividing document layout area |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111126128A true CN111126128A (en) | 2020-05-08 |
Family
ID=70495434
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911036942.4A Pending CN111126128A (en) | 2019-10-29 | 2019-10-29 | Method for detecting and dividing document layout area |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111126128A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060222239A1 (en) * | 2005-03-31 | 2006-10-05 | Bargeron David M | Systems and methods for detecting text |
CN108921152A (en) * | 2018-06-29 | 2018-11-30 | 清华大学 | English character cutting method and device based on object detection network |
CN109800756A (en) * | 2018-12-14 | 2019-05-24 | 华南理工大学 | A kind of text detection recognition methods for the intensive text of Chinese historical document |
CN110020615A (en) * | 2019-03-20 | 2019-07-16 | 阿里巴巴集团控股有限公司 | The method and system of Word Input and content recognition is carried out to picture |
CN110348280A (en) * | 2019-03-21 | 2019-10-18 | 贵州工业职业技术学院 | Water book character recognition method based on CNN artificial neural |
-
2019
- 2019-10-29 CN CN201911036942.4A patent/CN111126128A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060222239A1 (en) * | 2005-03-31 | 2006-10-05 | Bargeron David M | Systems and methods for detecting text |
CN108921152A (en) * | 2018-06-29 | 2018-11-30 | 清华大学 | English character cutting method and device based on object detection network |
CN109800756A (en) * | 2018-12-14 | 2019-05-24 | 华南理工大学 | A kind of text detection recognition methods for the intensive text of Chinese historical document |
CN110020615A (en) * | 2019-03-20 | 2019-07-16 | 阿里巴巴集团控股有限公司 | The method and system of Word Input and content recognition is carried out to picture |
CN110348280A (en) * | 2019-03-21 | 2019-10-18 | 贵州工业职业技术学院 | Water book character recognition method based on CNN artificial neural |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109800761B (en) | Method and terminal for creating paper document structured data based on deep learning model | |
CN110210413B (en) | Multidisciplinary test paper content detection and identification system and method based on deep learning | |
CN109948510B (en) | Document image instance segmentation method and device | |
US10896357B1 (en) | Automatic key/value pair extraction from document images using deep learning | |
CN110569832A (en) | text real-time positioning and identifying method based on deep learning attention mechanism | |
CN109816118A (en) | A kind of method and terminal of the creation structured document based on deep learning model | |
CN105825211B (en) | Business card identification method, apparatus and system | |
CN112508011A (en) | OCR (optical character recognition) method and device based on neural network | |
CN113901952A (en) | Print form and handwritten form separated character recognition method based on deep learning | |
CN113239807B (en) | Method and device for training bill identification model and bill identification | |
KR20130066819A (en) | Apparus and method for character recognition based on photograph image | |
CN113963147A (en) | Key information extraction method and system based on semantic segmentation | |
CN108090728B (en) | Express information input method and system based on intelligent terminal | |
Natei et al. | Extracting text from image document and displaying its related information | |
Akanksh et al. | Automated invoice data extraction using image processing | |
CN104899551B (en) | A kind of form image sorting technique | |
Saudagar et al. | Augmented reality mobile application for arabic text extraction, recognition and translation | |
CN111062262A (en) | Invoice recognition method and invoice recognition device | |
CN111126128A (en) | Method for detecting and dividing document layout area | |
Devi et al. | Brahmi script recognition system using deep learning techniques | |
JP7364639B2 (en) | Processing of digitized writing | |
CN111986015B (en) | Method and system for extracting financial information for billing | |
CN111213157A (en) | Express information input method and system based on intelligent terminal | |
CN112686253A (en) | Screen character extraction system and method for electronic whiteboard | |
CN110909734A (en) | Document character detection and identification method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200508 |