CN111126128A - Method for detecting and dividing document layout area - Google Patents

Method for detecting and dividing document layout area Download PDF

Info

Publication number
CN111126128A
CN111126128A CN201911036942.4A CN201911036942A CN111126128A CN 111126128 A CN111126128 A CN 111126128A CN 201911036942 A CN201911036942 A CN 201911036942A CN 111126128 A CN111126128 A CN 111126128A
Authority
CN
China
Prior art keywords
detection model
detection
document
data set
detecting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911036942.4A
Other languages
Chinese (zh)
Inventor
张�雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Cross Strait Information Technology Co Ltd
Original Assignee
Fujian Cross Strait Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Cross Strait Information Technology Co Ltd filed Critical Fujian Cross Strait Information Technology Co Ltd
Priority to CN201911036942.4A priority Critical patent/CN111126128A/en
Publication of CN111126128A publication Critical patent/CN111126128A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/225Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/416Extracting the logical structure, e.g. chapters, sections or page numbers; Identifying elements of the document, e.g. authors

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for detecting and segmenting a document layout area, which comprises the steps of obtaining a document picture and establishing a training data set; creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model; the document picture to be detected and segmented is detected and segmented according to the second detection model, automatic detection and segmentation of the document picture can be achieved, and accuracy is high.

Description

Method for detecting and dividing document layout area
Technical Field
The invention relates to the technical field of image detection, in particular to a method for detecting and dividing a document layout area.
Background
Current OCR technology typically first identifies all the text in the entire picture and then analyzes the content to extract useful information. When documents are digitized by using an OCR technology and made into an electronic book, not only are recognized characters detected by using the OCR technology, but also the layout of the original book needs to be followed, and therefore, an effective content area, an area in a frame (such as a black frame), a header and a footer and the like in a layout need to be determined. Due to the diversity of the layouts of different documents, the layout is difficult to be divided in a regularized mode, and no corresponding technology is available for realizing automatic division of the layout at present.
Therefore, a method for detecting and dividing a document layout area is needed, which can realize automatic detection and division of document pictures and has high accuracy.
Disclosure of Invention
Technical problem to be solved
In order to solve the above problems in the prior art, the present invention provides a method for detecting and segmenting a document layout area, which can realize automatic detection and segmentation of a document picture and has high accuracy.
(II) technical scheme
In order to achieve the purpose, the invention adopts the main technical scheme that:
a method for detecting and dividing document layout areas comprises the following steps:
s1, acquiring a document picture, and establishing a training data set;
s2, creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model;
and S3, detecting and segmenting the document picture to be detected and segmented according to the second detection model.
(III) advantageous effects
The invention has the beneficial effects that: obtaining a document picture and establishing a training data set; creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model; the document picture to be detected and segmented is detected and segmented according to the second detection model, automatic detection and segmentation of the document picture can be achieved, and accuracy is high.
Drawings
Fig. 1 is a flowchart of a method for detecting and dividing a document layout area according to an embodiment of the present invention.
Detailed Description
For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.
A method for detecting and dividing a document layout area is characterized by comprising the following steps:
s1, acquiring a document picture, and establishing a training data set;
s2, creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model;
and S3, detecting and segmenting the document picture to be detected and segmented according to the second detection model.
As can be seen from the above description, a training data set is established by acquiring literature images; creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model; the document picture to be detected and segmented is detected and segmented according to the second detection model, automatic detection and segmentation of the document picture can be achieved, and accuracy is high.
Further, step S1 is specifically:
acquiring literature pictures with different formats, and establishing a first detection data set.
Further, step S1 further includes:
and marking the pictures in the first detection data set to obtain a second detection data set.
According to the description, the accuracy of subsequent detection segmentation is improved by acquiring document pictures with different formats and marking the pictures in the first detection data set to obtain the second detection data set.
Further, step S2 is specifically:
and creating a first neural network YOLO V3 detection model, and training the first neural network YOLO V3 detection model through the training data set to obtain a trained second neural network YOLO V3 detection model.
From the above description, it can be seen that the efficiency and the accuracy of the detection segmentation are improved by creating the first neural network YOLO V3 detection model and training the first neural network YOLO V3 detection model through the training data set to obtain the trained second neural network YOLOV3 detection model.
Further, the training the first neural network YOLO V3 detection model through the training data set specifically includes:
training the first neural network YoLO V3 detection model through the second detection data set.
As can be seen from the above description, the first neural network YOLO V3 detection model is trained through the second detection data set, so that the accuracy of detection and segmentation of the trained model is ensured.
Further, step S3 is specifically:
and carrying out detection segmentation on the document picture to be detected and segmented according to the second neural network YOLO V3 detection model.
From the above description, by performing detection segmentation on the document picture to be detected and segmented according to the second neural network YOLO V3 detection model, the accuracy and efficiency of the detection segmentation of the document picture are improved.
Further, before the detecting and segmenting of the document picture to be detected and segmented according to the second detection model, the method further comprises:
and carrying out standardization processing on the document picture to be detected and segmented.
As can be seen from the above description, by performing the standardization processing on the document picture to be detected and segmented, the accuracy of detection and segmentation is facilitated to be improved.
Example one
Referring to fig. 1, a method for detecting and dividing a document layout area includes the steps of:
s1, acquiring a document picture, and establishing a training data set;
step S1 specifically includes:
acquiring literature pictures with different formats, and establishing a first detection data set.
Step S1 further includes:
and marking the pictures in the first detection data set to obtain a second detection data set.
S2, creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model;
step S2 specifically includes:
and creating a first neural network YOLO V3 detection model, and training the first neural network YOLO V3 detection model through the training data set to obtain a trained second neural network YOLO V3 detection model.
The training of the first neural network YOLO V3 detection model by the training data set specifically includes:
training the first neural network YoLO V3 detection model through the second detection data set.
And S3, detecting and segmenting the document picture to be detected and segmented according to the second detection model.
Step S3 specifically includes:
and carrying out detection segmentation on the document picture to be detected and segmented according to the second neural network YOLO V3 detection model.
The method further comprises the following steps before the document picture to be detected and segmented is detected and segmented according to the second detection model:
and carrying out standardization processing on the document picture to be detected and segmented.
Example two
The difference between this embodiment and the first embodiment is that this embodiment will further describe how the method for detecting and dividing layout areas in the above-mentioned document according to the present invention is implemented in combination with a specific application scenario:
collecting data
Acquiring a document picture, and establishing a training data set;
acquiring literature pictures with different formats, and establishing a first detection data set.
Specifically, various plate-type literature pictures are collected according to business requirements, and data analysis and sorting are performed. The number of pictures is as many as possible, the number of the formats is as many as possible, and the data volume is in the order of tens of thousands of pictures.
Marking data
And marking the pictures in the first detection data set to obtain a second detection data set.
Specifically, each picture is marked by a marking tool in a mode of manually dividing a layout area, and coordinates of each area of the layout of the picture are all recorded in a TXT file to serve as an area label of the picture, wherein the label file is one picture. The specific label file content format is as follows:
X1,Y1,X2,Y2
table 1 description table of contents of tag file
Figure BDA0002251761530000051
Three, training model
And creating a first neural network YOLO V3 detection model, and training the first neural network YOLO V3 detection model through the training data set to obtain a trained second neural network YOLO V3 detection model.
The YOLO V3 framework is established, a 105-layer structure is adopted, main hyper-parameter definition is adopted, and a dark net-53 feature extraction module and a feature interaction layer of the YOLO network are adopted.
darknet-53: from layer 0 to layer 74, there are a total of 53 convolutional layers, the remainder being res layers. The convolution layer is used for extracting image characteristics, and the res layer is used for solving the phenomenon of gradient dispersion or gradient explosion of the network. As the main network structure for the extraction of YOLO V3 features. The structure uses a series of 3 x 3 and 1 x 1 convolutions.
A characteristic interaction layer: the feature interaction layer of the network from 75 to 105 layers is divided into three scales, and in each scale, local feature interaction is realized by means of convolution kernels, and the effect is similar to that of a fully-connected layer, but local feature interaction between feature maps is realized by means of convolution kernels (3 × 3 and 1 × 1) (global feature interaction is realized by the fc layer).
Training the first neural network YOLO V3 detection model through the second detection dataset;
fourth, application detection model
And carrying out detection segmentation on the document picture to be detected and segmented according to the second neural network YOLO V3 detection model.
Specifically, after the model training stage, a second neural network YOLO V3 detection model is obtained for the picture to call the single character detection and segmentation function.
The process flow of applying the second detection model is as follows:
1. and carrying out first standardized preprocessing on the to-be-detected segmented picture input by the user.
Before the document picture to be detected and segmented according to the second detection model is detected and segmented, the method further comprises the following steps:
and carrying out standardization processing on the document picture to be detected and segmented.
The second detection model has strict limitations on the size and format of the input to-be-detected segmented picture, but due to the diversity of the size and format of the picture input by the user, the input picture needs to be processed in a standardized manner.
The pretreatment process comprises the following steps:
the pictures are uniformly converted into an RGB format,
the size scales uniformly to 416 x 416 size,
and uniformly subtracting the average value from the picture content.
2. A second detection model is invoked.
And sending the preprocessed pictures into a trained second detection model, and outputting detection data (x, y, w, h, confidence).
The second detection model outputs data:
TABLE 2 second test model output data description Table
Figure BDA0002251761530000071
3. Data post-processing
The data obtained by detecting the to-be-detected segmented picture by the second detection model is not suitable for being directly used by the user.
(x1,y1,x2,y2)
Data obtained by data post-processing:
TABLE 3 post-processing output description table for second test model data
Figure BDA0002251761530000072
4. And writing related data into a json file for single character detection.
In order to enable a user to use the data better, the data related to the picture and the data obtained by the second detection model are integrated and are uniformly written into a json file.
The Json file format is as follows:
TABLE 4json document content description
Figure BDA0002251761530000081
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or directly or indirectly applied in the related technical fields, are included in the scope of the present invention.

Claims (7)

1. A method for detecting and dividing a document layout area is characterized by comprising the following steps:
s1, acquiring a document picture, and establishing a training data set;
s2, creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model;
and S3, detecting and segmenting the document picture to be detected and segmented according to the second detection model.
2. The method for detecting and dividing the layout area of the document according to claim 1, wherein the step S1 is specifically as follows:
acquiring literature pictures with different formats, and establishing a first detection data set.
3. The document layout area detection and segmentation method according to claim 2, wherein the step S1 further comprises:
and marking the pictures in the first detection data set to obtain a second detection data set.
4. The method for detecting and dividing the layout area of the document according to claim 1, wherein the step S2 is specifically as follows:
and creating a first neural network YOLO V3 detection model, and training the first neural network YOLO V3 detection model through the training data set to obtain a trained second neural network YOLO V3 detection model.
5. The method for detecting and segmenting document layout areas according to claim 4, wherein the training of the first neural network YOLO V3 detection model through the training data set specifically comprises:
training the first neural network YoLO V3 detection model through the second detection data set.
6. The method for detecting and dividing the layout area of the document according to claim 4, wherein the step S3 is specifically as follows:
and detecting and segmenting the document picture to be detected and segmented according to the second neural network YOLO V3 detection model.
7. The method for detecting and segmenting the document layout area according to claim 1, wherein before the detecting and segmenting the document picture to be detected and segmented according to the second detection model, the method further comprises:
and carrying out standardization processing on the document picture to be detected and segmented.
CN201911036942.4A 2019-10-29 2019-10-29 Method for detecting and dividing document layout area Pending CN111126128A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911036942.4A CN111126128A (en) 2019-10-29 2019-10-29 Method for detecting and dividing document layout area

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911036942.4A CN111126128A (en) 2019-10-29 2019-10-29 Method for detecting and dividing document layout area

Publications (1)

Publication Number Publication Date
CN111126128A true CN111126128A (en) 2020-05-08

Family

ID=70495434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911036942.4A Pending CN111126128A (en) 2019-10-29 2019-10-29 Method for detecting and dividing document layout area

Country Status (1)

Country Link
CN (1) CN111126128A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060222239A1 (en) * 2005-03-31 2006-10-05 Bargeron David M Systems and methods for detecting text
CN108921152A (en) * 2018-06-29 2018-11-30 清华大学 English character cutting method and device based on object detection network
CN109800756A (en) * 2018-12-14 2019-05-24 华南理工大学 A kind of text detection recognition methods for the intensive text of Chinese historical document
CN110020615A (en) * 2019-03-20 2019-07-16 阿里巴巴集团控股有限公司 The method and system of Word Input and content recognition is carried out to picture
CN110348280A (en) * 2019-03-21 2019-10-18 贵州工业职业技术学院 Water book character recognition method based on CNN artificial neural

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060222239A1 (en) * 2005-03-31 2006-10-05 Bargeron David M Systems and methods for detecting text
CN108921152A (en) * 2018-06-29 2018-11-30 清华大学 English character cutting method and device based on object detection network
CN109800756A (en) * 2018-12-14 2019-05-24 华南理工大学 A kind of text detection recognition methods for the intensive text of Chinese historical document
CN110020615A (en) * 2019-03-20 2019-07-16 阿里巴巴集团控股有限公司 The method and system of Word Input and content recognition is carried out to picture
CN110348280A (en) * 2019-03-21 2019-10-18 贵州工业职业技术学院 Water book character recognition method based on CNN artificial neural

Similar Documents

Publication Publication Date Title
CN109800761B (en) Method and terminal for creating paper document structured data based on deep learning model
CN110210413B (en) Multidisciplinary test paper content detection and identification system and method based on deep learning
CN109948510B (en) Document image instance segmentation method and device
US10896357B1 (en) Automatic key/value pair extraction from document images using deep learning
CN110569832A (en) text real-time positioning and identifying method based on deep learning attention mechanism
CN109816118A (en) A kind of method and terminal of the creation structured document based on deep learning model
CN105825211B (en) Business card identification method, apparatus and system
CN112508011A (en) OCR (optical character recognition) method and device based on neural network
CN113901952A (en) Print form and handwritten form separated character recognition method based on deep learning
CN113239807B (en) Method and device for training bill identification model and bill identification
KR20130066819A (en) Apparus and method for character recognition based on photograph image
CN113963147A (en) Key information extraction method and system based on semantic segmentation
CN108090728B (en) Express information input method and system based on intelligent terminal
Natei et al. Extracting text from image document and displaying its related information
Akanksh et al. Automated invoice data extraction using image processing
CN104899551B (en) A kind of form image sorting technique
Saudagar et al. Augmented reality mobile application for arabic text extraction, recognition and translation
CN111062262A (en) Invoice recognition method and invoice recognition device
CN111126128A (en) Method for detecting and dividing document layout area
Devi et al. Brahmi script recognition system using deep learning techniques
JP7364639B2 (en) Processing of digitized writing
CN111986015B (en) Method and system for extracting financial information for billing
CN111213157A (en) Express information input method and system based on intelligent terminal
CN112686253A (en) Screen character extraction system and method for electronic whiteboard
CN110909734A (en) Document character detection and identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200508