CN111126128A

CN111126128A - Method for detecting and dividing document layout area

Info

Publication number: CN111126128A
Application number: CN201911036942.4A
Authority: CN
Inventors: 张�雄
Original assignee: Fujian Cross Strait Information Technology Co Ltd
Current assignee: Fujian Cross Strait Information Technology Co Ltd
Priority date: 2019-10-29
Filing date: 2019-10-29
Publication date: 2020-05-08

Abstract

The invention provides a method for detecting and segmenting a document layout area, which comprises the steps of obtaining a document picture and establishing a training data set; creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model; the document picture to be detected and segmented is detected and segmented according to the second detection model, automatic detection and segmentation of the document picture can be achieved, and accuracy is high.

Description

Method for detecting and dividing document layout area

Technical Field

The invention relates to the technical field of image detection, in particular to a method for detecting and dividing a document layout area.

Background

Current OCR technology typically first identifies all the text in the entire picture and then analyzes the content to extract useful information. When documents are digitized by using an OCR technology and made into an electronic book, not only are recognized characters detected by using the OCR technology, but also the layout of the original book needs to be followed, and therefore, an effective content area, an area in a frame (such as a black frame), a header and a footer and the like in a layout need to be determined. Due to the diversity of the layouts of different documents, the layout is difficult to be divided in a regularized mode, and no corresponding technology is available for realizing automatic division of the layout at present.

Therefore, a method for detecting and dividing a document layout area is needed, which can realize automatic detection and division of document pictures and has high accuracy.

Disclosure of Invention

Technical problem to be solved

In order to solve the above problems in the prior art, the present invention provides a method for detecting and segmenting a document layout area, which can realize automatic detection and segmentation of a document picture and has high accuracy.

(II) technical scheme

In order to achieve the purpose, the invention adopts the main technical scheme that:

a method for detecting and dividing document layout areas comprises the following steps:

s1, acquiring a document picture, and establishing a training data set;

s2, creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model;

and S3, detecting and segmenting the document picture to be detected and segmented according to the second detection model.

(III) advantageous effects

The invention has the beneficial effects that: obtaining a document picture and establishing a training data set; creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model; the document picture to be detected and segmented is detected and segmented according to the second detection model, automatic detection and segmentation of the document picture can be achieved, and accuracy is high.

Drawings

Fig. 1 is a flowchart of a method for detecting and dividing a document layout area according to an embodiment of the present invention.

Detailed Description

For the purpose of better explaining the present invention and to facilitate understanding, the present invention will be described in detail by way of specific embodiments with reference to the accompanying drawings.

A method for detecting and dividing a document layout area is characterized by comprising the following steps:

s1, acquiring a document picture, and establishing a training data set;

As can be seen from the above description, a training data set is established by acquiring literature images; creating a first detection model, and training the first detection model through the training data set to obtain a trained second detection model; the document picture to be detected and segmented is detected and segmented according to the second detection model, automatic detection and segmentation of the document picture can be achieved, and accuracy is high.

Further, step S1 is specifically:

acquiring literature pictures with different formats, and establishing a first detection data set.

Further, step S1 further includes:

and marking the pictures in the first detection data set to obtain a second detection data set.

According to the description, the accuracy of subsequent detection segmentation is improved by acquiring document pictures with different formats and marking the pictures in the first detection data set to obtain the second detection data set.

Further, step S2 is specifically:

and creating a first neural network YOLO V3 detection model, and training the first neural network YOLO V3 detection model through the training data set to obtain a trained second neural network YOLO V3 detection model.

From the above description, it can be seen that the efficiency and the accuracy of the detection segmentation are improved by creating the first neural network YOLO V3 detection model and training the first neural network YOLO V3 detection model through the training data set to obtain the trained second neural network YOLOV3 detection model.

Further, the training the first neural network YOLO V3 detection model through the training data set specifically includes:

training the first neural network YoLO V3 detection model through the second detection data set.

As can be seen from the above description, the first neural network YOLO V3 detection model is trained through the second detection data set, so that the accuracy of detection and segmentation of the trained model is ensured.

Further, step S3 is specifically:

and carrying out detection segmentation on the document picture to be detected and segmented according to the second neural network YOLO V3 detection model.

From the above description, by performing detection segmentation on the document picture to be detected and segmented according to the second neural network YOLO V3 detection model, the accuracy and efficiency of the detection segmentation of the document picture are improved.

Further, before the detecting and segmenting of the document picture to be detected and segmented according to the second detection model, the method further comprises:

and carrying out standardization processing on the document picture to be detected and segmented.

As can be seen from the above description, by performing the standardization processing on the document picture to be detected and segmented, the accuracy of detection and segmentation is facilitated to be improved.

Example one

Referring to fig. 1, a method for detecting and dividing a document layout area includes the steps of:

s1, acquiring a document picture, and establishing a training data set;

step S1 specifically includes:

Step S1 further includes:

step S2 specifically includes:

The training of the first neural network YOLO V3 detection model by the training data set specifically includes:

Step S3 specifically includes:

The method further comprises the following steps before the document picture to be detected and segmented is detected and segmented according to the second detection model:

Example two

The difference between this embodiment and the first embodiment is that this embodiment will further describe how the method for detecting and dividing layout areas in the above-mentioned document according to the present invention is implemented in combination with a specific application scenario:

collecting data

Acquiring a document picture, and establishing a training data set;

Specifically, various plate-type literature pictures are collected according to business requirements, and data analysis and sorting are performed. The number of pictures is as many as possible, the number of the formats is as many as possible, and the data volume is in the order of tens of thousands of pictures.

Marking data

Specifically, each picture is marked by a marking tool in a mode of manually dividing a layout area, and coordinates of each area of the layout of the picture are all recorded in a TXT file to serve as an area label of the picture, wherein the label file is one picture. The specific label file content format is as follows:

X1,Y1,X2,Y2

table 1 description table of contents of tag file

Three, training model

The YOLO V3 framework is established, a 105-layer structure is adopted, main hyper-parameter definition is adopted, and a dark net-53 feature extraction module and a feature interaction layer of the YOLO network are adopted.

darknet-53: from layer 0 to layer 74, there are a total of 53 convolutional layers, the remainder being res layers. The convolution layer is used for extracting image characteristics, and the res layer is used for solving the phenomenon of gradient dispersion or gradient explosion of the network. As the main network structure for the extraction of YOLO V3 features. The structure uses a series of 3 x 3 and 1 x 1 convolutions.

A characteristic interaction layer: the feature interaction layer of the network from 75 to 105 layers is divided into three scales, and in each scale, local feature interaction is realized by means of convolution kernels, and the effect is similar to that of a fully-connected layer, but local feature interaction between feature maps is realized by means of convolution kernels (3 × 3 and 1 × 1) (global feature interaction is realized by the fc layer).

Training the first neural network YOLO V3 detection model through the second detection dataset;

fourth, application detection model

Specifically, after the model training stage, a second neural network YOLO V3 detection model is obtained for the picture to call the single character detection and segmentation function.

The process flow of applying the second detection model is as follows:

1. and carrying out first standardized preprocessing on the to-be-detected segmented picture input by the user.

Before the document picture to be detected and segmented according to the second detection model is detected and segmented, the method further comprises the following steps:

The second detection model has strict limitations on the size and format of the input to-be-detected segmented picture, but due to the diversity of the size and format of the picture input by the user, the input picture needs to be processed in a standardized manner.

The pretreatment process comprises the following steps:

the pictures are uniformly converted into an RGB format,

the size scales uniformly to 416 x 416 size,

and uniformly subtracting the average value from the picture content.

2. A second detection model is invoked.

And sending the preprocessed pictures into a trained second detection model, and outputting detection data (x, y, w, h, confidence).

The second detection model outputs data:

TABLE 2 second test model output data description Table

3. Data post-processing

The data obtained by detecting the to-be-detected segmented picture by the second detection model is not suitable for being directly used by the user.

(x1,y1,x2,y2)

Data obtained by data post-processing:

TABLE 3 post-processing output description table for second test model data

4. And writing related data into a json file for single character detection.

In order to enable a user to use the data better, the data related to the picture and the data obtained by the second detection model are integrated and are uniformly written into a json file.

The Json file format is as follows:

TABLE 4json document content description

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all equivalent changes made by using the contents of the present specification and the drawings, or directly or indirectly applied in the related technical fields, are included in the scope of the present invention.

Claims

1. A method for detecting and dividing a document layout area is characterized by comprising the following steps:

s1, acquiring a document picture, and establishing a training data set;

2. The method for detecting and dividing the layout area of the document according to claim 1, wherein the step S1 is specifically as follows:

3. The document layout area detection and segmentation method according to claim 2, wherein the step S1 further comprises:

4. The method for detecting and dividing the layout area of the document according to claim 1, wherein the step S2 is specifically as follows:

5. The method for detecting and segmenting document layout areas according to claim 4, wherein the training of the first neural network YOLO V3 detection model through the training data set specifically comprises:

6. The method for detecting and dividing the layout area of the document according to claim 4, wherein the step S3 is specifically as follows:

and detecting and segmenting the document picture to be detected and segmented according to the second neural network YOLO V3 detection model.

7. The method for detecting and segmenting the document layout area according to claim 1, wherein before the detecting and segmenting the document picture to be detected and segmented according to the second detection model, the method further comprises: