CN113065396A - Automatic filing processing system and method for scanned archive image based on deep learning - Google Patents

Automatic filing processing system and method for scanned archive image based on deep learning Download PDF

Info

Publication number
CN113065396A
CN113065396A CN202110230772.4A CN202110230772A CN113065396A CN 113065396 A CN113065396 A CN 113065396A CN 202110230772 A CN202110230772 A CN 202110230772A CN 113065396 A CN113065396 A CN 113065396A
Authority
CN
China
Prior art keywords
picture
image
processing
document
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110230772.4A
Other languages
Chinese (zh)
Inventor
陈文正
栾杉
李琳
占娜
魏馨霆
王溪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Central China Technology Development Of Electric Power Co ltd
State Grid Hubei Electric Power Co Ltd
Original Assignee
Hubei Central China Technology Development Of Electric Power Co ltd
State Grid Hubei Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Central China Technology Development Of Electric Power Co ltd, State Grid Hubei Electric Power Co Ltd filed Critical Hubei Central China Technology Development Of Electric Power Co ltd
Priority to CN202110230772.4A priority Critical patent/CN113065396A/en
Publication of CN113065396A publication Critical patent/CN113065396A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/243Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Abstract

The invention provides an automatic filing processing system and method for scanning archival images based on deep learning, wherein the method comprises the following steps: the method comprises the following steps: preprocessing data and training a model; step two: identifying a subject; step three: correcting the inclination; step four: automatic threshold processing and image reconstruction; step five: processing table picture data; step six: table division: inputting the data marked in the step five into a Unet network for training to obtain a table segmentation model, and performing table segmentation on an input picture according to the table segmentation model to obtain a picture of a cell; step seven: text line segmentation: and performing text line segmentation on each cell segmented in the step six by using a CTPN model. The invention has the advantages that the integral accuracy of the archive picture processing reaches more than 91.7 percent, the archive filing work efficiency and the standardization level can be improved, and the personnel and equipment investment and the environmental requirements of a processing site are reduced.

Description

Automatic filing processing system and method for scanned archive image based on deep learning
Technical Field
The invention relates to the field of automatic processing of documents, in particular to an automatic filing processing system and method for scanning archival images based on deep learning.
Background
With the continuous application of informatization, networking and digitization in various industries of society, people have generally accepted the digitization of an industry management mode. However, digital archive management work is still slow moving. The main reasons are: large amount of files and limited quality of file management personnel. From the electric power industry, the intelligent terminal is popularized and applied in a plurality of specialties such as electric power operation and maintenance, marketing, safety supervision and office, and the proportion of unstructured data generated by the intelligent terminal is greatly improved.
The unstructured data generated by the intelligent terminal is mainly a complex shot image, and the image shot by the intelligent terminal generally has various abnormal characteristics such as inclination, uneven illumination, noise interference, edge softening, geometric distortion and the like, so that a large amount of finishing and processing work of converting the unstructured data into structured data is needed before the unstructured data is effectively applied, and the workload is huge, and the unstructured data is mainly completed manually at present. In the face of increasingly huge shot image documents, it is urgent to rapidly, effectively and correctly acquire valuable unstructured information or knowledge, and therefore, intensive research on the automatic processing technology for finishing and extracting information of complex shot image documents is urgently needed.
The archives digital work has the characteristics of labor intensity, the realization of the automation of scanning archives images and the standardized filing processing technology is expected to utilize machines to replace manpower in the archives filing link, the archives filing work efficiency and the standardization level are improved, and the personnel and equipment investment and the environmental requirements of a processing field are reduced.
Disclosure of Invention
Aiming at the problems in the digital automatic filing work of archives, the invention provides an automatic filing processing system and method for scanning archives images based on deep learning.
An automatic filing processing method for scanned archive images based on deep learning comprises the following steps:
the method comprises the following steps: data preprocessing and model training
The pictures to be processed are divided into five categories: drawing, handwriting, table, photo and other types, and marking the document body and the text line for each type of picture; then, training the preprocessed pictures by using Object Detection and Faster CNN models to obtain picture classification and document main body positioning models;
step two: subject identification
According to the document main body positioning model obtained in the first step, positioning a document main body and positioning a text line on an input picture, and simultaneously segmenting the document main body to obtain a text line part of the document;
step three: tilt correction
Selecting pixel points of the text line part obtained in the second step, fitting a straight line in a straight line fitting mode to obtain the integral inclination angle of the document, and performing rotation deviation correction on the document main body cut out in the second step according to the inclination angle to obtain a corrected document picture;
step four: automated thresholding and image reconstruction
Carrying out automatic threshold processing and image reconstruction on the corrected document picture obtained in the step three to obtain a standardized output picture;
step five: processing of tabular picture data
Selecting a part of table archive images from the images output in the step four in a standardized manner, and marking the line edge of the table through labelme;
step six: table partitioning
Inputting the data marked in the step five into a Unet network for training to obtain a table segmentation model, and performing table segmentation on an input picture according to the table segmentation model to obtain a picture of a cell;
step seven: text line segmentation
And performing text line segmentation on each cell segmented in the step six by using a CTPN model.
Further, the automatic threshold processing in the third step is to establish a dynamic threshold of the picture according to local pixel distribution of the picture, and perform threshold segmentation processing on the picture according to the dynamic threshold so as to retain most details of the picture and avoid loss of picture content; the image reconstruction in the third step is to output the pictures in a standardized manner, and the pictures are output according to the category and the size of A4 or A3.
Further, in step three, the system implements dynamic thresholding based on local image characteristics in the thresholding portion: let sigmaxyAnd mxyRepresenting a neighborhood S centered on coordinates (x, y) in an imagexyThe standard deviation and mean of the included set of pixels, the general form of the variable local threshold is:
Txy=aσxy+bmxy
where a and b are non-negative constants, the segmented image is calculated as follows:
Figure BDA0002957811220000031
where f (x, y) is the input image, and this equation is applied to all pixel locations in the imageEvaluate the row and use the neighborhood S at each point (x, y)xyThe pixels in (1) calculate different threshold values Txy
Further, in the seventh step, the CTPN model is trained by adopting an open source data set, the trained pictures are divided into a training set and a verification set according to the ratio of 99:1, the data randomly generate 5990 characters comprising Chinese characters, English letters, numbers and punctuations by utilizing a Chinese language database through font, size, gray scale, blur, perspective and stretching change, 10 characters are fixed on each sample, the characters are randomly intercepted from sentences in the language database, and the resolution of the pictures is unified as 280x 32.
Furthermore, 1000 table file images are selected in the fifth step.
Further, in the sixth step, the Unet is input for training, and the iteration number is 80000 times.
An automatic filing processing system for scanning archival images based on deep learning comprises an image finishing system and a document automatic processing system;
the image finishing system is used for carrying out data preprocessing and model training on a picture to be processed, obtaining a document main body positioning model based on deep learning through training, identifying and carrying out inclination correction on a main body of an image document by using the document main body positioning model, and then carrying out automatic threshold processing on a text to obtain a form archive image after image finishing;
the automatic document processing system is used for segmenting the table by using a deep learning segmentation network Unet and segmenting each cell in the table by using a CTPN model to the table archive image processed by the image finishing system, and provides a basis for OCR recognition and data extraction of subsequent table data.
Further, the automatic threshold processing refers to establishing a dynamic threshold of the picture according to local pixel distribution of the picture, and performing threshold segmentation processing on the picture according to the dynamic threshold so as to retain most details of the picture and avoid loss of picture contents.
Further, the image finishing system realizes the local map-based on the threshold segmentation partDynamic thresholding of image characteristics: let sigmaxyAnd mxyRepresenting a neighborhood S centered on coordinates (x, y) in an imagexyThe standard deviation and mean of the included set of pixels, the general form of the variable local threshold is:
Txy=aσxy+bmxy
where a and b are non-negative constants, the segmented image is calculated as follows:
Figure BDA0002957811220000041
where f (x, y) is the input image, this equation evaluates all pixel locations in the image and uses the neighborhood S at each point (x, y)xyThe pixels in (1) calculate different threshold values Txy
Further, the CTPN model is trained by adopting an open source data set, the trained pictures are divided into a training set and a verification set according to the ratio of 99:1, 5990 characters comprising Chinese characters, English letters, numbers and punctuations are randomly generated by using a Chinese language database through font, size, gray scale, fuzzy, perspective and stretching changes, 10 characters are fixed on each sample, the characters are randomly intercepted from sentences in the language database, and the resolution of the pictures is unified to 280x 32.
The method comprises the steps of firstly carrying out inclination correction on a main body of a document through a document main body positioning model obtained through training, then carrying out automatic threshold processing on a text, and finally carrying out region segmentation text extraction. Experimental analysis proves that the overall accuracy of image processing of the system archives reaches more than 91.7%, and the usability of the system in the field of archives digitization is proved.
Drawings
FIG. 1 is a schematic flow chart of an automated archive processing method for scanned archive images based on deep learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a Unet network architecture for use with the present invention;
FIG. 3 is a comparison of the effect of a conventional single threshold versus a dynamic threshold;
FIG. 4 is a process effect diagram of the system of the present invention;
fig. 5 is a schematic diagram of CTPN implementation steps of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1, an embodiment of the invention provides an automatic filing system for scanning an archival image based on deep learning, including an image finishing system and a document automatic processing system.
The image finishing system is used for carrying out data preprocessing and model training on a picture to be processed, obtaining a document main body positioning model based on deep learning through training, identifying and carrying out inclination correction on a main body of an image document by using the document main body positioning model, and then carrying out automatic threshold processing on a text to obtain a form archive image after image finishing; the image finishing system divides image documents into 5 types of images, paper, handwriting, tables, photos and other types, carries out different processing according to different types, carries out image reconstruction after main body recognition, inclination calculation, inclination correction and threshold processing operation to obtain a preprocessed image result, and divides the image output result into the above 5 types;
the automatic document processing system is used for segmenting the table by using a deep learning segmentation network Unet and segmenting each cell in the table by using a CTPN model to the table archive image processed by the image finishing system, and provides a basis for OCR recognition and data extraction of subsequent table data.
The embodiment of the invention also provides an automatic filing processing method of the scanned archive image based on deep learning, which comprises the following steps:
the method comprises the following steps: data preprocessing and model training
Experiments show that a single processing mode cannot meet the processing requirements of all pictures, so that the pictures to be processed need to be preprocessed, and specifically, the pictures to be processed are divided into five types: drawing, handwriting, table, photo and other types, and marking the document body and the text line for each type of picture; and then training the preprocessed pictures by using Object Detection and Faster CNN models to obtain picture classification and document main body positioning models.
Step two: subject identification
And C, according to the document main body positioning model obtained in the step I, positioning the document main body and positioning the text line on the input picture, and simultaneously segmenting the document main body to obtain the text line part of the document.
Step three: tilt correction
And C, selecting pixel points of the text line part obtained in the step two, fitting a straight line in a straight line fitting mode to obtain the integral inclination angle of the document, and performing rotation deviation correction on the document main body cut out in the step two according to the inclination angle to obtain a corrected document picture.
Step four: automated thresholding and image reconstruction
And performing automatic threshold processing and image reconstruction on the corrected document picture obtained in the step three. The automatic threshold processing is to establish a dynamic threshold of the picture according to the local pixel distribution of the picture, and perform threshold segmentation processing on the picture according to the dynamic threshold so as to retain most details of the picture and avoid the loss of the content of the picture; the image reconstruction is to output the normalized picture in the size of a4 or A3 according to the type.
In the process of image processing, it is desirable to retain more valuable information of an image file, eliminate noise and the like existing in an image, and the system realizes dynamic threshold processing based on local image characteristics in a threshold segmentation part. Compared with the traditional single thresholdAnd the segmentation effect of the dynamic threshold processing on the image with uneven illumination is better. Let sigmaxyAnd mxyRepresenting a neighborhood S centered on coordinates (x, y) in an imagexyThe standard deviation and mean of the included set of pixels, the general form of the variable local threshold is:
Txy=aσxy+bmxy
where a and b are non-negative constants, the segmented image is calculated as follows:
Figure BDA0002957811220000071
where f (x, y) is the input image, this equation evaluates all pixel locations in the image and uses the neighborhood S at each point (x, y)xyThe pixels in (1) calculate different threshold values Txy. By dynamic threshold processing based on local image characteristics, the problem of image information loss can be avoided, and the quality of the processed archive image is ensured.
After the image is subjected to the threshold segmentation processing, the image needs to be reconstructed into 2480 × 3508 pixels according to the requirement that the resolution is 300 pixels/inch, so when the size of the image is smaller than 2480 × 3508, pixels are padded around the image to be in the size of a4, and the standardized output is ensured.
The processing result pair obtained by using the conventional single threshold and dynamic threshold is shown in fig. 3. According to the comparison between the left part and the right part in the picture, the blurring and the losing of the characters exist on the right side in the left half picture, and the characters are well kept in the right half picture. The reason is that: the right side of the original image has insufficient light, which results in large pixel values of the text part (the larger the pixel value is, the closer the image color is to white), and if a single threshold value is adopted to process the image, the details of the image are lost. And if the dynamic threshold is adopted, the threshold is selected according to the local characteristics of the picture, so that the details in the picture are not lost too much.
Step five: processing of tabular picture data
And the first step to the fourth step are image finishing steps, five types of pictures which are output in a standardized mode can be obtained through the image finishing steps, 1000 table file images are selected, and line edges of the tables are marked through labelme.
Step six: table partitioning
Inputting the data marked in the fifth step into a Unet network for training, wherein the iteration times are 80000 times to obtain a table segmentation model, and performing table segmentation on an input picture according to the table segmentation model to obtain a picture of a cell.
The embodiment of the invention uses the Unet network structure to further divide and process the table output picture obtained by image finishing. The structure of the Unet as shown in fig. 2, the Unet network is made up of two parts, a contracted path on the left (down) and an expanded path on the right (up). Wherein the puncturing path follows a typical convolutional network structure, which consists of two repeated 3 x3 convolutional kernels (unfilled convolution), and both use modified linear unit (ReLU) activation functions and a 2 x 2 max pooling operation with step size 2 for downsampling, and the number of feature channels is doubled in each downsampling step.
In the expanding path, each step up-samples the feature map; then carrying out convolution operation by using 2 x 2 convolution kernels so as to reduce the number of the characteristic channels by half; then, corresponding cut characteristic graphs in the cascade contraction path are obtained; convolution operations were then performed with two convolution kernels of 3 x3, with both relus being used for the activation functions. In each convolution operation, the feature map needs to be clipped by the edge missing pixels. In the last layer, convolution operation is carried out by using convolution kernels of 1 x 1, and each feature vector of 64 dimensions is mapped to an output layer of the network. As can be seen from FIG. 2, the network has 23 convolutional layers.
The network adopts a common Encoder-Decoder structure, and adds the operation of directly intercepting information from the Encoder and putting the information in the Decoder into the original structure, so that the operation can effectively retain the edge detail information in the original image and prevent the loss of excessive edge information. Here, it should be noted that: to ensure seamless splicing of the output segment map, careful selection of the input picture size is required to ensure that all Max Pooling operations are applied to layers with even x-size and even y-size.
Step seven: text line segmentation
And performing text line segmentation on each segmented cell by using a CTPN model. The CTPN model adopts an open source data set to train about 364 pictures in total, and is divided into a training set and a verification set according to the ratio of 99: 1. The data utilizes a Chinese language database (news + Chinese), 5990 characters comprising Chinese characters, English letters, numbers and punctuations are randomly generated through changes of fonts, sizes, gray levels, fuzziness, perspective, stretching and the like, 10 characters are fixed in each sample, the characters are randomly intercepted from sentences in the language database, and the resolution of pictures is unified to 280x 32.
The specific implementation process of CTPN comprises three steps: detecting a small-scale text box, circularly connecting the text box and thinning a text line edge. The specific implementation steps are as follows, as shown in fig. 5:
1. extracting features by using VGG16 as base net to obtain the features of conv5_3 as feature map, wherein the size is W multiplied by H multiplied by C;
2. sliding on the feature map by using sliding windows with the size of 3 × 3, wherein each window can obtain a feature vector with the length of 3 × 3 × C, and the center of each sliding window can predict k offsets relative to the anchor;
3. inputting the features obtained in the last step into a bidirectional LSTM to obtain an output with the length of W multiplied by 256, and then connecting a 512 full-connection layer to prepare for output;
4. the output layer section has three main outputs: one is 2k vertical coordinates, one is 2k outputs (note that the output here is an offset from the anchor) because one anchor is represented by two values, the height of the center position (y-coordinate) and the height of the rectangular box; second, 2k scores, each of text and non-text having a score of 2k because k text spots are predicted; thirdly, k side-redefinitions which are mainly used for fine-trimming two end points of a text line and represent the horizontal translation amount of each proposal;
5. filtering out redundant text disposals using a standard non-maximum suppression algorithm;
6. and finally, combining the obtained text segments into a text line by using a text line construction algorithm based on a graph.
Through the steps, the text lines in the cells are segmented and recognized, and a basis can be provided for OCR recognition of the text lines and extraction of data. The reason why table segmentation is required is that if table segmentation is not performed, global OCR recognition is directly performed on a document image, a structured document such as a table document cannot be digitally restored, and for example, when the table document is reconstructed into an Excel table, the correspondence relationship of cells of the structured document is lost. In addition, the text line segmentation is performed on the form cells mainly for improving the accuracy of text recognition.
Results and analysis of the experiments
Experimental setup:
the data used in the embodiment of the invention is real data extracted from paper archives of a company in Wuhan City in nearly 30 years. Because of the large number of files, the samples with high frequency and strong characteristics in the files are selected. And finally, 6779 samples are obtained by selection, and the ratio of the training sample to the verification sample is 1: 1.
The system relates to two parts, wherein the first part is an image finishing part, the second part is a document automatic processing part, model training processes of the two parts are independent of each other, and the output of the first part is used as the input of the second part. The following is a detailed experimental procedure in two parts.
The image finishing part marks 6779 file samples selected by experiments on text positioning characteristics by using a LabelImg tool to manufacture a VOC2007 format data set. Features are grouped into two categories in total: one category performs text body positioning and the other category performs text line positioning. The size of batch _ size is set to 1 in terms of training parameter setting, the initial value of the learning rate is set to 0.002 the learning rate is automatically set to 0.00002 when the number of iterations exceeds 90000, the learning rate is automatically set to 0.000002 when the number of iterations exceeds 120000, the pooling is set to 2 × 2, the number of iterations is set to 200000, and the remaining parameters are defaulted. And evaluating the accuracy of the model obtained by training. If the success rate of the first class identification is a1 and the success rate of the second class identification is b1, the overall identification success rate is a1 × b 1. Through the analysis of success rate and other information, the defects of the weight model are found and the data set and partial parameters are adjusted until the total accuracy rate reaches about 90%.
In the automatic document processing part, 1000 table file images are selected, and line edges of the table are marked through labelImg. And inputting Unet for training, wherein the iteration number is 80000 times. After the table is segmented by Unet, text line segmentation is carried out on each segmented table unit by using a CTPN model, the CTPN model adopts an open source data set to train for about 364 pictures, and the CTPN model is divided into a training set and a verification set according to the ratio of 99: 1. The data utilizes a Chinese language database (news + Chinese), 5990 characters comprising Chinese characters, English letters, numbers and punctuations are randomly generated through changes of fonts, sizes, gray levels, fuzziness, perspective, stretching and the like, 10 characters are fixed in each sample, sentences in the language database are randomly intercepted by the characters, and the resolution of pictures is unified as 280x 32.
Results and analysis:
table 1 below is the training effect of the image finishing,
TABLE 1 image grooming training results
Figure BDA0002957811220000111
The following is a test set of 100 untrained photographs used to verify the segmentation effect of the Unet table, which is shown in Table 2. Of which 50 sheets were subjected to image finishing pretreatment and 50 sheets were not subjected to pretreatment.
Table 2 Unet table segmentation evaluation
Figure BDA0002957811220000112
For CTPN, the present invention replaces original CNN network VGG-16[8] with DenseNet [9] network, the comparison effect is shown in Table 3,
TABLE 3 CTPN text line segmentation evaluation
Figure BDA0002957811220000113
The system processing effect is shown in fig. 4.
Analysis experiment results show that the overall accuracy of image finishing reaches 96.5%, and certain errors exist because the input images have the problems of unobvious features, such as image blurring, excessive font density and the like, which result in that text lines cannot be effectively extracted, and the processing effect is poor. In addition, the effect of Unet table segmentation is good on 100 test pictures, although a table segmentation error phenomenon exists, the occurrence probability is low, and meanwhile, the table segmentation error does not have great influence on the subsequent data extraction operation. Meanwhile, the processing speed and the accuracy of two CNN networks VGG-16 and DenseNet in the CTPN are compared in the experiment, it can be found that both have high enough accuracy, DenseNet can achieve higher accuracy on the basis of sacrificing time, and the selection of the two networks can be determined according to specific requirements.
Through the work, the pictures with irregularities such as inclination, uneven illumination, noise interference and the like can be finished into standardized pictures specified in the digitized paper archive specification, and meanwhile, on the basis, the automatic processing scheme provided by the invention can be used for carrying out table segmentation and labeling on the table archive pictures, so that a good picture data format is provided for extracting designated position data of the table archive.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. An automatic filing processing method for scanned archive images based on deep learning is characterized in that: the method comprises the following steps:
the method comprises the following steps: data preprocessing and model training
The pictures to be processed are divided into five categories: drawing, handwriting, table, photo and other types, and marking the document body and the text line for each type of picture; then, training the preprocessed pictures by using Object Detection and Faster CNN models to obtain picture classification and document main body positioning models;
step two: subject identification
According to the document main body positioning model obtained in the first step, positioning a document main body and positioning a text line on an input picture, and simultaneously segmenting the document main body to obtain a text line part of the document;
step three: tilt correction
Selecting pixel points of the text line part obtained in the second step, fitting a straight line in a straight line fitting mode to obtain the integral inclination angle of the document, and performing rotation deviation correction on the document main body cut out in the second step according to the inclination angle to obtain a corrected document picture;
step four: automated thresholding and image reconstruction
Carrying out automatic threshold processing and image reconstruction on the corrected document picture obtained in the step three to obtain a standardized output picture;
step five: processing of tabular picture data
Selecting a part of table archive images from the images output in the step four in a standardized manner, and marking the line edge of the table through labelme;
step six: table partitioning
Inputting the data marked in the step five into a Unet network for training to obtain a table segmentation model, and performing table segmentation on an input picture according to the table segmentation model to obtain a picture of a cell;
step seven: text line segmentation
And performing text line segmentation on each cell segmented in the step six by using a CTPN model.
2. The automated archival processing method of deep learning based scan archival images of claim 1, characterized by: the automatic threshold processing in the third step is to establish a dynamic threshold of the picture according to the local pixel distribution of the picture, and perform threshold segmentation processing on the picture according to the dynamic threshold so as to retain most details of the picture and avoid the loss of the content of the picture; the image reconstruction in the third step is to output the pictures in a standardized manner, and the pictures are output according to the category and the size of A4 or A3.
3. The automated archival processing method of deep learning based scan archival images of claim 1, characterized by: in step three, the system realizes dynamic threshold processing based on local image characteristics in a threshold segmentation part: let sigmaxyAnd mxyRepresenting a neighborhood S centered on coordinates (x, y) in an imagexyThe standard deviation and mean of the included set of pixels, the general form of the variable local threshold is:
Txy=aσxy+bmxy
where a and b are non-negative constants, the segmented image is calculated as follows:
Figure FDA0002957811210000021
where f (x, y) is the input image, this equation evaluates all pixel locations in the image and uses the neighborhood S at each point (x, y)xyThe pixels in (1) calculate different threshold values Txy
4. The automated archival processing method of deep learning based scan archival images of claim 1, characterized by: in the seventh step, the CTPN model is trained by adopting an open source data set, the trained pictures are divided into a training set and a verification set according to the ratio of 99:1, 5990 characters comprising Chinese characters, English letters, numbers and punctuations are randomly generated by using a Chinese language database through font, size, gray scale, blur, perspective and stretching change, 10 characters are fixed on each sample, sentences in the language database are randomly intercepted by the characters, and the resolution of the pictures is unified as 280x 32.
5. The automated archival processing method of deep learning based scan archival images of claim 1, characterized by: and step five, selecting 1000 table file images.
6. The automated archival processing method of deep learning based scan archival images of claim 1, characterized by: and inputting Unet for training in the sixth step, wherein the iteration times are 80000 times.
7. An automated archival processing system for scanning archival images based on deep learning, characterized by: comprises an image finishing system and a document automatic processing system;
the image finishing system is used for carrying out data preprocessing and model training on a picture to be processed, obtaining a document main body positioning model based on deep learning through training, identifying and carrying out inclination correction on a main body of an image document by using the document main body positioning model, and then carrying out automatic threshold processing on a text to obtain a form archive image after image finishing;
the automatic document processing system is used for segmenting the table by using a deep learning segmentation network Unet and segmenting each cell in the table by using a CTPN model to the table archive image processed by the image finishing system, and provides a basis for OCR recognition and data extraction of subsequent table data.
8. The automated archival processing system of deep learning based scan archival images of claim 7, wherein: the automatic threshold processing is to establish a dynamic threshold of the picture according to the local pixel distribution of the picture, and perform threshold segmentation processing on the picture according to the dynamic threshold so as to retain most details of the picture and avoid the loss of the content of the picture.
9. The automated archival processing system of deep learning based scanned archival images of claim 8, wherein: the image finishing system implements dynamic thresholding based on local image characteristics in a thresholding section: let sigmaxyAnd mxyRepresenting a neighborhood S centered on coordinates (x, y) in an imagexyThe standard deviation and mean of the included set of pixels, the general form of the variable local threshold is:
Txy=aσxy+bmxy
where a and b are non-negative constants, the segmented image is calculated as follows:
Figure FDA0002957811210000031
where f (x, y) is the input image, this equation evaluates all pixel locations in the image and uses the neighborhood S at each point (x, y)xyThe pixels in (1) calculate different threshold values Txy
10. The automated archival processing system of deep learning based scan archival images of claim 7, wherein: the CTPN model is trained by adopting an open source data set, a trained picture is divided into a training set and a verification set according to the ratio of 99:1, 5990 characters comprising Chinese characters, English letters, numbers and punctuations are randomly generated by using a Chinese language database through font, size, gray scale, blur, perspective and stretching change, 10 characters are fixed in each sample, sentences in the language database are randomly intercepted by the characters, and the resolution of the picture is unified to 280x 32.
CN202110230772.4A 2021-03-02 2021-03-02 Automatic filing processing system and method for scanned archive image based on deep learning Pending CN113065396A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110230772.4A CN113065396A (en) 2021-03-02 2021-03-02 Automatic filing processing system and method for scanned archive image based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110230772.4A CN113065396A (en) 2021-03-02 2021-03-02 Automatic filing processing system and method for scanned archive image based on deep learning

Publications (1)

Publication Number Publication Date
CN113065396A true CN113065396A (en) 2021-07-02

Family

ID=76559534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110230772.4A Pending CN113065396A (en) 2021-03-02 2021-03-02 Automatic filing processing system and method for scanned archive image based on deep learning

Country Status (1)

Country Link
CN (1) CN113065396A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642401A (en) * 2021-07-13 2021-11-12 常州微亿智造科技有限公司 Document line segmentation and classification method and system based on deep learning network
CN113793264A (en) * 2021-09-07 2021-12-14 北京航星永志科技有限公司 Archive image processing method and system based on convolution model and electronic equipment
CN114141108A (en) * 2021-12-03 2022-03-04 中国科学技术大学 Blind-aiding voice-aided reading equipment and method
CN116433494A (en) * 2023-04-19 2023-07-14 南通大学 File scanning image automatic correction and trimming method based on deep learning

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163198A (en) * 2018-09-27 2019-08-23 腾讯科技(深圳)有限公司 A kind of Table recognition method for reconstructing, device and storage medium
CN110348294A (en) * 2019-05-30 2019-10-18 平安科技(深圳)有限公司 The localization method of chart, device and computer equipment in PDF document
CN110363102A (en) * 2019-06-24 2019-10-22 北京融汇金信信息技术有限公司 A kind of identification of objects process method and device of pdf document
CN111027297A (en) * 2019-12-23 2020-04-17 海南港澳资讯产业股份有限公司 Method for processing key form information of image type PDF financial data
CN112052853A (en) * 2020-09-09 2020-12-08 国家气象信息中心 Text positioning method of handwritten meteorological archive data based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163198A (en) * 2018-09-27 2019-08-23 腾讯科技(深圳)有限公司 A kind of Table recognition method for reconstructing, device and storage medium
CN110348294A (en) * 2019-05-30 2019-10-18 平安科技(深圳)有限公司 The localization method of chart, device and computer equipment in PDF document
CN110363102A (en) * 2019-06-24 2019-10-22 北京融汇金信信息技术有限公司 A kind of identification of objects process method and device of pdf document
CN111027297A (en) * 2019-12-23 2020-04-17 海南港澳资讯产业股份有限公司 Method for processing key form information of image type PDF financial data
CN112052853A (en) * 2020-09-09 2020-12-08 国家气象信息中心 Text positioning method of handwritten meteorological archive data based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孙婧婧等: "基于轻量级网络的自然场景下的文本检测", 《电子测量技术》 *
应自炉等: "多特征融合的文档图像版面分析", 《中国图象图形学报》 *
盛业华: "《数学形态学理论与地图扫描识别技术》", 30 June 1999, 中国矿业大学出版社 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642401A (en) * 2021-07-13 2021-11-12 常州微亿智造科技有限公司 Document line segmentation and classification method and system based on deep learning network
CN113793264A (en) * 2021-09-07 2021-12-14 北京航星永志科技有限公司 Archive image processing method and system based on convolution model and electronic equipment
CN113793264B (en) * 2021-09-07 2022-11-15 北京航星永志科技有限公司 Archive image processing method and system based on convolution model and electronic equipment
CN114141108A (en) * 2021-12-03 2022-03-04 中国科学技术大学 Blind-aiding voice-aided reading equipment and method
CN116433494A (en) * 2023-04-19 2023-07-14 南通大学 File scanning image automatic correction and trimming method based on deep learning
CN116433494B (en) * 2023-04-19 2024-02-02 南通大学 File scanning image automatic correction and trimming method based on deep learning

Similar Documents

Publication Publication Date Title
CN110210413B (en) Multidisciplinary test paper content detection and identification system and method based on deep learning
CN109241894B (en) Bill content identification system and method based on form positioning and deep learning
CN111814722B (en) Method and device for identifying table in image, electronic equipment and storage medium
CN113065396A (en) Automatic filing processing system and method for scanned archive image based on deep learning
CN111401372A (en) Method for extracting and identifying image-text information of scanned document
CN112836650B (en) Semantic analysis method and system for quality inspection report scanning image table
CN110647885B (en) Test paper splitting method, device, equipment and medium based on picture identification
CN110647795A (en) Form recognition method
CN111626292B (en) Text recognition method of building indication mark based on deep learning technology
CN112052852A (en) Character recognition method of handwritten meteorological archive data based on deep learning
CN112364834A (en) Form identification restoration method based on deep learning and image processing
CN109741273A (en) A kind of mobile phone photograph low-quality images automatically process and methods of marking
CN115331245A (en) Table structure identification method based on image instance segmentation
CN115578741A (en) Mask R-cnn algorithm and type segmentation based scanned file layout analysis method
CN111652117A (en) Method and medium for segmenting multi-document image
CN116824608A (en) Answer sheet layout analysis method based on target detection technology
CN112200789B (en) Image recognition method and device, electronic equipment and storage medium
CN113139535A (en) OCR document recognition method
CN112509026A (en) Insulator crack length identification method
JP5211449B2 (en) Program, apparatus and method for adjusting recognition distance, and program for recognizing character string
CN115171133A (en) Table structure detection method for leveling irregular table image
CN115880566A (en) Intelligent marking system based on visual analysis
CN114565749A (en) Method and system for identifying key content of visa document of power construction site
CN114066861A (en) Coal and gangue identification method based on cross algorithm edge detection theory and visual features
CN112565549A (en) Book image scanning method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination