CN116433494A - File scanning image automatic correction and trimming method based on deep learning - Google Patents

File scanning image automatic correction and trimming method based on deep learning Download PDF

Info

Publication number
CN116433494A
CN116433494A CN202310420534.9A CN202310420534A CN116433494A CN 116433494 A CN116433494 A CN 116433494A CN 202310420534 A CN202310420534 A CN 202310420534A CN 116433494 A CN116433494 A CN 116433494A
Authority
CN
China
Prior art keywords
image
model
trimming
module
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310420534.9A
Other languages
Chinese (zh)
Other versions
CN116433494B (en
Inventor
孙强
吉红慧
陈逸彬
蒋行健
曹张华
邵蔚
黄勋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202310420534.9A priority Critical patent/CN116433494B/en
Publication of CN116433494A publication Critical patent/CN116433494A/en
Application granted granted Critical
Publication of CN116433494B publication Critical patent/CN116433494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • G06T3/608Rotation of whole images or parts thereof by skew deformation, e.g. two-pass or three-pass rotation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/94Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of automatic correction and trimming of file scanning images, in particular to an automatic correction and trimming method of file scanning images based on deep learning, which comprises the following steps: preprocessing the archive scanning image; placing the processed image data set into an angle correction and edge cutting network model for training; extracting features of the image; carrying out automatic correction and trimming treatment on the image; and processing the file scanning image by using the model obtained by training, and outputting the file scanning image subjected to automatic correction and trimming processing. The model comprises a feature extraction module, a deviation rectifying module and a trimming module. Meanwhile, the self-adaptive convolution module and the channel attention module are respectively added in the deviation rectifying module and the edge cutting module, so that smaller angle deviation can be accurately processed and the edge ambiguity of the image can be reduced. The method can improve the calculation speed of the model, enable the model to be lighter, improve the correction and edge cutting efficiency of the file scanning image and accurately process the small-angle offset picture.

Description

File scanning image automatic correction and trimming method based on deep learning
Technical Field
The invention relates to the technical field of automatic correction and trimming of file scanning images, in particular to an automatic correction and trimming method of file scanning images based on deep learning.
Background
In modern society, digitization has become a main mode of information processing, and digitization processing of various documents is becoming more and more common. When the file is scanned, the problems of distortion, inclination, unfilled corner and the like of the scanned image are caused by the reasons that the scanner is improperly placed or the file is not placed flatly and the like. These problems can have a serious impact on subsequent document identification, classification, retrieval, and browsing. Therefore, automatic correction and trimming processing for the scanned images of the files become an important link in the digital file processing.
Currently, there are many methods for solving the document edge correction and trimming problem. The traditional method based on corner detection utilizes geometric shape characteristics of images to rectify deviation, but larger errors can occur when noise, shadow and the like exist in some document images. In addition, the method based on edge detection utilizes image edge information to correct deviation. In recent years, the development of deep learning technology has significantly improved the document deviation correcting effect. The document correction method based on the Convolutional Neural Network (CNN) is widely applied. For example, the method carries out self-adaptive correction of the document by constructing a CNN model, and the method realizes self-adaptive correction by learning structural information in the document, but has the problem of inaccurate correction when the distortion angle of some documents is smaller. In addition, another method for correcting deviation of the document based on CNN realizes correction and cutting of the document by learning characteristic points and boundary information of the document, but has poor processing effect on the condition that the document has redundant frames.
Although the existing document correction method achieves a certain effect, it is still a challenge to achieve an accurate effect on some archival scan images that do not twist, do not unfilled corner, have a small or almost no offset angle, but need to remove the redundant frame. This is because the rotation angles of these document images are small, the conventional rotation angle detection method has difficulty in detecting these angles, and the frame information of the document images is not clear. Therefore, there is a need for an automatic correction and trimming method for file scan images based on deep learning, which can realize higher precision automatic correction and trimming when processing file scan images which are not distorted, are not unfilled, are very small in offset angle or are almost not offset, but need to remove redundant frames, and can process the difficult conditions of small angle, unclear frame information and the like, thereby improving the efficiency and precision of digital file processing. The invention aims to solve the defects and limitations of the prior art and improve the efficiency and the accuracy of automatic correction and trimming of the file scanning image.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides an automatic correction and edge cutting method for file scanning images based on deep learning, which is used for realizing higher-precision automatic correction and edge cutting, and can be used for processing the difficult conditions of small angle, unclear frame information and the like, and improving the efficiency and precision of digital file processing.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a file scanning image automatic correction and trimming method based on deep learning comprises the following steps:
s11, preprocessing an input archive scanning image, including: 1) image edge cutting, 2) image size adjustment, 3) synthesis of data sets, 4) data set grouping to meet the requirements of a model on an input data set;
s12, placing the processed archive scanning image data set into an angle correction and edge cutting network model for training, and extracting features of an input image by the model;
s13, adopting an ACMCN model to automatically detect edges in the image, automatically rectifying according to the edge positions, automatically detecting contents in the image, and automatically trimming according to the content positions to remove redundant edge parts;
s14, judging whether the processed image meets the set deviation correcting and edge cutting requirements at the moment, namely, the average angle deviation value of all straight lines in the processed file scanning image is between 0 degrees and 1 degree, the value of the edge ambiguity of all edges is between 0,0.1 degrees, and outputting the processed image if the requirements are met; if the requirement is not met, continuing to iterate the correction and trimming processing until the requirement is met and outputting a processed image;
s15, processing the file scanning image by using the model obtained through training and outputting the file scanning image after automatic correction and trimming processing.
Preferably, in step S11, said 1): the image edge cutting means that firstly, n original pictures which are basically cut and have 0-degree deflection are read, and if the original pictures are not 0-degree, the deviation is corrected to 0-degree manually; then, performing the next round of cutting operation, cutting out 60 pixels from the upper, lower, left and right of the image respectively so as to cut off the variegated color of the document edge, and generating a new image data set;
said 2): adjusting the image size refers to setting the height pixels of all images of the new image dataset to 480dpi, and simultaneously adjusting the width pixels of the images according to the corresponding proportion, so that the image size is unified, and a new image dataset is generated;
the 3): the synthesis data set refers to the operation of respectively taking 75% of a new image data set to perform vertical overturning and horizontal overturning, and simultaneously selecting 50% of images to perform rotation, wherein the rotation angle takes any value between intervals of minus 90 degrees and 90 degrees, then compression enhancement is performed on all the images, the lower limit of the image quality after compression is designated as 30JFIF, the upper limit is designated as 80JFIF, and 70% of the images are selected to perform random shadow enhancement, wherein the possible occurrence area of the designated shadow is the whole image, the lower limit of the shadow quantity is 0, and the upper limit is 1; then selecting 50% of pictures for random brightness and contrast enhancement, wherein the range of the appointed contrast adjustment is between 0.1 and 0.34, and the appointed image brightness is reduced by 50%; finally, randomly synthesizing a background picture common to z archives scanning images and n processed images, and generating a final data set;
said 4): the data set grouping refers to grouping the last data set into 6 pictures, wherein each group extracts 1 picture as a verification set, and the other 5 pictures are used as training sets, so that the last data set is divided into two parts of the training set and the verification set.
Preferably, in step S12, the angle correction and edge cutting network model includes: a) a feature extraction module, b) a correction module, c) an edge cutting module; the specific architecture is as follows:
a) The feature extraction module has a network depth of 16 layers, the network comprises 5 convolution layers, 3 full connection layers and 8 nonlinear activation layers, wherein each nonlinear activation layer uses a ReLU activation function, the network uses a dropout layer after each full connection layer to prevent overfitting, the discarding rates are respectively 0.4, 0.3 and 0.25, and the softmax activation function is used for final classification;
b) The network depth of the deviation rectifying module is 9 layers, the network comprises 4 convolution layers, 4 pooling layers and 1 fully connected layer, wherein the convolution kernel size of each convolution layer is 3 multiplied by 3, the number of convolution kernels is 32, 64, 128 and 256 in sequence, the pooling window size of each pooling layer is 2 multiplied by 2, all the convolution layers and the pooling layers use a ReLU activation function, and the number of neurons of the fully connected layer is 1 and a sigmoid activation function is used;
c) The network depth of the edge cutting module is 4 layers, the network comprises 3 convolution layers and 1 full connection layer, wherein the convolution kernel size of each convolution layer is 3 multiplied by 3, the number of convolution kernels is 32, 64 and 128 in sequence, in the full connection layer, the number of neurons of a first hidden layer is 256, and the number of neurons of a second hidden layer is 128; in addition, an adaptive convolution module and a channel attention module are added after the deviation correcting module and the trimming module, wherein the adaptive convolution module comprises a one-dimensional convolution layer, two adaptive convolution layers, the number of output channels of the one-dimensional convolution layer is 1, the number of input channels of the first adaptive convolution layer is 1, the number of output channels is 2, the number of input channels of the second adaptive convolution layer is 2, and the number of output channels is 1; the size and the shape of the convolution kernel of the self-adaptive convolution layer are dynamically generated, and the self-adaptive adjustment can be carried out according to the size and the shape of the input feature map; the channel attention module comprises a global average pooling layer, two full-connection layers and a sigmoid activation layer; the global average pooling layer carries out average pooling on the input feature images along the channel dimension to obtain a one-dimensional feature image, the two full-connection layers respectively compress and expand the input feature images along the channel dimension to obtain two one-dimensional feature images, the sigmoid activation layer adds the two one-dimensional feature images and normalizes the two one-dimensional feature images by using a sigmoid function to obtain a channel attention weight vector, and finally the channel attention weight vector and the input feature images are multiplied channel by channel to obtain the weighted feature images.
Preferably, in step S15, the model obtained by training refers to that the training set is input into the ACMCN model to perform training, and the value of num_epochs is set to 100, that is, the model will traverse the whole training set 100 times, the value of BATCH_size is set to 16, that is, 16 samples will be trained by the model each time, and the value of num_classes is set to 2, that is, the model is to perform two classification; the performance of the model obtained by training in step S15 needs to be evaluated by an IOU function and a Loss function, where the IOU is used to calculate the performance of object detection and semantic segmentation, the value range of the IOU is between 0 and 1, the larger the value is, the more accurate the prediction result is, that is, the better the performance is, and the formula is expressed as follows:
Figure BDA0004186685510000061
wherein, area A U B represents the intersection Area between A and B, area A U B represents the union Area between A and B, w overlap And h overlap Representing the width and height, w, of the overlap between A and B, respectively A 、h A 、w B And h B Separate tableThe width and the height of A and B are shown, wherein A represents the prediction effect of correction and trimming of the file scanning image, and B represents the actual effect of correction and trimming of the file scanning image;
the Loss function is a Loss function, when the model is trained, the ACMCN model predicts according to input data, and then calculates the difference between a predicted result and a real result, wherein the difference is the value of the Loss function; the smaller the Loss value, the closer the predicted result of the model is to the real result, namely the better the performance, and the formula is expressed as follows:
Figure BDA0004186685510000062
wherein y is i Label representing prediction result of ith file scan image, file scan image with correction cut edge is marked as 1, file scan image without correction cut edge is marked as 0, p i Representing the probability that the i-th file scanning image prediction result is the file scanning image needing correction and trimming, L i A value representing a cross entropy loss function of the ith archive scan image, N being the total number of samples of the training set;
whether the model is trained is measured through the IOU and the Loss, and when the IOU reaches 0.5 and the Loss is lower than 0.1, the target is considered to be correctly detected.
Compared with the prior art, the invention has the following beneficial effects:
1. compared with the traditional document deviation correcting dataset which has the characteristics of large distortion degree, unfilled corner and large inclination angle, the dataset effectively reduces the complexity of model processing, improves the speed of model calculation, enables the model to be lighter, and improves the deviation correcting and edge cutting efficiency of the document scanning image.
2. Aiming at the problem that the traditional document correction trimming method has larger defects in the effect of removing the file scanning image of the redundant frame, which is not distorted, is not unfilled, has a small offset angle or is hardly offset, the invention provides an ACMCN model, and an adaptive convolution module and a channel attention module are added behind the correction module, wherein the size and the shape of the convolution kernel of the adaptive convolution layer are dynamically generated, and the adaptive adjustment can be carried out according to the size and the shape of the input feature diagram. The channel attention module can weight the channel attention of the input feature graph so as to further improve the expression capability and discrimination performance of the features. The small angle offset of the scanned image of the document can be accurately processed, and the deviation rectifying effect of the document is improved.
3. According to the invention, the ACMCN model is additionally provided with the self-adaptive convolution module and the channel attention module behind the trimming module, so that the trimming module can detect the edges of the file scanning image more clearly, the edge ambiguity of the file scanning image is reduced, and the discrimination capability and the robustness of the ACMCN model to the input image can be effectively improved.
4. The invention also adds a judging mechanism after the ACMCN model processing is finished, and if the processed image meets the set requirement, the processed image is output; if the set requirements are not met, the correction and edge cutting processing is continued to be iterated until the requirements are met, and then the processed images are output. This mechanism will greatly enhance the effect of the model to process archival scan images.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a flow chart of data preprocessing in the present invention;
FIG. 3 is a flow chart of ACMCN model training in the present invention;
FIG. 4 is a block diagram of a feature extraction module of the ACMCN model of the present invention;
FIG. 5 is a block diagram of a correction module of the ACMCN model according to the present invention;
fig. 6 is a block diagram of a trimming module of the ACMCN model according to the present invention.
Detailed Description
The following technical solutions in the embodiments of the present invention will be clearly and completely described with reference to the accompanying drawings, so that those skilled in the art can better understand the advantages and features of the present invention, and thus the protection scope of the present invention is more clearly defined. The described embodiments of the present invention are intended to be only a few, but not all embodiments of the present invention, and all other embodiments that may be made by one of ordinary skill in the art without inventive faculty are intended to be within the scope of the present invention.
As shown in FIG. 1, the method for automatically rectifying and trimming the file scan image based on deep learning comprises the following steps:
s11, preprocessing an input archive scanning image, including: 1) image edge cutting, 2) image size adjustment, 3) synthesis of data sets, 4) data set grouping to meet the requirements of a model on an input data set;
s12, placing the processed archive scanning image data set into an angle correction and edge cutting network model for training, and extracting features of an input image by the model;
s13, adopting an ACMCN model to automatically detect edges in the image, automatically rectifying according to the edge positions, automatically detecting contents in the image, and automatically trimming according to the content positions to remove redundant edge parts;
s14, judging whether the processed image meets the set deviation correcting and edge cutting requirements at the moment, namely, the average angle deviation value of all straight lines in the processed file scanning image is between 0 degrees and 1 degree, the value of the edge ambiguity of all edges is between 0,0.1 degrees, and outputting the processed image if the requirements are met; if the requirement is not met, continuing to iterate the correction and trimming processing until the requirement is met and outputting a processed image;
the average angle deviation value of all straight lines in the processed archive scanning image refers to the deviation between each straight line of the archive scanning image and the expected angle, the absolute values of all the deviations are added up, and finally the average deviation is obtained by dividing the absolute values by the number of the straight lines. The calculation formula is as follows:
Figure BDA0004186685510000091
where l is the number of straight lines, θ k Is the angle of the kth straight line, θ 0 Is the desired angle, here set to 0.
The edge Blurry values between [0,0.1] for all edges refer to the degree of edge blurring in image processing, which is used to describe the sharpness of the image edges, with smaller values indicating sharper edges and larger values indicating sharper edges. The invention uses a Sobel operator, and the specific calculation formula is as follows:
Figure BDA0004186685510000092
in the above formula, I (I, j) represents a pixel value of an ith row and a jth column in the image, where i=0, …, u, j=0, …, v, i_arg represents an average pixel value of the entire image, and m represents a total number of pixels in the image. The summation symbol in the formula represents the summation of the ambiguities of all pixels in the image. In order to avoid negative numbers, pixel values are normalized to be within the range of [0,1 ]. The value range of the summation symbol is the number m of pixels of the whole image.
S15, processing the file scanning image by using the model obtained through training and outputting the file scanning image after automatic correction and trimming processing.
Specifically, as shown in fig. 2, the specific steps of preprocessing an input archival scan image in the data preprocessing flow chart of the present invention are as follows:
s21, the image edge cutting means that 8000 original pictures which are basically cut and are deflected by 0 degree are read, and if the original pictures are not 0 degree, the manual deviation correction is 0 degree. Then, a next round of cropping operation is performed, 60 pixels are cropped out from the image up, down, left and right, so that the variegation of the document edge is cropped out, and a new image data set is generated.
S22, adjusting the image size refers to setting the height pixels of all images of the new image data set to 480dpi, and adjusting the width pixels of the images according to the corresponding proportion, so that the image size is unified, and the new image data set is generated.
S23, synthesizing a data set, namely respectively taking 75% of a new image data set to perform vertical overturning and horizontal overturning, simultaneously selecting 50% of images to perform rotating operation, taking any value between intervals of-90 degrees and 90 degrees, then performing compression enhancement on all the images, designating the lower limit of the quality of the compressed images as 30JFIF, the upper limit as 80JFIF, and selecting 70% of the images to perform random shadow enhancement, wherein the possible occurrence area of designated shadows is the whole image, the lower limit of the number of shadows is 0, and the upper limit is 1. Then, 50% of the pictures are selected for random brightness and contrast enhancement, wherein the range of the designated contrast adjustment is between 0.1 and 0.34, and the designated image brightness is reduced by 50%. And finally, randomly synthesizing the common background pictures of 100 archival scanning images and 8000 processed images, and generating a final data set.
Wherein, compression enhancement is performed on all images, the lower limit of the image quality after the compression is designated as 30JFIF, and the upper limit of the image quality is designated as JFIF in 80JFIF, which means that JFIF format is generally used for representing JPEG images in image processing and storage. The lower and upper limits of the compressed image quality may be specified by compression quality parameters in JFIF format, the compression quality parameter Q of a JPEG image is typically expressed using an integer between 0 and 100, where q=100 represents the highest quality, the lowest compression ratio, and q=0 represents the lowest quality, the highest compression ratio. The lower compression limit refers to the minimum compression ratio that the JPEG encoder can achieve when compressing an image, and the upper compression limit refers to the maximum compression ratio that the JPEG encoder can achieve when compressing an image. The formula is as follows:
Y=X/Q
wherein Y represents the size of the file scanned image file after compression, X represents the size of the original uncompressed file scanned image file, Q represents the compression ratio, and all file scanned images in the invention are JPEG images.
S24, grouping the data sets refers to grouping the last data set according to 6 pictures, wherein each group extracts 1 picture as a verification set, and the other 5 pictures are used as training sets, so that the last data set is divided into two parts of the training set and the verification set.
Specifically, as shown in fig. 3, the architecture of the angle correction and edge cut network model is as follows:
s31, a feature extraction module: as shown in fig. 4, the network depth is 16 layers, the network contains 5 convolutional layers, 3 fully-connected layers, and 8 nonlinear active layers, each using a ReLU activation function, the network uses dropout layers after each fully-connected layer to prevent overfitting, discard rates of 0.4, 0.3, and 0.25, respectively, and final classification using a softmax activation function.
S32, a deviation rectifying module: as shown in fig. 5, the network depth is 9 layers, the network comprises 4 convolution layers, 4 pooling layers and 1 fully connected layer, wherein the convolution kernel size of each convolution layer is 3×3, the number of convolution kernels is 32, 64, 128 and 256 in turn, the pooling window size of each pooling layer is 2×2, all convolution layers and pooling layers use a ReLU activation function, and the number of neurons of the fully connected layer is 1 and uses a sigmoid activation function.
S33, a trimming module: as shown in fig. 6, the network depth is 4 layers, and the network comprises 3 convolution layers and 1 full connection layer, wherein the convolution kernel size of each convolution layer is 3×3, the number of convolution kernels is 32, 64 and 128 in sequence, and in the full connection layer, the number of neurons of the first hidden layer is 256, and the number of neurons of the second hidden layer is 128.
In addition, the invention adds an adaptive convolution module and a channel attention module after the deviation correcting module and the trimming module, wherein the adaptive convolution module comprises a one-dimensional convolution layer, two adaptive convolution layers, the number of output channels of the one-dimensional convolution layer is 1, the number of input channels of the first adaptive convolution layer is 1, the number of output channels is 2, the number of input channels of the second adaptive convolution layer is 2, and the number of output channels is 1. The size and the shape of the convolution kernel of the self-adaptive convolution layer are dynamically generated, and can be self-adaptively adjusted according to the size and the shape of the input feature map. The channel attention module comprises a global average pooling layer, two full connection layers and a sigmoid activation layer. The global average pooling layer carries out average pooling on the input feature images along the channel dimension to obtain a one-dimensional feature image, the two full-connection layers respectively compress and expand the input feature images along the channel dimension to obtain two one-dimensional feature images, the sigmoid activation layer adds the two one-dimensional feature images and normalizes the two one-dimensional feature images by using a sigmoid function to obtain a channel attention weight vector, and finally the channel attention weight vector and the input feature images are multiplied channel by channel to obtain the weighted feature images.
Specifically, the manufactured training set is input into the ACMCN model for training, the value of num_epochs is set to 100, that is, the model traverses the whole training set 100 times, the value of BATCH_size is set to 16, that is, each time the model trains 16 samples, the value of num_classes is set to 2, that is, the model is required to be classified into two categories. The performance of the model obtained through training needs to be evaluated through an IOU function and a Loss function, wherein the IOU is used for calculating the performance of target detection and semantic segmentation, the value range of the IOU is between 0 and 1, and the larger the numerical value is, the more accurate the prediction result is, namely the better the performance is. The formula is expressed as follows:
Figure BDA0004186685510000131
wherein, area A U B represents the intersection Area between A and B, area A U B represents the union Area between A and B, w overlap And h overlap Representing the width and height, w, of the overlap between A and B, respectively A 、h A 、w B And h B The width and the height of A and B are respectively represented, wherein A represents the prediction effect of the correction trimming of the file scanning image, and B represents the actual effect of the correction trimming of the file scanning image;
the Loss function is a Loss function, and when the model is trained, the ACMCN model predicts according to input data, and then calculates the difference between a predicted result and a real result, wherein the difference is the value of the Loss function. The smaller the Loss value, the closer the predicted outcome of the model is to the real outcome, i.e. the better the performance. The formula is expressed as follows:
Figure BDA0004186685510000132
wherein y is i Label representing prediction result of ith file scan image, file scan image with correction cut edge is marked as 1, file scan image without correction cut edge is marked as 0, p i Representing the probability that the i-th file scanning image prediction result is the file scanning image needing correction and trimming, L i And (3) representing the value of the cross entropy loss function of the ith archive scan image, wherein N is the total number of samples of the training set.
Therefore, whether the model is trained is measured through the IOU and the Loss, and when the IOU reaches 0.5 and the Loss reaches 0.1, the target can be considered to be correctly detected.
Regarding p i How to calculate, an explanation is given below:
the ACMCN model transmits the input image data to the neural network for processing, and a final output result is obtained. This output is a vector with each element representing the score of the corresponding class, i.e. the likelihood that the sample belongs to each class.
In order to convert the output result into probability values, a sigmoid function transformation needs to be performed for each score value. The formula of the sigmoid function is as follows:
Figure BDA0004186685510000141
and x is a score value output by the neural network, namely, the predicted result of the ith file scanning image is a score value of the file scanning image needing correction and trimming, and x is substituted into a sigmoid function to obtain pi, namely, the probability that the sample belongs to the file scanning image needing correction and trimming is obtained. The calculation formula of pi is as follows:
Figure BDA0004186685510000142
in addition, other metrics may be selected by those skilled in the art to evaluate the performance of the model. When the index value of the predicted result given by the model on the test set reaches a certain threshold value, the trained ACMCN model can be considered to give an accurate predicted result, and the invention does not limit the threshold value.
The description and practice of the invention disclosed herein will be readily apparent to those skilled in the art, and may be modified and adapted in several ways without departing from the principles of the invention. Accordingly, modifications or improvements may be made without departing from the spirit of the invention and are also to be considered within the scope of the invention.

Claims (4)

1. The file scanning image automatic correction and trimming method based on deep learning is characterized by comprising the following steps of:
s11, preprocessing an input archive scanning image, including: 1) image edge cutting, 2) image size adjustment, 3) synthesis of data sets, 4) data set grouping to meet the requirements of a model on an input data set;
s12, placing the processed archive scanning image data set into an angle correction and edge cutting network model for training, and extracting features of an input image by the model;
s13, adopting an ACMCN model to automatically detect edges in the image, automatically rectifying according to the edge positions, automatically detecting contents in the image, and automatically trimming according to the content positions to remove redundant edge parts;
s14, judging whether the processed image meets the set deviation correcting and edge cutting requirements at the moment, namely, the average angle deviation value of all straight lines in the processed file scanning image is between 0 degrees and 1 degree, the value of the edge ambiguity of all edges is between 0,0.1 degrees, and outputting the processed image if the requirements are met; if the requirement is not met, continuing to iterate the correction and trimming processing until the requirement is met and outputting a processed image;
s15, processing the file scanning image by using the model obtained through training and outputting the file scanning image after automatic correction and trimming processing.
2. The method for automatically rectifying and trimming an archive scan image based on deep learning according to claim 1, wherein in step S11, the following steps 1: the image edge cutting means that firstly, n original pictures which are basically cut and have 0-degree deflection are read, and if the original pictures are not 0-degree, the deviation is corrected to 0-degree manually; then, performing the next round of cutting operation, cutting out 60 pixels from the upper, lower, left and right of the image respectively so as to cut off the variegated color of the document edge, and generating a new image data set;
said 2): adjusting the image size refers to setting the height pixels of all images of the new image dataset to 480dpi, and simultaneously adjusting the width pixels of the images according to the corresponding proportion, so that the image size is unified, and a new image dataset is generated;
the 3): the synthesis data set refers to the operation of respectively taking 75% of a new image data set to perform vertical overturning and horizontal overturning, and simultaneously selecting 50% of images to perform rotation, wherein the rotation angle takes any value between intervals of minus 90 degrees and 90 degrees, then compression enhancement is performed on all the images, the lower limit of the image quality after compression is designated as 30JFIF, the upper limit is designated as 80JFIF, and 70% of the images are selected to perform random shadow enhancement, wherein the possible occurrence area of the designated shadow is the whole image, the lower limit of the shadow quantity is 0, and the upper limit is 1; then selecting 50% of pictures for random brightness and contrast enhancement, wherein the range of the appointed contrast adjustment is between 0.1 and 0.34, and the appointed image brightness is reduced by 50%; finally, randomly synthesizing a background picture common to z archives scanning images and n processed images, and generating a final data set;
said 4): the data set grouping refers to grouping the last data set into 6 pictures, wherein each group extracts 1 picture as a verification set, and the other 5 pictures are used as training sets, so that the last data set is divided into two parts of the training set and the verification set.
3. The method for automatically rectifying and trimming an archive scan image based on deep learning as claimed in claim 1, wherein in step S12, the angle-correcting and edge-cutting network model comprises: a) a feature extraction module, b) a correction module, c) an edge cutting module; the specific architecture is as follows:
a) The feature extraction module has a network depth of 16 layers, the network comprises 5 convolution layers, 3 full connection layers and 8 nonlinear activation layers, wherein each nonlinear activation layer uses a ReLU activation function, the network uses a dropout layer after each full connection layer to prevent overfitting, the discarding rates are respectively 0.4, 0.3 and 0.25, and the softmax activation function is used for final classification;
b) The network depth of the deviation rectifying module is 9 layers, the network comprises 4 convolution layers, 4 pooling layers and 1 fully connected layer, wherein the convolution kernel size of each convolution layer is 3 multiplied by 3, the number of convolution kernels is 32, 64, 128 and 256 in sequence, the pooling window size of each pooling layer is 2 multiplied by 2, all the convolution layers and the pooling layers use a ReLU activation function, and the number of neurons of the fully connected layer is 1 and a sigmoid activation function is used;
c) The network depth of the edge cutting module is 4 layers, the network comprises 3 convolution layers and 1 full connection layer, wherein the convolution kernel size of each convolution layer is 3 multiplied by 3, the number of convolution kernels is 32, 64 and 128 in sequence, in the full connection layer, the number of neurons of a first hidden layer is 256, and the number of neurons of a second hidden layer is 128; in addition, an adaptive convolution module and a channel attention module are added after the deviation correcting module and the trimming module, wherein the adaptive convolution module comprises a one-dimensional convolution layer, two adaptive convolution layers, the number of output channels of the one-dimensional convolution layer is 1, the number of input channels of the first adaptive convolution layer is 1, the number of output channels is 2, the number of input channels of the second adaptive convolution layer is 2, and the number of output channels is 1; the size and the shape of the convolution kernel of the self-adaptive convolution layer are dynamically generated, and the self-adaptive adjustment can be carried out according to the size and the shape of the input feature map; the channel attention module comprises a global average pooling layer, two full-connection layers and a sigmoid activation layer; the global average pooling layer carries out average pooling on the input feature images along the channel dimension to obtain a one-dimensional feature image, the two full-connection layers respectively compress and expand the input feature images along the channel dimension to obtain two one-dimensional feature images, the sigmoid activation layer adds the two one-dimensional feature images and normalizes the two one-dimensional feature images by using a sigmoid function to obtain a channel attention weight vector, and finally the channel attention weight vector and the input feature images are multiplied channel by channel to obtain the weighted feature images.
4. The method of automatic correction and edge trimming for file scan image based on deep learning as claimed in claim 2, wherein in step S15, the training obtained model refers to the training set as claimed in claim 2 is input into ACMCN model for training, the value of num_epochs is set to 100, i.e. the model will traverse the whole training set 100 times, the value of BATCH_size is set to 16, i.e. each time the model will train 16 samples, the value of num_class is set to 2, i.e. the model is indicated to be classified into two categories; the performance of the model obtained by training in step S15 needs to be evaluated by an IOU function and a Loss function, where the IOU is used to calculate the performance of object detection and semantic segmentation, the value range of the IOU is between 0 and 1, the larger the value is, the more accurate the prediction result is, that is, the better the performance is, and the formula is expressed as follows:
Figure FDA0004186685490000041
wherein, area A U B represents the intersection Area between A and B, area A U B represents the union Area between A and B, w overlap And h overlap Representing the width and height, w, of the overlap between A and B, respectively A 、h A 、w B And h B The width and the height of A and B are respectively represented, wherein A represents the prediction effect of the correction trimming of the file scanning image, and B represents the actual effect of the correction trimming of the file scanning image;
the Loss function is a Loss function, when the model is trained, the ACMCN model predicts according to input data, and then calculates the difference between a predicted result and a real result, wherein the difference is the value of the Loss function; the smaller the Loss value, the closer the predicted result of the model is to the real result, namely the better the performance, and the formula is expressed as follows:
Figure FDA0004186685490000051
wherein y is i Label representing prediction result of ith file scan image, file scan image with correction cut edge is marked as 1, file scan image without correction cut edge is marked as 0, p i Representing the probability that the i-th file scanning image prediction result is the file scanning image needing correction and trimming, L i A value representing a cross entropy loss function of the ith archive scan image, N being the total number of samples of the training set;
whether the model is trained is measured through the IOU and the Loss, and when the IOU reaches 0.5 and the Loss is lower than 0.1, the target is considered to be correctly detected.
CN202310420534.9A 2023-04-19 2023-04-19 File scanning image automatic correction and trimming method based on deep learning Active CN116433494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310420534.9A CN116433494B (en) 2023-04-19 2023-04-19 File scanning image automatic correction and trimming method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310420534.9A CN116433494B (en) 2023-04-19 2023-04-19 File scanning image automatic correction and trimming method based on deep learning

Publications (2)

Publication Number Publication Date
CN116433494A true CN116433494A (en) 2023-07-14
CN116433494B CN116433494B (en) 2024-02-02

Family

ID=87081211

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310420534.9A Active CN116433494B (en) 2023-04-19 2023-04-19 File scanning image automatic correction and trimming method based on deep learning

Country Status (1)

Country Link
CN (1) CN116433494B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209632A (en) * 2019-05-27 2019-09-06 武汉市润普网络科技有限公司 A kind of electronics folder with case production, turn shelves system
CN113065396A (en) * 2021-03-02 2021-07-02 国网湖北省电力有限公司 Automatic filing processing system and method for scanned archive image based on deep learning
US20210407045A1 (en) * 2020-06-26 2021-12-30 Adobe Inc. Methods and systems for automatically correcting image rotation
CN114066919A (en) * 2021-11-18 2022-02-18 吉林省通联信用服务有限公司 Method for automatically cutting edges, correcting, denoising and replacing background of file image
CN114358137A (en) * 2021-12-10 2022-04-15 同略科技有限公司 Automatic image correction method for file scanning piece based on deep learning
CN115619656A (en) * 2022-09-19 2023-01-17 郑州大学 Digital file deviation rectifying method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110209632A (en) * 2019-05-27 2019-09-06 武汉市润普网络科技有限公司 A kind of electronics folder with case production, turn shelves system
US20210407045A1 (en) * 2020-06-26 2021-12-30 Adobe Inc. Methods and systems for automatically correcting image rotation
CN113065396A (en) * 2021-03-02 2021-07-02 国网湖北省电力有限公司 Automatic filing processing system and method for scanned archive image based on deep learning
CN114066919A (en) * 2021-11-18 2022-02-18 吉林省通联信用服务有限公司 Method for automatically cutting edges, correcting, denoising and replacing background of file image
CN114358137A (en) * 2021-12-10 2022-04-15 同略科技有限公司 Automatic image correction method for file scanning piece based on deep learning
CN115619656A (en) * 2022-09-19 2023-01-17 郑州大学 Digital file deviation rectifying method and system

Also Published As

Publication number Publication date
CN116433494B (en) 2024-02-02

Similar Documents

Publication Publication Date Title
CN111814722B (en) Method and device for identifying table in image, electronic equipment and storage medium
CN111325203B (en) American license plate recognition method and system based on image correction
CN109993040B (en) Text recognition method and device
CN109190625B (en) Large-angle perspective deformation container number identification method
CN109360179B (en) Image fusion method and device and readable storage medium
CN110647795A (en) Form recognition method
CN111626292B (en) Text recognition method of building indication mark based on deep learning technology
CN111353961A (en) Document curved surface correction method and device
CN113065396A (en) Automatic filing processing system and method for scanned archive image based on deep learning
CN110969164A (en) Low-illumination imaging license plate recognition method and device based on deep learning end-to-end
CN110689003A (en) Low-illumination imaging license plate recognition method and system, computer equipment and storage medium
CN113537211A (en) Deep learning license plate frame positioning method based on asymmetric IOU
CN109657682B (en) Electric energy representation number identification method based on deep neural network and multi-threshold soft segmentation
CN113095445B (en) Target identification method and device
CN116433494B (en) File scanning image automatic correction and trimming method based on deep learning
CN113139535A (en) OCR document recognition method
CN113052234A (en) Jade classification method based on image features and deep learning technology
JP5211449B2 (en) Program, apparatus and method for adjusting recognition distance, and program for recognizing character string
CN113989823B (en) Image table restoration method and system based on OCR coordinates
CN116152824A (en) Invoice information extraction method and system
CN113269136B (en) Off-line signature verification method based on triplet loss
CN115222652A (en) Method for identifying, counting and centering end faces of bundled steel bars and memory thereof
CN114821174A (en) Power transmission line aerial image data cleaning method based on content perception
CN114549649A (en) Feature matching-based rapid identification method for scanned map point symbols
CN114821582A (en) OCR recognition method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant