CN113591798A - Document character reconstruction method and device, electronic equipment and computer storage medium - Google Patents
Document character reconstruction method and device, electronic equipment and computer storage medium Download PDFInfo
- Publication number
- CN113591798A CN113591798A CN202110969444.6A CN202110969444A CN113591798A CN 113591798 A CN113591798 A CN 113591798A CN 202110969444 A CN202110969444 A CN 202110969444A CN 113591798 A CN113591798 A CN 113591798A
- Authority
- CN
- China
- Prior art keywords
- picture
- resolution
- text line
- original
- super
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000012549 training Methods 0.000 claims description 102
- 230000006870 function Effects 0.000 claims description 48
- 102100032202 Cornulin Human genes 0.000 claims description 19
- 101000920981 Homo sapiens Cornulin Proteins 0.000 claims description 19
- 238000013528 artificial neural network Methods 0.000 claims description 19
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 230000003321 amplification Effects 0.000 claims description 15
- 238000003199 nucleic acid amplification method Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 11
- 238000001514 detection method Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 4
- 238000012015 optical character recognition Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 244000290333 Vanilla fragrans Species 0.000 description 1
- 235000009499 Vanilla fragrans Nutrition 0.000 description 1
- 235000012036 Vanilla tahitensis Nutrition 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4007—Interpolation-based scaling, e.g. bilinear interpolation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4046—Scaling the whole image or part thereof using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4053—Super resolution, i.e. output image resolution higher than sensor resolution
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Editing Of Facsimile Originals (AREA)
Abstract
The application discloses a method and a device for reconstructing document characters, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring an original picture of a document to be processed; detecting each line of text lines in the original picture, and cutting to obtain a plurality of original text line pictures; respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture; amplifying the original picture to obtain an amplified picture corresponding to the original picture; and replacing the local image where each text line in the amplified picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture. Therefore, a super-resolution and clear reconstructed picture of the original picture is obtained based on the super-resolution network model and the text line picture replacement.
Description
Technical Field
The present application relates to the field of text reconstruction technologies, and in particular, to a method and an apparatus for reconstructing document characters, an electronic device, and a computer storage medium.
Background
Since texts are often required to be transmitted in daily life, a series of compression techniques and the like used in network transmission or scanning transmission of documents can cause the resolution of the documents to be reduced and the writing to be blurred, thereby causing discomfort or obstacle to reading for users.
In order to solve the problem of low resolution caused by the document during transmission, a method mainly used today is to enlarge the document by a method such as double-line interpolation, thereby enlarging the size of the document to a specified multiple. Another way is to recognize the text based on Optical Character Recognition (OCR), and reconstruct the document by the Recognition result.
However, although the first method enlarges the document size, the document becomes more blurred. The second approach is limited by the accuracy of OCR system recognition at low resolution. Since the recognition accuracy for low-resolution documents is low, erroneous document information is liable to occur.
Disclosure of Invention
Based on the defects of the prior art, the application provides a method and a device for reconstructing document characters, an electronic device and a computer storage medium, so as to solve the problem that the prior art cannot effectively process a low-resolution and unclear document into a high-resolution and accurate and clear document.
In order to achieve the above object, the present application provides the following technical solutions:
the first aspect of the present application provides a method for reconstructing a document text, including:
acquiring an original picture of a document to be processed;
detecting each line of text lines in the original picture, and cutting each line of text lines to obtain a plurality of original text line pictures;
inputting each original text line picture into a pre-trained super-resolution network model respectively to obtain a super-resolution text line picture corresponding to each original text line picture;
amplifying the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture;
and replacing the local image where each text line in the amplified picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above method, the method further includes:
recording the position parameters of each original text line picture in the original picture; the position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture;
replacing the local image where each text line in the enlarged picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture, wherein the method comprises the following steps:
amplifying the position parameters of each original text line picture in the original pictures to a target multiple to obtain the position parameters in a plurality of amplified pictures; wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture;
and replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above method, the super-resolution network model is obtained by training a generated countermeasure network formed by the super-resolution network model and the discriminator in advance by using multiple sets of training data; and each group of training data comprises a high-resolution text line picture and a low-resolution text line picture of the same text line.
Optionally, in the above method, the training method of the super-resolution network model includes:
acquiring the multiple groups of training data;
respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
adjusting parameters of the super-resolution network model to be trained based on a text error loss function and a discriminant network loss function, and returning to execute to input the low-resolution text line pictures in each set of training data into the super-resolution network model to be trained respectively until the text error loss function and the discriminant network loss function are converged to obtain the trained super-resolution network model;
the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text; the discriminant network loss function is obtained based on a second neural network CRNN serving as a discriminant and is used for reflecting the error of the high-resolution text line picture in the training data and the error of the super-resolution text line picture corresponding to the training data on an image.
Optionally, in the above method, the acquiring multiple sets of training data includes:
acquiring a plurality of high-resolution PDF files;
respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
detecting and cutting text lines in the low-resolution image and the high-resolution image by respectively utilizing a text line detection model aiming at the low-resolution image and the high-resolution image corresponding to each PDF file to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images;
and forming a set of training data by using the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
A second aspect of the present application provides a device for reconstructing text in a document, including:
the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring an original picture of a document to be processed;
the first cutting unit is used for detecting each line of text lines in the original pictures and cutting each line of text lines to obtain a plurality of original text line pictures;
the processing unit is used for respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture;
the amplifying unit is used for carrying out double-line interpolation amplifying processing on the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture;
and the replacing unit is used for replacing the local image where each text line in the amplified picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above apparatus, further comprising:
the recording unit is used for recording the position parameters of each original text line picture in the original picture; the position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture;
wherein the replacement unit includes:
the parameter amplifying unit is used for amplifying the position parameters of each original text line picture in the original pictures to target multiples to obtain the position parameters in a plurality of amplified pictures; wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture;
and the replacing subunit is used for replacing the local image of the enlarged picture indicated by the position parameter in each enlarged picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above apparatus, the apparatus further includes a training unit, wherein the training unit includes:
a second obtaining unit, configured to obtain the multiple sets of training data;
the training unit is used for respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
the parameter adjusting unit is used for adjusting parameters of the super-resolution network model to be trained based on a text error loss function and a judgment network loss function, and returning to execute the step of respectively inputting the low-resolution text line pictures in each group of training data into the super-resolution network model to be trained until the text error loss function and the judgment network loss function are converged to obtain the trained super-resolution network model;
the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text; the discriminant network loss function is obtained based on a second neural network CRNN serving as a discriminant and is used for reflecting the error of the high-resolution text line picture in the training data and the error of the super-resolution text line picture corresponding to the training data on an image.
Optionally, in the above apparatus, the second obtaining unit includes:
the file acquisition unit is used for acquiring a plurality of high-resolution PDF files;
the conversion unit is used for respectively carrying out image conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
the second cutting unit is used for detecting and cutting text lines in the low-resolution image and the high-resolution image by respectively utilizing a text line detection model aiming at the low-resolution image and the high-resolution image corresponding to each PDF file to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images;
and the composition unit is used for composing the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line into a group of training data.
A third aspect of the present application provides an electronic device comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, and when the program is executed, the program is specifically configured to implement the method for reconstructing a document text as described in any one of the above items.
A fourth aspect of the present application provides a computer storage medium for storing a computer program, which when executed, is configured to implement the method for reconstructing document text as described in any one of the above.
The embodiment of the application provides a method for reconstructing document characters, which includes the steps of training a super-resolution network model in advance, obtaining original pictures of a document to be processed, detecting each line of text in the original pictures, cutting each line of text to obtain a plurality of original text line pictures, inputting each original text line picture into the pre-trained super-resolution network model, processing the original text line pictures through the super-resolution network model to obtain super-resolution text line pictures corresponding to each original text line picture, and processing low-resolution and fuzzy original text line pictures into the super-resolution and clear text line pictures through the super-resolution network model. Because the super-resolution text line picture is amplified, the original picture is amplified to obtain the amplified picture corresponding to the original picture and having the same amplification factor, so that the local image where each text line in the amplified picture is positioned can be replaced by the corresponding super-resolution text line picture, and the reconstructed picture of the super-resolution and clear original picture is obtained.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a training method of a super-resolution network model according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a method for obtaining training data according to another embodiment of the present application;
FIG. 3 is a schematic diagram of a network architecture of a training model according to another embodiment of the present application;
FIG. 4 is a comparison graph of an example output result provided by another embodiment of the present application;
FIG. 5 is a flowchart illustrating a method for reconstructing text in a document according to another embodiment of the present application;
fig. 6 is a flowchart of a method for replacing pictures according to another embodiment of the present application;
fig. 7 is a schematic structural diagram of a device for reconstructing document text according to another embodiment of the present application;
FIG. 8 is a schematic structural diagram of a training unit according to another embodiment of the present application;
fig. 9 is a schematic structural diagram of a second obtaining unit according to another embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In this application, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The application provides a method for reconstructing document characters, which aims to solve the problem that a low-resolution and unclear document cannot be effectively processed into a high-resolution and accurate and clear document in the prior art.
The document character reconstruction method is realized based on a super-resolution network model generated by a generation countermeasure network. It should be noted that, the generation countermeasure network is composed of a generator and a discriminator, so in the embodiment of the present application, the super-resolution network model is used as the generator, and the generation countermeasure network is combined with the discriminator to form the generation corresponding network, and the super-resolution network model is trained based on the generation countermeasure network. Therefore, optionally, an embodiment of the present application provides a method for training a super-resolution network model, as shown in fig. 1, including the following steps:
s101, acquiring a plurality of groups of training data.
Each set of training data comprises a high-resolution text line picture and a low-resolution text line picture of the same text line.
Specifically, the same text line on the same picture is processed into a high-resolution text line picture and a low-resolution text line picture respectively, and then the two text line pictures form a group of training data. The low-resolution text line picture is used for being input into the super-resolution network model for reconstruction to obtain the text line picture exceeding the resolution, and the high-resolution text line picture is equivalent to a label and used for being compared with the text line picture exceeding the resolution output by the super-resolution network model to determine the training effect of the super-resolution network model.
Optionally, in another embodiment of the present application, a specific implementation manner of the step S101, as shown in fig. 2, includes the following steps:
s201, acquiring a plurality of high-resolution PDF files.
It should be noted that, in the embodiment, a standard bicubic interpolation algorithm is not used to construct the training data, but a more common document degradation method is used. Image degradation due to DPI resolution reduction during pdf document-to-image conversion is more common in document files, especially scanned files. Of course, for the application in other scenarios, training data obtained by a bicubic interpolation algorithm may be added, or training data may be constructed in other ways.
S202, respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file.
The format conversion tool may be pdftoppm.
Specifically, the PDF file is converted into a low-resolution map and a high-resolution map by setting different magnification factors scale. For example, taking 2 times of enlarged privacy as an example, a high resolution map of the PDF file is obtained by setting the dpi parameter of pdftoppm to 200, and a low resolution map of the PDF file is obtained by setting the dpi parameter of pdftoppm to 100.
And S203, detecting text lines in the low-resolution image and the high-resolution image and cutting the text lines by using a text line detection model respectively according to the low-resolution image and the high-resolution image corresponding to each PDF file to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images.
Specifically, an OCR character line detection model, such as EAST, is used to detect the low resolution image and the high resolution image, respectively, detect the text lines in the low resolution image and the high resolution image, and cut each line of text lines, thereby obtaining a plurality of high resolution text line pictures and a plurality of low resolution text line pictures.
And S204, forming a group of training data by the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
S102, respectively inputting the low-resolution text line pictures in each set of training data into the super-resolution network model to be trained to obtain the super-resolution text line pictures corresponding to each set of training data.
S103, judging whether the text error loss function and the network loss function are converged.
In the embodiment of the present application, the discriminant network Loss function is denoted as D OCR Loss, wherein the D OCR Loss may be Vanilla Loss. And the discrimination network loss function is obtained based on a second neural network CRNN serving as a discriminator and is used for reflecting the errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the image. That is, the second neural network CRNN is used as a discriminator to form an antagonistic network with the super-resolution network model, the super-resolution network model is equivalent to a generator for generating the super-resolution text line picture, and the second neural network CRNN is used for discriminating whether the super-resolution text line picture is a real picture, that is, judging the similarity with the high-resolution text line picture in the training data. Therefore, the judgment network loss function can be obtained through the second neural network CRNN and is used for reflecting the errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the image.
Because the main purpose of the present application is to reconstruct a text instead of a picture, in order to ensure that characters in a super-resolution text line picture constructed by the method are clearer, in the embodiment of the present application, a first neural network CRNN is further added, so that a text error loss function is further added, which can be recorded as ocr loss. Wherein, ocr loss can be l1 loss. Therefore, in the embodiment of the present application, the super-resolution network model is obtained by training in advance based on the generated countermeasure network formed by the discriminator, and also needs to be trained based on the first neural network CRNN. The first neural network CRNN is mainly used for recognizing characters in the picture.
The text error loss function is obtained based on the trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text.
Specifically, the trained first neural network CRNN may identify the characters in the image, so that the high-resolution text line image and the super-resolution text line image in the training data may be identified, so as to compare the difference between the two characters, and construct a text error loss function. Therefore, the whole network architecture for training the super-resolution network model in the embodiment of the present application can be as shown in fig. 3.
And adjusting parameters based on the text error loss function and the judgment network loss function, so as to guide the direction training of the super-resolution network model network which is easier to identify, and enable the super-resolution text line image output by the super-resolution network model to be clearer. For example, as shown in fig. 4, the first line of characters is characters on a low resolution text line picture of the input model, the second line of characters is characters in a super resolution text line picture generated when ocr loss and D ocr loss are not used, the third line of characters is characters in a super resolution text line picture generated when ocr loss and D ocr loss are used, and the third line of characters is characters on a high resolution text line picture in the training data. Therefore, after ocr loss and D ocr loss are used, the obtained characters are clearer and more approximate to the characters on the high-resolution original image.
If it is determined that the text error loss function or the discriminant network loss function does not converge, step S104 is executed. If the text error loss function or the judgment network loss function is converged, step S105 is executed.
And S104, adjusting parameters of the super-resolution network model to be trained based on the text error loss function and the judgment network loss function.
After step S104 is executed, step S102 is executed again until the text error loss function and the discriminant network loss function converge, so as to obtain the trained super-resolution network model.
And S105, determining that the super-resolution network model is trained.
Based on the provided super-resolution network model, another embodiment of the present application provides a document text reconstruction method, as shown in fig. 4, including the following steps:
s501, obtaining an original picture of the document to be processed.
S502, detecting each line of text lines in the original picture, and cutting each line of text lines to obtain a plurality of original text line pictures.
Similarly, an OCR character line detection model, such as EAST model, may be used to detect all text lines in the original image and perform cropping to obtain an original text line image corresponding to each text line, so that one original text line image includes one text line of the document to be processed.
S503, respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture.
Optionally, the super-resolution network model may be obtained by training a generated countermeasure network composed of the super-resolution network model and the discriminator in advance by using a plurality of sets of training data, that is, training the super-resolution network model as a generator in the countermeasure network. Each set of training data comprises a high-resolution text line picture and a low-resolution text line picture of the same text line. For a specific training process, reference may be made to the method provided in the embodiment shown in fig. 1, which is not described herein again. Of course, this is an alternative way, and the super-resolution network model can be constructed and trained separately.
And S504, amplifying the original picture to obtain an amplified picture corresponding to the original picture.
And the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture.
It should be noted that, since the super-resolution network model processes the original text line picture based on the specified magnification factor, the obtained super-resolution text line picture is relatively magnified with respect to the original picture, and the magnified super-resolution text line picture is not matched with the original picture in size, it is necessary to first obtain a magnified picture by magnifying the original picture, and then execute step S505.
Optionally, the original picture may be enlarged by using a two-line interpolation method.
And S505, replacing the local image where each text line in the amplified picture is located with a corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
It should be noted that the super-resolution text line picture corresponding to the local image where the text line in the enlarged picture is located refers to a super-resolution text line picture containing the same text content.
Optionally, in another embodiment of the present application, the method may further include the steps of:
and recording the position parameter of each original text line picture in the original picture.
The position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture. Of course, this is only one alternative way, and may be to record the coordinates of the four corners of the original text line picture, or record other parameters that may indicate the position of the original text line picture in the original picture.
Specifically, when each original text line picture is cut out, the position parameter of each original text line picture in the original picture is recorded and recorded. And recording the corresponding position parameters of each original text line picture so as to facilitate the subsequent replacement of the super-resolution text line picture.
Accordingly, the specific implementation of step S505 in the present application, as shown in fig. 6, includes the following steps:
s601, amplifying the position parameters of each original text line picture in the original pictures to a target multiple to obtain the position parameters in a plurality of amplified pictures.
Wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture. Since the enlarged image is obtained by enlarging the original image, the positions of the original image are correspondingly increased, and therefore, the position parameter of each original text line image in the original image needs to be enlarged to the magnification of the enlarged image. Specifically, each position parameter is multiplied by a target multiple.
S602, replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with a corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
The embodiment of the application provides a method for reconstructing document characters, which is obtained by training a generated countermeasure network formed by a super-resolution network model and a discriminator by using a plurality of groups of training data including high-resolution text line pictures and low-resolution text line pictures in advance, and obtaining the trained super-resolution network model. The method comprises the steps of obtaining an original picture of a document to be processed, detecting each line of text in the original picture, cutting each line of text to obtain a plurality of original text line pictures, inputting each original text line picture into a pre-trained super-resolution network model respectively, processing the original text line pictures through the super-resolution network model to obtain super-resolution text line pictures corresponding to each original text line picture, and effectively processing low-resolution and fuzzy original text line pictures into the super-resolution and clear text line pictures through the super-resolution network model. Because the super-resolution text line picture is amplified, the original picture is subjected to double-line interpolation amplification processing to obtain the amplified pictures corresponding to the original picture and having the same amplification factor, so that the local images of the text lines in the amplified pictures can be replaced by the corresponding super-resolution text line pictures, and the reconstructed picture of the super-resolution and clear original picture can be obtained.
Another embodiment of the present application provides a device for reconstructing a document text, as shown in fig. 7, including the following units:
a first obtaining unit 701, configured to obtain an original picture of a document to be processed.
A first cropping unit 702, configured to detect each line of text in the original picture, and crop each line of text to obtain multiple original text line pictures.
And the processing unit 703 is configured to input each original text line image into a pre-trained super-resolution network model to obtain a super-resolution text line image corresponding to each original text line image.
Optionally, the super-resolution network model is obtained by training a generated countermeasure network formed by the super-resolution network model and the discriminator in advance by using a plurality of sets of training data. Each set of the training data includes a high resolution text line picture and a low resolution text line picture of the same text line.
And the amplifying unit 704 is configured to amplify the original picture to obtain an amplified picture corresponding to the original picture.
And the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture.
Optionally, the enlarging unit 704 may specifically enlarge the original picture by using a two-line interpolation method.
A replacing unit 705, configured to replace the local image where each text line in the enlarged picture is located with the corresponding super-resolution text line picture, so as to obtain a reconstructed picture of the original picture.
Optionally, in an apparatus for reconstructing a document text provided in another embodiment of the present application, further includes:
and the recording unit is used for recording the position parameters of each original text line picture in the original picture. The position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture.
Wherein, the replacement unit in this application embodiment includes:
and the parameter amplifying unit is used for amplifying the position parameters of each original text line picture in the original pictures to target multiples to obtain the position parameters in the amplified pictures.
Wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture.
And the replacing subunit is used for replacing the local image of the enlarged picture indicated by the position parameter in each enlarged picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, the apparatus for reconstructing a document text provided in another embodiment of the present application further includes a training unit. Wherein, the training unit, as shown in fig. 8, comprises:
a second obtaining unit 801, configured to obtain the multiple sets of training data.
The training unit 802 is configured to input the low-resolution text line pictures in each set of training data into a super-resolution network model to be trained, and process the low-resolution text line pictures through the super-resolution network model to obtain super-resolution text line pictures corresponding to each set of training data.
And a parameter adjusting unit 803, configured to adjust parameters of the super-resolution network model to be trained based on a text error loss function and a discriminant network loss function, and return to execute to input the low-resolution text line pictures in each set of training data into the super-resolution network model to be trained respectively until the text error loss function and the discriminant network loss function converge, so as to obtain the trained super-resolution network model.
The text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text. The discriminant network loss function is obtained based on a second neural network CRNN serving as a discriminant and is used for reflecting the error of the high-resolution text line picture in the training data and the error of the super-resolution text line picture corresponding to the training data on an image.
Optionally, in an apparatus for reconstructing a document text provided in another embodiment of the present application, as shown in fig. 9, a second obtaining unit includes:
a file acquiring unit 901 configured to acquire a plurality of high-resolution PDF files.
A converting unit 902, configured to perform picture conversion on each PDF file by using a format conversion tool, so as to obtain a low resolution map and a high resolution map corresponding to each PDF file.
A second clipping unit 903, configured to, for a low-resolution image and a high-resolution image corresponding to each PDF file, detect and clip text lines in the low-resolution image and the high-resolution image by using a text line detection model, respectively, so as to obtain multiple high-resolution text line images and multiple low-resolution text line images.
A composing unit 904, configured to compose a set of training data from the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
Another embodiment of the present application provides an electronic device, as shown in fig. 10, including:
a memory 1001 and a processor 1002.
The memory 1001 is used for storing a program 1002, and the processor 1002 is used for executing the program stored in the memory 1001, and when the program is executed, the program is specifically used for implementing the method for reconstructing document characters provided in any one of the above embodiments.
Another embodiment of the present application provides a computer storage medium for storing a computer program, where the computer program is used to implement the method for reconstructing document text as provided in any one of the above embodiments.
Computer storage media, including permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (transmyedia) such as modulated data signals and carrier waves.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (11)
1. A method for reconstructing a document text, comprising:
acquiring an original picture of a document to be processed;
detecting each line of text lines in the original picture, and cutting each line of text lines to obtain a plurality of original text line pictures;
inputting each original text line picture into a pre-trained super-resolution network model respectively to obtain a super-resolution text line picture corresponding to each original text line picture;
amplifying the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture;
and replacing the local image where each text line in the amplified picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
2. The method of claim 1, further comprising:
recording the position parameters of each original text line picture in the original picture; the position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture;
replacing the local image where each text line in the enlarged picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture, wherein the method comprises the following steps:
amplifying the position parameters of each original text line picture in the original pictures to a target multiple to obtain the position parameters in a plurality of amplified pictures; wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture;
and replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
3. The method according to claim 1, wherein the super-resolution network model is obtained by training a generated countermeasure network composed of the super-resolution network model and a discriminator by using a plurality of sets of training data in advance; and each group of training data comprises a high-resolution text line picture and a low-resolution text line picture of the same text line.
4. The method of claim 3, wherein the training method of the super-resolution network model comprises:
acquiring the multiple groups of training data;
respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
adjusting parameters of the super-resolution network model to be trained based on a text error loss function and a discriminant network loss function, and returning to execute to input the low-resolution text line pictures in each set of training data into the super-resolution network model to be trained respectively until the text error loss function and the discriminant network loss function are converged to obtain the trained super-resolution network model;
the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text; the discriminant network loss function is obtained based on a second neural network CRNN serving as a discriminant and is used for reflecting the error of the high-resolution text line picture in the training data and the error of the super-resolution text line picture corresponding to the training data on an image.
5. The method of claim 1, wherein the obtaining a plurality of sets of training data comprises:
acquiring a plurality of high-resolution PDF files;
respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
detecting and cutting text lines in the low-resolution image and the high-resolution image by respectively utilizing a text line detection model aiming at the low-resolution image and the high-resolution image corresponding to each PDF file to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images;
and forming a set of training data by using the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
6. An apparatus for reconstructing text of a document, comprising:
the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring an original picture of a document to be processed;
the first cutting unit is used for detecting each line of text lines in the original pictures and cutting each line of text lines to obtain a plurality of original text line pictures;
the processing unit is used for respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture;
the amplifying unit is used for amplifying the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture;
and the replacing unit is used for replacing the local image where each text line in the amplified picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
7. The apparatus of claim 6, further comprising:
the recording unit is used for recording the position parameters of each original text line picture in the original picture; the position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture;
wherein the replacement unit includes:
the parameter amplifying unit is used for amplifying the position parameters of each original text line picture in the original pictures to target multiples to obtain the position parameters in a plurality of amplified pictures; wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture;
and the replacing subunit is used for replacing the local image of the enlarged picture indicated by the position parameter in each enlarged picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
8. The apparatus of claim 6, further comprising a training unit, wherein the training unit comprises:
a second obtaining unit, configured to obtain the multiple sets of training data;
the training unit is used for respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
the parameter adjusting unit is used for adjusting parameters of the super-resolution network model to be trained based on a text error loss function and a judgment network loss function, and returning to execute the step of respectively inputting the low-resolution text line pictures in each group of training data into the super-resolution network model to be trained until the text error loss function and the judgment network loss function are converged to obtain the trained super-resolution network model;
the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text; the discriminant network loss function is obtained based on a second neural network CRNN serving as a discriminant and is used for reflecting the error of the high-resolution text line picture in the training data and the error of the super-resolution text line picture corresponding to the training data on an image.
9. The apparatus of claim 6, wherein the second obtaining unit comprises:
the file acquisition unit is used for acquiring a plurality of high-resolution PDF files;
the conversion unit is used for respectively carrying out image conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
the second cutting unit is used for detecting and cutting text lines in the low-resolution image and the high-resolution image by respectively utilizing a text line detection model aiming at the low-resolution image and the high-resolution image corresponding to each PDF file to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images;
and the composition unit is used for composing the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line into a group of training data.
10. An electronic device, comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, and when the program is executed, the program is specifically configured to implement the method for reconstructing a document text according to any one of claims 1 to 5.
11. A computer storage medium storing a computer program which, when executed, implements a method of reconstructing a document literal according to any of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110969444.6A CN113591798B (en) | 2021-08-23 | 2021-08-23 | Method and device for reconstructing text of document, electronic equipment and computer storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110969444.6A CN113591798B (en) | 2021-08-23 | 2021-08-23 | Method and device for reconstructing text of document, electronic equipment and computer storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113591798A true CN113591798A (en) | 2021-11-02 |
CN113591798B CN113591798B (en) | 2023-11-03 |
Family
ID=78239065
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110969444.6A Active CN113591798B (en) | 2021-08-23 | 2021-08-23 | Method and device for reconstructing text of document, electronic equipment and computer storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113591798B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114155543A (en) * | 2021-12-08 | 2022-03-08 | 北京百度网讯科技有限公司 | Neural network training method, document image understanding method, device and equipment |
CN115829837A (en) * | 2022-11-15 | 2023-03-21 | 深圳市新良田科技股份有限公司 | Text image super-resolution reconstruction method and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108986029A (en) * | 2018-07-03 | 2018-12-11 | 南京览笛信息科技有限公司 | Character image super resolution ratio reconstruction method, system, terminal device and storage medium |
CN109410239A (en) * | 2018-11-07 | 2019-03-01 | 南京大学 | A kind of text image super resolution ratio reconstruction method generating confrontation network based on condition |
CN110633755A (en) * | 2019-09-19 | 2019-12-31 | 北京市商汤科技开发有限公司 | Network training method, image processing method and device and electronic equipment |
CN111461134A (en) * | 2020-05-18 | 2020-07-28 | 南京大学 | Low-resolution license plate recognition method based on generation countermeasure network |
US20200258197A1 (en) * | 2017-11-24 | 2020-08-13 | Tencent Technology (Shenzhen) Company Limited | Method for generating high-resolution picture, computer device, and storage medium |
US20200311871A1 (en) * | 2017-12-20 | 2020-10-01 | Huawei Technologies Co., Ltd. | Image reconstruction method and device |
CN112419159A (en) * | 2020-12-07 | 2021-02-26 | 上海互联网软件集团有限公司 | Character image super-resolution reconstruction system and method |
CN112734647A (en) * | 2021-01-20 | 2021-04-30 | 支付宝(杭州)信息技术有限公司 | Image processing method and device |
-
2021
- 2021-08-23 CN CN202110969444.6A patent/CN113591798B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200258197A1 (en) * | 2017-11-24 | 2020-08-13 | Tencent Technology (Shenzhen) Company Limited | Method for generating high-resolution picture, computer device, and storage medium |
US20200311871A1 (en) * | 2017-12-20 | 2020-10-01 | Huawei Technologies Co., Ltd. | Image reconstruction method and device |
CN108986029A (en) * | 2018-07-03 | 2018-12-11 | 南京览笛信息科技有限公司 | Character image super resolution ratio reconstruction method, system, terminal device and storage medium |
CN109410239A (en) * | 2018-11-07 | 2019-03-01 | 南京大学 | A kind of text image super resolution ratio reconstruction method generating confrontation network based on condition |
CN110633755A (en) * | 2019-09-19 | 2019-12-31 | 北京市商汤科技开发有限公司 | Network training method, image processing method and device and electronic equipment |
CN111461134A (en) * | 2020-05-18 | 2020-07-28 | 南京大学 | Low-resolution license plate recognition method based on generation countermeasure network |
CN112419159A (en) * | 2020-12-07 | 2021-02-26 | 上海互联网软件集团有限公司 | Character image super-resolution reconstruction system and method |
CN112734647A (en) * | 2021-01-20 | 2021-04-30 | 支付宝(杭州)信息技术有限公司 | Image processing method and device |
Non-Patent Citations (1)
Title |
---|
李瑞明;张烨;: "基于马尔科夫网络的文本图像超分辨率重建", 山西电子技术, no. 04 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114155543A (en) * | 2021-12-08 | 2022-03-08 | 北京百度网讯科技有限公司 | Neural network training method, document image understanding method, device and equipment |
CN115829837A (en) * | 2022-11-15 | 2023-03-21 | 深圳市新良田科技股份有限公司 | Text image super-resolution reconstruction method and system |
Also Published As
Publication number | Publication date |
---|---|
CN113591798B (en) | 2023-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113591798B (en) | Method and device for reconstructing text of document, electronic equipment and computer storage medium | |
AU2021201141A1 (en) | System and method for data extraction and searching | |
US6393150B1 (en) | Region-based image binarization system | |
Sabourin et al. | Off-line identification with handwritten signature images: survey and perspectives | |
CN111291661B (en) | Method and equipment for identifying text content of icon in screen | |
US20070091373A1 (en) | Medium processing method, copying apparatus, and data filing apparatus | |
CN110210400B (en) | Table file detection method and equipment | |
CN112037129A (en) | Image super-resolution reconstruction method, device, equipment and storage medium | |
CN104834645B (en) | Method and apparatus for showing format document | |
WO2001003416A1 (en) | Border eliminating device, border eliminating method, and authoring device | |
US11704925B2 (en) | Systems and methods for digitized document image data spillage recovery | |
AU2006235826A1 (en) | Image processing device, image processing method, and storage medium storing image processing program | |
US7929772B2 (en) | Method for generating typographical line | |
JP2640673B2 (en) | Pattern recognition device | |
CN116092108A (en) | Method, system and storage medium for generating PDF file by scanning entity document | |
US20020012468A1 (en) | Document recognition apparatus and method | |
CN116050379A (en) | Document comparison method and storage medium | |
JP3310744B2 (en) | Resolution converter | |
CN100511267C (en) | Graph and text image processing equipment and image processing method thereof | |
CN111681173A (en) | Inclined image correction method based on combination of minimum area bounding box and projection method | |
JP2020154562A (en) | Information processing device, information processing method and program | |
CN105015184A (en) | Image processing apparatus and image forming apparatus | |
WO2024085229A1 (en) | Image processing device, image processing system, information code search method, and information code search program | |
US20230343322A1 (en) | Provision of voice information by using printout on which attribute information of document is recorded | |
CN117115840A (en) | Information extraction method, device, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |