CN113591798A - Document character reconstruction method and device, electronic equipment and computer storage medium - Google Patents

Document character reconstruction method and device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN113591798A
CN113591798A CN202110969444.6A CN202110969444A CN113591798A CN 113591798 A CN113591798 A CN 113591798A CN 202110969444 A CN202110969444 A CN 202110969444A CN 113591798 A CN113591798 A CN 113591798A
Authority
CN
China
Prior art keywords
picture
resolution
text line
original
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110969444.6A
Other languages
Chinese (zh)
Other versions
CN113591798B (en
Inventor
张陆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202110969444.6A priority Critical patent/CN113591798B/en
Publication of CN113591798A publication Critical patent/CN113591798A/en
Application granted granted Critical
Publication of CN113591798B publication Critical patent/CN113591798B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4007Interpolation-based scaling, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4046Scaling the whole image or part thereof using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The application discloses a method and a device for reconstructing document characters, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring an original picture of a document to be processed; detecting each line of text lines in the original picture, and cutting to obtain a plurality of original text line pictures; respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture; amplifying the original picture to obtain an amplified picture corresponding to the original picture; and replacing the local image where each text line in the amplified picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture. Therefore, a super-resolution and clear reconstructed picture of the original picture is obtained based on the super-resolution network model and the text line picture replacement.

Description

Document character reconstruction method and device, electronic equipment and computer storage medium
Technical Field
The present application relates to the field of text reconstruction technologies, and in particular, to a method and an apparatus for reconstructing document characters, an electronic device, and a computer storage medium.
Background
Since texts are often required to be transmitted in daily life, a series of compression techniques and the like used in network transmission or scanning transmission of documents can cause the resolution of the documents to be reduced and the writing to be blurred, thereby causing discomfort or obstacle to reading for users.
In order to solve the problem of low resolution caused by the document during transmission, a method mainly used today is to enlarge the document by a method such as double-line interpolation, thereby enlarging the size of the document to a specified multiple. Another way is to recognize the text based on Optical Character Recognition (OCR), and reconstruct the document by the Recognition result.
However, although the first method enlarges the document size, the document becomes more blurred. The second approach is limited by the accuracy of OCR system recognition at low resolution. Since the recognition accuracy for low-resolution documents is low, erroneous document information is liable to occur.
Disclosure of Invention
Based on the defects of the prior art, the application provides a method and a device for reconstructing document characters, an electronic device and a computer storage medium, so as to solve the problem that the prior art cannot effectively process a low-resolution and unclear document into a high-resolution and accurate and clear document.
In order to achieve the above object, the present application provides the following technical solutions:
the first aspect of the present application provides a method for reconstructing a document text, including:
acquiring an original picture of a document to be processed;
detecting each line of text lines in the original picture, and cutting each line of text lines to obtain a plurality of original text line pictures;
inputting each original text line picture into a pre-trained super-resolution network model respectively to obtain a super-resolution text line picture corresponding to each original text line picture;
amplifying the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture;
and replacing the local image where each text line in the amplified picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above method, the method further includes:
recording the position parameters of each original text line picture in the original picture; the position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture;
replacing the local image where each text line in the enlarged picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture, wherein the method comprises the following steps:
amplifying the position parameters of each original text line picture in the original pictures to a target multiple to obtain the position parameters in a plurality of amplified pictures; wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture;
and replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above method, the super-resolution network model is obtained by training a generated countermeasure network formed by the super-resolution network model and the discriminator in advance by using multiple sets of training data; and each group of training data comprises a high-resolution text line picture and a low-resolution text line picture of the same text line.
Optionally, in the above method, the training method of the super-resolution network model includes:
acquiring the multiple groups of training data;
respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
adjusting parameters of the super-resolution network model to be trained based on a text error loss function and a discriminant network loss function, and returning to execute to input the low-resolution text line pictures in each set of training data into the super-resolution network model to be trained respectively until the text error loss function and the discriminant network loss function are converged to obtain the trained super-resolution network model;
the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text; the discriminant network loss function is obtained based on a second neural network CRNN serving as a discriminant and is used for reflecting the error of the high-resolution text line picture in the training data and the error of the super-resolution text line picture corresponding to the training data on an image.
Optionally, in the above method, the acquiring multiple sets of training data includes:
acquiring a plurality of high-resolution PDF files;
respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
detecting and cutting text lines in the low-resolution image and the high-resolution image by respectively utilizing a text line detection model aiming at the low-resolution image and the high-resolution image corresponding to each PDF file to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images;
and forming a set of training data by using the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
A second aspect of the present application provides a device for reconstructing text in a document, including:
the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring an original picture of a document to be processed;
the first cutting unit is used for detecting each line of text lines in the original pictures and cutting each line of text lines to obtain a plurality of original text line pictures;
the processing unit is used for respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture;
the amplifying unit is used for carrying out double-line interpolation amplifying processing on the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture;
and the replacing unit is used for replacing the local image where each text line in the amplified picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above apparatus, further comprising:
the recording unit is used for recording the position parameters of each original text line picture in the original picture; the position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture;
wherein the replacement unit includes:
the parameter amplifying unit is used for amplifying the position parameters of each original text line picture in the original pictures to target multiples to obtain the position parameters in a plurality of amplified pictures; wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture;
and the replacing subunit is used for replacing the local image of the enlarged picture indicated by the position parameter in each enlarged picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above apparatus, the apparatus further includes a training unit, wherein the training unit includes:
a second obtaining unit, configured to obtain the multiple sets of training data;
the training unit is used for respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
the parameter adjusting unit is used for adjusting parameters of the super-resolution network model to be trained based on a text error loss function and a judgment network loss function, and returning to execute the step of respectively inputting the low-resolution text line pictures in each group of training data into the super-resolution network model to be trained until the text error loss function and the judgment network loss function are converged to obtain the trained super-resolution network model;
the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text; the discriminant network loss function is obtained based on a second neural network CRNN serving as a discriminant and is used for reflecting the error of the high-resolution text line picture in the training data and the error of the super-resolution text line picture corresponding to the training data on an image.
Optionally, in the above apparatus, the second obtaining unit includes:
the file acquisition unit is used for acquiring a plurality of high-resolution PDF files;
the conversion unit is used for respectively carrying out image conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
the second cutting unit is used for detecting and cutting text lines in the low-resolution image and the high-resolution image by respectively utilizing a text line detection model aiming at the low-resolution image and the high-resolution image corresponding to each PDF file to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images;
and the composition unit is used for composing the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line into a group of training data.
A third aspect of the present application provides an electronic device comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, and when the program is executed, the program is specifically configured to implement the method for reconstructing a document text as described in any one of the above items.
A fourth aspect of the present application provides a computer storage medium for storing a computer program, which when executed, is configured to implement the method for reconstructing document text as described in any one of the above.
The embodiment of the application provides a method for reconstructing document characters, which includes the steps of training a super-resolution network model in advance, obtaining original pictures of a document to be processed, detecting each line of text in the original pictures, cutting each line of text to obtain a plurality of original text line pictures, inputting each original text line picture into the pre-trained super-resolution network model, processing the original text line pictures through the super-resolution network model to obtain super-resolution text line pictures corresponding to each original text line picture, and processing low-resolution and fuzzy original text line pictures into the super-resolution and clear text line pictures through the super-resolution network model. Because the super-resolution text line picture is amplified, the original picture is amplified to obtain the amplified picture corresponding to the original picture and having the same amplification factor, so that the local image where each text line in the amplified picture is positioned can be replaced by the corresponding super-resolution text line picture, and the reconstructed picture of the super-resolution and clear original picture is obtained.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a training method of a super-resolution network model according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a method for obtaining training data according to another embodiment of the present application;
FIG. 3 is a schematic diagram of a network architecture of a training model according to another embodiment of the present application;
FIG. 4 is a comparison graph of an example output result provided by another embodiment of the present application;
FIG. 5 is a flowchart illustrating a method for reconstructing text in a document according to another embodiment of the present application;
fig. 6 is a flowchart of a method for replacing pictures according to another embodiment of the present application;
fig. 7 is a schematic structural diagram of a device for reconstructing document text according to another embodiment of the present application;
FIG. 8 is a schematic structural diagram of a training unit according to another embodiment of the present application;
fig. 9 is a schematic structural diagram of a second obtaining unit according to another embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In this application, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The application provides a method for reconstructing document characters, which aims to solve the problem that a low-resolution and unclear document cannot be effectively processed into a high-resolution and accurate and clear document in the prior art.
The document character reconstruction method is realized based on a super-resolution network model generated by a generation countermeasure network. It should be noted that, the generation countermeasure network is composed of a generator and a discriminator, so in the embodiment of the present application, the super-resolution network model is used as the generator, and the generation countermeasure network is combined with the discriminator to form the generation corresponding network, and the super-resolution network model is trained based on the generation countermeasure network. Therefore, optionally, an embodiment of the present application provides a method for training a super-resolution network model, as shown in fig. 1, including the following steps:
s101, acquiring a plurality of groups of training data.
Each set of training data comprises a high-resolution text line picture and a low-resolution text line picture of the same text line.
Specifically, the same text line on the same picture is processed into a high-resolution text line picture and a low-resolution text line picture respectively, and then the two text line pictures form a group of training data. The low-resolution text line picture is used for being input into the super-resolution network model for reconstruction to obtain the text line picture exceeding the resolution, and the high-resolution text line picture is equivalent to a label and used for being compared with the text line picture exceeding the resolution output by the super-resolution network model to determine the training effect of the super-resolution network model.
Optionally, in another embodiment of the present application, a specific implementation manner of the step S101, as shown in fig. 2, includes the following steps:
s201, acquiring a plurality of high-resolution PDF files.
It should be noted that, in the embodiment, a standard bicubic interpolation algorithm is not used to construct the training data, but a more common document degradation method is used. Image degradation due to DPI resolution reduction during pdf document-to-image conversion is more common in document files, especially scanned files. Of course, for the application in other scenarios, training data obtained by a bicubic interpolation algorithm may be added, or training data may be constructed in other ways.
S202, respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file.
The format conversion tool may be pdftoppm.
Specifically, the PDF file is converted into a low-resolution map and a high-resolution map by setting different magnification factors scale. For example, taking 2 times of enlarged privacy as an example, a high resolution map of the PDF file is obtained by setting the dpi parameter of pdftoppm to 200, and a low resolution map of the PDF file is obtained by setting the dpi parameter of pdftoppm to 100.
And S203, detecting text lines in the low-resolution image and the high-resolution image and cutting the text lines by using a text line detection model respectively according to the low-resolution image and the high-resolution image corresponding to each PDF file to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images.
Specifically, an OCR character line detection model, such as EAST, is used to detect the low resolution image and the high resolution image, respectively, detect the text lines in the low resolution image and the high resolution image, and cut each line of text lines, thereby obtaining a plurality of high resolution text line pictures and a plurality of low resolution text line pictures.
And S204, forming a group of training data by the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
S102, respectively inputting the low-resolution text line pictures in each set of training data into the super-resolution network model to be trained to obtain the super-resolution text line pictures corresponding to each set of training data.
S103, judging whether the text error loss function and the network loss function are converged.
In the embodiment of the present application, the discriminant network Loss function is denoted as D OCR Loss, wherein the D OCR Loss may be Vanilla Loss. And the discrimination network loss function is obtained based on a second neural network CRNN serving as a discriminator and is used for reflecting the errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the image. That is, the second neural network CRNN is used as a discriminator to form an antagonistic network with the super-resolution network model, the super-resolution network model is equivalent to a generator for generating the super-resolution text line picture, and the second neural network CRNN is used for discriminating whether the super-resolution text line picture is a real picture, that is, judging the similarity with the high-resolution text line picture in the training data. Therefore, the judgment network loss function can be obtained through the second neural network CRNN and is used for reflecting the errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the image.
Because the main purpose of the present application is to reconstruct a text instead of a picture, in order to ensure that characters in a super-resolution text line picture constructed by the method are clearer, in the embodiment of the present application, a first neural network CRNN is further added, so that a text error loss function is further added, which can be recorded as ocr loss. Wherein, ocr loss can be l1 loss. Therefore, in the embodiment of the present application, the super-resolution network model is obtained by training in advance based on the generated countermeasure network formed by the discriminator, and also needs to be trained based on the first neural network CRNN. The first neural network CRNN is mainly used for recognizing characters in the picture.
The text error loss function is obtained based on the trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text.
Specifically, the trained first neural network CRNN may identify the characters in the image, so that the high-resolution text line image and the super-resolution text line image in the training data may be identified, so as to compare the difference between the two characters, and construct a text error loss function. Therefore, the whole network architecture for training the super-resolution network model in the embodiment of the present application can be as shown in fig. 3.
And adjusting parameters based on the text error loss function and the judgment network loss function, so as to guide the direction training of the super-resolution network model network which is easier to identify, and enable the super-resolution text line image output by the super-resolution network model to be clearer. For example, as shown in fig. 4, the first line of characters is characters on a low resolution text line picture of the input model, the second line of characters is characters in a super resolution text line picture generated when ocr loss and D ocr loss are not used, the third line of characters is characters in a super resolution text line picture generated when ocr loss and D ocr loss are used, and the third line of characters is characters on a high resolution text line picture in the training data. Therefore, after ocr loss and D ocr loss are used, the obtained characters are clearer and more approximate to the characters on the high-resolution original image.
If it is determined that the text error loss function or the discriminant network loss function does not converge, step S104 is executed. If the text error loss function or the judgment network loss function is converged, step S105 is executed.
And S104, adjusting parameters of the super-resolution network model to be trained based on the text error loss function and the judgment network loss function.
After step S104 is executed, step S102 is executed again until the text error loss function and the discriminant network loss function converge, so as to obtain the trained super-resolution network model.
And S105, determining that the super-resolution network model is trained.
Based on the provided super-resolution network model, another embodiment of the present application provides a document text reconstruction method, as shown in fig. 4, including the following steps:
s501, obtaining an original picture of the document to be processed.
S502, detecting each line of text lines in the original picture, and cutting each line of text lines to obtain a plurality of original text line pictures.
Similarly, an OCR character line detection model, such as EAST model, may be used to detect all text lines in the original image and perform cropping to obtain an original text line image corresponding to each text line, so that one original text line image includes one text line of the document to be processed.
S503, respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture.
Optionally, the super-resolution network model may be obtained by training a generated countermeasure network composed of the super-resolution network model and the discriminator in advance by using a plurality of sets of training data, that is, training the super-resolution network model as a generator in the countermeasure network. Each set of training data comprises a high-resolution text line picture and a low-resolution text line picture of the same text line. For a specific training process, reference may be made to the method provided in the embodiment shown in fig. 1, which is not described herein again. Of course, this is an alternative way, and the super-resolution network model can be constructed and trained separately.
And S504, amplifying the original picture to obtain an amplified picture corresponding to the original picture.
And the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture.
It should be noted that, since the super-resolution network model processes the original text line picture based on the specified magnification factor, the obtained super-resolution text line picture is relatively magnified with respect to the original picture, and the magnified super-resolution text line picture is not matched with the original picture in size, it is necessary to first obtain a magnified picture by magnifying the original picture, and then execute step S505.
Optionally, the original picture may be enlarged by using a two-line interpolation method.
And S505, replacing the local image where each text line in the amplified picture is located with a corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
It should be noted that the super-resolution text line picture corresponding to the local image where the text line in the enlarged picture is located refers to a super-resolution text line picture containing the same text content.
Optionally, in another embodiment of the present application, the method may further include the steps of:
and recording the position parameter of each original text line picture in the original picture.
The position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture. Of course, this is only one alternative way, and may be to record the coordinates of the four corners of the original text line picture, or record other parameters that may indicate the position of the original text line picture in the original picture.
Specifically, when each original text line picture is cut out, the position parameter of each original text line picture in the original picture is recorded and recorded. And recording the corresponding position parameters of each original text line picture so as to facilitate the subsequent replacement of the super-resolution text line picture.
Accordingly, the specific implementation of step S505 in the present application, as shown in fig. 6, includes the following steps:
s601, amplifying the position parameters of each original text line picture in the original pictures to a target multiple to obtain the position parameters in a plurality of amplified pictures.
Wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture. Since the enlarged image is obtained by enlarging the original image, the positions of the original image are correspondingly increased, and therefore, the position parameter of each original text line image in the original image needs to be enlarged to the magnification of the enlarged image. Specifically, each position parameter is multiplied by a target multiple.
S602, replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with a corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
The embodiment of the application provides a method for reconstructing document characters, which is obtained by training a generated countermeasure network formed by a super-resolution network model and a discriminator by using a plurality of groups of training data including high-resolution text line pictures and low-resolution text line pictures in advance, and obtaining the trained super-resolution network model. The method comprises the steps of obtaining an original picture of a document to be processed, detecting each line of text in the original picture, cutting each line of text to obtain a plurality of original text line pictures, inputting each original text line picture into a pre-trained super-resolution network model respectively, processing the original text line pictures through the super-resolution network model to obtain super-resolution text line pictures corresponding to each original text line picture, and effectively processing low-resolution and fuzzy original text line pictures into the super-resolution and clear text line pictures through the super-resolution network model. Because the super-resolution text line picture is amplified, the original picture is subjected to double-line interpolation amplification processing to obtain the amplified pictures corresponding to the original picture and having the same amplification factor, so that the local images of the text lines in the amplified pictures can be replaced by the corresponding super-resolution text line pictures, and the reconstructed picture of the super-resolution and clear original picture can be obtained.
Another embodiment of the present application provides a device for reconstructing a document text, as shown in fig. 7, including the following units:
a first obtaining unit 701, configured to obtain an original picture of a document to be processed.
A first cropping unit 702, configured to detect each line of text in the original picture, and crop each line of text to obtain multiple original text line pictures.
And the processing unit 703 is configured to input each original text line image into a pre-trained super-resolution network model to obtain a super-resolution text line image corresponding to each original text line image.
Optionally, the super-resolution network model is obtained by training a generated countermeasure network formed by the super-resolution network model and the discriminator in advance by using a plurality of sets of training data. Each set of the training data includes a high resolution text line picture and a low resolution text line picture of the same text line.
And the amplifying unit 704 is configured to amplify the original picture to obtain an amplified picture corresponding to the original picture.
And the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture.
Optionally, the enlarging unit 704 may specifically enlarge the original picture by using a two-line interpolation method.
A replacing unit 705, configured to replace the local image where each text line in the enlarged picture is located with the corresponding super-resolution text line picture, so as to obtain a reconstructed picture of the original picture.
Optionally, in an apparatus for reconstructing a document text provided in another embodiment of the present application, further includes:
and the recording unit is used for recording the position parameters of each original text line picture in the original picture. The position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture.
Wherein, the replacement unit in this application embodiment includes:
and the parameter amplifying unit is used for amplifying the position parameters of each original text line picture in the original pictures to target multiples to obtain the position parameters in the amplified pictures.
Wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture.
And the replacing subunit is used for replacing the local image of the enlarged picture indicated by the position parameter in each enlarged picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, the apparatus for reconstructing a document text provided in another embodiment of the present application further includes a training unit. Wherein, the training unit, as shown in fig. 8, comprises:
a second obtaining unit 801, configured to obtain the multiple sets of training data.
The training unit 802 is configured to input the low-resolution text line pictures in each set of training data into a super-resolution network model to be trained, and process the low-resolution text line pictures through the super-resolution network model to obtain super-resolution text line pictures corresponding to each set of training data.
And a parameter adjusting unit 803, configured to adjust parameters of the super-resolution network model to be trained based on a text error loss function and a discriminant network loss function, and return to execute to input the low-resolution text line pictures in each set of training data into the super-resolution network model to be trained respectively until the text error loss function and the discriminant network loss function converge, so as to obtain the trained super-resolution network model.
The text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text. The discriminant network loss function is obtained based on a second neural network CRNN serving as a discriminant and is used for reflecting the error of the high-resolution text line picture in the training data and the error of the super-resolution text line picture corresponding to the training data on an image.
Optionally, in an apparatus for reconstructing a document text provided in another embodiment of the present application, as shown in fig. 9, a second obtaining unit includes:
a file acquiring unit 901 configured to acquire a plurality of high-resolution PDF files.
A converting unit 902, configured to perform picture conversion on each PDF file by using a format conversion tool, so as to obtain a low resolution map and a high resolution map corresponding to each PDF file.
A second clipping unit 903, configured to, for a low-resolution image and a high-resolution image corresponding to each PDF file, detect and clip text lines in the low-resolution image and the high-resolution image by using a text line detection model, respectively, so as to obtain multiple high-resolution text line images and multiple low-resolution text line images.
A composing unit 904, configured to compose a set of training data from the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
Another embodiment of the present application provides an electronic device, as shown in fig. 10, including:
a memory 1001 and a processor 1002.
The memory 1001 is used for storing a program 1002, and the processor 1002 is used for executing the program stored in the memory 1001, and when the program is executed, the program is specifically used for implementing the method for reconstructing document characters provided in any one of the above embodiments.
Another embodiment of the present application provides a computer storage medium for storing a computer program, where the computer program is used to implement the method for reconstructing document text as provided in any one of the above embodiments.
Computer storage media, including permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include transitory computer readable media (transmyedia) such as modulated data signals and carrier waves.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method for reconstructing a document text, comprising:
acquiring an original picture of a document to be processed;
detecting each line of text lines in the original picture, and cutting each line of text lines to obtain a plurality of original text line pictures;
inputting each original text line picture into a pre-trained super-resolution network model respectively to obtain a super-resolution text line picture corresponding to each original text line picture;
amplifying the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture;
and replacing the local image where each text line in the amplified picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
2. The method of claim 1, further comprising:
recording the position parameters of each original text line picture in the original picture; the position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture;
replacing the local image where each text line in the enlarged picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture, wherein the method comprises the following steps:
amplifying the position parameters of each original text line picture in the original pictures to a target multiple to obtain the position parameters in a plurality of amplified pictures; wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture;
and replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
3. The method according to claim 1, wherein the super-resolution network model is obtained by training a generated countermeasure network composed of the super-resolution network model and a discriminator by using a plurality of sets of training data in advance; and each group of training data comprises a high-resolution text line picture and a low-resolution text line picture of the same text line.
4. The method of claim 3, wherein the training method of the super-resolution network model comprises:
acquiring the multiple groups of training data;
respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
adjusting parameters of the super-resolution network model to be trained based on a text error loss function and a discriminant network loss function, and returning to execute to input the low-resolution text line pictures in each set of training data into the super-resolution network model to be trained respectively until the text error loss function and the discriminant network loss function are converged to obtain the trained super-resolution network model;
the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text; the discriminant network loss function is obtained based on a second neural network CRNN serving as a discriminant and is used for reflecting the error of the high-resolution text line picture in the training data and the error of the super-resolution text line picture corresponding to the training data on an image.
5. The method of claim 1, wherein the obtaining a plurality of sets of training data comprises:
acquiring a plurality of high-resolution PDF files;
respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
detecting and cutting text lines in the low-resolution image and the high-resolution image by respectively utilizing a text line detection model aiming at the low-resolution image and the high-resolution image corresponding to each PDF file to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images;
and forming a set of training data by using the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
6. An apparatus for reconstructing text of a document, comprising:
the device comprises a first acquisition unit, a second acquisition unit and a processing unit, wherein the first acquisition unit is used for acquiring an original picture of a document to be processed;
the first cutting unit is used for detecting each line of text lines in the original pictures and cutting each line of text lines to obtain a plurality of original text line pictures;
the processing unit is used for respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture;
the amplifying unit is used for amplifying the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the amplification picture have the same amplification factor relative to the original picture;
and the replacing unit is used for replacing the local image where each text line in the amplified picture is located with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
7. The apparatus of claim 6, further comprising:
the recording unit is used for recording the position parameters of each original text line picture in the original picture; the position parameters comprise the horizontal left side and the vertical coordinate of the upper left corner of the original text line picture, and the width and the length of the original text line picture;
wherein the replacement unit includes:
the parameter amplifying unit is used for amplifying the position parameters of each original text line picture in the original pictures to target multiples to obtain the position parameters in a plurality of amplified pictures; wherein the target magnification is equal to the magnification of the magnified picture relative to the original picture;
and the replacing subunit is used for replacing the local image of the enlarged picture indicated by the position parameter in each enlarged picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
8. The apparatus of claim 6, further comprising a training unit, wherein the training unit comprises:
a second obtaining unit, configured to obtain the multiple sets of training data;
the training unit is used for respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
the parameter adjusting unit is used for adjusting parameters of the super-resolution network model to be trained based on a text error loss function and a judgment network loss function, and returning to execute the step of respectively inputting the low-resolution text line pictures in each group of training data into the super-resolution network model to be trained until the text error loss function and the judgment network loss function are converged to obtain the trained super-resolution network model;
the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the text; the discriminant network loss function is obtained based on a second neural network CRNN serving as a discriminant and is used for reflecting the error of the high-resolution text line picture in the training data and the error of the super-resolution text line picture corresponding to the training data on an image.
9. The apparatus of claim 6, wherein the second obtaining unit comprises:
the file acquisition unit is used for acquiring a plurality of high-resolution PDF files;
the conversion unit is used for respectively carrying out image conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
the second cutting unit is used for detecting and cutting text lines in the low-resolution image and the high-resolution image by respectively utilizing a text line detection model aiming at the low-resolution image and the high-resolution image corresponding to each PDF file to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images;
and the composition unit is used for composing the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line into a group of training data.
10. An electronic device, comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, and when the program is executed, the program is specifically configured to implement the method for reconstructing a document text according to any one of claims 1 to 5.
11. A computer storage medium storing a computer program which, when executed, implements a method of reconstructing a document literal according to any of claims 1 to 5.
CN202110969444.6A 2021-08-23 2021-08-23 Method and device for reconstructing text of document, electronic equipment and computer storage medium Active CN113591798B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110969444.6A CN113591798B (en) 2021-08-23 2021-08-23 Method and device for reconstructing text of document, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110969444.6A CN113591798B (en) 2021-08-23 2021-08-23 Method and device for reconstructing text of document, electronic equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN113591798A true CN113591798A (en) 2021-11-02
CN113591798B CN113591798B (en) 2023-11-03

Family

ID=78239065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110969444.6A Active CN113591798B (en) 2021-08-23 2021-08-23 Method and device for reconstructing text of document, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN113591798B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114155543A (en) * 2021-12-08 2022-03-08 北京百度网讯科技有限公司 Neural network training method, document image understanding method, device and equipment
CN115829837A (en) * 2022-11-15 2023-03-21 深圳市新良田科技股份有限公司 Text image super-resolution reconstruction method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986029A (en) * 2018-07-03 2018-12-11 南京览笛信息科技有限公司 Character image super resolution ratio reconstruction method, system, terminal device and storage medium
CN109410239A (en) * 2018-11-07 2019-03-01 南京大学 A kind of text image super resolution ratio reconstruction method generating confrontation network based on condition
CN110633755A (en) * 2019-09-19 2019-12-31 北京市商汤科技开发有限公司 Network training method, image processing method and device and electronic equipment
CN111461134A (en) * 2020-05-18 2020-07-28 南京大学 Low-resolution license plate recognition method based on generation countermeasure network
US20200258197A1 (en) * 2017-11-24 2020-08-13 Tencent Technology (Shenzhen) Company Limited Method for generating high-resolution picture, computer device, and storage medium
US20200311871A1 (en) * 2017-12-20 2020-10-01 Huawei Technologies Co., Ltd. Image reconstruction method and device
CN112419159A (en) * 2020-12-07 2021-02-26 上海互联网软件集团有限公司 Character image super-resolution reconstruction system and method
CN112734647A (en) * 2021-01-20 2021-04-30 支付宝(杭州)信息技术有限公司 Image processing method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200258197A1 (en) * 2017-11-24 2020-08-13 Tencent Technology (Shenzhen) Company Limited Method for generating high-resolution picture, computer device, and storage medium
US20200311871A1 (en) * 2017-12-20 2020-10-01 Huawei Technologies Co., Ltd. Image reconstruction method and device
CN108986029A (en) * 2018-07-03 2018-12-11 南京览笛信息科技有限公司 Character image super resolution ratio reconstruction method, system, terminal device and storage medium
CN109410239A (en) * 2018-11-07 2019-03-01 南京大学 A kind of text image super resolution ratio reconstruction method generating confrontation network based on condition
CN110633755A (en) * 2019-09-19 2019-12-31 北京市商汤科技开发有限公司 Network training method, image processing method and device and electronic equipment
CN111461134A (en) * 2020-05-18 2020-07-28 南京大学 Low-resolution license plate recognition method based on generation countermeasure network
CN112419159A (en) * 2020-12-07 2021-02-26 上海互联网软件集团有限公司 Character image super-resolution reconstruction system and method
CN112734647A (en) * 2021-01-20 2021-04-30 支付宝(杭州)信息技术有限公司 Image processing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李瑞明;张烨;: "基于马尔科夫网络的文本图像超分辨率重建", 山西电子技术, no. 04 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114155543A (en) * 2021-12-08 2022-03-08 北京百度网讯科技有限公司 Neural network training method, document image understanding method, device and equipment
CN115829837A (en) * 2022-11-15 2023-03-21 深圳市新良田科技股份有限公司 Text image super-resolution reconstruction method and system

Also Published As

Publication number Publication date
CN113591798B (en) 2023-11-03

Similar Documents

Publication Publication Date Title
CN113591798B (en) Method and device for reconstructing text of document, electronic equipment and computer storage medium
AU2021201141A1 (en) System and method for data extraction and searching
US6393150B1 (en) Region-based image binarization system
Sabourin et al. Off-line identification with handwritten signature images: survey and perspectives
CN111291661B (en) Method and equipment for identifying text content of icon in screen
US20070091373A1 (en) Medium processing method, copying apparatus, and data filing apparatus
CN110210400B (en) Table file detection method and equipment
CN112037129A (en) Image super-resolution reconstruction method, device, equipment and storage medium
CN104834645B (en) Method and apparatus for showing format document
WO2001003416A1 (en) Border eliminating device, border eliminating method, and authoring device
US11704925B2 (en) Systems and methods for digitized document image data spillage recovery
AU2006235826A1 (en) Image processing device, image processing method, and storage medium storing image processing program
US7929772B2 (en) Method for generating typographical line
JP2640673B2 (en) Pattern recognition device
CN116092108A (en) Method, system and storage medium for generating PDF file by scanning entity document
US20020012468A1 (en) Document recognition apparatus and method
CN116050379A (en) Document comparison method and storage medium
JP3310744B2 (en) Resolution converter
CN100511267C (en) Graph and text image processing equipment and image processing method thereof
CN111681173A (en) Inclined image correction method based on combination of minimum area bounding box and projection method
JP2020154562A (en) Information processing device, information processing method and program
CN105015184A (en) Image processing apparatus and image forming apparatus
WO2024085229A1 (en) Image processing device, image processing system, information code search method, and information code search program
US20230343322A1 (en) Provision of voice information by using printout on which attribute information of document is recorded
CN117115840A (en) Information extraction method, device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant