CN113591798B - Method and device for reconstructing text of document, electronic equipment and computer storage medium - Google Patents

Method and device for reconstructing text of document, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN113591798B
CN113591798B CN202110969444.6A CN202110969444A CN113591798B CN 113591798 B CN113591798 B CN 113591798B CN 202110969444 A CN202110969444 A CN 202110969444A CN 113591798 B CN113591798 B CN 113591798B
Authority
CN
China
Prior art keywords
resolution
picture
text line
super
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110969444.6A
Other languages
Chinese (zh)
Other versions
CN113591798A (en
Inventor
张陆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202110969444.6A priority Critical patent/CN113591798B/en
Publication of CN113591798A publication Critical patent/CN113591798A/en
Application granted granted Critical
Publication of CN113591798B publication Critical patent/CN113591798B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4046Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4053Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The application discloses a method and a device for reconstructing document characters, electronic equipment and a computer storage medium, wherein the method comprises the following steps: acquiring an original picture of a document to be processed; detecting each text line in the original picture, and cutting to obtain a plurality of original text line pictures; respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture; amplifying the original picture to obtain an amplified picture corresponding to the original picture; and replacing the local image in which each text line in the amplified picture is positioned with a corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture. Based on the super-resolution network model and text line image replacement, the reconstructed image of the super-resolution and clear original image is obtained.

Description

Method and device for reconstructing text of document, electronic equipment and computer storage medium
Technical Field
The present application relates to the field of text reconstruction technologies, and in particular, to a method and apparatus for reconstructing text in a document, an electronic device, and a computer storage medium.
Background
Since text is often required to be transmitted in daily life, a series of compression techniques used in network transmission or scanning transmission of documents and the like can cause the resolution of the documents to be reduced and the writing to be blurred, thereby causing discomfort or obstruction to the user in reading.
In order to solve the problem of low resolution caused by the document during transmission, one of the main approaches adopted today is to enlarge the document by a method such as bilinear interpolation, thereby enlarging the size of the document to a specified multiple. Another way is to identify the text based on optical character recognition technology (Optical Character Recognition, OCR) and reconstruct the document from the identification result.
However, the first approach, while enlarging the document size, makes the document more blurred. The second approach is limited by the accuracy of OCR system recognition at low resolution. Since the recognition accuracy for a low-resolution document is low, erroneous document information is liable to occur.
Disclosure of Invention
Based on the defects of the prior art, the application provides a document text reconstruction method and device, electronic equipment and a computer storage medium, so as to solve the problem that the prior art cannot effectively process a low-resolution and unclear document into a high-resolution and accurate and clear document.
In order to achieve the above object, the present application provides the following technical solutions:
the first aspect of the application provides a method for reconstructing text of a document, which comprises the following steps:
acquiring an original picture of a document to be processed;
detecting each text line in the original picture, and cutting each text line to obtain a plurality of original text line pictures;
inputting each original text line picture into a pre-trained super-resolution network model respectively to obtain super-resolution text line pictures corresponding to each original text line picture;
amplifying the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the enlarged picture have the same magnification factor relative to the original picture;
and replacing the local image of each text line in the enlarged picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above method, further comprising:
recording the position parameters of each original text line picture in the original picture; wherein the position parameters comprise the left and right horizontal coordinates of the upper left corner of the original text line picture and the width and length of the original text line picture;
The step of replacing the local image of each text line in the enlarged picture with the corresponding super-resolution text line picture to obtain the reconstructed picture of the original picture comprises the following steps:
amplifying the position parameters of each original text line picture in the original picture to target multiple to obtain the position parameters in a plurality of amplified pictures; the target multiple is equal to the magnification of the amplified picture relative to the original picture;
and replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above method, the super-resolution network model is obtained by training a generated countermeasure network composed of the super-resolution network model and the arbiter by using multiple sets of training data in advance; wherein each set of training data comprises a high resolution text line picture and a low resolution text line picture of the same text line.
Optionally, in the above method, the training method of the super-resolution network model includes:
acquiring the plurality of sets of training data;
Respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
based on a text error loss function and a discriminant network loss function, performing parameter adjustment on the super-resolution network model to be trained, and returning to execute the steps of inputting the low-resolution text line pictures in each group of training data into the super-resolution network model to be trained until the text error loss function and the discriminant network loss function are converged, so as to obtain the trained super-resolution network model;
the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on texts; the judging network loss function is obtained based on a second neural network CRNN serving as a judging device and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on images.
Optionally, in the above method, the acquiring multiple sets of training data includes:
acquiring a plurality of high-resolution PDF files;
respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
detecting text lines in the low-resolution image and the high-resolution image by using a text line detection model according to a low-resolution image and a high-resolution image corresponding to each PDF file, and cutting to obtain a plurality of high-resolution text line pictures and a plurality of low-resolution text line pictures;
and forming a group of training data by the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
The second aspect of the present application provides a device for reconstructing text of a document, comprising:
the first acquisition unit is used for acquiring an original picture of a document to be processed;
the first clipping unit is used for detecting each text line in the original pictures and clipping the text lines to obtain a plurality of original text line pictures;
the processing unit is used for respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture;
The amplifying unit is used for carrying out bilinear interpolation amplifying treatment on the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the enlarged picture have the same magnification factor relative to the original picture;
and the replacing unit is used for replacing the local image of each text line in the amplified picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
Optionally, in the above device, the method further includes:
the recording unit is used for recording the position parameters of each original text line picture in the original picture; wherein the position parameters comprise the left and right horizontal coordinates of the upper left corner of the original text line picture and the width and length of the original text line picture;
wherein the replacement unit includes:
the parameter amplifying unit is used for amplifying the position parameters of each original text line picture in the original picture to target multiple so as to obtain the position parameters in a plurality of amplified pictures; the target multiple is equal to the magnification of the amplified picture relative to the original picture;
And the replacing subunit is used for replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with the corresponding super-resolution text line picture to obtain the reconstructed picture of the original picture.
Optionally, in the above device, the device further includes a training unit, where the training unit includes:
the second acquisition unit is used for acquiring the plurality of groups of training data;
the training unit is used for respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
the parameter adjusting unit is used for adjusting parameters of the super-resolution network model to be trained based on a text error loss function and a discrimination network loss function, and returning to execute the steps of inputting the low-resolution text line pictures in each group of training data into the super-resolution network model to be trained until the text error loss function and the discrimination network loss function are converged, so as to obtain the trained super-resolution network model;
the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on texts; the judging network loss function is obtained based on a second neural network CRNN serving as a judging device and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on images.
Optionally, in the above apparatus, the second obtaining unit includes:
a file acquisition unit for acquiring a plurality of high-resolution PDF files;
the conversion unit is used for respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
the second clipping unit is used for detecting text lines in the low-resolution image and the high-resolution image by using a text line detection model respectively aiming at the low-resolution image and the high-resolution image corresponding to each PDF file, and clipping the text lines to obtain a plurality of high-resolution text line pictures and a plurality of low-resolution text line pictures;
and the composition unit is used for composing the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line into a group of training data.
A third aspect of the present application provides an electronic device, comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, where the program is executed, and specifically configured to implement the method for reconstructing a document text according to any one of the foregoing embodiments.
A fourth aspect of the present application provides a computer storage medium storing a computer program which, when executed, is adapted to carry out a method of reconstructing a document text as set out in any one of the preceding claims.
The embodiment of the application provides a document text reconstruction method, which comprises the steps of training a super-resolution network model in advance, detecting each text line in an original picture by acquiring the original picture of a document to be processed, cutting each text line to obtain a plurality of original text line pictures, respectively inputting each original text line picture into the pre-trained super-resolution network model, and processing the original text line pictures through the super-resolution network model to obtain super-resolution text line pictures corresponding to each original text line picture, thereby effectively processing the low-resolution and fuzzy original text line pictures into super-resolution and clear text line pictures through the super-resolution network model. Because the super-resolution text line picture is amplified, the original picture is amplified to obtain an amplified picture with the same amplification factor corresponding to the original picture, so that the local image of each text line in the amplified picture can be replaced by the corresponding super-resolution text line picture, and the reconstructed picture of the super-resolution and clear original picture is obtained.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a training method of a super-resolution network model according to an embodiment of the present application;
FIG. 2 is a flowchart of a method for acquiring training data according to another embodiment of the present application;
FIG. 3 is a schematic diagram of a training model network architecture according to another embodiment of the present application;
FIG. 4 is a graph showing an exemplary comparison of output results provided by another embodiment of the present application;
FIG. 5 is a flowchart of a method for reconstructing text of a document according to another embodiment of the present application;
FIG. 6 is a flowchart of a method for replacing a picture according to another embodiment of the present application;
FIG. 7 is a schematic diagram of a device for reconstructing text of a document according to another embodiment of the present application;
FIG. 8 is a schematic diagram of a training unit according to another embodiment of the present application;
fig. 9 is a schematic structural diagram of a second obtaining unit according to another embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to another embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
In the present application, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The application provides a method for reconstructing text of a document, which aims to solve the problem that the prior art cannot effectively process a document with low resolution and unclear into a document with high resolution and accuracy.
The application provides a document text reconstruction method which is realized based on a super-resolution network model generated by an antagonism network. In the embodiment of the present application, the super-resolution network model is used as the generator, and the corresponding network is generated by combining the discriminator, and the super-resolution network model is trained based on the generated countermeasure network. Therefore, optionally, an embodiment of the present application provides a training method for a super-resolution network model, as shown in fig. 1, including the following steps:
S101, acquiring a plurality of groups of training data.
Wherein each set of training data includes a high resolution text line picture and a low resolution text line picture of the same text line.
Specifically, the same text line on the same picture is processed into a high-resolution text line picture and a low-resolution text line picture respectively, and then the two text line pictures form a group of training data. The low-resolution text line pictures are used for being input into the super-resolution network model to be reconstructed, the text line pictures exceeding the resolution are obtained, the high-resolution text line pictures are equivalent to labels and are used for being compared with the text line pictures exceeding the resolution output by the super-resolution network model, and therefore training effects of the super-resolution network model can be determined.
Optionally, in another embodiment of the present application, a specific implementation of step S101, as shown in fig. 2, includes the following steps:
s201, a plurality of high-resolution PDF files are obtained.
It should be noted that, in the embodiment, the training data is not constructed by using a standard bicubic interpolation algorithm, but a more common document degradation mode is adopted. Because of image degradation caused by reduced resolution of DPI during the transfer of pdf documents, it is more common in document files, especially scan files. Of course, training data obtained by the bicubic interpolation algorithm may be added for other scene applications, or otherwise constructed.
S202, respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file.
Wherein the format conversion tool may be pdftopppm.
Specifically, the PDF file is converted into a low-resolution map and a high-resolution map by setting different magnification factors scale. For example, taking a magnification factor of 2 times as an example, a high resolution map of a PDF file is obtained by setting the dpi parameter of pdftopppm to 200, and a low resolution map of a PDF file is obtained by setting the dpi parameter of pdftopppm to 100.
S203, detecting text lines in the low-resolution image and the high-resolution image by using a text line detection model respectively aiming at the low-resolution image and the high-resolution image corresponding to each PDF file, and cutting to obtain a plurality of high-resolution text line images and a plurality of low-resolution text line images.
Specifically, an OCR text line detection model, such as EAST, is used to detect the low resolution image and the high resolution image respectively, detect text lines in the low resolution image and the high resolution image, and cut each text line, thereby obtaining a plurality of high resolution text line pictures and a plurality of low resolution text line pictures.
S204, forming a group of training data by the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
S102, respectively inputting the low-resolution text line pictures in each group of training data into a super-resolution network model to be trained, and obtaining the super-resolution text line pictures corresponding to each group of training data.
And S103, judging the text error loss function and judging whether the network loss function is converged or not.
The discriminant network Loss function is denoted as D OCR Loss in the embodiment of the present application, where D OCR Loss may be vanella Loss. The discrimination network loss function is obtained based on a second neural network CRNN serving as a discriminator and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on the image. That is, the second neural network CRNN is used as a discriminator to form an countermeasure network with the super-resolution network model, the super-resolution network model is equivalent to a generator for generating super-resolution text line pictures, and the second neural network CRNN is used to discriminate whether the super-resolution text line pictures are real pictures, i.e. to determine the similarity degree with the high-resolution text line pictures in the training data. Therefore, the second neural network CRNN is used to obtain a discrimination network loss function for reflecting the error of the high-resolution text line picture in the training data and the super-resolution text line picture corresponding to the training data on the image.
Since the main purpose of the application is not to reconstruct pictures, but to reconstruct texts, in order to ensure that the characters in the constructed super-resolution text line pictures can be clearer, in the embodiment of the application, a first neural network CRNN is further added, and then a text error loss function is added, which can be recorded as ocr loss. Wherein, ocr loss can be l1 loss. Therefore, in the embodiment of the present application, the super-resolution network model is not only obtained by training based on the generated countermeasure network composed of the discriminators in advance, but also needs to be trained based on the first neural network CRNN. The first neural network CRNN is mainly used for recognizing characters in the picture.
The text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in training data and the super-resolution text line pictures corresponding to the training data on texts.
Specifically, the trained first neural network CRNN can identify the characters in the pictures, so that the text error loss function can be constructed by identifying the high-resolution text line pictures and the super-resolution text line pictures in the training data, and comparing the differences of the high-resolution text line pictures and the super-resolution text line pictures on the characters. The whole network architecture of the super-resolution network model trained by the embodiment of the application can be shown in fig. 3.
The parameter adjustment is performed based on the text error loss function and the discrimination network loss function, so that the direction training of the super-resolution network model network which is easier to identify is guided, and the super-resolution text line picture output by the super-resolution network model is clearer. For example, as shown in fig. 4, the first line of characters is characters on a low-resolution text line picture of the input model, the second line of characters is characters in a super-resolution text line picture generated when ocr loss and D ocr loss are not used, the third line of characters is characters in a super-resolution text line picture generated when ocr loss and Docr loss are used, and the third line of characters is characters on a high-resolution text line picture in training data. It can be seen that after using ocr loss and D ocr loss, the obtained text is clearer and more approximate to the text on the high-resolution artwork.
If it is determined that the text error loss function or the network loss function is not converged, step S104 is performed. If it is determined that the text error loss function or the discrimination network loss function has converged, step S105 is performed.
S104, based on the text error loss function and the discrimination network loss function, parameter adjustment is carried out on the super-resolution network model to be trained.
After step S104 is performed, step S102 is performed again until the text error loss function and the discrimination network loss function converge, so as to obtain a trained super-resolution network model.
S105, determining that the super-resolution network model training is completed.
Based on the provided super-resolution network model, another embodiment of the present application provides a method for reconstructing text of a document, as shown in fig. 4, including the following steps:
s501, acquiring an original picture of a document to be processed.
S502, detecting each text line in the original picture, and cutting each text line to obtain a plurality of original text line pictures.
Similarly, an OCR text line detection model, such as an eat model, may be used to detect all text lines in the original image and cut to obtain an original text line image corresponding to each text line, so that one original text line image includes a line of text of the document to be processed.
S503, respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture.
Optionally, the super-resolution network model may be obtained by training a generated countermeasure network composed of the super-resolution network model and the arbiter by using multiple sets of training data in advance, that is, training the super-resolution network model as a generator in the countermeasure network. Wherein each set of training data includes a high resolution text line picture and a low resolution text line picture of the same text line. The specific training process may be referred to as a method provided in the embodiment shown in fig. 1, and will not be described herein. Of course, this is an alternative, or it is possible to construct and train the super-resolution network model separately.
S504, amplifying the original picture to obtain an amplified picture corresponding to the original picture.
And the super-resolution text line picture and the enlarged picture have the same magnification factor relative to the original picture.
It should be noted that, since the super-resolution network model processes the original text line image based on the specified magnification factor, the obtained super-resolution text line image is magnified with respect to the original image, so that the magnified super-resolution text line image is not matched with the original image in size, and therefore, the magnified super-resolution text line image is obtained by magnifying the original image, and then step S505 is performed.
Alternatively, a bilinear interpolation method may be used to amplify the original picture.
S505, replacing the local image of each text line in the amplified picture with a corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
The super-resolution text line picture corresponding to the partial image in which the text line in the enlarged picture is located refers to a super-resolution text line picture containing the same text content.
Optionally, in another embodiment of the present application, the method may further include the steps of:
And recording the position parameters of each original text line picture in the original picture.
Wherein the position parameters include the left and right horizontal coordinates of the upper left corner of the original text line picture, and the width and length of the original text line picture. Of course, this is just one alternative, and may be to record the coordinates of the four corners of the original text line picture, or to record other parameters that may represent the position of the original text line picture in the original picture.
Specifically, when each original text line picture is cut out, the position parameter of each original text line picture in the original picture is recorded and recorded. And recording the position parameters corresponding to each original text line picture so as to facilitate the subsequent replacement of the super-resolution text line picture.
Accordingly, in the present application, the specific embodiment of step S505, as shown in fig. 6, includes the following steps:
s601, amplifying the position parameters of each original text line picture in the original pictures to target multiples to obtain the position parameters in a plurality of amplified pictures.
Wherein, the target magnification is equal to the magnification of the enlarged picture relative to the original picture. Since the enlarged picture is obtained by enlarging the original picture, the positions on the original picture are correspondingly enlarged, so that the position parameter of each original text line picture in the original picture needs to be enlarged to the magnification of the enlarged picture. In particular, the respective position parameter is multiplied by a target multiple.
S602, replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with a corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
The embodiment of the application provides a method for reconstructing document characters, which is characterized in that a super-resolution network model and a generation countermeasure network consisting of a discriminator are trained by utilizing a plurality of groups of training data comprising high-resolution text line pictures and low-resolution text line pictures in advance to obtain the trained super-resolution network model. The method comprises the steps of obtaining an original picture of a document to be processed, detecting each text line in the original picture, cutting each text line to obtain a plurality of original text line pictures, respectively inputting each original text line picture into a pre-trained super-resolution network model, and processing the original text line pictures through the super-resolution network model to obtain super-resolution text line pictures corresponding to each original text line picture, so that the low-resolution and fuzzy original text line pictures are effectively processed into super-resolution and clear text line pictures through the super-resolution network model. Because the super-resolution text line picture is amplified, the original picture is subjected to double-line interpolation amplification processing to obtain amplified pictures with the same amplification factor corresponding to the original picture, so that local images of the text lines in the amplified pictures can be replaced by the corresponding super-resolution text line pictures, and the reconstructed picture of the super-resolution and clear original picture is obtained.
Another embodiment of the present application provides a document text reconstruction device, as shown in FIG. 7, comprising the following units:
a first obtaining unit 701, configured to obtain an original picture of a document to be processed.
The first clipping unit 702 is configured to detect each text line in the original picture, and clip the text line of each line, so as to obtain a plurality of original text line pictures.
The processing unit 703 is configured to input each of the original text line pictures into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each of the original text line pictures.
Optionally, the super-resolution network model is obtained by training the generated countermeasure network consisting of the super-resolution network model and the discriminator by utilizing multiple sets of training data in advance. Each set of the training data comprises a high resolution text line picture and a low resolution text line picture of the same text line.
And the amplifying unit 704 is configured to amplify the original picture to obtain an amplified picture corresponding to the original picture.
The super-resolution text line picture and the enlarged picture have the same magnification factor relative to the original picture.
Alternatively, the enlarging unit 704 may specifically be a method of enlarging the original picture by using a bilinear interpolation method.
And a replacing unit 705, configured to replace a local image where each text line in the enlarged picture is located with a corresponding super-resolution text line picture, so as to obtain a reconstructed picture of the original picture.
Optionally, in the device for reconstructing document text provided in another embodiment of the present application, the device further includes:
and the recording unit is used for recording the position parameters of each original text line picture in the original picture. Wherein the position parameters include the left and right horizontal coordinates of the upper left corner of the original text line picture, and the width and length of the original text line picture.
The replacing unit in the embodiment of the application comprises the following components:
and the parameter amplifying unit is used for amplifying the position parameters of each original text line picture in the original picture to target multiple so as to obtain the position parameters in a plurality of amplified pictures.
The target magnification is equal to the magnification of the enlarged picture relative to the original picture.
And the replacing subunit is used for replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with the corresponding super-resolution text line picture to obtain the reconstructed picture of the original picture.
Optionally, in the document text reconstruction device provided in another embodiment of the present application, the device further includes a training unit. The training unit, as shown in fig. 8, includes:
a second obtaining unit 801, configured to obtain the multiple sets of training data.
The training unit 802 is configured to input the low-resolution text line pictures in each set of training data into a super-resolution network model to be trained, and process the low-resolution text line pictures through the super-resolution network model to obtain super-resolution text line pictures corresponding to each set of training data.
And a parameter tuning unit 803, configured to tune the super-resolution network model to be trained based on a text error loss function and a discrimination network loss function, and return to perform input of the low-resolution text line pictures in each set of training data into the super-resolution network model to be trained until the text error loss function and the discrimination network loss function converge, thereby obtaining the trained super-resolution network model.
The text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on texts. The judging network loss function is obtained based on a second neural network CRNN serving as a judging device and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on images.
Optionally, in the apparatus for reconstructing a document text provided in another embodiment of the present application, as shown in fig. 9, the second obtaining unit includes:
a file acquisition unit 901 for acquiring a plurality of PDF files of high resolution.
And a conversion unit 902, configured to convert the pictures of each PDF file by using a format conversion tool, so as to obtain a low resolution map and a high resolution map corresponding to each PDF file.
The second clipping unit 903 is configured to detect text lines in the low-resolution image and the high-resolution image by using a text line detection model for the low-resolution image and the high-resolution image corresponding to each PDF file, and clip the text lines to obtain a plurality of high-resolution text line pictures and a plurality of low-resolution text line pictures.
A composing unit 904, configured to compose a set of training data from the high resolution text line picture and the low resolution text line picture corresponding to the same text line.
Another embodiment of the present application provides an electronic device, as shown in fig. 10, including:
a memory 1001 and a processor 1002.
The memory 1001 is configured to store a program 1002, and the processor 1002 is configured to execute the program stored in the memory 1001, where the program is specifically configured to implement the method for reconstructing a document text provided in any one of the embodiments.
Another embodiment of the present application provides a computer storage medium storing a computer program for implementing the method for reconstructing a document text provided in any one of the above embodiments when the computer program is executed.
Computer storage media, including both non-transitory and non-transitory, removable and non-removable media, may be implemented in any method or technology for storage of information. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, read only compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A method for reconstructing text of a document is characterized by comprising the following steps:
acquiring an original picture of a document to be processed;
detecting each text line in the original picture, and cutting each text line to obtain a plurality of original text line pictures;
inputting each original text line picture into a pre-trained super-resolution network model respectively to obtain super-resolution text line pictures corresponding to each original text line picture; the super-resolution network model is obtained by continuously adjusting parameters based on a text error loss function and a discrimination network loss function until the text error loss function and the discrimination network loss function are converged; the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of high-resolution text line pictures in training data and super-resolution text line pictures corresponding to the training data on texts; the judging network loss function is obtained based on a second neural network CRNN serving as a judging device and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on images; the super-resolution text line pictures corresponding to the training data are pictures obtained by inputting low-resolution text line pictures in the training data into the super-resolution network model to be trained;
Amplifying the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the enlarged picture have the same magnification factor relative to the original picture;
and replacing the local image of each text line in the enlarged picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
2. The method as recited in claim 1, further comprising:
recording the position parameters of each original text line picture in the original picture; wherein the position parameters comprise the left and right horizontal coordinates of the upper left corner of the original text line picture and the width and length of the original text line picture;
the step of replacing the local image of each text line in the enlarged picture with the corresponding super-resolution text line picture to obtain the reconstructed picture of the original picture comprises the following steps:
amplifying the position parameters of each original text line picture in the original picture to target multiple to obtain the position parameters in a plurality of amplified pictures; the target multiple is equal to the magnification of the amplified picture relative to the original picture;
And replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
3. The method according to claim 1, wherein the super-resolution network model is obtained by training a generated countermeasure network composed of the super-resolution network model and a discriminator by using a plurality of sets of training data in advance; wherein each set of training data comprises a high resolution text line picture and a low resolution text line picture of the same text line.
4. A method according to claim 3, wherein the training method of the super-resolution network model comprises:
acquiring a plurality of groups of training data;
respectively inputting the low-resolution text line pictures in each group of training data into the super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
and performing parameter adjustment on the super-resolution network model to be trained based on the text error loss function and the discrimination network loss function, and returning to execute the step of inputting the low-resolution text line pictures in each group of training data into the super-resolution network model to be trained until the text error loss function and the discrimination network loss function are converged, so as to obtain the trained super-resolution network model.
5. The method of claim 4, wherein the obtaining the plurality of sets of training data comprises:
acquiring a plurality of high-resolution PDF files;
respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
detecting text lines in the low-resolution image and the high-resolution image by using a text line detection model according to a low-resolution image and a high-resolution image corresponding to each PDF file, and cutting to obtain a plurality of high-resolution text line pictures and a plurality of low-resolution text line pictures;
and forming a group of training data by the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line.
6. A document text reconstruction device, comprising:
the first acquisition unit is used for acquiring an original picture of a document to be processed;
the first clipping unit is used for detecting each text line in the original pictures and clipping the text lines to obtain a plurality of original text line pictures;
the processing unit is used for respectively inputting each original text line picture into a pre-trained super-resolution network model to obtain a super-resolution text line picture corresponding to each original text line picture; the super-resolution network model is obtained by continuously adjusting parameters based on a text error loss function and a discrimination network loss function until the text error loss function and the discrimination network loss function are converged; the text error loss function is obtained based on a trained first neural network CRNN and is used for reflecting errors of high-resolution text line pictures in training data and super-resolution text line pictures corresponding to the training data on texts; the judging network loss function is obtained based on a second neural network CRNN serving as a judging device and is used for reflecting errors of the high-resolution text line pictures in the training data and the super-resolution text line pictures corresponding to the training data on images; the super-resolution text line pictures corresponding to the training data are pictures obtained by inputting low-resolution text line pictures in the training data into the super-resolution network model to be trained;
The amplifying unit is used for amplifying the original picture to obtain an amplified picture corresponding to the original picture; the super-resolution text line picture and the enlarged picture have the same magnification factor relative to the original picture;
and the replacing unit is used for replacing the local image of each text line in the amplified picture with the corresponding super-resolution text line picture to obtain a reconstructed picture of the original picture.
7. The apparatus as recited in claim 6, further comprising:
the recording unit is used for recording the position parameters of each original text line picture in the original picture; wherein the position parameters comprise the left and right horizontal coordinates of the upper left corner of the original text line picture and the width and length of the original text line picture;
wherein the replacement unit includes:
the parameter amplifying unit is used for amplifying the position parameters of each original text line picture in the original picture to target multiple so as to obtain the position parameters in a plurality of amplified pictures; the target multiple is equal to the magnification of the amplified picture relative to the original picture;
And the replacing subunit is used for replacing the local image of the amplified picture indicated by the position parameter in each amplified picture with the corresponding super-resolution text line picture to obtain the reconstructed picture of the original picture.
8. The apparatus of claim 6, further comprising a training unit, wherein the training unit comprises:
the second acquisition unit is used for acquiring a plurality of groups of training data;
the training unit is used for respectively inputting the low-resolution text line pictures in each group of training data into the super-resolution network model to be trained to obtain super-resolution text line pictures corresponding to each group of training data;
and the parameter adjusting unit is used for adjusting parameters of the super-resolution network model to be trained based on the text error loss function and the discrimination network loss function, and returning to the training unit to execute the steps of respectively inputting the low-resolution text line pictures in each group of training data into the super-resolution network model to be trained until the text error loss function and the discrimination network loss function are converged, so as to obtain the trained super-resolution network model.
9. The apparatus of claim 8, wherein the second acquisition unit comprises:
a file acquisition unit for acquiring a plurality of high-resolution PDF files;
the conversion unit is used for respectively carrying out picture conversion on each PDF file by using a format conversion tool to obtain a low-resolution image and a high-resolution image corresponding to each PDF file;
the second clipping unit is used for detecting text lines in the low-resolution image and the high-resolution image by using a text line detection model respectively aiming at the low-resolution image and the high-resolution image corresponding to each PDF file, and clipping the text lines to obtain a plurality of high-resolution text line pictures and a plurality of low-resolution text line pictures;
and the composition unit is used for composing the high-resolution text line pictures and the low-resolution text line pictures corresponding to the same text line into a group of training data.
10. An electronic device, comprising:
a memory and a processor;
wherein the memory is used for storing programs;
the processor is configured to execute the program, and when the program is executed, the program is specifically configured to implement the method for reconstructing a document text according to any one of claims 1 to 5.
11. A computer storage medium storing a computer program which, when executed, is adapted to carry out the method of reconstructing a document text as claimed in any one of claims 1 to 5.
CN202110969444.6A 2021-08-23 2021-08-23 Method and device for reconstructing text of document, electronic equipment and computer storage medium Active CN113591798B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110969444.6A CN113591798B (en) 2021-08-23 2021-08-23 Method and device for reconstructing text of document, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110969444.6A CN113591798B (en) 2021-08-23 2021-08-23 Method and device for reconstructing text of document, electronic equipment and computer storage medium

Publications (2)

Publication Number Publication Date
CN113591798A CN113591798A (en) 2021-11-02
CN113591798B true CN113591798B (en) 2023-11-03

Family

ID=78239065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110969444.6A Active CN113591798B (en) 2021-08-23 2021-08-23 Method and device for reconstructing text of document, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN113591798B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114155543B (en) * 2021-12-08 2022-11-29 北京百度网讯科技有限公司 Neural network training method, document image understanding method, device and equipment
CN115829837A (en) * 2022-11-15 2023-03-21 深圳市新良田科技股份有限公司 Text image super-resolution reconstruction method and system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986029A (en) * 2018-07-03 2018-12-11 南京览笛信息科技有限公司 Character image super resolution ratio reconstruction method, system, terminal device and storage medium
CN109410239A (en) * 2018-11-07 2019-03-01 南京大学 A kind of text image super resolution ratio reconstruction method generating confrontation network based on condition
CN110633755A (en) * 2019-09-19 2019-12-31 北京市商汤科技开发有限公司 Network training method, image processing method and device and electronic equipment
CN111461134A (en) * 2020-05-18 2020-07-28 南京大学 Low-resolution license plate recognition method based on generation countermeasure network
CN112419159A (en) * 2020-12-07 2021-02-26 上海互联网软件集团有限公司 Character image super-resolution reconstruction system and method
CN112734647A (en) * 2021-01-20 2021-04-30 支付宝(杭州)信息技术有限公司 Image processing method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108022212B (en) * 2017-11-24 2022-07-01 腾讯科技(深圳)有限公司 High-resolution picture generation method, generation device and storage medium
CN109949255B (en) * 2017-12-20 2023-07-28 华为技术有限公司 Image reconstruction method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986029A (en) * 2018-07-03 2018-12-11 南京览笛信息科技有限公司 Character image super resolution ratio reconstruction method, system, terminal device and storage medium
CN109410239A (en) * 2018-11-07 2019-03-01 南京大学 A kind of text image super resolution ratio reconstruction method generating confrontation network based on condition
CN110633755A (en) * 2019-09-19 2019-12-31 北京市商汤科技开发有限公司 Network training method, image processing method and device and electronic equipment
CN111461134A (en) * 2020-05-18 2020-07-28 南京大学 Low-resolution license plate recognition method based on generation countermeasure network
CN112419159A (en) * 2020-12-07 2021-02-26 上海互联网软件集团有限公司 Character image super-resolution reconstruction system and method
CN112734647A (en) * 2021-01-20 2021-04-30 支付宝(杭州)信息技术有限公司 Image processing method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于马尔科夫网络的文本图像超分辨率重建;李瑞明;张烨;;山西电子技术(04);全文 *

Also Published As

Publication number Publication date
CN113591798A (en) 2021-11-02

Similar Documents

Publication Publication Date Title
CN113591798B (en) Method and device for reconstructing text of document, electronic equipment and computer storage medium
US10783367B2 (en) System and method for data extraction and searching
KR20190095651A (en) Apparatus for generating training data for character learning and method thereof
Kumar et al. A dataset for quality assessment of camera captured document images
CN112560861A (en) Bill processing method, device, equipment and storage medium
US20220222284A1 (en) System and method for automated information extraction from scanned documents
JP2011525736A (en) How to find hard copy media orientation and date
US10616443B1 (en) On-device artificial intelligence systems and methods for document auto-rotation
CN111291661B (en) Method and equipment for identifying text content of icon in screen
CN108765532B (en) Child drawing model building method, reading robot and storage device
CN111914597A (en) Document comparison identification method and device, electronic equipment and readable storage medium
CN111428656A (en) Mobile terminal identity card identification method based on deep learning and mobile device
CN112749694B (en) Method and device for recognizing image direction and nameplate characters
CN112464629A (en) Form filling method and device
CN111611986B (en) Method and system for extracting and identifying focus text based on finger interaction
Li et al. Research on improving OCR recognition based on bending correction
CN110941947A (en) Document editing method and device, computer storage medium and terminal
JP2011524570A (en) Determining the location of scanned hardcopy media
CN114463758A (en) OCR double-layer file generation method capable of retaining native content
CN100511267C (en) Graph and text image processing equipment and image processing method thereof
JP2010009579A (en) System and method for detecting document content in real time
Agegnehu et al. Offline Handwritten Amharic Digit and Punctuation Mark Script Recognition using Deep learning
US20220179597A1 (en) Modify and output printout including data in predefined format
CN117632852A (en) Method, device and equipment for converting PDF format and readable storage medium
CN117253244A (en) Book data processing method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant