CN112419159B - Text image super-resolution reconstruction system and method - Google Patents

Text image super-resolution reconstruction system and method Download PDF

Info

Publication number
CN112419159B
CN112419159B CN202011417305.4A CN202011417305A CN112419159B CN 112419159 B CN112419159 B CN 112419159B CN 202011417305 A CN202011417305 A CN 202011417305A CN 112419159 B CN112419159 B CN 112419159B
Authority
CN
China
Prior art keywords
image
resolution
super
loss function
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011417305.4A
Other languages
Chinese (zh)
Other versions
CN112419159A (en
Inventor
张晓东
张月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Internet Software Group Co ltd
Original Assignee
Shanghai Internet Software Group Co ltd
Filing date
Publication date
Application filed by Shanghai Internet Software Group Co ltd filed Critical Shanghai Internet Software Group Co ltd
Priority to CN202011417305.4A priority Critical patent/CN112419159B/en
Publication of CN112419159A publication Critical patent/CN112419159A/en
Application granted granted Critical
Publication of CN112419159B publication Critical patent/CN112419159B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a text image super-resolution reconstruction system and a text image super-resolution reconstruction method, wherein the text image super-resolution reconstruction method comprises the following steps: the feature extraction module extracts a set feature layer corresponding to the image to be processed; inputting the feature layer to a super-resolution image reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution text image; inputting the feature layer into a character recognition module, downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in a character image to be processed; and inputting the feature layer to a super-resolution gradient map reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution gradient map. The system and the method for reconstructing the multitask text image with super resolution can improve the definition and the credibility of the reconstructed text image.

Description

Text image super-resolution reconstruction system and method
Technical Field
The invention belongs to the technical field of image processing, relates to an image processing system, and particularly relates to a text image super-resolution reconstruction system and method.
Background
The deep neural network is a complex mathematical model, input data obtain corresponding output data through the deep neural network, a loss function is constructed through the difference between the output data and the marking data, the loss function calculates gradients on parameters in the deep neural network, the parameters in the deep neural network are updated through gradient back propagation, and the difference between the output data and the marking data is reduced continuously through continuously updating the parameters. Wherein the input data and the marking data form training data required for training the deep neural network, and the performance of the deep neural network is related to the structure of the neural network and the training data. The deep neural network has better performance than the traditional method in the fields of image, voice, natural language processing and the like, and is widely applied.
Image super-resolution reconstruction refers to reconstructing a corresponding high-resolution image from observed low-resolution images. With the rapid development of the deep learning technology, the image super-resolution reconstruction method based on the deep neural network is the image super-resolution reconstruction method with the optimal performance at present.
The image super-resolution reconstruction method based on the deep neural network generally comprises two major modules: the feature extraction module 21 and the super-resolution image reconstruction module 31 obtain a reconstructed super-resolution text image 41, calculate an image loss function 51 between the reconstructed super-resolution text image 41 and a high-resolution image corresponding to the text image 11 to be processed during training, perform image training gradient back propagation based on the image loss function 51, update parameters of the feature extraction module 21 and the super-resolution image reconstruction module 31, and enable the feature extraction module 21 to extract image information of the image 11 to be processed, as shown in fig. 1 as a whole. The existing image super-resolution reconstruction method based on the deep neural network obtains good performance in natural image reconstruction. When the existing image super-resolution reconstruction method is directly used for super-resolution reconstruction of a character image, the reconstructed super-resolution character image can generate the problems of blurred character edges and low credibility:
Compared with a natural image, the character image contains a large amount of gradient information, and when the existing image super-resolution reconstruction method is directly used for super-resolution reconstruction of the character image, the gradient information in the character image cannot be fully utilized, so that the reconstructed super-resolution character image has blurred character edges;
super-resolution reconstruction is essentially an ill-posed problem, i.e., for a low-resolution image, there are often many high-resolution images corresponding to it, which may result in a change of the content of the text in the reconstructed super-resolution text image, resulting in a lower reliability of the reconstructed super-resolution text image.
In view of this, there is a strong need to design a new text-to-image reconstruction method in order to overcome at least some of the above-mentioned drawbacks of the existing text-to-image reconstruction methods.
Disclosure of Invention
The invention provides a text image super-resolution reconstruction system and a text image super-resolution reconstruction method, which can reconstruct a text image with reduced resolution into a text image with super-resolution, and provide clear and reliable images for high-level tasks such as text detection and recognition.
In order to solve the technical problems, according to one aspect of the present invention, the following technical scheme is adopted:
a text image super-resolution reconstruction system, the system comprising:
The feature extraction module is used for extracting a set feature layer corresponding to the image to be processed;
the super-resolution image reconstruction module is connected with the feature extraction module and is used for upsampling the feature layer and extracting features of the upsampled feature layer to obtain a reconstructed super-resolution text image;
The character recognition module is connected with the feature extraction module and used for downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in the character image to be processed;
and the super-resolution gradient map reconstruction module is connected with the feature extraction module and is used for upsampling the feature layer and extracting features of the upsampled feature layer to obtain a reconstructed super-resolution gradient map.
As an embodiment of the present invention, the system further comprises:
the image loss function acquisition module is used for calculating an image loss function according to the super-resolution text image acquired by the super-resolution image reconstruction module;
The character loss function acquisition module is used for calculating a character loss function according to the character content acquired by the character recognition module;
the gradient loss function acquisition module is used for calculating a gradient loss function according to the super-resolution gradient map acquired by the super-resolution gradient map reconstruction module;
the loss function fusion module is used for fusing the three loss functions of the image loss function acquired by the image loss function acquisition module, the text loss function acquired by the text loss function acquisition module and the gradient loss function acquired by the gradient loss function acquisition module to acquire a fusion loss function; and training the multi-task text image super-resolution reconstruction network by using the fusion loss function.
As one embodiment of the present invention, the feature extraction module is configured to obtain an advanced feature layer of a text image to be processed, where the advanced feature layer includes deep feature information of the text image to be processed;
The super-resolution image reconstruction module is used for carrying out up-sampling on the advanced feature layer by the deep neural network, and carrying out feature extraction on the feature layer after up-sampling to obtain the features output by the deep neural network of each layer; determining the characteristics output by the last layer of deep neural network as reconstructed super-resolution text images;
The character recognition module is used for downsampling the advanced feature layer by a deep neural network comprising a pooling layer, so that the height of the downsampled feature layer is a set value; sending the down-sampled feature layer into a bidirectional LSTM network to extract time sequence features, and obtaining the output of the time sequence features of the character image to be processed; the time sequence features are further provided with features through the full connection layer and the softmax function, and the features of the last layer are determined to be the text content of the text image to be processed;
the super-resolution gradient map reconstruction module is used for upsampling the advanced feature layer by the deep neural network, and extracting features of the upsampled feature layer to obtain the output features of the deep neural network of each layer; and determining the characteristics output by the final layer of deep neural network as a reconstructed super-resolution gradient map.
As one embodiment of the present invention, the image loss function obtaining module is configured to back-propagate the calculated image loss function to the feature extracting module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid;
The character loss function acquisition module is used for reversely transmitting the calculated character loss function to the feature extraction module through a character training gradient, so that the feature layer extracted by the feature extraction module contains rich character information, the character content of the super-resolution character image reconstructed by the super-resolution image reconstruction module is more prepared, and the credibility of the super-resolution character image reconstructed by the super-resolution image reconstruction module is improved;
The gradient loss function acquisition module is used for reversely transmitting the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, the super-resolution character image and the character edge reconstructed by the super-resolution image reconstruction module are more clear, and the definition of the super-resolution character image after reconstruction is improved.
As one embodiment of the present invention, the image loss function obtaining module is configured to calculate an image loss function, and specifically includes: calculating the L 1 loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image;
The text loss function obtaining module is used for calculating a text loss function, and specifically comprises: calculating CTC loss of the text content of the text image to be processed and the corresponding marked text content obtained by the text recognition module, so that the text content recognized by the text recognition module is more correct;
The gradient loss function obtaining module is used for calculating a gradient loss function, and specifically comprises: calculating a gradient map through a Sobel operator for the high-resolution text image corresponding to the text image to be processed, and obtaining a target gradient map; calculating the loss of L 1 by the target gradient map and the reconstructed super-resolution gradient map, so that the reconstructed super-resolution gradient map has the pixel value of the target gradient map;
And the loss function fusion module performs weighted summation on the three loss functions of the image loss function, the text loss function and the gradient loss function to obtain a fusion loss function.
According to one aspect of the invention, the following technical scheme is adopted: a text image super-resolution reconstruction method, the method comprising:
The feature extraction module extracts a set feature layer corresponding to the image to be processed;
Inputting the feature layer to a super-resolution image reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution text image;
Inputting the feature layer into a character recognition module, downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in a character image to be processed;
and inputting the feature layer to a super-resolution gradient map reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution gradient map.
As an embodiment of the present invention, the method further comprises:
the image loss function acquisition module calculates an image loss function according to the super-resolution text image acquired by the super-resolution image reconstruction module;
the text loss function acquisition module calculates a text loss function according to the text content acquired by the text recognition module;
The gradient loss function acquisition module calculates a gradient loss function according to the super-resolution gradient map acquired by the super-resolution gradient map reconstruction module;
The loss function fusion module fuses the three loss functions of the image loss function acquired by the image loss function acquisition module, the text loss function acquired by the text loss function acquisition module and the gradient loss function acquired by the gradient loss function acquisition module to acquire a fusion loss function; and training the multi-task text image super-resolution reconstruction network by using the fusion loss function.
As one embodiment of the present invention, the feature extraction module obtains an advanced feature layer of a text image to be processed, where the advanced feature layer includes deep feature information of the text image to be processed;
The super-resolution image reconstruction module performs up-sampling of the advanced feature layer by using a deep neural network, and performs feature extraction on the feature layer after up-sampling to obtain features output by the deep neural network of each layer; determining the characteristics output by the last layer of deep neural network as reconstructed super-resolution text images;
The character recognition module performs downsampling of the advanced feature layer by a deep neural network comprising a pooling layer, so that the height of the downsampled feature layer is a set value; sending the down-sampled feature layer into a bidirectional LSTM network to extract time sequence features, and obtaining the output of the time sequence features of the character image to be processed; the time sequence features are further provided with features through the full connection layer and the softmax function, and the features of the last layer are determined to be the text content of the text image to be processed;
the super-resolution gradient map reconstruction module performs up-sampling on the advanced feature layer by using a deep neural network, performs feature extraction on the feature layer after up-sampling, and obtains features output by the deep neural network of each layer; and determining the characteristics output by the final layer of deep neural network as a reconstructed super-resolution gradient map.
As one embodiment of the present invention, the image loss function obtaining module reversely propagates the calculated image loss function to the feature extracting module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid;
The character loss function acquisition module reversely propagates the calculated character loss function to the feature extraction module through a character training gradient, so that the feature layer extracted by the feature extraction module contains rich character information, the character content of the super-resolution character image reconstructed by the super-resolution image reconstruction module is more prepared, and the credibility of the reconstructed super-resolution character image is improved;
The gradient loss function acquisition module reversely propagates the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, the super-resolution character image and character edges reconstructed by the super-resolution image reconstruction module are helped to be clearer, and the definition of the super-resolution character image after reconstruction is improved.
As one embodiment of the present invention, the image loss function obtaining module calculates an image loss function, specifically including: calculating the L 1 loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image;
the text loss function obtaining module calculates a text loss function, and specifically includes: calculating CTC loss of the text content of the text image to be processed and the corresponding marked text content obtained by the text recognition module, so that the text content recognized by the text recognition module is more correct;
the gradient loss function obtaining module calculates a gradient loss function, and specifically includes: calculating a gradient map through a Sobel operator for the high-resolution text image corresponding to the text image to be processed, and obtaining a target gradient map; calculating the loss of L 1 by the target gradient map and the reconstructed super-resolution gradient map, so that the reconstructed super-resolution gradient map has the pixel value of the target gradient map;
And the loss function fusion module performs weighted summation on the three loss functions of the image loss function, the text loss function and the gradient loss function to obtain a fusion loss function.
The invention has the beneficial effects that: the system and the method for reconstructing the super-resolution of the multitask text image, which are provided by the invention, reduce the reconstruction of the text image with the resolution into the text image with the super-resolution, solve the problems of fuzzy text edges and low text content credibility of the reconstructed super-resolution text image when the traditional method for reconstructing the super-resolution of the image based on the depth neural network is applied to the reconstruction of the text image, and provide clear and credible images for high-level tasks such as semantic analysis of the text image.
Compared with the existing image super-resolution reconstruction method based on the deep neural network, the method has the following two advantages:
(1) The reconstructed super-resolution text image has clear text edges:
according to the multi-task text image super-resolution reconstruction method provided by the invention, the super-resolution gradient map reconstruction module is added in parallel on the basis of the super-resolution image reconstruction module, the gradient loss function is calculated, and when network parameters are updated, the gradient is trained and propagated in the opposite direction, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, and the text edges of the super-resolution text image reconstructed by the super-resolution image reconstruction module are clearer.
(2) The reliability of the text content of the reconstructed super-resolution text image is high:
according to the multi-task text image super-resolution reconstruction method provided by the invention, the text recognition module is added in parallel on the basis of the super-resolution image reconstruction module, the text loss function is calculated, and when the network parameters are updated, the high-level feature layer extracted by the feature extraction module contains rich text information through the text training gradient back propagation, so that the text content of the super-resolution text image reconstructed by the super-resolution image reconstruction module is correct and high in reliability.
Drawings
Fig. 1 is a schematic diagram of the composition of a conventional text image super-resolution reconstruction system.
Fig. 2 is a schematic diagram illustrating a system for reconstructing super-resolution of a Chinese image according to an embodiment of the present invention.
Fig. 3 is a schematic diagram illustrating a system for reconstructing super-resolution of a Chinese image according to an embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating a composition of a super-resolution reconstruction system for a chinese image according to an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.
For a further understanding of the present invention, preferred embodiments of the invention are described below in conjunction with the examples, but it should be understood that these descriptions are merely intended to illustrate further features and advantages of the invention, and are not limiting of the claims of the invention.
The description of this section is intended to be illustrative of only a few exemplary embodiments and the invention is not to be limited in scope by the description of the embodiments. It is also within the scope of the description and claims of the invention to interchange some of the technical features of the embodiments with other technical features of the same or similar prior art.
The description of the steps in the various embodiments in the specification is merely for convenience of description, and the implementation of the present application is not limited by the order in which the steps are implemented. "connected" in the specification includes both direct and indirect connections.
The invention discloses a text image super-resolution reconstruction system, and fig. 2 and 3 are schematic diagrams of the text image super-resolution reconstruction system in an embodiment of the invention; referring to fig. 2 and 3, the system includes: the device comprises a feature extraction module 1, a super-resolution image reconstruction module 2, a character recognition module 3 and a super-resolution gradient map reconstruction module 4; the feature extraction module 1 is respectively connected with the super-resolution image reconstruction module 2, the character recognition module 3 and the super-resolution gradient map reconstruction module 4.
The feature extraction module 1 is used for extracting a set feature layer corresponding to the image to be processed; the super-resolution image reconstruction module 2 is used for upsampling the feature layer, extracting features of the upsampled feature layer, and obtaining a reconstructed super-resolution text image; the character recognition module 3 is used for downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in the character image to be processed; the super-resolution gradient map reconstruction module 4 is configured to upsample the feature layer, perform feature extraction on the upsampled feature layer, and obtain a reconstructed super-resolution gradient map.
In an embodiment of the present invention, the feature extraction module is configured to obtain an advanced feature layer of the text image to be processed, where the advanced feature layer includes deep feature information of the text image to be processed. In one embodiment, the text image to be processed may be input ESRGAN to generate a feature extraction module in the network to obtain an advanced feature layer.
FIG. 4 is a schematic diagram illustrating the composition of a super-resolution reconstruction system for Chinese image in accordance with an embodiment of the present invention; referring to fig. 4, in an embodiment of the present invention, the super-resolution image reconstruction module 2 is configured to upsample the feature layer of the deep neural network, and perform feature extraction on the upsampled feature layer to obtain features output by the deep neural network of each layer; and determining the characteristics output by the last layer of deep neural network as the reconstructed super-resolution text image.
The character recognition module 3 is configured to downsample the advanced feature layer with a deep neural network including a pooling layer, so that the height of the downsampled feature layer is set to be 1; sending the down-sampled feature layer into a bidirectional LSTM network to extract time sequence features, and obtaining the output of the time sequence features of the character image to be processed; the time sequence features are further provided with features through the full connection layer and the softmax function, and the features of the last layer are determined to be the text content of the text image to be processed.
The super-resolution gradient map reconstruction module 4 is used for upsampling the advanced feature layer by a deep neural network, and extracting features of the upsampled feature layer to obtain the output features of the deep neural network of each layer; and determining the characteristics output by the final layer of deep neural network as a reconstructed super-resolution gradient map.
As shown in fig. 4, in an embodiment of the present invention, the system further includes an image loss function obtaining module 5, a text loss function obtaining module 6, a gradient loss function obtaining module 7, and a loss function fusion module 8.
The image loss function obtaining module 5 is used for calculating an image loss function according to the super-resolution text image obtained by the super-resolution image reconstruction module; the text loss function obtaining module 6 is used for calculating a text loss function according to the text content obtained by the text recognition module; the gradient loss function obtaining module 7 is configured to calculate a gradient loss function according to the super-resolution gradient map obtained by the super-resolution gradient map reconstruction module. The loss function fusion module 8 is configured to fuse the three loss functions of the image loss function acquired by the image loss function acquisition module 5, the text loss function acquired by the text loss function acquisition module 6, and the gradient loss function acquired by the gradient loss function acquisition module 7, so as to acquire a fusion loss function; and training the multi-task text image super-resolution reconstruction network by using the fusion loss function.
In one embodiment of the present invention, the image loss function obtaining module 5 is configured to calculate an image loss function, and specifically includes: and calculating the L 1 loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image.
The text loss function obtaining module 6 is configured to calculate a text loss function, and specifically includes: and calculating CTC loss of the text content of the text image to be processed and the corresponding marked text content, which are acquired by the text recognition module, so that the text content recognized by the text recognition module is more correct.
The gradient loss function obtaining module 7 is configured to calculate a gradient loss function, and specifically includes: calculating a gradient map through a Sobel operator for the high-resolution text image corresponding to the text image to be processed, and obtaining a target gradient map; and calculating the L 1 loss of the target gradient map and the reconstructed super-resolution gradient map, so that the reconstructed super-resolution gradient map has the pixel value of the target gradient map.
The loss function fusion module 8 performs weighted summation on the three loss functions of the image loss function, the text loss function and the gradient loss function to obtain a fusion loss function.
With continued reference to fig. 3 and fig. 4, in an embodiment of the present invention, the image loss function obtaining module 5 is configured to back-propagate the calculated image loss function to the feature extraction module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid.
The text loss function obtaining module 6 is configured to reversely propagate the calculated text loss function to the feature extracting module through a text training gradient, so that the feature layer extracted by the feature extracting module contains abundant text information, thereby helping the super-resolution text image text content reconstructed by the super-resolution image reconstruction module to be more ready, and improving the reliability of the reconstructed super-resolution text image.
The gradient loss function obtaining module 7 is configured to reversely propagate the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, thereby helping super-resolution text images and text edges reconstructed by the super-resolution image reconstruction module to be clearer, and improving definition of the reconstructed super-resolution text images.
The loss function fusion module 8 is further configured to back-propagate the fusion loss function to the image loss function acquisition module 5, the text loss function acquisition module 6, and the gradient loss function acquisition module 7.
The invention also discloses a method for reconstructing the super-resolution of the multi-task text image, which can be referred to as fig. 4, and comprises the following steps:
The feature extraction module extracts a set feature layer corresponding to the image to be processed;
Inputting the feature layer to a super-resolution image reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution text image;
Inputting the feature layer into a character recognition module, downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in a character image to be processed;
and inputting the feature layer to a super-resolution gradient map reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution gradient map.
In an embodiment of the present invention, the feature extraction module obtains an advanced feature layer of the text image to be processed, where the advanced feature layer includes deep feature information of the text image to be processed.
In an embodiment of the present invention, the super-resolution image reconstruction module performs upsampling on the feature layer by using a deep neural network, and performs feature extraction on the upsampled feature layer to obtain features output by the deep neural network of each layer; and determining the characteristics output by the last layer of deep neural network as the reconstructed super-resolution text image.
The character recognition module performs downsampling of the advanced feature layer by a deep neural network comprising a pooling layer, so that the height of the downsampled feature layer is a set value; sending the down-sampled feature layer into a bidirectional LSTM network to extract time sequence features, and obtaining the output of the time sequence features of the character image to be processed; the time sequence features are further provided with features through the full connection layer and the softmax function, and the features of the last layer are determined to be the text content of the text image to be processed.
The super-resolution gradient map reconstruction module performs up-sampling on the advanced feature layer by using a deep neural network, performs feature extraction on the feature layer after up-sampling, and obtains features output by the deep neural network of each layer; and determining the characteristics output by the final layer of deep neural network as a reconstructed super-resolution gradient map.
In an embodiment of the invention, the method further comprises a training process.
With continued reference to fig. 4, in one embodiment of the present invention, the method further includes:
the image loss function acquisition module calculates an image loss function according to the super-resolution text image acquired by the super-resolution image reconstruction module;
the text loss function acquisition module calculates a text loss function according to the text content acquired by the text recognition module;
The gradient loss function acquisition module calculates a gradient loss function according to the super-resolution gradient map acquired by the super-resolution gradient map reconstruction module;
The loss function fusion module fuses the three loss functions of the image loss function acquired by the image loss function acquisition module, the text loss function acquired by the text loss function acquisition module and the gradient loss function acquired by the gradient loss function acquisition module to acquire a fusion loss function; and training the multi-task text image super-resolution reconstruction network by using the fusion loss function.
In an embodiment of the present invention, the image loss function obtaining module calculates an image loss function, specifically including: and calculating the L 1 loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image.
The text loss function obtaining module calculates a text loss function, and specifically includes: and calculating CTC loss of the text content of the text image to be processed and the corresponding marked text content, which are acquired by the text recognition module, so that the text content recognized by the text recognition module is more correct.
The gradient loss function obtaining module calculates a gradient loss function, and specifically includes: calculating a gradient map through a Sobel operator for the high-resolution text image corresponding to the text image to be processed, and obtaining a target gradient map; and calculating the L 1 loss of the target gradient map and the reconstructed super-resolution gradient map, so that the reconstructed super-resolution gradient map has the pixel value of the target gradient map.
And the loss function fusion module performs weighted summation on the three loss functions of the image loss function, the text loss function and the gradient loss function to obtain a fusion loss function.
With continued reference to fig. 4, in one embodiment of the present invention, the method further includes:
The image loss function acquisition module reversely propagates the calculated image loss function to the feature extraction module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid;
The character loss function acquisition module reversely propagates the calculated character loss function to the feature extraction module through a character training gradient, so that the feature layer extracted by the feature extraction module contains rich character information, the character content of the super-resolution character image reconstructed by the super-resolution image reconstruction module is more prepared, and the credibility of the reconstructed super-resolution character image is improved;
the gradient loss function acquisition module reversely propagates the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, the super-resolution character image and character edges reconstructed by the super-resolution image reconstruction module are clearer, and the definition of the reconstructed super-resolution character image is improved;
the loss function fusion module reversely propagates the fusion loss function to the image loss function acquisition module, the text loss function acquisition module and the gradient loss function acquisition module.
In summary, the system and the method for reconstructing the multitask text image super-resolution provided by the invention have the advantages that the text image with reduced resolution is reconstructed into the text image with super-resolution, so that the problems of blurred text edges and low text content credibility of the reconstructed super-resolution text image are solved when the traditional method for reconstructing the text image based on the image super-resolution of the depth neural network is applied to text image reconstruction, and clear and credible images are provided for high-level tasks such as semantic analysis of the text image.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware; for example, an Application Specific Integrated Circuit (ASIC), a general purpose computer, or any other similar hardware device may be employed. In some embodiments, the software program of the present application may be executed by a processor to implement the above steps or functions. Likewise, the software program of the present application (including the related data structures) may be stored in a computer-readable recording medium; such as RAM memory, magnetic or optical drives or diskettes, and the like. In addition, some steps or functions of the present application may be implemented in hardware; for example, as circuitry that cooperates with the processor to perform various steps or functions.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The description and applications of the present invention herein are illustrative and are not intended to limit the scope of the invention to the embodiments described above. Effects or advantages referred to in the embodiments may not be embodied in the embodiments due to interference of various factors, and description of the effects or advantages is not intended to limit the embodiments. Variations and modifications of the embodiments disclosed herein are possible, and alternatives and equivalents of the various components of the embodiments are known to those of ordinary skill in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other assemblies, materials, and components, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.

Claims (2)

1. A text image super-resolution reconstruction system, the system comprising:
The feature extraction module is used for extracting a set feature layer corresponding to the image to be processed;
the super-resolution image reconstruction module is connected with the feature extraction module and is used for upsampling the feature layer and extracting features of the upsampled feature layer to obtain a reconstructed super-resolution text image;
The character recognition module is connected with the feature extraction module and used for downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in the character image to be processed;
The super-resolution gradient map reconstruction module is connected with the feature extraction module and is used for upsampling the feature layer and extracting features of the upsampled feature layer to obtain a reconstructed super-resolution gradient map;
the image loss function acquisition module is used for calculating an image loss function according to the super-resolution text image acquired by the super-resolution image reconstruction module;
The character loss function acquisition module is used for calculating a character loss function according to the character content acquired by the character recognition module;
the gradient loss function acquisition module is used for calculating a gradient loss function according to the super-resolution gradient map acquired by the super-resolution gradient map reconstruction module;
the loss function fusion module is used for fusing the three loss functions of the image loss function acquired by the image loss function acquisition module, the text loss function acquired by the text loss function acquisition module and the gradient loss function acquired by the gradient loss function acquisition module to acquire a fusion loss function; training a multi-task text image super-resolution reconstruction network by utilizing the fusion loss function;
the feature extraction module is used for obtaining an advanced feature layer of the character image to be processed, wherein the advanced feature layer comprises deep feature information of the character image to be processed;
The super-resolution image reconstruction module is used for carrying out up-sampling on the advanced feature layer by the deep neural network, and carrying out feature extraction on the feature layer after up-sampling to obtain the features output by the deep neural network of each layer; determining the characteristics output by the last layer of deep neural network as reconstructed super-resolution text images;
The character recognition module is used for downsampling the advanced feature layer by a deep neural network comprising a pooling layer, so that the height of the downsampled feature layer is a set value; sending the down-sampled feature layer into a bidirectional LSTM network to extract time sequence features, and obtaining the output of the time sequence features of the character image to be processed; the time sequence features are further provided with features through the full connection layer and the softmax function, and the features of the last layer are determined to be the text content of the text image to be processed;
The super-resolution gradient map reconstruction module is used for upsampling the advanced feature layer by the deep neural network, and extracting features of the upsampled feature layer to obtain the output features of the deep neural network of each layer; determining the characteristics output by the last layer of deep neural network as a reconstructed super-resolution gradient map;
The image loss function acquisition module is used for reversely transmitting the calculated image loss function to the feature extraction module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid;
The character loss function acquisition module is used for reversely transmitting the calculated character loss function to the feature extraction module through a character training gradient, so that the feature layer extracted by the feature extraction module contains rich character information, the character content of the super-resolution character image reconstructed by the super-resolution image reconstruction module is more prepared, and the credibility of the super-resolution character image reconstructed by the super-resolution image reconstruction module is improved;
The gradient loss function acquisition module is used for reversely transmitting the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, the super-resolution character image and the character edge reconstructed by the super-resolution image reconstruction module are more clear, and the definition of the reconstructed super-resolution character image is improved;
The image loss function obtaining module is used for calculating an image loss function, and specifically comprises: calculating the L 1 loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image;
The text loss function obtaining module is used for calculating a text loss function, and specifically comprises: calculating CTC loss of the text content of the text image to be processed and the corresponding marked text content obtained by the text recognition module, so that the text content recognized by the text recognition module is more correct;
The gradient loss function obtaining module is used for calculating a gradient loss function, and specifically comprises: calculating a gradient map through a Sobel operator for the high-resolution text image corresponding to the text image to be processed, and obtaining a target gradient map; calculating the loss of L 1 by the target gradient map and the reconstructed super-resolution gradient map, so that the reconstructed super-resolution gradient map has the pixel value of the target gradient map;
And the loss function fusion module performs weighted summation on the three loss functions of the image loss function, the text loss function and the gradient loss function to obtain a fusion loss function.
2. The super-resolution reconstruction method for the text image is characterized by comprising the following steps of:
The feature extraction module extracts a set feature layer corresponding to the image to be processed;
Inputting the feature layer to a super-resolution image reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution text image;
Inputting the feature layer into a character recognition module, downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in a character image to be processed; inputting the feature layer to a super-resolution gradient map reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution gradient map;
the image loss function acquisition module calculates an image loss function according to the super-resolution text image acquired by the super-resolution image reconstruction module;
the text loss function acquisition module calculates a text loss function according to the text content acquired by the text recognition module;
The gradient loss function acquisition module calculates a gradient loss function according to the super-resolution gradient map acquired by the super-resolution gradient map reconstruction module;
The loss function fusion module fuses the three loss functions of the image loss function acquired by the image loss function acquisition module, the text loss function acquired by the text loss function acquisition module and the gradient loss function acquired by the gradient loss function acquisition module to acquire a fusion loss function; training a multi-task text image super-resolution reconstruction network by utilizing the fusion loss function;
the feature extraction module acquires an advanced feature layer of the character image to be processed, wherein the advanced feature layer comprises deep feature information of the character image to be processed;
The super-resolution image reconstruction module performs up-sampling of the advanced feature layer by using a deep neural network, and performs feature extraction on the feature layer after up-sampling to obtain features output by the deep neural network of each layer; determining the characteristics output by the last layer of deep neural network as reconstructed super-resolution text images;
The character recognition module performs downsampling of the advanced feature layer by a deep neural network comprising a pooling layer, so that the height of the downsampled feature layer is a set value; sending the down-sampled feature layer into a bidirectional LSTM network to extract time sequence features, and obtaining the output of the time sequence features of the character image to be processed; the time sequence features are further provided with features through the full connection layer and the softmax function, and the features of the last layer are determined to be the text content of the text image to be processed;
The super-resolution gradient map reconstruction module performs up-sampling on the advanced feature layer by using a deep neural network, performs feature extraction on the feature layer after up-sampling, and obtains features output by the deep neural network of each layer; determining the characteristics output by the last layer of deep neural network as a reconstructed super-resolution gradient map;
the image loss function acquisition module reversely propagates the calculated image loss function to the feature extraction module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid;
The character loss function acquisition module reversely propagates the calculated character loss function to the feature extraction module through a character training gradient, so that the feature layer extracted by the feature extraction module contains rich character information, the character content of the super-resolution character image reconstructed by the super-resolution image reconstruction module is more prepared, and the credibility of the reconstructed super-resolution character image is improved;
the gradient loss function acquisition module reversely propagates the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, the super-resolution character image and character edges reconstructed by the super-resolution image reconstruction module are clearer, and the definition of the reconstructed super-resolution character image is improved;
The image loss function obtaining module calculates an image loss function, and specifically includes: calculating the L 1 loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image;
the text loss function obtaining module calculates a text loss function, and specifically includes: calculating CTC loss of the text content of the text image to be processed and the corresponding marked text content obtained by the text recognition module, so that the text content recognized by the text recognition module is more correct;
the gradient loss function obtaining module calculates a gradient loss function, and specifically includes: calculating a gradient map through a Sobel operator for the high-resolution text image corresponding to the text image to be processed, and obtaining a target gradient map; calculating the loss of L 1 by the target gradient map and the reconstructed super-resolution gradient map, so that the reconstructed super-resolution gradient map has the pixel value of the target gradient map;
And the loss function fusion module performs weighted summation on the three loss functions of the image loss function, the text loss function and the gradient loss function to obtain a fusion loss function.
CN202011417305.4A 2020-12-07 Text image super-resolution reconstruction system and method Active CN112419159B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011417305.4A CN112419159B (en) 2020-12-07 Text image super-resolution reconstruction system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011417305.4A CN112419159B (en) 2020-12-07 Text image super-resolution reconstruction system and method

Publications (2)

Publication Number Publication Date
CN112419159A CN112419159A (en) 2021-02-26
CN112419159B true CN112419159B (en) 2024-06-04

Family

ID=

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986029A (en) * 2018-07-03 2018-12-11 南京览笛信息科技有限公司 Character image super resolution ratio reconstruction method, system, terminal device and storage medium
CN109902678A (en) * 2019-02-12 2019-06-18 北京奇艺世纪科技有限公司 Model training method, character recognition method, device, electronic equipment and computer-readable medium
CN110136063A (en) * 2019-05-13 2019-08-16 南京信息工程大学 A kind of single image super resolution ratio reconstruction method generating confrontation network based on condition
CN110633755A (en) * 2019-09-19 2019-12-31 北京市商汤科技开发有限公司 Network training method, image processing method and device and electronic equipment
CN110929726A (en) * 2020-02-11 2020-03-27 南京智莲森信息技术有限公司 Railway contact network support number plate identification method and system
US10671878B1 (en) * 2019-01-11 2020-06-02 Capital One Services, Llc Systems and methods for text localization and recognition in an image of a document
CN111402138A (en) * 2020-03-24 2020-07-10 天津城建大学 Image super-resolution reconstruction method of supervised convolutional neural network based on multi-scale feature extraction fusion
CN111553290A (en) * 2020-04-30 2020-08-18 北京市商汤科技开发有限公司 Text recognition method, device, equipment and storage medium
CN111754399A (en) * 2020-05-29 2020-10-09 清华大学 Image super-resolution method for keeping geometric structure based on gradient
CN112037131A (en) * 2020-08-31 2020-12-04 上海电力大学 Single-image super-resolution reconstruction method based on generation countermeasure network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986029A (en) * 2018-07-03 2018-12-11 南京览笛信息科技有限公司 Character image super resolution ratio reconstruction method, system, terminal device and storage medium
US10671878B1 (en) * 2019-01-11 2020-06-02 Capital One Services, Llc Systems and methods for text localization and recognition in an image of a document
CN109902678A (en) * 2019-02-12 2019-06-18 北京奇艺世纪科技有限公司 Model training method, character recognition method, device, electronic equipment and computer-readable medium
CN110136063A (en) * 2019-05-13 2019-08-16 南京信息工程大学 A kind of single image super resolution ratio reconstruction method generating confrontation network based on condition
CN110633755A (en) * 2019-09-19 2019-12-31 北京市商汤科技开发有限公司 Network training method, image processing method and device and electronic equipment
CN110929726A (en) * 2020-02-11 2020-03-27 南京智莲森信息技术有限公司 Railway contact network support number plate identification method and system
CN111402138A (en) * 2020-03-24 2020-07-10 天津城建大学 Image super-resolution reconstruction method of supervised convolutional neural network based on multi-scale feature extraction fusion
CN111553290A (en) * 2020-04-30 2020-08-18 北京市商汤科技开发有限公司 Text recognition method, device, equipment and storage medium
CN111754399A (en) * 2020-05-29 2020-10-09 清华大学 Image super-resolution method for keeping geometric structure based on gradient
CN112037131A (en) * 2020-08-31 2020-12-04 上海电力大学 Single-image super-resolution reconstruction method based on generation countermeasure network

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
Improving Text Image Resolution using a Deep Generative Adversarial Network for Optical Character Recognition;Xiangdong Su 等;《2019 International Conference on Document Analysis and Recognition (ICDAR)》;20200229;第1193-1199页 *
刘祥龙 等.《飞桨PaddlePaddle深度学习实战》.北京:机械工业出版社,2020,第48-49页. *
基于像素及梯度域双层深度卷积神经网络的页岩图像超分辨率重建;占文枢 等;《科学技术与工程》;第18卷(第3期);第85-90页 *
基于深度学习的文本图像重建方法研究;陈赛健;《中国优秀硕士学位论文全文数据库 信息科技辑》(第8期);第I138-539页 *
张德丰.《TensorFlow深度学习从入门到进阶》.北京:机械工业出版社,2020,第279-286页. *
方圆圆.《人脸识别与美颜算法实战 基于Python、机器学习与深度学习》.北京:机械工业出版社,2020,第144-145页. *
章鲁 等.《分子成像及医学图像分析》.上海:上海科学技术出版社,2009,第66-67页. *
高新波 等.《现代图像分析》.西安:西安电子科技大学出版社,2020,第152页. *

Similar Documents

Publication Publication Date Title
CN111461114B (en) Multi-scale feature pyramid text detection method based on segmentation
Zuo et al. Multi-scale frequency reconstruction for guided depth map super-resolution via deep residual network
Li et al. Survey of single image super‐resolution reconstruction
US20180374197A1 (en) Human face resolution re-establishing method and re-establishing system, and readable medium
CN109886871A (en) The image super-resolution method merged based on channel attention mechanism and multilayer feature
CN110689483B (en) Image super-resolution reconstruction method based on depth residual error network and storage medium
CN111784762B (en) Method and device for extracting blood vessel center line of X-ray radiography image
CN112016489B (en) Pedestrian re-identification method capable of retaining global information and enhancing local features
Huang et al. Pyramid-structured depth map super-resolution based on deep dense-residual network
CN116258652B (en) Text image restoration model and method based on structure attention and text perception
CN110853039B (en) Sketch image segmentation method, system and device for multi-data fusion and storage medium
CN113066025B (en) Image defogging method based on incremental learning and feature and attention transfer
CN111914654A (en) Text layout analysis method, device, equipment and medium
CN112037239B (en) Text guidance image segmentation method based on multi-level explicit relation selection
CN104899835A (en) Super-resolution processing method for image based on blind fuzzy estimation and anchoring space mapping
CN113837290A (en) Unsupervised unpaired image translation method based on attention generator network
CN113140023A (en) Text-to-image generation method and system based on space attention
Yang et al. Image super-resolution reconstruction based on improved Dirac residual network
CN112419159B (en) Text image super-resolution reconstruction system and method
CN113570608B (en) Target segmentation method and device and electronic equipment
CN115511705A (en) Image super-resolution reconstruction method based on deformable residual convolution neural network
Wang et al. Super-resolving face image by facial parsing information
CN112419159A (en) Character image super-resolution reconstruction system and method
CN116188273A (en) Uncertainty-oriented bimodal separable image super-resolution method
CN112419158A (en) Image video super-resolution and super-definition reconstruction system and method

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant