CN112419159B

CN112419159B - Text image super-resolution reconstruction system and method

Info

Publication number: CN112419159B
Application number: CN202011417305.4A
Authority: CN
Inventors: 张晓东; 张月
Original assignee: Shanghai Internet Software Group Co ltd
Current assignee: Shanghai Internet Software Group Co ltd
Filing date: 2020-12-07
Publication date: 2024-06-04
Anticipated expiration: 2040-12-07

Abstract

The invention discloses a text image super-resolution reconstruction system and a text image super-resolution reconstruction method, wherein the text image super-resolution reconstruction method comprises the following steps: the feature extraction module extracts a set feature layer corresponding to the image to be processed; inputting the feature layer to a super-resolution image reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution text image; inputting the feature layer into a character recognition module, downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in a character image to be processed; and inputting the feature layer to a super-resolution gradient map reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution gradient map. The system and the method for reconstructing the multitask text image with super resolution can improve the definition and the credibility of the reconstructed text image.

Description

Text image super-resolution reconstruction system and method

Technical Field

The invention belongs to the technical field of image processing, relates to an image processing system, and particularly relates to a text image super-resolution reconstruction system and method.

Background

The deep neural network is a complex mathematical model, input data obtain corresponding output data through the deep neural network, a loss function is constructed through the difference between the output data and the marking data, the loss function calculates gradients on parameters in the deep neural network, the parameters in the deep neural network are updated through gradient back propagation, and the difference between the output data and the marking data is reduced continuously through continuously updating the parameters. Wherein the input data and the marking data form training data required for training the deep neural network, and the performance of the deep neural network is related to the structure of the neural network and the training data. The deep neural network has better performance than the traditional method in the fields of image, voice, natural language processing and the like, and is widely applied.

Image super-resolution reconstruction refers to reconstructing a corresponding high-resolution image from observed low-resolution images. With the rapid development of the deep learning technology, the image super-resolution reconstruction method based on the deep neural network is the image super-resolution reconstruction method with the optimal performance at present.

The image super-resolution reconstruction method based on the deep neural network generally comprises two major modules: the feature extraction module 21 and the super-resolution image reconstruction module 31 obtain a reconstructed super-resolution text image 41, calculate an image loss function 51 between the reconstructed super-resolution text image 41 and a high-resolution image corresponding to the text image 11 to be processed during training, perform image training gradient back propagation based on the image loss function 51, update parameters of the feature extraction module 21 and the super-resolution image reconstruction module 31, and enable the feature extraction module 21 to extract image information of the image 11 to be processed, as shown in fig. 1 as a whole. The existing image super-resolution reconstruction method based on the deep neural network obtains good performance in natural image reconstruction. When the existing image super-resolution reconstruction method is directly used for super-resolution reconstruction of a character image, the reconstructed super-resolution character image can generate the problems of blurred character edges and low credibility:

Compared with a natural image, the character image contains a large amount of gradient information, and when the existing image super-resolution reconstruction method is directly used for super-resolution reconstruction of the character image, the gradient information in the character image cannot be fully utilized, so that the reconstructed super-resolution character image has blurred character edges;

super-resolution reconstruction is essentially an ill-posed problem, i.e., for a low-resolution image, there are often many high-resolution images corresponding to it, which may result in a change of the content of the text in the reconstructed super-resolution text image, resulting in a lower reliability of the reconstructed super-resolution text image.

In view of this, there is a strong need to design a new text-to-image reconstruction method in order to overcome at least some of the above-mentioned drawbacks of the existing text-to-image reconstruction methods.

Disclosure of Invention

The invention provides a text image super-resolution reconstruction system and a text image super-resolution reconstruction method, which can reconstruct a text image with reduced resolution into a text image with super-resolution, and provide clear and reliable images for high-level tasks such as text detection and recognition.

In order to solve the technical problems, according to one aspect of the present invention, the following technical scheme is adopted:

a text image super-resolution reconstruction system, the system comprising:

The feature extraction module is used for extracting a set feature layer corresponding to the image to be processed;

the super-resolution image reconstruction module is connected with the feature extraction module and is used for upsampling the feature layer and extracting features of the upsampled feature layer to obtain a reconstructed super-resolution text image;

The character recognition module is connected with the feature extraction module and used for downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in the character image to be processed;

and the super-resolution gradient map reconstruction module is connected with the feature extraction module and is used for upsampling the feature layer and extracting features of the upsampled feature layer to obtain a reconstructed super-resolution gradient map.

As an embodiment of the present invention, the system further comprises:

the image loss function acquisition module is used for calculating an image loss function according to the super-resolution text image acquired by the super-resolution image reconstruction module;

The character loss function acquisition module is used for calculating a character loss function according to the character content acquired by the character recognition module;

the gradient loss function acquisition module is used for calculating a gradient loss function according to the super-resolution gradient map acquired by the super-resolution gradient map reconstruction module;

the loss function fusion module is used for fusing the three loss functions of the image loss function acquired by the image loss function acquisition module, the text loss function acquired by the text loss function acquisition module and the gradient loss function acquired by the gradient loss function acquisition module to acquire a fusion loss function; and training the multi-task text image super-resolution reconstruction network by using the fusion loss function.

As one embodiment of the present invention, the feature extraction module is configured to obtain an advanced feature layer of a text image to be processed, where the advanced feature layer includes deep feature information of the text image to be processed;

The super-resolution image reconstruction module is used for carrying out up-sampling on the advanced feature layer by the deep neural network, and carrying out feature extraction on the feature layer after up-sampling to obtain the features output by the deep neural network of each layer; determining the characteristics output by the last layer of deep neural network as reconstructed super-resolution text images;

The character recognition module is used for downsampling the advanced feature layer by a deep neural network comprising a pooling layer, so that the height of the downsampled feature layer is a set value; sending the down-sampled feature layer into a bidirectional LSTM network to extract time sequence features, and obtaining the output of the time sequence features of the character image to be processed; the time sequence features are further provided with features through the full connection layer and the softmax function, and the features of the last layer are determined to be the text content of the text image to be processed;

the super-resolution gradient map reconstruction module is used for upsampling the advanced feature layer by the deep neural network, and extracting features of the upsampled feature layer to obtain the output features of the deep neural network of each layer; and determining the characteristics output by the final layer of deep neural network as a reconstructed super-resolution gradient map.

As one embodiment of the present invention, the image loss function obtaining module is configured to back-propagate the calculated image loss function to the feature extracting module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid;

The character loss function acquisition module is used for reversely transmitting the calculated character loss function to the feature extraction module through a character training gradient, so that the feature layer extracted by the feature extraction module contains rich character information, the character content of the super-resolution character image reconstructed by the super-resolution image reconstruction module is more prepared, and the credibility of the super-resolution character image reconstructed by the super-resolution image reconstruction module is improved;

The gradient loss function acquisition module is used for reversely transmitting the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, the super-resolution character image and the character edge reconstructed by the super-resolution image reconstruction module are more clear, and the definition of the super-resolution character image after reconstruction is improved.

As one embodiment of the present invention, the image loss function obtaining module is configured to calculate an image loss function, and specifically includes: calculating the L ₁ loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image;

The text loss function obtaining module is used for calculating a text loss function, and specifically comprises: calculating CTC loss of the text content of the text image to be processed and the corresponding marked text content obtained by the text recognition module, so that the text content recognized by the text recognition module is more correct;

The gradient loss function obtaining module is used for calculating a gradient loss function, and specifically comprises: calculating a gradient map through a Sobel operator for the high-resolution text image corresponding to the text image to be processed, and obtaining a target gradient map; calculating the loss of L ₁ by the target gradient map and the reconstructed super-resolution gradient map, so that the reconstructed super-resolution gradient map has the pixel value of the target gradient map;

And the loss function fusion module performs weighted summation on the three loss functions of the image loss function, the text loss function and the gradient loss function to obtain a fusion loss function.

According to one aspect of the invention, the following technical scheme is adopted: a text image super-resolution reconstruction method, the method comprising:

The feature extraction module extracts a set feature layer corresponding to the image to be processed;

Inputting the feature layer to a super-resolution image reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution text image;

Inputting the feature layer into a character recognition module, downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in a character image to be processed;

and inputting the feature layer to a super-resolution gradient map reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution gradient map.

As an embodiment of the present invention, the method further comprises:

the image loss function acquisition module calculates an image loss function according to the super-resolution text image acquired by the super-resolution image reconstruction module;

the text loss function acquisition module calculates a text loss function according to the text content acquired by the text recognition module;

The gradient loss function acquisition module calculates a gradient loss function according to the super-resolution gradient map acquired by the super-resolution gradient map reconstruction module;

The loss function fusion module fuses the three loss functions of the image loss function acquired by the image loss function acquisition module, the text loss function acquired by the text loss function acquisition module and the gradient loss function acquired by the gradient loss function acquisition module to acquire a fusion loss function; and training the multi-task text image super-resolution reconstruction network by using the fusion loss function.

As one embodiment of the present invention, the feature extraction module obtains an advanced feature layer of a text image to be processed, where the advanced feature layer includes deep feature information of the text image to be processed;

The super-resolution image reconstruction module performs up-sampling of the advanced feature layer by using a deep neural network, and performs feature extraction on the feature layer after up-sampling to obtain features output by the deep neural network of each layer; determining the characteristics output by the last layer of deep neural network as reconstructed super-resolution text images;

The character recognition module performs downsampling of the advanced feature layer by a deep neural network comprising a pooling layer, so that the height of the downsampled feature layer is a set value; sending the down-sampled feature layer into a bidirectional LSTM network to extract time sequence features, and obtaining the output of the time sequence features of the character image to be processed; the time sequence features are further provided with features through the full connection layer and the softmax function, and the features of the last layer are determined to be the text content of the text image to be processed;

the super-resolution gradient map reconstruction module performs up-sampling on the advanced feature layer by using a deep neural network, performs feature extraction on the feature layer after up-sampling, and obtains features output by the deep neural network of each layer; and determining the characteristics output by the final layer of deep neural network as a reconstructed super-resolution gradient map.

As one embodiment of the present invention, the image loss function obtaining module reversely propagates the calculated image loss function to the feature extracting module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid;

The character loss function acquisition module reversely propagates the calculated character loss function to the feature extraction module through a character training gradient, so that the feature layer extracted by the feature extraction module contains rich character information, the character content of the super-resolution character image reconstructed by the super-resolution image reconstruction module is more prepared, and the credibility of the reconstructed super-resolution character image is improved;

The gradient loss function acquisition module reversely propagates the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, the super-resolution character image and character edges reconstructed by the super-resolution image reconstruction module are helped to be clearer, and the definition of the super-resolution character image after reconstruction is improved.

As one embodiment of the present invention, the image loss function obtaining module calculates an image loss function, specifically including: calculating the L ₁ loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image;

the text loss function obtaining module calculates a text loss function, and specifically includes: calculating CTC loss of the text content of the text image to be processed and the corresponding marked text content obtained by the text recognition module, so that the text content recognized by the text recognition module is more correct;

the gradient loss function obtaining module calculates a gradient loss function, and specifically includes: calculating a gradient map through a Sobel operator for the high-resolution text image corresponding to the text image to be processed, and obtaining a target gradient map; calculating the loss of L ₁ by the target gradient map and the reconstructed super-resolution gradient map, so that the reconstructed super-resolution gradient map has the pixel value of the target gradient map;

The invention has the beneficial effects that: the system and the method for reconstructing the super-resolution of the multitask text image, which are provided by the invention, reduce the reconstruction of the text image with the resolution into the text image with the super-resolution, solve the problems of fuzzy text edges and low text content credibility of the reconstructed super-resolution text image when the traditional method for reconstructing the super-resolution of the image based on the depth neural network is applied to the reconstruction of the text image, and provide clear and credible images for high-level tasks such as semantic analysis of the text image.

Compared with the existing image super-resolution reconstruction method based on the deep neural network, the method has the following two advantages:

(1) The reconstructed super-resolution text image has clear text edges:

according to the multi-task text image super-resolution reconstruction method provided by the invention, the super-resolution gradient map reconstruction module is added in parallel on the basis of the super-resolution image reconstruction module, the gradient loss function is calculated, and when network parameters are updated, the gradient is trained and propagated in the opposite direction, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, and the text edges of the super-resolution text image reconstructed by the super-resolution image reconstruction module are clearer.

(2) The reliability of the text content of the reconstructed super-resolution text image is high:

according to the multi-task text image super-resolution reconstruction method provided by the invention, the text recognition module is added in parallel on the basis of the super-resolution image reconstruction module, the text loss function is calculated, and when the network parameters are updated, the high-level feature layer extracted by the feature extraction module contains rich text information through the text training gradient back propagation, so that the text content of the super-resolution text image reconstructed by the super-resolution image reconstruction module is correct and high in reliability.

Drawings

Fig. 1 is a schematic diagram of the composition of a conventional text image super-resolution reconstruction system.

Fig. 2 is a schematic diagram illustrating a system for reconstructing super-resolution of a Chinese image according to an embodiment of the present invention.

Fig. 3 is a schematic diagram illustrating a system for reconstructing super-resolution of a Chinese image according to an embodiment of the present invention.

Fig. 4 is a schematic diagram illustrating a composition of a super-resolution reconstruction system for a chinese image according to an embodiment of the present invention.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

For a further understanding of the present invention, preferred embodiments of the invention are described below in conjunction with the examples, but it should be understood that these descriptions are merely intended to illustrate further features and advantages of the invention, and are not limiting of the claims of the invention.

The description of this section is intended to be illustrative of only a few exemplary embodiments and the invention is not to be limited in scope by the description of the embodiments. It is also within the scope of the description and claims of the invention to interchange some of the technical features of the embodiments with other technical features of the same or similar prior art.

The description of the steps in the various embodiments in the specification is merely for convenience of description, and the implementation of the present application is not limited by the order in which the steps are implemented. "connected" in the specification includes both direct and indirect connections.

The invention discloses a text image super-resolution reconstruction system, and fig. 2 and 3 are schematic diagrams of the text image super-resolution reconstruction system in an embodiment of the invention; referring to fig. 2 and 3, the system includes: the device comprises a feature extraction module 1, a super-resolution image reconstruction module 2, a character recognition module 3 and a super-resolution gradient map reconstruction module 4; the feature extraction module 1 is respectively connected with the super-resolution image reconstruction module 2, the character recognition module 3 and the super-resolution gradient map reconstruction module 4.

The feature extraction module 1 is used for extracting a set feature layer corresponding to the image to be processed; the super-resolution image reconstruction module 2 is used for upsampling the feature layer, extracting features of the upsampled feature layer, and obtaining a reconstructed super-resolution text image; the character recognition module 3 is used for downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in the character image to be processed; the super-resolution gradient map reconstruction module 4 is configured to upsample the feature layer, perform feature extraction on the upsampled feature layer, and obtain a reconstructed super-resolution gradient map.

In an embodiment of the present invention, the feature extraction module is configured to obtain an advanced feature layer of the text image to be processed, where the advanced feature layer includes deep feature information of the text image to be processed. In one embodiment, the text image to be processed may be input ESRGAN to generate a feature extraction module in the network to obtain an advanced feature layer.

FIG. 4 is a schematic diagram illustrating the composition of a super-resolution reconstruction system for Chinese image in accordance with an embodiment of the present invention; referring to fig. 4, in an embodiment of the present invention, the super-resolution image reconstruction module 2 is configured to upsample the feature layer of the deep neural network, and perform feature extraction on the upsampled feature layer to obtain features output by the deep neural network of each layer; and determining the characteristics output by the last layer of deep neural network as the reconstructed super-resolution text image.

The character recognition module 3 is configured to downsample the advanced feature layer with a deep neural network including a pooling layer, so that the height of the downsampled feature layer is set to be 1; sending the down-sampled feature layer into a bidirectional LSTM network to extract time sequence features, and obtaining the output of the time sequence features of the character image to be processed; the time sequence features are further provided with features through the full connection layer and the softmax function, and the features of the last layer are determined to be the text content of the text image to be processed.

The super-resolution gradient map reconstruction module 4 is used for upsampling the advanced feature layer by a deep neural network, and extracting features of the upsampled feature layer to obtain the output features of the deep neural network of each layer; and determining the characteristics output by the final layer of deep neural network as a reconstructed super-resolution gradient map.

As shown in fig. 4, in an embodiment of the present invention, the system further includes an image loss function obtaining module 5, a text loss function obtaining module 6, a gradient loss function obtaining module 7, and a loss function fusion module 8.

The image loss function obtaining module 5 is used for calculating an image loss function according to the super-resolution text image obtained by the super-resolution image reconstruction module; the text loss function obtaining module 6 is used for calculating a text loss function according to the text content obtained by the text recognition module; the gradient loss function obtaining module 7 is configured to calculate a gradient loss function according to the super-resolution gradient map obtained by the super-resolution gradient map reconstruction module. The loss function fusion module 8 is configured to fuse the three loss functions of the image loss function acquired by the image loss function acquisition module 5, the text loss function acquired by the text loss function acquisition module 6, and the gradient loss function acquired by the gradient loss function acquisition module 7, so as to acquire a fusion loss function; and training the multi-task text image super-resolution reconstruction network by using the fusion loss function.

In one embodiment of the present invention, the image loss function obtaining module 5 is configured to calculate an image loss function, and specifically includes: and calculating the L ₁ loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image.

The text loss function obtaining module 6 is configured to calculate a text loss function, and specifically includes: and calculating CTC loss of the text content of the text image to be processed and the corresponding marked text content, which are acquired by the text recognition module, so that the text content recognized by the text recognition module is more correct.

The gradient loss function obtaining module 7 is configured to calculate a gradient loss function, and specifically includes: calculating a gradient map through a Sobel operator for the high-resolution text image corresponding to the text image to be processed, and obtaining a target gradient map; and calculating the L ₁ loss of the target gradient map and the reconstructed super-resolution gradient map, so that the reconstructed super-resolution gradient map has the pixel value of the target gradient map.

The loss function fusion module 8 performs weighted summation on the three loss functions of the image loss function, the text loss function and the gradient loss function to obtain a fusion loss function.

With continued reference to fig. 3 and fig. 4, in an embodiment of the present invention, the image loss function obtaining module 5 is configured to back-propagate the calculated image loss function to the feature extraction module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid.

The text loss function obtaining module 6 is configured to reversely propagate the calculated text loss function to the feature extracting module through a text training gradient, so that the feature layer extracted by the feature extracting module contains abundant text information, thereby helping the super-resolution text image text content reconstructed by the super-resolution image reconstruction module to be more ready, and improving the reliability of the reconstructed super-resolution text image.

The gradient loss function obtaining module 7 is configured to reversely propagate the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, thereby helping super-resolution text images and text edges reconstructed by the super-resolution image reconstruction module to be clearer, and improving definition of the reconstructed super-resolution text images.

The loss function fusion module 8 is further configured to back-propagate the fusion loss function to the image loss function acquisition module 5, the text loss function acquisition module 6, and the gradient loss function acquisition module 7.

The invention also discloses a method for reconstructing the super-resolution of the multi-task text image, which can be referred to as fig. 4, and comprises the following steps:

In an embodiment of the present invention, the feature extraction module obtains an advanced feature layer of the text image to be processed, where the advanced feature layer includes deep feature information of the text image to be processed.

In an embodiment of the present invention, the super-resolution image reconstruction module performs upsampling on the feature layer by using a deep neural network, and performs feature extraction on the upsampled feature layer to obtain features output by the deep neural network of each layer; and determining the characteristics output by the last layer of deep neural network as the reconstructed super-resolution text image.

The character recognition module performs downsampling of the advanced feature layer by a deep neural network comprising a pooling layer, so that the height of the downsampled feature layer is a set value; sending the down-sampled feature layer into a bidirectional LSTM network to extract time sequence features, and obtaining the output of the time sequence features of the character image to be processed; the time sequence features are further provided with features through the full connection layer and the softmax function, and the features of the last layer are determined to be the text content of the text image to be processed.

In an embodiment of the invention, the method further comprises a training process.

With continued reference to fig. 4, in one embodiment of the present invention, the method further includes:

In an embodiment of the present invention, the image loss function obtaining module calculates an image loss function, specifically including: and calculating the L ₁ loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image.

The text loss function obtaining module calculates a text loss function, and specifically includes: and calculating CTC loss of the text content of the text image to be processed and the corresponding marked text content, which are acquired by the text recognition module, so that the text content recognized by the text recognition module is more correct.

The gradient loss function obtaining module calculates a gradient loss function, and specifically includes: calculating a gradient map through a Sobel operator for the high-resolution text image corresponding to the text image to be processed, and obtaining a target gradient map; and calculating the L ₁ loss of the target gradient map and the reconstructed super-resolution gradient map, so that the reconstructed super-resolution gradient map has the pixel value of the target gradient map.

The image loss function acquisition module reversely propagates the calculated image loss function to the feature extraction module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid;

the gradient loss function acquisition module reversely propagates the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, the super-resolution character image and character edges reconstructed by the super-resolution image reconstruction module are clearer, and the definition of the reconstructed super-resolution character image is improved;

the loss function fusion module reversely propagates the fusion loss function to the image loss function acquisition module, the text loss function acquisition module and the gradient loss function acquisition module.

In summary, the system and the method for reconstructing the multitask text image super-resolution provided by the invention have the advantages that the text image with reduced resolution is reconstructed into the text image with super-resolution, so that the problems of blurred text edges and low text content credibility of the reconstructed super-resolution text image are solved when the traditional method for reconstructing the text image based on the image super-resolution of the depth neural network is applied to text image reconstruction, and clear and credible images are provided for high-level tasks such as semantic analysis of the text image.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware; for example, an Application Specific Integrated Circuit (ASIC), a general purpose computer, or any other similar hardware device may be employed. In some embodiments, the software program of the present application may be executed by a processor to implement the above steps or functions. Likewise, the software program of the present application (including the related data structures) may be stored in a computer-readable recording medium; such as RAM memory, magnetic or optical drives or diskettes, and the like. In addition, some steps or functions of the present application may be implemented in hardware; for example, as circuitry that cooperates with the processor to perform various steps or functions.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The description and applications of the present invention herein are illustrative and are not intended to limit the scope of the invention to the embodiments described above. Effects or advantages referred to in the embodiments may not be embodied in the embodiments due to interference of various factors, and description of the effects or advantages is not intended to limit the embodiments. Variations and modifications of the embodiments disclosed herein are possible, and alternatives and equivalents of the various components of the embodiments are known to those of ordinary skill in the art. It will be clear to those skilled in the art that the present invention may be embodied in other forms, structures, arrangements, proportions, and with other assemblies, materials, and components, without departing from the spirit or essential characteristics thereof. Other variations and modifications of the embodiments disclosed herein may be made without departing from the scope and spirit of the invention.

Claims

1. A text image super-resolution reconstruction system, the system comprising:

The super-resolution gradient map reconstruction module is connected with the feature extraction module and is used for upsampling the feature layer and extracting features of the upsampled feature layer to obtain a reconstructed super-resolution gradient map;

the loss function fusion module is used for fusing the three loss functions of the image loss function acquired by the image loss function acquisition module, the text loss function acquired by the text loss function acquisition module and the gradient loss function acquired by the gradient loss function acquisition module to acquire a fusion loss function; training a multi-task text image super-resolution reconstruction network by utilizing the fusion loss function;

the feature extraction module is used for obtaining an advanced feature layer of the character image to be processed, wherein the advanced feature layer comprises deep feature information of the character image to be processed;

The super-resolution gradient map reconstruction module is used for upsampling the advanced feature layer by the deep neural network, and extracting features of the upsampled feature layer to obtain the output features of the deep neural network of each layer; determining the characteristics output by the last layer of deep neural network as a reconstructed super-resolution gradient map;

The image loss function acquisition module is used for reversely transmitting the calculated image loss function to the feature extraction module through an image training gradient; the advanced feature layer extracted by the feature extraction module contains rich image information, so that the super-resolution text image reconstructed by the super-resolution reconstruction module is more vivid;

The gradient loss function acquisition module is used for reversely transmitting the calculated gradient loss function to the feature extraction module through gradient training gradient, so that the advanced feature layer extracted by the feature extraction module contains rich gradient information, the super-resolution character image and the character edge reconstructed by the super-resolution image reconstruction module are more clear, and the definition of the reconstructed super-resolution character image is improved;

The image loss function obtaining module is used for calculating an image loss function, and specifically comprises: calculating the L ₁ loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image;

2. The super-resolution reconstruction method for the text image is characterized by comprising the following steps of:

Inputting the feature layer into a character recognition module, downsampling the feature layer, extracting time sequence features of the downsampled feature layer, and recognizing characters of the extracted time sequence features to obtain character contents in a character image to be processed; inputting the feature layer to a super-resolution gradient map reconstruction module, up-sampling the feature layer, and extracting features of the up-sampled feature layer to obtain a reconstructed super-resolution gradient map;

The loss function fusion module fuses the three loss functions of the image loss function acquired by the image loss function acquisition module, the text loss function acquired by the text loss function acquisition module and the gradient loss function acquired by the gradient loss function acquisition module to acquire a fusion loss function; training a multi-task text image super-resolution reconstruction network by utilizing the fusion loss function;

the feature extraction module acquires an advanced feature layer of the character image to be processed, wherein the advanced feature layer comprises deep feature information of the character image to be processed;

The super-resolution gradient map reconstruction module performs up-sampling on the advanced feature layer by using a deep neural network, performs feature extraction on the feature layer after up-sampling, and obtains features output by the deep neural network of each layer; determining the characteristics output by the last layer of deep neural network as a reconstructed super-resolution gradient map;

The image loss function obtaining module calculates an image loss function, and specifically includes: calculating the L ₁ loss of the high-resolution text image corresponding to the reconstructed super-resolution text image and the text image to be processed, so that the reconstructed super-resolution text image has the pixel value of the corresponding high-resolution text image;