WO2023035531A1 - Super-resolution reconstruction method for text image and related device thereof - Google Patents

Super-resolution reconstruction method for text image and related device thereof Download PDF

Info

Publication number
WO2023035531A1
WO2023035531A1 PCT/CN2022/071883 CN2022071883W WO2023035531A1 WO 2023035531 A1 WO2023035531 A1 WO 2023035531A1 CN 2022071883 W CN2022071883 W CN 2022071883W WO 2023035531 A1 WO2023035531 A1 WO 2023035531A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
resolution picture
super
loss function
low
Prior art date
Application number
PCT/CN2022/071883
Other languages
French (fr)
Chinese (zh)
Inventor
郑喜民
翟尤
舒畅
陈又新
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023035531A1 publication Critical patent/WO2023035531A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present application relates to the technical field of artificial intelligence, in particular to a text image super-resolution reconstruction method and related equipment.
  • Super-resolution reconstruction means that for any given low-resolution picture, the corresponding high-resolution picture is generated through a convolutional neural network, and the details and textures in the picture are preserved and restored as much as possible.
  • Super-resolution reconstruction technology plays a good role in promoting the development of related fields such as image classification, segmentation, tracking, and dehazing, and plays an important role in the development of neural networks.
  • the inventors realized that text pictures are different from natural scenes, and the text content has fixed shapes and clear edges, and the reconstruction requirements are higher. For ordinary pictures, most of the scenes in the picture are natural and random, and it is easier to convert low-resolution pictures into high-resolution ones. For the text in the scene, if there is distortion, sudden change in color, or blurring of text edges in the reconstructed image, the quality of the reconstructed image will be significantly reduced.
  • the purpose of the embodiment of the present application is to propose a text image super-resolution reconstruction method and its related equipment, so as to ensure the quality of the text image super-resolution reconstruction.
  • the embodiment of the present application provides a text image super-resolution reconstruction method, which adopts the following technical solution:
  • a text image super-resolution reconstruction method comprising the steps of:
  • Receive a low-resolution picture and a corresponding high-resolution picture input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
  • Receive the low-resolution picture to be converted input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
  • the embodiment of the present application also provides a text image super-resolution reconstruction device, which adopts the following technical solutions:
  • a text image super-resolution reconstruction device comprising:
  • a receiving module configured to receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
  • An upsampling module configured to generate a text mask based on the text position information and the text content information, and perform upsampling on the text mask to obtain a target mask;
  • a generating module configured to input the low-resolution image and the target mask into a preset generation layer of the confrontation network to obtain an output super-resolution image
  • a discrimination module configured to simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate the discrimination accuracy based on the discrimination result;
  • a calculation module configured to calculate the loss function of the confrontation network based on the low-resolution picture and the target mask, until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, a post-training against the network;
  • the obtaining module is used to receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
  • the embodiment of the present application also provides a computer device, which adopts the following technical solution:
  • a computer device comprising a memory and a processor, computer-readable instructions are stored in the memory, and the processor implements the steps of the text image super-resolution reconstruction method as follows when executing the computer-readable instructions:
  • Receive a low-resolution picture and a corresponding high-resolution picture input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
  • Receive the low-resolution picture to be converted input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
  • the embodiment of the present application also provides a computer-readable storage medium, which adopts the following technical solution:
  • a computer-readable storage medium computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the following text image super-resolution reconstruction method are realized:
  • Receive a low-resolution picture and a corresponding high-resolution picture input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
  • Receive the low-resolution picture to be converted input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
  • the application obtains the text position and text content information through the received low-resolution pictures, and generates a text mask based on the text position information and text content information.
  • the generation of the text mask takes into account the text position information and content information, and then can Clarify the boundary between the text in the picture and the surrounding image, so that the text in the super-resolution picture generated subsequently is clear, and the quality of the reconstructed picture is significantly improved.
  • By upsampling the text mask the text mask is enlarged, and the resolution of the text mask is improved, which facilitates the subsequent generation of super-resolution images.
  • the trained adversarial network is obtained to generate a target super-resolution image with better quality.
  • FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
  • Fig. 2 is the flowchart of an embodiment according to the text image super-resolution reconstruction method of the present application
  • FIG. 3 is a schematic structural diagram of an embodiment of a text image super-resolution reconstruction device according to the present application.
  • Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • a system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 .
  • the network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 .
  • Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
  • Terminal devices 101 , 102 , 103 Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102, 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
  • Terminal devices 101, 102, 103 can be various electronic devices with display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4) players, laptops and desktop computers, etc.
  • MP3 players Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3
  • MP4 Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4
  • laptops and desktop computers etc.
  • the server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal devices 101 , 102 , 103 .
  • the text image super-resolution reconstruction method provided in the embodiment of the present application is generally executed by a server/terminal device, and correspondingly, the text image super-resolution reconstruction device is generally set in the server/terminal device.
  • terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
  • FIG. 2 shows a flowchart of an embodiment of a text image super-resolution reconstruction method according to the present application.
  • the described text image super-resolution reconstruction method comprises the following steps:
  • S1 Receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information.
  • the size of the low resolution image (Low Resolution image, LR image) is W*H.
  • the scene text recognition model of this application is: text recognition model ASTER ("Aster: An attentional scene text recognizer with flexible rectification”. This application completes the training of the text recognition model in advance.
  • the electronic device on which the text image super-resolution reconstruction method runs can receive the low-resolution picture and the corresponding high-resolution picture through a wired or wireless connection. resolution picture.
  • the above wireless connection methods may include but not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods known or developed in the future .
  • S2 Generate a text mask based on the text position information and the text content information, and perform upsampling on the text mask to obtain a target mask.
  • a text mask emphasizing only the text part is generated based on the text position information and the text content information, and the size of the text mask is the same as that of the low-resolution picture.
  • the pixels where the text exists are marked as 1, and the pixels where the text does not exist are marked as 0, that is, a two-dimensional mask is obtained.
  • the size of the text mask is H*W, it is a low-resolution image. Text mask.
  • the target mask Upsampling the text mask to obtain a new text mask, that is, the target mask, whose size is rW*rH, at this time the size of the target mask is the same as the generated high-resolution image, where, for the received
  • the high-resolution image needs to be resized to be consistent with the size of the target mask, which is convenient for subsequent calculations.
  • the target mask is used for subsequent supervision of the generation result of the generation layer (ie, the super-resolution image). This application does not need to label high-resolution images to complete the scene text recognition part of the operation.
  • the step of generating a text mask based on the text position information and the text content information includes:
  • the text mask is generated based on the target text location information.
  • a text mask is generated based on the text position information and the text content information.
  • the shooting of some places is not clear, or the recognition result of the position is wrong, and the computer recognizes the text content information to generate a mask with higher accuracy. For example, if the content is not considered, the output mask may be "goed", while the network that considers the content outputs "good".
  • the step of upsampling the text mask and obtaining the target mask includes:
  • the application performs 5 times up-sampling on the text mask, and the text mask is enlarged by 5 times to improve the resolution of the text mask, and the generated super-resolution picture is 5 times larger than the low-resolution picture.
  • S3 Input the low-resolution picture and the target mask into a generation layer of a preset confrontation network to obtain an output super-resolution picture.
  • the computer After the scene text is recognized, the computer generates a super-resolution image through the generation layer of the confrontation network (Generative Adversarial Networks, GAN): the low-resolution image and the target mask are simultaneously input into the generation layer of the generation confrontation network ( Generator), the generation layer (Generator) generates a super-resolution image (Super Resolution image, SRimage).
  • GAN Generation Layer of the confrontation network
  • the generation layer Generation layer
  • SRimage Super Resolution image
  • S4 Simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate a discrimination accuracy rate based on the discrimination result.
  • the super-resolution picture and the high-resolution picture are input into the discriminator layer (Discriminator) at the same time, and the discriminator layer outputs the discrimination result, that is, the output super-resolution picture or the high-resolution picture , for example, the discriminant layer outputs 0 or 1, where 0 represents "the picture is a generated picture (super-resolution picture)", and 1 represents that the picture is a real picture (high-resolution picture).
  • the accuracy rate is lower than the accuracy rate threshold, it is determined that the discriminative layer is difficult to distinguish whether the input is a real image or a super-resolution image generated by the generation layer, indicating that the super-resolution image generated by the generation layer is of high quality and similar to the real image, that is, the training goal is completed , used in practical applications.
  • the calculation of the accuracy rate is the ratio of the number of correct judgment results output by the judgment layer within a preset time period to the total number of judgments.
  • the loss function involved in this application is mainly the loss function involved in generating images generated by the adversarial network, including content loss function (content loss), adversarial loss function (adversarial loss) and regularization loss function (regularization loss), And a text perceptual loss designed for text masks.
  • content loss content loss
  • adversarial loss function adversarial loss function
  • regularization loss function regularization loss
  • text perceptual loss designed for text masks.
  • the step of calculating the loss function of the confrontation network based on the low-resolution picture and the target mask includes:
  • the content loss function calculates the mean square error, and the width and length of the super-resolution picture and the high-resolution picture are rW and rH respectively.
  • This application calculates the difference sum of pixels in all positions of the super-resolution image and the high-resolution image whose width is rW and length is rH, and divided by the number of pixels, as the text perception loss. What is calculated is the loss between the super-resolution image and the high-resolution image.
  • the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask includes:
  • the characteristics of the adversarial loss function are:
  • G ⁇ G (I LR ) is the super-resolution picture
  • D ⁇ D is the discriminant layer
  • M is the total number of the super-resolution picture
  • m represents the super-resolution picture number.
  • the adversarial loss requires the discriminative layer D to successfully distinguish between the super-resolution image generated by the generative layer G and the natural high-resolution image input therein.
  • M is the total number of super-resolution pictures input into the discriminative layer.
  • the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask includes:
  • G ⁇ G (I LR ) x is the regularization loss function
  • y is the value of the pixel point of the super-resolution image at (x, y) position
  • rW and rH are the width and long
  • r 2 WH is the total number of pixels in the target mask
  • represents the norm
  • the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask includes:
  • l TR is the text perception loss function
  • N is the total number of text existence position pixels, for the target mask, is the super-resolution picture.
  • N represents the total number of pixels where the text exists. After summing all the differences and dividing by N, it is the text perception function. Through the text-aware function, the generative layer will produce clearer text when constructing new images. In this application, the position pixels where the text exists in the mask are marked as 1, and the position pixels that do not exist are marked as 0, and the target mask is generated after up-sampling. The target mask supervises the generation results of the generative layer through a text-aware loss function to emphasize only the text.
  • S6 Receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained adversarial network, and obtain the output target super-resolution picture.
  • a target super-resolution picture with higher quality can be generated to ensure that the text information in the picture is clear and complete.
  • the application obtains the text position and text content information through the received low-resolution pictures, and generates a text mask based on the text position information and text content information.
  • the generation of the text mask takes into account the text position information and content information, and then can Clarify the boundary between the text in the picture and the surrounding image, so that the text in the super-resolution picture generated subsequently is clear, and the quality of the reconstructed picture is significantly improved.
  • By upsampling the text mask the text mask is enlarged, and the resolution of the text mask is improved, which facilitates the subsequent generation of super-resolution images.
  • the trained adversarial network is obtained to generate a target super-resolution image with better quality.
  • the above-mentioned trained adversarial network can also be stored in a block chain node.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • AI artificial intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • the application can be applied in the field of smart medical treatment, and can be used to restore low-resolution pictures in the medical field, thereby promoting the construction of a smart city.
  • the aforementioned storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
  • the present application provides an embodiment of a text image super-resolution reconstruction device, which corresponds to the method embodiment shown in FIG. 2 ,
  • the device can be specifically applied to various electronic devices.
  • the text image super-resolution reconstruction device 300 in this embodiment includes: a receiving module 301 , an upsampling module 302 , a generating module 303 , a judging module 304 , a calculating module 305 and an obtaining module 306 .
  • the receiving module 301 is used to receive the low-resolution picture and the corresponding high-resolution picture, input the low-resolution picture into the pre-trained scene text recognition model, and obtain the output text position information and text content information;
  • An upsampling module 302 configured to generate a text mask based on the text position information and the text content information, and upsample the text mask to obtain a target mask;
  • a generation module 303 configured to convert the low The resolution picture and the target mask are input into the generation layer of the preset confrontation network to obtain the output super-resolution picture;
  • the discrimination module 304 is used to simultaneously combine the super-resolution picture and the high-resolution picture Input to the discriminant layer of the confrontation network to obtain the output discriminant result, and calculate the discriminant accuracy rate based on the discriminant result;
  • the calculation module 305 is used to calculate the The loss function of the adversarial network, until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, the trained adversarial network is obtained;
  • the application obtains its text position and text content information through the received low-resolution pictures, and generates a text mask based on the text position information and text content information, and the generation of its text mask takes into account the text position information And content information, and then can clarify the boundary between the text in the picture and the surrounding image, so that the text in the super-resolution picture generated subsequently is clear, and the quality of the reconstructed picture is significantly improved.
  • the text mask is enlarged, and the resolution of the text mask is improved, which facilitates the subsequent generation of super-resolution images.
  • the trained adversarial network is obtained to generate a target super-resolution image with better quality.
  • the up-sampling module 302 includes a correction submodule and a generation submodule, wherein the correction submodule is used to modify the text position information based on the text content information to obtain target text position information; the generation submodule is used to modify the text position information based on the target text position information to generate the text mask.
  • the upsampling module 302 is further configured to: perform multiple upsampling on the text mask to obtain the target mask.
  • the calculation module 305 is further configured to: calculate a content loss function of the adversarial network based on the low-resolution picture, and the content loss function is characterized by:
  • the calculation module 305 is further configured to: calculate the adversarial loss function of the adversarial network based on the low-resolution picture, and the characteristics of the adversarial loss function are:
  • G ⁇ G (I LR ) is the super-resolution picture
  • D ⁇ D is the discriminant layer
  • M is the total number of the super-resolution picture
  • m represents the super-resolution picture number.
  • the calculation module 305 is further configured to: calculate a regularization loss function of the adversarial network based on the low-resolution picture, and the regularization loss function is characterized by:
  • G ⁇ G (I LR ) x is the regularization loss function
  • y is the value of the pixel point of the super-resolution image at (x, y) position
  • rW and rH are the width and long
  • r 2 WH is the total number of pixels in the target mask
  • represents the norm
  • the calculation module 305 is further configured to: calculate a text-aware loss function of the adversarial network based on the low-resolution picture, and the characteristics of the text-aware loss function are:
  • l TR is the text perception loss function
  • N is the total number of text existence position pixels, for the target mask, is the super-resolution picture.
  • the application obtains the text position and text content information through the received low-resolution pictures, and generates a text mask based on the text position information and text content information.
  • the generation of the text mask takes into account the text position information and content information, and then can Clarify the boundary between the text in the picture and the surrounding image, so that the text in the super-resolution picture generated subsequently is clear, and the quality of the reconstructed picture is significantly improved.
  • By upsampling the text mask the text mask is enlarged, and the resolution of the text mask is improved, which facilitates the subsequent generation of super-resolution images.
  • the trained adversarial network is obtained to generate a target super-resolution image with better quality.
  • FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device 200 includes a memory 201 , a processor 202 , and a network interface 203 connected to each other through a system bus for communication. It should be noted that only the computer device 200 having components 201-203 is shown in the figure, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, dedicated Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • the computer equipment may be computing equipment such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can perform human-computer interaction with the user through keyboard, mouse, remote controller, touch panel or voice control device.
  • the memory 201 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the storage 201 may be an internal storage unit of the computer device 200 , such as a hard disk or memory of the computer device 200 .
  • the memory 201 can also be an external storage device of the computer device 200, such as a plug-in hard disk equipped on the computer device 200, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the storage 201 may also include both the internal storage unit of the computer device 200 and its external storage device.
  • the memory 201 is generally used to store the operating system and various application software installed in the computer device 200 , such as computer-readable instructions of a text image super-resolution reconstruction method and the like.
  • the memory 201 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 202 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chips in some embodiments.
  • the processor 202 is generally used to control the overall operation of the computer device 200.
  • the processor 202 is configured to run computer-readable instructions stored in the memory 201 or process data, such as computer-readable instructions for running the text image super-resolution reconstruction method.
  • the network interface 203 may include a wireless network interface or a wired network interface, and the network interface 203 is generally used to establish a communication connection between the computer device 200 and other electronic devices.
  • the generation of the text mask takes into account the position information and content information of the text, so that the boundary between the text in the picture and the surrounding image can be clarified, and the quality of the reconstructed picture can be significantly improved.
  • the trained adversarial network is obtained to generate a target super-resolution image with better quality.
  • the present application also provides another implementation manner, which is to provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is made to execute the steps of the above text image super-resolution reconstruction method.
  • the generation of the text mask takes into account the position information and content information of the text, so that the boundary between the text in the picture and the surrounding image can be clarified, and the quality of the reconstructed picture can be significantly improved.
  • the trained adversarial network is obtained to generate a target super-resolution image with better quality.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
  • a terminal device which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Abstract

Embodiments of the present application belong to the technical field of artificial intelligence, are applied to the field of intelligent medical treatment, and relate to a super-resolution reconstruction method for a text image and a related device thereof. The method comprises: inputting a low-resolution picture into a scene text recognition model to obtain text position and text content information; generating a text mask on the basis of the text position information and the text content information, and upsampling the text mask to obtain a target mask; inputting the low-resolution picture and the target mask into an adversarial network to obtain a discrimination result, and calculating discrimination accuracy on the basis of the discrimination result; calculating a loss function on the basis of the low-resolution picture and the target mask until the loss function converges, and obtaining a trained adversarial network when the discrimination accuracy is lower than an accuracy threshold; and inputting a received low-resolution picture to be converted into the trained adversarial network to obtain a target super-resolution picture. The trained adversarial network may be stored in a blockchain. The present application may ensure the quality of the super-resolution reconstruction of a text image.

Description

文本图像超分辨率重建方法及其相关设备Text image super-resolution reconstruction method and related equipment
本申请要求于2021年9月10日提交中国专利局、申请号为202111061974.7,发明名称为“文本图像超分辨率重建方法及其相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202111061974.7 filed on September 10, 2021 with the title of "Text Image Super-Resolution Reconstruction Method and Related Devices" filed with the China Patent Office, the entire contents of which are incorporated by reference in this application.
技术领域technical field
本申请涉及人工智能技术领域,尤其涉及文本图像超分辨率重建方法及其相关设备。The present application relates to the technical field of artificial intelligence, in particular to a text image super-resolution reconstruction method and related equipment.
背景技术Background technique
超分辨率重建,是指对于给定的任意低分辨率图片,通过卷积神经网络产生对应的高分辨率图片,对于图片中的细节和纹理,尽可能的进行保留和恢复。超分辨率重建技术对于图像分类、分割、追踪以及去雾等相关领域的发展都起到良好的促进作用,在神经网络的发展中占有重要地位。Super-resolution reconstruction means that for any given low-resolution picture, the corresponding high-resolution picture is generated through a convolutional neural network, and the details and textures in the picture are preserved and restored as much as possible. Super-resolution reconstruction technology plays a good role in promoting the development of related fields such as image classification, segmentation, tracking, and dehazing, and plays an important role in the development of neural networks.
发明人意识到,文本图片与自然景物不同,文本内容拥有固定的形状和清晰的边缘,重建要求更高。对于普通图片而言,图中的大多数景物都是自然的,随意的,将低分辨率图片转化为高分辨率较为容易。对于场景中的文本,重建图片中如果出现扭曲、颜色突变或者文字边缘和其他景物融合,模糊不清,都会显著降低重建图片的质量。The inventors realized that text pictures are different from natural scenes, and the text content has fixed shapes and clear edges, and the reconstruction requirements are higher. For ordinary pictures, most of the scenes in the picture are natural and random, and it is easier to convert low-resolution pictures into high-resolution ones. For the text in the scene, if there is distortion, sudden change in color, or blurring of text edges in the reconstructed image, the quality of the reconstructed image will be significantly reduced.
发明内容Contents of the invention
本申请实施例的目的在于提出一种文本图像超分辨率重建方法及其相关设备,实现保证对文本图像的超分辨率重建的质量。The purpose of the embodiment of the present application is to propose a text image super-resolution reconstruction method and its related equipment, so as to ensure the quality of the text image super-resolution reconstruction.
为了解决上述技术问题,本申请实施例提供一种文本图像超分辨率重建方法,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application provides a text image super-resolution reconstruction method, which adopts the following technical solution:
一种文本图像超分辨率重建方法,包括下述步骤:A text image super-resolution reconstruction method, comprising the steps of:
接收低分辨率图片和对应的高分辨率图片,将所述低分辨率图片输入至预先训练的场景文本识别模型中,获得输出的文本位置信息和文本内容信息;Receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
基于所述文本位置信息和所述文本内容信息生成文本掩模,并对所述文本掩膜进行上采样,获得目标掩膜;generating a text mask based on the text position information and the text content information, and upsampling the text mask to obtain a target mask;
将所述低分辨率图片和所述目标掩膜输入至预设的对抗网络的生成层中,获得输出的超分辨率图片;Input the low-resolution picture and the target mask into the generation layer of the preset confrontation network to obtain an output super-resolution picture;
将所述超分辨率图片和所述高分辨率图片同时输入至所述对抗网络的判别层中,获得输出的判别结果,并基于所述判别结果计算判别准确率;Simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate the discrimination accuracy rate based on the discrimination result;
基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数,直至所述损失函数收敛,且所述判别准确率低于准确率阈值时,获得训练后的对抗网络;Calculate the loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, a trained confrontation network is obtained;
接收待转化低分辨率图片,将所述待转化低分辨率图片输入至训练后的对抗网络中,获得输出的目标超分辨率图片。Receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
为了解决上述技术问题,本申请实施例还提供一种文本图像超分辨率重建装置,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application also provides a text image super-resolution reconstruction device, which adopts the following technical solutions:
一种文本图像超分辨率重建装置,包括:A text image super-resolution reconstruction device, comprising:
接收模块,用于接收低分辨率图片和对应的高分辨率图片,将所述低分辨率图片输入至预先训练的场景文本识别模型中,获得输出的文本位置信息和文本内容信息;A receiving module, configured to receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
上采样模块,用于基于所述文本位置信息和所述文本内容信息生成文本掩模,并对所述文本掩膜进行上采样,获得目标掩膜;An upsampling module, configured to generate a text mask based on the text position information and the text content information, and perform upsampling on the text mask to obtain a target mask;
生成模块,用于将所述低分辨率图片和所述目标掩膜输入至预设的对抗网络的生成层中,获得输出的超分辨率图片;A generating module, configured to input the low-resolution image and the target mask into a preset generation layer of the confrontation network to obtain an output super-resolution image;
判别模块,用于将所述超分辨率图片和所述高分辨率图片同时输入至所述对抗网络的判别层中,获得输出的判别结果,并基于所述判别结果计算判别准确率;A discrimination module, configured to simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate the discrimination accuracy based on the discrimination result;
计算模块,用于基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数,直至所述损失函数收敛,且所述判别准确率低于准确率阈值时,获得训练后的对抗网络;A calculation module, configured to calculate the loss function of the confrontation network based on the low-resolution picture and the target mask, until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, a post-training against the network;
获得模块,用于接收待转化低分辨率图片,将所述待转化低分辨率图片输入至训练后的对抗网络中,获得输出的目标超分辨率图片。The obtaining module is used to receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solution:
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的文本图像超分辨率重建方法的步骤:A computer device, comprising a memory and a processor, computer-readable instructions are stored in the memory, and the processor implements the steps of the text image super-resolution reconstruction method as follows when executing the computer-readable instructions:
接收低分辨率图片和对应的高分辨率图片,将所述低分辨率图片输入至预先训练的场景文本识别模型中,获得输出的文本位置信息和文本内容信息;Receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
基于所述文本位置信息和所述文本内容信息生成文本掩模,并对所述文本掩膜进行上采样,获得目标掩膜;generating a text mask based on the text position information and the text content information, and upsampling the text mask to obtain a target mask;
将所述低分辨率图片和所述目标掩膜输入至预设的对抗网络的生成层中,获得输出的超分辨率图片;Input the low-resolution picture and the target mask into the generation layer of the preset confrontation network to obtain an output super-resolution picture;
将所述超分辨率图片和所述高分辨率图片同时输入至所述对抗网络的判别层中,获得输出的判别结果,并基于所述判别结果计算判别准确率;Simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate the discrimination accuracy rate based on the discrimination result;
基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数,直至所述损失函数收敛,且所述判别准确率低于准确率阈值时,获得训练后的对抗网络;Calculate the loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, a trained confrontation network is obtained;
接收待转化低分辨率图片,将所述待转化低分辨率图片输入至训练后的对抗网络中,获得输出的目标超分辨率图片。Receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiment of the present application also provides a computer-readable storage medium, which adopts the following technical solution:
一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的文本图像超分辨率重建方法的步骤:A computer-readable storage medium, computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the following text image super-resolution reconstruction method are realized:
接收低分辨率图片和对应的高分辨率图片,将所述低分辨率图片输入至预先训练的场景文本识别模型中,获得输出的文本位置信息和文本内容信息;Receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
基于所述文本位置信息和所述文本内容信息生成文本掩模,并对所述文本掩膜进行上采样,获得目标掩膜;generating a text mask based on the text position information and the text content information, and upsampling the text mask to obtain a target mask;
将所述低分辨率图片和所述目标掩膜输入至预设的对抗网络的生成层中,获得输出的超分辨率图片;Input the low-resolution picture and the target mask into the generation layer of the preset confrontation network to obtain an output super-resolution picture;
将所述超分辨率图片和所述高分辨率图片同时输入至所述对抗网络的判别层中,获得输出的判别结果,并基于所述判别结果计算判别准确率;Simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate the discrimination accuracy rate based on the discrimination result;
基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数,直至所述损失函数收敛,且所述判别准确率低于准确率阈值时,获得训练后的对抗网络;Calculate the loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, a trained confrontation network is obtained;
接收待转化低分辨率图片,将所述待转化低分辨率图片输入至训练后的对抗网络中, 获得输出的目标超分辨率图片。Receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
与现有技术相比,本申请实施例主要有以下有益效果:Compared with the prior art, the embodiments of the present application mainly have the following beneficial effects:
本申请通过接收的低分辨率图片获得其文本位置和文本内容信息,并基于文本位置信息和文本内容信息生成文本掩膜,其文本掩膜的生成考虑到了文本的位置信息和内容信息,进而能够明确图片中文本与周围图像的之间的界限,使得后续生成的超分辨率图片中的文字清晰,显著提升重建后的图片的质量。通过对文本掩膜进行上采样,对文本掩膜进行放大,提高文本掩膜的分辨率,进而便于后续生成超分辨率图片。通过对抗网络中生成层和判别层的对抗训练,获得训练后的对抗网络,用于生成质量更加的目标超分辨率图片。The application obtains the text position and text content information through the received low-resolution pictures, and generates a text mask based on the text position information and text content information. The generation of the text mask takes into account the text position information and content information, and then can Clarify the boundary between the text in the picture and the surrounding image, so that the text in the super-resolution picture generated subsequently is clear, and the quality of the reconstructed picture is significantly improved. By upsampling the text mask, the text mask is enlarged, and the resolution of the text mask is improved, which facilitates the subsequent generation of super-resolution images. Through the adversarial training of the generation layer and the discriminative layer in the adversarial network, the trained adversarial network is obtained to generate a target super-resolution image with better quality.
附图说明Description of drawings
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solution in this application more clearly, a brief introduction will be given below to the accompanying drawings that need to be used in the description of the embodiments of the application. Obviously, the accompanying drawings in the following description are some embodiments of the application. Ordinary technicians can also obtain other drawings based on these drawings on the premise of not paying creative work.
图1是本申请可以应用于其中的示例性系统架构图;FIG. 1 is an exemplary system architecture diagram to which the present application can be applied;
图2是根据本申请的文本图像超分辨率重建方法的一个实施例的流程图;Fig. 2 is the flowchart of an embodiment according to the text image super-resolution reconstruction method of the present application;
图3是根据本申请的文本图像超分辨率重建装置的一个实施例的结构示意图;FIG. 3 is a schematic structural diagram of an embodiment of a text image super-resolution reconstruction device according to the present application;
图4是根据本申请的计算机设备的一个实施例的结构示意图。Fig. 4 is a schematic structural diagram of an embodiment of a computer device according to the present application.
附图标记:200、计算机设备;201、存储器;202、处理器;203、网络接口;300、文本图像超分辨率重建装置;301、接收模块;302、上采样模块;303、生成模块;304、判别模块;305、计算模块;306、获得模块。Reference numerals: 200, computer equipment; 201, memory; 202, processor; 203, network interface; 300, text image super-resolution reconstruction device; 301, receiving module; 302, upsampling module; 303, generating module; 304 . Discrimination module; 305. Calculation module; 306. Obtaining module.
具体实施方式Detailed ways
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by those skilled in the technical field of the application; the terms used herein in the description of the application are only to describe specific embodiments The purpose is not to limit the present application; the terms "comprising" and "having" and any variations thereof in the specification and claims of the present application and the description of the above drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的 实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are independent or alternative embodiments mutually exclusive of other embodiments. It is understood explicitly and implicitly by those skilled in the art that the embodiments described herein can be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to enable those skilled in the art to better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings.
如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1 , a system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 . The network 104 is used as a medium for providing communication links between the terminal devices 101 , 102 , 103 and the server 105 . Network 104 may include various connection types, such as wires, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。Users can use terminal devices 101 , 102 , 103 to interact with server 105 via network 104 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 101, 102, 103, such as web browser applications, shopping applications, search applications, instant messaging tools, email clients, social platform software, and the like.
终端设备101、102、103可以是具有显示屏并且支持网页浏览的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机和台式计算机等等。 Terminal devices 101, 102, 103 can be various electronic devices with display screens and support web browsing, including but not limited to smartphones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts compress standard audio layer 4) players, laptops and desktop computers, etc.
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上显示的页面提供支持的后台服务器。The server 105 may be a server that provides various services, such as a background server that provides support for pages displayed on the terminal devices 101 , 102 , 103 .
需要说明的是,本申请实施例所提供的文本图像超分辨率重建方法一般由服务器/终端设备执行,相应地,文本图像超分辨率重建装置一般设置于服务器/终端设备中。It should be noted that the text image super-resolution reconstruction method provided in the embodiment of the present application is generally executed by a server/terminal device, and correspondingly, the text image super-resolution reconstruction device is generally set in the server/terminal device.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in Fig. 1 are only illustrative. According to the implementation needs, there can be any number of terminal devices, networks and servers.
继续参考图2,示出了根据本申请的文本图像超分辨率重建方法的一个实施例的流程图。所述的文本图像超分辨率重建方法,包括以下步骤:Continuing to refer to FIG. 2 , it shows a flowchart of an embodiment of a text image super-resolution reconstruction method according to the present application. The described text image super-resolution reconstruction method comprises the following steps:
S1:接收低分辨率图片和对应的高分辨率图片,将所述低分辨率图片输入至预先训练的场景文本识别模型中,获得输出的文本位置信息和文本内容信息。S1: Receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information.
在本实施例中,低分辨率图片(Low Resolutionimage,LRimage)大小为W*H。将低分辨率图片输入场景文本识别模型中,获得场景文本位置和内容。本申请的场景文本识别模型为:文本识别模型ASTER(《Aster:An attentional scene text recognizer with flexible rectification》。本申请将文本识别模型预先完成训练。In this embodiment, the size of the low resolution image (Low Resolution image, LR image) is W*H. Input the low-resolution image into the scene text recognition model to obtain the location and content of the scene text. The scene text recognition model of this application is: text recognition model ASTER ("Aster: An attentional scene text recognizer with flexible rectification". This application completes the training of the text recognition model in advance.
在本实施例中,文本图像超分辨率重建方法运行于其上的电子设备(例如图1所示的服务器/终端设备)可以通过有线连接方式或者无线连接方式接收低分辨率图片和对应的 高分辨率图片。需要指出的是,上述无线连接方式可以包括但不限于3G/4G连接、WiFi连接、蓝牙连接、WiMAX连接、Zigbee连接、UWB(ultra wideband)连接、以及其他现在已知或将来开发的无线连接方式。In this embodiment, the electronic device on which the text image super-resolution reconstruction method runs (such as the server/terminal device shown in FIG. 1 ) can receive the low-resolution picture and the corresponding high-resolution picture through a wired or wireless connection. resolution picture. It should be pointed out that the above wireless connection methods may include but not limited to 3G/4G connection, WiFi connection, Bluetooth connection, WiMAX connection, Zigbee connection, UWB (ultra wideband) connection, and other wireless connection methods known or developed in the future .
S2:基于所述文本位置信息和所述文本内容信息生成文本掩模,并对所述文本掩膜进行上采样,获得目标掩膜。S2: Generate a text mask based on the text position information and the text content information, and perform upsampling on the text mask to obtain a target mask.
在本实施例中,基于所述文本位置信息和所述文本内容信息生成仅强调文本部分的文本掩膜(textmask),文本掩膜的大小与低分辨率图片相同。在文本掩膜中,文本存在的像素点标注为1,文本不存在的像素点标注为0,即获得一张二维掩膜,当文本掩膜大小为H*W时,是一个低分辨率图片的文本掩膜。将文本掩膜进行上采样(upsampling),获得新的文本掩膜,即目标掩膜,其大小为rW*rH,此时目标掩膜大小与生成的高分辨率图片相同,其中,对于接收到的高分辨率图片,需要进行图片的大小调整,调整为与目标掩膜的大小一致,便于后续的计算。目标掩膜用于后续对生成层的生成结果(即超分辨率图片)进行监督。本申请无需对高分辨率图片进行标注,完成场景文本识别部分操作。In this embodiment, a text mask (textmask) emphasizing only the text part is generated based on the text position information and the text content information, and the size of the text mask is the same as that of the low-resolution picture. In the text mask, the pixels where the text exists are marked as 1, and the pixels where the text does not exist are marked as 0, that is, a two-dimensional mask is obtained. When the size of the text mask is H*W, it is a low-resolution image. Text mask. Upsampling the text mask to obtain a new text mask, that is, the target mask, whose size is rW*rH, at this time the size of the target mask is the same as the generated high-resolution image, where, for the received The high-resolution image needs to be resized to be consistent with the size of the target mask, which is convenient for subsequent calculations. The target mask is used for subsequent supervision of the generation result of the generation layer (ie, the super-resolution image). This application does not need to label high-resolution images to complete the scene text recognition part of the operation.
具体的,所述基于所述文本位置信息和所述文本内容信息生成文本掩模的步骤包括:Specifically, the step of generating a text mask based on the text position information and the text content information includes:
基于所述文本内容信息修正所述文本位置信息,获得目标文本位置信息;modifying the text position information based on the text content information to obtain target text position information;
基于所述目标文本位置信息生成所述文本掩膜。The text mask is generated based on the target text location information.
在本实施例中,基于文本位置信息和所述文本内容信息生成文本掩膜。在图片中,有些地方的拍摄是不清楚的,或者位置的识别结果是错误的,计算机识别了文本内容信息能够生成一个准确度更高的掩膜。比如,如果不考虑内容,输出的掩膜可能是“goed”,而考虑了内容的网络,输出的则是“good”。In this embodiment, a text mask is generated based on the text position information and the text content information. In the picture, the shooting of some places is not clear, or the recognition result of the position is wrong, and the computer recognizes the text content information to generate a mask with higher accuracy. For example, if the content is not considered, the output mask may be "goed", while the network that considers the content outputs "good".
此外,作为本申请的另一实施例,所述对所述文本掩膜进行上采样,获得目标掩膜的步骤包括:In addition, as another embodiment of the present application, the step of upsampling the text mask and obtaining the target mask includes:
对所述文本掩膜进行多倍上采样,获得所述目标掩膜。performing multiple upsampling on the text mask to obtain the target mask.
在本实施例中,本申请对文本掩膜进行5倍的上采样,文本掩膜放大5倍,提高文本掩膜的分辨率,进而生成的超分辨率图片比低分辨率图片大5倍。In this embodiment, the application performs 5 times up-sampling on the text mask, and the text mask is enlarged by 5 times to improve the resolution of the text mask, and the generated super-resolution picture is 5 times larger than the low-resolution picture.
S3:将所述低分辨率图片和所述目标掩膜输入至预设的对抗网络的生成层中,获得输出的超分辨率图片。S3: Input the low-resolution picture and the target mask into a generation layer of a preset confrontation network to obtain an output super-resolution picture.
在本实施例中,场景文本识别之后,计算机通过对抗网络(Generative Adversarial Networks,GAN)的生成层来产生超分辨率图片:将低分辨率图片和目标掩膜同时输入生成对抗网络的生成层(Generator)中,生成层(Generator)生成超分辨率图片(Super Resolutionimage,SRimage)。In this embodiment, after the scene text is recognized, the computer generates a super-resolution image through the generation layer of the confrontation network (Generative Adversarial Networks, GAN): the low-resolution image and the target mask are simultaneously input into the generation layer of the generation confrontation network ( Generator), the generation layer (Generator) generates a super-resolution image (Super Resolution image, SRimage).
S4:将所述超分辨率图片和所述高分辨率图片同时输入至所述对抗网络的判别层中,获得输出的判别结果,并基于所述判别结果计算判别准确率。S4: Simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate a discrimination accuracy rate based on the discrimination result.
在本实施例中,将该超分辨率图片和高分辨率图片(High Resolutionimage,HRimage)同时输入判别层(Discriminator)中,判别层输出判别结果,即输出超分辨率图片或者输出高分辨率图片,例如,判别层输出0或者1,其中,0代表“图片为生成图片(超分辨率图片)”,1代表图片为真实图片(高分辨率图片)。本申请通过生成层和判别层的对抗式训练,随着生成层所产生的超分辨率图片与自然场景下的高分辨率图片越来越相似,越来越难以辨别,在判别层输出的准确率低于准确率阈值时,确定判别层难以辨别输入其中的是真实图片还是生成层生成的超分辨率图片,说明生成层生成的超分辨率图片质量高,与真实图片相似,即完成训练目标,用于实际应用中。其中,准确率的计算为预设的时间段内的判别层输出的判别结果为正确的数量与总的判别数量的比值。In this embodiment, the super-resolution picture and the high-resolution picture (High Resolution image, HRimage) are input into the discriminator layer (Discriminator) at the same time, and the discriminator layer outputs the discrimination result, that is, the output super-resolution picture or the high-resolution picture , for example, the discriminant layer outputs 0 or 1, where 0 represents "the picture is a generated picture (super-resolution picture)", and 1 represents that the picture is a real picture (high-resolution picture). In this application, through the adversarial training of the generation layer and the discrimination layer, as the super-resolution pictures generated by the generation layer are more and more similar to the high-resolution pictures in natural scenes, it becomes more and more difficult to distinguish, and the accurate output of the discrimination layer When the accuracy rate is lower than the accuracy rate threshold, it is determined that the discriminative layer is difficult to distinguish whether the input is a real image or a super-resolution image generated by the generation layer, indicating that the super-resolution image generated by the generation layer is of high quality and similar to the real image, that is, the training goal is completed , used in practical applications. Wherein, the calculation of the accuracy rate is the ratio of the number of correct judgment results output by the judgment layer within a preset time period to the total number of judgments.
S5:基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数,直至所述损失函数收敛,且所述判别准确率低于准确率阈值时,获得训练后的对抗网络。S5: Calculate the loss function of the adversarial network based on the low-resolution picture and the target mask until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, obtain the trained adversarial network .
在本实施例中,本申请涉及的损失函数,主要是生成对抗网络生成图片所涉及的损失函数,包含内容损失函数(contentloss),对抗损失函数(adversarialloss)和正则化损失函数(regularization loss),以及为文本掩膜设计的文本感知损失(text perceptual loss)。通过损失函数和判别准确率低于准确率阈值时,确定对抗网络训练完成,获得表现效果较佳的对抗网络。In this embodiment, the loss function involved in this application is mainly the loss function involved in generating images generated by the adversarial network, including content loss function (content loss), adversarial loss function (adversarial loss) and regularization loss function (regularization loss), And a text perceptual loss designed for text masks. When the loss function and the discrimination accuracy are lower than the accuracy threshold, it is determined that the training of the adversarial network is completed, and an adversarial network with better performance is obtained.
具体的,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:Specifically, the step of calculating the loss function of the confrontation network based on the low-resolution picture and the target mask includes:
基于所述低分辨率图片计算所述对抗网络的内容损失函数,所述内容损失函数的特征为:Calculate the content loss function of the confrontation network based on the low-resolution picture, the feature of the content loss function is:
Figure PCTCN2022071883-appb-000001
其中,
Figure PCTCN2022071883-appb-000002
为所述内容损失函数,
Figure PCTCN2022071883-appb-000003
为所述高分辨率图片在(x,y)位置的像素点的值,G θG(I LR) x,y为所述超分辨率图片在(x,y)位置的像素点的值,rW和rH分别为所述超分辨率图片的宽和长,r 2WH为所述超分辨率图片的像素点的总数量。
Figure PCTCN2022071883-appb-000001
in,
Figure PCTCN2022071883-appb-000002
is the content loss function,
Figure PCTCN2022071883-appb-000003
is the value of the pixel of the high-resolution picture at (x, y) position, G θG (I LR ) x, y is the value of the pixel of the super-resolution picture at (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, and r 2 WH is the total number of pixels in the super-resolution picture.
在本实施例中,内容损失函数计算的是均方方差,超分辨率图片和高分辨率图片的宽和长都分别是rW和rH。本申请计算宽为rW,长为rH的超分辨率图片和高分辨率图片所有位置的像素的差值和,除以像素点数量,作为文本感知损失。计算的是超分辨率图片和 高分辨率图片之间的损失。In this embodiment, the content loss function calculates the mean square error, and the width and length of the super-resolution picture and the high-resolution picture are rW and rH respectively. This application calculates the difference sum of pixels in all positions of the super-resolution image and the high-resolution image whose width is rW and length is rH, and divided by the number of pixels, as the text perception loss. What is calculated is the loss between the super-resolution image and the high-resolution image.
作为本申请的另一实施例,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:As another embodiment of the present application, the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask includes:
基于所述低分辨率图片计算所述对抗网络的对抗损失函数,所述对抗损失函数的特征为:Calculate the adversarial loss function of the adversarial network based on the low-resolution picture, the characteristics of the adversarial loss function are:
Figure PCTCN2022071883-appb-000004
其中,
Figure PCTCN2022071883-appb-000005
为所述对抗损失函数,G θG(I LR)为所述超分辨率图片,D θD为所述判别层,M为所述超分辨率图片的总数量,m表示所述超分辨率图片的个数。
Figure PCTCN2022071883-appb-000004
in,
Figure PCTCN2022071883-appb-000005
For the confrontation loss function, G θG (I LR ) is the super-resolution picture, D θD is the discriminant layer, M is the total number of the super-resolution picture, and m represents the super-resolution picture number.
在本实施例中,对抗损失要求判别层D成功分辨出生成层G所产生的超分辨率图片和输入其中的自然高分辨率图片。通过生成层和判别层的对抗训练,网络产生的超分辨率图片质量逐渐提高。M为输入至所述判别层中的超分辨率图片的总数量。In this embodiment, the adversarial loss requires the discriminative layer D to successfully distinguish between the super-resolution image generated by the generative layer G and the natural high-resolution image input therein. Through the adversarial training of the generation layer and the discriminative layer, the quality of the super-resolution images generated by the network is gradually improved. M is the total number of super-resolution pictures input into the discriminative layer.
此外,作为本申请的又一实施例,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:In addition, as another embodiment of the present application, the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask includes:
基于所述低分辨率图片计算所述对抗网络的正则化损失函数,所述正则化损失函数的特征为:Calculate the regularization loss function of the confrontation network based on the low-resolution picture, and the characteristics of the regularization loss function are:
Figure PCTCN2022071883-appb-000006
其中,
Figure PCTCN2022071883-appb-000007
为所述正则化损失函数,G θG(I LR) x,y为所述超分辨率图片在(x,y)位置的像素点的值,rW和rH分别为所述目标掩膜的宽和长,r 2WH为所述目标掩膜中像素点的总数量,‖‖表示范数,
Figure PCTCN2022071883-appb-000008
表示梯度。
Figure PCTCN2022071883-appb-000006
in,
Figure PCTCN2022071883-appb-000007
is the regularization loss function, G θG (I LR ) x, y is the value of the pixel point of the super-resolution image at (x, y) position, rW and rH are the width and long, r 2 WH is the total number of pixels in the target mask, ‖‖ represents the norm,
Figure PCTCN2022071883-appb-000008
Indicates the gradient.
在本实施例中,通过加入正则化损失函数,防止网络过拟合,加快整体的损失函数收敛。In this embodiment, by adding a regularized loss function, network overfitting is prevented and the overall loss function converges faster.
作为本申请的另一实施例,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:As another embodiment of the present application, the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask includes:
基于所述低分辨率图片计算所述对抗网络的文本感知损失函数,所述文本感知损失函数的特征为:Calculate the text-aware loss function of the confrontation network based on the low-resolution picture, the characteristics of the text-aware loss function are:
Figure PCTCN2022071883-appb-000009
其中,其中,l TR为所述文本感知损失函数,N为文本存在位置像素点的总数量,
Figure PCTCN2022071883-appb-000010
为所述目标掩膜,
Figure PCTCN2022071883-appb-000011
为所述超分辨率图片。
Figure PCTCN2022071883-appb-000009
Wherein, l TR is the text perception loss function, N is the total number of text existence position pixels,
Figure PCTCN2022071883-appb-000010
for the target mask,
Figure PCTCN2022071883-appb-000011
is the super-resolution picture.
在本实施例中,计算目标掩膜
Figure PCTCN2022071883-appb-000012
中文本存在的位置与生成层中生成图片对应位置的像素值差,N代表文本存在位置总像素点数量。将所有差值求和之后再除以N,即为文本感知函数。通过文本感知函数,生成层在构造新图片时,会产生更加清晰的文本。本 申请对掩膜中文本存在的位置像素标注为1,不存在的位置像素标注为0,通过上采样后,生成目标掩膜。目标掩膜通过文本感知损失函数监督生成层的生成结果,实现仅强调文本。
In this example, calculating the target mask
Figure PCTCN2022071883-appb-000012
The pixel value difference between the position where the Chinese text exists and the corresponding position of the generated picture in the generation layer, N represents the total number of pixels where the text exists. After summing all the differences and dividing by N, it is the text perception function. Through the text-aware function, the generative layer will produce clearer text when constructing new images. In this application, the position pixels where the text exists in the mask are marked as 1, and the position pixels that do not exist are marked as 0, and the target mask is generated after up-sampling. The target mask supervises the generation results of the generative layer through a text-aware loss function to emphasize only the text.
S6:接收待转化低分辨率图片,将所述待转化低分辨率图片输入至训练后的对抗网络中,获得输出的目标超分辨率图片。S6: Receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained adversarial network, and obtain the output target super-resolution picture.
在本实施例中,根据训练后的对抗网络,能够生成质量更加的目标超分辨率图片,保证图片中的文字信息清晰完整。In this embodiment, according to the trained adversarial network, a target super-resolution picture with higher quality can be generated to ensure that the text information in the picture is clear and complete.
本申请通过接收的低分辨率图片获得其文本位置和文本内容信息,并基于文本位置信息和文本内容信息生成文本掩膜,其文本掩膜的生成考虑到了文本的位置信息和内容信息,进而能够明确图片中文本与周围图像的之间的界限,使得后续生成的超分辨率图片中的文字清晰,显著提升重建后的图片的质量。通过对文本掩膜进行上采样,对文本掩膜进行放大,提高文本掩膜的分辨率,进而便于后续生成超分辨率图片。通过对抗网络中生成层和判别层的对抗训练,获得训练后的对抗网络,用于生成质量更加的目标超分辨率图片。The application obtains the text position and text content information through the received low-resolution pictures, and generates a text mask based on the text position information and text content information. The generation of the text mask takes into account the text position information and content information, and then can Clarify the boundary between the text in the picture and the surrounding image, so that the text in the super-resolution picture generated subsequently is clear, and the quality of the reconstructed picture is significantly improved. By upsampling the text mask, the text mask is enlarged, and the resolution of the text mask is improved, which facilitates the subsequent generation of super-resolution images. Through the adversarial training of the generation layer and the discriminative layer in the adversarial network, the trained adversarial network is obtained to generate a target super-resolution image with better quality.
需要强调的是,为进一步保证上述训练后的对抗网络的私密和安全性,上述训练后的对抗网络还可以存储于一区块链的节点中。It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned trained adversarial network, the above-mentioned trained adversarial network can also be stored in a block chain node.
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。The blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain (Blockchain), essentially a decentralized database, is a series of data blocks associated with each other using cryptographic methods. Each data block contains a batch of network transaction information, which is used to verify its Validity of information (anti-counterfeiting) and generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
本申请实施例可以基于人工智能技术对相关的数据进行获取和处理。其中,人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。The embodiments of the present application may acquire and process relevant data based on artificial intelligence technology. Among them, artificial intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. .
人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、机器人技术、生物识别技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, robotics technology, biometrics technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
本申请可应用于智慧医疗领域中,用于恢复医疗领域中的低分辨率图片,从而推动智慧城市的建设。The application can be applied in the field of smart medical treatment, and can be used to restore low-resolution pictures in the medical field, thereby promoting the construction of a smart city.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,该计算机可读指令可存储于一计算机可读取存 储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing related hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium. , when the computer-readable instructions are executed, they may include the processes of the embodiments of the above-mentioned methods. Wherein, the aforementioned storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flow chart of the accompanying drawings are displayed sequentially according to the arrows, these steps are not necessarily executed sequentially in the order indicated by the arrows. Unless otherwise specified herein, there is no strict order restriction on the execution of these steps, and they can be executed in other orders. Moreover, at least some of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the order of execution is also It is not necessarily performed sequentially, but may be performed alternately or alternately with at least a part of other steps or sub-steps or stages of other steps.
进一步参考图3,作为对上述图2所示方法的实现,本申请提供了一种文本图像超分辨率重建装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 3 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a text image super-resolution reconstruction device, which corresponds to the method embodiment shown in FIG. 2 , The device can be specifically applied to various electronic devices.
如图3所示,本实施例所述的文本图像超分辨率重建装置300包括:接收模块301、上采样模块302、生成模块303、判别模块304、计算模块305以及获得模块306。其中:接收模块301,用于接收低分辨率图片和对应的高分辨率图片,将所述低分辨率图片输入至预先训练的场景文本识别模型中,获得输出的文本位置信息和文本内容信息;上采样模块302,用于基于所述文本位置信息和所述文本内容信息生成文本掩模,并对所述文本掩膜进行上采样,获得目标掩膜;生成模块303,用于将所述低分辨率图片和所述目标掩膜输入至预设的对抗网络的生成层中,获得输出的超分辨率图片;判别模块304,用于将所述超分辨率图片和所述高分辨率图片同时输入至所述对抗网络的判别层中,获得输出的判别结果,并基于所述判别结果计算判别准确率;计算模块305,用于基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数,直至所述损失函数收敛,且所述判别准确率低于准确率阈值时,获得训练后的对抗网络;获得模块306,用于接收待转化低分辨率图片,将所述待转化低分辨率图片输入至训练后的对抗网络中,获得输出的目标超分辨率图片。As shown in FIG. 3 , the text image super-resolution reconstruction device 300 in this embodiment includes: a receiving module 301 , an upsampling module 302 , a generating module 303 , a judging module 304 , a calculating module 305 and an obtaining module 306 . Wherein: the receiving module 301 is used to receive the low-resolution picture and the corresponding high-resolution picture, input the low-resolution picture into the pre-trained scene text recognition model, and obtain the output text position information and text content information; An upsampling module 302, configured to generate a text mask based on the text position information and the text content information, and upsample the text mask to obtain a target mask; a generation module 303, configured to convert the low The resolution picture and the target mask are input into the generation layer of the preset confrontation network to obtain the output super-resolution picture; the discrimination module 304 is used to simultaneously combine the super-resolution picture and the high-resolution picture Input to the discriminant layer of the confrontation network to obtain the output discriminant result, and calculate the discriminant accuracy rate based on the discriminant result; the calculation module 305 is used to calculate the The loss function of the adversarial network, until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, the trained adversarial network is obtained; the obtaining module 306 is used to receive the low-resolution image to be converted, and convert the The low-resolution image to be converted is input into the trained confrontation network to obtain the output target super-resolution image.
在本实施例中,本申请通过接收的低分辨率图片获得其文本位置和文本内容信息,并基于文本位置信息和文本内容信息生成文本掩膜,其文本掩膜的生成考虑到了文本的位置信息和内容信息,进而能够明确图片中文本与周围图像的之间的界限,使得后续生成的超分辨率图片中的文字清晰,显著提升重建后的图片的质量。通过对文本掩膜进行上采样,对文本掩膜进行放大,提高文本掩膜的分辨率,进而便于后续生成超分辨率图片。通过对抗网络中生成层和判别层的对抗训练,获得训练后的对抗网络,用于生成质量更加的目标 超分辨率图片。In this embodiment, the application obtains its text position and text content information through the received low-resolution pictures, and generates a text mask based on the text position information and text content information, and the generation of its text mask takes into account the text position information And content information, and then can clarify the boundary between the text in the picture and the surrounding image, so that the text in the super-resolution picture generated subsequently is clear, and the quality of the reconstructed picture is significantly improved. By upsampling the text mask, the text mask is enlarged, and the resolution of the text mask is improved, which facilitates the subsequent generation of super-resolution images. Through the adversarial training of the generation layer and the discriminative layer in the adversarial network, the trained adversarial network is obtained to generate a target super-resolution image with better quality.
上采样模块302包括修正子模块和生成子模块,其中,修正子模块用于基于所述文本内容信息修正所述文本位置信息,获得目标文本位置信息;生成子模块用于基于所述目标文本位置信息生成所述文本掩膜。The up-sampling module 302 includes a correction submodule and a generation submodule, wherein the correction submodule is used to modify the text position information based on the text content information to obtain target text position information; the generation submodule is used to modify the text position information based on the target text position information to generate the text mask.
在本实施例的一些可选的实现方式中,上述上采样模块302进一步用于:对所述文本掩膜进行多倍上采样,获得所述目标掩膜。In some optional implementation manners of this embodiment, the upsampling module 302 is further configured to: perform multiple upsampling on the text mask to obtain the target mask.
在本实施例的一些可选的实现方式中,上述计算模块305进一步用于:基于所述低分辨率图片计算所述对抗网络的内容损失函数,所述内容损失函数的特征为:In some optional implementations of this embodiment, the calculation module 305 is further configured to: calculate a content loss function of the adversarial network based on the low-resolution picture, and the content loss function is characterized by:
Figure PCTCN2022071883-appb-000013
其中,
Figure PCTCN2022071883-appb-000014
为所述内容损失函数,
Figure PCTCN2022071883-appb-000015
为所述高分辨率图片在(x,y)位置的像素点的值,G θG(I LR) x,y为所述超分辨率图片在(x,y)位置的像素点的值,rW和rH分别为所述超分辨率图片的宽和长,r 2WH为所述超分辨率图片的像素点的总数量。
Figure PCTCN2022071883-appb-000013
in,
Figure PCTCN2022071883-appb-000014
is the content loss function,
Figure PCTCN2022071883-appb-000015
is the value of the pixel of the high-resolution picture at (x, y) position, G θG (I LR ) x, y is the value of the pixel of the super-resolution picture at (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, and r 2 WH is the total number of pixels in the super-resolution picture.
在本实施例的一些可选的实现方式中,上述计算模块305还进一步用于:基于所述低分辨率图片计算所述对抗网络的对抗损失函数,所述对抗损失函数的特征为:In some optional implementations of this embodiment, the calculation module 305 is further configured to: calculate the adversarial loss function of the adversarial network based on the low-resolution picture, and the characteristics of the adversarial loss function are:
Figure PCTCN2022071883-appb-000016
其中,
Figure PCTCN2022071883-appb-000017
为所述对抗损失函数,G θG(I LR)为所述超分辨率图片,D θD为所述判别层,M为所述超分辨率图片的总数量,m表示所述超分辨率图片的个数。
Figure PCTCN2022071883-appb-000016
in,
Figure PCTCN2022071883-appb-000017
For the confrontation loss function, G θG (I LR ) is the super-resolution picture, D θD is the discriminant layer, M is the total number of the super-resolution picture, and m represents the super-resolution picture number.
在本实施例的一些可选的实现方式中,上述计算模块305还进一步用于:基于所述低分辨率图片计算所述对抗网络的正则化损失函数,所述正则化损失函数的特征为:In some optional implementations of this embodiment, the calculation module 305 is further configured to: calculate a regularization loss function of the adversarial network based on the low-resolution picture, and the regularization loss function is characterized by:
Figure PCTCN2022071883-appb-000018
其中,
Figure PCTCN2022071883-appb-000019
为所述正则化损失函数,G θG(I LR) x,y为所述超分辨率图片在(x,y)位置的像素点的值,rW和rH分别为所述目标掩膜的宽和长,r 2WH为所述目标掩膜中像素点的总数量,‖‖表示范数,
Figure PCTCN2022071883-appb-000020
表示梯度。
Figure PCTCN2022071883-appb-000018
in,
Figure PCTCN2022071883-appb-000019
is the regularization loss function, G θG (I LR ) x, y is the value of the pixel point of the super-resolution image at (x, y) position, rW and rH are the width and long, r 2 WH is the total number of pixels in the target mask, ‖‖ represents the norm,
Figure PCTCN2022071883-appb-000020
Indicates the gradient.
在本实施例的一些可选的实现方式中,上述计算模块305还进一步用于:基于所述低分辨率图片计算所述对抗网络的文本感知损失函数,所述文本感知损失函数的特征为:In some optional implementations of this embodiment, the calculation module 305 is further configured to: calculate a text-aware loss function of the adversarial network based on the low-resolution picture, and the characteristics of the text-aware loss function are:
Figure PCTCN2022071883-appb-000021
其中,其中,l TR为所述文本感知损失函数,N为文本存在位置像素点的总数量,
Figure PCTCN2022071883-appb-000022
为所述目标掩膜,
Figure PCTCN2022071883-appb-000023
为所述超分辨率图片。
Figure PCTCN2022071883-appb-000021
Wherein, l TR is the text perception loss function, N is the total number of text existence position pixels,
Figure PCTCN2022071883-appb-000022
for the target mask,
Figure PCTCN2022071883-appb-000023
is the super-resolution picture.
本申请通过接收的低分辨率图片获得其文本位置和文本内容信息,并基于文本位置信息和文本内容信息生成文本掩膜,其文本掩膜的生成考虑到了文本的位置信息和内容信息,进而能够明确图片中文本与周围图像的之间的界限,使得后续生成的超分辨率图片中的文 字清晰,显著提升重建后的图片的质量。通过对文本掩膜进行上采样,对文本掩膜进行放大,提高文本掩膜的分辨率,进而便于后续生成超分辨率图片。通过对抗网络中生成层和判别层的对抗训练,获得训练后的对抗网络,用于生成质量更加的目标超分辨率图片。The application obtains the text position and text content information through the received low-resolution pictures, and generates a text mask based on the text position information and text content information. The generation of the text mask takes into account the text position information and content information, and then can Clarify the boundary between the text in the picture and the surrounding image, so that the text in the super-resolution picture generated subsequently is clear, and the quality of the reconstructed picture is significantly improved. By upsampling the text mask, the text mask is enlarged, and the resolution of the text mask is improved, which facilitates the subsequent generation of super-resolution images. Through the adversarial training of the generation layer and the discriminative layer in the adversarial network, the trained adversarial network is obtained to generate a target super-resolution image with better quality.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图4,图4为本实施例计算机设备基本结构框图。In order to solve the above technical problems, the embodiment of the present application further provides computer equipment. Please refer to FIG. 4 for details. FIG. 4 is a block diagram of the basic structure of the computer device in this embodiment.
所述计算机设备200包括通过系统总线相互通信连接存储器201、处理器202、网络接口203。需要指出的是,图中仅示出了具有组件201-203的计算机设备200,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。The computer device 200 includes a memory 201 , a processor 202 , and a network interface 203 connected to each other through a system bus for communication. It should be noted that only the computer device 200 having components 201-203 is shown in the figure, but it should be understood that it is not required to implement all the illustrated components, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to preset or stored instructions, and its hardware includes but is not limited to microprocessors, dedicated Integrated circuit (Application Specific Integrated Circuit, ASIC), programmable gate array (Field-Programmable Gate Array, FPGA), digital processor (Digital Signal Processor, DSP), embedded devices, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be computing equipment such as a desktop computer, a notebook, a palmtop computer, and a cloud server. The computer device can perform human-computer interaction with the user through keyboard, mouse, remote controller, touch panel or voice control device.
所述存储器201至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。所述计算机可读存储介质可以是非易失性,也可以是易失性。在一些实施例中,所述存储器201可以是所述计算机设备200的内部存储单元,例如该计算机设备200的硬盘或内存。在另一些实施例中,所述存储器201也可以是所述计算机设备200的外部存储设备,例如该计算机设备200上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器201还可以既包括所述计算机设备200的内部存储单元也包括其外部存储设备。本实施例中,所述存储器201通常用于存储安装于所述计算机设备200的操作系统和各类应用软件,例如文本图像超分辨率重建方法的计算机可读指令等。此外,所述存储器201还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 201 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. The computer-readable storage medium may be non-volatile or volatile. In some embodiments, the storage 201 may be an internal storage unit of the computer device 200 , such as a hard disk or memory of the computer device 200 . In some other embodiments, the memory 201 can also be an external storage device of the computer device 200, such as a plug-in hard disk equipped on the computer device 200, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the storage 201 may also include both the internal storage unit of the computer device 200 and its external storage device. In this embodiment, the memory 201 is generally used to store the operating system and various application software installed in the computer device 200 , such as computer-readable instructions of a text image super-resolution reconstruction method and the like. In addition, the memory 201 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器202在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器202通常用于控制所述计 算机设备200的总体操作。本实施例中,所述处理器202用于运行所述存储器201中存储的计算机可读指令或者处理数据,例如运行所述文本图像超分辨率重建方法的计算机可读指令。The processor 202 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chips in some embodiments. The processor 202 is generally used to control the overall operation of the computer device 200. In this embodiment, the processor 202 is configured to run computer-readable instructions stored in the memory 201 or process data, such as computer-readable instructions for running the text image super-resolution reconstruction method.
所述网络接口203可包括无线网络接口或有线网络接口,该网络接口203通常用于在所述计算机设备200与其他电子设备之间建立通信连接。The network interface 203 may include a wireless network interface or a wired network interface, and the network interface 203 is generally used to establish a communication connection between the computer device 200 and other electronic devices.
在本实施例中,文本掩膜的生成考虑到了文本的位置信息和内容信息,进而能够明确图片中文本与周围图像的之间的界限,显著提升重建后的图片的质量。通过对抗网络中生成层和判别层的对抗训练,获得训练后的对抗网络,用于生成质量更加的目标超分辨率图片。In this embodiment, the generation of the text mask takes into account the position information and content information of the text, so that the boundary between the text in the picture and the surrounding image can be clarified, and the quality of the reconstructed picture can be significantly improved. Through the adversarial training of the generation layer and the discriminative layer in the adversarial network, the trained adversarial network is obtained to generate a target super-resolution image with better quality.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令,所述计算机可读指令可被至少一个处理器执行,以使所述至少一个处理器执行如上述的文本图像超分辨率重建方法的步骤。The present application also provides another implementation manner, which is to provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions can be executed by at least one processor to The at least one processor is made to execute the steps of the above text image super-resolution reconstruction method.
在本实施例中,文本掩膜的生成考虑到了文本的位置信息和内容信息,进而能够明确图片中文本与周围图像的之间的界限,显著提升重建后的图片的质量。通过对抗网络中生成层和判别层的对抗训练,获得训练后的对抗网络,用于生成质量更加的目标超分辨率图片。In this embodiment, the generation of the text mask takes into account the position information and content information of the text, so that the boundary between the text in the picture and the surrounding image can be clarified, and the quality of the reconstructed picture can be significantly improved. Through the adversarial training of the generation layer and the discriminative layer in the adversarial network, the trained adversarial network is obtained to generate a target super-resolution image with better quality.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation. Based on such an understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全 面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Apparently, the embodiments described above are only some of the embodiments of the present application, not all of them. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. The present application can be implemented in many different forms, on the contrary, the purpose of providing these embodiments is to make the understanding of the disclosure of the present application more thorough and comprehensive. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features . All equivalent structures made using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are also within the scope of protection of this application.

Claims (20)

  1. 一种文本图像超分辨率重建方法,包括下述步骤:A text image super-resolution reconstruction method, comprising the steps of:
    接收低分辨率图片和对应的高分辨率图片,将所述低分辨率图片输入至预先训练的场景文本识别模型中,获得输出的文本位置信息和文本内容信息;Receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
    基于所述文本位置信息和所述文本内容信息生成文本掩模,并对所述文本掩膜进行上采样,获得目标掩膜;generating a text mask based on the text position information and the text content information, and upsampling the text mask to obtain a target mask;
    将所述低分辨率图片和所述目标掩膜输入至预设的对抗网络的生成层中,获得输出的超分辨率图片;Input the low-resolution picture and the target mask into the generation layer of the preset confrontation network to obtain an output super-resolution picture;
    将所述超分辨率图片和所述高分辨率图片同时输入至所述对抗网络的判别层中,获得输出的判别结果,并基于所述判别结果计算判别准确率;Simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate the discrimination accuracy rate based on the discrimination result;
    基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数,直至所述损失函数收敛,且所述判别准确率低于准确率阈值时,获得训练后的对抗网络;Calculate the loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, a trained confrontation network is obtained;
    接收待转化低分辨率图片,将所述待转化低分辨率图片输入至训练后的对抗网络中,获得输出的目标超分辨率图片。Receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
  2. 根据权利要求1所述的文本图像超分辨率重建方法,其中,所述基于所述文本位置信息和所述文本内容信息生成文本掩模的步骤包括:The text image super-resolution reconstruction method according to claim 1, wherein the step of generating a text mask based on the text position information and the text content information comprises:
    基于所述文本内容信息修正所述文本位置信息,获得目标文本位置信息;modifying the text position information based on the text content information to obtain target text position information;
    基于所述目标文本位置信息生成所述文本掩膜。The text mask is generated based on the target text location information.
  3. 根据权利要求1所述的文本图像超分辨率重建方法,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The text image super-resolution reconstruction method according to claim 1, wherein the step of calculating the loss function of the confrontation network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的内容损失函数,所述内容损失函数的特征为:Calculate the content loss function of the confrontation network based on the low-resolution picture, the feature of the content loss function is:
    Figure PCTCN2022071883-appb-100001
    其中,
    Figure PCTCN2022071883-appb-100002
    为所述内容损失函数,
    Figure PCTCN2022071883-appb-100003
    为所述高分辨率图片在(x,y)位置的像素点的值,G θG(I LR) x,y为所述超分辨率图片在(x,y)位置的像素点的值,rW和rH分别为所述超分辨率图片的宽和长,r 2WH为所述超分辨率 图片的像素点的总数量。
    Figure PCTCN2022071883-appb-100001
    in,
    Figure PCTCN2022071883-appb-100002
    is the content loss function,
    Figure PCTCN2022071883-appb-100003
    is the value of the pixel of the high-resolution picture at (x, y) position, G θG (I LR ) x, y is the value of the pixel of the super-resolution picture at (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, and r 2 WH is the total number of pixels in the super-resolution picture.
  4. 根据权利要求1所述的文本图像超分辨率重建方法,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The text image super-resolution reconstruction method according to claim 1, wherein the step of calculating the loss function of the confrontation network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的对抗损失函数,所述对抗损失函数的特征为:Calculate the adversarial loss function of the adversarial network based on the low-resolution picture, the characteristics of the adversarial loss function are:
    Figure PCTCN2022071883-appb-100004
    其中,
    Figure PCTCN2022071883-appb-100005
    为所述对抗损失函数,G θG(I LR)为所述超分辨率图片,D θD为所述判别层,M为所述超分辨率图片的总数量,m表示所述超分辨率图片的个数。
    Figure PCTCN2022071883-appb-100004
    in,
    Figure PCTCN2022071883-appb-100005
    For the confrontation loss function, G θG (I LR ) is the super-resolution picture, D θD is the discriminant layer, M is the total number of the super-resolution picture, and m represents the super-resolution picture number.
  5. 根据权利要求1所述的文本图像超分辨率重建方法,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The text image super-resolution reconstruction method according to claim 1, wherein the step of calculating the loss function of the confrontation network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的正则化损失函数,所述正则化损失函数的特征为:Calculate the regularization loss function of the confrontation network based on the low-resolution picture, and the characteristics of the regularization loss function are:
    Figure PCTCN2022071883-appb-100006
    其中,
    Figure PCTCN2022071883-appb-100007
    为所述正则化损失函数,G θG(I LR) x,y为所述超分辨率图片在(x,y)位置的像素点的值,rW和rH分别为所述目标掩膜的宽和长,r 2WH为所述目标掩膜中像素点的总数量,‖‖表示范数,
    Figure PCTCN2022071883-appb-100008
    表示梯度。
    Figure PCTCN2022071883-appb-100006
    in,
    Figure PCTCN2022071883-appb-100007
    is the regularization loss function, G θG (I LR ) x, y is the value of the pixel point of the super-resolution image at (x, y) position, rW and rH are the width and long, r 2 WH is the total number of pixels in the target mask, ‖‖ represents the norm,
    Figure PCTCN2022071883-appb-100008
    Indicates the gradient.
  6. 根据权利要求1所述的文本图像超分辨率重建方法,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The text image super-resolution reconstruction method according to claim 1, wherein the step of calculating the loss function of the confrontation network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的文本感知损失函数,所述文本感知损失函数的特征为:Calculate the text-aware loss function of the confrontation network based on the low-resolution picture, the characteristics of the text-aware loss function are:
    Figure PCTCN2022071883-appb-100009
    其中,其中,l TR为所述文本感知损失函数,N为文本存在位置像素点的总数量,
    Figure PCTCN2022071883-appb-100010
    为所述目标掩膜,
    Figure PCTCN2022071883-appb-100011
    为所述超分辨率图片。
    Figure PCTCN2022071883-appb-100009
    Wherein, l TR is the text perception loss function, N is the total number of text existence position pixels,
    Figure PCTCN2022071883-appb-100010
    for the target mask,
    Figure PCTCN2022071883-appb-100011
    is the super-resolution picture.
  7. 根据权利要求1所述的文本图像超分辨率重建方法,其中,所述对所述文本掩膜进行上采样,获得目标掩膜的步骤包括:The text image super-resolution reconstruction method according to claim 1, wherein said step of upsampling said text mask and obtaining a target mask comprises:
    对所述文本掩膜进行多倍上采样,获得所述目标掩膜。performing multiple upsampling on the text mask to obtain the target mask.
  8. 一种文本图像超分辨率重建装置,包括:A text image super-resolution reconstruction device, comprising:
    接收模块,用于接收低分辨率图片和对应的高分辨率图片,将所述低分辨率图片输入至预先训练的场景文本识别模型中,获得输出的文本位置信息和文本内容信息;A receiving module, configured to receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
    上采样模块,用于基于所述文本位置信息和所述文本内容信息生成文本掩模,并对所述文本掩膜进行上采样,获得目标掩膜;An upsampling module, configured to generate a text mask based on the text position information and the text content information, and perform upsampling on the text mask to obtain a target mask;
    生成模块,用于将所述低分辨率图片和所述目标掩膜输入至预设的对抗网络的生成层中,获得输出的超分辨率图片;A generating module, configured to input the low-resolution image and the target mask into a preset generation layer of the confrontation network to obtain an output super-resolution image;
    判别模块,用于将所述超分辨率图片和所述高分辨率图片同时输入至所述对抗网络的判别层中,获得输出的判别结果,并基于所述判别结果计算判别准确率;A discrimination module, configured to simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate the discrimination accuracy based on the discrimination result;
    计算模块,用于基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数,直至所述损失函数收敛,且所述判别准确率低于准确率阈值时,获得训练后的对抗网络;A calculation module, configured to calculate the loss function of the confrontation network based on the low-resolution picture and the target mask, until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, a post-training against the network;
    获得模块,用于接收待转化低分辨率图片,将所述待转化低分辨率图片输入至训练后的对抗网络中,获得输出的目标超分辨率图片。The obtaining module is used to receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述处理器执行所述计算机可读指令时实现如下所述的文本图像超分辨率重建方法的步骤:A computer device, comprising a memory and a processor, computer-readable instructions are stored in the memory, and the processor implements the steps of the text image super-resolution reconstruction method as follows when executing the computer-readable instructions:
    接收低分辨率图片和对应的高分辨率图片,将所述低分辨率图片输入至预先训练的场景文本识别模型中,获得输出的文本位置信息和文本内容信息;Receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
    基于所述文本位置信息和所述文本内容信息生成文本掩模,并对所述文本掩膜进行上采样,获得目标掩膜;generating a text mask based on the text position information and the text content information, and upsampling the text mask to obtain a target mask;
    将所述低分辨率图片和所述目标掩膜输入至预设的对抗网络的生成层中,获得输出的超分辨率图片;Input the low-resolution picture and the target mask into the generation layer of the preset confrontation network to obtain an output super-resolution picture;
    将所述超分辨率图片和所述高分辨率图片同时输入至所述对抗网络的判别层中,获得输出的判别结果,并基于所述判别结果计算判别准确率;Simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate the discrimination accuracy rate based on the discrimination result;
    基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数,直至所述损失函数收敛,且所述判别准确率低于准确率阈值时,获得训练后的对抗网络;Calculate the loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, a trained confrontation network is obtained;
    接收待转化低分辨率图片,将所述待转化低分辨率图片输入至训练后的对抗网络中,获得输出的目标超分辨率图片。Receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
  10. 根据权利要求9所述的计算机设备,其中,所述基于所述文本位置信息和所述文本内容信息生成文本掩模的步骤包括:The computer device according to claim 9, wherein the step of generating a text mask based on the text position information and the text content information comprises:
    基于所述文本内容信息修正所述文本位置信息,获得目标文本位置信息;modifying the text position information based on the text content information to obtain target text position information;
    基于所述目标文本位置信息生成所述文本掩膜。The text mask is generated based on the target text location information.
  11. 根据权利要求9所述的计算机设备,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The computer device according to claim 9, wherein the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的内容损失函数,所述内容损失函数的特征为:Calculate the content loss function of the confrontation network based on the low-resolution picture, the feature of the content loss function is:
    Figure PCTCN2022071883-appb-100012
    其中,
    Figure PCTCN2022071883-appb-100013
    为所述内容损失函数,
    Figure PCTCN2022071883-appb-100014
    为所述高分辨率图片在(x,y)位置的像素点的值,G θG(I LR) x,y为所述超分辨率图片在(x,y)位置的像素点的值,rW和rH分别为所述超分辨率图片的宽和长,r 2WH为所述超分辨率图片的像素点的总数量。
    Figure PCTCN2022071883-appb-100012
    in,
    Figure PCTCN2022071883-appb-100013
    is the content loss function,
    Figure PCTCN2022071883-appb-100014
    is the value of the pixel of the high-resolution picture at (x, y) position, G θG (I LR ) x, y is the value of the pixel of the super-resolution picture at (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, and r 2 WH is the total number of pixels in the super-resolution picture.
  12. 根据权利要求9所述的计算机设备,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The computer device according to claim 9, wherein the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的对抗损失函数,所述对抗损失函数的特征为:Calculate the adversarial loss function of the adversarial network based on the low-resolution picture, the characteristics of the adversarial loss function are:
    Figure PCTCN2022071883-appb-100015
    其中,
    Figure PCTCN2022071883-appb-100016
    为所述对抗损失函数,G θG(I LR)为所述超分辨率图片,D θD为所述判别层,M为所述超分辨率图片的总数量,m表示所述超分辨率图片的个数。
    Figure PCTCN2022071883-appb-100015
    in,
    Figure PCTCN2022071883-appb-100016
    For the confrontation loss function, G θG (I LR ) is the super-resolution picture, D θD is the discriminant layer, M is the total number of the super-resolution picture, and m represents the super-resolution picture number.
  13. 根据权利要求9所述的计算机设备,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The computer device according to claim 9, wherein the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的正则化损失函数,所述正则化损失函数的特征为:Calculate the regularization loss function of the confrontation network based on the low-resolution picture, and the characteristics of the regularization loss function are:
    Figure PCTCN2022071883-appb-100017
    其中,
    Figure PCTCN2022071883-appb-100018
    为所述正则化损失函数,G θG(I LR) x,y为所述超分辨率图片在(x,y)位置的像素点的值,rW和rH分别为所述目标掩膜的宽和长,r 2WH为所述目标掩膜中像素点的总数量,‖‖表示范数,
    Figure PCTCN2022071883-appb-100019
    表示梯度。
    Figure PCTCN2022071883-appb-100017
    in,
    Figure PCTCN2022071883-appb-100018
    is the regularization loss function, G θG (I LR ) x, y is the value of the pixel point of the super-resolution image at (x, y) position, rW and rH are the width and long, r 2 WH is the total number of pixels in the target mask, ‖‖ represents the norm,
    Figure PCTCN2022071883-appb-100019
    Indicates the gradient.
  14. 根据权利要求9所述的计算机设备,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The computer device according to claim 9, wherein the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的文本感知损失函数,所述文本感知损失函数的特征为:Calculate the text-aware loss function of the confrontation network based on the low-resolution picture, the characteristics of the text-aware loss function are:
    Figure PCTCN2022071883-appb-100020
    其中,其中,l TR为所述文本感知损失函数,N为文本存在位置像素点的总数量,
    Figure PCTCN2022071883-appb-100021
    为所述目标掩膜,
    Figure PCTCN2022071883-appb-100022
    为所述超分辨率图片。
    Figure PCTCN2022071883-appb-100020
    Wherein, l TR is the text perception loss function, N is the total number of text existence position pixels,
    Figure PCTCN2022071883-appb-100021
    for the target mask,
    Figure PCTCN2022071883-appb-100022
    is the super-resolution picture.
  15. 根据权利要求9所述的计算机设备,其中,所述对所述文本掩膜进行上采样,获得目标掩膜的步骤包括:The computer device according to claim 9, wherein said step of upsampling said text mask to obtain a target mask comprises:
    对所述文本掩膜进行多倍上采样,获得所述目标掩膜。performing multiple upsampling on the text mask to obtain the target mask.
  16. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现如下所述的文本图像超分辨率重建方法的步骤:A computer-readable storage medium, computer-readable instructions are stored on the computer-readable storage medium, and when the computer-readable instructions are executed by a processor, the steps of the following text image super-resolution reconstruction method are realized:
    接收低分辨率图片和对应的高分辨率图片,将所述低分辨率图片输入至预先训练的场景文本识别模型中,获得输出的文本位置信息和文本内容信息;Receive a low-resolution picture and a corresponding high-resolution picture, input the low-resolution picture into a pre-trained scene text recognition model, and obtain output text position information and text content information;
    基于所述文本位置信息和所述文本内容信息生成文本掩模,并对所述文本掩膜进行上采样,获得目标掩膜;generating a text mask based on the text position information and the text content information, and upsampling the text mask to obtain a target mask;
    将所述低分辨率图片和所述目标掩膜输入至预设的对抗网络的生成层中,获得输出的超分辨率图片;Input the low-resolution picture and the target mask into the generation layer of the preset confrontation network to obtain an output super-resolution picture;
    将所述超分辨率图片和所述高分辨率图片同时输入至所述对抗网络的判别层中,获得输出的判别结果,并基于所述判别结果计算判别准确率;Simultaneously input the super-resolution picture and the high-resolution picture into the discrimination layer of the confrontation network, obtain an output discrimination result, and calculate the discrimination accuracy rate based on the discrimination result;
    基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数,直至所述损失函数收敛,且所述判别准确率低于准确率阈值时,获得训练后的对抗网络;Calculate the loss function of the confrontation network based on the low-resolution picture and the target mask until the loss function converges, and when the discrimination accuracy is lower than the accuracy threshold, a trained confrontation network is obtained;
    接收待转化低分辨率图片,将所述待转化低分辨率图片输入至训练后的对抗网络中,获得输出的目标超分辨率图片。Receive the low-resolution picture to be converted, input the low-resolution picture to be converted into the trained confrontation network, and obtain the output target super-resolution picture.
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述基于所述文本位置信息和所述文本内容信息生成文本掩模的步骤包括:The computer-readable storage medium according to claim 16, wherein the step of generating a text mask based on the text position information and the text content information comprises:
    基于所述文本内容信息修正所述文本位置信息,获得目标文本位置信息;modifying the text position information based on the text content information to obtain target text position information;
    基于所述目标文本位置信息生成所述文本掩膜。The text mask is generated based on the target text location information.
  18. 根据权利要求16所述的计算机可读存储介质,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The computer-readable storage medium according to claim 16, wherein the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的内容损失函数,所述内容损失函数的特征为:Calculate the content loss function of the confrontation network based on the low-resolution picture, the feature of the content loss function is:
    Figure PCTCN2022071883-appb-100023
    其中,
    Figure PCTCN2022071883-appb-100024
    为所述内容损失函数,
    Figure PCTCN2022071883-appb-100025
    为所述高分辨率图片在(x,y)位置的像素点的值,G θG(I LR) x,y为所述超分辨率图片在(x,y)位置的像素点的值,rW和rH分别为所述超分辨率图片的宽和长,r 2WH为所述超分辨率图片的像素点的总数量。
    Figure PCTCN2022071883-appb-100023
    in,
    Figure PCTCN2022071883-appb-100024
    is the content loss function,
    Figure PCTCN2022071883-appb-100025
    is the value of the pixel of the high-resolution picture at (x, y) position, G θG (I LR ) x, y is the value of the pixel of the super-resolution picture at (x, y) position, rW and rH are the width and length of the super-resolution picture, respectively, and r 2 WH is the total number of pixels in the super-resolution picture.
  19. 根据权利要求16所述的计算机可读存储介质,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The computer-readable storage medium according to claim 16, wherein the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的对抗损失函数,所述对抗损失函数的特征为:Calculate the adversarial loss function of the adversarial network based on the low-resolution picture, the characteristics of the adversarial loss function are:
    Figure PCTCN2022071883-appb-100026
    其中,
    Figure PCTCN2022071883-appb-100027
    为所述对抗损失函数,G θG(I LR)为所述超分辨率图片,D θD为所述判别层,M为所述超分辨率图片的总数量,m表示所述超分辨率图片的个数。
    Figure PCTCN2022071883-appb-100026
    in,
    Figure PCTCN2022071883-appb-100027
    For the confrontation loss function, G θG (I LR ) is the super-resolution picture, D θD is the discriminant layer, M is the total number of the super-resolution picture, and m represents the super-resolution picture number.
  20. 根据权利要求16所述的计算机可读存储介质,其中,所述基于所述低分辨率图片和所述目标掩膜计算所述对抗网络的损失函数的步骤包括:The computer-readable storage medium according to claim 16, wherein the step of calculating the loss function of the adversarial network based on the low-resolution picture and the target mask comprises:
    基于所述低分辨率图片计算所述对抗网络的正则化损失函数,所述正则化损失函数的特征为:Calculate the regularization loss function of the confrontation network based on the low-resolution picture, and the characteristics of the regularization loss function are:
    Figure PCTCN2022071883-appb-100028
    其中,
    Figure PCTCN2022071883-appb-100029
    为所述正则化损失函数,G θG(I LR) x,y为所述超分辨率图片在(x,y)位置的像素点的值,rW和rH分别为所述目标掩膜的宽和长,r 2WH为所述目标掩膜中像素点的总数量,‖‖表示范数,
    Figure PCTCN2022071883-appb-100030
    表示梯度。
    Figure PCTCN2022071883-appb-100028
    in,
    Figure PCTCN2022071883-appb-100029
    is the regularization loss function, G θG (I LR ) x, y is the value of the pixel point of the super-resolution image at (x, y) position, rW and rH are the width and long, r 2 WH is the total number of pixels in the target mask, ‖‖ represents the norm,
    Figure PCTCN2022071883-appb-100030
    Indicates the gradient.
PCT/CN2022/071883 2021-09-10 2022-01-13 Super-resolution reconstruction method for text image and related device thereof WO2023035531A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111061974.7 2021-09-10
CN202111061974.7A CN113763249A (en) 2021-09-10 2021-09-10 Text image super-resolution reconstruction method and related equipment thereof

Publications (1)

Publication Number Publication Date
WO2023035531A1 true WO2023035531A1 (en) 2023-03-16

Family

ID=78794915

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071883 WO2023035531A1 (en) 2021-09-10 2022-01-13 Super-resolution reconstruction method for text image and related device thereof

Country Status (2)

Country Link
CN (1) CN113763249A (en)
WO (1) WO2023035531A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385318A (en) * 2023-06-06 2023-07-04 湖南纵骏信息科技有限公司 Image quality enhancement method and system based on cloud desktop

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113763249A (en) * 2021-09-10 2021-12-07 平安科技(深圳)有限公司 Text image super-resolution reconstruction method and related equipment thereof
CN116368512A (en) * 2021-10-29 2023-06-30 京东方科技集团股份有限公司 Image processing method, electronic device, and non-transitory computer readable medium
CN114172873B (en) * 2021-12-13 2023-05-30 中国平安财产保险股份有限公司 Resolution adjustment method, resolution adjustment device, server and computer readable storage medium
CN115829837A (en) * 2022-11-15 2023-03-21 深圳市新良田科技股份有限公司 Text image super-resolution reconstruction method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
CN112508782A (en) * 2020-09-10 2021-03-16 浙江大华技术股份有限公司 Network model training method, face image super-resolution reconstruction method and equipment
US11003865B1 (en) * 2020-05-20 2021-05-11 Google Llc Retrieval-augmented language model pre-training and fine-tuning
CN113256494A (en) * 2021-06-02 2021-08-13 同济大学 Text image super-resolution method
CN113763249A (en) * 2021-09-10 2021-12-07 平安科技(深圳)有限公司 Text image super-resolution reconstruction method and related equipment thereof

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107154023A (en) * 2017-05-17 2017-09-12 电子科技大学 Face super-resolution reconstruction method based on generation confrontation network and sub-pix convolution
US11003865B1 (en) * 2020-05-20 2021-05-11 Google Llc Retrieval-augmented language model pre-training and fine-tuning
CN112508782A (en) * 2020-09-10 2021-03-16 浙江大华技术股份有限公司 Network model training method, face image super-resolution reconstruction method and equipment
CN113256494A (en) * 2021-06-02 2021-08-13 同济大学 Text image super-resolution method
CN113763249A (en) * 2021-09-10 2021-12-07 平安科技(深圳)有限公司 Text image super-resolution reconstruction method and related equipment thereof

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385318A (en) * 2023-06-06 2023-07-04 湖南纵骏信息科技有限公司 Image quality enhancement method and system based on cloud desktop
CN116385318B (en) * 2023-06-06 2023-10-10 湖南纵骏信息科技有限公司 Image quality enhancement method and system based on cloud desktop

Also Published As

Publication number Publication date
CN113763249A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
WO2023035531A1 (en) Super-resolution reconstruction method for text image and related device thereof
US20230081645A1 (en) Detecting forged facial images using frequency domain information and local correlation
CN111127304B (en) Cross-domain image conversion
WO2022105125A1 (en) Image segmentation method and apparatus, computer device, and storage medium
CN109858333B (en) Image processing method, image processing device, electronic equipment and computer readable medium
WO2023159746A1 (en) Image matting method and apparatus based on image segmentation, computer device, and medium
WO2023124040A1 (en) Facial recognition method and apparatus
CN113012712A (en) Face video synthesis method and device based on generation countermeasure network
CN113379627A (en) Training method of image enhancement model and method for enhancing image
CN112686243A (en) Method and device for intelligently identifying picture characters, computer equipment and storage medium
CN113012075A (en) Image correction method and device, computer equipment and storage medium
CN114792355A (en) Virtual image generation method and device, electronic equipment and storage medium
CN112651399A (en) Method for detecting same-line characters in oblique image and related equipment thereof
WO2022001233A1 (en) Pre-labeling method based on hierarchical transfer learning and related device
CN112016502B (en) Safety belt detection method, safety belt detection device, computer equipment and storage medium
CN113516697A (en) Image registration method and device, electronic equipment and computer-readable storage medium
CN112669244A (en) Face image enhancement method and device, computer equipment and readable storage medium
CN114926322B (en) Image generation method, device, electronic equipment and storage medium
CN110837332A (en) Face image deformation method and device, electronic equipment and computer readable medium
CN113362249B (en) Text image synthesis method, text image synthesis device, computer equipment and storage medium
CN115601235A (en) Image super-resolution network training method, device, equipment and storage medium
WO2022105120A1 (en) Text detection method and apparatus from image, computer device and storage medium
WO2022174517A1 (en) Crowd counting method and apparatus, computer device and storage medium
WO2022178975A1 (en) Noise field-based image noise reduction method and apparatus, device, and storage medium
WO2022142032A1 (en) Handwritten signature verification method and apparatus, computer device, and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22866011

Country of ref document: EP

Kind code of ref document: A1