CN112801911B - Method and device for removing text noise in natural image and storage medium - Google Patents

Method and device for removing text noise in natural image and storage medium Download PDF

Info

Publication number
CN112801911B
CN112801911B CN202110172477.8A CN202110172477A CN112801911B CN 112801911 B CN112801911 B CN 112801911B CN 202110172477 A CN202110172477 A CN 202110172477A CN 112801911 B CN112801911 B CN 112801911B
Authority
CN
China
Prior art keywords
image
repaired
area
mask
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110172477.8A
Other languages
Chinese (zh)
Other versions
CN112801911A (en
Inventor
王波
张百灵
崔嵬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Changzuichu Software Co ltd
Original Assignee
Suzhou Changzuichu Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Changzuichu Software Co ltd filed Critical Suzhou Changzuichu Software Co ltd
Priority to CN202110172477.8A priority Critical patent/CN112801911B/en
Publication of CN112801911A publication Critical patent/CN112801911A/en
Application granted granted Critical
Publication of CN112801911B publication Critical patent/CN112801911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Processing (AREA)
  • Character Input (AREA)

Abstract

The application discloses a method and a device for removing text noise in natural images and a storage medium, wherein the method comprises the following steps: the image semantic segmentation network detects an area containing literal elements in an image to be repaired, and takes a segmentation recognition area as an area mask to be repaired; repairing the region containing the literal elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates an countermeasure network. According to the method and the device for detecting the text element areas, the text element areas which are common in the image to be repaired can be detected rapidly and automatically, text noise elements in the natural image can be removed selectively and automatically, and the areas which need to be repaired can be corrected in a manual interaction mode. The image restoration method based on the generation countermeasure network is used, and the restored image is more natural and lifelike.

Description

Method and device for removing text noise in natural image and storage medium
Technical Field
The embodiment of the application relates to the technical field of image classification, in particular to a method and a device for removing text noise in natural images and a storage medium.
Background
In recent years, with the advent of the big data age and the development of computer hardware, artificial intelligence is becoming more and more popular in our lives. The deep learning technology is widely applied to computer vision, and the image recognition is one of the most widely applied technologies, such as photographing recognition, face recognition, traffic sign recognition, gesture recognition, garbage classification and the like. These techniques find corresponding application in the e-commerce industry, the automotive industry, the gaming industry, and the manufacturing industry.
The image often has elements such as text due to human factors. These character elements destroy the aesthetic degree of the image, prevent the reuse of the image, and reduce the preservation value and quality of the image. Therefore, a large number of application scenes need to remove the text elements in the natural scene image to obtain a clean image. However, the text elements in natural images are often in different patterns and are unevenly distributed, such as handwriting, subtitles, watermarks, scratches and the like, which all increase the difficulty of removing the text elements. The existing mainstream text element removal method generally needs to manually mark text mask areas and then carry out image restoration, and the method has the problems of poor restored image quality and incongruity with natural image characteristics, and has long time consumption and heavy labor cost burden.
On the other hand, the conventional image restoration method based on diffusion uses edge information of an area to be restored to determine a direction of diffusion, and diffuses known information into the edge. The image restored by the method is unnatural, blurred and lacks texture details, and a large-scale image defect area cannot be restored. Other traditional methods also have the similar problems of complex processing flow, large calculation amount, low generalization and the like.
Disclosure of Invention
In view of this, the embodiments of the present application provide a method and apparatus for removing text noise in natural images, and a storage medium.
According to a first aspect of the present application, there is provided a method for removing text noise in a natural image, including:
detecting an area containing literal elements in an image to be repaired according to an image semantic segmentation network, and taking a segmentation recognition result as an area mask to be repaired;
repairing the region containing the literal elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates an countermeasure network.
As an implementation manner, the detecting, according to the image semantic segmentation network, a region containing a text element in an image to be repaired, and taking a segmentation recognition result as a mask of the region to be repaired, further includes:
after detecting the region containing the literal elements in the image to be repaired according to the image semantic segmentation network, determining whether a user selects a manual interaction mode to repair the image to be repaired; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the segmentation recognition result is used as a mask of the area to be repaired.
As an implementation manner, the image semantic segmentation network is a U-shaped jump layer connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
As an implementation, the method further includes:
increasing an attention mechanism to enhance a feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
As an implementation, the method further includes:
the channel attention module carries out global pooling on the feature graphs of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to normalize the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; the learned weight parameters are regulated to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
As an implementation, the method further includes:
the image restoration model generates a generator G of an countermeasure network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
According to a second aspect of the present application, there is provided a device for removing text noise in natural images, including:
the detection and mask generation unit is used for detecting an area containing literal elements in the image to be repaired according to the image semantic segmentation network, and taking the segmentation recognition area as a mask of the area to be repaired;
the image restoration unit is used for restoring the area containing the literal elements in the image to be restored by using the mask of the area to be restored according to the image restoration model; the image restoration model is a generator that generates an countermeasure network.
As an implementation, the apparatus further includes:
the manual interaction unit is used for determining whether a user selects a manual interaction mode to repair the image to be repaired after the detection and mask generation unit detects the area containing the text elements in the image to be repaired as the area to be repaired according to the image semantic segmentation network; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the detection and mask generation unit is informed to take the segmentation recognition result as a mask of the area to be repaired.
As an implementation manner, the image semantic segmentation network in the detection and mask generation unit is a 'U' -shaped layer jump connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
As an implementation, the detecting and mask generating unit is further configured to:
increasing an attention mechanism to enhance a feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
As an implementation, the detecting and mask generating unit is further configured to:
the channel attention module carries out global pooling on the feature graphs of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to normalize the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; the learned weight parameters are regulated to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
As an implementation, the image restoration unit is further configured to:
the image restoration model generates a generator G of an countermeasure network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
According to a third aspect of the present application, there is provided a storage medium having stored thereon an executable program which when executed by a processor implements the steps of the method of removing text noise in natural images.
According to the method, the device and the storage medium for removing the text noise in the natural image, the region containing the text elements in the image to be repaired is detected according to the image semantic segmentation network, and the segmentation recognition result is used as a mask of the region to be repaired; repairing the region containing the literal elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates an countermeasure network. According to the method and the device for detecting the text element areas, the text element areas which are common in the image to be repaired can be detected rapidly and automatically, text noise elements in the natural image can be removed selectively and automatically, and the areas which need to be repaired can be corrected in a manual interaction mode. The image restoration method based on the generation countermeasure network is used, and the restored image is more natural and lifelike.
Drawings
Fig. 1 is a schematic flow chart of a method for removing text noise in a natural image according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a semantic segmentation model according to an embodiment of the present application;
FIG. 3 is a flowchart of a specific example of a method for removing text noise in natural images according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an attention module structure according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a Pixel2Pixel model training architecture provided in an embodiment of the present application;
fig. 6 is a schematic diagram of a composition structure of a device for removing text noise in a natural image according to an embodiment of the present application.
Detailed Description
The following describes in detail the essence of the technical solution of the embodiments of the present application with reference to examples.
With the advent of deep learning, deep neural convolutional networks have been able to easily detect text in text or natural scene images and locate text regions. The main stream of deep learning text detection methods are based on two kinds of object detection and semantic segmentation. Compared with a target detection algorithm with regression rectangular frame level recognition precision, the semantic segmentation method can recognize pixel levels, has more accurate positioning, has no strict requirement on the character direction, and is more attached to the outline of a character area. The mainstream semantic segmentation network structures are all Encoder-decoders (Encoder-decoders), such as FCN, U-Net and DeepLab series segmentation models.
The image restoration method based on the countermeasure generation network (GAN) in the deep learning can learn rich semantic information from a large-scale data set, then fill missing contents in the image in an end-to-end mode, and the restored image is more natural and lifelike, so that a better restoration effect is achieved.
According to the embodiment of the application, the latest semantic segmentation and image restoration technology are combined, text areas in the natural image are obtained through semantic segmentation, a manual interaction mechanism is combined, and finally the natural image is restored by utilizing the generated countermeasure network. Aiming at different application scenes, the image restoration is carried out by combining two decision mechanisms of automatic selection and manual interaction of the text region, and the method is convenient to use, light in manpower burden and natural and vivid in restored images.
Fig. 1 is a schematic flow chart of a method for removing text noise in a natural image according to an embodiment of the present application, as shown in fig. 1, where the method for removing text noise in a natural image according to an embodiment of the present application includes the following processing steps:
and step 101, detecting the region containing the literal elements in the image to be repaired according to the image semantic segmentation network, and taking the segmentation recognition result as a mask of the region to be repaired.
In the embodiment of the application, after detecting the region containing the text elements in the image to be repaired according to the image semantic segmentation network, determining whether a user selects a manual interaction mode to repair the image to be repaired; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the segmentation recognition result is used as a mask of the area to be repaired.
In the embodiment of the application, the image semantic segmentation network is a U-shaped jump layer connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
The improved semantic segmentation model of the embodiment of the application is shown in fig. 2, and the whole U-Net network structure is similar to a large U letter. Firstly, downsampling; then deconvolution is carried out to carry out up-sampling, and the previous lower layers are fused; and then up-sampled again. This process is repeated to obtain an output attention image.
In the embodiment of the application, the hole space convolution pooling pyramid (atrous spatial pyramid pooling, ASPP) samples given hole convolutions of input at different sampling rates in parallel, which is equivalent to capturing the context of the image at a plurality of proportions.
In the embodiment of the application, the method further comprises the step of increasing an attention mechanism to enhance the characteristic characterization capability of the image semantic segmentation network; the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
In the embodiment of the application, a channel attention module carries out global pooling on the feature map of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to normalize the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; the learned weight parameters are regulated to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
102, repairing an area containing literal elements in the image to be repaired by using the mask of the area to be repaired according to an image repair model; the image restoration model is a generator that generates an countermeasure network.
In this embodiment of the present application, the image restoration model (image restoration module) generates a generator G of an countermeasure network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
In the embodiment of the application, after the repair area mask is generated, the selected area is repaired by using the image repair module. The image restoration module uses the generator G of the trained Pixel2Pixel model to restore the synthetic realistic natural image. Pixel2Pixel is a generating countermeasure network whose training inputs are pairs of images, consisting essentially of a generator G and a discriminator D. In order to promote the details of the image and keep the information of different scales, a U-Net model is adopted as a generator G.
Embodiments of the present application are described in further detail below in conjunction with specific examples.
In the embodiment of the present application, a natural image is taken as an example for explanation, and it should be noted that other pictures or images, such as a screen capture, a text of a picture, etc., may use the technical means of the embodiment of the present application.
Fig. 3 is a flowchart of a specific example of a method for removing text noise in a natural image according to an embodiment of the present application, where specific steps are as follows:
first, the user loads the image to be repaired. And automatically detecting the region containing the literal element in the natural image by a literal element detection module. The character detection module adopts a trained image semantic segmentation network to detect character areas, and takes segmentation recognition results as masks of the areas to be repaired. The semantic segmentation network model refers to a U-shaped jump layer connection network structure of a classical segmentation network U-Net. Aiming at the character characteristics, an ASPP module is added on the basis of the original U-Net to extract and fuse multi-scale context characteristics, and further a new attention mechanism is provided to enhance the characteristic characterization capability of the network, and the overall structure of the model is shown in figure 2.
In particular, the attention mechanism considers both enhanced channels and spatial features. The mechanism first uses a channel attention module whose main function is to assign weights to the individual channels, and then uses a spatial attention module to assign spatial feature weights. The channel attention module carries out global pooling on the feature map of each channel to obtain global information, then learns by adopting two full-connection layers (fc layers) to obtain the weight of each channel, and carries out multiplication operation with the initial feature. On the basis, the space attention module firstly compresses the channel number of the newly obtained feature map by using 1*1 convolution operation to reduce the calculated amount, then adopts self-adaptive pooling to normalize the space feature to 4 different scales, such as [1 x 1,8 x 8,16 x 16,32 x 32] and the like, so as to count global or local features of different feature maps, next splices and normalizes pooled features of the 4 scales, then inputs the pooled features into two layers of fully connected layers (fc layers) to learn different local weights of the space feature, normalizes the learned weight parameters to the size of the compressed feature of the previous step, then uses 1*1 convolution to restore the space parameter size to the space size of the channel attention feature, and performs multiplication operation on the space feature and the original feature, and finally performs addition operation on the latest obtained space feature to obtain the final attention feature. The attention module structure is shown in fig. 4. Fig. 4 is a schematic diagram of an attention module structure according to an embodiment of the present invention.
Specifically, the system judges whether the user selects to correct and modify the predicted area to be repaired of the U-Net in a manual interaction mode. If manual interaction is required, the user can correct the region to be repaired by deleting, modifying, adding and the like before generating the final region mask to be repaired. And if no manual interaction operation is adopted, directly generating a mask of the area to be repaired by using the text prediction area.
After the repair area mask is generated, the selected area is repaired by using the image repair module. The image restoration module uses the generator G of the trained Pixel2Pixel model to restore the synthetic realistic natural image. Pixel2Pixel is a generating countermeasure network whose training inputs are pairs of images, consisting essentially of a generator G and a discriminator D. In order to promote the details of the image and keep the information of different scales, a U-Net model is adopted as a generator G. The training architecture for Pixel2Pixel is shown in fig. 5. Fig. 5 is a schematic diagram of a Pixel2Pixel model training structure provided in an embodiment of the present application.
And storing the restored natural images until all image processing is completed, and exiting the system.
Fig. 6 is a schematic diagram of a composition structure of a device for removing text noise in a natural image according to an embodiment of the present application, as shown in fig. 6, where the device for removing text noise in a natural image according to an embodiment of the present application includes:
the detection and mask generation unit 61 detects an area containing a literal element in the image to be repaired according to the image semantic segmentation network, and takes the segmentation recognition area as a mask of the area to be repaired;
an image restoration unit 62, configured to restore, according to an image restoration model, an area including a text element in the image to be restored with the mask of the area to be restored; the image restoration model is a generator that generates an countermeasure network.
The apparatus further comprises:
a manual interaction unit 63, configured to determine whether a user selects a manual interaction mode to repair the image to be repaired after the detection and mask generation unit 61 detects, according to an image semantic segmentation network, a region including a text element in the image to be repaired as a region to be repaired; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the detection and mask generation unit 61 is notified of the division recognition result as a mask of the area to be repaired.
The image semantic segmentation network in the detection and mask generation unit 61 is a U-shaped jump layer connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
The detection and mask generation unit 61 is further configured to:
increasing an attention mechanism to enhance a feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
The detection and mask generation unit 61 is further configured to:
the channel attention module carries out global pooling on the feature graphs of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to normalize the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; the learned weight parameters are regulated to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
The image restoration unit 62 is further configured to:
the image restoration model generates a generator G of an countermeasure network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
In an exemplary embodiment, the above-described processing units of the apparatus for removing text noise in natural images of the embodiments of the present application may be implemented by one or more central processing units (CPU, central Processing Unit), graphic processors (GPU, graphics Processing Unit), baseband processors (BP, base Processor), application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSPs, programmable logic devices (PLD, programmable Logic Device), complex programmable logic devices (CPLD, complex Programmable Logic Device), field programmable gate arrays (FPGA, field-Programmable Gate Array), general purpose processors, controllers, microcontrollers (MCU, micro Controller Unit), microprocessors (Microprocessor), or other electronic components.
In the embodiment of the present disclosure, the specific manner in which each processing unit in the apparatus for removing text noise in natural images performs the operation in the embodiment of the method shown in fig. 6 is described in detail in the embodiment of the method, which will not be described in detail herein.
The embodiment of the application also describes a storage medium, and the storage medium stores an executable program, and the executable program realizes the steps of the method for removing the text noise in the natural image when being executed by a processor.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present invention, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present invention. The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
The foregoing is merely an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present invention, and the changes and substitutions are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (13)

1. A method for removing text noise in a natural image, the method comprising:
detecting an area containing literal elements in an image to be repaired according to an image semantic segmentation network, and taking a segmentation recognition area as an area mask to be repaired;
repairing the region containing the literal elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates an countermeasure network.
2. The method according to claim 1, wherein the detecting the region containing the text element in the image to be repaired according to the image semantic segmentation network and using the segmentation recognition result as the mask of the region to be repaired further comprises:
after detecting the region containing the literal elements in the image to be repaired according to the image semantic segmentation network, determining whether a user selects a manual interaction mode to repair the image to be repaired; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the segmentation recognition result is automatically used as the mask of the area to be repaired.
3. The method according to claim 1, wherein the image semantic segmentation network is a "U" type layer-jump connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
4. A method according to claim 3, characterized in that the method further comprises:
increasing an attention mechanism to enhance a feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
5. The method according to claim 4, wherein the method further comprises:
the channel attention module carries out global pooling on the feature graphs of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to normalize the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; the learned weight parameters are regulated to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
6. The method according to claim 1, wherein the method further comprises:
the image restoration model generates a generator G of an countermeasure network model for the trained Pixel2Pixel, and the Pixel2Pixel generates the countermeasure network model by adopting a U-Net segmentation network model as the generator G.
7. A device for removing text noise in natural images, the device comprising:
the detection and mask generation unit is used for detecting an area containing literal elements in the image to be repaired according to the image semantic segmentation network, and taking the segmentation recognition area as a mask of the area to be repaired;
the image restoration unit is used for restoring the area containing the literal elements in the image to be restored by using the mask of the area to be restored according to the image restoration model; the image restoration model is a generator that generates an countermeasure network.
8. The apparatus of claim 7, wherein the apparatus further comprises:
the manual interaction unit is used for determining whether a user selects a manual interaction mode to repair the image to be repaired after the detection and mask generation unit detects the area containing the text elements in the image to be repaired as the area to be repaired according to the image semantic segmentation network; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the detection and mask generation unit is notified to automatically take the segmentation recognition result as the mask of the area to be repaired.
9. The apparatus according to claim 7, wherein the image semantic segmentation network in the detection and mask generation unit is a "U" type layer-jump connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
10. The apparatus of claim 9, wherein the detection and mask generation unit is further configured to:
increasing an attention mechanism to enhance a feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
11. The apparatus of claim 10, wherein the detection and mask generation unit is further configured to:
the channel attention module carries out global pooling on the feature graphs of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to adjust the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; restoring the learned weight parameters to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
12. The apparatus of claim 7, wherein the image restoration unit is further configured to:
the image restoration model generates a generator G of an countermeasure network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
13. A storage medium having stored thereon an executable program which when executed by a processor performs the steps of the method of removing text noise in natural images as claimed in any one of claims 1 to 6.
CN202110172477.8A 2021-02-08 2021-02-08 Method and device for removing text noise in natural image and storage medium Active CN112801911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110172477.8A CN112801911B (en) 2021-02-08 2021-02-08 Method and device for removing text noise in natural image and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110172477.8A CN112801911B (en) 2021-02-08 2021-02-08 Method and device for removing text noise in natural image and storage medium

Publications (2)

Publication Number Publication Date
CN112801911A CN112801911A (en) 2021-05-14
CN112801911B true CN112801911B (en) 2024-03-26

Family

ID=75814802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110172477.8A Active CN112801911B (en) 2021-02-08 2021-02-08 Method and device for removing text noise in natural image and storage medium

Country Status (1)

Country Link
CN (1) CN112801911B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116670683A (en) * 2021-12-28 2023-08-29 华为技术有限公司 Image processing method, device and storage medium
CN114627389B (en) * 2022-03-23 2023-01-31 中国科学院空天信息创新研究院 Raft culture area extraction method based on multi-temporal optical remote sensing image

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN107609560A (en) * 2017-09-27 2018-01-19 北京小米移动软件有限公司 Character recognition method and device
CN108805840A (en) * 2018-06-11 2018-11-13 Oppo(重庆)智能科技有限公司 Method, apparatus, terminal and the computer readable storage medium of image denoising
CN109359550A (en) * 2018-09-20 2019-02-19 大连民族大学 Language of the Manchus document seal Abstraction and minimizing technology based on depth learning technology
CN109583449A (en) * 2018-10-29 2019-04-05 深圳市华尊科技股份有限公司 Character identifying method and Related product
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
WO2019238560A1 (en) * 2018-06-12 2019-12-19 Tomtom Global Content B.V. Generative adversarial networks for image segmentation
CN110738207A (en) * 2019-09-10 2020-01-31 西南交通大学 character detection method for fusing character area edge information in character image
CN110956579A (en) * 2019-11-27 2020-04-03 中山大学 Text image rewriting method based on semantic segmentation graph generation
CN111080723A (en) * 2019-12-17 2020-04-28 易诚高科(大连)科技有限公司 Image element segmentation method based on Unet network
CN111160352A (en) * 2019-12-27 2020-05-15 创新奇智(北京)科技有限公司 Workpiece metal surface character recognition method and system based on image segmentation
CN111199550A (en) * 2020-04-09 2020-05-26 腾讯科技(深圳)有限公司 Training method, segmentation method, device and storage medium of image segmentation network
WO2020219915A1 (en) * 2019-04-24 2020-10-29 University Of Virginia Patent Foundation Denoising magnetic resonance images using unsupervised deep convolutional neural networks

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN107609560A (en) * 2017-09-27 2018-01-19 北京小米移动软件有限公司 Character recognition method and device
CN108805840A (en) * 2018-06-11 2018-11-13 Oppo(重庆)智能科技有限公司 Method, apparatus, terminal and the computer readable storage medium of image denoising
WO2019238560A1 (en) * 2018-06-12 2019-12-19 Tomtom Global Content B.V. Generative adversarial networks for image segmentation
CN109359550A (en) * 2018-09-20 2019-02-19 大连民族大学 Language of the Manchus document seal Abstraction and minimizing technology based on depth learning technology
CN109583449A (en) * 2018-10-29 2019-04-05 深圳市华尊科技股份有限公司 Character identifying method and Related product
WO2020219915A1 (en) * 2019-04-24 2020-10-29 University Of Virginia Patent Foundation Denoising magnetic resonance images using unsupervised deep convolutional neural networks
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN110738207A (en) * 2019-09-10 2020-01-31 西南交通大学 character detection method for fusing character area edge information in character image
CN110956579A (en) * 2019-11-27 2020-04-03 中山大学 Text image rewriting method based on semantic segmentation graph generation
CN111080723A (en) * 2019-12-17 2020-04-28 易诚高科(大连)科技有限公司 Image element segmentation method based on Unet network
CN111160352A (en) * 2019-12-27 2020-05-15 创新奇智(北京)科技有限公司 Workpiece metal surface character recognition method and system based on image segmentation
CN111199550A (en) * 2020-04-09 2020-05-26 腾讯科技(深圳)有限公司 Training method, segmentation method, device and storage medium of image segmentation network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Semantic Prior Based Generative Adversarial Network for Video Super-Resolution;Xinyi Wu;2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019);20190711;全文 *
基于深度学习的场景文字检测与识别综述;艾合麦提江・麦提托合提;艾斯卡尔・艾木都拉;阿布都萨拉木・达吾提;;电视技术(14);全文 *
生成对抗网络在医学图像处理中的应用;陈锟;乔沁;宋志坚;;生命科学仪器;20181025(Z1);全文 *
艾合麦提江・麦提托合提 *

Also Published As

Publication number Publication date
CN112801911A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN109670558B (en) Digital image completion using deep learning
CN112232349B (en) Model training method, image segmentation method and device
CN111292264B (en) Image high dynamic range reconstruction method based on deep learning
CN110414499A (en) Text position localization method and system and model training method and system
CN112801911B (en) Method and device for removing text noise in natural image and storage medium
CN111062903A (en) Automatic processing method and system for image watermark, electronic equipment and storage medium
CN109472193A (en) Method for detecting human face and device
CN110781980B (en) Training method of target detection model, target detection method and device
WO2021238420A1 (en) Image defogging method, terminal, and computer storage medium
CN113160062A (en) Infrared image target detection method, device, equipment and storage medium
CN112906794A (en) Target detection method, device, storage medium and terminal
CN108710893A (en) A kind of digital image cameras source model sorting technique of feature based fusion
CN116645592B (en) Crack detection method based on image processing and storage medium
CN110689495A (en) Image restoration method for deep learning
CN110310224A (en) Light efficiency rendering method and device
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN113824884A (en) Photographing method and apparatus, photographing device, and computer-readable storage medium
CN113468946A (en) Semantically consistent enhanced training data for traffic light detection
CN108520263A (en) A kind of recognition methods of panoramic picture, system and computer storage media
CN111951373B (en) Face image processing method and equipment
CN108810319A (en) Image processing apparatus and image processing method
CN116091784A (en) Target tracking method, device and storage medium
CN116167910A (en) Text editing method, text editing device, computer equipment and computer readable storage medium
CN113034432B (en) Product defect detection method, system, device and storage medium
CN113033645A (en) Multi-scale fusion depth image enhancement method and device for RGB-D image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant