CN112801911B - Method and device for removing text noise in natural image and storage medium - Google Patents
Method and device for removing text noise in natural image and storage medium Download PDFInfo
- Publication number
- CN112801911B CN112801911B CN202110172477.8A CN202110172477A CN112801911B CN 112801911 B CN112801911 B CN 112801911B CN 202110172477 A CN202110172477 A CN 202110172477A CN 112801911 B CN112801911 B CN 112801911B
- Authority
- CN
- China
- Prior art keywords
- image
- repaired
- area
- mask
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 230000011218 segmentation Effects 0.000 claims abstract description 67
- 230000003993 interaction Effects 0.000 claims abstract description 16
- 238000011176 pooling Methods 0.000 claims description 28
- 238000001514 detection method Methods 0.000 claims description 21
- 230000007246 mechanism Effects 0.000 claims description 17
- 230000001105 regulatory effect Effects 0.000 claims description 11
- 230000008439 repair process Effects 0.000 claims description 11
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 claims description 8
- 238000012512 characterization method Methods 0.000 claims description 7
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 238000012937 correction Methods 0.000 claims description 6
- 238000012545 processing Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000012549 training Methods 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000009792 diffusion process Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000006467 substitution reaction Methods 0.000 description 2
- 238000003491 array Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30176—Document
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Image Processing (AREA)
- Character Input (AREA)
Abstract
The application discloses a method and a device for removing text noise in natural images and a storage medium, wherein the method comprises the following steps: the image semantic segmentation network detects an area containing literal elements in an image to be repaired, and takes a segmentation recognition area as an area mask to be repaired; repairing the region containing the literal elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates an countermeasure network. According to the method and the device for detecting the text element areas, the text element areas which are common in the image to be repaired can be detected rapidly and automatically, text noise elements in the natural image can be removed selectively and automatically, and the areas which need to be repaired can be corrected in a manual interaction mode. The image restoration method based on the generation countermeasure network is used, and the restored image is more natural and lifelike.
Description
Technical Field
The embodiment of the application relates to the technical field of image classification, in particular to a method and a device for removing text noise in natural images and a storage medium.
Background
In recent years, with the advent of the big data age and the development of computer hardware, artificial intelligence is becoming more and more popular in our lives. The deep learning technology is widely applied to computer vision, and the image recognition is one of the most widely applied technologies, such as photographing recognition, face recognition, traffic sign recognition, gesture recognition, garbage classification and the like. These techniques find corresponding application in the e-commerce industry, the automotive industry, the gaming industry, and the manufacturing industry.
The image often has elements such as text due to human factors. These character elements destroy the aesthetic degree of the image, prevent the reuse of the image, and reduce the preservation value and quality of the image. Therefore, a large number of application scenes need to remove the text elements in the natural scene image to obtain a clean image. However, the text elements in natural images are often in different patterns and are unevenly distributed, such as handwriting, subtitles, watermarks, scratches and the like, which all increase the difficulty of removing the text elements. The existing mainstream text element removal method generally needs to manually mark text mask areas and then carry out image restoration, and the method has the problems of poor restored image quality and incongruity with natural image characteristics, and has long time consumption and heavy labor cost burden.
On the other hand, the conventional image restoration method based on diffusion uses edge information of an area to be restored to determine a direction of diffusion, and diffuses known information into the edge. The image restored by the method is unnatural, blurred and lacks texture details, and a large-scale image defect area cannot be restored. Other traditional methods also have the similar problems of complex processing flow, large calculation amount, low generalization and the like.
Disclosure of Invention
In view of this, the embodiments of the present application provide a method and apparatus for removing text noise in natural images, and a storage medium.
According to a first aspect of the present application, there is provided a method for removing text noise in a natural image, including:
detecting an area containing literal elements in an image to be repaired according to an image semantic segmentation network, and taking a segmentation recognition result as an area mask to be repaired;
repairing the region containing the literal elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates an countermeasure network.
As an implementation manner, the detecting, according to the image semantic segmentation network, a region containing a text element in an image to be repaired, and taking a segmentation recognition result as a mask of the region to be repaired, further includes:
after detecting the region containing the literal elements in the image to be repaired according to the image semantic segmentation network, determining whether a user selects a manual interaction mode to repair the image to be repaired; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the segmentation recognition result is used as a mask of the area to be repaired.
As an implementation manner, the image semantic segmentation network is a U-shaped jump layer connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
As an implementation, the method further includes:
increasing an attention mechanism to enhance a feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
As an implementation, the method further includes:
the channel attention module carries out global pooling on the feature graphs of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to normalize the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; the learned weight parameters are regulated to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
As an implementation, the method further includes:
the image restoration model generates a generator G of an countermeasure network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
According to a second aspect of the present application, there is provided a device for removing text noise in natural images, including:
the detection and mask generation unit is used for detecting an area containing literal elements in the image to be repaired according to the image semantic segmentation network, and taking the segmentation recognition area as a mask of the area to be repaired;
the image restoration unit is used for restoring the area containing the literal elements in the image to be restored by using the mask of the area to be restored according to the image restoration model; the image restoration model is a generator that generates an countermeasure network.
As an implementation, the apparatus further includes:
the manual interaction unit is used for determining whether a user selects a manual interaction mode to repair the image to be repaired after the detection and mask generation unit detects the area containing the text elements in the image to be repaired as the area to be repaired according to the image semantic segmentation network; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the detection and mask generation unit is informed to take the segmentation recognition result as a mask of the area to be repaired.
As an implementation manner, the image semantic segmentation network in the detection and mask generation unit is a 'U' -shaped layer jump connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
As an implementation, the detecting and mask generating unit is further configured to:
increasing an attention mechanism to enhance a feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
As an implementation, the detecting and mask generating unit is further configured to:
the channel attention module carries out global pooling on the feature graphs of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to normalize the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; the learned weight parameters are regulated to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
As an implementation, the image restoration unit is further configured to:
the image restoration model generates a generator G of an countermeasure network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
According to a third aspect of the present application, there is provided a storage medium having stored thereon an executable program which when executed by a processor implements the steps of the method of removing text noise in natural images.
According to the method, the device and the storage medium for removing the text noise in the natural image, the region containing the text elements in the image to be repaired is detected according to the image semantic segmentation network, and the segmentation recognition result is used as a mask of the region to be repaired; repairing the region containing the literal elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates an countermeasure network. According to the method and the device for detecting the text element areas, the text element areas which are common in the image to be repaired can be detected rapidly and automatically, text noise elements in the natural image can be removed selectively and automatically, and the areas which need to be repaired can be corrected in a manual interaction mode. The image restoration method based on the generation countermeasure network is used, and the restored image is more natural and lifelike.
Drawings
Fig. 1 is a schematic flow chart of a method for removing text noise in a natural image according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of a semantic segmentation model according to an embodiment of the present application;
FIG. 3 is a flowchart of a specific example of a method for removing text noise in natural images according to an embodiment of the present disclosure;
FIG. 4 is a schematic diagram of an attention module structure according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a Pixel2Pixel model training architecture provided in an embodiment of the present application;
fig. 6 is a schematic diagram of a composition structure of a device for removing text noise in a natural image according to an embodiment of the present application.
Detailed Description
The following describes in detail the essence of the technical solution of the embodiments of the present application with reference to examples.
With the advent of deep learning, deep neural convolutional networks have been able to easily detect text in text or natural scene images and locate text regions. The main stream of deep learning text detection methods are based on two kinds of object detection and semantic segmentation. Compared with a target detection algorithm with regression rectangular frame level recognition precision, the semantic segmentation method can recognize pixel levels, has more accurate positioning, has no strict requirement on the character direction, and is more attached to the outline of a character area. The mainstream semantic segmentation network structures are all Encoder-decoders (Encoder-decoders), such as FCN, U-Net and DeepLab series segmentation models.
The image restoration method based on the countermeasure generation network (GAN) in the deep learning can learn rich semantic information from a large-scale data set, then fill missing contents in the image in an end-to-end mode, and the restored image is more natural and lifelike, so that a better restoration effect is achieved.
According to the embodiment of the application, the latest semantic segmentation and image restoration technology are combined, text areas in the natural image are obtained through semantic segmentation, a manual interaction mechanism is combined, and finally the natural image is restored by utilizing the generated countermeasure network. Aiming at different application scenes, the image restoration is carried out by combining two decision mechanisms of automatic selection and manual interaction of the text region, and the method is convenient to use, light in manpower burden and natural and vivid in restored images.
Fig. 1 is a schematic flow chart of a method for removing text noise in a natural image according to an embodiment of the present application, as shown in fig. 1, where the method for removing text noise in a natural image according to an embodiment of the present application includes the following processing steps:
and step 101, detecting the region containing the literal elements in the image to be repaired according to the image semantic segmentation network, and taking the segmentation recognition result as a mask of the region to be repaired.
In the embodiment of the application, after detecting the region containing the text elements in the image to be repaired according to the image semantic segmentation network, determining whether a user selects a manual interaction mode to repair the image to be repaired; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the segmentation recognition result is used as a mask of the area to be repaired.
In the embodiment of the application, the image semantic segmentation network is a U-shaped jump layer connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
The improved semantic segmentation model of the embodiment of the application is shown in fig. 2, and the whole U-Net network structure is similar to a large U letter. Firstly, downsampling; then deconvolution is carried out to carry out up-sampling, and the previous lower layers are fused; and then up-sampled again. This process is repeated to obtain an output attention image.
In the embodiment of the application, the hole space convolution pooling pyramid (atrous spatial pyramid pooling, ASPP) samples given hole convolutions of input at different sampling rates in parallel, which is equivalent to capturing the context of the image at a plurality of proportions.
In the embodiment of the application, the method further comprises the step of increasing an attention mechanism to enhance the characteristic characterization capability of the image semantic segmentation network; the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
In the embodiment of the application, a channel attention module carries out global pooling on the feature map of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to normalize the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; the learned weight parameters are regulated to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
102, repairing an area containing literal elements in the image to be repaired by using the mask of the area to be repaired according to an image repair model; the image restoration model is a generator that generates an countermeasure network.
In this embodiment of the present application, the image restoration model (image restoration module) generates a generator G of an countermeasure network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
In the embodiment of the application, after the repair area mask is generated, the selected area is repaired by using the image repair module. The image restoration module uses the generator G of the trained Pixel2Pixel model to restore the synthetic realistic natural image. Pixel2Pixel is a generating countermeasure network whose training inputs are pairs of images, consisting essentially of a generator G and a discriminator D. In order to promote the details of the image and keep the information of different scales, a U-Net model is adopted as a generator G.
Embodiments of the present application are described in further detail below in conjunction with specific examples.
In the embodiment of the present application, a natural image is taken as an example for explanation, and it should be noted that other pictures or images, such as a screen capture, a text of a picture, etc., may use the technical means of the embodiment of the present application.
Fig. 3 is a flowchart of a specific example of a method for removing text noise in a natural image according to an embodiment of the present application, where specific steps are as follows:
first, the user loads the image to be repaired. And automatically detecting the region containing the literal element in the natural image by a literal element detection module. The character detection module adopts a trained image semantic segmentation network to detect character areas, and takes segmentation recognition results as masks of the areas to be repaired. The semantic segmentation network model refers to a U-shaped jump layer connection network structure of a classical segmentation network U-Net. Aiming at the character characteristics, an ASPP module is added on the basis of the original U-Net to extract and fuse multi-scale context characteristics, and further a new attention mechanism is provided to enhance the characteristic characterization capability of the network, and the overall structure of the model is shown in figure 2.
In particular, the attention mechanism considers both enhanced channels and spatial features. The mechanism first uses a channel attention module whose main function is to assign weights to the individual channels, and then uses a spatial attention module to assign spatial feature weights. The channel attention module carries out global pooling on the feature map of each channel to obtain global information, then learns by adopting two full-connection layers (fc layers) to obtain the weight of each channel, and carries out multiplication operation with the initial feature. On the basis, the space attention module firstly compresses the channel number of the newly obtained feature map by using 1*1 convolution operation to reduce the calculated amount, then adopts self-adaptive pooling to normalize the space feature to 4 different scales, such as [1 x 1,8 x 8,16 x 16,32 x 32] and the like, so as to count global or local features of different feature maps, next splices and normalizes pooled features of the 4 scales, then inputs the pooled features into two layers of fully connected layers (fc layers) to learn different local weights of the space feature, normalizes the learned weight parameters to the size of the compressed feature of the previous step, then uses 1*1 convolution to restore the space parameter size to the space size of the channel attention feature, and performs multiplication operation on the space feature and the original feature, and finally performs addition operation on the latest obtained space feature to obtain the final attention feature. The attention module structure is shown in fig. 4. Fig. 4 is a schematic diagram of an attention module structure according to an embodiment of the present invention.
Specifically, the system judges whether the user selects to correct and modify the predicted area to be repaired of the U-Net in a manual interaction mode. If manual interaction is required, the user can correct the region to be repaired by deleting, modifying, adding and the like before generating the final region mask to be repaired. And if no manual interaction operation is adopted, directly generating a mask of the area to be repaired by using the text prediction area.
After the repair area mask is generated, the selected area is repaired by using the image repair module. The image restoration module uses the generator G of the trained Pixel2Pixel model to restore the synthetic realistic natural image. Pixel2Pixel is a generating countermeasure network whose training inputs are pairs of images, consisting essentially of a generator G and a discriminator D. In order to promote the details of the image and keep the information of different scales, a U-Net model is adopted as a generator G. The training architecture for Pixel2Pixel is shown in fig. 5. Fig. 5 is a schematic diagram of a Pixel2Pixel model training structure provided in an embodiment of the present application.
And storing the restored natural images until all image processing is completed, and exiting the system.
Fig. 6 is a schematic diagram of a composition structure of a device for removing text noise in a natural image according to an embodiment of the present application, as shown in fig. 6, where the device for removing text noise in a natural image according to an embodiment of the present application includes:
the detection and mask generation unit 61 detects an area containing a literal element in the image to be repaired according to the image semantic segmentation network, and takes the segmentation recognition area as a mask of the area to be repaired;
an image restoration unit 62, configured to restore, according to an image restoration model, an area including a text element in the image to be restored with the mask of the area to be restored; the image restoration model is a generator that generates an countermeasure network.
The apparatus further comprises:
a manual interaction unit 63, configured to determine whether a user selects a manual interaction mode to repair the image to be repaired after the detection and mask generation unit 61 detects, according to an image semantic segmentation network, a region including a text element in the image to be repaired as a region to be repaired; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the detection and mask generation unit 61 is notified of the division recognition result as a mask of the area to be repaired.
The image semantic segmentation network in the detection and mask generation unit 61 is a U-shaped jump layer connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
The detection and mask generation unit 61 is further configured to:
increasing an attention mechanism to enhance a feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
The detection and mask generation unit 61 is further configured to:
the channel attention module carries out global pooling on the feature graphs of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to normalize the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; the learned weight parameters are regulated to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
The image restoration unit 62 is further configured to:
the image restoration model generates a generator G of an countermeasure network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
In an exemplary embodiment, the above-described processing units of the apparatus for removing text noise in natural images of the embodiments of the present application may be implemented by one or more central processing units (CPU, central Processing Unit), graphic processors (GPU, graphics Processing Unit), baseband processors (BP, base Processor), application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSPs, programmable logic devices (PLD, programmable Logic Device), complex programmable logic devices (CPLD, complex Programmable Logic Device), field programmable gate arrays (FPGA, field-Programmable Gate Array), general purpose processors, controllers, microcontrollers (MCU, micro Controller Unit), microprocessors (Microprocessor), or other electronic components.
In the embodiment of the present disclosure, the specific manner in which each processing unit in the apparatus for removing text noise in natural images performs the operation in the embodiment of the method shown in fig. 6 is described in detail in the embodiment of the method, which will not be described in detail herein.
The embodiment of the application also describes a storage medium, and the storage medium stores an executable program, and the executable program realizes the steps of the method for removing the text noise in the natural image when being executed by a processor.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present invention, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present invention. The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present invention may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
The foregoing is merely an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present invention, and the changes and substitutions are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
Claims (13)
1. A method for removing text noise in a natural image, the method comprising:
detecting an area containing literal elements in an image to be repaired according to an image semantic segmentation network, and taking a segmentation recognition area as an area mask to be repaired;
repairing the region containing the literal elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates an countermeasure network.
2. The method according to claim 1, wherein the detecting the region containing the text element in the image to be repaired according to the image semantic segmentation network and using the segmentation recognition result as the mask of the region to be repaired further comprises:
after detecting the region containing the literal elements in the image to be repaired according to the image semantic segmentation network, determining whether a user selects a manual interaction mode to repair the image to be repaired; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the segmentation recognition result is automatically used as the mask of the area to be repaired.
3. The method according to claim 1, wherein the image semantic segmentation network is a "U" type layer-jump connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
4. A method according to claim 3, characterized in that the method further comprises:
increasing an attention mechanism to enhance a feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
5. The method according to claim 4, wherein the method further comprises:
the channel attention module carries out global pooling on the feature graphs of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to normalize the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; the learned weight parameters are regulated to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
6. The method according to claim 1, wherein the method further comprises:
the image restoration model generates a generator G of an countermeasure network model for the trained Pixel2Pixel, and the Pixel2Pixel generates the countermeasure network model by adopting a U-Net segmentation network model as the generator G.
7. A device for removing text noise in natural images, the device comprising:
the detection and mask generation unit is used for detecting an area containing literal elements in the image to be repaired according to the image semantic segmentation network, and taking the segmentation recognition area as a mask of the area to be repaired;
the image restoration unit is used for restoring the area containing the literal elements in the image to be restored by using the mask of the area to be restored according to the image restoration model; the image restoration model is a generator that generates an countermeasure network.
8. The apparatus of claim 7, wherein the apparatus further comprises:
the manual interaction unit is used for determining whether a user selects a manual interaction mode to repair the image to be repaired after the detection and mask generation unit detects the area containing the text elements in the image to be repaired as the area to be repaired according to the image semantic segmentation network; if yes, receiving the correction to-be-repaired area of the user through deleting, modifying and adding operations; otherwise, the detection and mask generation unit is notified to automatically take the segmentation recognition result as the mask of the area to be repaired.
9. The apparatus according to claim 7, wherein the image semantic segmentation network in the detection and mask generation unit is a "U" type layer-jump connection network structure of a U-Net segmentation network; and adding a hole space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse multi-scale context characteristics.
10. The apparatus of claim 9, wherein the detection and mask generation unit is further configured to:
increasing an attention mechanism to enhance a feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to distribute weights to all channels, and uses a space attention module to distribute space feature weights.
11. The apparatus of claim 10, wherein the detection and mask generation unit is further configured to:
the channel attention module carries out global pooling on the feature graphs of each channel to acquire global information; obtaining the weight of each channel by adopting two full-connection layer learning, and carrying out multiplication operation with the initial characteristics;
the space attention module compresses the channel number of the obtained feature map by using 1*1 convolution operation; adopting self-adaptive pooling to adjust the spatial characteristics to 4 different scales; the pooling features with 4 scales are spliced and regulated and then input into different local weights of the learning space features in the two layers of full-connection layers; restoring the learned weight parameters to the scale of the compression characteristic; recovering the spatial parameter scale to the spatial size of the channel attention feature and multiplying it with 1*1 convolution; and adding the obtained spatial features and the original features to obtain final attention features.
12. The apparatus of claim 7, wherein the image restoration unit is further configured to:
the image restoration model generates a generator G of an countermeasure network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
13. A storage medium having stored thereon an executable program which when executed by a processor performs the steps of the method of removing text noise in natural images as claimed in any one of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110172477.8A CN112801911B (en) | 2021-02-08 | 2021-02-08 | Method and device for removing text noise in natural image and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110172477.8A CN112801911B (en) | 2021-02-08 | 2021-02-08 | Method and device for removing text noise in natural image and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112801911A CN112801911A (en) | 2021-05-14 |
CN112801911B true CN112801911B (en) | 2024-03-26 |
Family
ID=75814802
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110172477.8A Active CN112801911B (en) | 2021-02-08 | 2021-02-08 | Method and device for removing text noise in natural image and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112801911B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116670683A (en) * | 2021-12-28 | 2023-08-29 | 华为技术有限公司 | Image processing method, device and storage medium |
CN114627389B (en) * | 2022-03-23 | 2023-01-31 | 中国科学院空天信息创新研究院 | Raft culture area extraction method based on multi-temporal optical remote sensing image |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105574513A (en) * | 2015-12-22 | 2016-05-11 | 北京旷视科技有限公司 | Character detection method and device |
CN107609560A (en) * | 2017-09-27 | 2018-01-19 | 北京小米移动软件有限公司 | Character recognition method and device |
CN108805840A (en) * | 2018-06-11 | 2018-11-13 | Oppo(重庆)智能科技有限公司 | Method, apparatus, terminal and the computer readable storage medium of image denoising |
CN109359550A (en) * | 2018-09-20 | 2019-02-19 | 大连民族大学 | Language of the Manchus document seal Abstraction and minimizing technology based on depth learning technology |
CN109583449A (en) * | 2018-10-29 | 2019-04-05 | 深圳市华尊科技股份有限公司 | Character identifying method and Related product |
CN110287960A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院信息工程研究所 | The detection recognition method of curve text in natural scene image |
WO2019238560A1 (en) * | 2018-06-12 | 2019-12-19 | Tomtom Global Content B.V. | Generative adversarial networks for image segmentation |
CN110738207A (en) * | 2019-09-10 | 2020-01-31 | 西南交通大学 | character detection method for fusing character area edge information in character image |
CN110956579A (en) * | 2019-11-27 | 2020-04-03 | 中山大学 | Text image rewriting method based on semantic segmentation graph generation |
CN111080723A (en) * | 2019-12-17 | 2020-04-28 | 易诚高科(大连)科技有限公司 | Image element segmentation method based on Unet network |
CN111160352A (en) * | 2019-12-27 | 2020-05-15 | 创新奇智(北京)科技有限公司 | Workpiece metal surface character recognition method and system based on image segmentation |
CN111199550A (en) * | 2020-04-09 | 2020-05-26 | 腾讯科技(深圳)有限公司 | Training method, segmentation method, device and storage medium of image segmentation network |
WO2020219915A1 (en) * | 2019-04-24 | 2020-10-29 | University Of Virginia Patent Foundation | Denoising magnetic resonance images using unsupervised deep convolutional neural networks |
-
2021
- 2021-02-08 CN CN202110172477.8A patent/CN112801911B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105574513A (en) * | 2015-12-22 | 2016-05-11 | 北京旷视科技有限公司 | Character detection method and device |
CN107609560A (en) * | 2017-09-27 | 2018-01-19 | 北京小米移动软件有限公司 | Character recognition method and device |
CN108805840A (en) * | 2018-06-11 | 2018-11-13 | Oppo(重庆)智能科技有限公司 | Method, apparatus, terminal and the computer readable storage medium of image denoising |
WO2019238560A1 (en) * | 2018-06-12 | 2019-12-19 | Tomtom Global Content B.V. | Generative adversarial networks for image segmentation |
CN109359550A (en) * | 2018-09-20 | 2019-02-19 | 大连民族大学 | Language of the Manchus document seal Abstraction and minimizing technology based on depth learning technology |
CN109583449A (en) * | 2018-10-29 | 2019-04-05 | 深圳市华尊科技股份有限公司 | Character identifying method and Related product |
WO2020219915A1 (en) * | 2019-04-24 | 2020-10-29 | University Of Virginia Patent Foundation | Denoising magnetic resonance images using unsupervised deep convolutional neural networks |
CN110287960A (en) * | 2019-07-02 | 2019-09-27 | 中国科学院信息工程研究所 | The detection recognition method of curve text in natural scene image |
CN110738207A (en) * | 2019-09-10 | 2020-01-31 | 西南交通大学 | character detection method for fusing character area edge information in character image |
CN110956579A (en) * | 2019-11-27 | 2020-04-03 | 中山大学 | Text image rewriting method based on semantic segmentation graph generation |
CN111080723A (en) * | 2019-12-17 | 2020-04-28 | 易诚高科(大连)科技有限公司 | Image element segmentation method based on Unet network |
CN111160352A (en) * | 2019-12-27 | 2020-05-15 | 创新奇智(北京)科技有限公司 | Workpiece metal surface character recognition method and system based on image segmentation |
CN111199550A (en) * | 2020-04-09 | 2020-05-26 | 腾讯科技(深圳)有限公司 | Training method, segmentation method, device and storage medium of image segmentation network |
Non-Patent Citations (4)
Title |
---|
Semantic Prior Based Generative Adversarial Network for Video Super-Resolution;Xinyi Wu;2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019);20190711;全文 * |
基于深度学习的场景文字检测与识别综述;艾合麦提江・麦提托合提;艾斯卡尔・艾木都拉;阿布都萨拉木・达吾提;;电视技术(14);全文 * |
生成对抗网络在医学图像处理中的应用;陈锟;乔沁;宋志坚;;生命科学仪器;20181025(Z1);全文 * |
艾合麦提江・麦提托合提 * |
Also Published As
Publication number | Publication date |
---|---|
CN112801911A (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109670558B (en) | Digital image completion using deep learning | |
CN112232349B (en) | Model training method, image segmentation method and device | |
CN111292264B (en) | Image high dynamic range reconstruction method based on deep learning | |
CN110414499A (en) | Text position localization method and system and model training method and system | |
CN112801911B (en) | Method and device for removing text noise in natural image and storage medium | |
CN111062903A (en) | Automatic processing method and system for image watermark, electronic equipment and storage medium | |
CN109472193A (en) | Method for detecting human face and device | |
CN110781980B (en) | Training method of target detection model, target detection method and device | |
WO2021238420A1 (en) | Image defogging method, terminal, and computer storage medium | |
CN113160062A (en) | Infrared image target detection method, device, equipment and storage medium | |
CN112906794A (en) | Target detection method, device, storage medium and terminal | |
CN108710893A (en) | A kind of digital image cameras source model sorting technique of feature based fusion | |
CN116645592B (en) | Crack detection method based on image processing and storage medium | |
CN110689495A (en) | Image restoration method for deep learning | |
CN110310224A (en) | Light efficiency rendering method and device | |
CN113378812A (en) | Digital dial plate identification method based on Mask R-CNN and CRNN | |
CN113824884A (en) | Photographing method and apparatus, photographing device, and computer-readable storage medium | |
CN113468946A (en) | Semantically consistent enhanced training data for traffic light detection | |
CN108520263A (en) | A kind of recognition methods of panoramic picture, system and computer storage media | |
CN111951373B (en) | Face image processing method and equipment | |
CN108810319A (en) | Image processing apparatus and image processing method | |
CN116091784A (en) | Target tracking method, device and storage medium | |
CN116167910A (en) | Text editing method, text editing device, computer equipment and computer readable storage medium | |
CN113034432B (en) | Product defect detection method, system, device and storage medium | |
CN113033645A (en) | Multi-scale fusion depth image enhancement method and device for RGB-D image |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |