CN112801911A - Method and device for removing Chinese character noise in natural image and storage medium - Google Patents

Method and device for removing Chinese character noise in natural image and storage medium Download PDF

Info

Publication number
CN112801911A
CN112801911A CN202110172477.8A CN202110172477A CN112801911A CN 112801911 A CN112801911 A CN 112801911A CN 202110172477 A CN202110172477 A CN 202110172477A CN 112801911 A CN112801911 A CN 112801911A
Authority
CN
China
Prior art keywords
image
repaired
spatial
features
channel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110172477.8A
Other languages
Chinese (zh)
Other versions
CN112801911B (en
Inventor
王波
张百灵
崔嵬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Changzuichu Software Co ltd
Original Assignee
Suzhou Changzuichu Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Changzuichu Software Co ltd filed Critical Suzhou Changzuichu Software Co ltd
Priority to CN202110172477.8A priority Critical patent/CN112801911B/en
Publication of CN112801911A publication Critical patent/CN112801911A/en
Application granted granted Critical
Publication of CN112801911B publication Critical patent/CN112801911B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • G06T5/70
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06T5/77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document

Abstract

The application discloses a method and a device for removing Chinese character noise in a natural image and a storage medium, wherein the method comprises the following steps: the image semantic segmentation network detects a region containing text elements in the image to be repaired, and takes a segmentation recognition region as a mask of the region to be repaired; repairing the region containing the character elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates a countermeasure network. The method and the device for detecting the character type noise elements in the image can quickly and automatically detect the common character type element areas in the image to be repaired, can select to automatically remove the character type noise elements in the natural image, and can also correct the areas needing to be repaired in a manual interaction mode. By using the image restoration method based on the generation countermeasure network, the restored image is more natural and vivid.

Description

Method and device for removing Chinese character noise in natural image and storage medium
Technical Field
The embodiment of the application relates to the technical field of image classification, in particular to a method and a device for removing character noise in a natural image and a storage medium.
Background
In recent years, with the advent of the big data age and the development of computer hardware, artificial intelligence has become more and more popular in our lives. The deep learning technology is widely applied to computer vision, and image recognition is one of the most widely applied technologies, such as photographing for identifying objects, face recognition, traffic sign recognition, gesture recognition, garbage classification and the like. The technologies are correspondingly applied to the electronic commerce industry, the automobile industry, the game industry and the manufacturing industry.
Images often have elements such as text due to human factors. These character elements spoil the beauty of the image, hinder the reuse of the image, and reduce the storage value and quality of the image. Therefore, a large number of application scenes need to remove text-like elements in the natural scene image to obtain a clean image. However, the character elements in the natural image are often different in style and distribution, such as handwriting, subtitles, watermarks, scratches, etc., which all increase the difficulty of removing the character elements. The existing mainstream character element removing method generally needs to manually label a character mask area and then carry out image restoration, and the method has the problems of poor quality of restored images and non-conformity with natural image characteristics, and is long in time consumption and heavy in labor cost burden.
On the other hand, the conventional image restoration method based on diffusion determines the diffusion direction by using the edge information of the area to be restored, and diffuses known information into the edge. The image restored by the method is unnatural, fuzzy and lack of texture details, and a large-range image defect area cannot be restored. Other traditional methods also have similar problems of complex processing flow, large calculation amount, low generalization and the like.
Disclosure of Invention
In view of the above, embodiments of the present application provide a method and an apparatus for removing text-like noise in a natural image, and a storage medium.
According to a first aspect of the present application, there is provided a method for removing word noise in a natural image, including:
detecting a region containing the character elements in the image to be repaired according to the image semantic segmentation network, and taking a segmentation recognition result as a mask of the region to be repaired;
repairing the region containing the character elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates a countermeasure network.
As an implementation manner, the detecting, according to an image semantic segmentation network, a region including a text element in an image to be repaired, and taking a segmentation recognition result as a mask of the region to be repaired, further includes:
after detecting a region containing the character elements in the image to be restored according to an image semantic segmentation network, determining whether a user selects a manual interaction mode to restore the image to be restored; if so, receiving the correction of the area to be repaired by the user through deletion, modification and addition operations; otherwise, the segmentation identification result is used as the mask of the area to be repaired.
As an implementation mode, the image semantic segmentation network is a U-shaped layer-skipping connection network structure of a U-Net segmentation network; and adding a cavity space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse the multi-scale context features.
As an implementation, the method further comprises:
adding an attention mechanism to enhance the feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to assign weights to each channel, and uses a spatial attention module to assign spatial feature weights.
As an implementation, the method further comprises:
the channel attention module performs global pooling on the feature map of each channel to acquire global information; learning by adopting two fully-connected layers to obtain the weight of each channel, and multiplying the weight by the initial characteristics;
the spatial attention module compresses the channel number of the feature map by using a 1-by-1 convolution operation; the spatial characteristics are regulated to 4 different scales by adopting self-adaptive pooling; after splicing and arranging the pooling features of 4 scales, inputting the pooling features into the two fully-connected layers to learn different local weights of the spatial features; regulating the learned weight parameters to the scale size of the compression features; restoring the spatial parameter scale to the spatial size of the channel attention feature by using 1-by-1 convolution and performing multiplication operation on the spatial parameter scale and the spatial size of the channel attention feature; and adding the obtained spatial features and the original features to obtain final attention features.
As an implementation, the method further comprises:
the image restoration model is a generator G for generating an antagonistic network model for the trained Pixel2 pixels; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
According to a second aspect of the present application, there is provided an apparatus for removing word noise in a natural image, comprising:
the detection and mask generation unit is used for detecting a region containing the character elements in the image to be repaired according to the image semantic segmentation network and taking the segmentation identification region as a mask of the region to be repaired;
the image restoration unit is used for restoring the region containing the character elements in the image to be restored by using the mask of the region to be restored according to the image restoration model; the image restoration model is a generator that generates a countermeasure network.
As an implementation, the apparatus further comprises:
the manual interaction unit is used for determining whether a user selects a manual interaction mode to repair the image to be repaired after the detection and mask generation unit detects the region containing the character elements in the image to be repaired as the region to be repaired according to the image semantic segmentation network; if so, receiving the correction of the area to be repaired by the user through deletion, modification and addition operations; otherwise, the detection and mask generation unit is informed to take the segmentation identification result as the mask of the area to be repaired.
As one implementation manner, the image semantic segmentation network in the detection and mask generation unit is a "U" type layer-skipping connection network structure of a U-Net segmentation network; and adding a cavity space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse the multi-scale context features.
As an implementation, the detection and mask generating unit is further configured to:
adding an attention mechanism to enhance the feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to assign weights to each channel, and uses a spatial attention module to assign spatial feature weights.
As an implementation, the detection and mask generating unit is further configured to:
the channel attention module performs global pooling on the feature map of each channel to acquire global information; learning by adopting two fully-connected layers to obtain the weight of each channel, and multiplying the weight by the initial characteristics;
the spatial attention module compresses the channel number of the feature map by using a 1-by-1 convolution operation; the spatial characteristics are regulated to 4 different scales by adopting self-adaptive pooling; after splicing and arranging the pooling features of 4 scales, inputting the pooling features into the two fully-connected layers to learn different local weights of the spatial features; regulating the learned weight parameters to the scale size of the compression features; restoring the spatial parameter scale to the spatial size of the channel attention feature by using 1-by-1 convolution and performing multiplication operation on the spatial parameter scale and the spatial size of the channel attention feature; and adding the obtained spatial features and the original features to obtain final attention features.
As an implementation manner, the image restoration unit is further configured to:
the image restoration model is a generator G for generating an antagonistic network model for the trained Pixel2 pixels; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
According to a third aspect of the present application, there is provided a storage medium having stored thereon an executable program which, when executed by a processor, performs the steps of the method for removing text-like noise in natural images.
According to the method, the device and the storage medium for removing the Chinese character noise in the natural image, the area containing the character elements in the image to be repaired is detected according to the image semantic segmentation network, and the segmentation recognition result is used as the mask of the area to be repaired; repairing the region containing the character elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates a countermeasure network. The method and the device for detecting the character type noise elements in the image can quickly and automatically detect the common character type element areas in the image to be repaired, can select to automatically remove the character type noise elements in the natural image, and can also correct the areas needing to be repaired in a manual interaction mode. By using the image restoration method based on the generation countermeasure network, the restored image is more natural and vivid.
Drawings
FIG. 1 is a schematic flow chart of a method for removing Chinese character noise in a natural image according to an embodiment of the present application;
FIG. 2 is a schematic structural diagram of an improved semantic segmentation model according to an embodiment of the present application;
FIG. 3 is a flowchart illustrating an exemplary method for removing text noise in a natural image according to an embodiment of the present disclosure;
FIG. 4 is a schematic structural diagram of an attention module according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a frame of a Pixel2Pixel model training structure provided in an embodiment of the present application;
fig. 6 is a schematic structural diagram illustrating a composition of a device for removing text noise in a natural image according to an embodiment of the present application.
Detailed Description
The following explains the essence of the technical solution of the embodiments of the present application in detail with reference to examples.
With the rise of deep learning, the deep neural convolutional network can easily detect characters in text or natural scene images and locate character areas. The mainstream deep learning character detection method comprises two methods based on target detection and semantic segmentation. Compared with a target detection algorithm with regression rectangular frame level identification precision, the semantic segmentation method can be used for identifying the pixel level, has more accurate positioning, has no strict requirement on the character direction, and is more fit with the character region outline. The mainstream semantic segmentation network structure is an Encoder-Decoder (Encoder-Decoder), such as FCN, U-Net and deep Lab series segmentation models.
The image restoration method based on the countermeasure generation network (GAN) in the deep learning can learn rich semantic information from a large-scale data set, and then fill missing contents in the image in an end-to-end mode, so that the restored image is more natural and vivid, and a better restoration effect is achieved.
The method and the device for restoring the natural image are combined with latest semantic segmentation and image restoration technologies, character areas in the natural image are obtained through semantic segmentation, and a generation countermeasure network is finally used for restoring the natural image in combination with an artificial interaction mechanism. Aiming at different application scenes, the image restoration is carried out by combining two decision mechanisms of automatic selection of character areas and manual interaction, the use is convenient, the labor burden is light, and the restored image is natural and vivid.
Fig. 1 is a schematic flow chart of a method for removing Chinese word noise in a natural image according to an embodiment of the present application, and as shown in fig. 1, the method for removing Chinese word noise in a natural image according to the embodiment of the present application includes the following processing steps:
step 101, detecting an area containing a character element in an image to be repaired according to an image semantic segmentation network, and taking a segmentation recognition result as a mask of the area to be repaired.
In the embodiment of the application, after detecting the region containing the character elements in the image to be restored according to the image semantic segmentation network, determining whether a user selects a manual interaction mode to restore the image to be restored; if so, receiving the correction of the area to be repaired by the user through deletion, modification and addition operations; otherwise, the segmentation identification result is used as the mask of the area to be repaired.
In the embodiment of the application, the image semantic segmentation network is a U-shaped layer-skipping connection network structure of a U-Net segmentation network; and adding a cavity space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse the multi-scale context features.
The improved semantic segmentation model of the embodiment of the application is shown in FIG. 2, and the whole U-Net network structure is similar to a large U letter. Firstly, down-sampling is carried out; then deconvolution is carried out to carry out up-sampling, and the former lower layer is fused; and then up-sampled again. This process is repeated to obtain an output attention image.
In the embodiment of the present application, the hole spatial convolution pooling pyramid (ASPP) samples given input in parallel with hole convolutions of different sampling rates, which is equivalent to capturing the context of an image at multiple scales.
In the embodiment of the application, an attention mechanism is added to enhance the feature characterization capability of the image semantic segmentation network; the attention mechanism uses a channel attention module to assign weights to each channel, and uses a spatial attention module to assign spatial feature weights.
In the embodiment of the application, the channel attention module performs global pooling on the feature map of each channel to acquire global information; learning by adopting two fully-connected layers to obtain the weight of each channel, and multiplying the weight by the initial characteristics;
the spatial attention module compresses the channel number of the feature map by using a 1-by-1 convolution operation; the spatial characteristics are regulated to 4 different scales by adopting self-adaptive pooling; after splicing and arranging the pooling features of 4 scales, inputting the pooling features into the two fully-connected layers to learn different local weights of the spatial features; regulating the learned weight parameters to the scale size of the compression features; restoring the spatial parameter scale to the spatial size of the channel attention feature by using 1-by-1 convolution and performing multiplication operation on the spatial parameter scale and the spatial size of the channel attention feature; and adding the obtained spatial features and the original features to obtain final attention features.
102, repairing a region containing the character elements in the image to be repaired by using the mask of the region to be repaired according to an image repairing model; the image restoration model is a generator that generates a countermeasure network.
In the embodiment of the application, the image restoration model (image restoration module) generates a generator G of an antagonistic network model for the trained Pixel2 Pixel; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
In the embodiment of the application, after the repair area mask is generated, the selected area is repaired by using the image repair module. The image inpainting module uses the generator G of the trained Pixel2Pixel model to inpaint the synthetic realistic natural image. Pixel2Pixel is a generation countermeasure network, and the input is paired images during training, and mainly comprises a generator G and a discriminator D. In order to improve the details of the image and retain information of different scales, a U-Net model is adopted as a generator G.
The embodiments of the present application will be described in further detail below with reference to specific examples.
In the embodiment of the present application, a natural image is taken as an example for description, and it should be noted that other pictures or images and the like can use the technical means of the embodiment of the present application like texts such as screen shots and pictures.
Fig. 3 is a flowchart illustrating a specific example of a method for removing text noise in a natural image according to an embodiment of the present application, including the following specific steps:
first, a user loads an image to be restored. And automatically detecting the region containing the character elements in the natural image through a character element detection module. The character detection module detects a character area by adopting a trained image semantic segmentation network, and takes a segmentation recognition result as a mask of the area to be repaired. The semantic segmentation network model refers to a U-shaped jump layer connection network structure of a classical segmentation network U-Net. Aiming at the character characteristics, an ASPP module is added on the basis of an original U-Net to extract and fuse multi-scale contextual characteristics, and further, the characteristic characterization capability of a new attention mechanism enhanced network is provided, and the overall structure of the model is shown in FIG. 2.
In particular, the attention mechanism considers both the enhanced channel and the spatial features. The mechanism first uses a channel attention module whose main function is to assign weights to individual channels, and then uses a spatial attention module to assign spatial feature weights. The channel attention module firstly performs global pooling on the feature map of each channel to obtain global information, then learns the weight of each channel by adopting two fully-connected layers (fc layers), and performs multiplication operation with the initial feature. On the basis, the spatial attention module firstly compresses the number of channels of a newly obtained feature graph by using 1 × 1 convolution operation to reduce the calculated amount, then normalizes the spatial features to 4 different scales such as [1 × 1,8 × 8,16 × 16,32 × 32] and the like by adopting self-adaptive pooling to count global or local features of different feature graphs, after splicing and regularizing the pooled features of the 4 scales, inputs the same into two fully-connected layers (fc layers) to learn different local weights of the spatial features, normalizes the learned weight parameters to the scale size of the previous compressed feature, then restores the spatial parameter scale to the spatial size of the channel attention feature by using 1 × 1 convolution and performs multiplication operation on the spatial parameter scale, and finally performs addition operation on the latest obtained spatial feature and the original feature to obtain the final attention feature. The attention module structure is shown in fig. 4. Fig. 4 is a schematic structural diagram of an attention module according to an embodiment of the present invention.
Specifically, the system judges whether the user selects to correct and modify the area to be repaired predicted by the U-Net in a manual interaction mode. If manual interaction is needed, the user can correct the area to be repaired through operations of deleting, modifying, adding and the like before the final mask of the area to be repaired is generated. And if the manual interaction operation is not adopted, directly generating the mask of the area to be repaired by using the character prediction area.
And after the repair area mask is generated, repairing the selected area by using an image repair module. The image inpainting module uses the generator G of the trained Pixel2Pixel model to inpaint the synthetic realistic natural image. Pixel2Pixel is a generation countermeasure network, and the input is paired images during training, and mainly comprises a generator G and a discriminator D. In order to improve the details of the image and retain information of different scales, a U-Net model is adopted as a generator G. The training framework for Pixel2Pixel is shown in FIG. 5. Fig. 5 is a schematic diagram of a frame of a Pixel2Pixel model training structure provided in an embodiment of the present application.
And storing the repaired natural image until all image processing is finished, and quitting the system.
Fig. 6 is a schematic structural diagram of a device for removing chinese character noise in a natural image according to an embodiment of the present application, and as shown in fig. 6, the device for removing chinese character noise in a natural image according to an embodiment of the present application includes:
the detection and mask generation unit 61 is used for detecting a region containing the character elements in the image to be repaired according to the image semantic segmentation network and taking the segmentation identification region as a mask of the region to be repaired;
the image restoration unit 62 is configured to restore, according to an image restoration model, an area including a text element in the image to be restored by using the area mask to be restored; the image restoration model is a generator that generates a countermeasure network.
The device further comprises:
the manual interaction unit 63 is configured to determine whether a user selects a manual interaction mode to repair the image to be repaired after the detection and mask generation unit 61 detects, according to the image semantic segmentation network, that the region of the image to be repaired contains the text elements as the region to be repaired; if so, receiving the correction of the area to be repaired by the user through deletion, modification and addition operations; otherwise, the detecting and mask generating unit 61 is notified to use the segmentation identification result as the mask of the area to be repaired.
The image semantic segmentation network in the detection and mask generation unit 61 is a U-shaped layer-skipping connection network structure of a U-Net segmentation network; and adding a cavity space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse the multi-scale context features.
The detection and mask generating unit 61 is further configured to:
adding an attention mechanism to enhance the feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to assign weights to each channel, and uses a spatial attention module to assign spatial feature weights.
The detection and mask generating unit 61 is further configured to:
the channel attention module performs global pooling on the feature map of each channel to acquire global information; learning by adopting two fully-connected layers to obtain the weight of each channel, and multiplying the weight by the initial characteristics;
the spatial attention module compresses the channel number of the feature map by using a 1-by-1 convolution operation; the spatial characteristics are regulated to 4 different scales by adopting self-adaptive pooling; after splicing and arranging the pooling features of 4 scales, inputting the pooling features into the two fully-connected layers to learn different local weights of the spatial features; regulating the learned weight parameters to the scale size of the compression features; restoring the spatial parameter scale to the spatial size of the channel attention feature by using 1-by-1 convolution and performing multiplication operation on the spatial parameter scale and the spatial size of the channel attention feature; and adding the obtained spatial features and the original features to obtain final attention features.
The image restoration unit 62 is further configured to:
the image restoration model is a generator G for generating an antagonistic network model for the trained Pixel2 pixels; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
In an exemplary embodiment, the Processing units of the apparatus for removing text-based noise in a natural image according to an embodiment of the present Application may be implemented by one or more Central Processing Units (CPUs), Graphics Processing Units (GPUs), Baseband Processors (BPs), Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors (GPUs), controllers, Micro Controllers (MCUs), microprocessors (microprocessunits), or other electronic components.
In the embodiment of the present disclosure, the specific manner in which each processing unit in the apparatus for removing text noise in natural images shown in fig. 6 performs operations has been described in detail in the embodiment related to the method, and will not be described in detail here.
The embodiment of the application also describes a storage medium, on which an executable program is stored, and the executable program realizes the steps of the method for removing the character noise in the natural image.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present invention, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention. The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are only illustrative, for example, the division of the unit is only one logical function division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present invention, and all such changes or substitutions are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (13)

1. A method for removing word noise in a natural image is characterized by comprising the following steps:
detecting a region containing the character elements in the image to be repaired according to the image semantic segmentation network, and taking the segmentation recognition region as a mask of the region to be repaired;
repairing the region containing the character elements in the image to be repaired by using the mask of the region to be repaired according to the image repairing model; the image restoration model is a generator that generates a countermeasure network.
2. The method according to claim 1, wherein the detecting a region containing a text element in the image to be repaired according to the image semantic segmentation network and using the segmentation recognition result as a mask of the region to be repaired, further comprises:
after detecting a region containing the character elements in the image to be restored according to an image semantic segmentation network, determining whether a user selects a manual interaction mode to restore the image to be restored; if so, receiving the correction of the area to be repaired by the user through deletion, modification and addition operations; otherwise, the segmentation recognition result is automatically used as the mask of the area to be repaired.
3. The method according to claim 1, wherein the image semantic segmentation network is a "U" type layer-hopping connection network structure of a U-Net segmentation network; and adding a cavity space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse the multi-scale context features.
4. The method of claim 3, further comprising:
adding an attention mechanism to enhance the feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to assign weights to each channel, and uses a spatial attention module to assign spatial feature weights.
5. The method of claim 4, further comprising:
the channel attention module performs global pooling on the feature map of each channel to acquire global information; learning by adopting two fully-connected layers to obtain the weight of each channel, and multiplying the weight by the initial characteristics;
the spatial attention module compresses the channel number of the feature map by using a 1-by-1 convolution operation; the spatial characteristics are regulated to 4 different scales by adopting self-adaptive pooling; after splicing and arranging the pooling features of 4 scales, inputting the pooling features into the two fully-connected layers to learn different local weights of the spatial features; regulating the learned weight parameters to the scale size of the compression features; restoring the spatial parameter scale to the spatial size of the channel attention feature by using 1-by-1 convolution and performing multiplication operation on the spatial parameter scale and the spatial size of the channel attention feature; and adding the obtained spatial features and the original features to obtain final attention features.
6. The method of claim 1, further comprising:
the image restoration model is a generator G for generating a confrontation network model for a trained Pixel2Pixel, and the image restoration model adopts a U-Net segmentation network model as the generator G.
7. An apparatus for removing word noise in a natural image, the apparatus comprising:
the detection and mask generation unit is used for detecting a region containing the character elements in the image to be repaired according to the image semantic segmentation network and taking the segmentation identification region as a mask of the region to be repaired;
the image restoration unit is used for restoring the region containing the character elements in the image to be restored by using the mask of the region to be restored according to the image restoration model; the image restoration model is a generator that generates a countermeasure network.
8. The apparatus of claim 7, further comprising:
the manual interaction unit is used for determining whether a user selects a manual interaction mode to repair the image to be repaired after the detection and mask generation unit detects the region containing the character elements in the image to be repaired as the region to be repaired according to the image semantic segmentation network; if so, receiving the correction of the area to be repaired by the user through deletion, modification and addition operations; otherwise, the detection and mask generation unit is informed to automatically take the segmentation identification result as the mask of the area to be repaired.
9. The apparatus according to claim 7, wherein the image semantic segmentation network in the detection and mask generation unit is a "U" type layer-skipping connection network structure of a U-Net segmentation network; and adding a cavity space convolution pooling pyramid ASPP network on the basis of the U-Net to extract and fuse the multi-scale context features.
10. The apparatus of claim 9, wherein the detection and mask generation unit is further configured to:
adding an attention mechanism to enhance the feature characterization capability of the image semantic segmentation network;
the attention mechanism uses a channel attention module to assign weights to each channel, and uses a spatial attention module to assign spatial feature weights.
11. The apparatus of claim 10, wherein the detection and mask generation unit is further configured to:
the channel attention module performs global pooling on the feature map of each channel to acquire global information; learning by adopting two fully-connected layers to obtain the weight of each channel, and multiplying the weight by the initial characteristics;
the spatial attention module compresses the channel number of the feature map by using a 1-by-1 convolution operation; adjusting the spatial characteristics to 4 different scales by adopting self-adaptive pooling; after splicing and arranging the pooling features of 4 scales, inputting the pooling features into the two fully-connected layers to learn different local weights of the spatial features; restoring the learned weight parameters to the scale of the compression features; restoring the spatial parameter scale to the spatial size of the channel attention feature by using 1-by-1 convolution and performing multiplication operation on the spatial parameter scale and the spatial size of the channel attention feature; and adding the obtained spatial features and the original features to obtain final attention features.
12. The apparatus of claim 7, wherein the image inpainting unit is further configured to:
the image restoration model is a generator G for generating an antagonistic network model for the trained Pixel2 pixels; the Pixel2Pixel generation countermeasure network model adopts a U-Net segmentation network model as the generator G.
13. A storage medium having stored thereon an executable program which, when executed by a processor, performs the steps of the method of removing text noise in natural images according to any one of claims 1 to 6.
CN202110172477.8A 2021-02-08 2021-02-08 Method and device for removing text noise in natural image and storage medium Active CN112801911B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110172477.8A CN112801911B (en) 2021-02-08 2021-02-08 Method and device for removing text noise in natural image and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110172477.8A CN112801911B (en) 2021-02-08 2021-02-08 Method and device for removing text noise in natural image and storage medium

Publications (2)

Publication Number Publication Date
CN112801911A true CN112801911A (en) 2021-05-14
CN112801911B CN112801911B (en) 2024-03-26

Family

ID=75814802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110172477.8A Active CN112801911B (en) 2021-02-08 2021-02-08 Method and device for removing text noise in natural image and storage medium

Country Status (1)

Country Link
CN (1) CN112801911B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114627389A (en) * 2022-03-23 2022-06-14 中国科学院空天信息创新研究院 Raft culture area extraction method based on multi-temporal optical remote sensing image
WO2023122955A1 (en) * 2021-12-28 2023-07-06 华为技术有限公司 Image processing method and apparatus, and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN107609560A (en) * 2017-09-27 2018-01-19 北京小米移动软件有限公司 Character recognition method and device
CN108805840A (en) * 2018-06-11 2018-11-13 Oppo(重庆)智能科技有限公司 Method, apparatus, terminal and the computer readable storage medium of image denoising
CN109359550A (en) * 2018-09-20 2019-02-19 大连民族大学 Language of the Manchus document seal Abstraction and minimizing technology based on depth learning technology
CN109583449A (en) * 2018-10-29 2019-04-05 深圳市华尊科技股份有限公司 Character identifying method and Related product
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
WO2019238560A1 (en) * 2018-06-12 2019-12-19 Tomtom Global Content B.V. Generative adversarial networks for image segmentation
CN110738207A (en) * 2019-09-10 2020-01-31 西南交通大学 character detection method for fusing character area edge information in character image
CN110956579A (en) * 2019-11-27 2020-04-03 中山大学 Text image rewriting method based on semantic segmentation graph generation
CN111080723A (en) * 2019-12-17 2020-04-28 易诚高科(大连)科技有限公司 Image element segmentation method based on Unet network
CN111160352A (en) * 2019-12-27 2020-05-15 创新奇智(北京)科技有限公司 Workpiece metal surface character recognition method and system based on image segmentation
CN111199550A (en) * 2020-04-09 2020-05-26 腾讯科技(深圳)有限公司 Training method, segmentation method, device and storage medium of image segmentation network
WO2020219915A1 (en) * 2019-04-24 2020-10-29 University Of Virginia Patent Foundation Denoising magnetic resonance images using unsupervised deep convolutional neural networks

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN107609560A (en) * 2017-09-27 2018-01-19 北京小米移动软件有限公司 Character recognition method and device
CN108805840A (en) * 2018-06-11 2018-11-13 Oppo(重庆)智能科技有限公司 Method, apparatus, terminal and the computer readable storage medium of image denoising
WO2019238560A1 (en) * 2018-06-12 2019-12-19 Tomtom Global Content B.V. Generative adversarial networks for image segmentation
CN109359550A (en) * 2018-09-20 2019-02-19 大连民族大学 Language of the Manchus document seal Abstraction and minimizing technology based on depth learning technology
CN109583449A (en) * 2018-10-29 2019-04-05 深圳市华尊科技股份有限公司 Character identifying method and Related product
WO2020219915A1 (en) * 2019-04-24 2020-10-29 University Of Virginia Patent Foundation Denoising magnetic resonance images using unsupervised deep convolutional neural networks
CN110287960A (en) * 2019-07-02 2019-09-27 中国科学院信息工程研究所 The detection recognition method of curve text in natural scene image
CN110738207A (en) * 2019-09-10 2020-01-31 西南交通大学 character detection method for fusing character area edge information in character image
CN110956579A (en) * 2019-11-27 2020-04-03 中山大学 Text image rewriting method based on semantic segmentation graph generation
CN111080723A (en) * 2019-12-17 2020-04-28 易诚高科(大连)科技有限公司 Image element segmentation method based on Unet network
CN111160352A (en) * 2019-12-27 2020-05-15 创新奇智(北京)科技有限公司 Workpiece metal surface character recognition method and system based on image segmentation
CN111199550A (en) * 2020-04-09 2020-05-26 腾讯科技(深圳)有限公司 Training method, segmentation method, device and storage medium of image segmentation network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
XINYI WU: "Semantic Prior Based Generative Adversarial Network for Video Super-Resolution", 2019 IEEE 16TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI 2019), 11 July 2019 (2019-07-11) *
艾合麦提江・麦提托合提;艾斯卡尔・艾木都拉;阿布都萨拉木・达吾提;: "基于深度学习的场景文字检测与识别综述", 电视技术, no. 14 *
陈锟;乔沁;宋志坚;: "生成对抗网络在医学图像处理中的应用", 生命科学仪器, no. 1, 25 October 2018 (2018-10-25) *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023122955A1 (en) * 2021-12-28 2023-07-06 华为技术有限公司 Image processing method and apparatus, and storage medium
CN114627389A (en) * 2022-03-23 2022-06-14 中国科学院空天信息创新研究院 Raft culture area extraction method based on multi-temporal optical remote sensing image
CN114627389B (en) * 2022-03-23 2023-01-31 中国科学院空天信息创新研究院 Raft culture area extraction method based on multi-temporal optical remote sensing image

Also Published As

Publication number Publication date
CN112801911B (en) 2024-03-26

Similar Documents

Publication Publication Date Title
US11250548B2 (en) Digital image completion using deep learning
CN110163198B (en) Table identification reconstruction method and device and storage medium
CN108401112B (en) Image processing method, device, terminal and storage medium
CN113379775A (en) Generating a colorized image based on interactive color edges using a colorized neural network
CN108710893B (en) Digital image camera source model classification method based on feature fusion
US20220262009A1 (en) Generating refined alpha mattes utilizing guidance masks and a progressive refinement network
CN112801911A (en) Method and device for removing Chinese character noise in natural image and storage medium
CN114049280A (en) Image erasing and repairing method and device, equipment, medium and product thereof
CN113160062A (en) Infrared image target detection method, device, equipment and storage medium
CN116645592B (en) Crack detection method based on image processing and storage medium
CN110781980A (en) Training method of target detection model, target detection method and device
CN111461211B (en) Feature extraction method for lightweight target detection and corresponding detection method
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
KR102430743B1 (en) Apparatus and method for developing object analysis model based on data augmentation
CN115967823A (en) Video cover generation method and device, electronic equipment and readable medium
CN116167910B (en) Text editing method, text editing device, computer equipment and computer readable storage medium
CN115115552B (en) Image correction model training method, image correction device and computer equipment
CN110674721A (en) Method for automatically detecting test paper layout formula
CN112822393B (en) Image processing method and device and electronic equipment
US11869125B2 (en) Generating composite images with objects from different times
CN113111804A (en) Face detection method and device, electronic equipment and storage medium
GB2567723A (en) Digital image completion using deep learning
US20230055204A1 (en) Generating colorized digital images utilizing a re-colorization neural network with local hints
US20230132180A1 (en) Upsampling and refining segmentation masks
CN115700728A (en) Image processing method and image processing apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant