CN116703768A - Training method, device, medium and equipment for blind spot denoising network model - Google Patents

Training method, device, medium and equipment for blind spot denoising network model Download PDF

Info

Publication number
CN116703768A
CN116703768A CN202310666771.3A CN202310666771A CN116703768A CN 116703768 A CN116703768 A CN 116703768A CN 202310666771 A CN202310666771 A CN 202310666771A CN 116703768 A CN116703768 A CN 116703768A
Authority
CN
China
Prior art keywords
training
network model
feature
feature fusion
mask
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310666771.3A
Other languages
Chinese (zh)
Inventor
张旦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Qigan Electronic Information Technology Co ltd
Original Assignee
Shanghai Qigan Electronic Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Qigan Electronic Information Technology Co ltd filed Critical Shanghai Qigan Electronic Information Technology Co ltd
Priority to CN202310666771.3A priority Critical patent/CN116703768A/en
Publication of CN116703768A publication Critical patent/CN116703768A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a training method, a device, a medium and equipment of a blind spot denoising network model, which are used for providing a blind spot denoising network model for self-supervision deep learning so as to effectively remove noise with larger area in an image, and comprise the following steps: optimizing a feature fusion network model, wherein the optimized feature fusion network model comprises at least two feature extraction channels, the mask types or the mask sizes of convolution layers in different feature extraction channels are different, and feature fusion is completed by the fusion layers according to the extraction results of different feature extraction channels; taking the noisy image in the public data set as a training data set; and inputting the training data set into the optimized feature fusion network model to perform self-supervision model training until the set iteration times are reached or the loss value of the loss function is smaller than a set threshold value, and outputting the blind spot denoising network model.

Description

Training method, device, medium and equipment for blind spot denoising network model
Technical Field
The invention relates to the technical field of image processing, in particular to a training method, device, medium and equipment of a blind spot denoising network model.
Background
At present, image denoising is an important step which is unavoidable in image processing, and the denoising effect has a great influence on the flow of subsequent image processing. The traditional image denoising algorithm is low in speed and poor in robustness. With the development of deep learning, a deep learning image denoising algorithm also makes great progress. Supervised image denoising requires a noise-clean image pair, and in practical applications, the collection of this noise-clean image pair is very difficult, so many self-supervised training methods that do not require clean pictures have been developed, such as blind spot network (Blind Spot Network, BSN) denoising methods, however, because the necessary condition for BSN denoising is that the noise is assumed to be spatially independent, but the real noise tends to be spatially continuous. Therefore, in order to break the spatial relationship of noise, the pixel point sampling is performed on the picture before training to break the spatial relationship of noise, and the center mask (BSN) is used to perform the center mask on the convolution kernel in the training process to achieve the blind point effect. In denoising a real noise picture, the key point of the improvement of the blind spot denoising effect is to break the noise space connection and generate blind spots and to keep the detail information of the original pixels of the picture as far as possible. Existing blind spot denoising models, whether masked (mask) on an input image or masked in a network, are punctiform. The blind spot denoising method is a method for recovering a blind spot by utilizing the spatial relation between the pixels of the blind spot and surrounding pixels. Thus, this method is capable of denoising, provided that the point is spatially linked to surrounding points, and the surrounding points are pixels of the picture itself, not noise. When the occupied area of noise in the picture is large, if only one current pixel point is covered in the process of extracting the characteristics, the current pixel point is recovered by utilizing surrounding information, and the recovered pixel point is likely to be still a noise point, so that the noise with large area in the image is difficult to remove.
Therefore, it is needed to provide a training scheme of an image denoising network model to effectively remove noise with larger area in an image.
Disclosure of Invention
The embodiment of the invention provides a training method, a training device, a training medium and training equipment for a blind spot denoising network model, which are used for effectively removing noise with larger area in an image.
In a first aspect, the present invention provides a training method for a blind spot denoising network model, where the training method may include the following steps: optimizing a feature fusion network model, wherein the optimized feature fusion network model comprises at least two feature extraction channels, the mask types or the mask sizes of convolution layers in different feature extraction channels are different, and feature fusion is completed by the fusion layers according to the extraction results of different feature extraction channels; taking the noisy image in the public data set as a training data set; and inputting the training data set into the optimized feature fusion network model to perform self-supervision model training until the set iteration times are reached or the loss value of the loss function is smaller than a set threshold value, and outputting the blind spot denoising network model.
The invention provides a training method of a blind spot denoising network model, which has the beneficial effects that: the training of the image denoising network model can be realized by using the noisy image without collecting the noise-clean image pair in advance, and the feature is extracted by optimizing the feature fusion network model and designing a convolution kernel comprising various mask shapes, so that the optimized feature fusion network model combines and extracts the global feature and the local feature, and the noise with larger area which is difficult to remove in the image can be effectively removed.
In one possible implementation manner, inputting the training data set into the optimized feature fusion network model for self-supervision model training includes: after the noise-containing image is subjected to feature pre-extraction, respectively carrying out feature extraction through each feature extraction channel; inputting the extraction results of different feature extraction channels into CDCL comprising DCL for convolution, then firstly connecting and fusing the extraction results of the feature extraction channels with the same mask size, fusing the features extracted by convolution kernels with different mask types, connecting the outputs of all channels together after a plurality of DCLs, and completing the feature fusion of the convolution kernels with different mask sizes; and finally, obtaining final output through channel transformation and feature fusion of a plurality of convolutions. The implementation method can remove the noise with larger area which is difficult to remove in the image by using a method of combining a plurality of masks in the self-supervision image denoising process.
In another possible implementation manner, the mask type is at least one or more of cross, back-font, row, column, oblique line and reverse oblique line.
In another possible embodiment, the loss function satisfies: loss= |i out -I N1
Wherein Loss is L1 Loss function, I out I is the output result of the blind spot denoising network model N Is a noisy image.
In other possible embodiments, inputting the mask image training set into the optimized feature fusion network model for self-supervision model training includes: randomly cutting the noisy image into a plurality of sub-images, randomly rotating the sub-images, and horizontally or vertically turning the sub-images, inputting the sub-images into an optimized feature fusion network model, and performing self-supervision model training.
In a second aspect, the embodiment of the present invention further provides a training device for a blind spot denoising network model, where the training device includes a module/unit for performing the method of any one of the possible embodiments of the first aspect. These modules/units may be implemented by hardware, or may be implemented by hardware executing corresponding software. These modules/units may be implemented by hardware, or may be implemented by hardware executing corresponding software.
In a third aspect, embodiments of the present invention further provide a computer readable storage medium, where the readable storage medium includes a program, where the program when executed on an electronic device causes the electronic device to perform a method according to any one of the possible implementation manners of the first aspect.
In a fourth aspect, the present embodiment also provides a computer program product comprising a program product for causing an electronic device to perform the method of any one of the possible embodiments of the first aspect described above when the program product is run on the electronic device.
The advantageous effects concerning the above second to sixth aspects can be seen from the description in the above first aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present invention;
fig. 2 is a schematic flow chart of a training method of a blind spot denoising network model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a blind spot denoising network model according to an embodiment of the present invention;
fig. 4 is a schematic diagram of CDCL and DCL in a blind spot denoising network model according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of various mask shapes according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a training device for a blind spot denoising network model according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Image denoising aims to recover clean signals from noisy observations, which is one of the important tasks in image processing and low-level computer vision. Recently, with the rapid development of neural networks, supervised denoising models based on learning have achieved satisfactory performance. However, the supervised denoising model based on learning is largely dependent on noise-clean image pairs. In practical applications, such image pairs are complex and expensive to collect, even in dynamic scenes, medical imaging, etc., because of the limitation of real conditions, the image pairs meeting the requirements cannot be obtained at all, which results in that the supervised image denoising method is difficult to adapt to some denoising scenes or is difficult to achieve the ideal denoising effect. Therefore, a self-monitoring image denoising method is provided in the prior art, and compared with a supervised image denoising method, the self-monitoring image denoising method has practical value because the self-monitoring image denoising method does not need to use noise-clean image pairs as references. Most of the current self-supervision methods can realize the training of the denoising model by using noisy images. However, in the existing self-supervision denoising method, when the occupied area of noise in a picture is large, if only one current pixel point is shielded in the process of extracting features, then the current pixel point is restored by using surrounding information, and the restored pixel point is likely to be still a noise point, so that the noise with large area in the picture is difficult to remove, and the noise reduction effect is insufficient.
In order to overcome the defect of the noise reduction effect of the existing image denoising network model, the invention provides a training method of a blind spot denoising network model, which can realize the training of the image denoising network model by using a noisy image without collecting noise-clean image pairs in advance, and extracts the features by optimizing the feature fusion network model and designing a convolution kernel comprising various mask shapes, so that the optimized feature fusion network model combines the global features and the local features, and the noise with larger area which is difficult to remove in the image can be effectively removed.
Some terms in the embodiments of the present invention are explained below to facilitate understanding by those skilled in the art.
1. Convolutional neural networks are a type of feedforward neural network (feed forward neural networks) that contains convolutional computations and has a deep structure, and are one of the representative algorithms of deep learning. The convolutional neural network has characteristic learning capability and can carry out translation invariant classification on input information according to a hierarchical structure of the convolutional neural network. Convolutional neural networks, which were developed by biological neuroscience research, were originally proposed to process data having a network-like structure, such as a two-dimensional network that can be thought of as an image consisting of pixels. The general network structure of the convolutional neural network comprises a data input layer, a convolutional layer, a data excitation layer, a pooling layer, a full connection layer and a data output layer.
Embodiments of the present invention relate to artificial intelligence (artificial intelligence, AI) and machine learning techniques, designed based on deep learning networks and Machine Learning (ML) in artificial intelligence.
With research and progress of artificial intelligence technology, artificial intelligence is developed in various fields such as common smart home, intelligent customer service, virtual assistant, smart speaker, smart marketing, unmanned, automatic driving, robot, smart medical, etc., and it is believed that with the development of technology, artificial intelligence will be applied in more fields and become more and more important value.
2. Machine learning is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like.
In the description of embodiments of the present invention, the terminology used in the embodiments below is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," "the," and "the" are intended to include, for example, "one or more" such forms of expression, unless the context clearly indicates to the contrary. It should also be understood that in the following embodiments of the present invention, "at least one", "one or more" means one or more than two (including two). The term "and/or" is used to describe an association relationship of associated objects, meaning that there may be three relationships; for example, a and/or B may represent: a alone, a and B together, and B alone, wherein A, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the invention. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise. The term "coupled" includes both direct and indirect connections, unless stated otherwise. The terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" is not necessarily to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
The training method of the blind spot denoising network model provided by the invention can be applied to an application scene shown in fig. 1, wherein the application scene comprises a server 100 and a terminal device 200.
In a possible implementation manner, the server 100 is configured to optimize a feature fusion network model, where the converter in the optimized feature fusion network model includes at least two feature extraction channels, mask types or mask sizes of convolution layers in different feature extraction channels are different, and feature fusion is completed by the fusion layer according to extraction results of the different feature extraction channels; taking the noisy image as a training data set; and inputting the training data set into the optimized feature fusion network model to perform self-supervision model training until the set iteration times are reached or the loss value of the loss function is smaller than a set threshold value, and outputting the blind spot denoising network model. The terminal device 200 acquires the blind spot denoising network model from the server 100 and uses it for image denoising.
The server 100 and the terminal device 200 may be connected through a wireless network, and the terminal device 200 may be a terminal device with an image sensor, which may be a smart phone, a tablet computer, a medical imaging device, or the like. The server 100 may be a server, or a server cluster or a cloud computing center formed by a plurality of servers.
Based on the application scenario diagram shown in fig. 1, the embodiment of the invention provides a training method flow of a blind spot denoising network model, as shown in fig. 2, the method flow can be executed by a server, and the method comprises the following steps:
s201, optimizing a feature fusion network model, wherein the optimized feature fusion network model comprises at least two feature extraction channels, mask types or mask sizes of convolution layers in different feature extraction channels are different, and feature fusion is completed by the fusion layers according to extraction results of different feature extraction channels.
And S202, taking the noisy image in the public data set as a training data set.
Illustratively, the present invention may employ SIDD-Medium and DND in a real image denoising common dataset. The SIDD-Medium contains 320 pairs of real noise and clean images, and the embodiment can use the noisy sRGB images therein as training sets, and corresponding SIDD validation and Benchmark as training sets and test sets respectively. The DND dataset contains 50 true noise images, so DND images are typically used as the test set, but in the blind spot denoising network model, the invention uses DND as both the training set and the test set because the invention can be trained using noisy pictures.
And S203, inputting the training data set into the optimized feature fusion network model to perform self-supervision model training until the set iteration times are reached or the loss value of the loss function is smaller than a set threshold value, and outputting the blind spot denoising network model.
In this step, in one possible embodiment, inputting the training data set into the optimized feature fusion network model to perform self-supervision model training includes: after the noise-containing image is subjected to feature pre-extraction, respectively carrying out feature extraction through each feature extraction channel; inputting the extraction results of different feature extraction channels into CDCL comprising DCL for convolution, then firstly connecting and fusing the extraction results of the feature extraction channels with the same mask size, fusing the features extracted by convolution kernels with different mask types, connecting the outputs of all channels together after a plurality of DCLs, and completing the feature fusion of the convolution kernels with different mask sizes; and finally, obtaining final output through channel transformation and feature fusion of a plurality of convolutions.
Fig. 3 illustrates a model framework diagram of an optimized feature fusion network model, named MM-BSN model framework. In fig. 3, it is shown that, when the mask (mask) is a punctiform mask and a cross mask with two different sizes, the optimized feature fusion network model may have four feature extraction channels, where each feature extraction channel corresponds to one mask, after simple feature extraction, a noisy image passes through multiple convolution layers containing convolution kernels with different masks, then, a Concatenation hole convolution layer (Concatenation-based Dilated Convolution Layer, CDCL) containing a small number of hole convolution layers (Dilated Convolution Layer, DCL) (set as 2 in fig. 3) is input, and then, the features extracted by the convolution kernels of different types of masks are fused according to the size of the mask, and after the feature extraction channel is set as 7 in fig. 3, all the mask outputs are connected together, so as to complete the feature fusion of the convolution kernels with different sizes of masks, and finally, the final output is obtained through multiple channel transformation and feature fusion of 1×1 convolution. It should be appreciated that if the mask type or size of the mask is increased, additional feature lift channels may be added to the model frame as shown in FIG. 3.
Fig. 4 illustrates a specific composition of the DCL and the CDCL, and it can be seen from fig. 4 that the DCL includes a 1×1 convolution and a 3×3 convolution, and a summer. The CDCL includes two branches, one of which includes two 1×1 convolutions and one DCL, and the other includes one 1×1 convolution, and the convolution results of the two branches are spliced in channel dimensions by a connector (connectate), and finally the final result is output after the convolution is performed by one 1×1 convolution layer.
In addition, the present invention also proposes a cross shape, a zigzag shape, a row, a column, a diagonal line, a reverse diagonal line, and a x-shaped mask in fig. 5, and by way of example, fig. 5 is a diagram showing the shapes of various masks when the convolution kernel size is 5*5, and fig. 5 is a diagram showing the shapes of various masks when the convolution kernel size is 5 times 5. Wherein gray represents a value of 1 and white represents a value of 0. Fig. 5 (a) and 5 (f) show a dot mask and a character-returning mask, respectively, and note that the dot mask can be understood as one of the character-returning masks; fig. 5 (b) and 5 (g) show masks with 0 in the middle row direction and 0 in the col direction, respectively; fig. 5 (c) and 5 (h) respectively show an inverse cross mask having a value of 0 in the middle cross direction and a cross mask having a value of 0 on the non-center cross except for the center point of 0; fig. 5 (d) and 5 (i) show a value of 0 on a 45 ° diagonal in the mask and a value of 0 on a 135 ° diagonal in the mask, respectively; fig. 5 (e) and 5 (j) show a value of 0 in the x direction and a value of 0 in the non-x direction except for the center point of 0, respectively.
In one possible embodiment, the batch size is set to 8 and the number of iterations is 20 epochs (epochs) when training the model. The optimization function used was Adma, an initial learning rate of 0.0001, and a learning rate of 0.1 times every 8 periods (epoch). The embodiment can randomly cut the picture into 128×128 sub-images (patches), and input the sub-images after 90 ° range class random rotation and horizontal or vertical overturn for model training. The various models may be trained on python3.8.0, pytorch1.12.0, nvidia Tesla T4 GPUs.
In addition, the embodiment can deploy the trained network to the cloud or edge equipment, take the sRGB image obtained by the front end as the input data of the network, and directly give the output of the model to the subsequent image processing equipment. The invention is also applicable to the basic processing of other images, such as image reconstruction and the like.
In some embodiments of the present invention, an embodiment of the present invention discloses a training device for a blind spot denoising network model, as shown in fig. 6, where the training device is configured to implement the methods described in the foregoing embodiments of the training method, and the training device includes: an optimization unit 601 and a training unit 602. The optimizing unit 601 is configured to optimize a feature fusion network model, where the optimized feature fusion network model includes at least two feature extraction channels, mask types or mask sizes of convolution layers in different feature extraction channels are different, and feature fusion is completed by the fusion layer according to extraction results of different feature extraction channels. The training unit 602 is configured to take the noisy image in the public data set as a training data set, input the training data set to the optimized feature fusion network model for self-supervision model training until the set iteration number is reached or the loss value of the loss function is smaller than a set threshold, and output a blind spot denoising network model. All relevant contents of each step related to the foregoing training method embodiment may be cited to the functional descriptions of the corresponding functional modules, which are not described herein.
In other embodiments of the present invention, an electronic device is disclosed, where the electronic device may refer to the server 100 above and may refer to the terminal device 200 above, as shown in fig. 7, and the electronic device may include: one or more processors 701; a memory 702; a display 703; one or more applications (not shown); and one or more programs 704, the devices described above may be connected by one or more communication buses 705. Wherein the one or more programs 704 are stored in the memory 702 and configured to be executed by the one or more processors 701, the one or more programs 704 include instructions that can be used to perform the various steps as in fig. 2 and the corresponding embodiments.
From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above. The specific working processes of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which are not described herein.
The functional units in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present invention may be essentially or partly contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, comprising several instructions for causing an electronic device or processor to perform all or part of the steps of the method described in the embodiments of the present invention. And the aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic or optical disk, and the like.
The foregoing is merely a specific implementation of the embodiment of the present invention, but the protection scope of the embodiment of the present invention is not limited to this, and any changes or substitutions within the technical scope disclosed in the embodiment of the present invention should be covered in the protection scope of the embodiment of the present invention. Therefore, the protection scope of the embodiments of the present invention shall be subject to the protection scope of the claims.

Claims (12)

1. A method for training a blind spot denoising network model, the method comprising:
optimizing a feature fusion network model, wherein the optimized feature fusion network model comprises at least two feature extraction channels, the mask types or the mask sizes of convolution layers in different feature extraction channels are different, and feature fusion is completed by the fusion layers according to the extraction results of different feature extraction channels;
taking the noisy image in the public data set as a training data set;
and inputting the training data set into the optimized feature fusion network model to perform self-supervision model training until the set iteration times are reached or the loss value of the loss function is smaller than a set threshold value, and outputting the blind spot denoising network model.
2. The training method of claim 1, wherein inputting the training dataset into the optimized feature fusion network model for self-supervised model training comprises:
after the noise-containing image is subjected to feature pre-extraction, respectively carrying out feature extraction through each feature extraction channel;
inputting the extraction results of different feature extraction channels into CDCL comprising DCL for convolution, then firstly connecting and fusing the extraction results of the feature extraction channels with the same mask size, fusing the features extracted by convolution kernels with different mask types, connecting the outputs of all channels together after a plurality of DCLs, and completing the feature fusion of the convolution kernels with different mask sizes;
and finally, obtaining final output through channel transformation and feature fusion of a plurality of convolutions.
3. The training method of claim 2, wherein the mask type is at least one or more of cross, return, row, column, diagonal, reverse diagonal.
4. A training method according to any one of claims 1 to 3, characterized in that the loss function satisfies:
Loss=‖I out -I N1
wherein Loss is L1 Loss function, I out I is the output result of the blind spot denoising network model N Is a noisy image.
5. A training method according to any one of claims 1 to 3, characterized in that the inputting of the mask image training set into the optimized feature fusion network model for self-supervision model training comprises:
randomly cutting the noisy image into a plurality of sub-images, randomly rotating the sub-images, and horizontally or vertically turning the sub-images, inputting the sub-images into an optimized feature fusion network model, and performing self-supervision model training.
6. A training device for a blind spot denoising network model, comprising:
the optimizing unit is used for optimizing the feature fusion network model, the optimized feature fusion network model comprises at least two feature extraction channels, the mask types or the mask sizes of the convolution layers in different feature extraction channels are different, and the feature fusion is completed by the fusion layers according to the extraction results of the different feature extraction channels;
the training unit is used for taking the noisy image in the public data set as a training data set, inputting the training data set into the optimized feature fusion network model for self-supervision model training until the set iteration times are reached or the loss value of the loss function is smaller than a set threshold value, and outputting the blind spot denoising network model.
7. The training device according to claim 6, wherein the training unit inputs the training data set into an optimized feature fusion network model for self-supervision model training, specifically for:
after the noise-containing image is subjected to feature pre-extraction, respectively carrying out feature extraction through each feature extraction channel;
inputting the extraction results of different feature extraction channels into CDCL comprising DCL for convolution, then firstly connecting and fusing the extraction results of the feature extraction channels with the same mask size, fusing the features extracted by convolution kernels with different mask types, connecting the outputs of all channels together after a plurality of DCLs, and completing the feature fusion of the convolution kernels with different mask sizes;
and finally, obtaining final output through channel transformation and feature fusion of a plurality of convolutions.
8. The training device of claim 7, wherein the mask type is at least one or more of cross, zig-zag, row, column, diagonal, and reverse diagonal.
9. Training device according to any of the claims 6 to 8, characterized in that the loss function satisfies:
Loss=‖I out -I N1
wherein I is out For the output of the blind spot denoising network model, I N Is a noisy image.
10. Training device according to any of the claims 6 to 8, wherein the training unit is configured to perform self-supervised model training on the mask image training set input optimized feature fusion network model, in particular for:
randomly cutting the noisy image into a plurality of sub-images, randomly rotating the sub-images, and horizontally or vertically turning the sub-images, inputting the sub-images into an optimized feature fusion network model, and performing self-supervision model training.
11. A computer readable storage medium having a program stored therein, characterized in that the program, when executed by a processor, implements the method of any one of claims 1 to 5.
12. An electronic device comprising a memory and a processor, the memory having stored thereon a program executable on the processor, which when executed by the processor, causes the electronic device to implement the method of any of claims 1 to 5.
CN202310666771.3A 2023-06-06 2023-06-06 Training method, device, medium and equipment for blind spot denoising network model Pending CN116703768A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310666771.3A CN116703768A (en) 2023-06-06 2023-06-06 Training method, device, medium and equipment for blind spot denoising network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310666771.3A CN116703768A (en) 2023-06-06 2023-06-06 Training method, device, medium and equipment for blind spot denoising network model

Publications (1)

Publication Number Publication Date
CN116703768A true CN116703768A (en) 2023-09-05

Family

ID=87832086

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310666771.3A Pending CN116703768A (en) 2023-06-06 2023-06-06 Training method, device, medium and equipment for blind spot denoising network model

Country Status (1)

Country Link
CN (1) CN116703768A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710240A (en) * 2023-12-15 2024-03-15 山东财经大学 Self-supervision image denoising method, system, device and readable storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117710240A (en) * 2023-12-15 2024-03-15 山东财经大学 Self-supervision image denoising method, system, device and readable storage medium
CN117710240B (en) * 2023-12-15 2024-05-24 山东财经大学 Self-supervision image denoising method, system, device and readable storage medium

Similar Documents

Publication Publication Date Title
Pan et al. Physics-based generative adversarial models for image restoration and beyond
Zhang et al. Pyramid channel-based feature attention network for image dehazing
US10943145B2 (en) Image processing methods and apparatus, and electronic devices
Liu et al. Learning affinity via spatial propagation networks
Fakhry et al. Residual deconvolutional networks for brain electron microscopy image segmentation
CN109858487B (en) Weak supervision semantic segmentation method based on watershed algorithm and image category label
CN111539290B (en) Video motion recognition method and device, electronic equipment and storage medium
CN110189260B (en) Image noise reduction method based on multi-scale parallel gated neural network
Sun et al. Multiscale generative adversarial network for real‐world super‐resolution
CN106709877A (en) Image deblurring method based on multi-parameter regular optimization model
Kim et al. Deeply aggregated alternating minimization for image restoration
CN111832592A (en) RGBD significance detection method and related device
CN110728636A (en) Monte Carlo rendering image denoising model, method and device based on generative confrontation network
Jiao et al. Guided-Pix2Pix: End-to-end inference and refinement network for image dehazing
CN116703768A (en) Training method, device, medium and equipment for blind spot denoising network model
CN112861718A (en) Lightweight feature fusion crowd counting method and system
CN115393191A (en) Method, device and equipment for reconstructing super-resolution of lightweight remote sensing image
Yang et al. Low‐light image enhancement based on Retinex decomposition and adaptive gamma correction
Niu et al. Deep robust image deblurring via blur distilling and information comparison in latent space
Yu et al. FS-GAN: Fuzzy Self-guided structure retention generative adversarial network for medical image enhancement
Hua et al. Dynamic scene deblurring with continuous cross-layer attention transmission
CN108961268B (en) Saliency map calculation method and related device
Tang et al. Structure-embedded ghosting artifact suppression network for high dynamic range image reconstruction
Ma et al. Depth estimation from single image using CNN-residual network
CN113096032A (en) Non-uniform blur removing method based on image area division

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination