CN111489365A - Neural network training method, image processing method and device - Google Patents
Neural network training method, image processing method and device Download PDFInfo
- Publication number
- CN111489365A CN111489365A CN202010278429.2A CN202010278429A CN111489365A CN 111489365 A CN111489365 A CN 111489365A CN 202010278429 A CN202010278429 A CN 202010278429A CN 111489365 A CN111489365 A CN 111489365A
- Authority
- CN
- China
- Prior art keywords
- image
- semantic segmentation
- network
- information
- pixel point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 118
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 81
- 238000012549 training Methods 0.000 title claims abstract description 71
- 238000003672 processing method Methods 0.000 title claims abstract description 12
- 230000011218 segmentation Effects 0.000 claims abstract description 314
- 238000012545 processing Methods 0.000 claims abstract description 100
- 230000005012 migration Effects 0.000 claims abstract description 40
- 238000013508 migration Methods 0.000 claims abstract description 40
- 230000008569 process Effects 0.000 claims description 32
- 238000004590 computer program Methods 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000008901 benefit Effects 0.000 description 4
- 238000004891 communication Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000002372 labelling Methods 0.000 description 3
- 238000013526 transfer learning Methods 0.000 description 3
- 230000007704 transition Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000008485 antagonism Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000036961 partial effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000036962 time dependent Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/70—Denoising; Smoothing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The disclosure provides a training method of a neural network, an image processing method and an image processing device, wherein the training method comprises the following steps: performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image; performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image and the credibility information; updating the parameter values of the teacher network based on the updated parameter values of the student network. According to the embodiment of the disclosure, the specific characteristics in the student network and the teacher network learning target image are controlled through the first semantic segmentation image, the second semantic segmentation image and the credibility information, so that negative migration of the student network and the teacher network in migration learning is avoided.
Description
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to a training method for a neural network, an image processing method, and an image processing apparatus.
Background
Image segmentation refers to the task of assigning semantic labels to each pixel of a given image; in the process of supervised training or semi-supervised training of a semantic segmentation model, firstly, labeling a large number of sample images pixel by pixel; the semantic segmentation model is then trained based on the labeled samples. However, a process of labeling a large number of sample images pixel by pixel consumes a large amount of time and cost; in order to solve the problem, a sample data set is constructed by simulating a synthetic sample image; however, because there is a certain difference between the synthesized image and the real image, the difference causes a significant performance reduction when the semantic segmentation network obtained based on the synthesized image training performs semantic segmentation processing on the real image.
Disclosure of Invention
The embodiment of the disclosure at least provides a training method of a neural network, an image processing method and an image processing device.
In a first aspect, an embodiment of the present disclosure provides a training method for a neural network, including: performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image; performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; updating parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information; updating the parameter values of the teacher network based on the updated parameter values of the student network.
The first semantic segmentation image, the second semantic segmentation image and the credibility information are used for controlling the student network and the teacher network to predict the same target image after disturbance to generate a consistent prediction result, so that the student network can learn specific characteristics in the target image in the process of migrating based on the target image, namely, the student network performs migration learning towards a specific direction, and the parameter value of the teacher network is updated according to the parameter value of the student network, so that the teacher network performs migration learning towards the specific direction, and the problem of negative migration is avoided.
In a possible embodiment, the method further comprises: semantic segmentation processing is carried out on the style migration image of the source image by utilizing a student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located; the updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information includes: updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image.
In this way, the style migration image of the source image is subjected to semantic segmentation processing by using the student network to obtain a third semantic segmentation image, and then the parameter value updating process of the student network is supervised based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image, so that the semantic segmentation precision of the student network and the teacher network can be further improved.
In one possible embodiment, the updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, the credibility information, the third semantically segmented image, and annotation information of the source image comprises: determining a consistency loss based on the first semantically segmented image, the second semantically segmented image and the credibility information; determining a weight of the consistency loss based on a current iteration number; determining semantic segmentation losses based on the third semantically segmented image and annotation information of the source image; updating parameter values for the student network based on the consistency loss, the weights, and the semantic segmentation loss.
In this way, the weight of the consistency loss is determined through the current iteration times, the adjusting process of the parameter values of the student network is supervised based on the consistency loss, the determined weight of the consistency loss and the semantic segmentation loss, and the influence of the consistency loss and the semantic segmentation loss on the parameter values of the student network and the teacher network is dynamically adjusted along with the increase of the iteration times of the student network and the teacher network, so that the specific features in the target image are learned on the premise of ensuring the semantic segmentation precision of the student network and the teacher network.
In one possible embodiment, performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image includes: performing semantic segmentation processing on the plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images; and generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images.
Therefore, the teacher network is used for respectively carrying out semantic segmentation processing on the plurality of second noise images to obtain a plurality of middle semantic segmentation images, and the second semantic segmentation images are generated based on the plurality of middle semantic segmentation images, so that more uncertain information in the second noise images can be extracted, reliability information of each pixel point in the second semantic segmentation images obtained based on the second noise images has better prominence, and further the optimization efficiency of student network parameter values is improved.
In one possible embodiment, the generating the second semantically segmented image based on the plurality of intermediate semantically segmented images comprises: calculating a pixel value mean value of pixel points at corresponding positions in the multiple intermediate semantic segmentation images in sequence; and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
Therefore, more uncertain information can be extracted by solving the pixel value mean value of the pixel points at the corresponding positions in the middle semantic segmentation images.
In a possible embodiment, the determining, based on the second semantically segmented image, reliability information of each pixel point in the second semantically segmented image includes: determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.
In this way, the information entropy of each pixel point in the second semantic segmentation image is extracted through the pixel value of each pixel point in the second semantic segmentation image, and then the credibility information of each pixel point in the second semantic segmentation image is determined based on the information entropy.
In a possible embodiment, the determining the reliability information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold includes: comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold; determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result; if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.
Therefore, only credible pixel points in the second semantic segmentation image are considered for consistency loss of the generated first semantic segmentation image and the second semantic segmentation image, and therefore when parameter values of the student network are updated based on the consistency loss, the result that the student network and the teacher network conduct semantic segmentation processing on target images added with different disturbances tends to be consistent. And then updating the parameter values of the teacher network based on the updated parameter values of the student network, so that the parameter values of the teacher network and the parameter values of the student network can be kept consistent, and the teacher network and the student network can learn the specific characteristics of the target image.
In one possible embodiment, the information entropy threshold is generated by: and determining the information entropy threshold value based on the semantic segmentation type of the teacher network.
In one possible embodiment, updating the parameter values of the teacher network based on the updated parameter values of the student network comprises: performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values; and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
Therefore, the parameter values of the teacher network are the exponential moving average values based on the parameter values of the student network, so that the teacher network and the student network can converge faster, and the training efficiency of the neural network is improved.
In a second aspect, an embodiment of the present disclosure further provides a training apparatus for a neural network, including: the first processing module is used for performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image; the second processing module is used for performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; a first updating module, configured to update a parameter value of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the reliability information; and the second updating module is used for updating the parameter values of the teacher network based on the updated parameter values of the student network.
In a possible embodiment, the apparatus further comprises: the third processing module is used for performing semantic segmentation processing on the style migration image of the source image by using a student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located; the first updating module, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the reliability information, is configured to: updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image.
In a possible embodiment, the first updating module, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, the credibility information, the third semantically segmented image, and the annotation information of the source image, is configured to: determining a consistency loss based on the first semantically segmented image, the second semantically segmented image and the credibility information; determining a weight of the consistency loss based on a current iteration number; determining semantic segmentation losses based on the third semantically segmented image and annotation information of the source image; updating parameter values for the student network based on the consistency loss, the weights, and the semantic segmentation loss.
In one possible embodiment, the second processing module, when performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image, is configured to: performing semantic segmentation processing on the plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images; and generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images.
In one possible embodiment, the second processing module, when generating the second semantically segmented image based on the plurality of intermediate semantically segmented images, is configured to: calculating a pixel value mean value of pixel points at corresponding positions in the multiple intermediate semantic segmentation images in sequence; and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
In a possible embodiment, the second processing module, when determining, based on the second semantically segmented image, reliability information of each pixel point in the second semantically segmented image, is configured to: determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.
In a possible implementation manner, the second processing module, when determining the reliability information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold, is configured to: comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold; determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result; if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.
In a possible implementation, the second processing module is further configured to generate the information entropy threshold by: and determining the information entropy threshold value based on the semantic segmentation type of the teacher network.
In one possible embodiment, the second updating module, when updating the parameter values of the teacher network based on the updated parameter values of the student network, is configured to: performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values; and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
In a third aspect, an embodiment of the present disclosure further provides an image processing method, including: acquiring an image to be processed; and performing semantic segmentation processing on the image to be processed by using the neural network trained by the neural network training method based on any one of the first aspect to obtain a semantic segmentation result of the image to be processed.
In a fourth aspect, an embodiment of the present disclosure further provides an image processing apparatus, including: the acquisition module is used for acquiring an image to be processed; and the processing module is used for performing semantic segmentation processing on the image to be processed by utilizing the neural network trained by the neural network training method based on any one of the first aspect to obtain a semantic segmentation result of the image to be processed.
In a fifth aspect, an embodiment of the present disclosure further provides an intelligent driving control method, including: acquiring an image acquired by a driving device in the driving process; detecting a target object in the image by using a neural network trained by the training method based on the neural network of any one of the first aspect; controlling the running device based on the detected target object.
In a sixth aspect, an embodiment of the present disclosure further provides an intelligent driving control device, including: the data acquisition module is used for acquiring images acquired by the driving device in the driving process; a detection module, configured to detect a target object in the image by using a neural network trained by the training method based on the neural network of any one of the first aspects; a control module for controlling the travel device based on the detected target object.
In a seventh aspect, this disclosure also provides an electronic device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect, or any one of the possible implementations of the first aspect, or to perform the steps in the third aspect, or to perform the steps in the fifth aspect.
In an eighth aspect, alternative implementations of the present disclosure further provide a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect, or any one of the possible implementations of the first aspect, or to perform the steps in the possible implementations of the third aspect, or to perform the steps in the possible implementations of the fifth aspect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 shows a flowchart of a training method of a neural network provided by an embodiment of the present disclosure;
fig. 2 is a flowchart illustrating a specific method for determining reliability information of each pixel point in a second semantically segmented image according to an embodiment of the present disclosure;
FIG. 3 illustrates a flow chart of another method of training a neural network provided by an embodiment of the present disclosure;
fig. 4 is a schematic diagram illustrating a specific example of a training method of a neural network provided by an embodiment of the present disclosure;
FIG. 5 is a flow chart illustrating an image processing method provided by an embodiment of the present disclosure;
fig. 6 shows a flowchart of an intelligent driving control method provided by an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a training apparatus for a neural network provided by an embodiment of the present disclosure;
fig. 8 shows a schematic diagram of an image processing apparatus provided by an embodiment of the present disclosure;
fig. 9 is a schematic diagram illustrating an intelligent driving control device provided in an embodiment of the present disclosure;
fig. 10 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
According to research, before training, the neural network usually needs to spend a great deal of time and cost to label the sample image so as to form a labeled data set; in order to reduce sample labeling time and cost, neural networks are trained in many cases by computer simulated synthetic images; however, because a certain domain difference exists between the synthetic image and the real image, the performance of the neural network obtained by training the synthetic image is reduced when the neural network executes an image processing task on the actual image; in order to solve the problem, currently, more supervised training with supervision signals is generally performed on an antagonism framework, for example, on the basis of a generative antagonism network, the neural network is subjected to transfer learning by adopting supervision signals such as depth, style, class constraint, decision boundary and the like; however, in the process of migration learning by the neural network, the learned features have great uncertainty, and therefore, the problem of negative migration may be caused.
Based on the research, the method and the device for training the neural network monitor the student network to perform transfer learning by controlling the teacher network and the student network to generate consistent prediction results on unmarked target images under different disturbances, and update the teacher network based on parameter values of the student network, so that the teacher network and the student network can learn specific technical characteristics in the target images in the transfer learning process, and the problem of negative transfer is avoided.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
To facilitate understanding of the present embodiment, first, a training method for a neural network disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the training method for a neural network provided in the embodiments of the present disclosure is generally a computer device with certain computing power, and the computer device includes, for example: a terminal device or server or other processing device; in some possible implementations, the training method of the neural network may be implemented by a processor calling computer-readable instructions stored in a memory.
The following describes a training method of a neural network provided in an embodiment of the present disclosure.
In the embodiment of the present disclosure, before updating the parameter values of the Student Network (Student Network) and the Teacher Network (Teacher Network) based on S101 to S104, the parameter values of the Student Network and the Teacher Network may be initialized first.
Illustratively, the teacher network and the student network may be initialized, for example, using a pre-trained semantic segmentation network.
Here, the pre-trained semantic segmentation network is, for example, a neural network trained based on source images; in the embodiments of the present disclosure, the processes of S101 to S104 are based on the target image, and control the pre-trained semantic segmentation network to perform the migration learning from the source domain to the target domain, so that after the migration learning is performed by the semantic segmentation network, performance of the semantic segmentation network is not degraded when performing the semantic segmentation processing on the image of the target domain.
The image of the source domain includes, for example: synthesizing an image; the image of the target field includes, for example: and (4) real images.
After parameter values of the student network and the teacher network are initialized, multiple rounds of iteration are carried out on the student network and the teacher network based on S101-S104, and the teacher network or the student network after the multiple rounds of iteration is determined as a trained neural network. Here, the process of S101 to S104 is performed once, and is a process of performing one iteration of the student network and the teacher network.
Referring to fig. 1, a flowchart of a training method of a neural network provided in an embodiment of the present disclosure is shown, where the method includes:
s101: and performing semantic segmentation processing on the first noise image of the target image by using a student network to obtain a first semantic segmentation image.
In a specific implementation, the first noise image may be obtained by injecting random noise into the target image, for example.
Exemplary random noise includes, for example: any one of gaussian noise, white noise, etc. may be determined according to actual needs.
Injecting random noise into a target image, generating a first noise image, and performing semantic segmentation processing on the first noise image by using a student network; when the student network carries out semantic segmentation processing on the first noise image, a semantic segmentation result of each pixel point in the first noise image can be obtained; then, forming a first semantic segmentation image based on the semantic segmentation result of each pixel point in the first noise image; the first semantically segmented image has the same size as the first noise image.
The pixel value of any pixel point a 'in the first semantic segmentation image is the semantic segmentation result of the pixel point a corresponding to the any pixel point a' in the first noise image.
The training method of the neural network provided by the embodiment of the disclosure further includes:
s102: performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image.
In specific implementation, S102 and S101 have no sequential logical relationship; the execution may be synchronous or asynchronous.
The second noise image is generated in a similar manner to the first noise image, and may be obtained by injecting random noise into the target image, for example. Wherein the noise injected by different noisy images of the target image is different.
In one possible embodiment, there is one second noise image; in this case, the teacher network is used to perform semantic segmentation processing on the second noise image, so as to obtain a semantic segmentation result of each pixel point in the second noise image, and then the second semantic segmentation image is formed based on the semantic segmentation result of each pixel point in the second noise image.
In another possible embodiment, there are a plurality of second noise images; in this case, performing semantic segmentation processing on the multiple second noise images of the target image by using a teacher network to obtain an intermediate semantic segmentation image corresponding to each second noise image in the multiple second noise images; and then, based on the multiple intermediate semantic segmentation images, generating a second semantic segmentation image.
Here, for example, pixel values of pixels at corresponding positions in a plurality of second semantic segmentation images may be averaged in sequence, and the average value of a pixel at any corresponding position may be determined as the pixel value of a pixel at a corresponding position in the second semantic segmentation image.
For example, if the size of the target image is h × w, N second noise images of the target image are A1, A2, … … and AN respectively, and the teacher network is used for semantic segmentation processing on the second noise images to obtain AN intermediate semantic segmentation image of the ith second noise imageExpressed as:wherein x istRepresenting a target image; h represents the height of the target image, w represents the width of the target image; and C represents the semantic segmentation type of the teacher network.
therefore, random noise is injected into the target image for multiple times to generate multiple second noise images, the images are segmented based on the middle semantemes corresponding to the multiple second noise images respectively to obtain second semanteme segmented images, more uncertain information in the second noise images can be extracted, reliability information of each pixel point in the second semanteme segmented images obtained based on the second noise images has better prominence, and optimization efficiency of student network parameter values is improved.
After obtaining the second semantic segmentation image, referring to fig. 2, the embodiment of the present disclosure further provides a specific method for determining reliability information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image, including:
s201: and determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image.
Here, the information entropy of any pixel pointFor example, the following formula (2) is satisfied:
s202: and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.
Here, the information entropy threshold may be determined based on, for example, a semantic division type of the teacher network.
The information entropy threshold H satisfies, for example, the following formula (3):
wherein a, b and c are all hyper-parameters; kmaxlogC; and C represents the semantic segmentation type of the teacher network. t represents the current iteration round number; t is tmaxThe maximum number of iteration rounds is indicated.
for example, the information entropy of each pixel point in the second semantic segmentation image may be compared with a predetermined information entropy threshold; and then determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result.
If the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.
In a specific implementation, as can be known from the above formula (2), the value of the information entropy is a negative number; for a certain pixel point in the second semantic segmentation image, the smaller the value of the information entropy of the pixel point is, the higher the credibility of the pixel point is represented, that is, the higher the credibility of the classification of the pixel point in the corresponding target image represented by the pixel value of the pixel point in the second semantic segmentation image is. When consistency loss between the first semantic segmentation image and the second semantic segmentation image is determined, considering pixel points with higher reliability in the second semantic segmentation image, and increasing influence of the pixel points with higher reliability on the consistency loss; and for the pixel points with lower credibility in the second semantic segmentation image, the influence of the pixel points on consistency loss can be reduced, and even the influence of the pixel points on consistency loss is eliminated.
Further, for example, a preset that the pixel value is authentic may be set to 1; a preset value at which the pixel value is not authentic is set to 0.
For another example, a preset value where the pixel value is authentic may be set to 1, a preset value where the pixel value is not authentic may be set to 0.5, and so on.
The specific setting can be carried out according to the actual needs.
Further, for example, the reliability information of each pixel point in the second semantic segmentation image satisfies the following formula (4):
wherein H represents an information entropy threshold; i (-) represents a 0-1 function; and isWhen the formula is adopted, I (·) takes 1;when it is, I (. cndot.) takes 0.
Receiving the above S101 and S102, the training method of the neural network provided by the embodiment of the present disclosure further includes:
s103: updating parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information.
S104: updating the parameter values of the teacher network based on the updated parameter values of the student network.
In a particular implementation, for example, a loss of consistency between the first semantically segmented image and the second semantically segmented image may be determined based on the first semantically segmented image, the second semantically segmented image, and the credibility information, and then parameter values of a student network may be updated based on the loss of consistency.
In a specific implementation, as can be seen from the above equation (3), H is a time dependent function, and the consistency loss can be, for example, a mean square error between a first semantically segmented image extracted from a student network and a second semantically segmented image extracted from a teacher network, and the consistency loss LconFor example, the following formula (5) is satisfied:
wherein f isSRepresenting a student network; f. ofTRepresenting a teacher network; x is the number oft1Representing a first noise image; x is the number oft2Representing a second noisy image; σ denotes an activation function, for example a softmax activation function.
When updating the parameter values of the student network on the basis of the loss of consistency, for example, the parameter values of the student network are adjusted in a direction to reduce the loss of consistency.
When updating the parameter values of the teacher network based on the updated parameter values of the student network, for example, exponential moving average processing may be performed on the parameter values of the parameters in the student network to obtain target parameter values; and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
In specific implementation, based on the formula (4) and the formula (5), it can be known that when the semantic segmentation result represented by any pixel point in the second semantic segmentation image is trusted, the value of the reliability information corresponding to the pixel point is 1; when the semantic segmentation result represented by any pixel point in the second semantic segmentation image is not credible, the credibility information corresponding to the pixel point is 0, and then consistency loss is determined based on the credible pixel points of the semantic segmentation result in the second semantic segmentation image, and further the consistency loss of the generated first semantic segmentation image and the second semantic segmentation image only considers the credible pixel points in the second semantic segmentation image, so that when the parameter value of the student network is updated based on the consistency loss, the result of semantic segmentation processing performed on target images added with different disturbances by the student network and the teacher network can be ensured to tend to be consistent. And then updating the parameter values of the teacher network based on the updated parameter values of the student network, so that the parameter values of the teacher network and the parameter values of the student network can keep consistent change direction, and the teacher network and the student network can learn the specific characteristics of the target image.
In the embodiment of the disclosure, the first noise image and the second noise image are both images obtained by applying different disturbances to the target image; performing semantic segmentation processing on the first noise image by using a student network to obtain a first semantic segmentation image, performing semantic segmentation processing on the second noise image by using a teacher network to obtain a second semantic segmentation image, determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image, updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image and the credibility information, and updating parameter values of the teacher network based on the updated parameter values of the student network; in the process, the first semantic segmentation image, the second semantic segmentation image and the credibility information are used for controlling the student network and the teacher network to predict the same target image after disturbance to generate a consistent prediction result, so that the student network can learn specific characteristics in the target image in the process of migrating based on the target image, namely, the student network performs migration learning towards a specific direction, and the parameter value of the teacher network is updated according to the parameter value of the student network, so that the teacher network performs migration learning towards the specific direction, and the problem of negative migration is avoided.
Referring to fig. 3, an embodiment of the present disclosure further provides another training method for a neural network, including:
s301: and performing semantic segmentation processing on the first noise image of the target image by using a student network to obtain a first semantic segmentation image.
S302: performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image.
The specific implementation process of S301 to S302 is similar to that of S101 to S102, and is not described herein again.
S303: and performing semantic segmentation processing on the style migration image of the source image by using a student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located.
In specific implementation, the S303 has no sequential logical relationship with the above S301 to S302; the execution may be synchronous or asynchronous.
Specifically, the style transition image of the source image may be obtained, for example, in the following manner:
carrying out style migration processing on the source image by utilizing a pre-trained style migration network to obtain a style migration image corresponding to the source image; the style migration network is obtained by utilizing the source image and the target image for training.
In one embodiment, the style migration network is, for example, a Generative Adversal Networks (GANs), such as a cycleGAN. The generative confrontation network can integrate semantic information of a source domain carried in a source image and semantic information of a target domain carried in a target image together, so that the source image is converted into a style migration image containing partial features in the target image; and then carrying out semantic segmentation processing on the style migration image by using a student network.
In addition, the segmentation migration image may also be generated by using a style migration network with another architecture, for example, a neural network with architecture such as VGG, Goog L eNet, etc. may be specifically selected according to actual needs.
In connection with the above S302 and S303, the training method of the neural network provided by the embodiment of the present disclosure further includes:
s304: updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image.
In a particular embodiment, the parameter values of the student network may be updated, for example, in the following manner: generating consistency loss of the first semantic segmentation image and the second semantic segmentation image based on the first semantic segmentation image, the second semantic segmentation image and the credibility information; generating semantic segmentation loss based on the third semantic segmentation image and the annotation information of the source image; parameters of the student network are updated based on the consistency loss and the semantic segmentation loss.
Exemplary, semantic segmentation loss LsegFor example, for optimizing the cross-entropy loss of a source image from a source domain, which satisfies the following equation (6):
wherein H represents the height of the style transition image; w represents the width of the style transition image; c represents the number of channels; y issRepresenting annotation information of the source image;representing a third semantically segmented image;representing a source image; f. ofS(-) represents a student network.
When updating the parameter values of the student network based on the semantic segmentation loss and the consistency loss, for example, a weight of the consistency loss may be determined according to the current iteration number, and then the parameter values of the student network may be updated according to the consistency loss, the weight of the consistency loss, and the semantic segmentation loss.
Determining the total loss of the student network according to the semantic segmentation loss and the consistency loss, wherein the total loss LtotalFor example, the following formula (7) is satisfied:
Ltotal=Lseg+λconLcon(7)
wherein, LsegRepresenting semantic segmentation loss LconIndicating a loss of consistency; lambda [ alpha ]conThe weight of the consistency loss is, for example, a dynamic weight which is set as a rising function increasing with the number of iterations and can balance between the semantic segmentation loss and the consistency loss, the advantage of the semantic segmentation loss is increased in the early training process of the neural network, and the advantage of the consistency loss is gradually increased in the later training process, so that the convergence of the parameter values of the neural network is stably controlled.
With reference to the foregoing S304, the training method for a neural network provided in the embodiment of the present disclosure further includes:
s305: updating the parameter values of the teacher network based on the updated parameter values of the student network.
Here, the specific implementation process of S305 is similar to that of S104 described above, and is not described herein again.
According to the method and the device, the style migration image of the source image is subjected to semantic segmentation processing by utilizing the student network to obtain a third semantic segmentation image, and then the parameter value updating process of the student network is supervised based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image, so that the semantic segmentation precision of the student network and the teacher network can be further improved.
Referring to fig. 4, an embodiment of the present disclosure further provides a specific example of a training method for a neural network, including:
step 1: a source image xsInputting the image data to a style migration network to obtain a source image xsStyle migration image of
Step 2: migrating styles from image to imageAnd inputting the image into a student network to obtain a third semantic segmentation image.
And step 3: based on source image xsMarking information y ofsAnd a third semantic segmentation image to obtain a semantic segmentation loss Lseg。
And 4, step 4: is a target image xtRandom noise is injected to generate a first noise image, and the first noise image is input to a student network to obtain a first semantic segmentation image.
And 5: is a target image xtInjecting random noise to generate N second noise images, and inputting the N second noise images to a teacher network to obtain a plurality of intermediate semantic segmentation images. And sequentially solving the pixel value mean value of pixel points at corresponding positions in the plurality of intermediate semantic segmentation images to obtain a second semantic segmentation image.
And 7: and (3) calculating the information entropy of each pixel point in the second semantic segmentation image according to the formula (2).
And 8: and (4) calculating according to a formula (4) to calculate the reliability, so as to obtain the reliability information of each pixel point in the second semantic segmentation image.
Step 9, obtaining a consistency loss L of the first semantic segmentation image and the second semantic segmentation image according to the first semantic segmentation image, the second semantic segmentation image and the credibility informationcon。
Step 10 calculating Total loss L according to equation (7)total。
Step 12: and carrying out exponential moving average processing on the updated parameter values of the student network, and updating the parameter values of the teacher network based on the result of the exponential moving average processing.
Through the process, one round of iteration of the student network and the teacher network is realized.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Referring to fig. 5, an embodiment of the present disclosure further provides an image processing method, including:
s501: acquiring an image to be processed;
s502: and performing semantic segmentation processing on the image to be processed by utilizing the neural network trained by the training method of the neural network based on any embodiment of the disclosure to obtain a semantic segmentation result of the image to be processed.
The implementation of the method is realized by utilizing the neural network trained by the neural network training method provided by the embodiment of the invention when the semantic segmentation processing is carried out on the image to be processed, the neural network trained by the neural network training method has better semantic style precision on the image to be processed, and the obtained semantic segmentation result of the image to be processed is more accurate.
Referring to fig. 6, an embodiment of the present disclosure further provides an intelligent driving control method, including:
s601: acquiring an image acquired by a driving device in the driving process;
s602: detecting a target object in the image by using a neural network trained by a training method of the neural network according to any embodiment of the disclosure;
s603: controlling the running device based on the detected target object.
In a specific implementation, the driving device is, for example, but not limited to, any one of the following: an autonomous vehicle, a vehicle equipped with an Advanced Driving Assistance System (ADAS), a robot, or the like.
Controlling the traveling device, for example, includes controlling the traveling device to accelerate, decelerate, steer, brake, etc., or may play voice prompt information to prompt the driver to control the traveling device to accelerate, decelerate, steer, brake, etc.
The intelligent driving control method of the embodiment of the disclosure is realized by utilizing the neural network trained by the neural network training method provided by the embodiment of the disclosure, and when the neural network trained by the neural network training method performs semantic segmentation processing on the image obtained in the driving process, a more accurate semantic segmentation processing result can be obtained, thereby ensuring higher safety in the driving control process.
Based on the same inventive concept, the embodiment of the present disclosure further provides a training apparatus for a neural network corresponding to the training method for the neural network, and since the principle of the apparatus in the embodiment of the present disclosure for solving the problem is similar to the training method for the neural network described above in the embodiment of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 7, a schematic diagram of a training apparatus for a neural network provided in an embodiment of the present disclosure is shown, where the apparatus includes: a first processing module 71, a second processing module 72, a first updating module 73, and a second updating module 74; wherein,
the first processing module 71 is configured to perform semantic segmentation processing on a first noise image of the target image by using a student network to obtain a first semantic segmentation image;
a second processing module 72, configured to perform semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;
a first updating module 73, configured to update parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the reliability information;
a second updating module 74 for updating the parameter values of the teacher network based on the updated parameter values of the student network.
In a possible embodiment, the apparatus further comprises: the third processing module 75 is configured to perform semantic segmentation processing on the style migration image of the source image by using a student network to obtain a third semantic segmentation image, where the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located;
the first updating module 73, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the reliability information, is configured to:
updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image.
In a possible embodiment, the first updating module 73, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, the credibility information, the third semantically segmented image, and the annotation information of the source image, is configured to:
determining a consistency loss based on the first semantically segmented image, the second semantically segmented image and the credibility information; determining a weight of the consistency loss based on a current iteration number;
determining semantic segmentation losses based on the third semantically segmented image and annotation information of the source image;
updating parameter values for the student network based on the consistency loss, the weights, and the semantic segmentation loss.
In one possible embodiment, the second processing module 72, when performing semantic segmentation processing on the second noise image of the target image by using a teacher network to obtain a second semantic segmentation image, is configured to:
performing semantic segmentation processing on the plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images;
and generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images.
In one possible embodiment, the second processing module 72, when generating the second semantically segmented image based on the plurality of intermediate semantically segmented images, is configured to:
calculating a pixel value mean value of pixel points at corresponding positions in the multiple intermediate semantic segmentation images in sequence;
and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
In one possible embodiment, the second processing module 72, when determining, based on the second semantically segmented image, the reliability information of each pixel point in the second semantically segmented image, is configured to:
determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image;
and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.
In one possible embodiment, the second processing module 72, when determining the reliability information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold, is configured to:
comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold;
determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result;
if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.
In a possible implementation, the second processing module 72 is further configured to generate the information entropy threshold value by:
and determining the information entropy threshold value based on the semantic segmentation type of the teacher network.
In one possible embodiment, the second updating module 74, when updating the parameter values of the teacher network based on the updated parameter values of the student network, is configured to:
performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values;
and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
In a possible embodiment, the method further comprises: a first generating module 76 for generating the style migration image by:
carrying out style migration processing on the source image by utilizing a pre-trained style migration network to obtain a style migration image corresponding to the source image; the style migration network is obtained by utilizing the source image and the target image for training.
In a possible embodiment, the method further comprises: and the initialization module 77 is used for initializing the teacher network and the student network by utilizing a pre-trained semantic segmentation network.
In a possible embodiment, the method further comprises: a second generating module 78 configured to generate the first noise image and the second noise image in the following manner:
injecting random noise into the target image to obtain the first noise image and the second noise image; wherein, the noise corresponding to different noise images is different.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
Referring to fig. 8, an embodiment of the present disclosure further provides an image processing apparatus, including:
an obtaining module 81, configured to obtain an image to be processed;
the processing module 82 is configured to perform semantic segmentation on the image to be processed by using the neural network trained by the neural network training method according to any embodiment of the present disclosure, so as to obtain a semantic segmentation result of the image to be processed.
Referring to fig. 9, an embodiment of the present disclosure further provides an intelligent driving control device, including:
the data acquisition module 91 is used for acquiring images acquired by the running device in the running process;
a detection module 92, configured to detect a target object in the image by using a neural network trained by a neural network training method according to any embodiment of the present disclosure;
and a control module 93 for controlling the running device based on the detected target object.
An embodiment of the present disclosure further provides an electronic device 10, as shown in fig. 10, which is a schematic structural diagram of the electronic device 10 provided in the embodiment of the present disclosure, and includes:
a processor 11 and a memory 12; the memory 12 stores machine-readable instructions executable by the processor 11, which when executed by the electronic device are executed by the processor to perform the steps of:
performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image; performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; updating parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information; updating the parameter values of the teacher network based on the updated parameter values of the student network.
Or implementing the following steps: acquiring an image to be processed; performing semantic segmentation processing on the image to be processed by using a neural network trained by using the neural network training method based on any embodiment of the disclosure to obtain a semantic segmentation result of the image to be processed;
or implementing the following steps: acquiring an image acquired by a driving device in the driving process; detecting a target object in the image by using a neural network trained by a training method of the neural network according to any embodiment of the disclosure; controlling the running device based on the detected target object. .
The specific execution process of the instruction may refer to the steps of the neural network training method or the image processing steps described in the embodiments of the present disclosure, and details are not repeated here.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the neural network training method described in the above method embodiments, or performs the steps of the image processing method described in the above method embodiments, or performs the steps of the intelligent driving control method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The computer program product of the neural network training method and the image processing method provided in the embodiments of the present disclosure includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the neural network training method, the image processing method, or the intelligent driving control method described in the embodiments of the above methods, and specific reference may be made to the embodiments of the above methods, which are not described herein again.
The embodiments of the present disclosure also provide a computer program, which when executed by a processor implements any one of the methods of the foregoing embodiments. The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an electronic device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.
Claims (16)
1. A method of training a neural network, comprising:
performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image;
performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;
updating parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information;
updating the parameter values of the teacher network based on the updated parameter values of the student network.
2. Training method according to claim 1, characterized in that the method further comprises:
semantic segmentation processing is carried out on the style migration image of the source image by utilizing a student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located;
the updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information includes:
updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image.
3. The training method of claim 2, wherein the updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, the credibility information, the third semantically segmented image, and annotation information of the source image comprises:
determining a consistency loss based on the first semantically segmented image, the second semantically segmented image and the credibility information; determining a weight of the consistency loss based on a current iteration number;
determining semantic segmentation losses based on the third semantically segmented image and annotation information of the source image;
updating parameter values for the student network based on the consistency loss, the weights, and the semantic segmentation loss.
4. A training method as claimed in any one of claims 1 to 3, wherein performing semantic segmentation processing on the second noise image of the target image using a teacher network to obtain a second semantic segmented image comprises:
performing semantic segmentation processing on the plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images;
and generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images.
5. The training method of claim 4, wherein the generating the second semantically segmented image based on the plurality of intermediate semantically segmented images comprises:
calculating a pixel value mean value of pixel points at corresponding positions in the multiple intermediate semantic segmentation images in sequence;
and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
6. The training method according to any one of claims 1 to 5, wherein the determining, based on the second semantically segmented image, reliability information of each pixel point in the second semantically segmented image comprises:
determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image;
and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.
7. The training method according to claim 6, wherein the determining the reliability information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold comprises:
comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold;
determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result;
if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.
8. Training method according to claim 6 or 7, characterized in that the information entropy threshold is generated in the following way:
and determining the information entropy threshold value based on the semantic segmentation type of the teacher network.
9. A training method as claimed in any one of claims 1 to 8, wherein updating the parameter values of the teacher network based on the updated parameter values of the student network comprises:
performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values;
and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
10. An image processing method, comprising:
acquiring an image to be processed;
performing semantic segmentation processing on the image to be processed by using the neural network trained by the neural network training method according to any one of claims 1 to 9 to obtain a semantic segmentation result of the image to be processed.
11. An intelligent travel control method, characterized by comprising:
acquiring an image acquired by a driving device in the driving process;
detecting a target object in the image by using a neural network trained by a training method based on the neural network according to any one of claims 1 to 9;
controlling the running device based on the detected target object.
12. An apparatus for training a neural network, comprising:
the first processing module is used for performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image;
the second processing module is used for performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;
a first updating module, configured to update a parameter value of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the reliability information;
and the second updating module is used for updating the parameter values of the teacher network based on the updated parameter values of the student network.
13. An image processing apparatus characterized by comprising:
the acquisition module is used for acquiring an image to be processed;
a processing module, configured to perform semantic segmentation processing on the image to be processed by using the neural network trained by the neural network training method according to any one of claims 1 to 9, so as to obtain a semantic segmentation result of the image to be processed.
14. An intelligent travel control device, comprising:
the data acquisition module is used for acquiring images acquired by the driving device in the driving process;
a detection module, configured to detect a target object in the image by using a neural network trained by a training method based on the neural network of any one of claims 1 to 9;
a control module for controlling the travel device based on the detected target object.
15. An electronic device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor to execute machine-readable instructions stored in the memory, the processor to perform the steps of the method of any one of claims 1 to 11 when the machine-readable instructions are executed by the processor.
16. A computer-readable storage medium, having stored thereon a computer program, when being executed by an electronic device, the electronic device performing the steps of the method according to any of the claims 1 to 11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010278429.2A CN111489365B (en) | 2020-04-10 | 2020-04-10 | Training method of neural network, image processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010278429.2A CN111489365B (en) | 2020-04-10 | 2020-04-10 | Training method of neural network, image processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111489365A true CN111489365A (en) | 2020-08-04 |
CN111489365B CN111489365B (en) | 2023-12-22 |
Family
ID=71794812
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010278429.2A Active CN111489365B (en) | 2020-04-10 | 2020-04-10 | Training method of neural network, image processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111489365B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111967597A (en) * | 2020-08-18 | 2020-11-20 | 上海商汤临港智能科技有限公司 | Neural network training and image classification method, device, storage medium and equipment |
CN112070163A (en) * | 2020-09-09 | 2020-12-11 | 北京字节跳动网络技术有限公司 | Image segmentation model training and image segmentation method, device and equipment |
CN112419326A (en) * | 2020-12-02 | 2021-02-26 | 腾讯科技(深圳)有限公司 | Image segmentation data processing method, device, equipment and storage medium |
WO2022041307A1 (en) * | 2020-08-31 | 2022-03-03 | 温州医科大学 | Method and system for constructing semi-supervised image segmentation framework |
CN114399640A (en) * | 2022-03-24 | 2022-04-26 | 之江实验室 | Road segmentation method and device for uncertain region discovery and model improvement |
WO2022134338A1 (en) * | 2020-12-23 | 2022-06-30 | 平安科技(深圳)有限公司 | Domain adaptation method and apparatus, electronic device, and storage medium |
CN114708436A (en) * | 2022-06-02 | 2022-07-05 | 深圳比特微电子科技有限公司 | Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium |
CN114842457A (en) * | 2022-06-29 | 2022-08-02 | 小米汽车科技有限公司 | Model training and feature extraction method, device, electronic equipment and medium |
WO2023019444A1 (en) * | 2021-08-17 | 2023-02-23 | 华为技术有限公司 | Optimization method and apparatus for semantic segmentation model |
WO2024187413A1 (en) * | 2023-03-15 | 2024-09-19 | 华为技术有限公司 | Model training method and communication apparatus |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1969297A (en) * | 2001-06-15 | 2007-05-23 | 索尼公司 | Image processing apparatus and method and image pickup apparatus |
CN106127810A (en) * | 2016-06-24 | 2016-11-16 | 惠州紫旭科技有限公司 | The recording and broadcasting system image tracking method of a kind of video macro block angle point light stream and device |
CN106709918A (en) * | 2017-01-20 | 2017-05-24 | 成都信息工程大学 | Method for segmenting images of multi-element student t distribution mixed model based on spatial smoothing |
US20170301085A1 (en) * | 2014-09-11 | 2017-10-19 | B.G. Negev Technologies And Applications Ltd. (Ben Gurion University | Interactive segmentation |
US20190147582A1 (en) * | 2017-11-15 | 2019-05-16 | Toyota Research Institute, Inc. | Adversarial learning of photorealistic post-processing of simulation with privileged information |
CN110414526A (en) * | 2019-07-31 | 2019-11-05 | 达闼科技(北京)有限公司 | Training method, training device, server and the storage medium of semantic segmentation network |
CN110458844A (en) * | 2019-07-22 | 2019-11-15 | 大连理工大学 | A kind of semantic segmentation method of low illumination scene |
US20190392573A1 (en) * | 2018-06-22 | 2019-12-26 | Cnh Industrial Canada, Ltd. | Measuring crop residue from imagery using a machine-learned semantic segmentation model |
CN110827963A (en) * | 2019-11-06 | 2020-02-21 | 杭州迪英加科技有限公司 | Semantic segmentation method for pathological image and electronic equipment |
-
2020
- 2020-04-10 CN CN202010278429.2A patent/CN111489365B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1969297A (en) * | 2001-06-15 | 2007-05-23 | 索尼公司 | Image processing apparatus and method and image pickup apparatus |
US20170301085A1 (en) * | 2014-09-11 | 2017-10-19 | B.G. Negev Technologies And Applications Ltd. (Ben Gurion University | Interactive segmentation |
CN106127810A (en) * | 2016-06-24 | 2016-11-16 | 惠州紫旭科技有限公司 | The recording and broadcasting system image tracking method of a kind of video macro block angle point light stream and device |
CN106709918A (en) * | 2017-01-20 | 2017-05-24 | 成都信息工程大学 | Method for segmenting images of multi-element student t distribution mixed model based on spatial smoothing |
US20190147582A1 (en) * | 2017-11-15 | 2019-05-16 | Toyota Research Institute, Inc. | Adversarial learning of photorealistic post-processing of simulation with privileged information |
US20190392573A1 (en) * | 2018-06-22 | 2019-12-26 | Cnh Industrial Canada, Ltd. | Measuring crop residue from imagery using a machine-learned semantic segmentation model |
CN110458844A (en) * | 2019-07-22 | 2019-11-15 | 大连理工大学 | A kind of semantic segmentation method of low illumination scene |
CN110414526A (en) * | 2019-07-31 | 2019-11-05 | 达闼科技(北京)有限公司 | Training method, training device, server and the storage medium of semantic segmentation network |
CN110827963A (en) * | 2019-11-06 | 2020-02-21 | 杭州迪英加科技有限公司 | Semantic segmentation method for pathological image and electronic equipment |
Non-Patent Citations (3)
Title |
---|
LEQUAN YU 等: "Uncertainty-aware Self-ensembling Model for Semi-supervised 3D Left Atrium Segmentation", pages 1 - 9 * |
华敏杰: "基于深度学习的图像语义分割算法概述", pages 130 * |
郑宝玉 等: "基于深度卷积神经网络的弱监督图像语义分割", pages 5 - 16 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111967597A (en) * | 2020-08-18 | 2020-11-20 | 上海商汤临港智能科技有限公司 | Neural network training and image classification method, device, storage medium and equipment |
WO2022041307A1 (en) * | 2020-08-31 | 2022-03-03 | 温州医科大学 | Method and system for constructing semi-supervised image segmentation framework |
CN112070163A (en) * | 2020-09-09 | 2020-12-11 | 北京字节跳动网络技术有限公司 | Image segmentation model training and image segmentation method, device and equipment |
CN112070163B (en) * | 2020-09-09 | 2023-11-24 | 抖音视界有限公司 | Image segmentation model training and image segmentation method, device and equipment |
CN112419326A (en) * | 2020-12-02 | 2021-02-26 | 腾讯科技(深圳)有限公司 | Image segmentation data processing method, device, equipment and storage medium |
CN112419326B (en) * | 2020-12-02 | 2023-05-23 | 腾讯科技(深圳)有限公司 | Image segmentation data processing method, device, equipment and storage medium |
WO2022134338A1 (en) * | 2020-12-23 | 2022-06-30 | 平安科技(深圳)有限公司 | Domain adaptation method and apparatus, electronic device, and storage medium |
WO2023019444A1 (en) * | 2021-08-17 | 2023-02-23 | 华为技术有限公司 | Optimization method and apparatus for semantic segmentation model |
CN114399640A (en) * | 2022-03-24 | 2022-04-26 | 之江实验室 | Road segmentation method and device for uncertain region discovery and model improvement |
CN114708436A (en) * | 2022-06-02 | 2022-07-05 | 深圳比特微电子科技有限公司 | Training method of semantic segmentation model, semantic segmentation method, semantic segmentation device and semantic segmentation medium |
CN114842457A (en) * | 2022-06-29 | 2022-08-02 | 小米汽车科技有限公司 | Model training and feature extraction method, device, electronic equipment and medium |
WO2024187413A1 (en) * | 2023-03-15 | 2024-09-19 | 华为技术有限公司 | Model training method and communication apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN111489365B (en) | 2023-12-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111489365A (en) | Neural network training method, image processing method and device | |
US20200327409A1 (en) | Method and device for hierarchical learning of neural network, based on weakly supervised learning | |
Jaafra et al. | Reinforcement learning for neural architecture search: A review | |
CN110651280B (en) | Projection neural network | |
KR102071582B1 (en) | Method and apparatus for classifying a class to which a sentence belongs by using deep neural network | |
US11741356B2 (en) | Data processing apparatus by learning of neural network, data processing method by learning of neural network, and recording medium recording the data processing method | |
US11651214B2 (en) | Multimodal data learning method and device | |
CN111767405A (en) | Training method, device and equipment of text classification model and storage medium | |
CN112116090B (en) | Neural network structure searching method and device, computer equipment and storage medium | |
WO2019083553A1 (en) | Capsule neural networks | |
CN112200889A (en) | Sample image generation method, sample image processing method, intelligent driving control method and device | |
CN111241287A (en) | Training method and device for generating generation model of confrontation text | |
CN116171446A (en) | Method and system for training neural network model through countermeasure learning and knowledge distillation | |
CN112926655B (en) | Image content understanding and visual question and answer VQA method, storage medium and terminal | |
CN116097277A (en) | Method and system for training neural network models using progressive knowledge distillation | |
Dupre et al. | Improving dataset volumes and model accuracy with semi-supervised iterative self-learning | |
Chatzis et al. | A conditional random field-based model for joint sequence segmentation and classification | |
JPWO2020240808A1 (en) | Learning device, classification device, learning method, classification method, learning program, and classification program | |
CN111160000A (en) | Composition automatic scoring method, device terminal equipment and storage medium | |
CN114971066A (en) | Knowledge tracking method and system integrating forgetting factor and learning ability | |
Zhang et al. | An end-to-end inverse reinforcement learning by a boosting approach with relative entropy | |
CN111144567A (en) | Training method and device of neural network model | |
CN114397817A (en) | Network training method, robot control method, network training device, robot control device, equipment and storage medium | |
KR102157441B1 (en) | Learning method for neural network using relevance propagation and service providing apparatus | |
WO2021059527A1 (en) | Learning device, learning method, and recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |