CN112200889A - Sample image generation method, sample image processing method, intelligent driving control method and device - Google Patents

Sample image generation method, sample image processing method, intelligent driving control method and device Download PDF

Info

Publication number
CN112200889A
CN112200889A CN202011197925.1A CN202011197925A CN112200889A CN 112200889 A CN112200889 A CN 112200889A CN 202011197925 A CN202011197925 A CN 202011197925A CN 112200889 A CN112200889 A CN 112200889A
Authority
CN
China
Prior art keywords
image
semantic segmentation
semantic
information
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011197925.1A
Other languages
Chinese (zh)
Inventor
周千寓
程光亮
石建萍
马利庄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sensetime Intelligent Technology Co Ltd
Original Assignee
Shanghai Sensetime Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sensetime Intelligent Technology Co Ltd filed Critical Shanghai Sensetime Intelligent Technology Co Ltd
Priority to CN202011197925.1A priority Critical patent/CN112200889A/en
Publication of CN112200889A publication Critical patent/CN112200889A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/60Editing figures and text; Combining figures or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a sample image generation, image processing, and intelligent driving control method and device, including: acquiring a source domain image, first annotation information of the source domain image and a target domain image; performing semantic fusion on the source domain image and the target domain image based on a spatial prior distribution matrix representing distribution characteristics of various objects in each image of the source domain and a first semantic segmentation image of the target domain image to obtain a fused image; performing semantic fusion on the first semantic segmentation image and the first annotation information of the source domain image to obtain second annotation information corresponding to the fused image; and generating a sample image according to the fused image and the second marking information corresponding to the fused image.

Description

Sample image generation method, sample image processing method, intelligent driving control method and device
Technical Field
The disclosure relates to the technical field of computers, in particular to a method and a device for sample image generation, image processing and intelligent driving control.
Background
Generally, in supervised training of a neural network, sample data with labeled information needs to be obtained in advance, and then the neural network needs to be trained by using the sample data. However, because the neural network requires more sample data in the training process and the labeling of the sample data is time-consuming, people propose an unsupervised field adaptive method to train the neural network based on the sample data.
When a neural network is trained by using an unsupervised field adaptive method, a labeled sample image and an unlabeled sample image are generally fused, a new sample image is generated based on the fused image and a label corresponding to the sample image, and the neural network is trained, so that the neural network can learn the characteristics of the labeled sample image and the characteristics of the unlabeled sample image.
Currently, in the process of fusing a labeled sample image and an unlabeled sample image, the labeled sample image and the unlabeled sample image are generally subjected to alignment fusion, and the alignment fusion mode may cause the situation that each object in the fused image is unreasonably distributed; for example, if the sample image in the source domain includes a telegraph pole, an automobile, etc., after the labeled sample image is fused with the unlabeled sample image, the telegraph pole may appear in the center of the road, and the automobile may appear in the sky, which may cause the labeling of the sample data to be unreasonable, thereby affecting the training accuracy of the neural network.
Disclosure of Invention
The embodiment of the disclosure at least provides a sample image generation method, an image processing method and an intelligent driving control method and device.
In a first aspect, an embodiment of the present disclosure provides a sample image generation method, including:
acquiring a source domain image, first annotation information of the source domain image and a target domain image;
performing semantic fusion on the source domain image and the target domain image based on a spatial prior distribution matrix representing distribution characteristics of various objects in each image of the source domain and a first semantic segmentation image of the target domain image to obtain a fused image;
performing semantic fusion on the first semantic segmentation image and the first annotation information of the source domain image to obtain second annotation information corresponding to the fused image;
and generating a sample image according to the fused image and the second marking information corresponding to the fused image.
According to the method, based on a first semantic segmentation image, various objects in a target domain image can be obtained, then various objects in the target domain image are fused into a source domain image by utilizing a predefined spatial prior distribution matrix for representing the distribution condition of the objects in the image to obtain a fused image, and under the guidance of the spatial prior distribution matrix, the positions of various objects in the obtained fused image can be distributed more reasonably, so that generated sample data is more reasonable. And then when the fused image is used for training the neural network, the interference caused by the unreasonable distribution of objects in the fused image to the precision of the neural network is reduced, and the precision of the neural network is improved.
In a possible embodiment, the semantically fusing the source domain image and the target domain image based on a spatial prior distribution matrix representing distribution characteristics of a plurality of objects in each image of the source domain and a first semantic segmentation image of the target domain image to obtain a fused image includes:
determining a target mask image corresponding to a first semantic segmentation image based on the space prior distribution matrix and the first semantic segmentation image of the target domain image;
and performing semantic fusion on the source domain image and the target domain image based on the target mask image to obtain a fused image.
Therefore, when the source domain image and the target domain image are fused, the target mask image determined based on the space prior distribution matrix combines the distribution rule of each object in the image, so that the fused image obtained based on the target mask image can better accord with the real distribution of each object.
In one possible embodiment, the determining a target mask image corresponding to a first semantically segmented image based on the spatial prior distribution matrix and the first semantically segmented image of the target domain image includes:
multiplying the space prior distribution matrix with the first semantic segmentation image to obtain a semantic distribution map corresponding to the first semantic segmentation image; the value of each pixel point in the semantic distribution map is represented, and the pixel point is the probability of belonging to the different objects;
aiming at any pixel point, determining semantic information corresponding to the pixel point based on the probability that the pixel point belongs to the multiple objects;
setting the value of a pixel point of which the corresponding semantic information is the target semantic information as a first preset value, and setting the value of a pixel point of which the corresponding semantic information is not the target semantic information as a second preset value to obtain the target mask image.
Based on the above embodiment, the value of each pixel point in the first semantic segmentation image represents the probability that the pixel point belongs to multiple objects, the first semantic segmentation image is multiplied by the spatial prior distribution matrix, and when determining semantic information corresponding to each pixel point, the distribution rules of each object are combined, so that the semantic information corresponding to the pixel point is more accurate in determination.
In one possible embodiment, the determining a target mask image corresponding to a first semantically segmented image based on the spatial prior distribution matrix and the first semantically segmented image of the target domain image includes:
multiplying the space prior distribution matrix with the first semantic segmentation image to obtain a semantic distribution map corresponding to the first semantic segmentation image; the value of each pixel point in the semantic distribution map is represented, and the pixel point is the probability of belonging to different objects;
aiming at any pixel point, determining semantic information corresponding to the pixel point based on the probability that the pixel point belongs to the multiple objects;
setting the value of a pixel point of which the corresponding semantic information is the target semantic information as a first preset value, setting the value of a pixel point of which the corresponding semantic information is not the target semantic information as a second preset value, and setting the value of a pixel point of which the corresponding semantic information is the associated semantic information of the target semantic information as the first preset value to obtain the target mask image.
In a possible implementation manner, the semantically fusing the source domain image and the target domain image based on the target mask image to obtain the fused image includes:
taking an image composed of corresponding pixel points of the pixel points with the value of the second preset value in the target mask image in the source domain image as a first image to be fused corresponding to the source domain image; and taking an image composed of corresponding pixel points of the pixel points with the value of the first preset value in the target domain image and the corresponding pixel points of the pixel points with the value of the first preset value in the target mask image as a second image to be fused corresponding to the target domain image;
and fusing the first image to be fused and the second image to be fused to obtain the fused image.
In one possible embodiment, after generating the sample image, the method further comprises:
and training a semantic segmentation network by using the source domain image, the first labeling information and a plurality of sample images.
In one possible embodiment, the semantic segmentation network comprises a student network and a teacher network; when a sample image is generated, the first semantic segmentation image of the target domain image is obtained by performing semantic segmentation processing on the target domain image by the teacher network.
In one possible embodiment, the training a semantic segmentation network using the source domain image, the first annotation information, and a plurality of sample images includes:
updating parameter values of the student network by using the source domain image, the first annotation information, the fused image and the second annotation information;
updating the parameter values of the teacher network based on the updated parameter values of the student network.
In a possible embodiment, the method further comprises:
performing semantic segmentation processing on the noise image of the target domain image based on the teacher network to obtain a second semantic segmentation image; wherein the noise image of the target domain image is an image after noise is added to the target domain image;
determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;
the updating the parameter value of the student network by using the source domain image, the first annotation information, the fused image and the second annotation information includes:
and updating the parameter values of the student network by using the source domain image, the first annotation information, the fusion image, the second annotation information and the credibility information of each pixel point in the second semantic segmentation image.
In a possible embodiment, the updating the parameter value of the student network by using the source domain image, the first labeled information, the fused image, the second labeled information, and the reliability information of each pixel point in the second semantic segmentation image includes:
performing semantic fusion on the credibility information of each pixel point in the second semantic segmentation image and the credibility information of each pixel point in the source domain image to obtain fused credibility information; and the number of the first and second groups,
performing semantic segmentation processing on the fused image by using the student network to obtain a third semantic segmentation image;
determining a consistency loss based on the third semantically segmented image, the second annotation information, and the fused confidence information; determining a weight of the consistency loss based on a current iteration number;
performing semantic segmentation processing on the source domain image by using the student network to obtain a fourth semantic segmentation image; determining semantic segmentation loss based on the fourth semantic segmentation image and the first annotation information;
updating parameter values for the student network based on the consistency loss, the weights, and the semantic segmentation loss.
In the above embodiment, the consistency loss weight is determined according to the current iteration number, the adjustment process of the parameter values of the student network is supervised based on the consistency loss, the determined consistency loss weight and the semantic segmentation loss, and the influences of the consistency loss and the semantic segmentation loss on the parameter values of the student network and the teacher network are dynamically adjusted as the iteration number of the student network and the teacher network increases, so that the specific features in the target domain image are learned on the premise of ensuring the semantic segmentation accuracy of the student network and the teacher network.
In one possible embodiment, performing semantic segmentation processing on the noise image of the target domain image based on the teacher network to obtain a second semantic segmentation image includes:
performing semantic segmentation on a plurality of noise images of the target domain image based on the teacher network to obtain a plurality of intermediate semantic segmentation images;
and generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images.
Therefore, the teacher network is used for performing semantic segmentation processing on the multiple noise images respectively to obtain multiple middle semantic segmentation images, and the second semantic segmentation image is generated based on the multiple middle semantic segmentation images, so that more uncertain information in the noise images can be extracted, reliability information of each pixel point in the second semantic segmentation image obtained based on the noise images has better prominence, and optimization efficiency of student network parameter values is improved.
In one possible embodiment, the generating the second semantically segmented image based on the plurality of intermediate semantically segmented images comprises:
calculating a pixel value mean value of pixel points at corresponding positions in the multiple intermediate semantic segmentation images in sequence;
and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
In a possible embodiment, the determining, based on the second semantically segmented image, reliability information of each pixel point in the second semantically segmented image includes:
determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image;
comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold;
determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result;
if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.
In one possible embodiment, the information entropy threshold is generated by:
and determining the information entropy threshold value based on the semantic segmentation type of the teacher network.
In one possible embodiment, the updating the parameter values of the teacher network based on the updated parameter values of the student network includes:
performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values;
and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
In a second aspect, an embodiment of the present disclosure provides an image processing method, including:
acquiring an image to be processed;
and performing semantic segmentation processing on the image to be processed by using a semantic segmentation network trained on the sample image obtained based on the first aspect or any one of the possible implementation manners of the first aspect to obtain a semantic segmentation result of the image to be processed.
In a third aspect, an embodiment of the present disclosure provides an intelligent driving control method, including:
acquiring an image acquired by a driving device in the driving process;
detecting a target object in the image by using a semantic segmentation network trained on a sample image obtained based on the first aspect or any one of the possible embodiments of the first aspect;
controlling the running device based on the detected target object.
In a fourth aspect, an embodiment of the present disclosure provides a sample image generation apparatus, including:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a source domain image, first annotation information of the source domain image and a target domain image;
the fusion module is used for performing semantic fusion on the source domain image and the target domain image based on a space prior distribution matrix representing the distribution characteristics of various objects in each image of the source domain and a first semantic segmentation image of the target domain image to obtain a fusion image;
the fusion module is further configured to perform semantic fusion on the first semantic segmentation image and the first annotation information of the source domain image to obtain second annotation information corresponding to the fusion image;
and the generating module is used for generating a sample image according to the fused image and the second marking information corresponding to the fused image.
In a fifth aspect, an embodiment of the present disclosure provides an image processing apparatus, including:
the second acquisition module is used for acquiring an image to be processed;
and the segmentation module is configured to perform semantic segmentation processing on the image to be processed by using a semantic segmentation network trained on the sample image obtained based on the first aspect or any one of the possible implementation manners of the first aspect, so as to obtain a semantic segmentation result of the image to be processed.
In a sixth aspect, an embodiment of the present disclosure provides an intelligent travel control device, including:
the third acquisition module is used for acquiring images acquired by the running device in the running process;
a detection module, configured to detect a target object in a sample image obtained based on the first aspect or any one of the possible implementations of the first aspect, using a semantic segmentation network trained on the sample image;
a control module for controlling the travel device based on the detected target object.
In a seventh aspect, an embodiment of the present disclosure further provides a computer device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when a computer device is running, the machine-readable instructions when executed by the processor performing the steps of the first aspect, or any one of the possible implementations of the first aspect, or the second aspect, or the third aspect.
In an eighth aspect, this disclosure also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program performs the steps in the first aspect, or any one of the possible implementations of the first aspect, or performs the steps in the second aspect, or performs the steps in the third aspect.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
FIG. 1 illustrates a fused image schematic provided by an embodiment of the present disclosure;
FIG. 2 shows a flow chart of a sample image generation method provided by an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a method for semantic fusion of a source domain image and a target domain image to obtain a fused image according to an embodiment of the present disclosure;
FIG. 4 is a flowchart illustrating a semantic segmentation network training method provided by an embodiment of the present disclosure;
FIG. 5 is a schematic diagram illustrating an overall architecture of a sample image generation method provided by an embodiment of the present disclosure;
FIG. 6 shows a flow chart of an image processing method provided by an embodiment of the present disclosure;
fig. 7 is a flowchart illustrating an intelligent driving control method according to an embodiment of the present disclosure;
fig. 8 shows an architecture diagram of a sample image generation apparatus provided by an embodiment of the present disclosure;
fig. 9 is a schematic diagram illustrating an architecture of an image processing apparatus provided in an embodiment of the present disclosure;
fig. 10 is a schematic diagram illustrating an architecture of an intelligent driving control device provided in an embodiment of the present disclosure;
FIG. 11 is a schematic structural diagram of a computer device provided by an embodiment of the present disclosure;
FIG. 12 is a schematic diagram of another computer device provided by the embodiments of the present disclosure;
fig. 13 shows a schematic structural diagram of another computer device provided in the embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
For example, in the related art, when a labeled sample image a and an unlabeled sample image B are fused, alignment fusion is generally performed, specifically, a segmentation map corresponding to the sample image B may be predicted first, the number of types of objects included in the sample image B is determined, then any several types of objects are selected from the segmentation map, a mask image is generated based on a region image corresponding to any selected object in the sample image B, values of pixel points of the objects belonging to the selected types in the mask image are 1, values of the remaining pixel points are 0, the mask image is multiplied by the sample image B to obtain a fused image corresponding to the sample image B, the mask image is inverted, that is, a pixel value of a pixel point having a pixel value of 0 is adjusted to 1, a pixel value of a pixel point having a pixel value of 1 is adjusted to 0, the mask image after inversion is multiplied by the sample image a, and obtaining a fused image corresponding to the sample image A, and fusing the fused image corresponding to the sample image A and the fused image corresponding to the sample image B to obtain a fused image corresponding to the sample image A and the sample image B.
For example, reference may be made to fig. 1, however, in this fusion method, since semantic correlation between objects is not considered, for example, a car and a road are semantically correlated, after fusion is performed by this method, the car may float in the air, and therefore, a fusion image finally obtained by this fusion method may be unreasonable, and the accuracy of the neural network may be affected based on unreasonable fusion image to perform training of the neural network.
Based on the research, the present disclosure provides a sample image generation method, which may acquire various objects in a target domain image based on a first semantic segmentation image, and then fuse the various objects in the target domain image into a source domain image by using a predefined spatial prior distribution matrix representing the distribution of the objects in the image, so as to obtain a fused image, where the positions of the various objects in the obtained fused image may have more reasonable distribution under the guidance of the spatial prior distribution matrix, so as to obtain more reasonable sample data; and then when the fused image is used for training the neural network, the interference caused by the unreasonable distribution of objects in the fused image to the precision of the neural network is reduced, and the precision of the neural network is improved.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
To facilitate understanding of the present embodiment, first, a detailed description is given of a sample image generation method disclosed in an embodiment of the present disclosure, where an execution subject of the sample image generation method provided in the embodiment of the present disclosure is generally a computer device with certain computing power, and the computer device includes, for example: a terminal device, which may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle mounted device, a wearable device, or a server or other processing device. In some possible implementations, the sample image generation method may be implemented by a processor calling computer readable instructions stored in a memory.
Referring to fig. 2, a flowchart of a sample image generation method provided in an embodiment of the present disclosure is shown, where the method includes steps 201 to 204, where:
step 201, acquiring a source domain image, first annotation information of the source domain image, and a target domain image.
202, performing semantic fusion on the source domain image and the target domain image based on a space prior distribution matrix representing the distribution characteristics of various objects in each image of the source domain and the first semantic segmentation image of the target domain image to obtain a fused image.
Step 203, performing semantic fusion on the first semantic segmentation image and the first annotation information of the source domain image to obtain second annotation information corresponding to the fused image.
And 204, generating a sample image according to the fused image and the second labeling information corresponding to the fused image.
The following is a detailed description of the above steps.
For step 201,
The source domain image can be any image in a source domain, each pixel point in the source domain image has corresponding first label information, and the first label information is used for representing an object corresponding to the pixel point. The target domain image can be any image in a target domain, a pixel point in the target domain image does not have corresponding labeling information, and the source domain and the target domain belong to different fields.
With respect to step 202,
In a possible implementation manner, the first semantic segmentation image of the target domain image may be obtained by performing semantic segmentation processing on the target domain image based on a neural network. The value of each pixel point in the first semantic segmentation image represents the probability that the pixel point belongs to a predefined object, the predefined object is the type of semantic segmentation in the method provided by the disclosure, for example, if 20 types of objects are predefined, the number of channels of the first semantic segmentation image is 20, different channels correspond to different objects, the value of each pixel point is a 20-dimensional vector, and each feature value in the vector represents the probability that the pixel point belongs to each type of object.
The dimensionality of the spatial prior distribution matrix is the same as the number of the predefined objects, each dimensionality of the spatial prior distribution matrix corresponds to a different object, the value of each position in each dimensionality represents the probability of the position of the corresponding object in the dimensionality, illustratively, if the spatial prior matrix is a 10-dimensional matrix, each dimensionality corresponds to a different object, and if the object corresponding to the nth dimension is an automobile, the value of each position in the dimensionality represents the probability of the position of the corresponding automobile.
The method provided by the disclosure is mainly applied to the automatic driving vehicle, so that the source domain images are all images acquired by the automatic driving vehicle, and in the images acquired by the automatic driving vehicle, each object should have a corresponding appearance position, for example, a road generally appears in the center of the images, and the sky generally appears on the upper parts of the images.
In determining the prior spatial distribution matrix, the prior spatial distribution vectors of the prior spatial distribution matrix may be determined, and the size of the vectors should be consistent with the size of the source domain image, for example, if the source domain image is an M × N image, the prior spatial distribution vectors should also be M × N dimensions.
Specifically, different spatial prior distribution vectors are used for describing distribution situations of different problems in the image, and each position of the spatial prior distribution vector is mapped with each position of the source domain image one by one. When determining the spatial prior distribution vector, for example, if the spatial prior distribution vector is used to describe the distribution of the object a in the image, the source domain image has S images in total, and for each position of the spatial prior distribution vector, it is determined that the position is the number of times the object a appears in different source domain images, and if there are K images with the object a at the position, the value at the position of the spatial prior distribution vector is K/S.
In a possible implementation, when performing semantic fusion on a source domain image and a target domain image based on a spatial prior distribution matrix and a first semantic segmentation image to obtain a fused image, reference may be made to the method shown in fig. 3, which includes the following steps:
step 301, determining a target mask image corresponding to a first semantic segmentation image based on the spatial prior distribution matrix and the first semantic segmentation image of the target domain image.
Step 302, performing semantic fusion on the source domain image and the target domain image based on the target mask image to obtain the fused image.
In step 301, specifically, when a target mask image corresponding to a first semantic segmentation image is determined, a spatial prior distribution matrix may be multiplied by the first semantic segmentation image to obtain a semantic distribution map corresponding to the first semantic segmentation image; wherein, the value taking of each pixel point in the semantic distribution map represents the probability that the pixel point belongs to various objects; for any pixel point, semantic information corresponding to the pixel point can be determined based on the probability that the pixel point belongs to various objects, and then a target mask image is determined based on the semantic information of each pixel point.
Specifically, the dimension of the spatial prior distribution matrix is the same as the number of channels of the first semantic segmentation image, and when the spatial prior distribution matrix is multiplied by the first semantic segmentation image, the corresponding values of the corresponding positions may be subjected to dot multiplication.
Illustratively, if the dimensionality of the spatial prior distribution matrix is 3 dimensions, the number of channels of the first semantic segmentation image is also 3, and the size of the spatial prior distribution matrix is M × N × 3, then for a pixel point in the row a and the column b in the first semantic segmentation image, the value of the pixel point is [ N1, N2, N3], which respectively indicates that the probability that the pixel point belongs to the object 1 is N1, the probability that the pixel point belongs to the object 2 is N2, and the probability that the pixel point belongs to the object 3 is N3; the values of [ m1, m2, m3] in different spatial prior distribution vectors at the a-th row and the b-th column in the spatial prior distribution matrix respectively indicate that the frequency of the object 1 at the position is m1, the probability of the object 2 at the position is m2, and the probability of the object 3 at the position is m3, and then the values of [ m1 × n1, m2 × n2, and m3 × n3] at the position in the semantic distribution map corresponding to the spatial prior distribution matrix.
For any pixel point, when determining semantic information corresponding to the pixel point based on the probability that the pixel point belongs to various objects, the object with the highest probability can be used as the semantic information corresponding to the pixel point.
Here, it should be noted that the number of the predefined objects may be T, the number of the semantic information finally determined by each pixel point in the final semantic distribution map may be only D, D is less than or equal to T, and D and T are both positive integers.
When the target mask image is determined based on the semantic information of each pixel, the value of the pixel of which the corresponding semantic information is the target semantic information can be set as a first preset value, and the value of the pixel of which the corresponding semantic information is not the target semantic information can be set as a second preset value, so that the target mask image is obtained. In practical applications, the first preset value may be 1, and the second preset value may be 0.
The target semantic information may be at least one selected randomly from the semantic information of each pixel point, or may be semantic information determined based on a selection instruction of a user after the selection instruction of the user is received.
In another possible implementation, when the target semantic information has corresponding associated semantic information, when determining the target mask image, in addition to setting a value of a pixel point of which the corresponding semantic information is the target semantic information as a first preset value and setting a value of a pixel point of which the corresponding semantic information is not the target semantic information as a second preset value, a value of a pixel point of which the corresponding semantic information is the associated semantic information also needs to be set as the first preset value.
Here, the associated semantic information corresponding to the target semantic information is semantic information that needs to be presented at the same time, for example, if the target semantic information is a sign, the associated semantic information corresponding to the target semantic information is a pillar of the sign, and if the target semantic information is a cyclist, the associated semantic information corresponding to the target semantic information is a bicycle.
The region image of the object corresponding to the target semantic information is an image to be fused, and if the target semantic information with the corresponding associated semantic information is fused independently, unreasonable semantics such as an indication board is suspended, so that the generated fused image can better accord with real rules by combining the associated semantic information corresponding to the target semantic information and semantic correlations among the objects when the fused image is generated.
In step 302, when semantic fusion is performed on the source domain image and the target domain image based on the target mask image, an image formed by corresponding pixel points in the source domain image and pixel points in the target mask image, which take the value of the second preset value, may be used as a first image to be fused corresponding to the source domain image; and taking an image composed of corresponding pixel points of the pixel points with the value of the first preset value in the target domain image and the corresponding pixel points of the pixel points with the value of the first preset value in the target mask image as a second image to be fused corresponding to the target domain image; and fusing the first image to be fused and the second image to be fused to obtain the fused image.
In practical application, the first preset value may be 1, the second preset value may be 0, the sizes of the target mask image, the source domain image and the target domain image are the same, the target mask image and the target domain image may be subjected to corresponding pixel multiplication, corresponding pixel points of first pixel points with a value of 1 in the target mask image in the target domain image may be reserved, corresponding pixel points of second pixel points with a value of 0 in the target mask image in the target domain image are covered, that is, the value after multiplication is 0; similarly, in the source domain image, the target mask image may be inverted, and then the inverted target mask image is multiplied by the source domain image, a corresponding pixel point of a first pixel point whose value is 1 in the target mask image in the source domain image may be reserved, and a corresponding pixel point of a second pixel point whose value is 0 in the target mask image in the source domain image is covered, that is, the value obtained after the multiplication is 0.
Specifically, the calculation can be performed by the following formula:
XM=M⊙XT+(1-M)⊙XS (1)
wherein, XMRepresenting a fused image, M representing a mask matrix, XSRepresenting a source domain image, XTIndicating a target field image, an indicates a corresponding pixel multiplication.
For step 203,
In a possible implementation manner, when performing semantic fusion on the first semantic segmentation image and the first annotation information of the source domain image, the semantic fusion may be performed on the first semantic segmentation image and the first annotation information of the source domain image based on the target mask image obtained in step 202, so as to obtain second annotation information corresponding to the fused image.
Specifically, the calculation can be performed by the following formula:
YM=M⊙YT+(1-M)⊙YS (2)
wherein, YMRepresenting the second label information corresponding to the fused image, M representing a mask matrix, YSRepresenting a first semantically segmented image, YTFirst annotation information, which indicates the source domain image, indicates the corresponding pixel multiplication.
For step 204,
Here, the plurality of fused images and the second label information corresponding to the fused images constitute the sample image.
In one possible implementation, after generating the sample image, the semantic segmentation network may be trained using the source domain image, the first annotation information, and the plurality of sample images. Here, the first semantically-segmented image of the target domain image may be obtained by performing semantic segmentation processing on the target domain image by a teacher network.
In one embodiment, the semantic segmentation Network may include a neural Network of a Student Network and a Teacher Network, and the parameter values of the Student Network and the Teacher Network may be initialized before updating the parameter values of the Student Network (Student Network) and the Teacher Network (Teacher Network).
For example, a student network may be trained first based on a plurality of source domain images in a source domain to obtain a preliminarily trained student network, then a teacher network may be obtained by determining a network parameter of the teacher network based on a network parameter of the preliminarily trained student network, and the preliminary training process is a process of initializing parameter values of the student network and the teacher network.
Specifically, when training a semantic segmentation network by using the source domain image, the first annotation information, and a plurality of sample images, the method shown in fig. 4 may be referred to, which includes the following steps:
step 401, updating the parameter value of the student network by using the source domain image, the first annotation information, the fusion image, and the second annotation information.
And step 402, updating the parameter values of the teacher network based on the updated parameter values of the student network.
In a possible implementation manner, semantic segmentation processing may be performed on a noise image of a target domain image based on a teacher network to obtain a second semantic segmented image, and then reliability information of each pixel point in the second semantic image is determined based on the second semantic segmented image, and when a parameter value of the student network is updated by using a source domain image, the first annotation information, the fusion image, the second annotation information, and the second annotation information, a parameter value of the student network may be updated by using the source domain image, the first annotation information, the fusion image, the second annotation information, and the reliability information of each pixel point in the second semantic segmented image.
Specifically, the noise image of the target domain image may be an image after random noise is added to the target domain image, and exemplary random noise includes, for example: any one of gaussian noise, white noise, etc. may be determined according to actual needs. The size of the noise image of the target field image is the same as the size of the target field image.
In one possible embodiment, there may be a plurality of noise images of the target domain image; in this case, when the second semantic segmentation image is obtained by performing semantic segmentation processing on the noise image of the target domain image, a plurality of intermediate semantic segmentation images may be obtained by performing semantic segmentation processing on a plurality of noise images of the target and the image based on a teacher network, and then the second semantic segmentation image may be generated by segmenting the image based on the plurality of intermediate semantic segmentation images.
Specifically, when a second semantic segmentation image is generated based on a plurality of intermediate semantic segmentation images, pixel values of pixel points at corresponding positions in the plurality of intermediate semantic segmentation images can be averaged in sequence, and then the pixel value of the pixel point at any corresponding position is determined as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
For example, the size of the target domain image is h × w, and there are N noise images of the target domain image, which are a1, a2, … …, and AN; performing semantic segmentation processing on the multiple noise images by using a teacher network to obtain an intermediate semantic segmentation image of the ith noise image
Figure BDA0002754523790000141
Expressed as:
Figure BDA0002754523790000142
wherein x istRepresenting a target domain image; h represents the height of the target domain image, and w represents the width of the target domain image; and C represents the semantic segmentation type of the teacher network.
Second semantically segmented image
Figure BDA0002754523790000143
For example, the following formula (3) is satisfied:
Figure BDA0002754523790000144
therefore, random noise is injected into the target domain image for multiple times to generate multiple noise images, the images are segmented based on the intermediate semantics corresponding to the multiple noise images respectively to obtain a second semantic segmented image, more uncertainty information in the noise images can be extracted, reliability information of each pixel point in the second semantic segmented image obtained based on the noise images has better prominence, and optimization efficiency of student network parameter values is improved.
After the second semantic segmentation image is obtained, when the reliability information of each pixel point in the second semantic segmentation image is determined, the following steps may be included:
step 1, determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image.
Here, the information entropy of any pixel point
Figure BDA0002754523790000145
For example, the following formula (4) is satisfied:
Figure BDA0002754523790000146
and 2, determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold value.
Here, the information entropy threshold may be determined based on, for example, a semantic division type of the teacher network.
The information entropy threshold H satisfies, for example, the following formula (5):
Figure BDA0002754523790000151
wherein a, b and c are all hyper-parameters; kmaxlogC; and C represents the semantic segmentation type of the teacher network. t represents the current iteration round number; t is tmaxThe maximum number of iteration rounds is indicated.
Illustratively, the information entropy threshold satisfies, for example:
Figure BDA0002754523790000152
after the information entropy threshold of each pixel point is determined, for example, the information entropy of each pixel point in the second semantic segmentation image may be compared with a predetermined information entropy threshold; and then determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result.
If the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.
In a specific implementation, as can be known from the above formula (4), the value of the information entropy is a negative number; for a certain pixel point in the second semantic segmentation image, the smaller the value of the information entropy of the pixel point is, the higher the credibility of the pixel point is represented, that is, the higher the credibility of the classification of the pixel point in the corresponding target domain image represented by the pixel value of the pixel point in the second semantic segmentation image is. When consistency loss calculation is carried out, the pixel points with higher reliability in the second semantic segmentation image are taken into consideration, and the influence of the pixel points with higher reliability on loss is increased; and for the pixel points with lower credibility in the second semantic segmentation image, the influence of the pixel points on consistency loss can be reduced, and even the influence of the pixel points on consistency loss is eliminated.
Further, for example, a preset that the pixel value is authentic may be set to 1; setting a incredible preset value of a pixel value to 0; for another example, a preset value where the pixel value is authentic may be set to 1, a preset value where the pixel value is not authentic may be set to 0.5, and so on. The specific setting can be carried out according to the actual needs.
Further, for example, the reliability information of each pixel point in the second semantic segmentation image satisfies the following formula (6):
Figure BDA0002754523790000153
wherein H represents an information entropy threshold; i (-) represents a 0-1 function; and is
Figure BDA0002754523790000154
When the formula is adopted, I (·) takes 1;
Figure BDA0002754523790000155
when it is, I (. cndot.) takes 0.
In step 401, specifically, when the reliability information of each pixel point in the source domain image, the first annotation information, the fusion image, the second annotation information, and the second semantic segmentation image is used to update the parameter value of the student network, the method may include the following steps:
step 1, performing semantic fusion on the credibility information of each pixel point in the second semantic segmentation image and the credibility information of each pixel point in the source domain image to obtain fused credibility information; and performing semantic segmentation processing on the fused image by using the student network to obtain a third semantic segmentation image.
Step 2, determining consistency loss based on the third semantic segmentation image, the second annotation information and the fusion credibility information; determining a weight of the loss of consistency based on a current number of iterations.
Step 3, semantic segmentation processing is carried out on the source domain image by utilizing the student network to obtain a fourth semantic segmentation image; and determining semantic segmentation loss based on the fourth semantic segmentation image and the first annotation information.
And 4, updating the parameter values of the student network based on the consistency loss, the weight and the semantic segmentation loss.
For the above step 1, when performing semantic fusion on the credibility information of each pixel point in the second semantic segmentation image and the credibility information of each pixel point in the source domain image, the fusion may be performed based on the target mask image calculated in step 202. In practical application, the credibility information of each pixel point in the source domain image is obtained from the first annotation information of the source domain image, so that each pixel point in the task source domain image can be credible.
Specifically, the method for performing semantic fusion on the credibility information of each pixel point in the second semantic segmentation image and the credibility information of each pixel point in the source domain image is the same as the semantic fusion method of the source domain image and the target domain image, and a description thereof will not be repeated.
For the above step 2, for example, when determining the consistency loss based on the third semantically segmented image, the second annotation information, and the fusion reliability information, the following formula (7) may be referred to:
Figure BDA0002754523790000161
wherein f isθ'Representing a teacher network, fθRepresenting a network of students, Lcon(fθ',fθ) Representing loss of coherence between teacher and student networks, UMRepresenting fusion confidence information, fθ(XM) Representing a third semantically segmented image, YMIndicates the second label information, CE (f)θ(XM),YM) Represents the segmentation loss, sigma, between the third fused segmented image and the second annotation informationjRepresenting a pixel-by-pixel summation.
In the specific implementation, when the credibility information of each pixel point in the second semantic segmentation image is semantically fused with the credibility information of each pixel point in the source domain image, each pixel point in the source domain image is credible, and when the consistency loss is calculated based on the fused credibility information, only the corresponding loss value of the credible pixel point in the third semantic segmentation image is considered (the fused credibility information corresponding to the incredible pixel point is 0), so that the influence of the pixel point with lower prediction precision of the teacher network on the parameter adjustment of the student network can be avoided, and the problem of error accumulation is further avoided.
For step 3 above, exemplary, semantic segmentation penalty LsegFor exampleIs the cross entropy loss of the source domain image, which satisfies the following formula (8):
Figure BDA0002754523790000171
wherein H represents the height of the source domain image; w represents the width of the source domain image; c represents the number of channels; y issFirst annotation information representing a source domain image; ps=fθ((XS)(h,w,c)) Representing a fourth semantically segmented image; xSRepresenting a source domain image; f. ofθ(-) represents a student network.
For the step 4, when updating the parameter values of the student network based on the consistency loss, the weight, and the semantic segmentation loss, a total loss value in the training process may be calculated based on the consistency loss, the weight, and the semantic segmentation loss, and then the network parameter values of the student network may be updated based on the total loss value.
For example, the total loss value in the training process may be calculated according to the following formula:
Ltotal=LsegconLcon (9)
wherein L isconDenotes loss of consistency, LsegRepresenting semantic segmentation loss, λconAnd a weight representing the consistency loss, for example, a dynamic weight, which is set as a rising function that increases with the number of iterations, and which is capable of taking a balance between the semantic segmentation loss and the consistency loss, increasing the dominance of the semantic segmentation loss during early training of the neural network, and gradually increasing the dominance of the consistency loss during later training, to stably control convergence of parameter values of the neural network.
When updating the parameter values of the teacher network based on the updated parameter values of the student network, for example, exponential moving average processing may be performed on the parameter values of the parameters in the student network to obtain target parameter values; and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
Referring to fig. 5, the overall architecture of the training method for the semantic segmentation network is described, and includes fusion of three aspects, where the first aspect is semantic fusion of a source domain image and a target domain image, the second aspect is semantic fusion of first annotation information of the source domain image and a first semantic segmentation image, and the third aspect is fusion of credibility information of the target domain image and credibility information of the source domain image, and specifically, the method may include the following steps:
step 1, inputting a target domain image into a teacher network to obtain a first semantic segmentation image;
step 2, performing semantic fusion on the source domain image and the target domain image based on the first semantic segmentation image to obtain a fusion image;
step 3, the following steps can be synchronously executed:
step 31, adding noise to the target domain image, and inputting the noise image of the target domain image into the teacher image to obtain a second semantic segmentation image;
step 32, inputting the source domain image into a student network to obtain a third semantic segmentation image;
step 33, inputting the fused image into a student network to obtain a fourth semantic segmentation image;
step 4, calculating semantic segmentation loss between the fourth semantic segmentation image and the first annotation information based on a formula (8);
step 5, performing semantic fusion on the first annotation information and the first semantic segmentation image to obtain second annotation information;
step 6, determining information entropy corresponding to the second semantic segmentation image based on the second semantic segmentation image;
step 7, determining credibility information of the second semantic segmentation image based on the information entropy and the information entropy threshold of the second semantic segmentation image;
step 8, performing semantic fusion on the credibility information of the source domain image and the credibility information of the second segmentation image to obtain fused credibility information;
step 9, calculating consistency loss among the third semantic segmentation image, the second annotation information and the fusion credibility information by using a formula (7);
and step 10, calculating the total loss in the training process by using a formula (9), adjusting the parameters of the student network based on the calculated total loss, and adjusting the parameters of the teacher network based on the adjusted parameters of the student network.
Referring to fig. 6, an embodiment of the present disclosure further provides an image processing method, including:
601, acquiring an image to be processed;
step 602, performing semantic segmentation processing on the image to be processed by using a semantic segmentation network trained on a sample image obtained by a sample image generation method according to any embodiment of the present disclosure, so as to obtain a semantic segmentation result of the image to be processed.
The implementation of the method is realized by utilizing the neural network trained by the sample image obtained by the sample image generation method provided by the embodiment of the disclosure when the semantic segmentation processing is carried out on the image to be processed, and the semantic segmentation result of the image to be processed is more accurate through the neural network trained by the sample image.
Referring to fig. 7, an embodiment of the present disclosure further provides an intelligent driving control method, including:
701, acquiring an image acquired by a driving device in the driving process;
step 702, detecting a target object in a sample image by using a trained semantic segmentation network of the sample image obtained by a sample image generation method according to any embodiment of the present disclosure;
and step 703, controlling the running device based on the detected target object.
In a specific implementation, the driving device is, for example, but not limited to, any one of the following: an autonomous vehicle, a vehicle equipped with an Advanced Driving Assistance System (ADAS), a robot, or the like.
Controlling the traveling device, for example, includes controlling the traveling device to accelerate, decelerate, steer, brake, etc., or may play voice prompt information to prompt the driver to control the traveling device to accelerate, decelerate, steer, brake, etc.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, the embodiment of the present disclosure further provides a sample image generation device corresponding to the sample image generation method, and since the principle of solving the problem of the device in the embodiment of the present disclosure is similar to that of the sample image generation method in the embodiment of the present disclosure, the implementation of the device may refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 8, there is shown a schematic architecture diagram of a sample image generating apparatus provided in an embodiment of the present disclosure, the apparatus includes: a first obtaining module 801, a fusion module 802, and a generating module 803; wherein the content of the first and second substances,
a first obtaining module 801, configured to obtain a source domain image, first annotation information of the source domain image, and a target domain image;
a fusion module 802, configured to perform semantic fusion on the source domain image and the target domain image based on a spatial prior distribution matrix representing distribution characteristics of multiple objects in each image of the source domain and a first semantic segmentation image of the target domain image, so as to obtain a fusion image;
the fusion module 802 is further configured to perform semantic fusion on the first semantic segmentation image and the first annotation information of the source domain image to obtain second annotation information corresponding to the fusion image;
the generating module 803 is configured to generate a sample image according to the fused image and the second annotation information corresponding to the fused image.
The fusion module 802 is configured to perform semantic fusion on the source domain image and the target domain image based on a spatial prior distribution matrix representing distribution characteristics of multiple objects in each image of the source domain and a first semantic segmentation image of the target domain image to obtain a fusion image, and is configured to:
determining a target mask image corresponding to a first semantic segmentation image based on the space prior distribution matrix and the first semantic segmentation image of the target domain image;
and performing semantic fusion on the source domain image and the target domain image based on the target mask image to obtain a fused image.
In one possible embodiment, the fusion module 802, when determining the target mask image corresponding to the first semantic segmentation image based on the spatial prior distribution matrix and the first semantic segmentation image of the target domain image, is configured to:
multiplying the space prior distribution matrix with the first semantic segmentation image to obtain a semantic distribution map corresponding to the first semantic segmentation image; the value of each pixel point in the semantic distribution map is represented, and the pixel point is the probability of belonging to the different objects;
aiming at any pixel point, determining semantic information corresponding to the pixel point based on the probability that the pixel point belongs to the multiple objects;
setting the value of a pixel point of which the corresponding semantic information is the target semantic information as a first preset value, and setting the value of a pixel point of which the corresponding semantic information is not the target semantic information as a second preset value to obtain the target mask image.
In one possible embodiment, the fusion module 802, when determining the target mask image corresponding to the first semantic segmentation image based on the spatial prior distribution matrix and the first semantic segmentation image of the target domain image, is configured to:
multiplying the space prior distribution matrix with the first semantic segmentation image to obtain a semantic distribution map corresponding to the first semantic segmentation image; the value of each pixel point in the semantic distribution map is represented, and the pixel point is the probability of belonging to different objects;
aiming at any pixel point, determining semantic information corresponding to the pixel point based on the probability that the pixel point belongs to the multiple objects;
setting the value of a pixel point of which the corresponding semantic information is the target semantic information as a first preset value, setting the value of a pixel point of which the corresponding semantic information is not the target semantic information as a second preset value, and setting the value of a pixel point of which the corresponding semantic information is the associated semantic information of the target semantic information as the first preset value to obtain the target mask image.
In a possible implementation manner, the fusion module 802, when performing semantic fusion on the source domain image and the target domain image based on the target mask image to obtain the fusion image, is configured to:
taking an image composed of corresponding pixel points of the pixel points with the value of the second preset value in the target mask image in the source domain image as a first image to be fused corresponding to the source domain image; and taking an image composed of corresponding pixel points of the pixel points with the value of the first preset value in the target domain image and the corresponding pixel points of the pixel points with the value of the first preset value in the target mask image as a second image to be fused corresponding to the target domain image;
and fusing the first image to be fused and the second image to be fused to obtain the fused image.
In a possible implementation, the apparatus further includes a training module 804 configured to: after generating the sample image, training a semantic segmentation network by using the source domain image, the first annotation information and the plurality of sample images.
In one possible embodiment, the semantic segmentation network comprises a student network and a teacher network; when a sample image is generated, the first semantic segmentation image of the target domain image is obtained by performing semantic segmentation processing on the target domain image by the teacher network.
In one possible embodiment, the training module 804, when training the semantic segmentation network by using the source domain image, the first annotation information, and the plurality of sample images, is configured to:
updating parameter values of the student network by using the source domain image, the first annotation information, the fused image and the second annotation information;
updating the parameter values of the teacher network based on the updated parameter values of the student network.
In a possible implementation, the training module 804 is further configured to:
performing semantic segmentation processing on the noise image of the target domain image based on the teacher network to obtain a second semantic segmentation image;
determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;
the training module 804, when updating the parameter values of the student network by using the source domain image, the first annotation information, the fusion image, and the second annotation information, is configured to:
and updating the parameter values of the student network by using the source domain image, the first annotation information, the fusion image, the second annotation information and the credibility information of each pixel point in the second semantic segmentation image.
In a possible implementation manner, the training module 804, when updating the parameter value of the student network by using the source domain image, the first annotation information, the fusion image, the second annotation information, and the reliability information of each pixel point in the second semantic segmentation image, is configured to:
performing semantic fusion on the credibility information of each pixel point in the second semantic segmentation image and the credibility information of each pixel point in the source domain image to obtain fused credibility information; and the number of the first and second groups,
performing semantic segmentation processing on the fused image by using the student network to obtain a third semantic segmentation image;
determining a consistency loss based on the third semantically segmented image, the second annotation information, and the fused confidence information; determining a weight of the consistency loss based on a current iteration number;
performing semantic segmentation processing on the source domain image by using the student network to obtain a fourth semantic segmentation image; determining semantic segmentation loss based on the fourth semantic segmentation image and the first annotation information;
updating parameter values for the student network based on the consistency loss, the weights, and the semantic segmentation loss.
In a possible implementation manner, the training module 804, when performing semantic segmentation processing on the noise image of the target domain image based on the teacher network to obtain a second semantic segmentation image, is configured to:
performing semantic segmentation on a plurality of noise images of the target domain image based on the teacher network to obtain a plurality of intermediate semantic segmentation images; wherein the noise image of the target domain image is an image after noise is added to the target domain image;
and generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images.
In one possible embodiment, the training module 804, when generating the second semantically segmented image based on the plurality of intermediate semantically segmented images, is configured to:
calculating a pixel value mean value of pixel points at corresponding positions in the multiple intermediate semantic segmentation images in sequence;
and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
In a possible implementation manner, the training module 804, when determining, based on the second semantically segmented image, reliability information of each pixel point in the second semantically segmented image, is configured to:
determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image;
comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold;
determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result;
if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.
In a possible implementation, the training module 804 is further configured to generate the information entropy threshold by:
and determining the information entropy threshold value based on the semantic segmentation type of the teacher network.
In one possible implementation, the training module 804, when updating the parameter values of the teacher network based on the updated parameter values of the student network, is configured to:
performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values;
and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.
Based on the same inventive concept, an image processing apparatus corresponding to the image processing method is also provided in the embodiments of the present disclosure, and since the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the image processing method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.
Referring to fig. 9, there is shown a schematic architecture diagram of an image processing apparatus according to an embodiment of the present disclosure, the apparatus includes: a second obtaining module 901 and a dividing module 902; wherein the content of the first and second substances,
a second obtaining module 901, configured to obtain an image to be processed;
a segmentation module 902, configured to perform semantic segmentation on the image to be processed by using a semantic segmentation network trained on a sample image obtained by using the sample image generation method according to any embodiment of the present disclosure, so as to obtain a semantic segmentation result of the image to be processed.
Based on the same inventive concept, an intelligent driving control device corresponding to the intelligent driving control method is also provided in the embodiments of the present disclosure, and because the principle of solving the problem of the device in the embodiments of the present disclosure is similar to that of the intelligent driving control method in the embodiments of the present disclosure, the implementation of the device can refer to the implementation of the method, and repeated details are not repeated.
Referring to fig. 10, a schematic diagram of an architecture of an intelligent driving control device provided in an embodiment of the present disclosure is shown, where the device includes: a third acquisition module 1001, a detection module 1002 and a control module 1003; wherein the content of the first and second substances,
a third obtaining module 1001, configured to obtain an image acquired by a driving device in a driving process;
a detection module 1002, configured to detect a target object in a sample image obtained based on a sample image generation method according to any embodiment of the present disclosure by using a semantic segmentation network trained on the sample image;
a control module 1003 for controlling the running device based on the detected target object.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
Based on the same technical concept, the embodiment of the disclosure also provides computer equipment. Referring to fig. 11, a schematic structural diagram of a computer device 1100 provided in the embodiment of the present disclosure includes a processor 1101, a memory 1102, and a bus 1103. The storage 1102 is used for storing execution instructions and includes a memory 11021 and an external storage 11022; the memory 11021 is also referred to as an internal memory, and stores temporarily operation data in the processor 1101 and data exchanged with an external memory 11022 such as a hard disk, the processor 1101 exchanges data with the external memory 11022 through the memory 11021, and when the computer device 1100 is operated, the processor 1101 communicates with the memory 1102 through the bus 1103, so that the processor 1101 executes the following instructions:
acquiring a source domain image, first annotation information of the source domain image and a target domain image;
performing semantic fusion on the source domain image and the target domain image based on a spatial prior distribution matrix representing distribution characteristics of various objects in each image of the source domain and a first semantic segmentation image of the target domain image to obtain a fused image;
performing semantic fusion on the first semantic segmentation image and the first annotation information of the source domain image to obtain second annotation information corresponding to the fused image;
and generating a sample image according to the fused image and the second marking information corresponding to the fused image.
Based on the same technical concept, the embodiment of the disclosure also provides computer equipment. Referring to fig. 12, a schematic structural diagram of a computer device 1200 provided in the embodiment of the present disclosure includes a processor 1201, a memory 1202, and a bus 1203. The storage 1202 is used for storing execution instructions, and includes a memory 12021 and an external storage 12022; the memory 12021 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 1201 and data exchanged with an external storage 12022 such as a hard disk, the processor 1201 exchanges data with the external storage 12022 through the memory 12021, and when the computer apparatus 1200 is operated, the processor 1201 and the storage 1202 communicate with each other through the bus 1203 to make the processor 1201 execute the following instructions:
acquiring an image to be processed;
and performing semantic segmentation processing on the image to be processed by utilizing a semantic segmentation network trained on the sample image obtained by the sample image generation method according to any embodiment of the disclosure to obtain a semantic segmentation result of the image to be processed.
Based on the same technical concept, the embodiment of the disclosure also provides computer equipment. Referring to fig. 13, a schematic structural diagram of a computer device 1300 provided in the embodiment of the present disclosure includes a processor 1301, a memory 1302, and a bus 1303. The storage 1302 is used for storing execution instructions and includes a memory 13021 and an external storage 13022; the memory 13021 is also referred to as an internal memory, and is used for temporarily storing operation data in the processor 1301 and data exchanged with an external storage 13022 such as a hard disk, the processor 1301 exchanges data with the external storage 13022 through the memory 13021, and when the computer device 1300 runs, the processor 1301 and the storage 1302 communicate through the bus 1303, so that the processor 1301 executes the following instructions:
acquiring an image acquired by a driving device in the driving process;
detecting a target object in an image by utilizing a semantic segmentation network trained by a sample image obtained based on the sample image generation method of any embodiment of the disclosure;
controlling the running device based on the detected target object.
The disclosed embodiments also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, executes the steps of the sample image generation, image processing, and intelligent driving control method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The computer program product of the sample image generation, image processing, and intelligent driving control method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the sample image generation, image processing, and intelligent driving control method described in the above method embodiments, which may be referred to in the above method embodiments specifically, and are not described herein again.
The embodiments of the present disclosure also provide a computer program, which when executed by a processor implements any one of the methods of the foregoing embodiments. The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (20)

1. A sample image generation method, comprising:
acquiring a source domain image, first annotation information of the source domain image and a target domain image;
performing semantic fusion on the source domain image and the target domain image based on a spatial prior distribution matrix representing distribution characteristics of various objects in each image of the source domain and a first semantic segmentation image of the target domain image to obtain a fused image;
performing semantic fusion on the first semantic segmentation image and the first annotation information of the source domain image to obtain second annotation information corresponding to the fused image;
and generating a sample image according to the fused image and the second marking information corresponding to the fused image.
2. The method according to claim 1, wherein the semantically fusing the source domain image and the target domain image based on a spatial prior distribution matrix representing distribution characteristics of a plurality of objects in each image of a source domain and a first semantic segmentation image of the target domain image to obtain a fused image comprises:
determining a target mask image corresponding to a first semantic segmentation image based on the space prior distribution matrix and the first semantic segmentation image of the target domain image;
and performing semantic fusion on the source domain image and the target domain image based on the target mask image to obtain a fused image.
3. The method according to claim 2, wherein the determining a target mask image corresponding to a first semantically segmented image based on the spatial prior distribution matrix and the first semantically segmented image of the target domain image comprises:
multiplying the space prior distribution matrix with the first semantic segmentation image to obtain a semantic distribution map corresponding to the first semantic segmentation image; the value of each pixel point in the semantic distribution map is represented, and the pixel point is the probability of belonging to different objects;
aiming at any pixel point, determining semantic information corresponding to the pixel point based on the probability that the pixel point belongs to the multiple objects;
setting the value of a pixel point of which the corresponding semantic information is the target semantic information as a first preset value, and setting the value of a pixel point of which the corresponding semantic information is not the target semantic information as a second preset value to obtain the target mask image.
4. The method according to claim 2, wherein the determining a target mask image corresponding to a first semantically segmented image based on the spatial prior distribution matrix and the first semantically segmented image of the target domain image comprises:
multiplying the space prior distribution matrix with the first semantic segmentation image to obtain a semantic distribution map corresponding to the first semantic segmentation image; the value of each pixel point in the semantic distribution map is represented, and the pixel point is the probability of belonging to different objects;
aiming at any pixel point, determining semantic information corresponding to the pixel point based on the probability that the pixel point belongs to the multiple objects;
setting the value of a pixel point of which the corresponding semantic information is the target semantic information as a first preset value, setting the value of a pixel point of which the corresponding semantic information is not the target semantic information as a second preset value, and setting the value of a pixel point of which the corresponding semantic information is the associated semantic information of the target semantic information as the first preset value to obtain the target mask image.
5. The method according to claim 3 or 4, wherein the semantically fusing the source domain image and the target domain image based on the target mask image to obtain the fused image comprises:
taking an image composed of corresponding pixel points of the pixel points with the value of the second preset value in the target mask image in the source domain image as a first image to be fused corresponding to the source domain image; and taking an image composed of corresponding pixel points of the pixel points with the value of the first preset value in the target domain image and the corresponding pixel points of the pixel points with the value of the first preset value in the target mask image as a second image to be fused corresponding to the target domain image;
and fusing the first image to be fused and the second image to be fused to obtain the fused image.
6. The method of any of claims 1-5, wherein after generating the sample image, the method further comprises:
and training a semantic segmentation network by using the source domain image, the first labeling information and a plurality of sample images.
7. The method of claim 6, wherein the semantic segmentation network comprises a student network and a teacher network; when a sample image is generated, the first semantic segmentation image of the target domain image is obtained by performing semantic segmentation processing on the target domain image by the teacher network.
8. The method of claim 7, wherein training a semantic segmentation network using the source domain image, the first annotation information, and a plurality of sample images comprises:
updating parameter values of the student network by using the source domain image, the first annotation information, the fused image and the second annotation information;
updating the parameter values of the teacher network based on the updated parameter values of the student network.
9. The method of claim 8, further comprising:
performing semantic segmentation processing on the noise image of the target domain image based on the teacher network to obtain a second semantic segmentation image; wherein the noise image of the target domain image is an image after noise is added to the target domain image;
determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;
the updating the parameter value of the student network by using the source domain image, the first annotation information, the fused image and the second annotation information includes:
and updating the parameter values of the student network by using the source domain image, the first annotation information, the fusion image, the second annotation information and the credibility information of each pixel point in the second semantic segmentation image.
10. The method of claim 9, wherein the updating the parameter values of the student network by using the source domain image, the first annotation information, the fused image, the second annotation information, and the reliability information of each pixel point in the second semantic segmentation image comprises:
performing semantic fusion on the credibility information of each pixel point in the second semantic segmentation image and the credibility information of each pixel point in the source domain image to obtain fused credibility information; and the number of the first and second groups,
performing semantic segmentation processing on the fused image by using the student network to obtain a third semantic segmentation image;
determining a consistency loss based on the third semantically segmented image, the second annotation information, and the fused confidence information; determining a weight of the consistency loss based on a current iteration number;
performing semantic segmentation processing on the source domain image by using the student network to obtain a fourth semantic segmentation image; determining semantic segmentation loss based on the fourth semantic segmentation image and the first annotation information;
updating parameter values for the student network based on the consistency loss, the weights, and the semantic segmentation loss.
11. The method of claim 9, wherein performing semantic segmentation processing on the noise image of the target domain image based on the teacher network to obtain a second semantic segmented image comprises:
performing semantic segmentation on a plurality of noise images of the target domain image based on the teacher network to obtain a plurality of intermediate semantic segmentation images;
and generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images.
12. The method of claim 11, wherein generating the second semantically segmented image based on the plurality of intermediate semantically segmented images comprises:
calculating a pixel value mean value of pixel points at corresponding positions in the multiple intermediate semantic segmentation images in sequence;
and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.
13. The method of claim 9, wherein determining confidence information for each pixel in the second semantically segmented image based on the second semantically segmented image comprises:
determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image;
comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold;
determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result;
if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.
14. An image processing method, comprising:
acquiring an image to be processed;
performing semantic segmentation processing on the image to be processed by using a semantic segmentation network obtained based on the method of any one of claims 6 to 13 to obtain a semantic segmentation result of the image to be processed.
15. An intelligent travel control method, characterized by comprising:
acquiring an image acquired by a driving device in the driving process;
detecting a target object in the image by using a semantic segmentation network obtained based on the method of any one of claims 6 to 13;
controlling the running device based on the detected target object.
16. A sample image generation apparatus, comprising:
the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring a source domain image, first annotation information of the source domain image and a target domain image;
the fusion module is used for performing semantic fusion on the source domain image and the target domain image based on a space prior distribution matrix representing the distribution characteristics of various objects in each image of the source domain and a first semantic segmentation image of the target domain image to obtain a fusion image;
the fusion module is further configured to perform semantic fusion on the first semantic segmentation image and the first annotation information of the source domain image to obtain second annotation information corresponding to the fusion image;
and the generating module is used for generating a sample image according to the fused image and the second marking information corresponding to the fused image.
17. An image processing apparatus characterized by comprising:
the second acquisition module is used for acquiring an image to be processed;
a segmentation module, configured to perform semantic segmentation processing on the image to be processed by using the semantic segmentation network obtained by the method according to any one of claims 6 to 13, so as to obtain a semantic segmentation result of the image to be processed.
18. An intelligent travel control device, comprising:
the third acquisition module is used for acquiring images acquired by the running device in the running process;
a detection module, configured to detect a target object in the image by using the semantic segmentation network obtained according to any one of claims 6 to 13;
a control module for controlling the travel device based on the detected target object.
19. A computer device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when a computer device is run, the machine-readable instructions when executed by the processor performing the steps of the method of any one of claims 1 to 15. .
20. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1 to 15.
CN202011197925.1A 2020-10-30 2020-10-30 Sample image generation method, sample image processing method, intelligent driving control method and device Pending CN112200889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011197925.1A CN112200889A (en) 2020-10-30 2020-10-30 Sample image generation method, sample image processing method, intelligent driving control method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011197925.1A CN112200889A (en) 2020-10-30 2020-10-30 Sample image generation method, sample image processing method, intelligent driving control method and device

Publications (1)

Publication Number Publication Date
CN112200889A true CN112200889A (en) 2021-01-08

Family

ID=74010664

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011197925.1A Pending CN112200889A (en) 2020-10-30 2020-10-30 Sample image generation method, sample image processing method, intelligent driving control method and device

Country Status (1)

Country Link
CN (1) CN112200889A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706440A (en) * 2021-03-12 2021-11-26 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN113724203A (en) * 2021-08-03 2021-11-30 唯智医疗科技(佛山)有限公司 Segmentation method and device for target features in OCT (optical coherence tomography) image
CN113836271A (en) * 2021-09-28 2021-12-24 北京有竹居网络技术有限公司 Method and product for natural language processing
WO2022134338A1 (en) * 2020-12-23 2022-06-30 平安科技(深圳)有限公司 Domain adaptation method and apparatus, electronic device, and storage medium
CN114998712A (en) * 2022-08-03 2022-09-02 阿里巴巴(中国)有限公司 Image recognition method, storage medium, and electronic device
WO2023030182A1 (en) * 2021-08-30 2023-03-09 华为技术有限公司 Image generation method and apparatus

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022134338A1 (en) * 2020-12-23 2022-06-30 平安科技(深圳)有限公司 Domain adaptation method and apparatus, electronic device, and storage medium
CN113706440A (en) * 2021-03-12 2021-11-26 腾讯科技(深圳)有限公司 Image processing method, image processing device, computer equipment and storage medium
CN113724203A (en) * 2021-08-03 2021-11-30 唯智医疗科技(佛山)有限公司 Segmentation method and device for target features in OCT (optical coherence tomography) image
CN113724203B (en) * 2021-08-03 2024-04-23 唯智医疗科技(佛山)有限公司 Model training method and device applied to target feature segmentation in OCT image
WO2023030182A1 (en) * 2021-08-30 2023-03-09 华为技术有限公司 Image generation method and apparatus
CN113836271A (en) * 2021-09-28 2021-12-24 北京有竹居网络技术有限公司 Method and product for natural language processing
CN113836271B (en) * 2021-09-28 2023-08-15 北京有竹居网络技术有限公司 Method and product for natural language processing
CN114998712A (en) * 2022-08-03 2022-09-02 阿里巴巴(中国)有限公司 Image recognition method, storage medium, and electronic device

Similar Documents

Publication Publication Date Title
CN112200889A (en) Sample image generation method, sample image processing method, intelligent driving control method and device
CN111489365A (en) Neural network training method, image processing method and device
CN111767405A (en) Training method, device and equipment of text classification model and storage medium
US11651214B2 (en) Multimodal data learning method and device
US20200327409A1 (en) Method and device for hierarchical learning of neural network, based on weakly supervised learning
CN106845430A (en) Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN113947764B (en) Image processing method, device, equipment and storage medium
CN113361685B (en) Knowledge tracking method and system based on learner knowledge state evolution expression
CN112163643A (en) Sample generation method, neural network training method, data processing method and device
CN112381227B (en) Neural network generation method and device, electronic equipment and storage medium
CN112926655B (en) Image content understanding and visual question and answer VQA method, storage medium and terminal
CN111291187A (en) Emotion analysis method and device, electronic equipment and storage medium
CN111382870A (en) Method and device for training neural network
CN114091554A (en) Training set processing method and device
CN111179272B (en) Rapid semantic segmentation method for road scene
CN116097277A (en) Method and system for training neural network models using progressive knowledge distillation
CN115187772A (en) Training method, device and equipment of target detection network and target detection method, device and equipment
CN112926461A (en) Neural network training and driving control method and device
CN111523548A (en) Image semantic segmentation and intelligent driving control method and device
Tumu et al. Physics constrained motion prediction with uncertainty quantification
CN113919444A (en) Training method of target detection network, target detection method and device
CN113591892A (en) Training data processing method and device
CN116189284A (en) Human motion prediction method, device, equipment and storage medium
CN114648679A (en) Neural network training method, neural network training device, target detection method, target detection device, equipment and storage medium
CN114004357A (en) Neural network training method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination