CN111489365A

CN111489365A - Neural network training method, image processing method and device

Info

Publication number: CN111489365A
Application number: CN202010278429.2A
Authority: CN
Inventors: 周千寓; 程光亮; 石建萍; 马利庄
Original assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Lingang Intelligent Technology Co Ltd
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2020-08-04
Anticipated expiration: 2040-04-10
Also published as: CN111489365B

Abstract

The disclosure provides a training method of a neural network, an image processing method and an image processing device, wherein the training method comprises the following steps: performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image; performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image and the credibility information; updating the parameter values of the teacher network based on the updated parameter values of the student network. According to the embodiment of the disclosure, the specific characteristics in the student network and the teacher network learning target image are controlled through the first semantic segmentation image, the second semantic segmentation image and the credibility information, so that negative migration of the student network and the teacher network in migration learning is avoided.

Description

Neural network training method, image processing method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a training method for a neural network, an image processing method, and an image processing apparatus.

Background

Image segmentation refers to the task of assigning semantic labels to each pixel of a given image; in the process of supervised training or semi-supervised training of a semantic segmentation model, firstly, labeling a large number of sample images pixel by pixel; the semantic segmentation model is then trained based on the labeled samples. However, a process of labeling a large number of sample images pixel by pixel consumes a large amount of time and cost; in order to solve the problem, a sample data set is constructed by simulating a synthetic sample image; however, because there is a certain difference between the synthesized image and the real image, the difference causes a significant performance reduction when the semantic segmentation network obtained based on the synthesized image training performs semantic segmentation processing on the real image.

Disclosure of Invention

The embodiment of the disclosure at least provides a training method of a neural network, an image processing method and an image processing device.

In a first aspect, an embodiment of the present disclosure provides a training method for a neural network, including: performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image; performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; updating parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information; updating the parameter values of the teacher network based on the updated parameter values of the student network.

The first semantic segmentation image, the second semantic segmentation image and the credibility information are used for controlling the student network and the teacher network to predict the same target image after disturbance to generate a consistent prediction result, so that the student network can learn specific characteristics in the target image in the process of migrating based on the target image, namely, the student network performs migration learning towards a specific direction, and the parameter value of the teacher network is updated according to the parameter value of the student network, so that the teacher network performs migration learning towards the specific direction, and the problem of negative migration is avoided.

In a possible embodiment, the method further comprises: semantic segmentation processing is carried out on the style migration image of the source image by utilizing a student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located; the updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information includes: updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image.

In this way, the style migration image of the source image is subjected to semantic segmentation processing by using the student network to obtain a third semantic segmentation image, and then the parameter value updating process of the student network is supervised based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image, so that the semantic segmentation precision of the student network and the teacher network can be further improved.

In one possible embodiment, the updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, the credibility information, the third semantically segmented image, and annotation information of the source image comprises: determining a consistency loss based on the first semantically segmented image, the second semantically segmented image and the credibility information; determining a weight of the consistency loss based on a current iteration number; determining semantic segmentation losses based on the third semantically segmented image and annotation information of the source image; updating parameter values for the student network based on the consistency loss, the weights, and the semantic segmentation loss.

In this way, the weight of the consistency loss is determined through the current iteration times, the adjusting process of the parameter values of the student network is supervised based on the consistency loss, the determined weight of the consistency loss and the semantic segmentation loss, and the influence of the consistency loss and the semantic segmentation loss on the parameter values of the student network and the teacher network is dynamically adjusted along with the increase of the iteration times of the student network and the teacher network, so that the specific features in the target image are learned on the premise of ensuring the semantic segmentation precision of the student network and the teacher network.

In one possible embodiment, performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image includes: performing semantic segmentation processing on the plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images; and generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images.

Therefore, the teacher network is used for respectively carrying out semantic segmentation processing on the plurality of second noise images to obtain a plurality of middle semantic segmentation images, and the second semantic segmentation images are generated based on the plurality of middle semantic segmentation images, so that more uncertain information in the second noise images can be extracted, reliability information of each pixel point in the second semantic segmentation images obtained based on the second noise images has better prominence, and further the optimization efficiency of student network parameter values is improved.

In one possible embodiment, the generating the second semantically segmented image based on the plurality of intermediate semantically segmented images comprises: calculating a pixel value mean value of pixel points at corresponding positions in the multiple intermediate semantic segmentation images in sequence; and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.

Therefore, more uncertain information can be extracted by solving the pixel value mean value of the pixel points at the corresponding positions in the middle semantic segmentation images.

In a possible embodiment, the determining, based on the second semantically segmented image, reliability information of each pixel point in the second semantically segmented image includes: determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.

In this way, the information entropy of each pixel point in the second semantic segmentation image is extracted through the pixel value of each pixel point in the second semantic segmentation image, and then the credibility information of each pixel point in the second semantic segmentation image is determined based on the information entropy.

In a possible embodiment, the determining the reliability information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold includes: comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold; determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result; if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.

Therefore, only credible pixel points in the second semantic segmentation image are considered for consistency loss of the generated first semantic segmentation image and the second semantic segmentation image, and therefore when parameter values of the student network are updated based on the consistency loss, the result that the student network and the teacher network conduct semantic segmentation processing on target images added with different disturbances tends to be consistent. And then updating the parameter values of the teacher network based on the updated parameter values of the student network, so that the parameter values of the teacher network and the parameter values of the student network can be kept consistent, and the teacher network and the student network can learn the specific characteristics of the target image.

In one possible embodiment, the information entropy threshold is generated by: and determining the information entropy threshold value based on the semantic segmentation type of the teacher network.

In one possible embodiment, updating the parameter values of the teacher network based on the updated parameter values of the student network comprises: performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values; and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.

Therefore, the parameter values of the teacher network are the exponential moving average values based on the parameter values of the student network, so that the teacher network and the student network can converge faster, and the training efficiency of the neural network is improved.

In a second aspect, an embodiment of the present disclosure further provides a training apparatus for a neural network, including: the first processing module is used for performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image; the second processing module is used for performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; a first updating module, configured to update a parameter value of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the reliability information; and the second updating module is used for updating the parameter values of the teacher network based on the updated parameter values of the student network.

In a possible embodiment, the apparatus further comprises: the third processing module is used for performing semantic segmentation processing on the style migration image of the source image by using a student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located; the first updating module, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the reliability information, is configured to: updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image.

In a possible embodiment, the first updating module, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, the credibility information, the third semantically segmented image, and the annotation information of the source image, is configured to: determining a consistency loss based on the first semantically segmented image, the second semantically segmented image and the credibility information; determining a weight of the consistency loss based on a current iteration number; determining semantic segmentation losses based on the third semantically segmented image and annotation information of the source image; updating parameter values for the student network based on the consistency loss, the weights, and the semantic segmentation loss.

In one possible embodiment, the second processing module, when performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image, is configured to: performing semantic segmentation processing on the plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images; and generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images.

In one possible embodiment, the second processing module, when generating the second semantically segmented image based on the plurality of intermediate semantically segmented images, is configured to: calculating a pixel value mean value of pixel points at corresponding positions in the multiple intermediate semantic segmentation images in sequence; and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.

In a possible embodiment, the second processing module, when determining, based on the second semantically segmented image, reliability information of each pixel point in the second semantically segmented image, is configured to: determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.

In a possible implementation manner, the second processing module, when determining the reliability information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold, is configured to: comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold; determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result; if the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.

In a possible implementation, the second processing module is further configured to generate the information entropy threshold by: and determining the information entropy threshold value based on the semantic segmentation type of the teacher network.

In one possible embodiment, the second updating module, when updating the parameter values of the teacher network based on the updated parameter values of the student network, is configured to: performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values; and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.

In a third aspect, an embodiment of the present disclosure further provides an image processing method, including: acquiring an image to be processed; and performing semantic segmentation processing on the image to be processed by using the neural network trained by the neural network training method based on any one of the first aspect to obtain a semantic segmentation result of the image to be processed.

In a fourth aspect, an embodiment of the present disclosure further provides an image processing apparatus, including: the acquisition module is used for acquiring an image to be processed; and the processing module is used for performing semantic segmentation processing on the image to be processed by utilizing the neural network trained by the neural network training method based on any one of the first aspect to obtain a semantic segmentation result of the image to be processed.

In a fifth aspect, an embodiment of the present disclosure further provides an intelligent driving control method, including: acquiring an image acquired by a driving device in the driving process; detecting a target object in the image by using a neural network trained by the training method based on the neural network of any one of the first aspect; controlling the running device based on the detected target object.

In a sixth aspect, an embodiment of the present disclosure further provides an intelligent driving control device, including: the data acquisition module is used for acquiring images acquired by the driving device in the driving process; a detection module, configured to detect a target object in the image by using a neural network trained by the training method based on the neural network of any one of the first aspects; a control module for controlling the travel device based on the detected target object.

In a seventh aspect, this disclosure also provides an electronic device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect, or any one of the possible implementations of the first aspect, or to perform the steps in the third aspect, or to perform the steps in the fifth aspect.

In an eighth aspect, alternative implementations of the present disclosure further provide a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect, or any one of the possible implementations of the first aspect, or to perform the steps in the possible implementations of the third aspect, or to perform the steps in the possible implementations of the fifth aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.

Fig. 1 shows a flowchart of a training method of a neural network provided by an embodiment of the present disclosure;

fig. 2 is a flowchart illustrating a specific method for determining reliability information of each pixel point in a second semantically segmented image according to an embodiment of the present disclosure;

FIG. 3 illustrates a flow chart of another method of training a neural network provided by an embodiment of the present disclosure;

fig. 4 is a schematic diagram illustrating a specific example of a training method of a neural network provided by an embodiment of the present disclosure;

FIG. 5 is a flow chart illustrating an image processing method provided by an embodiment of the present disclosure;

fig. 6 shows a flowchart of an intelligent driving control method provided by an embodiment of the present disclosure;

FIG. 7 is a schematic diagram of a training apparatus for a neural network provided by an embodiment of the present disclosure;

fig. 8 shows a schematic diagram of an image processing apparatus provided by an embodiment of the present disclosure;

fig. 9 is a schematic diagram illustrating an intelligent driving control device provided in an embodiment of the present disclosure;

fig. 10 shows a schematic diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of the embodiments of the present disclosure, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure, presented in the figures, is not intended to limit the scope of the claimed disclosure, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.

According to research, before training, the neural network usually needs to spend a great deal of time and cost to label the sample image so as to form a labeled data set; in order to reduce sample labeling time and cost, neural networks are trained in many cases by computer simulated synthetic images; however, because a certain domain difference exists between the synthetic image and the real image, the performance of the neural network obtained by training the synthetic image is reduced when the neural network executes an image processing task on the actual image; in order to solve the problem, currently, more supervised training with supervision signals is generally performed on an antagonism framework, for example, on the basis of a generative antagonism network, the neural network is subjected to transfer learning by adopting supervision signals such as depth, style, class constraint, decision boundary and the like; however, in the process of migration learning by the neural network, the learned features have great uncertainty, and therefore, the problem of negative migration may be caused.

Based on the research, the method and the device for training the neural network monitor the student network to perform transfer learning by controlling the teacher network and the student network to generate consistent prediction results on unmarked target images under different disturbances, and update the teacher network based on parameter values of the student network, so that the teacher network and the student network can learn specific technical characteristics in the target images in the transfer learning process, and the problem of negative transfer is avoided.

The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

To facilitate understanding of the present embodiment, first, a training method for a neural network disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the training method for a neural network provided in the embodiments of the present disclosure is generally a computer device with certain computing power, and the computer device includes, for example: a terminal device or server or other processing device; in some possible implementations, the training method of the neural network may be implemented by a processor calling computer-readable instructions stored in a memory.

The following describes a training method of a neural network provided in an embodiment of the present disclosure.

In the embodiment of the present disclosure, before updating the parameter values of the Student Network (Student Network) and the Teacher Network (Teacher Network) based on S101 to S104, the parameter values of the Student Network and the Teacher Network may be initialized first.

Illustratively, the teacher network and the student network may be initialized, for example, using a pre-trained semantic segmentation network.

Here, the pre-trained semantic segmentation network is, for example, a neural network trained based on source images; in the embodiments of the present disclosure, the processes of S101 to S104 are based on the target image, and control the pre-trained semantic segmentation network to perform the migration learning from the source domain to the target domain, so that after the migration learning is performed by the semantic segmentation network, performance of the semantic segmentation network is not degraded when performing the semantic segmentation processing on the image of the target domain.

The image of the source domain includes, for example: synthesizing an image; the image of the target field includes, for example: and (4) real images.

After parameter values of the student network and the teacher network are initialized, multiple rounds of iteration are carried out on the student network and the teacher network based on S101-S104, and the teacher network or the student network after the multiple rounds of iteration is determined as a trained neural network. Here, the process of S101 to S104 is performed once, and is a process of performing one iteration of the student network and the teacher network.

Referring to fig. 1, a flowchart of a training method of a neural network provided in an embodiment of the present disclosure is shown, where the method includes:

s101: and performing semantic segmentation processing on the first noise image of the target image by using a student network to obtain a first semantic segmentation image.

In a specific implementation, the first noise image may be obtained by injecting random noise into the target image, for example.

Exemplary random noise includes, for example: any one of gaussian noise, white noise, etc. may be determined according to actual needs.

Injecting random noise into a target image, generating a first noise image, and performing semantic segmentation processing on the first noise image by using a student network; when the student network carries out semantic segmentation processing on the first noise image, a semantic segmentation result of each pixel point in the first noise image can be obtained; then, forming a first semantic segmentation image based on the semantic segmentation result of each pixel point in the first noise image; the first semantically segmented image has the same size as the first noise image.

The pixel value of any pixel point a 'in the first semantic segmentation image is the semantic segmentation result of the pixel point a corresponding to the any pixel point a' in the first noise image.

The training method of the neural network provided by the embodiment of the disclosure further includes:

s102: performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image.

In specific implementation, S102 and S101 have no sequential logical relationship; the execution may be synchronous or asynchronous.

The second noise image is generated in a similar manner to the first noise image, and may be obtained by injecting random noise into the target image, for example. Wherein the noise injected by different noisy images of the target image is different.

In one possible embodiment, there is one second noise image; in this case, the teacher network is used to perform semantic segmentation processing on the second noise image, so as to obtain a semantic segmentation result of each pixel point in the second noise image, and then the second semantic segmentation image is formed based on the semantic segmentation result of each pixel point in the second noise image.

In another possible embodiment, there are a plurality of second noise images; in this case, performing semantic segmentation processing on the multiple second noise images of the target image by using a teacher network to obtain an intermediate semantic segmentation image corresponding to each second noise image in the multiple second noise images; and then, based on the multiple intermediate semantic segmentation images, generating a second semantic segmentation image.

Here, for example, pixel values of pixels at corresponding positions in a plurality of second semantic segmentation images may be averaged in sequence, and the average value of a pixel at any corresponding position may be determined as the pixel value of a pixel at a corresponding position in the second semantic segmentation image.

For example, if the size of the target image is h × w, N second noise images of the target image are A1, A2, … … and AN respectively, and the teacher network is used for semantic segmentation processing on the second noise images to obtain AN intermediate semantic segmentation image of the ith second noise image

Expressed as:

wherein x is_tRepresenting a target image; h represents the height of the target image, w represents the width of the target image; and C represents the semantic segmentation type of the teacher network.

Second semantically segmented image

For example, the following formula (1) is satisfied:

therefore, random noise is injected into the target image for multiple times to generate multiple second noise images, the images are segmented based on the middle semantemes corresponding to the multiple second noise images respectively to obtain second semanteme segmented images, more uncertain information in the second noise images can be extracted, reliability information of each pixel point in the second semanteme segmented images obtained based on the second noise images has better prominence, and optimization efficiency of student network parameter values is improved.

After obtaining the second semantic segmentation image, referring to fig. 2, the embodiment of the present disclosure further provides a specific method for determining reliability information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image, including:

s201: and determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image.

Here, the information entropy of any pixel point

For example, the following formula (2) is satisfied:

s202: and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.

Here, the information entropy threshold may be determined based on, for example, a semantic division type of the teacher network.

The information entropy threshold H satisfies, for example, the following formula (3):

wherein a, b and c are all hyper-parameters; k_maxlogC; and C represents the semantic segmentation type of the teacher network. t represents the current iteration round number; t is t_maxThe maximum number of iteration rounds is indicated.

Illustratively, the information entropy threshold satisfies, for example:

for example, the information entropy of each pixel point in the second semantic segmentation image may be compared with a predetermined information entropy threshold; and then determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result.

If the absolute value of the information entropy of any pixel point in the second semantic segmentation image is larger than the information entropy threshold, setting the credibility information corresponding to any pixel point as a credible preset value representing the pixel value of any pixel point, wherein the preset value is larger than 0.

In a specific implementation, as can be known from the above formula (2), the value of the information entropy is a negative number; for a certain pixel point in the second semantic segmentation image, the smaller the value of the information entropy of the pixel point is, the higher the credibility of the pixel point is represented, that is, the higher the credibility of the classification of the pixel point in the corresponding target image represented by the pixel value of the pixel point in the second semantic segmentation image is. When consistency loss between the first semantic segmentation image and the second semantic segmentation image is determined, considering pixel points with higher reliability in the second semantic segmentation image, and increasing influence of the pixel points with higher reliability on the consistency loss; and for the pixel points with lower credibility in the second semantic segmentation image, the influence of the pixel points on consistency loss can be reduced, and even the influence of the pixel points on consistency loss is eliminated.

Further, for example, a preset that the pixel value is authentic may be set to 1; a preset value at which the pixel value is not authentic is set to 0.

For another example, a preset value where the pixel value is authentic may be set to 1, a preset value where the pixel value is not authentic may be set to 0.5, and so on.

The specific setting can be carried out according to the actual needs.

Further, for example, the reliability information of each pixel point in the second semantic segmentation image satisfies the following formula (4):

wherein H represents an information entropy threshold; i (-) represents a 0-1 function; and is

When the formula is adopted, I (·) takes 1;

when it is, I (. cndot.) takes 0.

Receiving the above S101 and S102, the training method of the neural network provided by the embodiment of the present disclosure further includes:

s103: updating parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information.

S104: updating the parameter values of the teacher network based on the updated parameter values of the student network.

In a particular implementation, for example, a loss of consistency between the first semantically segmented image and the second semantically segmented image may be determined based on the first semantically segmented image, the second semantically segmented image, and the credibility information, and then parameter values of a student network may be updated based on the loss of consistency.

In a specific implementation, as can be seen from the above equation (3), H is a time dependent function, and the consistency loss can be, for example, a mean square error between a first semantically segmented image extracted from a student network and a second semantically segmented image extracted from a teacher network, and the consistency loss L_conFor example, the following formula (5) is satisfied:

wherein f is_SRepresenting a student network; f. of_TRepresenting a teacher network; x is the number of_t1Representing a first noise image; x is the number of_t2Representing a second noisy image; σ denotes an activation function, for example a softmax activation function.

When updating the parameter values of the student network on the basis of the loss of consistency, for example, the parameter values of the student network are adjusted in a direction to reduce the loss of consistency.

When updating the parameter values of the teacher network based on the updated parameter values of the student network, for example, exponential moving average processing may be performed on the parameter values of the parameters in the student network to obtain target parameter values; and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.

In specific implementation, based on the formula (4) and the formula (5), it can be known that when the semantic segmentation result represented by any pixel point in the second semantic segmentation image is trusted, the value of the reliability information corresponding to the pixel point is 1; when the semantic segmentation result represented by any pixel point in the second semantic segmentation image is not credible, the credibility information corresponding to the pixel point is 0, and then consistency loss is determined based on the credible pixel points of the semantic segmentation result in the second semantic segmentation image, and further the consistency loss of the generated first semantic segmentation image and the second semantic segmentation image only considers the credible pixel points in the second semantic segmentation image, so that when the parameter value of the student network is updated based on the consistency loss, the result of semantic segmentation processing performed on target images added with different disturbances by the student network and the teacher network can be ensured to tend to be consistent. And then updating the parameter values of the teacher network based on the updated parameter values of the student network, so that the parameter values of the teacher network and the parameter values of the student network can keep consistent change direction, and the teacher network and the student network can learn the specific characteristics of the target image.

In the embodiment of the disclosure, the first noise image and the second noise image are both images obtained by applying different disturbances to the target image; performing semantic segmentation processing on the first noise image by using a student network to obtain a first semantic segmentation image, performing semantic segmentation processing on the second noise image by using a teacher network to obtain a second semantic segmentation image, determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image, updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image and the credibility information, and updating parameter values of the teacher network based on the updated parameter values of the student network; in the process, the first semantic segmentation image, the second semantic segmentation image and the credibility information are used for controlling the student network and the teacher network to predict the same target image after disturbance to generate a consistent prediction result, so that the student network can learn specific characteristics in the target image in the process of migrating based on the target image, namely, the student network performs migration learning towards a specific direction, and the parameter value of the teacher network is updated according to the parameter value of the student network, so that the teacher network performs migration learning towards the specific direction, and the problem of negative migration is avoided.

Referring to fig. 3, an embodiment of the present disclosure further provides another training method for a neural network, including:

s301: and performing semantic segmentation processing on the first noise image of the target image by using a student network to obtain a first semantic segmentation image.

S302: performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; and determining the credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image.

The specific implementation process of S301 to S302 is similar to that of S101 to S102, and is not described herein again.

S303: and performing semantic segmentation processing on the style migration image of the source image by using a student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located.

In specific implementation, the S303 has no sequential logical relationship with the above S301 to S302; the execution may be synchronous or asynchronous.

Specifically, the style transition image of the source image may be obtained, for example, in the following manner:

carrying out style migration processing on the source image by utilizing a pre-trained style migration network to obtain a style migration image corresponding to the source image; the style migration network is obtained by utilizing the source image and the target image for training.

In one embodiment, the style migration network is, for example, a Generative Adversal Networks (GANs), such as a cycleGAN. The generative confrontation network can integrate semantic information of a source domain carried in a source image and semantic information of a target domain carried in a target image together, so that the source image is converted into a style migration image containing partial features in the target image; and then carrying out semantic segmentation processing on the style migration image by using a student network.

In addition, the segmentation migration image may also be generated by using a style migration network with another architecture, for example, a neural network with architecture such as VGG, Goog L eNet, etc. may be specifically selected according to actual needs.

In connection with the above S302 and S303, the training method of the neural network provided by the embodiment of the present disclosure further includes:

s304: updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image.

In a particular embodiment, the parameter values of the student network may be updated, for example, in the following manner: generating consistency loss of the first semantic segmentation image and the second semantic segmentation image based on the first semantic segmentation image, the second semantic segmentation image and the credibility information; generating semantic segmentation loss based on the third semantic segmentation image and the annotation information of the source image; parameters of the student network are updated based on the consistency loss and the semantic segmentation loss.

Exemplary, semantic segmentation loss L_segFor example, for optimizing the cross-entropy loss of a source image from a source domain, which satisfies the following equation (6):

wherein H represents the height of the style transition image; w represents the width of the style transition image; c represents the number of channels; y is_sRepresenting annotation information of the source image;

representing a third semantically segmented image;

representing a source image; f. of_S(-) represents a student network.

When updating the parameter values of the student network based on the semantic segmentation loss and the consistency loss, for example, a weight of the consistency loss may be determined according to the current iteration number, and then the parameter values of the student network may be updated according to the consistency loss, the weight of the consistency loss, and the semantic segmentation loss.

Determining the total loss of the student network according to the semantic segmentation loss and the consistency loss, wherein the total loss L_totalFor example, the following formula (7) is satisfied:

L_total＝L_seg+λ_conL_con(7)

wherein, L_segRepresenting semantic segmentation loss L_conIndicating a loss of consistency; lambda [ alpha ]_conThe weight of the consistency loss is, for example, a dynamic weight which is set as a rising function increasing with the number of iterations and can balance between the semantic segmentation loss and the consistency loss, the advantage of the semantic segmentation loss is increased in the early training process of the neural network, and the advantage of the consistency loss is gradually increased in the later training process, so that the convergence of the parameter values of the neural network is stably controlled.

With reference to the foregoing S304, the training method for a neural network provided in the embodiment of the present disclosure further includes:

s305: updating the parameter values of the teacher network based on the updated parameter values of the student network.

Here, the specific implementation process of S305 is similar to that of S104 described above, and is not described herein again.

According to the method and the device, the style migration image of the source image is subjected to semantic segmentation processing by utilizing the student network to obtain a third semantic segmentation image, and then the parameter value updating process of the student network is supervised based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image, so that the semantic segmentation precision of the student network and the teacher network can be further improved.

Referring to fig. 4, an embodiment of the present disclosure further provides a specific example of a training method for a neural network, including:

step 1: a source image x_sInputting the image data to a style migration network to obtain a source image x_sStyle migration image of

Step 2: migrating styles from image to image

And inputting the image into a student network to obtain a third semantic segmentation image.

And step 3: based on source image x_sMarking information y of_sAnd a third semantic segmentation image to obtain a semantic segmentation loss L_seg。

And 4, step 4: is a target image x_tRandom noise is injected to generate a first noise image, and the first noise image is input to a student network to obtain a first semantic segmentation image.

And 5: is a target image x_tInjecting random noise to generate N second noise images, and inputting the N second noise images to a teacher network to obtain a plurality of intermediate semantic segmentation images. And sequentially solving the pixel value mean value of pixel points at corresponding positions in the plurality of intermediate semantic segmentation images to obtain a second semantic segmentation image.

And 7: and (3) calculating the information entropy of each pixel point in the second semantic segmentation image according to the formula (2).

And 8: and (4) calculating according to a formula (4) to calculate the reliability, so as to obtain the reliability information of each pixel point in the second semantic segmentation image.

Step 9, obtaining a consistency loss L of the first semantic segmentation image and the second semantic segmentation image according to the first semantic segmentation image, the second semantic segmentation image and the credibility information_con。

Step 10 calculating Total loss L according to equation (7)_total。

Step 11, L according to total loss_totalRenewing studentsParameter values of the network.

Step 12: and carrying out exponential moving average processing on the updated parameter values of the student network, and updating the parameter values of the teacher network based on the result of the exponential moving average processing.

Through the process, one round of iteration of the student network and the teacher network is realized.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

Referring to fig. 5, an embodiment of the present disclosure further provides an image processing method, including:

s501: acquiring an image to be processed;

s502: and performing semantic segmentation processing on the image to be processed by utilizing the neural network trained by the training method of the neural network based on any embodiment of the disclosure to obtain a semantic segmentation result of the image to be processed.

The implementation of the method is realized by utilizing the neural network trained by the neural network training method provided by the embodiment of the invention when the semantic segmentation processing is carried out on the image to be processed, the neural network trained by the neural network training method has better semantic style precision on the image to be processed, and the obtained semantic segmentation result of the image to be processed is more accurate.

Referring to fig. 6, an embodiment of the present disclosure further provides an intelligent driving control method, including:

s601: acquiring an image acquired by a driving device in the driving process;

s602: detecting a target object in the image by using a neural network trained by a training method of the neural network according to any embodiment of the disclosure;

s603: controlling the running device based on the detected target object.

In a specific implementation, the driving device is, for example, but not limited to, any one of the following: an autonomous vehicle, a vehicle equipped with an Advanced Driving Assistance System (ADAS), a robot, or the like.

Controlling the traveling device, for example, includes controlling the traveling device to accelerate, decelerate, steer, brake, etc., or may play voice prompt information to prompt the driver to control the traveling device to accelerate, decelerate, steer, brake, etc.

The intelligent driving control method of the embodiment of the disclosure is realized by utilizing the neural network trained by the neural network training method provided by the embodiment of the disclosure, and when the neural network trained by the neural network training method performs semantic segmentation processing on the image obtained in the driving process, a more accurate semantic segmentation processing result can be obtained, thereby ensuring higher safety in the driving control process.

Based on the same inventive concept, the embodiment of the present disclosure further provides a training apparatus for a neural network corresponding to the training method for the neural network, and since the principle of the apparatus in the embodiment of the present disclosure for solving the problem is similar to the training method for the neural network described above in the embodiment of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not repeated.

Referring to fig. 7, a schematic diagram of a training apparatus for a neural network provided in an embodiment of the present disclosure is shown, where the apparatus includes: a first processing module 71, a second processing module 72, a first updating module 73, and a second updating module 74; wherein,

the first processing module 71 is configured to perform semantic segmentation processing on a first noise image of the target image by using a student network to obtain a first semantic segmentation image;

a second processing module 72, configured to perform semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;

a first updating module 73, configured to update parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the reliability information;

a second updating module 74 for updating the parameter values of the teacher network based on the updated parameter values of the student network.

In a possible embodiment, the apparatus further comprises: the third processing module 75 is configured to perform semantic segmentation processing on the style migration image of the source image by using a student network to obtain a third semantic segmentation image, where the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located;

the first updating module 73, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the reliability information, is configured to:

updating parameter values of the student network based on the first semantic segmentation image, the second semantic segmentation image, the credibility information, the third semantic segmentation image and the annotation information of the source image.

In a possible embodiment, the first updating module 73, when updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, the credibility information, the third semantically segmented image, and the annotation information of the source image, is configured to:

determining a consistency loss based on the first semantically segmented image, the second semantically segmented image and the credibility information; determining a weight of the consistency loss based on a current iteration number;

determining semantic segmentation losses based on the third semantically segmented image and annotation information of the source image;

updating parameter values for the student network based on the consistency loss, the weights, and the semantic segmentation loss.

In one possible embodiment, the second processing module 72, when performing semantic segmentation processing on the second noise image of the target image by using a teacher network to obtain a second semantic segmentation image, is configured to:

performing semantic segmentation processing on the plurality of second noise images of the target image by using a teacher network to obtain a plurality of intermediate semantic segmentation images;

and generating the second semantic segmentation image based on the plurality of intermediate semantic segmentation images.

In one possible embodiment, the second processing module 72, when generating the second semantically segmented image based on the plurality of intermediate semantically segmented images, is configured to:

calculating a pixel value mean value of pixel points at corresponding positions in the multiple intermediate semantic segmentation images in sequence;

and determining the average value of the pixel points at any corresponding position as the pixel value of the pixel point at the corresponding position in the second semantic segmentation image.

In one possible embodiment, the second processing module 72, when determining, based on the second semantically segmented image, the reliability information of each pixel point in the second semantically segmented image, is configured to:

determining the information entropy of each pixel point in the second semantic segmentation image based on the pixel value of each pixel point in the second semantic segmentation image;

and determining the credibility information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold.

In one possible embodiment, the second processing module 72, when determining the reliability information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold, is configured to:

comparing the information entropy of each pixel point in the second semantic segmentation image with the information entropy threshold;

determining the credibility information of each pixel point in the second semantic segmentation image based on the comparison result;

In a possible implementation, the second processing module 72 is further configured to generate the information entropy threshold value by:

and determining the information entropy threshold value based on the semantic segmentation type of the teacher network.

In one possible embodiment, the second updating module 74, when updating the parameter values of the teacher network based on the updated parameter values of the student network, is configured to:

performing exponential moving average processing on parameter values of parameters in the student network to obtain target parameter values;

and replacing the parameter value of the corresponding parameter in the teacher network by using the target parameter value.

In a possible embodiment, the method further comprises: a first generating module 76 for generating the style migration image by:

In a possible embodiment, the method further comprises: and the initialization module 77 is used for initializing the teacher network and the student network by utilizing a pre-trained semantic segmentation network.

In a possible embodiment, the method further comprises: a second generating module 78 configured to generate the first noise image and the second noise image in the following manner:

injecting random noise into the target image to obtain the first noise image and the second noise image; wherein, the noise corresponding to different noise images is different.

The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.

Referring to fig. 8, an embodiment of the present disclosure further provides an image processing apparatus, including:

an obtaining module 81, configured to obtain an image to be processed;

the processing module 82 is configured to perform semantic segmentation on the image to be processed by using the neural network trained by the neural network training method according to any embodiment of the present disclosure, so as to obtain a semantic segmentation result of the image to be processed.

Referring to fig. 9, an embodiment of the present disclosure further provides an intelligent driving control device, including:

the data acquisition module 91 is used for acquiring images acquired by the running device in the running process;

a detection module 92, configured to detect a target object in the image by using a neural network trained by a neural network training method according to any embodiment of the present disclosure;

and a control module 93 for controlling the running device based on the detected target object.

An embodiment of the present disclosure further provides an electronic device 10, as shown in fig. 10, which is a schematic structural diagram of the electronic device 10 provided in the embodiment of the present disclosure, and includes:

a processor 11 and a memory 12; the memory 12 stores machine-readable instructions executable by the processor 11, which when executed by the electronic device are executed by the processor to perform the steps of:

performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image; performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image; updating parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information; updating the parameter values of the teacher network based on the updated parameter values of the student network.

Or implementing the following steps: acquiring an image to be processed; performing semantic segmentation processing on the image to be processed by using a neural network trained by using the neural network training method based on any embodiment of the disclosure to obtain a semantic segmentation result of the image to be processed;

or implementing the following steps: acquiring an image acquired by a driving device in the driving process; detecting a target object in the image by using a neural network trained by a training method of the neural network according to any embodiment of the disclosure; controlling the running device based on the detected target object. .

The specific execution process of the instruction may refer to the steps of the neural network training method or the image processing steps described in the embodiments of the present disclosure, and details are not repeated here.

The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, performs the steps of the neural network training method described in the above method embodiments, or performs the steps of the image processing method described in the above method embodiments, or performs the steps of the intelligent driving control method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.

The computer program product of the neural network training method and the image processing method provided in the embodiments of the present disclosure includes a computer readable storage medium storing a program code, where instructions included in the program code may be used to execute the neural network training method, the image processing method, or the intelligent driving control method described in the embodiments of the above methods, and specific reference may be made to the embodiments of the above methods, which are not described herein again.

The embodiments of the present disclosure also provide a computer program, which when executed by a processor implements any one of the methods of the foregoing embodiments. The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing an electronic device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A method of training a neural network, comprising:

performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image;

performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;

updating parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information;

updating the parameter values of the teacher network based on the updated parameter values of the student network.

2. Training method according to claim 1, characterized in that the method further comprises:

semantic segmentation processing is carried out on the style migration image of the source image by utilizing a student network to obtain a third semantic segmentation image, wherein the style migration image of the source image is an image obtained by migrating the style of the source image to a target domain where the target image is located;

the updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, and the credibility information includes:

3. The training method of claim 2, wherein the updating the parameter values of the student network based on the first semantically segmented image, the second semantically segmented image, the credibility information, the third semantically segmented image, and annotation information of the source image comprises:

4. A training method as claimed in any one of claims 1 to 3, wherein performing semantic segmentation processing on the second noise image of the target image using a teacher network to obtain a second semantic segmented image comprises:

5. The training method of claim 4, wherein the generating the second semantically segmented image based on the plurality of intermediate semantically segmented images comprises:

6. The training method according to any one of claims 1 to 5, wherein the determining, based on the second semantically segmented image, reliability information of each pixel point in the second semantically segmented image comprises:

7. The training method according to claim 6, wherein the determining the reliability information of each pixel point in the second semantic segmentation image based on the information entropy of each pixel point in the second semantic segmentation image and a predetermined information entropy threshold comprises:

8. Training method according to claim 6 or 7, characterized in that the information entropy threshold is generated in the following way:

9. A training method as claimed in any one of claims 1 to 8, wherein updating the parameter values of the teacher network based on the updated parameter values of the student network comprises:

10. An image processing method, comprising:

acquiring an image to be processed;

performing semantic segmentation processing on the image to be processed by using the neural network trained by the neural network training method according to any one of claims 1 to 9 to obtain a semantic segmentation result of the image to be processed.

11. An intelligent travel control method, characterized by comprising:

acquiring an image acquired by a driving device in the driving process;

detecting a target object in the image by using a neural network trained by a training method based on the neural network according to any one of claims 1 to 9;

controlling the running device based on the detected target object.

12. An apparatus for training a neural network, comprising:

the first processing module is used for performing semantic segmentation processing on a first noise image of a target image by using a student network to obtain a first semantic segmentation image;

the second processing module is used for performing semantic segmentation processing on a second noise image of the target image by using a teacher network to obtain a second semantic segmentation image; determining credibility information of each pixel point in the second semantic segmentation image based on the second semantic segmentation image;

a first updating module, configured to update a parameter value of the student network based on the first semantic segmentation image, the second semantic segmentation image, and the reliability information;

and the second updating module is used for updating the parameter values of the teacher network based on the updated parameter values of the student network.

13. An image processing apparatus characterized by comprising:

the acquisition module is used for acquiring an image to be processed;

a processing module, configured to perform semantic segmentation processing on the image to be processed by using the neural network trained by the neural network training method according to any one of claims 1 to 9, so as to obtain a semantic segmentation result of the image to be processed.

14. An intelligent travel control device, comprising:

the data acquisition module is used for acquiring images acquired by the driving device in the driving process;

a detection module, configured to detect a target object in the image by using a neural network trained by a training method based on the neural network of any one of claims 1 to 9;

a control module for controlling the travel device based on the detected target object.

15. An electronic device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor to execute machine-readable instructions stored in the memory, the processor to perform the steps of the method of any one of claims 1 to 11 when the machine-readable instructions are executed by the processor.

16. A computer-readable storage medium, having stored thereon a computer program, when being executed by an electronic device, the electronic device performing the steps of the method according to any of the claims 1 to 11.