CN113780101A - Obstacle avoidance model training method and device, electronic equipment and storage medium - Google Patents

Obstacle avoidance model training method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113780101A
CN113780101A CN202110963071.1A CN202110963071A CN113780101A CN 113780101 A CN113780101 A CN 113780101A CN 202110963071 A CN202110963071 A CN 202110963071A CN 113780101 A CN113780101 A CN 113780101A
Authority
CN
China
Prior art keywords
model
obstacle avoidance
target
training
updated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110963071.1A
Other languages
Chinese (zh)
Inventor
张宝丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Kunpeng Jiangsu Technology Co Ltd
Original Assignee
Jingdong Kunpeng Jiangsu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Kunpeng Jiangsu Technology Co Ltd filed Critical Jingdong Kunpeng Jiangsu Technology Co Ltd
Priority to CN202110963071.1A priority Critical patent/CN113780101A/en
Publication of CN113780101A publication Critical patent/CN113780101A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The embodiment of the invention discloses a method and a device for training an obstacle avoidance model, electronic equipment and a storage medium, wherein the method comprises the following steps: updating a learnable configuration parameter corresponding to the obstacle avoidance model to be trained according to a target loss function corresponding to the obstacle avoidance training model to be trained; updating target candidate values of various indexes of a backbone network of the obstacle avoidance model to be trained according to the learnable configuration parameters to obtain the obstacle avoidance training model to be updated; inputting the training pseudo-image into an obstacle avoidance training model to be updated to obtain actual output data corresponding to the training pseudo-image; and if the target loss function corresponding to the obstacle avoidance training model to be determined is converged, taking the obstacle avoidance training model to be updated as a target obstacle avoidance model. According to the technical scheme, the problems that in the prior art, a large number of artificial model structures are needed to be adjusted, complexity is complex, and efficiency is low are solved, the dynamic model structure adjustment is realized, and the technical effects of convenience and accuracy in determining the model structure parameters are improved.

Description

Obstacle avoidance model training method and device, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a training method of an obstacle avoidance model, an obstacle avoidance method, an obstacle avoidance device, electronic equipment and a storage medium.
Background
Currently, in an autopilot scenario, a three-dimensional point cloud may be determined based on a camera and a lidar, and a corresponding image may be generated based on the three-dimensional point cloud. And processing the image corresponding to the three-dimensional point cloud by adopting a corresponding data processing model to obtain obstacle result information, thereby controlling the unmanned vehicle to avoid the obstacle.
The data processing model used at present may be a model obtained by directly taking a trained model, or a model obtained by artificially determining a model structure according to experience or experimental data and then training.
When the present invention is implemented based on the above-described embodiments, the inventors have found that the following problems occur:
when a model structure is artificially determined, a large amount of experimental analysis is required to determine the model structure required by the current scene, and at this time, subjective factors of a user exist, so that the problem that the model structure is not adaptive to the current scene is solved; furthermore, when the task and the scene change, a large number of repeated experiments are performed again from zero, and then the user determines the model structure according to the test result, so that the problems of complexity and incompatibility of the model and the scene exist.
Based on the above, the problem that the model structure needs to be adjusted manually at present, the adjustment is complicated, and the problem that the accuracy of a processing result is low due to the fact that the model structure is not matched with a scene when the model structure is trained and used based on manual adjustment exists.
Disclosure of Invention
The invention provides a method and a device for training an obstacle avoidance model, electronic equipment and a storage medium, which are used for realizing the adaptation of the obtained target obstacle avoidance model and an application scene, thereby improving the technical problem of data processing accuracy.
In a first aspect, an embodiment of the present invention provides a method for training an obstacle avoidance model, where the method includes:
updating a learnable configuration parameter corresponding to the obstacle avoidance model to be trained according to a target loss function corresponding to the obstacle avoidance training model to be trained; the target loss function is determined according to a penalty function corresponding to the obstacle avoidance model to be trained and a preset loss function;
updating target candidate values of various indexes of a backbone network of the obstacle avoidance training model to be trained according to the learnable configuration parameters to obtain the obstacle avoidance training model to be updated; the backbone network comprises convolution layers, wherein each convolution layer comprises at least one convolution, and each convolution comprises at least one index;
inputting a training pseudo image into the obstacle avoidance training model to be updated to obtain actual output data corresponding to the training pseudo image;
and if the target loss function corresponding to the obstacle avoidance training model to be updated is converged, taking the obstacle avoidance training model to be updated as a target obstacle avoidance model.
In a second aspect, an embodiment of the present invention further provides a device for training an obstacle avoidance model, where the device includes:
the learnable configuration parameter determining module is used for updating the learnable configuration parameters corresponding to the obstacle avoidance model to be trained according to the target loss function corresponding to the obstacle avoidance model to be trained; the target loss function is determined according to a penalty function corresponding to the obstacle avoidance model to be trained and a preset loss function;
the obstacle avoidance training model to be updated determining module is used for updating target candidate values of various indexes of a backbone network of the obstacle avoidance training model to be trained according to the learnable configuration parameters to obtain the obstacle avoidance training model to be updated; the backbone network comprises convolution layers, wherein each convolution layer comprises at least one convolution, and each convolution comprises at least one index;
the output data determining module is used for inputting the training pseudo-image into the obstacle avoidance training model to be updated to obtain actual output data corresponding to the training pseudo-image;
and the target obstacle avoidance model determining module is used for taking the to-be-updated obstacle avoidance training model as a target obstacle avoidance model if a target loss function corresponding to the to-be-updated obstacle avoidance training model is converged.
In a third aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
one or more processors;
a storage device for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors implement the method for training the obstacle avoidance model according to any one of the embodiments of the present invention.
In a fourth aspect, an embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method for training an obstacle avoidance model according to any one of the embodiments of the present invention.
The technical scheme of the embodiment of the invention determines target candidate values corresponding to various indexes of a backbone network of the obstacle avoidance model to be trained according to the acquired learnable configuration parameters, determines the model structure of the obstacle avoidance model to be trained according to the target candidate values, takes the obstacle avoidance model to be trained with the determined model structure as the obstacle avoidance training model to be updated, can input the current training image into the obstacle avoidance training model to be updated to obtain actual output data, determines the target loss function to be converged according to the actual output data and the target penalty function of the obstacle avoidance training model to be updated, takes the obstacle avoidance model to be updated as the target obstacle avoidance model, solves the problems that the manual adjustment of the model structure in the prior art takes trouble and labor, thereby causing low efficiency, simultaneously, the determined model structure is possibly not matched with a scene, and realizes the dynamic adjustment of the model parameters in the model according to the learnable configuration parameters, therefore, the obtained target obstacle avoidance model is matched with a specific application scene, the accuracy of model determination is improved, and further, the technical effect of the accuracy of image processing can be improved when a pseudo image is processed based on the target obstacle avoidance model.
Drawings
In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, a brief description is given below of the drawings used in describing the embodiments. It should be clear that the described figures are only views of some of the embodiments of the invention to be described, not all, and that for a person skilled in the art, other figures can be derived from these figures without inventive effort.
Fig. 1 is a schematic flow chart of a method for training an obstacle avoidance model according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of a method for training an obstacle avoidance model according to a second embodiment of the present invention;
fig. 3 is a schematic flow chart of a method for training an obstacle avoidance model according to a third embodiment of the present invention;
fig. 4 is a schematic diagram of a framework for determining a target obstacle avoidance model according to a second embodiment of the present invention;
fig. 5 is a schematic flow chart of a method for training an obstacle avoidance model according to a fourth embodiment of the present invention;
fig. 6 is a schematic flow chart of a method for training an obstacle avoidance model according to a fifth embodiment of the present invention;
fig. 7 is a schematic structural diagram of a training device for an obstacle avoidance model according to a sixth embodiment of the present invention
Fig. 8 is a schematic structural diagram of an electronic device according to a seventh embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a schematic flow chart of a method for training an obstacle avoidance model according to an embodiment of the present invention, where the embodiment is applicable to determining a situation of model training on an obstacle avoidance model to be trained, the method may be executed by a device for training an obstacle avoidance model, the device may be implemented in a form of software and/or hardware, the hardware may be an electronic device, and the electronic device may be a mobile terminal, a PC terminal, or the like. The execution of the technical scheme can be executed by the server, the terminal equipment or the cooperation of the server and the terminal equipment.
Before the technical solution is introduced, an application scenario may be exemplarily described. The obstacle avoidance model training method provided by the embodiment of the invention can be applied to any model. For clearly describing the technical scheme of the embodiment of the invention, a corresponding target obstacle avoidance model in the field of unmanned driving can be taken as an example for description.
The unmanned vehicle can be provided with a laser radar, and surrounding obstacles are sensed based on the laser radar. And processing the scanning result of the laser radar to obtain three-dimensional point cloud. The three-dimensional point cloud can be processed into a 'pseudo image' similar to a picture by adopting the existing pointpilar algorithm. The pseudo-image can be processed based on a target obstacle avoidance model obtained through training, and an obstacle avoidance result is obtained. The automatic driving algorithm can control the unmanned vehicle to run on the road according to the obstacle avoidance result.
One model usually includes a convolutional layer, a pooling layer, a regular layer, and the like, and a user can set parameters in each layer according to experience, and then train the model based on the set parameters to obtain a model that can be used finally. Because the parameters in each layer are set by the user according to experience, the trained model and the scene have certain incompatibility, and therefore, when the obtained obstacle avoidance model is used for processing the pseudo-image, the problems of poor processing accuracy and poor processing effect exist. That is to say, in the existing model training, after the parameter setting of each structural layer is completed, the common model parameters in the model are adjusted based on the loss function, and the problem that the determined model is not adaptive to the actual requirement exists.
It should be noted that the model training mode provided by the embodiment of the present invention may be applicable to model training in any scene, and it is only necessary to adjust training sample data in training samples.
As shown in fig. 1, the method includes:
and S110, updating the learnable configuration parameters corresponding to the obstacle avoidance model to be trained according to the target loss function corresponding to the obstacle avoidance training model to be trained.
The target loss function is determined according to a penalty function corresponding to the obstacle avoidance model to be trained and a preset loss function; if a target obstacle avoidance model is obtained through training, one model can be selected in advance, each parameter in the model can be set as a default value at the moment, and the model can be used as the obstacle avoidance model to be trained. Of course, if in the process of model training, the model of the initial model parameters may be used as the obstacle avoidance model to be trained, or the model which is not trained for the first time and needs to update its internal parameters may be used as the obstacle avoidance model to be trained. For example, a model structure of a model needs to be determined, the model may obtain a corresponding output result when a first sample is input into an obstacle avoidance model to be trained, and if a target loss function corresponding to the model to be trained does not converge, an error value corresponding to the obstacle avoidance model to be trained may be determined based on the target loss function, and a learnable configuration parameter may be determined according to the error value. And correspondingly, after the sample is input into the obstacle avoidance training model to be updated, if the target loss function is still not converged, the obstacle avoidance model to be updated can be used as the obstacle avoidance model to be trained, and the learnable configuration parameters are determined. Namely, the obstacle avoidance model to be trained may be a model with default parameters, and may be an obstacle avoidance model to be updated obtained at a stage where the loss function is not converged in the training process.
And S120, updating target candidate values of various indexes of a backbone network of the obstacle avoidance model to be trained according to the learnable configuration parameters to obtain the obstacle avoidance training model to be updated.
The backbone network can be understood as a network corresponding to a convolution layer in the obstacle avoidance model to be trained. In order to make the trained obstacle avoidance model most adaptive to the current scene, common parameters in the obstacle avoidance model to be trained and model structure parameters corresponding to each index of each convolution in the convolution layer can be trained. When the indexes corresponding to the convolution are different, the structures of the models are also different, namely the models are different in shape. The learnable configuration parameter is a parameter for determining each index in the convolution, that is, based on the learnable configuration parameter, a model structure parameter of the obstacle avoidance model to be trained can be determined. It is understood that the model structure parameters include the target candidate value corresponding to each index in each convolution. The determined model structure parameters may be used as target candidate values corresponding to the respective index items. When the values corresponding to the indexes are different, the structures of the obstacle avoidance models to be trained are different, and the models determined based on the learnable configuration parameters can be used as the obstacle avoidance training models to be updated.
It can be understood that, in the process of training the obstacle avoidance model to be trained, common parameters and model structure parameters in the obstacle avoidance model to be trained can be dynamically adjusted, so that the obtained target obstacle avoidance model is most suitable for the current scene. Meanwhile, the obtained target obstacle avoidance model is determined objectively, and model structure parameters in the obstacle avoidance model to be trained are not determined according to subjective factors of a user.
It should be noted that, when the to-be-trained obstacle avoidance model is initially trained, the learnable configuration parameters are randomly determined. During subsequent training, the learnable configuration parameter is determined based on the loss value. The loss value may be determined by the target loss function on the actual output of the model and the corresponding theoretical output.
In this embodiment, a corresponding configuration item, i.e., an index item, may be set in advance for each convolution in the backbone network. In order to make the finally obtained target obstacle avoidance model most suitable for the current scene, at least one index item may be set for each convolution, and at least one optional candidate value may be set for each index item. The index item can be at least one of step size, channel number, convolution kernel size and convolution kernel type; the candidate value may be a value that can be selected by each index, and it should be noted that candidate values corresponding to different indexes may be different, but the selectable candidate value corresponding to each index is the same for each convolution. Of course, in order to further improve the adaptation degree between the trained target obstacle avoidance model and the application scenario, for each convolution, the selectable candidate value corresponding to each index may also be different, and a developer may set the selectable candidate value corresponding to each index in each convolution according to actual requirements. For example, the selectable candidate values may be a value that may be selected for the index step size, a number that may be selected for the number of channels, a value that may be selected for the convolution kernel size, and a type that may be selected for the convolution kernel type.
It is understood that, according to the actual situation, a corresponding index and an optional candidate value corresponding to each index may be set for each convolution in the convolution layer.
It should be noted that, for each convolution, the indexes corresponding to the convolution may include the above four types, or may include any number of the four types. That is, the number of indices corresponding to each convolution may be the same or different.
Illustratively, if the number of convolutions in the convolutional layer includes five, each convolution includes the above four indexes, and a corresponding optional candidate value is set for each index in each convolution, so that when the obstacle avoidance model is trained, the model structure of the obstacle avoidance model can be dynamically adjusted, and the target obstacle avoidance model is obtained. Assuming that the indexes required by the first convolution include step size, number of channels, convolution kernel size, and convolution kernel type, an optional candidate value corresponding to each index may be set, for example, the step size optional candidate value may be {1, 2}, the channel number optional candidate value may be {32, 48, 64, 96, 128}, the convolution kernel size optional candidate value may be {3 x 3, 5 x 5, 7 x 7}, and the convolution kernel type optional candidate value may be { depth separable convolution, packet convolution, normal convolution }. In the second convolution to the fifth convolution, the selectable candidate value corresponding to each index may be the same as the first convolution.
On the above exemplary basis, according to the obtained learnable configuration parameters, the target candidate value corresponding to each indicator in each convolution is determined, for example, the target candidate value of the first convolution is determined to be 2, 48, 3 × 3 and the depth separable convolution, and meanwhile, the target candidate values corresponding to each indicator in other convolutions may also be determined. According to the target candidate value of each convolution, the model structure parameters in the obstacle avoidance model to be trained can be determined, namely the shape of the obstacle avoidance model to be trained is determined and is used as the obstacle avoidance training model to be updated. And performing subsequent training based on the obstacle avoidance training model to be updated.
And S130, inputting the training pseudo image into the obstacle avoidance training model to be updated to obtain actual output data corresponding to the training pseudo image.
The training pseudo image is an image which participates in training to obtain a target obstacle avoidance model. The training pseudo-image is processed based on the obstacle avoidance training model to be updated, a corresponding result can be output, and the result output by the model can be used as actual output data. The image currently input into the model may be taken as a training pseudo-image.
It should be noted that, during model training, a training sample set may be obtained, where the training sample set includes a plurality of training data, and each training sample data may be a pseudo image and a corresponding theoretical output result. The theoretical output result may be obstacle information in the pseudo image.
In the present embodiment, the determination of the pseudo image may be: acquiring three-dimensional point cloud obtained based on laser radar processing; and processing the three-dimensional point cloud by adopting a preset algorithm to obtain a training pseudo image for training the obstacle avoidance model to be trained.
Wherein, can set up lidar on unmanned car, lidar can launch corresponding light beam. According to the light beam feedback information, three-dimensional point cloud can be obtained. The three-dimensional point cloud is processed by adopting a corresponding method, and a pseudo picture similar to the picture, namely a pseudo image, is obtained. Meanwhile, the existing method can also be adopted to determine the obstacle information corresponding to the pseudo image. The type, position, etc. of the obstacle may be included in the obstacle information.
In this embodiment, a plurality of training samples, that is, training pseudo images, which participate in the obstacle avoidance model training may be obtained in the above manner, and meanwhile, a result corresponding to the training pseudo image may also be determined, and the result is used as theoretical output data.
It should be noted that, in the process of obtaining the target obstacle avoidance model through training, the training pseudo images participating in the training each time are different. However, only one training pseudo image is input into the obstacle avoidance model to be updated at a time for subsequent processing.
And S140, if the target loss function corresponding to the obstacle avoidance training model to be updated is converged, taking the obstacle avoidance training model to be updated as a target obstacle avoidance model.
When the target candidate values corresponding to the indexes of each convolution are different, the shapes of the models are different. The penalty functions corresponding to different models are different. The target loss function is determined based on the target penalty function and the preset loss function. Therefore, different obstacle avoidance models to be updated have different corresponding target loss functions. If the target loss function is converged, the model obtained by training is usable, and the obstacle avoidance training model to be updated obtained at the moment can be used as the target obstacle avoidance model. The target obstacle avoidance model is used on an unmanned vehicle and can process the pseudo-image so as to obtain the model of the obstacle information.
In this embodiment, a target penalty function of the obstacle avoidance training model to be updated may be determined, and the target loss function is obtained based on the target penalty function and the preset loss function. The actual output data and the theoretical output data can be processed based on the target loss function, if the convergence of the target loss function is detected, the obstacle avoidance model to be updated can be used as the target obstacle avoidance model, namely, the model parameters in the model and the target candidate value corresponding to each convolution are all matched with the current scene.
The technical scheme of the embodiment of the invention determines target candidate values corresponding to various indexes of a backbone network of the obstacle avoidance model to be trained according to the acquired learnable configuration parameters, determines the model structure of the obstacle avoidance model to be trained according to the target candidate values, takes the obstacle avoidance model to be trained with the determined model structure as the obstacle avoidance training model to be updated, can input the current training image into the obstacle avoidance training model to be updated to obtain actual output data, determines the target loss function to be converged according to the actual output data and the target penalty function of the obstacle avoidance training model to be updated, takes the obstacle avoidance model to be updated as the target obstacle avoidance model, solves the problems that the manual adjustment of the model structure in the prior art takes trouble and labor, thereby causing low efficiency, simultaneously, the determined model structure is possibly not matched with a scene, and realizes the dynamic adjustment of the model parameters in the model according to the learnable configuration parameters, therefore, the obtained target obstacle avoidance model is matched with a specific application scene, the accuracy of model determination is improved, and further, the technical effect of the accuracy of image processing can be improved when a pseudo image is processed based on the target obstacle avoidance model.
Example two
Fig. 2 is a schematic flow chart of a training method for an obstacle avoidance model according to a second embodiment of the present invention, and based on the second embodiment, target candidate values corresponding to various indicators of a backbone network of the obstacle avoidance model to be trained may be determined according to the obtained learnable configuration parameters, and the obstacle avoidance training model to be updated may be obtained based on the target candidate values for refinement. The specific implementation mode can be seen in the detailed description of the technical scheme. The technical terms that are the same as or corresponding to the above embodiments are not repeated herein.
As shown in fig. 2, the method includes:
s210, processing the learnable configuration parameters based on a normalized exponential function to obtain a weight matrix corresponding to each index in a backbone network in the obstacle avoidance model to be trained.
The normalization index function may be a softmax function, and after the learnable configuration parameter is processed based on the normalization index function, the weight matrix corresponding to each index may be determined. The weight matrix, as the name implies, represents a weight value for each element value. The element at this time may be an optional candidate value corresponding to each index. Each value in the weight matrix may be a probability that each of the selectable candidate values is selected in each convolution. The number of rows of the weight matrix corresponds to the number of convolutions in the backbone network, and the number of columns corresponds to the number of selectable candidate values for each index. Of course, if the step size indicator is not set in a certain convolution, at this time, the weight values corresponding to the convolution step size indicators in the weight matrix may all be 0.
For example, it is assumed that the convolutional layer includes five convolutions, each convolution includes four indexes, the selectable candidate values corresponding to the step index include two, the selectable candidate values corresponding to the channel number index include five, the selectable candidate values corresponding to the convolutional kernel size index include three, and the selectable candidate values corresponding to the convolutional kernel type index include three. After processing the learnable configuration parameters based on the normalized exponential function, four weight matrices may be obtained. The four weight matrices are a 5 × 2-order weight matrix corresponding to the step size, a 5 × 5-order weight matrix corresponding to the number of channels, a 5 × 3-order weight matrix corresponding to the convolution kernel size, and a 5 × 3-order weight matrix corresponding to the convolution kernel type, respectively.
S220, determining a target candidate value of at least one index corresponding to each convolution based on the weight matrix of each index.
Wherein, each index has a weight matrix corresponding to the index. Each row of the weight matrix represents the weight (probability) of the chosen alternative candidate value of an index in the corresponding convolution. According to the weight value of each line, a target candidate value corresponding to each index in each convolution can be determined from each optional candidate value. It can be understood that the target candidate value is a selectable candidate value determined to be most suitable for the current scene according to the weight matrix corresponding to each index.
In this embodiment, for the weight matrix of each index, according to the weight value of each row of the weight matrix corresponding to the current index, a target weight value corresponding to each convolution in the convolution layer is determined; the weight value is used for representing the probability of each optional candidate value being selected; and determining the target candidate value according to the selectable candidate value corresponding to each target weight value.
For clear determination of how to determine the target candidate value, the process of a certain weight matrix can be taken as an example for description. Based on the above, the value corresponding to each row element in the weight matrix may be the probability that each selectable candidate value is selected in the current convolutional layer. For example, the weight matrix corresponds to the step size, the first row corresponds to the first convolution, and the second row corresponds to the second convolution. And each weight value in the first row in the weight matrix is used for representing the probability value of the selected two step size selectable candidate values. The largest weight value in each row may be taken as the target weight value. The selectable candidate value corresponding to the target weight value may be used as the target candidate value.
In this embodiment, the determining, according to the weight value of each row of the weight matrix corresponding to the current indicator, a target weight value corresponding to each convolution in the convolution layer includes: determining the maximum weight value of each row in the current weight matrix, and determining the maximum weight value as the target weight value of the current index in each convolution; correspondingly, the determining the target candidate value according to the selectable candidate value corresponding to each target weight value includes: and determining selectable candidate values corresponding to the target weight values, and taking the selectable candidate values as target candidate values.
It should be noted that the weight value at each position of each row in the weight matrix sequentially corresponds to the selectable candidate values, for example, the selectable candidate values of the step size are 1 and 2, then the first weight value of each row corresponds to 1, and the second weight value corresponds to 2.
On the above exemplary basis, the weight matrix of the index step is a 5 × 2 order matrix. The weight value of each row corresponds to each convolution, and each column represents the weight corresponding to the same selectable candidate value. For example, the elements in the first row and the first column are weights corresponding to the step candidate value of 1 in the first convolution, and the second element in the first row represents a weight corresponding to the step candidate value of 2. Correspondingly, the elements in the first row and the first column of the second row represent the weight corresponding to the step candidate value of 1 in the second convolution, and the elements in the second row and the second column of the second row represent the weight corresponding to the step candidate value of 2. Correspondingly, if the weight matrix corresponds to the channel number index, if there is only one convolution, the pre-setting of the channel number candidate includes: {32, 48, 64, 96, 128}, wherein channel number index weight matrixes obtained based on the normalized exponential function are {0.1, 0.2, 0.5, 0.1, 0.1}, respectively, and weight values in the weight matrixes correspond to the channel number sequence, namely 0.1 corresponds to 32, and 0.2 corresponds to 48. At this time, it may be determined that the maximum weight value is 0.5, and 0.5 may be used as the target weight value, and meanwhile, since the candidate value for the number of channels corresponding to 0.5 is 64, the number of channels 64 may be used as the target candidate value. The same may be used for determining other target alternative candidate values.
And S230, updating the used target candidate values corresponding to the step length, the channel, the convolution kernel and the convolution kernel type in each convolution according to the target candidate values to obtain the obstacle avoidance training model to be updated.
It should be noted that when the target candidate values are different, the network structures in the backbone network are also different, so that the model structure of the obstacle avoidance model to be trained can be determined based on each target candidate value, the problem that in the prior art, the model structure of the model is set by a user according to experience, so that the model obtained by training is not adapted to the current model, and further the processing effect is poor is solved, the network structure in the backbone network is dynamically adjusted according to the actual application scene, and the obstacle avoidance model to be trained obtained by adjustment at this time is used as the obstacle avoidance training model to be updated.
Specifically, after each target candidate value is determined, the model structure of the obstacle avoidance model to be trained may be updated based on the target candidate value corresponding to each index in each convolution, so as to obtain the obstacle avoidance training model to be updated.
In this embodiment, the to-be-trained obstacle avoidance model for determining the target candidate value is used as the to-be-updated obstacle avoidance training model, and the advantages of the to-be-updated obstacle avoidance training model are that: the obstacle avoidance model to be updated can be trained based on the training sample data, the obstacle avoidance training model to be updated is dynamically adjusted according to the training result, and the adaptation degree of the target obstacle avoidance model obtained through final training and the current scene is improved.
In this embodiment, the determining a model structure of the obstacle avoidance model to be trained based on each target candidate value to obtain the obstacle avoidance training model to be updated includes: and respectively determining the step length, the channel, the convolution kernel size and the value of the convolution kernel type in each convolution according to each target candidate value to obtain the obstacle avoidance training model to be updated.
After each target candidate value is determined according to the weight matrix of each index, specific values of step length, channel number, convolution kernel size and convolution kernel type in each convolution can be determined, and an obstacle avoidance training model to be updated is determined based on each specific value.
S240, inputting the training pseudo image into the obstacle avoidance training model to be updated, and obtaining actual output data corresponding to the training pseudo image.
Specifically, the training pseudo image may be used as an input of the to-be-updated obstacle avoidance training model, and the to-be-updated obstacle avoidance model may output an output result corresponding to the training pseudo image, and use the output result as actual output data.
It should be noted that the training pseudo images corresponding to each obstacle avoidance training model to be updated are different, that is, each step of the embodiment may be repeatedly executed to train and obtain the target obstacle avoidance model.
And S250, if the target loss function corresponding to the obstacle avoidance training model to be updated is converged, taking the obstacle avoidance training model to be updated as a target obstacle avoidance model.
According to the technical scheme of the embodiment of the invention, learnable configuration parameters are processed through a normalized index function to obtain a weight matrix corresponding to each index in a backbone network, and the model structure of the obstacle avoidance model to be trained is adjusted according to the weight matrix. The method comprises the steps of inputting a training sample into an obstacle avoidance training model to be updated to obtain an actual output result, processing actual output data and theoretical output data through a target loss function, determining that the obstacle avoidance training model to be updated is most adaptive to a current scene when the convergence of the target loss function is determined, and completing training, so that the adaptation of the obstacle avoidance model and the scene is improved, and the accuracy of the obstacle avoidance model on data processing is improved.
EXAMPLE III
Fig. 3 is a schematic flow chart of a training method for an obstacle avoidance model according to a third embodiment of the present invention, which further includes determining penalty functions corresponding to the obstacle avoidance models to be updated, so as to determine target penalty functions corresponding to the obstacle avoidance training models to be updated, and further determining target loss functions based on the target penalty functions and preset loss functions.
As shown in fig. 3, the method includes:
s310, randomly sampling selectable candidate values corresponding to various indexes in the backbone network to obtain at least one obstacle avoidance model to be selected.
It should be noted that, the index used by each convolution in the backbone network may be preset, and at least one optional candidate value corresponding to each index may be set. And randomly acquiring a selectable candidate value from the candidate values of the indexes aiming at each convolution by adopting a random sampling mode to obtain an obstacle avoidance model to be selected. In order to improve the precision of the target obstacle avoidance model obtained by subsequent training, complete random sampling can be performed to obtain a large number of obstacle avoidance models to be selected in different shapes. At this time, the selectable candidate values used by the convolutions in each obstacle avoidance model to be selected are different, that is, the model structures of each obstacle avoidance model to be selected are different.
S320, determining the calculated amount and the model speed corresponding to each obstacle avoidance model to be selected, and determining a penalty function of the corresponding obstacle avoidance model to be selected according to the calculated amount and the model speed.
In this embodiment, each obstacle avoidance model to be selected may be deployed on GPU hardware, the inference speed of the model may be measured, and the calculated amount of the model may be obtained through statistics. That is to say, each obstacle avoidance model to be selected may be deployed on the unmanned vehicle and started, and the model speed and the calculation amount of each obstacle avoidance model to be selected may be determined. Corresponding penalty functions may be set for different model speeds and computational quantities. The penalty functions may include a computational quantum penalty function corresponding to the computational effort and a velocity sub penalty function corresponding to the model velocity. That is, the penalty function is an accumulation of two sub-penalty functions, each of which may be a linear function.
S330, aiming at each obstacle avoidance model to be selected, determining a record in a target query table based on the current obstacle avoidance model to be selected, the calculated amount corresponding to the current obstacle avoidance model to be selected, the model speed and the corresponding penalty function, so as to determine the target penalty function corresponding to the obstacle avoidance training model to be updated from the target query table.
The target lookup table may include at least one to-be-selected obstacle avoidance model, and a calculation amount, a model speed, and a penalty function corresponding to each to-be-selected obstacle avoidance model. Because the penalty function comprises a calculated quantum penalty function and a speed sub penalty function, the record corresponding to each obstacle avoidance model to be selected in the target lookup table comprises the calculated quantum penalty function and the speed sub penalty function.
It should be further noted that, in order to reduce the data amount in the target lookup table, a mapping relationship between the obstacle avoidance model to be selected and the penalty function in the penalty function may be established. For example, all the sub-penalty functions that may correspond to each calculated amount and model speed may be predetermined, and a correspondence between each obstacle avoidance model to be selected and the corresponding sub-penalty function may be established. The method has the advantages that the calculation quantum penalty functions and the speed sub penalty functions corresponding to different to-be-selected obstacle avoidance models are the same, and if the sub penalty functions of each to-be-selected obstacle avoidance model are correspondingly stored in the data table, at least two repeated sub penalty functions exist, namely, the problem of data redundancy exists. If the corresponding relation is established, repeated sub-penalty functions can be eliminated, and therefore the problems of data redundancy and data volume are reduced. It should be noted that, in order to subsequently and quickly determine the penalty function corresponding to the obstacle avoidance model to be updated, selectable candidate values corresponding to each index of each obstacle avoidance model to be selected may be recorded in the target lookup table.
S340, determining target candidate values corresponding to various indexes of a backbone network of the obstacle avoidance model to be trained according to the acquired learnable configuration parameters, and obtaining the obstacle avoidance training model to be updated based on the target candidate values.
After the learnable configuration parameters are obtained, the learnable configuration parameters can be processed based on the normalized index function to obtain target candidate values corresponding to various indexes in a backbone network of the obstacle avoidance model to be trained, namely configuration is selected, and then a specific obstacle avoidance training model to be updated is obtained according to the selected configuration.
For example, referring to fig. 4, the learnable configuration parameter is processed based on the normalized exponential function to obtain each target candidate value, i.e. the selected configuration. And obtaining a specific model, namely the obstacle avoidance training model to be updated according to the selected configuration.
And S350, inputting the training pseudo image into the obstacle avoidance training model to be updated to obtain actual output data corresponding to the training pseudo image.
S360, determining a target obstacle avoidance model to be selected corresponding to the obstacle avoidance training model to be updated according to the target query table, and taking a penalty function of the target obstacle avoidance model to be selected as the target penalty function.
Specifically, after the obstacle avoidance training model to be updated is determined, a model structure of the model, that is, a target candidate value corresponding to each index in the model backbone network can be determined. And determining a corresponding obstacle avoidance model to be selected from the target query table according to the target candidate value corresponding to each index, and using the corresponding obstacle avoidance model as the target obstacle avoidance model to be selected. And determining a penalty function of the obstacle avoidance training model to be updated according to the calculated quantum penalty function and the statistical quantum penalty function corresponding to the target obstacle avoidance model to be selected. The penalty function acquired at this time may be used as the target penalty function.
With continued reference to fig. 4, a computation penalty function and a model speed penalty function corresponding to the current configuration, that is, the obstacle avoidance training model to be updated, may be obtained from the computation amount lookup table and the model speed lookup table in the target lookup table. Based on a preset loss function and a penalty function corresponding to the obstacle avoidance model to be updated, a target loss function, namely total loss, can be obtained.
And S370, obtaining the target loss function based on the target penalty function and a preset loss function.
In this embodiment, the target loss function is dynamically adjusted according to a penalty function of the obstacle avoidance training model to be updated and a preset loss function. It can be understood that, different target candidate values correspond to different to-be-updated obstacle avoidance training models, correspondingly, a certain difference also exists between the calculated amount and the model speed, and at this time, a certain difference also exists between the penalty function, and the target loss function can be determined according to the penalty function and the preset loss function of the to-be-updated obstacle avoidance training model. The predetermined loss function is determined according to a model, and the predetermined loss function may be invariant during a model training process.
And S380, correcting the target loss function based on the actual output data and the corresponding theoretical output data, and when the convergence of the target loss function is detected, taking the obstacle avoidance model to be updated as the target obstacle avoidance model.
Wherein the theoretical output data is a theoretical output result corresponding to the current training pseudo-image.
Specifically, the target loss function may be corrected according to actual output data and theoretical output data, and if it is detected that the target loss function is convergent, the obstacle avoidance training model to be updated obtained at this time is determined to be the obstacle avoidance model that can be finally used, and the obstacle avoidance training model to be updated may be used as the target obstacle avoidance model.
It can be understood that, after training is finished, the optimal configuration of the backbone network adapted to the current scenario can be obtained.
According to the technical scheme of the embodiment of the invention, before the obstacle avoidance training model to be updated is determined based on the learnable configuration parameters, all selectable candidate values of all indexes of each convolution can be sampled completely randomly to obtain a plurality of obstacle avoidance models to be selected, and a penalty function of each obstacle avoidance model to be selected is determined. The target loss function of the obstacle avoidance training model to be updated is determined based on the obstacle avoidance training model to be updated and the preset loss function, and then whether the training model to be updated is the target obstacle avoidance model is determined based on the target loss function, so that the technical effect of self-adaptive adjustment of optional candidate values of all indexes in a backbone network is achieved, and the technical effect of accuracy of the target obstacle avoidance model is improved.
Example four
Fig. 5 is a schematic flow chart of a training method of an obstacle avoidance model according to a fourth embodiment of the present invention, and based on the foregoing embodiment, if it is detected that a target loss function is not converged, a target obstacle avoidance model may be determined based on the technical solution provided in the embodiment of the present invention, and a specific implementation manner thereof may refer to detailed explanation of the technical solution. The technical terms that are the same as or corresponding to the above embodiments are not repeated herein.
As shown in fig. 5, the method includes:
s410, updating the learnable configuration parameters corresponding to the obstacle avoidance model to be trained according to the target loss function corresponding to the obstacle avoidance training model to be trained.
The target loss function is determined according to a penalty function corresponding to the obstacle avoidance model to be trained and a preset loss function;
and S420, updating target candidate values of various indexes of a backbone network of the obstacle avoidance model to be trained according to the learnable configuration parameters to obtain the obstacle avoidance training model to be updated.
The backbone network comprises convolution layers, wherein each convolution layer comprises at least one convolution, and each convolution comprises at least one index;
and S430, inputting the training pseudo image into the obstacle avoidance training model to be updated to obtain actual output data corresponding to the training pseudo image.
S440, if the target loss function is not converged, the obstacle avoidance training model to be updated is used as the obstacle avoidance training model to be trained, actual output data and theoretical data are processed based on the updated target loss function of the obstacle avoidance model to be trained, model parameters and learnable configuration parameters for adjusting the obstacle avoidance model to be trained are obtained, candidate values corresponding to all indexes in a backbone network of the obstacle avoidance model to be trained are updated based on the learnable configuration parameters, the obstacle avoidance model to be updated is obtained, and the step of determining the obstacle avoidance model to be updated is repeatedly executed until the target loss function corresponding to the obstacle avoidance model to be updated is converged, and the target obstacle avoidance model is obtained.
The model parameters at this time may be common parameters in the model, and do not relate to parameters of the model shape.
It can be understood that when the target loss function is determined and the target loss function is determined not to be converged, the actual output data and the theoretical output data can be processed based on the target loss function to obtain a processing result, the common model parameters and the learnable configuration parameters in the obstacle avoidance training model to be updated can be adjusted according to the processing result, and meanwhile, the obstacle avoidance training model to be updated is used as the obstacle avoidance model to be trained. And processing the updated learnable configuration parameters based on the normalized index function to obtain a new target candidate value, and endowing the target candidate value to the obstacle avoidance model to be trained to obtain the obstacle avoidance training model to be updated. And processing the training pseudo image based on the to-be-updated obstacle avoidance training model, determining whether a target loss function of the to-be-updated obstacle avoidance training model is converged or not according to a processing result, if the target loss function is still not converged, repeatedly executing the steps until the target loss function of the to-be-updated obstacle avoidance training model is converged, and taking the corresponding to-be-updated obstacle avoidance training model when the target loss function is converged as the target training model.
According to the technical scheme of the embodiment of the invention, when the target loss function is determined not to be converged, the actual output data and the theoretical output data can be processed based on the target loss function to obtain the corresponding loss value, and the common parameters and the learnable configuration parameters in the model are corrected according to the loss value. The learnable configuration parameters are processed again on the basis of the normalized index function to obtain target candidate values, the obstacle avoidance training model to be updated is updated on the basis of the target candidate values, the updated obstacle avoidance training model to be updated is processed on the basis of the training samples, and the target obstacle avoidance model is determined until the target loss function is converged, so that the technical effect of automatically determining the model structure of the target obstacle avoidance model is achieved.
EXAMPLE five
Fig. 6 is a schematic flow chart of a method for training an obstacle avoidance model according to a fifth embodiment of the present invention. In the technology of the foregoing embodiment, after the target obstacle avoidance model is obtained, the target obstacle avoidance model may be integrated on each unmanned vehicle, so as to process the pseudo image corresponding to the three-dimensional point cloud based on the target obstacle avoidance model, thereby achieving obstacle avoidance. The technical terms that are the same as or corresponding to the above embodiments are not repeated herein.
As shown in fig. 6, the method includes:
and S510, updating learnable configuration parameters corresponding to the obstacle avoidance model to be trained according to a target loss function corresponding to the obstacle avoidance training model to be trained.
The target loss function is determined according to a penalty function corresponding to the obstacle avoidance model to be trained and a preset loss function;
s520, updating target candidate values of all indexes of a backbone network of the obstacle avoidance model to be trained according to the learnable configuration parameters to obtain the obstacle avoidance training model to be updated.
The backbone network comprises convolution layers, wherein the convolution layers comprise at least one convolution, and each convolution comprises at least one index.
S530, inputting the training pseudo image into the obstacle avoidance training model to be updated, and obtaining actual output data corresponding to the training pseudo image.
And S540, if the target loss function corresponding to the obstacle avoidance training model to be updated is converged, taking the obstacle avoidance training model to be updated as a target obstacle avoidance model. Wherein the target loss function is determined based on a preset loss function and a target penalty function, and the learnable configuration parameter is determined for the actual output data and theoretical output data corresponding to the current training pseudo-image based on the target loss function.
And S550, receiving the target pseudo image.
On the basis of the technical scheme, after the target obstacle avoidance model is obtained through training, the target obstacle avoidance model can be integrated in the unmanned vehicle, so that three-dimensional point cloud is obtained through laser radar scanning, after a pseudo image corresponding to the three-dimensional point cloud is obtained based on a preset method, the pseudo image is processed based on the integrated target obstacle avoidance model, and the unmanned vehicle is controlled to achieve obstacle avoidance according to a processing result.
The target pseudo image is an image corresponding to the three-dimensional point cloud in the driving process of the unmanned vehicle.
And S560, inputting the target pseudo image into a target obstacle avoidance model obtained through pre-training to obtain obstacle information corresponding to the target pseudo image.
The obstacle information may include the type of the obstacle, such as a pedestrian or a vehicle, and may also be the size of the obstacle, the orientation of the obstacle, and the like.
Specifically, after the target pseudo image is input into the target obstacle avoidance model, the target obstacle avoidance model may process the target pseudo image to obtain the obstacle information corresponding to the target pseudo image.
And S570, controlling the target vehicle to run based on the obstacle information.
After determining the target obstacle information, the travel trajectory of the target vehicle may be controlled.
According to the technical scheme of the embodiment of the invention, after the target obstacle avoidance model which is most adaptive to the current scene is obtained through training, the three-dimensional point cloud image acquired by the laser radar of the unmanned vehicle can be processed based on the target obstacle avoidance model to obtain the obstacle information, so that the accuracy of determining the obstacle information is improved, and the control effect of the unmanned vehicle is further improved.
EXAMPLE six
Fig. 7 is a schematic structural diagram of a training device for an obstacle avoidance model according to a sixth embodiment of the present invention, and as shown in fig. 7, the training device includes: the method comprises a learnable configuration parameter determination module 610, an obstacle avoidance training model determination module 620 to be updated, an output data determination module 630 and a target obstacle avoidance model determination module 640.
The learnable configuration parameter determining module 610 is configured to update a learnable configuration parameter corresponding to an obstacle avoidance model to be trained according to a target loss function corresponding to the obstacle avoidance model to be trained; the target loss function is determined according to a penalty function corresponding to the obstacle avoidance model to be trained and a preset loss function; the to-be-updated obstacle avoidance training model determining module 620 is configured to update target candidate values of various indexes of a backbone network of the to-be-updated obstacle avoidance training model according to the learnable configuration parameter, so as to obtain the to-be-updated obstacle avoidance training model; the backbone network comprises convolution layers, wherein each convolution layer comprises at least one convolution, and each convolution comprises at least one index; an output data determining module 630, configured to input a training pseudo image into the obstacle avoidance training model to be updated, to obtain actual output data corresponding to the training pseudo image; and a target obstacle avoidance model determining module 640, configured to use the to-be-updated obstacle avoidance training model as a target obstacle avoidance model if a target loss function corresponding to the to-be-updated obstacle avoidance training model converges. On the basis of the above technical solution, the apparatus further includes:
the three-dimensional point cloud determining module is used for acquiring three-dimensional point cloud obtained based on laser radar processing;
and the pseudo image determining module is used for processing the three-dimensional point cloud by adopting a preset algorithm to obtain a training pseudo image for training the obstacle avoidance model to be trained.
On the basis of the technical scheme, before the model for determining the obstacle avoidance training model to be updated is used for training the obstacle avoidance model to be trained, the method further comprises the following steps: determining at least one index corresponding to each convolution in the backbone network of the obstacle avoidance model to be trained and at least one optional candidate value corresponding to each index; wherein the index includes at least one of a step size, a number of channels, a convolution kernel size, and a convolution kernel type.
On the basis of the technical scheme, the module for determining the obstacle avoidance training model to be updated comprises:
the weight matrix determining unit is used for processing the learnable configuration parameters based on a normalized index function to obtain a weight matrix corresponding to each index in the backbone network of the obstacle avoidance model to be trained; the number of the weight matrixes is consistent with the number of the indexes, the row number of the weight matrixes corresponds to the number of convolutions in the backbone network, and the column number corresponds to the number of selectable candidate values of each index;
a target candidate value determining unit, configured to determine a target candidate value of at least one index corresponding to each convolution based on the weight matrix of each index;
and the obstacle avoidance training model determining unit is used for determining the model structure of the obstacle avoidance model to be trained based on each target candidate value to obtain the obstacle avoidance training model to be updated.
On the basis of the above technical solution, the target candidate value determining unit is further configured to determine, for the weight matrix of each index, a target weight value corresponding to each convolution in the convolution layer according to a weight value of each line of the weight matrix corresponding to the current index; the weight value is used for representing the probability of each optional candidate value being selected; and determining the target candidate value according to the selectable candidate value corresponding to each target weight value.
On the basis of the above technical solution, the target candidate value determining unit is further configured to determine a maximum weight value of each row in the current weight matrix, and determine the maximum weight value as a target weight value of each convolution of the current index; correspondingly, the determining the target candidate value according to the selectable candidate value corresponding to each target weight value includes: and determining selectable candidate values corresponding to the target weight values, and taking the selectable candidate values as target candidate values.
On the basis of the above technical solution, the to-be-updated obstacle avoidance training model determining module is further configured to:
and updating the used target candidate values corresponding to the step length, the channel, the convolution kernel and the convolution kernel type in each convolution according to the target candidate values to obtain the obstacle avoidance training model to be updated.
On the basis of the above technical solution, the apparatus further includes:
the obstacle avoidance model to be selected determining module is used for randomly sampling at least one optional candidate value corresponding to each convolution index in the backbone network to obtain at least one obstacle avoidance model to be selected with different model structures;
the penalty function determining module is used for determining the calculated amount and the model speed of each obstacle avoidance model to be selected and determining a penalty function of the corresponding obstacle avoidance model to be selected according to the calculated amount and the model speed; the penalty function comprises a calculation quantum penalty function corresponding to the calculated amount and a speed sub penalty function corresponding to the model speed;
and the target query table determining module is used for determining one record in a target query table according to each to-be-selected obstacle avoidance model and based on the current to-be-selected obstacle avoidance model, the calculated amount corresponding to the current to-be-selected obstacle avoidance model, the model speed and the corresponding penalty function so as to determine the target penalty function corresponding to the to-be-updated obstacle avoidance training model from the target query table.
On the basis of the technical scheme, the target obstacle avoidance model determining module is used for determining a target obstacle avoidance model to be selected corresponding to the obstacle avoidance training model to be updated according to the target query table, and taking a penalty function of the target obstacle avoidance model to be selected as the target penalty function; obtaining the target loss function based on the target penalty function and a preset loss function; correcting the target loss function based on the actual output data and the corresponding theoretical output data; and when the convergence of the target loss function is detected, taking the obstacle avoidance model to be updated as a target obstacle avoidance model.
On the basis of the above technical solution, the apparatus further includes: a training module to:
if the target loss function is not converged, the obstacle avoidance training model to be updated is used as the obstacle avoidance training model to be trained, the actual output data and the theoretical data are processed based on the updated target loss function of the obstacle avoidance model to be trained, model parameters and learnable configuration parameters for adjusting the obstacle avoidance model to be trained are obtained, candidate values corresponding to all indexes in a backbone network of the obstacle avoidance model to be trained are updated based on the learnable configuration parameters, the obstacle avoidance model to be updated is obtained, and the step of determining the obstacle avoidance model to be updated is repeatedly executed until the target loss function corresponding to the obstacle avoidance model to be updated is converged, and the target obstacle avoidance model is obtained.
On the basis of the above technical solution, the apparatus further includes: the obstacle avoidance module is used for receiving a target pseudo image; inputting the target pseudo image into a target obstacle avoidance model obtained through pre-training to obtain obstacle information corresponding to the target pseudo image; and controlling the target vehicle to run based on the obstacle information.
The technical scheme of the embodiment of the invention determines target candidate values corresponding to various indexes of a backbone network of the obstacle avoidance model to be trained according to the acquired learnable configuration parameters, determines the model structure of the obstacle avoidance model to be trained according to the target candidate values, takes the obstacle avoidance model to be trained with the determined model structure as the obstacle avoidance training model to be updated, can input the current training image into the obstacle avoidance training model to be updated to obtain actual output data, determines the target loss function to be converged according to the actual output data and the target penalty function of the obstacle avoidance training model to be updated, takes the obstacle avoidance model to be updated as the target obstacle avoidance model, solves the problems that the manual adjustment of the model structure in the prior art takes trouble and labor, thereby causing low efficiency, simultaneously, the determined model structure is possibly not matched with a scene, and realizes the dynamic adjustment of the model parameters in the model according to the learnable configuration parameters, therefore, the obtained target obstacle avoidance model is matched with a specific application scene, the accuracy of model determination is improved, and further, the technical effect of the accuracy of image processing can be improved when a pseudo image is processed based on the target obstacle avoidance model.
It should be noted that, the units and modules included in the apparatus are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.
EXAMPLE seven
Fig. 8 is a schematic structural diagram of an electronic device according to a seventh embodiment of the present invention. FIG. 8 illustrates a block diagram of an exemplary electronic device 70 suitable for use in implementing embodiments of the present invention. The electronic device 70 shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 8, electronic device 70 is embodied in the form of a general purpose computing device. The components of the electronic device 70 may include, but are not limited to: one or more processors or processing units 701, a system memory 702, and a bus 703 that couples various system components including the system memory 702 and the processing unit 701.
Bus 703 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Electronic device 70 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 70 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 702 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)704 and/or cache memory 705. The electronic device 70 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, the storage system 706 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 8, and commonly referred to as a "hard drive"). Although not shown in FIG. 8, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 703 via one or more data media interfaces. Memory 702 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 708 having a set (at least one) of program modules 707 may be stored, for example, in memory 702, such program modules 707 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 707 generally perform the functions and/or methodologies of the described embodiments of the invention.
The electronic device 70 may also communicate with one or more external devices 709 (e.g., keyboard, pointing device, display 710, etc.), with one or more devices that enable a user to interact with the electronic device 70, and/or with any devices (e.g., network card, modem, etc.) that enable the electronic device 70 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 711. Also, the electronic device 70 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 712. As shown, the network adapter 712 communicates with the other modules of the electronic device 70 over a bus 703. It should be appreciated that although not shown in FIG. 8, other hardware and/or software modules may be used in conjunction with electronic device 70, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 701 executes various functional applications and data processing by running a program stored in the system memory 702, for example, to implement the method for training the obstacle avoidance model provided in the embodiment of the present invention.
Example eight
An eighth embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for training an obstacle avoidance model.
The method comprises the following steps:
updating a learnable configuration parameter corresponding to the obstacle avoidance model to be trained according to a target loss function corresponding to the obstacle avoidance training model to be trained; the target loss function is determined according to a penalty function corresponding to the obstacle avoidance model to be trained and a preset loss function;
updating target candidate values of various indexes of a backbone network of the obstacle avoidance model to be trained according to the learnable configuration parameters to obtain an obstacle avoidance training model to be updated; the backbone network comprises convolution layers, wherein each convolution layer comprises at least one convolution, and each convolution comprises at least one index;
inputting a training pseudo image into the obstacle avoidance training model to be updated to obtain actual output data corresponding to the training pseudo image;
and if the target loss function corresponding to the obstacle avoidance training model to be updated is converged, taking the obstacle avoidance training model to be updated as a target obstacle avoidance model. Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (14)

1. A method for training an obstacle avoidance model is characterized by comprising the following steps:
updating a learnable configuration parameter corresponding to the obstacle avoidance model to be trained according to a target loss function corresponding to the obstacle avoidance training model to be trained; the target loss function is determined according to a penalty function corresponding to the obstacle avoidance model to be trained and a preset loss function;
updating target candidate values of various indexes of a backbone network of the obstacle avoidance model to be trained according to the learnable configuration parameters to obtain an obstacle avoidance training model to be updated; the backbone network comprises convolution layers, wherein each convolution layer comprises at least one convolution, and each convolution comprises at least one index;
inputting a training pseudo image into the obstacle avoidance training model to be updated to obtain actual output data corresponding to the training pseudo image;
and if the target loss function corresponding to the obstacle avoidance training model to be updated is converged, taking the obstacle avoidance training model to be updated as a target obstacle avoidance model.
2. The method of claim 1, further comprising:
acquiring three-dimensional point cloud obtained based on laser radar processing;
and processing the three-dimensional point cloud by adopting a preset algorithm to obtain a training pseudo image for training the obstacle avoidance model to be trained.
3. The method of claim 1, wherein the updating the learnable configuration parameters corresponding to the obstacle avoidance model to be trained according to the target loss function corresponding to the obstacle avoidance model to be trained comprises:
determining a penalty function corresponding to the obstacle avoidance model to be trained according to candidate values used by various indexes in a backbone network in the obstacle avoidance model to be trained;
and determining the target loss function according to the penalty function and the preset loss function, processing an actual output result and a theoretical output result of the obstacle avoidance model to be trained on a previous training sample according to the target loss function to obtain the learnable configuration parameter, and updating target candidate values corresponding to all indexes in the obstacle avoidance model to be trained on the basis of the learnable configuration parameter to obtain the obstacle avoidance training model to be updated.
4. The method according to claim 1, wherein the updating target candidate values of various indexes of a backbone network of the obstacle avoidance model to be trained according to the learnable configuration parameters to obtain the obstacle avoidance training model to be updated comprises:
processing the learnable configuration parameters based on a normalized index function to obtain a weight matrix corresponding to each index in the backbone network of the obstacle avoidance model to be trained; the number of the weight matrixes is consistent with the number of the indexes, the row number of the weight matrixes corresponds to the number of convolutions in the backbone network, and the column number corresponds to the number of selectable candidate values of each index;
determining a target candidate value of at least one index corresponding to each convolution based on the weight matrix of each index;
and determining the model structure of the obstacle avoidance model to be trained based on each target candidate value to obtain the obstacle avoidance training model to be updated.
5. The method of claim 4, wherein determining the target candidate value of the at least one indicator corresponding to each convolution based on the weight matrix of each indicator comprises:
aiming at the weight matrix of each index, determining a target weight value corresponding to each convolution in the convolution layer according to the weight value of each row of the weight matrix corresponding to the current index; the weight value is used for representing the probability of each optional candidate value being selected;
and determining the target candidate value according to the selectable candidate value corresponding to each target weight value.
6. The method of claim 5, wherein determining a target weight value corresponding to each convolution in a convolutional layer according to a weight value of each row of a weight matrix corresponding to a current index comprises:
determining the maximum weight value of each row in the current weight matrix, and determining the maximum weight value as the target weight value of the current index in each convolution;
correspondingly, the determining the target candidate value according to the selectable candidate value corresponding to each target weight value includes:
and determining selectable candidate values corresponding to the target weight values, and taking the selectable candidate values as target candidate values.
7. The method as claimed in claim 4, wherein the determining the model structure of the obstacle avoidance model to be trained based on each target candidate value to obtain the obstacle avoidance training model to be updated includes:
updating the used target candidate value of each index in each convolution according to each target candidate value to obtain the obstacle avoidance training model to be updated;
wherein the indicators in the convolution include at least one of step size, number of channels, convolution kernel size, and convolution kernel type.
8. The method of claim 1, further comprising, before the updating the learnable configuration parameters corresponding to the obstacle avoidance model to be trained according to the target loss function corresponding to the obstacle avoidance training model to be trained, the following:
randomly sampling at least one optional candidate value corresponding to each convolution index in the backbone network to obtain at least one obstacle avoidance model to be selected with different model structures;
determining the calculated amount and the model speed of each obstacle avoidance model to be selected, and determining a penalty function of the corresponding obstacle avoidance model to be selected according to the calculated amount and the model speed; the penalty function comprises a calculation quantum penalty function corresponding to the calculated amount and a speed sub penalty function corresponding to the model speed;
and for each obstacle avoidance model to be selected, determining a record in a target query table based on the current obstacle avoidance model to be selected, the calculated amount corresponding to the current obstacle avoidance model to be selected, the model speed and a corresponding penalty function, so as to determine the target penalty function corresponding to the obstacle avoidance training model to be updated from the target query table.
9. The method according to claim 8, wherein when determining that a target loss function converges based on the actual output data and a target penalty function relative to the obstacle avoidance training model to be updated, taking the obstacle avoidance model to be updated as a target obstacle avoidance model, includes:
determining a target obstacle avoidance model to be selected corresponding to the obstacle avoidance training model to be updated according to the target query table, and taking a penalty function of the target obstacle avoidance model to be selected as the target penalty function;
obtaining the target loss function based on the target penalty function and a preset loss function;
correcting the target loss function based on the actual output data and the corresponding theoretical output data;
and when the convergence of the target loss function is detected, taking the obstacle avoidance model to be updated as a target obstacle avoidance model.
10. The method of claim 1, further comprising:
if the target loss function is not converged, the obstacle avoidance training model to be updated is used as the obstacle avoidance training model to be trained, the actual output data and the theoretical data are processed based on the updated target loss function of the obstacle avoidance model to be trained, model parameters and learnable configuration parameters for adjusting the obstacle avoidance model to be trained are obtained, candidate values corresponding to all indexes in a backbone network of the obstacle avoidance model to be trained are updated based on the learnable configuration parameters, the obstacle avoidance model to be updated is obtained, and the step of determining the obstacle avoidance model to be updated is repeatedly executed until the target loss function corresponding to the obstacle avoidance model to be updated is converged, and the target obstacle avoidance model is obtained.
11. The method of claim 1, after determining the target obstacle avoidance model, further comprising:
receiving a target pseudo image;
inputting the target pseudo image into a target obstacle avoidance model obtained through pre-training to obtain obstacle information corresponding to the target pseudo image;
and controlling the target vehicle to run based on the obstacle information.
12. The utility model provides a keep away trainer of barrier model which characterized in that includes:
the learnable configuration parameter determining module is used for updating the learnable configuration parameters corresponding to the obstacle avoidance model to be trained according to the target loss function corresponding to the obstacle avoidance model to be trained; the target loss function is determined according to a penalty function corresponding to the obstacle avoidance model to be trained and a preset loss function;
the obstacle avoidance training model to be updated determining module is used for updating target candidate values of various indexes of a backbone network of the obstacle avoidance training model to be trained according to the learnable configuration parameters to obtain the obstacle avoidance training model to be updated; the backbone network comprises convolution layers, wherein each convolution layer comprises at least one convolution, and each convolution comprises at least one index;
the output data determining module is used for inputting the training pseudo-image into the obstacle avoidance training model to be updated to obtain actual output data corresponding to the training pseudo-image;
and the target obstacle avoidance model determining module is used for taking the to-be-updated obstacle avoidance training model as a target obstacle avoidance model if a target loss function corresponding to the to-be-updated obstacle avoidance training model is converged.
13. An electronic device, characterized in that the electronic device comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of training an obstacle avoidance model according to any one of claims 1 to 11.
14. A storage medium containing computer executable instructions for performing a method of training an obstacle avoidance model as claimed in any one of claims 1 to 11 when executed by a computer processor.
CN202110963071.1A 2021-08-20 2021-08-20 Obstacle avoidance model training method and device, electronic equipment and storage medium Pending CN113780101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110963071.1A CN113780101A (en) 2021-08-20 2021-08-20 Obstacle avoidance model training method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110963071.1A CN113780101A (en) 2021-08-20 2021-08-20 Obstacle avoidance model training method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113780101A true CN113780101A (en) 2021-12-10

Family

ID=78838578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110963071.1A Pending CN113780101A (en) 2021-08-20 2021-08-20 Obstacle avoidance model training method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113780101A (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446970A (en) * 2018-10-24 2019-03-08 西南交通大学 A kind of Intelligent Mobile Robot road scene recognition methods based on deep learning
CN109584179A (en) * 2018-11-29 2019-04-05 厦门美图之家科技有限公司 A kind of convolutional neural networks model generating method and image quality optimization method
US20200189590A1 (en) * 2018-12-18 2020-06-18 Beijing DIDI Infinity Technology and Development Co., Ltd Systems and methods for determining driving action in autonomous driving
US20200201342A1 (en) * 2018-12-13 2020-06-25 Aisin Seiki Kabushiki Kaisha Obstacle avoidance model generation method, obstacle avoidance model generation device, and obstacle avoidance model generation program
US20200210726A1 (en) * 2018-12-28 2020-07-02 Nvidia Corporation Distance to obstacle detection in autonomous machine applications
US20200250542A1 (en) * 2019-02-02 2020-08-06 Beijing Horizon Robotics Technology Research And Development Co., Ltd. Neural Network Training Method, Neural Network Training Apparatus and Electronic Device
CN112700408A (en) * 2020-12-28 2021-04-23 中国银联股份有限公司 Model training method, image quality evaluation method and device
WO2021087985A1 (en) * 2019-11-08 2021-05-14 深圳市欢太科技有限公司 Model training method and apparatus, storage medium, and electronic device
CN112906542A (en) * 2021-02-08 2021-06-04 北京理工大学 Unmanned vehicle obstacle avoidance method and device based on reinforcement learning
CN113128419A (en) * 2021-04-23 2021-07-16 京东鲲鹏(江苏)科技有限公司 Obstacle identification method and device, electronic equipment and storage medium
KR20210090239A (en) * 2018-12-13 2021-07-19 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 Information prediction methods, model training methods and servers
CN113176776A (en) * 2021-03-03 2021-07-27 上海大学 Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446970A (en) * 2018-10-24 2019-03-08 西南交通大学 A kind of Intelligent Mobile Robot road scene recognition methods based on deep learning
CN109584179A (en) * 2018-11-29 2019-04-05 厦门美图之家科技有限公司 A kind of convolutional neural networks model generating method and image quality optimization method
US20200201342A1 (en) * 2018-12-13 2020-06-25 Aisin Seiki Kabushiki Kaisha Obstacle avoidance model generation method, obstacle avoidance model generation device, and obstacle avoidance model generation program
KR20210090239A (en) * 2018-12-13 2021-07-19 텐센트 테크놀로지(센젠) 컴퍼니 리미티드 Information prediction methods, model training methods and servers
US20200189590A1 (en) * 2018-12-18 2020-06-18 Beijing DIDI Infinity Technology and Development Co., Ltd Systems and methods for determining driving action in autonomous driving
US20200210726A1 (en) * 2018-12-28 2020-07-02 Nvidia Corporation Distance to obstacle detection in autonomous machine applications
US20200250542A1 (en) * 2019-02-02 2020-08-06 Beijing Horizon Robotics Technology Research And Development Co., Ltd. Neural Network Training Method, Neural Network Training Apparatus and Electronic Device
WO2021087985A1 (en) * 2019-11-08 2021-05-14 深圳市欢太科技有限公司 Model training method and apparatus, storage medium, and electronic device
CN112700408A (en) * 2020-12-28 2021-04-23 中国银联股份有限公司 Model training method, image quality evaluation method and device
CN112906542A (en) * 2021-02-08 2021-06-04 北京理工大学 Unmanned vehicle obstacle avoidance method and device based on reinforcement learning
CN113176776A (en) * 2021-03-03 2021-07-27 上海大学 Unmanned ship weather self-adaptive obstacle avoidance method based on deep reinforcement learning
CN113128419A (en) * 2021-04-23 2021-07-16 京东鲲鹏(江苏)科技有限公司 Obstacle identification method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN113762252B (en) Unmanned aerial vehicle intelligent following target determining method, unmanned aerial vehicle and remote controller
US11836596B2 (en) Neural networks with relational memory
US11741334B2 (en) Data-efficient reinforcement learning for continuous control tasks
CN110232411B (en) Model distillation implementation method, device, system, computer equipment and storage medium
JP7249384B2 (en) Method for generating route planning model, method for determining trajectory of means of transportation, apparatus for generating route planning model, means of transportation, electronic device, storage medium and computer program
US20210103815A1 (en) Domain adaptation for robotic control using self-supervised learning
CN112183166A (en) Method and device for determining training sample and electronic equipment
US20240029303A1 (en) Three-dimensional target detection method and apparatus
WO2023165361A1 (en) Data processing method and related device
CN112650300A (en) Unmanned aerial vehicle obstacle avoidance method and device
WO2022222647A1 (en) Method and apparatus for predicting vehicle intention, device, and storage medium
WO2020135601A1 (en) Image processing method and device, vehicle-mounted operation platform, electronic device and system
US20210383224A1 (en) Machine learning method and machine learning system involving data augmentation
CN113052253A (en) Hyper-parameter determination method, device, deep reinforcement learning framework, medium and equipment
CN112668596B (en) Three-dimensional object recognition method and device, recognition model training method and device
CN112734827A (en) Target detection method and device, electronic equipment and storage medium
CN113780101A (en) Obstacle avoidance model training method and device, electronic equipment and storage medium
CN116912187A (en) Image generation model training and image generation method, device, equipment and medium
CN116461507A (en) Vehicle driving decision method, device, equipment and storage medium
CN112541515A (en) Model training method, driving data processing method, device, medium and equipment
CN113269168B (en) Obstacle data processing method and device, electronic equipment and computer readable medium
CN112862840B (en) Image segmentation method, device, equipment and medium
CN114387197A (en) Binocular image processing method, device, equipment and storage medium
CN114386481A (en) Vehicle perception information fusion method, device, equipment and storage medium
US20230259467A1 (en) Direct memory access (dma) engine processing data transfer tasks in parallel

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination