CN113110526B

CN113110526B - Model training method, unmanned equipment control method and device

Info

Publication number: CN113110526B
Application number: CN202110657875.9A
Authority: CN
Inventors: 刘思威; 贾庆山; 任冬淳; 白钰; 樊明宇; 夏华夏; 毛一年
Original assignee: Tsinghua University; Beijing Sankuai Online Technology Co Ltd
Current assignee: Tsinghua University; Beijing Sankuai Online Technology Co Ltd
Priority date: 2021-06-15
Filing date: 2021-06-15
Publication date: 2021-09-24
Anticipated expiration: 2041-06-15
Also published as: CN113110526A

Abstract

The specification discloses a model training method, a control method and a control device of unmanned equipment, and specifically discloses that for each driving scene, a driving scene corresponding to a training sample is determined through a scene driving model, a decision model corresponding to the driving scene is trained by using the training sample to obtain each adjusted decision model, then an actual driving scene corresponding to each training sample is determined on the basis of the adjusted decision model, and the scene driving model is trained by using the actual driving scene until a preset training condition is determined to be met. And then, when the unmanned equipment is controlled, the driving scene where the unmanned equipment is located is determined by using the trained scene driving model, then a decision model matched with the driving scene is adopted, a control strategy corresponding to the unmanned equipment is determined, and the unmanned equipment is controlled to drive, so that the adaptability of the unmanned equipment to different driving scenes is improved.

Description

Model training method, unmanned equipment control method and device

Technical Field

The specification relates to the technical field of unmanned driving, in particular to a model training method, and a control method and device of unmanned equipment.

Background

The unmanned technology is a driving technology which is used for driving a vehicle by itself through various actual scenes under the condition that no driver exists or the driver does not take over the vehicle. With the continuous progress of the artificial intelligence technology, the unmanned technology has also been developed greatly, and various unmanned vehicles are favored by more and more users. At present, two control strategy planning modes are generally adopted in unmanned driving, one mode is to dynamically plan a driving track according to data collected by an unmanned vehicle, then a control strategy corresponding to the unmanned vehicle is determined according to the planned driving track, and the unmanned vehicle is controlled to drive through the determined control strategy; and the other method is to input the data acquired by the unmanned vehicle into a pre-trained decision model to directly obtain a corresponding control strategy.

However, in real life, the scenes that the unmanned vehicles need to face are various, for example, mountain roads, off-road roads, urban roads, rural roads, and the like, and the problems and challenges that the unmanned vehicles need to face are different when facing different scenes, so that the focus points that need to be paid attention to when planning the control strategy are also different.

Disclosure of Invention

The present specification provides a model training method, a method and an apparatus for controlling an unmanned aerial vehicle, which partially solve the above problems in the prior art.

The technical scheme adopted by the specification is as follows:

the present specification provides a method of model training, comprising:

inputting historical sensing data serving as the training samples into a preset scene driving model aiming at each training sample to obtain a driving scene corresponding to the training sample;

inputting the historical sensing data into a decision model corresponding to the driving scene to obtain a first predictive control strategy corresponding to the training sample, and training the decision model corresponding to the driving scene according to the first predictive control strategy to obtain an adjusted decision model corresponding to the driving scene;

after each adjusted decision model is obtained, determining the matching degree between the training sample and each adjusted decision model aiming at each training sample, and determining the actual driving scene corresponding to the training sample according to the matching degree;

and training the scene driving model by taking the minimized deviation between the driving scene and the actual driving scene as an optimization target until the preset training condition is determined to be met, wherein the scene driving model and each decision model are used for controlling the unmanned equipment.

Optionally, according to the first predictive control strategy, training a decision model corresponding to the driving scenario to obtain an adjusted decision model corresponding to the driving scenario specifically includes:

predicting a future driving track corresponding to the training sample according to the first prediction control strategy, and determining a first score corresponding to the future driving track;

and training a decision model corresponding to the driving scene by taking the maximized first score as an optimization target to obtain an adjusted decision model corresponding to the driving scene.

Optionally, determining a matching degree between the training sample and each adjusted decision model specifically includes:

inputting the historical sensing data into each adjusted decision model to obtain a second prediction control strategy corresponding to the training sample;

determining a second score corresponding to the second predictive control strategy;

and determining the matching degree between the training sample and the adjusted decision model according to the second score.

Optionally, determining that the preset training condition is met specifically includes:

determining a target sample from each training sample aiming at each round of model training, wherein aiming at each training sample, if the actual driving scene determined by the training sample in the round of model training is determined to be different from the driving scene recognized by inputting the training sample into the scene driving model after the previous round of adjustment, the training sample is taken as the target sample;

and if the ratio of the target sample in each training sample is smaller than the set ratio, determining that the preset training condition is met.

Optionally, the method further comprises:

determining each adjusted decision model matched with the algorithm configuration as each decision model to be clustered;

clustering the decision models to be clustered according to model parameters contained in the decision models to be clustered to obtain clustering clusters;

aiming at each cluster, merging the driving scenes corresponding to the decision models to be clustered contained in the cluster to obtain a merged driving scene corresponding to the cluster;

and determining a decision model of the combined driving scene corresponding to the clustering cluster according to the decision model to be clustered contained in the clustering cluster.

Optionally, determining a decision model of a merged driving scene corresponding to the cluster according to the decision model to be clustered included in the cluster specifically includes:

aiming at each decision model to be clustered contained in the cluster, determining a weight coefficient corresponding to the decision model to be clustered according to the number of training samples belonging to the driving scene corresponding to the decision model to be clustered;

and generating a decision model for merging the driving scenes corresponding to the cluster according to the weight coefficient corresponding to each decision model to be clustered in the cluster, the model parameters contained in each decision model to be clustered in the cluster and the matched algorithm configuration contained in each decision model to be clustered in the cluster.

The present specification provides a control method of an unmanned aerial vehicle device, including:

acquiring sensing data acquired by unmanned equipment;

inputting the sensing data into a pre-trained scene driving model to obtain a driving scene corresponding to the unmanned equipment;

inputting the sensing data into a decision model matched with the driving scene to obtain a control strategy corresponding to the unmanned equipment, wherein the scene driving model and the decision model are obtained by training through the model training method;

and controlling the unmanned equipment according to the control strategy.

The present specification provides an apparatus for model training, comprising:

the driving scene determining module is used for inputting the historical sensing data serving as the training samples into a preset scene driving model aiming at each training sample to obtain a driving scene corresponding to the training sample;

the decision model training module is used for inputting the historical sensing data into a decision model corresponding to the driving scene to obtain a first prediction control strategy corresponding to the training sample, and training the decision model corresponding to the driving scene according to the first prediction control strategy to obtain an adjusted decision model corresponding to the driving scene;

the actual driving scene determining module is used for determining the matching degree between each training sample and each adjusted decision model after each adjusted decision model is obtained, and determining the actual driving scene corresponding to the training sample according to the matching degree;

and the scene driving model training module is used for training the scene driving model by taking the deviation between the driving scene and the actual driving scene as an optimization target until a preset training condition is determined to be met, wherein the scene driving model and each decision model are used for controlling the unmanned equipment.

The present specification provides a control apparatus of an unmanned aerial vehicle, including:

the acquisition module is used for acquiring sensing data acquired by the unmanned equipment;

the driving scene determining module is used for inputting the sensing data into a pre-trained scene driving model to obtain a driving scene corresponding to the unmanned equipment;

the control strategy determination module is used for inputting the sensing data into a decision model matched with the driving scene to obtain a control strategy corresponding to the unmanned equipment, and the scene driving model and the decision model are obtained by training through the model training method;

and the control module is used for controlling the unmanned equipment according to the control strategy.

The present specification provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the above-described method of model training and method of controlling an unmanned device.

The present specification provides an unmanned device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the above method of model training and method of controlling an unmanned device when executing the program.

The technical scheme adopted by the specification can achieve the following beneficial effects:

in the model training method and the control method of the unmanned aerial vehicle provided in this specification, for each training sample, historical sensing data serving as the training sample is input into a preset scene driving model to obtain a driving scene corresponding to the training sample, and the historical sensing data is continuously input into a decision model corresponding to the driving scene to obtain a first predictive control strategy corresponding to the training sample, so that the decision model corresponding to the driving scene is trained according to the first predictive control strategy to obtain an adjusted decision model corresponding to the driving scene. And then, based on the obtained adjusted decision models, determining the matching degree between the training sample and each adjusted decision model for each training sample, determining the actual driving scene corresponding to the training sample according to the matching degree, and training the scene driving model by taking the deviation between the driving scene output by the minimized scene driving model and the actual driving scene as an optimization target until the preset training condition is determined to be met. And then, in the driving process of the unmanned equipment, obtaining the driving scene of the unmanned equipment by utilizing the trained scene driving model according to the acquired sensing data collected by the unmanned equipment, inputting the sensing data into a pre-trained decision model matched with the driving scene to obtain a control strategy corresponding to the unmanned equipment, and further controlling the unmanned equipment according to the determined control strategy.

It can be seen from the above method that, in this specification, the decision models corresponding to the respective driving scenarios are trained based on the driving scenario results output by the scenario driving models to obtain the respective adjusted decision models, then the actual driving scenario corresponding to the respective training samples is determined based on the adjusted decision models, and the scenario driving models are trained by using the actual driving scenarios until the predetermined training conditions are satisfied. Then, when the unmanned equipment is controlled, the driving scene where the unmanned equipment is located is determined by using the trained scene driving model, then a decision model matched with the determined driving scene is adopted to determine a control strategy corresponding to the unmanned equipment, and the unmanned equipment is controlled according to the control strategy, so that the decision model used when the unmanned equipment is controlled is more suitable for the current environment where the unmanned equipment is located as far as possible, and the adaptability of the unmanned equipment to different driving scenes is further improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the specification and are incorporated in and constitute a part of this specification, illustrate embodiments of the specification and together with the description serve to explain the specification and not to limit the specification in a non-limiting sense. In the drawings:

FIG. 1 is a schematic flow chart of a method of model training in the present specification;

FIG. 2 is a detailed flow chart of the model training in this specification;

FIG. 3 is a schematic flow chart illustrating a method for controlling an unmanned aerial vehicle;

FIG. 4 is a schematic diagram of an apparatus for model training provided herein;

FIG. 5 is a schematic diagram of a control apparatus for an unmanned aerial vehicle provided herein;

fig. 6 is a schematic view of the drone corresponding to fig. 1 or 3 provided by the present description.

Detailed Description

In order to make the objects, technical solutions and advantages of the present disclosure more clear, the technical solutions of the present disclosure will be clearly and completely described below with reference to the specific embodiments of the present disclosure and the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present disclosure, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present specification without any creative effort belong to the protection scope of the present specification.

When unmanned driving is realized, the current main realization mode is to train under an algorithm configuration to obtain a decision model of the unmanned equipment, and determine a control strategy of the unmanned equipment by using the decision model so as to control the unmanned equipment. However, currently, a decision model is usually used for making a decision for the unmanned device, and the decision model is often not applicable to all driving scenes, which results in that in practical application, a control strategy determined by the unmanned device through the decision model is not applicable to the current driving scene, so that the unmanned device has low adaptability to different driving scenes, and certain potential safety hazard may be brought to driving of the unmanned device.

In order to solve the problem, the specification provides a model training method, which is used for training different driving scenes to obtain a decision model matched with the driving scenes, so that a decision can be made for each driving scene through the trained decision model matched with the driving scene, and the adaptability of the unmanned equipment to different driving scenes is further improved.

The method for model training and the control scheme of the unmanned aerial vehicle provided in the present specification will be described in detail below with reference to embodiments.

Fig. 1 is a schematic flow chart of a model training method in this specification, which specifically includes the following steps:

step S100, aiming at each training sample, inputting historical sensing data serving as the training sample into a preset scene driving model to obtain a driving scene corresponding to the training sample.

In this specification, the historical sensing data as the training sample is the historical sensing data collected by a specific device (which may be a device dedicated to collecting sensing data when the driver drives, or may be an unmanned device) equipped with various sensors (such as a camera, a laser radar, a millimeter wave radar, and the like) during actual road driving, and the historical sensing data may include: specifying the environment in which the device is located, specifying the travel state of the device, specifying the control amount of the device, and the like. The environment where the designated device is located may include the type of the road where the designated device is located (e.g., mountain roads, urban and rural roads, expressways, urban main roads, etc.), the road conditions of the road where the designated device is located (e.g., wet and slippery road conditions, road surface water accumulation conditions, road surface leveling conditions, lane numbers, traffic flow sizes, etc.), obstacles around the road segment where the designated device is located (e.g., obstacle vehicles (including vehicle types, vehicle positions, etc.), pedestrians, etc.), and the like. The running state of the specified device may include a speed of the specified device, an acceleration of the specified device, an angular velocity of the specified device, and the like. The control quantity of the designated equipment can comprise the strength of an accelerator of the designated equipment, the strength of a brake of the designated equipment, the rotation angle of a steering wheel of the designated equipment and the like.

In specific implementation, after the unmanned equipment inputs the historical sensing data serving as the training sample into a preset scene driving model for each training sample, the scene driving model outputs a driving scene corresponding to the training sample, and then a decision model needing to be trained can be determined according to the driving scene.

It should be noted that the main execution body of the model training method in the above description is the unmanned aerial vehicle, and of course, the server providing service support for the unmanned aerial vehicle in this specification may also be the main execution body of the model training method. For convenience of description, only the unmanned aerial vehicle will be taken as an example of the execution subject. The unmanned device may be a vehicle, a robot, an automatic distribution device, or other devices capable of realizing automatic driving. Based on this, the unmanned device configured with the model obtained by applying the model training method provided in the present specification can be used for executing delivery tasks in the delivery field, such as business scenes of delivery such as express delivery, logistics, takeaway and the like by using the unmanned device.

Each driving scene corresponds to a decision model, the decision model is formed by algorithm configuration and model parameters, and the decision models are at least partially different in model parameters. When the decision models corresponding to a plurality of preset driving scenes are determined, the decision models corresponding to the driving scenes can be set under each algorithm configuration, and then the decision models corresponding to all the driving scenes are obtained. The algorithm configuration may refer to a deep learning neural network, a trajectory planning algorithm, and the like with different operation logics.

Step S102, inputting the historical sensing data into a decision model corresponding to the driving scene to obtain a first predictive control strategy corresponding to the training sample, and training the decision model corresponding to the driving scene according to the first predictive control strategy to obtain an adjusted decision model corresponding to the driving scene.

In specific implementation, for each training sample, the unmanned device may input historical sensing data of the training sample into a decision model matched with the determined driving scene corresponding to the training sample, so as to obtain a first predictive control strategy corresponding to the training sample. And then, based on the sensing data in the training sample, predicting a future driving track which can be obtained after the appointed equipment drives according to a first prediction control strategy, further determining a first score corresponding to the future driving track, taking the maximized first score as an optimization target, and training a decision model corresponding to the driving scene to obtain an adjusted decision model corresponding to each driving scene.

The first predictive control strategy may include a speed control strategy for controlling the speed of the specific device and a steering angle control strategy for controlling the steering angle of the specific device. When the first predictive control strategy includes only a speed control strategy, the first predictive control strategy may include a force to throttle a given device, a force to brake a given device. When the first predictive control strategy includes both the speed control strategy and the turning angle control strategy, the first predictive control strategy may include a force for specifying a device throttle, a force for specifying a device brake, and a specified device steering angle.

After the unmanned equipment determines the first predictive control strategy corresponding to the training sample, the corresponding first score is further determined according to the first predictive control strategy, and the decision model is trained according to the first score.

The unmanned equipment can perform simulation test according to the determined first prediction control strategy based on historical sensing data contained in the training sample to obtain a future driving track obtained after the designated equipment drives according to the first prediction control strategy, and determines a first score corresponding to the future driving track according to the difference between the future driving track and the actual driving track of the designated equipment. Wherein, the higher the coincidence degree between the future travel track and the actual travel track of the specified device, the higher the first score corresponding to the future travel track.

Of course, when the unmanned device predicts the future travel track, the future travel track may further include the predicted sensing data corresponding to the designated device traveling according to the first prediction control strategy, and then, the first score corresponding to the future travel track is determined according to the predicted sensing data in the future travel track and the historical sensing data in the training sample. The sensory data used to determine the first score may include, among other things, a velocity of the designated device, an acceleration of the designated device, a steering angle of the designated device, and the like. Since, in general, the faster the speed of specifying a device, the faster the destination can be reached, the faster the speed, the higher the score can be set. The more stable the change of the acceleration of the specified equipment is, the more stable the running process of the specified equipment is, and the higher the comfort level is, so that the more stable the change of the acceleration can be set, and the higher the score is. The smaller the steering angle of the specified equipment is, the more stable the running process of the specified equipment is, and the higher the comfort level is, so that the more stable the change of the steering angle can be set, and the higher the score is.

Further, since different first driving scenes have different emphasis points concerned in decision making of the decision model, a part of sensing data dedicated to the driving scene can be selected according to characteristics of the driving scene when determining the corresponding first score according to different driving scenes so as to determine the first score.

For example, in the following driving scene, the distance between the designated device and the preceding vehicle cannot be too small or too large, and therefore, the first score corresponding to the future driving trajectory can be determined by predicting the change in the distance between the designated device and the preceding vehicle in the future driving trajectory when the designated device drives according to the first control strategy. In this case, the position data of the designated device and the position data of the preceding vehicle may be used as the sensing data dedicated to the determination of the first score in the scene of the following vehicle.

For another example, in a turning driving scene, the specifying device needs to drive in a lane without overtaking or changing lanes, so that the first score corresponding to the future driving track can be determined by predicting the change situation of the distance between the vehicle body and the lane lines on two sides in the future driving track when the specifying device drives according to the first control strategy. In this case, the distance between the left vehicle body of the specifying device and the left lane line and the distance between the right vehicle body of the specifying device and the right lane line may be used as the sensing data dedicated to determining the first score in the turning driving scene.

It should be noted that, the above is only a specific example of determining the first score in two driving scenarios, and in practical application, different sensing data may be selected according to actual requirements for different driving scenarios to determine the first score corresponding to the future driving trajectory, which is not specifically limited herein.

And S104, after each adjusted decision model is obtained, determining the matching degree between the training sample and each adjusted decision model for each training sample, and determining the actual driving scene corresponding to the training sample according to the matching degree.

In specific implementation, after the unmanned device obtains each adjusted decision model, the matching degree between the training sample and each adjusted decision model is determined based on all the adjusted decision models. In specific implementation, the unmanned device may input the historical sensing data into each adjusted decision model to obtain a second predictive control strategy corresponding to the training sample, determine a second score corresponding to the second predictive control strategy, and determine the matching degree between the training sample and the adjusted decision model according to the second score. The determination manner of the second score corresponding to the second predictive control strategy may refer to the determination manner of the first score corresponding to the future driving track, and will not be elaborated herein.

In this specification, when determining the matching degree between the training sample and each adjusted decision model according to the second score, the higher the second score is, the higher the matching degree between the adjusted decision model corresponding to the second predictive control strategy corresponding to the second score and the training sample is, that is, the higher the possibility that the training sample belongs to the driving scene corresponding to the adjusted decision model is. In this case, the driving scene corresponding to the decision model with the highest matching degree may be used as the actual driving scene corresponding to the sample.

Of course, the matching degree between the training sample and each adjusted decision model may be determined in other manners in the present specification. For example, when a decision model is trained, a model update step size during model training is determined according to a first score obtained during training of each training sample. The more suitable the training sample is for which decision model, the smaller the model updating step length obtained during training tends to be, so the matching degree between the training sample and each adjusted decision model can be determined according to the model updating step length corresponding to each training sample during the training of each decision model, wherein the smaller the model updating step length is, the higher the matching degree is. Other ways are not illustrated in detail here.

And step S106, training the scene driving model by taking the deviation between the driving scene and the actual driving scene as an optimization target until a preset training condition is met.

In this specification, when the unmanned aerial vehicle trains a scene driving model, the unmanned aerial vehicle inputs, for each training sample, historical sensing data serving as the training sample into a preset scene driving model to obtain a driving scene corresponding to the training sample, and then trains the scene driving model with a minimized deviation between the driving scene and an actual driving scene as an optimization target. And after training of all training samples is finished, judging whether preset training conditions are met, if not, re-determining the driving scene corresponding to the training sample through the adjusted scene driving model aiming at each training sample, so as to train the decision model and the scene driving model until the preset training conditions are met.

In this specification, the unmanned aerial vehicle may determine whether a preset training condition is satisfied in the following manner. Specifically, the unmanned aerial vehicle determines the actual driving scene determined in the training round from all training samples, and takes the training samples as target training samples, which are different from the driving scene identified by inputting the training samples into the scene driving model after the previous round of adjustment. And then, determining the proportion of the target sample in all the training samples, judging whether the proportion is less than a set proportion, if so, determining that a preset training condition is met, completing model training, and otherwise, continuing to perform the next round of decision-making model training and scene driving model training.

That is, the smaller the number of training samples giving different actual driving scenes in the two preceding and following training runs (the smaller the proportion of the target sample in all the training samples), the stronger the capability of the scene driving model to recognize the driving scene corresponding to the training sample. In other words, the driving scenes output by the scene driving model when the scene driving model classifies the training samples are not frequently changed any more between each round of training, the realized classification logic tends to be stable, and the driving scenes output by the scene driving model can be considered to be more accurate. Correspondingly, the number of training samples of different actual driving scenes given by the front and back training is reduced, and the fact that the actual driving scenes corresponding to the training samples can be accurately determined through the adjusted decision model by adopting the scoring mode is also shown, so that the control strategy determined by the adjusted decision model is more accurate.

Of course, the preset training condition may also include other forms, for example, when the model training turns reach the set turns, it may be determined that the preset training condition is satisfied; for another example, after each round of training, the scene recognition model and the decision model corresponding to each driving scene are verified by using the verification sample, and after the verification is determined to pass, the preset training condition is determined to be met, and the like. Other ways are not illustrated in detail here.

Further, in order to accelerate the model training speed in this specification, when a first round of model training is performed, for each training sample, the historical sensing data serving as the training sample may be directly input to at least part of the decision models to obtain at least part of the first predictive control strategies corresponding to the training sample, and corresponding first scores may be determined, so that the decision models are trained according to the first scores to obtain the adjusted decision models. And then, determining an actual driving scene corresponding to the training sample based on the adjusted decision model, and training the scene driving model to obtain an adjusted scene driving model. A second round of model training is then performed.

In the second round of model training, aiming at each training sample, inputting the historical sensing data serving as the training sample into the adjusted scene driving model to obtain a driving scene corresponding to the training sample, and training the decision model corresponding to the driving scene to obtain a new adjusted decision model. And then, based on the new adjusted decision model, re-determining the actual driving scene corresponding to each training sample so as to train the scene driving model to obtain a new adjusted scene driving model. And then, judging whether the preset training condition is met, and if not, starting the next round of training until the preset training condition is determined to be met.

In order to further increase the speed of model training, in this specification, all training samples in the training sample set may be further divided into a plurality of sub-training sample sets by performing refined division before performing model training. Then, when model training is performed, after training samples belonging to the same sub-training sample set are input into the scene driving model, driving scenes corresponding to the obtained training samples are consistent, and therefore training is performed on the scene recognition model and the decision model corresponding to each driving scene.

A detailed flow chart of the method for model training provided in the present specification will be described in detail below with reference to the accompanying drawings, and refer to fig. 2.

200, obtaining a training sample for training a decision model, and inputting the training sample into a preset scene driving model to obtain a driving scene corresponding to the training sample.

Step 202, inputting the training sample into the decision model corresponding to the driving scene to obtain a first predictive control strategy corresponding to the training sample, and training the decision model corresponding to the driving scene according to the first predictive control strategy to obtain an adjusted decision model.

And step 204, judging whether training samples which are not used for training the decision model exist, if so, returning to the step 200, otherwise, executing the step 206.

Step 206, after each adjusted decision model is obtained, determining the matching degree between the training sample and each adjusted decision model for each training sample, and determining the actual driving scene corresponding to the training sample according to the matching degree;

and step 208, training the scene driving model by taking the deviation between the driving scene corresponding to the training sample and the actual driving scene as an optimization target, so as to obtain the adjusted scene driving model.

Step 210, judging whether a training sample which is not used for training the scene driving model exists, if so, returning to execute step 206, otherwise, executing step 212.

Step 212, determining that the round of model training is completed, and determining whether a preset training condition is met, if so, ending the process, otherwise, executing step 214.

Step 214, obtaining a training sample for training the decision model, inputting the training sample into the adjusted scene driving model to obtain a driving scene corresponding to the training sample, and continuing to execute step 202.

In this specification, in order to better cope with various driving scenes during model training, it may be designed that the driving scene types that the scene driving models can classify are more, and thus, the driving scenes determined after training and the decision models corresponding to the driving scenes may have a certain degree of redundancy. Therefore, in practical application, after model training is completed, clustering can be performed on each decision model, and each driving scene belonging to the same cluster and the decision models corresponding to each driving scene are combined.

Specifically, the unmanned equipment determines each adjusted decision model matched with algorithm configuration to serve as each decision model to be clustered, then clusters each decision model to be clustered according to model parameters contained in each decision model to be clustered to obtain each cluster, then merges driving scenes corresponding to the decision models to be clustered contained in each cluster to obtain a merged driving scene corresponding to the cluster, and finally determines the decision model of the merged driving scene corresponding to the cluster according to the decision model to be clustered contained in the cluster.

When the unmanned equipment determines each decision model to be clustered from all the adjusted decision models, the model parameters of all the adjusted decision models can be set as the same parameter value, and then, according to the operation logic realized by each decision model, each adjusted decision model with similar operation logic is regarded as each adjusted decision model with matched algorithm configuration and is used as each decision model to be clustered.

Further, after the unmanned equipment determines each decision model to be clustered, clustering the decision models to be clustered according to model parameters contained in each decision model to be clustered aiming at the decision models to be clustered to obtain each cluster.

In specific implementation, when the unmanned equipment clusters each decision model to be clustered, the similarity between any two decision models to be clustered is determined according to the model parameters of each decision model to be clustered, and each decision model to be clustered is clustered according to the determined similarity to obtain each cluster.

There are various ways for the unmanned device to determine the similarity between the two decision models to be clustered. For example, the unmanned device may construct a vector for representing each to-be-clustered decision model according to the model parameter of each to-be-clustered decision model, then determine a cosine distance between the two vectors, and determine a similarity between the two to-be-clustered decision models according to the cosine distance, wherein the smaller the cosine distance between the two vectors is, the higher the similarity between the corresponding to-be-clustered decision models is.

For another example, after a vector for representing each to-be-clustered decision model is constructed by the unmanned equipment according to the model parameter of each to-be-clustered decision model, subtracting the vectors of the two to-be-clustered decision models to obtain a difference vector describing the difference between the two to-be-clustered decision models, then determining a norm corresponding to the difference vector, and determining the similarity between the two to-be-clustered decision models according to a norm corresponding to the difference vector, wherein the smaller the norm corresponding to the difference vector is, the higher the similarity between the corresponding to-be-clustered decision models is. It should be noted that the above are only two specific examples for determining a cluster, and other clustering methods may also be used for clustering in this specification, which is not necessarily illustrated.

After the unmanned equipment determines each cluster, merging the driving scenes corresponding to the decision models to be clustered contained in each cluster to obtain the merged driving scene corresponding to the cluster, and finally determining the decision model of the merged driving scene corresponding to the cluster according to the decision models to be clustered contained in the cluster.

In specific implementation, the unmanned equipment determines a weight coefficient corresponding to each to-be-clustered decision model for each to-be-clustered decision model included in the cluster according to the number of training samples belonging to the driving scene corresponding to the to-be-clustered decision model, and then generates a decision model for merging the driving scenes corresponding to the cluster according to the weight coefficient corresponding to each to-be-clustered decision model in the cluster, model parameters included in each to-be-clustered decision model in the cluster, and matched algorithm configuration included in each to-be-clustered decision model in the cluster.

For example, cluster 1 includes: the method comprises the following steps that a decision model A to be clustered, a decision model B to be clustered and a decision model C to be clustered are adopted, wherein model parameters of the decision model A to be clustered are represented as [ a1, a2, a3, …, an ], and the weight coefficient is 0.2; the model parameters of the decision model B to be clustered are represented as [ B1, B2, B3, …, bn ], and the weight coefficient is 0.3; the model parameters of the decision model C to be clustered are represented as [ C1, C2, C3, …, cn ], and the weight coefficient is 0.5; and the algorithm configuration of the three decision models to be clustered is the same. Then, the model parameters of the decision model for the merged driving scenario corresponding to the cluster may be represented as [0.2 × a1+0.3 × b1+0.5 × c1, 0.2 × a2, +0.3 × b2, +0.5 × c2, 0.2 × a3+0.3 × b3+0.5 × c3, …, 0.2 × an +0.3 × bn +0.5 × cn ].

The above description provides only an example of a decision model for merging driving scenes, and in this specification, the decision model can be specifically set according to actual requirements, so that the merging manner of the decision model for driving scenes is not specifically limited.

Further, in this specification, for each cluster, one decision model to be clustered may be directly selected from the decision models to be clustered in the cluster, and the selected decision model may be used as the decision model for the merged driving scene corresponding to the cluster.

For example, the unmanned device may determine, for each to-be-clustered decision model included in each cluster, the number of training samples belonging to the driving scene corresponding to each to-be-clustered decision model, and select the to-be-clustered decision model with the largest number of training samples as the decision model of the merged driving scene corresponding to the cluster. For another example, for each cluster, the unmanned equipment randomly selects one decision model to be clustered from the decision models to be clustered in the cluster, and the selected decision model is used as the decision model of the combined driving scene corresponding to the cluster. Other ways are not given as examples.

In the model training method provided in this specification, a decision model is trained based on an initial driving scene classification given by a scene driving model, so that the adjusted decision model has a certain decision-making capability. Then, under the condition that the adjusted decision model has certain decision-making capability, the adjusted decision model is used for determining the actual driving scene corresponding to each training sample (namely the driving scene corresponding to the most suitable decision model), and the actual driving scene corresponding to each training sample is used as a label to train the scene driving model in return. After repeated iterative training is carried out in the mode, the trained scene driving model can be ensured to have higher scene classification capability, and the trained decision model has higher decision capability.

Further, in practical application, the division of the driving scenes is often complex, so if the actual driving scenes corresponding to the training samples are determined in an artificial manner, the accuracy is often low, and in the model training process in the specification, the actual driving scenes corresponding to the training samples do not need to be marked in an artificial manner, but the decision model corresponding to which driving scene the training samples are suitable for is determined by using the control strategy determined by the adjusted decision model, so that the labor cost is effectively saved, and the actual driving scenes corresponding to the training samples can be accurately determined.

Fig. 3 is a schematic flow chart of a control method of an unmanned aerial vehicle in this specification, and specifically includes the following steps:

step 300, acquiring sensing data acquired by the unmanned equipment;

step 302, inputting the sensing data into a pre-trained scene driving model to obtain a driving scene corresponding to the unmanned equipment;

step 304, inputting the sensing data into a decision model matched with the driving scene to obtain a control strategy corresponding to the unmanned equipment;

and step 306, controlling the unmanned equipment according to the control strategy.

The scene driving model and the decision model are obtained by training through the model training method.

In a specific implementation, the control of the unmanned device may be decoupled into two parts in this specification: speed control and steering angle control. If the scene driving model and the decision model obtained by training in the specification are used, the speed of the unmanned equipment is controlled, the unmanned equipment inputs sensing data into the scene driving model, and after the corresponding driving scene is obtained, the sensing data is continuously input into the decision model matched with the driving scene, so that the speed control strategy corresponding to the unmanned equipment is obtained. When the unmanned equipment runs according to the speed control strategy, the speed control of the unmanned equipment can be realized, wherein the speed control strategy can comprise the strength of an accelerator of the unmanned equipment and the strength of a brake of the unmanned equipment. And the unmanned device is controlled based on the turning angle, the turning angle control strategy can be determined by the unmanned device through a track tracking algorithm (such as pure-tracking (pure-pursuit) algorithm, Stanley and the like).

If the scene driving model and the decision model obtained by training in the specification are used, speed control and corner control of the unmanned equipment are simultaneously realized, the unmanned equipment inputs sensing data into the scene driving model, and after the corresponding driving scene is obtained, the sensing data is continuously input into the decision model matched with the driving scene, so that a speed control strategy and a corner control strategy corresponding to the unmanned equipment are obtained. When the unmanned equipment runs according to the speed control strategy, the speed control of the unmanned equipment can be realized, wherein the speed control strategy can comprise the strength of an accelerator of the unmanned equipment and the strength of a brake of the unmanned equipment. Meanwhile, when the unmanned equipment runs according to the corner control strategy, the corner control of the unmanned equipment can be realized, and the corner control strategy can comprise the turning angle of a steering wheel of the vehicle.

Through the steps, the unmanned equipment can train the decision models corresponding to the driving scenes on the basis of the driving scene results output by the scene driving models to obtain each adjusted decision model, then determines the actual driving scene corresponding to each training sample on the basis of the adjusted decision models, and trains the scene driving models by using the actual driving scenes until the scene driving models meet the preset training conditions. Then, when the unmanned equipment is controlled, the driving scene where the unmanned equipment is located is determined by using the trained scene driving model, and then a decision model matched with the determined driving scene is adopted to determine a control strategy corresponding to the unmanned equipment to control the unmanned equipment, so that the decision model used when the unmanned equipment is controlled is more suitable for the current environment where the unmanned equipment is located as far as possible, and the adaptability of the unmanned equipment to different driving scenes is improved.

The above method for model training and method for controlling an unmanned aerial vehicle provided for one or more embodiments of the present specification are based on the same idea, and the present specification further provides a corresponding apparatus for model training and a corresponding apparatus for controlling an unmanned aerial vehicle, as shown in fig. 4 and 5.

Fig. 4 is a schematic diagram of a model training apparatus provided in this specification, which specifically includes:

the driving scene determining module 400 is configured to, for each training sample, input historical sensing data serving as the training sample into a preset scene driving model to obtain a driving scene corresponding to the training sample;

a decision model training module 401, configured to input the historical sensing data into a decision model corresponding to the driving scenario, to obtain a first predictive control strategy corresponding to the training sample, and train the decision model corresponding to the driving scenario according to the first predictive control strategy, to obtain an adjusted decision model corresponding to the driving scenario;

an actual driving scene determining module 402, configured to determine, for each training sample, a matching degree between the training sample and each adjusted decision model after obtaining each adjusted decision model, and determine, according to the matching degree, an actual driving scene corresponding to the training sample;

a scene driving model training module 403, configured to train the scene driving model with a minimized deviation between the driving scene and the actual driving scene as an optimization target until it is determined that a preset training condition is met, where the scene driving model and each decision model are used for controlling the unmanned aerial vehicle.

Optionally, the decision model training module 401 is specifically configured to predict a future driving trajectory corresponding to the training sample according to the first predictive control strategy, and determine a first score corresponding to the future driving trajectory; and training a decision model corresponding to the driving scene by taking the maximized first score as an optimization target to obtain an adjusted decision model corresponding to the driving scene.

Optionally, the decision model training module 401 is specifically configured to, for each adjusted decision model, input the historical sensing data into the adjusted decision model to obtain a second predictive control strategy corresponding to the training sample; determining a second score corresponding to the second predictive control strategy; and determining the matching degree between the training sample and the adjusted decision model according to the second score.

Optionally, the scene driving model training module 403 is specifically configured to determine, for each round of model training, a target sample from each training sample, where, for each training sample, if it is determined that the actual driving scene determined by the training sample in the round of model training is different from the driving scene identified by inputting the training sample into the last round of adjusted scene driving model, the training sample is used as the target sample; and if the ratio of the target sample in each training sample is smaller than the set ratio, determining that the preset training condition is met.

Optionally, the apparatus further comprises:

a clustering module 404, configured to determine each adjusted decision model matched with the algorithm configuration as each decision model to be clustered; clustering the decision models to be clustered according to model parameters contained in the decision models to be clustered to obtain clustering clusters; aiming at each cluster, merging the driving scenes corresponding to the decision models to be clustered contained in the cluster to obtain a merged driving scene corresponding to the cluster; and determining a decision model of the combined driving scene corresponding to the clustering cluster according to the decision model to be clustered contained in the clustering cluster.

Optionally, the clustering module 404 is specifically configured to, for each decision model to be clustered included in the cluster, determine a weight coefficient corresponding to the decision model to be clustered according to the number of training samples belonging to the driving scene corresponding to the decision model to be clustered; and generating a decision model for merging the driving scenes corresponding to the cluster according to the weight coefficient corresponding to each decision model to be clustered in the cluster, the model parameters contained in each decision model to be clustered in the cluster and the matched algorithm configuration contained in each decision model to be clustered in the cluster.

Fig. 5 is a schematic diagram of a control device of an unmanned aerial vehicle provided in this specification, and specifically includes:

the acquisition module 500 is used for acquiring sensing data acquired by the unmanned equipment;

a scene determining module 501, configured to input the sensing data into a pre-trained scene driving model, so as to obtain a driving scene corresponding to the unmanned device;

a control strategy determining module 502, configured to input the sensing data into a decision model matched with the driving scene to obtain a control strategy corresponding to the unmanned device, where the scene driving model and the decision model are obtained by training through the model training method;

and a control module 503, configured to control the unmanned device according to the control policy.

The present specification also provides a computer-readable storage medium having stored thereon a computer program operable to execute the method of model training provided in fig. 1 above or the method of controlling an unmanned aerial device provided in fig. 3 above.

This description also provides a schematic block diagram of the drone shown in figure 6. As shown in fig. 6, the drone includes, at the hardware level, a processor, an internal bus, a network interface, a memory, and a non-volatile memory, although it may also include hardware needed for other services. The processor reads a corresponding computer program from the non-volatile memory into the memory and then runs the computer program to implement the method of model training provided in fig. 1 above or the method of controlling the drone provided in fig. 3 above. Of course, besides the software implementation, the present specification does not exclude other implementations, such as logic devices or a combination of software and hardware, and the like, that is, the execution subject of the following processing flow is not limited to each logic unit, and may be hardware or logic devices.

In the 90 s of the 20 th century, improvements in a technology could clearly distinguish between improvements in hardware (e.g., improvements in circuit structures such as diodes, transistors, switches, etc.) and improvements in software (improvements in process flow). However, as technology advances, many of today's process flow improvements have been seen as direct improvements in hardware circuit architecture. Designers almost always obtain the corresponding hardware circuit structure by programming an improved method flow into the hardware circuit. Thus, it cannot be said that an improvement in the process flow cannot be realized by hardware physical modules. For example, a Programmable Logic Device (PLD), such as a Field Programmable Gate Array (FPGA), is an integrated circuit whose Logic functions are determined by programming the Device by a user. A digital system is "integrated" on a PLD by the designer's own programming without requiring the chip manufacturer to design and fabricate application-specific integrated circuit chips. Furthermore, nowadays, instead of manually making an Integrated Circuit chip, such Programming is often implemented by "logic compiler" software, which is similar to a software compiler used in program development and writing, but the original code before compiling is also written by a specific Programming Language, which is called Hardware Description Language (HDL), and HDL is not only one but many, such as abel (advanced Boolean Expression Language), ahdl (alternate Hardware Description Language), traffic, pl (core universal Programming Language), HDCal (jhdware Description Language), lang, Lola, HDL, laspam, hardward Description Language (vhr Description Language), vhal (Hardware Description Language), and vhigh-Language, which are currently used in most common. It will also be apparent to those skilled in the art that hardware circuitry that implements the logical method flows can be readily obtained by merely slightly programming the method flows into an integrated circuit using the hardware description languages described above.

The controller may be implemented in any suitable manner, for example, the controller may take the form of, for example, a microprocessor or processor and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, and an embedded microcontroller, examples of which include, but are not limited to, the following microcontrollers: ARC 625D, Atmel AT91SAM, Microchip PIC18F26K20, and Silicone Labs C8051F320, the memory controller may also be implemented as part of the control logic for the memory. Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may thus be considered a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functions of the various elements may be implemented in the same one or more software and/or hardware implementations of the present description.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.

The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A method of model training, comprising:

2. The method according to claim 1, wherein training the decision model corresponding to the driving scenario according to the first predictive control strategy to obtain the adjusted decision model corresponding to the driving scenario specifically comprises:

3. The method of claim 2, wherein determining a degree of match between the training sample and each adjusted decision model comprises:

4. The method of claim 1, wherein determining that a predetermined training condition is satisfied specifically comprises:

5. The method of claim 1, wherein the method further comprises:

6. The method according to claim 5, wherein determining the decision model of the merged driving scenario corresponding to the cluster according to the decision model to be clustered included in the cluster specifically comprises:

7. A control method of an unmanned aerial vehicle, characterized by comprising:

acquiring sensing data acquired by unmanned equipment;

inputting the sensing data into a decision model matched with the driving scene to obtain a control strategy corresponding to the unmanned equipment, wherein the scene driving model and the decision model are obtained by training through the model training method according to any one of claims 1 to 6;

and controlling the unmanned equipment according to the control strategy.

8. An apparatus for model training, comprising:

9. A control apparatus of an unmanned aerial vehicle, characterized by comprising:

a control strategy determination module, configured to input the sensing data into a decision model matched with the driving scene to obtain a control strategy corresponding to the unmanned aerial vehicle, where the scene driving model and the decision model are obtained by training through the model training method according to any one of claims 1 to 6;

10. A computer-readable storage medium, characterized in that the storage medium stores a computer program which, when executed by a processor, implements the method of any of the preceding claims 1 to 6 or the method of claim 7.

11. An unmanned aerial vehicle comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of any one of claims 1 to 6 or the method of claim 7.