CN111353644B

CN111353644B - Prediction model generation method of intelligent network cloud platform based on reinforcement learning

Info

Publication number: CN111353644B
Application number: CN202010122791.0A
Authority: CN
Inventors: 黄通文; 韩胜明
Original assignee: Chengdu Meiyun Zhixiang Intelligent Technology Co ltd
Current assignee: Chengdu Meiyun Zhixiang Intelligent Technology Co ltd
Priority date: 2020-02-27
Filing date: 2020-02-27
Publication date: 2023-04-07
Anticipated expiration: 2040-02-27
Also published as: CN111353644A

Abstract

The invention discloses a prediction model generation method of an intelligent networking cloud platform based on reinforcement learning, which relates to the technical field of intelligent networking automobile cloud platform prediction, and is characterized in that an automobile information prediction network model is generated by utilizing a plurality of acquired automobile information data and operator character sequences and combining an RNN Controller network and a model analyzer, wherein the operator character sequences have a plurality of operators, so that the operators of the generated automobile information prediction network model have a plurality of operators, have structural diversity and can realize better prediction effect; the method has the advantages that a reinforcement learning thought is initially adopted in the automobile information prediction network model, the recognition network structure is transformed into the prediction problem, a set of universal automatic prediction model generation method is adapted, a good network structure thought is selected from a large amount of search spaces, the structure does not need to be designed manually, time and working cost are saved, and efficiency is higher; the idea of weight sharing is adopted, so that the searching efficiency is improved and is about 1000 times faster than that of a non-shared model.

Description

Prediction model generation method of intelligent network cloud platform based on reinforcement learning

Technical Field

The invention relates to a prediction model generation method of an intelligent networking cloud platform based on reinforcement learning, and belongs to the technical field of intelligent networking automobile cloud platform prediction.

Background

The intelligent networked automobile cloud platform is connected with an entire automobile enterprise, an automobile owner (user) and terminal equipment (an intelligent networked automobile and the like), and records the operation record of the automobile owner (user) and the operation state of the terminal equipment (the intelligent networked automobile) in real time.

By combing the current intelligent networked automobile cloud platform prediction model, the fact that the current network structure prediction model is artificially and finely selected for characteristics and structural model design is found, and a novel model or algorithm is usually designed, so that an algorithm engineer is required to have a solid theoretical basis, and meanwhile, the novel model or algorithm has strong engineering capability and innovation capability, and the algorithm engineer is required to produce an effective model in a long period. For example, network structures such as common recommended model structure Logistic Regression (LR), factorization Machine (FM), FFM, DNN, deep FM, DCN, XDeepFM, and FiBiNET all need to consume a lot of time and work cost of an algorithm engineer, and efficiency is relatively low.

Disclosure of Invention

The invention provides a prediction model generation method of an intelligent networking cloud platform based on reinforcement learning, the model generation efficiency is high, and the generated model can greatly improve the prediction precision.

In order to alleviate the above problems, the technical scheme adopted by the invention is as follows:

the invention provides a prediction model generation method of an intelligent networking cloud platform based on reinforcement learning, which comprises the following steps:

s1, acquiring a plurality of automobile information data from an intelligent networked automobile cloud platform;

s2, preprocessing the automobile information data, forming an automobile information data set, and dividing the automobile information data set into a training data set and a testing data set;

s3, selecting a plurality of types of network structure models, abstracting and inducing a plurality of operators from the network structure models, and forming an operator data set;

s4, constructing a model generation architecture, wherein the model generation architecture comprises an RNN Controller network and a model resolver;

s5, initializing the iteration number K =0, and setting an iteration number threshold K _m ；

S6, the RNN Controller network randomly generates S different operator character sequences according to the operator data set, and each operator character sequence is composed of a plurality of operators which are randomly sampled from the operator data set;

s7, respectively converting the S operator character sequences into S sub-models by the model analyzer, randomly initializing the parameters of each current sub-model if K =0, and respectively initializing the parameters of the S sub-models obtained in the previous training round to the parameters of the current S sub-models if K is not equal to 0;

s8, training each current submodel through a training data set, storing parameters of each submodel, evaluating each currently trained submodel according to a test data set, and respectively obtaining S rewards, wherein K = K +1;

s9, if K = K _m And if not, updating the parameters of the RNN Controller network according to the current rewarded by adopting a PolicyGradient reinforcement learning algorithm, and then jumping to the step S6.

The technical effect of the technical scheme is as follows: operators of the generated automobile information prediction network model are various, and the automobile information prediction network model has structural diversity and can achieve a better prediction effect; the method is characterized in that a reinforcement learning thought is initially adopted in the automobile information prediction network model, the recognition network structure is transformed into the prediction problem, a set of general automatic prediction model generation method is adapted, a good network structure thought is selected from a large amount of search spaces, the structure is not required to be designed manually, the time and the working cost are saved, and the efficiency is higher; and a weight sharing idea is adopted, the trained parameters in the previous round are used as the parameters of the current sub-model and then the sub-model is studied again, so that the searching efficiency is improved and is about 1000 times faster than that of a non-shared model.

Specifically, in step S1, the vehicle information data is vehicle information data or vehicle user personal information data; the vehicle information data comprises vehicle parameter data, vehicle real-time driving data and current road environment data of the vehicle.

The technical effect of the technical scheme is as follows: vehicle information data are input, and the finally obtained vehicle information prediction network model can be used for predicting the driving behavior of the intelligent networked vehicle; personal information data of automobile users are input, and the finally obtained automobile information prediction network model can be used for automobile price prediction.

More specifically, the step S2 specifically includes: and (2) arranging the automobile information data into a plurality of training examples in a form of < X, Y >, wherein each training example forms the training data set, X is the information data characteristic, and Y is the prediction target.

The technical effect of the technical scheme is as follows: the automobile information data is arranged into a form of < X, Y >, so that the finally obtained automobile information prediction network model has strong field universality and can be adapted to different prediction tasks, and the automobile information prediction network model is actually tested and applied in different application scenes (such as prediction of intelligent internet automobile driving operation behaviors, prediction of user click conditions, recommendation of automobiles with proper price according to user conditions and the like) and comprises a recommendation system in the current internet field.

Optionally, in step S3, the network structure models are ten types, and are respectively FM, PNN, PIN, HNN, deep FM, fiBiNET, DCN, XDeepFM, AFM, and AutoInt.

The technical effect of the technical scheme is as follows: the ten types are the existing mainstream models, and operators extracted from the ten types can enable the automobile information prediction network model to have diversity and excellent prediction effect.

Optionally, the RNN Controller network is a two-layer LSTM network, and the hidden layer size is 256; the model parser is a decoding algorithm for stacking the operator character sequences into a model structure of TensorFlow.

Optionally, in the step S5, K _m ＝2000。

Optionally, S =2.

The technical effect of the technical scheme is as follows: the finally generated automobile information prediction network model can be optimal, and the prediction effect is best.

Optionally, the automobile information prediction network model is saved in a checkpoint file format.

Optionally, the submodel is a tensrflow model.

The technical effect of the technical scheme is as follows: the test process of the model is simple, and the model can be conveniently deployed to an intelligent networking automobile cloud platform.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

FIG. 1 is a flowchart of a prediction model generation method of an intelligent networked cloud platform based on reinforcement learning in an embodiment;

FIG. 2 is a diagram of relationships between RNN Controller networks, model resolvers, and sub-models in an embodiment;

FIG. 3 is a schematic diagram of the RNN Controller network generating the operator character sequence according to the embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

Referring to fig. 1, fig. 2 and fig. 3, the embodiment provides a prediction model generation method for an intelligent networking cloud platform based on reinforcement learning, including the following steps:

s1, obtaining a plurality of automobile information data from an intelligent networking automobile cloud platform.

In this embodiment, the vehicle information data is vehicle information data, and the vehicle information data includes vehicle parameter data, vehicle real-time driving data, and current road environment data of the vehicle.

The vehicle parameter data is the basic parameter content of the vehicle recorded in the specification after the vehicle leaves a factory, wherein the basic parameter content comprises engine parameters, power battery parameters and sensor parameter data; the sensor parameter data relates to laser radar parameter data, millimeter wave radar parameter data, ultrasonic radar parameter data, infrared night vision device parameter data, high definition camera parameter data, corner sensor parameter data and rotation speed sensor parameter data.

The vehicle real-time driving data comprises vehicle real-time steering data, vehicle real-time braking data, vehicle real-time driving data and vehicle real-time sensor detection data.

The current road environment data of the vehicle comprises traffic light indication information data and pedestrian position and flow information data.

S2, preprocessing the automobile information data, forming an automobile information data set, and dividing the automobile information data set into a training data set and a testing data set.

In the present embodiment, the car information data is organized in the form of < X, Y > into a number of instances, each instance constituting a car information data set. Wherein X is an information data characteristic which is < vehicle parameter data, vehicle real-time driving data and current road environment data >; and Y is a prediction target, and the prediction target of the embodiment is the next driving behavior of the intelligent networked automobile, including steering behavior, braking behavior and driving behavior.

And S3, selecting a plurality of types of network structure models, abstracting and summarizing a plurality of operators from the network structure models, and forming an operator data set.

In this embodiment, there are ten types of network structure models, which are FM (factorization machine), PNN (Product-based neural network), PIN (inner Product-based neural network), HNN (holographic Product network), deep FM (deep factorization machine), fiBiNET (bilinear network of feature importance), DCN (deep combination network), XDeepFM (extreme deep factorization machine), AFM (attention-machine factorization machine), and AutoInt (automatic feature combination of self attention), respectively.

The operators obtained from the network structure model include six categories, and the specific correspondence is as follows:

the second-order combination operator-FM is FM/DeepFM;

a second-order combination operator HFM is PNN/PIN/HNN;

a second-order combined operator-Biliner-All is FiBiNET;

a second-order combined operator-Biliner-arch is FiBiNET;

a second-order combined operator-Biliner-Interaction is FiBiNET;

a second-order combined operator-CIN is XDeepFM/DCN;

DNN input part-first order: deep FM;

DNN input partial-second order PIN;

the DNN input part-first order + second order: PIN;

the DNN input part is a first-order and a second-order n-dimension, wherein the first order is from PIN, and the second-order n-dimension is obtained by performing sum polymerization on PIN feature combination dimension;

the DNN input part comprises first-order and second-order k dimensions, wherein the first order is from PIN, and the second-order k dimension is obtained by performing sum polymerization on PIN characteristic Embedding dimension;

AFM, attention-MLP on Field level;

attention-SENTET at Field level FiBiNET;

Attention-Multi-Head at Field level: autoInt;

attachment on Field level-DeepFM;

the number of hidden layers of DNN is DeepFM/XDeepFM;

DeepFM/XDeepFM;

Skip-Connection:XDeepFM/DCN。

and S4, constructing a model generation architecture, wherein the model generation architecture comprises an RNN Controller network and a model resolver.

In this embodiment, the RNN Controller network is a two-layer LSTM network, and the hidden layer size is 256; the model parser is a decoding algorithm for stacking operator character sequences into a model structure of TensorFlow.

S5, initializing the iteration number K =0, and setting an iteration number threshold K _m 。

In this embodiment, K _m The value is 2000.

And S6, randomly generating S =2 different operator character sequences by the RNN Controller network according to the operator data set, wherein each operator character sequence is composed of a plurality of operators randomly sampled from the operator data set.

In this embodiment, each step of RNN generation in the RNN Controller network corresponds to generation of an operator structure, and a specific recommended model structure can be obtained by combining and assembling the generated operator sequences, for example, the following structure:

string sequence str1= "feature combination operator: FM, input feature operator: original structure, attention operator: none, hidden layer size: 300, activation function: RELU, number of layers: 3 layers ", then this structure is that of standard deep fm;

string sequence str2= "feature combination operator: biliner All, input feature operator: first and second order stitching, attention operator: SENET, hidden layer size: 300, activation function: RELU, number of layers: 3 layers ", then this structure is the structure of the fibanet-all.

And S7, respectively converting the 2 operator character sequences into 2 submodels (Child models) by the Model analyzer, randomly initializing the parameters of each current submodel if K =0, and respectively initializing the parameters of the 2 submodels obtained in the previous training round into the parameters of the current 2 submodels if K is not equal to 0.

S8, training each current submodel through the training data set, storing parameters of each submodel, evaluating each submodel after current training according to the test data set, and respectively obtaining S rewards, wherein K = K +1.

In the present embodiment, reward refers to an evaluation index of recommendation performance (e.g., accuracy ACC of driving behavior prediction, AUC of click, etc.). If the structure performance is excellent, a certain reward is given to the Controller, and if the generated structure performance is poor, the Controller is punished.

In this embodiment, adam's monitoring update algorithm is used to train update parameters, and the obtained model format is a checkpoint file format.

And S9, if K =2000, selecting the best rewarded one from the S sub-models after current training as the output of the automobile information prediction network model to complete the generation of the automobile information prediction network model, and otherwise, updating the parameters of the RNN Controller network according to the current rewarded by adopting a PolicyGradient reinforcement learning algorithm and then jumping to the step S6.

In this embodiment, through multiple feedback of child model, the RNN Controller network structure becomes more and more excellent, and then the generated operator character sequence becomes more excellent, and then the obtained sub-model becomes more excellent.

In this embodiment, the parameters of the RNN Controller network are updated according to the reinforcement learning algorithm of policygredient, and the specific optimization formula of the optimization structure is as follows:

in the above formula, J (theta) _C ) Represents a specific optimization objective, wherein a _1：T Represents the operator sequence set generated by each step of the Controller, T is the length of the operator sequence, and R is the calculationThe Reward, P obtained for a substructure is the probability distribution of the sampling space of the operator structure with the goal of J (theta) _C ) And (4) maximizing.

In the parameter solving process of the Controller, besides Reward, a specific gradient function needs to be calculated, and the specific gradient function is expanded according to time steps, wherein the calculation mode is as follows:

in this embodiment, 2 submodels are selected to perform the sampling calculation expectation process, and the gradient function is updated as:

through multiple experiments, when S is observed to take different values, the fact that S is not larger is found to be better, S =2 enables a finally generated automobile information prediction network model to be optimal, and the prediction effect is best.

Update the general formula as

Wherein alpha is _C For the updated learning rate of Controller, J (θ) _C ) Is a parameter variable of the Controller.

In the present embodiment, P (a) at each time step _t |a _1：(t-1 ) And in the probability calculation process, calculating cross entropy according to an operator generated by sampling and the probability of sampling in each step.

In step S7 of this embodiment, after the model has been iteratively trained once, in the following process, the parameters learned by the last child model are used to initialize the parameters of the current round, and the advantages of such parameter sharing are as follows:

1. the parameters of the child model are ensured to be initialized in a better parameter space, so that the child model has fewer required learning steps and better effect;

2. compared with the non-sharing mode, the learning efficiency of the sharing mode is saved by about 1000 times, because:

non-sharing: controller steps C1=100000 steps, S =2,child requires steps C2=100 steps, and takes a total of time: C1S C2=10^5 ^ 2 ^ 100=2 ^ 10^7 steps;

the sharing mode comprises the following steps: controller steps C1=2000 steps, S =2,child requires steps C2=5 steps, and takes time in total:

C1S C2= 2S 10^ 3S 25 =2S 10^4 steps.

The efficiency is saved: 2 x 10^7/2 x 10^4 ^ 10^3.

After the automobile information prediction network model generated in the embodiment is deployed to the intelligent internet automobile cloud platform, the automobile information prediction network model can be used for predicting automobile driving behaviors.

In this embodiment, the generated parameter structure of the automobile information prediction network model is a checkpoint file, and the checkpoint file is deployed to an intelligent networked automobile cloud platform in a restful service interface manner.

Example 2

Compared with the embodiment 1, the automobile information data in the step S1 is the personal information data of the automobile user, including the age of the user, the sex of the user, the city where the user is located, the hobbies of the user and the personality of the user, and the Y in the step S2 is the predicted automobile price. The automobile information prediction network model generated in the embodiment can be used for predicting automobile prices after being deployed on the intelligent networked automobile cloud platform, and the deployment method is the same as that in the embodiment 1.

The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A prediction model generation method of an intelligent networking cloud platform based on reinforcement learning is characterized by comprising the following steps:

s1, acquiring a plurality of automobile information data from an intelligent network automobile cloud platform;

s5, initializing the iteration frequency K =0, and setting an iteration frequency threshold valueK _m ；

S6, randomly generating S different operator character sequences by the RNNController network according to the operator data set, wherein each operator character sequence is composed of a plurality of operators randomly sampled from the operator data set;

s7, respectively converting the S operator character sequences into S sub-models by a model analyzer, randomly initializing the parameters of each current sub-model if K =0, and respectively initializing the parameters of the S sub-models obtained in the previous training round to the parameters of the current S sub-models if K is not equal to 0;

s8, training each current submodel through a training data set, storing parameters of each submodel, evaluating each currently trained submodel according to a test data set, and respectively obtaining S rewards, wherein K = K +1; reward is an evaluation index of recommended performance;

s9, if K =K _m Selecting the best rewarded one from the S sub-models after current training as an automobile information prediction network model to be output, and completing generation of the automobile information prediction network model, otherwise adopting a PolicyGradient reinforcement learning algorithm, updating the RNNController network parameters according to the current rewarded, and then jumping to the step S6;

in the step S1, the vehicle information data is vehicle information data; the vehicle information data comprises vehicle parameter data, vehicle real-time driving data and current road environment data of the vehicle;

the step S2 specifically includes: arranging automobile information data into a plurality of training examples in a form of < X, Y >, wherein each training example forms the training data set, and X is an information data characteristic which is < vehicle parameter data, vehicle real-time driving data and current road environment data >; y is a prediction target, and the prediction target is the next driving behavior of the intelligent networked automobile, including steering behavior, braking behavior and driving behavior.

2. The method according to claim 1, wherein in step S3, the network structure models are ten types, and are respectively FM, PNN, PIN, HNN, deep FM, fiBiNET, DCN, XDeepFM, AFM, and AutoInt.

3. The reinforcement learning-based prediction model generation method for the intelligent internet cloud platform according to claim 1, wherein the RNN Controller network is a two-layer LSTM network, and the hidden layer size is 256; the model parser is used for stacking the operator character sequences into a model structure of TensorFlow.

4. The reinforcement learning-based intelligent networked cloud platform prediction model generation method according to claim 1, wherein in the step S5,K _m =2000。

5. the reinforcement learning-based intelligent networked cloud platform prediction model generation method according to claim 1, wherein S =2.

6. The intelligent internet cloud platform prediction model generation method based on reinforcement learning of claim 1, wherein the automobile information prediction network model is stored in a checkpoint file format.

7. The reinforcement learning-based intelligent networking cloud platform prediction model generation method according to claim 1, wherein the sub-model is a TensorFlow model.