CN111242280A - Deep reinforcement learning model combination method and device and computer equipment - Google Patents

Deep reinforcement learning model combination method and device and computer equipment Download PDF

Info

Publication number
CN111242280A
CN111242280A CN202010009647.6A CN202010009647A CN111242280A CN 111242280 A CN111242280 A CN 111242280A CN 202010009647 A CN202010009647 A CN 202010009647A CN 111242280 A CN111242280 A CN 111242280A
Authority
CN
China
Prior art keywords
reinforcement learning
depth
learning models
data
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010009647.6A
Other languages
Chinese (zh)
Inventor
温建伟
王宇杰
袁潮
方璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Zhuohe Technology Co ltd
Original Assignee
Beijing Zhuohe Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhuohe Technology Co Ltd filed Critical Beijing Zhuohe Technology Co Ltd
Priority to CN202010009647.6A priority Critical patent/CN111242280A/en
Publication of CN111242280A publication Critical patent/CN111242280A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The text discloses a combination method and a combination device of a deep reinforcement learning model and computer equipment, and relates to the deep reinforcement learning technology. Disclosed herein is a method of combining deep reinforcement learning models, comprising: determining weight information of each depth reinforcement learning model in a plurality of depth reinforcement learning models used in combination, and respectively transmitting data to be processed to the plurality of depth reinforcement learning models used in combination to obtain a plurality of output data; and performing weighted average calculation on the plurality of output data according to the weight information of the corresponding depth reinforcement learning model, wherein the calculated result is the output result of using the plurality of depth reinforcement learning models in a combined mode. According to the technical scheme, the output results of the multiple deep reinforcement learning models are determined to be used in combination based on the weight information of different deep reinforcement learning models. The obtained output result is more accurate and efficient.

Description

Deep reinforcement learning model combination method and device and computer equipment
Technical Field
The present invention relates to a deep reinforcement learning technology, and in particular, to a method and an apparatus for combining deep reinforcement learning models, and a computer device.
Background
The deep reinforcement learning combines the perception capability of the deep learning and the decision capability of the reinforcement learning, and is a learning and feedback between an Agent and the environment. The rapid accumulation of experience can be realized through deep reinforcement learning, and dynamic planning can be made for real-time conditions. For example, a game character belongs to an agent, and how the game character takes a series of actions in a learning environment can be determined through deep reinforcement learning, so that the maximum accumulated return is obtained. Where the state(s) is involved, i.e. the state the agent is currently in. Policy (policy), i.e. how to act in the current state. Action (a), the action taken by the agent according to the policy. Reward (r), i.e. the reward obtained after the corresponding action is taken in the state currently in. Model (model), i.e. by means of which the next state can be obtained knowing the state and the action currently being in. Q-Learning is a very popular technique for deep reinforcement Learning. Where the Q function is Q (s, a) represents the total reward value that can be obtained after performing action a from state s under a particular policy.
In the related art, a deep reinforcement learning algorithm can be combined, and model fusion is generally performed by simply averaging and summing a plurality of reinforcement learning models. However, when the state distribution region difference of the model in the feature space is large, the model obtained after fusion cannot simultaneously solve each sub-problem, or even cannot individually process any one sub-problem.
Disclosure of Invention
The application provides a combination method and device of a deep reinforcement learning model and computer equipment.
The application discloses a combination method of a deep reinforcement learning model, which comprises the following steps:
determining weight information of each depth reinforcement learning model in a plurality of depth reinforcement learning models used in combination, wherein the weight information of the depth reinforcement learning models is used for representation, and the influence degree of output data of the depth reinforcement learning models on output results of the depth reinforcement learning models used in combination is determined;
respectively transmitting data to be processed to a plurality of combined deep reinforcement learning models to obtain a plurality of output data;
and performing weighted average calculation on the plurality of output data according to the weight information of the corresponding depth reinforcement learning model, wherein the calculated result is the output result of using the plurality of depth reinforcement learning models in combination.
Optionally, in the above method, the determining weight information of each of a plurality of depth-enhanced learning models used in combination includes:
determining the similarity between preset input data and to-be-processed data of each deep reinforcement learning model;
respectively determining the weight information of each deep reinforcement learning model according to the similarity between preset input data of each deep reinforcement learning model and data to be processed;
the weight information of the deep reinforcement learning model is positively correlated with the similarity between the preset input data of the deep reinforcement learning model and the data to be processed.
Optionally, in the method, the performing weighted average calculation on the plurality of output data according to the weight information of the corresponding deep reinforcement learning model includes:
calculating a weighted sum of Q function values output by a plurality of depth reinforcement learning models used in combination according to the weight information of each depth reinforcement learning model;
and calculating the weighted average value of the Q function values according to the weighted sum of the Q function values and the number of the depth reinforcement learning models used in combination.
Optionally, the method further includes:
and generating a classification model according to the training data of each depth reinforcement learning model used in combination, wherein the classification model is used for determining the similarity between preset input data of different depth reinforcement learning models and the same input data.
Optionally, in the above method, the classification model at least includes one of a classifier constructed based on a variational self-encoder and a classification model constructed based on a neural network.
The application also discloses a composite set of deep reinforcement learning model, includes:
the weight information determining module is used for determining the weight information of each depth reinforcement learning model in a plurality of depth reinforcement learning models used in combination, wherein the weight information of the depth reinforcement learning models is used for representing, and the influence degree of the output data of the depth reinforcement learning models on the output result of the depth reinforcement learning models used in combination is determined;
the data transmission module is used for respectively transmitting the data to be processed to a plurality of combined deep reinforcement learning models to obtain a plurality of output data;
and the calculation module is used for performing weighted average calculation on the plurality of output data according to the weight information of the corresponding depth reinforcement learning model, and the calculated result is the output result of using the plurality of depth reinforcement learning models in a combined manner.
Optionally, in the above apparatus, the weight information determining module includes:
the first weight information determining submodule is used for determining the similarity between preset input data and to-be-processed data of each deep reinforcement learning model;
the second weight information determining submodule is used for respectively determining the weight information of each depth reinforcement learning model according to the similarity between the preset input data of each depth reinforcement learning model and the data to be processed;
the weight information of the deep reinforcement learning model is positively correlated with the similarity between the preset input data of the deep reinforcement learning model and the data to be processed.
Optionally, in the above apparatus, the calculation module includes:
the first calculation submodule is used for calculating the weighted sum of Q function values output by a plurality of depth reinforcement learning models which are used in combination according to the weight information of each depth reinforcement learning model;
and the second calculation submodule is used for calculating the weighted average value of the Q function values according to the weighted sum of the Q function values and the number of the depth reinforcement learning models used in combination.
Optionally, the apparatus further comprises:
the classification module is used for generating a classification model according to the training data of each depth reinforcement learning model used in combination, and the classification model is used for determining the similarity between preset input data of different depth reinforcement learning models and the same input data;
and the first weight information determining submodule determines the similarity between the preset input data of each deep reinforcement learning model and the data to be processed through the classification module.
Optionally, in the above apparatus, the classification module includes at least one of a classifier constructed based on a variational self-encoder and a classification model constructed based on a neural network.
The application also discloses a composite set of deep reinforcement learning model, includes:
a processor;
and a memory storing processor-executable instructions;
wherein the processor is configured to:
instructions implementing the combined method of deep reinforcement learning model described above are executed.
The present application also discloses a computer readable storage medium having a computer program stored thereon, wherein the computer program when executed implements the steps of the method of combining deep reinforcement learning models as described above.
The technical scheme of the application provides a combination scheme of deep reinforcement learning models, and the influence degrees of different deep reinforcement learning models on output data are considered to be different. Therefore, based on the weight information of different depth reinforcement learning models, the output results of using a plurality of depth reinforcement learning models in combination are determined as a weighted average of the output results of the plurality of depth reinforcement learning models. The method realizes the fusion of the deep reinforcement learning model, and the obtained output result is more accurate and efficient.
Drawings
Fig. 1 is a flowchart illustrating a method for combining deep reinforcement learning models according to an exemplary embodiment of the present application.
Fig. 2 is a schematic diagram illustrating a method for combining deep reinforcement learning models according to an exemplary embodiment of the present application.
FIG. 3 is a block diagram of a combination apparatus of a deep reinforcement learning model according to an exemplary embodiment of the present application.
FIG. 4 is a block diagram of a combination apparatus of a deep reinforcement learning model according to an exemplary embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be further described in detail with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments of the present application may be arbitrarily combined with each other without conflict.
Fig. 1 is a flowchart illustrating a method for combining deep reinforcement learning models according to this embodiment. As shown in fig. 1, the method includes the operations of:
step S101, determining weight information of each depth-enhanced learning model in a plurality of depth-enhanced learning models used in combination, wherein the weight information of the depth-enhanced learning models is used for representation, and the influence degree of output data of the depth-enhanced learning models on output results of the depth-enhanced learning models used in combination is determined;
step S102, respectively transmitting data to be processed to a plurality of depth reinforcement learning models used in combination to obtain a plurality of output data;
step S103, carrying out weighted average calculation on the plurality of output data according to the weight information of the corresponding depth reinforcement learning model, wherein the calculated result is the output result of using the plurality of depth reinforcement learning models in combination.
The weight information of the different deep reinforcement learning models comprises the influence degree of the output data of the different deep reinforcement learning models on the target output result when the plurality of deep reinforcement learning models are used in combination. For example, the output data of different deep reinforcement learning models may be a factor in the final output result of a combination of a plurality of deep reinforcement learning models.
It can be seen that, compared with the related art, the method of simply and directly performing average calculation on the output results of the multiple deep reinforcement learning models is adopted. According to the technical scheme, when a plurality of depth reinforcement learning models are combined and used, the influence degrees of output data of different depth reinforcement learning models on the target output result may be different. Therefore, the result of using a plurality of depth reinforcement learning models in combination is determined as a weighted average of the output data of the plurality of depth reinforcement learning models based on the weight information of different depth reinforcement learning models. The output result determined in this way is more accurate and efficient.
The present embodiment further provides a method for combining depth-enhanced learning models, in which determining weight information of each of a plurality of depth-enhanced learning models used in combination includes:
determining the similarity between preset input data and to-be-processed data of each deep reinforcement learning model;
respectively determining the weight information of each deep reinforcement learning model according to the similarity between preset input data of the deep reinforcement learning models and data to be processed;
the weight information of the deep reinforcement learning model is positively correlated with the similarity between the preset input data of the deep reinforcement learning model and the data to be processed.
Herein, the weight information of the deep reinforcement learning model is represented in a positive correlation with the similarity between the preset input data of the deep reinforcement learning model and the data to be processed, and the higher the similarity between the preset input data of the deep reinforcement learning model and the data to be processed is, the larger the weight information of the deep reinforcement learning model is. Correspondingly, the lower the similarity between the preset input data of the deep reinforcement learning model and the data to be processed is, the smaller the weight information of the deep reinforcement learning model is. The weight information of the deep reinforcement learning model may include a weight coefficient.
The similarity between the preset input data of the deep reinforcement learning model and the data to be processed may include a similarity between feature information of the preset input data of the deep reinforcement learning model and feature information of the data to be processed. For example, through deep reinforcement learning, determining how a game character takes a series of actions in a learning environment to obtain a maximum cumulative reward, the pending data may include the state S of the game character. At this point, state S may be characterized by a set of characteristic information. The similarity between the state S and the preset input data of the deep reinforcement learning model can be determined by comparing the feature information of the preset input data of the deep reinforcement learning model with the feature information contained in the state S. For example, in two different deep reinforcement learning models, the feature information of the preset input data of the first deep reinforcement learning model is all the same or substantially the same as the feature information included in the state S. Only part of the feature information of the preset input data of the second deep reinforcement learning model is the same or basically the same as the feature information contained in the state S. The similarity between the preset input data of the first deep reinforcement learning model and the data to be processed is higher than the similarity between the preset input data of the second deep reinforcement learning model and the data to be processed.
As can be seen from the above description, the weight information determined by using the similarity between the preset input data of the deep reinforcement learning model and the data to be processed in this embodiment may indicate that, the higher the similarity between the preset input data and the data to be processed is, the closer the output data of the deep reinforcement learning model and the output data corresponding to the data to be processed is, that is, the greater the influence of the output data of the deep reinforcement learning model on the output result of using a plurality of deep reinforcement learning models in combination is. In this way, the output data of each of the depth-enhanced learning models can be adjusted based on the weight information of each of the depth-enhanced learning models, and the weight of the output data of each of the depth-enhanced learning models can be increased to a greater extent than the weight of the final output result of the combined use of the plurality of depth-enhanced learning models. The final output result obtained is closer to the actual output result.
The present embodiment further provides a method for combining deep reinforcement learning models, in which performing weighted average calculation on a plurality of output data according to weight information of corresponding deep reinforcement learning models includes:
calculating a weighted sum of Q function values output by a plurality of depth reinforcement learning models used in combination according to the weight information of each depth reinforcement learning model used in combination;
and calculating the weighted average value of the Q function values according to the weighted sum of the Q function values and the number of the depth reinforcement learning models used in combination.
The Q function value output by the deep reinforcement learning model may include a calculated value of the function Q (s, a) in the deep reinforcement learning application.
Supposing that the Q function corresponding to each deep reinforcement learning model is QiI.e. QiAnd representing the Q function of the ith deep reinforcement learning model. Computing a weighted sum of Q functions of a plurality of depth reinforcement learning models used in combination as
Figure BDA0002356651050000071
Wherein, αiAnd representing the weight coefficient of the ith deep reinforcement learning model.
The weighted average of the Q function values may be calculated by the following formula:
Figure BDA0002356651050000072
wherein n is the number of the deep reinforcement learning models used in combination.
In this embodiment, the sum of the weighting coefficients of the n deep reinforcement learning models is equal to n, i.e.
Figure BDA0002356651050000073
Is equal to n.
It can be seen that, in the technical solution of this embodiment, based on the weight information of different depth-enhanced learning models, the output result of using a plurality of depth-enhanced learning models in combination is determined as a weighted average of Q function values of the plurality of depth-enhanced learning models. The output result determined in this way is more accurate and efficient.
The embodiment also provides a combination method of the deep reinforcement learning model, and the method further includes:
and generating a classification model according to the training data of each depth reinforcement learning model used in combination, wherein the classification model is used for determining the similarity between preset input data of different depth reinforcement learning models and the same input data.
In this context, historical input data in training data of different deep reinforcement learning models can be sampled and trained.
And analyzing the historical input data of different depth reinforcement learning models by the classification model to determine the similarity between the preset input data of the different depth reinforcement learning models and the same input data. Therefore, the classification operation of different deep reinforcement learning models is realized. That is, the similarity between the preset input data of different deep reinforcement learning models and the same input data is distinguished through the classification model.
In this embodiment, the classification model at least includes one of a classifier constructed based on a variational self-encoder and a classification model constructed based on a neural network.
When the classification model comprises a classifier constructed based on a variational self-encoder, the training input data of different deep reinforcement learning models can be sampled, and the variational self-encoder-based identification network corresponding to each deep reinforcement learning model is trained respectively. In this way, when new input data is received, the similarity between the current input data and the input data of the deep reinforcement learning model training can be determined according to the identification network corresponding to each deep reinforcement learning model.
The following describes an implementation process of the above deep reinforcement learning model combination method by taking practical applications as an example.
This embodiment takes the most extensive Q-Learning application as an example to illustrate the combination process of the deep reinforcement Learning model. Wherein the Q function in Q-Learning is Q (s, a) which indicates that under a specific strategy, the Q function is selected fromState s is the total reward value that can be obtained after performing action a. The principle of the process is shown in fig. 2, and a plurality of depth-enhanced learning models (i.e. Q in fig. 2) used in combination are determined in real time1,Q2……Qn) The classification result compared with the current input data (i.e. D in FIG. 2)1,D2……Dn) Convert the classification result into a weight coefficient (i.e. α in fig. 2)1,α2……αn) Then, a weighted average function is determined
Figure BDA0002356651050000091
Namely, the Q function corresponding to the multiple deep reinforcement learning models is used in combination.
The combination process of the multiple deep reinforcement learning models comprises the following operations:
step 1, respectively collecting training data aiming at each deep reinforcement learning model, and respectively adding the collected training data into a cache buffer;
herein, training data collected by different deep reinforcement learning models can be distinguished;
the manner of collecting the training data may include various manners.
Assume that action a is randomly selected based on the current statemObtaining a set of training data of s through a deep reinforcement learning modelm,am,sm’,rmWhere m denotes a time m, m' denotes a time next to the time m, smRepresents the state at time m, amRepresents the action at time m, sm'represents the state at time m', rmI.e. represents the reward at time m. And in the same way, adding the collected training data into the buffer.
And 2, acquiring new training data for each deep reinforcement learning model to be combined, randomly sampling the training data from the buffer, respectively using the training data as input data of a positive sample and input data of a negative sample for training an identification network corresponding to each deep reinforcement learning model to determine the similarity between the same input data and the training input data of different deep reinforcement learning models, and forming a classification model by using the identification network corresponding to each deep reinforcement learning model.
The generated classification model may include various forms of models, among others.
For example, a classification model may be constructed from the encoder based on the variational basis. For each deep reinforcement learning model, two side-by-side variational self-encoders can be used to form an identification network, positive and negative samples are respectively input, and the output of the variational self-encoders is stacked and then passes through a multilayer perceptron to obtain an output result. The discrimination network learned in this way can determine the similarity between the training input data and the current input data of the deep reinforcement learning model. And combining the identification networks corresponding to each deep reinforcement learning model to obtain a classifier, namely the classifier belongs to the classification model.
For another example, a classification model based on a neural network may be established, that is, input data of different deep reinforcement learning models are trained and learned by using a target neural network, so as to determine similarity between training input data of different deep reinforcement learning models and current input data. Training the learned classification model is also referred to herein as the classification model.
And 3, when the data to be processed is acquired, sending the data to be processed to the classification model to obtain a classification result D, and converting the classification result into a weight coefficient α of the deep reinforcement learning model.
And the obtained classification result comprises the classification result of each combined deep reinforcement learning model compared with the data to be processed. For example, the obtained classification result includes the classification result D of the first deep reinforcement learning model1Classification result D of the second deep reinforcement learning model2… … classification result D of nth deep reinforcement learning modeln. The classification result herein may include a similarity between preset input data of the deep reinforcement learning model and the data to be processed.
The weighting factors α of the depth-enhanced learning models into which the classification results are converted include, for example, the weighting factor of each depth-enhanced learning model used in combinationWeight coefficient α of deep reinforcement learning model1The weight coefficient α of the second deep reinforcement learning model2… … weight coefficient α of nth deep reinforcement learning modeln
In this embodiment, the classification result and the weight coefficient have positive correlation. That is, in the classification result, the closer the data to be processed is to the preset input data of a certain deep reinforcement learning model to be combined, that is, the higher the similarity between the data to be processed and the preset input data of the deep reinforcement learning model is, the larger the weight coefficient obtained by conversion is. In the classification result, the more the data to be processed deviates from the preset input data of a certain deep reinforcement learning model to be combined, that is, the lower the similarity between the data to be processed and the preset input data of the deep reinforcement learning model is, the smaller the weight coefficient obtained by conversion is.
For example, the linear relationship α between the classification result and the weighting factori=μDiOr the classification result is exponential α with the weight coefficienti∝exp(Di)。
And 4, determining Q functions of a plurality of depth reinforcement learning models used in combination according to the weight coefficients of different depth reinforcement learning models.
In the step 4, the Q function may be determined according to the following formula 1 or formula 2:
Figure BDA0002356651050000101
Figure BDA0002356651050000102
in the formula, n is the total number of the depth reinforcement learning models used in combination;
αiweighting coefficients for the ith deep reinforcement learning model;
wherein, the sum of the weighting coefficients α of the n depth-enhanced learning models to be used in combination in formula 1 is n;
the value obtained by dividing the weighting coefficient α of each depth-enhanced learning model in formula 2 by n is less than 1, and the sum of the values obtained by dividing the weighting coefficients of the n depth-enhanced learning models to be used in combination by n is equal to 1;
Qiis the Q function (state-action value function) of the ith deep reinforcement learning model.
Herein, the Q function (state-action value function) of the deep reinforcement Learning model may include the Q function involved in the Soft Q-Learning method.
It can be seen that the output result calculated according to the Q function is a weighted average of the output results of the plurality of deep reinforcement learning models, that is, a final output result obtained by using the plurality of deep reinforcement learning models in combination.
In addition, the operation of step 3 may be to send the acquired input data to the classification model each time the input data is acquired, obtain a classification result, and convert the classification result into a weight coefficient. That is, the present embodiment may determine the weight coefficients of the depth-enhanced learning model in real time for different input data to calculate the weighted average of the output results of the plurality of depth-enhanced learning models. In this way, the obtained output result obtained by using a plurality of deep reinforcement learning models in combination is more accurate for different input data.
Fig. 3 is a schematic structural diagram of a combination apparatus of a deep reinforcement learning model according to an exemplary embodiment. As shown in fig. 3, the apparatus includes at least a weight information determination module 31, a data transmission module 32, and a calculation module 33.
The weight information determining module 31 is configured to determine weight information of each of a plurality of depth-enhanced learning models used in combination, wherein the weight information of the depth-enhanced learning models is used for characterization, and the influence degree of output data of the depth-enhanced learning models on output results of the plurality of depth-enhanced learning models used in combination is determined;
the data transmission module 32 is configured to transmit the data to be processed to a plurality of depth reinforcement learning models used in combination respectively to obtain multiple output data;
and a calculation module 33 configured to perform weighted average calculation on the plurality of output data according to the weight information of the corresponding depth reinforcement learning model, wherein the result of the calculation is an output result obtained by using the plurality of depth reinforcement learning models in combination.
The embodiment also provides a combination device of a deep reinforcement learning model, in which the weight information determining module includes:
the first weight information determining submodule is configured to determine the similarity between preset input data and to-be-processed data of each deep reinforcement learning model;
the second weight information determining submodule is configured to respectively determine the weight information of each depth reinforcement learning model according to the similarity between preset input data of the depth reinforcement learning model and data to be processed;
the weight information of the deep reinforcement learning model is positively correlated with the similarity between the preset input data of the deep reinforcement learning model and the data to be processed.
The embodiment also provides a combined device of a deep reinforcement learning model, in which the computing module includes:
a first calculation submodule configured to calculate a weighted sum of Q function values output from a plurality of depth reinforcement learning models used in combination, in accordance with weight information of each depth reinforcement learning model;
and the second calculating submodule is configured to calculate a weighted average value of the Q function values according to the weighted sum of the Q function values and the number of the depth reinforcement learning models used in combination.
The embodiment further provides a combined device of the deep reinforcement learning model, and the device further includes:
the classification module is configured to generate a classification model according to training data of each depth reinforcement learning model used in combination, wherein the classification model is used for determining the similarity between preset input data of different depth reinforcement learning models and the same input data;
at the moment, the first weight information determining submodule determines the similarity between the preset input data of each deep reinforcement learning model and the data to be processed through the classification module.
In the apparatus, the classification module at least includes one of a classifier constructed based on a variational self-encoder and a classification model constructed based on a neural network.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
FIG. 4 is a block diagram illustrating an apparatus 400 for virtual rocker control in accordance with an exemplary embodiment. Referring to fig. 4, the apparatus 400 includes a processor 401, and the number of the processors may be set to one or more as needed. The apparatus 400 also includes a memory 402 for storing instructions, such as an application program, that are executable by the processor 401. The number of the memories can be set to one or more according to needs. Which may store one or more application programs. Processor 401 is configured to execute instructions to perform the virtual rocker control method described above.
In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 402 comprising instructions, executable by the processor 401 of the apparatus 400 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium having instructions therein which, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a method of combining deep reinforcement learning models, comprising:
determining weight information of each depth reinforcement learning model in a plurality of depth reinforcement learning models used in combination, wherein the weight information of the depth reinforcement learning models is used for representation, and the influence degree of output data of the depth reinforcement learning models on output results of the depth reinforcement learning models used in combination is determined;
respectively transmitting data to be processed to a plurality of combined deep reinforcement learning models to obtain a plurality of output data;
and performing weighted average calculation on the plurality of output data according to the weight information of the corresponding depth reinforcement learning model, wherein the calculated result is the output result of using the plurality of depth reinforcement learning models in combination.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus (device), or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, including, but not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer, and the like. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of additional like elements in the article or device comprising the element.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, it is intended that the present invention cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims (10)

1. A combination method of deep reinforcement learning models is characterized by comprising the following steps:
determining weight information of each depth reinforcement learning model in a plurality of depth reinforcement learning models used in combination, wherein the weight information of the depth reinforcement learning models is used for representation, and the influence degree of output data of the depth reinforcement learning models on output results of the depth reinforcement learning models used in combination is determined;
respectively transmitting data to be processed to a plurality of combined deep reinforcement learning models to obtain a plurality of output data;
and performing weighted average calculation on the plurality of output data according to the weight information of the corresponding depth reinforcement learning model, wherein the calculated result is the output result of using the plurality of depth reinforcement learning models in combination.
2. The method according to claim 1, wherein the determining weight information of each of the plurality of depth-enhanced learning models used in combination comprises:
determining the similarity between preset input data and to-be-processed data of each deep reinforcement learning model;
respectively determining the weight information of each deep reinforcement learning model according to the similarity between preset input data of each deep reinforcement learning model and data to be processed;
the weight information of the deep reinforcement learning model is positively correlated with the similarity between the preset input data of the deep reinforcement learning model and the data to be processed.
3. The method according to claim 1 or 2, wherein the performing a weighted average calculation on the plurality of output data according to the weight information of the corresponding deep reinforcement learning model comprises:
calculating a weighted sum of Q function values output by a plurality of depth reinforcement learning models used in combination according to the weight information of each depth reinforcement learning model;
and calculating the weighted average value of the Q function values according to the weighted sum of the Q function values and the number of the depth reinforcement learning models used in combination.
4. The method of claim 3, further comprising:
and generating a classification model according to the training data of each depth reinforcement learning model used in combination, wherein the classification model is used for determining the similarity between preset input data of different depth reinforcement learning models and the same input data.
5. An apparatus for combining deep reinforcement learning models, the apparatus comprising:
the weight information determining module is used for determining the weight information of each depth reinforcement learning model in a plurality of depth reinforcement learning models used in combination, wherein the weight information of the depth reinforcement learning models is used for representing, and the influence degree of the output data of the depth reinforcement learning models on the output result of the depth reinforcement learning models used in combination is determined;
the data transmission module is used for respectively transmitting the data to be processed to a plurality of combined deep reinforcement learning models to obtain a plurality of output data;
and the calculation module is used for performing weighted average calculation on the plurality of output data according to the weight information of the corresponding depth reinforcement learning model, and the calculated result is the output result of using the plurality of depth reinforcement learning models in a combined manner.
6. The apparatus of claim 5, wherein the weight information determining module comprises:
the first weight information determining submodule is used for determining the similarity between preset input data and to-be-processed data of each deep reinforcement learning model;
the second weight information determining submodule is used for respectively determining the weight information of each depth reinforcement learning model according to the similarity between the preset input data of each depth reinforcement learning model and the data to be processed;
the weight information of the deep reinforcement learning model is positively correlated with the similarity between the preset input data of the deep reinforcement learning model and the data to be processed.
7. The apparatus of claim 5 or 6, wherein the computing module comprises:
the first calculation submodule is used for calculating the weighted sum of Q function values output by a plurality of depth reinforcement learning models which are used in combination according to the weight information of each depth reinforcement learning model;
and the second calculation submodule is used for calculating the weighted average value of the Q function values according to the weighted sum of the Q function values and the number of the depth reinforcement learning models used in combination.
8. The apparatus of claim 7, further comprising:
the classification module is used for generating a classification model according to the training data of each depth reinforcement learning model used in combination, and the classification model is used for determining the similarity between preset input data of different depth reinforcement learning models and the same input data;
and the first weight information determining submodule determines the similarity between the preset input data of each deep reinforcement learning model and the data to be processed through the classification module.
9. A combination of deep reinforcement learning models, comprising:
a processor;
and a memory storing processor-executable instructions;
wherein the processor is configured to:
instructions to perform a combinatorial method of implementing the deep reinforcement learning model of any one of claims 1 to 4.
10. A computer-readable storage medium, on which a computer program is stored, which, when executed, carries out the steps of the method of combining deep reinforcement learning models according to any one of claims 1 to 4.
CN202010009647.6A 2020-01-06 2020-01-06 Deep reinforcement learning model combination method and device and computer equipment Pending CN111242280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010009647.6A CN111242280A (en) 2020-01-06 2020-01-06 Deep reinforcement learning model combination method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010009647.6A CN111242280A (en) 2020-01-06 2020-01-06 Deep reinforcement learning model combination method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN111242280A true CN111242280A (en) 2020-06-05

Family

ID=70870828

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010009647.6A Pending CN111242280A (en) 2020-01-06 2020-01-06 Deep reinforcement learning model combination method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN111242280A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112370258A (en) * 2020-11-13 2021-02-19 北京三角洲机器人科技有限公司 Electric mobile device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090324060A1 (en) * 2008-06-30 2009-12-31 Canon Kabushiki Kaisha Learning apparatus for pattern detector, learning method and computer-readable storage medium
CN104766080A (en) * 2015-05-06 2015-07-08 苏州搜客信息技术有限公司 Image multi-class feature recognizing and pushing method based on electronic commerce
CN106709588A (en) * 2015-11-13 2017-05-24 日本电气株式会社 Prediction model construction method and equipment and real-time prediction method and equipment
CN109196527A (en) * 2016-04-13 2019-01-11 谷歌有限责任公司 Breadth and depth machine learning model
CN109829478A (en) * 2018-12-29 2019-05-31 平安科技(深圳)有限公司 One kind being based on the problem of variation self-encoding encoder classification method and device
CN110052031A (en) * 2019-04-11 2019-07-26 网易(杭州)网络有限公司 The imitation method, apparatus and readable storage medium storing program for executing of player

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090324060A1 (en) * 2008-06-30 2009-12-31 Canon Kabushiki Kaisha Learning apparatus for pattern detector, learning method and computer-readable storage medium
CN104766080A (en) * 2015-05-06 2015-07-08 苏州搜客信息技术有限公司 Image multi-class feature recognizing and pushing method based on electronic commerce
CN106709588A (en) * 2015-11-13 2017-05-24 日本电气株式会社 Prediction model construction method and equipment and real-time prediction method and equipment
CN109196527A (en) * 2016-04-13 2019-01-11 谷歌有限责任公司 Breadth and depth machine learning model
CN109829478A (en) * 2018-12-29 2019-05-31 平安科技(深圳)有限公司 One kind being based on the problem of variation self-encoding encoder classification method and device
CN110052031A (en) * 2019-04-11 2019-07-26 网易(杭州)网络有限公司 The imitation method, apparatus and readable storage medium storing program for executing of player

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112370258A (en) * 2020-11-13 2021-02-19 北京三角洲机器人科技有限公司 Electric mobile device
CN112370258B (en) * 2020-11-13 2022-08-09 安徽金百合医疗器械有限公司 Electric mobile device

Similar Documents

Publication Publication Date Title
CN109816221B (en) Project risk decision method, apparatus, computer device and storage medium
KR102641116B1 (en) Method and device to recognize image and method and device to train recognition model based on data augmentation
JP6844301B2 (en) Methods and data processors to generate time series data sets for predictive analytics
KR20190050141A (en) Method and apparatus for generating fixed point type neural network
CN116635866A (en) Method and system for mining minority class data samples to train a neural network
CN110942142B (en) Neural network training and face detection method, device, equipment and storage medium
US20190378009A1 (en) Method and electronic device for classifying an input
CN110780938A (en) Computing task unloading method based on differential evolution in mobile cloud environment
CN111950810B (en) Multi-variable time sequence prediction method and equipment based on self-evolution pre-training
CN110909878A (en) Training method and device of neural network model for estimating resource usage share
CN111242280A (en) Deep reinforcement learning model combination method and device and computer equipment
CN112925924A (en) Multimedia file recommendation method and device, electronic equipment and storage medium
CN116432780A (en) Model increment learning method, device, equipment and storage medium
Cao et al. Lstm network based traffic flow prediction for cellular networks
WO2023052827A1 (en) Processing a sequence of data items
CN112667394B (en) Computer resource utilization rate optimization method
GB2622756A (en) Training agent neural networks through open-ended learning
CN114528992A (en) Block chain-based e-commerce business analysis model training method
CN113747500A (en) High-energy-efficiency low-delay task unloading method based on generation countermeasure network in mobile edge computing environment
CN114692888A (en) System parameter processing method, device, equipment and storage medium
CN117036037B (en) Suspicious transaction risk analysis method and suspicious transaction risk analysis device
CN117707795B (en) Graph-based model partitioning side collaborative reasoning method and system
CN111427935B (en) Predicting and displaying method for quantized transaction index, electronic equipment and medium
CN111178443B (en) Model parameter selection, image classification and information identification methods, devices and equipment
CN115470910A (en) Automatic parameter adjusting method based on Bayesian optimization and K-center sampling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20211103

Address after: 518000 409, Yuanhua complex building, 51 Liyuan Road, merchants street, Nanshan District, Shenzhen, Guangdong

Applicant after: Shenzhen zhuohe Technology Co.,Ltd.

Address before: 100083 no.2501-1, 25th floor, block D, Tsinghua Tongfang science and technology building, No.1 courtyard, Wangzhuang Road, Haidian District, Beijing

Applicant before: Beijing Zhuohe Technology Co.,Ltd.

TA01 Transfer of patent application right