CN110263136A - The method and apparatus for pushing object to user based on intensified learning model - Google Patents
The method and apparatus for pushing object to user based on intensified learning model Download PDFInfo
- Publication number
- CN110263136A CN110263136A CN201910463434.8A CN201910463434A CN110263136A CN 110263136 A CN110263136 A CN 110263136A CN 201910463434 A CN201910463434 A CN 201910463434A CN 110263136 A CN110263136 A CN 110263136A
- Authority
- CN
- China
- Prior art keywords
- push
- user
- wheel
- pushing module
- candidate target
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
This specification embodiment provides a kind of method and apparatus for pushing object to user based on intensified learning model, the method includes the continuous at most N for the first user to take turns push, wherein, every wheel push has corresponding predetermined candidate target set, every wheel push since the second wheel push is clicking on after the object pushed in previous round push the first user, and, the candidate target set that every wheel since the second wheel push pushes includes the respective multiple subclasses of multiple candidate targets that previous round pushes, wherein, i-th wheel push is the following steps are included: obtain i-th of status information;And i-th of status information is inputted into the intensified learning model, to determine the respective mark of push object of the predetermined number of the i-th wheel push.
Description
Technical field
This specification embodiment is related to machine learning techniques field, is based on intensified learning model more particularly, to one kind
The method and apparatus for pushing object to user.
Background technique
Traditional customer service is manpower/resource-intensive and time-consuming, and therefore, building can answer automatically user and face
The intelligent assistant of problem is extremely important.Recently, people are increasingly concerned with how preferably to construct such intelligence with machine learning
It can assistant.As the core function of customer service robot, customer problem prediction, which is intended to automatic Prediction user, may wish to inquire
The problem of, and candidate problem is presented so that it is selected to mitigate the cognitive load of user to user.Problem the essence of prediction is to be based on
The historical behavior of user predicts the problem of user may propose, helps user to solve the problems, such as, improves the satisfaction of user, and
Save the human cost of customer service.Problem prediction technique existing at present is normally based on the single-wheel question recommending of supervised learning, directly
It connects and investigates topic.However, the user in some complexity is intended to noncommittal scene, the accuracy rate of recommendation is generally lower.
Therefore, it is necessary to a kind of schemes that problem is more effectively pushed to user.
Summary of the invention
This specification embodiment is intended to provide a kind of scheme that object is more effectively pushed to user, to solve the prior art
In deficiency.
To achieve the above object, this specification provides one kind on one side and pushes object to user based on intensified learning model
Method, the method includes the continuous at most N for the first user to take turns push, wherein every wheel is pushed with corresponding pre-
Determine candidate target set, every wheel push since the second wheel push clicks in previous round push in first user to be pushed
Object after start, also, since second wheel push start every wheel push candidate target set include what previous round pushed
Multiple respective multiple subclasses of candidate target, wherein the i-th wheel push at most N wheel push the following steps are included:
I-th of status information is obtained, i-th of status information includes static nature and behavioral characteristics, wherein described quiet
State feature includes the existing feature of the first user before carrying out the method, and the behavioral characteristics include that first user is directed to
The mark for each object that preceding i-1 wheel push has been clicked;And
I-th of status information is inputted into the intensified learning model, so that the intensified learning model is taken turns from i-th
The respective mark of push object of the predetermined number of the i-th wheel push is determined in the candidate target set of push.
In one embodiment, so that the intensified learning model determines i-th from the candidate target set of the i-th wheel push
The respective mark of push object for taking turns the predetermined number of push includes, so that the intensified learning model: being based on i-th of shape
The object identity of state information and each candidate target in the candidate target set of the i-th wheel push calculates each of the i-th wheel push
The push probability of candidate target, and it is based on each push probability, determine the push object of the predetermined number of the i-th wheel push.
In one embodiment, first user clicks the first push pair in wheel push for the (i-1)-th wheel push
As, wherein it is based on each push probability, determines that the push object of the predetermined number of the i-th wheel push includes determining that the i-th wheel pushes
Each candidate target in belong to the first push object subclass the first candidate target, and it is candidate right based on each first
The push probability of elephant determines the push object of the predetermined number of the i-th wheel push.
In one embodiment, first user clicks the first push pair in wheel push for the (i-1)-th wheel push
As, wherein so that the intensified learning model determines the predetermined number of the i-th wheel push from the candidate target set of the i-th wheel push
The respective mark of purpose push object includes, so that the son for the candidate target set that the intensified learning model is pushed from the i-th wheel
The respective mark of push object of the predetermined number of the i-th wheel push is determined in set, wherein include described the in the subclass
Multiple subclasses of one push object.
In one embodiment, the i-th wheel push further includes, after the push object for determining the push of the i-th wheel, to institute
It states the first user and pushes the push object, to obtain the feedback of first user.
In one embodiment, i ≠ N, in one for being fed back to not click in the push object of first user
In the case where, the method includes the continuous i for the first user to take turns push, and the method also includes being pushed away based on taking turns with the i
The corresponding multi-group data of multiple push objects in sending, passes through model described in Policy-Gradient algorithm optimization, wherein with jth
The corresponding one group of data of the second push object in wheel push include: status information corresponding with the push of jth wheel, the second push pair
The mark of elephant and return value corresponding with the second push object, wherein j is 1 any natural number into i, the return value
It is obtained based on feedback of first user to the second push object.
In one embodiment, i=N, the method includes the continuous N for the first user to take turns push, and the method is also
Including after the feedback for obtaining first user, based on multiple groups corresponding with multiple push objects in N wheel push
Data pass through model described in Policy-Gradient algorithm optimization, wherein with the second push in the jth wheel push in N wheel push
The corresponding one group of data of object include: with jth wheel to push corresponding status information, the mark of the second push object and with the
The corresponding return value of two push objects, wherein the return value is based on first user to the anti-of the second push object
Feedback obtains.
It in one embodiment, is inquiry problem with the corresponding push object of N wheel push, the return value is described the
One user takes positive value in the case where clicking the second push object, does not click on the second push object in first user
In the case where be zero.
In one embodiment, the return value is in j=N and first user clicks the second push object
Take the first value in situation, the return value takes in j ≠ N and in the case where first user clicks the second push object
Second value, wherein first value is greater than the second value.
On the other hand this specification provides a kind of device for pushing object to user based on intensified learning model, described device
The at most N number of pushing module continuously disposed including being directed to the first user, wherein each pushing module has corresponding predetermined time
Object set is selected, each pushing module since second pushing module clicks through previous push in first user
Start to dispose after the object of module push, also, the candidate target of each pushing module since second pushing module
Set includes the respective multiple subclasses of multiple candidate targets of previous pushing module, wherein at most N number of pushing module
I-th of pushing module include with lower unit:
Acquiring unit is configured to, and obtains i-th of status information, i-th of status information includes static nature and dynamic
Feature, wherein the static nature includes existing feature of first user before disposing the device, and the behavioral characteristics include
The mark for each object that first user has clicked for preceding i-1 pushing module;And
Determination unit is configured to, and i-th of status information is inputted the intensified learning model, so that described strong
Change the push pair that learning model determines the predetermined number of i-th of pushing module from the candidate target set of i-th of pushing module
As respective mark.
In one embodiment, the determination unit includes being deployed in the intensified learning model: computation subunit,
It is configured to, pair of each candidate target in the candidate target set based on i-th of status information and i-th of pushing module
As mark, the push probability of each candidate target of i-th of pushing module is calculated, and determines subelement, is configured to, based on each
Probability is pushed, determines the push object of the predetermined number of i-th of pushing module.
In one embodiment, first user is directed to (i-1)-th and pushes away module clicks through module push first
Push object, wherein the determining subelement is additionally configured to, and is determined in each candidate target of i-th of pushing module and is belonged to institute
The first candidate target of the subclass of the first push object, and the push probability based on each first candidate target are stated, is determined i-th
The push object of the predetermined number of pushing module.
In one embodiment, first user clicks through the of module push for (i-1)-th pushing module
One push object, wherein the determination unit is additionally configured to, so that time of the intensified learning model from i-th of pushing module
Select the respective mark of push object that the predetermined number of i-th of pushing module is determined in the subclass of object set, wherein described
It include multiple subclasses of the first push object in subclass.
In one embodiment, i-th of pushing module further includes that push unit is configured to, and is pushed away determining i-th
After sending the push object of module, the first user of Xiang Suoshu pushes the push object, to obtain the feedback of first user.
In one embodiment, i ≠ N, in one for being fed back to not click in the push object of first user
In the case where, described device includes the continuous i pushing module for the first user, and described device further includes that optimization module is matched
It is set to, based on multi-group data corresponding with multiple push objects in the i pushing module, passes through Policy-Gradient algorithm
Optimize the model, wherein one group of data corresponding with the second push object of j-th of pushing module include: to push with j-th
The corresponding status information of module, the mark of the second push object and with the second corresponding return value of push object, wherein j is
1 any natural number into i, the return value are obtained based on feedback of first user to the second push object.
In one embodiment, i=N, described device include continuous N number of pushing module for the first user, the dress
Set and further include, optimization module is configured to, after the feedback for obtaining first user, based on in N number of pushing module
The corresponding multi-group data of multiple push objects, pass through model described in Policy-Gradient algorithm optimization, wherein with N number of push
The corresponding one group of data of the second push object of j-th of pushing module in module include: shape corresponding with j-th of pushing module
State information, the mark of the second push object and return value corresponding with the second push object, wherein the return value is based on
First user obtains the feedback of the second push object.
On the other hand this specification provides a kind of computer readable storage medium, be stored thereon with computer program, work as institute
When stating computer program and executing in a computer, computer is enabled to execute any of the above-described method.
On the other hand this specification provides a kind of calculating equipment, including memory and processor, which is characterized in that described to deposit
It is stored with executable code in reservoir, when the processor executes the executable code, realizes any of the above-described method.
In the scheme according to the push object of this specification embodiment, a kind of Object Push of novel structuring is proposed
Process guides user step by step, by the process of the entire more wheel push state transitions of intensified learning modeling, and in a model
The dynamic click information for considering user, improves predictablity rate.
Detailed description of the invention
This specification embodiment is described in conjunction with the accompanying drawings, and this specification embodiment can be made clearer:
Fig. 1 shows the process schematic that object is pushed to user according to this specification embodiment;
Fig. 2 shows a kind of methods for pushing object to user based on intensified learning model according to this specification embodiment;
Fig. 3 schematically illustrates the push object shown respectively to user in three-wheel push;
Fig. 4 shows a kind of device for pushing object to user based on intensified learning model according to this specification embodiment
400。
Specific embodiment
This specification embodiment is described below in conjunction with attached drawing.
Fig. 1 shows the process schematic that object is pushed to user according to this specification embodiment.It shows in figure by strong
Change learning model 11 (i.e. intelligent body (agent)) for the successive decision three times of user 12 (i.e. environment) progress to carry out three respectively
The process of secondary push.The intensified learning model is for example in intelligent customer service, for predicting the problem of user wants inquiry.At this
In specification embodiment, it can be layered thinking by structuring when encountering problems by the simulation mankind and determine and finally want inquiry
The problem of design, construct intensified learning model.That is, carrying out hierarchical prediction to problem, first in advance in decision three times
Major class where survey problem, then group of the forecasting problem under major class, finally predicts the problem under group again.
Specifically, in first time decision, the current state based on user 12 inputs original state s to model 111, the shape
State s1In include static nature (as mark s1Ellipse in white box shown in) and behavioral characteristics (not shown), static nature
For the current signature of user, the attributive character having before the bout including user, historical behavior feature etc., behavioral characteristics
The problem of being had clicked in the bout for user, here due to for first time decision, behavioral characteristics are sky.To mould
Type 11 inputs s1Later, model 11 calculates each candidate big of the corresponding first round push of the secondary decision by Policy-Gradient algorithm
The probability of class, and the push major class based on the push of the determine the probability wheel, such as (a11、a12、a13), the push major class namely mould
The movement that type 11 exports.To, can the output based on model to user show (push) these major class.For example, paying
In precious intelligent customer service, based on the output of model 11, three major class " flower ", " borrow " and " remaining sum are shown to user first
It is precious ".After carrying out above-mentioned displaying to user, user can feed back the displaying, for example, user can click one of them greatly
Class, or can be obtained based on the feedback in wheel push without any click and act corresponding return value with each
(r11、r12、r13).After user clicks a major class, such as " flower ", model 11 starts second of decision process.Specifically
It is that the second state s is inputted to model 11 based on the current state of user2, second state s2It equally include static nature (as schemed
Middle mark s2Ellipse in white box shown in) and behavioral characteristics (mark s in such as figure2Ellipse in grey box shown in),
The static nature and state s1Static nature it is identical, include the mark for the major class " flower " that user has clicked in the behavioral characteristics
Know, such as a11.By state s2After input model 11, similarly, model 11 is based on state s2Output and second of decision pair
Three group (a of the second wheel push answered21、a22、a23), such as correspond respectively to next layer that " flower " major class includes
Group " bill ", " refund ", " service charge ".Similarly, after the wheel push of carry out second, the feedback of user can be obtained, such as
User clicks " refund ", can obtain return value (r corresponding with each group based on the feedback of user21、r22、r23), and
And it can correspondingly obtain the state s of third time decision3.By by state s3Input model 11, the three of exportable third round push
A problem (a31、a32、a33), such as correspond respectively to " flower can be with payment beforehand ", " flower it is automatic refund withhold it is suitable
Sequence ", " flower how to refund ", and the feedback that third round can push based on user, obtain with third round push in each problem
Corresponding return value (r31、r32、r33).After carrying out the three-wheel push to user as described above, the three-wheel can be based on
Above-mentioned each data in push carry out model optimization, to improve the forecasting accuracy of model.
It is appreciated that the above-mentioned description to Fig. 1 be only exemplary rather than it is restrictive, for example, the push object
It is not limited to user query problem, but can be other push objects, such as commodity, film review, thus corresponding major class and group
Also change therewith, and user can also correspondingly change the movement of push object, the calculation method of return value also correspondingly becomes
Change;The model is not limited by three-wheel push and is pushed to the user, but can be according to concrete scene sets itself;It is described
Model is not limited by Policy-Gradient algorithm and carries out intensified learning etc..
Above-mentioned push process is detailed below.
Fig. 2 shows according to this specification embodiment it is a kind of based on intensified learning model to user push object method,
The method includes the continuous at most N for the first user to take turns push, wherein every wheel push has corresponding predetermined candidate right
As set, since every wheel push that the second wheel push starts first user click on previous round push in the object that pushes it
After start, also, since second wheel push start every wheel push candidate target set include previous round push multiple candidates
The respective multiple subclasses of object, wherein the i-th wheel push at most N wheel push the following steps are included:
Step S202 obtains i-th of status information, and i-th of status information includes static nature and behavioral characteristics,
In, the static nature includes the existing feature of the first user before carrying out the method, and the behavioral characteristics include described
The mark for each object that one user has clicked for preceding i-1 wheel push;And
I-th of status information is inputted the intensified learning model, so that the intensified learning mould by step S204
Type determines the respective mark of push object of the predetermined number of the i-th wheel push from the candidate target set of the i-th wheel push.
The at most N wheel push is the one bout (episode) of intensified learning, as described above, wherein N wheel
The push object of push is, for example, the inquiry problem of user, then the push object of the 1st to N-1 wheel push is correspondingly inquiry problem
The group at place, major class etc..Every wheel is pushed, corresponding candidate target set has all been preset.For example, in the 1st wheel push
In, preset candidate target set includes each major class, for example, in Alipay intelligent customer service, the candidate target of the 1st wheel push
Set may include " flower ", " borrow ", " Yuebao ", " sesame credit ", " ant insurance ", " ant forest " etc..In the 2nd wheel
In push, preset candidate target set includes each group, and each group is respectively the above-mentioned respective son of each major class
Class such as includes: subclass (" flower bill ", " flower refund ", " flower service charge ", " open flower "), " borrow " of " flower "
Subclass (" borrow interest ", " borrow refund ", " borrowing amount ", " open borrow ") etc..It is preset in the 3rd wheel push
Candidate target set includes each problem under above-mentioned each group, and each problem is above-mentioned each respective son of group
Class such as includes subclass (that is, each inquiry problem relevant to flower refund), the subclass of " flower bill " of " flower refund "
(each inquiry problem i.e. relevant to " flower bill "), etc..
It is appreciated that the candidate target set that every wheel since the second wheel push pushes is not limited to the described above restriction.
For example, in the case where the model is the model of Policy-Gradient algorithm, it is each by being calculated based on input state in a model
The push probability of candidate target, to determine the push object of output based on the sequence of the push probability of each candidate target.When
After first user has had clicked on some push object in the first round, the time is calculated in order to save model, in the second wheel,
Candidate target set can be limited to include each subclass for pushing object through clicking.In this case, for example, can pass through
Specific mark indicates each group is belonging respectively to which major class in first round push in the such as second wheel.
Wherein, since every wheel push that the second wheel push starts first user click on previous round push in push
Start after object.For example, after carrying out first time push, for example, as shown in fig. 1, having been pushed in the 1st wheel push each
After a major class (for example, " flower ", " borrow " and " Yuebao "), if to click one of major class (such as " colored by user
"), then into the second wheel push process of the bout, if user does not click on any one, which terminates, that is, should
Bout only includes wheel push.
The process for including in every wheel push of the at most N wheel push is identical, and the i-th wheel push therein may include walking as follows
Suddenly.
Firstly, obtaining i-th of status information, i-th of status information includes static nature and dynamic in step S202
Feature, wherein the static nature includes the existing feature of the first user before carrying out the method, and the behavioral characteristics include
The mark for each object that first user has clicked for preceding i-1 wheel push.
I-th of status information namely i-th of the state s inputted in the i-th model prediction of the bouti, the shape
State siThe for example, form of feature vector, including multiple elements.Wherein, state siIn make a reservation for multiple dimensions element it is corresponding
In the static nature of user, i.e., existing feature before executing the bout, such as the attributive character of user, Figure Characteristics, history
Behavioural characteristic etc., thus, the static nature in each state corresponding with each secondary model prediction in one bout is
It is identical.Wherein, state siIn make a reservation for the elements of multiple dimensions and correspond to the behavioral characteristics of user, be first in the behavioral characteristics
The mark for each object that user has clicked in each push before the wheel push of the bout.For example, with reference to right above
The description of Fig. 1, in first round push, due to not carrying out any click before the first user, the state s of input1In
Behavioral characteristics can for example be expressed as [0,0];In the second wheel push, for example, the first user point after the first round pushes
" flower " has been hit, thus, second state s of input2In behavioral characteristics in include " flower " mark, for example, state s2
In behavioral characteristics can be expressed as [a11,0];In third round push, for example, the first user clicks after the second wheel push
" refund ", thus, the third state s of input3In behavioral characteristics include the first user for first round push and the
The mark of " flower " that two wheel push have been clicked and the mark of " refund ", for example, state s3In behavioral characteristics can be expressed as
[a11,a22]。
I-th of status information is inputted the intensified learning model, so that the intensified learning mould by step S204
Type determines the respective mark of push object of the predetermined number of the i-th wheel push from the candidate target set of the i-th wheel push.
The intensified learning model be, for example, the model based on Policy-Gradient algorithm, in this case, model include about
The strategic function π (a | s, θ) of state s and movement a, wherein θ is the model parameter of the intensified learning model, π (a | s, θ) be
Using the probability of movement a under state s.For example, to mode input state siLater, in a model based on strategic function π (a | s,
θ) obtain each candidate target (i.e. each movement) a of the i-th decisionijPush probability, and based on each candidate target
Probability is pushed, determines the push object of the predetermined number of wheel push.
For example, in the scene of intelligent customer service as described in Figure 1, in first round push, to mode input state s1
Later, as described above, the candidate target of the first time decision of model is for example including " flower ", " borrow ", " Yuebao ", " sesame
Credit ", " ant insurance ", " ant forest ", such as pass through a respectively11、a12、…、a16Mark, model successively calculate with it is each
Candidate target is corresponding to use probability π (a1j|s1, θ), wherein j 1,2 ... 6, and this 6 probability are ranked up, from
And forward predetermined number (such as the 3) candidate target that will sort is determined as pushing object.It is appreciated that the predetermined number can
To be set as all being identical relative to each decision process, alternatively, the predetermined number can be distinguished relative to each secondary decision
Setting.For example, predetermined number can be set as to proportional to the candidate target number of the secondary decision, thus, carrying out the first round
In push, the number of candidate target is less, so that the number for pushing object is also less, it is candidate right in carrying out third round push
The number of elephant is more, so that the number for pushing object is also correspondingly more.
In general, after progress as described above such as first round push, for example, the first user clicks first round push
" flower ", model is based on new state s2The push object of prediction is each subclass under " flower ".However, for troubleshooting model
Error can be filtered the output of model, then be ranked up again.For example, model is calculating each of the second wheel push
After the push probability of candidate target, the candidate target of " flower " subclass is filtered out in each candidate target and is not, and to remaining
The push probability of candidate target is ranked up, to finally determine the push object of the second wheel push.In one embodiment, may be used
The candidate target of the second wheel push is filtered before model calculates push probability, that is, filtering out in candidate target set is not
The candidate target of " flower " subclass, the subclass of " flower " so that only remaining in candidate target set, and it is based on the filtered time
Object set (subclass that the candidate target collection is combined into former candidate target set) is selected to calculate the push probability of each candidate target,
And the sequence of the push probability based on each candidate target, determine the push object of wheel push.
It is appreciated that the intensified learning model is not limited to using Policy-Gradient algorithm, and other algorithms can be used, such as
Q learning algorithm, behavior-judge algorithm (actor-critic) etc., are not described in detail one by one herein.
After the push object for determining wheel push as described above, the first user of Xiang Suoshu pushes the push object,
To obtain the feedback of first user.It, can be in window page to first user's sequence exhibition for example, in the scene of intelligent customer service
Show the push object of the predetermined number of wheel push.Fig. 3 schematically illustrates the push pair shown respectively to user in three-wheel push
As shown in Figure 3, in first round push, showing " flower ", " borrow " and " Yuebao " to user's sequence.In the displaying
Later, the feedback of the first user can be obtained, that is, the first user can obtain its click condition based on the feedback of the first user
Corresponding return value is taken, the mouse in Fig. 3 is used to indicate click of the user to the push object.For example, can preset, when first
User clicks some push object a1jWhen, it will return value r corresponding with the push object in wheel push1jIt is denoted as 0.1, if
First user does not click on push object a1jWhen, it will return value r corresponding with the object1jIt is denoted as 0.That is, in user
After clicking " flower " in one wheel push, it can obtain, r11=0.1, r12=0, r13=0.Only show it is appreciated that this is preset
Meaning property, for example, when the first user clicks some push object a1jWhen, can based on the push object predetermined number push pair
Collating sequence as in sets return value r1j, sort more forward, return value is bigger.For example, it is assumed that in first round push,
Model exports the three push object a arranged in the following order11、a12、a13, the first user can be clicked to push object a11When
Return value r11It is set greater than the first user and clicks push object a12When return value r12。
After the first user clicks some push object of wheel push, model enters next round push thus and can
The return value of next round push is obtained as described above.For example, the first user clicks " flower " in the first round pushes, then exist
It in next round push, is predicted according to second of model, it may be determined that the sequence under " flower " major class is shown in second push
Forward several " groups " are taken turns in push second to three groups of user's displaying model prediction such as Fig. 3 institute: flower bill,
Flower refund, flower service charge.And one " group " (such as " flower refund ") in above-mentioned several " groups " is clicked in user
Later, model carries out third time prediction, with determining three tools for showing next layer of group " flower refund " in third time push
The problem of body.As shown in figure 3, third round push in user show model prediction three problems: flower can go back in advance
Money, flower it is automatic refund withhold sequence, flower how to refund.
Such as in N wheel push, can set, N is taken turns and is pushed, some push pair of N wheel when the user clicks
As in the case where, return value can be set as to bigger than the click return value for corresponding to preceding N-1 wheel.For example, for above-mentioned intelligence visitor
The scene of clothes, when the first user clicks some push of third round object a3jIn the case where (i.e. some problem), can will in the wheel
The corresponding return value of push object is set as r3j=1, when the first user clicks some push object of the first round or the second wheel
In the case where, it can be by return value (r1jOr r2j) it is set as 0.1.For any wheel in three-wheel push, if user does not click on
Any push object of wheel push, then the bout of model terminates, i.e., model will not enter next round push, that is to say, that
The one bout of model includes at most N wheel push.
After the one bout of model terminates, the inputoutput data and feedback data training mould in the bout can be passed through
Type.For example, in oneainstance, which includes the three-wheel push to the first user, that is to say, that the first user is in the first round
There is click action in the second wheel push, to eventually enter into third round push.In this case, it can carry out to model extremely
It is few to train three times.Specifically, it is assumed that the first user clicks after the first round pushes and is identified as a11Push object, second
It is clicked after wheel push and is identified as a22Push object, third round push after click be identified as a32Push object, then may be used
Obtain three groups of training data (s1、a11、r11)、(s2、a22、r22) and (s3、a32、r32), wherein as described above it can be concluded that, r11
=0.1, r22=0.1, r32=1.It can be passed through based on each group of data in three groups of training datas according to Policy-Gradient algorithm
Following formula (1) carries out model parameter update:
Wherein,Indicate desired value.For example, in use (s1、a11、r11) training pattern when, following formula can be passed through
(2) in calculation formula (1)
In use (s2、a22、r22) and (s3、a32、r32) training pattern when, each return value can be based on similar as abovely
r22And r32, calculate separatelyWith
In that case, it other than obtaining above-mentioned three groups of training datas, can also be based on not obtaining in the push of each wheel
The push object acquisition training data that user clicks.For example, for the push object a in first round push12, one group of instruction can be obtained
Practice data (s1、a12、r12), in this case, since user does not click on the push object, r12=0, correspondingly,It also is 0.
In another scenario, for example, in the second wheel push, the first user does not click on any one push object, at this
In situation, which terminates after the second wheel push executes.Specifically, it is assumed that the first user point after the first round pushes
It hits and is identified as a11Push object, second wheel push after do not click on any push object, then can obtain one group of training data
(s1、a11、r11) wherein, r11=0.1.To which model training can be carried out likewise by formula (1), and wherein, can pass through
Formula (2) obtains in formula (1)I.e.Similarly, from
In the bout, it is also can correspond to each push object being not clicked in the first round and the second wheel push, obtains multiple groups training
Data, to be used for model training.
Fig. 4 shows a kind of device for pushing object to user based on intensified learning model according to this specification embodiment
400, described device includes the at most N number of pushing module 41 continuously disposed for the first user, wherein each pushing module tool
There is corresponding predetermined candidate target set, each pushing module since second pushing module is clicked in first user
Start to dispose after the object pushed by previous pushing module, also, each push since second pushing module
The candidate target set of module includes the respective multiple subclasses of multiple candidate targets of previous pushing module, wherein it is described extremely
I-th of pushing module in mostly N number of pushing module includes with lower unit:
Acquiring unit 411, is configured to, obtain i-th of status information, i-th of status information include static nature and
Behavioral characteristics, wherein the static nature includes existing feature of first user before disposing the device, the behavioral characteristics
Mark including each object that first user has clicked for preceding i-1 pushing module;And
Determination unit 412, is configured to, and i-th of status information is inputted the intensified learning model, so that described
Intensified learning model determines the push of the predetermined number of i-th of pushing module from the candidate target set of i-th of pushing module
The respective mark of object.
In one embodiment, the determination unit 412 includes being deployed in the intensified learning model: it is single to calculate son
Member 4121, is configured to, each candidate in the candidate target set based on i-th of status information and i-th of pushing module
The object identity of object calculates the push probability of each candidate target of i-th of pushing module, and determines subelement 4122, matches
It is set to, is based on each push probability, determines the push object of the predetermined number of i-th of pushing module.
In one embodiment, first user is directed to (i-1)-th and pushes away module clicks through module push first
Push object, wherein the determining subelement is additionally configured to, and is determined in each candidate target of i-th of pushing module and is belonged to institute
The first candidate target of the subclass of the first push object, and the push probability based on each first candidate target are stated, is determined i-th
The push object of the predetermined number of pushing module.
In one embodiment, first user clicks through the of module push for (i-1)-th pushing module
One push object, wherein the determination unit is additionally configured to, so that time of the intensified learning model from i-th of pushing module
Select the respective mark of push object that the predetermined number of i-th of pushing module is determined in the subclass of object set, wherein described
It include multiple subclasses of the first push object in subclass.
In one embodiment, i-th of pushing module 41 further includes that push unit 413 is configured to, and is determining i-th
After the push object of a pushing module, the first user of Xiang Suoshu pushes the push object, to obtain first user's
Feedback.
In one embodiment, i ≠ N, in one for being fed back to not click in the push object of first user
In the case where, described device includes the continuous i pushing module for the first user, and described device further includes optimization module 42,
It is configured to, based on multi-group data corresponding with multiple push objects in the i pushing module, is calculated by Policy-Gradient
Method optimizes the model, wherein pushes object corresponding one with second of j-th of pushing module in the i pushing module
Group data include: status information corresponding with j-th of pushing module, second push object mark and with second push pair
As corresponding return value, wherein the return value is obtained based on feedback of first user to the second push object.
In one embodiment, i=N, described device include continuous N number of pushing module for the first user, the dress
It sets and further includes, optimization module 42 is configured to, and after the feedback for obtaining first user, is based on and N number of pushing module
In the corresponding multi-group data of multiple push objects, pass through model described in Policy-Gradient algorithm optimization, wherein N number of push away with described
It includes: corresponding with j-th of pushing module for sending second of j-th of pushing module in module to push the corresponding one group of data of object
Status information, the mark of the second push object and return value corresponding with the second push object, wherein the return value base
It is obtained in feedback of first user to the second push object.
On the other hand this specification provides a kind of computer readable storage medium, be stored thereon with computer program, work as institute
When stating computer program and executing in a computer, computer is enabled to execute any of the above-described method.
On the other hand this specification provides a kind of calculating equipment, including memory and processor, which is characterized in that described to deposit
It is stored with executable code in reservoir, when the processor executes the executable code, realizes any of the above-described method.
In the scheme according to the push object of this specification embodiment, a kind of Object Push of novel structuring is proposed
Process guides user step by step, by the process of the entire more wheel push state transitions of intensified learning modeling, and in a model
The dynamic click information for considering user, improves predictablity rate.
It is to be understood that herein " first ", the description such as " second ", it is for illustration only simple and to similar concept into
Row is distinguished, and does not have other restriction effects.
All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment
Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method
Part explanation.
It is above-mentioned that this specification specific embodiment is described.Other embodiments are in the scope of the appended claims
It is interior.In some cases, the movement recorded in detail in the claims or step can be come according to the sequence being different from embodiment
It executes and desired result still may be implemented.In addition, process depicted in the drawing not necessarily require show it is specific suitable
Sequence or consecutive order are just able to achieve desired result.In some embodiments, multitasking and parallel processing be also can
With or may be advantageous.
Those of ordinary skill in the art should further appreciate that, describe in conjunction with the embodiments described herein
Each exemplary unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, in order to clear
Illustrate to Chu the interchangeability of hardware and software, generally describes each exemplary group according to function in the above description
At and step.These functions hold track actually with hardware or software mode, depending on technical solution specific application and set
Count constraint condition.Those of ordinary skill in the art can realize each specific application using distinct methods described
Function, but this realization is it is not considered that exceed scope of the present application.
The step of method described in conjunction with the examples disclosed in this document or algorithm, can hold track with hardware, processor
Software module or the combination of the two implement.Software module can be placed in random access memory (RAM), memory, read-only storage
Device (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technology neck
In any other form of storage medium well known in domain.
Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects
It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention
Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include
Within protection scope of the present invention.
Claims (20)
1. a kind of method for pushing object to user based on intensified learning model, the method includes for the continuous of the first user
At most N take turns push, wherein the push of every wheel has corresponding predetermined candidate target set, every wheel since pushing the second wheel
It pushes and starts after first user clicks on the object pushed in previous round push, also, since the second wheel push
Every wheel push candidate target set include previous round push the respective multiple subclasses of multiple candidate targets, wherein it is described
At most N wheel push in i-th wheel push the following steps are included:
I-th of status information is obtained, i-th of status information includes static nature and behavioral characteristics, wherein described static special
Sign includes the existing feature of the first user before carrying out the method, and the behavioral characteristics include first user for preceding i-
The mark for each object that 1 wheel push has been clicked;And
I-th of status information is inputted into the intensified learning model, so that the intensified learning model is pushed from the i-th wheel
Candidate target set in determine i-th wheel push predetermined number the respective mark of push object.
2. according to the method described in claim 1, wherein, so that candidate target of the intensified learning model from the i-th wheel push
Determine that the respective mark of the push object for the predetermined number that the i-th wheel pushes includes in set, so that the intensified learning model: base
The object identity of each candidate target in i-th of status information and the candidate target set of the i-th wheel push, calculates the
The push probability of each candidate target of i wheel push, and it is based on each push probability, determine the predetermined number of the i-th wheel push
Push object.
3. according to the method described in claim 2, wherein, first user clicks in wheel push for the (i-1)-th wheel push
The first push object, wherein be based on each push probability, determine that the push object of predetermined number of the i-th wheel push includes, really
Belong to the first candidate target of the subclass of the first push object in each candidate target of fixed i-th wheel push, and based on each
The push probability of a first candidate target determines the push object of the predetermined number of the i-th wheel push.
4. according to the method described in claim 1, wherein, first user clicks in wheel push for the (i-1)-th wheel push
First push object, wherein so that the intensified learning model from i-th wheel push candidate target set in determine i-th wheel
The respective mark of the push object of the predetermined number of push includes, so that candidate of the intensified learning model from the i-th wheel push
The respective mark of push object of the predetermined number of the i-th wheel push is determined in the subclass of object set, wherein the subclass
In include it is described first push object multiple subclasses.
5. according to the method described in claim 2, it is described i-th wheel push further include, determine the i-th wheel push push object it
Afterwards, the first user of Xiang Suoshu pushes the push object, to obtain the feedback of first user.
6. according to the method described in claim 5, wherein, i ≠ N is fed back to not click on the push in first user
In the case where one in object, the method includes the continuous i for the first user to take turns push, and the method also includes bases
In multi-group data corresponding with multiple push objects in i wheel push, pass through mould described in Policy-Gradient algorithm optimization
Type, wherein and the second corresponding one group of data of push object in the push of jth wheel include: state letter corresponding with the push of jth wheel
The mark and return value corresponding with the second push object of breath, the second push object, wherein j is 1 any nature into i
Number, the return value are obtained based on feedback of first user to the second push object.
7. according to the method described in claim 5, wherein, i=N, the method includes the continuous N wheels for the first user to push away
It send, the method also includes after the feedback for obtaining first user, based on the multiple push taken turns in push with the N
The corresponding multi-group data of object, passes through model described in Policy-Gradient algorithm optimization, wherein pushes away with the jth wheel in N wheel push
The corresponding one group of data of the second push object in sending include: to push corresponding status information, the second push object with jth wheel
Mark and return value corresponding with the second push object, wherein the return value is based on first user to described second
The feedback for pushing object obtains.
8. according to the method described in claim 7, push object corresponding with N wheel push is inquiry problem, the return value
Positive value is taken in the case where first user clicks the second push object, does not click on described second in first user
It is zero in the case where pushing object.
9. according to the method described in claim 8, the return value is in j=N and first user click, second push
The first value is taken in the case where object, the return value clicks the feelings of the second push object in j ≠ N and first user
Second value is taken in condition, wherein first value is greater than the second value.
10. a kind of device for pushing object to user based on intensified learning model, described device includes the company for the first user
At most N number of pushing module of continuous deployment, wherein each pushing module has corresponding predetermined candidate target set, from second
Each pushing module that pushing module starts is after the object that first user clicks through previous pushing module push
Start to dispose, also, the candidate target set of each pushing module since second pushing module includes previous push
The respective multiple subclasses of multiple candidate targets of module, wherein i-th of pushing module packet at most N number of pushing module
It includes with lower unit:
Acquiring unit is configured to, and obtains i-th of status information, and i-th of status information includes that static nature and dynamic are special
Sign, wherein the static nature includes existing feature of first user before disposing the device, and the behavioral characteristics include institute
State the mark for each object that the first user has clicked for preceding i-1 pushing module;And
Determination unit is configured to, and i-th of status information is inputted the intensified learning model, so that the extensive chemical
It practises model and determines that the push object of the predetermined number of i-th of pushing module is each from the candidate target set of i-th of pushing module
From mark.
11. device according to claim 10, wherein the determination unit includes being deployed in the intensified learning model
: computation subunit is configured to, each in the candidate target set based on i-th of status information and i-th of pushing module
The object identity of a candidate target calculates the push probability of each candidate target of i-th of pushing module, and determines subelement,
It is configured to, is based on each push probability, determines the push object of the predetermined number of i-th of pushing module.
12. device according to claim 11, wherein first user pushes away module for (i-1)-th and clicks through this
First push object of module push, wherein the determining subelement is additionally configured to, and determines each time of i-th of pushing module
Select the first candidate target for belonging to the subclass of the first push object in object, and the push based on each first candidate target
Probability determines the push object of the predetermined number of i-th of pushing module.
13. device according to claim 10, wherein first user clicks through for (i-1)-th pushing module
First push object of module push, wherein the determination unit is additionally configured to, so that the intensified learning model is from i-th
Determine that the push object of the predetermined number of i-th of pushing module is respective in the subclass of the candidate target set of a pushing module
Mark, wherein include multiple subclasses of the first push object in the subclass.
14. device according to claim 11, i-th of pushing module further includes that push unit is configured to, true
After the push object of fixed i-th of pushing module, the first user of Xiang Suoshu pushes the push object, is used with obtaining described first
The feedback at family.
15. device according to claim 14, wherein i ≠ N is fed back to not click on described push away in first user
In the case where sending one in object, described device includes the continuous i pushing module for the first user, and described device is also wrapped
It includes, optimization module is configured to, and based on multi-group data corresponding with multiple push objects in the i pushing module, is led to
Cross model described in Policy-Gradient algorithm optimization, wherein one group of data packet corresponding with the second push object of j-th of pushing module
It includes: and the corresponding status information of j-th of pushing module, the mark of the second push object and corresponding with the second push object
Return value, wherein j is 1 any natural number into i, and the return value is based on first user to second push pair
The feedback of elephant obtains.
16. device according to claim 14, wherein i=N, described device include N number of pushing away for the continuous of the first user
Send module, described device further includes that optimization module is configured to, after the feedback for obtaining first user, based on it is described
The corresponding multi-group data of multiple push objects in N number of pushing module, passes through model described in Policy-Gradient algorithm optimization, wherein
One group of data corresponding with the second push object of j-th of pushing module in N number of pushing module include: to push away with j-th
Send the corresponding status information of module, the mark of the second push object and return value corresponding with the second push object, wherein
The return value is obtained based on feedback of first user to the second push object.
17. device according to claim 16, push object corresponding with n-th pushing module is inquiry problem, described
Return value takes positive value in the case where first user clicks the second push object, does not click on institute in first user
It is zero in the case where stating the second push object.
18. device according to claim 17, the return value is in j=N and first user clicks described second and pushes away
Take the first value in the case where sending object, the return value is in j ≠ N and first user clicks the second push object
Second value is taken in situation, wherein first value is greater than the second value.
19. a kind of computer readable storage medium, is stored thereon with computer program, when the computer program in a computer
When execution, computer perform claim is enabled to require the method for any one of 1-9.
20. a kind of calculating equipment, including memory and processor, which is characterized in that be stored with executable generation in the memory
Code realizes method of any of claims 1-9 when the processor executes the executable code.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910463434.8A CN110263136B (en) | 2019-05-30 | 2019-05-30 | Method and device for pushing object to user based on reinforcement learning model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910463434.8A CN110263136B (en) | 2019-05-30 | 2019-05-30 | Method and device for pushing object to user based on reinforcement learning model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110263136A true CN110263136A (en) | 2019-09-20 |
CN110263136B CN110263136B (en) | 2023-10-20 |
Family
ID=67916018
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910463434.8A Active CN110263136B (en) | 2019-05-30 | 2019-05-30 | Method and device for pushing object to user based on reinforcement learning model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110263136B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704754A (en) * | 2019-10-18 | 2020-01-17 | 支付宝(杭州)信息技术有限公司 | Push model optimization method and device executed by user terminal |
CN110766086A (en) * | 2019-10-28 | 2020-02-07 | 支付宝(杭州)信息技术有限公司 | Method and device for fusing multiple classification models based on reinforcement learning model |
CN111046156A (en) * | 2019-11-29 | 2020-04-21 | 支付宝(杭州)信息技术有限公司 | Method and device for determining reward data and server |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016057001A1 (en) * | 2014-10-09 | 2016-04-14 | Cloudradigm Pte. Ltd. | A computer implemented method and system for automatically modelling a problem and orchestrating candidate algorithms to solve the problem |
CN108230058A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Products Show method and system |
CN108230057A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | A kind of intelligent recommendation method and system |
CN109003143A (en) * | 2018-08-03 | 2018-12-14 | 阿里巴巴集团控股有限公司 | Recommend using deeply study the method and device of marketing |
US20190005948A1 (en) * | 2017-06-29 | 2019-01-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and device for managing dialogue based on artificial intelligence |
CN109192300A (en) * | 2018-08-17 | 2019-01-11 | 百度在线网络技术(北京)有限公司 | Intelligent way of inquisition, system, computer equipment and storage medium |
CN109451038A (en) * | 2018-12-06 | 2019-03-08 | 北京达佳互联信息技术有限公司 | A kind of information-pushing method, device, server and computer readable storage medium |
-
2019
- 2019-05-30 CN CN201910463434.8A patent/CN110263136B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016057001A1 (en) * | 2014-10-09 | 2016-04-14 | Cloudradigm Pte. Ltd. | A computer implemented method and system for automatically modelling a problem and orchestrating candidate algorithms to solve the problem |
CN108230058A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | Products Show method and system |
CN108230057A (en) * | 2016-12-09 | 2018-06-29 | 阿里巴巴集团控股有限公司 | A kind of intelligent recommendation method and system |
US20190005948A1 (en) * | 2017-06-29 | 2019-01-03 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Method and device for managing dialogue based on artificial intelligence |
CN109003143A (en) * | 2018-08-03 | 2018-12-14 | 阿里巴巴集团控股有限公司 | Recommend using deeply study the method and device of marketing |
CN109192300A (en) * | 2018-08-17 | 2019-01-11 | 百度在线网络技术(北京)有限公司 | Intelligent way of inquisition, system, computer equipment and storage medium |
CN109451038A (en) * | 2018-12-06 | 2019-03-08 | 北京达佳互联信息技术有限公司 | A kind of information-pushing method, device, server and computer readable storage medium |
Non-Patent Citations (1)
Title |
---|
李益群等: "基于标签的强化学习推荐算法研究与应用", 《计算机应用研究》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110704754A (en) * | 2019-10-18 | 2020-01-17 | 支付宝(杭州)信息技术有限公司 | Push model optimization method and device executed by user terminal |
CN110766086A (en) * | 2019-10-28 | 2020-02-07 | 支付宝(杭州)信息技术有限公司 | Method and device for fusing multiple classification models based on reinforcement learning model |
CN110766086B (en) * | 2019-10-28 | 2022-07-22 | 支付宝(杭州)信息技术有限公司 | Method and device for fusing multiple classification models based on reinforcement learning model |
CN111046156A (en) * | 2019-11-29 | 2020-04-21 | 支付宝(杭州)信息技术有限公司 | Method and device for determining reward data and server |
CN111046156B (en) * | 2019-11-29 | 2023-10-13 | 支付宝(杭州)信息技术有限公司 | Method, device and server for determining rewarding data |
Also Published As
Publication number | Publication date |
---|---|
CN110263136B (en) | 2023-10-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Chau | Application of a PSO-based neural network in analysis of outcomes of construction claims | |
Fonseca et al. | Artificial neural networks for job shop simulation | |
Hong et al. | Problem solving by heterogeneous agents | |
CN107015983A (en) | A kind of method and apparatus for being used in intelligent answer provide knowledge information | |
Kuo et al. | Application of metaheuristics-based clustering algorithm to item assignment in a synchronized zone order picking system | |
CN109919685A (en) | Customer churn prediction method, apparatus, equipment and computer readable storage medium | |
CN110263136A (en) | The method and apparatus for pushing object to user based on intensified learning model | |
CN109840628A (en) | A kind of multizone speed prediction method and system in short-term | |
CN106909931A (en) | A kind of feature generation method for machine learning model, device and electronic equipment | |
CN110263979A (en) | Method and device based on intensified learning model prediction sample label | |
CN112328646B (en) | Multitask course recommendation method and device, computer equipment and storage medium | |
Jani et al. | A framework of software requirements quality analysis system using case-based reasoning and Neural Network | |
CN109902823A (en) | A kind of model training method and equipment based on generation confrontation network | |
Al-Hawamdeh et al. | Artificial intelligence applications as a modern trend to achieve organizational innovation in Jordanian commercial banks | |
Battiti et al. | Reactive search optimization: learning while optimizing | |
Kanda et al. | A meta-learning approach to select meta-heuristics for the traveling salesman problem using mlp-based label ranking | |
CN106897388A (en) | Predict the method and device of microblogging event temperature | |
Relich et al. | Knowledge discovery in enterprise databases for forecasting new product success | |
Shiue et al. | Learning-based multi-pass adaptive scheduling for a dynamic manufacturing cell environment | |
Kwon et al. | Neural network modeling for a two-stage production process with versatile variables: Predictive analysis for above-average performance | |
Acosta et al. | Predicting city safety perception based on visual image content | |
Rivera Lazo et al. | Multi-attribute transformers for sequence prediction in business process management | |
CN109859087A (en) | A kind of intelligent Community soft environment configuration system and method surpassing brain using city | |
CN109583749A (en) | A kind of software development cost intelligent control method and system based on Dynamic Programming | |
Borissova et al. | Generalized approach to support business group decision-making by using of different strategies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |