CN106997488A - A kind of action knowledge extraction method of combination markov decision process - Google Patents

A kind of action knowledge extraction method of combination markov decision process Download PDF

Info

Publication number
CN106997488A
CN106997488A CN201710173631.7A CN201710173631A CN106997488A CN 106997488 A CN106997488 A CN 106997488A CN 201710173631 A CN201710173631 A CN 201710173631A CN 106997488 A CN106997488 A CN 106997488A
Authority
CN
China
Prior art keywords
state
action
attribute
strategy
value function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710173631.7A
Other languages
Chinese (zh)
Inventor
吕强
李兆荣
李欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangzhou University
Original Assignee
Yangzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangzhou University filed Critical Yangzhou University
Priority to CN201710173631.7A priority Critical patent/CN106997488A/en
Publication of CN106997488A publication Critical patent/CN106997488A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a kind of action knowledge extraction method of combination markov decision process, including:Train Random Forest model H;Definition action knowledge extracts problem AKE:For Random Forest model H, attribute is split, defined attribute change, action, definition action knowledge extracts problem AKE on this basis;AKE optimization problems are solved with markov decision process:To any input data, markov decision process MDP, and definition strategy are defined, by Policy iteration more new strategy, finally solves and obtains an optimal policy;The action that knowledge extracts definition is acted in the present invention, multiple property values of state can be changed, in actual applications, it will provide accurate feasible proposal.

Description

A kind of action knowledge extraction method of combination markov decision process
Technical field
The invention belongs to machine learning techniques field, particularly a kind of action knowledge of combination markov decision process is carried Take method.
Background technology
In machine learning, many models such as SVMs, random forest, deep-neural-network have been suggested and taken Very big success was obtained, but in many practical applications, the exploitativeness of these models is poor.
Intensified learning is the special machine learning of a class, by being interacted with the autonomous of place environment come learning decision strategy, So that the long-term accumulated award that strategy is received is maximum;Intensified learning and the difference of other machines learning method are:Without advance Provide training data, but will be by being produced with interacting for environment;In management science field, it is using system that knowledge, which extracts problem, Method is counted to analyze the behavior of user and find out specific rule;In machine learning field, knowledge is extracted problem and is mainly Using model subsequent analysis technology.
The major defect of this two classes method is that they are to set up model with total data to extract knowledge, is not to independent Record extracts its useful knowledge.So in numerous applications, the exploitativeness of these models is poor, because these models are only One property value of state is modified, this has resulted in result in actual applications and error occurs, it is impossible to give exactly Go out the suggestion of feasibility.
The content of the invention
Technical problem solved by the invention is that the action knowledge for providing a kind of combination markov decision process is extracted Method, to solve the property value set up model extraction knowledge with total data in the prior art and only change state, causes The problem of resultant error is larger;The present invention realizes the action knowledge of data-driven by the markov decision process of intensified learning Extract, realize and predicting the outcome for machine learning model is converted into the ability of action knowledge.
The technical solution for realizing the object of the invention is:
A kind of action knowledge extraction method of combination markov decision process, comprises the following steps:
Step 1:Train Random Forest model H;
Step 2:Definition action knowledge extracts problem AKE:For Random Forest model H, attribute is split, definition category Property change, action, on this basis definition action knowledge extract problem AKE;
Step 3, with markov decision process solve AKE optimization problems:To any input data, define Markov and determine Plan process MDP, and definition strategy, by Policy iteration more new strategy, finally solve and obtain an optimal policy.
The present invention compared with prior art, its remarkable advantage:
(1) method that the present invention proposes a kind of classical intensified learning method markov decision process of combination, is current Act knowledge and extract field there is provided a kind of new method.
(2) action knowledge extractive technique proposed by the present invention improve efficiently finds optimal policy in finite time Accuracy rate;The present invention is to be based on Random Forest model, and Random Forest model is one of existing best disaggregated model, extensive For in practical problem, by the pretreatment of Random Forest model, data ordered categorization can be caused, optimized in follow-up horse Iteration finds the time of optimal policy in Er Kefu decision processes.
(3) action knowledge extracts the action of definition in the present invention, can change multiple property values of state, in practical application In, it will provide accurate feasible proposal.
(4) it is based on often walking state in markov decision process and being observed completely, iteration finds optimal policy Accuracy rate is ensured;The characteristics of total data is to set up model need not be used with reference to markov decision process, the present invention Can for some individually record extract its available action knowledge, can be by independently understanding environment with interacting for environment And obtain a preferably strategy.
The present invention is described in further detail below in conjunction with the accompanying drawings.
Brief description of the drawings
Fig. 1 is the inventive method overview flow chart.
Embodiment
The action knowledge extraction method of a kind of combination markov decision process of the present invention, with reference to machine learning and reinforcing Study, knowledge is acted using markov decision process extraction;Comprise the following steps that:
Step 1:Train Random Forest model H:
A training dataset is given, a Random Forest model H is set up;It is { X, Y } to define training dataset, and X is defeated Enter data vector set, Y is output classification tag set, and Random Forest model H is set up by stochastical sampling and fully nonlinear water wave, with Machine forest model H anticipation function is
Wherein,For input vector,Y ∈ Y, y are that Random Forest model H is in input vectorIn the case of export Prediction classification, c is expects class object, and d is the d decision tree, and D is the total number of decision tree in random forest, wdFor d The weight of decision tree,It is that the d decision tree is inputtingIn the case of corresponding output,To indicate Function,Expression be in input data vectorIn the case of the prediction that exports be categorized as c probability.
Step 2:Definition action knowledge extracts problem (AKE):For Random Forest model H, attribute is split, defined Attribute change, action, on this basis definition act knowledge and extract problem (AKE).
2.1 pairs of attributes are split:Given a Random Forest model H, each attribute xi(i=1 ..., M) is divided For the interval of M quantity.
If 1) attribute xiIt is classification type and classifies with n, then attribute xiNaturally it is divided into n interval, this When M=n.
If 2) attribute xiIt is that branch node in value type, Random Forest model H on every decision tree is xi> b, Then b is attribute xiA cut-point.If the attribute x in all decision treesiThere is n cut-point, then attribute xiIt is divided into n+ 1 interval, now M=n+1.
2.2 defined attributes change:Give Random Forest model a H, an attribute change τ and be defined as a triple τ =(xi, p, q), p and q are attribute x respectivelyiTwo segmentations it is interval.
One attribute change τ is in given input vectorOn be executable, and if only if the input vectorI-th Attribute xiIn interval p;One attribute change τ is input vectorAttribute xiInterval q is converted to from interval p.
2.3rd, definition is acted:
One action a is defined as an attribute change collection, that is, acts a={ τ1..., τ|a|};Each action a has one R (α) is awarded immediately.
Wherein, | a | the number of attribute change in expression action a, | a | the action a of >=1, i.e., one comprises at least an attribute Change τ.
One action a is in input vectorOn be executable, and if only if its all properties change τ existsOn be executable 's.
2.4th, definition action knowledge extraction problem (AKE) is:
Subject to p (y=c | x*) > z
Wherein, A is executable set of actions, AsTo need the optimal action sequence found, aiFor optimal action sequence As In any one act, R (ai) it is action aiAward immediately, F (As) it is to act on optimal action sequence AsOn obtained total prize Reward is worth, and y is that Random Forest model H is in input vectorIn the case of the prediction classification that exports, z is constant threshold, x*For From initial input vectorPerform optimal action sequence AsThe vector result obtained after middle everything.
AKE problems be look for an action sequence input vector be changed into one have expect prediction classification target to Amount, while ensureing that the award summation of the action sequence is maximum;So, this is an optimization problem, referred to as AKE optimization problems. In the action definition of AKE problems, an action comprises at least an attribute change, and this can just change multiple category of a state Property value, in actual applications, it will provide accurate feasible proposal.
Step 3, with markov decision process solve AKE optimization problems:To any input data, define Markov and determine Plan process (MDP), and definition strategy, by Policy iteration more new strategy, finally solve and obtain an optimal policy.
3.1 define markov decision process for ΠMDP={ S, A, T, R };
Definition procedure is prior art, and wherein S represents state space, and state is represented with s;A represents motion space, and action is used A is represented;T:S × A × S → [0,1] is state transition function, represents that execution under a state one is transferred to after acting another Individual shape probability of state;R:S × A → R is reward functions, represents the award immediately that environment is provided during generating state transfer.From state s Set out, take action a ∈ A (s), receiving the award R of environmental feedback, (s a), and is transferred to T (s, a, s ') probability next State s ' ∈ S, wherein A (s) expressions at moment can take the set of action in state s.
Markov decision process is the process of a loop iteration, untill meeting end condition, defeated after terminating Go out optimal policy sequence B.
3.2 definition strategy:
Tactful π is mapping of the state to action:S × A → [0,1], target is to find one there is largest cumulative to award Rπ Optimal policy π*
Wherein, RπIt is the accumulative award that t execution is acted under tactful π, γtIt is discount factor γ t powers, Eπ[·] It is the expectation under tactful π, rtIt is the award immediately of t execution action.
3.3 define value function:
Reward functions are the instant evaluations to a state (action), and value function is then in the long run to consider a shape The quality of state;Used here as state value function V (s).
A strategy π is given, state value function is defined as:
Based on optimal policy π*, optimum state value function V*(s) it can be defined as:
Wherein, s0Represent original state, s0=s represented using state s as original state, Vπ(s) it is with state s under tactful π For the state value function of original state, V*(s) it is optimum state value function under tactful π by original state of state s.
According to the optimal equatioies of Bellman, can have:
Wherein, rt+1It is the award immediately of t+1 moment execution action, V*(st+1) it is t+1 moment states st+1Optimum state Value function, s ' is the state of subsequent time, and T (s, a, s ') is state transition probability, and γ is discount factor, and R (s, α) is in state Accumulative award under s, action a, V*(s ') is the lower optimum state value functions of NextState s '.
3.4th, solved according to Policy iteration and obtain an optimal policy:
First one strategy π of random initializtiont, calculate state value function v under this strategyt, according to these state value function calls To new tactful πt+1, calculate the value function v of each state under new strategyt+1, until convergence.
The value of each state under a strategy is calculated, is referred to as Policy evaluation;New strategy, quilt are obtained according to state value Referred to as stragetic innovation.
3.4.1 Policy evaluation is carried out:
According to Bellman equatioies, the value function of a state is related to the value function of its succeeding state;Therefore, with follow-up State value function v (s ') updates the value function v (s) of current state;
Policy evaluation traversal institute is stateful, and state value function is updated according to formula below:
After renewal state value function, by tactful πtIt is added in optimal policy sequence B;
Wherein,It is tactful πtLower state s state value function,It is tactful πt+1Lower state s ' state Value function, (s, it is state s, action a a) to represent strategy to π.
3.4.2 stragetic innovation is carried out:
One, which is obtained, according to state value function is better than old tactful new strategy;For a state s, policy selection one is allowed Act a so that current state value function R(s, a)+γ∑s′T(s, a, s ')Vπ(s ') is maximum, i.e.,
Wherein, πt+1Represent the strategy at t+1 moment.
3.4.3 according to the result of stragetic innovation, optimal policy sequence B is exported:Whether the state in determination strategy is target State, if dbjective state is with regard to exit strategy iteration and exports optimal policy sequence B;If not dbjective state, then again Policy evaluation is carried out, is dbjective state until meeting state s, and export optimal policy B.
Whether it is that the Rule of judgment of object function is:
The method that the present invention proposes a kind of classical intensified learning method markov decision process of combination, is current action Knowledge extracts field and provides a kind of new method.The present invention is to be based on Random Forest model, and Random Forest model is existing One of best disaggregated model, has been widely used in practical problem.By the pretreatment of Random Forest model, data can be caused Ordered categorization, optimizes the time that the iteration in follow-up markov decision process finds optimal policy, therefore the present invention is carried The action knowledge extraction method gone out improve efficiently the accuracy rate that optimal policy is found in finite time.Acted in the present invention Knowledge extracts the action of definition, can change multiple property values of state, in actual applications, it will provide accurate feasibility It is recommended that.It can be observed completely based on state is often walked in markov decision process, iteration finds the accuracy rate of optimal policy Ensured.The characteristics of total data is to set up model need not be used with reference to markov decision process, the present invention being capable of pin Its available action knowledge is extracted to some independent record, can be by independently understanding environment with interacting for environment and obtaining One preferably tactful.

Claims (7)

1. a kind of action knowledge extraction method of combination markov decision process, it is characterised in that comprise the following steps:
Step 1:Train Random Forest model H;
Step 2:Definition action knowledge extracts problem AKE:For Random Forest model H, attribute is split, defined attribute becomes Change, act, definition action knowledge extracts problem AKE on this basis;
Step 3, with markov decision process solve AKE optimization problems:To any input data, Markovian decision mistake is defined Journey MDP, and definition strategy, by Policy iteration more new strategy, finally solve and obtain an optimal policy.
2. a kind of action knowledge extraction method of combination markov decision process as claimed in claim 1, it is characterised in that Training Random Forest model H in step 1 is specially:
A training dataset is given, a Random Forest model H is set up;It is { X, Y } to define training dataset, and X is input number According to vector set, Y is output classification tag set, and Random Forest model H is set up by stochastical sampling and fully nonlinear water wave, random gloomy Woods model H anticipation function is
p ( y = c | x → ) = Σ d - 1 D w d I ( o d ( x → ) = c ) Σ d - 1 D w d
Wherein,For input vector,Y ∈ Y, y are that Random Forest model H is in input vectorIn the case of export it is pre- Classification is surveyed, c is expects class object, and d is the d decision tree, and D is the total number of decision tree in random forest, wdFor the d certainly The weight of plan tree,It is that the d decision tree is inputtingIn the case of corresponding output,For indicator function,Expression be in input data vectorIn the case of the prediction that exports be categorized as c probability.
3. a kind of action knowledge extraction method of combination markov decision process as claimed in claim 1, it is characterised in that Knowledge extraction problem is acted defined in step 2 and specifically includes following steps:
2.1 pairs of attributes are split:Given a Random Forest model H, each attribute xi(i=1 ..., M) is divided into M The interval of quantity;
2.2 defined attributes change:Give Random Forest model a H, an attribute change τ and be defined as a triple τ=(xi, P, q), p and q are attribute x respectivelyiTwo segmentations it is interval;
2.3rd, definition is acted:
One action a is defined as an attribute change collection, that is, acts a={ τ1..., τ|a|};Each action a has one to encourage immediately Appreciate R (α);
Wherein, | a | the number of attribute change in expression action a, | α | the action α of >=1, i.e., one comprises at least an attribute change τ;
2.4th, definition action knowledge extraction problem (AKE) is:
max A s ∈ A F ( A s ) = Σ a i ∈ A s R ( a i )
Subject to p (y=C | x*) > z
Wherein, A is executable set of actions, AsTo need the optimal action sequence found, αiFor optimal action sequence AsIn appoint One action of meaning, R (ai) it is action aiAward immediately, F (As) it is to act on optimal action sequence AsOn obtained total award Value, y is that Random Forest model H is in input vectorIn the case of the prediction classification that exports, z is constant threshold, x*For from Initial input vectorPerform optimal action sequence AsThe vector result obtained after middle everything.
4. a kind of action knowledge extraction method of combination markov decision process as claimed in claim 3, it is characterised in that Attribute x in step 2.1iThe interval of M quantity is divided into, is specifically divided into:
If 1) attribute xiIt is classification type and classifies with n, then attribute xiNaturally it is divided into n interval, now M =n;
If 2) attribute xiIt is that branch node in value type, Random Forest model H on every decision tree is xi> b, then b As attribute xiA cut-point;If the attribute x in all decision treesiThere is n cut-point, then attribute xiIt is divided into n+1 Interval, now M=n+1.
5. a kind of action knowledge extraction method of combination markov decision process as claimed in claim 1, it is characterised in that AKE optimization problems are solved with markov decision process specifically include following steps in step 3:
3.1 define markov decision process for ΠMDP={ S, A, T, R }:
S represents state space, and state is represented with s;A represents motion space, and action is represented with a;T:S × A × S → [0,1] is shape State transfer function, represents to be transferred to another shape probability of state after performing an action under a state;R:S × A → R is prize Function is appreciated, the award immediately that environment is provided during generating state transfer is represented;From state s, action a ∈ A (s) are taken, are received (s a), and is transferred to T (s, a, s ') probability state s ' the ∈ S, wherein A (s) of subsequent time to the award R of environmental feedback Expression can take the set of action in state s;
3.2 definition strategy:
Tactful π is mapping of the state to action:S × A → [0,1], target is to find one there is largest cumulative to award RπIt is optimal Tactful π*
π * = arg max π R π
R π = E π [ Σ t = 0 ∞ γ t r t ]
Wherein, RπIt is the accumulative award that t execution is acted under tactful π, γtIt is discount factor γ t powers, Eπ[] is plan Expectation under slightly π, rtIt is the award immediately of t execution action;
3.3 define value function:
A strategy π is given, state value function is defined as:
V π ( s ) = E π [ Σ t = 0 ∞ γ t r t | s 0 = s ]
Based on optimal policy π*, optimum state value function V*(s) it can be defined as:
V * ( s ) = E π * [ Σ t = 0 ∞ γ t r t | s 0 = s ]
Wherein, s0Represent original state, s0=s represented using state s as original state, Vπ(s) it under tactful π using state s is first to be The state value function of beginning state, V*(s) it is optimum state value function under tactful π by original state of state s;
According to the optimal equatioies of Bellman, can have:
V * ( s ) = m a x a E [ r t + 1 + γV * ( s t + 1 ) | s t = s , a t = a ] = m a x a Σ s ′ T ( s , a , s ′ ) [ R ( s , a ) + γV * ( s ′ ) ]
Wherein, rt+1It is the award immediately of t+1 moment execution action, V*(st+1) it is t+1 moment states st+1Optimum state value letter Number, s ' is the state of subsequent time, and T (s, a, s ') is state transition probability, and γ is discount factor, and (s is a) in state s, dynamic to R Make the accumulative award under a, V*(s ') is the lower optimum state value functions of NextState s ';
3.4th, solved according to Policy iteration and obtain an optimal policy:
First one strategy π of random initializtiont, calculate state value function v under this strategyt, obtained newly according to these state value functions Tactful πt+1, calculate the value function v of each state under new strategyt+1, until convergence.
6. a kind of action knowledge extraction method of combination markov decision process as claimed in claim 5, it is characterised in that Solved in step 3.4 according to Policy iteration and obtain an optimal policy, specifically include following steps:
3.4.1 Policy evaluation is carried out:
According to Bellman equatioies, the value function of a state is related to the value function of its succeeding state;Therefore, succeeding state is used Value function v (s ') updates the value function v (s) of current state;
Policy evaluation traversal institute is stateful, and state value function is updated according to formula below:
V π t ( s ) = Σ a ∈ A π ( s , a ) ( R ( s , a ) + γ Σ s ′ T ( s , a , s ′ ) V π t + 1 ( s ′ ) )
After renewal state value function, by tactful πtIt is added in optimal policy sequence B;
Wherein,It is tactful πtLower state s state value function,It is tactful πt+1Lower state s ' state value letter Number, (s, it is state s, action a a) to represent strategy to π;
3.4.2 stragetic innovation is carried out:
One, which is obtained, according to state value function is better than old tactful new strategy;For a state s, policy selection one is allowed to act A so that current state value function R(s, a)+γ∑s′T(s, a, s ')Vπ(s ') is maximum, i.e.,
π t + 1 = 1 a = arg max a ( R ( s , a ) + γ Σ s ′ T ( s , a , s ′ ) V π t + 1 ( s ′ ) ) 0 a ≠ arg max a ( R ( s , a ) + γ Σ s ′ T ( s , a , s ′ ) V π t + 1 ( s ′ ) )
Wherein, πt+1Represent the strategy at t+1 moment;
3.4.3 according to the result of stragetic innovation, optimal policy sequence B is exported:Whether the state in determination strategy is dbjective state, If dbjective state is with regard to exit strategy iteration and exports optimal policy sequence B;If not dbjective state, then plan is re-started Slightly assess, be dbjective state until meeting state s, and export optimal policy B.
7. a kind of action knowledge extraction method of combination markov decision process as claimed in claim 6, it is characterised in that Whether it is that object function Rule of judgment is in step 3.4.3:
CN201710173631.7A 2017-03-22 2017-03-22 A kind of action knowledge extraction method of combination markov decision process Pending CN106997488A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710173631.7A CN106997488A (en) 2017-03-22 2017-03-22 A kind of action knowledge extraction method of combination markov decision process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710173631.7A CN106997488A (en) 2017-03-22 2017-03-22 A kind of action knowledge extraction method of combination markov decision process

Publications (1)

Publication Number Publication Date
CN106997488A true CN106997488A (en) 2017-08-01

Family

ID=59431600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710173631.7A Pending CN106997488A (en) 2017-03-22 2017-03-22 A kind of action knowledge extraction method of combination markov decision process

Country Status (1)

Country Link
CN (1) CN106997488A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376287A (en) * 2018-03-02 2018-08-07 复旦大学 Multi-valued attribute segmenting device based on CN-DBpedia and method
CN108510110A (en) * 2018-03-13 2018-09-07 浙江禹控科技有限公司 A kind of water table trend analysis method of knowledge based collection of illustrative plates
CN109741626A (en) * 2019-02-24 2019-05-10 苏州科技大学 Parking situation prediction technique, dispatching method and system
CN110363015A (en) * 2019-07-10 2019-10-22 华东师范大学 A kind of construction method of the markov Prefetching Model based on user property classification
CN110378717A (en) * 2018-04-13 2019-10-25 北京京东尚科信息技术有限公司 Method and apparatus for output information
CN111294284A (en) * 2018-12-10 2020-06-16 华为技术有限公司 Traffic scheduling method and device
CN113112051A (en) * 2021-03-11 2021-07-13 同济大学 Production maintenance joint optimization method for serial production system based on reinforcement learning

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5806056A (en) * 1994-04-29 1998-09-08 International Business Machines Corporation Expert system and method employing hierarchical knowledge base, and interactive multimedia/hypermedia applications
CN101000624A (en) * 2007-01-10 2007-07-18 华为技术有限公司 Method, system and device for implementing data mining model conversion and application
CN102054002A (en) * 2009-10-28 2011-05-11 中国移动通信集团公司 Method and device for generating decision tree in data mining system
CN103034691A (en) * 2012-11-30 2013-04-10 南京航空航天大学 Method for getting expert system knowledge based on support vector machine
CN103246991A (en) * 2013-05-28 2013-08-14 运筹信息科技(上海)有限公司 Data mining-based customer relationship management method and data mining-based customer relationship management system
CN103258255A (en) * 2013-03-28 2013-08-21 国家电网公司 Knowledge discovery method applicable to power grid management system
CN105182988A (en) * 2015-09-11 2015-12-23 西北工业大学 Pilot operation behavior guiding method based on Markov decision-making process
CN105955921A (en) * 2016-04-18 2016-09-21 苏州大学 Robot hierarchical reinforcement learning initialization method based on automatic discovery of abstract action
CN106021377A (en) * 2016-05-11 2016-10-12 上海点荣金融信息服务有限责任公司 Information processing method and device implemented by computer
CN106156488A (en) * 2016-06-22 2016-11-23 南京邮电大学 Knowledge graph based on Bayes's personalized ordering link Forecasting Methodology
CN106447463A (en) * 2016-10-21 2017-02-22 南京大学 Commodity recommendation method based on Markov decision-making process model

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5806056A (en) * 1994-04-29 1998-09-08 International Business Machines Corporation Expert system and method employing hierarchical knowledge base, and interactive multimedia/hypermedia applications
CN101000624A (en) * 2007-01-10 2007-07-18 华为技术有限公司 Method, system and device for implementing data mining model conversion and application
CN102054002A (en) * 2009-10-28 2011-05-11 中国移动通信集团公司 Method and device for generating decision tree in data mining system
CN103034691A (en) * 2012-11-30 2013-04-10 南京航空航天大学 Method for getting expert system knowledge based on support vector machine
CN103258255A (en) * 2013-03-28 2013-08-21 国家电网公司 Knowledge discovery method applicable to power grid management system
CN103246991A (en) * 2013-05-28 2013-08-14 运筹信息科技(上海)有限公司 Data mining-based customer relationship management method and data mining-based customer relationship management system
CN105182988A (en) * 2015-09-11 2015-12-23 西北工业大学 Pilot operation behavior guiding method based on Markov decision-making process
CN105955921A (en) * 2016-04-18 2016-09-21 苏州大学 Robot hierarchical reinforcement learning initialization method based on automatic discovery of abstract action
CN106021377A (en) * 2016-05-11 2016-10-12 上海点荣金融信息服务有限责任公司 Information processing method and device implemented by computer
CN106156488A (en) * 2016-06-22 2016-11-23 南京邮电大学 Knowledge graph based on Bayes's personalized ordering link Forecasting Methodology
CN106447463A (en) * 2016-10-21 2017-02-22 南京大学 Commodity recommendation method based on Markov decision-making process model

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LONGBING CAO: ""Actionable knowledge discovery and delivery"", 《METASYNTHETIC COMPUTING AND ENGINEERING OF COMPLEX SYSTEMS》 *
QIANG YANG等: ""Extracting Actionable Knowledge from Decision Trees"", 《IEEE TRANSACTIONS ON KNOELEDGE AND DATA ENGINEERING》 *
ZHICHENG CUI等: ""Optimal Action Extraction for Random Forests and Boosted Trees"", 《PROCEEDINGS OF THE 21TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING》 *
陈兴国等: ""强化学习及其在电脑围棋中的应用"", 《自动化学报》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376287A (en) * 2018-03-02 2018-08-07 复旦大学 Multi-valued attribute segmenting device based on CN-DBpedia and method
CN108510110A (en) * 2018-03-13 2018-09-07 浙江禹控科技有限公司 A kind of water table trend analysis method of knowledge based collection of illustrative plates
CN110378717A (en) * 2018-04-13 2019-10-25 北京京东尚科信息技术有限公司 Method and apparatus for output information
CN110378717B (en) * 2018-04-13 2024-03-05 北京京东尚科信息技术有限公司 Method and device for outputting information
CN111294284A (en) * 2018-12-10 2020-06-16 华为技术有限公司 Traffic scheduling method and device
CN111294284B (en) * 2018-12-10 2022-04-26 华为技术有限公司 Traffic scheduling method and device
CN109741626A (en) * 2019-02-24 2019-05-10 苏州科技大学 Parking situation prediction technique, dispatching method and system
CN109741626B (en) * 2019-02-24 2023-09-29 苏州科技大学 Parking situation prediction method, scheduling method and system for parking lot
CN110363015A (en) * 2019-07-10 2019-10-22 华东师范大学 A kind of construction method of the markov Prefetching Model based on user property classification
CN113112051A (en) * 2021-03-11 2021-07-13 同济大学 Production maintenance joint optimization method for serial production system based on reinforcement learning
CN113112051B (en) * 2021-03-11 2022-10-25 同济大学 Production maintenance joint optimization method for serial production system based on reinforcement learning

Similar Documents

Publication Publication Date Title
CN106997488A (en) A kind of action knowledge extraction method of combination markov decision process
Al-Shabandar et al. A deep gated recurrent neural network for petroleum production forecasting
Alzubaidi et al. A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications
WO2023065545A1 (en) Risk prediction method and apparatus, and device and storage medium
CN110889556B (en) Enterprise operation risk characteristic data information extraction method and extraction system
CN109299396B (en) Convolutional neural network collaborative filtering recommendation method and system fusing attention model
Jin et al. Bayesian symbolic regression
CN104008203B (en) A kind of Users' Interests Mining method for incorporating body situation
WO2019015631A1 (en) Method for generating combined features for machine learning samples and system
Pirani et al. A comparative analysis of ARIMA, GRU, LSTM and BiLSTM on financial time series forecasting
US11151480B1 (en) Hyperparameter tuning system results viewer
CN105893609A (en) Mobile APP recommendation method based on weighted mixing
CN104798043A (en) Data processing method and computer system
WO2018133596A1 (en) Continuous feature construction method based on nominal attribute
CN103324954A (en) Image classification method based on tree structure and system using same
CN107451230A (en) A kind of answering method and question answering system
CN113326852A (en) Model training method, device, equipment, storage medium and program product
Patidar et al. Handling missing value in decision tree algorithm
CN116861924A (en) Project risk early warning method and system based on artificial intelligence
CN107368895A (en) A kind of combination machine learning and the action knowledge extraction method planned automatically
Li A study on the influence of non-intelligence factors on college students’ English learning achievement based on C4. 5 algorithm of decision tree
Prudêncio et al. A modal symbolic classifier for selecting time series models
Kim et al. Knowledge extraction and representation using quantum mechanics and intelligent models
CN110310012A (en) Data analysing method, device, equipment and computer readable storage medium
CN113326884A (en) Efficient learning method and device for large-scale abnormal graph node representation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170801