WO2020186453A1 - Universal logical reasoning method and system for implementing agent based on wide learning algorithm - Google Patents

Universal logical reasoning method and system for implementing agent based on wide learning algorithm Download PDF

Info

Publication number
WO2020186453A1
WO2020186453A1 PCT/CN2019/078710 CN2019078710W WO2020186453A1 WO 2020186453 A1 WO2020186453 A1 WO 2020186453A1 CN 2019078710 W CN2019078710 W CN 2019078710W WO 2020186453 A1 WO2020186453 A1 WO 2020186453A1
Authority
WO
WIPO (PCT)
Prior art keywords
agent
logic
output
bias
new
Prior art date
Application number
PCT/CN2019/078710
Other languages
French (fr)
Chinese (zh)
Inventor
曾祥洪
李利鹏
吴明华
Original Assignee
北京汇真网络传媒科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京汇真网络传媒科技有限公司 filed Critical 北京汇真网络传媒科技有限公司
Priority to PCT/CN2019/078710 priority Critical patent/WO2020186453A1/en
Publication of WO2020186453A1 publication Critical patent/WO2020186453A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/06Asset management; Financial planning or analysis

Definitions

  • NewFeature3 GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])
  • NewFeature3 GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])
  • NewIdea is the new concept state of the agent, and Moveloss is the moving stop loss distance of the imitation object;
  • Module 23 Loop the modules 11 to 22 until the cumulative evolution result of the step-by-step decision and reverse evaluation converges with the situation at the same time, output the new imitation object, and output the condition combination of the new imitation object and the imitation object and logic The causal relationship between.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Artificial Intelligence (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Human Resources & Organizations (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Economics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

A universal logical reasoning method and system for implementing an agent based on a wide learning algorithm, comprising: acquiring various types of environmental data corresponding to an object, where each type of environmental data comprises multidimensional data or indicators, constructing a logic enhancement process and a reverse model by constructing a logic layer having attributes corresponding to the various types of data, performing dynamic self-assessment, environment assessment, and logic assessment with respect to the condition of the object in an environment, using a logistic regression algorithm to perform a feature fusion on the three major assessments to produce a new feature; constructing a new object on the basis of the new feature, and then establishing a causal relationship between a logic formed by the new object and an artificial initial target and perform an assessment. The method and system allow the fully automated exploration of causal relationships between a logic composed by conditional factors and a target, thus implementing a logical reasoning of a machine algorithm; the degree of automation is increased; dependencies between data are better explained; and technical barriers to using AI are reduced.

Description

基于广度学习算法实现智能体的通用逻辑推理方法和系统Realization of general logic reasoning method and system of agent based on breadth learning algorithm 技术领域Technical field
本发明涉及机器学习领域,并特别涉及一种基于广度学习算法来模拟人脑思维方式从而实现智能体的逻辑推理的方法和系统。The invention relates to the field of machine learning, and in particular to a method and system for simulating the thinking mode of the human brain based on a breadth learning algorithm to realize the logical reasoning of an agent.
背景技术Background technique
深度学习源于人工神经网络的研究。含多隐层的多层感知器就是一种深度学习结构。深度学习通过组合低层特征形成更加抽象的高层表示属性类别或特征,以发现数据的分布式特征表示。广度学习与深度学习的差别主要在于深度学习是单个多层感知器,而广度学习是多个多层感知器。Deep learning comes from the research of artificial neural networks. The multilayer perceptron with multiple hidden layers is a kind of deep learning structure. Deep learning forms a more abstract high-level representation attribute category or feature by combining low-level features to discover distributed feature representations of data. The main difference between breadth learning and deep learning is that deep learning is a single multi-layer perceptron, while breadth learning is multiple multi-layer perceptrons.
深度学习目前主要用于图像识别,语音识别和自然语音处理。我们建立的广度学习主要作用在于建立了对象之间的逻辑关系,并自动探索对象与环境,对象与事实的因果关系,实现智能体的逻辑推理。广度学习的意义在于他能自动根据人的逻辑探索未知数据的因果关系,减少人们探索真理的时间,缩减人们探索真理的成本;同时他改变了之前所有推荐算法由被动推荐(机器算法的推荐)变为主动推荐(人工干预的推荐)的方式。Deep learning is currently mainly used for image recognition, speech recognition and natural speech processing. The main function of the breadth learning we established is to establish the logical relationship between objects, and automatically explore the causal relationship between the object and the environment, the object and the fact, and realize the logical reasoning of the agent. The significance of breadth learning is that he can automatically explore the causality of unknown data based on human logic, reduce the time people spend exploring the truth, and reduce the cost of people exploring the truth; at the same time, he changed all previous recommendation algorithms from passive recommendation (recommendation by machine algorithms) Become an active recommendation (recommendation of manual intervention).
深度机器学习方法有监督学习与无监督学习之分.不同的学习框架下建立的学习模型很是不同.例如,卷积神经网络(Convolutional neural networks,简称CNNs)就是一种深度的监督学习下的机器学习模型,而深度置信网(Deep Belief Nets,简称DBNs)就是一种无监督学习下的机器学习模型。我们建立的广度学习融合了监督学习和无监督学习的方式,他同时具备监督学习和无监督学习机制,模仿对象就是广度学习的监督对象,但是之前的深度学习需要大量的监督样本,而广度学习只要一个就够了,但他同时也是无监督学习,因为他的反向模型会自动探索组成模仿对象的所有环境数据,并形成新的模仿对象,通过不断迭代优化模仿对象直到达到人的预期目标。Deep machine learning methods are divided into supervised learning and unsupervised learning. The learning models established under different learning frameworks are very different. For example, Convolutional Neural Networks (CNNs) is a kind of deep supervised learning Machine learning model, and Deep Belief Nets (DBNs) is a machine learning model under unsupervised learning. The breadth learning we established combines the methods of supervised learning and unsupervised learning. It has both supervised learning and unsupervised learning mechanisms. The object of imitation is the supervised object of breadth learning, but the previous deep learning requires a large number of supervised samples, and breadth learning Just one is enough, but it is also unsupervised learning, because his reverse model will automatically explore all the environmental data that composes the imitation object, and form a new imitation object, and optimize the imitation object through continuous iteration until it reaches the expected goal of the person. .
深度学习方法往往无法解释数据之间的相互依赖关系,这是因为他建立的数据之间的函数关系,随着深度的加深,函数之间的参数会变得越来越多以至于无法用数学表达式来表示他们之间的关系,也因此而无法解释。而我们建立 的广度学习虽然也是用深度学习的方法,但是他在深度学习之外建立了函数与属性之间的逻辑关系,以及在逻辑与事实之间建立的因果关系,使得机器算法能通过观察对象与环境之间的相关性就能理解认识他们的因果关系,从而在逻辑上能解释清楚对象与环境之间的关系。Deep learning methods are often unable to explain the interdependence between data. This is because the functional relationship between data established by him. As the depth deepens, the parameters between functions will become more and more so that mathematics cannot be used. Expressions to express the relationship between them, and therefore cannot be explained. Although the breadth learning we have established is also a deep learning method, it establishes a logical relationship between functions and attributes, and a causal relationship between logic and facts outside of deep learning, so that machine algorithms can observe The correlation between the object and the environment can understand their causal relationship, so that the relationship between the object and the environment can be explained logically.
深度学习具有较强的感知能力,但是缺乏一定的决策能力;而强化学习具有决策能力,对感知问题束手无策。深度强化学习将两者结合起来,优势互补,为复杂系统的感知决策问题提供了解决思路。而我们建立的广度学习算法里面的拟人决策模型包括了逻辑强化和反向模型,它与深度强化学习相同的地方是,拟人决策模型里面继承了深度强化学习的方法,比如它包括逻辑强化,目标强化和策略强化,这些强化学习算法就借鉴了深度强化学习的优点,既能感知环境的情况也能提供不同情况下相应的策略,但拟人决策模型与深度强化学习不同的地方在于他还有反向模型,反向模型包括对强化学习的再评估以及根据评估而做进一步的特征重组。Deep learning has strong perception ability, but lacks certain decision-making ability; while reinforcement learning has decision-making ability and can't do anything about perception problems. Deep reinforcement learning combines the two and complements each other's advantages, providing solutions to the perception and decision-making problems of complex systems. The anthropomorphic decision model in the breadth learning algorithm we established includes logical reinforcement and reverse models. The same thing as deep reinforcement learning is that the anthropomorphic decision model inherits the method of deep reinforcement learning. For example, it includes logical reinforcement and goal Reinforcement and strategy reinforcement. These reinforcement learning algorithms draw on the advantages of deep reinforcement learning. They can perceive the environment and provide corresponding strategies in different situations. However, the difference between the anthropomorphic decision model and deep reinforcement learning is that they have countermeasures. Directional model and reverse model include re-evaluation of reinforcement learning and further feature reorganization based on evaluation.
发明公开Invention Disclosure
本发明的核心发明点是以广度学习为基础,建立一套让机器算法(智能体)通过观察对象与环境之间的相关性而认识因果关系(Causal relationship)的方法,即这种方法可以实现逻辑推理,其次发明点是生成逻辑的方法以及验证逻辑优劣的方法。其中对象与环境均为本领域惯用词汇,关于其自身含义的解释不再赘述。The core invention of the present invention is based on breadth learning to establish a set of methods for machine algorithms (agents) to recognize the causal relationship by observing the correlation between the object and the environment, that is, this method can be realized Logical reasoning, the second invention is the method of generating logic and the method of verifying the pros and cons of logic. The object and environment are all commonly used words in the field, and the explanation of its own meaning will not be repeated.
例如让机器算法全自动分析业务(本发明举例是证券市场)数据并提供推荐以及推荐的逻辑。本发明解决了两类问题:1、人们可以通过设定或修改目标,让机器算法自动挖掘并生成目标与逻辑之间的因果关系,并把结果推荐给使用者,这将非常方便的使得人们能根据自己的逻辑探索未知数据的因果关系,并大大缩减探索真理的试错成本。2、人们可以通过机器算法(一个智能体或者是机器人)检验自己的逻辑是否符合实际情况,当机器算法不能推荐出符合人的期望结果时,这证明人们的逻辑可能不符合实际情况,此时人们可以干预机器算法的逻辑部分以便机器的推荐逐渐符合用户的期望与目标;具体来说,本发明提出了一种基于广度学习算法实现智能体的通用逻辑推理方法推理方法,其中包括:For example, let the machine algorithm automatically analyze the business data (the securities market in the present invention) and provide recommendations and recommendation logic. The present invention solves two types of problems: 1. People can set or modify the target, let the machine algorithm automatically mine and generate the causal relationship between the target and the logic, and recommend the result to the user, which will make people very convenient Can explore the causality of unknown data according to their own logic, and greatly reduce the trial and error cost of exploring the truth. 2. People can use a machine algorithm (an agent or a robot) to check whether their logic is in line with the actual situation. When the machine algorithm cannot recommend a result that meets human expectations, it proves that people's logic may not be in line with the actual situation. People can intervene in the logical part of the machine algorithm so that the machine's recommendation gradually meets the user's expectations and goals; specifically, the present invention proposes a general logical reasoning method based on a breadth learning algorithm to realize an agent, including:
步骤1、获取对象对应的各类环境数据,其中每类环境数据包括多维数据或指标,通过特征提取得到每类环境数据的属性; Step 1. Obtain various types of environmental data corresponding to the object, where each type of environmental data includes multi-dimensional data or indicators, and obtain the attributes of each type of environmental data through feature extraction;
步骤2、建立各类环境数据对应属性的逻辑层,以构建逻辑强化过程和反向模型,对该对象在环境中的情况进行动态的自我评估、环境评估和逻辑评估,并使用逻辑回归算法将三者的评估结果进行特征融合,得到新特征; Step 2. Establish a logical layer corresponding to the attributes of various environmental data to build a logical strengthening process and a reverse model, conduct dynamic self-assessment, environmental assessment and logical assessment of the object’s situation in the environment, and use logistic regression algorithms to The evaluation results of the three are combined with features to obtain new features;
步骤3、根据该新特征构建新对象,然后对该新对象形成的逻辑与人为初始目标建立因果关系,并对该因果关系和逻辑进行评估,根据评估结果确认因果关系以及形成该因果关系的逻辑后,将符合该因果关系的新对象作为逻辑推理结果进行推荐输出。 Step 3. Construct a new object based on the new feature, and then establish a causal relationship between the logic formed by the new object and the human initial goal, and evaluate the causal relationship and logic, and confirm the causal relationship and the logic of forming the causal relationship according to the evaluation results Then, the new object that meets the causal relationship is used as the result of logical reasoning for recommendation output.
该基于广度学习算法实现智能体的通用逻辑推理方法,其中该步骤1包括:The general logical reasoning method of the agent based on the breadth learning algorithm is realized, wherein the step 1 includes:
步骤11、获取人为初始目标和环境数据,该环境数据包括财务指标、行情指标、新闻指标和宏观指标,以随机的环境数据作为筛选条件,筛选得到满足该筛选条件的多个股票,集合该多个股票形成alpha指数作为模仿对象;Step 11. Obtain man-made initial goals and environmental data. The environmental data includes financial indicators, market indicators, news indicators, and macro indicators. Random environmental data is used as a screening condition to select multiple stocks that meet the screening conditions, and aggregate the multiple stocks. Each stock forms an alpha index as an imitation object;
步骤12、通过激励函数合并该环境数据,得到情绪指标,使用该环境数据与该人为初始目标做加法得到智能体的初级概念状态,通过激励函数用模仿对象的收益率除以智能体的初级概念状态,得到取舍逻辑,通过激励函数用取舍逻辑除以模仿对象的收益率,得到智能体的初始目标;Step 12. Combine the environmental data through the incentive function to obtain the emotional index, use the environmental data and the person as the initial goal to add the primary concept state of the agent, and divide the return rate of the imitation object by the primary concept of the agent through the incentive function State, get the logic of selection, and divide the logic of selection by the incentive function by the rate of return of the imitation object to get the initial goal of the agent;
步骤13、通过激励函数用智能体的初始目标除以模仿对象的收益率与情绪指标的差得到目标明确的状态值,通过激励函数把智能体的初始目标,目标明确的状态值,智能体的新概念状态和未感知态作为输入得到目标未明确的状态值,通过激励函数用模仿对象的移动止损距离除以目标明确的状态值得到智能体的新概念状态,通过激励函数用目标未明确的状态值除以目标明确的状态值作为分步决策和反向评估的累计进化结果,通过relu函数用智能体的初始目标除以该取舍逻辑得到输出逻辑,通过激励函数用输出逻辑除以人为初始目标与实际收益的差,得到情况;Step 13. Use the incentive function to divide the agent’s initial goal by the difference between the return rate of the imitation object and the emotional index to obtain a clear state value. Through the incentive function, the agent’s initial goal, the clear state value, and the agent’s The new concept state and the unperceived state are used as input to get the unclear state value of the target, and the new concept state of the agent is obtained by the incentive function by dividing the moving stop loss distance of the imitation object by the clear state value, and the target is unclear through the incentive function Divide the state value of the target by the state value of the goal as the cumulative evolution result of step-by-step decision-making and reverse evaluation. Through the relu function, the agent’s initial goal is divided by the selection logic to obtain the output logic, and the incentive function is used to divide the output logic by the artificial The difference between the initial target and the actual income, get the situation;
步骤14、集合该分步决策和反向评估的累计进化结果、该情况和该智能体的初始目标作为该属性。Step 14. Collect the cumulative evolution result of the step-by-step decision and reverse evaluation, the situation and the agent's initial goal as the attribute.
该基于广度学习算法实现智能体的通用逻辑推理方法,其中该情绪指标的计算:The breadth-based learning algorithm realizes the general logical reasoning method of the agent, in which the calculation of the emotional index:
MergeAllData=pd.merge(fin[36],market[32],mac[11],news[17])MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])
EmotionIndex=tanh(weight*MergeAllData+bias)EmotionIndex=tanh(weight*MergeAllData+bias)
EmotionIndex是情绪指标,weight代表权重,fin[36],market[32],mac[11],news[17]分别是财务数据、行情数据、宏观数据、新闻数据,bias是偏置,tanh是激励函数,pd.merge()是合并函数;EmotionIndex is sentiment index, weight represents weight, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias is bias, tanh is incentive Function, pd.merge() is the merge function;
该智能体的初级概念状态的计算:Calculation of the primary concept state of the agent:
idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)
idea代表智能体的初级概念状态,prospective是预期收益,relu是激励函数;Idea represents the primary concept state of the agent, prospect is the expected return, and relu is the incentive function;
该取舍逻辑的计算:Calculation of the trade-off logic:
ar=sigmoid(weight*(reward/idea)+bias))ar=sigmoid(weight*(reward/idea)+bias))
ar为取舍逻辑,reward是步骤11中该模仿对象的收益率,sigmoid是激励函数;ar is the logic of selection, reward is the rate of return of the imitation object in step 11, and sigmoid is the incentive function;
AgentTarget=relu(weight*(ar/reward)+bias))AgentTarget=relu(weight*(ar/reward)+bias))
AgentTarget是智能体的初始目标,weight代表权重,ar是取舍逻辑,reward是alpha的收益率,bias是偏置,relu是激励函数;AgentTarget is the initial target of the agent, weight represents the weight, ar is the logic of selection, reward is the rate of return of alpha, bias is the bias, and relu is the incentive function;
该目标明确的状态值为:The goal-specific status values are:
Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))
Targeted是目标明确的状态值的值,AgentTarget是智能体的初始目标;Targeted is the value of the target state value, and AgentTarget is the initial target of the agent;
该目标未明确的状态值的值为:The value of the unclear status value of the target is:
UnTargeted=sigmoid(weight*([AgentTarget,Targeted,NewIdea,un-recognized n])+bias)) UnTargeted=sigmoid(weight*([AgentTarget, Targeted, NewIdea,un-recognized n ])+bias))
UnTargeted是目标未明确的状态值的值,un-recognized n代表由n维未感知态构成的未知数据; UnTargeted is the value of the unrecognized state value of the target, and un-recognized n represents the unknown data composed of n-dimensional unrecognized state;
该智能体的新概念状态通过下式得到:The new concept state of the agent is obtained by the following formula:
NewIdea=relu(weight*(Moveloss/Targeted)+bias))NewIdea=relu(weight*(Moveloss/Targeted)+bias))
NewIdea是该智能体的新概念状态,Moveloss是该模仿对象的移动止损距离;NewIdea is the new concept state of the agent, and Moveloss is the moving stop loss distance of the imitation object;
分步决策和反向评估的累计进化结果就是目标未明确的状态值与目标明确的状态值的比;The cumulative evolution result of step-by-step decision-making and reverse evaluation is the ratio of the state value with unclear goals to the state value with clear goals;
SapientState=relu(weight*(UnTargeted/Targeted)+bias))SapientState=relu(weight*(UnTargeted/Targeted)+bias))
SapientState是分步决策和反向评估的累计进化结果,weight代表权重,UnTargeted是目标未明确的状态值,Targeted是目标明确的状态值,bias是偏置,relu是激励函数;SapientState is the cumulative evolution result of step-by-step decision-making and reverse evaluation, weight represents weight, UnTargeted is the state value with unclear target, Targeted is the state value with clear target, bias is bias, and relu is the incentive function;
输出逻辑的计算:通过relu函数用预期收益除以取舍逻辑;Calculation of the output logic: divide the expected return by the trade-off logic through the relu function;
OutputLogic=relu(weight*(prospective/ar)+bias))OutputLogic=relu(weight*(prospective/ar)+bias))
OutputLogic是输出逻辑,prospective是预期收益,ar是取舍逻辑;OutputLogic is the output logic, prospective is the expected return, and ar is the logic of choice;
情况代表了对环境认识的最高层次,情况的计算:The situation represents the highest level of understanding of the environment, the calculation of the situation:
Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))
Situation代表情况。Situation represents the situation.
该基于广度学习算法实现智能体的通用逻辑推理方法,其中该步骤2包括:The general logical reasoning method of the agent based on the breadth learning algorithm is realized, wherein the step 2 includes:
步骤21、使用XGBoost算法,该分步决策和反向评估的累计进化结果、该目标未明确的状态值、该目标明确的状态值和该智能体的初始目标作为输出,该环境数据作为输入,筛选出的特征作为自我评估特征;使用LightGBM算法,该情况和该输出逻辑作为输出,该环境数据作为输入,筛选出环境评估特征;用GradientBoosting算法,分步决策和反向评估的累计进化结果和情况为输出,该环境数据作为输入,筛选出输出逻辑特征;用逻辑回归算法把该智能体的初始目标作为输出,用该自我评估特征、该环境评估特征和该输出逻辑特征值作为输入,筛选出模型融合的特征作为新特征;Step 21. Using the XGBoost algorithm, the cumulative evolution result of the step-by-step decision and reverse evaluation, the unclear state value of the goal, the clear state value of the goal, and the agent’s initial goal are used as output, and the environmental data is used as input. The selected features are used as self-assessment features; the LightGBM algorithm is used, the situation and the output logic are used as output, and the environmental data is used as input to filter out the environmental assessment features; the GradientBoosting algorithm is used to determine the cumulative evolution results of stepwise decision-making and reverse evaluation. When the situation is output, the environmental data is used as input to filter out the output logical characteristics; the logistic regression algorithm is used to take the agent’s initial goal as output, and the self-assessment characteristics, environmental evaluation characteristics and output logical characteristics are used as input to filter Feature merged with the model as a new feature;
步骤22、根据该新特征和随机森林算法为每一个股票进行打分,取分值排名最高的多个股票形成的alpha指数,根据新alpha组合与模仿对象alpha的更新规则替代该步骤11中该模仿对象;Step 22: Score each stock according to the new feature and the random forest algorithm, take the alpha index formed by the multiple stocks with the highest score value, and replace the imitation in step 11 according to the update rule of the new alpha combination and the imitation object alpha Object
步骤23、循环该步骤11到步骤22,直到该分步决策和反向评估的累计进化结果和该情况同时收敛,输出该新模仿对象,并输出该新模仿对象的条件组合以及模仿对象与逻辑之间的因果关系。Step 23. Loop the steps 11 to 22 until the cumulative evolution result of the step-by-step decision and reverse evaluation converges with the situation at the same time, output the new imitation object, and output the condition combination of the new imitation object and the imitation object and logic The causal relationship between.
该基于广度学习算法实现智能体的通用逻辑推理方法,其中筛选该自我评估特征的过程包括:The general logical reasoning method of the agent based on the breadth learning algorithm is realized, wherein the process of screening the self-assessment features includes:
NewFeature1=XGBClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,UnTargeted,Targeted,AgentTarget])NewFeature1=XGBClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,UnTargeted,Targeted,AgentTarget])
NewFeature1为该自我评估特征,XGBClassifier为分类函数, SapientState,UnTargeted,Targeted,AgentTarget分别是分步决策和反向评估的累计进化结果,目标未明确的状态值,目标明确的状态值,智能体的初始目标;NewFeature1 is the self-assessment feature, XGBClassifier is the classification function, SapientState, UnTargeted, Targeted, and AgentTarget are the cumulative evolution results of step-by-step decision-making and reverse evaluation, respectively, the state value with unclear target, the state value with clear target, the initial of the agent aims;
筛选该环境评估特征的过程包括:The process of screening this environmental assessment feature includes:
NewFeature2=LightGBMClassifier(input(fin[36],market[32],mac[11],news[17]),output[Situation,OutputLogic])NewFeature2=LightGBMClassifier(input(fin[36], market[32], mac[11], news[17]), output[Situation,OutputLogic])
NewFeature2为该环境评估特征,LightGBMClassifier为分类函数,Situation是情况,OutputLogic是输出逻辑;NewFeature2 is the environmental evaluation feature, LightGBMClassifier is the classification function, Situation is the situation, and OutputLogic is the output logic;
筛选该输出逻辑特征的过程包括:The process of screening the output logical characteristics includes:
NewFeature3=GradientBoostingClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,Situation])NewFeature3=GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])
NewFeature3为该输出逻辑特征,GradientBoostingClassifier为分类函数;NewFeature3 is the output logical feature, and GradientBoostingClassifier is the classification function;
根据该环境评估特征、该输出逻辑特征和该自我评估特征,筛选出该新特征:According to the environmental assessment feature, the output logic feature, and the self-assessment feature, the new feature is selected:
CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])
其中CombinedFeature为该新特征,LRClassifier为逻辑回归函数。Among them, CombinedFeature is the new feature, and LRClassifier is the logistic regression function.
本发明还提出了一种基于广度学习算法实现智能体的通用逻辑推理方法推理系统,其中包括:The present invention also proposes a general logic reasoning method reasoning system based on the breadth learning algorithm to realize the agent, which includes:
模块1、获取对象对应的各类环境数据,其中每类环境数据包括多维数据或指标,通过特征提取得到每类环境数据的属性; Module 1. Obtain various types of environmental data corresponding to the object, where each type of environmental data includes multi-dimensional data or indicators, and obtain the attributes of each type of environmental data through feature extraction;
模块2、建立各类环境数据对应属性的逻辑层,以构建逻辑强化过程和反向模型,对该对象在环境中的情况进行动态的自我评估、环境评估和逻辑评估,并使用逻辑回归算法将三者的评估结果进行特征融合,得到新特征; Module 2. Establish a logical layer corresponding to the attributes of various environmental data to construct a logical strengthening process and a reverse model, conduct dynamic self-assessment, environmental assessment, and logical assessment of the object’s situation in the environment, and use logistic regression algorithms to The evaluation results of the three are combined with features to obtain new features;
模块3、根据该新特征构建新对象,然后对该新对象形成的逻辑与人为初始目标建立因果关系,并对该因果关系和逻辑进行评估,根据评估结果确认因果关系以及形成该因果关系的逻辑后,将符合该因果关系的新对象作为逻辑推理结果进行推荐输出。 Module 3. Construct a new object according to the new feature, then establish a causal relationship between the logic formed by the new object and the human initial goal, and evaluate the causal relationship and logic, and confirm the causal relationship and the logic of forming the causal relationship according to the evaluation results Then, the new object that meets the causal relationship is used as the result of logical reasoning for recommendation output.
该基于广度学习算法实现智能体的逻辑推理系统,其中该模块1包括:The logical reasoning system of the agent based on the breadth learning algorithm is realized, wherein the module 1 includes:
模块11、获取人为初始目标和环境数据,该环境数据包括财务指标、行情指标、新闻指标和宏观指标,以随机的环境数据作为筛选条件,筛选得到满 足该筛选条件的多个股票,集合该多个股票形成alpha指数作为模仿对象;Module 11. Obtain man-made initial goals and environmental data. The environmental data includes financial indicators, market indicators, news indicators, and macro indicators. Random environmental data is used as a screening condition to screen multiple stocks that meet the screening conditions, and collect the multiple stocks. Each stock forms an alpha index as an imitation object;
模块12、通过激励函数合并该环境数据,得到情绪指标,使用该环境数据与该人为初始目标做加法得到智能体的初级概念状态,通过激励函数用模仿对象的收益率除以智能体的初级概念状态,得到取舍逻辑,通过激励函数用取舍逻辑除以模仿对象的收益率,得到智能体的初始目标;Module 12. Combine the environmental data through the incentive function to obtain the emotional index, use the environmental data and the person as the initial goal to add the primary concept state of the agent, and divide the return rate of the imitation object by the primary concept of the agent through the incentive function State, get the logic of selection, and divide the logic of selection by the incentive function by the rate of return of the imitation object to get the initial goal of the agent;
模块13、通过激励函数用智能体的初始目标除以模仿对象的收益率与情绪指标的差得到目标明确的状态值,通过激励函数把智能体的初始目标,目标明确的状态值,智能体的新概念状态和未感知态作为输入得到目标未明确的状态值,通过激励函数用模仿对象的移动止损距离除以目标明确的状态值得到智能体的新概念状态,通过激励函数用目标未明确的状态值除以目标明确的状态值作为分步决策和反向评估的累计进化结果,通过relu函数用智能体的初始目标除以该取舍逻辑得到输出逻辑,通过激励函数用输出逻辑除以人为初始目标与实际收益的差,得到情况;Module 13. Through the incentive function, the agent’s initial goal is divided by the difference between the return rate of the imitation object and the emotional index to obtain a clear state value. Through the incentive function, the agent’s initial goal, the clear state value, and the agent’s The new concept state and the unperceived state are used as input to get the unclear state value of the target, and the new concept state of the agent is obtained by the incentive function by dividing the moving stop loss distance of the imitation object by the clear state value, and the target is unclear through the incentive function Divide the state value of the target by the state value of the goal as the cumulative evolution result of step-by-step decision-making and reverse evaluation. Through the relu function, the agent’s initial goal is divided by the selection logic to obtain the output logic, and the incentive function is used to divide the output logic by the artificial The difference between the initial target and the actual income, get the situation;
模块14、集合该分步决策和反向评估的累计进化结果、该情况和该智能体的初始目标作为该属性。Module 14. Collect the cumulative evolution result of the step-by-step decision and reverse evaluation, the situation and the agent's initial goal as the attribute.
该基于广度学习算法实现智能体的逻辑推理系统,其中该情绪指标的计算:The breadth-based learning algorithm realizes the logical reasoning system of the agent, in which the calculation of the emotional index:
MergeAllData=pd.merge(fin[36],market[32],mac[11],news[17])MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])
EmotionIndex=tanh(weight*MergeAllData+bias)EmotionIndex=tanh(weight*MergeAllData+bias)
EmotionIndex是情绪指标,weight代表权重,fin[36],market[32],mac[11],news[17]分别是财务数据、行情数据、宏观数据、新闻数据,bias是偏置,tanh是激励函数,pd.merge()是合并函数。EmotionIndex is sentiment index, weight represents weight, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias is bias, tanh is incentive Function, pd.merge() is the merge function.
该智能体的初级概念状态的计算:Calculation of the primary concept state of the agent:
idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)
idea代表智能体的初级概念状态,prospective是预期收益,relu是激励函数;Idea represents the primary concept state of the agent, prospect is the expected return, and relu is the incentive function;
该取舍逻辑的计算:Calculation of the trade-off logic:
ar=sigmoid(weight*(reward/idea)+bias))ar=sigmoid(weight*(reward/idea)+bias))
ar为取舍逻辑,reward是模块11中该模仿对象的收益率,sigmoid是激励函数;ar is the logic of selection, reward is the rate of return of the imitation object in module 11, and sigmoid is the incentive function;
AgentTarget=relu(weight*(ar/reward)+bias))AgentTarget=relu(weight*(ar/reward)+bias))
AgentTarget是智能体的初始目标,weight代表权重,ar是取舍逻辑,reward是alpha的收益率,bias是偏置,relu是激励函数;AgentTarget is the initial target of the agent, weight represents the weight, ar is the logic of selection, reward is the rate of return of alpha, bias is the bias, and relu is the incentive function;
该目标明确的状态值的值为:The value of the targeted state value is:
Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))
Targeted是目标明确的状态值的值,AgentTarget是智能体的初始目标;Targeted is the value of the target state value, and AgentTarget is the initial target of the agent;
该目标未明确的状态值的值为:The value of the unclear status value of the target is:
UnTargeted=sigmoid(weight*([AgentTarget,Targeted,NewIdea,un-recognized n])+bias)) UnTargeted=sigmoid(weight*([AgentTarget, Targeted, NewIdea,un-recognized n ])+bias))
UnTargeted是目标未明确的状态值的值,un-recognized n代表存在n种未感知的数据,这个数据并不在最开始的环境数据里面,而是环境数据外部的数据,当un-recognized n>1时证明至少存在一种未感知的数据没有进入环境数据; UnTargeted is the value of the unclear state value of the target, un-recognized n represents the existence of n kinds of unperceived data. This data is not in the initial environmental data, but data outside the environmental data. When un-recognized n > 1 It is proved that at least one kind of unperceived data has not entered the environmental data;
该智能体的新概念状态通过下式得到:The new concept state of the agent is obtained by the following formula:
NewIdea=relu(weight*(Moveloss/Targeted)+bias))NewIdea=relu(weight*(Moveloss/Targeted)+bias))
NewIdea是该智能体的新概念状态,Moveloss是该模仿对象的移动止损距离;NewIdea is the new concept state of the agent, and Moveloss is the moving stop loss distance of the imitation object;
分步决策和反向评估的累计进化结果就是目标未明确的状态值与目标明确的状态值的比;The cumulative evolution result of step-by-step decision-making and reverse evaluation is the ratio of the state value with unclear goals to the state value with clear goals;
SapientState=relu(weight*(UnTargeted/Targeted)+bias))SapientState=relu(weight*(UnTargeted/Targeted)+bias))
SapientState是分步决策和反向评估的累计进化结果,weight代表权重,UnTargeted是目标未明确的状态值,Targeted是目标明确的状态值,bias是偏置,relu是激励函数;SapientState is the cumulative evolution result of step-by-step decision-making and reverse evaluation, weight represents weight, UnTargeted is the state value with unclear target, Targeted is the state value with clear target, bias is bias, and relu is the incentive function;
输出逻辑的计算:通过relu函数用预期收益除以取舍逻辑Calculation of the output logic: divide the expected return by the trade-off logic through the relu function
OutputLogic=relu(weight*(prospective/ar)+bias))OutputLogic=relu(weight*(prospective/ar)+bias))
OutputLogic是输出逻辑,prospective是预期收益,ar是取舍逻辑;OutputLogic is the output logic, prospective is the expected return, and ar is the logic of choice;
情况代表了对环境认识的最高层次,情况的计算:The situation represents the highest level of understanding of the environment, the calculation of the situation:
Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))
Situation代表情况。Situation represents the situation.
该基于广度学习算法实现智能体的逻辑推理系统,其中该模块2包括:The logical reasoning system of the agent based on the breadth learning algorithm is realized, wherein the module 2 includes:
模块21、使用XGBoost算法,该分步决策和反向评估的累计进化结果、该目标未明确的状态值、该目标明确的状态值和该智能体的初始目标作为输 出,该环境数据作为输入,筛选出的特征作为自我评估特征;使用LightGBM算法,该情况和该输出逻辑作为输出,该环境数据作为输入,筛选出环境评估特征;用GradientBoosting算法,分步决策和反向评估的累计进化结果和情况为输出,该环境数据作为输入,筛选出输出逻辑特征;用逻辑回归算法把该智能体的初始目标作为输出,用该自我评估特征、该环境评估特征和该输出逻辑特征值作为输入,筛选出模型融合的特征作为新特征;Module 21. Using the XGBoost algorithm, the cumulative evolution result of the step-by-step decision and reverse evaluation, the unclear state value of the goal, the clear state value of the goal, and the agent’s initial goal as output, and the environmental data as input, The selected features are used as self-assessment features; the LightGBM algorithm is used, the situation and the output logic are used as output, and the environmental data is used as input to filter out the environmental assessment features; the GradientBoosting algorithm is used to determine the cumulative evolution results of stepwise decision-making and reverse evaluation. When the situation is output, the environmental data is used as input to filter out the output logical characteristics; the logistic regression algorithm is used to take the agent’s initial goal as output, and the self-assessment characteristics, environmental evaluation characteristics and output logical characteristics are used as input to filter Feature merged with the model as a new feature;
模块22、根据该新特征和随机森林算法为每一个股票进行打分,取分值排名最高的多个股票形成的alpha指数,根据新alpha组合与模仿对象alpha的更新规则替代该模块11中该模仿对象;Module 22. Score each stock according to the new feature and random forest algorithm, take the alpha index formed by the multiple stocks with the highest score value, and replace the imitation in module 11 according to the update rule of the new alpha combination and the imitation object alpha Object
模块23、循环该模块11到模块22,直到该分步决策和反向评估的累计进化结果和该情况同时收敛,输出该新模仿对象,并输出该新模仿对象的条件组合以及模仿对象与逻辑之间的因果关系。Module 23. Loop the modules 11 to 22 until the cumulative evolution result of the step-by-step decision and reverse evaluation converges with the situation at the same time, output the new imitation object, and output the condition combination of the new imitation object and the imitation object and logic The causal relationship between.
该基于广度学习算法实现智能体的逻辑推理系统,其中筛选该自我评估特征的过程包括:The breadth-based learning algorithm realizes the logical reasoning system of the agent, wherein the process of screening the self-assessment features includes:
NewFeature1=XGBClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,UnTargeted,Targeted,AgentTarget])NewFeature1=XGBClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,UnTargeted,Targeted,AgentTarget])
NewFeature1为该自我评估特征,XGBClassifier为分类函数,SapientState,UnTargeted,Targeted,AgentTarget分别是分步决策和反向评估的累计进化结果,目标未明确的状态值,目标明确的状态值,智能体的初始目标;NewFeature1 is the self-assessment feature, XGBClassifier is the classification function, SapientState, UnTargeted, Targeted, and AgentTarget are the cumulative evolutionary results of step-by-step decision-making and reverse evaluation, respectively. The state value with unclear target, the state value with clear target, the initial agent aims;
筛选该环境评估特征的过程包括:The process of screening this environmental assessment feature includes:
NewFeature2=LightGBMClassifier(input(fin[36],market[32],mac[11],news[17]),output[Situation,OutputLogic])NewFeature2=LightGBMClassifier(input(fin[36], market[32], mac[11], news[17]), output[Situation,OutputLogic])
NewFeature2为该环境评估特征,LightGBMClassifier为分类函数,Situation是情况,OutputLogic是输出逻辑;NewFeature2 is the environmental evaluation feature, LightGBMClassifier is the classification function, Situation is the situation, and OutputLogic is the output logic;
筛选该输出逻辑特征的过程包括:The process of screening the output logical characteristics includes:
NewFeature3=GradientBoostingClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,Situation])NewFeature3=GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])
NewFeature3为该输出逻辑特征,GradientBoostingClassifier为分类函数;NewFeature3 is the output logical feature, and GradientBoostingClassifier is the classification function;
根据该环境评估特征、该输出逻辑特征和该自我评估特征,筛选出该新特征:According to the environmental assessment feature, the output logic feature, and the self-assessment feature, the new feature is selected:
CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])
其中CombinedFeature为该新特征,LRClassifier为逻辑回归函数。Among them, CombinedFeature is the new feature, and LRClassifier is the logistic regression function.
本发明的技术效果包括:The technical effects of the present invention include:
1:提高了自动化程度1: Improved automation
以前的特征工程都需要人工去做,广度学习当中的反向模型省去了人工做特征工程的过程,机器算法自动更新迭代做特征工程,不断优化特征,这大大加快了建模和模型验证的过程,另外,本发明由于运用到证券市场,所以该方法同时加快了发觉市场机会的速度。The previous feature engineering needs to be done manually. The reverse model in the breadth learning eliminates the need for manual feature engineering. The machine algorithm automatically updates and iteratively performs feature engineering and continuously optimizes features, which greatly accelerates the modeling and model verification. In addition, since the present invention is applied to the securities market, this method also speeds up the discovery of market opportunities.
2:更好的解释了数据之间的依赖关系:2: Better explain the dependence between data:
本发明中我们的逻辑强化过程建立了数据和函数之间的逻辑关系,反向模型建立了数据与逻辑之间的因果关系。这两种关系是模拟了人的思维方式,比如,对自我的认知,对环境的认知,甚至包括对自我和环境的评估,这种模拟人脑思维的方法极大的方便了人们探索事物之间相互的依赖关系。In the present invention, our logic strengthening process establishes the logical relationship between data and function, and the reverse model establishes the causal relationship between data and logic. These two relationships simulate the way of thinking of people, for example, the cognition of self, the cognition of environment, and even the evaluation of self and environment. This method of simulating human brain thinking greatly facilitates people's exploration The interdependence between things.
3:降低了AI的使用门槛。3: Lower the threshold for using AI.
现在的工程人员做AI开发的时候,需要写大量的代码,而且其中有很多是重复性的代码,另外,对于超参的设定没有一定的经验也很有可能导致模型的过拟合。本发明提出的广度学习算法,只要给机器算法一个模仿对象并设定好期望目标,剩下的就可以交给智能体去完成,所以本发明不仅简化了人工做数据挖掘的复杂过程和步骤同时也降低了AI的使用门槛,使得非专业人士也能很快上手设计出符合自己需求的模型。Nowadays, when engineers develop AI, they need to write a lot of code, and many of them are repetitive codes. In addition, there is no certain experience in setting the hyperparameters, which is likely to lead to over-fitting of the model. The breadth learning algorithm proposed by the present invention, as long as the machine algorithm is given an imitation object and the desired goal is set, the rest can be handed over to the agent to complete, so the present invention not only simplifies the complicated process and steps of manual data mining, but also It also lowers the threshold for using AI, so that non-professionals can quickly design models that meet their needs.
4:计算的并行度更高。对比图1和图2可知,广度学习网络是一个可并行计算的网络,他能同时针对不同的环境数据进行不同的神经网络的计算,然后通过反向模型有针对性的不断迭代优化,挖掘新特征与目标之间的关系,这种网络结构支持并行计算,极大加快了计算的速度,减少了迭代的次数。4: The parallelism of calculation is higher. Comparing Figure 1 and Figure 2, we can see that the breadth learning network is a network that can be calculated in parallel. It can calculate different neural networks for different environmental data at the same time, and then use the reverse model to continuously iteratively optimize and discover new The relationship between features and targets. This network structure supports parallel calculations, which greatly accelerates the calculation speed and reduces the number of iterations.
附图简要说明Brief description of the drawings
图1为现有技术采用的广度学习流程图;Figure 1 is a flow chart of breadth learning adopted by the prior art;
图2为本发明采用的广度学习流程图;Figure 2 is a flow chart of breadth learning adopted by the present invention;
图3为本发明实施例中alpha的组合图;Figure 3 is a combination diagram of alpha in an embodiment of the present invention;
图4为本发明实施例的效果对比图;4 is a comparison diagram of the effects of the embodiments of the present invention;
图5为本发明实施例中分步决策和反向评估的累计进化结果随迭代次数的走势图;5 is a trend chart of cumulative evolution results of step-by-step decision-making and reverse evaluation with the number of iterations in an embodiment of the present invention;
图6为本发明实施例中情况随迭代次数的走势图;Fig. 6 is a trend chart of the situation with the number of iterations in the embodiment of the present invention;
图7为本发明实施例中随迭代次数增加的收益率走势图;FIG. 7 is a trend chart of the rate of return as the number of iterations increases in an embodiment of the present invention;
图8为本发明广度学习的工作流程图;Figure 8 is a working flow chart of the breadth learning of the present invention;
图9为数据和函数之间的关系图;Figure 9 is a diagram of the relationship between data and functions;
图10为本发明逻辑强化流程图;Figure 10 is a flow chart of logic enhancement of the present invention;
图11为现有技术变量x和y之间的函数联系图;Figure 11 is a diagram of the function relationship between variables x and y in the prior art;
图12为现有技术变量x和y之间的深度学习联系图;Figure 12 is a deep learning connection diagram between variables x and y in the prior art;
图13为本发明变量x和y之间的函数联系图;Figure 13 is a diagram of the function relationship between variables x and y of the present invention;
图14为本发明变量x和y之间的深度学习联系图;Figure 14 is a deep learning connection diagram between variables x and y of the present invention;
图15为本发明反向模型流程图;Figure 15 is a flow chart of the reverse model of the present invention;
图16为本发明整体流程图;Figure 16 is the overall flow chart of the present invention;
图17为本发明反向模型的网络结构图;Figure 17 is a network structure diagram of the reverse model of the present invention;
图18为本发明反向模型的计算流程图;Figure 18 is a calculation flow chart of the reverse model of the present invention;
图19为本发明新特征形成新alpha组合的数据流程图;Figure 19 is a data flow diagram of the new feature of the present invention forming a new alpha combination;
图20为本发明另一实施例中智能体的新概念状态的值的变化情况图;FIG. 20 is a diagram of changes in the value of the new concept state of an agent in another embodiment of the present invention;
图21为本发明另一实施例中收益率、情况和分步决策和反向评估的累计进化结果与收益的变化情况图;FIG. 21 is a diagram showing the change of the return rate, the situation, and the cumulative evolution result of step-by-step decision-making and reverse evaluation and the return in another embodiment of the present invention;
图22为本发明另一实施例中最大回撤的变化情况图。Fig. 22 is a diagram showing the change of the maximum drawdown in another embodiment of the present invention.
实现本发明的最佳方式The best way to implement the invention
为让本发明的上述特征和效果能阐述的更明确易懂,下文特举实施例,并配合说明书附图作详细说明如下。In order to make the above-mentioned features and effects of the present invention more clear and understandable, the following specific embodiments are described in detail in conjunction with the accompanying drawings of the specification.
本发明框架实现过程包括:首先:生成逻辑和逻辑强化过程。通过不同维度的数据建立多个多层神经网络(多个深度学习网络组成的广度学习网络),然后通过连接n个函数的输出作为逻辑层的输入建立数据和函数之间的逻辑关系,也就是建立在不同维度上的数据与逻辑的关系(本发明用的是分类算法建立数据与逻辑的关系,除了分类外,也可以是其他算法组成的逻辑关系); 接着验证逻辑的好坏。首先利用反向模型自动做特征工程形成新的特征,然后用新的特征生成新的模仿对象与旧模仿对象对比,对比的目的就是检验机器算法生成的结果与模仿对象之间的差距,差距越小说明模仿的越好,但是因为有人为初始目标在里面,所以这个差距是需要加上人为初始目标后的差距,也就是检验机器算法生成的结果与人为初始目标减去模仿对象后的差距,差距在缩小说明机器算法是有效的,反之差距越来越大,要么是目标不合适,要么是环境数据不充分。这个主要反映在指标智能体的新概念状态上面。新特征生成的模仿对象就是新逻辑下的结果,最后验证该逻辑下的结果是否与目标之间有因果关系,如果有,则代表这个逻辑符合实际情况,保留这种逻辑并继续迭代生成新的特征,使得机器算法能通过自我优化不断接近人所期望的目标,如果没有,则也会继续迭代生成新的逻辑,并进一步验证,经过多次迭代后如果确实不存在逻辑与模仿对象之间的因果关系,就会输出智能体的新概念状态,智能体的新概念状态会提示哪里出了问题,也可以说逻辑好不好的评价标准就是检验有没有因果关系,本发明的因果关系即逻辑与结果之间是否有正相关关系。The realization process of the framework of the present invention includes: first: generating logic and logic strengthening process. Establish multiple multi-layer neural networks (a breadth learning network composed of multiple deep learning networks) through data of different dimensions, and then establish the logical relationship between data and functions by connecting the output of n functions as the input of the logical layer, that is The relationship between data and logic established in different dimensions (the present invention uses a classification algorithm to establish the relationship between data and logic. In addition to classification, it can also be a logical relationship composed of other algorithms); then verify the quality of the logic. First, use the reverse model to automatically do feature engineering to form new features, and then use the new features to generate a new imitation object to compare with the old imitation object. The purpose of the comparison is to test the gap between the result generated by the machine algorithm and the imitation object. It means that the better the imitation, but because the human-made initial target is in it, the gap is the gap after adding the artificial initial target, that is, the difference between the result generated by the machine algorithm and the artificial initial target minus the imitation object. The narrowing of the gap indicates that the machine algorithm is effective. On the contrary, the gap is getting bigger and bigger, either because the target is inappropriate or the environmental data is insufficient. This is mainly reflected in the new concept state of the indicator agent. The imitation object generated by the new feature is the result of the new logic. Finally, it is verified whether there is a causal relationship between the result of the logic and the target. If there is, it means that the logic is in line with the actual situation. Keep this logic and continue to iteratively generate new ones. Features that enable the machine algorithm to continuously approach the desired goal through self-optimization. If not, it will continue to iteratively generate new logic, and further verify that after multiple iterations, if there is indeed no logic and imitation object Causality, the new concept state of the agent will be output, and the new concept state of the agent will indicate what went wrong. It can also be said that the evaluation criterion of good logic is to check whether there is causality. The causality of the present invention is logical and Whether there is a positive correlation between the results.
以下为本发明详细完整的过程。The following is a detailed and complete process of the present invention.
首先构造一个多层次多维度的环境,多层次数据在本发明中把证券市场的数据分为四大类,分别是宏观指标,财务指标,行情指标,新闻指标,见下表1:First, construct a multi-level and multi-dimensional environment. In the present invention, the multi-level data divides the securities market data into four categories, namely macro indicators, financial indicators, market indicators, and news indicators. See Table 1 below:
Figure PCTCN2019078710-appb-000001
Figure PCTCN2019078710-appb-000001
表1Table 1
每个层次的数据我们又分为多个维度的数据,其中宏观指标有11个维度,10年的数据,记为mac[11*120],财务指标有36个维度,每年四个季度,总共10年,3345个股票,记为fin[36*40*3345],行情指标32个维度,总共10年,3345支股票,记为market[32*2250*3345],新闻指标有17个维度,总共5年的数据,每3天统计一次,3345个股票,记为news[17*600*3345]。见表2:Each level of data is divided into multiple dimensions of data. Among them, there are 11 dimensions of macro indicators, 10 years of data, denoted as mac[11*120], and 36 dimensions of financial indicators, four quarters per year, total In 10 years, 3345 stocks are recorded as fin[36*40*3345], and market indicators have 32 dimensions. A total of 10 years, 3345 stocks are recorded as market[32*2250*3345]. News indicators have 17 dimensions. A total of 5 years of data, counted every 3 days, 3345 stocks, recorded as news[17*600*3345]. See Table 2:
Figure PCTCN2019078710-appb-000002
Figure PCTCN2019078710-appb-000002
Figure PCTCN2019078710-appb-000003
Figure PCTCN2019078710-appb-000003
表2Table 2
然后通过这个环境构造一个智能体可以模仿的对象——alpha组合和一个人为初始目标,模仿对象可以是目标也可以是近似于期望的目标。Then construct an object that the agent can imitate through this environment-alpha combination and a human initial goal. The imitated object can be a goal or a goal close to the expectation.
本实施例中模仿对象是一个随机挑选的alpha组合:见图3。该alpha组合由20只股票组成,收益率是0.0063,夏普比率是0.000089。The imitation object in this embodiment is a randomly selected alpha combination: see Figure 3. The alpha portfolio consists of 20 stocks, with a yield of 0.0063 and a Sharpe ratio of 0.000089.
接着设定人为初始目标为收益率为5%,夏普比率为3,然后让智能体自动自我学习并开始迭代,总共迭代1万次,并在3千次,5千次,1万次的时候输出智能体推荐的新的alpha组合,分别纪为alpha-3,alpha-5,alpha-10,并制作成指数后与模仿对象alpha组合进行同轴比较,结果见图4,可以看出随着迭代次数的增加,组合的收益率是在逐渐上升的。同时根据输出的数据,分别计算这四个组合的收益率和夏普指数,见下表3:Then set the human initial target as a return rate of 5% and a Sharpe ratio of 3, and then let the agent automatically learn by itself and start iterating, a total of 10,000 iterations, and at the time of 3,000, 5,000, and 10,000 The new alpha combination recommended by the output agent is alpha-3, alpha-5, and alpha-10 respectively, and the index is made into an index and compared with the alpha combination of the imitation object. The result is shown in Figure 4. As the number of iterations increases, the rate of return of the portfolio is gradually rising. At the same time, according to the output data, respectively calculate the return rate and Sharpe index of these four combinations, see Table 3 below:
Figure PCTCN2019078710-appb-000004
Figure PCTCN2019078710-appb-000004
表3table 3
从表3中可以看到随着迭代次数的增加alpha的收益率和夏普指数都在逐步上升,虽然没有达到人为初始目标,但是机器算法确实是在朝着设定的人为初始目标逐步接近,智能体如何自我学习并提升自己的呢?在本发明中首先运用了深度学习的神经网络来建立数据和函数之间的关系。见图9。这个图的一般表达式是:
Figure PCTCN2019078710-appb-000005
式中
Figure PCTCN2019078710-appb-000006
是预测值,weight就是权重,x是输入,bias是偏置,σ是激励函数(激励函数也叫激活函数)本发明中运用到了三种激励函数。分别如下:
It can be seen from Table 3 that with the increase of the number of iterations, both the alpha yield and the Sharpe index are gradually increasing. Although the artificial initial goal is not reached, the machine algorithm is indeed gradually approaching the set artificial initial goal. How does the body learn and improve itself? In the present invention, a deep learning neural network is first used to establish the relationship between data and functions. See Figure 9. The general expression of this graph is:
Figure PCTCN2019078710-appb-000005
Where
Figure PCTCN2019078710-appb-000006
Is the predicted value, weight is the weight, x is the input, bias is the bias, and σ is the activation function (the activation function is also called the activation function). Three types of activation functions are used in the present invention. They are as follows:
Sigmoid函数:f(x)=1/(1+e x) Sigmoid function: f(x)=1/(1+e x )
Tanh函数:f(x)=(e x-e -x)/(e x+e -x) Tanh function: f(x)=(e x -e -x )/(e x +e -x )
Relu函数:f(x)=max(0,x)Relu function: f(x)=max(0,x)
然后广度学习建立了自己的逻辑和评估系统,建立逻辑的过程我们叫做逻辑强化过程,评估系统本案我们叫做拟人决策模型,其中逻辑强化包括了目标强化和策略强化,拟人决策模型包括自我评估、环境评估、逻辑评估和反向模型四个部分。我们首先看逻辑强化过程,见图10。逻辑强化过程简单说就是抽象逻辑的过程。逻辑强化的过程是形成新逻辑的过程,要形成新的逻辑就要建立逻辑层。通过函数建立逻辑层,然后通过逻辑层探寻变量和自变量之间的关系。因为人们在探索事件之间的关系时就是寻找它们的逻辑关系,而非函数关系。所以本发明要建立智能体的逻辑思维,首先就要构建它的逻辑层,如何 构建呢?现有技术如图11、12所示,变量x和自变量y之间是通过函数或深度学习,引入了神经网络建立联系;而本发明的方案如图13、14所示。这里面的差别在于,本发明在中间建立了一个逻辑层,逻辑层也可以看成是多个函数的融合,逻辑层的输入是各个函数的输出,而逻辑层的输出才对应到自变量y。而y就是我们的模仿对象,不过这里的模仿对象并不是只有alpha的收益率,还包括了通过取舍逻辑而进入y值的人为初始目标,也就是我们给机器算法(智能体)定义的智能体的初始目标值。另外,逻辑层的逻辑函数我们用激励函数来作为逻辑函数,因为经过激励函数后的值要么在-1到1之间,要么在0到1之间,所以我们把经过激励函数后的值作为某个属性的逻辑函数来处理。本发明中我们主要建立了两个逻辑层,分别是目标强化逻辑层和策略强化逻辑层。在经过3千次,5千次,1万次的迭代后,下面我们进一步看具体的参数,见下表4:Then breadth learning has established its own logic and evaluation system. The process of establishing logic is called the process of logical reinforcement. In this case, the evaluation system is called the anthropomorphic decision model. The logic reinforcement includes goal reinforcement and strategy reinforcement. The anthropomorphic decision model includes self-assessment and environment. Four parts: evaluation, logic evaluation and reverse model. Let us first look at the logic strengthening process, as shown in Figure 10. The logic strengthening process is simply the process of abstract logic. The process of logic strengthening is the process of forming new logic. To form new logic, a logic layer must be established. Establish a logic layer through functions, and then explore the relationship between variables and independent variables through the logic layer. Because when people explore the relationship between events, they are looking for their logical relationship, not a functional relationship. Therefore, in order to establish the logical thinking of the agent, the present invention must first construct its logical layer. How to construct it? The prior art is shown in Figs. 11 and 12, the variable x and the independent variable y are connected through function or deep learning, and a neural network is introduced; while the scheme of the present invention is shown in Figs. The difference here is that the present invention establishes a logic layer in the middle. The logic layer can also be regarded as the fusion of multiple functions. The input of the logic layer is the output of each function, and the output of the logic layer corresponds to the independent variable y. . And y is our imitation object, but the imitation object here is not only the return rate of alpha, but also includes the initial goal of entering the y value through the logic of selection, that is, the agent we define for the machine algorithm (agent) The initial target value. In addition, the logic function of the logic layer we use the activation function as the logic function, because the value after the activation function is either between -1 and 1, or between 0 and 1, so we take the value after the activation function as Logical function of a certain attribute to deal with. In the present invention, we mainly establish two logic layers, namely, the goal enhancement logic layer and the strategy enhancement logic layer. After 3,000, 5,000, and 10,000 iterations, let’s take a closer look at the specific parameters, see Table 4:
Figure PCTCN2019078710-appb-000007
Figure PCTCN2019078710-appb-000007
Figure PCTCN2019078710-appb-000008
Figure PCTCN2019078710-appb-000008
表4Table 4
分析表4,发现从目标强化和策略强化抽出了的两个核心指标分步决策和反向评估的累计进化结果和情况来看也是在逐步提升的,其中目标强化和策略强化都是一个过程,这个过程中会产生不同的指标,表4中的输出逻辑和情况是策略强化过程产生的,而智能体的初始目标,取舍逻辑,目标明确的状态值,目标未明确的状态值,分步决策和反向评估的累计进化结果是目标强化过程产生的。同时把分步决策和反向评估的累计进化结果和情况的得分也输出来看一下,具体请见图5和图6,虽然情况在开始的时候有所下跌,但是后来却逐步地趋于稳定,当他们逐步趋于平缓后,也是alpha收益到达高点的时候,把这三个值抽出来看,见下表5a:Analyzing Table 4, it is found that the cumulative evolutionary results and situation of the step-by-step decision-making and reverse evaluation of the two core indicators extracted from the goal strengthening and strategy strengthening are also gradually improving. Among them, goal strengthening and strategy strengthening are both a process. Different indicators will be produced in this process. The output logic and situations in Table 4 are produced by the strategy strengthening process, and the agent’s initial goal, choice logic, state value with clear goal, state value with unclear goal, step-by-step decision-making The cumulative evolutionary result of the reverse evaluation is produced by the goal reinforcement process. At the same time, take a look at the cumulative evolutionary results of step-by-step decision-making and reverse evaluation and the score of the situation. Please see Figures 5 and 6 for details. Although the situation declined at the beginning, it gradually stabilized later. When they gradually flatten out, it is also when the alpha income reaches a high point. Take these three values out and look at the following table 5a:
Figure PCTCN2019078710-appb-000009
Figure PCTCN2019078710-appb-000009
表5aTable 5a
分步决策和反向评估的累计进化结果代表着对机器算法的自我认识的最高水平,而情况代表着机器算法对环境认识的最高水平,当这两值经过不断地迭代后,都达到一定的收敛后,机器算法挑选出来的股票组合的收益率也达到了最高点。见图7。从图上可以更直观的看出来,alpha的收益率在经过3千次,5千次,1万次迭代后,机器算法的分步决策和反向评估的累计进化结果和情况与收益率成正相关性,同时我们输出每次迭代后新alpha的收益与分步决策和反向评估的累计进化结果和情况的相关系数矩阵,见下表5b:The cumulative evolution result of step-by-step decision-making and reverse evaluation represents the highest level of self-knowledge of the machine algorithm, and the situation represents the highest level of machine algorithm's understanding of the environment. When these two values are continuously iterated, they both reach a certain level. After convergence, the return rate of the stock portfolio selected by the machine algorithm also reached the highest point. See Figure 7. It can be seen more intuitively from the figure that after 3,000, 5,000, and 10,000 iterations of the alpha rate of return, the cumulative evolution results of the step-by-step decision-making and reverse evaluation of the machine algorithm and the situation are positive with the rate of return. Correlation, and at the same time, we output the correlation coefficient matrix of the new alpha gain after each iteration and the cumulative evolution result and situation of stepwise decision-making and reverse evaluation, as shown in Table 5b:
Figure PCTCN2019078710-appb-000010
Figure PCTCN2019078710-appb-000010
表5bTable 5b
从表中我们能确定收益率与分步决策和反向评估的累计进化结果和情况也成正相关性。(相关系数的强弱,一般来说,取绝对值后,0-0.09为没有相关性,0.1-0.3为弱相关,0.3-0.5为中等相关,0.5-1.0为强相关)。From the table, we can determine that the rate of return is also positively correlated with the cumulative evolutionary results and conditions of step-by-step decision-making and reverse evaluation. (The strength of the correlation coefficient, generally speaking, after taking the absolute value, 0-0.09 is no correlation, 0.1-0.3 is weak correlation, 0.3-0.5 is medium correlation, and 0.5-1.0 is strong correlation).
到这里我们建立了一套机器算法(智能体)通过观察对象与环境之间的相关性而认识因果关系的方法(在本案中alpha就是对象,而要判断他们之间的关系还必须有个“我”,而分步决策和反向评估的累计进化结果就是“我”的体现),这种方法是通过模拟大脑思维过程用广度学习算法而建立的框架,他可以让智能体(机器算法)通过观察而认识因果关系,从而做出逻辑推理。这里的逻辑推理指的是:如果有这种因果关系,代表机器算法挑选股票的逻辑是可行的。如果不存在这种因果关系,那么他能推理出要么是目标不合理,要么是数据不完备。另外,该方法也更好地解释了数据之间的相互依赖关系(因果 关系),而不像之前深度学习方法是无法解释数据之间的相互依赖关系。深度学习方法之所以无法解释数据之间的相互依赖关系就是因为他只建立数据之间的函数关系,随着深度的加深,函数之间的参数会变得越来越多以至于无法用数学表达式来表示他们之间的关系,也因此而无法解释。而我们建立的广度学习虽然也是用深度学习的方法,但是他在深度学习之外建立了函数与属性之间的逻辑关系,以及在逻辑与事实之间建立的因果关系,使得“智能体”通过观察对象与环境之间的相关性就能理解他们的因果关系。实际上这种方法更为重要的作用不在于只是理解对象到事实之间的因果关系,更重要的是能通过修改其中的逻辑关系来探索对象到事实之间的因果关系。比如,我们看下表6:So far we have established a set of machine algorithms (agents) to recognize the causal relationship by observing the correlation between the object and the environment (in this case, alpha is the object, and to judge the relationship between them, there must be a " "I", and the cumulative evolutionary result of step-by-step decision-making and reverse evaluation is the embodiment of "I"). This method is a framework established by simulating the brain's thinking process and using a breadth learning algorithm. It can make the agent (machine algorithm) Recognize causality through observation and make logical reasoning. The logical reasoning here refers to: if there is such a causal relationship, the logic of selecting stocks on behalf of the machine algorithm is feasible. If there is no such causal relationship, then he can infer that either the goal is unreasonable or the data is incomplete. In addition, this method also better explains the interdependence between data (causality), unlike the previous deep learning methods, which cannot explain the interdependence between data. The reason why the deep learning method cannot explain the interdependence between the data is because it only establishes the functional relationship between the data. As the depth deepens, the parameters between the functions will become more and more so that they cannot be expressed in mathematics. Formula to express the relationship between them, and therefore cannot be explained. Although the breadth learning we have established is also a deep learning method, it has established a logical relationship between functions and attributes, and a causal relationship between logic and facts outside of deep learning, allowing the "agent" to pass Observing the correlation between the object and the environment can understand their causality. In fact, the more important function of this method is not only to understand the causal relationship between the object and the fact, but more importantly, to explore the causal relationship between the object and the fact by modifying the logical relationship. For example, let’s look at Table 6 below:
Figure PCTCN2019078710-appb-000011
Figure PCTCN2019078710-appb-000011
表6Table 6
这个表是经过不断迭代后,初始alpha的选股条件发生了改变,原来从四类数据中,每类数据选取2个条件作为选股条件,经过3千,5千,1万次迭代后,每类数据中选股的条件都分别发生了改变。见下表7:This table is after constant iterations, the initial alpha stock selection conditions have changed. Originally, from the four types of data, 2 conditions for each type of data were selected as the stock selection conditions. After 3,000, 5,000, and 10,000 iterations, The conditions for stock selection in each type of data have been changed. See Table 7 below:
Figure PCTCN2019078710-appb-000012
Figure PCTCN2019078710-appb-000012
Figure PCTCN2019078710-appb-000013
Figure PCTCN2019078710-appb-000013
表7Table 7
为什么会发生改变,是因为本发明的反向模型能自动的做特征重组。The reason for the change is that the reverse model of the present invention can automatically reorganize features.
进一步的,如果改变其中的选股条件,比如用户认为市盈率和市净率是相似的指标,想把市净率换成总负债,那么机器算法就能根据人为干预的逻辑,重新进行自动检验,检验这种逻辑下的因果关系是否存在,由于人工可以干预中间的选股条件,其实相当于人只要修正机器算法挖掘的逻辑关系而检验人所期待的因果关系。换句话说,人只需要告知机器算法人的逻辑,机器算法就能根据这种逻辑自动探索符合这个逻辑下的因果关系(此处的因果关系就是图4表示的正相关性),如果有这种因果关系就说明人给出的逻辑是合适的,如果没有这种因果关系说明人给出的这个逻辑可能是不合适的,同时机器算法会给出提示,提示分两种,要么不存在这种因果关系,也就是逻辑与事实不符合,要么是人最初定的目标不合适,提示的指标就是智能体的新概念状态,如果智能体的新概念状态(是目标强化当中的一个指标)这一指标越来越大,意味着 人工给出的逻辑在本批次数据中可能不存在该逻辑下的因果关系,就需要引入新的环境数据,如果智能体的新概念状态越来越小意味着最初定的目标不合适。也就是说,我们给出了一套鉴定逻辑好不好的评价标准,那就是看有没有因果关系,在本案中,逻辑好不好看回报与分步决策和反向评估的累计进化结果和情况之间是否出现正相关关系,如果有正相关关系,就说明人给的逻辑是可信的,否则就需要看目标定的是否合适,环境数据是不是充分等情况。总结起来说:广度学习的意义在于他能自动根据人的逻辑探索未知数据的因果关系,减少人们探索真理的时间,缩减人们探索真理的成本;同时他改变了之前所有推荐算法由被动推荐(机器算法的推荐)变为主动推荐(人工干预的推荐)的方式。Furthermore, if the stock selection conditions are changed, for example, users believe that the price-to-earnings ratio and the price-to-book ratio are similar indicators and want to convert the price-to-book ratio into total liabilities, then the machine algorithm can re-test automatically based on the logic of human intervention. To test whether the causal relationship under this logic exists, because humans can intervene in the middle stock selection conditions, it is actually equivalent to only modifying the logical relationship mined by the machine algorithm to test the causal relationship that people expect. In other words, people only need to inform the machine algorithm of the human logic, and the machine algorithm can automatically explore the causal relationship under this logic based on this logic (the causal relationship here is the positive correlation shown in Figure 4), if there is this A causal relationship means that the logic given by humans is appropriate. If there is no such causal relationship, the logic given by humans may be inappropriate. At the same time, the machine algorithm will give hints. There are two kinds of hints, or there is no such thing. A causal relationship, that is, the logic does not match the facts, or the initial goal set by the person is inappropriate, the indicator that is prompted is the new concept state of the agent. If the new concept state of the agent (which is an indicator of goal enhancement) is this An indicator is getting larger and larger, which means that the logic given manually may not have a causal relationship under that logic in this batch of data. New environmental data needs to be introduced. If the state of the new concept of the agent becomes smaller and smaller, it means The original goal is inappropriate. In other words, we have given a set of evaluation criteria for determining whether the logic is good or not, that is, whether there is a causal relationship. In this case, whether the logic is good or not is the relationship between the return and the cumulative evolutionary results and circumstances of step-by-step decision-making and reverse evaluation. Whether there is a positive correlation, if there is a positive correlation, it means that the logic given by the person is credible, otherwise it depends on whether the target is appropriate, whether the environmental data is sufficient, etc. To sum up: The significance of breadth learning is that it can automatically explore the causality of unknown data based on human logic, reduce the time people spend exploring the truth, and reduce the cost of people exploring the truth; at the same time, it has changed all previous recommendation algorithms from passive recommendation (machine Algorithmic recommendation) becomes active recommendation (recommendation of manual intervention).
为了进一步说明广度学习算法对逻辑推理的运用,我们换一个目标,看看机器算法是如何通过因果关系来做逻辑推理的?这次我们人为设定目标是收益率为5%,最大回撤是5%。我们仍然以原来的alpha作为模仿对象开始我们的迭代。经过1万次迭代后,我们首先看看智能体的新概念状态的值的变化情况。见图20,可以看到智能体的新概念状态的值大幅的来回波动,而不是趋于一条窄幅的波动,说明压力值较大,难以实现目标。根据我们的测试,智能体的新概念状态的值在0.5范围波动是最有效的,低于0.5并且趋于0值往往是目标设计的不合理,而高于0.5并且趋于1代表模仿对象给的不合理,也就是环境数据不充分。另外,智能体的新概念状态的值平价波幅在0.2范围以内目标容易收敛,超过0.3平价波幅范围的一般难以接近目标,而且情况和分步决策和反向评估的累计进化结果都不容易收敛,上面的平价波幅是0.2995。再来看收益率的变化情况,见图21,收益率不仅没有收敛,而且波动较大,同时情况和分步决策和反向评估的累计进化结果与收益也不成正比。再看,见图22,我们看到alpha的最大回撤一直处于在较大范围内的波动,并且还有增大的迹象,以上种种迹象判断,这次的人为设定的目标是达不到的,至少在我们选择的这段时间内alpha是无法达到这个目标的。而作为机器算法(智能体)的逻辑推理是这样的:第一判断有没有因果关系,从上面的数据看没有因果关系,没有因果关系代表智能体无法从这些条件因素中组合出与人为初始目标接近的模仿对象。其次看智能体的新概念状态的值域范围,低于0.5并且趋于0代表目标难以实现,综上所述,衡量逻辑的指标与评估组合收益的指标同升同 降(有正相关关系),说明逻辑这个因与实际的果之间存在着因果关系,而智能体的新概念状态是推导这种因果关系的推断模型,他探索的是条件因素与目标变量的因果关系。In order to further illustrate the application of breadth learning algorithms to logical reasoning, let's change a goal and see how machine algorithms do logical reasoning through causality? This time we artificially set a target rate of return of 5%, and the maximum drawdown is 5%. We still start our iteration with the original alpha as the imitation object. After 10,000 iterations, we first look at the changes in the value of the new concept state of the agent. As shown in Figure 20, we can see that the value of the new concept state of the agent fluctuates greatly back and forth, instead of tending to a narrow fluctuation, indicating that the pressure value is large and it is difficult to achieve the goal. According to our test, the value of the new concept state of the agent fluctuates in the range of 0.5 is the most effective. A value lower than 0.5 and tending to 0 is often unreasonable in the target design, while a value higher than 0.5 and tending to 1 means that the imitation object gives Is unreasonable, that is, insufficient environmental data. In addition, the value parity volatility of the new concept state of the agent is within the range of 0.2, and the target is easy to converge. It is generally difficult to approach the target if it exceeds the 0.3 parity volatility range. Moreover, the situation and the cumulative evolution result of stepwise decision-making and reverse evaluation are not easy to converge. The parity volatility above is 0.2995. Let's look at the change in the rate of return, as shown in Figure 21. The rate of return not only has not converged, but also fluctuates greatly. At the same time, the situation and the cumulative evolutionary results of step-by-step decision-making and reverse evaluation are not proportional to the return. Looking at Figure 22 again, we can see that the maximum retracement of alpha has been fluctuating within a relatively large range, and there are signs of increase. The above signs judge that the artificial goal set this time is not achieved. Yes, at least in the period of time we have chosen, alpha cannot achieve this goal. The logical reasoning as a machine algorithm (agent) is as follows: first judge whether there is a causal relationship, from the above data, there is no causal relationship, and no causal relationship means that the agent cannot combine these conditional factors with humans as the initial goal Close imitation object. Secondly, look at the range of the new concept state of the agent. If it is lower than 0.5 and tends to 0, it means that the goal is difficult to achieve. In summary, the indicators for measuring logic and the indicators for evaluating portfolio returns both rise and fall (there is a positive correlation) , Which shows that there is a causal relationship between the cause of logic and the actual result, and the new conceptual state of the agent is an inference model for deriving this causal relationship. What he explores is the causal relationship between conditional factors and target variables.
广度学习的框架:The framework of breadth learning:
框架包括逻辑强化和拟人决策模型,其中该逻辑强化包括目标强化和策略强化,该拟人决策模型包括自我评估、环境评估、逻辑评估和反向模型四个部分。图8说明了广度学习的总体框架工作流程:The framework includes logical reinforcement and anthropomorphic decision-making models. The logical reinforcement includes goal reinforcement and strategy reinforcement. The anthropomorphic decision-making model includes four parts: self-assessment, environmental assessment, logical assessment and reverse model. Figure 8 illustrates the overall framework workflow of breadth learning:
首先拟人决策模型根据环境的情况(Situation)做出一种判断,也就是输出一套逻辑来看环境给与的反馈(reward),如果反馈是符合人为初始目标,就输出结果,如果反馈不如人为初始目标,那么输出的逻辑就要把反馈(reward)给拟人决策模型做反向,拟人决策模型根据输出逻辑和环境的反馈重新作出一套输出逻辑,然后输出逻辑再到环境中验证(strategy)如果符合人的预期就输出结果,如果不符合人的预期接着再重复上面的过程。当然如果一直都不能符合人的预期广度学习算法会产生出智能体的新概念状态,智能体的新概念状态会提示可能是环境的数据不能满足,也或者是目标期望不切合实际环境。First of all, the anthropomorphic decision model makes a judgment based on the situation of the environment (Situation), that is, outputs a set of logic to see the feedback given by the environment (reward). If the feedback is in line with the human initial goal, output the result. If the feedback is not as good as human The initial goal is to reverse the output logic to the anthropomorphic decision model. The anthropomorphic decision model makes a new set of output logic based on the output logic and environmental feedback, and then the output logic is verified in the environment (strategy) If it meets people's expectations, output the results. If it does not meet people's expectations, then repeat the above process. Of course, if it has not been able to meet human expectations, the learning algorithm will produce a new conceptual state of the agent. The new conceptual state of the agent will indicate that the data of the environment cannot be met, or that the target expectation is not in line with the actual environment.
广度学习框架的优点在于以下六点。The advantages of the breadth learning framework are the following six points.
1、广度学习实现小样本泛化:1. Extensive learning to realize small sample generalization:
本发明中只在开始时给智能体一个用于模仿学习的alpha组合,之后智能体就能拆解形成模仿对象alpha组合的各个指标,然后根据目标不断迭代后就能产生多个新alpha组合。就仿佛一开始给一个小孩子一套乐高玩具,开始时玩具的形状是个小车,后来小孩子经过拆解小车后重新组装出了新的小车或者其他形状的玩具。本发明当中的alpha组合相当于玩具小车,而组成玩具小车的模组就是组成alpha的那些环境数据,基于广度学习算法的框架就能实现小样本泛化,而之前的深度学习是需要大数据才能实现泛化。In the present invention, only an alpha combination for imitation learning is given to the agent at the beginning, and then the agent can disassemble each index forming the alpha combination of the imitation object, and then iterate continuously according to the target to generate multiple new alpha combinations. It's like giving a child a set of Lego toys at the beginning. At the beginning, the toy was in the shape of a small car. Later, the child disassembled the car and reassembled it into a new car or other toys. The alpha combination in the present invention is equivalent to a toy car, and the modules that make up the toy car are the environmental data that make up the alpha. The framework based on the breadth learning algorithm can realize the generalization of small samples, while the previous deep learning requires big data. Achieve generalization.
2、现在的深度学习数据流的路径是设计出来的,数据需要走完整个路径,这种方法的实际是人们把知识灌输给智能体,而不是让智能体自己去探索,而本发明采用的广度学习的数据流路径不是靠人工设计出来的,而是机器本身自动寻找最优路径“走”出来的。也就是说广度学习是真正实现智能体的自我探索,他能通过探索因果关系来评价逻辑的优劣,或者通过修改逻辑来达到期 望的结果,这将非常方便的使得人们能根据自己的逻辑探索未知的因果关系,并大大缩减探索真理的试错成本。2. The path of the current deep learning data flow is designed, and the data needs to take a complete path. The actual method of this method is that people instill knowledge into the agent instead of letting the agent explore by themselves, and the present invention uses The data flow path of breadth learning is not manually designed, but the machine itself automatically finds the optimal path to "walk". In other words, breadth learning is to truly realize the self-exploration of the agent. It can evaluate the pros and cons of logic by exploring causality, or modify the logic to achieve the desired result. This will be very convenient for people to explore according to their own logic. Unknown causality, and greatly reduce the trial and error cost of exploring the truth.
3、现在的深度学习是通过迭代来优化出最好的参数从而做出图形识别,语音识别或自然语音理解,而我们的广度学习是通过建立数据与对象的属性之间的逻辑关系,然后建立对象与环境之间的因果关系,并对周围的环境做出评估,通过不断迭代探索逻辑与因果之间的关系,并最终达成预期目标。3. The current deep learning is to optimize the best parameters through iteration to make pattern recognition, speech recognition or natural speech understanding, and our breadth learning is to establish the logical relationship between the data and the attributes of the object, and then establish The causal relationship between the object and the environment, and evaluate the surrounding environment, explore the relationship between logic and causation through continuous iteration, and finally achieve the desired goal.
4、过去的计算机技术我们探索的是变量和自变量的关系,自变量通常来说都是外部的一个客观数据,但是人们在做决策的时候往往不是针对一个或几个客观数据来做决策,而是根据自己的需要来决策。本发明中我们通过模拟人的思维方式模拟了一个智能体,并且让这个智能体有自己的目标,这个目标不是一个外部客观数据,就是说,我们不是用人为设定的目标(本发明中的人为初始目标是收益率和夏普指标)来做为智能体的目标,也不是用人为设定的逻辑检验智能体的逻辑,尽管我们设定了人为初始目标,但是本发明中机器算法产生的目标是智能体生成的目标,并且能按照人的期望自动探索达到目标的方法,所以本发明中,我们首先要建立智能体的目标,我们称之为智能体的初始目标。其次建立目标强化形成分步决策和反向评估的累计进化结果等过程。4. In the past, computer technology we explored the relationship between variables and independent variables. The independent variables are usually external objective data, but when people make decisions, they often do not make decisions based on one or several objective data. Instead, make decisions based on your own needs. In the present invention, we simulate an agent by simulating the way of thinking of people, and let the agent have its own goal. This goal is not an external objective data, that is, we are not using artificially set goals (in the present invention) The artificial initial goal is the rate of return and the Sharpe index) as the goal of the agent, and it is not the logic of the agent that is artificially set. Although we set the artificial initial goal, the goal generated by the machine algorithm in the present invention It is the goal generated by the agent and can automatically explore the method to reach the goal according to human expectations. Therefore, in the present invention, we must first establish the goal of the agent, which we call the initial goal of the agent. Secondly, establish goals to strengthen the process of forming step-by-step decision-making and the cumulative evolutionary results of reverse evaluation.
5、框架中的反向模型。如图15所示,反向模型的作用就是自动化特征工程,分为四个部分,对自我评估的特征重组,对环境评估的特征重组,对输出逻辑评估的特征重组以及特征融合。最终反向模型形成的是一系列的新特征,这些特征会产生新的逻辑,因为逻辑发生了变化,所以尽管是同一个程序,但是却能输出不同的张量流图(张量流图是数据走过的路径),不同的张量流图也代表了不同的逻辑,结合本发明就是经过反向模型之后,之前的选股逻辑发生了变化,产生了新的选股策略,而这个策略就是智能体自己产生的逻辑,而非人给出的不同条件下的选股策略。也就是说,从技术上看,不同的张量流图代表了不同的路径,不同的路径代表了不同的选择标准,不同的选择标准代表了不同的选择逻辑,本发明中不同的张量图就是不同的选股逻辑,这样也就实现了智能逻辑的过程了。5. The reverse model in the framework. As shown in Figure 15, the role of the reverse model is to automate feature engineering, which is divided into four parts: self-assessment feature reorganization, environmental assessment feature reorganization, output logic evaluation feature reorganization, and feature fusion. The final reverse model formed a series of new features, these features will produce new logic, because the logic has changed, so although it is the same program, but can output different tensor flow graphs (tensor flow graph is Data path), different tensor flow graphs also represent different logics. In combination with the present invention, after going through the reverse model, the previous stock selection logic has changed, resulting in a new stock selection strategy, and this strategy It is the logic generated by the agent itself, not the stock selection strategy given by people under different conditions. That is to say, from a technical point of view, different tensorflow graphs represent different paths, different paths represent different selection criteria, and different selection criteria represent different selection logics. Different tensor graphs in the present invention It is a different stock selection logic, so that the process of intelligent logic is realized.
6、传统的模型融合仍然是找x和y之间的关系,而反向模型的模型融合不仅找x和y的关系,更重要的是,反向模型寻找的是一种因果关系,本发明中指的因果关系是指正相关关系,因就是机器算法衍生出来的各种逻辑,而果就 是我们的目标回报。也就是说广度学习探索的是逻辑与结果之间的因果关系。如果有这种因果对应关系就保留这个逻辑,不对应的就剔除并继续通过迭代挖掘其他特征形成的新逻辑,并再次比对是否与结果有因果关系,如此循环一直迭代到逻辑与结果出现正相关性。本发明当中,把机器算法对所有外部环境的数据的认识最终提炼为一个值,叫做情况,情况是机器算法对环境认识的最高层次,情况的收敛代表对环境的认识达到了一个高度,同时,我们把智能体对自我的认识最终提炼为一个值,叫做分步决策和反向评估的累计进化结果,分步决策和反向评估的累计进化结果是机器算法对“自我”认识的最高层次,分步决策和反向评估的累计进化结果的收敛代表对“自我”的认识达到了一定高度,分步决策和反向评估的累计进化结果和情况都同时收敛代表机器对自我和环境都有了一个清晰的认识,这样的情况下输出他的选股,而这就是智能体根据自己的逻辑选择出来的股票。我们挑选前20个形成新的alpha组合。6. The traditional model fusion is still looking for the relationship between x and y, and the model fusion of the reverse model not only looks for the relationship between x and y, but more importantly, the reverse model looks for a causal relationship. The causality of the middle finger refers to the positive correlation, the cause is the various logic derived from the machine algorithm, and the result is our target return. In other words, breadth learning explores the causal relationship between logic and results. If there is this kind of causal correspondence, keep this logic, if it does not correspond, remove and continue to iteratively mine the new logic formed by other features, and again compare whether there is a causal relationship with the result, and then iterate until the logic and the result are positive. Correlation. In the present invention, the machine algorithm's knowledge of all external environment data is finally refined into a value called the situation. The situation is the highest level of machine algorithm's understanding of the environment. The convergence of the situation means that the understanding of the environment has reached a high level. At the same time, We finally refine the agent’s knowledge of self into a value called the cumulative evolution result of step-by-step decision-making and reverse evaluation. The cumulative evolution result of step-by-step decision-making and reverse evaluation is the highest level of understanding of "self" by machine algorithms. The convergence of the cumulative evolutionary results of step-by-step decision-making and reverse evaluation means that the understanding of "self" has reached a certain height. The cumulative evolutionary results and conditions of step-by-step decision-making and reverse evaluation have both converged, which means that the machine has both self and environment. With a clear understanding, output his stock selection in this situation, and this is the stock selected by the agent according to its own logic. We select the top 20 to form a new alpha combination.
需要注意:requires attention:
拟人决策模型中的人为初始目标并不是智能体的目标,而是人期望智能体达到的目标,智能体的目标本发明中我们叫做智能体的初始目标,智能体的初始目标是带有智能体自我主观的目标,因为人在做取舍的时候往往也是带有主观色彩的,并不完全根据客观世界来做出选择,所以,我们实际上在这里是构建了一个由情绪指标和模仿对象而形成的智能体的目标,这就是智能体的初始目标。另外,拟人决策模型中的取舍逻辑并不是一成不变的,在智能体进行特征重组后,进入情绪指标的输入值就变成了模型融合后的新特征组合。The human-made initial goal in the anthropomorphic decision-making model is not the goal of the agent, but the goal that people expect the agent to achieve. The goal of the agent in this invention is called the agent’s initial goal. The agent’s initial goal is with the agent Self-subjective goals, because people are often subjective when making choices, and they do not make choices based entirely on the objective world. Therefore, we are actually here to construct an emotional indicator and imitation object. The goal of the agent, this is the initial goal of the agent. In addition, the choice logic in the anthropomorphic decision-making model is not static. After the agent reorganizes the features, the input value into the sentiment index becomes a new feature combination after the model is fused.
本发明技术方案的算法描述。Algorithm description of the technical solution of the present invention.
首先构造一个多层次多维度的环境,然后通过这个环境构造一个智能体可以模仿的对象—alpha组合,模仿对象的alpha是人随机给出的一个组合并生成的alpha的收益图(见图3),智能体一开始看到的只是一个alpha的收益率曲线,现在他需要自己来重新组合一个alpha组合。所以他会根据环境数据和“自我”逻辑来创造新的积木组合。环境数据包括:宏观指标、财务指标、行情指标和新闻指标。例如我们随机挑选几个筛选条件:股价低于15元的、市盈率低于20倍的、行业包括金融,电商、舆情指数最近3个月始终大于100。最终选取前20只股票形成了一个组合(模仿对象),作为alpha组合如图3所 示,收益率在过去3个月是0.0063,夏普比率是0.000089。First construct a multi-level and multi-dimensional environment, and then use this environment to construct an object-alpha combination that the agent can imitate. The alpha of the imitated object is a combination given by people randomly and generated alpha income graph (see Figure 3) At first, the agent saw only an alpha yield curve, and now he needs to recombine an alpha combination by himself. So he will create a new combination of building blocks based on environmental data and "self" logic. Environmental data includes: macro indicators, financial indicators, market indicators and news indicators. For example, we randomly select a few selection criteria: stocks with a stock price of less than 15 yuan, a P/E ratio of less than 20 times, industries including finance, e-commerce, and public sentiment indexes have always been greater than 100 in the past 3 months. Finally, the top 20 stocks are selected to form a portfolio (imitated object), as shown in Figure 3 as an alpha portfolio, the return rate in the past 3 months is 0.0063, and the Sharpe ratio is 0.000089.
然后人为设定一个目标作为人为初始目标,例如收益是5%,夏普比率是3。把所有的环境数据提供给程序,同时把alpha作为学习模仿对象提供给程序。然后如图16所示,程序按照下面的流程就开始运行。流程分为2个大的步骤,和一个大循环。Then artificially set a goal as the artificial initial goal, for example, the return is 5% and the Sharpe ratio is 3. Provide all environmental data to the program, and at the same time provide alpha to the program as a learning and imitation object. Then, as shown in Figure 16, the program starts to run according to the following process. The process is divided into 2 big steps and a big loop.
第一步是实现逻辑强化过程,第二步实现反向评估过程,然后把反向模型筛选后的新特征放入第一步逻辑强化中作为情绪指标的输入,代替之前的外部特征数据,同时用新特征筛选股票形成新的alpha组合与旧的alpha做比较,再重复第一步骤和第二步骤。The first step is to implement the logic enhancement process, the second step is to implement the reverse evaluation process, and then put the new features after the reverse model screening into the first step logic enhancement as the input of sentiment indicators, replacing the previous external feature data, and at the same time Use new features to screen stocks to form a new alpha combination to compare with the old alpha, and then repeat the first and second steps.
逻辑强化过程:Logic strengthening process:
首先读入外部数据(包括财务数据、行情数据、新闻数据和宏观数据),然后通过添加隐含层函数和激励函数逐步生成对应属性的逻辑层。First read in external data (including financial data, market data, news data, and macro data), and then gradually generate logical layers of corresponding attributes by adding hidden layer functions and incentive functions.
情绪指标的计算:把外部环境数据当成情绪指标的函数。Calculation of sentiment indicators: take external environmental data as a function of sentiment indicators.
MergeAllData=pd.merge(fin[36],market[32],mac[11],news[17])MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])
EmotionIndex=tanh(weight*MergeAllData+bias)EmotionIndex=tanh(weight*MergeAllData+bias)
EmotionIndex是情绪指标,weight代表权重,MergeAllData是合并的所有数据,fin[36],market[32],mac[11],news[17]分别是财务数据,行情数据,宏观数据,新闻数据,bias是偏置,tanh是激励函数。pd.merge()是合并函数。EmotionIndex is the sentiment index, weight represents the weight, MergeAllData is all the merged data, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias Is the bias and tanh is the activation function. pd.merge() is the merge function.
智能体的初级概念状态:智能体的初级概念状态代表在现实之上的期望,因此用环境的数据与目标做加法得到智能体的初级概念状态,智能体的初级概念状态是现实基础上的预期。The primary conceptual state of the agent: The primary conceptual state of the agent represents the expectation above reality. Therefore, the primary conceptual state of the agent is obtained by adding environmental data and goals. The primary conceptual state of the agent is an expectation based on reality .
idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)
idea代表智能体的初级概念状态,EmotionIndex是情绪指标,reshap()是降维函数,prospective是预期收益,weight代表权重,bias是偏置,relu是激励函数。Idea represents the primary concept state of the agent, EmotionIndex is the emotion index, reshap() is the dimensionality reduction function, prospect is the expected return, weight is the weight, bias is the bias, and relu is the incentive function.
取舍逻辑:设定1就是舍,0就是获取。通过relu函数,回报与智能体的初级概念状态相除代表现实与理想的距离。Selection logic: Set 1 to discard, and 0 to obtain. Through the relu function, the return divided by the primary concept state of the agent represents the distance between reality and ideal.
ar=sigmoid(weight*(reward/idea)+bias))ar=sigmoid(weight*(reward/idea)+bias))
ar是取舍逻辑,weight代表权重,reward是alpha的收益率,idea是智能体的初级概念状态,bias是偏置,sigmoid是激励函数。ar is the logic of choice, weight represents the weight, reward is the rate of return of alpha, idea is the primary concept state of the agent, bias is the bias, and sigmoid is the incentive function.
计算目标强化层的四个函数:Four functions for calculating the target enhancement layer:
智能体的初始目标的计算:通过relu函数把外部数据的四大类数据中通过筛选条件选出20个股票,形成组合并计算该组合的收益率和夏普比率,然后用取舍逻辑的输出值除以alpha的收益率得到智能体的初始目标:The calculation of the agent’s initial goal: select 20 stocks from the four categories of external data by the relu function through filtering conditions, form a combination and calculate the return rate and Sharpe ratio of the combination, and then divide the output value of the logic of selection Get the initial goal of the agent with an alpha rate of return:
AgentTarget=relu(weight*(ar/reward)+bias))AgentTarget=relu(weight*(ar/reward)+bias))
AgentTarget是智能体的初始目标,weight代表权重,ar是取舍逻辑,reward是alpha的收益率,bias是偏置,relu是激励函数。AgentTarget is the initial target of the agent, weight represents the weight, ar is the logic of selection, reward is the rate of return of alpha, bias is the bias, and relu is the incentive function.
目标明确的状态值的计算:目标明确的状态值可以理解为机器算法中智能体的目标与实际的距离,距离太远代表目标明确的状态值无效,只有智能体的目标与实际的距离比较接近才能算是有效的目标明确的状态值。经过tanh激励函数后,只要判断目标明确的状态值是否大于0,大于0代表有效的目标明确的状态值,小于0代表无效的目标明确的状态值,无效的目标明确的状态值并非没有意义,目标未明确的状态值可以形成常识,不过本案并不涉及常识的部分,只记录该目标明确的状态值的值用于后面的分步决策和反向评估的累计进化结果的计算。The calculation of the state value with a clear goal: the state value with a clear goal can be understood as the distance between the target of the agent in the machine algorithm and the actual. The distance is too far to indicate that the state value of the clear goal is invalid. Only the target of the agent is closer to the actual distance It can be regarded as an effective target state value. After passing the tanh excitation function, as long as it is judged whether the clear-target state value is greater than 0, greater than 0 represents an effective clear-target state value, and less than 0 represents an invalid clear-target state value. An invalid clear-target state value is not meaningless. State values with unclear goals can form common sense, but this case does not involve common sense. Only the state values with clear goals are recorded for subsequent step-by-step decision-making and calculation of the cumulative evolution result of reverse evaluation.
Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))
Targeted是目标明确的状态值,weight代表权重,AgentTarget是智能体的初始目标,reward是alpha的收益率,EmotionIndex是情绪指标,bias是偏置,relu是激励函数。Targeted is the target state value, weight represents the weight, AgentTarget is the initial target of the agent, reward is the rate of return of alpha, EmotionIndex is the sentiment index, bias is the bias, and relu is the incentive function.
目标未明确的状态值的计算:目标未明确的状态值来自3部分,一部分来自目标明确的状态值的有效转换,一部分来目标明确的状态值的无效转换,更大一部分来自未感知态。人们对于从来没经历过的事情通常是有意识的去尝试体验,但是经历过后如果没有大碍就会或多或少的转换成无意识的一部分。有意识类似于本案定义的目标明确的状态,无意识类似于本案中目标未明确的状态值,无意识最终形成的是常识,也可以说无意识存储的就是常识。人类做逻辑推理来自大脑无意识形成的常识和有意识的判别,也就是说基于常识才能做推理,本案中目标未明确的状态值和目标明确的状态值模拟了人的有意识和无意识,以此来做逻辑推理。The calculation of the state value of the unclear goal: the state value of the unclear goal comes from three parts, one part comes from the effective conversion of the goal clear state value, one part comes from the invalid transition of the goal clear state value, and the larger part comes from the unconscious state. People usually consciously try to experience things that they have never experienced before, but if there are no major problems after the experience, they will more or less become part of the unconscious. Consciousness is similar to the state with a clear goal defined in this case, and unconscious is similar to the state value of unclear goals in this case. The unconscious ultimately forms common sense, or it can be said that the unconscious stores common sense. Human logical reasoning comes from the common sense and conscious judgment formed by the brain unconsciously, that is to say, reasoning can be made based on common sense. In this case, the unclear target state value and the clear target state value simulate human conscious and unconsciousness. Logical reasoning.
UnTargeted=sigmoid(weight*([AgentTarget,Targeted,NewIdea,un-recognized n])+bias)) UnTargeted=sigmoid(weight*([AgentTarget, Targeted, NewIdea,un-recognized n ])+bias))
UnTargeted是目标未明确的状态值,weight代表权重,AgentTarget是智能体的初始目标,Targeted是目标明确的状态值,NewIdea是智能体的新概念状态,bias是偏置,sigmoid是激励函数,un-recognized n代表由n维未感知态构成的未知数据。 UnTargeted is the state value of the unclear target, weight represents the weight, AgentTarget is the initial target of the agent, Targeted is the state value of the clear target, NewIdea is the new conceptual state of the agent, bias is the bias, sigmoid is the incentive function, un- recognized n represents unknown data composed of n-dimensional unperceived states.
智能体的新概念状态:智能体的新概念状态的目标是检验反事实(反事实指对过去已经发生的事实进行否定而重新表征)。智能体的新概念状态的值越小证明模仿对象与智能体的初始目标之间的距离是在接近的,反之值越大说明模仿对象与智能体的初始目标的距离越来越远。智能体的新概念状态可以看做是一个推断模型,他探索的是条件因素与目标变量的因果关系,他的值变大或者变小意味着有2种可能,一是目标难以达成,或者说目标设定的不一定合适(反事实之一),此时智能体的新概念状态的值趋向于0,二是模仿对象设置的不好(反事实之二),此时智能体的新概念状态的值趋向于正1。如果出现智能体的新概念状态的值越来越大的情况意味着存在未知数据没有引入到环境数据当中,此时需要考虑引入新的外部环境数据。经过我们的测试,当NewIdea的值超过0.9的时候,情况的值几乎就不会再收敛了,此时我们设定un-recognized n=un-recognized n+1,提示需要引入新的外部数据到环境数据中。NewIdea=relu(weight*(Moveloss/Targeted)+bias))。 The new concept state of the agent: The goal of the new concept state of the agent is to test counterfactuals (counterfacts refer to the negation of facts that have occurred in the past and re-characterization). The smaller the value of the new concept state of the agent proves that the distance between the imitation object and the agent's initial goal is close, and the larger the value, the longer the distance between the imitation object and the agent's initial goal. The new concept state of the agent can be regarded as an inference model. What he explores is the causal relationship between the conditional factors and the target variable. A larger or smaller value means that there are two possibilities. One is that the goal is difficult to achieve, or that The goal setting is not necessarily appropriate (one of the counterfactuals). At this time, the value of the new concept state of the agent tends to 0. The second is that the imitation object is not well set (the second of the counterfactuals). At this time, the new concept of the agent The value of the state tends to be positive 1. If the value of the new concept state of the agent becomes larger and larger, it means that there is unknown data that has not been introduced into the environmental data. At this time, it is necessary to consider the introduction of new external environmental data. After our test, when the value of NewIdea exceeds 0.9, the value of the situation will hardly converge. At this time, we set un-recognized n = un-recognized n +1, prompting the need to introduce new external data to Environmental data. NewIdea=relu(weight*(Moveloss/Targeted)+bias)).
NewIdea是智能体的新概念状态,weight代表权重,Moveloss是alpha这个组合的移动止损距离(移动止损又称“追踪止损”,就是追随最新价格设置一定点数的止损,移动止损距离指的是多空双方平均移动止损的差。)Targeted是目标明确的状态值,sigmoid是激励函数,bias是偏置。NewIdea is the new concept state of the agent, weight represents the weight, and Moveloss is the moving stop loss distance of the alpha combination (moving stop loss is also called "trailing stop loss", which is to follow the latest price to set a certain number of stop loss, and move the stop loss distance It refers to the difference between the average moving stop loss of the long and the short.) Targeted is the state value with a clear target, sigmoid is the incentive function, and bias is the bias.
moveloss=tf.reduce_mean(tf.reduce_sum(tf.square((moveloss_sel-moveloss_buy))))moveloss=tf.reduce_mean(tf.reduce_sum(tf.square((moveloss_sel-moveloss_buy))))
tf.reduce_mean是求均值的函数,tf.reduce_sum是求和的函数,tf.square是求方程的函数,moveloss_sell是卖单的截止到今天的综合止损位,moveloss_buy是买单的截止到今天的综合止损位。他们的计算分别如下:tf.reduce_mean is the function to find the mean value, tf.reduce_sum is the sum function, tf.square is the function to find the equation, moveloss_sell is the comprehensive stop loss of the sell order as of today, and moveloss_buy is the comprehensive stop of the buy order as of today Loss. Their calculations are as follows:
如果当天的开盘价大于等于收盘价,If the opening price of the day is greater than or equal to the closing price,
moveloss_buy=close_yesterday-firstlossmoveloss_buy=close_yesterday-firstloss
如果当天的开盘价小于收盘价,If the opening price of the day is less than the closing price,
moveloss_buy=close_today-(close_today-close_yesterday)*0.618+firstlossmoveloss_buy=close_today-(close_today-close_yesterday)*0.618+firstloss
其中close_today是今天的收盘价,close_yesterday是昨天的收盘价,firstloss是初始化止损,我们设为0.0012.Among them, close_today is today's closing price, close_yesterday is yesterday's closing price, firstloss is the initial stop loss, we set it to 0.0012.
如果当天的开盘价小于等于收盘价,If the opening price of the day is less than or equal to the closing price,
moveloss_sell=close_yesterday+firstlossmoveloss_sell=close_yesterday+firstloss
如果当天的开盘价大于收盘价,If the opening price of the day is greater than the closing price,
moveloss_sell=close_today+(close_today-close_yesterday)*0.618+firstlossmoveloss_sell=close_today+(close_today-close_yesterday)*0.618+firstloss
其中close_today是今天的收盘价,close_yesterday是昨天的收盘价,firstloss是初始化止损,我们设为0.0012.Among them, close_today is today's closing price, close_yesterday is yesterday's closing price, firstloss is the initial stop loss, we set it to 0.0012.
注:close_today,close_yesterday是alpha组合指数的收盘价,alpha组合的指数计算见附录。Note: close_today and close_yesterday are the closing prices of the alpha portfolio index. See the appendix for the calculation of the alpha portfolio index.
Moveloss代表的是一种多空双方止损距离的大小,距离越大压力越小,距离越小压力越大,我们在这里也是模拟人的情况,人在有压力的情况下往往能想出各种奇思妙想,本案中我们涉及的是证券市场,所以我们把某个组合的压力近似当成人的压力,就好像如果某人买的股票跌了那么他也会产生大小不等的压力。对于更加通用的智能体的新概念状态算法可以有针对性地设计算法,原则上本着压力产生思想来设计,类似急中生智,就是说可以把相应的压力值当做智能体的新概念状态的值。Moveloss represents the size of the stop loss distance between the long and short sides. The greater the distance, the smaller the pressure, and the smaller the distance, the greater the pressure. We are also simulating human situations here. People can often think of each under pressure. This is a whimsical idea. In this case, we are involved in the securities market, so we treat the pressure of a certain combination as the pressure of an adult, just like if someone buys a stock that drops, he will also have pressure of varying magnitude. The new concept state algorithm for a more general agent can be designed in a targeted manner. In principle, it is designed based on the idea of pressure generation. It is similar to the emergence of intelligence, which means that the corresponding pressure value can be regarded as the value of the new concept state of the agent.
分步决策和反向评估的累计进化结果的计算:分步决策和反向评估的累计进化结果代表了智能体的智慧,分步决策和反向评估的累计进化结果是目标未明确的状态值和目标明确的状态值的距离比,可以说目标明确的状态值越接近于目标未明确的状态值代表分步决策和反向评估的累计进化结果越智能。Calculation of the cumulative evolution result of step-by-step decision-making and reverse evaluation: The cumulative evolution result of step-by-step decision-making and reverse evaluation represents the intelligence of the agent, and the cumulative evolution result of step-by-step decision-making and reverse evaluation is a state value with unclear goals Compared with the state value with a clear goal, it can be said that the closer the state value with a clear goal is to the state value with an unclear goal, the smarter the cumulative evolution result of stepwise decision-making and reverse evaluation.
SapientState=relu(weight*(UnTargeted/Targeted)+bias))SapientState=relu(weight*(UnTargeted/Targeted)+bias))
SapientState是分步决策和反向评估的累计进化结果,weight代表权重,UnTargeted是目标未明确的状态值,Targeted是目标明确的状态值,bias是偏置,relu是激励函数;SapientState is the cumulative evolution result of step-by-step decision-making and reverse evaluation, weight represents weight, UnTargeted is the state value with unclear target, Targeted is the state value with clear target, bias is bias, and relu is the incentive function;
计算策略强化的二个函数:Two functions of computing strategy enhancement:
输出逻辑的计算:通过relu函数用智能体的初始目标除以取舍逻辑;Calculation of output logic: Divide the agent’s initial target by the choice logic through the relu function;
OutputLogic=relu(weight*(prospective/ar)+bias))OutputLogic=relu(weight*(prospective/ar)+bias))
OutputLogic是输出逻辑,prospective是预期收益,weight代表权重,ar是取舍逻辑,bias是偏置,relu是激励函数;OutputLogic is the output logic, prospective is the expected return, weight is the weight, ar is the choice logic, bias is the bias, and relu is the incentive function;
情况的计算:情况代表了对环境认识的最高层次。Calculation of the situation: The situation represents the highest level of understanding of the environment.
Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))
Situation代表情况,weight代表权重,OutputLogic是输出逻辑,ar是取舍逻辑,bias是偏置,tanh是激励函数。Situation represents the situation, weight represents the weight, OutputLogic is the output logic, ar is the choice logic, bias is the bias, and tanh is the excitation function.
本案中的策略在证券市场中就是买卖股票,所以这里的策略就是买入股票或者卖出股票当天的价格,本案统一按买入卖出当日的收盘价close来计算。反向模型:The strategy in this case is to buy and sell stocks in the stock market, so the strategy here is to buy or sell stocks at the price of the day. In this case, the calculation is based on the closing price close on the day of buying and selling. Reverse model:
反向模型的作用是自动化特征工程。The role of the reverse model is to automate feature engineering.
如图17和图18所示,反向模型分为四部分,对自我的评估的特征提取,对环境的评估特征提取,对输出逻辑评估的特征提取以及特征重组。As shown in Figure 17 and Figure 18, the reverse model is divided into four parts, feature extraction for self-assessment, feature extraction for environment assessment, feature extraction for output logic assessment, and feature reorganization.
第一步智能体对自我评估的特征提取,用XGBoost算法把目标强化的四个值作为输出,外部环境的值作为输入,筛选出的特征就是自我评估的特征:In the first step, the agent extracts the features of self-assessment, using the XGBoost algorithm to take the four values of the target reinforcement as the output, and the value of the external environment as the input. The selected features are the self-assessment features:
NewFeature1=XGBClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,UnTargeted,Targeted,AgentTarget])NewFeature1=XGBClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,UnTargeted,Targeted,AgentTarget])
NewFeature1为自我评估特征,XGBClassifier属于XGBoost算法,是一种分类函数,fin[36],market[32],mac[11],news[17]分别是财务数据(指标),行情数据,宏观数据,新闻数据,SapientState,UnTargeted,Targeted,AgentTarget分别是分步决策和反向评估的累计进化结果,目标未明确的状态值,目标明确的状态值,智能体的初始目标。上式即把自我评估的四个值(分步决策和反向评估的累计进化结果,目标未明确的状态值,目标明确的状态值,智能体的初始目标)看成四个标准,筛选是指从原始数据中挑出符合这四个标准的特征。NewFeature1 is a self-assessment feature, XGBClassifier belongs to the XGBoost algorithm, a classification function, fin[36], market[32], mac[11], news[17] are financial data (indicators), market data, and macro data, respectively. News data, SapientState, UnTargeted, Targeted, AgentTarget are the cumulative evolutionary results of step-by-step decision-making and reverse evaluation, respectively, the state value with unclear target, the state value with clear target, and the initial target of the agent. The above formula regards the four values of self-evaluation (accumulative evolution results of step-by-step decision-making and reverse evaluation, state values with unclear goals, state values with clear goals, and initial goals of the agent) as four criteria. The selection is Refers to selecting features that meet these four standards from the original data.
第二步对环境的评估特征提取,用LightGBM算法把策略强化的两个值作为输出,外部环境的值作为输入,筛选出符合环境评估逻辑的特征:The second step is to extract the evaluation features of the environment, using the LightGBM algorithm to take the two values of the strategy enhancement as the output, and the value of the external environment as the input, to filter out the features that meet the environmental evaluation logic:
NewFeature2=LightGBMClassifier(input(fin[36],market[32],mac[11],news[17]),output[Situation,OutputLogic])NewFeature2=LightGBMClassifier(input(fin[36], market[32], mac[11], news[17]), output[Situation,OutputLogic])
NewFeature2为符合环境评估逻辑层的特征,LightGBMClassifier是一种分类函数,fin[36],market[32],mac[11],news[17]分别是财务数据,行情数据,宏观数据,新闻数据,Situation是情况,OutputLogic是输出逻辑。NewFeature2 is in line with the characteristics of the environmental assessment logic layer, LightGBMClassifier is a classification function, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, Situation is the situation, and OutputLogic is the output logic.
第三步对输出逻辑评估的特征提取,用GradientBoosting算法把目标强化和策略强化的两个值作为输出,外部环境的值作为输入,筛选出输出逻辑特征;The third step is to extract the features of the output logic evaluation, use the GradientBoosting algorithm to take the two values of target enhancement and strategy enhancement as the output, and the value of the external environment as the input to filter out the output logic features;
NewFeature3=GradientBoostingClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,Situation])NewFeature3=GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])
NewFeature3为输出逻辑的特征,GradientBoostingClassifier是一种分类函数,fin[36],market[32],mac[11],news[17]分别是财务数据,行情数据,宏观数据,新闻数据,Situation是情况,SapientState是分步决策和反向评估的累计进化结果。NewFeature3 is the feature of output logic, GradientBoostingClassifier is a classification function, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, Situation is the situation. , SapientState is the cumulative evolutionary result of step-by-step decision-making and reverse evaluation.
第四步,特征重组:The fourth step, feature reorganization:
用逻辑回归算法把智能体的初始目标值做为输出,用第一步,第二步和第三步筛选出来的特征值作为输入,筛选出新特征CombinedFeature:Use the logistic regression algorithm to take the agent’s initial target value as the output, and use the feature values filtered in the first, second, and third steps as input to filter out the new feature CombinedFeature:
CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])
LRClassifier是逻辑回归函数,NewFeature1,NewFeature2和NewFeature3是前三步选出来的新特征组,AgentTarget是之前算出来的智能体的初始目标值。LRClassifier is a logistic regression function, NewFeature1, NewFeature2, and NewFeature3 are the new feature groups selected in the first three steps, and AgentTarget is the initial target value of the agent calculated before.
也可以用stacking方法来做模型融合:You can also use the stacking method to do model fusion:
StackingClassifier=(classifiers=[LightGBM,xgboost,GradientBoosting],meta_classifier=lr)其中meta_classifier=lr代表用逻辑回归算法来融合LightGBM,xgboost,GradientBoosting三个算法挑选出来的特征。StackingClassifier = (classifiers = [LightGBM, xgboost, GradientBoosting], meta_classifier = lr) where meta_classifier = lr represents the use of logistic regression algorithm to fuse features selected by the three algorithms of LightGBM, xgboost, and GradientBoosting.
五:新特征如何形成新的alpha组合。Five: How do new features form new alpha combinations.
首先见数据流程图19,分三步:First see Data Flow Chart 19, which is divided into three steps:
第一步用随机森林算法给每支股票打分,然后挑选出得分最高的前n个股票,本案n=20。The first step is to use the random forest algorithm to score each stock, and then select the top n stocks with the highest scores. In this case, n=20.
步骤如下:我们把迭代1万次后挑选出来的新特征CombinedFeature选出来见下表8:The steps are as follows: We select the new feature CombinedFeature selected after 10,000 iterations, as shown in Table 8 below:
Figure PCTCN2019078710-appb-000014
Figure PCTCN2019078710-appb-000014
表8Table 8
首先设计一个目标变量GB,本着价值投资的理念,首先剔除大盘涨跌幅后每支股票剩下的涨跌幅,然后再加上我们最开始人为设定的5%的收益率。First, design a target variable GB. Based on the concept of value investment, first remove the remaining fluctuations of each stock after the market fluctuations, and then add the 5% return rate that we artificially set at the beginning.
当大盘下跌的时候:When the market falls:
countday:=count(date>20170407,0);countday:=count(date>20170407,0);
change:=(close-ref(close,countday))/ref(close,countday);change:=(close-ref(close,countday))/ref(close,countday);
GB:=change+0.045+0.05;GB:=change+0.045+0.05;
当大盘上涨的时候:When the market rises:
countday:=count(date>20170407,0);countday:=count(date>20170407,0);
change:=(close-ref(close,countday))/ref(close,countday);change:=(close-ref(close,countday))/ref(close,countday);
if(change>0and change<0.045)if(change>0and change<0.045)
then GB:=0.045-change+0.05;then GB: =0.045-change+0.05;
else GB:=change-0.045+0.05;else GB:=change-0.045+0.05;
其中countday是统计一段时间的天数,change是剔除大盘涨跌幅后的剩余涨跌幅,count()是计算符合条件的天数的函数,ref()是抽取从现在到countday天的收盘价的函数,Among them, countday is the number of days to count a period of time, change is the remaining rise and fall after excluding the market rise and fall, count() is a function to calculate the number of days that meet the conditions, and ref() is a function to extract the closing price from now to countday ,
if()if()
thenthen
else是条件函数,else is a conditional function,
0.045是上证指数从20170407统计到20170614的涨跌幅,0.05是预期收益,GB是目标变量,,通过计算后,我们按GB的大小从大到小排序来看,见下表(只列出了前20名)0.045 is the increase and decrease of the Shanghai Composite Index from 20170407 to 20170614, 0.05 is the expected return, and GB is the target variable. After calculation, we sort by the size of GB from large to small, see the following table (only listed Top 20)
Figure PCTCN2019078710-appb-000015
Figure PCTCN2019078710-appb-000015
Figure PCTCN2019078710-appb-000016
Figure PCTCN2019078710-appb-000016
Figure PCTCN2019078710-appb-000017
Figure PCTCN2019078710-appb-000017
Figure PCTCN2019078710-appb-000018
Figure PCTCN2019078710-appb-000018
表9Table 9
然后用随机森林算法就能给每只股票打一个分数,把排名最前面的n只股票挑选出来就是新alpha组合。Then use the random forest algorithm to give each stock a score, and select the top n stocks to form the new alpha combination.
score=model_selection.cross_val_score(RandomForestClassifier,input[CombinedFeature],output[GB])score=model_selection.cross_val_score(RandomForestClassifier,input[CombinedFeature],output[GB])
score是每支股票的得分,model_selection.cross_val_score是计算得分的函数,RandomForestClassifier是随机森林算法,GB就是目标变量。经过计算得到下表10:Score is the score of each stock, model_selection.cross_val_score is the function to calculate the score, RandomForestClassifier is the random forest algorithm, and GB is the target variable. After calculation, the following table 10 is obtained:
Figure PCTCN2019078710-appb-000019
Figure PCTCN2019078710-appb-000019
Figure PCTCN2019078710-appb-000020
Figure PCTCN2019078710-appb-000020
表10Table 10
本实施例中,我们选择前20个股票作为新alpha的组合的股票。In this example, we select the top 20 stocks as the stocks of the new alpha portfolio.
第二步,权重分配The second step, weight distribution
利用函数BL_asset_allocation来计算每支股票的权重。Use the function BL_asset_allocation to calculate the weight of each stock.
weight=BL_asset_allocation(df,0.05,p,q,optim_setting)weight=BL_asset_allocation(df,0.05,p,q,optim_setting)
weight是权重,BL_asset_allocation是计算权重的函数,df是股票的数组数据,0.05是之前人为设定的预期收益,p是n个股票的矩阵数据,q是n个股票每支股票的收益率,optim_setting是风险度量指标,此处我们设为3,也就是最开始我们人为设定的夏普比率3。通过计算我们得到下面的表11。weight is the weight, BL_asset_allocation is the function to calculate the weight, df is the array data of stocks, 0.05 is the expected return artificially set before, p is the matrix data of n stocks, q is the return rate of each stock of n stocks, optim_setting It is a risk measurement index, here we set it to 3, which is the Sharpe ratio 3 that we set artificially at the beginning. Through calculation, we get the following Table 11.
Figure PCTCN2019078710-appb-000021
Figure PCTCN2019078710-appb-000021
Figure PCTCN2019078710-appb-000022
Figure PCTCN2019078710-appb-000022
Figure PCTCN2019078710-appb-000023
Figure PCTCN2019078710-appb-000023
表11Table 11
第三步:将新alpha组合计算成alpha指数:Step 3: Calculate the new alpha combination into alpha index:
本日指数=(本日市价总值/基期市价总值)×100Today's index = (Total market value of the day/Total market value of the base period)×100
市价总值就是当日所有股票的收盘价与发行量乘积再乘以权重之和,见公式:The total market value is the product of the closing price of all stocks on the day and the issuance volume multiplied by the sum of the weights, see the formula:
基期市价总值就是所有从买入当天的股票收盘价与发行量乘积再乘以权重之和,见公式:The total market value of the base period is the sum of the product of the closing price of the stock on the day of purchase and the issuance multiplied by the weight, see the formula:
公式:
Figure PCTCN2019078710-appb-000024
formula:
Figure PCTCN2019078710-appb-000024
其中close是收盘价,circulation是发行量,weight是权重。本案的n=20Among them, close is the closing price, circulation is the circulation, and weight is the weight. N=20 in this case
新alpha组合与模仿对象alpha的更新规则:The new alpha combination and the update rule of the imitated object alpha:
如果新alpha的收益率高于原来的模仿对象alpha的收益那么就用新alpha代替原来的模仿对象成为新的模仿对象,如果新alpha的收益率低于或等于原来的模仿对象alpha的收益那么还用原来的alpha作为模仿对象。If the rate of return of the new alpha is higher than that of the original imitation object alpha, then the new alpha will replace the original imitation object to become the new imitation object. If the rate of return of the new alpha is lower than or equal to the income of the original imitation object alpha, then return Use the original alpha as the imitation object.
拟人决策模型可运用的范围非常广泛,只要设定好目标并提供模仿对象,他就能自动学习模仿对象,并按照给定的目标逐步接近目标,并提醒使用的人需要达成目标的各个方向。目前我们只是运用在金融投资领域,下一步我们打算扩展到其他领域。The anthropomorphic decision-making model can be used in a wide range. As long as the goal is set and the imitation object is provided, he can automatically learn the imitation object, and gradually approach the goal according to the given goal, and remind the person who uses it to achieve the goal in all directions. At present, we are only using it in the field of financial investment, and we plan to expand to other fields in the next step.
以下为与上述方法实施例对应的系统实施例,本实施方式可与上述实施方式互相配合实施。上述实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在上述实施方式中。The following are system embodiments corresponding to the foregoing method embodiments, and this embodiment can be implemented in cooperation with the foregoing embodiment. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they will not be repeated here. Correspondingly, the related technical details mentioned in this embodiment can also be applied to the above embodiment.
本发明还提出了一种基于广度学习算法实现智能体的通用逻辑推理方法推理系统,其中包括:The present invention also proposes a general logic reasoning method reasoning system based on the breadth learning algorithm to realize the agent, which includes:
模块1、获取对象对应的各类环境数据,其中每类环境数据包括多维数据或指标,通过特征提取得到每类环境数据的属性; Module 1. Obtain various types of environmental data corresponding to the object, where each type of environmental data includes multi-dimensional data or indicators, and obtain the attributes of each type of environmental data through feature extraction;
模块2、建立各类环境数据对应属性的逻辑层,以构建逻辑强化过程和反 向模型,对该对象在环境中的情况进行动态的自我评估、环境评估和逻辑评估,并使用逻辑回归算法将三者的评估结果进行特征融合,得到新特征; Module 2. Establish a logical layer corresponding to the attributes of various environmental data to construct a logical strengthening process and a reverse model, conduct dynamic self-assessment, environmental assessment and logical assessment of the object’s situation in the environment, and use logistic regression algorithms to The evaluation results of the three are combined with features to obtain new features;
模块3、根据该新特征构建新对象,然后对该新对象形成的逻辑与人为初始目标建立因果关系,并对该因果关系和逻辑进行评估,根据评估结果确认因果关系以及形成该因果关系的逻辑后,将符合该因果关系的新对象作为逻辑推理结果进行推荐输出。 Module 3. Construct a new object according to the new feature, then establish a causal relationship between the logic formed by the new object and the human initial goal, and evaluate the causal relationship and logic, and confirm the causal relationship and the logic of forming the causal relationship according to the evaluation results Then, the new object that meets the causal relationship is used as the result of logical reasoning for recommendation output.
该基于广度学习算法实现智能体的逻辑推理系统,其中该模块1包括:The logical reasoning system of the agent based on the breadth learning algorithm is realized, wherein the module 1 includes:
模块11、获取人为初始目标和环境数据,该环境数据包括财务指标、行情指标、新闻指标和宏观指标,以随机的环境数据作为筛选条件,筛选得到满足该筛选条件的多支股票,集合该多支股票形成alpha指数作为模仿对象;Module 11. Obtain man-made initial goals and environmental data. The environmental data includes financial indicators, market indicators, news indicators, and macro indicators. Random environmental data is used as a screening condition to select multiple stocks that meet the screening conditions, and aggregate the multiple stocks. The stocks form the alpha index as an imitation object;
模块12、通过激励函数合并该环境数据,得到情绪指标,使用该环境数据与该人为初始目标做加法得到智能体的初级概念状态,通过激励函数用模仿对象的收益率除以智能体的初级概念状态,得到取舍逻辑,通过激励函数用取舍逻辑除以模仿对象的收益率,得到智能体的初始目标;Module 12. Combine the environmental data through the incentive function to obtain the emotional index, use the environmental data and the person as the initial goal to add the primary concept state of the agent, and divide the return rate of the imitation object by the primary concept of the agent through the incentive function State, get the logic of selection, and divide the logic of selection by the incentive function by the rate of return of the imitation object to get the initial goal of the agent;
模块13、通过激励函数用智能体的初始目标除以模仿对象的收益率与情绪指标的差得到目标明确的状态值,通过激励函数把智能体的初始目标,目标明确的状态值,智能体的新概念状态和未感知态作为输入得到目标未明确的状态值,通过激励函数用模仿对象的移动止损距离除以目标明确的状态值得到智能体的新概念状态,通过激励函数用目标未明确的状态值除以目标明确的状态值作为分步决策和反向评估的累计进化结果,通过relu函数用智能体的初始目标除以该取舍逻辑得到输出逻辑,通过激励函数用输出逻辑除以人为初始目标与实际收益的差,得到情况;Module 13. Through the incentive function, the agent’s initial goal is divided by the difference between the return rate of the imitation object and the emotional index to obtain a clear state value. Through the incentive function, the agent’s initial goal, the clear state value, and the agent’s The new concept state and the unperceived state are used as input to get the unclear state value of the target, and the new concept state of the agent is obtained by the incentive function by dividing the moving stop loss distance of the imitation object by the clear state value, and the target is unclear through the incentive function Divide the state value of the target by the state value of the goal as the cumulative evolution result of step-by-step decision-making and reverse evaluation. Through the relu function, the agent’s initial goal is divided by the selection logic to obtain the output logic, and the incentive function is used to divide the output logic by the artificial The difference between the initial target and the actual income, get the situation;
模块14、集合该分步决策和反向评估的累计进化结果、该情况和该智能体的初始目标作为该属性。Module 14. Collect the cumulative evolution result of the step-by-step decision and reverse evaluation, the situation and the agent's initial goal as the attribute.
该基于广度学习算法实现智能体的逻辑推理系统,其中该情绪指标的计算:The breadth-based learning algorithm realizes the logical reasoning system of the agent, in which the calculation of the emotional index:
MergeAllData=pd.merge(fin[36],market[32],mac[11],news[17])MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])
EmotionIndex=tanh(weight*MergeAllData+bias)EmotionIndex=tanh(weight*MergeAllData+bias)
EmotionIndex是情绪指标,weight代表权重,fin[36],market[32],mac[11],news[17]分别是财务数据、行情数据、宏观数据、新闻数据,bias是偏置,tanh 是激励函数,pd.merge()是合并函数。EmotionIndex is sentiment index, weight represents weight, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias is bias, tanh is incentive Function, pd.merge() is the merge function.
该智能体的初级概念状态的计算:Calculation of the primary concept state of the agent:
idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)
idea代表智能体的初级概念状态,prospective是预期收益,relu是激励函数;Idea represents the primary concept state of the agent, prospect is the expected return, and relu is the incentive function;
该取舍逻辑的计算:Calculation of the trade-off logic:
ar=sigmoid(weight*(reward/idea)+bias))ar=sigmoid(weight*(reward/idea)+bias))
ar为取舍逻辑,reward是模块11中该模仿对象的收益率,sigmoid是激励函数;ar is the logic of selection, reward is the rate of return of the imitation object in module 11, and sigmoid is the incentive function;
AgentTarget=relu(weight*(ar/reward)+bias))AgentTarget=relu(weight*(ar/reward)+bias))
AgentTarget是智能体的初始目标,weight代表权重,ar是取舍逻辑,reward是alpha的收益率,bias是偏置,relu是激励函数;AgentTarget is the initial target of the agent, weight represents the weight, ar is the logic of selection, reward is the rate of return of alpha, bias is the bias, and relu is the incentive function;
该目标明确的状态值的值为:The value of the targeted state value is:
Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))
Targeted是目标明确的状态值的值,AgentTarget是智能体的初始目标;Targeted is the value of the target state value, and AgentTarget is the initial target of the agent;
该目标未明确的状态值的值为:The value of the unclear status value of the target is:
UnTargeted=sigmoid(weight*([AgentTarget,Targeted,NewIdea,un-recognized n])+bias)) UnTargeted=sigmoid(weight*([AgentTarget, Targeted, NewIdea,un-recognized n ])+bias))
UnTargeted是目标未明确的状态值的值,un-recognized n代表由n维未感知态构成的未知数据; UnTargeted is the value of the unrecognized state value of the target, and un-recognized n represents the unknown data composed of n-dimensional unrecognized state;
该智能体的新概念状态通过下式得到:The new concept state of the agent is obtained by the following formula:
NewIdea=relu(weight*(Moveloss/Targeted)+bias))NewIdea=relu(weight*(Moveloss/Targeted)+bias))
NewIdea是该智能体的新概念状态,Moveloss是该模仿对象的移动止损距离;NewIdea is the new concept state of the agent, and Moveloss is the moving stop loss distance of the imitation object;
分步决策和反向评估的累计进化结果就是目标未明确的状态值与目标明确的状态值的比;The cumulative evolution result of step-by-step decision-making and reverse evaluation is the ratio of the state value with unclear goals to the state value with clear goals;
SapientState=relu(weight*(UnTargeted/Targeted)+bias))SapientState=relu(weight*(UnTargeted/Targeted)+bias))
SapientState是分步决策和反向评估的累计进化结果,weight代表权重,UnTargeted是目标未明确的状态值,Targeted是目标明确的状态值,bias是偏置,relu是激励函数;SapientState is the cumulative evolution result of step-by-step decision-making and reverse evaluation, weight represents weight, UnTargeted is the state value with unclear target, Targeted is the state value with clear target, bias is bias, and relu is the incentive function;
输出逻辑的计算:通过relu函数用预期收益除以取舍逻辑Calculation of the output logic: divide the expected return by the trade-off logic through the relu function
OutputLogic=relu(weight*(prospective/ar)+bias))OutputLogic=relu(weight*(prospective/ar)+bias))
OutputLogic是输出逻辑,prospective是预期收益,ar是取舍逻辑;OutputLogic is the output logic, prospective is the expected return, and ar is the logic of choice;
情况代表了对环境认识的最高层次,情况的计算:The situation represents the highest level of understanding of the environment, the calculation of the situation:
Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))
Situation代表情况。Situation represents the situation.
该基于广度学习算法实现智能体的逻辑推理系统,其中该模块2包括:The logical reasoning system of the agent based on the breadth learning algorithm is realized, wherein the module 2 includes:
模块21、使用XGBoost算法,该分步决策和反向评估的累计进化结果、该目标未明确的状态值、该目标明确的状态值和该智能体的初始目标作为输出,该环境数据作为输入,筛选出的特征作为自我评估特征;使用LightGBM算法,该情况和该输出逻辑作为输出,该环境数据作为输入,筛选出环境评估特征;用GradientBoosting算法,分步决策和反向评估的累计进化结果和情况为输出,该环境数据作为输入,筛选出输出逻辑特征;用逻辑回归算法把该智能体的初始目标作为输出,用该自我评估特征、该环境评估特征和该输出逻辑特征值作为输入,筛选出模型融合的特征作为新特征;Module 21. Using the XGBoost algorithm, the cumulative evolution result of the step-by-step decision and reverse evaluation, the unclear state value of the goal, the clear state value of the goal, and the agent’s initial goal as output, and the environmental data as input, The selected features are used as self-assessment features; the LightGBM algorithm is used, the situation and the output logic are used as output, and the environmental data is used as input to filter out the environmental assessment features; the GradientBoosting algorithm is used to determine the cumulative evolution results of stepwise decision-making and reverse evaluation. When the situation is output, the environmental data is used as input to filter out the output logical characteristics; the logistic regression algorithm is used to take the agent’s initial goal as output, and the self-assessment characteristics, environmental evaluation characteristics and output logical characteristics are used as input to filter Feature merged with the model as a new feature;
模块22、根据该新特征和随机森林算法为每一个股票进行打分,取分值排名最高的多个股票形成的alpha指数,根据新alpha组合与模仿对象alpha的更新规则替代该模块11中该模仿对象;Module 22. Score each stock according to the new feature and random forest algorithm, take the alpha index formed by the multiple stocks with the highest score value, and replace the imitation in module 11 according to the update rule of the new alpha combination and the imitation object alpha Object
模块23、循环该模块11到模块22,直到该分步决策和反向评估的累计进化结果和该情况同时收敛,输出该新模仿对象,并输出该新模仿对象的条件组合以及模仿对象与逻辑之间的因果关系。Module 23. Loop the modules 11 to 22 until the cumulative evolution result of the step-by-step decision and reverse evaluation converges with the situation at the same time, output the new imitation object, and output the condition combination of the new imitation object and the imitation object and logic The causal relationship between.
该基于广度学习算法实现智能体的逻辑推理系统,其中筛选该自我评估特征的过程包括:The breadth-based learning algorithm realizes the logical reasoning system of the agent, wherein the process of screening the self-assessment features includes:
NewFeature1=XGBClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,UnTargeted,Targeted,AgentTarget])NewFeature1=XGBClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,UnTargeted,Targeted,AgentTarget])
NewFeature1为该自我评估特征,XGBClassifier为分类函数,SapientState,UnTargeted,Targeted,AgentTarget分别是分步决策和反向评估的累计进化结果,目标未明确的状态值,目标明确的状态值,智能体的初始目标;NewFeature1 is the self-assessment feature, XGBClassifier is the classification function, SapientState, UnTargeted, Targeted, and AgentTarget are the cumulative evolutionary results of step-by-step decision-making and reverse evaluation, respectively. The state value with unclear target, the state value with clear target, the initial agent aims;
筛选该环境评估特征的过程包括:The process of screening this environmental assessment feature includes:
NewFeature2=LightGBMClassifier(input(fin[36],market[32],mac[11],news[17]),output[Situation,OutputLogic])NewFeature2=LightGBMClassifier(input(fin[36], market[32], mac[11], news[17]), output[Situation,OutputLogic])
NewFeature2为该环境评估特征,LightGBMClassifier为分类函数,Situation是情况,OutputLogic是输出逻辑;NewFeature2 is the environmental evaluation feature, LightGBMClassifier is the classification function, Situation is the situation, and OutputLogic is the output logic;
筛选该输出逻辑特征的过程包括:The process of screening the output logical characteristics includes:
NewFeature3=GradientBoostingClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,Situation])NewFeature3=GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])
NewFeature3为该输出逻辑特征,GradientBoostingClassifier为分类函数;NewFeature3 is the output logical feature, and GradientBoostingClassifier is the classification function;
根据该环境评估特征、该输出逻辑特征和该自我评估特征,筛选出该新特征:According to the environmental assessment feature, the output logic feature, and the self-assessment feature, the new feature is selected:
CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])
其中CombinedFeature为该新特征,LRClassifier为逻辑回归函数。Among them, CombinedFeature is the new feature, and LRClassifier is the logistic regression function.
工业应用性Industrial applicability
本发明涉及一种基于广度学习算法实现智能体的通用逻辑推理方法和系统,包括:获取对象对应的各类环境数据,其中每类环境数据包括多维数据或指标,通过建立各类数据对应属性的逻辑层构建逻辑强化过程和反向模型,对该对象在环境中的情况进行动态的自我评估、环境评估和逻辑评估,并使用逻辑回归算法将三大评估进行特征融合,得到新特征;根据该新特征构建新的对象,然后对新对象形成的逻辑与人为初始目标建立因果关系并进行评估。本发明能全自动探索条件因素组合起来的逻辑与目标的因果关系,从而实现机器算法的逻辑推理,他提高了自动化程度;更好地解释了数据之间的依赖关系;降低了使用AI的技术门槛。The present invention relates to a general logical reasoning method and system for realizing an agent based on a breadth learning algorithm, including: acquiring various environmental data corresponding to an object, wherein each environmental data includes multi-dimensional data or indicators, and establishing corresponding attributes of various data The logic layer constructs a logic strengthening process and a reverse model, carries out dynamic self-assessment, environmental assessment and logical assessment of the object's environment in the environment, and uses the logistic regression algorithm to integrate the three assessments to obtain new features; The new feature constructs a new object, and then establishes and evaluates the causal relationship between the logic formed by the new object and the human initial goal. The invention can fully explore the causal relationship between the logic and the target combined by the condition factors, thereby realizing the logical reasoning of the machine algorithm, which improves the degree of automation; better explains the dependence relationship between data; reduces the use of AI technology threshold.

Claims (10)

  1. 一种基于广度学习算法实现智能体的通用逻辑推理方法,其特征在于,包括:A general logical reasoning method for an agent based on a breadth learning algorithm is characterized in that it includes:
    步骤1、获取对象对应的各类环境数据,其中每类环境数据包括多维数据或指标,通过特征提取得到每类环境数据的属性;Step 1. Obtain various types of environmental data corresponding to the object, where each type of environmental data includes multi-dimensional data or indicators, and obtain the attributes of each type of environmental data through feature extraction;
    步骤2、建立各类环境数据对应属性的逻辑层,以构建逻辑强化过程和反向模型,对该对象在环境中的情况进行动态的自我评估、环境评估和逻辑评估,并使用逻辑回归算法将三者的评估结果进行特征融合,得到新特征;Step 2. Establish a logical layer corresponding to the attributes of various environmental data to build a logical strengthening process and a reverse model, conduct dynamic self-assessment, environmental assessment and logical assessment of the object’s situation in the environment, and use logistic regression algorithms to The evaluation results of the three are combined with features to obtain new features;
    步骤3、根据该新特征构建新对象,然后对该新对象形成的逻辑与人为初始目标的因果关系和逻辑进行评估,根据评估结果确认因果关系以及形成该因果关系的逻辑,然后将符合该因果关系的新对象作为逻辑推理结果进行推荐输出。Step 3. Construct a new object according to the new feature, and then evaluate the causal relationship and logic between the logic formed by the new object and the human initial goal, and confirm the causal relationship and the logic that formed the causal relationship according to the evaluation result, and then it will conform to the causality The new object of the relationship is recommended as the result of logical inference.
  2. 如权利要求1所述的基于广度学习算法实现智能体的通用逻辑推理方法,其特征在于,该步骤1包括:The universal logical reasoning method for an agent based on a breadth learning algorithm according to claim 1, wherein the step 1 includes:
    步骤11、获取人为初始目标和环境数据,该环境数据包括财务指标、行情指标、新闻指标和宏观指标,以随机的环境数据作为筛选条件,筛选得到满足该筛选条件的多个股票,集合该多个股票形成alpha指数作为模仿对象;Step 11. Obtain man-made initial goals and environmental data. The environmental data includes financial indicators, market indicators, news indicators, and macro indicators. Random environmental data is used as a screening condition to select multiple stocks that meet the screening conditions, and aggregate the multiple stocks. Each stock forms an alpha index as an imitation object;
    步骤12、通过激励函数合并该环境数据,得到情绪指标,使用该情绪指标与该人为初始目标做加法得到智能体的初级概念状态,通过激励函数用模仿对象的收益率除以智能体的初级概念状态,得到取舍逻辑,通过激励函数用取舍逻辑除以模仿对象的收益率,得到智能体的初始目标;Step 12. Combine the environmental data through the incentive function to obtain the emotion index, add the emotion index and the person as the initial goal to get the primary concept state of the agent, and divide the return rate of the imitation object by the primary concept of the agent through the incentive function State, get the logic of selection, and divide the logic of selection by the incentive function by the rate of return of the imitation object to get the initial goal of the agent;
    步骤13、通过激励函数用智能体的初始目标除以模仿对象的收益率与情绪指标的差得到目标明确的状态值,通过激励函数把智能体的初始目标,目标明确的状态值,智能体的新概念状态和未感知态作为输入得到目标未明确的状态值,通过激励函数用模仿对象的移动止损距离除以目标明确的状态值得到智能体的新概念状态,通过激励函数用目标未明确的状态值除以目标明确的状态值作为分步决策和反向评估的累计进化结果,通过relu函数用智能体的初始目标除以该取舍逻辑得到输出逻辑,通过激励函数用输出逻辑除以人为初始目标 与实际收益的差,得到情况;Step 13. Use the incentive function to divide the agent’s initial goal by the difference between the return rate of the imitation object and the emotional index to obtain a clear state value. Through the incentive function, the agent’s initial goal, the clear state value, and the agent’s The new concept state and the unperceived state are used as input to get the unclear state value of the target, and the new concept state of the agent is obtained by the incentive function by dividing the moving stop loss distance of the imitation object by the clear state value, and the target is unclear through the incentive function Divide the state value of the target by the state value of the goal as the cumulative evolution result of step-by-step decision-making and reverse evaluation. Through the relu function, the agent’s initial goal is divided by the selection logic to obtain the output logic, and the incentive function is used to divide the output logic by the artificial The difference between the initial target and the actual income, get the situation;
    步骤14、集合该分步决策和反向评估的累计进化结果、该情况和该智能体的初始目标作为该属性。Step 14. Collect the cumulative evolution result of the step-by-step decision and reverse evaluation, the situation and the agent's initial goal as the attribute.
  3. 如权利要求2所述的基于广度学习算法实现智能体的通用逻辑推理方法,其特征在于,该情绪指标的计算:The universal logical reasoning method for an agent based on a breadth learning algorithm as claimed in claim 2, wherein the calculation of the emotional index is:
    MergeAllData=pd.merge(fin[36],market[32],mac[11],news[17])MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])
    EmotionIndex=tanh(weight*MergeAllData+bias)EmotionIndex=tanh(weight*MergeAllData+bias)
    EmotionIndex是情绪指标,weight代表权重,fin[36],market[32],mac[11],news[17]分别是财务数据、行情数据、宏观数据、新闻数据,bias是偏置,tanh是激励函数,pd.merge()是合并函数;EmotionIndex is sentiment index, weight represents weight, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias is bias, tanh is incentive Function, pd.merge() is the merge function;
    该智能体的初级概念状态的计算:Calculation of the primary concept state of the agent:
    idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)
    idea代表智能体的初级概念状态,prospective是预期收益,reshape()是降维函数,relu是激励函数;Idea represents the primary concept state of the agent, prospect is the expected return, reshape() is the dimensionality reduction function, and relu is the incentive function;
    该取舍逻辑的计算:Calculation of the trade-off logic:
    ar=sigmoid(weight*(reward/idea)+bias))ar=sigmoid(weight*(reward/idea)+bias))
    ar为取舍逻辑,reward是步骤11中该模仿对象的收益率,sigmoid是激励函数;ar is the logic of selection, reward is the rate of return of the imitation object in step 11, and sigmoid is the incentive function;
    AgentTarget=relu(weight*(ar/reward)+bias))AgentTarget=relu(weight*(ar/reward)+bias))
    AgentTarget是智能体的初始目标,weight代表权重,reward是alpha的收益率,relu是激励函数;AgentTarget is the initial target of the agent, weight represents the weight, reward is the rate of return of alpha, and relu is the incentive function;
    该目标明确的状态值的值为:The value of the targeted state value is:
    Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))
    Targeted是目标明确的状态值的值,AgentTarget是智能体的初始目标;Targeted is the value of the target state value, and AgentTarget is the initial target of the agent;
    该目标未明确的状态值的值为:The value of the unclear status value of the target is:
    UnTargeted=sigmoid(weight*([AgentTarget,Targeted,UnTargeted=sigmoid(weight*([AgentTarget, Targeted,
    NewIdea,un-recognized n])+bias)) NewIdea,un-recognized n ])+bias))
    UnTargeted是目标未明确的状态值的值,un-recognized n代表由n维未感知态构成的未知数据; UnTargeted is the value of the unrecognized state value of the target, and un-recognized n represents the unknown data composed of n-dimensional unrecognized state;
    该智能体的新概念状态通过下式得到:The new concept state of the agent is obtained by the following formula:
    NewIdea=relu(weight*(Moveloss/Targeted)+bias))NewIdea=relu(weight*(Moveloss/Targeted)+bias))
    NewIdea是该智能体的新概念状态,Moveloss是该模仿对象的移动止损距离;NewIdea is the new concept state of the agent, and Moveloss is the moving stop loss distance of the imitation object;
    分步决策和反向评估的累计进化结果就是目标未明确的状态值与目标明确的状态值的比;The cumulative evolution result of step-by-step decision-making and reverse evaluation is the ratio of the state value with unclear goals to the state value with clear goals;
    SapientState=relu(weight*(UnTargeted/Targeted)+bias))SapientState=relu(weight*(UnTargeted/Targeted)+bias))
    SapientState是分步决策和反向评估的累计进化结果,weight代表权重,UnTargeted是目标未明确的状态值,Targeted是目标明确的状态值,bias是偏置,relu是激励函数;SapientState is the cumulative evolution result of step-by-step decision-making and reverse evaluation, weight represents weight, UnTargeted is the state value with unclear target, Targeted is the state value with clear target, bias is bias, and relu is the incentive function;
    输出逻辑的计算:通过relu函数用预期收益除以取舍逻辑;Calculation of the output logic: divide the expected return by the trade-off logic through the relu function;
    OutputLogic=relu(weight*(prospective/ar)+bias))OutputLogic=relu(weight*(prospective/ar)+bias))
    OutputLogic是输出逻辑,prospective是预期收益,ar是取舍逻辑;OutputLogic is the output logic, prospective is the expected return, and ar is the logic of choice;
    情况代表了对环境认识的最高层次,情况的计算:The situation represents the highest level of understanding of the environment, the calculation of the situation:
    Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))
    Situation代表情况。Situation represents the situation.
  4. 如权利要求2或3所述的基于广度学习算法实现智能体的通用逻辑推理方法,其特征在于,该步骤2包括:The universal logical reasoning method for an agent based on a breadth learning algorithm according to claim 2 or 3, characterized in that step 2 includes:
    步骤21、使用XGBoost算法,该分步决策和反向评估的累计进化结果、该目标未明确的状态值、该目标明确的状态值和该智能体的初始目标作为输出,该环境数据作为输入,筛选出的特征作为自我评估特征;使用LightGBM算法,该情况和该输出逻辑作为输出,该环境数据作为输入,筛选出环境评估特征;用GradientBoosting算法,分步决策和反向评估的累计进化结果和情况为输出,该环境数据作为输入,筛选出输出逻辑特征;用逻辑回归算法把该智能体的初始目标作为输出,用该自我评估特征、该环境评估特征和该输出逻辑特征值作为输入,筛选出模型融合的特征作为新特征;Step 21. Using the XGBoost algorithm, the cumulative evolution result of the step-by-step decision and reverse evaluation, the unclear state value of the goal, the clear state value of the goal, and the agent’s initial goal are used as output, and the environmental data is used as input. The selected features are used as self-assessment features; the LightGBM algorithm is used, the situation and the output logic are used as output, and the environmental data is used as input to filter out the environmental assessment features; the GradientBoosting algorithm is used to determine the cumulative evolution results of stepwise decision-making and reverse evaluation. When the situation is output, the environmental data is used as input to filter out the output logical characteristics; the logistic regression algorithm is used to take the agent’s initial goal as output, and the self-assessment characteristics, environmental evaluation characteristics and output logical characteristics are used as input to filter Feature merged with the model as a new feature;
    步骤22、根据该新特征和随机森林算法为每一个股票进行打分,取分值排名最高的多个股票形成的alpha指数,根据新alpha组合与模仿对象alpha的更新规则替代该步骤11中该模仿对象;Step 22: Score each stock according to the new feature and the random forest algorithm, take the alpha index formed by the multiple stocks with the highest score value, and replace the imitation in step 11 according to the update rule of the new alpha combination and the imitation object alpha Object
    步骤23、循环该步骤11到步骤22,直到该分步决策和反向评估的累计进化结果和该情况同时收敛,输出该新模仿对象,并输出该新模仿对象的条件组合以及模仿对象与逻辑之间的因果关系。Step 23. Loop the steps 11 to 22 until the cumulative evolution result of the step-by-step decision and reverse evaluation converges with the situation at the same time, output the new imitation object, and output the condition combination of the new imitation object and the imitation object and logic The causal relationship between.
  5. 如权利要求4所述的基于广度学习算法实现智能体的通用逻辑推理方法,其特征在于,筛选该自我评估特征的过程包括:The universal logical reasoning method for an agent based on a breadth learning algorithm as claimed in claim 4, wherein the process of screening the self-assessment features comprises:
    NewFeature1=XGBClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,UnTargeted,Targeted,AgentTarget])NewFeature1=XGBClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,UnTargeted,Targeted,AgentTarget])
    NewFeature1为该自我评估特征,XGBClassifier为分类函数,SapientState,UnTargeted,Targeted,AgentTarget分别是分步决策和反向评估的累计进化结果,目标未明确的状态值,目标明确的状态值,智能体的初始目标;NewFeature1 is the self-assessment feature, XGBClassifier is the classification function, SapientState, UnTargeted, Targeted, and AgentTarget are the cumulative evolutionary results of step-by-step decision-making and reverse evaluation, respectively. The state value with unclear target, the state value with clear target, the initial agent aims;
    筛选该环境评估特征的过程包括:The process of screening this environmental assessment feature includes:
    NewFeature2=LightGBMClassifier(input(fin[36],market[32],mac[11],news[17]),output[Situation,OutputLogic])NewFeature2=LightGBMClassifier(input(fin[36], market[32], mac[11], news[17]), output[Situation,OutputLogic])
    NewFeature2为该环境评估特征,LightGBMClassifier为分类函数,Situation是情况,OutputLogic是输出逻辑;NewFeature2 is the environmental evaluation feature, LightGBMClassifier is the classification function, Situation is the situation, and OutputLogic is the output logic;
    筛选该输出逻辑特征的过程包括:The process of screening the output logical characteristics includes:
    NewFeature3=GradientBoostingClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,Situation])NewFeature3=GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])
    NewFeature3为该输出逻辑特征,GradientBoostingClassifier为分类函数;NewFeature3 is the output logical feature, and GradientBoostingClassifier is the classification function;
    根据该环境评估特征、该输出逻辑特征和该自我评估特征,筛选出该新特征:According to the environmental assessment feature, the output logic feature, and the self-assessment feature, the new feature is selected:
    CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])
    其中CombinedFeature为该新特征,LRClassifier为逻辑回归函数。Among them, CombinedFeature is the new feature, and LRClassifier is the logistic regression function.
  6. 一种基于广度学习算法实现智能体的通用逻辑推理方法推理系统,其特征在于,包括:A general logic reasoning method reasoning system based on a breadth learning algorithm to realize an agent is characterized in that it includes:
    模块1、获取对象对应的各类环境数据,其中每类环境数据包括多维数据或指标,通过特征提取得到每类环境数据的属性;Module 1. Obtain various types of environmental data corresponding to the object, where each type of environmental data includes multi-dimensional data or indicators, and obtain the attributes of each type of environmental data through feature extraction;
    模块2、建立各类环境数据对应属性的逻辑层,以构建逻辑强化过程和反向模型,对该对象在环境中的情况进行动态的自我评估、环境评估和逻辑评估,并使用逻辑回归算法将三者的评估结果进行特征融合,得到新特征;Module 2. Establish a logical layer corresponding to the attributes of various environmental data to construct a logical strengthening process and a reverse model, conduct dynamic self-assessment, environmental assessment, and logical assessment of the object’s situation in the environment, and use logistic regression algorithms to The evaluation results of the three are combined with features to obtain new features;
    模块3、根据该新特征构建新对象,然后对该新对象形成的逻辑与人为初始目标建立因果关系,并对该因果关系和逻辑进行评估,根据评估结果确认因 果关系以及形成该因果关系的逻辑后,将符合该因果关系的新对象作为逻辑推理结果进行推荐输出。Module 3. Construct a new object according to the new feature, then establish a causal relationship between the logic formed by the new object and the human initial goal, and evaluate the causal relationship and logic, and confirm the causal relationship and the logic of forming the causal relationship according to the evaluation results Then, the new object that meets the causal relationship is used as the result of logical reasoning for recommendation output.
  7. 如权利要求6所述的基于广度学习算法实现智能体的逻辑推理系统,其特征在于,该模块1包括:The logical reasoning system of an agent based on a breadth learning algorithm according to claim 6, wherein the module 1 includes:
    模块11、获取人为初始目标和环境数据,该环境数据包括财务指标、行情指标、新闻指标和宏观指标,以随机的环境数据作为筛选条件,筛选得到满足该筛选条件的多个股票,集合该多个股票形成alpha指数作为模仿对象;Module 11. Obtain man-made initial goals and environmental data. The environmental data includes financial indicators, market indicators, news indicators, and macro indicators. Random environmental data is used as a screening condition to screen multiple stocks that meet the screening conditions, and collect the multiple stocks. Each stock forms an alpha index as an imitation object;
    模块12、通过激励函数合并该环境数据,得到情绪指标,使用该环境数据与该人为初始目标做加法得到智能体的初级概念状态,通过激励函数用模仿对象的收益率除以智能体的初级概念状态,得到取舍逻辑,通过激励函数用取舍逻辑除以模仿对象的收益率,得到智能体的初始目标;Module 12. Combine the environmental data through the incentive function to obtain the emotional index, use the environmental data and the person as the initial goal to add the primary concept state of the agent, and divide the return rate of the imitation object by the primary concept of the agent through the incentive function State, get the logic of selection, and divide the logic of selection by the incentive function by the rate of return of the imitation object to get the initial goal of the agent;
    模块13、通过激励函数用智能体的初始目标除以模仿对象的收益率与情绪指标的差得到目标明确的状态值,通过激励函数把智能体的初始目标,目标明确的状态值,智能体的新概念状态和未感知态作为输入得到目标未明确的状态值,通过激励函数用模仿对象的移动止损距离除以目标明确的状态值得到智能体的新概念状态,通过激励函数用目标未明确的状态值除以目标明确的状态值的作为分步决策和反向评估的累计进化结果,通过relu函数用智能体的初始目标除以该取舍逻辑得到输出逻辑,通过激励函数用输出逻辑除以人为初始目标与实际收益的差,得到情况;Module 13. Through the incentive function, the agent’s initial goal is divided by the difference between the return rate of the imitation object and the emotional index to obtain a clear state value. Through the incentive function, the agent’s initial goal, the clear state value, and the agent’s The new concept state and the unperceived state are used as input to get the unclear state value of the target, and the new concept state of the agent is obtained by the incentive function by dividing the moving stop loss distance of the imitation object by the clear state value, and the target is unclear through the incentive function The state value divided by the state value with a clear goal is the cumulative evolution result of step-by-step decision-making and reverse evaluation. Through the relu function, the agent’s initial goal is divided by the selection logic to obtain the output logic, and the output logic is divided by the incentive function The difference between the artificial initial goal and the actual income, get the situation;
    模块14、集合该分步决策和反向评估的累计进化结果、该情况和该智能体的初始目标作为该属性。Module 14. Collect the cumulative evolution result of the step-by-step decision and reverse evaluation, the situation and the agent's initial goal as the attribute.
  8. 如权利要求7所述的基于广度学习算法实现智能体的逻辑推理系统,其特征在于,该情绪指标的计算:The logical reasoning system based on a breadth learning algorithm to realize an agent according to claim 7, wherein the calculation of the emotional index:
    MergeAllData=pd.merge(fin[36],market[32],mac[11],news[17])MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])
    EmotionIndex=tanh(weight*MergeAllData+bias)EmotionIndex=tanh(weight*MergeAllData+bias)
    EmotionIndex是情绪指标,weight代表权重,fin[36],market[32],mac[11],news[17]分别是财务数据、行情数据、宏观数据、新闻数据,bias是偏置,tanh是激励函数,pd.merge()是合并函数;EmotionIndex is sentiment index, weight represents weight, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias is bias, tanh is incentive Function, pd.merge() is the merge function;
    该智能体的初级概念状态的计算:Calculation of the primary concept state of the agent:
    idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)
    idea代表智能体的初级概念状态,prospective是预期收益,relu是激励函数;Idea represents the primary concept state of the agent, prospect is the expected return, and relu is the incentive function;
    该取舍逻辑的计算:Calculation of the trade-off logic:
    ar=sigmoid(weight*(reward/idea)+bias))ar=sigmoid(weight*(reward/idea)+bias))
    ar为取舍逻辑,reward是模块11中该模仿对象的收益率,sigmoid是激励函数;ar is the logic of selection, reward is the rate of return of the imitation object in module 11, and sigmoid is the incentive function;
    AgentTarget=relu(weight*(ar/reward)+bias))AgentTarget=relu(weight*(ar/reward)+bias))
    AgentTarget是智能体的初始目标,weight代表权重,ar是取舍逻辑,reward是alpha的收益率,bias是偏置,relu是激励函数;AgentTarget is the initial target of the agent, weight represents the weight, ar is the logic of selection, reward is the rate of return of alpha, bias is the bias, and relu is the incentive function;
    该目标明确的状态值的值为:The value of the targeted state value is:
    Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))
    Targeted是目标明确的状态值的值,AgentTarget是智能体的初始目标;Targeted is the value of the target state value, and AgentTarget is the initial target of the agent;
    该目标未明确的状态值的值为:The value of the unclear status value of the target is:
    UnTargeted=sigmoid(weight*([AgentTarget,Targeted,NewIdea,un-recognized n])+bias)) UnTargeted=sigmoid(weight*([AgentTarget, Targeted, NewIdea,un-recognized n ])+bias))
    UnTargeted是目标未明确的状态值的值,un-recognized n代表由n维未感知态构成的未知数据; UnTargeted is the value of the unrecognized state value of the target, and un-recognized n represents the unknown data composed of n-dimensional unrecognized state;
    该智能体的新概念状态通过下式得到:The new concept state of the agent is obtained by the following formula:
    NewIdea=relu(weight*(Moveloss/Targeted)+bias))NewIdea=relu(weight*(Moveloss/Targeted)+bias))
    NewIdea是该智能体的新概念状态,Moveloss是该模仿对象的移动止损距离;NewIdea is the new concept state of the agent, and Moveloss is the moving stop loss distance of the imitation object;
    分步决策和反向评估的累计进化结果就是目标未明确的状态值与目标明确的状态值的比;The cumulative evolution result of step-by-step decision-making and reverse evaluation is the ratio of the state value with unclear goals to the state value with clear goals;
    SapientState=relu(weight*(UnTargeted/Targeted)+bias))SapientState=relu(weight*(UnTargeted/Targeted)+bias))
    SapientState是分步决策和反向评估的累计进化结果,weight代表权重,UnTargeted是目标未明确的状态值,Targeted是目标明确的状态值,bias是偏置,relu是激励函数;SapientState is the cumulative evolution result of step-by-step decision-making and reverse evaluation, weight represents weight, UnTargeted is the state value with unclear target, Targeted is the state value with clear target, bias is bias, and relu is the incentive function;
    输出逻辑的计算:通过relu函数用预期收益除以取舍逻辑Calculation of the output logic: divide the expected return by the trade-off logic through the relu function
    OutputLogic=relu(weight*(prospective/ar)+bias))OutputLogic=relu(weight*(prospective/ar)+bias))
    OutputLogic是输出逻辑,prospective是预期收益,ar是取舍逻辑;OutputLogic is the output logic, prospective is the expected return, and ar is the logic of choice;
    情况代表了对环境认识的最高层次,情况的计算:The situation represents the highest level of understanding of the environment, the calculation of the situation:
    Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))
    Situation代表情况。Situation represents the situation.
  9. 如权利要求7或8所述的基于广度学习算法实现智能体的逻辑推理系统,其特征在于,该模块2包括:The logical reasoning system of an agent based on a breadth learning algorithm according to claim 7 or 8, characterized in that the module 2 includes:
    模块21、使用XGBoost算法,该分步决策和反向评估的累计进化结果、该目标未明确的状态值、该目标明确的状态值和该智能体的初始目标作为输出,该环境数据作为输入,筛选出的特征作为自我评估特征;使用LightGBM算法,该情况和该输出逻辑作为输出,该环境数据作为输入,筛选出环境评估特征;用GradientBoosting算法,分步决策和反向评估的累计进化结果和情况为输出,该环境数据作为输入,筛选出输出逻辑特征;用逻辑回归算法把该智能体的初始目标作为输出,用该自我评估特征、该环境评估特征和该输出逻辑特征值作为输入,筛选出模型融合的特征作为新特征;Module 21. Using the XGBoost algorithm, the cumulative evolution result of the step-by-step decision and reverse evaluation, the unclear state value of the goal, the clear state value of the goal, and the agent’s initial goal as output, and the environmental data as input, The selected features are used as self-assessment features; the LightGBM algorithm is used, the situation and the output logic are used as output, and the environmental data is used as input to filter out the environmental assessment features; the GradientBoosting algorithm is used to determine the cumulative evolution results of stepwise decision-making and reverse evaluation. When the situation is output, the environmental data is used as input to filter out the output logical characteristics; the logistic regression algorithm is used to take the agent’s initial goal as output, and the self-assessment characteristics, environmental evaluation characteristics and output logical characteristics are used as input to filter Feature merged with the model as a new feature;
    模块22、根据该新特征和随机森林算法为每一个股票进行打分,取分值排名最高的多个股票形成的alpha指数,根据新alpha组合与模仿对象alpha的更新规则替代该模块11中该模仿对象;Module 22. Score each stock according to the new feature and random forest algorithm, take the alpha index formed by the multiple stocks with the highest score value, and replace the imitation in module 11 according to the update rule of the new alpha combination and the imitation object alpha Object
    模块23、循环该模块11到模块22,直到该分步决策和反向评估的累计进化结果和该情况同时收敛,输出该新模仿对象,并输出该新模仿对象的条件组合以及模仿对象与逻辑之间的因果关系。Module 23. Loop the modules 11 to 22 until the cumulative evolution result of the step-by-step decision and reverse evaluation converges with the situation at the same time, output the new imitation object, and output the condition combination of the new imitation object and the imitation object and logic The causal relationship between.
  10. 如权利要求9所述的基于广度学习算法实现智能体的逻辑推理系统,其特征在于,筛选该自我评估特征的过程包括:The logical reasoning system based on a breadth learning algorithm to realize an agent according to claim 9, wherein the process of screening the self-assessment features comprises:
    NewFeature1=XGBClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,UnTargeted,Targeted,AgentTarget])NewFeature1=XGBClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,UnTargeted,Targeted,AgentTarget])
    NewFeature1为该自我评估特征,XGBClassifier为分类函数,SapientState,UnTargeted,Targeted,AgentTarget分别是分步决策和反向评估的累计进化结果,目标未明确的状态值,目标明确的状态值,智能体的初始目标;NewFeature1 is the self-assessment feature, XGBClassifier is the classification function, SapientState, UnTargeted, Targeted, and AgentTarget are the cumulative evolutionary results of step-by-step decision-making and reverse evaluation, respectively. The state value with unclear target, the state value with clear target, the initial agent aims;
    筛选该环境评估特征的过程包括:The process of screening this environmental assessment feature includes:
    NewFeature2=LightGBMClassifier(input(fin[36],market[32],mac[11],news[17]),output[Situation,OutputLogic])NewFeature2=LightGBMClassifier(input(fin[36], market[32], mac[11], news[17]), output[Situation,OutputLogic])
    NewFeature2为该环境评估特征,LightGBMClassifier为分类函数,Situation是情况,OutputLogic是输出逻辑;NewFeature2 is the environmental evaluation feature, LightGBMClassifier is the classification function, Situation is the situation, and OutputLogic is the output logic;
    筛选该输出逻辑特征的过程包括:The process of screening the output logical characteristics includes:
    NewFeature3=GradientBoostingClassifier(input(fin[36],market[32],mac[11],news[17]),output[SapientState,Situation])NewFeature3=GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])
    NewFeature3为该输出逻辑特征,GradientBoostingClassifier为分类函数;NewFeature3 is the output logical feature, and GradientBoostingClassifier is the classification function;
    根据该环境评估特征、该输出逻辑特征和该自我评估特征,筛选出该新特征:According to the environmental assessment feature, the output logic feature, and the self-assessment feature, the new feature is selected:
    CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])
    其中CombinedFeature为该新特征,LRClassifier为逻辑回归函数。Among them, CombinedFeature is the new feature, and LRClassifier is the logistic regression function.
PCT/CN2019/078710 2019-03-19 2019-03-19 Universal logical reasoning method and system for implementing agent based on wide learning algorithm WO2020186453A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/078710 WO2020186453A1 (en) 2019-03-19 2019-03-19 Universal logical reasoning method and system for implementing agent based on wide learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/078710 WO2020186453A1 (en) 2019-03-19 2019-03-19 Universal logical reasoning method and system for implementing agent based on wide learning algorithm

Publications (1)

Publication Number Publication Date
WO2020186453A1 true WO2020186453A1 (en) 2020-09-24

Family

ID=72518932

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/078710 WO2020186453A1 (en) 2019-03-19 2019-03-19 Universal logical reasoning method and system for implementing agent based on wide learning algorithm

Country Status (1)

Country Link
WO (1) WO2020186453A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113311035A (en) * 2021-05-17 2021-08-27 北京工业大学 Effluent total phosphorus prediction method based on width learning network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001325582A (en) * 2000-05-17 2001-11-22 Chugoku Electric Power Co Inc:The Learning and predicting device for time-series data
CN103985055A (en) * 2014-05-30 2014-08-13 西安交通大学 Stock market investment decision-making method based on network analysis and multi-model fusion
US20160217366A1 (en) * 2015-01-23 2016-07-28 Jianjun Li Portfolio Optimization Using Neural Networks
CN109325861A (en) * 2018-08-31 2019-02-12 平安科技(深圳)有限公司 Using target stock selection method, device and the storage medium of experience replay mechanism

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001325582A (en) * 2000-05-17 2001-11-22 Chugoku Electric Power Co Inc:The Learning and predicting device for time-series data
CN103985055A (en) * 2014-05-30 2014-08-13 西安交通大学 Stock market investment decision-making method based on network analysis and multi-model fusion
US20160217366A1 (en) * 2015-01-23 2016-07-28 Jianjun Li Portfolio Optimization Using Neural Networks
CN109325861A (en) * 2018-08-31 2019-02-12 平安科技(深圳)有限公司 Using target stock selection method, device and the storage medium of experience replay mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI, XIANG: "Multi-factor Quantitative Stock Option Planning Based on XGBoost Algorithm", ECONOMICS AND MANAGEMENT SCIENCES, CHINESE MASTER’S THESES FULL-TEXT DATABASE, no. 01, 15 January 2018 (2018-01-15), ISSN: 1674-0246, DOI: 20191118133140A *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113311035A (en) * 2021-05-17 2021-08-27 北京工业大学 Effluent total phosphorus prediction method based on width learning network
CN113311035B (en) * 2021-05-17 2022-05-03 北京工业大学 Effluent total phosphorus prediction method based on width learning network

Similar Documents

Publication Publication Date Title
Santoso et al. A genetic programming approach to binary classification problem
Natesan Ramamurthy et al. Model agnostic multilevel explanations
US8515884B2 (en) Neuro type-2 fuzzy based method for decision making
Dostál The use of soft computing for optimization in business, economics, and finance
Sotnik The SOSIEL platform: Knowledge-based, cognitive, and multi-agent
Ansari et al. Parameter tuning of MLP, RBF, and ANFIS models using genetic algorithm in modeling and classification applications
WO2020186453A1 (en) Universal logical reasoning method and system for implementing agent based on wide learning algorithm
Situngkir Emerging the emergence sociology: The philosophical framework of agent-based social studies
Huang et al. Fuzzy c-means clustering based deep patch learning with improved interpretability for classification problems
Shan et al. An integrated knowledge-based system for urban planning decision support
Haryono et al. Stock price forecasting in Indonesia stock exchange using deep learning: A comparative study
Jain et al. Practical applications of computational intelligence techniques
Hatzilygeroudis et al. Fuzzy and neuro-symbolic approaches to assessment of bank loan applicants
Ladas et al. Augmented neural networks for modelling consumer indebtness
Zhu et al. From numeric to granular models: A quest for error and performance analysis
Rasmani et al. Subsethood-based fuzzy rule models and their application to student performance classification
Testov et al. Soft modeling and expert systems in modern science: development trends
Ruz et al. Random vector functional link with naive bayes for classification problems of mixed data
Ma et al. Study on Predicting University Student Performance Based on Course Correlation
Aziz A review on artificial neural networks and its’ applicability
Zhang et al. Intelligent Information Processing with Matlab
Hossain et al. Hybrid neural network for efficient training
Lin et al. Credit risk assessment using BP neural network with Dempster-Shafer theory
Tian et al. The coupling degree prediction between financial innovation process and innovation environment based on GM (1, 1)-BPNN
Tomé et al. Fuzzy Boolean Nets–a nature inspired model for learning and reasoning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19919625

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 04/02/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19919625

Country of ref document: EP

Kind code of ref document: A1