WO2020186453A1

WO2020186453A1 - Universal logical reasoning method and system for implementing agent based on wide learning algorithm

Info

Publication number: WO2020186453A1
Application number: PCT/CN2019/078710
Authority: WO
Inventors: 曾祥洪; 李利鹏; 吴明华
Original assignee: 北京汇真网络传媒科技有限公司
Priority date: 2019-03-19
Filing date: 2019-03-19
Publication date: 2020-09-24

Abstract

A universal logical reasoning method and system for implementing an agent based on a wide learning algorithm, comprising: acquiring various types of environmental data corresponding to an object, where each type of environmental data comprises multidimensional data or indicators, constructing a logic enhancement process and a reverse model by constructing a logic layer having attributes corresponding to the various types of data, performing dynamic self-assessment, environment assessment, and logic assessment with respect to the condition of the object in an environment, using a logistic regression algorithm to perform a feature fusion on the three major assessments to produce a new feature; constructing a new object on the basis of the new feature, and then establishing a causal relationship between a logic formed by the new object and an artificial initial target and perform an assessment. The method and system allow the fully automated exploration of causal relationships between a logic composed by conditional factors and a target, thus implementing a logical reasoning of a machine algorithm; the degree of automation is increased; dependencies between data are better explained; and technical barriers to using AI are reduced.

Description

Realization of general logic reasoning method and system of agent based on breadth learning algorithm

Technical field

The invention relates to the field of machine learning, and in particular to a method and system for simulating the thinking mode of the human brain based on a breadth learning algorithm to realize the logical reasoning of an agent.

Background technique

Deep learning comes from the research of artificial neural networks. The multilayer perceptron with multiple hidden layers is a kind of deep learning structure. Deep learning forms a more abstract high-level representation attribute category or feature by combining low-level features to discover distributed feature representations of data. The main difference between breadth learning and deep learning is that deep learning is a single multi-layer perceptron, while breadth learning is multiple multi-layer perceptrons.

Deep learning is currently mainly used for image recognition, speech recognition and natural speech processing. The main function of the breadth learning we established is to establish the logical relationship between objects, and automatically explore the causal relationship between the object and the environment, the object and the fact, and realize the logical reasoning of the agent. The significance of breadth learning is that he can automatically explore the causality of unknown data based on human logic, reduce the time people spend exploring the truth, and reduce the cost of people exploring the truth; at the same time, he changed all previous recommendation algorithms from passive recommendation (recommendation by machine algorithms) Become an active recommendation (recommendation of manual intervention).

Deep machine learning methods are divided into supervised learning and unsupervised learning. The learning models established under different learning frameworks are very different. For example, Convolutional Neural Networks (CNNs) is a kind of deep supervised learning Machine learning model, and Deep Belief Nets (DBNs) is a machine learning model under unsupervised learning. The breadth learning we established combines the methods of supervised learning and unsupervised learning. It has both supervised learning and unsupervised learning mechanisms. The object of imitation is the supervised object of breadth learning, but the previous deep learning requires a large number of supervised samples, and breadth learning Just one is enough, but it is also unsupervised learning, because his reverse model will automatically explore all the environmental data that composes the imitation object, and form a new imitation object, and optimize the imitation object through continuous iteration until it reaches the expected goal of the person. .

Deep learning methods are often unable to explain the interdependence between data. This is because the functional relationship between data established by him. As the depth deepens, the parameters between functions will become more and more so that mathematics cannot be used. Expressions to express the relationship between them, and therefore cannot be explained. Although the breadth learning we have established is also a deep learning method, it establishes a logical relationship between functions and attributes, and a causal relationship between logic and facts outside of deep learning, so that machine algorithms can observe The correlation between the object and the environment can understand their causal relationship, so that the relationship between the object and the environment can be explained logically.

Deep learning has strong perception ability, but lacks certain decision-making ability; while reinforcement learning has decision-making ability and can't do anything about perception problems. Deep reinforcement learning combines the two and complements each other's advantages, providing solutions to the perception and decision-making problems of complex systems. The anthropomorphic decision model in the breadth learning algorithm we established includes logical reinforcement and reverse models. The same thing as deep reinforcement learning is that the anthropomorphic decision model inherits the method of deep reinforcement learning. For example, it includes logical reinforcement and goal Reinforcement and strategy reinforcement. These reinforcement learning algorithms draw on the advantages of deep reinforcement learning. They can perceive the environment and provide corresponding strategies in different situations. However, the difference between the anthropomorphic decision model and deep reinforcement learning is that they have countermeasures. Directional model and reverse model include re-evaluation of reinforcement learning and further feature reorganization based on evaluation.

Invention Disclosure

The core invention of the present invention is based on breadth learning to establish a set of methods for machine algorithms (agents) to recognize the causal relationship by observing the correlation between the object and the environment, that is, this method can be realized Logical reasoning, the second invention is the method of generating logic and the method of verifying the pros and cons of logic. The object and environment are all commonly used words in the field, and the explanation of its own meaning will not be repeated.

For example, let the machine algorithm automatically analyze the business data (the securities market in the present invention) and provide recommendations and recommendation logic. The present invention solves two types of problems: 1. People can set or modify the target, let the machine algorithm automatically mine and generate the causal relationship between the target and the logic, and recommend the result to the user, which will make people very convenient Can explore the causality of unknown data according to their own logic, and greatly reduce the trial and error cost of exploring the truth. 2. People can use a machine algorithm (an agent or a robot) to check whether their logic is in line with the actual situation. When the machine algorithm cannot recommend a result that meets human expectations, it proves that people's logic may not be in line with the actual situation. People can intervene in the logical part of the machine algorithm so that the machine's recommendation gradually meets the user's expectations and goals; specifically, the present invention proposes a general logical reasoning method based on a breadth learning algorithm to realize an agent, including:

Step 1. Obtain various types of environmental data corresponding to the object, where each type of environmental data includes multi-dimensional data or indicators, and obtain the attributes of each type of environmental data through feature extraction;

Step 2. Establish a logical layer corresponding to the attributes of various environmental data to build a logical strengthening process and a reverse model, conduct dynamic self-assessment, environmental assessment and logical assessment of the object’s situation in the environment, and use logistic regression algorithms to The evaluation results of the three are combined with features to obtain new features;

Step 3. Construct a new object based on the new feature, and then establish a causal relationship between the logic formed by the new object and the human initial goal, and evaluate the causal relationship and logic, and confirm the causal relationship and the logic of forming the causal relationship according to the evaluation results Then, the new object that meets the causal relationship is used as the result of logical reasoning for recommendation output.

The general logical reasoning method of the agent based on the breadth learning algorithm is realized, wherein the step 1 includes:

Step 11. Obtain man-made initial goals and environmental data. The environmental data includes financial indicators, market indicators, news indicators, and macro indicators. Random environmental data is used as a screening condition to select multiple stocks that meet the screening conditions, and aggregate the multiple stocks. Each stock forms an alpha index as an imitation object;

Step 12. Combine the environmental data through the incentive function to obtain the emotional index, use the environmental data and the person as the initial goal to add the primary concept state of the agent, and divide the return rate of the imitation object by the primary concept of the agent through the incentive function State, get the logic of selection, and divide the logic of selection by the incentive function by the rate of return of the imitation object to get the initial goal of the agent;

Step 13. Use the incentive function to divide the agent’s initial goal by the difference between the return rate of the imitation object and the emotional index to obtain a clear state value. Through the incentive function, the agent’s initial goal, the clear state value, and the agent’s The new concept state and the unperceived state are used as input to get the unclear state value of the target, and the new concept state of the agent is obtained by the incentive function by dividing the moving stop loss distance of the imitation object by the clear state value, and the target is unclear through the incentive function Divide the state value of the target by the state value of the goal as the cumulative evolution result of step-by-step decision-making and reverse evaluation. Through the relu function, the agent’s initial goal is divided by the selection logic to obtain the output logic, and the incentive function is used to divide the output logic by the artificial The difference between the initial target and the actual income, get the situation;

Step 14. Collect the cumulative evolution result of the step-by-step decision and reverse evaluation, the situation and the agent's initial goal as the attribute.

The breadth-based learning algorithm realizes the general logical reasoning method of the agent, in which the calculation of the emotional index:

MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])

EmotionIndex=tanh(weight*MergeAllData+bias)

EmotionIndex is sentiment index, weight represents weight, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias is bias, tanh is incentive Function, pd.merge() is the merge function;

Calculation of the primary concept state of the agent:

idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)

Idea represents the primary concept state of the agent, prospect is the expected return, and relu is the incentive function;

Calculation of the trade-off logic:

ar=sigmoid(weight*(reward/idea)+bias))

ar is the logic of selection, reward is the rate of return of the imitation object in step 11, and sigmoid is the incentive function;

AgentTarget=relu(weight*(ar/reward)+bias))

AgentTarget is the initial target of the agent, weight represents the weight, ar is the logic of selection, reward is the rate of return of alpha, bias is the bias, and relu is the incentive function;

The goal-specific status values are:

Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))

Targeted is the value of the target state value, and AgentTarget is the initial target of the agent;

The value of the unclear status value of the target is:

UnTargeted=sigmoid(weight*([AgentTarget, Targeted, NewIdea,un-recognized _n ])+bias))

UnTargeted is the value of the unrecognized state value of the target, and un-recognized _n represents the unknown data composed of n-dimensional unrecognized state;

The new concept state of the agent is obtained by the following formula:

NewIdea=relu(weight*(Moveloss/Targeted)+bias))

NewIdea is the new concept state of the agent, and Moveloss is the moving stop loss distance of the imitation object;

The cumulative evolution result of step-by-step decision-making and reverse evaluation is the ratio of the state value with unclear goals to the state value with clear goals;

SapientState=relu(weight*(UnTargeted/Targeted)+bias))

SapientState is the cumulative evolution result of step-by-step decision-making and reverse evaluation, weight represents weight, UnTargeted is the state value with unclear target, Targeted is the state value with clear target, bias is bias, and relu is the incentive function;

Calculation of the output logic: divide the expected return by the trade-off logic through the relu function;

OutputLogic=relu(weight*(prospective/ar)+bias))

OutputLogic is the output logic, prospective is the expected return, and ar is the logic of choice;

The situation represents the highest level of understanding of the environment, the calculation of the situation:

Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))

Situation represents the situation.

The general logical reasoning method of the agent based on the breadth learning algorithm is realized, wherein the step 2 includes:

Step 21. Using the XGBoost algorithm, the cumulative evolution result of the step-by-step decision and reverse evaluation, the unclear state value of the goal, the clear state value of the goal, and the agent’s initial goal are used as output, and the environmental data is used as input. The selected features are used as self-assessment features; the LightGBM algorithm is used, the situation and the output logic are used as output, and the environmental data is used as input to filter out the environmental assessment features; the GradientBoosting algorithm is used to determine the cumulative evolution results of stepwise decision-making and reverse evaluation. When the situation is output, the environmental data is used as input to filter out the output logical characteristics; the logistic regression algorithm is used to take the agent’s initial goal as output, and the self-assessment characteristics, environmental evaluation characteristics and output logical characteristics are used as input to filter Feature merged with the model as a new feature;

Step 22: Score each stock according to the new feature and the random forest algorithm, take the alpha index formed by the multiple stocks with the highest score value, and replace the imitation in step 11 according to the update rule of the new alpha combination and the imitation object alpha Object

Step 23. Loop the steps 11 to 22 until the cumulative evolution result of the step-by-step decision and reverse evaluation converges with the situation at the same time, output the new imitation object, and output the condition combination of the new imitation object and the imitation object and logic The causal relationship between.

The general logical reasoning method of the agent based on the breadth learning algorithm is realized, wherein the process of screening the self-assessment features includes:

NewFeature1=XGBClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,UnTargeted,Targeted,AgentTarget])

NewFeature1 is the self-assessment feature, XGBClassifier is the classification function, SapientState, UnTargeted, Targeted, and AgentTarget are the cumulative evolution results of step-by-step decision-making and reverse evaluation, respectively, the state value with unclear target, the state value with clear target, the initial of the agent aims;

The process of screening this environmental assessment feature includes:

NewFeature2=LightGBMClassifier(input(fin[36], market[32], mac[11], news[17]), output[Situation,OutputLogic])

NewFeature2 is the environmental evaluation feature, LightGBMClassifier is the classification function, Situation is the situation, and OutputLogic is the output logic;

The process of screening the output logical characteristics includes:

NewFeature3=GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])

NewFeature3 is the output logical feature, and GradientBoostingClassifier is the classification function;

According to the environmental assessment feature, the output logic feature, and the self-assessment feature, the new feature is selected:

CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])

Among them, CombinedFeature is the new feature, and LRClassifier is the logistic regression function.

The present invention also proposes a general logic reasoning method reasoning system based on the breadth learning algorithm to realize the agent, which includes:

Module 1. Obtain various types of environmental data corresponding to the object, where each type of environmental data includes multi-dimensional data or indicators, and obtain the attributes of each type of environmental data through feature extraction;

Module 2. Establish a logical layer corresponding to the attributes of various environmental data to construct a logical strengthening process and a reverse model, conduct dynamic self-assessment, environmental assessment, and logical assessment of the object’s situation in the environment, and use logistic regression algorithms to The evaluation results of the three are combined with features to obtain new features;

Module 3. Construct a new object according to the new feature, then establish a causal relationship between the logic formed by the new object and the human initial goal, and evaluate the causal relationship and logic, and confirm the causal relationship and the logic of forming the causal relationship according to the evaluation results Then, the new object that meets the causal relationship is used as the result of logical reasoning for recommendation output.

The logical reasoning system of the agent based on the breadth learning algorithm is realized, wherein the module 1 includes:

Module 11. Obtain man-made initial goals and environmental data. The environmental data includes financial indicators, market indicators, news indicators, and macro indicators. Random environmental data is used as a screening condition to screen multiple stocks that meet the screening conditions, and collect the multiple stocks. Each stock forms an alpha index as an imitation object;

Module 12. Combine the environmental data through the incentive function to obtain the emotional index, use the environmental data and the person as the initial goal to add the primary concept state of the agent, and divide the return rate of the imitation object by the primary concept of the agent through the incentive function State, get the logic of selection, and divide the logic of selection by the incentive function by the rate of return of the imitation object to get the initial goal of the agent;

Module 13. Through the incentive function, the agent’s initial goal is divided by the difference between the return rate of the imitation object and the emotional index to obtain a clear state value. Through the incentive function, the agent’s initial goal, the clear state value, and the agent’s The new concept state and the unperceived state are used as input to get the unclear state value of the target, and the new concept state of the agent is obtained by the incentive function by dividing the moving stop loss distance of the imitation object by the clear state value, and the target is unclear through the incentive function Divide the state value of the target by the state value of the goal as the cumulative evolution result of step-by-step decision-making and reverse evaluation. Through the relu function, the agent’s initial goal is divided by the selection logic to obtain the output logic, and the incentive function is used to divide the output logic by the artificial The difference between the initial target and the actual income, get the situation;

Module 14. Collect the cumulative evolution result of the step-by-step decision and reverse evaluation, the situation and the agent's initial goal as the attribute.

The breadth-based learning algorithm realizes the logical reasoning system of the agent, in which the calculation of the emotional index:

MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])

EmotionIndex=tanh(weight*MergeAllData+bias)

EmotionIndex is sentiment index, weight represents weight, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias is bias, tanh is incentive Function, pd.merge() is the merge function.

Calculation of the primary concept state of the agent:

idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)

Calculation of the trade-off logic:

ar=sigmoid(weight*(reward/idea)+bias))

ar is the logic of selection, reward is the rate of return of the imitation object in module 11, and sigmoid is the incentive function;

AgentTarget=relu(weight*(ar/reward)+bias))

The value of the targeted state value is:

Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))

The value of the unclear status value of the target is:

UnTargeted is the value of the unclear state value of the target, un-recognized _n represents the existence of n kinds of unperceived data. This data is not in the initial environmental data, but data outside the environmental data. When un-recognized _n > 1 It is proved that at least one kind of unperceived data has not entered the environmental data;

The new concept state of the agent is obtained by the following formula:

NewIdea=relu(weight*(Moveloss/Targeted)+bias))

SapientState=relu(weight*(UnTargeted/Targeted)+bias))

Calculation of the output logic: divide the expected return by the trade-off logic through the relu function

OutputLogic=relu(weight*(prospective/ar)+bias))

Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))

Situation represents the situation.

The logical reasoning system of the agent based on the breadth learning algorithm is realized, wherein the module 2 includes:

Module 21. Using the XGBoost algorithm, the cumulative evolution result of the step-by-step decision and reverse evaluation, the unclear state value of the goal, the clear state value of the goal, and the agent’s initial goal as output, and the environmental data as input, The selected features are used as self-assessment features; the LightGBM algorithm is used, the situation and the output logic are used as output, and the environmental data is used as input to filter out the environmental assessment features; the GradientBoosting algorithm is used to determine the cumulative evolution results of stepwise decision-making and reverse evaluation. When the situation is output, the environmental data is used as input to filter out the output logical characteristics; the logistic regression algorithm is used to take the agent’s initial goal as output, and the self-assessment characteristics, environmental evaluation characteristics and output logical characteristics are used as input to filter Feature merged with the model as a new feature;

Module 22. Score each stock according to the new feature and random forest algorithm, take the alpha index formed by the multiple stocks with the highest score value, and replace the imitation in module 11 according to the update rule of the new alpha combination and the imitation object alpha Object

Module 23. Loop the modules 11 to 22 until the cumulative evolution result of the step-by-step decision and reverse evaluation converges with the situation at the same time, output the new imitation object, and output the condition combination of the new imitation object and the imitation object and logic The causal relationship between.

The breadth-based learning algorithm realizes the logical reasoning system of the agent, wherein the process of screening the self-assessment features includes:

NewFeature1 is the self-assessment feature, XGBClassifier is the classification function, SapientState, UnTargeted, Targeted, and AgentTarget are the cumulative evolutionary results of step-by-step decision-making and reverse evaluation, respectively. The state value with unclear target, the state value with clear target, the initial agent aims;

The process of screening this environmental assessment feature includes:

The process of screening the output logical characteristics includes:

The technical effects of the present invention include:

1: Improved automation

The previous feature engineering needs to be done manually. The reverse model in the breadth learning eliminates the need for manual feature engineering. The machine algorithm automatically updates and iteratively performs feature engineering and continuously optimizes features, which greatly accelerates the modeling and model verification. In addition, since the present invention is applied to the securities market, this method also speeds up the discovery of market opportunities.

2: Better explain the dependence between data:

In the present invention, our logic strengthening process establishes the logical relationship between data and function, and the reverse model establishes the causal relationship between data and logic. These two relationships simulate the way of thinking of people, for example, the cognition of self, the cognition of environment, and even the evaluation of self and environment. This method of simulating human brain thinking greatly facilitates people's exploration The interdependence between things.

3: Lower the threshold for using AI.

Nowadays, when engineers develop AI, they need to write a lot of code, and many of them are repetitive codes. In addition, there is no certain experience in setting the hyperparameters, which is likely to lead to over-fitting of the model. The breadth learning algorithm proposed by the present invention, as long as the machine algorithm is given an imitation object and the desired goal is set, the rest can be handed over to the agent to complete, so the present invention not only simplifies the complicated process and steps of manual data mining, but also It also lowers the threshold for using AI, so that non-professionals can quickly design models that meet their needs.

4: The parallelism of calculation is higher. Comparing Figure 1 and Figure 2, we can see that the breadth learning network is a network that can be calculated in parallel. It can calculate different neural networks for different environmental data at the same time, and then use the reverse model to continuously iteratively optimize and discover new The relationship between features and targets. This network structure supports parallel calculations, which greatly accelerates the calculation speed and reduces the number of iterations.

Brief description of the drawings

Figure 1 is a flow chart of breadth learning adopted by the prior art;

Figure 2 is a flow chart of breadth learning adopted by the present invention;

Figure 3 is a combination diagram of alpha in an embodiment of the present invention;

4 is a comparison diagram of the effects of the embodiments of the present invention;

5 is a trend chart of cumulative evolution results of step-by-step decision-making and reverse evaluation with the number of iterations in an embodiment of the present invention;

Fig. 6 is a trend chart of the situation with the number of iterations in the embodiment of the present invention;

FIG. 7 is a trend chart of the rate of return as the number of iterations increases in an embodiment of the present invention;

Figure 8 is a working flow chart of the breadth learning of the present invention;

Figure 9 is a diagram of the relationship between data and functions;

Figure 10 is a flow chart of logic enhancement of the present invention;

Figure 11 is a diagram of the function relationship between variables x and y in the prior art;

Figure 12 is a deep learning connection diagram between variables x and y in the prior art;

Figure 13 is a diagram of the function relationship between variables x and y of the present invention;

Figure 14 is a deep learning connection diagram between variables x and y of the present invention;

Figure 15 is a flow chart of the reverse model of the present invention;

Figure 16 is the overall flow chart of the present invention;

Figure 17 is a network structure diagram of the reverse model of the present invention;

Figure 18 is a calculation flow chart of the reverse model of the present invention;

Figure 19 is a data flow diagram of the new feature of the present invention forming a new alpha combination;

FIG. 20 is a diagram of changes in the value of the new concept state of an agent in another embodiment of the present invention;

FIG. 21 is a diagram showing the change of the return rate, the situation, and the cumulative evolution result of step-by-step decision-making and reverse evaluation and the return in another embodiment of the present invention;

Fig. 22 is a diagram showing the change of the maximum drawdown in another embodiment of the present invention.

The best way to implement the invention

In order to make the above-mentioned features and effects of the present invention more clear and understandable, the following specific embodiments are described in detail in conjunction with the accompanying drawings of the specification.

The realization process of the framework of the present invention includes: first: generating logic and logic strengthening process. Establish multiple multi-layer neural networks (a breadth learning network composed of multiple deep learning networks) through data of different dimensions, and then establish the logical relationship between data and functions by connecting the output of n functions as the input of the logical layer, that is The relationship between data and logic established in different dimensions (the present invention uses a classification algorithm to establish the relationship between data and logic. In addition to classification, it can also be a logical relationship composed of other algorithms); then verify the quality of the logic. First, use the reverse model to automatically do feature engineering to form new features, and then use the new features to generate a new imitation object to compare with the old imitation object. The purpose of the comparison is to test the gap between the result generated by the machine algorithm and the imitation object. It means that the better the imitation, but because the human-made initial target is in it, the gap is the gap after adding the artificial initial target, that is, the difference between the result generated by the machine algorithm and the artificial initial target minus the imitation object. The narrowing of the gap indicates that the machine algorithm is effective. On the contrary, the gap is getting bigger and bigger, either because the target is inappropriate or the environmental data is insufficient. This is mainly reflected in the new concept state of the indicator agent. The imitation object generated by the new feature is the result of the new logic. Finally, it is verified whether there is a causal relationship between the result of the logic and the target. If there is, it means that the logic is in line with the actual situation. Keep this logic and continue to iteratively generate new ones. Features that enable the machine algorithm to continuously approach the desired goal through self-optimization. If not, it will continue to iteratively generate new logic, and further verify that after multiple iterations, if there is indeed no logic and imitation object Causality, the new concept state of the agent will be output, and the new concept state of the agent will indicate what went wrong. It can also be said that the evaluation criterion of good logic is to check whether there is causality. The causality of the present invention is logical and Whether there is a positive correlation between the results.

The following is a detailed and complete process of the present invention.

First, construct a multi-level and multi-dimensional environment. In the present invention, the multi-level data divides the securities market data into four categories, namely macro indicators, financial indicators, market indicators, and news indicators. See Table 1 below:

Table 1

Each level of data is divided into multiple dimensions of data. Among them, there are 11 dimensions of macro indicators, 10 years of data, denoted as mac[11*120], and 36 dimensions of financial indicators, four quarters per year, total In 10 years, 3345 stocks are recorded as fin[36*40*3345], and market indicators have 32 dimensions. A total of 10 years, 3345 stocks are recorded as market[32*2250*3345]. News indicators have 17 dimensions. A total of 5 years of data, counted every 3 days, 3345 stocks, recorded as news[17*600*3345]. See Table 2:

Table 2

Then construct an object that the agent can imitate through this environment-alpha combination and a human initial goal. The imitated object can be a goal or a goal close to the expectation.

The imitation object in this embodiment is a randomly selected alpha combination: see Figure 3. The alpha portfolio consists of 20 stocks, with a yield of 0.0063 and a Sharpe ratio of 0.000089.

Then set the human initial target as a return rate of 5% and a Sharpe ratio of 3, and then let the agent automatically learn by itself and start iterating, a total of 10,000 iterations, and at the time of 3,000, 5,000, and 10,000 The new alpha combination recommended by the output agent is alpha-3, alpha-5, and alpha-10 respectively, and the index is made into an index and compared with the alpha combination of the imitation object. The result is shown in Figure 4. As the number of iterations increases, the rate of return of the portfolio is gradually rising. At the same time, according to the output data, respectively calculate the return rate and Sharpe index of these four combinations, see Table 3 below:

table 3

It can be seen from Table 3 that with the increase of the number of iterations, both the alpha yield and the Sharpe index are gradually increasing. Although the artificial initial goal is not reached, the machine algorithm is indeed gradually approaching the set artificial initial goal. How does the body learn and improve itself? In the present invention, a deep learning neural network is first used to establish the relationship between data and functions. See Figure 9. The general expression of this graph is:

Where

Is the predicted value, weight is the weight, x is the input, bias is the bias, and σ is the activation function (the activation function is also called the activation function). Three types of activation functions are used in the present invention. They are as follows:

Sigmoid function: f(x)=1/(1+e ^x )

Tanh function: f(x)=(e ^x -e ^-x )/(e ^x +e ^-x )

Relu function: f(x)=max(0,x)

Then breadth learning has established its own logic and evaluation system. The process of establishing logic is called the process of logical reinforcement. In this case, the evaluation system is called the anthropomorphic decision model. The logic reinforcement includes goal reinforcement and strategy reinforcement. The anthropomorphic decision model includes self-assessment and environment. Four parts: evaluation, logic evaluation and reverse model. Let us first look at the logic strengthening process, as shown in Figure 10. The logic strengthening process is simply the process of abstract logic. The process of logic strengthening is the process of forming new logic. To form new logic, a logic layer must be established. Establish a logic layer through functions, and then explore the relationship between variables and independent variables through the logic layer. Because when people explore the relationship between events, they are looking for their logical relationship, not a functional relationship. Therefore, in order to establish the logical thinking of the agent, the present invention must first construct its logical layer. How to construct it? The prior art is shown in Figs. 11 and 12, the variable x and the independent variable y are connected through function or deep learning, and a neural network is introduced; while the scheme of the present invention is shown in Figs. The difference here is that the present invention establishes a logic layer in the middle. The logic layer can also be regarded as the fusion of multiple functions. The input of the logic layer is the output of each function, and the output of the logic layer corresponds to the independent variable y. . And y is our imitation object, but the imitation object here is not only the return rate of alpha, but also includes the initial goal of entering the y value through the logic of selection, that is, the agent we define for the machine algorithm (agent) The initial target value. In addition, the logic function of the logic layer we use the activation function as the logic function, because the value after the activation function is either between -1 and 1, or between 0 and 1, so we take the value after the activation function as Logical function of a certain attribute to deal with. In the present invention, we mainly establish two logic layers, namely, the goal enhancement logic layer and the strategy enhancement logic layer. After 3,000, 5,000, and 10,000 iterations, let’s take a closer look at the specific parameters, see Table 4:

Table 4

Analyzing Table 4, it is found that the cumulative evolutionary results and situation of the step-by-step decision-making and reverse evaluation of the two core indicators extracted from the goal strengthening and strategy strengthening are also gradually improving. Among them, goal strengthening and strategy strengthening are both a process. Different indicators will be produced in this process. The output logic and situations in Table 4 are produced by the strategy strengthening process, and the agent’s initial goal, choice logic, state value with clear goal, state value with unclear goal, step-by-step decision-making The cumulative evolutionary result of the reverse evaluation is produced by the goal reinforcement process. At the same time, take a look at the cumulative evolutionary results of step-by-step decision-making and reverse evaluation and the score of the situation. Please see Figures 5 and 6 for details. Although the situation declined at the beginning, it gradually stabilized later. When they gradually flatten out, it is also when the alpha income reaches a high point. Take these three values out and look at the following table 5a:

Table 5a

The cumulative evolution result of step-by-step decision-making and reverse evaluation represents the highest level of self-knowledge of the machine algorithm, and the situation represents the highest level of machine algorithm's understanding of the environment. When these two values are continuously iterated, they both reach a certain level. After convergence, the return rate of the stock portfolio selected by the machine algorithm also reached the highest point. See Figure 7. It can be seen more intuitively from the figure that after 3,000, 5,000, and 10,000 iterations of the alpha rate of return, the cumulative evolution results of the step-by-step decision-making and reverse evaluation of the machine algorithm and the situation are positive with the rate of return. Correlation, and at the same time, we output the correlation coefficient matrix of the new alpha gain after each iteration and the cumulative evolution result and situation of stepwise decision-making and reverse evaluation, as shown in Table 5b:

Table 5b

From the table, we can determine that the rate of return is also positively correlated with the cumulative evolutionary results and conditions of step-by-step decision-making and reverse evaluation. (The strength of the correlation coefficient, generally speaking, after taking the absolute value, 0-0.09 is no correlation, 0.1-0.3 is weak correlation, 0.3-0.5 is medium correlation, and 0.5-1.0 is strong correlation).

So far we have established a set of machine algorithms (agents) to recognize the causal relationship by observing the correlation between the object and the environment (in this case, alpha is the object, and to judge the relationship between them, there must be a " "I", and the cumulative evolutionary result of step-by-step decision-making and reverse evaluation is the embodiment of "I"). This method is a framework established by simulating the brain's thinking process and using a breadth learning algorithm. It can make the agent (machine algorithm) Recognize causality through observation and make logical reasoning. The logical reasoning here refers to: if there is such a causal relationship, the logic of selecting stocks on behalf of the machine algorithm is feasible. If there is no such causal relationship, then he can infer that either the goal is unreasonable or the data is incomplete. In addition, this method also better explains the interdependence between data (causality), unlike the previous deep learning methods, which cannot explain the interdependence between data. The reason why the deep learning method cannot explain the interdependence between the data is because it only establishes the functional relationship between the data. As the depth deepens, the parameters between the functions will become more and more so that they cannot be expressed in mathematics. Formula to express the relationship between them, and therefore cannot be explained. Although the breadth learning we have established is also a deep learning method, it has established a logical relationship between functions and attributes, and a causal relationship between logic and facts outside of deep learning, allowing the "agent" to pass Observing the correlation between the object and the environment can understand their causality. In fact, the more important function of this method is not only to understand the causal relationship between the object and the fact, but more importantly, to explore the causal relationship between the object and the fact by modifying the logical relationship. For example, let’s look at Table 6 below:

Table 6

This table is after constant iterations, the initial alpha stock selection conditions have changed. Originally, from the four types of data, 2 conditions for each type of data were selected as the stock selection conditions. After 3,000, 5,000, and 10,000 iterations, The conditions for stock selection in each type of data have been changed. See Table 7 below:

Table 7

The reason for the change is that the reverse model of the present invention can automatically reorganize features.

Furthermore, if the stock selection conditions are changed, for example, users believe that the price-to-earnings ratio and the price-to-book ratio are similar indicators and want to convert the price-to-book ratio into total liabilities, then the machine algorithm can re-test automatically based on the logic of human intervention. To test whether the causal relationship under this logic exists, because humans can intervene in the middle stock selection conditions, it is actually equivalent to only modifying the logical relationship mined by the machine algorithm to test the causal relationship that people expect. In other words, people only need to inform the machine algorithm of the human logic, and the machine algorithm can automatically explore the causal relationship under this logic based on this logic (the causal relationship here is the positive correlation shown in Figure 4), if there is this A causal relationship means that the logic given by humans is appropriate. If there is no such causal relationship, the logic given by humans may be inappropriate. At the same time, the machine algorithm will give hints. There are two kinds of hints, or there is no such thing. A causal relationship, that is, the logic does not match the facts, or the initial goal set by the person is inappropriate, the indicator that is prompted is the new concept state of the agent. If the new concept state of the agent (which is an indicator of goal enhancement) is this An indicator is getting larger and larger, which means that the logic given manually may not have a causal relationship under that logic in this batch of data. New environmental data needs to be introduced. If the state of the new concept of the agent becomes smaller and smaller, it means The original goal is inappropriate. In other words, we have given a set of evaluation criteria for determining whether the logic is good or not, that is, whether there is a causal relationship. In this case, whether the logic is good or not is the relationship between the return and the cumulative evolutionary results and circumstances of step-by-step decision-making and reverse evaluation. Whether there is a positive correlation, if there is a positive correlation, it means that the logic given by the person is credible, otherwise it depends on whether the target is appropriate, whether the environmental data is sufficient, etc. To sum up: The significance of breadth learning is that it can automatically explore the causality of unknown data based on human logic, reduce the time people spend exploring the truth, and reduce the cost of people exploring the truth; at the same time, it has changed all previous recommendation algorithms from passive recommendation (machine Algorithmic recommendation) becomes active recommendation (recommendation of manual intervention).

In order to further illustrate the application of breadth learning algorithms to logical reasoning, let's change a goal and see how machine algorithms do logical reasoning through causality? This time we artificially set a target rate of return of 5%, and the maximum drawdown is 5%. We still start our iteration with the original alpha as the imitation object. After 10,000 iterations, we first look at the changes in the value of the new concept state of the agent. As shown in Figure 20, we can see that the value of the new concept state of the agent fluctuates greatly back and forth, instead of tending to a narrow fluctuation, indicating that the pressure value is large and it is difficult to achieve the goal. According to our test, the value of the new concept state of the agent fluctuates in the range of 0.5 is the most effective. A value lower than 0.5 and tending to 0 is often unreasonable in the target design, while a value higher than 0.5 and tending to 1 means that the imitation object gives Is unreasonable, that is, insufficient environmental data. In addition, the value parity volatility of the new concept state of the agent is within the range of 0.2, and the target is easy to converge. It is generally difficult to approach the target if it exceeds the 0.3 parity volatility range. Moreover, the situation and the cumulative evolution result of stepwise decision-making and reverse evaluation are not easy to converge. The parity volatility above is 0.2995. Let's look at the change in the rate of return, as shown in Figure 21. The rate of return not only has not converged, but also fluctuates greatly. At the same time, the situation and the cumulative evolutionary results of step-by-step decision-making and reverse evaluation are not proportional to the return. Looking at Figure 22 again, we can see that the maximum retracement of alpha has been fluctuating within a relatively large range, and there are signs of increase. The above signs judge that the artificial goal set this time is not achieved. Yes, at least in the period of time we have chosen, alpha cannot achieve this goal. The logical reasoning as a machine algorithm (agent) is as follows: first judge whether there is a causal relationship, from the above data, there is no causal relationship, and no causal relationship means that the agent cannot combine these conditional factors with humans as the initial goal Close imitation object. Secondly, look at the range of the new concept state of the agent. If it is lower than 0.5 and tends to 0, it means that the goal is difficult to achieve. In summary, the indicators for measuring logic and the indicators for evaluating portfolio returns both rise and fall (there is a positive correlation) , Which shows that there is a causal relationship between the cause of logic and the actual result, and the new conceptual state of the agent is an inference model for deriving this causal relationship. What he explores is the causal relationship between conditional factors and target variables.

The framework of breadth learning:

The framework includes logical reinforcement and anthropomorphic decision-making models. The logical reinforcement includes goal reinforcement and strategy reinforcement. The anthropomorphic decision-making model includes four parts: self-assessment, environmental assessment, logical assessment and reverse model. Figure 8 illustrates the overall framework workflow of breadth learning:

First of all, the anthropomorphic decision model makes a judgment based on the situation of the environment (Situation), that is, outputs a set of logic to see the feedback given by the environment (reward). If the feedback is in line with the human initial goal, output the result. If the feedback is not as good as human The initial goal is to reverse the output logic to the anthropomorphic decision model. The anthropomorphic decision model makes a new set of output logic based on the output logic and environmental feedback, and then the output logic is verified in the environment (strategy) If it meets people's expectations, output the results. If it does not meet people's expectations, then repeat the above process. Of course, if it has not been able to meet human expectations, the learning algorithm will produce a new conceptual state of the agent. The new conceptual state of the agent will indicate that the data of the environment cannot be met, or that the target expectation is not in line with the actual environment.

The advantages of the breadth learning framework are the following six points.

1. Extensive learning to realize small sample generalization:

In the present invention, only an alpha combination for imitation learning is given to the agent at the beginning, and then the agent can disassemble each index forming the alpha combination of the imitation object, and then iterate continuously according to the target to generate multiple new alpha combinations. It's like giving a child a set of Lego toys at the beginning. At the beginning, the toy was in the shape of a small car. Later, the child disassembled the car and reassembled it into a new car or other toys. The alpha combination in the present invention is equivalent to a toy car, and the modules that make up the toy car are the environmental data that make up the alpha. The framework based on the breadth learning algorithm can realize the generalization of small samples, while the previous deep learning requires big data. Achieve generalization.

2. The path of the current deep learning data flow is designed, and the data needs to take a complete path. The actual method of this method is that people instill knowledge into the agent instead of letting the agent explore by themselves, and the present invention uses The data flow path of breadth learning is not manually designed, but the machine itself automatically finds the optimal path to "walk". In other words, breadth learning is to truly realize the self-exploration of the agent. It can evaluate the pros and cons of logic by exploring causality, or modify the logic to achieve the desired result. This will be very convenient for people to explore according to their own logic. Unknown causality, and greatly reduce the trial and error cost of exploring the truth.

3. The current deep learning is to optimize the best parameters through iteration to make pattern recognition, speech recognition or natural speech understanding, and our breadth learning is to establish the logical relationship between the data and the attributes of the object, and then establish The causal relationship between the object and the environment, and evaluate the surrounding environment, explore the relationship between logic and causation through continuous iteration, and finally achieve the desired goal.

4. In the past, computer technology we explored the relationship between variables and independent variables. The independent variables are usually external objective data, but when people make decisions, they often do not make decisions based on one or several objective data. Instead, make decisions based on your own needs. In the present invention, we simulate an agent by simulating the way of thinking of people, and let the agent have its own goal. This goal is not an external objective data, that is, we are not using artificially set goals (in the present invention) The artificial initial goal is the rate of return and the Sharpe index) as the goal of the agent, and it is not the logic of the agent that is artificially set. Although we set the artificial initial goal, the goal generated by the machine algorithm in the present invention It is the goal generated by the agent and can automatically explore the method to reach the goal according to human expectations. Therefore, in the present invention, we must first establish the goal of the agent, which we call the initial goal of the agent. Secondly, establish goals to strengthen the process of forming step-by-step decision-making and the cumulative evolutionary results of reverse evaluation.

5. The reverse model in the framework. As shown in Figure 15, the role of the reverse model is to automate feature engineering, which is divided into four parts: self-assessment feature reorganization, environmental assessment feature reorganization, output logic evaluation feature reorganization, and feature fusion. The final reverse model formed a series of new features, these features will produce new logic, because the logic has changed, so although it is the same program, but can output different tensor flow graphs (tensor flow graph is Data path), different tensor flow graphs also represent different logics. In combination with the present invention, after going through the reverse model, the previous stock selection logic has changed, resulting in a new stock selection strategy, and this strategy It is the logic generated by the agent itself, not the stock selection strategy given by people under different conditions. That is to say, from a technical point of view, different tensorflow graphs represent different paths, different paths represent different selection criteria, and different selection criteria represent different selection logics. Different tensor graphs in the present invention It is a different stock selection logic, so that the process of intelligent logic is realized.

6. The traditional model fusion is still looking for the relationship between x and y, and the model fusion of the reverse model not only looks for the relationship between x and y, but more importantly, the reverse model looks for a causal relationship. The causality of the middle finger refers to the positive correlation, the cause is the various logic derived from the machine algorithm, and the result is our target return. In other words, breadth learning explores the causal relationship between logic and results. If there is this kind of causal correspondence, keep this logic, if it does not correspond, remove and continue to iteratively mine the new logic formed by other features, and again compare whether there is a causal relationship with the result, and then iterate until the logic and the result are positive. Correlation. In the present invention, the machine algorithm's knowledge of all external environment data is finally refined into a value called the situation. The situation is the highest level of machine algorithm's understanding of the environment. The convergence of the situation means that the understanding of the environment has reached a high level. At the same time, We finally refine the agent’s knowledge of self into a value called the cumulative evolution result of step-by-step decision-making and reverse evaluation. The cumulative evolution result of step-by-step decision-making and reverse evaluation is the highest level of understanding of "self" by machine algorithms. The convergence of the cumulative evolutionary results of step-by-step decision-making and reverse evaluation means that the understanding of "self" has reached a certain height. The cumulative evolutionary results and conditions of step-by-step decision-making and reverse evaluation have both converged, which means that the machine has both self and environment. With a clear understanding, output his stock selection in this situation, and this is the stock selected by the agent according to its own logic. We select the top 20 to form a new alpha combination.

requires attention:

The human-made initial goal in the anthropomorphic decision-making model is not the goal of the agent, but the goal that people expect the agent to achieve. The goal of the agent in this invention is called the agent’s initial goal. The agent’s initial goal is with the agent Self-subjective goals, because people are often subjective when making choices, and they do not make choices based entirely on the objective world. Therefore, we are actually here to construct an emotional indicator and imitation object. The goal of the agent, this is the initial goal of the agent. In addition, the choice logic in the anthropomorphic decision-making model is not static. After the agent reorganizes the features, the input value into the sentiment index becomes a new feature combination after the model is fused.

Algorithm description of the technical solution of the present invention.

First construct a multi-level and multi-dimensional environment, and then use this environment to construct an object-alpha combination that the agent can imitate. The alpha of the imitated object is a combination given by people randomly and generated alpha income graph (see Figure 3) At first, the agent saw only an alpha yield curve, and now he needs to recombine an alpha combination by himself. So he will create a new combination of building blocks based on environmental data and "self" logic. Environmental data includes: macro indicators, financial indicators, market indicators and news indicators. For example, we randomly select a few selection criteria: stocks with a stock price of less than 15 yuan, a P/E ratio of less than 20 times, industries including finance, e-commerce, and public sentiment indexes have always been greater than 100 in the past 3 months. Finally, the top 20 stocks are selected to form a portfolio (imitated object), as shown in Figure 3 as an alpha portfolio, the return rate in the past 3 months is 0.0063, and the Sharpe ratio is 0.000089.

Then artificially set a goal as the artificial initial goal, for example, the return is 5% and the Sharpe ratio is 3. Provide all environmental data to the program, and at the same time provide alpha to the program as a learning and imitation object. Then, as shown in Figure 16, the program starts to run according to the following process. The process is divided into 2 big steps and a big loop.

The first step is to implement the logic enhancement process, the second step is to implement the reverse evaluation process, and then put the new features after the reverse model screening into the first step logic enhancement as the input of sentiment indicators, replacing the previous external feature data, and at the same time Use new features to screen stocks to form a new alpha combination to compare with the old alpha, and then repeat the first and second steps.

Logic strengthening process:

First read in external data (including financial data, market data, news data, and macro data), and then gradually generate logical layers of corresponding attributes by adding hidden layer functions and incentive functions.

Calculation of sentiment indicators: take external environmental data as a function of sentiment indicators.

MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])

EmotionIndex=tanh(weight*MergeAllData+bias)

EmotionIndex is the sentiment index, weight represents the weight, MergeAllData is all the merged data, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias Is the bias and tanh is the activation function. pd.merge() is the merge function.

The primary conceptual state of the agent: The primary conceptual state of the agent represents the expectation above reality. Therefore, the primary conceptual state of the agent is obtained by adding environmental data and goals. The primary conceptual state of the agent is an expectation based on reality .

idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)

Idea represents the primary concept state of the agent, EmotionIndex is the emotion index, reshap() is the dimensionality reduction function, prospect is the expected return, weight is the weight, bias is the bias, and relu is the incentive function.

Selection logic: Set 1 to discard, and 0 to obtain. Through the relu function, the return divided by the primary concept state of the agent represents the distance between reality and ideal.

ar=sigmoid(weight*(reward/idea)+bias))

ar is the logic of choice, weight represents the weight, reward is the rate of return of alpha, idea is the primary concept state of the agent, bias is the bias, and sigmoid is the incentive function.

Four functions for calculating the target enhancement layer:

The calculation of the agent’s initial goal: select 20 stocks from the four categories of external data by the relu function through filtering conditions, form a combination and calculate the return rate and Sharpe ratio of the combination, and then divide the output value of the logic of selection Get the initial goal of the agent with an alpha rate of return:

AgentTarget=relu(weight*(ar/reward)+bias))

AgentTarget is the initial target of the agent, weight represents the weight, ar is the logic of selection, reward is the rate of return of alpha, bias is the bias, and relu is the incentive function.

The calculation of the state value with a clear goal: the state value with a clear goal can be understood as the distance between the target of the agent in the machine algorithm and the actual. The distance is too far to indicate that the state value of the clear goal is invalid. Only the target of the agent is closer to the actual distance It can be regarded as an effective target state value. After passing the tanh excitation function, as long as it is judged whether the clear-target state value is greater than 0, greater than 0 represents an effective clear-target state value, and less than 0 represents an invalid clear-target state value. An invalid clear-target state value is not meaningless. State values with unclear goals can form common sense, but this case does not involve common sense. Only the state values with clear goals are recorded for subsequent step-by-step decision-making and calculation of the cumulative evolution result of reverse evaluation.

Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))

Targeted is the target state value, weight represents the weight, AgentTarget is the initial target of the agent, reward is the rate of return of alpha, EmotionIndex is the sentiment index, bias is the bias, and relu is the incentive function.

The calculation of the state value of the unclear goal: the state value of the unclear goal comes from three parts, one part comes from the effective conversion of the goal clear state value, one part comes from the invalid transition of the goal clear state value, and the larger part comes from the unconscious state. People usually consciously try to experience things that they have never experienced before, but if there are no major problems after the experience, they will more or less become part of the unconscious. Consciousness is similar to the state with a clear goal defined in this case, and unconscious is similar to the state value of unclear goals in this case. The unconscious ultimately forms common sense, or it can be said that the unconscious stores common sense. Human logical reasoning comes from the common sense and conscious judgment formed by the brain unconsciously, that is to say, reasoning can be made based on common sense. In this case, the unclear target state value and the clear target state value simulate human conscious and unconsciousness. Logical reasoning.

UnTargeted is the state value of the unclear target, weight represents the weight, AgentTarget is the initial target of the agent, Targeted is the state value of the clear target, NewIdea is the new conceptual state of the agent, bias is the bias, sigmoid is the incentive function, un- recognized _n represents unknown data composed of n-dimensional unperceived states.

The new concept state of the agent: The goal of the new concept state of the agent is to test counterfactuals (counterfacts refer to the negation of facts that have occurred in the past and re-characterization). The smaller the value of the new concept state of the agent proves that the distance between the imitation object and the agent's initial goal is close, and the larger the value, the longer the distance between the imitation object and the agent's initial goal. The new concept state of the agent can be regarded as an inference model. What he explores is the causal relationship between the conditional factors and the target variable. A larger or smaller value means that there are two possibilities. One is that the goal is difficult to achieve, or that The goal setting is not necessarily appropriate (one of the counterfactuals). At this time, the value of the new concept state of the agent tends to 0. The second is that the imitation object is not well set (the second of the counterfactuals). At this time, the new concept of the agent The value of the state tends to be positive 1. If the value of the new concept state of the agent becomes larger and larger, it means that there is unknown data that has not been introduced into the environmental data. At this time, it is necessary to consider the introduction of new external environmental data. After our test, when the value of NewIdea exceeds 0.9, the value of the situation will hardly converge. At this time, we set un-recognized _n = un-recognized _n +1, prompting the need to introduce new external data to Environmental data. NewIdea=relu(weight*(Moveloss/Targeted)+bias)).

NewIdea is the new concept state of the agent, weight represents the weight, and Moveloss is the moving stop loss distance of the alpha combination (moving stop loss is also called "trailing stop loss", which is to follow the latest price to set a certain number of stop loss, and move the stop loss distance It refers to the difference between the average moving stop loss of the long and the short.) Targeted is the state value with a clear target, sigmoid is the incentive function, and bias is the bias.

moveloss=tf.reduce_mean(tf.reduce_sum(tf.square((moveloss_sel-moveloss_buy))))

tf.reduce_mean is the function to find the mean value, tf.reduce_sum is the sum function, tf.square is the function to find the equation, moveloss_sell is the comprehensive stop loss of the sell order as of today, and moveloss_buy is the comprehensive stop of the buy order as of today Loss. Their calculations are as follows:

If the opening price of the day is greater than or equal to the closing price,

moveloss_buy=close_yesterday-firstloss

If the opening price of the day is less than the closing price,

moveloss_buy=close_today-(close_today-close_yesterday)*0.618+firstloss

Among them, close_today is today's closing price, close_yesterday is yesterday's closing price, firstloss is the initial stop loss, we set it to 0.0012.

If the opening price of the day is less than or equal to the closing price,

moveloss_sell=close_yesterday+firstloss

If the opening price of the day is greater than the closing price,

moveloss_sell=close_today+(close_today-close_yesterday)*0.618+firstloss

Note: close_today and close_yesterday are the closing prices of the alpha portfolio index. See the appendix for the calculation of the alpha portfolio index.

Moveloss represents the size of the stop loss distance between the long and short sides. The greater the distance, the smaller the pressure, and the smaller the distance, the greater the pressure. We are also simulating human situations here. People can often think of each under pressure. This is a whimsical idea. In this case, we are involved in the securities market, so we treat the pressure of a certain combination as the pressure of an adult, just like if someone buys a stock that drops, he will also have pressure of varying magnitude. The new concept state algorithm for a more general agent can be designed in a targeted manner. In principle, it is designed based on the idea of pressure generation. It is similar to the emergence of intelligence, which means that the corresponding pressure value can be regarded as the value of the new concept state of the agent.

Calculation of the cumulative evolution result of step-by-step decision-making and reverse evaluation: The cumulative evolution result of step-by-step decision-making and reverse evaluation represents the intelligence of the agent, and the cumulative evolution result of step-by-step decision-making and reverse evaluation is a state value with unclear goals Compared with the state value with a clear goal, it can be said that the closer the state value with a clear goal is to the state value with an unclear goal, the smarter the cumulative evolution result of stepwise decision-making and reverse evaluation.

SapientState=relu(weight*(UnTargeted/Targeted)+bias))

Two functions of computing strategy enhancement:

Calculation of output logic: Divide the agent’s initial target by the choice logic through the relu function;

OutputLogic=relu(weight*(prospective/ar)+bias))

OutputLogic is the output logic, prospective is the expected return, weight is the weight, ar is the choice logic, bias is the bias, and relu is the incentive function;

Calculation of the situation: The situation represents the highest level of understanding of the environment.

Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))

Situation represents the situation, weight represents the weight, OutputLogic is the output logic, ar is the choice logic, bias is the bias, and tanh is the excitation function.

The strategy in this case is to buy and sell stocks in the stock market, so the strategy here is to buy or sell stocks at the price of the day. In this case, the calculation is based on the closing price close on the day of buying and selling. Reverse model:

The role of the reverse model is to automate feature engineering.

As shown in Figure 17 and Figure 18, the reverse model is divided into four parts, feature extraction for self-assessment, feature extraction for environment assessment, feature extraction for output logic assessment, and feature reorganization.

In the first step, the agent extracts the features of self-assessment, using the XGBoost algorithm to take the four values of the target reinforcement as the output, and the value of the external environment as the input. The selected features are the self-assessment features:

NewFeature1 is a self-assessment feature, XGBClassifier belongs to the XGBoost algorithm, a classification function, fin[36], market[32], mac[11], news[17] are financial data (indicators), market data, and macro data, respectively. News data, SapientState, UnTargeted, Targeted, AgentTarget are the cumulative evolutionary results of step-by-step decision-making and reverse evaluation, respectively, the state value with unclear target, the state value with clear target, and the initial target of the agent. The above formula regards the four values of self-evaluation (accumulative evolution results of step-by-step decision-making and reverse evaluation, state values with unclear goals, state values with clear goals, and initial goals of the agent) as four criteria. The selection is Refers to selecting features that meet these four standards from the original data.

The second step is to extract the evaluation features of the environment, using the LightGBM algorithm to take the two values of the strategy enhancement as the output, and the value of the external environment as the input, to filter out the features that meet the environmental evaluation logic:

NewFeature2 is in line with the characteristics of the environmental assessment logic layer, LightGBMClassifier is a classification function, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, Situation is the situation, and OutputLogic is the output logic.

The third step is to extract the features of the output logic evaluation, use the GradientBoosting algorithm to take the two values of target enhancement and strategy enhancement as the output, and the value of the external environment as the input to filter out the output logic features;

NewFeature3 is the feature of output logic, GradientBoostingClassifier is a classification function, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, Situation is the situation. , SapientState is the cumulative evolutionary result of step-by-step decision-making and reverse evaluation.

The fourth step, feature reorganization:

Use the logistic regression algorithm to take the agent’s initial target value as the output, and use the feature values filtered in the first, second, and third steps as input to filter out the new feature CombinedFeature:

LRClassifier is a logistic regression function, NewFeature1, NewFeature2, and NewFeature3 are the new feature groups selected in the first three steps, and AgentTarget is the initial target value of the agent calculated before.

You can also use the stacking method to do model fusion:

StackingClassifier = (classifiers = [LightGBM, xgboost, GradientBoosting], meta_classifier = lr) where meta_classifier = lr represents the use of logistic regression algorithm to fuse features selected by the three algorithms of LightGBM, xgboost, and GradientBoosting.

Five: How do new features form new alpha combinations.

First see Data Flow Chart 19, which is divided into three steps:

The first step is to use the random forest algorithm to score each stock, and then select the top n stocks with the highest scores. In this case, n=20.

The steps are as follows: We select the new feature CombinedFeature selected after 10,000 iterations, as shown in Table 8 below:

Table 8

First, design a target variable GB. Based on the concept of value investment, first remove the remaining fluctuations of each stock after the market fluctuations, and then add the 5% return rate that we artificially set at the beginning.

When the market falls:

countday:=count(date>20170407,0);

change:=(close-ref(close,countday))/ref(close,countday);

GB:=change+0.045+0.05;

When the market rises:

countday:=count(date>20170407,0);

change:=(close-ref(close,countday))/ref(close,countday);

if(change>0and change<0.045)

then GB: =0.045-change+0.05;

else GB:=change-0.045+0.05;

Among them, countday is the number of days to count a period of time, change is the remaining rise and fall after excluding the market rise and fall, count() is a function to calculate the number of days that meet the conditions, and ref() is a function to extract the closing price from now to countday ,

if()

then

else is a conditional function,

0.045 is the increase and decrease of the Shanghai Composite Index from 20170407 to 20170614, 0.05 is the expected return, and GB is the target variable. After calculation, we sort by the size of GB from large to small, see the following table (only listed Top 20)

Table 9

Then use the random forest algorithm to give each stock a score, and select the top n stocks to form the new alpha combination.

score=model_selection.cross_val_score(RandomForestClassifier,input[CombinedFeature],output[GB])

Score is the score of each stock, model_selection.cross_val_score is the function to calculate the score, RandomForestClassifier is the random forest algorithm, and GB is the target variable. After calculation, the following table 10 is obtained:

Table 10

In this example, we select the top 20 stocks as the stocks of the new alpha portfolio.

The second step, weight distribution

Use the function BL_asset_allocation to calculate the weight of each stock.

weight=BL_asset_allocation(df,0.05,p,q,optim_setting)

weight is the weight, BL_asset_allocation is the function to calculate the weight, df is the array data of stocks, 0.05 is the expected return artificially set before, p is the matrix data of n stocks, q is the return rate of each stock of n stocks, optim_setting It is a risk measurement index, here we set it to 3, which is the Sharpe ratio 3 that we set artificially at the beginning. Through calculation, we get the following Table 11.

Table 11

Step 3: Calculate the new alpha combination into alpha index:

Today's index = (Total market value of the day/Total market value of the base period)×100

The total market value is the product of the closing price of all stocks on the day and the issuance volume multiplied by the sum of the weights, see the formula:

The total market value of the base period is the sum of the product of the closing price of the stock on the day of purchase and the issuance multiplied by the weight, see the formula:

formula:

Among them, close is the closing price, circulation is the circulation, and weight is the weight. N=20 in this case

The new alpha combination and the update rule of the imitated object alpha:

If the rate of return of the new alpha is higher than that of the original imitation object alpha, then the new alpha will replace the original imitation object to become the new imitation object. If the rate of return of the new alpha is lower than or equal to the income of the original imitation object alpha, then return Use the original alpha as the imitation object.

The anthropomorphic decision-making model can be used in a wide range. As long as the goal is set and the imitation object is provided, he can automatically learn the imitation object, and gradually approach the goal according to the given goal, and remind the person who uses it to achieve the goal in all directions. At present, we are only using it in the field of financial investment, and we plan to expand to other fields in the next step.

The following are system embodiments corresponding to the foregoing method embodiments, and this embodiment can be implemented in cooperation with the foregoing embodiment. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they will not be repeated here. Correspondingly, the related technical details mentioned in this embodiment can also be applied to the above embodiment.

Module 2. Establish a logical layer corresponding to the attributes of various environmental data to construct a logical strengthening process and a reverse model, conduct dynamic self-assessment, environmental assessment and logical assessment of the object’s situation in the environment, and use logistic regression algorithms to The evaluation results of the three are combined with features to obtain new features;

Module 11. Obtain man-made initial goals and environmental data. The environmental data includes financial indicators, market indicators, news indicators, and macro indicators. Random environmental data is used as a screening condition to select multiple stocks that meet the screening conditions, and aggregate the multiple stocks. The stocks form the alpha index as an imitation object;

MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])

EmotionIndex=tanh(weight*MergeAllData+bias)

Calculation of the primary concept state of the agent:

idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)

Calculation of the trade-off logic:

ar=sigmoid(weight*(reward/idea)+bias))

AgentTarget=relu(weight*(ar/reward)+bias))

The value of the targeted state value is:

Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))

The value of the unclear status value of the target is:

The new concept state of the agent is obtained by the following formula:

NewIdea=relu(weight*(Moveloss/Targeted)+bias))

SapientState=relu(weight*(UnTargeted/Targeted)+bias))

OutputLogic=relu(weight*(prospective/ar)+bias))

Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))

Situation represents the situation.

The process of screening this environmental assessment feature includes:

The process of screening the output logical characteristics includes:

Industrial applicability

The present invention relates to a general logical reasoning method and system for realizing an agent based on a breadth learning algorithm, including: acquiring various environmental data corresponding to an object, wherein each environmental data includes multi-dimensional data or indicators, and establishing corresponding attributes of various data The logic layer constructs a logic strengthening process and a reverse model, carries out dynamic self-assessment, environmental assessment and logical assessment of the object's environment in the environment, and uses the logistic regression algorithm to integrate the three assessments to obtain new features; The new feature constructs a new object, and then establishes and evaluates the causal relationship between the logic formed by the new object and the human initial goal. The invention can fully explore the causal relationship between the logic and the target combined by the condition factors, thereby realizing the logical reasoning of the machine algorithm, which improves the degree of automation; better explains the dependence relationship between data; reduces the use of AI technology threshold.

Claims

A general logical reasoning method for an agent based on a breadth learning algorithm is characterized in that it includes:

Step 1. Obtain various types of environmental data corresponding to the object, where each type of environmental data includes multi-dimensional data or indicators, and obtain the attributes of each type of environmental data through feature extraction;

Step 2. Establish a logical layer corresponding to the attributes of various environmental data to build a logical strengthening process and a reverse model, conduct dynamic self-assessment, environmental assessment and logical assessment of the object’s situation in the environment, and use logistic regression algorithms to The evaluation results of the three are combined with features to obtain new features;

Step 3. Construct a new object according to the new feature, and then evaluate the causal relationship and logic between the logic formed by the new object and the human initial goal, and confirm the causal relationship and the logic that formed the causal relationship according to the evaluation result, and then it will conform to the causality The new object of the relationship is recommended as the result of logical inference.
The universal logical reasoning method for an agent based on a breadth learning algorithm according to claim 1, wherein the step 1 includes:

Step 11. Obtain man-made initial goals and environmental data. The environmental data includes financial indicators, market indicators, news indicators, and macro indicators. Random environmental data is used as a screening condition to select multiple stocks that meet the screening conditions, and aggregate the multiple stocks. Each stock forms an alpha index as an imitation object;

Step 12. Combine the environmental data through the incentive function to obtain the emotion index, add the emotion index and the person as the initial goal to get the primary concept state of the agent, and divide the return rate of the imitation object by the primary concept of the agent through the incentive function State, get the logic of selection, and divide the logic of selection by the incentive function by the rate of return of the imitation object to get the initial goal of the agent;

Step 13. Use the incentive function to divide the agent’s initial goal by the difference between the return rate of the imitation object and the emotional index to obtain a clear state value. Through the incentive function, the agent’s initial goal, the clear state value, and the agent’s The new concept state and the unperceived state are used as input to get the unclear state value of the target, and the new concept state of the agent is obtained by the incentive function by dividing the moving stop loss distance of the imitation object by the clear state value, and the target is unclear through the incentive function Divide the state value of the target by the state value of the goal as the cumulative evolution result of step-by-step decision-making and reverse evaluation. Through the relu function, the agent’s initial goal is divided by the selection logic to obtain the output logic, and the incentive function is used to divide the output logic by the artificial The difference between the initial target and the actual income, get the situation;

Step 14. Collect the cumulative evolution result of the step-by-step decision and reverse evaluation, the situation and the agent's initial goal as the attribute.
The universal logical reasoning method for an agent based on a breadth learning algorithm as claimed in claim 2, wherein the calculation of the emotional index is:

MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])

EmotionIndex=tanh(weight*MergeAllData+bias)

EmotionIndex is sentiment index, weight represents weight, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias is bias, tanh is incentive Function, pd.merge() is the merge function;

Calculation of the primary concept state of the agent:

idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)

Idea represents the primary concept state of the agent, prospect is the expected return, reshape() is the dimensionality reduction function, and relu is the incentive function;

Calculation of the trade-off logic:

ar=sigmoid(weight*(reward/idea)+bias))

ar is the logic of selection, reward is the rate of return of the imitation object in step 11, and sigmoid is the incentive function;

AgentTarget=relu(weight*(ar/reward)+bias))

AgentTarget is the initial target of the agent, weight represents the weight, reward is the rate of return of alpha, and relu is the incentive function;

The value of the targeted state value is:

Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))

Targeted is the value of the target state value, and AgentTarget is the initial target of the agent;

The value of the unclear status value of the target is:

UnTargeted=sigmoid(weight*([AgentTarget, Targeted,

NewIdea,un-recognized n ])+bias))

UnTargeted is the value of the unrecognized state value of the target, and un-recognized n represents the unknown data composed of n-dimensional unrecognized state;

The new concept state of the agent is obtained by the following formula:

NewIdea=relu(weight*(Moveloss/Targeted)+bias))

NewIdea is the new concept state of the agent, and Moveloss is the moving stop loss distance of the imitation object;

The cumulative evolution result of step-by-step decision-making and reverse evaluation is the ratio of the state value with unclear goals to the state value with clear goals;

SapientState=relu(weight*(UnTargeted/Targeted)+bias))

SapientState is the cumulative evolution result of step-by-step decision-making and reverse evaluation, weight represents weight, UnTargeted is the state value with unclear target, Targeted is the state value with clear target, bias is bias, and relu is the incentive function;

Calculation of the output logic: divide the expected return by the trade-off logic through the relu function;

OutputLogic=relu(weight*(prospective/ar)+bias))

OutputLogic is the output logic, prospective is the expected return, and ar is the logic of choice;

The situation represents the highest level of understanding of the environment, the calculation of the situation:

Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))

Situation represents the situation.
The universal logical reasoning method for an agent based on a breadth learning algorithm according to claim 2 or 3, characterized in that step 2 includes:

Step 21. Using the XGBoost algorithm, the cumulative evolution result of the step-by-step decision and reverse evaluation, the unclear state value of the goal, the clear state value of the goal, and the agent’s initial goal are used as output, and the environmental data is used as input. The selected features are used as self-assessment features; the LightGBM algorithm is used, the situation and the output logic are used as output, and the environmental data is used as input to filter out the environmental assessment features; the GradientBoosting algorithm is used to determine the cumulative evolution results of stepwise decision-making and reverse evaluation. When the situation is output, the environmental data is used as input to filter out the output logical characteristics; the logistic regression algorithm is used to take the agent’s initial goal as output, and the self-assessment characteristics, environmental evaluation characteristics and output logical characteristics are used as input to filter Feature merged with the model as a new feature;

Step 22: Score each stock according to the new feature and the random forest algorithm, take the alpha index formed by the multiple stocks with the highest score value, and replace the imitation in step 11 according to the update rule of the new alpha combination and the imitation object alpha Object

Step 23. Loop the steps 11 to 22 until the cumulative evolution result of the step-by-step decision and reverse evaluation converges with the situation at the same time, output the new imitation object, and output the condition combination of the new imitation object and the imitation object and logic The causal relationship between.
The universal logical reasoning method for an agent based on a breadth learning algorithm as claimed in claim 4, wherein the process of screening the self-assessment features comprises:

NewFeature1=XGBClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,UnTargeted,Targeted,AgentTarget])

NewFeature1 is the self-assessment feature, XGBClassifier is the classification function, SapientState, UnTargeted, Targeted, and AgentTarget are the cumulative evolutionary results of step-by-step decision-making and reverse evaluation, respectively. The state value with unclear target, the state value with clear target, the initial agent aims;

The process of screening this environmental assessment feature includes:

NewFeature2=LightGBMClassifier(input(fin[36], market[32], mac[11], news[17]), output[Situation,OutputLogic])

NewFeature2 is the environmental evaluation feature, LightGBMClassifier is the classification function, Situation is the situation, and OutputLogic is the output logic;

The process of screening the output logical characteristics includes:

NewFeature3=GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])

NewFeature3 is the output logical feature, and GradientBoostingClassifier is the classification function;

According to the environmental assessment feature, the output logic feature, and the self-assessment feature, the new feature is selected:

CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])

Among them, CombinedFeature is the new feature, and LRClassifier is the logistic regression function.
A general logic reasoning method reasoning system based on a breadth learning algorithm to realize an agent is characterized in that it includes:

Module 1. Obtain various types of environmental data corresponding to the object, where each type of environmental data includes multi-dimensional data or indicators, and obtain the attributes of each type of environmental data through feature extraction;

Module 2. Establish a logical layer corresponding to the attributes of various environmental data to construct a logical strengthening process and a reverse model, conduct dynamic self-assessment, environmental assessment, and logical assessment of the object’s situation in the environment, and use logistic regression algorithms to The evaluation results of the three are combined with features to obtain new features;

Module 3. Construct a new object according to the new feature, then establish a causal relationship between the logic formed by the new object and the human initial goal, and evaluate the causal relationship and logic, and confirm the causal relationship and the logic of forming the causal relationship according to the evaluation results Then, the new object that meets the causal relationship is used as the result of logical reasoning for recommendation output.
The logical reasoning system of an agent based on a breadth learning algorithm according to claim 6, wherein the module 1 includes:

Module 11. Obtain man-made initial goals and environmental data. The environmental data includes financial indicators, market indicators, news indicators, and macro indicators. Random environmental data is used as a screening condition to screen multiple stocks that meet the screening conditions, and collect the multiple stocks. Each stock forms an alpha index as an imitation object;

Module 12. Combine the environmental data through the incentive function to obtain the emotional index, use the environmental data and the person as the initial goal to add the primary concept state of the agent, and divide the return rate of the imitation object by the primary concept of the agent through the incentive function State, get the logic of selection, and divide the logic of selection by the incentive function by the rate of return of the imitation object to get the initial goal of the agent;

Module 13. Through the incentive function, the agent’s initial goal is divided by the difference between the return rate of the imitation object and the emotional index to obtain a clear state value. Through the incentive function, the agent’s initial goal, the clear state value, and the agent’s The new concept state and the unperceived state are used as input to get the unclear state value of the target, and the new concept state of the agent is obtained by the incentive function by dividing the moving stop loss distance of the imitation object by the clear state value, and the target is unclear through the incentive function The state value divided by the state value with a clear goal is the cumulative evolution result of step-by-step decision-making and reverse evaluation. Through the relu function, the agent’s initial goal is divided by the selection logic to obtain the output logic, and the output logic is divided by the incentive function The difference between the artificial initial goal and the actual income, get the situation;

Module 14. Collect the cumulative evolution result of the step-by-step decision and reverse evaluation, the situation and the agent's initial goal as the attribute.
The logical reasoning system based on a breadth learning algorithm to realize an agent according to claim 7, wherein the calculation of the emotional index:

MergeAllData=pd.merge(fin[36], market[32], mac[11], news[17])

EmotionIndex=tanh(weight*MergeAllData+bias)

EmotionIndex is sentiment index, weight represents weight, fin[36], market[32], mac[11], news[17] are financial data, market data, macro data, news data, bias is bias, tanh is incentive Function, pd.merge() is the merge function;

Calculation of the primary concept state of the agent:

idea=relu(weight*(EmotionIndex.reshape()+prospective)+bias)

Idea represents the primary concept state of the agent, prospect is the expected return, and relu is the incentive function;

Calculation of the trade-off logic:

ar=sigmoid(weight*(reward/idea)+bias))

ar is the logic of selection, reward is the rate of return of the imitation object in module 11, and sigmoid is the incentive function;

AgentTarget=relu(weight*(ar/reward)+bias))

AgentTarget is the initial target of the agent, weight represents the weight, ar is the logic of selection, reward is the rate of return of alpha, bias is the bias, and relu is the incentive function;

The value of the targeted state value is:

Targeted=tanh(weight*([AgentTarget/(reward-EmotionIndex))+bias))

Targeted is the value of the target state value, and AgentTarget is the initial target of the agent;

The value of the unclear status value of the target is:

UnTargeted=sigmoid(weight*([AgentTarget, Targeted, NewIdea,un-recognized n ])+bias))

UnTargeted is the value of the unrecognized state value of the target, and un-recognized n represents the unknown data composed of n-dimensional unrecognized state;

The new concept state of the agent is obtained by the following formula:

NewIdea=relu(weight*(Moveloss/Targeted)+bias))

NewIdea is the new concept state of the agent, and Moveloss is the moving stop loss distance of the imitation object;

The cumulative evolution result of step-by-step decision-making and reverse evaluation is the ratio of the state value with unclear goals to the state value with clear goals;

SapientState=relu(weight*(UnTargeted/Targeted)+bias))

SapientState is the cumulative evolution result of step-by-step decision-making and reverse evaluation, weight represents weight, UnTargeted is the state value with unclear target, Targeted is the state value with clear target, bias is bias, and relu is the incentive function;

Calculation of the output logic: divide the expected return by the trade-off logic through the relu function

OutputLogic=relu(weight*(prospective/ar)+bias))

OutputLogic is the output logic, prospective is the expected return, and ar is the logic of choice;

The situation represents the highest level of understanding of the environment, the calculation of the situation:

Situation=tanh(weight*(OutputLogic/(prospective-reward))+bias))

Situation represents the situation.
The logical reasoning system of an agent based on a breadth learning algorithm according to claim 7 or 8, characterized in that the module 2 includes:

Module 21. Using the XGBoost algorithm, the cumulative evolution result of the step-by-step decision and reverse evaluation, the unclear state value of the goal, the clear state value of the goal, and the agent’s initial goal as output, and the environmental data as input, The selected features are used as self-assessment features; the LightGBM algorithm is used, the situation and the output logic are used as output, and the environmental data is used as input to filter out the environmental assessment features; the GradientBoosting algorithm is used to determine the cumulative evolution results of stepwise decision-making and reverse evaluation. When the situation is output, the environmental data is used as input to filter out the output logical characteristics; the logistic regression algorithm is used to take the agent’s initial goal as output, and the self-assessment characteristics, environmental evaluation characteristics and output logical characteristics are used as input to filter Feature merged with the model as a new feature;

Module 22. Score each stock according to the new feature and random forest algorithm, take the alpha index formed by the multiple stocks with the highest score value, and replace the imitation in module 11 according to the update rule of the new alpha combination and the imitation object alpha Object

Module 23. Loop the modules 11 to 22 until the cumulative evolution result of the step-by-step decision and reverse evaluation converges with the situation at the same time, output the new imitation object, and output the condition combination of the new imitation object and the imitation object and logic The causal relationship between.
The logical reasoning system based on a breadth learning algorithm to realize an agent according to claim 9, wherein the process of screening the self-assessment features comprises:

NewFeature1=XGBClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,UnTargeted,Targeted,AgentTarget])

NewFeature1 is the self-assessment feature, XGBClassifier is the classification function, SapientState, UnTargeted, Targeted, and AgentTarget are the cumulative evolutionary results of step-by-step decision-making and reverse evaluation, respectively. The state value with unclear target, the state value with clear target, the initial agent aims;

The process of screening this environmental assessment feature includes:

NewFeature2=LightGBMClassifier(input(fin[36], market[32], mac[11], news[17]), output[Situation,OutputLogic])

NewFeature2 is the environmental evaluation feature, LightGBMClassifier is the classification function, Situation is the situation, and OutputLogic is the output logic;

The process of screening the output logical characteristics includes:

NewFeature3=GradientBoostingClassifier(input(fin[36], market[32], mac[11], news[17]), output[SapientState,Situation])

NewFeature3 is the output logical feature, and GradientBoostingClassifier is the classification function;

According to the environmental assessment feature, the output logic feature, and the self-assessment feature, the new feature is selected:

CombinedFeature=LRClassifier(input[NewFeature1,NewFeature2,NewFeature3],output[AgentTarget])

Among them, CombinedFeature is the new feature, and LRClassifier is the logistic regression function.