CN115239189A

CN115239189A - Policy generation method, apparatus, device, medium, and program product

Info

Publication number: CN115239189A
Application number: CN202210973948.XA
Authority: CN
Inventors: 洪欢江; 王永隆; 张彦东; 徐丽
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2022-08-15
Filing date: 2022-08-15
Publication date: 2022-10-25

Abstract

The disclosure provides a strategy generation method, and relates to the technical field of computers. The strategy generation method comprises the following steps: acquiring at least one asset held by a target object and the expense of the target object; generating a plurality of expected total assets of a total asset of a target object on a target time node, wherein each expected total asset is generated through a random process based on expenses and at least one asset, configuring an asset holding ratio for the target object when each expected total asset is generated, wherein the asset holding ratio comprises a holding share of the target object to the at least one asset, and configuring different asset holding ratios when different expected total assets are generated; determining a first objective strategy using a dynamic planning equation based on the expected total assets and expenses, the first objective strategy comprising: the target object's assets hold a mix when the total assets and expenses are expected to meet the first control objective. The present disclosure also provides a policy generation apparatus, device, medium, and program product.

Description

Policy generation method, apparatus, device, medium, and program product

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a policy generation method, apparatus, electronic device, storage medium, and program product.

Background

Banking efficiency refers to the comparative relationship between input and output or cost and revenue in banking activities. The bank efficiency is the centralized embodiment of the competitiveness of the banking industry, and the improvement of the efficiency of the banking industry is the basis of preventing financial risks and promoting the sustainable development of the banking industry.

In order to improve the efficiency of the bank, the resource allocation of the bank can be optimized, but the market environment is complex and changeable, the economic situation is full of uncertainty, the process usually needs higher cost, and the efficiency is lower.

Disclosure of Invention

In view of the above, the present disclosure provides a policy generation method, apparatus, device, medium, and program product.

According to a first aspect of the present disclosure, there is provided a policy generation method, including:

acquiring at least one asset held by a target object and the expense of the target object;

generating a plurality of expected total assets of the target object on a target time node, wherein each expected total asset is generated through a random process based on the expenditure and at least one item of the assets, when each expected total asset is generated, an asset holding ratio is configured for the target object, the asset holding ratio comprises a holding share of at least one item of the assets of the target object, and when different expected total assets are generated, different asset holding ratios are configured;

determining a first objective strategy using a dynamic planning equation based on the expected total assets and the expenditures, the first objective strategy comprising: when the expected total assets and the payout meet a first control objective, the assets of the target object hold a mix.

According to an embodiment of the present disclosure, the at least one asset includes an inauguration asset and an inauguration asset, and the configuring asset holding proportion for the target object when generating each of the expected total assets includes: configuring the target object's share of holds for the risky assets and share of holds for the non-risky assets;

generating each expected total asset of the total assets of the target object on a target time node, comprising:

acquiring deposit of a depositor;

determining the total assets of the target object on at least one preset time node according to the share of the target object on the risky assets, the share of the target object on the non-risky assets and the deposit;

constructing the random process by using a random differential equation;

and generating the expected total assets through the random process according to the determined at least one total asset.

According to an embodiment of the present disclosure, the payout includes an interest payout and an operation payout, and the constructing the stochastic process using stochastic differential equations includes:

obtaining interest rates and risk-free interest rates of the deposit and expected rates and expected strengths of change of the at-risk assets;

determining a random differential equation of the risk-free asset according to the risk-free interest rate to obtain a first random differential equation;

determining a stochastic differential equation for the non-risky asset based on the expected rate of change, the expected strength of change, and a stochastic distribution function to obtain a second stochastic differential equation;

determining a random differential equation of the interest expenditure according to the deposit and the interest rate of the deposit to obtain a third random differential equation;

determining a random differential equation for a total asset based on the first random differential equation, the second random differential equation, the third random differential equation, and the operational expenditure to construct the stochastic process.

According to an embodiment of the present disclosure, the expenditures comprise operational expenditures, and the determining a first target strategy using a dynamic planning equation based on the expected total assets and the expenditures comprises:

determining a first control function based on the expected total assets and the operational expenditure;

maximizing the first control function such that the expected total assets and the operational expenditure meet the first control objective;

and solving the dynamic programming equation by taking the maximized first control function as a cost function so as to determine the first target strategy.

According to an embodiment of the present disclosure, after solving the dynamic programming equation, further comprising:

verifying the first target policy;

discretizing at least one of the asset and the payout when the validation fails;

determining a second target strategy by using a Bellman equation according to the result of the discretization process, wherein the second target strategy comprises: when the discretized at least one of the assets and the payout meet a second control objective, the assets of the target object hold a mix ratio.

According to an embodiment of the present disclosure, the at least one asset includes an at-risk asset and an inauguration asset, the discretizing the at least one asset and the payout includes:

determining a probability trend of the at-risk asset from one time period to a next time period to obtain a first probability trend;

determining a discrete vector of the risk asset according to the first probability trend to obtain a first discrete vector;

and determining the discrete vector of the risk-free asset according to a first preset variation coefficient to obtain a second discrete vector.

According to an embodiment of the present disclosure, the expenditure includes interest expenditure and operation expenditure, and the determining the second target policy using the bellman equation according to the result of the discretization process includes:

determining an expected value of the total asset at the target time node through a second random process according to the first discrete vector, the second discrete vector, the interest expenditure and the operation expenditure after discretization;

determining a second control function according to the expected value of the total assets at the target time node and the operation expenditure;

maximizing the second control function such that the discretized at least one of the asset and the operational expenditure meets a second control objective;

and solving the Bellman equation by taking the maximized second control function as a cost function to obtain the second target strategy.

A second aspect of the present disclosure provides a policy generation apparatus, including:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring at least one item of asset held by a target object and the expense of the target object;

a first processing module, configured to generate a plurality of expected total assets of the target object on a target time node, where each expected total asset is generated through a random process based on the expenditure and at least one of the assets, and when each expected total asset is generated, an asset holding proportion is configured for the target object, where the asset holding proportion includes a held share of at least one of the assets by the target object, and when different expected total assets are generated, different asset holding proportions are configured;

a second processing module to determine a first objective strategy using a dynamic planning equation based on the expected total assets and the expenses, the first objective strategy comprising: when the expected total assets and the payout meet a first control objective, the assets of the target object hold a mix.

A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the policy generation method described above.

The fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the policy generation method described above.

The fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the policy generation method described above.

One or more of the above-described embodiments may have the following advantages or benefits:

with the policy generation method of the embodiments of the present disclosure, for each of a plurality of expected total assets, it is possible to generate through a random process according to the assets and expenses held by the target object. And then, continuously exploring to obtain an optimal solution based on a plurality of expected total assets and expenses through a dynamic planning equation, so as to obtain the optimal holding ratio of the target object to the assets under the condition of maximizing the income, and further obtain a first target control strategy. That is to say, the policy generation method of the embodiment of the present disclosure can generate the first target policy based on a random control manner, and even if there is sufficient uncertainty in the future, sufficient data (for example, a plurality of expected total assets) can be generated by the random control manner, and an optimal asset holding proportion is given on the basis, so that asset allocation optimization efficiency is greatly improved, optimization cost is reduced, and it is further beneficial to better plan asset allocation of a bank and further improve bank efficiency.

Drawings

The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:

FIG. 1 schematically illustrates an application scenario diagram of a policy generation method, apparatus, electronic device, storage medium and program product according to embodiments of the disclosure;

FIG. 2 schematically shows one of the flow diagrams of a policy generation method according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart for generating expected total assets according to an embodiment of the disclosure;

FIG. 4 schematically illustrates a flow diagram of a process of constructing a random according to an embodiment of the present disclosure;

FIG. 5 schematically illustrates a flow chart for determining a first target policy according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a second flow chart of a policy generation method according to an embodiment of the present disclosure;

FIG. 7 schematically shows a flow diagram of a discretization process in accordance with embodiments of the present disclosure;

FIG. 8 schematically illustrates a flow chart for determining a second target policy according to an embodiment of the present disclosure;

fig. 9 schematically shows a block diagram of a structure of a policy generation apparatus according to an embodiment of the present disclosure;

fig. 10 schematically shows a block diagram of an electronic device adapted to implement a policy generation method according to an embodiment of the present disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.

Where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).

It should be noted that the policy generation method, apparatus, electronic device, storage medium, and program product provided by the embodiments of the present disclosure relate to the field of computer technology. The policy generation method, apparatus, electronic device, storage medium, and program product provided by the embodiments of the present disclosure may be applied to the financial field or any field other than the financial field, for example, the policy generation method, apparatus, electronic device, storage medium, and program product provided by the embodiments of the present disclosure may be applied to an asset automatic configuration service in the financial field. The present disclosure does not limit the application fields of the policy generation method, apparatus, electronic device, storage medium, and program product.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure, application and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations, necessary confidentiality measures are taken, and the customs of the public order is not violated.

The embodiment of the present disclosure provides a policy generation method, including: at least one asset held by the target object and the expenditure of the target object are obtained. Generating a plurality of expected total assets of the target object on the target time node, wherein each expected total asset is generated through a random process based on the expenditure and at least one item of assets, configuring an asset holding ratio for the target object when generating each expected total asset, wherein the asset holding ratio comprises a holding share of the target object to the at least one item of assets, and configuring different asset holding ratios when generating different expected total assets. Determining a first objective strategy using a dynamic planning equation based on the expected total assets and expenses, the first objective strategy comprising: the target object's assets hold a mix when the total assets and expenses are expected to meet the first control objective.

In the embodiment of the disclosure, a first control function may be constructed according to a plurality of expected total assets and expenses, and then a dynamic planning equation is continuously explored and solved according to the first control function and a first control target condition, so as to find out an optimal asset holding ratio, and further obtain a first target strategy. For example, the first control objective may include maximizing revenue generated by anticipating total assets and payouts, where payouts generated revenue may include implicit revenue generated by operating payouts, such as employee loyalty and reputation, and the like. At this time, the first objective strategy includes the asset holding ratio that the objective subject should adopt when the total assets and expenses are expected to generate the maximum revenue. Alternatively, when determining the holding proportion of the assets that the target object should adopt, the holding proportion of the target object to the risky assets may be determined, and after determining the holding proportion of the target object to the risky assets, the financial institution may obtain the allocation proportion of the funds to each asset (e.g., non-risky assets) and the expenditure, so as to obtain the first target control policy.

Fig. 1 schematically illustrates an application scenario diagram of a policy generation method, apparatus, electronic device, storage medium, and program product according to embodiments of the present disclosure.

As shown in fig. 1, the application scenario 100 according to this embodiment may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use

terminal devices

101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the

terminal devices

101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.

It should be noted that the policy generation method provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the policy generation apparatus provided by the embodiment of the present disclosure may be generally disposed in the server 105. The policy generation method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105. Accordingly, the policy generation apparatus provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the

terminal devices

101, 102, 103 and/or the server 105.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

The embodiment of the disclosure provides a strategy generation method, and in the embodiment of the disclosure, a strategy generation method based on a random control theory is provided. The random control may refer to: the only way that one can use is a random control, which is that the only way that one can do is to have a targeted control, although the necessary conditions are not known at all to the nature of the control object. Random control is the most primitive control method, also called heuristic control, and is the basis of all other control methods.

The following describes in detail a strategy generation method based on the stochastic control theory according to an embodiment of the present disclosure with reference to fig. 2 to 8 based on the scenario described in fig. 1.

Fig. 2 schematically shows one of flowcharts of a policy generation method according to an embodiment of the present disclosure, and as shown in fig. 2, the policy generation method of this embodiment includes steps S210 to S230.

It should be noted that, although the steps in fig. 2 are shown in sequence as indicated by arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, in different orders, and may be performed in turn or in alternation with other steps or at least some of the sub-steps or stages of other steps.

At step S210, at least one asset held by the target object and a payout of the target object are acquired.

In embodiments of the present disclosure, assets and expenses held by a target object may refer to assets and expenses held by the target object at a certain moment. The target object may include a financial institution, such as a bank, and the policy generation method of the embodiment of the disclosure may be used to automatically generate an asset allocation policy according to the asset holding condition and the expenditure condition of the financial institution.

Illustratively, the at least one asset held by the target object may include a plurality of assets classified according to risk classes, such as risk assets and risk-free assets, and the like. The expenditure can be divided into various types according to the actual expenditure content, such as interest expenditure paid to the depositor and operation expenditure.

In step S220, a plurality of expected total assets of the target object on the target time node are generated, wherein each expected total asset is generated through a random process based on the expenditure and the at least one asset, when each expected total asset is generated, an asset holding ratio is configured for the target object, the asset holding ratio includes a holding share of the target object for the at least one asset, and when different expected total assets are generated, different asset holding ratios are configured.

In embodiments of the present disclosure, total assets may be derived from a combination of assets and payouts, e.g., total assets = total assets-payouts. The expected total assets are generated through a random process, for example, the random process can be constructed through a random differential equation, the random variation of the total assets can be obtained through the random differential equation, further, the total assets at the next moment can be obtained based on the total assets at a certain moment and the random variation thereof, and the like, until the total assets of the target time node are obtained, the expected total assets can be obtained. For each expected total asset, in the generation process, configuring an asset holding ratio for the target object, for example, for an expected total asset X1, in the generation process, configuring a holding share of the target object for the asset S1 as S11, and configuring a holding share of the target object for the asset S2 as S21; as another example, for the expected total asset X2, the holding share of the configuration target object for asset S1 is S12 and the holding share of the configuration target object for asset S2 is S22 during its generation.

Optionally, in the above embodiment, the configured holding shares may specifically refer to holding shares of risk assets by the target object.

At step S230, a first objective strategy is determined using a dynamic planning equation (HJB equation) based on the expected total assets and expenses, the first objective strategy comprising: the target object's assets hold a mix when the total assets and expenses are expected to meet the first control objective.

With the policy generation method of the embodiments of the present disclosure, for each of a plurality of expected total assets, it is possible to generate through a random process according to the assets and expenses held by the target object. And then, continuously exploring to obtain an optimal solution based on a plurality of expected total assets and expenses through a dynamic planning equation, so as to obtain the optimal holding ratio of the target object to the assets under the condition of maximizing the income, and further obtain a first target control strategy. That is to say, the policy generation method according to the embodiment of the disclosure can generate the first target policy based on a random control manner, and even if there is sufficient uncertainty in the future, sufficient data (for example, a plurality of expected total assets) can be generated by the random control manner, and an optimal asset holding proportion is given on the basis, so that asset allocation optimization efficiency is greatly improved, optimization cost is reduced, and it is further beneficial to better plan asset allocation of a bank and further improve bank efficiency.

The following describes a policy generation method according to an embodiment of the present disclosure with reference to fig. 2 to 8 by taking an example that a target object includes a silver behavior.

In some embodiments, the at least one asset includes an inauguration asset, such as a fund financing or the like, and an inauguration asset, such as a treasury or the like. In generating each expected total asset, configuring asset holding proportions for the target object includes: the target object's share of holds on the risky assets and share of holds on the non-risky assets are configured. Fig. 3 schematically shows a flowchart of generating expected total assets according to an embodiment of the present disclosure, and as shown in fig. 3, step S220 includes steps S221 to S224.

In step S221, a deposit of the depositor is acquired.

In the embodiment of the present disclosure, deposits of all depositors in the bank may be acquired.

In step S222, the total assets of the target object on at least one preset time node are determined according to the share of the target object for the risky assets, the share of the target object for the non-risky assets, and the deposit.

In the embodiment of the present disclosure, taking a preset time node t as an example, the jth risk asset held by a target object at the preset time node t is

Hold a share of

The risk-free assets held by the target object at the preset time node t are B _t Having a share of beta _t (ii) a At the preset time node t, the deposit of the depositor is D _t . Target object in preset time sectionTotal asset X at point t _t Can be determined by the following formula:

wherein beta is _t B _t The value of the risk-free assets held by the target object at the preset time node t,

the value of the jth risky asset held for the target object,

and the values of the m risk assets held by the target object at the preset time node t are obtained.

In step S223, a random process is constructed using random differential equations.

In the embodiment of the disclosure, random differential equations may be respectively constructed for the risk-free assets, the risk assets and the expenses, the variation of the risk-free assets, the risk assets and the expenses may be determined through the random differential equations, and then the random differential equations of the total assets may be determined according to the random differential equations to construct a random process.

In some embodiments, the payout includes an interest payout and an operational payout, and fig. 4 schematically illustrates a flowchart of a process of constructing a random according to an embodiment of the present disclosure, and as shown in fig. 4, step S223 includes steps S2231 to S2235.

At step S2231, the interest rate of the deposit, the risk-free interest rate, and the expected rate of change and the expected strength of change of the risky asset are obtained.

In the disclosed embodiment, the interest rate of the deposit is r _D With no risk of interest rate r _B At preset time node t, jth risk asset

Is mu _j J th risky asset

The ith expected change intensity of

Wherein the expected rate of change mu _j And expected change in intensity

Can be based on the risk assets

Can estimate the expected change strength as a function of the variance of the historical data

Estimating an expected rate of change as mu from the beginning data and the end data of the historical data _i 。

At step S2232, stochastic differential equations for the risk-free assets are determined based on the risk-free interest rates to arrive at first stochastic differential equations. The amount of change in risk-free interest rate can be expressed by the first random differential equation.

In the disclosed embodiment, the non-risky asset B _t The random differential equation of (a) is as follows:

dB _t ＝r _B B _t dt (2)

at step S2233, a random differential equation for the non-risky asset is determined based on the expected rate of change, the expected strength of change, and the random distribution function to obtain a second random differential equation. Through a second random differential equation, the amount of change in risk interest rate can be expressed.

In the disclosed embodiment, at preset time node t, the ith risk asset

The random differential equation of (a) is as follows:

wherein I is the expected change intensity

The number of the (c) is greater than the total number of the (c),

to change the intensity of the expected change at the preset time node t

A corresponding random distribution function.

At step S2234, a stochastic differential equation for interest expenditure is determined based on the deposit and the interest rate of the deposit to obtain a third stochastic differential equation.

In the disclosed embodiment, interest C is paid out _t The random differential equation of (a) is as follows:

dC _t ＝r _D D _t dt (4)

at step S2235, a random differential equation for the total asset is determined based on the first random differential equation, the second random differential equation, the third random differential equation, and the operational expenditure data to construct a stochastic process.

In the disclosed embodiment, total asset X _t The random differential equation of (a) is as follows:

wherein the operation expenditure O _t Differential dO of (1) _t The operation expense data determination of two adjacent time points can be directly obtained.

In step S224, expected total assets are generated by a random process according to the determined at least one total asset.

For example, the expected total assets are determined by the above equations (1) and (5)

Wherein y represents an asset holding mix ratio,

represents: and when the asset holding ratio adopted by the target object is y, the total asset of the target object at the target time node T. Wherein the asset holding ratio y may specifically be: for each risk asset, at a preset time node t, a holding share corresponding to the asset holding ratio y is adopted, for example, for the jth risk asset

Make hold of share

For the j +1 th risky asset

Make hold of share

The α 1 and α 2 may be the same or different, and may be determined according to actual needs, which is not limited herein.

In some specific embodiments, the expenditure includes an operational expenditure, fig. 5 schematically illustrates a flowchart of determining the first target policy according to an embodiment of the present disclosure, and as shown in fig. 5, step S230 includes steps S231 to S233.

In step S231, a first control function is determined based on the expected total assets and operational expenditures.

In the disclosed embodiments, the bank's revenues are primarily implicit revenues generated by operational expenditures (e.g., reputation and employee loyalty, etc.) and explicit revenues directly reflected on the assets (e.g., asset revenues). The losses of the bank are mainly capital consumption (such as bad accounts and investment loss, etc., which are reflected in the risky assets) and operational expenses.

In the disclosed embodiment, to determine the best asset-holding mix, a first control objective may be configured with the goal of maximizing implicit and explicit revenue, and thus, the following first control function J (t, x) may be constructed based on expected total assets and operational expenses:

wherein the content of the first and second substances,

when X occurs ₀ When = x, occurs

Expectation of e- ^rt Conversion factor, X, for continuous remuneration ₀ Is the initial total asset.

In step S232, the first control function is maximized to meet the first control objective with the expected total assets and operational expenses, i.e., to maximize the revenue generated by the expected total assets and operational expenses. In this way, a cost function v (t, x) can be obtained:

v(t,x)＝sup _y∈Y J(t,x) (6)

wherein, Y is the matching mode of all assets holding.

In step S233, the dynamic programming equation is solved with the maximized first control function as a cost function to determine a first target strategy.

In the disclosed embodiment, the dynamic programming equation is as follows:

v(T,x)＝g(x) (8)

wherein, the first and the second end of the pipe are connected with each other,

as partial differential operator, L ^y Is Laplace operator, f ^y Holding f (O) for assets with a match a _t ) V (T, x) = g (x) is a termination function.

Combining formula (7)) And (4) solving the formula (8) to obtain the optimal asset holding ratio y (t, X), namely, when the explicit income of the assets and the implicit income of the operation expenditure are maximized, at a preset time node t, when the total assets X are subjected to the operation expenditure _t If = x, the asset to be used holds the mix ratio.

In some embodiments, in an ideal case, the optimal asset holding ratios may be obtained directly using the analytical solutions of the dynamic programming equations, but in some cases, the dynamic programming equations may not necessarily have solutions or the resulting solutions may not be applicable, and the following will provide a solution that will be discretized and the optimal asset holding ratios are derived.

Fig. 6 schematically illustrates a second flowchart of a policy generation method according to an embodiment of the present disclosure, and as shown in fig. 6, in some specific embodiments, after step S233, step S240 to step S260 are further included.

In step S240, the first target policy is verified.

In the disclosed embodiment, the optimal asset-holding ratio y (t, x) may be verified, for example, after determining the optimal asset-holding ratio y (t, x), it may be brought back to a random process and verified whether it meets the HJB yoke condition, if so, it is indicated that the verification passes, otherwise, it is determined that the verification fails. It should be noted that the HJB yoke condition can be verified in a conventional verification manner, and therefore, the details thereof are not described herein.

In step S250, when the verification fails, discretization processing is performed on the assets and the expenses.

In the embodiment of the present disclosure, the preset time period N may be divided into N segments, and then, discretization processing is performed on the assets according to the N segments.

Fig. 7 schematically illustrates a flow diagram of a discretization process in accordance with an embodiment of the present disclosure, where, as shown in fig. 7, in some particular embodiments, at least one asset comprises an at-risk asset and an inauguration asset, and step S250 comprises steps S251 through S253.

Optionally, the holdings of the assets may be discretized, for example, discretizing the holdings of the assets to obtain the assets H in the kth segment _k Is a discrete variable h holding a share _k Discrete variable h _k Including non-risky assets B _k Is a discrete variable b of holding shares _k And first at risk asset

Is a discrete variable of held shares

Second risky asset

Is a discrete variable of held shares

Mth risky asset

Is a discrete variable of held shares

Wherein, b _k And

to is that

And taking z%, wherein z is a positive integer. Discrete variable h _k Is of the form:

and then discretizing the value of the asset.

At step S251, a probabilistic trend of the risky asset from one time period to the next time period is determined to obtain a first probabilistic trend.

In embodiments of the present disclosure, for a risky asset, it may determine a probabilistic trend using a binary tree from one time period to the next.

For example, at time period k +1, the risky assets

The probability of p rises by a factor of u and the probability of q falls by a factor of d.

From this, the probability trend of the risky assets from one time period to the next time end can be derived.

Wherein u, d, p, q can be obtained by simultaneous solution of the following equations:

p+q＝1 (12)

wherein the content of the first and second substances,

is composed of

In the event that this occurs, the system will,

the expectation of the occurrence is that,

is composed of

In the event that this occurs, the system will,

the deviation occurred.

At step S252, at least one discrete vector of the risk assets is determined according to the first probability trend to obtain a first discrete vector, that is, to obtain a first discrete vector

And

and so on.

In step S253, at least one discrete vector of the risk-free asset is determined according to the first predetermined variation coefficient to obtain a second discrete vector, that is, the risk-free asset B is obtained _k And B _k+1 And so on.

In the disclosed embodiment, the non-risky asset B _k+1 May be based on non-risky asset B _k Obtained by the following formula:

wherein the content of the first and second substances,

is a first predetermined variation coefficient which can be based on the risk-free interest rate r _B Thus obtaining the product.

In some embodiments, interest expenditure C for the kth time period may be paid _k And operating expenses O _k Discretization processing is also performed. For example, the interest expenditure in the next time period may be determined according to the interest expenditure in the previous time period and the second preset variation coefficient, so as to obtain a discrete vector of the interest expenditure.

In step S260, a second target strategy is determined using the bellman equation according to the result of the discretization process, the second target strategy including: when the discretized at least one asset and the expenditure meet a second control objective, the asset of the target object holds the mix ratio.

In the disclosed embodiment, the Bellman Equation (Bellman equalization) is found by Rich Bellman. This equation expresses the "value of the decision problem at a particular time" in terms of "reward from initial selection over value of the decision problem derived from initial selection". Thereby, the dynamic optimization problem is changed into simple sub-problems, and the sub-problems obey the optimization principle proposed by Bellmann.

Fig. 8 schematically illustrates a flowchart of determining a second target policy according to an embodiment of the present disclosure, and as shown in fig. 8, in some specific embodiments, the expenditure includes an interest expenditure and an operation expenditure, and step S260 includes steps S261 to S264.

In step S261, an expected value of the total asset at the target time node, that is, the total asset at the time period of the target time node is determined through a second random process according to the first discrete vector, the second discrete vector, the interest expenditure and the operation expenditure after the discretization process.

In the disclosed embodiment, the second random process is as follows:

X _k+1 ＝X _k +ΔX _k (17)

wherein, delta B _k The method can be obtained according to the discrete variables of the risk-free assets in two adjacent time periods, and correspondingly,

ΔC _k and Δ O _k Or may be obtained according to discrete variables of two adjacent time periods, and therefore, the description is omitted here.

In step S262, a second control function is determined based on the expected value of the total assets at the target time node and the operational expenditure.

In the disclosed embodiment, the second control function J ^y′ (x) The following were used:

wherein the content of the first and second substances,

represents: when the asset holding ratio adopted by the target object is y', the target object is at the target time node, that is, the total asset of the nth time period.

In step S263, the second control function is maximized such that the discretized at least one asset and operational expenditure meets a second control objective.

In an embodiment of the present disclosure, maximizing the second control function yields a cost function V (x):

V(x)＝max _y∈Y J ^y (x) (20)

in step S264, the bellman equation is solved with the maximized second control function as the cost function to obtain the second objective strategy.

In the disclosed embodiment, the bellman equation is as follows:

and solving the Bellman equation to obtain the optimal asset holding ratio.

Optionally, when solving the bellman equation, a V-value iterative algorithm may be used, a Q-learning algorithm, a control iterative algorithm, and the like may also be used, which may be specifically determined according to actual needs, and is not limited herein.

Based on the strategy generation method, the disclosure also provides a strategy generation device. The apparatus will be described in detail below with reference to fig. 9.

Fig. 9 schematically shows a block diagram of a policy generation apparatus according to an embodiment of the present disclosure.

As shown in fig. 9, the policy generating apparatus 900 of this embodiment includes an obtaining module 910, a first processing module 920, and a second processing module 930.

The obtaining module 910 is configured to obtain at least one of an asset held by the target object and a payout of the target object. In an embodiment, the obtaining module 910 may be configured to perform the step S210 described above, which is not described herein again.

The first processing module 920 is configured to generate a plurality of expected total assets of the target object on the target time node, wherein each expected total asset is generated through a random process based on the expenditure and the at least one asset, when each expected total asset is generated, an asset holding ratio is configured for the target object, the asset holding ratio includes a held share of the at least one asset by the target object, and when different expected total assets are generated, different asset holding ratios are configured. In an embodiment, the first processing module 920 may be configured to perform the step S220 described above, which is not described herein again.

The second processing module 930 is configured to determine a first objective strategy using the dynamic planning equation according to the expected total assets and expenses, the first objective strategy comprising: when the total assets and expenses are expected to meet the first control objective, the assets of the target object hold the mix. In an embodiment, the second processing module 930 may be configured to perform the step S230 described above, which is not described herein again.

In the embodiment of the disclosure, a first control function may be constructed according to a plurality of expected total assets and expenses, and then a dynamic planning equation is continuously explored and solved according to the first control function and a first control target condition, so as to find out an optimal asset holding ratio, and further obtain a first target strategy. For example, the first control objective may include maximizing revenue generated by the expected total assets and payouts, where the revenue generated by payouts may include implicit revenue generated by operating payouts, such as employee loyalty and reputation, and the like. At this time, the first objective strategy includes the asset holding ratio that the objective subject should adopt when the total assets and expenses are expected to generate the maximum revenue. Alternatively, when determining the holding proportion of the assets that the target object should adopt, the holding proportion of the target object to the risky assets may be determined, and after determining the holding proportion of the target object to the risky assets, the financial institution may obtain the allocation proportion of the funds to each asset (e.g., non-risky assets) and the expenditure, so as to obtain the first target control policy.

With the policy generation apparatus according to the embodiment of the present disclosure, for each of a plurality of expected total assets, it is possible to generate through a random process according to the assets and expenses held by the target object. And then, continuously exploring to obtain an optimal solution through a dynamic planning equation based on a plurality of expected total assets and expenses so as to obtain the optimal holding ratio of the target object to the assets under the condition of maximizing the income and further obtain a first target control strategy. That is to say, the policy generation method of the embodiment of the present disclosure can generate the first target policy based on a random control manner, and even if there is sufficient uncertainty in the future, sufficient data (for example, a plurality of expected total assets) can be generated by the random control manner, and an optimal asset holding proportion is given on the basis, so that asset allocation optimization efficiency is greatly improved, optimization cost is reduced, and it is further beneficial to better plan asset allocation of a bank and further improve bank efficiency.

According to an embodiment of the present disclosure, any multiple modules of the obtaining module 910, the first processing module 920 and the second processing module 930 may be combined into one module to be implemented, or any one of the modules may be split into multiple modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the obtaining module 910, the first processing module 920 and the second processing module 930 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented by hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or implemented by any one of three implementations of software, hardware and firmware, or any suitable combination of any of them. Alternatively, at least one of the obtaining module 910, the first processing module 920 and the second processing module 930 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.

In some embodiments, the at least one asset includes an at-risk asset and an inauguration asset, the configuring of asset holding proportions for the target object in generating each expected total asset comprising: the target object's share of holds on the risky assets and share of holds on the non-risky assets are configured.

The first processing module is specifically configured to perform the following steps:

a deposit of the depositor is obtained.

And determining the total assets of the target object on at least one preset time node according to the share of the target object to the risky assets, the share of the target object to the non-risky assets and the deposit.

A stochastic process is constructed using stochastic differential equations.

And generating the expected total assets through a random process according to the determined at least one total asset.

In some embodiments, the first processing module is specifically configured to perform the following steps:

interest rates for deposits, risk-free interest rates, and expected rates and expected strengths of change for risky assets are obtained.

And determining a random differential equation of the risk-free asset according to the risk-free interest rate to obtain a first random differential equation.

And determining a random differential equation of the risk-free asset according to the expected change rate, the expected change intensity and the random distribution function to obtain a second random differential equation.

And determining a random differential equation of interest expenditure according to the deposit and the interest rate of the deposit to obtain a third random differential equation.

And determining the random differential equation of the total assets according to the first random differential equation, the second random differential equation, the third random differential equation and the operation expenditure so as to construct a random process.

In some embodiments, the second processing module is specifically configured to perform the following steps:

a first control function is determined based on the expected total assets and operational expenditures.

The first control function is maximized such that the expected total assets and operational expenses meet the first control objective.

And solving a dynamic programming equation by taking the maximized first control function as a cost function so as to determine a first target strategy.

In some embodiments, the policy generation apparatus further includes a third processing module, and the third processing module is configured to perform the following steps after solving the dynamic programming equation:

the first target policy is verified.

When the verification fails, discretizing at least one asset and payout.

Determining a second target strategy by using a Bellman equation according to the discretization processing result, wherein the second target strategy comprises the following steps: when the discretized at least one asset and the expenditure meet a second control objective, the asset of the target object holds the mix ratio.

In some embodiments, the third processing module is specifically configured to perform the following steps:

a probabilistic trend of the at-risk asset from one time period to a next time period is determined to derive a first probabilistic trend.

And determining a discrete vector of the risk assets according to the first probability trend to obtain a first discrete vector.

And determining a discrete vector of the risk-free asset according to the first preset variation coefficient to obtain a second discrete vector.

In some embodiments, the expenditure includes interest expenditure and operational expenditure, and in some embodiments, the third processing module is specifically configured to perform the steps of:

and determining an expected value of the total asset at the target time node through a second random process according to the first discrete vector, the second discrete vector, the interest expenditure and the operation expenditure which are subjected to discretization treatment.

And determining a second control function according to the expected value of the total assets at the target time node and the operation expenditure.

The second control function is maximized such that the discretized at least one asset and operational expenditure meet a second control objective.

And solving the Bellman equation by taking the maximized second control function as a value function to obtain a second target strategy.

As shown in fig. 10, an electronic device 1000 according to an embodiment of the present disclosure includes a processor 1001 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage section 1008 into a Random Access Memory (RAM) 1003. Processor 1001 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 1001 may also include onboard memory for caching purposes. The processor 1001 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.

In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are stored. The processor 1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. The processor 1001 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1002 and/or the RAM 1003. Note that the programs may also be stored in one or more memories other than the ROM 1002 and the RAM 1003. The processor 1001 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.

Electronic device 1000 may also include an input/output (I/O) interface 1005, the input/output (I/O) interface 1005 also being connected to bus 1004, according to an embodiment of the present disclosure. The electronic device 1000 may also include one or more of the following components connected to the I/O interface 1005: an input section 1006 including a keyboard, a mouse, and the like; an output portion 1007 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 1008 including a hard disk and the like; and a communication portion 1009 including a network interface card such as a LAN card, a modem, or the like. The communication section 1009 performs communication processing via a network such as the internet. The driver 1010 is also connected to the I/O interface 1005 as necessary. A removable medium 1011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 1010 as necessary, so that a computer program read out therefrom is mounted into the storage section 1008 as necessary.

The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a policy generation method according to an embodiment of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 1002 and/or the RAM 1003 described above and/or one or more memories other than the ROM 1002 and the RAM 1003.

Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated by the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the strategy generation method provided by the embodiment of the disclosure.

The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1001. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, and the like. In another embodiment, the computer program may also be transmitted in the form of a signal on a network medium, distributed, downloaded and installed via the communication part 1009, and/or installed from the removable medium 1011. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

In such an embodiment, the computer program may be downloaded and installed from a network through the communication part 1009 and/or installed from the removable medium 1011. The computer program performs the above-described functions defined in the system of the embodiment of the present disclosure when executed by the processor 1001. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.

In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

It will be appreciated by a person skilled in the art that various combinations or/and combinations of features recited in the various embodiments of the disclosure and/or in the claims may be made, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.

The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims

1. A policy generation method, comprising:

acquiring at least one item of asset held by a target object and expenditure of the target object;

2. The method of claim 1, wherein the at least one asset comprises an inauguration asset and an inauguration asset, and wherein the configuring of the asset holding proportion for the target object in generating each of the expected total assets comprises: configuring the target object's share of holds for the risky assets and share of holds for the non-risky assets;

acquiring deposit of a depositor;

constructing the random process by using a random differential equation;

3. The policy generation method according to claim 2, wherein the expenditure includes interest expenditure and operational expenditure, and the constructing the stochastic process using stochastic differential equations includes:

acquiring interest rate and risk-free interest rate of the deposit and expected change rate and expected change intensity of the risk assets;

determining a random differential equation for the non-risky asset based on the expected rate of change, the expected strength of change, and a random distribution function to obtain a second random differential equation;

determining stochastic differential equations for total assets based on the first stochastic differential equation, the second stochastic differential equation, the third stochastic differential equation, and the operational expenditure to construct the stochastic process.

4. The method of policy generation according to claim 1 wherein said expenditures comprise operational expenditures and said determining a first target policy using a dynamic planning equation based on said expected total assets and said expenditures comprises:

5. The policy generation method according to claim 4, further comprising, after solving the dynamic programming equation:

verifying the first target policy;

determining a second target strategy by using a Bellman equation according to the result of the discretization process, wherein the second target strategy comprises: the asset of the target object holds a mix ratio when the discretized at least one of the asset and the payout meets a second control objective.

6. The method of policy generation according to claim 5 wherein at least one of said assets comprises an at-risk asset and an inauguration asset, said discretizing at least one of said assets and said disbursement comprising:

7. The policy generation method according to claim 6, wherein the expenditure includes an interest expenditure and an operation expenditure, and the determining the second target policy using the bellman equation according to the result of the discretization process includes:

8. A policy generation apparatus, comprising:

a first processing module, configured to generate a plurality of expected total assets of a total asset of the target object on a target time node, wherein each of the expected total assets is generated by a random process based on the expenditure and at least one of the assets, and when each of the expected total assets is generated, an asset holding ratio is configured for the target object, the asset holding ratio comprises a held share of at least one of the assets by the target object, and when different of the expected total assets are generated, different asset holding ratios are configured;

9. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the policy generation method according to any one of claims 1-7.

10. A computer-readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform a policy generation method according to any one of claims 1 to 7.

11. A computer program product, characterized in that it comprises a computer program which, when executed by a processor, implements a policy generation method according to any one of claims 1 to 7.