CN116700972A

CN116700972A - Resource scheduling model training method, scheduling method and device of financial system

Info

Publication number: CN116700972A
Application number: CN202310688077.1A
Authority: CN
Inventors: 曾俊杰; 张彬; 唐琳娜; 方安
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-06-12
Filing date: 2023-06-12
Publication date: 2023-09-05

Abstract

The application relates to the technical field of artificial intelligence, in particular to a resource scheduling model training method, a scheduling method and a scheduling device of a financial system. The training method comprises the following steps: acquiring initial resource occupation data in a financial system and an initial resource scheduling evaluation model; acquiring initial resource occupation data to execute resource scheduling evaluation values corresponding to various resource scheduling actions respectively through an initial resource scheduling evaluation model, and acquiring a target resource scheduling action with the maximum resource scheduling evaluation value; acquiring a resource loss value of the target resource scheduling action executed by the initial resource occupation data according to the resource scheduling loss model, and updating the initial resource scheduling evaluation model according to the resource loss value to obtain a target resource scheduling model; repeating iteration until the target resource scheduling model meets the preset convergence condition; and taking the target resource scheduling model as a resource scheduling model after training. The method can improve the utilization rate of the distributed financial business system resources.

Description

Resource scheduling model training method, scheduling method and device of financial system

Technical Field

The present application relates to the field of artificial intelligence technology, and in particular, to a resource scheduling model training method, scheduling method, apparatus, computer device, storage medium and computer program product for a financial system.

Background

In order to obtain user habits, user information and transaction information in the financial service system, the financial service system collects a large amount of financial service data, and in the service of collecting and performing batch calculation, a resource management system is used for scheduling and allocating resources.

The resource management system in the related art allocates tasks according to the following scheduling policies: first in first out, container scheduler, fairness scheduler. After the resource management system obtains the user request in the financial service system, the resource management system can obtain the task through one of the three strategies and allocate the system resource applied by the task. When the three strategies are adopted to allocate system resources, the resource utilization rate of the distributed financial business system is low, so that the operation and the performance of the distributed financial business system are negatively influenced, and the operation efficiency of financial business data calculation tasks is reduced.

Therefore, how to optimize the scheduling of distributed resources in the financial business system and improve the utilization rate of the distributed financial business system resources, so as to improve the operation efficiency of the financial business data calculation task is a technical problem to be solved.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a resource scheduling model training method, scheduling method and apparatus capable of providing a financial system.

In a first aspect, the present application provides a method for training a resource scheduling model of a financial system, the method comprising:

acquiring initial resource occupation data of each financial node in a financial system and a pre-constructed initial resource scheduling evaluation model associated with the financial system;

acquiring the initial resource occupation data, executing the resource scheduling evaluation values corresponding to various resource scheduling actions respectively through the initial resource scheduling evaluation model, and acquiring a target resource scheduling action with the maximum resource scheduling evaluation value;

acquiring a resource loss value of the initial resource occupation data for executing the target resource scheduling action according to a resource scheduling loss model associated with the financial system, and updating the initial resource scheduling evaluation model according to the resource loss value to obtain a target resource scheduling model;

Acquiring target resource occupation data of each financial node after the initial resource occupation data execute the target resource scheduling action, taking the target resource occupation data as new initial resource occupation data, taking the target resource scheduling model as a new initial resource scheduling evaluation model, and returning to execute the step of acquiring resource scheduling evaluation values respectively corresponding to various resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the target resource scheduling model meets the preset convergence condition;

and taking the target resource scheduling model as a resource scheduling model trained by the financial system.

In one embodiment, the obtaining, by the initial resource scheduling evaluation model, the resource scheduling evaluation value corresponding to the initial resource occupation data for executing the multiple resource scheduling actions, and obtaining the target resource scheduling action with the largest resource scheduling evaluation value includes:

acquiring a plurality of resource scheduling actions corresponding to the preset initial resource occupation data;

inputting each resource scheduling action in a plurality of resource scheduling actions and the initial resource occupation data into the initial resource scheduling evaluation model to obtain resource scheduling evaluation values respectively corresponding to each resource scheduling action executed by the initial resource occupation data;

And taking the resource scheduling action with the largest resource scheduling evaluation value as the target resource scheduling action in the resource scheduling evaluation value.

In one embodiment, the updating the initial resource scheduling evaluation model according to the resource loss value to obtain a target resource scheduling model includes:

acquiring a first resource scheduling evaluation value corresponding to the target resource scheduling action and current resource occupation data of each financial node after the initial resource occupation data execute the target resource scheduling action;

determining resource scheduling evaluation values respectively corresponding to the plurality of resource scheduling actions of the current resource occupation data according to the initial resource scheduling evaluation model, and taking the maximum resource scheduling evaluation value corresponding to the current resource occupation data as a second resource scheduling evaluation value;

updating the first resource scheduling evaluation value according to the difference value between the second resource scheduling evaluation value and the first resource scheduling evaluation value and the sum of the difference value and the resource loss value to obtain an updated first resource scheduling evaluation value;

and updating the initial resource scheduling evaluation model according to the updated first resource scheduling evaluation value to obtain the target resource scheduling model.

In one embodiment, the step of using the target resource scheduling model as a new initial resource scheduling evaluation model, and returning to execute the step of obtaining the resource scheduling evaluation values respectively corresponding to the multiple resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the target resource scheduling model meets a preset convergence condition includes:

taking the target resource scheduling model as a new initial resource scheduling evaluation model, and detecting whether a system state corresponding to the target resource occupation data reaches a preset system ending state or not; the system end state is a system state corresponding to the target resource occupation data updated by the preset times;

if the system state does not reach the system ending state, returning to execute the step of acquiring resource scheduling evaluation values respectively corresponding to various resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the system state reaches the system ending state;

if the system state reaches the system ending state, judging whether the target resource scheduling model meets a preset convergence condition or not;

If the target resource scheduling model does not meet the preset convergence condition, resetting the target resource occupation data to initial resource occupation data corresponding to a system initial state, and returning to execute the step of acquiring resource scheduling evaluation values respectively corresponding to various resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the target resource scheduling model meets the preset convergence condition; the initial state of the system is a system state corresponding to initial resource occupation data before the initial resource scheduling evaluation model is updated for the first time.

In one embodiment, before the obtaining the initial resource occupation data of each financial node in the financial system and the pre-constructed initial resource scheduling evaluation model associated with the financial system, the method further includes:

acquiring historical resource scheduling data of the financial system in a preset operation period, wherein the historical resource scheduling data comprises historical resource occupation data, historical resource scheduling actions corresponding to the historical resource occupation data and historical resource loss values of the historical resource occupation data for executing the historical resource scheduling actions;

And constructing the initial resource scheduling evaluation model and the resource scheduling loss model according to the historical resource occupation data, the historical resource scheduling action and the historical resource loss value.

In one embodiment, the constructing the initial resource scheduling evaluation model and the resource scheduling loss model according to the historical resource occupation data, the historical resource scheduling action and the historical resource loss value includes:

taking the historical resource occupation data and the historical resource scheduling action corresponding to the historical resource occupation data as independent variables, and taking the historical resource loss value of the historical resource scheduling action executed by the historical resource occupation data as dependent variables, and fitting by using a deep neural network algorithm to obtain the resource scheduling loss model;

according to the historical resource occupation data and the historical resource scheduling action corresponding to the historical resource occupation data, determining a historical resource scheduling evaluation value of the historical resource scheduling action executed by the historical resource occupation data;

and taking the historical resource occupation data and the historical resource scheduling action corresponding to the historical resource occupation data as independent variables, taking the historical resource scheduling evaluation value as dependent variable, and fitting according to a deep neural network algorithm to obtain the initial resource scheduling evaluation model.

In a second aspect, the present application provides a resource scheduling method for a financial system. The method comprises the following steps:

acquiring real-time resource occupation data of each financial node in a financial system and a resource scheduling model trained by the financial system; the resource scheduling model is trained by the method of any one of claims 1 to 6;

acquiring real-time resource scheduling evaluation values respectively corresponding to the real-time resource scheduling actions executed by the real-time resource occupation data through the resource scheduling model, and acquiring real-time resource scheduling actions with the maximum real-time resource scheduling evaluation values;

and scheduling the resources of the financial system according to the real-time resource scheduling action with the maximum real-time resource scheduling evaluation value.

In a third aspect, the present application further provides a training device for a resource scheduling model of a financial system. The device comprises:

the data model acquisition module is used for acquiring initial resource occupation data of each financial node in the financial system and a pre-constructed initial resource scheduling evaluation model associated with the financial system;

the scheduling action acquisition module is used for acquiring the initial resource occupation data, executing the resource scheduling evaluation values respectively corresponding to the plurality of resource scheduling actions through the initial resource scheduling evaluation model, and acquiring the target resource scheduling action with the maximum resource scheduling evaluation value;

The model updating module is used for acquiring the resource loss value of the initial resource occupation data for executing the target resource scheduling action according to a resource scheduling loss model associated with the financial system, which is constructed in advance, and updating the initial resource scheduling evaluation model according to the resource loss value to obtain a target resource scheduling model;

the cyclic convergence module is used for acquiring target resource occupation data of each financial node after the initial resource occupation data execute the target resource scheduling action, taking the target resource occupation data as new initial resource occupation data, taking the target resource scheduling model as a new initial resource scheduling evaluation model, and returning to execute the step of acquiring resource scheduling evaluation values respectively corresponding to various resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the target resource scheduling model meets preset convergence conditions;

and the training completion module is used for taking the target resource scheduling model as a resource scheduling model for completing the training of the financial system.

In a fourth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method described in the first or second aspect when the computer program is executed.

In a fifth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method described in the first or second aspect.

In a sixth aspect, the application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described in the first or second aspect.

The resource scheduling model training method, the scheduling device, the computer equipment, the storage medium and the computer program product of the financial system are realized by acquiring initial resource occupation data of each financial node in the financial system and a pre-constructed initial resource scheduling evaluation model associated with the financial system; acquiring the initial resource occupation data through the initial resource scheduling evaluation model, executing resource scheduling evaluation values corresponding to various resource scheduling actions respectively, and acquiring a target resource scheduling action with the maximum resource scheduling evaluation value; acquiring a resource loss value of the initial resource occupation data for executing the target resource scheduling action according to a resource scheduling loss model associated with the financial system, and updating the initial resource scheduling evaluation model according to the resource loss value to obtain a target resource scheduling model; then, acquiring target resource occupation data of each financial node after the initial resource occupation data execute the target resource scheduling action, taking the target resource occupation data as new initial resource occupation data, taking the target resource scheduling model as a new initial resource scheduling evaluation model, and returning to execute the step of acquiring resource scheduling evaluation values respectively corresponding to various resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the target resource scheduling model meets a preset convergence condition; and finally, taking the target resource scheduling model as a resource scheduling model which is trained by the financial system.

It can be known that, in this scheme, an initial resource scheduling model is updated and trained to obtain a converged resource scheduling model, where in the training process, a plurality of resource scheduling evaluation values corresponding to the model are obtained through an existing initial resource scheduling evaluation model and initial resource occupation data, at this time, it is necessary to obtain a highest evaluation value among the plurality of resource scheduling evaluation values, and an action corresponding to the highest evaluation value is taken as a next resource scheduling action, and an action corresponding to the highest evaluation value means that the resource scheduling action corresponding to the evaluation value is an optimal scheduling action, so that a loss value of computing resources can be reduced. At this time, the highest evaluation value is updated by adopting a reinforcement learning mode and a resource loss function, so that an initial resource scheduling model is updated, the highest evaluation value is continuously updated and trained, the resource scheduling model is continuously updated until convergence, and the resource scheduling model after training can fully represent the optimal resource scheduling action of the financial system in different resource occupation states, so that the utilization rate of resources of the distributed financial service system is improved, and the running efficiency of financial service data calculation tasks is improved.

Drawings

FIG. 1 is an application environment diagram of a resource scheduling model training method of a financial system in one embodiment;

FIG. 2 is a flow chart of a method for training a resource scheduling model of a financial system in one embodiment;

FIG. 3 is a flow chart of a method for training a resource scheduling model of a financial system in one embodiment;

FIG. 4 is a flow chart of a method for training a resource scheduling model of a financial system in one embodiment;

FIG. 5 is a flow diagram of constructing an initial resource scheduling evaluation model and a resource scheduling loss model in one embodiment;

FIG. 6 is a flow chart of a method for scheduling resources of a financial system according to one embodiment;

FIG. 7 is a flow diagram of a method for building a resource scheduling model of a financial system, in one embodiment;

FIG. 8 is a flow chart of data sampling of a financial system in one embodiment;

FIG. 9 is a flow diagram of constructing an initial resource scheduling evaluation model and a resource scheduling loss model in one embodiment;

FIG. 10 is a flow diagram of a method for training a resource scheduling model of a financial system in one embodiment;

FIG. 11 is a schematic diagram of a resource scheduling model training method of a financial system in one embodiment;

FIG. 12 is a block diagram of a resource scheduling model training apparatus of a financial system in one embodiment;

FIG. 13 is a block diagram of a resource scheduler of a financial system in one embodiment;

fig. 14 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

The training method of the resource scheduling model of the financial system or the scheduling method of the resource scheduling model of the financial system provided by the embodiment of the application can be applied to the application environment as shown in figure 1. The terminal 102 communicates with the server 104 through a network, and receives and transmits resource occupation data, resource scheduling actions and resource scheduling evaluation values of each financial node in the financial system. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.

In one embodiment, as shown in fig. 2, a method for training a resource scheduling model of a financial system is provided, and the method is applied to the server 104 in fig. 1 for illustration, and includes the following steps:

s202, initial resource occupation data of all financial nodes in a financial system and a pre-constructed initial resource scheduling evaluation model associated with the financial system are acquired.

The financial system is a distributed system, and comprises a plurality of financial nodes, each node can perform independent operation, and each node can independently allocate computing resources, so that the whole distributed system can flexibly allocate resources. The initial resource occupation data is an initial value set when the simulation system operates, and the resource occupation data refers to the resource occupation amount of each node in the financial system, so that the initial resource occupation data is the resource occupation data corresponding to a plurality of financial nodes. It will be appreciated that different resource usage data may correspond to different current system states of the financial system, one resource usage data corresponding to a current system state of the system.

The initial resource scheduling evaluation model is a pre-constructed function model related to the financial system, and the function model consists of resource occupation data, resource scheduling actions and resource scheduling evaluation values. The resource scheduling action refers to various resource scheduling actions which can be executed in the current system state, wherein the resource scheduling action can jump the system state from the current state to another state, so that the change of the system state is completed. Meanwhile, by determining the resource scheduling action and the resource occupation data corresponding to the current state, the resource scheduling action can be executed in the current state and the corresponding evaluation value is jumped to the next system state, so that a plurality of resource scheduling actions exist in the current state, and each resource scheduling action has the corresponding resource scheduling evaluation value.

Optionally, as an embodiment, the server obtains preset initial resource occupation data of each node in the financial system stored in the database, and obtains a preset initial resource scheduling evaluation function.

S204, acquiring the initial resource occupation data to execute the resource scheduling evaluation values respectively corresponding to the plurality of resource scheduling actions through the initial resource scheduling evaluation model, and acquiring the target resource scheduling action with the maximum resource scheduling evaluation value.

When the initial resource occupation data is acquired, an initial financial system state can be obtained, at this time, a plurality of resource scheduling actions exist in the initial resource occupation data, each resource scheduling action can change the initial financial system state and switch to another financial system state, each resource scheduling action can calculate a resource scheduling evaluation value with the initial resource occupation data, the calculation process is to input the initial resource occupation data and one resource scheduling action into an initial resource scheduling evaluation model, so that the resource scheduling evaluation value corresponding to the resource scheduling action is obtained, and the like, all the resource scheduling evaluation values can be obtained through the initial resource scheduling evaluation model.

In order to screen out the needed resource scheduling actions, the server calculates the evaluation value corresponding to each scheduling action, and then selects the resource scheduling action with the largest evaluation value as the resource scheduling action participating in the subsequent step.

The selection of the resource scheduling action with the largest evaluation value is only one implementation mode, and alternatively, the resource scheduling action with the smallest evaluation value or the resource scheduling action with the closest evaluation value to the average value of the evaluation values can be screened.

The selection of the resource scheduling action corresponding to one evaluation value is only one embodiment, and alternatively, two or more resource scheduling actions corresponding to evaluation values may be selected.

S206, acquiring a resource loss value of the initial resource occupation data for executing the target resource scheduling action according to a resource scheduling loss model associated with the financial system, and updating the initial resource scheduling evaluation model according to the resource loss value to obtain a target resource scheduling model.

The resource scheduling loss model related to the financial system is constructed in advance and is used for representing the resource loss value corresponding to each resource scheduling action under the current system state, so that the consumption of the computational power resource caused by executing each resource scheduling action can be intuitively obtained. The resource scheduling loss model is composed of resource occupation data, resource scheduling actions and resource loss values, so that the corresponding resource loss values can be obtained only by acquiring the resource occupation data and the resource scheduling actions.

The resource loss value can be used as a parameter for updating the initial resource scheduling evaluation model, a corresponding value in the initial resource scheduling evaluation model is updated through the resource loss value, and the initial resource scheduling evaluation model is continuously and iteratively updated through different resource loss values, so that an updated target resource scheduling model is obtained. And if the target resource scheduling model reaches a preset effect or an ideal value, finishing updating to obtain a final resource scheduling model.

Optionally, as an embodiment, after the resource loss value is obtained, the resource loss value may be used as one of the training parameters to perform iterative training according to a Q-learning algorithm in the reinforcement learning algorithm, may be used as one of the training parameters to perform iterative training according to a State-Action-rewarding-State-Action-SARSA (State-Action-Reward-State-Action) algorithm, or may be used as one of the training parameters to perform training according to a reinforcement learning algorithm such as the deep Q network DQN (Deep Q Network), the deep deterministic strategy gradient DDPG (Deep Deterministic Policy Gradient), and the like, where the application is not particularly limited.

S208, acquiring target resource occupation data of each financial node after the target resource occupation data execute the target resource scheduling action, taking the target resource occupation data as new initial resource occupation data, taking the target resource scheduling model as new initial resource scheduling evaluation model, and returning to execute the step of acquiring resource scheduling evaluation values respectively corresponding to various resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the target resource scheduling model meets the preset convergence condition.

After the resource scheduling action is performed through the initial resource occupation data, the state of the financial system is changed, so that the operation process of the system is simulated. Meanwhile, the system allocates resources for each node in the financial system according to the task request in the running process, at this time, the state of the financial system changes continuously along with the running of the system, initial resource occupation data can be updated continuously along with the state change of the financial system, and resource scheduling actions corresponding to the maximum resource scheduling evaluation value are matched. Therefore, after the initial resource occupation data is changed each time, the resource scheduling loss value corresponding to the resource scheduling action can be updated into the latest resource scheduling evaluation model through the reinforcement learning mode.

The model training is to obtain a final stable model through iterative training, so that a convergence condition exists to judge whether the model is converged or not, and the convergence condition has various modes.

Alternatively, as an embodiment, the preset convergence condition may be that a threshold range is set, so that an average difference between the updated resource scheduling evaluation value and the resource scheduling evaluation value corresponding to the previous model is smaller than the threshold range, so as to indicate that the updated resource scheduling model does not have excessive fluctuation.

Alternatively, as an embodiment, the convergence condition may be that the number of iterative updating times is set, and when the training is performed to the preset number of iterative updating times, model convergence is completed.

And S210, taking the target resource scheduling model as a resource scheduling model for which the financial system training is completed.

The resource scheduling model after convergence can be used for obtaining an accurate resource scheduling evaluation value, so that the resource scheduling model can be directly used.

In the above-mentioned method for training a resource scheduling model of a financial system, it is known that in this scheme, an initial resource scheduling model is updated and trained, so as to obtain a converged resource scheduling model, where in the training process, a plurality of resource scheduling evaluation values corresponding to the model are obtained through an existing initial resource scheduling evaluation model and initial resource occupation data, at this time, it is necessary to obtain a highest evaluation value among the plurality of resource scheduling evaluation values, and an action corresponding to the highest evaluation value is taken as a next resource scheduling action, and an action corresponding to the highest evaluation value means that the resource scheduling action corresponding to the evaluation value is an optimal scheduling action, so that a loss value of computing resources can be reduced. At this time, the highest evaluation value is updated by adopting a reinforcement learning mode and a resource loss function, so that an initial resource scheduling model is updated, the highest evaluation value is continuously updated and trained, the resource scheduling model is continuously updated until convergence, and the resource scheduling model after training can fully represent the optimal resource scheduling action of the financial system in different resource occupation states, so that the utilization rate of resources of the distributed financial service system is improved, and the running efficiency of financial service data calculation tasks is improved.

It should be appreciated that the various resource scheduling actions are preset actions and are actions that the initial resource occupancy data may do. And inputting a resource scheduling action and initial resource occupation data into an initial resource scheduling evaluation model, so as to obtain a resource scheduling evaluation value, wherein the resource scheduling evaluation value is used for evaluating the score obtained by executing the resource scheduling action under the system state corresponding to the initial resource occupation data.

It should be understood that the larger the resource scheduling evaluation value, the better the resource scheduling action corresponding to the representative resource scheduling evaluation value, and therefore the resource scheduling action corresponding to the largest resource scheduling evaluation value is the optimal scheduling action.

In this embodiment, through the above steps, the optimal resource scheduling action can be obtained based on the maximum resource scheduling evaluation value, so as to optimize the execution logic of the model and achieve the effect of the optimal model system operation process.

In one embodiment, as shown in fig. 3, the initial resource scheduling evaluation model is updated according to the resource loss value to obtain a target resource scheduling model, which includes the following steps:

s302, a first resource scheduling evaluation value corresponding to the target resource scheduling action is obtained, and current resource occupation data of each financial node after the initial resource occupation data execute the target resource scheduling action.

S304, according to the initial resource scheduling evaluation model, determining resource scheduling evaluation values corresponding to various resource scheduling actions of the current resource occupation data, and taking the maximum resource scheduling evaluation value corresponding to the current resource occupation data as a second resource scheduling evaluation value.

After the initial resource occupation data in the operation process of the simulated financial system is updated once, the system state is changed due to the execution of the target resource scheduling action and becomes the current resource occupation data, at the moment, a plurality of resource scheduling evaluation values corresponding to a plurality of resource scheduling actions are calculated based on the current resource occupation data and the initial resource scheduling evaluation model, and the resource scheduling evaluation value with the largest evaluation value is selected to be used as a second resource scheduling evaluation value.

Therefore, the first resource scheduling evaluation value is the evaluation value corresponding to the initial resource occupation data, and the second resource scheduling evaluation value is the evaluation value corresponding to the current resource occupation data, so the first resource scheduling evaluation value is the evaluation value before the state update of the financial system, and the second resource scheduling evaluation value is the evaluation value after the state update of the financial system.

S306, updating the first resource scheduling evaluation value according to the difference value between the second resource scheduling evaluation value and the first resource scheduling evaluation value and the sum of the difference value and the resource loss value, and obtaining the updated first resource scheduling evaluation value.

Alternatively, as an embodiment, the first resource scheduling evaluation value may be updated by calculating the first resource scheduling evaluation value, the second resource scheduling evaluation value, and the resource loss value before the financial system status update.

Alternatively, as an example, an average value of the first resource scheduling evaluation value and the second resource scheduling evaluation value may be calculated, and the resource loss value before updating the state of the financial system is subtracted, and the final result is obtained to update the first resource scheduling evaluation value.

And S308, updating the initial resource scheduling evaluation model according to the updated first resource scheduling evaluation value to obtain a target resource scheduling model.

Updating only the first resource scheduling evaluation value cannot directly change the initial resource scheduling evaluation model. At this time, the resource scheduling evaluation model needs to be updated according to the updated first resource scheduling evaluation value, so that the hidden feature of the updated first resource scheduling evaluation value is incorporated into the initial resource scheduling evaluation model.

In this embodiment, through the above steps, the first resource scheduling evaluation value, the second resource scheduling evaluation value and the resource loss value obtained through calculation can be used as training parameters to update the first resource scheduling evaluation value, and further update the initial resource scheduling evaluation model, so that training accuracy is improved, logic of a training process is increased, and an accurate updating result is finally obtained.

In one embodiment, as shown in fig. 4, the step of using the target resource scheduling model as a new initial resource scheduling evaluation model, and returning to execute the step of obtaining the resource scheduling evaluation values corresponding to the multiple resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the target resource scheduling model meets a preset convergence condition includes:

S402, taking the target resource scheduling model as a new initial resource scheduling evaluation model, and detecting whether a system state corresponding to target resource occupation data reaches a preset system ending state; the system end state is a system state corresponding to the target resource occupation data updated by the preset times;

s404, if the system state does not reach the system ending state, returning to execute the step of acquiring the resource scheduling evaluation values respectively corresponding to a plurality of resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the system state reaches the system ending state;

in artificial intelligence, the training process is iterated for a plurality of times, and finally convergence conditions are achieved, so that a corresponding control method needs to be formulated to determine whether the convergence conditions are achieved, and whether the training is completed is determined.

Similarly, the financial system receives task requests continuously during operation, so as to execute resource scheduling actions and continuously transform the system state of the financial system. In addition, the financial system always reaches the system end state of the financial system operation at a certain time or under a certain preset condition when the financial system is operated. That is, the financial system will start to operate from the initial state of the system until the end state of the system is reached, thereby completing the life cycle of the financial system.

Therefore, the life cycle can be used as a circulating condition for training the resource scheduling model, the financial system is set to run from the initial state of the system, and the iterative process of training is completed after continuous running and resource scheduling until the system state of the system reaches the system end state.

S406, if the system state reaches the system end state, judging whether the target resource scheduling model meets a preset convergence condition;

s408, if the target resource scheduling model does not meet the preset convergence condition, resetting the target resource occupation data to initial resource occupation data corresponding to the initial state of the system, and returning to execute the step of acquiring resource scheduling evaluation values respectively corresponding to various resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the target resource scheduling model meets the preset convergence condition; the initial system state is a system state corresponding to initial resource occupation data before the initial resource scheduling evaluation model is updated for the first time.

It should be understood that after the life cycle of the financial system is completed, the trained resource scheduling model does not necessarily reach the convergence state, so that a judgment condition still needs to be added to determine whether the current resource scheduling model converges, if not, the initial state of the system of the financial system needs to be re-simulated, training is performed from the initial state of the system again until the end state of the system is reached again, and whether the convergence state is reached is judged again.

Optionally, as an embodiment, the initial state of the system is a certain state preset by a user, and reaches the end state of the system after iteration update for a preset number of times; or alternatively, as an embodiment, the initial state of the system is a state preset by the user, and iterates continuously until the resource occupation data corresponding to the current system state is the same as the resource occupation data corresponding to the preset system ending state, and at this time, the current system state reaches the system ending state. There are various embodiments for determining the system end state and the system initial state, and the present application is not particularly limited.

Alternatively, as an embodiment, when the resource scheduling evaluation model is trained to a certain state between the initial state and the end state of the system, a determination process of whether the preset convergence condition is satisfied may be added, so that it is not necessary to wait until the end state of the system is completed and then determine whether to converge.

In this embodiment, through the above steps, one iteration loop condition can be determined through the initial state and the end state of the system, and another iteration loop condition is determined through the convergence condition, so that the iteration times are flexibly determined, and the effects of ensuring the training accuracy of the resource scheduling evaluation model and the training accuracy of the resource scheduling evaluation model are achieved.

It should be appreciated that the historical data should be obtained as training data prior to training the model, and that the initial resource scheduling assessment model and resource scheduling depletion model should also be built from the historical data prior to updating the model.

The resource scheduling evaluation model takes the resource occupation data, the resource scheduling action and the resource scheduling evaluation value as three parameters to participate in the calculation of the model, so that the historical resource occupation data and the resource scheduling action corresponding to the historical resource occupation data are required to be acquired, and the resource scheduling evaluation value corresponding to the historical resource occupation data is taken as basic data for constructing the resource scheduling evaluation model.

Similarly, the resource scheduling loss model takes the resource occupation data, the resource scheduling action and the resource loss value as three parameters to participate in the calculation of the model, so that the historical resource occupation data and the resource scheduling action corresponding to the historical resource occupation data are required to be acquired, and the resource scheduling loss value corresponding to the historical resource occupation data is taken as basic data for constructing the resource scheduling loss model.

In this embodiment, through the above steps, an initial resource scheduling evaluation model and a resource scheduling loss model can be constructed by acquiring historical data, so as to achieve the effect of acquiring a necessary model before model training.

In one embodiment, as shown in fig. 5, the constructing an initial resource scheduling evaluation model and a resource scheduling loss model according to the historical resource occupation data, the historical resource scheduling actions and the historical resource loss values includes:

s502, taking historical resource occupation data and historical resource scheduling actions corresponding to the historical resource occupation data as independent variables, taking historical resource loss values of the historical resource occupation data executing the historical resource scheduling actions as dependent variables, and fitting by using a deep neural network algorithm to obtain a resource scheduling loss model;

S504, according to the historical resource occupation data and the historical resource scheduling action corresponding to the historical resource occupation data, determining a historical resource scheduling evaluation value of the historical resource scheduling action executed by the historical resource occupation data;

s506, taking the historical resource occupation data and the historical resource scheduling action corresponding to the historical resource occupation data as independent variables, taking the historical resource scheduling evaluation value as dependent variables, and fitting according to a deep neural network algorithm to obtain an initial resource scheduling evaluation model.

The deep neural network is an unsupervised training, and the value to be predicted can be regressed through the historical data, that is, the predicted value can be fitted according to the historical data, so that under the condition that no independent variable is recorded in the historical data, the dependent variable corresponding to the independent variable which is not recorded is predicted. Therefore, the historical resource occupation data and the historical resource scheduling actions can be used as independent variables, the historical resource loss values are used as dependent variables, and the resource scheduling loss model is obtained through fitting.

Similarly, historical resource occupation data and historical resource scheduling actions can be used as independent variables, historical resource scheduling evaluation values are used as dependent variables, and an initial resource scheduling evaluation model is obtained through fitting.

Optionally, as an embodiment, performing fitting training by using a deep neural network DNN (Deep Neural Networks) to obtain a corresponding initial resource scheduling evaluation model and a resource scheduling loss model; or alternatively, as an embodiment, a linear regression algorithm is adopted to perform fitting, so as to obtain a corresponding initial resource scheduling evaluation model and a resource scheduling loss model. How to fit is not particularly limited in the present application.

Optionally, as an embodiment, the historical resource scheduling evaluation value may be determined according to a historical resource scheduling action corresponding to the historical resource occupation data and the historical resource occupation data through a preset algorithm; or alternatively, the process may be performed,

alternatively, as an embodiment, the historical resource scheduling evaluation value may be determined according to historical resource occupation data and a historical resource loss value corresponding to the historical resource occupation data through a preset algorithm.

In this embodiment, through the above steps, a model for predicting a resource scheduling loss value and predicting a resource scheduling evaluation value can be constructed according to a fitted algorithm and historical data, so as to achieve the effect of improving the prediction accuracy of the necessary basic model.

In one embodiment, as shown in fig. 6, a resource scheduling method of a financial system is provided, and the method is applied to the server 104 in fig. 1 for illustration, and includes the following steps:

S602, acquiring real-time resource occupation data of each financial node in a financial system and a resource scheduling model trained by the financial system; the resource scheduling model is obtained by training the method according to any one of the resource scheduling model training methods of the financial system.

In the actual running process of the financial system, real-time resource occupation data are required to be acquired to determine the real-time system state of the financial system, a resource scheduling model which is trained by the resource occupation data corresponding to the system state is input, and a real-time resource scheduling evaluation value corresponding to the resource occupation data is obtained according to the resource scheduling model.

S604, acquiring real-time resource scheduling evaluation values respectively corresponding to the real-time resource scheduling actions of the real-time resource occupation data execution through the resource scheduling model, and acquiring the real-time resource scheduling action with the largest real-time resource scheduling evaluation value.

Because the resource scheduling model is trained, an accurate real-time resource scheduling evaluation value can be obtained, and thus, which action is the next resource scheduling action to be executed can be determined, thereby completing the resource scheduling. The resource scheduling action is usually executed after the server receives the financial system task.

S606, scheduling the resources of the financial system according to the real-time resource scheduling action with the maximum real-time resource scheduling evaluation value.

It should be understood that when the resource scheduling model is used, an optimal resource scheduling mode in the current system state needs to be found, so that a resource scheduling action with the largest resource scheduling evaluation value is selected as a resource scheduling method, and optimal resource scheduling is made.

Optionally, as an embodiment, the distributed system is operated and an optimal resource scheduling model is imported, the current state of the system is submitted to the scheduling model, the scheduling model obtains the evaluation of all the executable scheduling modes in the current state through an evaluation function, and the system makes corresponding resource scheduling by referring to the height of each evaluation value.

In the resource scheduling method of the financial system, the real-time resource scheduling action corresponding to the maximum resource scheduling evaluation value is determined to perform resource scheduling by acquiring the real-time resource occupation data, and the optimal resource scheduling action can be obtained according to the resource scheduling model, so that the effects of optimizing the scheduling of the distributed resources in the financial service system and improving the utilization rate of the resources of the distributed financial service system are achieved.

Optionally, as an embodiment, the present embodiment makes an optimal policy for system resource scheduling through a reinforcement Learning algorithm Q-Learning.

The system first aggregates the resource occupancy of the distributed entity, i.e., each node in the system, and acts as a state s. The distributed system gets the task and distributes it to different nodes as different actions a. And then constructing a Q table, wherein the Q (s, a) table is used for measuring the distributed system resource pressure caused by various actions in various states, and the consumed CPU resource quantity is used as the value of executing the action a in the current state s.

And then using a deep neural network algorithm DNN to fit the Q (s, a) table as a discrete binary function, so as to obtain a Q (s, a) function with the same output value as the Q (s, a) table under the condition that the input values s and a are the same, and compressing the space required by the system.

Finally, through learning of historical data, continuous iteration is carried out on the Q function to finally obtain a converged Q function, so that the system can acquire what action a is performed under the state s of the system through the Q function, the resource occupation of the distributed system is smaller, and finally the system is enabled to adapt to periodic change and a better system task scheduling scheme is obtained.

As shown in fig. 7, a flow chart of a distributed resource scheduling model module based on reinforcement learning is shown. The module comprises a system data acquisition module, an evaluation function and system consumption function module, a Q-learning reinforcement learning module and an optimal resource scheduling model module. The overall idea is that after historical data is collected through the system data collection module of S702, an evaluation function and a system consumption function are built through the step of S704, and then the reinforcement learning module of S706 is used on the basis of possessing the evaluation function and the system consumption function to obtain the final trained evaluation function which is used as an optimal resource scheduling model.

Wherein, the system data acquisition module: and collecting historical data of the distributed system, wherein the data comprises a complete system pressure change and a complete resource scheduling process in an analysis period.

And (3) constructing an evaluation function and a system consumption function module: the evaluation function Q (s, a) and the system consumption function R (s, a) are important components of Q-learning. The evaluation function Q (s, a) is used to represent the evaluation of the effect obtained by the system performing some manner of resource scheduling in the current state. The system consumption function R (s, a) represents the system resources consumed by the scheduling after the system performs some manner of resource scheduling in the current state.

Q-learning reinforcement learning module: the collected complete historical data is formatted, an initial evaluation function Q (s, a) and a system consumption function R (s, a) are constructed, various parameters are initialized, and then the operation of the system is simulated for scheduling resources. The system refers to the evaluation function Q (s, a), and takes the highest evaluation as the scheduling criterion. After the scheduling is executed, the system consumes corresponding resources, and the size of the resource consumption is obtained from a system consumption function. And updating the evaluation of the system on the scheduling just executed after the scheduling is executed, namely an evaluation function. Because the evaluation function is actually a very large two-dimensional table, and each time the system executes a complete cycle, only the evaluation value of the state experienced by the iteration is updated, in order to compress the system space and also in order to update in real time, a neural network of DNN is adopted to fit and replace the discrete binary function. The fitting of the neural network is adjusted, which is equivalent to updating the evaluation function Q (s, a), and when the evaluation function Q (s, a) converges, the system can obtain a high-quality evaluation function Q (s, a) as a dispatching strategy of the distributed system.

Generating an optimal resource scheduling model module: after self-training by the reinforcement learning module, the distributed system can import an evaluation function fitted by DNN as a scheduling model to participate in scheduling of the system, and when the distributed system encounters a resource scheduling problem, the system can automatically refer to the scheduling model to make a decision.

Alternatively, as an embodiment, as shown in fig. 8, fig. 8 is a format of system data sampling post-arrangement, and the system data acquisition module arranges the historical data of the system. T is the complete run cycle of the system. s is(s) _i A matrix consisting of all nodes within the distributed interior at time i. a, a _i An action of distributing the resource to a certain node for the moment i. r is (r) _i At s _i Execution of state a _i System resource consumption caused by actions.

Alternatively, as an embodiment, as shown in fig. 9, fig. 9 is a build evaluation function and a system evaluation function.

S902, S and a in the system data acquisition module take S as the current state, and take a as the action of jumping to the next state to be constructed as an evaluation function Q (S, a) for evaluating the value of the system for executing the scheduling action a in the S state. The system consumption function R (s, a) is constructed by taking R as a dependent variable and s and a as independent variables in the system data acquisition module, and is used for representing the system consumption generated by executing the resource scheduling operation a in the system state s.

S904, using DNN to fit the functions Q (S, a) and R (S, a) to get two approximate functions of Q (S, a) and R (S, a), respectively, to replace. Because the evaluation function and the system consumption function are two-dimensional tables formed by s and a together, DNN is adopted to fit the two-dimensional tables for the compression system to obtain the respective approximation functions and provide the subsequent use of the algorithm as the functions of Q (s, a) and R (s, a).

Alternatively, as one embodiment, as shown in FIG. 10, FIG. 10 is a Q-Learning reinforcement Learning training flow chart.

S1002, by collecting historical information of the system and constructing an initial evaluation function Q (S, a) and a system consumption function R (S, a) for preparing for training of the algorithm.

S1004, setting an initial system state and simulating the operation of the system.

S1006, simulating system operation and obtaining current tasks. When the distributed system needs to schedule the current task, the system accesses the evaluation function Q (s, a) and takes the action a which can obtain the highest evaluation Q in the current state as the current scheduling scheme.

S1008, after the system makes scheduling according to the evaluation function Q (S, a), updating the evaluation corresponding to the scheduling executed just before. The update rule is as follows:

Q(s,a)＝Q(s,a)+α[R(s,a)+γmax _a' Q(s',a')-Q(s,a)]

wherein, alpha is learning rate, gamma is attenuation rate, and the adjustment is performed manually;

s is the system state before resource scheduling, s' is the system state after resource scheduling;

a is optional scheduling action in the state, and a 'is optional scheduling action in the s' state;

max _a' the Q (s ', a') is the schedule with the highest evaluation of the evaluation function Q in the optional schedule a 'in the s' state

The evaluation of the resource schedule that the system has just performed can be updated according to the update rule of the above equation.

Assuming that the state jump diagram is shown in fig. 11, 6 states can be obtained from fig. 11: s is S ₀ 、S ₁ 、S ₂ 、S ₃ 、S ₄ 、S ₅ . Jumping action a _x Indicating whenThe former state jumps to state S _x 。R _x Indicating a by jumping from the current state _x Transition to state S _x The acquired consumption value.

In state S ₃ For example, it may jump to the 6 states that are currently known and obtain the instant consumption value. (instant consumption value is marked on arrow if S ₃ Failure to jump to this state is-1). S is S ₃ Through a ₀ 、a ₁ 、a ₂ 、a ₃ 、a ₄ 、a ₅ Jump to S respectively ₀ 、S ₁ 、S ₂ 、S ₃ 、S ₄ 、S ₅ The instant consumption values obtained were: r is R ₀ ＝-1、R ₁ ＝50、R ₂ ＝50、R ₃ ＝-1、R ₄ ＝50、R ₅ = -1. All instant consumption values R _x Integration will result in what is shown in the following matrix.

The initialization evaluation function Q is shown in the following matrix:

let it be in state S ₀ To start, walk S ₄ Route S ₅ To end, a new evaluation function will be obtained by performing an iteration.

The method comprises the following steps:

step 1: s is S ₀ To S ₄

Q(s,a)＝Q(s,a)+α[R(s,a)+γmax _a' Q(s',a')-Q(s,a)]

Q(s ₀ ,a ₄ )＝Q(s ₀ ,a ₄ )+α[R(s ₀ ,a ₄ )+γQ(s ₄ ,a ₅ )-Q(s ₀ ,a ₄ )]

Q(s ₀ ,a ₄ )＝0+α[50+γ0-0]

Q(s ₀ ,a ₄ )＝α50

Step 2: s is S ₄ To S ₅

Q(s,a)＝Q(s,a)+α[R(s,a)+γmax _a' Q(s',a')-Q(s,a)]

Q(s ₄ ,a ₅ )＝Q(s ₄ ,a ₅ )+α[R(s ₄ ,a ₅ )]

The subsequent items are omitted as they are already the last step.

Q(s ₄ ,a ₅ )＝0+α[50]

Q(s ₄ ,a ₅ )＝α50

After the end, the Q function is updated to obtain the contents shown in the following matrix.

The value in the evaluation function Q is continuously changed after the cyclic iteration, and S is finally obtained by searching Q _x Which a is executed next _x Most valuable.

S1010: since the evaluation function Q of the resource scheduling has a new evaluation value for the newly executed and new evaluation values, in order to add the variation to the evaluation function Q, the system inputs the new evaluation value to the DNN neural network to train to obtain the updated evaluation function Q.

And then checking whether the system state reaches an end state. If not, return to S1006 continues to run the system until the system reaches an end state.

It is continued to check whether the evaluation function Q converges. If the function does not converge, the process returns to S1004 again to restart the system and enter the next iterative training.

And when the system state is the ending state and the evaluation function Q is converged, obtaining a final resource scheduling model.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a resource scheduling model training device of the financial system for realizing the resource scheduling model training method of the financial system. The implementation scheme of the solution provided by the device is similar to the implementation scheme described in the above method, so the specific limitation in the embodiments of the resource scheduling model training device for one or more financial systems provided below may refer to the limitation of the resource scheduling model training method for a financial system, which is not described herein.

In one embodiment, as shown in fig. 12, there is provided a resource scheduling model training apparatus 1200 of a financial system, comprising: a data model acquisition module 1201, a scheduling action acquisition module 1202, a model update module 1203, a loop convergence module 1204 and a training completion module 1205, wherein:

the data model obtaining module 1201 is configured to obtain initial resource occupation data of each financial node in a financial system, and a pre-constructed initial resource scheduling evaluation model associated with the financial system.

The scheduling action obtaining module 1202 is configured to obtain, through the initial resource scheduling evaluation model, resource scheduling evaluation values corresponding to the initial resource occupation data for executing multiple resource scheduling actions respectively, and obtain a target resource scheduling action with a maximum resource scheduling evaluation value.

The model updating module 1203 is configured to obtain, according to a pre-constructed resource scheduling loss model associated with the financial system, a resource loss value of the initial resource occupation data for executing the target resource scheduling action, and update the initial resource scheduling evaluation model according to the resource loss value, so as to obtain a target resource scheduling model.

The cyclic convergence module 1204 is configured to obtain target resource occupation data of each financial node after the target resource occupation data performs the target resource scheduling action, take the target resource occupation data as new initial resource occupation data, take the target resource scheduling model as new initial resource scheduling evaluation model, and return to perform the step of obtaining resource scheduling evaluation values respectively corresponding to multiple resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the target resource scheduling model meets a preset convergence condition.

And a training completion module 1205, configured to use the target resource scheduling model as a resource scheduling model after the financial system training is completed.

Further, in an embodiment, the scheduling action obtaining module 1202 is further configured to obtain a plurality of resource scheduling actions corresponding to the preset initial resource occupation data;

Further, in an embodiment, the model updating module 1203 is further configured to obtain a first resource scheduling evaluation value corresponding to the target resource scheduling action, and current resource occupation data of each financial node after the initial resource occupation data performs the target resource scheduling action;

Further, in an embodiment, the cyclic convergence module 1204 is further configured to use the target resource scheduling model as a new initial resource scheduling evaluation model, and detect whether a system state corresponding to the target resource occupation data reaches a preset system end state; the system end state is a system state corresponding to the target resource occupation data updated by the preset times;

Further, in one embodiment, a historical data processing module is provided, configured to obtain historical resource scheduling data of the financial system in a preset operation period, where the historical resource scheduling data includes historical resource occupation data, a historical resource scheduling action corresponding to the historical resource occupation data, and a historical resource loss value of the historical resource occupation data for executing the historical resource scheduling action;

Further, in one embodiment, the historical data processing module is further configured to use the historical resource occupation data and a historical resource scheduling action corresponding to the historical resource occupation data as independent variables, and the historical resource occupation data execute a historical resource loss value of the historical resource scheduling action as a dependent variable, and fit the dependent variable by using a deep neural network algorithm to obtain the resource scheduling loss model;

Based on the same inventive concept, the embodiment of the application also provides a resource scheduling device of the financial system for realizing the resource scheduling method of the financial system. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the resource scheduling apparatus for one or more financial systems provided below may refer to the limitation of the resource scheduling method for a financial system hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 13, there is provided a resource scheduling apparatus 1300 of a financial system, comprising: a data model acquisition module 1301, a scheduling action acquisition module 1302, and a resource scheduling module 1303, wherein:

the data model obtaining module 1301 is configured to obtain real-time resource occupation data of each financial node in a financial system, and a resource scheduling model that is trained by the financial system; the resource scheduling model is obtained by training the method according to any one of the training methods of the resource scheduling model of the financial system.

The scheduling action obtaining module 1302 is configured to obtain, through the resource scheduling model, real-time resource scheduling evaluation values corresponding to the real-time resource scheduling actions respectively executed by the real-time resource occupation data, and obtain a real-time resource scheduling action with a maximum real-time resource scheduling evaluation value.

And the resource scheduling module 1303 is configured to schedule resources of the financial system according to the real-time resource scheduling action with the maximum real-time resource scheduling evaluation value.

The above-mentioned resource scheduling model training apparatus of the financial system or each module in the resource scheduling apparatus of the financial system may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 14. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing the resource occupation data, the resource scheduling actions and the resource scheduling evaluation values of all financial nodes in the financial system. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements a resource scheduling model training method of a financial system or a resource scheduling method of a financial system.

It will be appreciated by those skilled in the art that the structure shown in fig. 14 is merely a block diagram of a portion of the structure associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements are applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A method for training a resource scheduling model of a financial system, the method comprising:

2. The method according to claim 1, wherein the obtaining, by the initial resource scheduling evaluation model, the resource scheduling evaluation value corresponding to each of the plurality of resource scheduling actions performed by the initial resource occupation data, and obtaining the target resource scheduling action with the largest resource scheduling evaluation value, includes:

3. The method according to claim 2, wherein updating the initial resource scheduling evaluation model according to the resource loss value to obtain a target resource scheduling model comprises:

4. The method according to claim 1, wherein the step of returning the target resource scheduling model as a new initial resource scheduling evaluation model to execute the resource scheduling evaluation value corresponding to each of the plurality of resource scheduling actions corresponding to the initial resource occupation data through the initial resource scheduling evaluation model until the target resource scheduling model meets a preset convergence condition includes:

5. The method of claim 1, further comprising, prior to the obtaining initial resource occupancy data for each financial node in a financial system and the pre-constructed initial resource scheduling assessment model associated with the financial system:

6. The method of claim 5, wherein constructing the initial resource scheduling assessment model and the resource scheduling loss model from the historical resource occupancy data, the historical resource scheduling actions, and the historical resource loss values comprises:

7. A method for scheduling resources in a financial system, the method comprising:

8. A resource scheduling model training apparatus for a financial system, the apparatus comprising:

9. A resource scheduling apparatus for a financial system, the apparatus comprising:

the data model acquisition module is used for acquiring real-time resource occupation data of each financial node in the financial system and a resource scheduling model trained by the financial system; the resource scheduling model is trained by the method of any one of claims 1 to 6;

the scheduling action acquisition module is used for acquiring the real-time resource scheduling evaluation values respectively corresponding to the real-time resource scheduling actions of the real-time resource occupation data and acquiring the real-time resource scheduling action with the largest real-time resource scheduling evaluation value through the resource scheduling model;

and the resource scheduling module is used for scheduling the resources of the financial system according to the real-time resource scheduling action with the maximum real-time resource scheduling evaluation value.

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

12. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.