CN117437063A

CN117437063A - Financial risk prediction method and system

Info

Publication number: CN117437063A
Application number: CN202311695962.9A
Authority: CN
Inventors: 涂畅; 周兴; 陈西琳
Original assignee: Bank Of Communications Co ltd Hunan Branch
Current assignee: Bank Of Communications Co ltd Hunan Branch
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-01-23

Abstract

The invention discloses a financial risk prediction method and a financial risk prediction system, which belong to the technical field of machine learning, and can solve the problems of time and labor waste and high cost existing in manual supervision by introducing machine learning to supervise financial risks.

Description

Financial risk prediction method and system

Technical Field

The invention belongs to the technical field of machine learning, and particularly relates to a financial risk prediction method and a financial risk prediction system.

Background

The traditional financial supervision method is mostly realized by manual offline investigation, along with the development of the Internet, a plurality of companies publish basic data of the financial supervision method on the Internet, and part of important data is uploaded to a financial supervision organization, so that the financial supervision personnel can complete the investigation by analyzing the data of the financial supervision personnel, and the financial supervision and the enterprises are convenient. And along with the rapid economic development, more and more enterprises are rushed in the market, so that the financial supervision workload is increased sharply, and the manual supervision cost is high and the efficiency is low. Financial supervisory personnel are urgent to relieve working pressure, and enterprise personnel are hoped to quickly know the risk condition of the enterprise.

In order to solve the problem of manual supervision, the prior art often adopts a machine learning mode to predict financial risks, so that supervision effect can be realized. The existing machine learning usually adopts a gradient descent method, a particle swarm algorithm, a genetic algorithm and other optimization algorithms to optimize the neural network, and the automatic identification of data can be realized, but due to algorithm defects, the optimization process tends to be in the middle of local optimization, so that the optimization effect of the neural network parameters is poor.

Disclosure of Invention

The invention provides a financial risk prediction method and a financial risk prediction system, which are used for solving the problems that local optimization is easy to fall into and the parameter optimization effect is poor in the machine learning process in the prior art.

In a first aspect, the present invention provides a financial risk prediction method, including:

crawling historical enterprise feature data and label data corresponding to the manually input historical enterprise feature data from a designated data source, constructing training data by adopting the historical enterprise feature data and the label data corresponding to the historical enterprise feature data, and dividing the training data into a training set and a testing set;

performing data cleaning and data preprocessing on the training set and the testing set to obtain the preprocessed training set and the preprocessed testing set;

Constructing a financial risk prediction model, optimizing parameters of the financial risk prediction model by adopting a preprocessed training set and a multi-group optimization algorithm, and testing the financial risk prediction model by adopting a preprocessed testing set to obtain a trained financial risk prediction model;

and acquiring real-time enterprise feature data corresponding to the financial risk to be predicted, and identifying the real-time enterprise feature data by adopting a trained financial risk prediction model to acquire a financial risk prediction method.

Further, performing data cleaning and data preprocessing on the training set and the testing set to obtain a preprocessed training set and a preprocessed testing set, including:

acquiring historical enterprise characteristic data with missing values or abnormal values in a training set to obtain first data to be cleaned;

judging whether the number of the first data to be cleaned exceeds a preset number threshold, if so, replacing the missing value or the abnormal value with the average value of the data of the same type to obtain the median, and obtaining the first cleaning data, otherwise, directly removing the first data to be cleaned to obtain the first cleaning data;

acquiring historical enterprise characteristic data with missing values or abnormal values in the test set to obtain second data to be cleaned;

Judging whether the number of the second data to be cleaned exceeds a preset number threshold, if so, replacing the missing value or the abnormal value with the average value of the data of the same type to obtain the median, and obtaining second cleaning data, otherwise, directly removing the second data to be cleaned to obtain the second cleaning data;

and carrying out normalization processing on the first cleaning data and the second cleaning data to obtain a training set after pretreatment and a testing set after pretreatment.

Further, constructing a financial risk prediction model, and performing parameter optimization on the financial risk prediction model by adopting a preprocessed training set and a multi-group optimization algorithm, wherein the method comprises the following steps:

constructing a neural network model, and taking the constructed neural network model as a financial risk prediction model;

initializing a population scale P, a minimum parallel training group number Q1 and a maximum parallel training group number Q2 corresponding to the financial risk prediction model, wherein the population scale P is the training group number Q2-! Integer multiples of (2);

initializing model parameters of a financial risk prediction model based on the population scale P to obtain a training population comprising P training individuals, wherein each training individual comprises all model parameters of the financial risk prediction model;

Initializing the current training group number Q based on the minimum parallel training group number Q1 ^t The value of (2) is Q1; acquiring an adaptability value corresponding to each training individual by adopting the preprocessed training set;

training individuals are arranged according to the sequence of the fitness value from the big to the small, and the current training group number Q is used ^t Basis, Q before taking out ^t Individual training individuals as Q ^t The local leaders of the training groups take out the training individuals with the largest fitness value as global leaders, and the rest training individuals are evenly distributed to Q ^t The training groups;

aiming at each training group, carrying out local search on each training individual in the training group by adopting a local leader guiding strategy, a first self-adaptive optimizing strategy and a greedy optimizing strategy to obtain a training group after local search;

aiming at each training set after local search, carrying out global search on each training individual in the training set by adopting a probability optimization strategy, a global leader guiding strategy, a second self-adaptive optimization strategy and a greedy optimization strategy to obtain a training set after global search;

based on the training groups after global search, reselecting a global leader and a local leader of each training group;

Judging whether the local leaders of each training group are unchanged in the L1 training process, if yes, updating the local leaders by adopting a probability disturbance strategy and a variation strategy, and entering a judging step of the global leaders, otherwise, directly entering the judging step of the global leaders;

judging whether the global leader is unchanged in the L2 training process, if so, entering a step of judging the number of the current training groups, otherwise, entering a step of judging the iteration ending condition;

judging whether the current training group number is greater than the maximum parallel training group number and is Q2, if so, keeping the local leader of each training group unchanged, and re-following other training individualsUniformly distributing the machine to each training group, entering a judging step of iteration ending conditions, otherwise, enabling the current training group number Q ^t And is incremented by one and is preceded by Q ^t Individual training individuals as Q ^t The local leaders of the training groups take out the training individuals with the largest fitness value as global leaders, and the rest training individuals are evenly distributed to Q ^t In the training groups, a judging step of iteration ending conditions is carried out;

judging whether the iteration ending condition is met, if yes, outputting a global leader as a final model parameter of the financial risk prediction model to obtain the financial risk prediction model after parameter optimization, otherwise, returning to the step of local search.

Further, initializing model parameters of a financial risk prediction model based on the population scale P to obtain a training population including P training individuals, including:

the d-th dimension model parameters of the i-th training individual are generated as follows:

X _id ＝X _mind +R(0,1)*(X _maxd -X _mind )

wherein X is _id D-th dimension model parameters representing the i-th training individual, d=1, 2, …, D representing the total number of model parameters, R (0, 1) representing [0,1 ]]Random numbers uniformly distributed on the surface, X _maxd Represents the maximum value, X, of the d-th dimension model parameter _mind Representing a minimum value of the d-th dimension model parameter;

and repeatedly generating model parameters of the P training individuals to obtain a training population.

Further, the training set after preprocessing is adopted to obtain the fitness value corresponding to each training individual, which comprises the following steps:

applying the ith training individual to the financial risk prediction model, and taking N pieces of historical enterprise characteristic data from the preprocessed training set as input of the financial risk prediction model to obtain the actual output of the financial risk prediction model as y _i,nm ；

Takes tag data corresponding to historical enterprise feature data as expected outputAnd according to the actual output y _i,nm Desired output +.>The fitness value of the ith training individual is obtained as follows:

wherein f _i Fitness value representing the i-th training individual, i=1, 2, …, P; y is _i,nm The ith training individual is applied to the financial risk prediction model, and the nth historical enterprise characteristic data is used as the input of the financial risk prediction model, so that the output of the mth output neuron of the obtained financial risk prediction model; n=1, 2, …, N, m=1, 2, …, M represents the total number of output neurons of the financial risk prediction model, i.e. the total number of risk categories;representing expected output corresponding to the nth historical enterprise feature data; alpha represents a constant term greater than 0 and less than 1.

Further, for each training group, performing local search on each training individual in the training group by adopting a local leader guiding strategy, a first self-adaptive optimizing strategy and a greedy optimizing strategy to obtain a training group after local search, including:

for each training individual in each training group, randomly generating a new individual for the training individual, judging whether the fitness value of the new individual is larger than that of the original training individual, if so, replacing the corresponding original training individual by the new individual to obtain the training individual after initial search, otherwise, retaining the original training individual to obtain the training individual after initial search;

aiming at the training individuals after initial search, randomly generating a first random number between [0,1] for the training individuals after initial search, judging whether the first random number is larger than the local update probability, if so, updating the training individuals after initial search by adopting a local leader guiding strategy, a first self-adaptive optimizing strategy and a greedy optimizing strategy, otherwise, not updating the training individuals after initial search;

Traversing all training individuals to obtain a training group after local search;

wherein updating the trained individuals after the initial search with the local leader guidance strategy and the greedy optimization strategy comprises:

the first updated value is obtained as follows:

in the method, in the process of the invention,represents the d-th dimension model parameter of the j-th training individual in the q-th training group in the t-th training process,representation->A corresponding first updated value; q=1, 2, …, Q ^t ，/>D-th dimension model parameter representing local leader corresponding to q-th training set,/d->Represents the division +.>The d-th dimension model parameters of the random individuals,representing the firstAdaptive optimization step size, and->t _max Represents the maximum number of exercises, λ represents the random number between (-0.1,0.1);

judging whether the fitness value of the first updated value is increased, if so, accepting the update, otherwise rejecting the update.

Further, for each training set after local search, performing global search on each training individual in the training set by adopting a probability optimization strategy, a global leader guiding strategy, a second self-adaptive optimization strategy and a greedy optimization strategy to obtain a training set after global search, including:

the global update probability is obtained as follows:

In the formula, pr _i Representing global update probability corresponding to the ith training individual, f _i Indicating the fitness value of the ith training individual, f _max An fitness value representing a global leader;

aiming at the training individuals after the local search, randomly generating a second random number between [0,1] for the training individuals after the local search, judging whether the second random number is smaller than the global updating probability, if so, updating the training individuals after the local search by adopting a global leader guiding strategy, a second self-adaptive optimizing strategy and a greedy optimizing strategy, otherwise, not updating the training individuals after the local search;

traversing all training individuals to obtain a training group after global searching;

the method for updating the training individuals after the local search by adopting the global leader guiding strategy, the second self-adaptive optimizing strategy and the greedy optimizing strategy comprises the following steps:

the second updated value is obtained as follows:

in the method, in the process of the invention,d-th dimension model parameter representing the i-th training individual,>representation->A corresponding second updated value is provided for the second data,d-th dimension model parameter representing global leader,/->Indicate>The other random individuals, R (0, 1) represents the random number between (0, 1), R (-1, 1) represents the random number between (-1, 1), and- >Represents the second adaptive optimization step size and when the i-th training individual +.>The fitness value f of (2) _i ^t Step above the average fitness value of all trained individuals ₂ ^t Value preset maximum valueOtherwise->The value is +.> Representation->Corresponding preset minimum value f _ave Representing the average fitness value, f, of all trained individuals _min Representing the minimum fitness value of all trained individuals;

judging whether the fitness value of the second updating value is increased, if so, accepting the updating, otherwise rejecting the updating.

Further, updating the local leader by using a probability disturbance strategy and a mutation strategy comprises:

aiming at the training individuals after global search, randomly generating a third random number between [0,1] for the training individuals after global search, judging whether the third random number is smaller than the global updating probability, if so, carrying out first disturbance updating on the training individuals, otherwise, carrying out second disturbance updating on the training individuals;

traversing all the training individuals after global searching to obtain the training individuals after disturbance updating;

aiming at the training individuals after disturbance updating, obtaining variation updating values of the training individuals, and adopting a greedy optimization strategy to reserve and select the variation updating values;

Wherein the first perturbation is updated as:

in the method, in the process of the invention,representing updated values of the d-th dimension model parameters of the i-th training individual after the global search;

the second perturbation is updated as:

in the method, in the process of the invention,d-th dimension model parameters representing the i-th training individual after global search, R ₁ Represents [0,1 ]]A fourth random number uniformly distributed on the upper part, R ₂ Represents [0,1 ]]A fifth random number uniformly distributed thereon, < >>D-th dimension model parameters representing global leader after global search,/d->D-th dimension model parameters representing random individuals other than the i-th training individual after the global search;

the variation update value of the training individuals is obtained as follows:wherein beta is ₁ Representing a scaling factor that decreases linearly from 2 to 0 as the number of exercises increases; pi represents the circumference ratio; beta ₂ Represents [0,1 ]]Random numbers uniformly distributed on the base.

Further, the testing set after preprocessing is adopted to test the financial risk prediction model so as to obtain a trained financial risk prediction model, and the testing set comprises the following steps:

taking the preprocessed test set as a training sample of the financial risk prediction model, and acquiring the accuracy, precision and recall rate of the financial risk prediction model;

when the accuracy, the precision and the recall rate of the financial risk prediction model meet preset requirements, the financial risk prediction model after parameter optimization is used as a financial risk prediction model after training is completed;

When any one of the accuracy, the precision and the recall rate of the financial risk prediction model does not meet the preset requirements, the financial risk prediction model is retrained until the accuracy, the precision and the recall rate of the financial risk prediction model meet the preset requirements, and the financial risk prediction model after parameter optimization is used as the financial risk prediction model after training is completed.

In a second aspect, the invention provides a financial risk prediction system, comprising a data acquisition module, a data cleaning module, a model training module and a risk prediction module;

the data acquisition module is used for crawling the historical enterprise feature data and the label data corresponding to the manually input historical enterprise feature data from the appointed data source, constructing training data by adopting the historical enterprise feature data and the label data corresponding to the historical enterprise feature data, and dividing the training data into a training set and a testing set;

the data cleaning module is used for cleaning data and preprocessing the data of the training set and the testing set so as to obtain the preprocessed training set and the preprocessed testing set;

the model training module is used for constructing a financial risk prediction model, carrying out parameter optimization on the financial risk prediction model by adopting a preprocessed training set and a multi-group optimization algorithm, and testing the financial risk prediction model by adopting a preprocessed testing set so as to obtain a trained financial risk prediction model;

The risk prediction module is used for collecting real-time enterprise feature data corresponding to financial risks to be predicted, and identifying the real-time enterprise feature data by adopting a trained financial risk prediction model so as to obtain a financial risk prediction method.

According to the financial risk prediction method and system, the financial risk is monitored by introducing machine learning, so that the problems of time and labor waste and high cost existing in manual supervision can be solved, and the multi-group optimization algorithm is provided for solving the problems existing in the existing machine learning algorithm, so that the convergence speed and the convergence precision can be effectively improved, the global searching capability can be effectively improved, the better parameter optimization effect is realized, and the defects existing in the existing optimization algorithm are overcome.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a flowchart of a financial risk prediction method provided by the present invention.

Fig. 2 is a schematic structural diagram of a financial risk prediction system according to the present invention.

In the drawings, a 1-data acquisition module, a 2-data cleaning module, a 3-model training module and a 4-risk prediction module are adopted.

Specific embodiments of the present invention have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

Embodiments of the present invention are described in detail below with reference to the accompanying drawings.

As shown in fig. 1, the present embodiment provides a financial risk prediction method, including:

s1, crawling historical enterprise feature data and label data corresponding to the manually input historical enterprise feature data from a designated data source, constructing training data by adopting the historical enterprise feature data and the label data corresponding to the historical enterprise feature data, and dividing the training data into a training set and a testing set.

The historical enterprise feature data can be data stored in a database in advance, or can be data on a certain website, and the data can be crawled in real time or periodically by a web crawler, so that the enterprise feature can be obtained. If only early warning is needed, the label data corresponding to the manually input historical enterprise feature data can include financial risk or no financial risk, and the label data can be obtained from the historical manual supervision data. More tag data can be set according to the actual demands of management staff.

Optionally, the historical enterprise characteristic data may include a city in which the enterprise is located, an enterprise type, a registered capital, whether in an abnormal list of business operations, a number of times an enterprise legal dispute is reported, a number of times historical enterprise characteristic data is reported, a number of labor contract disputes, a number of executed person information, a number of judge document information, a number of court announcement information, a research and development investment proportion, a research and development income proportion, a business type number, a number of business traffic objects as negative enterprises, a number of business traffic objects as associated enterprises, a first stakeholder proportion, a number of times a equity change is made, a number of subsidiary companies, a number of times an enterprise is reported on a government platform, a total asset return rate, a net asset return rate, a flow rate, a cash flow liability rate, a capital retention value-added rate, a business profit growth rate, a net capital and net asset rate, and the like; it should be noted that the enterprise feature data is merely an example of the present embodiment, and other features may be adopted.

S2, data cleaning and data preprocessing are conducted on the training set and the testing set, so that the training set after preprocessing and the testing set after preprocessing are obtained.

The data cleansing may include cleansing of abnormal data and processing of missing values to avoid the impact of abnormal data on the training process. The preprocessing can include character conversion to numbers and normalization processing, so that dimension can be eliminated, and the data training amount can be reduced, thereby accelerating the training process.

And S3, constructing a financial risk prediction model, carrying out parameter optimization on the financial risk prediction model by adopting a preprocessed training set and a multi-group optimization algorithm, and testing the financial risk prediction model by adopting a preprocessed testing set so as to obtain the trained financial risk prediction model.

Alternatively, a neural network model may be employed to construct a financial risk prediction model, such as: a convolutional neural network or a BP (Back Propagation) neural network may be used or a financial risk prediction model may be constructed, and in this embodiment, the BP neural network is preferred to construct the financial risk prediction model. It should be noted that the above neural network is merely an example of the present embodiment, other neural networks may be used to construct the financial risk prediction model, and different neural networks may have different input forms, and the training set after the preprocessing and the test set after the preprocessing need to be converted into the input forms of the financial risk prediction model.

S4, acquiring real-time enterprise feature data corresponding to the financial risk to be predicted, and identifying the real-time enterprise feature data by adopting a trained financial risk prediction model to acquire a financial risk prediction method.

The processing of the real-time enterprise feature data should be consistent with the historical enterprise feature data, so that the trained financial risk prediction model can correctly perform the data identification process, and thereby automatic supervision of financial risks is realized.

According to the financial risk prediction method, the financial risk is monitored by introducing machine learning, the problems of time and labor waste and high cost existing in manual supervision can be solved, and aiming at the problems existing in the existing machine learning algorithm, the multi-group optimization algorithm is provided, so that the convergence speed and the convergence precision can be effectively improved, the global searching capability can be effectively increased, a better parameter optimization effect is achieved, and the defects existing in the existing optimization algorithm are overcome.

In this embodiment, performing data cleaning and data preprocessing on the training set and the test set to obtain a preprocessed training set and a preprocessed test set, including:

And acquiring historical enterprise characteristic data with missing values or abnormal values in the training set, and obtaining first data to be cleaned.

And judging whether the number of the first data to be cleaned exceeds a preset number threshold, if so, replacing the missing value or the abnormal value with the average value of the data of the same type to obtain the median, and obtaining the first cleaning data, otherwise, directly removing the first data to be cleaned, and obtaining the first cleaning data.

And acquiring historical enterprise characteristic data with missing values or abnormal values in the test set, and obtaining second data to be cleaned.

Judging whether the number of the second data to be cleaned exceeds a preset number threshold, if so, replacing the missing value or the abnormal value with the average value of the data of the same type to obtain the median, and obtaining the second cleaning data, otherwise, directly removing the second data to be cleaned, and obtaining the second cleaning data.

It should be noted that, the above data cleaning and normalization processing are only preferred modes of the present embodiment, and other data preprocessing techniques may be used to process the training data, so that the data better meets the training requirements.

In this embodiment, constructing a financial risk prediction model, and performing parameter optimization on the financial risk prediction model by using a training set after preprocessing and a multi-group optimization algorithm, including:

and constructing a neural network model, and taking the constructed neural network model as a financial risk prediction model.

Optionally, in this embodiment, the BP neural network is preferably constructed, and the BP neural network is used as a financial risk prediction model, but the model construction may be performed by using other neural networks, which is not the only way.

Initializing a population scale P, a minimum parallel training group number Q1 and a maximum parallel training group number Q2 corresponding to the financial risk prediction model, wherein the population scale P is the training group number Q2-! Is an integer multiple of (a).

The multi-group optimization algorithm provided by the embodiment guides the training set to update by the local leader in the region, so that multi-region exploration of the solution space can be performed, and the training set has the action of reaggregation in each training process, so that the exploration region of the solution space can be increased, and the searching capability of the algorithm is improved.

And initializing model parameters of the financial risk prediction model based on the population scale P to obtain a training population comprising P training individuals, wherein each training individual comprises all model parameters of the financial risk prediction model.

Initializing the current training group number Q based on the minimum parallel training group number Q1 ^t The value of (2) is Q1. And acquiring the fitness value corresponding to each training individual by adopting the preprocessed training set.

Training individuals are arranged according to the sequence of the fitness value from the big to the small, and the current training group number Q is used ^t Basis, Q before taking out ^t Individual training individuals as Q ^t The local leaders of the training groups take out the training individuals with the largest fitness value as global leaders, and the rest training individuals are evenly distributed to Q ^t In the training set.

And aiming at each training group, carrying out local search on each training individual in the training group by adopting a local leader guiding strategy, a first self-adaptive optimizing strategy and a greedy optimizing strategy to obtain the training group after the local search.

Through carrying out local search on each training individual in the training set, the local optimal solution of the area where the training set is located can be searched through the information of the local leader and other individuals in the training set, so that area exploration is realized.

And carrying out global search on each training individual in the training set by adopting a probability optimization strategy, a global leader guiding strategy, a second self-adaptive optimization strategy and a greedy optimization strategy aiming at the training set after each local search to obtain the training set after the global search.

By performing global search on each training individual in the training set, the global optimal solution of the whole population can be searched through the information of the global leader and other individuals, so that a better solution of a nearby area can be found, and the convergence speed of an algorithm is improved.

And re-selecting the global leader and the local leader of each training group based on the training groups after the global search.

Judging whether the local leaders of each training group are unchanged in the L1 training process, if yes, updating the local leaders by adopting a probability disturbance strategy and a variation strategy, and entering a judging step of the global leaders, otherwise, directly entering the judging step of the global leaders.

The probability disturbance strategy and the mutation strategy are adopted to update the local leader, so that the exploration probability of other areas can be increased, and the global exploration capacity of the algorithm is improved.

Judging whether the global leader is unchanged in the L2 training process, if so, entering a judging step of the current training group number, otherwise, entering a judging step of the iteration ending condition.

Judging whether the current training group number is greater than the maximum parallel training group number and is Q2, if so, keeping the local leader of each training group unchanged, re-randomly and uniformly distributing other training individuals into each training group, and entering a judging step of iteration ending conditions, otherwise, enabling the current training group number Q to be the same ^t And is incremented by one and is preceded by Q ^t Individual training individuals as Q ^t The local leaders of the training groups take out the training individuals with the largest fitness value as global leaders, and the rest training individuals are evenly distributed to Q ^t And in the training groups, entering a judging step of iteration ending conditions.

The method of regrouping the areas can be used for searching more solution spaces, and global optimum searching can be realized under the condition of ensuring local searching, so that the problems of easy sinking into local optimum and poor training effect in the prior art are solved.

Optionally, determining whether the iteration end condition is satisfied may include: judging whether the current training times are larger than the maximum training times or whether the current fitness value is larger than a preset threshold value, if so, satisfying the iteration ending condition, otherwise, not satisfying the iteration ending condition.

In this embodiment, initializing model parameters of a financial risk prediction model based on the population scale P to obtain a training population including P training individuals includes:

X _id ＝X _mind +R(0,1)*(X _maxd -X _mind )

wherein X is _id D-th dimension model parameters representing the i-th training individual, d=1, 2, …, D representing the total number of model parameters, R (0, 1) representing [0,1 ]]Random numbers uniformly distributed on the surface, X _maxd Represents the maximum value, X, of the d-th dimension model parameter _mind Representing the minimum value of the d-th dimension model parameter.

Optionally, in order to increase the convergence rate of the algorithm, a chaotic mapping strategy may be further used to initialize the training population, so that the distribution of the training individuals in the solution space is more uniform, thereby reducing the training time.

In this embodiment, the acquiring, by using the training set after preprocessing, the fitness value corresponding to each training individual includes:

applying the ith training individual to the financial risk prediction model, and taking N pieces of historical enterprise characteristic data from the preprocessed training set as input of the financial risk prediction model to obtain the actual output of the financial risk prediction model as y _i,nm 。

Wherein f _i The fitness value for the i-th training individual is represented, i=1, 2, …, P. y is _i,nm And the ith training individual is applied to the financial risk prediction model, the nth historical enterprise characteristic data is used as the input of the financial risk prediction model, and the output of the mth output neuron of the obtained financial risk prediction model is obtained. n=1, 2, …, N, m=1, 2, …, M represents the total number of output neurons of the financial risk prediction model, i.e. the total number of risk categories.Indicating the desired output corresponding to the nth historical enterprise characteristic data. Alpha represents a constant term of more than 0 and less than 1, and the present embodiment is preferably 0.0001.

Alternatively, the error function value of the financial risk prediction model may be obtained, the inverse is obtained after the error function is directly added with 1, and the obtained inverse is used as the fitness value.

In this embodiment, for each training set, a local leader guiding policy, a first adaptive optimization policy, and a greedy optimization policy are adopted to perform local search on each training individual in the training set, so as to obtain a training set after local search, including:

and randomly generating a new individual for each training individual in each training group, judging whether the fitness value of the new individual is larger than that of the original training individual, if so, replacing the corresponding original training individual by the new individual to obtain the training individual after initial search, otherwise, retaining the original training individual to obtain the training individual after initial search. By performing unordered initial searches on the training individuals, more exploration opportunities of the solution space can be increased, and fine optimization of the region can be performed based on the exploration opportunities.

Optionally, in order to improve convergence accuracy in the later stage of the algorithm, when the training frequency is greater than 2/3 of the maximum training frequency, the step of initial searching can be removed, so that the capability of local fine optimization is improved.

And randomly generating a first random number between 0 and 1 for the training individuals after initial search, judging whether the first random number is larger than the local updating probability, if so, updating the training individuals after initial search by adopting a local leader guiding strategy, a first self-adaptive optimizing strategy and a greedy optimizing strategy, otherwise, not updating the training individuals after initial search.

And traversing all training individuals to obtain a training group after local search.

the first updated value is obtained as follows:

in the method, in the process of the invention,represents the d-th dimension model parameter of the j-th training individual in the q-th training group in the t-th training process,representation->A corresponding first updated value. q=1, 2, …, Q ^t ，/>D-th dimension model parameter representing local leader corresponding to q-th training set,/d- >Represents the division +.>D-th dimension model parameters of random individuals outside, < ->Represents a first adaptive optimization step size, and +.>t _max Represents the maximum number of training, λ represents the random number between (-0.1,0.1).

By setting the first self-adaptive optimization step length and the first updating value, a training individual can search with a larger step length in the early stage of the algorithm and with a smaller step length in the later stage of the algorithm, so that the situation that excessive greedy falls into local optimum is avoided, and the flexibility, randomness and ergodic performance of the algorithm are improved.

In this embodiment, for each training set after local search, a probability optimization strategy, a global leader guiding strategy, a second adaptive optimization strategy and a greedy optimization strategy are adopted to perform global search on each training individual in the training set, so as to obtain a training set after global search, including:

the global update probability is obtained as follows:

in the formula, pr _i Representing global update probability corresponding to the ith training individual, f _i Indicating the fitness value of the ith training individual, f _max Representing the fitness value of the global leader.

And randomly generating a second random number between 0 and 1 for the training individuals after the local search, judging whether the second random number is smaller than the global updating probability, if so, updating the training individuals after the local search by adopting a global leader guiding strategy, a second self-adaptive optimizing strategy and a greedy optimizing strategy, otherwise, not updating the training individuals after the local search.

And traversing all training individuals to obtain a training group after global searching.

the second updated value is obtained as follows:

in the method, in the process of the invention,d-th dimension model parameter representing the i-th training individual,>representation->A corresponding second updated value is provided for the second data,d-th dimension model parameter representing global leader,/->Indicate>The other random individuals, R (0, 1) represents (0, 1)Random numbers between R (-1, 1) represents random numbers between (-1, 1), and +.>Represents the second adaptive optimization step size and when the i-th training individual +.>The fitness value f of (2) _i ^t Step above the average fitness value of all trained individuals ₂ ^t Value preset maximum valueOtherwise->The value is +.> Representation->Corresponding preset minimum value f _ave Representing the average fitness value, f, of all trained individuals _min Representing the minimum fitness value for all trained individuals.

By introducing the second self-adaptive optimization step length and the second updating value, search defects caused by fixed step length can be avoided, uncertainty of random step length is reduced, global search is carried out, and meanwhile certain local search capacity is maintained, so that the search precision of an algorithm is ensured.

In this embodiment, updating the local leader using the probability perturbation strategy and the mutation strategy includes:

and randomly generating a third random number between 0 and 1 for the training individuals after global search, judging whether the third random number is smaller than the global updating probability, if so, carrying out first disturbance updating on the training individuals, otherwise, carrying out second disturbance updating on the training individuals.

And traversing all the training individuals after global searching to obtain the training individuals after disturbance updating.

Aiming at the training individuals after disturbance updating, obtaining variation updating values of the training individuals, and adopting a greedy optimization strategy to reserve and select the variation updating values. I.e., the fitness value increases, the variant update value is retained.

Wherein the first perturbation is updated as:

in the method, in the process of the invention,representing updated values of the d-th dimension model parameters of the i-th training individual after the global search.

The second perturbation is updated as:

in the method, in the process of the invention,d-th dimension model parameters representing the i-th training individual after global search, R ₁ Represents [0,1 ]]A fourth random number uniformly distributed on the upper part, R ₂ Represents [0,1 ]]A fifth random number uniformly distributed thereon, < >>D-th dimension model parameters representing global leader after global search,/d->D-th dimension model parameters representing random individuals other than the i-th training individual after the global search.

Through disturbance updating, the searching direction of the training group can be changed, so that the training individuals can be transferred to other areas for searching, and traversal of a solution space is realized.

The variation update value of the training individuals is obtained as follows:wherein beta is ₁ Representing a scaling factor that decreases linearly from 2 to 0 as the number of exercises increases. Pi represents the circumference ratio. Beta ₂ Represents [0,1 ]]Random numbers uniformly distributed on the base.

Although the algorithm can realize global search, the algorithm is a small-range transition search, so that a mutation update mode is introduced to carry out position jump, and the local optimum is jumped out.

Alternatively, since the model parameters have an upper limit and a lower limit, after the training individuals change, the training individuals should be subjected to out-of-range processing to bring the model parameters back into the upper and lower limits. For example, when the model parameters are out of limit, the random position in the upper and lower limits, or the nearest position in the upper and lower limits, will be pulled back.

In this embodiment, the testing of the financial risk prediction model by using the test set after preprocessing to obtain the trained financial risk prediction model includes:

and taking the preprocessed test set as a training sample of the financial risk prediction model to obtain the accuracy, precision and recall rate of the financial risk prediction model.

And when the accuracy, the precision and the recall rate of the financial risk prediction model all meet the preset requirements, taking the financial risk prediction model after parameter optimization as a financial risk prediction model after training.

Example 2

As shown in fig. 2, the present invention provides a financial risk prediction system, which includes a data acquisition module 1, a data cleaning module 2, a model training module 3, and a risk prediction module 4.

The data acquisition module 1 is configured to crawl historical enterprise feature data from a specified data source and tag data corresponding to the manually input historical enterprise feature data, construct training data by adopting the historical enterprise feature data and the tag data corresponding to the historical enterprise feature data, and divide the training data into a training set and a testing set.

The data cleaning module 2 is configured to perform data cleaning and data preprocessing on the training set and the testing set, so as to obtain the training set after preprocessing and the testing set after preprocessing.

The model training module 3 is configured to construct a financial risk prediction model, perform parameter optimization on the financial risk prediction model by using a preprocessed training set and a multi-group optimization algorithm, and test the financial risk prediction model by using a preprocessed testing set to obtain a trained financial risk prediction model.

The risk prediction module 4 is configured to collect real-time enterprise feature data corresponding to financial risk to be predicted, and identify the real-time enterprise feature data by using a trained financial risk prediction model, so as to obtain a financial risk prediction method.

The principle and the beneficial effects of the financial risk prediction system provided in this embodiment are similar, and the description of this embodiment is omitted.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Those of ordinary skill in the art will appreciate that implementing all or part of the above facts and methods may be accomplished by a program to instruct related hardware, the program involved or the program may be stored in a computer readable storage medium, the program when executed comprising the steps of: the corresponding method steps are introduced at this time, and the storage medium may be a ROM/RAM, a magnetic disk, an optical disk, or the like.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. A financial risk prediction method, comprising:

2. The financial risk prediction method according to claim 1, wherein performing data cleansing and data preprocessing on the training set and the test set to obtain the training set after preprocessing and the test set after preprocessing, comprises:

3. The financial risk prediction method according to claim 1 or 2, wherein constructing a financial risk prediction model, and performing parameter optimization on the financial risk prediction model by using a training set after preprocessing and a multi-group optimization algorithm, comprises:

judging whether the current training group number is greater than the maximum parallel training group number and is Q2, if so, keeping the local leader of each training group unchanged, re-randomly and uniformly distributing other training individuals into each training group, and entering a judging step of iteration ending conditions, otherwise, enabling the current training group number Q to be the same ^t And is incremented by one and is preceded by Q ^t Individual training individuals as Q ^t Personal trainingThe local leader of the training group takes out the training individual with the largest fitness value as the global leader, and the rest training individuals are evenly distributed to Q ^t In the training groups, a judging step of iteration ending conditions is carried out;

4. A financial risk prediction method according to claim 3, wherein initializing model parameters of a financial risk prediction model based on the population size P to obtain a training population comprising P training individuals comprises:

X _id ＝X _mind +R(0,1)*(X _maxd -X _mind )

5. The financial risk prediction method according to claim 4, wherein the acquiring the fitness value corresponding to each training individual using the preprocessed training set includes:

6. The financial risk prediction method according to claim 5, wherein for each training set, performing a local search on each training individual in the training set using a local leader guidance strategy, a first adaptive optimization strategy, and a greedy optimization strategy to obtain a training set after the local search, comprising:

the first updated value is obtained as follows:

in the method, in the process of the invention,d-th dimension model parameter representing the jth training individual in the jth training group during the jth training process,/->Representation->A corresponding first updated value; q=1, 2, …, Q ^t ，/>D-th dimension model parameter representing local leader corresponding to q-th training set,/d->Represents the division +.>D-th dimension model parameters of random individuals outside, < ->Represents a first adaptive optimization step size, and +.>t _max Represents the maximum number of exercises, λ represents the random number between (-0.1,0.1);

7. The financial risk prediction method according to claim 6, wherein, for each training set after the local search, performing a global search on each training individual in the training set by using a probability optimization strategy, a global leader guidance strategy, a second adaptive optimization strategy, and a greedy optimization strategy to obtain a training set after the global search, comprising:

The global update probability is obtained as follows:

the second updated value is obtained as follows:

in the method, in the process of the invention,d-th dimension model parameter representing the i-th training individual,>representation->Corresponding second updated value, +.>D-th dimension model parameter representing global leader,/->Indicate>The other random individuals, R (0, 1) represents the random number between (0, 1), R (-1, 1) represents the random number between (-1, 1), and- >Represents the second adaptive optimization step size and when the i-th training individual +.>The fitness value f of (2) _i ^t Greater than the average fitness value of all trained individuals->Value preset maximum value +.>Otherwise->The value is +.>Representation ofCorresponding preset minimum value f _ave Representing the average fitness value, f, of all trained individuals _min Representing the minimum fitness value of all trained individuals;

8. The financial risk prediction method according to claim 7, wherein updating the local leader with the probabilistic perturbation strategy and the mutation strategy comprises:

Wherein the first perturbation is updated as:

the second perturbation is updated as:

9. The financial risk prediction method according to claim 8, wherein testing the financial risk prediction model using the pre-processed test set to obtain a trained financial risk prediction model comprises:

10. The financial risk prediction system is characterized by comprising a data acquisition module, a data cleaning module, a model training module and a risk prediction module;