CN111882416A - Training method and related device of risk prediction model - Google Patents

Training method and related device of risk prediction model Download PDF

Info

Publication number
CN111882416A
CN111882416A CN202010720354.9A CN202010720354A CN111882416A CN 111882416 A CN111882416 A CN 111882416A CN 202010720354 A CN202010720354 A CN 202010720354A CN 111882416 A CN111882416 A CN 111882416A
Authority
CN
China
Prior art keywords
financial data
initial
data set
pieces
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010720354.9A
Other languages
Chinese (zh)
Inventor
李招
张彬杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weikun Shanghai Technology Service Co Ltd
Original Assignee
Weikun Shanghai Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weikun Shanghai Technology Service Co Ltd filed Critical Weikun Shanghai Technology Service Co Ltd
Priority to CN202010720354.9A priority Critical patent/CN111882416A/en
Publication of CN111882416A publication Critical patent/CN111882416A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The application relates to the field of block storage systems and artificial intelligence, and discloses a risk prediction model training method and a related device, wherein the method comprises the following steps: acquiring a first financial data set, wherein the first financial data set comprises M pieces of first financial data corresponding to a plurality of first fields; vectorizing, for the first financial data set, the plurality of pieces of first financial data associated with each of the plurality of first fields to obtain a plurality of first vectors; determining the correlation between every two first vectors in the plurality of vectors by adopting a preset feature selection algorithm; determining a second financial data set from the first financial data set according to the correlation between each two first vectors; training a risk prediction model using the second financial data set. By implementing the embodiment of the application, the training period of the risk prediction model is shortened, and the training complexity is reduced.

Description

Training method and related device of risk prediction model
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and a related apparatus for training a risk prediction model.
Background
With the rapid development of emerging technologies, various industries begin to utilize deep learning, neural networks and the like to realize risk prediction. For example, the risk of default of the enterprise is predicted through a risk prediction model. Generally, before the risk of the enterprise default is predicted through the risk prediction model, the risk prediction model needs to be trained. In the prior art, when a risk prediction model is trained, a financial data set is often adopted directly. Due to the fact that the data volume of the financial data set is large, the training period of the risk prediction model is long, and the training complexity is high.
Disclosure of Invention
The embodiment of the application provides a training method and a related device of a risk prediction model, and by implementing the embodiment of the application, the training period of the risk prediction model is shortened, and the training complexity is reduced.
The first aspect of the present application provides a method for training a risk prediction model, including:
acquiring a first financial data set, wherein the first financial data set comprises M pieces of first financial data corresponding to a plurality of first fields, the plurality of first fields comprise a first field A and a first field B, the first field A is associated with X pieces of first financial data, the first field B is associated with Y pieces of first financial data, and M is X + Y, wherein M, X and Y are integers greater than 1;
vectorizing, for the first financial data set, the plurality of pieces of first financial data associated with each of the plurality of first fields to obtain a plurality of first vectors;
determining the correlation between every two first vectors in the plurality of first vectors by adopting a preset feature selection algorithm;
determining a second financial data set from the first financial data set according to the correlation between each two first vectors;
training a risk prediction model using the second financial data set.
A second aspect of the present application provides a training apparatus for a risk prediction model, including:
the processing module is configured to obtain a first financial data set, where the first financial data set includes M pieces of first financial data corresponding to a plurality of first fields, where the plurality of first fields includes a first field a and a first field B, the first field a is associated with X pieces of first financial data, the first field B is associated with Y pieces of first financial data, M is X + Y, and M, X, and Y are integers greater than 1; vectorizing, for the first financial data set, the plurality of pieces of first financial data associated with each of the plurality of first fields to obtain a plurality of first vectors; determining the correlation between every two first vectors in the plurality of first vectors by adopting a preset feature selection algorithm; determining a second financial data set from the first financial data set according to the correlation between each two first vectors; training a risk prediction model using the second financial data set.
A third aspect of the application provides an electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and are generated as instructions that are executed by the processor to perform steps in any of a method of training a risk prediction model.
A fourth aspect of the application provides a computer readable storage medium for storing a computer program for execution by the processor to perform the method of any one of the methods of training a risk prediction model.
It can be seen that, in the above technical solution, by determining the second financial data set from the first financial data set according to the correlation and training the risk prediction model by using the second financial data set, the correlation between the financial data is deeply mined, so that the second financial data set is determined from the first financial data set according to the correlation between the financial data, data used for training the risk prediction model is reduced, the training period of the risk prediction model is shortened, and the training complexity is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Wherein:
FIG. 1 is a schematic diagram of a training system for a risk prediction model according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of a method for training a risk prediction model according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a training method for a risk prediction model according to an embodiment of the present disclosure;
fig. 4 is a schematic flowchart of a training method for a risk prediction model according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a risk prediction model training apparatus according to an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of an electronic device in a hardware operating environment according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The following are detailed below.
The terms "first" and "second" in the description and claims of the present application and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Referring to fig. 1, fig. 1 is a schematic diagram of a training system of a risk prediction model provided in an embodiment of the present application, where the training system 100 of the risk prediction model includes a training device 110 of the risk prediction model. The risk prediction model training device 110 is used to process and store the first financial data set. The training system 100 of the risk prediction model may include an integrated single device or multiple devices, and for convenience of description, the training system 100 of the risk prediction model is generally referred to as an electronic device. It will be apparent that the electronic device may include various handheld devices, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to a wireless modem having wireless communication capability, as well as various forms of User Equipment (UE), Mobile Stations (MS), terminal equipment (terminal device), and the like.
With reference to fig. 1, an embodiment of the present application provides a method for training a risk prediction model, and the following describes the embodiment of the present application in detail.
Referring to fig. 2, fig. 2 is a schematic flowchart of a training method of a risk prediction model according to an embodiment of the present application. The risk prediction model training method can be applied to an electronic device, as shown in fig. 2, and includes:
201. acquiring a first financial data set, wherein the first financial data set comprises M pieces of first financial data corresponding to a plurality of first fields, the plurality of first fields comprise a first field A and a first field B, the first field A is associated with X pieces of first financial data, the first field B is associated with Y pieces of first financial data, and M is X + Y, wherein M, X and Y are integers greater than 1.
Wherein, the first field may include, for example: and fields of basic information of listed and debt enterprises, financial reports, audit opinions, credit rating, negative events, stockholder equity and equity, certificate prison punishment and the like. Specifically, the first field may include, for example, a net profit percentage increase rate within 3 years, a credit rating increase rate within 3 years, a number of negative events within 3 years, a three-year net profit average within 3 years, and the like, without being limited thereto.
The first financial data may include, for example: the percentage increase of net profit within 3 years, the extent of increase of credit rating within 3 years, the number of negative events within 3 years, the average value of net profit for three years within 3 years, etc., which are not limited herein.
For example, referring to table 1, table 1 is a first financial data set provided in the embodiments of the present application, as shown in table 1.
TABLE 1
Figure BDA0002600214730000041
It can be seen that in table 1, one first field is the magnitude of the rise in credit rating over 3 years, one first field is the number of negative events over 3 years, and one first field is the three-year average of net profit over 3 years. Further, the first field is a 3-year credit rating rise, and the corresponding first financial data includes 15%, 11%, and the like. The first field is the number of negative events within 3 years, and its corresponding first financial data includes 8, 3, etc. The first field is the three-year average net profit over 3 years, and its corresponding first financial data includes 9000, 11000, etc.
Wherein X may be equal to or different from Y, and is not particularly limited. Further, the first field a and the first field B are two different fields in the plurality of first fields.
202. Vectorizing, for the first financial data set, the plurality of pieces of first financial data associated with each of the plurality of first fields to obtain a plurality of first vectors.
With reference to Table 1, the first field is the ascending amplitude of the credit rating within 3 years, and the corresponding first vector is
Figure BDA0002600214730000042
The first field is the number of negative events in 3 years, and the corresponding first vector is
Figure BDA0002600214730000043
The first field is the average value of net profit for three years in 3 years, and the corresponding first vector is
Figure BDA0002600214730000051
203. And determining the correlation between every two first vectors in the plurality of first vectors by adopting a preset feature selection algorithm.
The preset feature selection algorithm may be, for example, a feature selection algorithm of pearson correlation coefficients.
204. Determining a second financial data set from the first financial data set according to the correlation between each two first vectors.
Optionally, in a possible implementation, the determining, from the first financial data set, a second financial data set according to a correlation between every two first vectors, where the correlation between every two first vectors includes a correlation between a second vector and a third vector, where the second vector is any one of the plurality of first vectors, and the third vector is any one of the plurality of first vectors except for the second vector, includes:
if the correlation between the second vector and the third vector is higher than the preset correlation, reserving a plurality of pieces of first financial data corresponding to the second vector, and deleting a plurality of pieces of first financial data corresponding to the third vector to obtain a second financial data set; or deleting the plurality of pieces of first financial data corresponding to the second vector, and reserving the plurality of pieces of first financial data corresponding to the third vector to obtain the second financial data set.
The preset correlation may be set by an administrator or may be configured in the electronic device.
In addition, if the correlation between the second vector and the third vector is lower than a preset correlation, the plurality of pieces of first financial data corresponding to the second vector are reserved, and the plurality of pieces of first financial data corresponding to the third vector are reserved to obtain the second financial data set.
Therefore, in the technical scheme, the relevance-based data for training the risk prediction model is reduced, so that the training period of the risk prediction model is shortened, and the training complexity is also reduced.
205. Training a risk prediction model using the second financial data set.
It can be seen that, in the above technical solution, by determining the second financial data set from the first financial data set according to the correlation and training the risk prediction model by using the second financial data set, the correlation between the financial data is deeply mined, so that the second financial data set is determined from the first financial data set according to the correlation between the financial data, data used for training the risk prediction model is reduced, the training period of the risk prediction model is shortened, and the training complexity is reduced.
Referring to fig. 3, fig. 3 is a schematic flowchart of a training method for a risk prediction model according to an embodiment of the present application. The method for training the risk prediction model may be applied to an electronic device, where, as shown in fig. 3, the acquiring a first financial data set includes:
301. acquiring an initial financial data set from at least one blockchain, wherein the initial financial data set comprises N pieces of initial financial data corresponding to a plurality of initial fields, the initial fields comprise initial fields A and initial fields B, the initial fields A are associated with S pieces of initial financial data, the initial fields B are associated with T pieces of initial financial data, N is S + T, and N, S and T are integers greater than 1.
The block chain is a chain data structure which connects the data blocks according to the time sequence, and is a distributed account book which is cryptographically guaranteed to be not falsifiable and counterfeitable. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
Further, the properties of the blockchain include openness, consensus, de-centering, de-trust, transparency, anonymity of both sides, non-tampering, traceability, and the like. Open and transparent means that anyone can participate in the blockchain network, and each device can be used as a node, and each node allows a complete database copy to be obtained. The nodes maintain the whole block chain together through competition calculation based on a set of consensus mechanism. When any node fails, the rest nodes can still work normally. The decentralization and the distrust mean that a block chain is formed into an end-to-end network by a plurality of nodes together, and no centralized equipment or management mechanism exists. The data exchange between the nodes is verified by a digital signature technology, mutual trust is not needed, and other nodes cannot be deceived as long as the data exchange is carried out according to the rules set by the system. Transparent and anonymous meaning that the operation rule of the block chain is public, and all data information is also public, so that each transaction is visible to all nodes. Because the nodes are distrusted, the nodes do not need to disclose identities, and each participated node is anonymous. Among other things, non-tamperable and traceable means that modifications to the database by each and even multiple nodes cannot affect the databases of other nodes unless more than 51% of the nodes in the entire network can be controlled to modify at the same time, which is almost impossible. In the block chain, each transaction is connected with two adjacent blocks in series through a cryptographic method, so that any transaction record can be traced.
In particular, the blockchain may utilize blockchain data structures to verify and store data, utilize distributed node consensus algorithms to generate and update data, cryptographically secure data transmission and access, and utilize intelligent contracts comprised of automated script code to program and manipulate data in a completely new distributed infrastructure and computing manner. Therefore, the characteristic that the block chain technology is not tampered fundamentally changes a centralized credit creation mode, and the irrevocability and the safety of data are effectively improved. The intelligent contract enables all the terms to be written into programs, the terms can be automatically executed on the block chain, and therefore when conditions for triggering the intelligent contract exist, the block chain can be forcibly executed according to the content in the intelligent contract and is not blocked by any external force, effectiveness and execution force of the contract are guaranteed, cost can be greatly reduced, and efficiency can be improved. Each node on the block chain has the same account book, and the recording process of the account book can be ensured to be public and transparent. The block chain technology can realize point-to-point, open and transparent direct interaction, so that an information interaction mode with high efficiency, large scale and no centralized agent becomes a reality.
The initial field may include, for example: and fields of basic information of listed and debt enterprises, financial reports, audit opinions, credit rating, negative events, stockholder equity and equity, certificate prison punishment and the like. Specifically, the first field may include, for example, a net profit percentage increase rate within 3 years, a credit rating increase rate within 3 years, a number of negative events within 3 years, a three-year net profit average within 3 years, and the like, without being limited thereto.
The initial financial data may include, for example: the percentage increase of net profit within 3 years, the extent of increase of credit rating within 3 years, the number of negative events within 3 years, the average value of net profit for three years within 3 years, etc., which are not limited herein.
S may be equal to T or not, which is not limited herein. Further, the initial field a and the initial field B are two different fields of the plurality of initial fields.
302. Determining sparsity of the initial set of financial data.
Optionally, in a possible implementation, the determining sparsity of the initial financial data set includes: constructing a matrix from the initial set of financial data, a column of elements in the matrix corresponding to a plurality of pieces of initial financial data associated with an initial field of the plurality of initial fields; determining the number of sparse elements of each column of elements in the matrix, wherein initial data corresponding to the sparse elements are zero; and determining the sparsity corresponding to the matrix according to the number of sparse elements of each column of elements in the matrix.
The matrix is a matrix with n rows and m columns, n is the number of initial fields, m is the number of initial financial data associated with an initial field K, and the initial field K is the initial field with the most associated initial financial data in the initial fields.
For example, referring to table 2, table 2 is an initial financial data set provided by the embodiment of the present application, as shown in table 2.
TABLE 2
Figure BDA0002600214730000071
It can be seen that in table 2, one first field is the magnitude of the rise in credit rating over 3 years, one first field is the number of negative events over 3 years, and one first field is the three-year average of net profit over 3 years. Further, the first field is a 3-year credit rating rise, and the corresponding initial financial data includes 15%, 0%, 11%, and so on. The first field is the number of negative events within 3 years, which corresponds to initial financial data including 8, 0, 5, etc. The first field is the three-year average net profit over 3 years, and its corresponding initial financial data includes 9000, 11000, 15000, etc. Further, the matrix may be
Figure BDA0002600214730000081
It can be seen that the matrix is a 3-row and 3-column matrix, the first column is initial financial data associated with a field of "credit rating rise in 3 years", the second column is initial financial data associated with a field of "number of negative events in 3 years", and the third column is initial financial data associated with a field of "three-year average of net profit in 3 years". The number of the sparse elements in the first column is 1, the number of the sparse elements in the second column is 1, and the number of the sparse elements in the third column is 0. Therefore, the sparsity corresponding to this matrix is 2.
Therefore, in the technical scheme, the sparsity is determined, and preparation is made for subsequently acquiring the first financial data set.
303. If the sparsity is less than a threshold, determining whether a plurality of initial financial data associated with at least one initial field in the plurality of initial fields do not satisfy a preset distribution for the initial financial data set.
If yes, go to step 304; if not, go to step 305.
The preset distribution may be a gaussian distribution, for example.
304. Deleting a plurality of pieces of initial financial data associated with the at least one initial field aiming at the initial financial data set to obtain a remaining initial financial data set; determining the remaining initial financial data set as the first financial data set.
305. Determining the initial financial data set as the first financial data set.
It can be seen that, in the above technical solution, the determination of the first financial data set is implemented by determining the sparsity of the initial financial data set, and determining whether there are multiple pieces of initial financial data associated with at least one initial field in the initial field that do not satisfy the preset distribution when the sparsity is smaller than the threshold. Meanwhile, more reliable and scientific training data are provided for the training of the subsequent risk prediction model.
Referring to fig. 4, fig. 4 is a schematic flowchart of a training method for a risk prediction model according to an embodiment of the present application. The method for training the risk prediction model may be applied to an electronic device, wherein, as shown in fig. 4, the training the risk prediction model using the second financial data set, where the second financial data set includes a plurality of pieces of second financial data associated with each of a plurality of second fields, includes:
401. vectorizing, for the second financial data set, the plurality of pieces of second financial data associated with each second field to obtain a plurality of fourth vectors.
402. And obtaining a vector corresponding to the preset field.
The vector corresponding to the preset field may be a negative vector or a positive vector. Further, when the initial financial data set meets a first preset strategy, a vector corresponding to the preset field is a negative vector; and when the initial financial data set meets a second preset strategy, the vector corresponding to the preset field is a positive vector.
It is understood that the first preset policy and the second preset policy may be set by an administrator or may be configured in the electronic device. The preset field may be set by an administrator or may be configured in the electronic device.
403. And determining the distance between each fourth vector in the plurality of fourth vectors and the vector corresponding to the preset field.
404. Determining a third financial data set from the second financial data set based on the distance.
Optionally, the determining, according to the distance, a third financial data set from the second financial data set, where the distance includes a distance between a fifth vector and a vector corresponding to the preset field, and the fifth vector is any one of the fourth vectors, and the method includes:
if the distance between the fifth vector and the vector corresponding to the preset field is higher than a preset distance, reserving a plurality of pieces of second financial data corresponding to the fifth vector to obtain a third financial data set;
and if the distance between the fifth vector and the vector corresponding to the preset field is lower than the preset distance, deleting the second financial data corresponding to the fifth vector to obtain the third financial data set.
The preset distance can be set by an administrator and can be configured in the electronic device.
It can be seen that, in the above technical solution, the third financial data set is determined based on the distance. And the data for training the risk prediction model is reduced again, so that the training period of the risk prediction model is shortened, and the training complexity is also reduced.
405. Training the risk prediction model using the third financial dataset.
Therefore, in the technical scheme, the third financial data set is determined from the second financial data set, and the data for training the risk prediction model is reduced again, so that the training period of the risk prediction model is shortened, and the training complexity is also reduced.
Referring to fig. 5, fig. 5 is a schematic diagram of a training apparatus for a risk prediction model according to an embodiment of the present application. As shown in fig. 5, a training apparatus 500 for a risk prediction model provided in an embodiment of the present application may include:
a processing module 501, configured to obtain a first financial data set, where the first financial data set includes M pieces of first financial data corresponding to a plurality of first fields, where the plurality of first fields include a first field a and a first field B, the first field a is associated with X pieces of first financial data, the first field B is associated with Y pieces of first financial data, and M is X + Y, where M, X, and Y are integers greater than 1; vectorizing, for the first financial data set, the plurality of pieces of first financial data associated with each of the plurality of first fields to obtain a plurality of first vectors; determining the correlation between every two first vectors in the plurality of first vectors by adopting a preset feature selection algorithm; determining a second financial data set from the first financial data set according to the correlation between each two first vectors; training a risk prediction model using the second financial data set.
Optionally, when acquiring the first financial data set, the processing module 501 is configured to acquire an initial financial data set from at least one blockchain, where the initial financial data set includes N initial financial data corresponding to a plurality of initial fields, where the plurality of initial fields includes an initial field a and an initial field B, the initial field a is associated with S initial financial data, the initial field B is associated with T initial financial data, N ═ S + T, where N, S, and T are integers greater than 1; determining a sparsity of the initial set of financial data; if the sparsity is less than a threshold, determining whether a plurality of initial financial data associated with at least one initial field in the plurality of initial fields do not satisfy a preset distribution for the initial financial data set; if so, deleting the plurality of pieces of initial financial data associated with the at least one initial field aiming at the initial financial data set to obtain a residual initial financial data set; determining the remaining initial financial data set as the first financial data set; if not, determining the initial financial data set as the first financial data set.
Optionally, when determining the sparsity of the initial financial data set, the processing module 501 is configured to construct a matrix according to the initial financial data set, where a column of elements in the matrix corresponds to multiple pieces of initial financial data associated with one of the multiple initial fields; determining the number of sparse elements of each column of elements in the matrix, wherein initial data corresponding to the sparse elements are zero; and determining the sparsity corresponding to the matrix according to the number of sparse elements of each column of elements in the matrix.
Optionally, the correlation between every two first vectors includes a correlation between a second vector and a third vector, where the second vector is any one of the multiple first vectors, and the third vector is any one of the multiple first vectors except the second vector, and when a second financial data set is determined from the first financial data set according to the correlation between every two first vectors, the processing module 501 is configured to, if the correlation between the second vector and the third vector is higher than a preset correlation, retain the multiple pieces of first financial data corresponding to the second vector, and delete the multiple pieces of first financial data corresponding to the third vector, so as to obtain the second financial data set; or deleting the plurality of pieces of first financial data corresponding to the second vector, and reserving the plurality of pieces of first financial data corresponding to the third vector to obtain the second financial data set.
Optionally, when the risk prediction model is trained by using the second financial data set, where the second financial data set includes a plurality of pieces of second financial data associated with each of a plurality of second fields, the processing module 501 is configured to vectorize, for the second financial data set, the plurality of pieces of second financial data associated with each of the plurality of second fields to obtain a plurality of fourth vectors; obtaining a vector corresponding to a preset field; determining a distance between each fourth vector in the plurality of fourth vectors and the vector corresponding to the preset field; determining a third financial data set from the second financial data set based on the distance; training the risk prediction model using the third financial dataset.
Optionally, the distance includes a distance between a fifth vector and a vector corresponding to the preset field, where the fifth vector is any one of the fourth vectors, and a third financial data set is determined from the second financial data set according to the distance, and the processing module 501 is configured to, if the distance between the fifth vector and the vector corresponding to the preset field is higher than a preset distance, reserve a plurality of pieces of second financial data corresponding to the fifth vector to obtain the third financial data set;
and if the distance between the fifth vector and the vector corresponding to the preset field is lower than the preset distance, deleting the second financial data corresponding to the fifth vector to obtain the third financial data set.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an electronic device in a hardware operating environment according to an embodiment of the present application.
An embodiment of the application provides an electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the processor to perform instructions of steps in a training method comprising any one of the risk prediction models. As shown in fig. 6, an electronic device of a hardware operating environment according to an embodiment of the present application may include:
a processor 601, such as a CPU.
The memory 602 may alternatively be a high speed RAM memory or a stable memory such as a disk memory.
A communication interface 603 for implementing connection communication between the processor 601 and the memory 602.
Those skilled in the art will appreciate that the configuration of the electronic device shown in fig. 6 is not intended to be limiting and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.
As shown in fig. 6, the memory 602 may include an operating system, a network communication module, and one or more programs. An operating system is a program that manages and controls the server hardware and software resources, supporting the execution of one or more programs. The network communication module is used for communication among the components in the memory 602 and with other hardware and software in the electronic device.
In the electronic device shown in fig. 6, the processor 601 is configured to execute one or more programs in the memory 602, and implement the following steps: acquiring a first financial data set, wherein the first financial data set comprises M pieces of first financial data corresponding to a plurality of first fields, the plurality of first fields comprise a first field A and a first field B, the first field A is associated with X pieces of first financial data, the first field B is associated with Y pieces of first financial data, and M is X + Y, wherein M, X and Y are integers greater than 1; vectorizing, for the first financial data set, the plurality of pieces of first financial data associated with each of the plurality of first fields to obtain a plurality of first vectors; determining the correlation between every two first vectors in the plurality of first vectors by adopting a preset feature selection algorithm; determining a second financial data set from the first financial data set according to the correlation between each two first vectors; training a risk prediction model using the second financial data set.
For specific implementation of the electronic device related to the present application, reference may be made to various embodiments of the risk prediction model training method, which are not described herein again.
The present application further provides a computer readable storage medium for storing a computer program, the stored computer program being executable by the processor to perform the steps of: acquiring a first financial data set, wherein the first financial data set comprises M pieces of first financial data corresponding to a plurality of first fields, the plurality of first fields comprise a first field A and a first field B, the first field A is associated with X pieces of first financial data, the first field B is associated with Y pieces of first financial data, and M is X + Y, wherein M, X and Y are integers greater than 1; vectorizing, for the first financial data set, the plurality of pieces of first financial data associated with each of the plurality of first fields to obtain a plurality of first vectors; determining the correlation between every two first vectors in the plurality of first vectors by adopting a preset feature selection algorithm; determining a second financial data set from the first financial data set according to the correlation between each two first vectors; training a risk prediction model using the second financial data set.
For specific implementation of the computer-readable storage medium related to the present application, reference may be made to the embodiments of the risk prediction model training method, which are not described herein again.
The computer readable storage medium may be non-volatile or volatile.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art should understand that the present application is not limited by the order of acts described, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that the acts and modules involved are not necessarily required for this application.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims (10)

1. A method for training a risk prediction model, comprising:
acquiring a first financial data set, wherein the first financial data set comprises M pieces of first financial data corresponding to a plurality of first fields, the plurality of first fields comprise a first field A and a first field B, the first field A is associated with X pieces of first financial data, the first field B is associated with Y pieces of first financial data, and M is X + Y, wherein M, X and Y are integers greater than 1;
vectorizing, for the first financial data set, the plurality of pieces of first financial data associated with each of the plurality of first fields to obtain a plurality of first vectors;
determining the correlation between every two first vectors in the plurality of first vectors by adopting a preset feature selection algorithm;
determining a second financial data set from the first financial data set according to the correlation between each two first vectors;
training a risk prediction model using the second financial data set.
2. The method of claim 1, wherein said obtaining a first set of financial data comprises:
acquiring an initial financial data set from at least one blockchain, wherein the initial financial data set comprises N pieces of initial financial data corresponding to a plurality of initial fields, the initial fields comprise initial fields A and initial fields B, the initial fields A are associated with S pieces of initial financial data, the initial fields B are associated with T pieces of initial financial data, N is S + T, and N, S and T are integers greater than 1;
determining a sparsity of the initial set of financial data;
if the sparsity is less than a threshold, determining whether a plurality of initial financial data associated with at least one initial field in the plurality of initial fields do not satisfy a preset distribution for the initial financial data set;
if so, deleting the plurality of pieces of initial financial data associated with the at least one initial field aiming at the initial financial data set to obtain a remaining initial financial data set, and determining the remaining initial financial data set as the first financial data set;
if not, determining the initial financial data set as the first financial data set.
3. The method of claim 2, wherein said determining sparsity of said initial set of financial data comprises:
constructing a matrix from the initial set of financial data, a column of elements in the matrix corresponding to a plurality of pieces of initial financial data associated with one of the plurality of initial fields;
determining the number of sparse elements of each column of elements in the matrix, wherein initial data corresponding to the sparse elements are zero;
and determining the sparsity corresponding to the matrix according to the number of sparse elements of each column of elements in the matrix.
4. The method of any one of claims 1-3, wherein the determining a second set of financial data from the first set of financial data is based on a correlation between each of two first vectors, the correlation between each of the two first vectors comprising a correlation between a second vector and a third vector, the second vector being any one of the plurality of first vectors, the third vector being any one of the plurality of first vectors except the second vector, the method comprising:
if the correlation between the second vector and the third vector is higher than the preset correlation, reserving a plurality of pieces of first financial data corresponding to the second vector, and deleting a plurality of pieces of first financial data corresponding to the third vector to obtain a second financial data set; or deleting the plurality of pieces of first financial data corresponding to the second vector, and reserving the plurality of pieces of first financial data corresponding to the third vector to obtain the second financial data set.
5. The method of claim 1, wherein training a risk prediction model using the second set of financial data, the second set of financial data including a plurality of pieces of second financial data associated with each of a plurality of second fields, comprises:
vectorizing, for the second financial data set, the plurality of pieces of second financial data associated with each second field to obtain a plurality of fourth vectors;
obtaining a vector corresponding to a preset field;
determining a distance between each fourth vector in the plurality of fourth vectors and the vector corresponding to the preset field;
determining a third financial data set from the second financial data set based on the distance;
training the risk prediction model using the third financial dataset.
6. The method of claim 5, wherein the determining a third set of financial data from the second set of financial data according to the distance comprises a distance between a fifth vector and a vector corresponding to the predetermined field, wherein the fifth vector is any one of the plurality of fourth vectors, and wherein the method comprises:
if the distance between the fifth vector and the vector corresponding to the preset field is higher than a preset distance, reserving a plurality of pieces of second financial data corresponding to the fifth vector to obtain a third financial data set;
and if the distance between the fifth vector and the vector corresponding to the preset field is lower than the preset distance, deleting the second financial data corresponding to the fifth vector to obtain the third financial data set.
7. A training device for a risk prediction model, comprising:
the processing module is configured to obtain a first financial data set, where the first financial data set includes M pieces of first financial data corresponding to a plurality of first fields, where the plurality of first fields includes a first field a and a first field B, the first field a is associated with X pieces of first financial data, the first field B is associated with Y pieces of first financial data, M is X + Y, and M, X, and Y are integers greater than 1; vectorizing, for the first financial data set, the plurality of pieces of first financial data associated with each of the plurality of first fields to obtain a plurality of first vectors; determining the correlation between every two first vectors in the plurality of first vectors by adopting a preset feature selection algorithm; determining a second financial data set from the first financial data set according to the correlation between each two first vectors; training a risk prediction model using the second financial data set.
8. The apparatus of claim 7, wherein the processing module, in acquiring the first set of financial data, is configured to
Acquiring an initial financial data set from at least one blockchain, wherein the initial financial data set comprises N pieces of initial financial data corresponding to a plurality of initial fields, the initial fields comprise initial fields A and initial fields B, the initial fields A are associated with S pieces of initial financial data, the initial fields B are associated with T pieces of initial financial data, N is S + T, and N, S and T are integers greater than 1;
determining a sparsity of the initial set of financial data;
if the sparsity is less than a threshold, determining whether a plurality of initial financial data associated with at least one initial field in the plurality of initial fields do not satisfy a preset distribution for the initial financial data set;
if so, deleting the plurality of pieces of initial financial data associated with the at least one initial field aiming at the initial financial data set to obtain a residual initial financial data set; determining the remaining initial financial data set as the first financial data set;
if not, determining the initial financial data set as the first financial data set.
9. An electronic device comprising a processor, a memory, a communication interface, and one or more programs, wherein the one or more programs are stored in the memory and generated instructions for execution by the processor to perform the steps of the method of any of claims 1-6.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store a computer program, which is executed by the processor, to implement the method of any of claims 1-6.
CN202010720354.9A 2020-07-24 2020-07-24 Training method and related device of risk prediction model Pending CN111882416A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010720354.9A CN111882416A (en) 2020-07-24 2020-07-24 Training method and related device of risk prediction model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010720354.9A CN111882416A (en) 2020-07-24 2020-07-24 Training method and related device of risk prediction model

Publications (1)

Publication Number Publication Date
CN111882416A true CN111882416A (en) 2020-11-03

Family

ID=73200232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010720354.9A Pending CN111882416A (en) 2020-07-24 2020-07-24 Training method and related device of risk prediction model

Country Status (1)

Country Link
CN (1) CN111882416A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116503147A (en) * 2023-06-29 2023-07-28 北京裕芃科技有限公司 Financial risk prediction method based on deep learning neural network

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014111540A1 (en) * 2013-01-21 2014-07-24 Ides Technologies Sa System and method for characterizing financial messages
CN105389471A (en) * 2015-11-19 2016-03-09 电子科技大学 Method for reducing training set of machine learning
CN108572947A (en) * 2017-03-13 2018-09-25 腾讯科技(深圳)有限公司 A kind of data fusion method and device
CN108897834A (en) * 2018-06-22 2018-11-27 招商信诺人寿保险有限公司 Data processing and method for digging
CN109783617A (en) * 2018-12-11 2019-05-21 平安科技(深圳)有限公司 For replying model training method, device, equipment and the storage medium of problem
CN110008983A (en) * 2019-01-17 2019-07-12 西安交通大学 A kind of net flow assorted method of the adaptive model based on distributed fuzzy support vector machine
CN110263024A (en) * 2019-05-20 2019-09-20 平安普惠企业管理有限公司 Data processing method, terminal device and computer storage medium
CN110442516A (en) * 2019-07-12 2019-11-12 上海陆家嘴国际金融资产交易市场股份有限公司 Information processing method, equipment and computer readable storage medium
CN110679114A (en) * 2017-05-24 2020-01-10 国际商业机器公司 Method for estimating deletability of data object
CN110750640A (en) * 2019-09-17 2020-02-04 平安科技(深圳)有限公司 Text data classification method and device based on neural network model and storage medium
CN110941598A (en) * 2019-12-02 2020-03-31 北京锐安科技有限公司 Data deduplication method, device, terminal and storage medium
CN111046969A (en) * 2019-12-23 2020-04-21 Oppo(重庆)智能科技有限公司 Data screening method and device, storage medium and electronic equipment
CN111090813A (en) * 2019-12-20 2020-05-01 腾讯科技(深圳)有限公司 Content processing method and device and computer readable storage medium
CN111177765A (en) * 2020-01-06 2020-05-19 广州知弘科技有限公司 Financial big data processing method, storage medium and system
CN111275062A (en) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 Model training method, device, server and computer readable storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014111540A1 (en) * 2013-01-21 2014-07-24 Ides Technologies Sa System and method for characterizing financial messages
CN105389471A (en) * 2015-11-19 2016-03-09 电子科技大学 Method for reducing training set of machine learning
CN108572947A (en) * 2017-03-13 2018-09-25 腾讯科技(深圳)有限公司 A kind of data fusion method and device
CN110679114A (en) * 2017-05-24 2020-01-10 国际商业机器公司 Method for estimating deletability of data object
CN108897834A (en) * 2018-06-22 2018-11-27 招商信诺人寿保险有限公司 Data processing and method for digging
CN111275062A (en) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 Model training method, device, server and computer readable storage medium
CN109783617A (en) * 2018-12-11 2019-05-21 平安科技(深圳)有限公司 For replying model training method, device, equipment and the storage medium of problem
CN110008983A (en) * 2019-01-17 2019-07-12 西安交通大学 A kind of net flow assorted method of the adaptive model based on distributed fuzzy support vector machine
CN110263024A (en) * 2019-05-20 2019-09-20 平安普惠企业管理有限公司 Data processing method, terminal device and computer storage medium
CN110442516A (en) * 2019-07-12 2019-11-12 上海陆家嘴国际金融资产交易市场股份有限公司 Information processing method, equipment and computer readable storage medium
CN110750640A (en) * 2019-09-17 2020-02-04 平安科技(深圳)有限公司 Text data classification method and device based on neural network model and storage medium
CN110941598A (en) * 2019-12-02 2020-03-31 北京锐安科技有限公司 Data deduplication method, device, terminal and storage medium
CN111090813A (en) * 2019-12-20 2020-05-01 腾讯科技(深圳)有限公司 Content processing method and device and computer readable storage medium
CN111046969A (en) * 2019-12-23 2020-04-21 Oppo(重庆)智能科技有限公司 Data screening method and device, storage medium and electronic equipment
CN111177765A (en) * 2020-01-06 2020-05-19 广州知弘科技有限公司 Financial big data processing method, storage medium and system

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116503147A (en) * 2023-06-29 2023-07-28 北京裕芃科技有限公司 Financial risk prediction method based on deep learning neural network

Similar Documents

Publication Publication Date Title
US10915578B1 (en) Graph outcome determination in domain-specific execution environment
CN111428881B (en) Recognition model training method, device, equipment and readable storage medium
US11537852B2 (en) Evolving graph convolutional networks for dynamic graphs
CN110084377B (en) Method and device for constructing decision tree
EP4010815A1 (en) Graph evolution and outcome determination for graph-defined program states
US11520899B2 (en) System and method for machine learning architecture with adversarial attack defense
CN111160749B (en) Information quality assessment and information fusion method and device
CN111401558A (en) Data processing model training method, data processing device and electronic equipment
CN111401700A (en) Data analysis method, device, computer system and readable storage medium
CN112508118B (en) Target object behavior prediction method aiming at data offset and related equipment thereof
CN112785005B (en) Multi-objective task assistant decision-making method and device, computer equipment and medium
CN112035549B (en) Data mining method, device, computer equipment and storage medium
CN113822315A (en) Attribute graph processing method and device, electronic equipment and readable storage medium
CN112990583B (en) Method and equipment for determining model entering characteristics of data prediction model
CN112150266A (en) Design principle of intelligent contract prediction machine
US11354752B2 (en) Systems and methods for a simulation program of a percolation model for the loss distribution caused by a cyber attack
CN112765481B (en) Data processing method, device, computer and readable storage medium
CN111882416A (en) Training method and related device of risk prediction model
CN113850669A (en) User grouping method and device, computer equipment and computer readable storage medium
CN117291722A (en) Object management method, related device and computer readable medium
CN116881898A (en) Authority changing method, system, device and storage medium
CN116629423A (en) User behavior prediction method, device, equipment and storage medium
CN111882415A (en) Training method and related device of quality detection model
CN115713424A (en) Risk assessment method, risk assessment device, equipment and storage medium
CN112346737B (en) Method, device and equipment for training programming language translation model and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination