CN117522143A

CN117522143A - Method, device, equipment and storage medium for determining risk level

Info

Publication number: CN117522143A
Application number: CN202311617017.7A
Authority: CN
Inventors: 何治宇
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2023-11-29
Filing date: 2023-11-29
Publication date: 2024-02-06

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for determining risk levels. Comprising the following steps: acquiring transaction information of a target user; wherein the transaction information comprises a plurality of key information; dividing the plurality of key information into characteristic information of a plurality of labels; wherein the tag comprises at least one of an identity tag, a credit tag, a communication tag, a fund tag and an associated tag; and inputting the characteristic information of the multiple labels into a pre-trained random forest model, and outputting the target risk level of the target user. According to the risk level determining method provided by the embodiment of the invention, the plurality of key information is divided into the characteristic information of the plurality of labels, the characteristic information of the plurality of labels is input into the pre-trained random forest model, the target risk level of the target user is output, and the accuracy of determining the risk level of the target user can be improved, so that the safety of financial data is improved.

Description

Method, device, equipment and storage medium for determining risk level

Technical Field

The embodiment of the invention relates to the technical field of data processing, in particular to a method, a device, equipment and a storage medium for determining a risk level.

Background

Due to the rapid development of internet finance, safe and reliable service is provided for legal clients, and convenience is provided for illegal personnel to conduct illegal activities. The bank is an important ring of money melting, and can grade risks for users and timely dispose accounts of the users so as to reduce losses of clients. With the popularity of informatization systems, within banks, government agencies and other businesses accumulate vast amounts of user attributes and behavioral information, which makes it possible to analyze user behavior and mine the connections between users.

Disclosure of Invention

The embodiment of the invention provides a method, a device, equipment and a storage medium for determining risk levels, which can improve the accuracy of determining the risk levels of users, thereby improving the safety of financial data.

In a first aspect, an embodiment of the present invention provides a method for determining a risk level, including:

acquiring transaction information of a target user; wherein the transaction information comprises a plurality of key information;

dividing the plurality of key information into characteristic information of a plurality of labels; wherein the tag comprises at least one of an identity tag, a credit tag, a communication tag, a fund tag and an associated tag;

and inputting the characteristic information of the multiple labels into a pre-trained random forest model, and outputting the target risk level of the target user.

In a second aspect, an embodiment of the present invention further provides a device for determining a risk level, including:

the transaction information acquisition module is used for acquiring the transaction information of the target user; wherein the transaction information comprises a plurality of key information;

the tag characteristic information dividing module is used for dividing the plurality of key information into characteristic information of a plurality of tags; wherein the tag comprises at least one of an identity tag, a credit tag, a communication tag, a fund tag and an associated tag;

and the target risk level output module is used for inputting the characteristic information of the plurality of labels into a pre-trained random forest model and outputting the target risk level of the target user.

In a third aspect, an embodiment of the present invention further provides an electronic device, including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method for determining a risk level according to the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention further provides a computer readable storage medium, where a computer instruction is stored, where the computer instruction is configured to cause a processor to execute the method for determining a risk level according to the embodiment of the present invention

The embodiment of the invention discloses a method, a device, equipment and a storage medium for determining risk levels. Acquiring transaction information of a target user; wherein, the transaction information comprises a plurality of key information; dividing the plurality of key information into characteristic information of a plurality of labels; the label comprises at least one of an identity label, a credit label, a communication label, a fund label and an associated label; and inputting the characteristic information of the multiple labels into a pre-trained random forest model, and outputting the target risk level of the target user. According to the risk level determining method provided by the embodiment of the invention, the plurality of key information is divided into the characteristic information of the plurality of labels, the characteristic information of the plurality of labels is input into the pre-trained random forest model, the target risk level of the target user is output, and the accuracy of determining the risk level of the target user can be improved, so that the safety of financial data is improved.

Drawings

FIG. 1 is a flow chart of a method for determining a risk level in accordance with a first embodiment of the present invention;

FIG. 2 is an exemplary diagram of predicting a target risk level using a random forest model in accordance with a first embodiment of the present invention;

fig. 3 is a schematic structural diagram of a risk level determining apparatus according to a second embodiment of the present invention;

fig. 4 is a schematic structural diagram of an electronic device in a third embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flowchart of a risk level determining method according to an embodiment of the present invention, where the method may be applied to determining a risk level of a user, and the method may be performed by a risk level determining device, where the device may be implemented in software and/or hardware, and optionally, implemented by an electronic device, where the electronic device may be a mobile terminal, a PC side, a server, or the like. The method specifically comprises the following steps:

s110, acquiring transaction information of a target user.

Wherein the transaction information comprises a plurality of key information. The key information may include: user basic information, user credit information, user communication information, user funds information, and user-associated information.

S120, dividing the plurality of key information into characteristic information of a plurality of labels.

Wherein the tag includes at least one of an identity tag, a credit tag, a communication tag, a funds tag, and an associated tag.

Specifically, the manner of dividing the transaction information into the characteristic information of the plurality of tags may be: the transaction information is preprocessed as follows: cleaning, structuring and standardizing; carrying out semantic analysis on the preprocessed transaction information to obtain semantic features of each key information; the plurality of key information is divided into feature information of a plurality of tags based on semantic features.

Wherein, the information belonging to the identity tag may include: user identification, user name, contact details, etc. The information pertaining to the credit tag may include: information such as loan amount, loan number and loan application number of the user in each financial institution (such as banks and Internet financial platforms). The information belonging to the communication tag may include information of the communication device, search records of various categories of information on the internet, search frequency differences, consultation records, and the like. The information pertaining to the funds label may include: consumption records, consumption trends, account flows under the name, transaction time and other information. The information belonging to the associated tag may include: contact information, common contact risk information, association organizations, association legal persons and the like.

S130, inputting the characteristic information of the multiple labels into a pre-trained random forest model, and outputting the target risk level of the target user.

The random forest model comprises a plurality of decision trees and prediction modules corresponding to the labels. The risk levels may include five levels of very low risk, lower risk, medium and high risk.

Specifically, the feature information of the multiple tags is input into a pre-trained random forest model, and the mode of outputting the target risk level of the target user can be as follows: respectively inputting the characteristic information of the various labels into corresponding decision trees, and outputting initial risk level results corresponding to the labels respectively; and inputting initial risk level results corresponding to the labels respectively into a prediction module for processing, and outputting the target risk level of the target user.

The initial risk level result includes probabilities of the risk levels corresponding to the labels, namely, probabilities of five risk levels. Fig. 2 is an exemplary diagram of predicting a target risk level by using a random forest model, where the random forest model includes five decision trees corresponding to an identity tag, a credit tag, a communication tag, a fund tag and an associated tag, and the feature information of the identity tag, the feature information of the credit tag, the feature information of the communication tag, the feature information of the fund tag and the feature information of the associated tag are input into the corresponding decision trees respectively, and initial risk level results corresponding to the identity tag, the credit tag, the communication tag, the fund tag and the associated tag are output, as shown in fig. 2. Namely the probability of five risk levels corresponding to the identity tag, the probability of five risk levels corresponding to the credit tag, the probability of five risk levels corresponding to the communication tag, the probability of five risk levels corresponding to the fund tag and the probability of five risk levels corresponding to the associated tag.

Specifically, the process of inputting the initial risk level results corresponding to each label into the prediction module for processing may be: for each risk level, fusing the probability of the risk level of each label in the initial risk level result to obtain the target probability of the risk level; and determining the risk level with the highest target probability as the target risk level of the target user.

The way to fuse the probabilities of the risk levels of the tags may be: and carrying out weighted summation on the probabilities of the risk levels of the labels to obtain the target probability of each risk level.

For example, for a very low risk level, the very low risk level probability for an identity tag is a1, the very low risk level probability for a credit tag is a2, the very low risk level probability for a communication tag is a3, the very low risk level probability for a funds tag is a4, the very low risk level probability for an associated tag is a5, and then the weighted summation of a1, a2, a3, a4, and a5 is performed to obtain the target probability for the very low risk level. Assuming that the target probability of the medium risk is highest among the five classes of extremely low risk, lower risk, medium and high risk, the target user belongs to the medium risk.

Alternatively, the training mode of the random forest model may be: acquiring transaction information samples in a plurality of data sources; fusing transaction information samples in a plurality of data sources, and preprocessing the fused transaction information samples in sequence as follows: cleaning, structuring and standardizing; dividing the preprocessed transaction information sample into characteristic information samples of various labels; training the random forest model based on the characteristic information samples.

The plurality of data sources may be different data sources such as financial institutions and non-financial institutions. The process of fusing transaction information samples in multiple data sources may be: and analyzing, mining and fusing transaction information samples of a plurality of data sources, so that each data source can acquire global knowledge according to local data, and the problems of cooperation and privacy protection among the data sources are solved while the win-win result is achieved. The manner of dividing the preprocessed transaction information sample into the feature information samples of the plurality of tags is similar to the manner of dividing the plurality of key information into the feature information of the plurality of tags in the above embodiment, and will not be described herein.

Specifically, the training method for the random forest model based on the characteristic information sample may be: respectively inputting the characteristic information samples of the labels into corresponding decision trees, and outputting initial prediction risk level results corresponding to the labels respectively; inputting initial prediction risk level results corresponding to the labels respectively into a prediction module for processing, and outputting target prediction risk levels; training the random forest model based on the target predicted risk level and the real risk level of the transaction information sample.

The initial predicted risk result comprises probabilities of risk levels corresponding to the labels. The structure of the random forest model can be seen in fig. 2. The manner of inputting the initial risk level prediction results corresponding to each label into the prediction module for processing may refer to the process of inputting the initial risk level result corresponding to each label into the prediction module for processing in the above embodiment, which is not described herein. The training of the random forest model based on the target predicted risk level and the real risk level of the transaction information sample may be: and adjusting parameters in the random forest model based on the target prediction risk level and the real risk level of the transaction information sample until the accuracy of the random forest model meets the requirement.

According to the technical scheme, transaction information of a target user is obtained; wherein, the transaction information comprises a plurality of key information; dividing the plurality of key information into characteristic information of a plurality of labels; the label comprises at least one of an identity label, a credit label, a communication label, a fund label and an associated label; and inputting the characteristic information of the multiple labels into a pre-trained random forest model, and outputting the target risk level of the target user. According to the risk level determining method provided by the embodiment of the invention, the plurality of key information is divided into the characteristic information of the plurality of labels, the characteristic information of the plurality of labels is input into the pre-trained random forest model, the target risk level of the target user is output, and the accuracy of determining the risk level of the target user can be improved, so that the safety of financial data is improved.

Example two

Fig. 3 is a schematic structural diagram of a risk level determining apparatus according to a second embodiment of the present invention, where, as shown in fig. 3, the apparatus includes:

a transaction information acquisition module 310, configured to acquire transaction information of a target user; wherein, the transaction information comprises a plurality of key information;

the tag feature information dividing module 320 is configured to divide the plurality of key information into feature information of a plurality of tags; the label comprises at least one of an identity label, a credit label, a communication label, a fund label and an associated label;

the target risk level output module 330 is configured to input the feature information of the plurality of labels into a pre-trained random forest model, and output a target risk level of the target user.

Optionally, the random forest model includes a plurality of decision trees and prediction modules corresponding to the labels.

Optionally, the target risk level output module 330 is further configured to:

respectively inputting the characteristic information of the various labels into corresponding decision trees, and outputting initial risk level results corresponding to the labels respectively; the initial risk level result comprises probabilities of all risk levels corresponding to the labels;

and inputting initial risk level results corresponding to the labels respectively into a prediction module for processing, and outputting the target risk level of the target user.

Optionally, the target risk level output module 330 is further configured to:

for each risk level, fusing the probability of the risk level of each label in the initial risk level result to obtain the target probability of the risk level;

and determining the risk level with the highest target probability as the target risk level of the target user.

Optionally, the tag feature information dividing module 320 is further configured to:

the transaction information is preprocessed as follows: cleaning, structuring and standardizing;

carrying out semantic analysis on the preprocessed transaction information to obtain semantic features of each key information;

the plurality of key information is divided into feature information of a plurality of tags based on semantic features.

Optionally, the method further comprises: a random forest model training module for:

acquiring transaction information samples in a plurality of data sources;

fusing transaction information samples in a plurality of data sources, and preprocessing the fused transaction information samples in sequence as follows: cleaning, structuring and standardizing;

dividing the preprocessed transaction information sample into characteristic information samples of various labels;

training the random forest model based on the characteristic information samples.

Optionally, the random forest model training module is further configured to:

respectively inputting the characteristic information samples of the labels into corresponding decision trees, and outputting initial prediction risk level results corresponding to the labels respectively; the initial prediction risk result comprises the probability of each risk level corresponding to the label;

inputting initial prediction risk level results corresponding to the labels respectively into a prediction module for processing, and outputting target prediction risk levels;

training the random forest model based on the target predicted risk level and the real risk level of the transaction information sample.

The device can execute the method provided by all the embodiments of the invention, and has the corresponding functional modules and beneficial effects of executing the method. Technical details not described in detail in this embodiment can be found in the methods provided in all the foregoing embodiments of the invention.

Example III

Fig. 4 shows a schematic diagram of the structure of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 4, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the various methods and processes described above, such as the method of determining the risk level.

In some embodiments, the method of determining the risk level may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the above-described method of determining a risk level may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the method of determining the risk level in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for determining a risk level, comprising:

2. The method of claim 1, wherein the random forest model comprises a plurality of decision trees and prediction modules corresponding to the tags.

3. The method of claim 2, wherein inputting the characteristic information of the plurality of tags into a pre-trained random forest model, outputting a target risk level of the target user, comprises:

respectively inputting the characteristic information of the plurality of labels into corresponding decision trees, and outputting initial risk level results respectively corresponding to the labels; wherein the initial risk level result comprises probabilities of risk levels corresponding to the labels;

and inputting initial risk level results corresponding to the labels respectively into the prediction module for processing, and outputting the target risk level of the target user.

4. A method according to claim 3, wherein inputting the initial risk level results corresponding to the tags into the prediction module for processing comprises:

5. The method of claim 1, wherein dividing the transaction information into characteristic information of a plurality of tags comprises:

and dividing the plurality of key information into characteristic information of a plurality of labels based on the semantic features.

6. The method according to claim 2, wherein the training mode of the random forest model is:

acquiring transaction information samples in a plurality of data sources;

fusing transaction information samples in the plurality of data sources, and preprocessing the fused transaction information samples in sequence as follows: cleaning, structuring and standardizing;

training the random forest model based on the characteristic information sample.

7. The method of claim 6, wherein training the random forest model based on the feature information samples comprises:

respectively inputting the characteristic information samples of the labels into corresponding decision trees, and outputting initial prediction risk level results corresponding to the labels respectively; wherein the initial predicted risk result comprises probabilities of risk levels corresponding to the labels;

inputting initial prediction risk level results corresponding to the labels respectively into the prediction module for processing, and outputting target prediction risk levels;

8. A risk level determining apparatus, comprising:

9. An electronic device, the electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the method of determining a risk level of any one of claims 1-7.

10. A computer readable storage medium storing computer instructions for causing a processor to perform the method of determining a risk level according to any one of claims 1-7.