CN112507628B

CN112507628B - Risk prediction method and device based on deep bidirectional language model and electronic equipment

Info

Publication number: CN112507628B
Application number: CN202110148727.4A
Authority: CN
Inventors: 王骞; 沈赟
Original assignee: Beijing Qilu Information Technology Co Ltd
Current assignee: Beijing Qilu Information Technology Co Ltd
Priority date: 2021-02-03
Filing date: 2021-02-03
Publication date: 2021-07-02
Anticipated expiration: 2041-02-03
Also published as: CN112507628A

Abstract

The invention provides a risk prediction method and device based on a deep two-way language model and electronic equipment. The method comprises the following steps: acquiring position text information of a historical user, and extracting address text information of the historical user at least one specific time point; pre-training a deep bidirectional language model by using a Bert model based on a self-attention mechanism for semantic vector conversion; splicing the address text information by using the deep bidirectional language model, and converting word vectors and sentence vectors to generate user address characteristic data; constructing a risk prediction model, and training the risk prediction model by using a training data set; and calculating the risk assessment value of the current user by using the risk prediction model so as to predict the risk of the current user. According to the method, the risk prediction model is obtained by adding the Sigmoid layer to the depth bidirectional language model, effective characteristic data can be mined, risk users can be identified more accurately, and the model precision can be further improved.

Description

Risk prediction method and device based on deep bidirectional language model and electronic equipment

Technical Field

The invention relates to the field of computer information processing, in particular to a risk prediction method and device based on a deep bidirectional language model and electronic equipment.

Background

Risk control (wind control for short) means that a risk manager takes various measures and methods to eliminate or reduce various possibilities of occurrence of a risk case, or a risk controller reduces losses caused when a risk case occurs. The risk control is generally applied to the financial industry, such as risk control on company transactions, merchant transactions or personal transactions and the like.

With the rapid development of machine learning technology, risk prediction is realized by training a machine learning model in the related technology. Specifically, in the process of training the machine learning model, generally, the classification loss of the model is reduced as a training target, and finally, a risk prediction model with the classification loss meeting a set requirement is obtained, and then the trained risk prediction model is used for predicting the risk level information. However, the model accuracy of the risk prediction model provided by the related art needs to be improved. In fact, the main objective of financial risk prediction is how to distinguish good customers from bad customers, and predict the risk condition of the customers so as to reduce credit risk and realize profit maximization. In addition, there is still much room for improvement in the troubleshooting, risk-resistant feature extraction, and model prediction accuracy of some high-risk users.

Therefore, it is necessary to provide a risk prediction method with higher accuracy.

Disclosure of Invention

In order to improve the model prediction precision, accurately evaluate the risk condition of a user and further improve the characteristic data extraction method, the invention provides a risk prediction method based on a permanent station, which comprises the following steps: acquiring position text information of a historical user, and extracting address text information of the historical user at least one specific time point; pre-training a deep bidirectional language model by using a Bert model based on a self-attention mechanism for semantic vector conversion; splicing the address text information by using the deep bidirectional language model, and converting word vectors and sentence vectors to generate user address characteristic data; establishing a training data set and a testing data set, wherein the training data set comprises user address characteristic data and anti-risk performance data of historical users; constructing a risk prediction model, and training the risk prediction model by using the training data set; and calculating the risk assessment value of the current user by using the risk prediction model so as to predict the risk of the current user.

Preferably, the extracting of the user address information of the historical user at least one specific time point includes: and extracting address text information of the historical user during application, registration and login, wherein the address text information comprises longitude and latitude information and detailed geographic information.

Preferably, the deep bidirectional language model comprises the following structural layers: the first layer is an input layer, and text sentences to be predicted are input into the deep bidirectional language model; the second layer is a word vector construction layer, and each word is mapped to a low-dimensional vector; the third layer is a Bi-LSTM network layer, and correlation characteristics are extracted from the word vector layer by using the Bi-LSTM based on each word vector and sentence vector; the fourth layer is a self-attention mechanism layer, weight vectors corresponding to all words are generated, and word-level features in each iteration are combined into sentence-level features through multiplication of the weight vectors, so that user address feature data are obtained; the fifth layer is an output layer, and the user address characteristic data is used for user risk classification.

Preferably, the generating the user address characteristic data comprises: using a transform bidirectional encoder to express, and training depth bidirectional expression in advance by jointly adjusting the context in each layer to obtain a word vector of each word, the correlation degree of each word and other words in the text sentence and the weight of each word; and adjusting parameters according to the correlation between different words and the weight of each word in the text sentence, and obtaining the word vector of each word again to generate the user address characteristic data.

Preferably, the word vector of each word includes a word vector, a segment vector, and a position vector.

Preferably, the constructing a risk prediction model comprises: and adding a Sigmoid layer to the depth bidirectional language model as an additional output layer by using a Bert model to obtain the risk prediction model.

Preferably, the method further comprises the following steps: and setting a pre-training task for pre-training deep bidirectional representation, wherein the pre-training task is a plurality of tasks and comprises a word prediction task and a next text sentence prediction task.

Preferably, the pre-training task comprises: randomly masking a certain number of words, and predicting the masked words by using a complete filling mechanism; at the time of data generation execution, replacing words with mask marks for 80% of the time period; replacing the word with a random word tag for 10% of the time period; the original word is kept unchanged for a period of 10%.

Preferably, the pre-training task further comprises: pre-training a two-classification task as a next text sentence prediction task, adding the next text sentence prediction task into a word prediction task, and performing multi-task learning; 50% of sample sentence pairs are obtained, and one sample sentence in the sample sentence pairs is replaced by a random sentence to be used as a negative sample for establishing a training data set.

Preferably, the anti-risk performance data comprises a probability of overdue and/or a probability of default.

Preferably, an unsupervised clustering algorithm is used for carrying out clustering analysis on the position text information, the extracted address text information and the user address feature data of the historical user; and determining risk corresponding relations among different user addresses based on the clustering analysis result so as to mark risk labels of all users for establishing a training data set.

In addition, the invention also provides a risk prediction device based on the permanent station, which comprises: the acquisition module is used for acquiring the position text information of the historical user and extracting the address text information of the historical user at least one specific time point; the processing module is used for pre-training a deep bidirectional language model by using a Bert model based on a self-attention mechanism so as to be used for semantic vector conversion; the data generation module is used for splicing the address text information by using the deep bidirectional language model and performing word vector and sentence vector conversion to generate user address characteristic data; the system comprises an establishing module, a testing module and a processing module, wherein the establishing module is used for establishing a training data set and a testing data set, and the training data set comprises user address characteristic data and anti-risk performance data of historical users; a model construction module for constructing a risk prediction model, which is trained using the training data set; and the prediction module is used for calculating the risk assessment value of the current user by using the risk prediction model so as to predict the risk of the current user.

Preferably, the system further comprises an extraction module, wherein the extraction module is used for extracting address text information of the historical user during application, registration and login, and the address text information comprises longitude and latitude information and detailed geographic information.

Preferably, the system further comprises a setting module, wherein the setting module is used for setting a pre-training task for pre-training the deep bidirectional representation, and the pre-training task is a plurality of tasks including a word prediction task and a next text sentence prediction task.

In addition, the present invention also provides an electronic device, wherein the electronic device includes: a processor; and a memory storing computer executable instructions that, when executed, cause the processor to perform the deep bi-directional language model based risk prediction method of the present invention.

Furthermore, the present invention also provides a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs which, when executed by a processor, implement the risk prediction method based on a deep bidirectional language model according to the present invention.

Advantageous effects

Compared with the prior art, the method uses the Bert model, adopts the bidirectional Transformer network structure with stronger semantic ability to pre-train a large amount of linguistic data, can obtain a more universal deep bidirectional language model, can improve the language comprehension of the model, and can also improve the model precision; the risk prediction model is obtained by adding a Sigmoid layer to the depth bidirectional language model, and effective feature data mining can be performed on position (or geographic) text information, so that the extraction method of feature data can be further optimized, risk users can be identified more accurately, overfitting can be prevented, and the model precision can be further improved.

Drawings

In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive faculty.

Fig. 1 is a flowchart of an example of a risk prediction method based on a deep bidirectional language model according to embodiment 1 of the present invention.

Fig. 2 is a flowchart of another example of the risk prediction method based on the deep bidirectional language model according to embodiment 1 of the present invention.

Fig. 3 is a flowchart of another example of the risk prediction method based on the deep bidirectional language model according to embodiment 1 of the present invention.

Fig. 4 is a schematic diagram of an example of the risk prediction apparatus based on the deep bidirectional language model according to embodiment 2 of the present invention.

Fig. 5 is a schematic diagram of another example of the risk prediction apparatus based on the deep bidirectional language model according to embodiment 2 of the present invention.

Fig. 6 is a schematic diagram of still another example of the risk prediction apparatus based on the deep bidirectional language model according to embodiment 2 of the present invention.

Fig. 7 is a block diagram of an exemplary embodiment of an electronic device according to the present invention.

Fig. 8 is a block diagram of an exemplary embodiment of a computer-readable medium according to the present invention.

Detailed Description

Exemplary embodiments of the present invention will now be described more fully with reference to the accompanying drawings. The exemplary embodiments, however, may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. The same reference numerals denote the same or similar elements, components, or parts in the drawings, and thus their repetitive description will be omitted.

Features, structures, characteristics or other details described in a particular embodiment do not preclude the fact that the features, structures, characteristics or other details may be combined in a suitable manner in one or more other embodiments in accordance with the technical idea of the invention.

In describing particular embodiments, the present invention has been described with reference to features, structures, characteristics or other details that are within the purview of one skilled in the art to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific features, structures, characteristics, or other details.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these terms should not be construed as limiting. These phrases are used to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention.

The term "and/or" and/or "includes any and all combinations of one or more of the associated listed items.

In order to improve the model prediction precision, accurately evaluate the user risk condition and further improve the feature data extraction method, the invention provides a risk prediction method based on a deep bidirectional language model. In addition, a risk prediction model is obtained by adding a Sigmoid layer to the deep two-way language model, effective characteristic data mining can be carried out on position (or geographic) text information, risk users can be identified more accurately, overfitting can be prevented, and model precision can be further improved. The specific risk prediction process will be described in detail below.

Example 1

Hereinafter, an embodiment of the risk prediction method based on the deep bi-directional language model of the present invention will be described with reference to fig. 1 to 3.

FIG. 1 is a flowchart of a risk prediction method based on a deep bi-directional language model according to the present invention. As shown in fig. 1, the risk prediction method includes the following steps.

Step S101, obtaining position text information of the historical user, and extracting address text information of the historical user at least one specific time point.

Step S102, pre-training a deep bidirectional language model by using a Bert model based on a self-attention mechanism for semantic vector conversion.

And step S103, using the deep bidirectional language model to splice the address text information, and performing word vector and sentence vector conversion to generate user address characteristic data.

Step S104, establishing a training data set and a testing data set, wherein the training data set comprises user address characteristic data and risk resistance performance data of historical users.

Step S105, a risk prediction model is constructed, and the risk prediction model is trained by using the training data set.

And step S106, calculating a risk assessment value of the current user by using the risk prediction model so as to predict the risk of the current user.

First, in step S101, location text information of a history user is acquired, and address text information of the history user at least one specific time point is extracted.

For example, the user characteristic information and the position text information of the historical user are obtained through a third-party database or from APP use data of a certain financial product.

Specifically, the specific time point includes a request node, a registration node, a login node, a transaction node, a default node, a return node, and the like of the financial product.

Note that, in this example, the financial product is a financial service product, a money management product, a financing product, or the like. However, the present invention is not limited thereto, and the above description is only by way of example and is not to be construed as limiting the present invention.

Preferably, address text information of the historical user during application, registration and login is extracted, wherein the address text information comprises longitude and latitude information and detailed geographic information.

For example, address text information "street number, longitude and latitude of sunny district in beijing city" of history user a "at the time of registration" is extracted. For another example, address text information "houbei province city country, longitude, latitude, and degree" when the historical user a registered at date. For another example, address text information "cell building" of Changping district of Beijing city of the historical user A at the resource return node is extracted.

It should be noted that the above description is only given as a preferred example, and the present invention is not limited thereto. In other examples, the address text information of more than three nodes may be extracted, or the address text information of all nodes may be extracted, and the like.

Next, in step S102, a deep bi-directional language model is pre-trained using the Bert model for semantic vector conversion based on the self-attention mechanism.

Specifically, a language model based on the self-attention mechanism and BERT (bidirectional Encoder expressions) is constructed.

In this example, a deep bi-directional language model specifically includes the following structural layers: the first layer is an input layer, and text sentences to be predicted are input into the deep bidirectional language model; the second layer is a word vector construction layer, and each word is mapped to a low-dimensional vector; the third layer is a Bi-LSTM network layer, and correlation characteristics are extracted from the word vector layer by using the Bi-LSTM based on each word vector and sentence vector; the fourth layer is a self-attention mechanism layer, weight vectors corresponding to all words are generated, and word-level features in each iteration are combined into sentence-level features through multiplication of the weight vectors, so that user address feature data are obtained; the fifth layer is an output layer, and the user address characteristic data is used for user risk classification.

Specifically, the deep bi-directional language model mainly uses a bi-directional Transformer coding layer, abandons the recurrent network structure of RNN, and is completely based on the attention mechanism. The deep bidirectional language model uses an Encoder of a Transformer to extract text features, wherein the Encoder consists of a Self-Attention mechanism (Self-Attention mechanism) and a Network (Feed forward Neural Network), the core of the Encoder is the Self-Attention mechanism, the relationship between each word and other words in a current text sentence can be determined, and distance limitation does not exist, so that left and right context information of each word can be fully mined, and bidirectional representation of the words is obtained.

In this example, the deep bidirectional language model uses a piece of address text as modeling feature data, wherein the piece of address text includes address text information of at least three specific time points of the extracted user.

As shown in fig. 2, the method further includes a step S201 of setting a pre-training task for pre-training the depth bi-directional representation.

In step S201, a pre-training task for pre-training the deep bi-directional representation is set for pre-training the deep bi-directional language model.

In this example, the pre-training task is a plurality of tasks including a word prediction task and a next text sentence prediction task.

Specifically, randomly masking a certain number of words, and predicting the masked words by using a complete filling mechanism, wherein the certain number is 10% -30%, preferably 16%, and more preferably 14% of the total number of words of the current text sentence; at the time of data generation execution, replacing words with mask marks for 80% of the time period; replacing the word with a random word tag for 10% of the time period; the original word is kept unchanged for a period of 10%.

And further, a two-classification task is trained in advance to serve as a next text sentence prediction task, and the next text sentence prediction task is added into the word prediction task to perform multi-task learning.

Further, from the extracted address text information of all the historical users, 50% of sample sentence pairs are obtained, one sample sentence in the sample sentence pair is replaced by a random sentence as a negative sample for establishing a training data set for pre-training, and a deep bi-directional representation is pre-trained by jointly adjusting the context in each layer using a transform bi-directional encoder representation. Therefore, by using the Bert model and adopting a bidirectional Transformer network structure with stronger semantic ability, a large amount of linguistic data are pre-trained, a more universal deep bidirectional language model can be obtained, the language comprehension of the model can be improved, and the model precision can be improved.

It should be noted that the above description is only given by way of example, and the present invention is not limited thereto.

Next, in step S103, the address text information is spliced by using the deep bidirectional language model, and word vector and sentence vector conversion is performed to generate user address feature data.

In the example, address text information of a historical user during application, registration and login is extracted, each address text information is segmented, the three address text information are spliced and combined into an address text sentence with a specific length.

Specifically, each word of the extracted user address text information and the combined address text sentence are subjected to deep bidirectional representation by using a pre-trained deep bidirectional language model, so as to obtain a word vector of each word, a correlation degree of each word and other words in the text sentence (i.e. the combined address text sentence), and a weight of each word.

Further, according to the correlation between different words and the weight of each word in the current text sentence, performing parameter adjustment, and obtaining a word vector of each word again to generate user address feature data, wherein the word vector of each word comprises a word vector, a segment vector and a position vector, that is, each generated user address feature data comprises a word vector, a segment vector and a position vector.

Next, in step S104, a training data set including user address feature data and risk-resistant performance data of the historical user and a test data set are established.

Specifically, for the training data set, good and bad samples are defined, and the label is 0 and 1, where 1 represents a sample with a user default probability (and/or overdue probability) greater than or equal to a specific threshold, and 0 represents a sample with a user less than the default probability (and/or overdue probability) of the specific threshold. Typically, the calculated risk assessment value (in this example, the overdue probability) is a number between 0 and 1, which is used to represent the user risk situation. The closer the user's risk assessment value is to 1, the less the user's resistance to risk (i.e., the more risky the fund recovery), and the closer the user's risk assessment value is to 0, the greater the user's resistance to risk (i.e., the better the fund recovery).

As shown in fig. 3, the method further includes a step S301 of performing cluster analysis on the location text information of the historical user, the extracted address text information, and the user address feature data.

In step S301, clustering analysis is performed on the location text information of the historical user, the extracted address text information, and the user address feature data.

Preferably, the clustering analysis is performed using a K-means clustering algorithm.

Further, based on the cluster analysis result, determining the risk correspondence between different user addresses, and labeling the risk label of each user, that is, giving a risk coefficient value to each user.

Specifically, the risk correspondence is formed by address text information of the user and risk resistance of the user.

In another example, risk correspondences of two or more users are determined based on a calculation of vector similarity of address text information of different users.

Further, according to the risk corresponding relation and address text information of the user, the anti-risk value of the user can be determined. Preferably, the training data set further comprises a risk-resistance value of the historical user. Or using the user address characteristic data and the anti-risk performance data of the historical users marked with the risk labels to establish a training data set.

In this example, the anti-risk performance data includes a probability of overdue and/or a probability of breach.

Specifically, the input features are user address feature data, a risk resistance value and user feature data, and the output features are risk assessment values.

It should be noted that, for the input feature, in other examples, only the user address feature data may be included, or social text data, performance data of professional categories, and the like may also be included. The foregoing is described by way of preferred examples only and is not to be construed as limiting the invention.

Further, the method also comprises the steps of establishing a test data set and evaluating parameters and corresponding threshold values, wherein the evaluation parameters comprise AUC values and KS values. And performing model parameter adjustment and model effect verification by using the test data.

Preferably, in case that the evaluation parameter is equal to or less than the respective threshold value, the model parameter adjustment is performed until the evaluation parameter is greater than the respective threshold value.

Next, in step S105, a risk prediction model is constructed, which is trained using the training data set.

In this example, the constructing the risk prediction model includes: and adding a Sigmoid layer to the depth bidirectional language model as an additional output layer by using a Bert model to obtain the risk prediction model.

Specifically, a Sigmoid function is added to the outermost layer to form an additional output layer, and a risk assessment value is output and used for representing the probability that a user tends to be overdue or default, wherein the risk assessment value is a numerical value between 0 and 1.

Further, the risk prediction model is trained using the training data set established in step S104.

Preferably, the method further comprises setting an evaluation parameter, wherein the evaluation parameter comprises an AUC value and a KS value, and the evaluation parameter is used for adjusting the model parameter and verifying the effect of the model.

Specifically, the established test data set is used, and in the model training process, the evaluation parameters are calculated, model effect verification is performed, and model parameter adjustment is performed to obtain a more optimized risk prediction model.

In the model effect validation, the AUC value on the training data set was 0.717, the AUC value on the test data set was 0.687, and the KS value on the training data set was 0.341 and the KS value on the test data set was 0.287. Therefore, the risk prediction model is obtained by adding a Sigmoid layer to the deep two-way language model, and effective feature data mining can be performed on the position (or geographic) text information, so that the extraction method of the feature data can be further optimized, the risk user can be identified more accurately, overfitting can be prevented, and the model precision can be further improved.

Next, in step S106, a risk assessment value of the current user is calculated using the risk prediction model to perform risk prediction on the current user.

In this example, the location text information of the current user is obtained, and the address text information of the current user is extracted to generate the user address feature data of the current user, where the user address feature data includes a word vector, a segment vector, and a location vector.

It should be noted that since the extraction method of the address text information of the current user is the same as the extraction method of the address text information of the historical user in step S101, the description thereof is omitted. In addition, since the generation method of the user address feature data of the current user is the same as the user address feature data of the historical user in step S103, a description thereof is omitted.

Further, the generated user address characteristic data, the risk coefficient value and the user characteristic data are input into the risk prediction model to calculate the risk assessment value of the current user.

In this example, a preset risk threshold value is further included, and the risk threshold value is used for judging a risk user and a non-risk user, wherein a user with the calculated risk assessment value being greater than or equal to the risk threshold value is judged as a risk user, and a user with the calculated risk assessment value being less than the risk threshold value is judged as a non-risk user.

Preferably, the method further comprises setting a plurality of risk level thresholds, wherein the risk level thresholds are used for judging the risk condition of the user and subdividing the risk condition of the user into a plurality of sections.

Specifically, the calculated risk assessment value of the user is compared with each risk threshold value, and the risk section to which the user belongs is judged, so that the risk condition of the user is judged more accurately.

It should be noted that the above description is only given as a preferred example, and the present invention is not limited thereto.

Those skilled in the art will appreciate that all or part of the steps to implement the above-described embodiments are implemented as programs (computer programs) executed by a computer data processing apparatus. When the computer program is executed, the method provided by the invention can be realized. Furthermore, the computer program may be stored in a computer readable storage medium, which may be a readable storage medium such as a magnetic disk, an optical disk, a ROM, a RAM, or a storage array composed of a plurality of storage media, such as a magnetic disk or a magnetic tape storage array. The storage medium is not limited to centralized storage, but may be distributed storage, such as cloud storage based on cloud computing.

Example 2

Embodiments of the apparatus of the present invention are described below, which may be used to perform method embodiments of the present invention. The details described in the device embodiments of the invention should be regarded as complementary to the above-described method embodiments; reference is made to the above-described method embodiments for details not disclosed in the apparatus embodiments of the invention.

Referring to fig. 4, 5 and 6, the present invention further provides a risk prediction apparatus 400 based on a deep bi-directional language model, including: an obtaining module 401, configured to obtain location text information of a historical user, and extract address text information of the historical user at least one specific time point; a processing module 402, pre-training a deep bi-directional language model using a Bert model for semantic vector conversion based on a self-attention mechanism; a data generating module 403, configured to perform splicing processing on the address text information by using the deep bidirectional language model, and perform word vector and sentence vector conversion to generate user address feature data; an establishing module 404, configured to establish a training data set and a testing data set, where the training data set includes user address feature data and risk-resistance performance data of a historical user; a model construction module 405 for constructing a risk prediction model, which is trained using the training data set; and the prediction module 406 is used for calculating a risk assessment value of the current user by using the risk prediction model so as to predict the risk of the current user.

As shown in fig. 5, the system further includes an extracting module 501, where the extracting module 501 is configured to extract address text information of the historical user during application, registration, and login, where the address text information includes latitude and longitude information and detailed geographic information.

Preferably, the method further comprises the following steps: using a transform bidirectional encoder to express, and training depth bidirectional expression in advance by jointly adjusting the context in each layer to obtain a word vector of each word, the correlation degree of each word and other words in the text sentence and the weight of each word; and adjusting parameters according to the correlation between different words and the weight of each word in the text sentence, and obtaining the word vector of each word again to generate the user address characteristic data.

As shown in fig. 6, the apparatus further includes a setting module 601, where the setting module 601 is configured to set a pre-training task for pre-training the deep bidirectional representation, where the pre-training task is a plurality of tasks, and includes a word prediction task and a next text sentence prediction task.

In embodiment 2, the same portions as those in embodiment 1 are not described.

Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Example 3

In the following, embodiments of the electronic device of the present invention are described, which may be regarded as specific physical implementations for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.

Fig. 7 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. An electronic apparatus 200 according to this embodiment of the present invention is described below with reference to fig. 7. The electronic device 200 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, the electronic device 200 is embodied in the form of a general purpose computing device. The components of the electronic device 200 may include, but are not limited to: at least one processing unit 210, at least one memory unit 220, a bus 230 connecting different system components (including the memory unit 220 and the processing unit 210), a display unit 240, and the like.

Wherein the storage unit stores program code executable by the processing unit 210 to cause the processing unit 210 to perform steps according to various exemplary embodiments of the present invention described in the processing method section of the electronic device described above in this specification. For example, the processing unit 210 may perform the steps as shown in fig. 1.

The memory unit 220 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM) 2201 and/or a cache memory unit 2202, and may further include a read only memory unit (ROM) 2203.

The storage unit 220 may also include a program/utility 2204 having a set (at least one) of program modules 2205, such program modules 2205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 230 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 200 may also communicate with one or more external devices 300 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 200, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 200 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 250. Also, the electronic device 200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 260. The network adapter 260 may communicate with other modules of the electronic device 200 via the bus 230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention. The computer program, when executed by a data processing apparatus, enables the computer readable medium to carry out the above-described methods of the invention.

As shown in fig. 8, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

In summary, the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components in embodiments in accordance with the invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP). The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.

Claims

1. A risk prediction method based on a deep bidirectional language model is characterized by comprising the following steps:

acquiring position text information of a historical user, and extracting address text information of the historical user at least three specific time points, wherein the specific time points comprise one or more of a request node, a registration node, a login node, a transaction node, a default node and a return node of a financial product;

pre-training a deep bidirectional language model by using a Bert model based on a self-attention mechanism for semantic vector conversion;

splicing the address text information by using the deep bidirectional language model, merging the address text information into an address text sentence with a specific length, performing deep bidirectional expression by using a pre-trained deep bidirectional language model to obtain a word vector of each word, the correlation between each word and other words in the merged address text sentence and the weight of each word, performing parameter adjustment according to the correlation between different words and the weight of each word in the current text sentence, and obtaining the word vector of each word again to generate user address characteristic data;

establishing a training data set and a testing data set, wherein the training data set comprises user address characteristic data and anti-risk performance data of historical users;

adding a Sigmoid layer to the deep two-way language model by using a Bert model to serve as an additional output layer to obtain a risk prediction model, and training the risk prediction model by using the training data set;

and inputting the generated user address characteristic data, risk coefficient values and user characteristic data into the risk prediction model by using the risk prediction model, and calculating a risk evaluation value of the current user so as to carry out risk prediction on the current user.

2. The risk prediction method of claim 1, wherein the extracting of the user address information of the historical user at least one specific time point comprises:

and extracting address text information of the historical user during application, registration and login, wherein the address text information comprises longitude and latitude information and detailed geographic information.

3. The risk prediction method according to claim 1 or 2,

the deep bidirectional language model comprises the following structural layers: the first layer is an input layer, and text sentences to be predicted are input into the deep bidirectional language model; the second layer is a word vector construction layer, and each word is mapped to a low-dimensional vector; the third layer is a Bi-LSTM network layer, and correlation characteristics are extracted from the word vector layer by using the Bi-LSTM based on each word vector and sentence vector; the fourth layer is a self-attention mechanism layer, weight vectors corresponding to all words are generated, and word-level features in each iteration are combined into sentence-level features through multiplication of the weight vectors, so that user address feature data are obtained; the fifth layer is an output layer, and the user address characteristic data is used for user risk classification.

4. The risk prediction method of claim 1, wherein the generating user address characteristic data comprises:

using a transform bidirectional encoder to express, and training depth bidirectional expression in advance by jointly adjusting the context in each layer to obtain a word vector of each word, the correlation degree of each word and other words in the text sentence and the weight of each word;

and adjusting parameters according to the correlation between different words and the weight of each word in the text sentence, and obtaining the word vector of each word again to generate the user address characteristic data.

5. The risk prediction method of claim 4, wherein the word vectors for each word comprise a word vector, a segment vector, and a position vector.

6. The risk prediction method of claim 4, further comprising:

and setting a pre-training task for pre-training deep bidirectional representation, wherein the pre-training task is a plurality of tasks and comprises a word prediction task and a next text sentence prediction task.

7. A risk prediction device based on a deep bi-directional language model, comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring the position text information of a historical user and extracting the address text information of the historical user at least three specific time points, and the specific time points comprise one or more of a request node, a registration node, a login node, a transaction node, a default node and a return node of a financial product;

the processing module is used for pre-training a deep bidirectional language model by using a Bert model based on a self-attention mechanism so as to be used for semantic vector conversion;

the data generation module is used for splicing the address text information by using the deep bidirectional language model, merging the address text information into an address text sentence with a specific length, performing deep bidirectional representation by using the pre-trained deep bidirectional language model to obtain a word vector of each word, the correlation between each word and other words in the merged address text sentence and the weight of each word, adjusting parameters according to the correlation between different words and the weight of each word in the current text sentence, and obtaining the word vector of each word again to generate user address characteristic data;

the system comprises an establishing module, a testing module and a processing module, wherein the establishing module is used for establishing a training data set and a testing data set, and the training data set comprises user address characteristic data and anti-risk performance data of historical users;

the model building module is used for adding a Sigmoid layer to the deep two-way language model as an additional output layer by using a Bert model to obtain a risk prediction model, and training the risk prediction model by using the training data set;

and the prediction module is used for inputting the generated user address characteristic data, risk coefficient values and user characteristic data into the risk prediction model and calculating a risk evaluation value of the current user so as to carry out risk prediction on the current user.

8. An electronic device, wherein the electronic device comprises:

a processor; and the number of the first and second groups,

a memory storing computer-executable instructions that, when executed, cause the processor to perform the method of deep bi-directional language model based risk prediction according to any of claims 1-6.

9. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the deep bi-directional language model based risk prediction method of any one of claims 1-6.