CN111966730A - Risk prediction method and device based on permanent premises and electronic equipment - Google Patents
Risk prediction method and device based on permanent premises and electronic equipment Download PDFInfo
- Publication number
- CN111966730A CN111966730A CN202011144202.5A CN202011144202A CN111966730A CN 111966730 A CN111966730 A CN 111966730A CN 202011144202 A CN202011144202 A CN 202011144202A CN 111966730 A CN111966730 A CN 111966730A
- Authority
- CN
- China
- Prior art keywords
- user
- risk
- information
- risk prediction
- frequent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 239000013598 vector Substances 0.000 claims abstract description 39
- 238000012549 training Methods 0.000 claims abstract description 35
- 238000013058 risk prediction model Methods 0.000 claims abstract description 29
- 238000012502 risk assessment Methods 0.000 claims abstract description 21
- 238000000605 extraction Methods 0.000 claims abstract description 15
- 238000004422 calculation algorithm Methods 0.000 claims description 17
- 238000004458 analytical method Methods 0.000 claims description 14
- 238000007621 cluster analysis Methods 0.000 claims description 9
- 238000013507 mapping Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 15
- 238000011156 evaluation Methods 0.000 description 9
- 238000004590 computer program Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 4
- 238000012954 risk control Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000003064 k means clustering Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9537—Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/04—Trading; Exchange, e.g. stocks, commodities, derivatives or currency exchange
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Accounting & Taxation (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Finance (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a risk prediction method and device based on a permanent station and electronic equipment. The method comprises the following steps: acquiring position text information of a historical user, and extracting frequent residence information of the historical user; using a Word2vec model to extract the characteristics of the user frequent residence information so as to extract a multi-dimensional Word vector; generating a sentence vector with a specific dimension by using an Embedding method so as to generate the characteristic data of the frequent user premises; establishing a training data set based on the generated user permanent station characteristic data; constructing a risk prediction model, and training the risk prediction model by using the training data set; calculating a risk assessment value of the current user by using the risk prediction model; and predicting the risk of the current user according to the calculated risk assessment value. The method realizes the extraction of the risk resistance capability based on the user permanent information and improves the model prediction precision.
Description
Technical Field
The invention relates to the field of computer information processing, in particular to a risk prediction method and device based on a permanent station and electronic equipment.
Background
Risk control (wind control for short) means that a risk manager takes various measures and methods to eliminate or reduce various possibilities of occurrence of a risk case, or a risk controller reduces losses caused when a risk case occurs. The risk control is generally applied to the financial industry, such as risk control on company transactions, merchant transactions or personal transactions and the like.
In the prior art, the main purpose of financial risk assessment is how to distinguish good customers from bad customers, and assess the risk condition of users, so as to reduce credit risk and realize profit maximization. At present, only qualitative judgment can be made on occupation of a client, the coverage rate is low, the utilization of occupation information of the client is limited, the function of timely early warning on risks cannot be achieved, and differentiated client operation cannot be performed. In addition, there is still much room for improvement in the troubleshooting, risk-resistant feature extraction, and model prediction accuracy of some high-risk users.
Therefore, it is necessary to provide a risk prediction method with higher accuracy.
Disclosure of Invention
In order to improve the model prediction precision and accurately evaluate the risk condition of a user, the invention provides a risk prediction method based on a permanent station, which comprises the following steps: acquiring position text information of a historical user, and extracting frequent residence information of the historical user; using a Word2vec model to extract the characteristics of the user frequent residence information so as to extract a multi-dimensional Word vector; generating a sentence vector with a specific dimension by using an Embedding method so as to generate the characteristic data of the frequent user premises; establishing a training data set based on the generated user permanent station characteristic data; constructing a risk prediction model, and training the risk prediction model by using the training data set; calculating a risk assessment value of the current user by using the risk prediction model; and predicting the risk of the current user according to the calculated risk assessment value.
Preferably, the obtaining of the location text information of the historical user and the extracting of the frequent residence information of the historical user includes: and collecting the geographical position of the historical user, carrying out cluster analysis, forming GPS information through a mapping relation, and extracting the frequent-place information of the historical user from the GPS information.
Preferably, the customer premises information includes home address information, company address information, and a city class of a premises.
Preferably, the establishing a training data set comprises: determining the risk resistance grade of each user according to the extracted multi-dimensional word vectors and the user frequent site feature data, and marking risks of each user; based on the marked user data, a training data set is established.
Preferably, the training data set comprises customer premises characteristic data, user characteristic data and risk-resistance performance data, the risk-resistance performance data comprising an overdue probability and/or a default probability.
Preferably, an unsupervised clustering algorithm is used for carrying out clustering analysis on the historical user permanent location information, the extracted multi-dimensional word vectors and the user permanent location characteristic data; and determining risk corresponding relations among the ordinary places of different users based on the clustering analysis result so as to mark the risk labels of the users, wherein the corresponding relations comprise risk equivalence.
Preferably, the specific dimension is 200 to 1000 dimensions.
Preferably, the method further comprises the following steps: constructing a risk prediction model by using an XGboost algorithm; and presetting a risk threshold value, wherein the risk threshold value is used for judging a risk user and a non-risk user, the user with the calculated risk evaluation value larger than the risk threshold value is judged as the risk user, and the user with the calculated risk evaluation value smaller than or equal to the risk threshold value is judged as the non-risk user.
In addition, the invention also provides a risk prediction device based on the permanent station, which comprises: the acquisition module is used for acquiring the position text information of the historical user and extracting the frequent site information of the historical user; the extraction module is used for extracting the characteristics of the user frequent residence information by using a Word2vec model so as to extract a multi-dimensional Word vector; the generating module is used for generating a sentence vector with a specific dimension by using an Embedding method so as to generate the characteristic data of the user permanent; the establishing module is used for establishing a training data set based on the generated user permanent station characteristic data; a construction module for constructing a risk prediction model, which is trained using the training data set; the calculation module is used for calculating the risk assessment value of the current user by using the risk prediction model; and the prediction module is used for predicting the risk of the current user according to the calculated risk assessment value.
Preferably, the system further comprises an analysis module, wherein the analysis module is used for collecting the geographic position of the historical user, performing cluster analysis, forming GPS information through a mapping relation, and extracting the frequent-residence information of the historical user from the GPS information.
Preferably, the customer premises information includes home address information, company address information, and a city class of a premises.
Preferably, the system further comprises a processing module, wherein the processing module determines the risk resistance grade of each user according to the extracted multi-dimensional word vector and the characteristic data of the user permanent, and marks the risk of each user; based on the marked user data, a training data set is established.
Preferably, the training data set comprises customer premises characteristic data, user characteristic data and risk-resistance performance data, the risk-resistance performance data comprising an overdue probability and/or a default probability.
Preferably, an unsupervised clustering algorithm is used for carrying out clustering analysis on the historical user permanent location information, the extracted multi-dimensional word vectors and the user permanent location characteristic data; and determining risk corresponding relations among the ordinary places of different users based on the clustering analysis result so as to mark the risk labels of the users, wherein the corresponding relations comprise risk equivalence.
Preferably, the specific dimension is 200 to 1000 dimensions.
Preferably, the method further comprises the following steps: constructing a risk prediction model by using an XGboost algorithm; and presetting a risk threshold value, wherein the risk threshold value is used for judging a risk user and a non-risk user, the user with the calculated risk evaluation value larger than the risk threshold value is judged as the risk user, and the user with the calculated risk evaluation value smaller than or equal to the risk threshold value is judged as the non-risk user.
In addition, the present invention also provides an electronic device, wherein the electronic device includes: a processor; and a memory storing computer executable instructions that, when executed, cause the processor to perform the permanent premises based risk prediction method of the present invention.
Furthermore, the present invention also provides a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs which, when executed by a processor, implement the permanent premises based risk prediction method of the present invention.
Advantageous effects
Compared with the prior art, the risk prediction method carries out the extraction of the risk resistance capability characteristic by obtaining the user frequent resident information of the user, generates the user frequent resident characteristic data by using the Word2vec model, carries out the risk marking on the user by using the user frequent resident characteristic data, and predicts the risk condition of the user by using the risk prediction model based on the user frequent resident characteristic data, thereby realizing the extraction of the risk resistance capability based on the user frequent resident information and improving the model prediction precision.
Drawings
In order to make the technical problems solved by the present invention, the technical means adopted and the technical effects obtained more clear, the following will describe in detail the embodiments of the present invention with reference to the accompanying drawings. It should be noted, however, that the drawings described below are only illustrations of exemplary embodiments of the invention, from which other embodiments can be derived by those skilled in the art without inventive faculty.
Fig. 1 is a flowchart of an example of a permanent premises based risk prediction method according to embodiment 1 of the present invention.
Fig. 2 is a flowchart of another example of the permanent premises based risk prediction method of embodiment 1 of the present invention.
Fig. 3 is a flowchart of still another example of the permanent premises based risk prediction method of embodiment 1 of the present invention.
Fig. 4 is a schematic diagram of an example of a permanent premises based risk prediction apparatus of embodiment 2 of the present invention.
Fig. 5 is a schematic diagram of another example of a permanent premises based risk prediction apparatus of embodiment 2 of the present invention.
Fig. 6 is a schematic diagram of still another example of a permanent premises based risk prediction apparatus of embodiment 2 of the present invention.
Fig. 7 is a block diagram of an exemplary embodiment of an electronic device according to the present invention.
Fig. 8 is a block diagram of an exemplary embodiment of a computer-readable medium according to the present invention.
Detailed Description
Exemplary embodiments of the present invention will now be described more fully with reference to the accompanying drawings. The exemplary embodiments, however, may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these exemplary embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the invention to those skilled in the art. The same reference numerals denote the same or similar elements, components, or parts in the drawings, and thus their repetitive description will be omitted.
Features, structures, characteristics or other details described in a particular embodiment do not preclude the fact that the features, structures, characteristics or other details may be combined in a suitable manner in one or more other embodiments in accordance with the technical idea of the invention.
In describing particular embodiments, the present invention has been described with reference to features, structures, characteristics or other details that are within the purview of one skilled in the art to provide a thorough understanding of the embodiments. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific features, structures, characteristics, or other details.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various elements, components, or sections, these terms should not be construed as limiting. These phrases are used to distinguish one from another. For example, a first device may also be referred to as a second device without departing from the spirit of the present invention.
The term "and/or" and/or "includes any and all combinations of one or more of the associated listed items.
In order to improve the model prediction precision and accurately evaluate the user risk condition, the invention provides a risk prediction method based on a permanent station. The method can extract the risk resistance characteristic based on the user frequent site information of the user, generate the user frequent site characteristic data, and can predict the risk condition of the user more accurately by using the user frequent site characteristic data, thereby realizing the extraction of the risk resistance characteristic based on the user frequent site information and improving the prediction precision of the model. The specific prediction process will be described in detail below.
Example 1
Hereinafter, an embodiment of the permanent premises based risk prediction method of the present invention will be described with reference to fig. 1 to 3.
Fig. 1 is a flowchart of a permanent premises based risk prediction method of the present invention. As shown in fig. 1, a risk prediction method includes the following steps.
And step S101, acquiring the position text information of the historical user, and extracting the frequent residence information of the historical user.
And S102, extracting the characteristics of the user frequent residence information by using a Word2vec model to extract a multi-dimensional Word vector.
Step S103, generating a sentence vector with a specific dimension by using an Embedding method so as to generate the frequent-residence feature data of the user.
And step S104, establishing a training data set based on the generated user permanent station characteristic data.
Step S105, a risk prediction model is constructed, and the risk prediction model is trained by using the training data set.
And step S106, calculating the risk assessment value of the current user by using the risk prediction model.
And S107, predicting the risk of the current user according to the calculated risk assessment value.
First, in step S101, location text information of the historical user is acquired, and the historical user frequent location information is extracted.
In the present example, the geographical location of the historical user is collected, cluster analysis is performed, GPS information is formed through a mapping relationship, and the historical user frequent location information is extracted from the GPS information.
Preferably, the collected geographical location information is subjected to cluster analysis using a K-means clustering model.
It should be noted that the above description is only given as a preferred example, and the present invention is not limited thereto. In other examples, algorithms such as a gaussian mixture clustering model or a density-based clustering model may also be used.
Specifically, the customer frequent location information includes home address information, company address information, and a city class of a frequent location.
Next, in step S102, using a Word2vec model, performing feature extraction on the customer frequent location information to extract a multi-dimensional Word vector.
In this example, the extracted historical customer premises information is processed (data mined) to convert unstructured text information data into structured word vector data.
Specifically, using the Word2vec model and historical customer premises information, learning in the neural network maps Word to a continuous (high-dimensional) vector, thereby simplifying the processing of text content to vector operations in a K-dimensional vector space through training.
Specifically, the risk resistance characteristic of the frequent user premises information is extracted to extract a multi-dimensional word vector.
For example, the customer premises information of the historical user a is "beijing country trade business center", and feature extraction is performed on the customer premises information according to provincial streets, addresses, and the like, for example, the user premises information is mapped into a 500-dimensional word vector [0,1,0,2 … …,1] by using one-hot mapping.
It should be noted that the Word2vec model is actually a simplified neural network. The Word vectors output using Word2vec may be used for e.g. clustering, finding synonyms, part-of-speech analysis, etc. Furthermore, in other examples, additive combination algorithms on the output word vectors may also be used. The foregoing is illustrative only and is not to be construed as limiting the invention.
Next, in step S103, a sentence vector of a specific dimension is generated using the Embedding method to generate customer premises characteristic data.
In this example, the extracted multidimensional word vector is pre-trained by using an Embedding method, and then calculated by using a weighted average method, so as to obtain an Embedding vector (i.e., an embedded vector or a weight matrix) of the whole sentence, and generate a sentence vector of a specific dimension, so as to generate the user permanent feature data.
Specifically, the specific dimension is 200 to 1000.
As shown in fig. 2, a step S201 of performing cluster analysis is further included.
In step S201, an unsupervised clustering algorithm is used to perform clustering analysis on the historical customer premise information, the extracted multidimensional word vector, and the customer premise feature data, so as to complete classification of different customer premises.
It should be noted that the unsupervised clustering algorithm includes algorithms such as a K-means clustering model, a gaussian mixture clustering model, or a density-based clustering model.
Further, based on the cluster analysis result, determining risk corresponding relations among different user permanent premises, wherein the risk corresponding relations comprise risk equivalence. Therefore, the anti-risk capability of the user is predicted through the frequent user residence where the user is located.
In another example, a risk correspondence for two or more customer premises is determined based on a calculation of vector similarity.
For example, the customer premises of user a is the Shanghai land-based family finance center, and the customer premises of user B is the Beijing national trade business center. According to the city grade and the clustering analysis result, the corresponding relationship between the Shanghai Luzhou finance center and the Beijing national trade business center is the same risk. As another example, the customer premises of customer C is the yunnan kunming ring finance center. The risk corresponding relation between the Yunnan Kunming Global finance center and the Beijing national trade business center is unequal, and the risk resistance of the user C is larger than that of the user A.
It should be noted that the above description is only given by way of example, and the present invention is not limited thereto.
Next, in step S104, a training data set is established based on the generated customer premises characteristic data.
As shown in fig. 3, a step S301 of risk labeling for the user is further included.
In this example, the risk resistance level of each user is determined according to the extracted multi-dimensional word vector and the user frequent location feature data, and risk marking is performed on each user, that is, the risk label of each user is labeled.
Further, based on the flagged user data, a training data set is established.
Specifically, a training data set is established by using user data with risk labels and frequent user premises characteristic data, wherein the training data set comprises the frequent user premises characteristic data, the user characteristic data and anti-risk performance data, and the anti-risk performance data comprises overdue probability and/or default probability.
More specifically, the input features are permanent station feature data and user feature data of the user, and the output features are anti-risk performance data.
It should be noted that, for the input feature, in other examples, only the customer premises feature data may be included, or social text data, performance data of professional categories, and the like may be included. The foregoing is described by way of preferred examples only and is not to be construed as limiting the invention.
More specifically, for the training data set, good and bad samples are defined, and the label is 0,1, where 1 represents a sample that the user default probability (and/or overdue probability) is greater than or equal to a specific threshold, and 0 represents a sample that the user is less than the default probability (and/or overdue probability) by the specific threshold. Typically, the calculated risk assessment value (in this example, the overdue probability) is a number between 0 and 1, which is used to represent the user risk situation. The closer the user's risk assessment value is to 1, the less the user's resistance to risk (i.e., the more risky the fund recovery), and the closer the user's risk assessment value is to 0, the greater the user's resistance to risk (i.e., the better the fund recovery).
Next, in step S105, a risk prediction model is constructed, which is trained using the training data set.
Specifically, a risk prediction model is constructed, for example, using the XGBoost method. However, without being limited thereto, in other examples, a TextCNN algorithm, a random forest algorithm, a logistic regression algorithm, or the like, or two or more of the above algorithms may be used. The specific algorithm used may be determined based on the sampled data and/or traffic requirements.
Further, the risk prediction model is trained using the training dataset.
Next, in step S106, a risk assessment value of the current user is calculated using the risk prediction model.
Specifically, user permanent station information of a current user is obtained, and the user permanent station information is generated into user permanent station characteristic data by using a Word2vec model. In this example, user characteristic data for the current user is also obtained.
And further, inputting the generated customer premises characteristic data and the user characteristic data into the risk prediction model to calculate a risk assessment value of the current user.
Next, in step S107, risk prediction is performed for the current user based on the calculated risk assessment value.
In this example, a preset risk threshold value is further included, and the risk threshold value is used for judging a risk user and a non-risk user, wherein a user with the calculated risk evaluation value larger than the risk threshold value is judged as a risk user, and a user with the calculated risk evaluation value smaller than or equal to the risk threshold value is judged as a non-risk user.
Preferably, the method further comprises setting a plurality of risk level thresholds, wherein the risk level thresholds are used for judging the risk condition of the user and subdividing the risk condition of the user into a plurality of sections.
Specifically, the calculated user occupation evaluation value is compared with each risk threshold value, and a risk section to which the user belongs is judged, so that the risk condition of the user is judged more accurately.
It should be noted that the above description is only a preferred example, and is not to be construed as limiting the present invention.
Those skilled in the art will appreciate that all or part of the steps to implement the above-described embodiments are implemented as programs (computer programs) executed by a computer data processing apparatus. When the computer program is executed, the method provided by the invention can be realized. Furthermore, the computer program may be stored in a computer readable storage medium, which may be a readable storage medium such as a magnetic disk, an optical disk, a ROM, a RAM, or a storage array composed of a plurality of storage media, such as a magnetic disk or a magnetic tape storage array. The storage medium is not limited to centralized storage, but may be distributed storage, such as cloud storage based on cloud computing.
Compared with the prior art, the risk prediction method carries out the extraction of the risk resistance capability characteristic by obtaining the user frequent resident information of the user, generates the user frequent resident characteristic data by using the Word2vec model, carries out the risk marking on the user by using the user frequent resident characteristic data, and predicts the risk condition of the user by using the risk prediction model based on the user frequent resident characteristic data, thereby realizing the extraction of the risk resistance capability based on the user frequent resident information and improving the model prediction precision.
Example 2
Embodiments of the apparatus of the present invention are described below, which may be used to perform method embodiments of the present invention. The details described in the device embodiments of the invention should be regarded as complementary to the above-described method embodiments; reference is made to the above-described method embodiments for details not disclosed in the apparatus embodiments of the invention.
Referring to fig. 4, 5 and 6, the present invention also provides a permanent premises based risk prediction apparatus 400, including: the acquisition module 401 is configured to acquire location text information of a historical user and extract frequent location information of the historical user; an extraction module 402, configured to perform feature extraction on the customer premises information using a Word2vec model to extract a multi-dimensional Word vector; a generating module 403, configured to generate a sentence vector of a specific dimension by using an Embedding method, so as to generate user frequent residence feature data; an establishing module 404, configured to establish a training data set based on the generated customer premises characteristic data; a construction module 405 for constructing a risk prediction model, which is trained using the training data set; a calculating module 406, configured to calculate a risk assessment value of the current user using the risk prediction model; and the prediction module 407 performs risk prediction on the current user according to the calculated risk assessment value.
As shown in fig. 5, the system further includes an analysis module 501, where the analysis module 501 is configured to collect geographic locations of the historical users, perform cluster analysis, form GPS information through a mapping relationship, and extract the historical user frequent location information from the GPS information.
In this example, the customer premises information includes home address information, company address information, and a city class of a premises.
As shown in fig. 6, the system further includes a processing module 601, where the processing module 601 determines a risk-resisting level of each user according to the extracted multidimensional word vector and the user permanent location feature data, and marks a risk for each user; based on the marked user data, a training data set is established.
Preferably, the training data set comprises customer premises characteristic data, user characteristic data and risk-resistance performance data, the risk-resistance performance data comprising an overdue probability and/or a default probability.
Preferably, an unsupervised clustering algorithm is used for carrying out clustering analysis on the historical user permanent location information, the extracted multi-dimensional word vectors and the user permanent location characteristic data; and determining risk corresponding relations among the ordinary places of different users based on the clustering analysis result so as to mark the risk labels of the users, wherein the corresponding relations comprise risk equivalence.
Preferably, the specific dimension is 200 to 1000 dimensions.
Preferably, the method further comprises the following steps: constructing a risk prediction model by using an XGboost algorithm; and presetting a risk threshold value, wherein the risk threshold value is used for judging a risk user and a non-risk user, the user with the calculated risk evaluation value larger than the risk threshold value is judged as the risk user, and the user with the calculated risk evaluation value smaller than or equal to the risk threshold value is judged as the non-risk user.
In embodiment 2, the same portions as those in embodiment 1 are not described.
Those skilled in the art will appreciate that the modules in the above-described embodiments of the apparatus may be distributed as described in the apparatus, and may be correspondingly modified and distributed in one or more apparatuses other than the above-described embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Compared with the prior art, the risk prediction device carries out the extraction of the risk resistance capability characteristic by obtaining the user frequent residence information of the user, generates the user frequent residence characteristic data by using the Word2vec model, carries out the risk marking on the user by using the user frequent residence characteristic data, and predicts the risk condition of the user by using the risk prediction model based on the user frequent residence characteristic data, thereby realizing the extraction of the risk resistance capability based on the user frequent residence information and improving the model prediction precision.
Example 3
In the following, embodiments of the electronic device of the present invention are described, which may be regarded as specific physical implementations for the above-described embodiments of the method and apparatus of the present invention. Details described in the embodiments of the electronic device of the invention should be considered supplementary to the embodiments of the method or apparatus described above; for details which are not disclosed in embodiments of the electronic device of the invention, reference may be made to the above-described embodiments of the method or the apparatus.
Fig. 7 is a block diagram of an exemplary embodiment of an electronic device according to the present invention. An electronic apparatus 200 according to this embodiment of the present invention is described below with reference to fig. 7. The electronic device 200 shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 7, the electronic device 200 is embodied in the form of a general purpose computing device. The components of the electronic device 200 may include, but are not limited to: at least one processing unit 210, at least one memory unit 220, a bus 230 connecting different system components (including the memory unit 220 and the processing unit 210), a display unit 240, and the like.
Wherein the storage unit stores program code executable by the processing unit 210 to cause the processing unit 210 to perform steps according to various exemplary embodiments of the present invention described in the processing method section of the electronic device described above in this specification. For example, the processing unit 210 may perform the steps as shown in fig. 1.
The memory unit 220 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM) 2201 and/or a cache memory unit 2202, and may further include a read only memory unit (ROM) 2203.
The storage unit 220 may also include a program/utility 2204 having a set (at least one) of program modules 2205, such program modules 2205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
The electronic device 200 may also communicate with one or more external devices 300 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 200, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 200 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 250. Also, the electronic device 200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 260. The network adapter 260 may communicate with other modules of the electronic device 200 via the bus 230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments of the present invention described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a computer-readable storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to make a computing device (which can be a personal computer, a server, or a network device, etc.) execute the above-mentioned method according to the present invention. The computer program, when executed by a data processing apparatus, enables the computer readable medium to carry out the above-described methods of the invention.
As shown in fig. 8, the computer program may be stored on one or more computer readable media. The computer readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
In summary, the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functionality of some or all of the components in embodiments in accordance with the invention may be implemented in practice using a general purpose data processing device such as a microprocessor or a Digital Signal Processor (DSP). The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
While the foregoing embodiments have described the objects, aspects and advantages of the present invention in further detail, it should be understood that the present invention is not inherently related to any particular computer, virtual machine or electronic device, and various general-purpose machines may be used to implement the present invention. The invention is not to be considered as limited to the specific embodiments thereof, but is to be understood as being modified in all respects, all changes and equivalents that come within the spirit and scope of the invention.
Claims (10)
1. A method for premise-based risk prediction, comprising:
acquiring position text information of a historical user, and extracting frequent residence information of the historical user;
using a Word2vec model to extract the characteristics of the user frequent residence information so as to extract a multi-dimensional Word vector;
generating a sentence vector with a specific dimension by using an Embedding method so as to generate the characteristic data of the frequent user premises;
establishing a training data set based on the generated user permanent station characteristic data;
constructing a risk prediction model, and training the risk prediction model by using the training data set;
calculating a risk assessment value of the current user by using the risk prediction model;
and predicting the risk of the current user according to the calculated risk assessment value.
2. The risk prediction method according to claim 1, wherein the obtaining of the location text information of the historical user and the extracting of the frequent-residence information of the historical user comprises:
and collecting the geographical position of the historical user, carrying out cluster analysis, forming GPS information through a mapping relation, and extracting the frequent-place information of the historical user from the GPS information.
3. The risk prediction method according to claim 1 or 2, wherein the customer premises information includes home address information, company address information, and a city class of a premises.
4. The risk prediction method of claim 3, wherein the establishing a training data set comprises:
determining the risk resistance grade of each user according to the extracted multi-dimensional word vectors and the user frequent site feature data, and marking risks of each user;
based on the marked user data, a training data set is established.
5. The risk prediction method of claim 4, wherein the training data set includes customer premises characteristic data, user characteristic data, and anti-risk performance data, the anti-risk performance data including an overdue probability and/or a default probability.
6. The risk prediction method according to claim 1 or 2,
performing cluster analysis on the historical user permanent location information, the extracted multi-dimensional word vector and the user permanent location characteristic data by using an unsupervised clustering algorithm;
and determining risk corresponding relations among the ordinary places of different users based on the clustering analysis result so as to mark the risk labels of the users, wherein the corresponding relations comprise risk equivalence.
7. The risk prediction method of claim 1, wherein the specific dimension is 200-1000 dimensions.
8. A premise-based risk prediction device, comprising:
the acquisition module is used for acquiring the position text information of the historical user and extracting the frequent site information of the historical user;
the extraction module is used for extracting the characteristics of the user frequent residence information by using a Word2vec model so as to extract a multi-dimensional Word vector;
the generating module is used for generating a sentence vector with a specific dimension by using an Embedding method so as to generate the characteristic data of the user permanent;
the establishing module is used for establishing a training data set based on the generated user permanent station characteristic data;
a construction module for constructing a risk prediction model, which is trained using the training data set;
the calculation module is used for calculating the risk assessment value of the current user by using the risk prediction model;
and the prediction module is used for predicting the risk of the current user according to the calculated risk assessment value.
9. An electronic device, wherein the electronic device comprises:
a processor; and the number of the first and second groups,
a memory storing computer-executable instructions that, when executed, cause the processor to perform the permanent premises based risk prediction method of any of claims 1-7.
10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the permanent-based risk prediction method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011144202.5A CN111966730A (en) | 2020-10-23 | 2020-10-23 | Risk prediction method and device based on permanent premises and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011144202.5A CN111966730A (en) | 2020-10-23 | 2020-10-23 | Risk prediction method and device based on permanent premises and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111966730A true CN111966730A (en) | 2020-11-20 |
Family
ID=73387206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011144202.5A Pending CN111966730A (en) | 2020-10-23 | 2020-10-23 | Risk prediction method and device based on permanent premises and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111966730A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507628A (en) * | 2021-02-03 | 2021-03-16 | 北京淇瑀信息科技有限公司 | Risk prediction method and device based on deep bidirectional language model and electronic equipment |
CN112907360A (en) * | 2021-03-25 | 2021-06-04 | 深圳前海微众银行股份有限公司 | Risk assessment method, apparatus, storage medium, and program product |
CN113570113A (en) * | 2021-07-02 | 2021-10-29 | 上海淇玥信息技术有限公司 | Equipment loss prediction method and device and electronic equipment |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492103A (en) * | 2018-11-09 | 2019-03-19 | 北京三快在线科技有限公司 | Label information acquisition methods, device, electronic equipment and computer-readable medium |
CN110335115A (en) * | 2019-07-01 | 2019-10-15 | 阿里巴巴集团控股有限公司 | A kind of service order processing method and processing device |
CN111178687A (en) * | 2019-12-11 | 2020-05-19 | 北京淇瑀信息科技有限公司 | Financial risk classification method and device and electronic equipment |
CN111191677A (en) * | 2019-12-11 | 2020-05-22 | 北京淇瑀信息科技有限公司 | User characteristic data generation method and device and electronic equipment |
CN111310462A (en) * | 2020-02-07 | 2020-06-19 | 北京三快在线科技有限公司 | User attribute determination method, device, equipment and storage medium |
CN111711618A (en) * | 2020-06-02 | 2020-09-25 | 支付宝(杭州)信息技术有限公司 | Risk address identification method, device, equipment and storage medium |
-
2020
- 2020-10-23 CN CN202011144202.5A patent/CN111966730A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109492103A (en) * | 2018-11-09 | 2019-03-19 | 北京三快在线科技有限公司 | Label information acquisition methods, device, electronic equipment and computer-readable medium |
CN110335115A (en) * | 2019-07-01 | 2019-10-15 | 阿里巴巴集团控股有限公司 | A kind of service order processing method and processing device |
CN111178687A (en) * | 2019-12-11 | 2020-05-19 | 北京淇瑀信息科技有限公司 | Financial risk classification method and device and electronic equipment |
CN111191677A (en) * | 2019-12-11 | 2020-05-22 | 北京淇瑀信息科技有限公司 | User characteristic data generation method and device and electronic equipment |
CN111310462A (en) * | 2020-02-07 | 2020-06-19 | 北京三快在线科技有限公司 | User attribute determination method, device, equipment and storage medium |
CN111711618A (en) * | 2020-06-02 | 2020-09-25 | 支付宝(杭州)信息技术有限公司 | Risk address identification method, device, equipment and storage medium |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112507628A (en) * | 2021-02-03 | 2021-03-16 | 北京淇瑀信息科技有限公司 | Risk prediction method and device based on deep bidirectional language model and electronic equipment |
CN112907360A (en) * | 2021-03-25 | 2021-06-04 | 深圳前海微众银行股份有限公司 | Risk assessment method, apparatus, storage medium, and program product |
CN112907360B (en) * | 2021-03-25 | 2024-06-07 | 深圳前海微众银行股份有限公司 | Risk assessment method, apparatus, storage medium, and program product |
CN113570113A (en) * | 2021-07-02 | 2021-10-29 | 上海淇玥信息技术有限公司 | Equipment loss prediction method and device and electronic equipment |
CN113570113B (en) * | 2021-07-02 | 2024-05-21 | 上海淇玥信息技术有限公司 | Equipment disconnection prediction method and device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112507628B (en) | Risk prediction method and device based on deep bidirectional language model and electronic equipment | |
CN111199474B (en) | Risk prediction method and device based on network map data of two parties and electronic equipment | |
CN111966730A (en) | Risk prediction method and device based on permanent premises and electronic equipment | |
CN112348520A (en) | XGboost-based risk assessment method and device and electronic equipment | |
CN111210335B (en) | User risk identification method and device and electronic equipment | |
CN111222976A (en) | Risk prediction method and device based on network diagram data of two parties and electronic equipment | |
CN109509048B (en) | Malicious order identification method and device, electronic equipment and storage medium | |
CN112348519A (en) | Method and device for identifying fraudulent user and electronic equipment | |
CN112508300A (en) | Method for establishing risk prediction model, regional risk prediction method and corresponding device | |
CN111179055B (en) | Credit line adjusting method and device and electronic equipment | |
CN111210336A (en) | User risk model generation method and device and electronic equipment | |
CN112348662B (en) | Risk assessment method and device based on user occupation prediction and electronic equipment | |
CN112818667B (en) | Address correction method, system, device and storage medium | |
CN111583018A (en) | Credit granting strategy management method and device based on user financial performance analysis and electronic equipment | |
CN111191825A (en) | User default prediction method and device and electronic equipment | |
CN110782128B (en) | User occupation label generation method and device and electronic equipment | |
CN115115244B (en) | Evaluation method and device for investment environment of mining projects and computer equipment | |
Liu et al. | Online urban-waterlogging monitoring based on a recurrent neural network for classification of microblogging text | |
CN111178687A (en) | Financial risk classification method and device and electronic equipment | |
CN112016855A (en) | User industry identification method and device based on relational network matching and electronic equipment | |
CN110222139B (en) | Road entity data duplication eliminating method, device, computing equipment and medium | |
CN113569578B (en) | User intention recognition method and device and computer equipment | |
CN111191677A (en) | User characteristic data generation method and device and electronic equipment | |
CN112488865A (en) | Financial risk prediction method and device based on financial time nodes and electronic equipment | |
CN112508690A (en) | Risk assessment method and device based on joint distribution adaptation and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |