WO2016206557A1 - 一种风险识别方法及装置 - Google Patents

一种风险识别方法及装置 Download PDF

Info

Publication number
WO2016206557A1
WO2016206557A1 PCT/CN2016/085935 CN2016085935W WO2016206557A1 WO 2016206557 A1 WO2016206557 A1 WO 2016206557A1 CN 2016085935 W CN2016085935 W CN 2016085935W WO 2016206557 A1 WO2016206557 A1 WO 2016206557A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
user
geographic location
feature value
location
Prior art date
Application number
PCT/CN2016/085935
Other languages
English (en)
French (fr)
Inventor
彭际群
何慧梅
王峰伟
吴东杏
何帝君
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016206557A1 publication Critical patent/WO2016206557A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/034Test or assess a computer or a system

Definitions

  • the present application relates to the field of computer networks and information technologies, and in particular, to a risk identification method and apparatus.
  • IP Internet Protocol
  • WiFi Wireless Fidelity
  • Determining the stability of a geographic location is important for business risk identification.
  • users with lower geographical stability have higher probability of performing risk operations than users with higher geographical stability. Therefore, when the system automatically performs risk identification, the stability of the user's geographic location should be taken as An important indicator for risk identification.
  • the stability of the user's geographic location is generally determined by comparing whether the historical geographic location information of the user is consistent with the current geographic location information.
  • the accuracy of this method is low in actual implementation, which leads to the accuracy and feasibility of the risk identification of the server based on the stability of the geographical location.
  • the embodiment of the present application provides a risk identification method and device, which are used to solve the problem that the accuracy and low feasibility of the risk identification of the server based on the stability of the geographical location are low.
  • the embodiment of the present application provides a risk identification method, including:
  • a location stability index of the user to be identified Determining, according to the location stability contribution coefficient of the user to be identified under each geographic location feature, a location stability index of the user to be identified, where the location stability index is used to measure the resident location of the user to be identified Steady Qualitative
  • the risk identification of the user to be identified is performed based on the determined location stability index of the user to be identified.
  • the location stability index of the to-be-identified user is determined according to the location stability contribution coefficient of the user to be identified in each geographic location feature, and specifically includes:
  • the machine classification model It is a classification model obtained by training in advance, and is used to predict the position stability index of the user according to the position stability contribution coefficient of the user under different geographical features.
  • the server trains the machine classification model according to the following steps:
  • the server acquires feature values of each sample user under a preset plurality of geographical location features of the plurality of sample users; the plurality of sample users includes a plurality of security type sample users and a plurality of risk type sample users ;
  • the location stability contribution coefficient under each geographic location feature is an input value of the machine classification model
  • the location stability index corresponding to the sample user type of the sample user is an output value of the machine classification model.
  • the server determines a location stability contribution coefficient corresponding to any feature value interval of the geographic location feature according to the following steps:
  • the server determines, according to the following formula, a location stability contribution coefficient WOE corresponding to any of the feature value intervals:
  • P1 represents the first ratio
  • P0 represents the second ratio
  • P1 represents the third ratio
  • P0 represents the fourth ratio
  • the server determines each feature value interval of any one of the geographic location features according to the following steps:
  • Each feature value under the geographical feature is taken as a feature value interval
  • the chi-square value is used to represent a proportion of different types of sample users having a feature value in one of the feature value intervals for a pair of adjacent feature value intervals, and having another feature value interval The difference between the number of eigenvalues and the number of different types of sample users.
  • the method before the server trains the machine classification model, the method further includes:
  • Geographical features used to train machine classification models including:
  • the screening is performed from the geographic feature A geographic location feature is used as a geographic location feature of the training machine classification model.
  • the screening is performed from the geographic feature A geographical feature, including:
  • Determining a contribution value IV for each geographic location feature wherein, for any one of the geographic location features, P1k represents the number of sample users of the security type having the feature value in the kth feature value interval, and the acquired plurality of samples The ratio of the total number of sample users of the security type in the user, P0k represents the number of sample users of the risk type having the feature value in the kth feature value interval, and the ratio of the total number of sample users of the risk type of the plurality of sample users obtained , WOE(k) represents the positional stability contribution coefficient corresponding to the kth eigenvalue interval, and q is the number of eigenvalue intervals of the geographic location feature;
  • the embodiment of the present application provides a risk identification apparatus, including:
  • a first determining module configured to determine, according to a feature value of the at least one geographic location feature of the user to be identified, and a location stability contribution coefficient corresponding to each feature value interval of each geographic location feature, Position stability contribution coefficient under geographical features;
  • a second determining module configured to determine a location stability index of the user to be identified according to a location stability contribution coefficient of the user to be identified under each geographic location feature, where the location stability index is used to measure the The stability of the location where the user is to be identified;
  • the identification module is configured to perform risk identification on the to-be-identified user based on the location stability index of the to-be-identified user determined by the second determining module.
  • the embodiment of the present application can fuse the contribution of various geographical features to the stability of the user location based on the location stability contribution coefficient of each geographical feature, and divide the feature value interval for each geographical feature.
  • Each feature value interval corresponds to a position stability contribution coefficient, which reduces the computational complexity (no need for each feature value to correspond to a position stability contribution coefficient), and ensures the accuracy of position stability recognition. It can be seen that the manner of the embodiment of the present application can improve the recognition accuracy of the geographical stability of the user, and the feasibility is high.
  • FIG. 1 is a flowchart of a risk identification method according to Embodiment 1 of the present application.
  • FIG. 2 is a flowchart of a risk identification method according to Embodiment 2 of the present application.
  • FIG. 3 is a flowchart of a risk identification method according to Embodiment 3 of the present application.
  • FIG. 4 is a flowchart of a risk identification method according to Embodiment 4 of the present application.
  • FIG. 5 is a flowchart of a risk identification method according to Embodiment 5 of the present application.
  • Figure 6 (a) is a schematic diagram of the location stability index distribution of the risk user and the security user;
  • Figure 6 (b) is a position stability index distribution curve of users of different credit levels
  • FIG. 7 is a schematic structural diagram of a risk identification apparatus according to an embodiment of the present application.
  • the server determines, according to the feature value of the user to be identified under the at least one geographic location feature, and the location stability contribution coefficient corresponding to each feature value interval of each geographic location feature, a location stability contribution coefficient under a geographic location feature; determining a location stability index of the user to be identified according to a location stability contribution coefficient of the user to be identified under each geographic location feature; The location stability index of the user is identified, and the user to be identified is identified by risk.
  • the embodiment of the present application can fuse the contribution of various geographical features to the stability of the user location based on the location stability contribution coefficient of each geographical feature, and divide the feature value interval for each geographical feature.
  • Each feature value interval corresponds to a position stability contribution coefficient, which reduces the computational complexity (no need for each feature value to correspond to a position stability contribution coefficient), and ensures the accuracy of position stability recognition. It can be seen that the manner of the embodiment of the present application can improve the recognition accuracy of the geographical stability of the user, and the feasibility is high.
  • a flowchart of a risk identification method provided by Embodiment 1 of the present application includes:
  • the server determines, according to the feature value of the user to be identified under the at least one geographic location feature, and the location stability contribution coefficient corresponding to each feature value interval of each geographic location feature, determining the geographic location feature of the to-be-identified user.
  • the positional stability contribution coefficient below.
  • the server may collect multiple geographical features of the user to be identified in the preset (or multiple geographical features selected from preset multiple geographical features, See the feature values in each of the geographic features in the description of the fourth embodiment.
  • each of the geographic features is used to reflect the stability of the user's resident location.
  • the geographic location feature in the embodiment of the present application may be Statistic information reflecting the location of the user, such as the number of different resident cities in the monthly average, the proportion of cities that have stayed for more than 12 months, the number of cities that have appeared in the last 2 years, and the current resident city in the last two years. Resident probability, etc.
  • the server determines the location stability contribution coefficient of the user to be identified under each geographic location feature based on the location stability contribution coefficients corresponding to each feature value interval of each geographic location feature; for example, all that will appear in the last 2 years
  • the number of cities is divided into four sections: 0-3 cities, 4-7 cities, 8-12 cities, and 12 cities or more; the characteristic value interval can be manually divided, or can be automatically divided by the server based on certain principles. See the description of Example 3 below.
  • S102 Determine a location stability index of the to-be-identified user according to a location stability contribution coefficient of the to-be-identified user under each geographic location feature, where the location stability index is used to measure that the user to be identified resides The stability of the location.
  • the location stability index of the user to be identified may be determined according to the location stability contribution coefficient of the user to be identified under each geographic location feature, and the weight of each geographic location feature, for example, The position stability contribution coefficient under each geographical feature is multiplied by the respective weights and then added, and the final sum value is determined as the position stability index of the user to be identified.
  • the location stability contribution coefficient of the user to be identified under each geographic location feature may be input into a machine classification model, and an output value of the machine classification model is used as a location of the to-be-identified user.
  • Stability index the machine classification model is a classification model obtained by training (training based on historical data) in advance, and is used to predict the position stability of the user according to the position stability contribution coefficient of the user under different geographical features. index.
  • the trained machine classification model takes the position stability contribution coefficient corresponding to each of the plurality of geographical features as an input value, and uses the position stability index as an output value, and the position stability index reflects the to-be-identified
  • the position stability index reflects the to-be-identified
  • S103 Perform risk identification on the to-be-identified user based on the determined location stability index of the to-be-identified user.
  • the value of the position stability index reflects the stability of the location of the user to be identified. For example, if the position stability index has a value range of [0, 1], the closer the position stability index is to 1, Indicates that the resident location of the user to be identified is more stable.
  • the location stability index of the user to be identified may be considered. For example, if the location stability index is greater than a set threshold, the user to be identified is considered to be a secure user, otherwise it is a risk user. In actual implementation, it is also possible to combine the information other than the location to comprehensively determine whether the user to be identified is a risk user, such as considering the user's daily credit record.
  • a flowchart of a risk identification method provided in Embodiment 2 of the present application includes the following steps:
  • S201 The server acquires feature values of each sample user among a plurality of sample users, where the plurality of sample users include multiple sample types of security types and sample users of multiple risk types. .
  • the server may collect network interactions of sample users (users of known sample user types, such as known security users or risk users, who can confirm sample user types based on sample user evaluation information, complaint information, etc.) Information, and extract user location information, such as geographic location of user work, study, life, entertainment, etc.; according to the extracted user location information, determine a plurality of geographic location features, where each geographic location feature is used to reflect the customer resident
  • the stability of the geographical location such as the number of different resident cities in the month, the proportion of cities that have stayed for more than 12 months, the number of cities that have appeared in the last 2 years, and the current permanent cities in the last two years Resident probability, etc.
  • each feature value under each geographic location feature is divided into multiple feature value intervals (for example, all cities that have appeared in the last 2 years are divided into four intervals: 0-3 cities, 4- 7 cities, 8-12 cities, 12 cities or more), each feature value interval corresponds to a position stability contribution coefficient, and different feature value intervals under a geographic location feature correspond to different position stability contribution coefficients,
  • the position stability contribution coefficient is used to represent the sample user distribution of the security type and the risk type corresponding to the feature value interval, and the difference between the overall security type and the sample user distribution of the risk type, that is, any feature value interval.
  • the first ratio between the corresponding security type and the number of sample users of the risk type is larger than the second ratio between the total security type and the number of sample users of the risk type, indicating that the feature value interval is positional stability
  • Log ratio between the first ratio and the second ratio may be employed to measure the natural logarithm of the ratio between the first ratio and the second ratio may be used to measure, specifically see the description of a second embodiment.
  • S203 Train the machine classification model according to a location stability contribution coefficient of each sample user in each of the plurality of sample users, and a sample user type of each sample user; wherein, the same The location stability contribution coefficient of the user under each geographic location feature is an input value of the machine classification model, and the location stability index corresponding to the sample user type of the sample user is an output value of the machine classification model.
  • This step is a process of training the machine classification model.
  • the machine classification model is a logistic regression model, and this step is a process of determining the logistic regression coefficients in the logistic regression model.
  • the machine classification model takes the position stability contribution coefficient of the user under each geographical feature as the input value, and the output value is the position stability index of the user, where the position stability index is used to measure the stability of the position.
  • the machine classification model needs to be trained based on the information of multiple sample users. Generally, the more the number of sample users, the more accurate the trained machine classification model.
  • the process of performing logistic regression model training in this step is also the process of determining each logistic regression coefficient.
  • S204 For any user to be identified, determining the user to be identified according to the feature value of the user to be identified under each geographical feature and the location stability contribution coefficient corresponding to each feature value interval of each geographical feature Position stability contribution factor under each geographic location feature.
  • the trained machine classification model When applying the trained machine classification model to identify the user position stability, first determine the feature value interval to which the feature value of the user to be identified belongs under each geographical feature, and contribute the position stability corresponding to the feature value interval.
  • the coefficient is used as the position stability contribution coefficient of the user to be identified under the geographical feature.
  • S205 input a position stability contribution coefficient of the user to be identified under each geographic location feature into a machine classification model, and use an output value of the machine classification model as a position stability index of the user to be identified, the position stability The index is used to measure the stability of the resident location of the user to be identified.
  • S206 Perform risk identification on the to-be-identified user based on the determined location stability index of the to-be-identified user.
  • each feature value of each geographic location feature needs to be divided into different feature value intervals, and each feature value interval corresponds to a position stability contribution coefficient; in a specific implementation, the feature value is involved.
  • the principle of eigenvalue interval division is: try to divide the eigenvalues of users with high position stability (sample users of security type) and the eigenvalues of users with low position stability (sample users of risk types) In different feature value intervals.
  • the feature value interval may be manually divided according to experience, or may be automatically divided by the server.
  • the specific server automatic division mode is given, and the position stability contribution corresponding to each feature value interval is given.
  • the specific method of determining the coefficient is described.
  • a flowchart of a risk identification method provided in Embodiment 3 of the present application includes the following steps:
  • S301 The server acquires feature values of each sample user in a plurality of sample users, where the plurality of sample users include multiple sample types of security types and sample users of multiple risk types. .
  • S302 Determine each feature value interval of each geographic location feature according to the following steps:
  • Each feature value under the geographic location feature is taken as a feature value interval (here, each feature value of each geographic location feature may be based on the feature value of each sample user under the geographic location feature, and each of the individualized values is summarized. Do not repeat the feature value);
  • each feature value is regarded as a feature value interval, and then a pair of feature value intervals with the smallest chi-square value among the current adjacent feature value intervals are combined, and the chi-square value is the smallest one.
  • the feature value interval which is the closest feature value interval of different types of sample users, combining them does not violate the principle of distributing different types of sample users in different feature value intervals (even different feature values).
  • the principle of the distribution of sample user types within the interval is as different as possible).
  • the chi-square value may be determined according to the following formula:
  • Aij represents the number of sample users of the jth type having the feature value under the i-th feature value interval in a pair of adjacent feature value intervals; Eij indicates that in the pair of adjacent feature value intervals, there is a The expected value of the number of sample users of the j-th type of the feature values under the i-valued interval, and N is the total number of sample users having the feature values under the pair of adjacent feature value intervals.
  • S303 Determine a position stability contribution coefficient for each feature value interval of each geographical location feature determined by S302.
  • the position stability contribution coefficient can be determined according to one of the following ways:
  • Manner 1 determining, according to the feature value of each sample user of the plurality of sample users, the number of sample users and the type of risk of the security type having the feature value in any of the feature value intervals Sample a first ratio between the number of users, and a second ratio between the total number of sample users of the security type of the plurality of sample users and the total number of sample users of the risk type; according to the first ratio and the second ratio a ratio, determining a position stability contribution coefficient corresponding to any of the feature value intervals;
  • Manner 2 determining, according to the feature value of each sample user of the plurality of sample users, the number of sample users of the security type having the feature value in any of the feature value intervals and the plurality of a third ratio between the total number of sample users of the security type among the sample users, and the number of sample users of the risk type having the feature values of any of the feature value intervals and the sample of the risk types of the plurality of sample users a fourth ratio between the total number of users; determining a position stability contribution coefficient corresponding to any of the feature value intervals according to a ratio between the third ratio and the fourth ratio.
  • the position stability contribution coefficient WOE corresponding to any of the feature value intervals may be determined according to the following formula:
  • P1 represents the first ratio
  • P0 represents the second ratio
  • P1 represents the third ratio
  • P0 represents the fourth ratio
  • each feature value interval after discretizing each feature value under each geographic location feature into each feature value interval, in order to quantify the difference in the contribution of each feature value interval of the different geographic location features to the location stability, according to the above The manner determines the position stability contribution coefficient WOE corresponding to each feature value interval. In this way, not only can each of the feature value intervals of the same geographical location feature be directly compared and quantified, and each feature value interval of different geographic location features can also be directly compared and quantified.
  • the geographical features of all the cities that appear are discretized into four characteristic value intervals, which are 0-3 cities, 4-7 cities, 8-12 cities, 12 cities or more, for each
  • the eigenvalue interval calculates a WOE value, and the WOE values of the different eigenvalue intervals corresponding to other geographic location features are comparable.
  • any simple modification to the first mode and the second mode is within the protection scope of the embodiment of the present application.
  • the position stability contribution coefficient WOE is determined.
  • P1 represents the first product
  • P0 represents the second product.
  • the position of the sample user is determined to be stable. Coefficient of contribution.
  • S305 Train a machine classification model according to a location stability contribution coefficient of each sample user in each of the plurality of sample users and a sample user type of each sample user; wherein, any sample user
  • the position stability contribution coefficient under each geographic location feature is an input value of the machine classification model, and the position stability index corresponding to the sample user type of the sample user is an output value of the machine classification model; the position is stable
  • the sex index is used to measure the stability of the location.
  • the machine classification model adopted in this embodiment may be a logistic regression model, that is:
  • Index represents the position stability index
  • ⁇ i is the logistic regression coefficient (that is, the coefficient that S205 needs to train)
  • fi is the eigenvalue under the i-th geographical feature
  • f0 1
  • n is the geographical feature number.
  • the embodiment of the present application adopts the following Principal Component Analysis (PCA) method to linearly transform the original geographical location features, that is, perform dimensionality reduction processing to avoid the geographical features with high correlation. Participate in geospatial stability analysis.
  • PCA Principal Component Analysis
  • the logistic regression model processed by the PCA method is:
  • ⁇ ′ i is a logistic regression coefficient
  • f' i is the i-th feature after linearly transforming various geographical features
  • m is the number of features after linear transformation
  • wk is the coefficient of fk when performing linear transformation
  • fk is the feature of kth geographical position
  • n is the number of geographic features
  • m ⁇ n.
  • S306 Determine, for any user to be identified, the user to be identified according to the feature value of the user to be identified under each geographic feature and the location stability contribution coefficient corresponding to each feature value interval of each geographic feature. Position stability contribution factor under each geographic location feature.
  • S307 input a position stability contribution coefficient of the user to be identified under each geographical feature into a trained machine classification model, and use an output value of the machine classification model as a position stability index of the user to be identified.
  • the position stability index is used to measure the stability of the resident location of the user to be identified.
  • S308 Perform risk identification on the to-be-identified user based on the determined location stability index of the to-be-identified user.
  • the step of performing geographic location feature screening is further given.
  • a flowchart of a risk identification method provided in Embodiment 4 of the present application includes the following steps:
  • S401 The server acquires feature values of each sample user among a plurality of sample users, where the plurality of sample users include multiple sample types of security types and sample users of multiple risk types. .
  • S403 Determine, according to the correlation coefficient between different geographic location features, each pair of geographic location features whose correlation coefficient is greater than a set threshold.
  • the correlation coefficient between different geographical features can be determined according to the following formula:
  • is the number of sample users
  • Xi is the eigenvalue of the i-th sample user under a geographic feature X.
  • Yi is the eigenvalue of the i-th sample user under another geographic feature Y. The average of the eigenvalues for the geographic location feature Y for all sample users.
  • the threshold of the correlation coefficient can be taken as 0.6.
  • the correlation coefficient between the two geographical features is greater than 0.6, one of the geographical features needs to be screened out.
  • the screening may be performed directly based on the location stability contribution coefficient WOE. For example, for each geographic location feature whose correlation coefficient is greater than a set threshold, the location stability corresponding to each feature value interval of each geographic location feature is determined. The sum of the contribution coefficients WOE is used to filter out the corresponding geographic features with smaller values.
  • the geographic feature screening can also be performed based on the following steps:
  • a contribution value IV for each geographic location feature wherein, for any one of the geographic location features, P1k represents the number of sample users of the security type having the feature value in the kth feature value interval, and the acquired plurality of samples
  • P0k represents the number of sample users of the risk type having the feature value in the kth feature value interval
  • WOE(k) represents the positional stability contribution coefficient corresponding to the kth eigenvalue interval
  • q is the number of eigenvalue intervals of the geographic location feature
  • the position stability contribution reflected by the value of the WOE may not be objective (for example, the total number of sample users in a feature value interval itself).
  • the ratio of the number of sample users of the security type to the number of sample users of the risk type is However, it is not sufficient to fully consider that the positional stability of the eigenvalue interval is relatively large.
  • the WOE value is multiplied by the security type corresponding to the eigenvalue interval and the risk type sample users respectively appear. The difference in probability.
  • S405 Train a machine classification model according to a location stability contribution coefficient of each sample user in each of the plurality of sample users, and a sample user type of each sample user, wherein the machine classification model is trained;
  • the position stability contribution coefficient of a sample user under each geographic location feature is an input value of the machine classification model, and the position stability index corresponding to the sample user type of the sample user is an output value of the machine classification model;
  • the position stability index is used to measure the stability of the position.
  • S406 Determine, for any user to be identified, the user to be identified according to the feature value of the user to be identified under each geographic feature and the location stability contribution coefficient corresponding to each feature value interval of each geographic feature. Position stability contribution factor under each geographic location feature.
  • S407 input a position stability contribution coefficient of the user to be identified under each geographic location feature into a machine classification model, and determine an output value of the machine classification model as a position stability index of the user to be identified, the position stability The index is used to measure the stability of the resident location of the user to be identified.
  • S408 Perform risk identification on the to-be-identified user based on the determined location stability index of the to-be-identified user.
  • a flowchart of a risk identification method provided in Embodiment 5 of the present application includes:
  • S501 The server acquires feature values of each sample user among a plurality of sample users, where the plurality of sample users include multiple sample types of security types and sample users of multiple risk types. .
  • the stability characteristics of resident cities may include: the monthly average number of different resident cities (the number of months in which all resident cities are divided by the length of statistical time in the statistical time length), and the mean probability of monthly resident cities ( The mean value of the user's resident probability in all resident cities), the monthly resident city probability variance (the variance of the user's resident probability in all resident cities), etc.;
  • the urban distribution characteristics of different frequencies may include: all cities in which the user resides The number of cities that have stayed for 1 to 3 months, the proportion of cities that have stayed for 4 to 6 months, the proportion of cities that have been in existence for 7 to 12 months, and the 13 to 24 months of residence.
  • the proportion of the city, the number of months to which the user resides, etc.; the stability characteristics of the current resident city may include: the current resident probability of the user in the current resident city, and the current resident city as the resident city. Number, month in the current resident city as a resident city In the share, the average probability of the resident of the current resident city, the variance of the probability of the resident of the current resident city in the month in which the current resident city is a resident city, and the like.
  • the above-mentioned geographical features are related to the resident city, where the resident city is the selected user in the set time period, such as the city with the longest stay in a certain month.
  • the probability of the user staying in each city may be determined according to the number of days the user resides in each city and the number of cities that the user may reside, and the corresponding city with the highest probability of staying is selected as the resident. city.
  • the resident probability corresponding to any city can be calculated as:
  • E represents the expected number of days to reside in the city during a set period of time (such as a certain month), and e1 represents the city that resides in the ith unexisting city (representing uncounted cities where the user may reside)
  • the expected number of days e2 represents the expected number of days residing in the jth resident city
  • CNT is the number of days the user resides in the city
  • L is the length of the set time period, such as 30 days
  • M is the user may be resident
  • the total number of cities left, such as M 12 (take the 99th digit of the total number of cities the user may reside)
  • N is the number of cities the user has in total during the set time period
  • CNTj is the user who resides in the jth The number of days in a city.
  • S502 Perform, for each geographic location feature, performing: using each feature value under the geographic location feature as a feature value interval; determining a chi-square value of each pair of adjacent feature value intervals, and determining the minimum value The pair of adjacent feature value intervals corresponding to the chi-square value are combined, and the step is repeated until the number of feature value intervals under the geographical feature feature reaches a preset number of intervals.
  • the chi-square value is determined according to the following formula:
  • Aij represents the number of sample users of the jth type having the feature value under the i-th feature value interval in a pair of adjacent feature value intervals; Eij indicates that in the pair of adjacent feature value intervals, there is a The expected value of the number of sample users of the j-th type of the feature values under the i-valued interval, and N is the total number of sample users having the feature values under the pair of adjacent feature value intervals.
  • S503 Determine a position stability contribution coefficient for each feature value interval of each geographical location feature determined by S502.
  • S504 Determine, according to the correlation coefficient between different geographic location features, each pair of geographic location features whose correlation coefficient is greater than a set threshold.
  • the correlation coefficient between different geographical features can be determined according to the following formula:
  • is the number of sample users
  • Xi is the eigenvalue of the i-th sample user under a geographic feature X.
  • Yi is the eigenvalue of the i-th sample user under another geographic feature Y. The average of the eigenvalues for the geographic location feature Y for all sample users.
  • the threshold of the correlation coefficient can be taken as 0.6.
  • the correlation coefficient between the two geographical features is greater than 0.6, one of the geographical features needs to be screened out.
  • a contribution value IV for each geographic location feature wherein, for any one of the geographic location features, P1k represents the number of sample users of the security type having the feature value in the kth feature value interval, and the acquired plurality of samples
  • P0k represents the number of sample users of the risk type having the feature value in the kth feature value interval, and the ratio of the total number of sample users of the risk type of the plurality of sample users obtained
  • WOE(k) represents the position stability contribution coefficient corresponding to the kth eigenvalue interval
  • q is the number of feature value intervals of the geographical feature
  • the contribution is determined for each geographical feature in the pair of geographical features
  • a geographic location feature having a minimum value of IV determining a geographic location feature having a minimum of IV as a geographic location feature selected from the pair of geographic location features.
  • the monthly average number of resident cities the monthly mean probability of resident cities, the probabilities of monthly resident cities, the number of cities where users reside.
  • the proportion of cities that have stayed for 1 to 3 months the proportion of cities that have stayed for 4 to 6 months, and the cities that have been in existence for 13 to 24 months Ratio, counts all months of the user's resident location, the user's current resident probability in the current resident city, the number of months in the current resident city as a resident city, and the month in which the current resident city is a resident city , the variance of the resident probability of the user in the current resident city.
  • S507 Determine a logistic regression coefficient in the logistic regression model according to a location stability contribution coefficient of each of the plurality of sample users under each of the selected geographical features, and a sample user type of each sample user.
  • the position stability contribution coefficient of each sample user under each geographical feature is an input value of the logistic regression model
  • the position stability index corresponding to the sample user type of the sample user is the logistic regression model output value.
  • the logistic regression model obtained by the PCA method is:
  • ⁇ ′ i is a logistic regression coefficient
  • f' i is the i-th feature after linearly transforming various geographical features
  • m is the number of features after linear transformation
  • wk is the coefficient of fk when performing linear transformation
  • fk is the feature of kth geographical position
  • n is the number of geographic features
  • m ⁇ n.
  • S508 Determine, for any user to be identified, the user to be identified according to the feature value of the user to be identified under each geographic feature and the location stability contribution coefficient corresponding to each feature value interval of each geographic feature. Position stability contribution factor under each geographic location feature.
  • S509 input a position stability contribution coefficient of the user to be identified under each geographical feature into a logistic regression model, and use an output value of the logistic regression model as a position stability index of the user to be identified, the position stability index It is used to measure the stability of the resident location of the user to be identified.
  • S510 Perform risk identification on the to-be-identified user based on the determined location stability index of the to-be-identified user.
  • the position stability contribution coefficient of the user to be identified under each geographical feature is input into the trained logistic regression model, and the output value of the logistic regression model is obtained, that is, the position stability index of the user to be identified, the position
  • the value of the stability index characterizes the location stability characteristics of the user to be identified.
  • the location stability index of the user to be identified may be considered. For example, if the location stability index is greater than a set threshold, the user to be identified is considered to be a secure user, otherwise it is a risk user.
  • curve A shows the position stability index distribution of the risk user
  • curve B shows the position stability index distribution of the safety user
  • the abscissa is the position stability index
  • the ordinate is the distribution density.
  • the risk identification device corresponding to the risk identification method is also provided in the embodiment of the present application. Since the principle of solving the problem is similar to the risk identification method in the embodiment of the present application, the implementation of the device can be referred to the method. The implementation, repetitions will not be repeated.
  • FIG. 7 is a schematic structural diagram of a risk identification apparatus provided by an embodiment of the present application, including:
  • the first determining module 71 is configured to determine, according to the feature value of the user to be identified that is at least one geographic location feature, and the location stability contribution coefficient corresponding to each feature value interval of each geographic location feature, Position stability contribution coefficient for each geographic location feature;
  • a second determining module 72 configured to determine a location stability index of the user to be identified according to a location stability contribution coefficient of the user to be identified under each geographic location feature, where the location stability index is used to measure Determining the stability of identifying the location where the user resides;
  • the identification module 73 is configured to perform risk identification on the user to be identified based on the location stability index of the user to be identified determined by the second determining module 72.
  • the second determining module 72 is specifically configured to:
  • a positional stability contribution coefficient of the user to be identified under each geographic location feature into a machine classification model, and determining an output value of the machine classification model as a position stability index of the user to be identified;
  • the machine classification model Is a classification model obtained in advance through training, based on the position of the user under different geographical features
  • the stability contribution coefficient predicts the position stability index of the user.
  • the device further includes:
  • the model training module 74 is configured to acquire, among the plurality of sample users, each of the sample users before the second determining module 72 inputs the position stability contribution coefficient of the user to be identified under each geographic location feature into the machine classification model.
  • a feature value under a plurality of preset geographic location features the plurality of sample users includes a plurality of security type sample users and a plurality of risk type sample users; for each geographic location feature, according to each sample user The feature value interval to which the feature value belongs under the geographical feature, and the position stability contribution coefficient corresponding to each feature value interval of the geographical feature, determine the positional stability of each sample user under the geographical feature a contribution coefficient; training the machine classification model according to a position stability contribution coefficient of each sample user of each of the plurality of sample users under each geographic location feature, and a sample user type of each sample user;
  • the positional stability contribution coefficient of a sample user under each geographic location feature is the input of the machine classification model
  • the value, the position stability index corresponding to the sample user type of the sample user is the output value
  • the model training module 74 is specifically configured to determine a location stability contribution coefficient corresponding to any feature value interval of the geographic location feature according to the following steps:
  • model training module 74 is specifically configured to determine a position stability contribution coefficient WOE corresponding to any one of the feature value intervals according to the following formula:
  • P1 represents the first ratio
  • P0 represents the second ratio
  • P1 represents the third ratio
  • P0 represents the fourth ratio
  • model training module 74 is specifically configured to determine each feature value interval of any one of the geographic location features according to the following steps:
  • Each feature value under the geographical feature is taken as a feature value interval
  • model training module 74 is specifically configured to determine the chi-square value according to the following formula:
  • Aij represents the number of sample users of the jth type having the feature value under the i-th feature value interval in a pair of adjacent feature value intervals; Eij indicates that in the pair of adjacent feature value intervals, there is a The expected value of the number of sample users of the j-th type of the feature values under the i-valued interval, and N is the total number of sample users having the feature values under the pair of adjacent feature value intervals.
  • the model training module 74 is specifically configured to: before the training machine classification model, according to correlation coefficients between different geographic location features, and corresponding location stability of each feature value interval of each geographic location feature respectively a contribution coefficient that selects geographic location features for training the machine classification model from the predetermined plurality of geographic location features.
  • the model training module 74 is specifically configured to: determine, according to a correlation coefficient between different geographic location features, each pair of geographic location features whose correlation coefficient is greater than a set threshold; and the correlation coefficient for each pair is greater than a set threshold Geographical location feature, according to the location stability contribution coefficient corresponding to each feature value interval of each geographic location feature, a geographic location feature is selected from the geographic location feature for training The geographic location characteristics of the machine classification model.
  • model training module 74 is specifically configured to:
  • Determining a contribution value IV for each geographic location feature wherein, for any one of the geographic location features, P1k represents the number of sample users of the security type having the feature value in the kth feature value interval, and the acquired plurality of samples The ratio of the total number of sample users of the security type in the user, P0k represents the number of sample users of the risk type having the feature value in the kth feature value interval, and the ratio of the total number of sample users of the risk type of the plurality of sample users obtained , WOE(k) represents the position stability contribution coefficient corresponding to the kth eigenvalue interval, q is the number of feature value intervals of the geographical feature, and the contribution is determined for each geographical feature in the pair of geographical features A geographic location feature having a minimum value of IV, determining a geographic location feature having a minimum of IV as a geographic location feature selected from the pair of geographic location features.
  • the machine classification model is:
  • Index represents the position stability index
  • ⁇ i is the logistic regression coefficient
  • fi is the eigenvalue under the i-th geographical feature
  • f0 1
  • n is the number of geographical features.
  • the machine classification model is:
  • ⁇ ′ i is a logistic regression coefficient
  • f' i is the i-th feature after linearly transforming various geographical features
  • m is the number of features after linear transformation
  • wk is the coefficient of fk when performing linear transformation
  • fk is the feature of kth geographical position
  • n is the number of geographic features
  • m ⁇ n.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种风险识别方法及装置,属于计算机网络与信息技术领域,用以解决基于地理位置的稳定性进行风险识别的准确率及可行度较低的问题。实施例提供的风险识别方法包括:服务器根据待识别用户在至少一种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数(S101);根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,确定所述待识别用户的位置稳定性指数,所述位置稳定性指数用于衡量所述待识别用户驻留位置的稳定性(S102);基于确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别(S103)。

Description

一种风险识别方法及装置
本申请要求2015年06月24日递交的申请号为201510354187.X、发明名称为“一种风险识别方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机网络与信息技术领域,尤其涉及一种风险识别方法及装置。
背景技术
用户在使用互联网时,会留下很多地理位置信息,比如收货地址、上网时的互联网地址(Internet Protocol,IP)、无线保真(WirelessFidelity,WiFi)信息,导航的位置信息等。通过分析一定时间段内用户的地理位置信息,可以确定用户地理位置的稳定性。
确定地理位置的稳定性对于进行业务风险识别具有重要作用。一般情况下,地理位置稳定性较低的用户相比地理位置稳定性较高的用户,进行风险操作的概率要大,因此,系统在自动进行风险识别时,应该将用户地理位置的稳定性作为进行风险识别的一个重要指标。
目前,一般是通过比较用户的历史地理位置信息与当前的地理位置信息是否一致来确定用户地理位置的稳定性。但是,由于用户的地理位置通常不唯一、不固定,所以这种方式在实际实施中准确率较低,进而导致服务器基于地理位置的稳定性进行风险识别的准确率及可行度较低。
发明内容
本申请实施例提供一种风险识别方法及装置,用以解决服务器基于地理位置的稳定性进行风险识别的准确率及可行度较低的问题。
本申请实施例提供一种风险识别方法,包括:
服务器根据待识别用户在至少一种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数;
根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,确定所述待识别用户的位置稳定性指数,所述位置稳定性指数用于衡量所述待识别用户驻留位置的稳 定性;
基于确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。
可选地,根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,确定所述待识别用户的位置稳定性指数,具体包括:
将所述待识别用户在每种地理位置特征下的位置稳定性贡献系数输入机器分类模型,将所述机器分类模型的输出值确定为该待识别用户的位置稳定性指数;所述机器分类模型是预先通过训练得到的分类模型,用于根据用户在不同种地理位置特征下的位置稳定性贡献系数,预测该用户的位置稳定性指数。
可选地,所述服务器根据以下步骤训练出所述机器分类模型:
所述服务器获取多个样本用户中,每个样本用户在预设的多种地理位置特征下的特征值;所述多个样本用户包括多个安全类型的样本用户和多个风险类型的样本用户;
针对每种地理位置特征,根据每个样本用户在该种地理位置特征下的特征值所属的特征值区间,以及该种地理位置特征的每个特征值区间对应的位置稳定性贡献系数,确定每个样本用户在该种地理位置特征下的位置稳定性贡献系数;
根据所述多个样本用户中每个样本用户在每种地理位置特征下的位置稳定性贡献系数,以及每个样本用户的样本用户类型,训练出所述机器分类模型;其中,任一样本用户在每种地理位置特征下的位置稳定性贡献系数为所述机器分类模型的输入值,该样本用户的样本用户类型对应的位置稳定性指数为所述机器分类模型的输出值。
可选地,针对每种地理位置特征,所述服务器根据以下步骤确定该种地理位置特征的任一特征值区间对应的位置稳定性贡献系数:
根据所述多个样本用户中每个样本用户在该种地理位置特征下的特征值,确定具有该任一特征值区间下的特征值的、安全类型的样本用户数目与风险类型的样本用户数目之间的第一比值,以及所述多个样本用户中安全类型的样本用户总数与风险类型的样本用户总数之间的第二比值;根据所述第一比值和第二比值之间的比值,确定所述任一特征值区间对应的位置稳定性贡献系数;或者,
根据所述多个样本用户中每个样本用户在该种地理位置特征下的特征值,确定具有该任一特征值区间下的特征值的、安全类型的样本用户数目与所述多个样本用户中安全类型的样本用户总数之间的第三比值,以及,具有该任一特征值区间下的特征值的、风险类型的样本用户数目与所述多个样本用户中风险类型的样本用户总数之间的第四比值;根据所述第三比值和第四比值之间的比值,确定所述任一特征值区间对应的位置稳 定性贡献系数。
可选地,所述服务器根据以下公式确定所述任一特征值区间对应的位置稳定性贡献系数WOE:
WOE=ln(P1/P0);
其中,P1表示所述第一比值,P0表示所述第二比值;或者,P1表示所述第三比值,P0表示所述第四比值。
可选地,所述服务器根据以下步骤确定任一种地理位置特征的各个特征值区间:
将该种地理位置特征下的每个特征值作为一个特征值区间;
确定当前每一对相邻的特征值区间的卡方值,将确定的最小的卡方值所对应的一对相邻的特征值区间进行合并;重复该步骤,直到该种地理位置特征下的特征值区间数目达到预设区间数目;
其中,所述卡方值用于表征针对一对相邻的特征值区间,具有其中一个特征值区间下的特征值的、不同类型的样本用户数目占比,与具有另一个特征值区间下的特征值的、不同类型的样本用户数目占比之间的差异。
可选地,所述服务器训练机器分类模型之前,还包括:
根据不同种地理位置特征之间的相关系数,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从所述预设的多种地理位置特征中筛选出用于训练机器分类模型的地理位置特征。
可选地,根据不同种地理位置特征之间的相关系数,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从所述预设的多种地理位置特征中筛选出用于训练机器分类模型的地理位置特征,包括:
根据不同种地理位置特征之间的相关系数,确定相关系数大于设定阈值的各对地理位置特征;
针对每一对相关系数大于设定阈值的地理位置特征,根据该对地理位置特征中,每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从该对地理位置特征中筛选出一种地理位置特征用于作为训练机器分类模型的地理位置特征。
针对每一对相关系数大于设定阈值的地理位置特征,根据该对地理位置特征中,每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从该对地理位置特征中筛选出一种地理位置特征,包括:
根据
Figure PCTCN2016085935-appb-000001
确定每种地理位置特征的贡献值IV;其中,针对任一种地理位置特征,P1k表示具有第k个特征值区间中的特征值的安全类型的样本用户数目,占获取的所述多个样本用户中安全类型的样本用户总数目的比率,P0k表示具有第k个特征值区间中的特征值的风险类型的样本用户数目,占获取的所述多个样本用户中风险类型的样本用户总数目的比率,WOE(k)表示第k个特征值区间对应的位置稳定性贡献系数,q为该种地理位置特征的特征值区间数目;
针对该对地理位置特征中的每种地理位置特征,确定使贡献值IV最小的一种地理位置特征,将IV最小的一种地理位置特征确定为从该对地理位置特征中筛选出的一种地理位置特征。
本申请实施例提供一种风险识别装置,包括:
第一确定模块,用于根据待识别用户在至少一种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数;
第二确定模块,用于根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,确定所述待识别用户的位置稳定性指数,所述位置稳定性指数用于衡量所述待识别用户驻留位置的稳定性;
识别模块,用于基于第二确定模块确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。
本申请实施例可以基于每种地理位置特征的位置稳定性贡献系数,将各种地理位置特征对用户位置稳定性的贡献融合在一起,并且,对每种地理位置特征进行了特征值区间的划分,每一个特征值区间对应了一个位置稳定性贡献系数,这样既减少了计算的复杂度(无需每一个特征值都对应一个位置稳定性贡献系数),又保证了位置稳定性识别的准确性。可见,本申请实施例的方式可以提高对用户地理位置稳定性的识别准确率,可行度较高。
附图说明
图1为本申请实施例一提供的风险识别方法流程图;
图2为本申请实施例二提供的风险识别方法流程图;
图3为本申请实施例三提供的风险识别方法流程图;
图4为本申请实施例四提供的风险识别方法流程图;
图5为本申请实施例五提供的风险识别方法流程图;
图6(a)为风险用户和安全用户的位置稳定性指数分布示意图;
图6(b)为不同信用级别用户的位置稳定性指数分布曲线;
图7为本申请实施例提供的风险识别装置结构示意图。
具体实施方式
本申请实施例中,服务器根据待识别用户在至少一种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数;根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,确定所述待识别用户的位置稳定性指数;基于确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。本申请实施例可以基于每种地理位置特征的位置稳定性贡献系数,将各种地理位置特征对用户位置稳定性的贡献融合在一起,并且,对每种地理位置特征进行了特征值区间的划分,每一个特征值区间对应了一个位置稳定性贡献系数,这样既减少了计算的复杂度(无需每一个特征值都对应一个位置稳定性贡献系数),又保证了位置稳定性识别的准确性。可见,本申请实施例的方式可以提高对用户地理位置稳定性的识别准确率,可行度较高。
下面结合说明书附图对本申请实施例作进一步详细描述。
实施例一
如图1所示,为本申请实施例一提供的风险识别方法流程图,包括:
S101:服务器根据待识别用户在至少一种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数。
在具体实施中,针对任一待识别用户,服务器可以采集该待识别用户在预设的多种地理位置特征(或者是从预设的多种地理位置特征中筛选出的多种地理位置特征,参见实施例四的描述)中每种地理位置特征下的特征值,这里,每种地理位置特征用于反映用户驻留位置的稳定性,优选地,本申请实施例中的地理位置特征可以为反映用户驻留位置的统计量信息,比如包括月平均不同常驻城市数、驻留过12个月以上的城市占比、在最近2年内出现的所有城市数,当前常驻城市在最近两年内的常驻概率等。然后,服 务器基于每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数;比如将在最近2年内出现的所有城市数划分为4个区间:0-3个城市、4-7个城市、8~12个城市、12个城市以上;特征值区间可以人工划分,也可以由服务器基于一定的原则自动划分,详见下述实施例三的描述。
S102:根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,确定所述待识别用户的位置稳定性指数,所述位置稳定性指数用于衡量所述待识别用户驻留位置的稳定性。
在具体实施过程中,可以根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,以及每种地理位置特征的权重,确定所述待识别用户的位置稳定性指数,比如将在每种地理位置特征下的位置稳定性贡献系数乘以各自对应的权重后再相加,将最后的和值确定为所述待识别用户的位置稳定性指数。
可选地,在具体实施中,可以将所述待识别用户在每种地理位置特征下的位置稳定性贡献系数输入机器分类模型,以所述机器分类模型的输出值作为该待识别用户的位置稳定性指数;所述机器分类模型是预先通过训练(基于历史数据进行训练)得到的分类模型,用于根据用户在不同种地理位置特征下的位置稳定性贡献系数,预测该用户的位置稳定性指数。
在具体实施中,训练出的机器分类模型以多种地理位置特征分别对应的位置稳定性贡献系数作为输入值,以位置稳定性指数作为输出值,该位置稳定性指数即反映了所述待识别用户驻留位置的稳定性,关于机器分类模型的训练可详见下述实施例二的描述。
S103:基于确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。
在具体实施中,位置稳定性指数的值反映了待识别用户驻留位置的稳定性,比如,位置稳定性指数的取值范围为[0,1],则位置稳定性指数越接近1,则说明该待识别用户的驻留位置越稳定。在对该待识别用户进行风险识别时,可以考虑该待识别用户的位置稳定性指数,比如若位置稳定性指数大于设定阈值,则认为待识别用户为安全用户,否则为风险用户。在实际实施中,还可以结合除位置之外的其它信息来综合判断待识别用户是否为风险用户,比如考虑用户的日常信用记录等。
实施例二
如图2所示,为本申请实施例二提供的风险识别方法流程图,包括以下步骤:
S201:服务器获取多个样本用户中,每个样本用户在预设的多种地理位置特征下的特征值;所述多个样本用户包括多个安全类型的样本用户和多个风险类型的样本用户。
在具体实施过程中,服务器可以采集样本用户(已知样本用户类型的用户,比如是已知的安全用户或风险用户,可以基于样本用户的评价信息、投诉信息等确认样本用户类型)的网络交互信息,并从中提取出用户位置信息,比如用户工作、学习、生活、娱乐等的地理位置;根据提取的用户位置信息,确定多种地理位置特征,这里的每种地理位置特征用于反映用户驻留的地理位置的稳定性,比如包括月平均不同常驻城市数、驻留过12个月以上的城市占比、在最近2年内出现的所有城市数,当前常驻城市在最近两年内的常驻概率等。
S202:针对每种地理位置特征,根据每个样本用户在该种地理位置特征下的特征值所属的特征值区间,以及该种地理位置特征的每个特征值区间对应的位置稳定性贡献系数,确定每个样本用户在该种地理位置特征下的位置稳定性贡献系数;其中,每个特征值区间对应的位置稳定性贡献系数用于表征具有该特征值区间下的特征值的、安全类型的样本用户数目与风险类型的样本用户数目的比例,和获取的所述多个样本用户中、安全类型的样本用户总数与风险类型的样本用户总数的比例之间的差异。
本申请实施例中,将每种地理位置特征下的各个特征值划分为多个特征值区间(比如将在最近2年内出现的所有城市数划分为4个区间:0-3个城市、4-7个城市、8~12个城市、12个城市以上),每个特征值区间对应一个位置稳定性贡献系数,一种地理位置特征下的不同的特征值区间对应不同的位置稳定性贡献系数,该位置稳定性贡献系数用于表征该特征值区间所对应的安全类型与风险类型的样本用户分布,与总体安全类型与风险类型的样本用户分布之间的差异,也即,任一特征值区间所对应的安全类型与风险类型的样本用户数目之间的第一比值,相比总体安全类型与风险类型的样本用户数目之间的第二比值越大,则说明该特征值区间对位置稳定性的贡献越大,也即该特征值区间所对应的位置稳定性贡献系数越大;具体地,任一特征值区间的位置稳定性贡献系数可以采用第一比值和第二比值之间的比值来衡量,也可以采用第一比值和第二比值之间的比值的自然对数来衡量,具体见实施例二的描述。
S203:根据所述多个样本用户中每个样本用户在每种地理位置特征下的位置稳定性贡献系数,以及每个样本用户的样本用户类型,训练出所述机器分类模型;其中,任一样本用户在每种地理位置特征下的位置稳定性贡献系数为所述机器分类模型的输入值,该样本用户的样本用户类型对应的位置稳定性指数为所述机器分类模型的输出值。
该步骤为进行机器分类模型训练的过程,比如机器分类模型为逻辑回归模型,该步骤即为确定逻辑回归模型中的逻辑回归系数的过程。机器分类模型以用户在每种地理位置特征下的位置稳定性贡献系数作为输入值,输出值为该用户的位置稳定性指数,这里的位置稳定性指数即用于衡量位置的稳定性。
该步骤中,需要基于多个样本用户的信息来训练机器分类模型,一般地,样本用户的数量越多,训练的机器分类模型越准确。作为机器分类模型的一种,逻辑回归模型
Figure PCTCN2016085935-appb-000002
其中,Index表示位置稳定性指数,θi为逻辑回归系数,fi为在第i种地理位置特征下的特征值,f0=1,n为地理位置特征的种数。该步骤进行逻辑回归模型训练的过程也即为确定出各逻辑回归系数的过程。
S204:针对任一待识别用户,根据该待识别用户在每种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数。
在应用训练出的机器分类模型进行用户位置稳定性识别时,首先确定出待识别用户在每种地理位置特征下的特征值所属的特征值区间,并将该特征值区间对应的位置稳定性贡献系数作为该待识别用户在该种地理位置特征下的位置稳定性贡献系数。
S205:将所述待识别用户在每种地理位置特征下的位置稳定性贡献系数输入机器分类模型,以所述机器分类模型的输出值作为该待识别用户的位置稳定性指数,该位置稳定性指数用于衡量所述待识别用户驻留位置的稳定性。
S206:基于确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。
在上述实施例一中说明了,需要将每种地理位置特征的各个特征值划分为不同的特征值区间,每个特征值区间对应一个位置稳定性贡献系数;在具体实施中,涉及到了特征值区间如何划分的问题,特征值区间划分的原则是:尽量将位置稳定性高的用户(安全类型的样本用户)的特征值与位置稳定性低的用户(风险类型的样本用户)的特征值划分在不同的特征值区间。特征值区间可以由人工依经验来划分,也可以由服务器自动划分,本申请以下实施例二给出了具体的服务器自动划分的方式,并给出了每个特征值区间对应的位置稳定性贡献系数的具体确定方式。
实施例三
如图3所示,为本申请实施例三提供的风险识别方法流程图,包括以下步骤:
S301:服务器获取多个样本用户中,每个样本用户在预设的多种地理位置特征下的特征值;所述多个样本用户包括多个安全类型的样本用户和多个风险类型的样本用户。
S302:根据以下步骤确定每一种地理位置特征的各个特征值区间:
将该种地理位置特征下的每个特征值作为一个特征值区间(这里,每种地理位置特征的各个特征值可以是基于各个样本用户在该种地理位置特征下的特征值,归纳出的各个不重复特征值);
确定当前每一对相邻的特征值区间的卡方值,将确定的最小的卡方值所对应的一对相邻的特征值区间进行合并;重复该步骤,直到该种地理位置特征下的特征值区间数目达到预设区间数目;其中,所述卡方值用于表征针对一对相邻的特征值区间,具有其中一个特征值区间下的特征值的、不同类型的样本用户数目占比,与具有另一个特征值区间下的特征值的、不同类型的样本用户数目占比之间的差异。
该实施方式的基本思想是:首先将每一个特征值作为一个特征值区间,然后将当前的相邻特征值区间中,卡方值最小的一对特征值区间进行合并,卡方值最小的一对特征值区间也即为不同类型的样本用户分布最接近的特征值区间,将其合并不会违背尽量将不同类型的样本用户分布在不同的特征值区间内的原则(也即使不同的特征值区间内的样本用户类型分布尽量不同的原则)。
在具体实施中,可以根据以下公式确定所述卡方值:
Figure PCTCN2016085935-appb-000003
其中,
Figure PCTCN2016085935-appb-000004
Aij表示在一对相邻的特征值区间中,具有第i个特征值区间下的特征值的、第j种类型的样本用户数目;Eij表示在该对相邻的特征值区间中,具有第i个特征值区间下的特征值的、第j种类型的样本用户数目的期望值,N为具有该对相邻的特征值区间下的特征值的样本用户总数。
S303:为S302确定出的每一种地理位置特征的每个特征值区间确定一个位置稳定性贡献系数。
具体地,可以根据以下方式之一确定位置稳定性贡献系数:
方式一:根据所述多个样本用户中每个样本用户在该种地理位置特征下的特征值,确定具有该任一特征值区间下的特征值的、安全类型的样本用户数目与风险类型的样本 用户数目之间的第一比值,以及所述多个样本用户中安全类型的样本用户总数与风险类型的样本用户总数之间的第二比值;根据所述第一比值和第二比值之间的比值,确定所述任一特征值区间对应的位置稳定性贡献系数;
方式二:根据所述多个样本用户中每个样本用户在该种地理位置特征下的特征值,确定具有该任一特征值区间下的特征值的、安全类型的样本用户数目与所述多个样本用户中安全类型的样本用户总数之间的第三比值,以及,具有该任一特征值区间下的特征值的、风险类型的样本用户数目与所述多个样本用户中风险类型的样本用户总数之间的第四比值;根据所述第三比值和第四比值之间的比值,确定所述任一特征值区间对应的位置稳定性贡献系数。
具体地,可以根据以下公式确定所述任一特征值区间对应的位置稳定性贡献系数WOE:
WOE=ln(P1/P0);
其中,P1表示所述第一比值,P0表示所述第二比值;或者,P1表示所述第三比值,P0表示所述第四比值。
在具体实施中,在将每种地理位置特征下的各个特征值离散化为各个特征值区间后,为了量化不同种地理位置特征的每个特征值区间对位置稳定性贡献度的差异,根据上述方式确定每个特征值区间对应的位置稳定性贡献系数WOE。这样,不仅同一种地理位置特征的各个特征值区间之间可以直接进行量化比较,不同种地理位置特征的各个特征值区间之间也都可以直接进行量化比较。比如,将出现的所有城市数这种地理位置特征离散化为4个特征值区间,分别为0-3个城市、4-7个城市、8~12个城市、12个城市以上之后,对每个特征值区间计算一个WOE值,这些WOE值与其它地理位置特征对应的不同特征值区间的WOE值就具有可比性了。
在具体实施中,任何对以上方式一和方式二的简单变形都在本申请实施例的保护范围内。比如,还可以确定具有该任一特征值区间下的特征值的、安全类型的样本用户数目,与所述多个样本用户中风险类型的样本用户总数之间的第一乘积,以及具有该任一特征值区间下的特征值的、风险类型的样本用户数目,与所述多个样本用户中安全类型的样本用户总数之间的第二乘积,根据第一乘积和第二乘积之间的比值,确定所述位置稳定性贡献系数WOE,此时,上述公式中,P1表示所述第一乘积,P0表示所述第二乘积。
S304:针对每种地理位置特征,根据每个样本用户在该种地理位置特征下的特征值 所属的特征值区间,以及该种地理位置特征的每个特征值区间对应的位置稳定性贡献系数,确定每个样本用户在该种地理位置特征下的位置稳定性贡献系数。
该步骤中,针对每种地理位置特征,基于S203确定出的特征值区间与位置稳定性贡献系数的对应关系,以及每个样本用户的特征值所属的特征值区间,确定该样本用户的位置稳定性贡献系数。
S305:根据所述多个样本用户中每个样本用户在每种地理位置特征下的位置稳定性贡献系数,以及每个样本用户的样本用户类型,训练出机器分类模型;其中,任一样本用户在每种地理位置特征下的位置稳定性贡献系数为所述机器分类模型的输入值,该样本用户的样本用户类型对应的位置稳定性指数为所述机器分类模型的输出值;所述位置稳定性指数用于衡量位置的稳定性。
本实施例采用的机器分类模型可以为逻辑回归模型,即为:
Figure PCTCN2016085935-appb-000005
其中,Index表示位置稳定性指数,θi为逻辑回归系数(即为S205需要训练的系数),fi为在第i种地理位置特征下的特征值,f0=1,n为地理位置特征的种数。
在具体实施过程中,不同种地理位置特征之间可能会存在相关性,这种相关性有可能导致模型参数值与实际业务理解不相符,比如我们采用逻辑回归模型预测用户的收入,两种特征为年龄和学历,从实际业务理解上来说,年龄越大收入往往越高,学历越高,收入往往也越高,但训练模型得到的年龄的逻辑回归系数可能会为负数,造成这种不一致的原因是,年龄和学历存在一定的相关性,并且学历与收入的相关性大于年龄与收入的相关性,学历对年龄进行了抑制,从而出现了逻辑回归系数为负数的情况。为此,本申请实施例采用以下主成分分析(Principal Component Analysis,PCA)方式,对原有的地理位置特征进行线性变换,也即进行降维处理,避免相关性较大的地理位置特征的同时参与地理位置稳定性分析。
具体地,采用PCA方式处理后的逻辑回归模型为:
Figure PCTCN2016085935-appb-000006
其中,θ′i为逻辑回归系数,
Figure PCTCN2016085935-appb-000007
f′i为对各种地理位置特征进行线性变换后 的第i种特征,m为进行线性变换后的特征种数,wk为进行线性变换时fk的系数,fk为在第k种地理位置特征下的特征值,n为地理位置特征的种数,且m<n。
这里,经过上述PCA处理后,为了便于业务理解,可还原得到每种地理位置特征的原逻辑回归系数
Figure PCTCN2016085935-appb-000008
S306:针对任一待识别用户,根据该待识别用户在每种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数。
S307:将所述待识别用户在每种地理位置特征下的位置稳定性贡献系数输入训练出的机器分类模型,以所述机器分类模型的输出值作为该待识别用户的位置稳定性指数,该位置稳定性指数用于衡量所述待识别用户驻留位置的稳定性。
S308:基于确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。
实施例四
该实施例四中,在进行地理位置模型训练之前,进一步给出了进行地理位置特征筛选的步骤。
如图4所示,为本申请实施例四提供的风险识别方法流程图,包括以下步骤:
S401:服务器获取多个样本用户中,每个样本用户在预设的多种地理位置特征下的特征值;所述多个样本用户包括多个安全类型的样本用户和多个风险类型的样本用户。
S402:针对每种地理位置特征,根据每个样本用户在该种地理位置特征下的特征值所属的特征值区间,以及该种地理位置特征的每个特征值区间对应的位置稳定性贡献系数,确定每个样本用户在该种地理位置特征下的位置稳定性贡献系数;其中,每个特征值区间对应的位置稳定性贡献系数用于表征具有该特征值区间下的特征值的、安全类型的样本用户数目与风险类型的样本用户数目的比例,和获取的所述多个样本用户中、安全类型的样本用户总数与风险类型的样本用户总数的比例之间的差异。
S403:根据不同种地理位置特征之间的相关系数,确定相关系数大于设定阈值的各对地理位置特征。
具体地,可以根据以下公式确定不同种地理位置特征之间的相关系数:
Figure PCTCN2016085935-appb-000009
其中,λ为样本用户个数,Xi为第i个样本用户在一种地理位置特征X下的特征值,
Figure PCTCN2016085935-appb-000010
为所有样本用户在地理位置特征X下的特征值的平均值,Yi为第i个样本用户在另一种地理位置特征Y下的特征值,
Figure PCTCN2016085935-appb-000011
为所有样本用户在地理位置特征Y下的特征值的平均值。
比如相关系数的设定阈值可以取0.6,当两种地理位置特征之间的相关系数大于0.6时,需要筛选掉其中的一种地理位置特征。
S404:针对每一对相关系数大于设定阈值的地理位置特征,根据该对地理位置特征中,每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从该对地理位置特征中筛选出一种地理位置特征用于训练机器分类模型。
该步骤中,针对每一对相关性较大的地理位置特征,筛选掉其中一个位置稳定贡献度较小的地理位置特征。在具体实施中,可以直接基于位置稳定性贡献系数WOE进行筛选,比如针对每一对相关系数大于设定阈值的地理位置特征,确定其中每个地理位置特征的各个特征值区间对应的位置稳定性贡献系数WOE的和值,将对应的和值较小的地理特征筛选掉。优选地,还可以基于以下步骤进行地理特征筛选:
具体地,根据
Figure PCTCN2016085935-appb-000012
确定每种地理位置特征的贡献值IV;其中,针对任一种地理位置特征,P1k表示具有第k个特征值区间中的特征值的安全类型的样本用户数目,占获取的所述多个样本用户中安全类型的样本用户总数目的比率,P0k表示具有第k个特征值区间中的特征值的风险类型的样本用户数目,占获取的所述多个样本用户中风险类型的样本用户总数目的比率,WOE(k)表示第k个特征值区间对应的位置稳定性贡献系数,q为该种地理位置特征的特征值区间数目;
针对该对地理位置特征中的每种地理位置特征,确定使贡献值IV最小的一种地理位置特征,将IV最小的一种地理位置特征确定为从该对地理位置特征中筛选出的一种地理位置特征。
在上述实施方式下,考虑到当一个特征值区间所对应的样本用户总数比较少时,WOE的取值所反映的位置稳定性贡献度可能并不客观(比如,一个特征值区间内样本用户总数本身比较少,此时安全类型的样本用户数目与风险类型的样本用户数目的比值虽 然大,但并不能就完全认为该特征值区间的位置稳定性贡献度比较大),在确定IV值时,将WOE值乘上该特征值区间所对应的安全类型与风险类型样本用户各自出现的概率之差。
S405:根据所述多个样本用户中每个样本用户在筛选出的每种地理位置特征下的位置稳定性贡献系数,以及每个样本用户的样本用户类型,训练出机器分类模型;其中,任一样本用户在每种地理位置特征下的位置稳定性贡献系数为所述机器分类模型的输入值,该样本用户的样本用户类型对应的位置稳定性指数为所述机器分类模型的输出值;所述位置稳定性指数用于衡量位置的稳定性。
S406:针对任一待识别用户,根据该待识别用户在每种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数。
S407:将该待识别用户在每种地理位置特征下的位置稳定性贡献系数输入机器分类模型,将所述机器分类模型的输出值确定为该待识别用户的位置稳定性指数,该位置稳定性指数用于衡量所述待识别用户驻留位置的稳定性。
S408:基于确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。
实施例五
结合上述实施例一~三,下面通过一个具体的实施例对本申请思想作进一步说明。
如图5所示,为本申请实施例五提供的风险识别方法流程图,包括:
S501:服务器获取多个样本用户中,每个样本用户在预设的多种地理位置特征下的特征值;所述多个样本用户包括多个安全类型的样本用户和多个风险类型的样本用户。
比如,可以获取三大类地理位置特征,分别为常驻城市稳定性特征、出现不同频率的城市分布特征、当前常驻城市的稳定性特征。其中,常驻城市稳定性特征可以包括:月平均不同常驻城市数(在统计时间长度比如2年内的所有常驻城市除以统计时间长度所占的月数)、月常驻城市概率均值(用户在所有常驻城市驻留概率的均值)、月常驻城市概率方差(用户在所有常驻城市驻留概率的方差)等;出现不同频率的城市分布特征可以包括:用户驻留的所有城市数、驻留过1~3个月的城市占比、驻留过4~6个月的城市占比、驻留过7~12个月的城市占比、驻留过13~24个月的城市占比、统计到用户驻留位置的所有月数等;当前常驻城市的稳定性特征可以包括:用户当前在该当前常驻城市的驻留概率、当前常驻城市作为常驻城市的月数、在当前常驻城市作为常驻城市的月 份中,用户在该当前常驻城市的驻留概率均值,在当前常驻城市作为常驻城市的月份中,用户在该当前常驻城市的驻留概率方差等。
上述地理位置特征中都涉及到了常驻城市,这里的常驻城市为选择出的用户在设定时间段,比如某个月内驻留时间最长的城市。在具体实施中,可以根据用户驻留在每个城市的天数,以及用户可能驻留的所有城市数来确定用户在每一个城市的驻留概率,选择对应的驻留概率最大的城市作为常驻城市。比如,任一城市对应的驻留概率的计算方式可以为:
Figure PCTCN2016085935-appb-000013
其中,E表示在设定时间段(比如为某个月)内驻留在该城市的期望天数,e1表示驻留在第i个未出现城市(表示未统计到的、用户可能驻留的城市)的期望天数,e2表示驻留在第j个驻留城市的期望天数,CNT为用户驻留在该城市的天数,L为设定时间段的长度,比如为30天,M为用户可能驻留的城市总数,比如M=12(取用户可能驻留的城市总数的99分位数),N为用户在该设定时间段内总共驻留的城市数,CNTj为用户驻留在第j个城市的天数。
S502:针对每一种地理位置特征,执行:将该种地理位置特征下的每个特征值作为一个特征值区间;确定当前每一对相邻的特征值区间的卡方值,将确定的最小的卡方值所对应的一对相邻的特征值区间进行合并,重复该步骤,直到该种地理位置特征下的特征值区间数目达到预设区间数目。
具体地,根据以下公式确定所述卡方值:
Figure PCTCN2016085935-appb-000014
其中,
Figure PCTCN2016085935-appb-000015
Aij表示在一对相邻的特征值区间中,具有第i个特征值区间下的特征值的、第j种类型的样本用户数目;Eij表示在该对相邻的特征值区间中,具有第i个特征值区间下的特征值的、第j种类型的样本用户数目的期望值,N为具有该对相邻的特征值区间下的特征值的样本用户总数。
S503:为S502确定出的每一种地理位置特征的每个特征值区间确定一个位置稳定性贡献系数。
具体确定位置稳定性贡献系数WOE的方式可参见上述实施例二关于S203的描述,这里不再详述。
S504:根据不同种地理位置特征之间的相关系数,确定相关系数大于设定阈值的各对地理位置特征。
具体地,可以根据以下公式确定不同种地理位置特征之间的相关系数:
Figure PCTCN2016085935-appb-000016
其中,λ为样本用户个数,Xi为第i个样本用户在一种地理位置特征X下的特征值,
Figure PCTCN2016085935-appb-000017
为所有样本用户在地理位置特征X下的特征值的平均值,Yi为第i个样本用户在另一种地理位置特征Y下的特征值,
Figure PCTCN2016085935-appb-000018
为所有样本用户在地理位置特征Y下的特征值的平均值。
比如相关系数的设定阈值可以取0.6,当两种地理位置特征之间的相关系数大于0.6时,需要筛选掉其中的一种地理位置特征。
S505:针对每一对相关系数大于设定阈值的地理位置特征,根据该对地理位置特征中,每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从该对地理位置特征中筛选出一种地理位置特征用于作为确定逻辑回归模型中的逻辑回归系数的地理位置特征。
具体地,根据
Figure PCTCN2016085935-appb-000019
确定每种地理位置特征的贡献值IV;其中,针对任一种地理位置特征,P1k表示具有第k个特征值区间中的特征值的安全类型的样本用户数目,占获取的所述多个样本用户中安全类型的样本用户总数目的比率,P0k表示具有第k个特征值区间中的特征值的风险类型的样本用户数目,占获取的所述多个样本用户中风险类型的样本用户总数目的比率,WOE(k)表示第k个特征值区间对应的位置稳定性贡献系数,q为该种地理位置特征的特征值区间数目;针对该对地理位置特征中的每种地理位置特征,确定使贡献值IV最小的一种地理位置特征,将IV最小的一种地理位置特征确定为从该对地理位置特征中筛选出的一种地理位置特征。
比如经过上述特征筛选过程,选择出的地理位置特征共有11种,分别为:月平均不同常驻城市数、月常驻城市概率均值、月常驻城市概率方差、用户驻留的所有城市数、驻留过1~3个月的城市占比、驻留过4~6个月的城市占比、驻留过13~24个月的城市占 比、统计到用户驻留位置的所有月数、用户当前在该当前常驻城市的驻留概率、当前常驻城市作为常驻城市的月数、在当前常驻城市作为常驻城市的月份中,用户在该当前常驻城市的驻留概率方差。
S506:针对筛选出的每种地理位置特征,根据每个样本用户在该种地理位置特征下的特征值所属的特征值区间,以及该种地理位置特征的每个特征值区间对应的位置稳定性贡献系数,确定每个样本用户在该种地理位置特征下的位置稳定性贡献系数。
该步骤中,针对筛选出的每种地理位置特征,基于S403确定出的特征值区间与位置稳定性贡献系数的对应关系,以及每个样本用户的特征值所属的特征值区间,确定该样本用户的位置稳定性贡献系数。
S507:根据所述多个样本用户中每个样本用户在筛选出的每种地理位置特征下的位置稳定性贡献系数,以及每个样本用户的样本用户类型,确定逻辑回归模型中的逻辑回归系数;其中,任一样本用户在每种地理位置特征下的位置稳定性贡献系数为所述逻辑回归模型的输入值,该样本用户的样本用户类型对应的位置稳定性指数为所述逻辑回归模型的输出值。
具体地,采用PCA方式处理得到逻辑回归模型为:
Figure PCTCN2016085935-appb-000020
其中,θ′i为逻辑回归系数,
Figure PCTCN2016085935-appb-000021
f′i为对各种地理位置特征进行线性变换后的第i种特征,m为进行线性变换后的特征种数,wk为进行线性变换时fk的系数,fk为在第k种地理位置特征下的特征值,n为地理位置特征的种数,且m<n。
这里,经过上述PCA处理后,为了便于业务理解,可还原得到每种地理位置特征的原逻辑回归系数
Figure PCTCN2016085935-appb-000022
S508:针对任一待识别用户,根据该待识别用户在每种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数。
S509:将该待识别用户在每种地理位置特征下的位置稳定性贡献系数输入逻辑回归模型,以所述逻辑回归模型的输出值作为该待识别用户的位置稳定性指数,该位置稳定性指数用于衡量所述待识别用户驻留位置的稳定性。
S510:基于确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。
该步骤中,将待识别用户在每种地理位置特征下的位置稳定性贡献系数输入训练出的逻辑回归模型,得到逻辑回归模型的输出值,即为待识别用户的位置稳定性指数,该位置稳定性指数的值即表征了待识别用户的位置稳定性特征。在对该待识别用户进行风险识别时,可以考虑该待识别用户的位置稳定性指数,比如若位置稳定性指数大于设定阈值,则认为待识别用户为安全用户,否则为风险用户。
如图6(a)所示,曲线A所示为风险用户的位置稳定性指数分布,曲线B所示为安全用户的位置稳定性指数分布,横坐标为位置稳定性指数,纵坐标为分布密度(标识用户数量),从图中可见,用户的位置稳定性指数越高,该用户作为风险用户的几率越低。再如图6(b)所示,分别显示了为高信用分值用户(曲线C)、中信用分值用户(曲线D)和低信用分值用户(曲线D)的位置稳定性指数分布曲线,从图中可见,用户的位置稳定性指数越高,该用户的信用也会相对更好。
基于同一发明构思,本申请实施例中还提供了一种与风险识别方法对应的风险识别装置,由于该装置解决问题的原理与本申请实施例风险识别方法相似,因此该装置的实施可以参见方法的实施,重复之处不再赘述。
实施例六
如图7所示,为本申请实施例提供的风险识别装置结构示意图,包括:
第一确定模块71,用于根据待识别用户在至少一种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数;
第二确定模块72,用于根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,确定所述待识别用户的位置稳定性指数,所述位置稳定性指数用于衡量所述待识别用户驻留位置的稳定性;
识别模块73,用于基于第二确定模块72确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。
可选地,所述第二确定模块72具体用于:
将所述待识别用户在每种地理位置特征下的位置稳定性贡献系数输入机器分类模型,将所述机器分类模型的输出值确定为该待识别用户的位置稳定性指数;所述机器分类模型是预先通过训练得到的分类模型,用于根据用户在不同种地理位置特征下的位置 稳定性贡献系数,预测该用户的位置稳定性指数。
可选地,所述装置还包括:
模型训练模块74,用于在所述第二确定模块72将待识别用户在每种地理位置特征下的位置稳定性贡献系数输入机器分类模型之前,获取多个样本用户中,每个样本用户在预设的多种地理位置特征下的特征值;所述多个样本用户包括多个安全类型的样本用户和多个风险类型的样本用户;针对每种地理位置特征,根据每个样本用户在该种地理位置特征下的特征值所属的特征值区间,以及该种地理位置特征的每个特征值区间对应的位置稳定性贡献系数,确定每个样本用户在该种地理位置特征下的位置稳定性贡献系数;根据所述多个样本用户中每个样本用户在每种地理位置特征下的位置稳定性贡献系数,以及每个样本用户的样本用户类型,训练出所述机器分类模型;其中,任一样本用户在每种地理位置特征下的位置稳定性贡献系数为所述机器分类模型的输入值,该样本用户的样本用户类型对应的位置稳定性指数为所述机器分类模型的输出值。
可选地,针对每种地理位置特征,所述模型训练模块74具体用于根据以下步骤确定该种地理位置特征的任一特征值区间对应的位置稳定性贡献系数:
根据所述多个样本用户中每个样本用户在该种地理位置特征下的特征值,确定具有该任一特征值区间下的特征值的、安全类型的样本用户数目与风险类型的样本用户数目之间的第一比值,以及所述多个样本用户中安全类型的样本用户总数与风险类型的样本用户总数之间的第二比值;根据所述第一比值和第二比值之间的比值,确定所述任一特征值区间对应的位置稳定性贡献系数;或者,
根据所述多个样本用户中每个样本用户在该种地理位置特征下的特征值,确定具有该任一特征值区间下的特征值的、安全类型的样本用户数目与所述多个样本用户中安全类型的样本用户总数之间的第三比值,以及,具有该任一特征值区间下的特征值的、风险类型的样本用户数目与所述多个样本用户中风险类型的样本用户总数之间的第四比值;根据所述第三比值和第四比值之间的比值,确定所述任一特征值区间对应的位置稳定性贡献系数。
可选地,所述模型训练模块74具体用于根据以下公式确定所述任一特征值区间对应的位置稳定性贡献系数WOE:
WOE=ln(P1/P0);
其中,P1表示所述第一比值,P0表示所述第二比值;或者,P1表示所述第三比值,P0表示所述第四比值。
可选地,所述模型训练模块74具体用于根据以下步骤确定任一种地理位置特征的各个特征值区间:
将该种地理位置特征下的每个特征值作为一个特征值区间;
确定当前每一对相邻的特征值区间的卡方值,将确定的最小的卡方值所对应的一对相邻的特征值区间进行合并;重复该步骤,直到该种地理位置特征下的特征值区间数目达到预设区间数目;其中,所述卡方值用于表征针对一对相邻的特征值区间,具有其中一个特征值区间下的特征值的、不同类型的样本用户数目占比,与具有另一个特征值区间下的特征值的、不同类型的样本用户数目占比之间的差异。
可选地,所述模型训练模块74具体用于根据以下公式确定所述卡方值:
Figure PCTCN2016085935-appb-000023
其中,
Figure PCTCN2016085935-appb-000024
Aij表示在一对相邻的特征值区间中,具有第i个特征值区间下的特征值的、第j种类型的样本用户数目;Eij表示在该对相邻的特征值区间中,具有第i个特征值区间下的特征值的、第j种类型的样本用户数目的期望值,N为具有该对相邻的特征值区间下的特征值的样本用户总数。
可选地,所述模型训练模块74具体用于,在训练机器分类模型之前,根据不同种地理位置特征之间的相关系数,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从所述预设的多种地理位置特征中筛选出用于训练机器分类模型的地理位置特征。
可选地,所述模型训练模块74具体用于,根据不同种地理位置特征之间的相关系数,确定相关系数大于设定阈值的各对地理位置特征;针对每一对相关系数大于设定阈值的地理位置特征,根据该对地理位置特征中,每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从该对地理位置特征中筛选出一种地理位置特征用于作为训练机器分类模型的地理位置特征。
可选地,所述模型训练模块74具体用于:
根据
Figure PCTCN2016085935-appb-000025
确定每种地理位置特征的贡献值IV;其中,针对任一种地理位置特征,P1k表示具有第k个特征值区间中的特征值的安全类型的样本用户数目,占获取的所述多个样本用户中安全类型的样本用户总数目的比率,P0k表示具有 第k个特征值区间中的特征值的风险类型的样本用户数目,占获取的所述多个样本用户中风险类型的样本用户总数目的比率,WOE(k)表示第k个特征值区间对应的位置稳定性贡献系数,q为该种地理位置特征的特征值区间数目;针对该对地理位置特征中的每种地理位置特征,确定使贡献值IV最小的一种地理位置特征,将IV最小的一种地理位置特征确定为从该对地理位置特征中筛选出的一种地理位置特征。
可选地,所述机器分类模型为:
Figure PCTCN2016085935-appb-000026
其中,Index表示位置稳定性指数,θi为逻辑回归系数,fi为在第i种地理位置特征下的特征值,f0=1,n为地理位置特征的种数。
可选地,所述机器分类模型为:
Figure PCTCN2016085935-appb-000027
其中,θ′i为逻辑回归系数,
Figure PCTCN2016085935-appb-000028
f′i为对各种地理位置特征进行线性变换后的第i种特征,m为进行线性变换后的特征种数,wk为进行线性变换时fk的系数,fk为在第k种地理位置特征下的特征值,n为地理位置特征的种数,且m<n。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、装置(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (18)

  1. 一种风险识别方法,其特征在于,该方法包括:
    服务器根据待识别用户在至少一种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数;
    根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,确定所述待识别用户的位置稳定性指数,所述位置稳定性指数用于衡量所述待识别用户驻留位置的稳定性;
    基于确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。
  2. 如权利要求1所述的方法,其特征在于,服务器根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,确定所述待识别用户的位置稳定性指数,包括:
    将所述待识别用户在每种地理位置特征下的位置稳定性贡献系数输入机器分类模型,将所述机器分类模型的输出值确定为该待识别用户的位置稳定性指数;所述机器分类模型是预先通过训练得到的分类模型,用于根据用户在不同种地理位置特征下的位置稳定性贡献系数,预测该用户的位置稳定性指数。
  3. 如权利要求2所述的方法,其特征在于,所述服务器根据以下步骤训练出所述机器分类模型:
    所述服务器获取多个样本用户中,每个样本用户在预设的多种地理位置特征下的特征值;所述多个样本用户包括多个安全类型的样本用户和多个风险类型的样本用户;
    针对每种地理位置特征,根据每个样本用户在该种地理位置特征下的特征值所属的特征值区间,以及该种地理位置特征的每个特征值区间对应的位置稳定性贡献系数,确定每个样本用户在该种地理位置特征下的位置稳定性贡献系数;
    根据所述多个样本用户中每个样本用户在每种地理位置特征下的位置稳定性贡献系数,以及每个样本用户的样本用户类型,训练出所述机器分类模型;其中,任一样本用户在每种地理位置特征下的位置稳定性贡献系数为所述机器分类模型的输入值,该样本用户的样本用户类型对应的位置稳定性指数为所述机器分类模型的输出值。
  4. 如权利要求3所述的方法,其特征在于,针对每种地理位置特征,所述服务器根据以下步骤确定该种地理位置特征的任一特征值区间对应的位置稳定性贡献系数:
    根据所述多个样本用户中每个样本用户在该种地理位置特征下的特征值,确定具有 该任一特征值区间下的特征值的、安全类型的样本用户数目与风险类型的样本用户数目之间的第一比值,以及所述多个样本用户中安全类型的样本用户总数与风险类型的样本用户总数之间的第二比值;根据所述第一比值和第二比值之间的比值,确定所述任一特征值区间对应的位置稳定性贡献系数;或者,
    根据所述多个样本用户中每个样本用户在该种地理位置特征下的特征值,确定具有该任一特征值区间下的特征值的、安全类型的样本用户数目与所述多个样本用户中安全类型的样本用户总数之间的第三比值,以及,具有该任一特征值区间下的特征值的、风险类型的样本用户数目与所述多个样本用户中风险类型的样本用户总数之间的第四比值;根据所述第三比值和第四比值之间的比值,确定所述任一特征值区间对应的位置稳定性贡献系数。
  5. 如权利要求4所述的方法,其特征在于,所述服务器根据以下公式确定所述任一特征值区间对应的位置稳定性贡献系数WOE:
    WOE=ln(P1/P0);
    其中,P1表示所述第一比值,P0表示所述第二比值;或者,P1表示所述第三比值,P0表示所述第四比值。
  6. 如权利要求3~5任一所述的方法,其特征在于,所述服务器根据以下步骤确定任一种地理位置特征的各个特征值区间:
    将该种地理位置特征下的每个特征值作为一个特征值区间;
    确定当前每一对相邻的特征值区间的卡方值,将确定的最小的卡方值所对应的一对相邻的特征值区间进行合并;重复该步骤,直到该种地理位置特征下的特征值区间数目达到预设区间数目;
    其中,所述卡方值用于表征针对一对相邻的特征值区间,具有其中一个特征值区间下的特征值的、不同类型的样本用户数目占比,与具有另一个特征值区间下的特征值的、不同类型的样本用户数目占比之间的差异。
  7. 如权利要求3所述的方法,其特征在于,所述服务器训练所述机器分类模型之前,还包括:
    根据不同种地理位置特征之间的相关系数,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从所述预设的多种地理位置特征中筛选出用于训练所述机器分类模型的地理位置特征。
  8. 如权利要求7所述的方法,其特征在于,根据不同种地理位置特征之间的相关 系数,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从所述预设的多种地理位置特征中筛选出用于训练所述机器分类模型的地理位置特征,包括:
    根据不同种地理位置特征之间的相关系数,确定相关系数大于设定阈值的各对地理位置特征;
    针对每一对相关系数大于设定阈值的地理位置特征,根据该对地理位置特征中,每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从该对地理位置特征中筛选出一种地理位置特征用于作为训练所述机器分类模型的地理位置特征。
  9. 如权利要求8所述的方法,其特征在于,针对每一对相关系数大于设定阈值的地理位置特征,根据该对地理位置特征中,每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从该对地理位置特征中筛选出一种地理位置特征,包括:
    根据
    Figure PCTCN2016085935-appb-100001
    确定每种地理位置特征的贡献值IV;其中,针对任一种地理位置特征,P1k表示具有第k个特征值区间中的特征值的安全类型的样本用户数目,占获取的所述多个样本用户中安全类型的样本用户总数目的比率,P0k表示具有第k个特征值区间中的特征值的风险类型的样本用户数目,占获取的所述多个样本用户中风险类型的样本用户总数目的比率,WOE(k)表示第k个特征值区间对应的位置稳定性贡献系数,q为该种地理位置特征的特征值区间数目;
    针对该对地理位置特征中的每种地理位置特征,确定使贡献值IV最小的一种地理位置特征,将IV最小的一种地理位置特征确定为从该对地理位置特征中筛选出的一种地理位置特征。
  10. 一种风险识别装置,其特征在于,该装置包括:
    第一确定模块,用于根据待识别用户在至少一种地理位置特征下的特征值,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,确定该待识别用户在每种地理位置特征下的位置稳定性贡献系数;
    第二确定模块,用于根据所述待识别用户在每种地理位置特征下的位置稳定性贡献系数,确定所述待识别用户的位置稳定性指数,所述位置稳定性指数用于衡量所述待识别用户驻留位置的稳定性;
    识别模块,用于基于所述第二确定模块确定的所述待识别用户的位置稳定性指数,对该待识别用户进行风险识别。
  11. 如权利要求10所述的装置,其特征在于,所述第二确定模块具体用于:
    将所述待识别用户在每种地理位置特征下的位置稳定性贡献系数输入机器分类模型,将所述机器分类模型的输出值确定为该待识别用户的位置稳定性指数;所述机器分类模型是预先通过训练得到的分类模型,用于根据用户在不同种地理位置特征下的位置稳定性贡献系数,预测该用户的位置稳定性指数。
  12. 如权利要求11所述的装置,其特征在于,所述装置还包括:
    模型训练模块,用于获取多个样本用户中,每个样本用户在预设的多种地理位置特征下的特征值;所述多个样本用户包括多个安全类型的样本用户和多个风险类型的样本用户;针对每种地理位置特征,根据每个样本用户在该种地理位置特征下的特征值所属的特征值区间,以及该种地理位置特征的每个特征值区间对应的位置稳定性贡献系数,确定每个样本用户在该种地理位置特征下的位置稳定性贡献系数;根据所述多个样本用户中每个样本用户在每种地理位置特征下的位置稳定性贡献系数,以及每个样本用户的样本用户类型,训练出所述机器分类模型;其中,任一样本用户在每种地理位置特征下的位置稳定性贡献系数为所述机器分类模型的输入值,该样本用户的样本用户类型对应的位置稳定性指数为所述机器分类模型的输出值。
  13. 如权利要求12所述的装置,其特征在于,针对每种地理位置特征,所述模型训练模块具体用于根据以下步骤确定该种地理位置特征的任一特征值区间对应的位置稳定性贡献系数:
    根据所述多个样本用户中每个样本用户在该种地理位置特征下的特征值,确定具有该任一特征值区间下的特征值的、安全类型的样本用户数目与风险类型的样本用户数目之间的第一比值,以及所述多个样本用户中安全类型的样本用户总数与风险类型的样本用户总数之间的第二比值;根据所述第一比值和第二比值之间的比值,确定所述任一特征值区间对应的位置稳定性贡献系数;或者,
    根据所述多个样本用户中每个样本用户在该种地理位置特征下的特征值,确定具有该任一特征值区间下的特征值的、安全类型的样本用户数目与所述多个样本用户中安全类型的样本用户总数之间的第三比值,以及,具有该任一特征值区间下的特征值的、风险类型的样本用户数目与所述多个样本用户中风险类型的样本用户总数之间的第四比值;根据所述第三比值和第四比值之间的比值,确定所述任一特征值区间对应的位置稳定性贡献系数。
  14. 如权利要求13所述的装置,其特征在于,所述模型训练模块具体用于根据以 下公式确定所述任一特征值区间对应的位置稳定性贡献系数WOE:
    WOE=ln(P1/P0);
    其中,P1表示所述第一比值,P0表示所述第二比值;或者,P1表示所述第三比值,P0表示所述第四比值。
  15. 如权利要求12~14任一所述的装置,其特征在于,所述模型训练模块具体用于根据以下步骤确定任一种地理位置特征的各个特征值区间:
    将该种地理位置特征下的每个特征值作为一个特征值区间;
    确定当前每一对相邻的特征值区间的卡方值,将确定的最小的卡方值所对应的一对相邻的特征值区间进行合并;重复该步骤,直到该种地理位置特征下的特征值区间数目达到预设区间数目;其中,所述卡方值用于表征针对一对相邻的特征值区间,具有其中一个特征值区间下的特征值的、不同类型的样本用户数目占比,与具有另一个特征值区间下的特征值的、不同类型的样本用户数目占比之间的差异。
  16. 如权利要求12所述的装置,其特征在于,所述模型训练模块具体用于,在训练所述机器分类模型之前,根据不同种地理位置特征之间的相关系数,以及每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从所述预设的多种地理位置特征中筛选出用于训练所述机器分类模型的地理位置特征。
  17. 如权利要求16所述的装置,其特征在于,所述模型训练模块具体用于,根据不同种地理位置特征之间的相关系数,确定相关系数大于设定阈值的各对地理位置特征;针对每一对相关系数大于设定阈值的地理位置特征,根据该对地理位置特征中,每种地理位置特征的各个特征值区间分别对应的位置稳定性贡献系数,从该对地理位置特征中筛选出一种地理位置特征用于作为训练所述机器分类模型的地理位置特征。
  18. 如权利要求17所述的装置,其特征在于,所述模型训练模块具体用于:
    根据
    Figure PCTCN2016085935-appb-100002
    确定每种地理位置特征的贡献值IV;其中,针对任一种地理位置特征,P1k表示具有第k个特征值区间中的特征值的安全类型的样本用户数目,占获取的所述多个样本用户中安全类型的样本用户总数目的比率,P0k表示具有第k个特征值区间中的特征值的风险类型的样本用户数目,占获取的所述多个样本用户中风险类型的样本用户总数目的比率,WOE(k)表示第k个特征值区间对应的位置稳定性贡献系数,q为该种地理位置特征的特征值区间数目;针对该对地理位置特征中的每种地理位置特征,确定使贡献值IV最小的一种地理位置特征,将IV最小的一种地理位 置特征确定为从该对地理位置特征中筛选出的一种地理位置特征。
PCT/CN2016/085935 2015-06-24 2016-06-16 一种风险识别方法及装置 WO2016206557A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510354187.XA CN106295351B (zh) 2015-06-24 2015-06-24 一种风险识别方法及装置
CN201510354187.X 2015-06-24

Publications (1)

Publication Number Publication Date
WO2016206557A1 true WO2016206557A1 (zh) 2016-12-29

Family

ID=57584723

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/085935 WO2016206557A1 (zh) 2015-06-24 2016-06-16 一种风险识别方法及装置

Country Status (2)

Country Link
CN (1) CN106295351B (zh)
WO (1) WO2016206557A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852443A (zh) * 2019-09-26 2020-02-28 支付宝(杭州)信息技术有限公司 特征稳定性检测方法、设备及计算机可读介质
CN111400663A (zh) * 2020-03-17 2020-07-10 深圳前海微众银行股份有限公司 风险识别方法、装置、设备及计算机可读存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800933B (zh) * 2017-11-17 2021-09-03 北京京东金融科技控股有限公司 风险评估方法及装置、存储介质、电子设备
CN110197435B (zh) * 2018-04-23 2023-09-26 腾讯科技(深圳)有限公司 对象识别方法和装置、存储介质及电子装置
CN109919783A (zh) * 2019-01-31 2019-06-21 德联易控科技(北京)有限公司 车险理赔案件的风险识别方法、装置、设备及存储介质
CN110033278B (zh) * 2019-03-27 2023-06-23 创新先进技术有限公司 风险识别方法和装置
CN110147923B (zh) * 2019-04-04 2023-07-11 创新先进技术有限公司 用于识别风险用户的方法及装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080114564A1 (en) * 2004-11-25 2008-05-15 Masayoshi Ihara Information Classifying Device, Information Classifying Method, Information Classifying Program, Information Classifying System
CN103514566A (zh) * 2013-10-15 2014-01-15 国家电网公司 一种风险控制系统及方法
CN103577876A (zh) * 2013-11-07 2014-02-12 吉林大学 基于前馈神经网络的可信与不可信用户识别方法
CN103581120A (zh) * 2012-07-24 2014-02-12 阿里巴巴集团控股有限公司 一种识别用户风险的方法和装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573434B (zh) * 2013-10-12 2018-09-04 深圳市腾讯计算机系统有限公司 帐户保护方法、装置及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080114564A1 (en) * 2004-11-25 2008-05-15 Masayoshi Ihara Information Classifying Device, Information Classifying Method, Information Classifying Program, Information Classifying System
CN103581120A (zh) * 2012-07-24 2014-02-12 阿里巴巴集团控股有限公司 一种识别用户风险的方法和装置
CN103514566A (zh) * 2013-10-15 2014-01-15 国家电网公司 一种风险控制系统及方法
CN103577876A (zh) * 2013-11-07 2014-02-12 吉林大学 基于前馈神经网络的可信与不可信用户识别方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110852443A (zh) * 2019-09-26 2020-02-28 支付宝(杭州)信息技术有限公司 特征稳定性检测方法、设备及计算机可读介质
CN110852443B (zh) * 2019-09-26 2023-02-21 支付宝(杭州)信息技术有限公司 特征稳定性检测方法、设备及计算机可读介质
CN111400663A (zh) * 2020-03-17 2020-07-10 深圳前海微众银行股份有限公司 风险识别方法、装置、设备及计算机可读存储介质

Also Published As

Publication number Publication date
CN106295351B (zh) 2019-03-19
CN106295351A (zh) 2017-01-04

Similar Documents

Publication Publication Date Title
WO2016206557A1 (zh) 一种风险识别方法及装置
Romashkova et al. Application of information technology for the analysis of the rating of university
US20230325724A1 (en) Updating attribute data structures to indicate trends in attribute data provided to automated modelling systems
Grimmer et al. Estimating heterogeneous treatment effects and the effects of heterogeneous treatments with ensemble methods
Petchey et al. The ecological forecast horizon, and examples of its uses and determinants
EP3327582A1 (en) Method and apparatus for completing a knowledge graph
CN107040397B (zh) 一种业务参数获取方法及装置
Cooke All tied up: Tied staying and tied migration within the United States, 1997 to 2007
WO2018145586A1 (zh) 信用评分方法及服务器
WO2017071474A1 (zh) 一种语料处理方法和装置及语料分析方法和装置
Dierckx et al. Change point analysis of extreme values
WO2017071369A1 (zh) 一种预测用户离网的方法和设备
Wisse et al. Relieving the elicitation burden of Bayesian belief networks.
Zhou et al. The risk management using limit theory of statistics on extremes on the big data era
CN112101692B (zh) 移动互联网质差用户的识别方法及装置
Serafini et al. Approximation of Bayesian Hawkes process with inlabru
CN115222081A (zh) 学位资源预测方法、装置及计算机设备
CN111325255B (zh) 特定人群圈定方法、装置、电子设备及存储介质
EP2731021B1 (en) Apparatus, program, and method for reconciliation processing in a graph database
CN112862283A (zh) 基于层次分析的城管派遣方法和系统
Luts Real-time semiparametric regression for distributed data sets
EP3301638A1 (en) Method for automatic property valuation
CN111581197A (zh) 对数据集中的数据表进行抽样和校验的方法及装置
Almomani et al. Selecting a good stochastic system for the large number of alternatives
Coro et al. Comparing heterogeneous distribution maps for marine species

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16813673

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16813673

Country of ref document: EP

Kind code of ref document: A1