WO2017202006A1 - Data processing method and device, and computer storage medium - Google Patents

Data processing method and device, and computer storage medium Download PDF

Info

Publication number
WO2017202006A1
WO2017202006A1 PCT/CN2016/109729 CN2016109729W WO2017202006A1 WO 2017202006 A1 WO2017202006 A1 WO 2017202006A1 CN 2016109729 W CN2016109729 W CN 2016109729W WO 2017202006 A1 WO2017202006 A1 WO 2017202006A1
Authority
WO
WIPO (PCT)
Prior art keywords
account
variable
data
feature
behavior
Prior art date
Application number
PCT/CN2016/109729
Other languages
French (fr)
Chinese (zh)
Inventor
陈玲
陈谦
陈培炫
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017202006A1 publication Critical patent/WO2017202006A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Definitions

  • the present invention relates to the field of data processing, and in particular to a data processing method and apparatus, and a computer storage medium.
  • the data processing can be used for each service, taking the existing personal credit information business as an example, and the data processing process is described as follows:
  • the basic database of credit information includes credit information, public records and inquiry records. Credit information includes credit card records, bank loan records, personal asset records and other credit loan records. Public records include personal housing provident fund, personal pension insurance, etc., and the inquiry records include personal addresses. And contact information, etc.
  • the bank's credit information is used as the primary basis. Obtain an individual's credit rating by means of a sample survey. However, due to the slow update of the bank's credit information, it is impossible to reflect the true credit level of the individual in time, resulting in inaccurate credit levels. At the same time, the data obtained by means of the prior art sample survey cannot fully reflect the true credit level of the bank account, and the resulting credit level is inaccurate, resulting in inaccurate data.
  • the embodiment of the invention provides a data processing method and device, and a computer storage medium, to solve at least the technical problem that the credit level of the account cannot be accurately obtained and the data is inaccurate.
  • a data processing method including: Collecting behavior data of the first account, the behavior data includes online behavior data and offline behavior data based on the Internet; acquiring a first feature variable of the first account according to the behavior data, wherein the first feature a variable for indicating a behavior characteristic of the first account; inputting the first feature variable into a data analysis model, wherein the data analysis model is configured to output a first value according to the first feature variable, the first The value is used to indicate a probability value that the behavior of the first account does not satisfy the preset condition; and the first value output by the data analysis model is recorded.
  • a data processing apparatus including: an collecting unit, configured to collect behavior data of a first account, where the behavior data includes online behavior data and offline behavior based on the Internet a data acquisition unit, configured to acquire, according to the behavior data, a first feature variable of the first account, where the first feature variable is used to represent behavior characteristics of the first account, and an input unit is configured to The first feature variable is input to a data analysis model, wherein the data analysis model is configured to output a first value according to the first feature variable, where the first value is used to indicate that the behavior of the first account is not satisfied a probability value of the condition; a recording unit configured to record the first value output by the data analysis model.
  • the collecting unit, the obtaining unit, the input unit, and the recording unit may use a central processing unit (CPU), a digital signal processor (DSP, Digital Singnal Processor), or Field-Programmable Gate Array (FPGA) implementation.
  • CPU central processing unit
  • DSP digital signal processor
  • FPGA Field-Programmable Gate Array
  • the embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are configured to execute the data processing method described above.
  • the first feature variable is used to represent the behavior characteristic of the first account, and the behavior characteristic of the first account is obtained based on the behavior data of the first account based on the Internet, and then the first feature variable is input into the data analysis model. , the probability value that the behavior of the first account does not satisfy the preset condition can be obtained. Since the behavior data of the first account in the social application can be relatively wide coverage The behavior of an account, the behavior data input into the data analysis model can fully reflect the behavior of the first account, so that the analyzed probability value of the behavior of the first account does not meet the preset condition is more accurate, thereby solving the inaccuracy Get technical questions about the credit level of your account.
  • FIG. 1 is a schematic diagram of a network architecture in accordance with an embodiment of the present invention.
  • FIG. 2 is a flow chart of a data processing method according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a model architecture in accordance with an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention.
  • FIG. 5 is a hardware configuration diagram of a server according to an embodiment of the present invention.
  • an embodiment of a method that can be performed by an embodiment of the apparatus of the present application is provided. It is noted that the steps illustrated in the flowchart of the accompanying drawings can be in a computer system such as a set of computer executable instructions. The execution is performed, and although the logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
  • a data processing method is provided.
  • the data processing method can be applied to a hardware environment formed by the terminal 102 and the server 104 as shown in FIG. 1.
  • the terminal 102 is connected to the server 104 through a network.
  • the network includes but is not limited to: a mobile communication network, a wide area network, a metropolitan area network, or a local area network.
  • the terminal 102 may be a mobile phone terminal, or may be a PC terminal or a notebook terminal. Or a tablet terminal.
  • the server 104 collects behavior data of the plurality of terminals 102, including behavior data obtained by the terminal 102 performing actions through the Internet (for example, chatting in an instant messaging application, watching videos, games, etc.) and combining the terminal 102 via the Internet and offline actions. Behavioral data obtained by the action (such as storing motion data in the cloud through a wearable device during exercise, etc.).
  • the server 104 analyzes the feature variables of one or more terminals 102 according to the collected behavior data, and then acquires the probability that the behavior of a certain terminal satisfies a preset condition according to the feature variables of one or more terminals 102 (eg, a certain terminal). Credit rating). Further, when the credits of a certain terminal 102 are acquired through the feature variables of the plurality of terminals 102, the plurality of terminals have an association relationship (such as a friend relationship) with the one terminal 102.
  • the first account is based on social application-based behavior data, it is not limited to the prior art silver.
  • Row data the collected behavior data covers a wider range, and can reflect the probability value of the behavior of the first account meeting the preset condition from multiple aspects, thereby improving the accuracy of the obtained probability value, thereby solving the existing Technology can't accurately get the technical problem of the credit level of the account.
  • FIG. 2 is a flowchart of a data processing method according to an embodiment of the present invention.
  • the data processing method provided by the embodiment of the present invention is specifically described below with reference to FIG. 2 .
  • the data processing method mainly includes the following steps:
  • Step S202 Collect behavior data of the first account, and the behavior data includes online data and offline data based on the Internet.
  • Step S204 Acquire a first feature variable of the first account according to the behavior data, where the first feature variable is used to represent a behavior feature of the first account.
  • Step S206 the first feature variable is input into the data analysis model, wherein the data analysis model is configured to output the first value according to the first feature variable, where the first value is used to indicate a probability value that the behavior of the first account does not satisfy the preset condition.
  • Step S208 recording a first value output by the data analysis model.
  • the first characteristic variable is used to represent the behavior characteristic of the first account, and the behavior characteristic of the first account is obtained based on the behavior data of the first account based on the Internet, and then the first feature variable is input into the data analysis model, and the first account is obtained.
  • the behavior does not satisfy the probability value of the preset condition. Since the behavior data of the first account in the social application can cover the behavior of the first account relatively widely, the behavior data input into the data analysis model can fully reflect the behavior of the first account, thereby causing the analysis of the first account. The probability value that the behavior does not satisfy the preset condition is more accurate.
  • the behavior data includes online behavior data and offline behavior data of the first account based on the Internet.
  • the virtual space behavior data on the Internet includes not only:
  • User's basic demographic attribute information such as name, age, gender, region, education, occupation, etc.
  • Virtual value-added service data such as virtual account role dressing, game item purchase, film and television membership service, cloud storage space value-added service, music flow package, etc.
  • Economic behavior data such as payment, wealth management, shopping, stocks, funds, P2P, finance, etc.
  • the online data can be obtained through the user's mobile phone, tablet or PC computer instant messaging application, game client, APP download platform, financial platform, shopping software, etc. to collect the information filled by the user or the application is actively reported.
  • Offline related scene data includes not only:
  • O2O online to offline, such as housekeeping services, urban services, beauty care, etc.
  • wear device data such as medical health, sports, etc.
  • LBS location based service
  • Travel data such as ticket ordering, hotel reservations, etc.
  • the behavior data includes actions in various online and offline scenes, and almost includes behavior data of various aspects of life. Therefore, the probability values obtained according to the behavior data also more accurately reflect the true probability value of the account.
  • the behavior data changes, it will be immediately fed back to the server or instantly obtained by the server. Therefore, the behavior data is updated quickly, and the probability value obtained according to the behavior data of the instant update may reflect that the behavior of the first account is not satisfied.
  • Set the probability of the condition The probability value that does not satisfy the preset condition may be the probability of default, such as not complying with the contract Definite behavior, etc.
  • the account chat behavior of the user A in the instant messaging application may be collected.
  • Extracting the first feature variable from the behavior data separately can obtain the first feature variable of different categories.
  • the first feature variable of the instant communication class, the first feature variable of the video class, and the first feature variable of the download class all of the first feature variables of the different classes are input into the data analysis model, and the first value is output. It is also possible to input portions of the first characteristic variables of different classes into the data analysis model.
  • the friend of the user A has a similarity with the user A, and the behavior data of the friend of the user A can also reflect the probability that the behavior of the user A does not satisfy the preset condition. Therefore, when the first feature variable is input to the data analysis model, the feature variable associated with the friend of the user A can also be input at the same time.
  • the inputting the first feature variable into the data analysis model includes: acquiring the second feature variable, wherein the second feature variable is used to represent behavior characteristics of the plurality of second account accounts having an association relationship with the first account; And the second characteristic variable is input to the data analysis model, wherein the data analysis model is further configured to output the first value according to the first feature variable and the second feature variable.
  • the method of obtaining the second feature variable is the same as the method of acquiring the first feature variable, which will be described in detail later.
  • the first account with the associated relationship and the plurality of the second account are friends, and the plurality of second accounts are friends of the first account.
  • Both the online behavior and the offline behavior in the above example can be mapped to the behavior of an application account through a certain correspondence.
  • the second account registers the navigation service and the instant messaging application by using the mobile phone number, and collects the behavior data of the second account in the navigation service and the behavior of the second account in the instant messaging application when acquiring the behavior data of the second account. data.
  • inputting the first feature variable and the second feature variable to the data analysis model includes: acquiring intimacy between the plurality of second account accounts and the first account, wherein the intimacy is according to each second The interaction behavior between the account and the first account is generated; the third feature variable is obtained according to the intimacy and the second feature variable by using the following formula:
  • ⁇ ' f(( ⁇ 1 , ⁇ 2 ,..., ⁇ i ,..., ⁇ n ),( ⁇ 1 , ⁇ 2 ,..., ⁇ i ,..., ⁇ n )),
  • ⁇ ' represents the third characteristic variable
  • i represents the i-th second account
  • ⁇ i is the intimacy of the i-th second account and the first account
  • ⁇ i is the second characteristic variable of the i-th second account
  • f is a weighted average of the second characteristic variable and the intimacy of the first n second accounts in the order of indicating the high to low intimacy
  • the first characteristic variable and the third characteristic variable are input to the data analysis model .
  • the second feature variable of the second account is processed to make it better reflect the behavior characteristics of the first account. Therefore, when the second variable is acquired, each second characteristic variable is multiplied by the corresponding weight value, and then the weighted average is performed.
  • the weight value indicates the intimacy of the first account and the second account. The closer the first account is to the second account, the greater the weight value; conversely, the smaller the weight value.
  • the intimacy can be measured by the interaction between the first account and the second account. For example, the more chats between the first account and the second account, the more intimate the relationship. The higher the degree of overlap between the first account and the second account, the more intimate the relationship between the two accounts. The intimacy and coincidence can be obtained by means of a training model.
  • Interactive behaviors include interactions in the circle of friends, payment interactions (such as red envelopes), and sports interactions (such as walking 10,000 steps).
  • Intimacy can be reflected by the above information interaction, including the number of times the information is sent and received, the number of days, etc., as well as the ratio of sending and receiving information, the number of times of information interaction every day.
  • This information includes text information, video information, and voice information.
  • Intimacy can also be obtained by commenting, praising, marking a friend as a special friend, giving a gift, or pulling black.
  • the third characteristic variable That is, the intimacy is a weighted average of the second characteristic variable and the intimacy of the top 10 friends arranged in order of intimacy.
  • the general characteristics of a group can reflect the characteristics of a certain user in this group. Therefore, the probability value of the behavior that does not satisfy the preset condition can be obtained according to the characteristics of a group, and the credit degree of the user can be more accurately reflected.
  • the second feature of acquiring multiple second accounts is obtained.
  • the first n second accounts of the intimacy are selected according to the intimacy of the first account and the first account, and the third feature variable is generated according to the intimacy and the second feature variable.
  • the abnormal data may be data that is obviously beyond a certain range. For example, the normal person's age does not exceed one hundred. If the collected data shows that the age is 100, the abnormal data is deleted. If the collected data shows that the age includes 0 and 49, 0 and 49 are in the range of 0 to 100, however, most of the other data are between 18 and 45, so 0 and 49 belong to the singular point with a large fluctuation value.
  • behavioral data is divided into multiple dimensions based on data sources and business characteristics. For example, basic information, social interaction, financial management, etc., are classified and integrated into the database.
  • basic information, social interaction, financial management, etc. are classified and integrated into the database.
  • the above basic attributes, social interaction behavior, purchase behavior, taxi behavior and friend attributes can all reflect the behavior characteristics of the first account.
  • acquiring the first feature variable of the first account according to the behavior data includes: acquiring an information gain of the feature in the behavior data, the information gain is used to indicate the amount of information included in the behavior data; and determining whether the information gain is Within the preset value range; if the information gain is within the preset value range, the derived variable is constructed according to the behavior data, wherein the derived variable is the merged or split behavior data; if the information gain is outside the preset value range, Then, the feature corresponding to the information gain outside the preset value range is deleted, and the derivative variable is constructed according to the remaining features; the derived variable is used as the first feature variable.
  • deleting a feature corresponding to an information gain outside a preset value range, and then constructing a derivative variable according to the remaining feature includes: deleting a feature corresponding to an information gain outside a preset value range, Obtaining the correlation coefficient of the remaining features; combining the features whose correlation coefficient is greater than or equal to the preset coefficient into one merge feature; and using the merged feature as a derivative variable.
  • the collected behavior data including the number of text chats, the number of voice calls, the amount of payment, etc., are all characteristics in the behavior data. For example, text chat 9 times, voice call 10 times and payment amount 100, the numbers are called feature values.
  • the information gain can reflect the amount of information of a feature. If the amount of information is less than the threshold, the feature can be deleted. For example, the information gain is sorted for each type of feature, and the feature with the information gain less than the threshold is deleted. Then, the correlation of the remaining features is detected. If there are more relevant features, the features with stronger correlation are combined to obtain the first feature variable. If a feature is weakly correlated and highly significant, then this feature can be refined into multiple features. For example, split the number of chats into evening chats, daytime chats, weekend chats, and weekday chats. Number of days, etc. Conversely, you can combine night chats and day chats into chats.
  • the behavior data can be flexibly split and merged to construct the first feature variable, and when splitting and merging, the same or different methods can be used for multiple features (for example, some features adopt the principal component)
  • the analysis method and other features using clustering methods, etc.) increase the flexibility of constructing the first feature variable.
  • each subcategory when the data analysis model is established, each subcategory may be generated corresponding to each category according to the category classified by the behavior data, and each submodel can output a first sub-value. Processing these first sub-values yields the first value of the data analysis model output. Further, when the sub-model is established, the sub-model may be trained according to the sample data of each category, and each category may be further divided, and a low-level model is established for the divided data, and multiple low-level models are constructed. The submodel, and then the submodel constitutes the data analysis model.
  • the method before inputting the first feature variable and the second feature variable to the data analysis model, the method further includes: dividing the behavior data into a plurality of categories; respectively for each of the plurality of categories The category establishes a sub-model, wherein each sub-model is configured to output a first sub-value according to the first characteristic variable and/or the second characteristic variable, wherein the first sub-value is used to represent the category corresponding to the sub-model,
  • the behavior of an account does not satisfy the probability value of the preset condition; and the plurality of sub-models corresponding to the plurality of categories are constructed as a data analysis model.
  • establishing a sub-model for each of the plurality of categories includes: establishing a sub-model for each category using the same or different training models; or using the same or different training The model establishes a low-level model for each subcategory under each category, and constructs a low-level model corresponding to multiple subcategories under each category as a sub-model.
  • the training models used to build sub-models in each category can be the same or different. For example, among the 10 categories, 5 categories use the decision tree training model, and the other 5 use the neural network training sub-model. type.
  • the constructing the plurality of sub-models corresponding to the plurality of categories into the data analysis model comprises: constructing the plurality of sub-models into the data analysis model in the following manner:
  • P is the first value
  • i is the i-th sub-model of the plurality of sub-models
  • n is the number of the plurality of sub-models
  • P i ' is the first sub-value output of the i-th sub-model
  • P 0 is a constant feature
  • dividing the behavior data into a plurality of categories includes: dividing the behavior data into a plurality of categories according to the type of the service included in the behavior data; or dividing the data including the target object in the behavior data into one category, and not in the behavior data The data including the target object is divided into another category.
  • the three division methods may use any one of them to separately construct the sub-model, or may construct the sub-model by any two or three combinations. For example, first establish a sub-model according to whether or not the target object is divided, and then divide the sub-model below the sub-model according to the business type.
  • the sub-services mainly refer to the previous data categories, such as basic information, value-added services, social interactions, economic behaviors, etc.; grouping is mainly based on business characteristics. For example, in economic activities, there are credit cards and users without credit cards in payment, shopping, wealth management, etc. The behavioral performance is quite different, so it can be divided into two groups to build the model separately.
  • the layering is mainly at the level of the whole model architecture. For example, the sub-model layer can also be divided into multiple dimension layers, each layer is adopted. Machine learning algorithms can be quite different.
  • the data processing method of this embodiment is mainly divided into four parts, including data acquisition, data processing, feature mining and model construction.
  • Data collection This includes collecting online data and offline scene data.
  • Online data includes data on games, finance, apps, shopping, social, and education, such as game titles, purchase amounts, and more.
  • Offline scene data includes data such as life, navigation, travel, check-in, medical, and sports. For example, medical records, booking hotels, tourist locations, etc.
  • Data processing Includes cleaning, integration, and standardization. Cleaning includes deduplication, deletion of singularities, removal of abnormal data and information supplementation, integration includes dividing the same category of data into the same category, and normalization includes normalization of data types and normalization of storage data structures.
  • the processed data is mined, for example, using graph calculation and text mining methods.
  • the characteristics of mining include data on user basic information, social interaction, personality traits, hobbies, emotional orientation, life circle, physical health and financial management.
  • Model construction Classify the mined features and build a model for each category. For example, social interaction classes, hobbies, health and personality. Create a model for each category type. And each model can be obtained using different learning and training methods.
  • the characteristics of the social interaction class can also be subdivided into chat features, phone features and video features. After building the submodel, the total model is obtained. The first feature variable and the third feature variable are then input into the sub-model to obtain a first value of the total model output.
  • the first feature variable includes the feature a1, the feature a2, and the feature a3.
  • the feature a1, the feature a2, and the feature a3 and the feature b1, the feature b2, and the feature b3 are three pairs of features that sequentially correspond.
  • feature a1 represents the payment amount of the first account
  • feature b1 represents the payment amount of the second account
  • feature a2 represents the game type of the first account
  • feature b2 represents the game type of the second account
  • feature a3 represents the movement of the first account.
  • the number of times, feature b3 represents the number of times of movement of the second account.
  • the first value in order to improve the readability of the first value, is converted into a credit program capable of embodying the first account.
  • the first value represents the probability value of the first account default, and after converting to the third value, the credit level of the first account may be indicated. That is, after recording the first value output by the data analysis model, the method further comprises: converting the first value to the third value S by using the following method:
  • S is used to indicate the degree to which the behavior of the first account satisfies the preset condition
  • b represents a reference value
  • p represents a first value
  • st represents a step size
  • the features used comprehensively cover the online and offline behavior characteristics of the user, including not only basic user information, social interaction, financial activities, hobbies, life circles, but also deepening the user's personality characteristics, emotional inclination, etc. It is more able to characterize the stable features of the user's mental outlook and personality.
  • the multi-layered machine learning algorithm is adopted, which can improve the complexity and predictive ability of the algorithm while improving the accuracy of the user's credit program.
  • the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk,
  • the optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
  • a data processing apparatus for implementing the data processing method
  • the data processing apparatus is mainly used to perform the data processing method provided by the foregoing content of the embodiment of the present invention, and the following is an embodiment of the present invention.
  • the data processing device provided is specifically introduced:
  • the data processing apparatus mainly includes an acquisition unit 10, an acquisition unit 20, an input unit 30, and a recording unit 40.
  • the collecting unit 10 is configured to collect behavior data of the first account, and the behavior data includes online behavior data and offline behavior data based on the Internet.
  • the obtaining unit 20 is configured to obtain a first feature variable of the first account according to the behavior data, where the first feature variable is used to represent a behavior feature of the first account.
  • the input unit 30 is configured to input the first feature variable into the data analysis model, wherein the data analysis
  • the model is configured to output a first value according to the first feature variable, where the first value is used to indicate a probability value that the behavior of the first account does not satisfy the preset condition.
  • the recording unit 40 is for recording the first value of the data analysis model output.
  • the first characteristic variable is used to represent the behavior characteristic of the first account, and the behavior characteristic of the first account is obtained based on the behavior data of the first account based on the Internet, and then the first feature variable is input into the data analysis model, and the first account is obtained.
  • the behavior does not satisfy the probability value of the preset condition. Since the behavior data of the first account in the social application can cover the behavior of the first account relatively widely, the behavior data input into the data analysis model can fully reflect the behavior of the first account, thereby causing the analysis of the first account. The probability value that the behavior does not satisfy the preset condition is more accurate.
  • the behavior data includes online behavior data and offline behavior data of the first account based on the Internet.
  • Behavioral data includes actions in a variety of online and offline scenarios, including behavioral data for all aspects of life. Therefore, the probability values obtained from these behavioral data also more accurately reflect the true probability value of the account.
  • the behavior data changes, it will be immediately fed back to the server or instantly obtained by the server. Therefore, the behavior data is updated quickly, and the probability value obtained according to the behavior data of the instant update may reflect that the current behavior of the first account is not satisfied.
  • Set the probability of the condition The probability value that does not satisfy the preset condition may be a probability of default, such as non-compliance with the contracted behavior.
  • the account chat behavior of the user A in the instant messaging application may be collected.
  • Extracting the first feature variable from the behavior data separately can obtain the first feature variable of different categories.
  • the first feature variable of the instant communication class, the first feature variable of the video class, and the first feature variable of the download class all of the first feature variables of the different classes are input into the data analysis model, and the first value is output. It is also possible to input portions of the first characteristic variables of different classes into the data analysis model.
  • the friend of the user A has a similarity with the user A, and the behavior data of the friend of the user A can also reflect the probability that the behavior of the user A does not satisfy the preset condition. Therefore, when the first feature variable is input to the data analysis model, the feature variable associated with the friend of the user A can also be input at the same time.
  • the input unit includes: a first acquiring subunit, configured to acquire a second feature variable, wherein the second feature variable is used to represent a behavior characteristic of the plurality of second account accounts having an association relationship with the first account; the input subunit, And a method for inputting the first feature variable and the second feature variable to the data analysis model, wherein the data analysis model is further configured to output the first value according to the first feature variable and the second feature variable.
  • the method of obtaining the second feature variable is the same as the method of acquiring the first feature variable, which will be described in detail later.
  • the first account with the associated relationship and the plurality of the second account are friends, and the plurality of second accounts are friends of the first account.
  • Both the online behavior and the offline behavior in the above example can be mapped to the behavior of an application account through a certain correspondence.
  • the second account registers the navigation service and the instant messaging application by using the mobile phone number, and collects the behavior data of the second account in the navigation service and the behavior of the second account in the instant messaging application when acquiring the behavior data of the second account. data.
  • the input subunit includes: a first obtaining module, configured to acquire a closeness between the plurality of second accounts and the first account, wherein the intimacy is generated according to an interaction behavior of each second account and the first account;
  • the third feature variable is obtained from the intimacy and the second characteristic variable using the following formula:
  • ⁇ ' f(( ⁇ 1 , ⁇ 2 ,..., ⁇ i ,..., ⁇ n ),( ⁇ 1 , ⁇ 2 ,..., ⁇ i ,..., ⁇ n )),
  • ⁇ ' represents the third characteristic variable
  • i represents the i-th second account
  • ⁇ i is the intimacy of the i-th second account and the first account
  • ⁇ i is the second characteristic variable of the i-th second account
  • f is a weighted average of the second characteristic variable and the intimacy of the first n second accounts in the order of indicating the intimacy from high to low
  • the input module is configured to use the first feature variable and the third feature variable Enter the data analysis model.
  • the second feature variable of the second account is processed to make it better reflect the behavior characteristics of the first account. Therefore, when the second variable is acquired, each second characteristic variable is multiplied by the corresponding weight value, and then the weighted average is performed.
  • the weight value indicates the intimacy of the first account and the second account. The closer the first account is to the second account, the greater the weight value; conversely, the smaller the weight value.
  • the intimacy can be measured by the interaction between the first account and the second account. For example, the more chats between the first account and the second account, the more intimate the relationship. The higher the degree of overlap between the first account and the second account, the more intimate the relationship between the two accounts. The intimacy and coincidence can be obtained by means of a training model.
  • Interactive behaviors include interactions in the circle of friends, payment interactions (such as red envelopes), and sports interactions (such as walking 10,000 steps).
  • Intimacy can be reflected by the above information interaction, including the number of times the information is sent and received, the number of days, etc., as well as the ratio of sending and receiving information, the number of times of information interaction every day.
  • This information includes text information, video information, and voice information.
  • Intimacy can also be obtained by commenting, praising, marking a friend as a special friend, giving a gift, or pulling black.
  • the third characteristic variable That is, the intimacy is a weighted average of the second characteristic variable and the intimacy of the top 10 friends arranged in order of intimacy.
  • the general characteristics of a group can reflect the characteristics of a certain user in this group. Therefore, the probability value of the behavior that does not satisfy the preset condition can be obtained according to the characteristics of a group, and the credit degree of the user can be more accurately reflected. It should be noted that, when acquiring the second feature variable of the plurality of second accounts, first selecting the intimacy ranking of the first n second accounts according to the intimacy of the first account and the first account, and then according to the intimacy And generating a third feature variable with the second feature variable.
  • the abnormal data may be data that is obviously beyond a certain range. For example, usually the age of the person does not exceed one hundred, if the collected data shows the age is 100, the exception data is deleted. If the collected data shows that the age includes 0 and 49, 0 and 49 are in the range of 0 to 100, however, most of the other data are between 18 and 45, so 0 and 49 belong to the singular point with a large fluctuation value.
  • behavioral data is divided into multiple dimensions based on data sources and business characteristics. For example, basic information, social interaction, financial management, etc., are classified and integrated into the database.
  • basic information, social interaction, financial management, etc. are classified and integrated into the database.
  • the above basic attributes, social interaction behavior, purchase behavior, taxi behavior and friend attributes can all reflect the behavior characteristics of the first account.
  • the acquiring unit includes: an acquiring subunit, configured to acquire an information gain of a feature in the behavior data, where the information gain is used to represent the amount of information included in the behavior data; a determining subunit for determining whether the information gain is within a preset value range; constructing a subunit for constructing a derivative variable according to the behavior data when the information gain is within a preset value range, wherein the derived variable is a merge or Decomposed behavior data; delete sub-units, when the information gain is outside the preset value range, delete the feature corresponding to the information gain outside the preset value range, and then construct the derivative variable according to the remaining features; determine the sub-unit Used to use the derived variable as the first feature variable.
  • the deleting the subunit includes: a second acquiring module, configured to acquire a correlation coefficient of the remaining feature after deleting the feature corresponding to the information gain outside the preset numerical range; The feature that the correlation coefficient is greater than or equal to the preset coefficient is merged into one merge feature; the determining module is configured to use the merged feature as a derivative variable.
  • the collected behavior data including the number of text chats, the number of voice calls, the amount of payment, etc., are all characteristics in the behavior data. For example, text chat 9 times, voice call 10 times and payment amount 100, the numbers are called feature values.
  • the information gain can reflect the amount of information of a feature. If the amount of information is less than the threshold, the feature can be deleted. For example, the information gain is sorted for each type of feature, and the feature with the information gain less than the threshold is deleted. Then, the correlation of the remaining features is detected. If there are more relevant features, the features with stronger correlation are combined to obtain the first feature variable. If a feature is weakly correlated and highly significant, then this feature can be refined into multiple features. For example, split the number of chats into evening chats, daytime chats, weekend chats, and weekday chats. Conversely, you can combine night chats and day chats into chats.
  • the behavior data can be flexibly split and merged to construct the first feature variable, and when splitting and merging, the same or different methods can be used for multiple features (for example, some features adopt the principal component)
  • the analysis method and other features using clustering methods, etc.) increase the flexibility of constructing the first feature variable.
  • the apparatus further includes: a dividing unit, configured to divide the behavior data into a plurality of categories before inputting the first feature variable and the second feature variable to the data analysis model; a sub-model for each of the plurality of categories, wherein each sub-model is configured to output a first sub-value according to the first characteristic variable and/or the second characteristic variable, wherein the first sub-value is used In the category corresponding to the sub-model, the behavior of the first account does not satisfy the probability value of the preset condition; and the second establishing unit is configured to construct the plurality of sub-models corresponding to the plurality of categories as the data analysis model.
  • the first establishing unit includes: a first establishing subunit, configured to respectively establish a submodel for each category by using the same or different training models; or a second establishing subunit, for A low-level model is established for each subcategory under each category by using the same or different training models, and a low-level model corresponding to multiple subcategories under each category is constructed as a sub-model.
  • the training models used to build sub-models in each category can be the same or different. For example, among the 10 categories, 5 categories use the decision tree training model, and the other 5 use the neural network training sub-model.
  • the second establishing unit is further configured to construct multiple sub-models into a data analysis model in the following manner:
  • P always represents the first value
  • i is the i-th sub-model of the plurality of sub-models
  • n is the number of the plurality of sub-models
  • P i ' is the first sub-value output of the i-th sub-model
  • P 0 is a constant.
  • the dividing unit includes: a first dividing subunit, configured to divide the behavior data into multiple categories according to a service type included in the behavior data; or a second dividing subunit, configured to perform the behavior Data in the data including the target object is divided into one class, which will behave The data in the data that does not include the target object is divided into another category.
  • the apparatus further includes: a converting unit, configured to convert the first value into the third value S by using the following method after recording the first value output by the data analysis model:
  • S is used to indicate the degree to which the behavior of the first account satisfies the preset condition
  • b represents a reference value
  • p represents a first value
  • st represents a step size
  • the features used comprehensively cover the online and offline behavior characteristics of the user, including not only basic user information, social interaction, financial activities, hobbies, life circles, but also deepening the user's personality characteristics, emotional inclination, etc. It is more able to characterize the stable features of the user's mental outlook and personality.
  • the multi-layered machine learning algorithm is adopted, which can improve the complexity and predictive ability of the algorithm while improving the accuracy of the user's credit program.
  • a server for implementing the above data processing method is further provided.
  • the server mainly includes a processor 501, a data interface 503, a memory 505, and a network interface 507, where:
  • the data interface 503 transmits the behavior data acquired by the third party tool to the processor 501 mainly by means of data transmission.
  • the memory 505 is mainly used to store behavior data and data analysis models.
  • the network interface 507 is mainly used for network communication with the server, and obtains behavior data provided by the terminal from other servers.
  • the processor 501 is mainly configured to perform the following operations:
  • the behavior data includes online behavior data and offline behavior data based on the Internet; and acquiring the first feature of the first account according to the behavior data And the first feature variable is used to represent a behavior characteristic of the first account; the first feature variable is input to a data analysis model, wherein the data analysis model is used according to the first feature variable And outputting a first value, where the first value is used to indicate a probability value that the behavior of the first account does not satisfy a preset condition; and the first value output by the data analysis model is recorded.
  • the processor 501 is further configured to acquire a second feature variable, where the second feature variable is used to represent behavior characteristics of a plurality of second account accounts that have an association relationship with the first account; the first feature variable and The second characteristic variable is input to the data analysis model, wherein the data analysis model is further configured to output the first value according to the first feature variable and the second feature variable.
  • the processor 501 is further configured to acquire the intimacy between the plurality of second accounts and the first account, where the intimacy is generated according to an interaction behavior of each of the second accounts and the first account. Obtaining a third characteristic variable according to the intimacy and the second characteristic variable by using the following formula:
  • ⁇ ' f(( ⁇ 1 , ⁇ 2 ,..., ⁇ i ,..., ⁇ n ),( ⁇ 1 , ⁇ 2 ,..., ⁇ i ,..., ⁇ n )),
  • ⁇ ' denotes the third characteristic variable
  • i denotes the i-th second account
  • ⁇ i is the intimacy of the i-th second account and the first account
  • ⁇ i is the ith second
  • f is a weighted average value of the second characteristic variable and the intimacy of the first n second account numbers in the order of indicating the intimacy from high to low
  • the first feature variable and the third feature variable are input to the data analysis model.
  • Embodiments of the present invention also provide a storage medium.
  • the above storage medium may be used to store program codes of the data processing method of the embodiment of the present invention.
  • the foregoing storage medium may be located in a mobile communication network, a wide area network, or a metropolitan area. At least one of a plurality of network devices in a network of a network or a local area network.
  • the storage medium is arranged to store program code for performing the following steps:
  • S1 Collect behavior data of the first account, where the behavior data includes online behavior data and offline behavior data based on the Internet.
  • the first feature variable is input to a data analysis model, wherein the data analysis model is configured to output a first value according to the first feature variable, where the first value is used to represent behavior of the first account The probability value that does not satisfy the preset condition.
  • the storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory.
  • ROM Read-Only Memory
  • RAM Random Access Memory
  • a mobile hard disk and a magnetic memory.
  • the processor performs acquiring the second feature variable according to the stored program code in the storage medium, where the second feature variable is used to indicate that the relationship relationship with the first account is Behavioral characteristics of the second account; inputting the first feature variable and the second feature variable to the data analysis model, wherein the data analysis model is further configured to use the first feature variable and the The second characteristic variable outputs the first value.
  • the processor performs, according to the stored program code in the storage medium, acquiring the intimacy between the plurality of second accounts and the first account, wherein the intimacy is according to each The interaction behavior of the second account with the first account is generated; and the third feature variable is obtained according to the intimacy and the second feature variable by using the following formula:
  • ⁇ ' f(( ⁇ 1 , ⁇ 2 ,..., ⁇ i ,..., ⁇ n ),( ⁇ 1 , ⁇ 2 ,..., ⁇ i ,..., ⁇ n )),
  • ⁇ ' represents the third characteristic variable
  • i represents the i-th second account
  • ⁇ i is the intimacy of the i-th second account and the first account
  • ⁇ i is the ith second
  • f is a weighted average value of the second characteristic variable and the intimacy of the first n second account numbers in the order of indicating the intimacy from high to low
  • the first feature variable and the third feature variable are input to the data analysis model.
  • the embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are configured to execute the data processing method described above.
  • the integrated unit in the above embodiment if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in the above-described computer readable storage medium.
  • the technical solution of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium.
  • a number of instructions are included to cause one or more computer devices (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the disclosed client may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interface, indirect coupling of the unit or module or The communication connection can be in electrical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • the first feature variable is used to represent the behavior characteristic of the first account, and the behavior characteristic of the first account is obtained based on the behavior data of the first account based on the Internet, and then the first feature variable is input into the data analysis model. , the probability value that the behavior of the first account does not satisfy the preset condition can be obtained. Since the behavior data of the first account in the social application can cover the behavior of the first account relatively widely, the behavior data input into the data analysis model can fully reflect the behavior of the first account, thereby causing the analysis of the first account. The probability value that the behavior does not satisfy the preset condition is more accurate, thereby solving the technical problem that the credit level of the account cannot be accurately obtained.

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A data processing method and device, and a computer storage medium. The method comprises: collecting behavior data of a first account, the behavior data comprising Internet-based online behavior data and offline behavior data (S202); obtaining a first characteristic variable of the first account according to the behavior data (S204), wherein the first characteristic variable is used for representing behavior characteristics the first account; inputting the first characteristic variable into a data analysis model (S206), wherein the data analysis model is used for outputting a first numerical value according to the first characteristic variable, and the first numerical value is used for representing the value of probability that a behavior of the first account does not satisfy a preset condition; and recording the first numerical value outputted by the data analysis model (S208).

Description

数据处理方法和装置、计算机存储介质Data processing method and device, computer storage medium 技术领域Technical field
本发明涉及数据处理领域,具体而言,涉及一种数据处理方法和装置、计算机存储介质。The present invention relates to the field of data processing, and in particular to a data processing method and apparatus, and a computer storage medium.
背景技术Background technique
数据处理可以针对各项业务,以现有的个人征信业务为例,对其数据处理过程描述如下:The data processing can be used for each service, taking the existing personal credit information business as an example, and the data processing process is described as follows:
通过采集银行的数据来建立个人的信用级别。一般情况下,建立个人的信用级别采用征信基础数据库中的数据。征信基础数据库包括信贷信息、公共记录和查询记录,信贷信息包括信用卡记录、银行贷款记录、个人资产记录和其他信用贷款记录,公共记录包括个人住房公积金、个人养老保险等,查询记录包括个人地址和联系方式等。在建立个人的信用级别时,将银行的信贷信息作为主要的依据。通过抽样调查的方式,获取个人的信用级别。但是,由于银行的信贷信息更新慢,不能及时反映个人的真实信用程度,导致获取的信用级别不准确。同时,由于现有技术抽样调查的方式所获得的数据不能全面反映银行帐号的真实信用程度,也导致最终获取的信用级别不准确,从而导致数据不准确。Establish a personal credit rating by collecting data from the bank. In general, the personal credit rating is established using the data in the credit base database. The basic database of credit information includes credit information, public records and inquiry records. Credit information includes credit card records, bank loan records, personal asset records and other credit loan records. Public records include personal housing provident fund, personal pension insurance, etc., and the inquiry records include personal addresses. And contact information, etc. When establishing a personal credit rating, the bank's credit information is used as the primary basis. Obtain an individual's credit rating by means of a sample survey. However, due to the slow update of the bank's credit information, it is impossible to reflect the true credit level of the individual in time, resulting in inaccurate credit levels. At the same time, the data obtained by means of the prior art sample survey cannot fully reflect the true credit level of the bank account, and the resulting credit level is inaccurate, resulting in inaccurate data.
针对上述的问题,目前尚未提出有效的解决方案。In response to the above problems, no effective solution has been proposed yet.
发明内容Summary of the invention
本发明实施例提供了一种数据处理方法和装置、计算机存储介质,以至少解决无法准确获取帐号的信用级别而导致数据不准确的技术问题。The embodiment of the invention provides a data processing method and device, and a computer storage medium, to solve at least the technical problem that the credit level of the account cannot be accurately obtained and the data is inaccurate.
根据本发明实施例的一个方面,提供了一种数据处理方法,包括:采 集第一帐号的行为数据,所述行为数据包括基于互联网的线上行为数据和线下行为数据;根据所述行为数据获取所述第一帐号的第一特征变量,其中,所述第一特征变量用于表示所述第一帐号的行为特征;将所述第一特征变量输入数据分析模型,其中,所述数据分析模型用于根据所述第一特征变量输出第一数值,所述第一数值用于表示所述第一帐号的行为不满足预设条件的概率值;记录所述数据分析模型输出的所述第一数值。According to an aspect of an embodiment of the present invention, a data processing method is provided, including: Collecting behavior data of the first account, the behavior data includes online behavior data and offline behavior data based on the Internet; acquiring a first feature variable of the first account according to the behavior data, wherein the first feature a variable for indicating a behavior characteristic of the first account; inputting the first feature variable into a data analysis model, wherein the data analysis model is configured to output a first value according to the first feature variable, the first The value is used to indicate a probability value that the behavior of the first account does not satisfy the preset condition; and the first value output by the data analysis model is recorded.
根据本发明实施例的另一方面,还提供了一种数据处理装置,包括:采集单元,用于采集第一帐号的行为数据,所述行为数据包括基于互联网的线上行为数据和线下行为数据;获取单元,用于根据所述行为数据获取所述第一帐号的第一特征变量,其中,所述第一特征变量用于表示所述第一帐号的行为特征;输入单元,用于将所述第一特征变量输入数据分析模型,其中,所述数据分析模型用于根据所述第一特征变量输出第一数值,所述第一数值用于表示所述第一帐号的行为不满足预设条件的概率值;记录单元,用于记录所述数据分析模型输出的所述第一数值。According to another aspect of the embodiments of the present invention, a data processing apparatus is provided, including: an collecting unit, configured to collect behavior data of a first account, where the behavior data includes online behavior data and offline behavior based on the Internet a data acquisition unit, configured to acquire, according to the behavior data, a first feature variable of the first account, where the first feature variable is used to represent behavior characteristics of the first account, and an input unit is configured to The first feature variable is input to a data analysis model, wherein the data analysis model is configured to output a first value according to the first feature variable, where the first value is used to indicate that the behavior of the first account is not satisfied a probability value of the condition; a recording unit configured to record the first value output by the data analysis model.
所述采集单元、所述获取单元、所述输入单元、所述记录单元在执行处理时,可以采用中央处理器(CPU,Central Processing Unit)、数字信号处理器(DSP,Digital Singnal Processor)或可编程逻辑阵列(FPGA,Field-Programmable Gate Array)实现。The collecting unit, the obtaining unit, the input unit, and the recording unit may use a central processing unit (CPU), a digital signal processor (DSP, Digital Singnal Processor), or Field-Programmable Gate Array (FPGA) implementation.
本发明实施例还提供一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,该计算机可执行指令配置为执行上述的数据处理方法。The embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are configured to execute the data processing method described above.
在本发明实施例中,采用第一特征变量表征第一帐号的行为特征,第一帐号的行为特征是基于第一帐号基于互联网的行为数据获得,再将第一特征变量输入到数据分析模型后,就能得到第一帐号的行为不满足预设条件的概率值。由于第一帐号在社交应用中的行为数据能够比较广的覆盖第 一帐号的行为,输入至数据分析模型中的行为数据能够全面的体现第一帐号的行为,从而使得分析出来的第一帐号的行为不满足预设条件的概率值更加准确,进而解决了无法准确获取帐号的信用级别的技术问题。In the embodiment of the present invention, the first feature variable is used to represent the behavior characteristic of the first account, and the behavior characteristic of the first account is obtained based on the behavior data of the first account based on the Internet, and then the first feature variable is input into the data analysis model. , the probability value that the behavior of the first account does not satisfy the preset condition can be obtained. Since the behavior data of the first account in the social application can be relatively wide coverage The behavior of an account, the behavior data input into the data analysis model can fully reflect the behavior of the first account, so that the analyzed probability value of the behavior of the first account does not meet the preset condition is more accurate, thereby solving the inaccuracy Get technical questions about the credit level of your account.
附图说明DRAWINGS
此处所说明的附图用来提供对本发明的进一步理解,构成本申请的一部分,本发明的示意性实施例及其说明用于解释本发明,并不构成对本发明的不当限定。在附图中:The drawings described herein are intended to provide a further understanding of the invention, and are intended to be a part of the invention. In the drawing:
图1是根据本发明实施例的网络架构的示意图;1 is a schematic diagram of a network architecture in accordance with an embodiment of the present invention;
图2是根据本发明实施例的数据处理方法的流程图;2 is a flow chart of a data processing method according to an embodiment of the present invention;
图3是根据本发明实施例的模型架构的示意图;3 is a schematic diagram of a model architecture in accordance with an embodiment of the present invention;
图4是根据本发明实施例的数据处理装置的示意图;4 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;
图5是根据本发明实施例的服务器的硬件结构图。FIG. 5 is a hardware configuration diagram of a server according to an embodiment of the present invention.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本发明方案,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分的实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本发明保护的范围。The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is an embodiment of the invention, but not all of the embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts shall fall within the scope of the present invention.
需要说明的是,本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限 于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。It is to be understood that the terms "first", "second" and the like in the specification and claims of the present invention are used to distinguish similar objects, and are not necessarily used to describe a particular order or order. It is to be understood that the data so used may be interchanged where appropriate, so that the embodiments of the invention described herein can be implemented in a sequence other than those illustrated or described herein. In addition, the terms "comprises" and "comprises" and "the" and "the" are intended to cover a non-exclusive inclusion, for example, a process, method, system, product, or device that comprises a series of steps or units is not limited Those steps or units that are clearly listed may include other steps or units that are not explicitly listed or inherent to such processes, methods, products, or devices.
实施例1Example 1
根据本发明实施例,提供了一种可以通过本申请装置实施例执行的方法实施例,需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。In accordance with an embodiment of the present invention, an embodiment of a method that can be performed by an embodiment of the apparatus of the present application is provided. It is noted that the steps illustrated in the flowchart of the accompanying drawings can be in a computer system such as a set of computer executable instructions. The execution is performed, and although the logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in a different order than the ones described herein.
根据本发明实施例,提供了一种数据处理方法。According to an embodiment of the present invention, a data processing method is provided.
在本发明实施例一实施方式中,在本实施例中,上述数据处理方法可以应用于如图1所示的终端102和服务器104所构成的硬件环境中。如图1所示,终端102通过网络与服务器104进行连接,上述网络包括但不限于:移动通信网络、广域网、城域网或局域网,终端102可以是手机终端,也可以是PC终端、笔记本终端或平板电脑终端。In an embodiment of the present invention, in the embodiment, the data processing method can be applied to a hardware environment formed by the terminal 102 and the server 104 as shown in FIG. 1. As shown in FIG. 1 , the terminal 102 is connected to the server 104 through a network. The network includes but is not limited to: a mobile communication network, a wide area network, a metropolitan area network, or a local area network. The terminal 102 may be a mobile phone terminal, or may be a PC terminal or a notebook terminal. Or a tablet terminal.
图1中示出的硬件环境系统的主要工作原理是:The main working principle of the hardware environment system shown in Figure 1 is:
服务器104收集多个终端102的行为数据,包括终端102通过互联网执行动作所得到的行为数据(例如,在即时通信应用中聊天、观看视频、游戏等)以及终端102通过互联网和线下动作相结合的动作所得到的行为数据(如运动时通过可穿戴设备将运动数据存储在云端等)。服务器104根据这些收集到的行为数据分析一个或者多个终端102的特征变量,再根据一个或者多个终端102的特征变量来获取某个终端的行为满足预设条件的概率(如,某个终端的信用度)。进一步地,通过多个终端102的特征变量来获取某个终端102的信用度时,多个终端与这一个终端102具有关联关系(如好友关系)。The server 104 collects behavior data of the plurality of terminals 102, including behavior data obtained by the terminal 102 performing actions through the Internet (for example, chatting in an instant messaging application, watching videos, games, etc.) and combining the terminal 102 via the Internet and offline actions. Behavioral data obtained by the action (such as storing motion data in the cloud through a wearable device during exercise, etc.). The server 104 analyzes the feature variables of one or more terminals 102 according to the collected behavior data, and then acquires the probability that the behavior of a certain terminal satisfies a preset condition according to the feature variables of one or more terminals 102 (eg, a certain terminal). Credit rating). Further, when the credits of a certain terminal 102 are acquired through the feature variables of the plurality of terminals 102, the plurality of terminals have an association relationship (such as a friend relationship) with the one terminal 102.
由于采用第一帐号基于社交应用的行为数据,不局限于现有技术的银 行数据,采集的行为数据覆盖的范围更广,能够从多个方面反映第一帐号的行为满足预设条件的概率值,也就提高了获取到的概率值的准确性,从而解决了现有技术无法准确获取帐号的信用级别的技术问题。Since the first account is based on social application-based behavior data, it is not limited to the prior art silver. Row data, the collected behavior data covers a wider range, and can reflect the probability value of the behavior of the first account meeting the preset condition from multiple aspects, thereby improving the accuracy of the obtained probability value, thereby solving the existing Technology can't accurately get the technical problem of the credit level of the account.
图2是根据本发明实施例的数据处理方法的流程图,以下结合图2对本发明实施例所提供的数据处理方法做具体介绍,如图2所示,该数据处理方法主要包括如下步骤:FIG. 2 is a flowchart of a data processing method according to an embodiment of the present invention. The data processing method provided by the embodiment of the present invention is specifically described below with reference to FIG. 2 . As shown in FIG. 2 , the data processing method mainly includes the following steps:
步骤S202,采集第一帐号的行为数据,行为数据包括基于互联网的线上数据和线下数据。Step S202: Collect behavior data of the first account, and the behavior data includes online data and offline data based on the Internet.
步骤S204,根据行为数据获取第一帐号的第一特征变量,其中,第一特征变量用于表示第一帐号的行为特征。Step S204: Acquire a first feature variable of the first account according to the behavior data, where the first feature variable is used to represent a behavior feature of the first account.
步骤S206,将第一特征变量输入数据分析模型,其中,数据分析模型用于根据第一特征变量输出第一数值,第一数值用于表示第一帐号的行为不满足预设条件的概率值。Step S206, the first feature variable is input into the data analysis model, wherein the data analysis model is configured to output the first value according to the first feature variable, where the first value is used to indicate a probability value that the behavior of the first account does not satisfy the preset condition.
步骤S208,记录数据分析模型输出的第一数值。Step S208, recording a first value output by the data analysis model.
采用第一特征变量表征第一帐号的行为特征,第一帐号的行为特征是基于第一帐号基于互联网的行为数据获得,再将第一特征变量输入到数据分析模型后,就能得到第一帐号的行为不满足预设条件的概率值。由于第一帐号在社交应用中的行为数据能够比较广的覆盖第一帐号的行为,输入至数据分析模型中的行为数据能够全面的体现第一帐号的行为,从而使得分析出来的第一帐号的行为不满足预设条件的概率值更加准确。The first characteristic variable is used to represent the behavior characteristic of the first account, and the behavior characteristic of the first account is obtained based on the behavior data of the first account based on the Internet, and then the first feature variable is input into the data analysis model, and the first account is obtained. The behavior does not satisfy the probability value of the preset condition. Since the behavior data of the first account in the social application can cover the behavior of the first account relatively widely, the behavior data input into the data analysis model can fully reflect the behavior of the first account, thereby causing the analysis of the first account. The probability value that the behavior does not satisfy the preset condition is more accurate.
具体地,行为数据包括第一帐号基于互联网的线上行为数据和线下行为数据。Specifically, the behavior data includes online behavior data and offline behavior data of the first account based on the Internet.
互联网线上虚拟空间行为数据包括不仅限于:The virtual space behavior data on the Internet includes not only:
1)用户的基础人口属性信息,如姓名、年龄、性别、地区、学历、职业等; 1) User's basic demographic attribute information, such as name, age, gender, region, education, occupation, etc.;
2)虚拟增值服务数据,如虚拟帐号角色装扮,游戏道具购买,影视会员服务,云存储空间增值服务,音乐流量包等;2) Virtual value-added service data, such as virtual account role dressing, game item purchase, film and television membership service, cloud storage space value-added service, music flow package, etc.;
3)社交互动行为数据,如聊天,电子邮件,语音通话,微博空间发表,豆瓣评论点评,知乎问答,公众号文章阅读等;3) Social interaction behavior data, such as chat, email, voice call, microblog space release, Douban review, knowledge and answer, public article reading, etc.;
4)经济行为数据,如支付,理财,购物,股票、基金、P2P、金融等;4) Economic behavior data, such as payment, wealth management, shopping, stocks, funds, P2P, finance, etc.;
5)娱乐休闲行为数据,如视频点播,音乐播放,K歌,新闻阅读等;5) Entertainment and leisure behavior data, such as video on demand, music playback, K song, news reading, etc.;
6)教育行为数据,如线上读书,公开课学习,职业考试练习,技能培训,翻译软件使用等;6) Educational behavior data, such as online reading, open class study, vocational test practice, skill training, translation software use, etc.;
7)其他互联网移动应用行为数据,如App下载,搜索等。7) Other Internet mobile application behavior data, such as App download, search, etc.
线上数据可以通过用户手机,平板或者PC电脑上即时通信应用程序、游戏客户端、APP下载平台、理财平台、购物软件等采集用户填写的资料或者应用程序主动上报获得。The online data can be obtained through the user's mobile phone, tablet or PC computer instant messaging application, game client, APP download platform, financial platform, shopping software, etc. to collect the information filled by the user or the application is actively reported.
线下关联场景数据包括不仅限于:Offline related scene data includes not only:
1)O2O(online to offline,即线上到线下)生活服务信息,如家政服务,城市服务,美容保健等;1) O2O (online to offline), such as housekeeping services, urban services, beauty care, etc.;
2)穿戴设备数据,如医疗健康,运动等;2) wear device data, such as medical health, sports, etc.;
3)LBS(location based service,基于位置的服务器)地理位置数据,如导航,签到,专车等;3) LBS (location based service) location data, such as navigation, check-in, special car, etc.;
4)旅游出行数据,如票务订购,酒店预定等。4) Travel data, such as ticket ordering, hotel reservations, etc.
可见,行为数据包括多种线上和线下场景中的动作,几乎囊括了生活各个方面的行为数据,因此,根据这些行为数据获得的概率值也更加准确的反应了帐号的真实概率值。另外,当行为数据由变化时会即时反馈给服务器或者即时被服务器获取,因此,行为数据的更新速度快,根据这些即时更新的行为数据获得的概率值可以反映第一帐号当下的行为不满足预设条件的概率。不满足预设条件的概率值可以是违约概率,如不遵守合同规 定的行为等。It can be seen that the behavior data includes actions in various online and offline scenes, and almost includes behavior data of various aspects of life. Therefore, the probability values obtained according to the behavior data also more accurately reflect the true probability value of the account. In addition, when the behavior data changes, it will be immediately fed back to the server or instantly obtained by the server. Therefore, the behavior data is updated quickly, and the probability value obtained according to the behavior data of the instant update may reflect that the behavior of the first account is not satisfied. Set the probability of the condition. The probability value that does not satisfy the preset condition may be the probability of default, such as not complying with the contract Definite behavior, etc.
例如,根据用户A的行为数据来获取该用户A的信用度时,可以采集用户A在即时通信应用中的帐号聊天行为、在视频应用中观看视频的行为、下载应用的行为等。从这些行为数据中分别提取中第一特征变量,可以得到不同类别的第一特征变量。如即时通信类的第一特征变量,视频类的第一特征变量和下载类的第一特征变量,将这些不同类别的第一特征变量的全部都输入到数据分析模型中,输出第一数值。也可以采用将不同类别的第一特征变量中的部分输入到数据分析模型中。For example, when the credit of the user A is obtained according to the behavior data of the user A, the account chat behavior of the user A in the instant messaging application, the behavior of watching the video in the video application, the behavior of downloading the application, and the like may be collected. Extracting the first feature variable from the behavior data separately can obtain the first feature variable of different categories. For example, the first feature variable of the instant communication class, the first feature variable of the video class, and the first feature variable of the download class, all of the first feature variables of the different classes are input into the data analysis model, and the first value is output. It is also possible to input portions of the first characteristic variables of different classes into the data analysis model.
通常情况下,用户A的好友与用户A具有相似点,用户A的好友的行为数据也能反映用户A的行为不满足预设条件的概率。因此,在向数据分析模型输入第一特征变量时,还可以同时输入与用户A的好友相关联的特征变量。Generally, the friend of the user A has a similarity with the user A, and the behavior data of the friend of the user A can also reflect the probability that the behavior of the user A does not satisfy the preset condition. Therefore, when the first feature variable is input to the data analysis model, the feature variable associated with the friend of the user A can also be input at the same time.
即,将第一特征变量输入数据分析模型包括:获取第二特征变量,其中,第二特征变量用于表示与第一帐号具有关联关系的多个第二帐号的行为特征;将第一特征变量和第二特征变量输入至数据分析模型,其中,数据分析模型还用于根据第一特征变量和第二特征变量输出第一数值。That is, the inputting the first feature variable into the data analysis model includes: acquiring the second feature variable, wherein the second feature variable is used to represent behavior characteristics of the plurality of second account accounts having an association relationship with the first account; And the second characteristic variable is input to the data analysis model, wherein the data analysis model is further configured to output the first value according to the first feature variable and the second feature variable.
获取第二特征变量的方法与获取第一特征变量的方法相同,在后文详述。具有关联关系的第一帐号和多个第二帐号为好友关系,可以理解为多个第二帐号是第一帐号的好友。在上述例子中的线上行为和线下行为都可以通过一定的对应关系映射为某个应用帐号的行为。例如,第二帐号利用手机号码注册了导航服务和即时通信应用,在获取第二帐号的行为数据时,采集第二帐号在导航服务中的行为数据,以及第二帐号在即时通信应用中的行为数据。The method of obtaining the second feature variable is the same as the method of acquiring the first feature variable, which will be described in detail later. The first account with the associated relationship and the plurality of the second account are friends, and the plurality of second accounts are friends of the first account. Both the online behavior and the offline behavior in the above example can be mapped to the behavior of an application account through a certain correspondence. For example, the second account registers the navigation service and the instant messaging application by using the mobile phone number, and collects the behavior data of the second account in the navigation service and the behavior of the second account in the instant messaging application when acquiring the behavior data of the second account. data.
进一步地,将第一特征变量和第二特征变量输入至数据分析模型包括:获取多个第二帐号与第一帐号之间的亲密度,其中,亲密度根据每个第二 帐号与第一帐号的互动行为生成;采用以下公式根据亲密度和第二特征变量获取第三特征变量:Further, inputting the first feature variable and the second feature variable to the data analysis model includes: acquiring intimacy between the plurality of second account accounts and the first account, wherein the intimacy is according to each second The interaction behavior between the account and the first account is generated; the third feature variable is obtained according to the intimacy and the second feature variable by using the following formula:
υ'=f((α12,...,αi,...,αn),(υ12,...,υi,...,υn)),υ'=f((α 12 ,...,α i ,...,α n ),(υ 12 ,...,υ i ,...,υ n )),
其中,υ'表示第三特征变量,i表示第i个第二帐号,αi为第i个第二帐号与第一帐号的亲密度,υi是第i个第二帐号的第二特征变量,f为用于表示亲密度由高到低的排序中,前n个第二帐号的第二特征变量与亲密度的加权平均值;将第一特征变量和第三特征变量输入至数据分析模型。Where υ' represents the third characteristic variable, i represents the i-th second account, α i is the intimacy of the i-th second account and the first account, and υ i is the second characteristic variable of the i-th second account f is a weighted average of the second characteristic variable and the intimacy of the first n second accounts in the order of indicating the high to low intimacy; the first characteristic variable and the third characteristic variable are input to the data analysis model .
在本实施例中,对第二帐号的第二特征变量进行数据处理,使其更能体现第一帐号的行为特征。因此,在获取第二变量时,用每个第二特征变量乘以相应的权重值,再进行加权平均。该权重值表示第一帐号与第二帐号的亲密度。第一帐号与第二帐号越亲密,权重值越大;反之,权重值越小。亲密度可以通过第一帐号和第二帐号之间的互动来衡量,比如第一帐号和第二帐号之间聊天越多,关系越亲密。第一帐号与第二帐号的社区重合度越高,这两个帐号的关系越亲密。亲密度和重合度可以采用训练模型的方式来获取。互动行为包括朋友圈互动、支付互动(如发红包)、运动互动(如步行1万步点赞)等。亲密度可以通过上述的信息互动来体现,包括发送和接收信息的次数、天数等,还包括发送和接收信息的比例,每天进行信息互动的次数等。这些信息包括文字信息、视频信息和语音信息等。亲密度还可以通过评论、点赞,标记好友为特别好友、送礼物或者拉黑等行为来获得。In this embodiment, the second feature variable of the second account is processed to make it better reflect the behavior characteristics of the first account. Therefore, when the second variable is acquired, each second characteristic variable is multiplied by the corresponding weight value, and then the weighted average is performed. The weight value indicates the intimacy of the first account and the second account. The closer the first account is to the second account, the greater the weight value; conversely, the smaller the weight value. The intimacy can be measured by the interaction between the first account and the second account. For example, the more chats between the first account and the second account, the more intimate the relationship. The higher the degree of overlap between the first account and the second account, the more intimate the relationship between the two accounts. The intimacy and coincidence can be obtained by means of a training model. Interactive behaviors include interactions in the circle of friends, payment interactions (such as red envelopes), and sports interactions (such as walking 10,000 steps). Intimacy can be reflected by the above information interaction, including the number of times the information is sent and received, the number of days, etc., as well as the ratio of sending and receiving information, the number of times of information interaction every day. This information includes text information, video information, and voice information. Intimacy can also be obtained by commenting, praising, marking a friend as a special friend, giving a gift, or pulling black.
例如,第三特征变量
Figure PCTCN2016109729-appb-000001
即亲密度为按照由亲到疏的顺序排列的前10个好友的第二特征变量与亲密度的加权平均值。
For example, the third characteristic variable
Figure PCTCN2016109729-appb-000001
That is, the intimacy is a weighted average of the second characteristic variable and the intimacy of the top 10 friends arranged in order of intimacy.
一个群体的普遍特征可以反映这个群体中某个用户的特征,因此,可以根据一个群体的特征来获得行为不满足预设条件的概率值,可以更加准确的体现用户的信用度。需要说明的是,在获取多个第二账户的第二特征 变量时,先根据第一帐号的好友中与第一帐号的亲密度,选择亲密度排名前n个第二帐号,再根据亲密度和第二特征变量生成第三特征变量。The general characteristics of a group can reflect the characteristics of a certain user in this group. Therefore, the probability value of the behavior that does not satisfy the preset condition can be obtained according to the characteristics of a group, and the credit degree of the user can be more accurately reflected. It should be noted that the second feature of acquiring multiple second accounts is obtained. In the variable, the first n second accounts of the intimacy are selected according to the intimacy of the first account and the first account, and the third feature variable is generated according to the intimacy and the second feature variable.
由于采集的行为数据范围比较广,获得的数据格式也不相同。因此,在得到行为数据之后,先删除异常数据,去除重复数据,过滤掉波动值较大的数据以及补全缺失的数据。其中,异常数据可以是明显超出一定范围的数据,例如,通常人的年龄不会超过一百,如果采集的数据显示年龄为100,则删除该异常数据。如果采集到的数据显示年龄包括0和49,0和49处于0至100的范围内,然而,其他数据多数都在18至45之间,因此,0和49属于波动值较大的奇异点。Due to the wide range of behavioral data collected, the data formats obtained are also different. Therefore, after obtaining the behavior data, the abnormal data is deleted, the duplicate data is removed, the data with larger fluctuation values is filtered out, and the missing data is completed. The abnormal data may be data that is obviously beyond a certain range. For example, the normal person's age does not exceed one hundred. If the collected data shows that the age is 100, the abnormal data is deleted. If the collected data shows that the age includes 0 and 49, 0 and 49 are in the range of 0 to 100, however, most of the other data are between 18 and 45, so 0 and 49 belong to the singular point with a large fluctuation value.
在对行为数据进行基本的处理之后,根据数据来源和业务特点将行为数据划分为多个维度。比如,基本信息、社交互动、金融理财等,归类整合写入数据库。在写入数据库时,可以先约定好数据类型及数据结构。例如,数值的类型为int,地区名称的类型为字符串。其他形式也可,不再一一举例。After basic processing of behavioral data, behavioral data is divided into multiple dimensions based on data sources and business characteristics. For example, basic information, social interaction, financial management, etc., are classified and integrated into the database. When writing to the database, you can agree on the data type and data structure. For example, the type of the value is int and the type of the region name is a string. Other forms are also available, no longer one by one.
由于这些存储的行为数据数量庞大,且很多数据之间具有相关性,因此,需要对这些数据进行筛选,得到显著性较强的特征来输入到数据分析模型。Because of the large amount of behavioral data stored in these stores, and the correlation between many data, it is necessary to filter these data and obtain more significant features to input into the data analysis model.
a)基础属性中公务员人群工作较稳定,经济来源可靠,可以反映用户的经济能力和还款意愿;a) The civil servants in the basic attributes are more stable and have reliable economic sources, which can reflect the user's economic ability and willingness to repay;
b)社交互动中经常不及时回复消息的用户可能比较懒散,反映用户在性格上比较拖延;b) Users who often do not respond to messages in a timely manner in social interactions may be lazy, reflecting the user's delay in personality;
c)经常购买增值服务、网上购物的用户可以反映用户的经济能力;c) Users who frequently purchase value-added services and shop online can reflect the user's financial ability;
d)经济行为中股票、基金、P2P购买可以反映用户的风险承受能力和经济能力;d) Stocks, funds, and P2P purchases in economic behavior can reflect the user's risk tolerance and economic ability;
e)打了专车但又经常取消订单或者评星较低可以反映用户的信誉; e) hit the car but often cancel the order or the lower rating can reflect the user's reputation;
f)用户经常来往的朋友如果都是高素质,守约定,经济能力较强的人群,一定程度上可以反馈用户自身。f) If the users who are frequented by the users are all high-quality, adherent to the agreement, the people with strong economic ability can feedback the users themselves to a certain extent.
上述基础属性、社交互动行为、购买行为、打车行为和朋友属性都能体现第一帐号的行为特征。The above basic attributes, social interaction behavior, purchase behavior, taxi behavior and friend attributes can all reflect the behavior characteristics of the first account.
在本发明实施例一实施方式中,根据行为数据获取第一帐号的第一特征变量包括:获取行为数据中特征的信息增益,信息增益用于表示行为数据包含的信息量;判断信息增益是否处于预设数值范围之内;若信息增益处于预设数值范围之内,则根据行为数据构造衍生变量,其中,衍生变量为合并或者拆分后的行为数据;若信息增益处于预设数值范围以外,则删除处于预设数值范围以外的信息增益对应的特征,再根据剩余的特征构造衍生变量;将衍生变量作为第一特征变量。In an embodiment of the present invention, acquiring the first feature variable of the first account according to the behavior data includes: acquiring an information gain of the feature in the behavior data, the information gain is used to indicate the amount of information included in the behavior data; and determining whether the information gain is Within the preset value range; if the information gain is within the preset value range, the derived variable is constructed according to the behavior data, wherein the derived variable is the merged or split behavior data; if the information gain is outside the preset value range, Then, the feature corresponding to the information gain outside the preset value range is deleted, and the derivative variable is constructed according to the remaining features; the derived variable is used as the first feature variable.
在本发明实施例一实施方式中,删除处于预设数值范围以外的信息增益对应的特征,再根据剩余的特征构造衍生变量包括:在删除处于预设数值范围以外的信息增益对应的特征之后,获取剩余的特征的相关系数;将相关系数大于等于预设系数的特征合并为一个合并特征;将合并特征作为衍生变量。In an embodiment of the present invention, deleting a feature corresponding to an information gain outside a preset value range, and then constructing a derivative variable according to the remaining feature includes: deleting a feature corresponding to an information gain outside a preset value range, Obtaining the correlation coefficient of the remaining features; combining the features whose correlation coefficient is greater than or equal to the preset coefficient into one merge feature; and using the merged feature as a derivative variable.
特征和特征值构成了行为数据。例如采集到的行为数据包括文本聊天次数、语音通话次数、支付金额等,都是行为数据中的特征。而例如,文本聊天9次,语音通话10次和支付金额100,其中的数字都叫做特征值。信息增益可以体现一个特征的信息量。如果信息量小于阈值,可以删除该特征。例如,对每一类特征进行信息增益的排序,将信息增益小于阈值的特征删除。然后再检测剩余的特征的相关性,如果有相关性较强的特征,则将这些相关性较强的特征合并,得到第一特征变量。如果某个特征的相关性较弱,显著性很强,则可以把这一个特征细化为多个特征。例如,将聊天次数拆分为晚上聊天次数、白天聊天次数、周末聊天次数和工作日聊 天次数等。相反的,也可以把晚上聊天次数和白天聊天次数合并为聊天次数。Features and eigenvalues constitute behavioral data. For example, the collected behavior data including the number of text chats, the number of voice calls, the amount of payment, etc., are all characteristics in the behavior data. For example, text chat 9 times, voice call 10 times and payment amount 100, the numbers are called feature values. The information gain can reflect the amount of information of a feature. If the amount of information is less than the threshold, the feature can be deleted. For example, the information gain is sorted for each type of feature, and the feature with the information gain less than the threshold is deleted. Then, the correlation of the remaining features is detected. If there are more relevant features, the features with stronger correlation are combined to obtain the first feature variable. If a feature is weakly correlated and highly significant, then this feature can be refined into multiple features. For example, split the number of chats into evening chats, daytime chats, weekend chats, and weekday chats. Number of days, etc. Conversely, you can combine night chats and day chats into chats.
在本实施例,可以灵活的对行为数据进行拆分和合并来构造第一特征变量,且拆分和合并时,可以对多个特征采用相同或者不同的方法,(比如,部分特征采用主成分分析法和其它特征采用聚类法等),增加了构造第一特征变量的灵活性。In this embodiment, the behavior data can be flexibly split and merged to construct the first feature variable, and when splitting and merging, the same or different methods can be used for multiple features (for example, some features adopt the principal component) The analysis method and other features using clustering methods, etc.) increase the flexibility of constructing the first feature variable.
在本发明实施例一实施方式中,在建立数据分析模型时,可以按照采集行为数据时所划分的类别,将每个类别对应生成一个子模型,每个子模型都能输出一个第一子值,将这些第一子值进行处理,就得到了数据分析模型输出的第一数值。进一步地,在建立子模型时,可以根据每个类别的样本数据进行训练,得到子模型,也可以继续对每个类别进一步划分,对划分后的数据建立低级别模型,多个低级别模型构成子模型,然后再由子模型构成数据分析模型。In an embodiment of the present invention, when the data analysis model is established, each subcategory may be generated corresponding to each category according to the category classified by the behavior data, and each submodel can output a first sub-value. Processing these first sub-values yields the first value of the data analysis model output. Further, when the sub-model is established, the sub-model may be trained according to the sample data of each category, and each category may be further divided, and a low-level model is established for the divided data, and multiple low-level models are constructed. The submodel, and then the submodel constitutes the data analysis model.
在本发明实施例一实施方式中,在将第一特征变量和第二特征变量输入至数据分析模型之前,方法还包括:将行为数据划分为多个类别;分别对多个类别中的每个类别建立一个子模型,其中,每个子模型用于根据第一特征变量和/或第二特征变量输出第一子值,其中,第一子值用于表示在与子模型对应的类别下,第一帐号的行为不满足预设条件的概率值;将多个类别对应的多个子模型构建为数据分析模型。In an embodiment of the present invention, before inputting the first feature variable and the second feature variable to the data analysis model, the method further includes: dividing the behavior data into a plurality of categories; respectively for each of the plurality of categories The category establishes a sub-model, wherein each sub-model is configured to output a first sub-value according to the first characteristic variable and/or the second characteristic variable, wherein the first sub-value is used to represent the category corresponding to the sub-model, The behavior of an account does not satisfy the probability value of the preset condition; and the plurality of sub-models corresponding to the plurality of categories are constructed as a data analysis model.
在本发明实施例一实施方式中,分别对多个类别中的每个类别建立一个子模型包括:采用相同或者不同的训练模型分别对每个类别建立一个子模型;或者采用相同或者不同的训练模型分别对每个类别下的子类别建立低级别模型,将每个类别下的多个子类别对应的低级别模型构建为子模型。In an embodiment of the present invention, establishing a sub-model for each of the plurality of categories includes: establishing a sub-model for each category using the same or different training models; or using the same or different training The model establishes a low-level model for each subcategory under each category, and constructs a low-level model corresponding to multiple subcategories under each category as a sub-model.
每个类别建立子模型所采用的训练模型可以相同或者不同,比如,10个类别中,5个类别采用决策树训练模型,另外5个采用神经网络训练子模 型。The training models used to build sub-models in each category can be the same or different. For example, among the 10 categories, 5 categories use the decision tree training model, and the other 5 use the neural network training sub-model. type.
在本发明实施例一实施方式中,将多个类别对应的多个子模型构建为数据分析模型包括:采用以下方式将多个子模型构建为数据分析模型:In an embodiment of the present invention, the constructing the plurality of sub-models corresponding to the plurality of categories into the data analysis model comprises: constructing the plurality of sub-models into the data analysis model in the following manner:
Figure PCTCN2016109729-appb-000002
Figure PCTCN2016109729-appb-000002
其中,P表示第一数值,i为多个子模型中的第i个子模型,n为多个子模型的个数,
Figure PCTCN2016109729-appb-000003
为第i个子模型的系数,Pi'为第i个子模型输出的第一子值,P0为常数特征。
Where P always represents the first value, i is the i-th sub-model of the plurality of sub-models, and n is the number of the plurality of sub-models,
Figure PCTCN2016109729-appb-000003
For the coefficient of the i-th sub-model, P i ' is the first sub-value output of the i-th sub-model, and P 0 is a constant feature.
进一步地,将行为数据划分为多个类别包括:按照行为数据所包括的业务类型将行为数据划分为多个类别;或者将行为数据中包括目标对象的数据划分为一类,将行为数据中不包括目标对象的数据划分为另外一类。Further, dividing the behavior data into a plurality of categories includes: dividing the behavior data into a plurality of categories according to the type of the service included in the behavior data; or dividing the data including the target object in the behavior data into one category, and not in the behavior data The data including the target object is divided into another category.
按照层级划分、按照业务类型划分和按照是否包含目标对象的划分,这三种划分方法可以采用其中任意一种单独构建子模型,也可以任意两种或者三种的组合构建子模型。例如,先按照是否包含目标对象的划分建立子模型,然后再按照业务类型划分子模型以下的低级别子模型等等。According to the hierarchical division, the division according to the business type, and the division according to whether or not the target object is included, the three division methods may use any one of them to separately construct the sub-model, or may construct the sub-model by any two or three combinations. For example, first establish a sub-model according to whether or not the target object is divided, and then divide the sub-model below the sub-model according to the business type.
分业务主要参考前面数据类别划分,如基础信息、增值业务、社交互动、经济行为等;分群主要根据业务特点来划分,如经济行为中,有信用卡与没有信用卡的用户在支付、购物、理财等行为表现差异较大,因此可以划分为两个人群,分别构建模型;分层主要表现在整个模型架构的层次上,如子模型层,子模型也可以划分为多个维度层,每层采用的机器学习算法可以完全不一样。The sub-services mainly refer to the previous data categories, such as basic information, value-added services, social interactions, economic behaviors, etc.; grouping is mainly based on business characteristics. For example, in economic activities, there are credit cards and users without credit cards in payment, shopping, wealth management, etc. The behavioral performance is quite different, so it can be divided into two groups to build the model separately. The layering is mainly at the level of the whole model architecture. For example, the sub-model layer can also be divided into multiple dimension layers, each layer is adopted. Machine learning algorithms can be quite different.
在生成子模型时,详细做法如下:When generating a submodel, the detailed approach is as follows:
1)获取好坏样本,划分训练集和测试集;好样本为守约时的行为数据,坏样本为违约时的行为数据。1) Obtain good and bad samples, divide training sets and test sets; good samples are behavior data at the time of compliance, and bad samples are behavior data at the time of default.
2)根据子模型业务特点,提取多维度用户自身及好友特征,采用回归、 分类、分段多种机器学习算法训练多层子模型。以社交互动子模型为例,步骤如下:2) According to the characteristics of the sub-model business, extract the characteristics of the multi-dimensional user and friends, adopt regression, Classification, segmentation, multiple machine learning algorithms, training of multi-layer submodels. Take the social interaction sub-model as an example. The steps are as follows:
1.提取至少包括以下几个维度自身及好友特征:文本聊天、语音消息、视频通话、图片发表、评论点赞、问答互动;1. Extract at least the following dimensions and friends characteristics: text chat, voice message, video call, picture release, comment like, question and answer interaction;
2.采用LR(逻辑回归)、决策树、神经网络、GBDT等机器学习算法训练社交互动子模型的维度层模型,输出信用概率值;2. Using LR (logical regression), decision tree, neural network, GBDT and other machine learning algorithms to train the dimensional layer model of the social interaction sub-model, and output the credit probability value;
3.采用2中所述算法训练社交互动子模型,输出信用概率值(第一子值)。3. Train the social interaction sub-model using the algorithm described in 2, and output the credit probability value (first sub-value).
3)将子模型输出的信用概率值作为输入值,用公式
Figure PCTCN2016109729-appb-000004
训练总模型,输出预测概率值(第一数值);
3) Using the credit probability value output by the submodel as the input value, using the formula
Figure PCTCN2016109729-appb-000004
Train the total model and output the predicted probability value (first value);
结合图3对本实施例进行说明。This embodiment will be described with reference to FIG. 3.
本实施例的数据处理方法主要分为4个部分,包括数据采集、数据处理、特征挖掘和模型构建。The data processing method of this embodiment is mainly divided into four parts, including data acquisition, data processing, feature mining and model construction.
(1)数据采集。包括采集线上数据和线下场景数据。线上数据包括游戏、金融、应用程序、购物、社交和教育等方面的数据,具体例如,游戏名称、购物金额等。线下场景数据包括生活、导航、旅行、签到、医疗和运动等数据。例如,医疗记录、订酒店、旅游地点等数据。(1) Data collection. This includes collecting online data and offline scene data. Online data includes data on games, finance, apps, shopping, social, and education, such as game titles, purchase amounts, and more. Offline scene data includes data such as life, navigation, travel, check-in, medical, and sports. For example, medical records, booking hotels, tourist locations, etc.
(2)数据处理。包括清洗、整合和规范化。清洗包括去重、删除奇异点、清除异常数据和信息补充等,整合包括将相同类别的数据划分为同一个类别等,规范化包括数据类型的规范化和存储数据结构的规范化。(2) Data processing. Includes cleaning, integration, and standardization. Cleaning includes deduplication, deletion of singularities, removal of abnormal data and information supplementation, integration includes dividing the same category of data into the same category, and normalization includes normalization of data types and normalization of storage data structures.
(3)特征挖掘。对处理后的数据进行挖掘,例如,采用图计算和文本挖掘方法进行。挖掘的特征包括用户基本信息、社交互动、人格特征、兴趣爱好、情感倾向、生活圈子、身体健康和金融理财等各个方面的数据。(3) Feature mining. The processed data is mined, for example, using graph calculation and text mining methods. The characteristics of mining include data on user basic information, social interaction, personality traits, hobbies, emotional orientation, life circle, physical health and financial management.
(4)模型构建。对挖掘到的特征进行分类,每个分类建立一个模型。比如,社交互动类、兴趣爱好类、健康类和性格等。每个分类建立一个模 型。且每个模型可以采用不同的学习训练方法得到。对于社交互动类的模型,还可以将社交互动类的特征细分为聊天特征、语音特征和视频特征等。在构建完子模型后,得到总模型。再将第一特征变量和第三特征变量输入到子模型中,得到总模型输出的第一数值。(4) Model construction. Classify the mined features and build a model for each category. For example, social interaction classes, hobbies, health and personality. Create a model for each category type. And each model can be obtained using different learning and training methods. For the social interaction class model, the characteristics of the social interaction class can also be subdivided into chat features, phone features and video features. After building the submodel, the total model is obtained. The first feature variable and the third feature variable are then input into the sub-model to obtain a first value of the total model output.
例如,第一特征变量包括特征a1、特征a2和特征a3,那么,也获取第二账号的特征b1、特征b2和特征b3作为第三特征变量,输入到子模型中,如下:y=f(a1*b1)+f(a2*b2)+f(a3*b3)。特征a1、特征a2和特征a3与特征b1、特征b2和特征b3为依次对应的三对特征。如,特征a1表示第一账号的支付金额,特征b1表示第二账号的支付金额,特征a2表示第一账号的游戏类型,特征b2表示第二账号的游戏类型,特征a3表示第一账号的运动次数,特征b3表示第二账号的运动次数。For example, the first feature variable includes the feature a1, the feature a2, and the feature a3. Then, the feature b1, the feature b2, and the feature b3 of the second account are also acquired as the third feature variable, and input into the submodel as follows: y=f( A1*b1)+f(a2*b2)+f(a3*b3). The feature a1, the feature a2, and the feature a3 and the feature b1, the feature b2, and the feature b3 are three pairs of features that sequentially correspond. For example, feature a1 represents the payment amount of the first account, feature b1 represents the payment amount of the second account, feature a2 represents the game type of the first account, feature b2 represents the game type of the second account, and feature a3 represents the movement of the first account. The number of times, feature b3 represents the number of times of movement of the second account.
在本发明实施例一实施方式中,为了提高第一数值的可读性,对第一数值进行转换,转换为能够体现第一账号的信用程序。第一数值表示第一帐号违约的概率值,转换为第三数值后,可以表示第一帐号的信用程度。即在记录数据分析模型输出的第一数值之后,方法还包括:采用以下方法将第一数值转换为第三数值S:In an embodiment of the present invention, in order to improve the readability of the first value, the first value is converted into a credit program capable of embodying the first account. The first value represents the probability value of the first account default, and after converting to the third value, the credit level of the first account may be indicated. That is, after recording the first value output by the data analysis model, the method further comprises: converting the first value to the third value S by using the following method:
Figure PCTCN2016109729-appb-000005
Figure PCTCN2016109729-appb-000005
其中,S用于表示第一帐号的行为满足预设条件的程度,b表示基准数值,p表示第一数值,st表示步长。Wherein, S is used to indicate the degree to which the behavior of the first account satisfies the preset condition, b represents a reference value, p represents a first value, and st represents a step size.
本实施例获取信用度时,使用的特征全面覆盖用户的线上线下行为特征,不仅包括用户基本信息,社交互动,金融活动,兴趣爱好,生活圈子,还深入挖掘了用户的性格特点,情感倾向等更能刻画用户精神面貌和个性的稳定特征。同时,采用多层多样化的机器学习算法,兼顾可解释性的同时提高算法复杂度和预测能力,提高了评价用户的信用程序的准确性。 When the credit is obtained in this embodiment, the features used comprehensively cover the online and offline behavior characteristics of the user, including not only basic user information, social interaction, financial activities, hobbies, life circles, but also deepening the user's personality characteristics, emotional inclination, etc. It is more able to characterize the stable features of the user's mental outlook and personality. At the same time, the multi-layered machine learning algorithm is adopted, which can improve the complexity and predictive ability of the algorithm while improving the accuracy of the user's credit program.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。It should be noted that, for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should understand that the present invention is not limited by the described action sequence. Because certain steps may be performed in other sequences or concurrently in accordance with the present invention. In addition, those skilled in the art should also understand that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到根据上述实施例的方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware, but in many cases, the former is A better implementation. Based on such understanding, the technical solution of the present invention, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a cell phone, a computer, a server, or a network device, etc.) to perform the methods described in various embodiments of the present invention.
实施例2Example 2
根据本发明实施例,还提供了一种用于实施上述数据处理方法的数据处理装置,该数据处理装置主要用于执行本发明实施例上述内容所提供的数据处理方法,以下对本发明实施例所提供的数据处理装置做具体介绍:According to an embodiment of the present invention, there is further provided a data processing apparatus for implementing the data processing method, the data processing apparatus is mainly used to perform the data processing method provided by the foregoing content of the embodiment of the present invention, and the following is an embodiment of the present invention. The data processing device provided is specifically introduced:
图4是根据本发明实施例的数据处理装置的示意图,如图4所示,该数据处理装置主要包括:采集单元10、获取单元20、输入单元30和记录单元40。4 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention. As shown in FIG. 4, the data processing apparatus mainly includes an acquisition unit 10, an acquisition unit 20, an input unit 30, and a recording unit 40.
采集单元10用于采集第一帐号的行为数据,行为数据包括基于互联网的线上行为数据和线下行为数据。The collecting unit 10 is configured to collect behavior data of the first account, and the behavior data includes online behavior data and offline behavior data based on the Internet.
获取单元20用于根据行为数据获取第一帐号的第一特征变量,其中,第一特征变量用于表示第一帐号的行为特征。The obtaining unit 20 is configured to obtain a first feature variable of the first account according to the behavior data, where the first feature variable is used to represent a behavior feature of the first account.
输入单元30用于将第一特征变量输入数据分析模型,其中,数据分析 模型用于根据第一特征变量输出第一数值,第一数值用于表示第一帐号的行为不满足预设条件的概率值。The input unit 30 is configured to input the first feature variable into the data analysis model, wherein the data analysis The model is configured to output a first value according to the first feature variable, where the first value is used to indicate a probability value that the behavior of the first account does not satisfy the preset condition.
记录单元40用于记录数据分析模型输出的第一数值。The recording unit 40 is for recording the first value of the data analysis model output.
采用第一特征变量表征第一帐号的行为特征,第一帐号的行为特征是基于第一帐号基于互联网的行为数据获得,再将第一特征变量输入到数据分析模型后,就能得到第一帐号的行为不满足预设条件的概率值。由于第一帐号在社交应用中的行为数据能够比较广的覆盖第一帐号的行为,输入至数据分析模型中的行为数据能够全面的体现第一帐号的行为,从而使得分析出来的第一帐号的行为不满足预设条件的概率值更加准确。The first characteristic variable is used to represent the behavior characteristic of the first account, and the behavior characteristic of the first account is obtained based on the behavior data of the first account based on the Internet, and then the first feature variable is input into the data analysis model, and the first account is obtained. The behavior does not satisfy the probability value of the preset condition. Since the behavior data of the first account in the social application can cover the behavior of the first account relatively widely, the behavior data input into the data analysis model can fully reflect the behavior of the first account, thereby causing the analysis of the first account. The probability value that the behavior does not satisfy the preset condition is more accurate.
具体地,行为数据包括第一帐号基于互联网的线上行为数据和线下行为数据。Specifically, the behavior data includes online behavior data and offline behavior data of the first account based on the Internet.
行为数据包括多种线上和线下场景中的动作,几乎囊括了生活各个方面的行为数据,因此,根据这些行为数据获得的概率值也更加准确的反应了帐号的真实概率值。另外,当行为数据有变化时会即时反馈给服务器或者即时被服务器获取,因此,行为数据的更新速度快,根据这些即时更新的行为数据获得的概率值可以反映第一帐号当下的行为不满足预设条件的概率。不满足预设条件的概率值可以是违约概率,如不遵守合同规定的行为等。Behavioral data includes actions in a variety of online and offline scenarios, including behavioral data for all aspects of life. Therefore, the probability values obtained from these behavioral data also more accurately reflect the true probability value of the account. In addition, when the behavior data changes, it will be immediately fed back to the server or instantly obtained by the server. Therefore, the behavior data is updated quickly, and the probability value obtained according to the behavior data of the instant update may reflect that the current behavior of the first account is not satisfied. Set the probability of the condition. The probability value that does not satisfy the preset condition may be a probability of default, such as non-compliance with the contracted behavior.
例如,根据用户A的行为数据来获取该用户A的信用度时,可以采集用户A在即时通信应用中的帐号聊天行为、在视频应用中观看视频的行为、下载应用的行为等。从这些行为数据中分别提取中第一特征变量,可以得到不同类别的第一特征变量。如即时通信类的第一特征变量,视频类的第一特征变量和下载类的第一特征变量,将这些不同类别的第一特征变量的全部都输入到数据分析模型中,输出第一数值。也可以采用将不同类别的第一特征变量中的部分输入到数据分析模型中。 For example, when the credit of the user A is obtained according to the behavior data of the user A, the account chat behavior of the user A in the instant messaging application, the behavior of watching the video in the video application, the behavior of downloading the application, and the like may be collected. Extracting the first feature variable from the behavior data separately can obtain the first feature variable of different categories. For example, the first feature variable of the instant communication class, the first feature variable of the video class, and the first feature variable of the download class, all of the first feature variables of the different classes are input into the data analysis model, and the first value is output. It is also possible to input portions of the first characteristic variables of different classes into the data analysis model.
通常情况下,用户A的好友与用户A具有相似点,用户A的好友的行为数据也能反映用户A的行为不满足预设条件的概率。因此,在向数据分析模型输入第一特征变量时,还可以同时输入与用户A的好友相关联的特征变量。Generally, the friend of the user A has a similarity with the user A, and the behavior data of the friend of the user A can also reflect the probability that the behavior of the user A does not satisfy the preset condition. Therefore, when the first feature variable is input to the data analysis model, the feature variable associated with the friend of the user A can also be input at the same time.
即,输入单元包括:第一获取子单元,用于获取第二特征变量,其中,第二特征变量用于表示与第一帐号具有关联关系的多个第二帐号的行为特征;输入子单元,用于将第一特征变量和第二特征变量输入至数据分析模型,其中,数据分析模型还用于根据第一特征变量和第二特征变量输出第一数值。That is, the input unit includes: a first acquiring subunit, configured to acquire a second feature variable, wherein the second feature variable is used to represent a behavior characteristic of the plurality of second account accounts having an association relationship with the first account; the input subunit, And a method for inputting the first feature variable and the second feature variable to the data analysis model, wherein the data analysis model is further configured to output the first value according to the first feature variable and the second feature variable.
获取第二特征变量的方法与获取第一特征变量的方法相同,在后文详述。具有关联关系的第一帐号和多个第二帐号为好友关系,可以理解为多个第二帐号是第一帐号的好友。在上述例子中的线上行为和线下行为都可以通过一定的对应关系映射为某个应用帐号的行为。例如,第二帐号利用手机号码注册了导航服务和即时通信应用,在获取第二帐号的行为数据时,采集第二帐号在导航服务中的行为数据,以及第二帐号在即时通信应用中的行为数据。The method of obtaining the second feature variable is the same as the method of acquiring the first feature variable, which will be described in detail later. The first account with the associated relationship and the plurality of the second account are friends, and the plurality of second accounts are friends of the first account. Both the online behavior and the offline behavior in the above example can be mapped to the behavior of an application account through a certain correspondence. For example, the second account registers the navigation service and the instant messaging application by using the mobile phone number, and collects the behavior data of the second account in the navigation service and the behavior of the second account in the instant messaging application when acquiring the behavior data of the second account. data.
进一步地,输入子单元包括:第一获取模块,用于获取多个第二帐号与第一帐号之间的亲密度,其中,亲密度根据每个第二帐号与第一帐号的互动行为生成;采用以下公式根据亲密度和第二特征变量获取第三特征变量:Further, the input subunit includes: a first obtaining module, configured to acquire a closeness between the plurality of second accounts and the first account, wherein the intimacy is generated according to an interaction behavior of each second account and the first account; The third feature variable is obtained from the intimacy and the second characteristic variable using the following formula:
υ'=f((α12,...,αi,...,αn),(υ12,...,υi,...,υn)),υ'=f((α 12 ,...,α i ,...,α n ),(υ 12 ,...,υ i ,...,υ n )),
其中,υ'表示第三特征变量,i表示第i个第二帐号,αi为第i个第二帐号与第一帐号的亲密度,υi是第i个第二帐号的第二特征变量,f为用于表示亲密度由高到低的排序中,前n个第二帐号的第二特征变量与亲密度的加权平均值;输入模块,用于将第一特征变量和第三特征变量输入至数据 分析模型。Where υ' represents the third characteristic variable, i represents the i-th second account, α i is the intimacy of the i-th second account and the first account, and υ i is the second characteristic variable of the i-th second account , f is a weighted average of the second characteristic variable and the intimacy of the first n second accounts in the order of indicating the intimacy from high to low; the input module is configured to use the first feature variable and the third feature variable Enter the data analysis model.
在本实施例中,对第二帐号的第二特征变量进行数据处理,使其更能体现第一帐号的行为特征。因此,在获取第二变量时,用每个第二特征变量乘以相应的权重值,再进行加权平均。该权重值表示第一帐号与第二帐号的亲密度。第一帐号与第二帐号越亲密,权重值越大;反之,权重值越小。亲密度可以通过第一帐号和第二帐号之间的互动来衡量,比如第一帐号和第二帐号之间聊天越多,关系越亲密。第一帐号与第二帐号的社区重合度越高,这两个帐号的关系越亲密。亲密度和重合度可以采用训练模型的方式来获取。互动行为包括朋友圈互动、支付互动(如发红包)、运动互动(如步行1万步点赞)等。亲密度可以通过上述的信息互动来体现,包括发送和接收信息的次数、天数等,还包括发送和接收信息的比例,每天进行信息互动的次数等。这些信息包括文字信息、视频信息和语音信息等。亲密度还可以通过评论、点赞,标记好友为特别好友、送礼物或者拉黑等行为来获得。In this embodiment, the second feature variable of the second account is processed to make it better reflect the behavior characteristics of the first account. Therefore, when the second variable is acquired, each second characteristic variable is multiplied by the corresponding weight value, and then the weighted average is performed. The weight value indicates the intimacy of the first account and the second account. The closer the first account is to the second account, the greater the weight value; conversely, the smaller the weight value. The intimacy can be measured by the interaction between the first account and the second account. For example, the more chats between the first account and the second account, the more intimate the relationship. The higher the degree of overlap between the first account and the second account, the more intimate the relationship between the two accounts. The intimacy and coincidence can be obtained by means of a training model. Interactive behaviors include interactions in the circle of friends, payment interactions (such as red envelopes), and sports interactions (such as walking 10,000 steps). Intimacy can be reflected by the above information interaction, including the number of times the information is sent and received, the number of days, etc., as well as the ratio of sending and receiving information, the number of times of information interaction every day. This information includes text information, video information, and voice information. Intimacy can also be obtained by commenting, praising, marking a friend as a special friend, giving a gift, or pulling black.
例如,第三特征变量
Figure PCTCN2016109729-appb-000006
即亲密度为按照由亲到疏的顺序排列的前10个好友的第二特征变量与亲密度的加权平均值。
For example, the third characteristic variable
Figure PCTCN2016109729-appb-000006
That is, the intimacy is a weighted average of the second characteristic variable and the intimacy of the top 10 friends arranged in order of intimacy.
一个群体的普遍特征可以反映这个群体中某个用户的特征,因此,可以根据一个群体的特征来获得行为不满足预设条件的概率值,可以更加准确的体现用户的信用度。需要说明的是,在获取多个第二账户的第二特征变量时,先根据第一帐号的好友中与第一帐号的亲密度,选择亲密度排名前n个第二帐号,再根据亲密度和第二特征变量生成第三特征变量。The general characteristics of a group can reflect the characteristics of a certain user in this group. Therefore, the probability value of the behavior that does not satisfy the preset condition can be obtained according to the characteristics of a group, and the credit degree of the user can be more accurately reflected. It should be noted that, when acquiring the second feature variable of the plurality of second accounts, first selecting the intimacy ranking of the first n second accounts according to the intimacy of the first account and the first account, and then according to the intimacy And generating a third feature variable with the second feature variable.
由于采集的行为数据范围比较广,获得的数据格式也不相同。因此,在得到行为数据之后,先删除异常数据,去除重复数据,过滤掉波动值较大的数据以及补全缺失的数据。其中,异常数据可以是明显超出一定范围的数据,例如,通常人的年龄不会超过一百,如果采集的数据显示年龄为 100,则删除该异常数据。如果采集到的数据显示年龄包括0和49,0和49处于0至100的范围内,然而,其他数据多数都在18至45之间,因此,0和49属于波动值较大的奇异点。Due to the wide range of behavioral data collected, the data formats obtained are also different. Therefore, after obtaining the behavior data, the abnormal data is deleted, the duplicate data is removed, the data with larger fluctuation values is filtered out, and the missing data is completed. The abnormal data may be data that is obviously beyond a certain range. For example, usually the age of the person does not exceed one hundred, if the collected data shows the age is 100, the exception data is deleted. If the collected data shows that the age includes 0 and 49, 0 and 49 are in the range of 0 to 100, however, most of the other data are between 18 and 45, so 0 and 49 belong to the singular point with a large fluctuation value.
在对行为数据进行基本的处理之后,根据数据来源和业务特点将行为数据划分为多个维度。比如,基本信息、社交互动、金融理财等,归类整合写入数据库。在写入数据库时,可以先约定好数据类型及数据结构。例如,数值的类型为int,地区名称的类型为字符串。其他形式也可,不再一一举例。After basic processing of behavioral data, behavioral data is divided into multiple dimensions based on data sources and business characteristics. For example, basic information, social interaction, financial management, etc., are classified and integrated into the database. When writing to the database, you can agree on the data type and data structure. For example, the type of the value is int and the type of the region name is a string. Other forms are also available, no longer one by one.
由于这些存储的行为数据数量庞大,且很多数据之间具有相关性,因此,需要对这些数据进行筛选,得到显著性较强的特征来输入到数据分析模型。Because of the large amount of behavioral data stored in these stores, and the correlation between many data, it is necessary to filter these data and obtain more significant features to input into the data analysis model.
a)基础属性中公务员人群工作较稳定,经济来源可靠,可以反映用户的经济能力和还款意愿;a) The civil servants in the basic attributes are more stable and have reliable economic sources, which can reflect the user's economic ability and willingness to repay;
b)社交互动中经常不及时回复消息的用户可能比较懒散,反映用户在性格上比较拖延;b) Users who often do not respond to messages in a timely manner in social interactions may be lazy, reflecting the user's delay in personality;
c)经常购买增值服务、网上购物的用户可以反映用户的经济能力;c) Users who frequently purchase value-added services and shop online can reflect the user's financial ability;
d)经济行为中股票、基金、P2P购买可以反映用户的风险承受能力和经济能力;d) Stocks, funds, and P2P purchases in economic behavior can reflect the user's risk tolerance and economic ability;
e)打了专车但又经常取消订单或者评星较低可以反映用户的信誉;e) hit the car but often cancel the order or the lower rating can reflect the user's reputation;
f)用户经常来往的朋友如果都是高素质,守约定,经济能力较强的人群,一定程度上可以反馈用户自身。f) If the users who are frequented by the users are all high-quality, adherent to the agreement, the people with strong economic ability can feedback the users themselves to a certain extent.
上述基础属性、社交互动行为、购买行为、打车行为和朋友属性都能体现第一帐号的行为特征。The above basic attributes, social interaction behavior, purchase behavior, taxi behavior and friend attributes can all reflect the behavior characteristics of the first account.
在本发明实施例一实施方式中,获取单元包括:获取子单元,用于获取行为数据中特征的信息增益,信息增益用于表示行为数据包含的信息量; 判断子单元,用于判断信息增益是否处于预设数值范围之内;构造子单元,用于在信息增益处于预设数值范围之内时,根据行为数据构造衍生变量,其中,衍生变量为合并或者拆分后的行为数据;删除子单元,用于在信息增益处于预设数值范围以外时,删除处于预设数值范围以外的信息增益对应的特征,再根据剩余的特征构造衍生变量;确定子单元,用于将衍生变量作为第一特征变量。In an embodiment of the present invention, the acquiring unit includes: an acquiring subunit, configured to acquire an information gain of a feature in the behavior data, where the information gain is used to represent the amount of information included in the behavior data; a determining subunit for determining whether the information gain is within a preset value range; constructing a subunit for constructing a derivative variable according to the behavior data when the information gain is within a preset value range, wherein the derived variable is a merge or Decomposed behavior data; delete sub-units, when the information gain is outside the preset value range, delete the feature corresponding to the information gain outside the preset value range, and then construct the derivative variable according to the remaining features; determine the sub-unit Used to use the derived variable as the first feature variable.
在本发明实施例一实施方式中,删除子单元包括:第二获取模块,用于在删除处于预设数值范围以外的信息增益对应的特征之后,获取剩余的特征的相关系数;合并模块,用于将相关系数大于等于预设系数的特征合并为一个合并特征;确定模块,用于将合并特征作为衍生变量。In an embodiment of the present invention, the deleting the subunit includes: a second acquiring module, configured to acquire a correlation coefficient of the remaining feature after deleting the feature corresponding to the information gain outside the preset numerical range; The feature that the correlation coefficient is greater than or equal to the preset coefficient is merged into one merge feature; the determining module is configured to use the merged feature as a derivative variable.
特征和特征值构成了行为数据。例如采集到的行为数据包括文本聊天次数、语音通话次数、支付金额等,都是行为数据中的特征。而例如,文本聊天9次,语音通话10次和支付金额100,其中的数字都叫做特征值。信息增益可以体现一个特征的信息量。如果信息量小于阈值,可以删除该特征。例如,对每一类特征进行信息增益的排序,将信息增益小于阈值的特征删除。然后再检测剩余的特征的相关性,如果有相关性较强的特征,则将这些相关性较强的特征合并,得到第一特征变量。如果某个特征的相关性较弱,显著性很强,则可以把这一个特征细化为多个特征。例如,将聊天次数拆分为晚上聊天次数、白天聊天次数、周末聊天次数和工作日聊天次数等。相反的,也可以把晚上聊天次数和白天聊天次数合并为聊天次数。Features and eigenvalues constitute behavioral data. For example, the collected behavior data including the number of text chats, the number of voice calls, the amount of payment, etc., are all characteristics in the behavior data. For example, text chat 9 times, voice call 10 times and payment amount 100, the numbers are called feature values. The information gain can reflect the amount of information of a feature. If the amount of information is less than the threshold, the feature can be deleted. For example, the information gain is sorted for each type of feature, and the feature with the information gain less than the threshold is deleted. Then, the correlation of the remaining features is detected. If there are more relevant features, the features with stronger correlation are combined to obtain the first feature variable. If a feature is weakly correlated and highly significant, then this feature can be refined into multiple features. For example, split the number of chats into evening chats, daytime chats, weekend chats, and weekday chats. Conversely, you can combine night chats and day chats into chats.
在本实施例,可以灵活的对行为数据进行拆分和合并来构造第一特征变量,且拆分和合并时,可以对多个特征采用相同或者不同的方法,(比如,部分特征采用主成分分析法和其它特征采用聚类法等),增加了构造第一特征变量的灵活性。 In this embodiment, the behavior data can be flexibly split and merged to construct the first feature variable, and when splitting and merging, the same or different methods can be used for multiple features (for example, some features adopt the principal component) The analysis method and other features using clustering methods, etc.) increase the flexibility of constructing the first feature variable.
在本发明实施例一实施方式中,装置还包括:划分单元,用于在将第一特征变量和第二特征变量输入至数据分析模型之前,将行为数据划分为多个类别;第一建立单元,用于分别对多个类别中的每个类别建立一个子模型,其中,每个子模型用于根据第一特征变量和/或第二特征变量输出第一子值,其中,第一子值用于表示在与子模型对应的类别下,第一帐号的行为不满足预设条件的概率值;第二建立单元,用于将多个类别对应的多个子模型构建为数据分析模型。In an embodiment of the present invention, the apparatus further includes: a dividing unit, configured to divide the behavior data into a plurality of categories before inputting the first feature variable and the second feature variable to the data analysis model; a sub-model for each of the plurality of categories, wherein each sub-model is configured to output a first sub-value according to the first characteristic variable and/or the second characteristic variable, wherein the first sub-value is used In the category corresponding to the sub-model, the behavior of the first account does not satisfy the probability value of the preset condition; and the second establishing unit is configured to construct the plurality of sub-models corresponding to the plurality of categories as the data analysis model.
在本发明实施例一实施方式中,第一建立单元包括:第一建立子单元,用于采用相同或者不同的训练模型分别对每个类别建立一个子模型;或者第二建立子单元,用于采用相同或者不同的训练模型分别对每个类别下的子类别建立低级别模型,将每个类别下的多个子类别对应的低级别模型构建为子模型。In an embodiment of the present invention, the first establishing unit includes: a first establishing subunit, configured to respectively establish a submodel for each category by using the same or different training models; or a second establishing subunit, for A low-level model is established for each subcategory under each category by using the same or different training models, and a low-level model corresponding to multiple subcategories under each category is constructed as a sub-model.
每个类别建立子模型所采用的训练模型可以相同或者不同,比如,10个类别中,5个类别采用决策树训练模型,另外5个采用神经网络训练子模型。The training models used to build sub-models in each category can be the same or different. For example, among the 10 categories, 5 categories use the decision tree training model, and the other 5 use the neural network training sub-model.
在本发明实施例一实施方式中,第二建立单元还用于采用以下方式将多个子模型构建为数据分析模型:In an embodiment of the present invention, the second establishing unit is further configured to construct multiple sub-models into a data analysis model in the following manner:
Figure PCTCN2016109729-appb-000007
Figure PCTCN2016109729-appb-000007
其中,P表示第一数值,i为多个子模型中的第i个子模型,n为多个子模型的个数,
Figure PCTCN2016109729-appb-000008
为第i个子模型的系数,Pi'为第i个子模型输出的第一子值,P0为常数。
Where P always represents the first value, i is the i-th sub-model of the plurality of sub-models, and n is the number of the plurality of sub-models,
Figure PCTCN2016109729-appb-000008
For the coefficient of the i-th sub-model, P i ' is the first sub-value output of the i-th sub-model, and P 0 is a constant.
在本发明实施例一实施方式中,划分单元包括:第一划分子单元,用于按照行为数据所包括的业务类型将行为数据划分为多个类别;或者第二划分子单元,用于将行为数据中包括目标对象的数据划分为一类,将行为 数据中不包括目标对象的数据划分为另外一类。In an embodiment of the present invention, the dividing unit includes: a first dividing subunit, configured to divide the behavior data into multiple categories according to a service type included in the behavior data; or a second dividing subunit, configured to perform the behavior Data in the data including the target object is divided into one class, which will behave The data in the data that does not include the target object is divided into another category.
在本发明实施例一实施方式中,装置还包括:转换单元,用于在记录数据分析模型输出的第一数值之后,采用以下方法将第一数值转换为第三数值S:In an embodiment of the present invention, the apparatus further includes: a converting unit, configured to convert the first value into the third value S by using the following method after recording the first value output by the data analysis model:
Figure PCTCN2016109729-appb-000009
Figure PCTCN2016109729-appb-000009
其中,S用于表示第一帐号的行为满足预设条件的程度,b表示基准数值,p表示第一数值,st表示步长。Wherein, S is used to indicate the degree to which the behavior of the first account satisfies the preset condition, b represents a reference value, p represents a first value, and st represents a step size.
本实施例获取信用度时,使用的特征全面覆盖用户的线上线下行为特征,不仅包括用户基本信息,社交互动,金融活动,兴趣爱好,生活圈子,还深入挖掘了用户的性格特点,情感倾向等更能刻画用户精神面貌和个性的稳定特征。同时,采用多层多样化的机器学习算法,兼顾可解释性的同时提高算法复杂度和预测能力,提高了评价用户的信用程序的准确性。When the credit is obtained in this embodiment, the features used comprehensively cover the online and offline behavior characteristics of the user, including not only basic user information, social interaction, financial activities, hobbies, life circles, but also deepening the user's personality characteristics, emotional inclination, etc. It is more able to characterize the stable features of the user's mental outlook and personality. At the same time, the multi-layered machine learning algorithm is adopted, which can improve the complexity and predictive ability of the algorithm while improving the accuracy of the user's credit program.
实施例3Example 3
根据本发明实施例,还提供了一种用于实施上述数据处理方法的服务器,如图5所示,该服务器主要包括处理器501、数据接口503、存储器505和网络接口507,其中:According to an embodiment of the present invention, a server for implementing the above data processing method is further provided. As shown in FIG. 5, the server mainly includes a processor 501, a data interface 503, a memory 505, and a network interface 507, where:
数据接口503则主要通过数据传输的方式将第三方工具获取的行为数据传输给处理器501。The data interface 503 transmits the behavior data acquired by the third party tool to the processor 501 mainly by means of data transmission.
存储器505主要用于存储行为数据和数据分析模型。The memory 505 is mainly used to store behavior data and data analysis models.
网络接口507主要用于与服务器进行网络通信,从其他服务器获取终端提供的行为数据。The network interface 507 is mainly used for network communication with the server, and obtains behavior data provided by the terminal from other servers.
处理器501主要用于执行如下操作:The processor 501 is mainly configured to perform the following operations:
采集第一帐号的行为数据,所述行为数据包括基于互联网的线上行为数据和线下行为数据;根据所述行为数据获取所述第一帐号的第一特征变 量,其中,所述第一特征变量用于表示所述第一帐号的行为特征;将所述第一特征变量输入数据分析模型,其中,所述数据分析模型用于根据所述第一特征变量输出第一数值,所述第一数值用于表示所述第一帐号的行为不满足预设条件的概率值;记录所述数据分析模型输出的所述第一数值。Collecting behavior data of the first account, the behavior data includes online behavior data and offline behavior data based on the Internet; and acquiring the first feature of the first account according to the behavior data And the first feature variable is used to represent a behavior characteristic of the first account; the first feature variable is input to a data analysis model, wherein the data analysis model is used according to the first feature variable And outputting a first value, where the first value is used to indicate a probability value that the behavior of the first account does not satisfy a preset condition; and the first value output by the data analysis model is recorded.
处理器501还用于获取第二特征变量,其中,所述第二特征变量用于表示与所述第一帐号具有关联关系的多个第二帐号的行为特征;将所述第一特征变量和所述第二特征变量输入至所述数据分析模型,其中,所述数据分析模型还用于根据所述第一特征变量和所述第二特征变量输出所述第一数值。The processor 501 is further configured to acquire a second feature variable, where the second feature variable is used to represent behavior characteristics of a plurality of second account accounts that have an association relationship with the first account; the first feature variable and The second characteristic variable is input to the data analysis model, wherein the data analysis model is further configured to output the first value according to the first feature variable and the second feature variable.
处理器501还用于获取所述多个第二帐号与所述第一帐号之间的亲密度,其中,所述亲密度根据每个所述第二帐号与所述第一帐号的互动行为生成;采用以下公式根据所述亲密度和所述第二特征变量获取第三特征变量:The processor 501 is further configured to acquire the intimacy between the plurality of second accounts and the first account, where the intimacy is generated according to an interaction behavior of each of the second accounts and the first account. Obtaining a third characteristic variable according to the intimacy and the second characteristic variable by using the following formula:
υ'=f((α12,...,αi,...,αn),(υ12,...,υi,...,υn)),υ'=f((α 12 ,...,α i ,...,α n ),(υ 12 ,...,υ i ,...,υ n )),
其中,υ'表示所述第三特征变量,i表示第i个第二帐号,αi为第i个所述第二帐号与所述第一帐号的亲密度,υi是第i个第二帐号的所述第二特征变量,f为用于表示所述亲密度由高到低的排序中,前n个第二帐号的所述第二特征变量与所述亲密度的加权平均值;将所述第一特征变量和所述第三特征变量输入至所述数据分析模型。Where υ' denotes the third characteristic variable, i denotes the i-th second account, α i is the intimacy of the i-th second account and the first account, υ i is the ith second The second characteristic variable of the account number, f is a weighted average value of the second characteristic variable and the intimacy of the first n second account numbers in the order of indicating the intimacy from high to low; The first feature variable and the third feature variable are input to the data analysis model.
在本发明实施例一实施方式中,本实施例中的具体示例可以参考上述实施例1和实施例2中所描述的示例,本实施例在此不再赘述。In the embodiment of the present invention, the specific examples in this embodiment may refer to the examples described in Embodiment 1 and Embodiment 2, and details are not described herein again.
实施例4Example 4
本发明的实施例还提供了一种存储介质。在本实施例中,上述存储介质可以用于存储本发明实施例的数据处理方法的程序代码。Embodiments of the present invention also provide a storage medium. In the embodiment, the above storage medium may be used to store program codes of the data processing method of the embodiment of the present invention.
在本实施例中,上述存储介质可以位于移动通信网络、广域网、城域 网或局域网的网络中的多个网络设备中的至少一个网络设备。In this embodiment, the foregoing storage medium may be located in a mobile communication network, a wide area network, or a metropolitan area. At least one of a plurality of network devices in a network of a network or a local area network.
在本实施例中,存储介质被设置为存储用于执行以下步骤的程序代码:In the present embodiment, the storage medium is arranged to store program code for performing the following steps:
S1,采集第一帐号的行为数据,所述行为数据包括基于互联网的线上行为数据和线下行为数据。S1: Collect behavior data of the first account, where the behavior data includes online behavior data and offline behavior data based on the Internet.
S2,根据所述行为数据获取所述第一帐号的第一特征变量,其中,所述第一特征变量用于表示所述第一帐号的行为特征。S2. Acquire a first feature variable of the first account according to the behavior data, where the first feature variable is used to represent a behavior feature of the first account.
S3,将所述第一特征变量输入数据分析模型,其中,所述数据分析模型用于根据所述第一特征变量输出第一数值,所述第一数值用于表示所述第一帐号的行为不满足预设条件的概率值。S3, the first feature variable is input to a data analysis model, wherein the data analysis model is configured to output a first value according to the first feature variable, where the first value is used to represent behavior of the first account The probability value that does not satisfy the preset condition.
S4,记录所述数据分析模型输出的所述第一数值。S4, recording the first value output by the data analysis model.
在本发明实施例一实施方式中,上述存储介质可以包括但不限于:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。In an embodiment of the present invention, the storage medium may include, but is not limited to, a USB flash drive, a Read-Only Memory (ROM), a Random Access Memory (RAM), a mobile hard disk, and a magnetic memory. A variety of media that can store program code, such as a disc or a disc.
在本发明实施例一实施方式中,处理器根据存储介质中已存储的程序代码执行获取第二特征变量,其中,所述第二特征变量用于表示与所述第一帐号具有关联关系的多个第二帐号的行为特征;将所述第一特征变量和所述第二特征变量输入至所述数据分析模型,其中,所述数据分析模型还用于根据所述第一特征变量和所述第二特征变量输出所述第一数值。In an embodiment of the present invention, the processor performs acquiring the second feature variable according to the stored program code in the storage medium, where the second feature variable is used to indicate that the relationship relationship with the first account is Behavioral characteristics of the second account; inputting the first feature variable and the second feature variable to the data analysis model, wherein the data analysis model is further configured to use the first feature variable and the The second characteristic variable outputs the first value.
在本发明实施例一实施方式中,处理器根据存储介质中已存储的程序代码执行获取所述多个第二帐号与所述第一帐号之间的亲密度,其中,所述亲密度根据每个所述第二帐号与所述第一帐号的互动行为生成;采用以下公式根据所述亲密度和所述第二特征变量获取第三特征变量:In an embodiment of the present invention, the processor performs, according to the stored program code in the storage medium, acquiring the intimacy between the plurality of second accounts and the first account, wherein the intimacy is according to each The interaction behavior of the second account with the first account is generated; and the third feature variable is obtained according to the intimacy and the second feature variable by using the following formula:
υ'=f((α12,...,αi,...,αn),(υ12,...,υi,...,υn)),υ'=f((α 12 ,...,α i ,...,α n ),(υ 12 ,...,υ i ,...,υ n )),
其中,υ'表示所述第三特征变量,i表示第i个第二帐号,αi为第i个所 述第二帐号与所述第一帐号的亲密度,υi是第i个第二帐号的所述第二特征变量,f为用于表示所述亲密度由高到低的排序中,前n个第二帐号的所述第二特征变量与所述亲密度的加权平均值;将所述第一特征变量和所述第三特征变量输入至所述数据分析模型。Where υ' represents the third characteristic variable, i represents the i-th second account, α i is the intimacy of the i-th second account and the first account, and υ i is the ith second The second characteristic variable of the account number, f is a weighted average value of the second characteristic variable and the intimacy of the first n second account numbers in the order of indicating the intimacy from high to low; The first feature variable and the third feature variable are input to the data analysis model.
在本发明实施例一实施方式中,具体示例可以参考上述实施例1和实施例2中所描述的示例,本实施例在此不再赘述。For an example of the embodiment of the present invention, reference may be made to the examples described in the foregoing Embodiment 1 and Embodiment 2, and details are not described herein again.
本发明实施例还提供一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,该计算机可执行指令配置为执行上述的数据处理方法。The embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are configured to execute the data processing method described above.
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the embodiments of the present invention are merely for the description, and do not represent the advantages and disadvantages of the embodiments.
上述实施例中的集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在上述计算机可读取的存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在存储介质中,包括若干指令用以使得一台或多台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。The integrated unit in the above embodiment, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in the above-described computer readable storage medium. Based on such understanding, the technical solution of the present invention may contribute to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium. A number of instructions are included to cause one or more computer devices (which may be a personal computer, server or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
在本发明的上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments of the present invention, the descriptions of the various embodiments are different, and the parts that are not detailed in a certain embodiment can be referred to the related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的客户端,可通过其它的方式实现。其中,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,单元或模块的间接耦合或 通信连接,可以是电性或其它的形式。In the several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The device embodiments described above are merely illustrative. For example, the division of the unit is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not executed. In addition, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interface, indirect coupling of the unit or module or The communication connection can be in electrical or other form.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit. The above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本发明的保护范围。The above description is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can also make several improvements and retouchings without departing from the principles of the present invention. It should be considered as the scope of protection of the present invention.
工业实用性Industrial applicability
在本发明实施例中,采用第一特征变量表征第一帐号的行为特征,第一帐号的行为特征是基于第一帐号基于互联网的行为数据获得,再将第一特征变量输入到数据分析模型后,就能得到第一帐号的行为不满足预设条件的概率值。由于第一帐号在社交应用中的行为数据能够比较广的覆盖第一帐号的行为,输入至数据分析模型中的行为数据能够全面的体现第一帐号的行为,从而使得分析出来的第一帐号的行为不满足预设条件的概率值更加准确,进而解决了无法准确获取帐号的信用级别的技术问题。 In the embodiment of the present invention, the first feature variable is used to represent the behavior characteristic of the first account, and the behavior characteristic of the first account is obtained based on the behavior data of the first account based on the Internet, and then the first feature variable is input into the data analysis model. , the probability value that the behavior of the first account does not satisfy the preset condition can be obtained. Since the behavior data of the first account in the social application can cover the behavior of the first account relatively widely, the behavior data input into the data analysis model can fully reflect the behavior of the first account, thereby causing the analysis of the first account. The probability value that the behavior does not satisfy the preset condition is more accurate, thereby solving the technical problem that the credit level of the account cannot be accurately obtained.

Claims (21)

  1. 一种数据处理方法,包括:A data processing method comprising:
    采集第一帐号的行为数据,所述行为数据包括基于互联网的线上行为数据和线下行为数据;Collecting behavior data of the first account, the behavior data including online behavior data and offline behavior data based on the Internet;
    根据所述行为数据获取所述第一帐号的第一特征变量,其中,所述第一特征变量用于表示所述第一帐号的行为特征;And acquiring, according to the behavior data, a first feature variable of the first account, where the first feature variable is used to represent a behavior feature of the first account;
    将所述第一特征变量输入数据分析模型,其中,所述数据分析模型用于根据所述第一特征变量输出第一数值,所述第一数值用于表示所述第一帐号的行为不满足预设条件的概率值;Entering the first feature variable into a data analysis model, wherein the data analysis model is configured to output a first value according to the first feature variable, where the first value is used to indicate that the behavior of the first account is not satisfied The probability value of the preset condition;
    记录所述数据分析模型输出的所述第一数值。Recording the first value of the data analysis model output.
  2. 根据权利要求1所述的方法,其中,将所述第一特征变量输入数据分析模型包括:The method of claim 1 wherein entering the first characteristic variable into the data analysis model comprises:
    获取第二特征变量,其中,所述第二特征变量用于表示与所述第一帐号具有关联关系的多个第二帐号的行为特征;Obtaining a second feature variable, wherein the second feature variable is used to represent behavior characteristics of a plurality of second account accounts having an association relationship with the first account;
    将所述第一特征变量和所述第二特征变量输入至所述数据分析模型,其中,所述数据分析模型还用于根据所述第一特征变量和所述第二特征变量输出所述第一数值。Inputting the first feature variable and the second feature variable to the data analysis model, wherein the data analysis model is further configured to output the first according to the first feature variable and the second feature variable A value.
  3. 根据权利要求2所述的方法,其中,将所述第一特征变量和所述第二特征变量输入至所述数据分析模型包括:The method of claim 2, wherein inputting the first feature variable and the second feature variable to the data analysis model comprises:
    获取所述多个第二帐号与所述第一帐号之间的亲密度,其中,所述亲密度根据每个所述第二帐号与所述第一帐号的互动行为生成;Acquiring the intimacy between the plurality of second accounts and the first account, wherein the intimacy is generated according to an interaction behavior of each of the second accounts and the first account;
    采用以下公式根据所述亲密度和所述第二特征变量获取第三特征变量:The third feature variable is obtained according to the intimacy and the second characteristic variable by using the following formula:
    υ'=f((α12,...,αi,...,αn),(υ12,...,υi,...,υn)),υ'=f((α 12 ,...,α i ,...,α n ),(υ 12 ,...,υ i ,...,υ n )),
    其中,υ'表示所述第三特征变量,i表示第i个第二帐号,αi为第i个所 述第二帐号与所述第一帐号的亲密度,υi是第i个第二帐号的所述第二特征变量,f为用于表示所述亲密度由高到低的排序中,前n个第二帐号的所述第二特征变量与所述亲密度的加权平均值;Where υ' represents the third characteristic variable, i represents the i-th second account, α i is the intimacy of the i-th second account and the first account, and υ i is the ith second The second characteristic variable of the account number, f is a weighted average value of the second characteristic variable and the intimacy of the first n second account numbers in the order of indicating the intimacy from high to low;
    将所述第一特征变量和所述第三特征变量输入至所述数据分析模型。The first feature variable and the third feature variable are input to the data analysis model.
  4. 根据权利要求1所述的方法,其中,根据所述行为数据获取所述第一帐号的第一特征变量包括:The method according to claim 1, wherein the acquiring the first feature variable of the first account according to the behavior data comprises:
    获取所述行为数据中特征的信息增益,所述信息增益用于表示所述行为数据包含的信息量;Obtaining an information gain of a feature in the behavior data, the information gain being used to represent an amount of information included in the behavior data;
    判断所述信息增益是否处于预设数值范围之内;Determining whether the information gain is within a preset value range;
    若所述信息增益处于所述预设数值范围之内,则根据所述行为数据构造衍生变量,其中,所述衍生变量为合并或者拆分后的所述行为数据;And if the information gain is within the preset value range, constructing a derivative variable according to the behavior data, wherein the derived variable is the merged or split behavior data;
    若所述信息增益处于所述预设数值范围以外,则删除处于所述预设数值范围以外的所述信息增益对应的特征,再根据剩余的特征构造所述衍生变量;If the information gain is outside the preset value range, deleting the feature corresponding to the information gain outside the preset value range, and constructing the derivative variable according to the remaining features;
    将所述衍生变量作为所述第一特征变量。The derived variable is taken as the first characteristic variable.
  5. 根据权利要求4所述的方法,其中,删除处于所述预设数值范围以外的所述信息增益对应的特征,再根据剩余的特征构造所述衍生变量包括:The method according to claim 4, wherein deleting the feature corresponding to the information gain outside the preset value range, and constructing the derived variable according to the remaining features comprises:
    在删除处于所述预设数值范围以外的所述信息增益对应的特征之后,获取所述剩余的特征的相关系数;Acquiring a correlation coefficient of the remaining features after deleting a feature corresponding to the information gain outside the preset value range;
    将所述相关系数大于等于预设系数的特征合并为一个合并特征;Combining the feature whose correlation coefficient is greater than or equal to the preset coefficient into one merge feature;
    将所述合并特征作为所述衍生变量。The merged feature is taken as the derived variable.
  6. 根据权利要求2所述的方法,其中,在将所述第一特征变量和所述第二特征变量输入至数据分析模型之前,所述方法还包括:The method of claim 2, wherein before the inputting the first feature variable and the second feature variable to a data analysis model, the method further comprises:
    将所述行为数据划分为多个类别;Dividing the behavior data into a plurality of categories;
    分别对所述多个类别中的每个类别建立一个子模型,其中,每个子模 型用于根据所述第一特征变量和/或所述第二特征变量输出第一子值,其中,所述第一子值用于表示在与所述子模型对应的类别下,所述第一帐号的行为不满足所述预设条件的概率值;Establishing a sub-model for each of the plurality of categories, wherein each sub-module The type is configured to output a first sub-value according to the first characteristic variable and/or the second characteristic variable, wherein the first sub-value is used to indicate that under the category corresponding to the sub-model, the The behavior of an account does not satisfy the probability value of the preset condition;
    将所述多个类别对应的多个子模型构建为所述数据分析模型。A plurality of sub-models corresponding to the plurality of categories are constructed as the data analysis model.
  7. 根据权利要求6所述的方法,其中,分别对所述多个类别中的每个类别建立一个子模型包括:The method of claim 6 wherein establishing a sub-model for each of the plurality of categories separately comprises:
    采用相同或者不同的训练模型分别对每个类别建立一个子模型;或者Create a submodel for each category using the same or different training models; or
    采用相同或者不同的训练模型分别对每个类别下的子类别建立低级别模型,将所述每个类别下的多个所述子类别对应的所述低级别模型构建为所述子模型。The low-level models are respectively established for the sub-categories under each category by using the same or different training models, and the low-level models corresponding to the plurality of sub-categories under each of the categories are constructed as the sub-models.
  8. 根据权利要求6所述的方法,其中,将所述多个类别对应的多个子模型构建为所述数据分析模型包括:The method of claim 6, wherein constructing the plurality of sub-models corresponding to the plurality of categories as the data analysis model comprises:
    采用以下方式将所述多个子模型构建为所述数据分析模型:The plurality of sub-models are constructed as the data analysis model in the following manner:
    Figure PCTCN2016109729-appb-100001
    Figure PCTCN2016109729-appb-100001
    其中,P表示所述第一数值,i为所述多个子模型中的第i个子模型,n为所述多个子模型的个数,
    Figure PCTCN2016109729-appb-100002
    为第i个子模型的系数,Pi'为第i个子模型输出的所述第一子值,P0为常数。
    Wherein P always represents the first value, i is an i-th sub-model of the plurality of sub-models, and n is the number of the plurality of sub-models,
    Figure PCTCN2016109729-appb-100002
    For the coefficient of the i-th sub-model, P i ' is the first sub-value output of the i-th sub-model, and P 0 is a constant.
  9. 根据权利要求6所述的方法,其中,将所述行为数据划分为多个类别包括:The method of claim 6 wherein dividing the behavioral data into a plurality of categories comprises:
    按照所述行为数据所包括的业务类型将所述行为数据划分为多个类别;或者Dividing the behavior data into a plurality of categories according to a type of service included in the behavior data; or
    将所述行为数据中包括目标对象的数据划分为一类,将所述行为数据中不包括目标对象的数据划分为另外一类。The data including the target object in the behavior data is divided into one class, and the data that does not include the target object in the behavior data is divided into another class.
  10. 根据权利要求1所述的方法,其中,在记录所述数据分析模型输出 的所述第一数值之后,所述方法还包括:The method of claim 1 wherein said data analysis model output is recorded After the first value, the method further includes:
    采用以下方法将所述第一数值转换为第三数值S:The first value is converted to a third value S by the following method:
    Figure PCTCN2016109729-appb-100003
    Figure PCTCN2016109729-appb-100003
    其中,S用于表示所述第一帐号的行为满足所述预设条件的程度,b表示基准数值,p表示所述第一数值,st表示步长。Wherein S is used to indicate the degree to which the behavior of the first account satisfies the preset condition, b represents a reference value, p represents the first value, and st represents a step size.
  11. 一种数据处理装置,包括:A data processing device comprising:
    采集单元,用于采集第一帐号的行为数据,所述行为数据包括基于互联网的线上行为数据和线下行为数据;The collecting unit is configured to collect behavior data of the first account, where the behavior data includes online behavior data and offline behavior data based on the Internet;
    获取单元,用于根据所述行为数据获取所述第一帐号的第一特征变量,其中,所述第一特征变量用于表示所述第一帐号的行为特征;An acquiring unit, configured to acquire, according to the behavior data, a first feature variable of the first account, where the first feature variable is used to represent a behavior feature of the first account;
    输入单元,用于将所述第一特征变量输入数据分析模型,其中,所述数据分析模型用于根据所述第一特征变量输出第一数值,所述第一数值用于表示所述第一帐号的行为不满足预设条件的概率值;An input unit, configured to input the first feature variable into a data analysis model, wherein the data analysis model is configured to output a first value according to the first feature variable, where the first value is used to represent the first The probability that the behavior of the account does not satisfy the preset condition;
    记录单元,用于记录所述数据分析模型输出的所述第一数值。And a recording unit, configured to record the first value output by the data analysis model.
  12. 根据权利要求11所述的装置,其中,所述输入单元包括:The apparatus of claim 11 wherein said input unit comprises:
    第一获取子单元,用于获取第二特征变量,其中,所述第二特征变量用于表示与所述第一帐号具有关联关系的多个第二帐号的行为特征;a first acquiring sub-unit, configured to acquire a second feature variable, where the second feature variable is used to represent behavior characteristics of a plurality of second account accounts that are associated with the first account;
    输入子单元,用于将所述第一特征变量和所述第二特征变量输入至所述数据分析模型,其中,所述数据分析模型还用于根据所述第一特征变量和所述第二特征变量输出所述第一数值。Input subunits for inputting the first feature variable and the second feature variable to the data analysis model, wherein the data analysis model is further configured to use the first feature variable and the second The feature variable outputs the first value.
  13. 根据权利要求12所述的装置,其中,所述输入子单元包括:The apparatus of claim 12 wherein said input subunit comprises:
    第一获取模块,用于获取所述多个第二帐号与所述第一帐号之间的亲密度,其中,所述亲密度根据每个所述第二帐号与所述第一帐号的互动行为生成; a first obtaining module, configured to acquire intimacy between the plurality of second accounts and the first account, wherein the intimacy is based on an interaction behavior between each of the second accounts and the first account Generate
    计算模块,用于采用以下公式根据所述亲密度和所述第二特征变量获取第三特征变量:a calculation module, configured to acquire a third characteristic variable according to the intimacy and the second characteristic variable by using the following formula:
    υ'=f((α12,...,αi,...,αn),(υ12,...,υi,...,υn)),υ'=f((α 12 ,...,α i ,...,α n ),(υ 12 ,...,υ i ,...,υ n )),
    其中,υ'表示所述第三特征变量,i表示第i个第二帐号,αi为第i个所述第二帐号与所述第一帐号的亲密度,υi是第i个第二帐号的所述第二特征变量,f为用于表示所述亲密度由高到低的排序中,前n个第二帐号的所述第二特征变量与所述亲密度的加权平均值;Where υ' denotes the third characteristic variable, i denotes the i-th second account, α i is the intimacy of the i-th second account and the first account, υ i is the ith second The second characteristic variable of the account number, f is a weighted average value of the second characteristic variable and the intimacy of the first n second account numbers in the order of indicating the intimacy from high to low;
    输入模块,用于将所述第一特征变量和所述第三特征变量输入至所述数据分析模型。And an input module, configured to input the first feature variable and the third feature variable to the data analysis model.
  14. 根据权利要求11所述的装置,其中,所述获取单元包括:The apparatus of claim 11, wherein the obtaining unit comprises:
    获取子单元,用于获取所述行为数据中特征的信息增益,所述信息增益用于表示所述行为数据包含的信息量;Obtaining a subunit, configured to acquire an information gain of a feature in the behavior data, where the information gain is used to represent an amount of information included in the behavior data;
    判断子单元,用于判断所述信息增益是否处于预设数值范围之内;a determining subunit, configured to determine whether the information gain is within a preset value range;
    构造子单元,用于在所述信息增益处于所述预设数值范围之内时,根据所述行为数据构造衍生变量,其中,所述衍生变量为合并或者拆分后的所述行为数据;Constructing a subunit, configured to construct a derived variable according to the behavior data when the information gain is within the preset value range, wherein the derived variable is the merged or split behavior data;
    删除子单元,用于在所述信息增益处于所述预设数值范围以外时,删除处于所述预设数值范围以外的所述信息增益对应的特征,再根据剩余的特征构造所述衍生变量;And deleting a sub-unit, configured to delete a feature corresponding to the information gain that is outside the preset value range when the information gain is outside the preset value range, and then construct the derivative variable according to the remaining features;
    确定子单元,用于将所述衍生变量作为所述第一特征变量。Determining a subunit for using the derived variable as the first characteristic variable.
  15. 根据权利要求14所述的装置,其中,所述删除子单元包括:The apparatus of claim 14, wherein the deleting subunit comprises:
    第二获取模块,用于在删除处于所述预设数值范围以外的所述信息增益对应的特征之后,获取所述剩余的特征的相关系数;a second acquiring module, configured to acquire a correlation coefficient of the remaining feature after deleting a feature corresponding to the information gain that is outside the preset value range;
    合并模块,用于将所述相关系数大于等于预设系数的特征合并为一个合并特征; a merging module, configured to combine the feature whose correlation coefficient is greater than or equal to the preset coefficient into one merge feature;
    确定模块,用于将所述合并特征作为所述衍生变量。A determination module is used to use the merged feature as the derived variable.
  16. 根据权利要求12所述的装置,其中,所述装置还包括:The device of claim 12, wherein the device further comprises:
    划分单元,用于在将所述第一特征变量和所述第二特征变量输入至数据分析模型之前,将所述行为数据划分为多个类别;a dividing unit, configured to divide the behavior data into a plurality of categories before inputting the first feature variable and the second feature variable to a data analysis model;
    第一建立单元,用于分别对所述多个类别中的每个类别建立一个子模型,其中,每个子模型用于根据所述第一特征变量和/或所述第二特征变量输出第一子值,其中,所述第一子值用于表示在与所述子模型对应的类别下,所述第一帐号的行为不满足所述预设条件的概率值;a first establishing unit, configured to respectively establish a sub-model for each of the plurality of categories, wherein each sub-model is configured to output a first according to the first characteristic variable and/or the second characteristic variable a sub-value, wherein the first sub-value is used to indicate a probability value that the behavior of the first account does not satisfy the preset condition under a category corresponding to the sub-model;
    第二建立单元,用于将所述多个类别对应的多个子模型构建为所述数据分析模型。a second establishing unit, configured to construct a plurality of sub-models corresponding to the plurality of categories as the data analysis model.
  17. 根据权利要求16所述的装置,其中,第一建立单元包括:The apparatus of claim 16, wherein the first establishing unit comprises:
    第一建立子单元,用于采用相同或者不同的训练模型分别对每个类别建立一个子模型;或者a first establishing sub-unit for establishing a sub-model for each category using the same or different training models; or
    第二建立子单元,用于采用相同或者不同的训练模型分别对每个类别下的子类别建立低级别模型,将所述每个类别下的多个所述子类别对应的所述低级别模型构建为所述子模型。a second establishing sub-unit, configured to establish a low-level model for each sub-category under each category by using the same or different training models, and the low-level model corresponding to the plurality of sub-categories under each category Constructed as the submodel.
  18. 根据权利要求16所述的装置,其中,所述第二建立单元还用于采用以下方式将所述多个子模型构建为所述数据分析模型:The apparatus according to claim 16, wherein the second establishing unit is further configured to construct the plurality of sub-models into the data analysis model in the following manner:
    Figure PCTCN2016109729-appb-100004
    Figure PCTCN2016109729-appb-100004
    其中,P表示所述第一数值,i为所述多个子模型中的第i个子模型,n为所述多个子模型的个数,
    Figure PCTCN2016109729-appb-100005
    为第i个子模型的系数,Pi'为第i个子模型输出的所述第一子值,P0为常数。
    Wherein P always represents the first value, i is an i-th sub-model of the plurality of sub-models, and n is the number of the plurality of sub-models,
    Figure PCTCN2016109729-appb-100005
    For the coefficient of the i-th sub-model, P i ' is the first sub-value output of the i-th sub-model, and P 0 is a constant.
  19. 根据权利要求16所述的装置,其中,所述划分单元包括:The apparatus of claim 16, wherein the dividing unit comprises:
    第一划分子单元,用于按照所述行为数据所包括的业务类型将所述行 为数据划分为多个类别;或者a first dividing subunit, configured to use the line according to a type of service included in the behavior data Divide data into multiple categories; or
    第二划分子单元,用于将所述行为数据中包括目标对象的数据划分为一类,将所述行为数据中不包括目标对象的数据划分为另外一类。The second dividing subunit is configured to divide the data including the target object in the behavior data into one class, and divide the data in the behavior data that does not include the target object into another class.
  20. 根据权利要求11所述的装置,其中,所述装置还包括:The apparatus of claim 11 wherein said apparatus further comprises:
    转换单元,用于在记录所述数据分析模型输出的所述第一数值之后,采用以下方法将所述第一数值转换为第三数值S:a converting unit, configured to convert the first value into a third value S after recording the first value output by the data analysis model:
    Figure PCTCN2016109729-appb-100006
    Figure PCTCN2016109729-appb-100006
    其中,S用于表示所述第一帐号的行为满足所述预设条件的程度,b表示基准数值,p表示所述第一数值,st表示步长。Wherein S is used to indicate the degree to which the behavior of the first account satisfies the preset condition, b represents a reference value, p represents the first value, and st represents a step size.
  21. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,该计算机可执行指令配置为执行权利要求1所述的数据处理方法。 A computer storage medium having stored therein computer executable instructions configured to perform the data processing method of claim 1.
PCT/CN2016/109729 2016-05-25 2016-12-13 Data processing method and device, and computer storage medium WO2017202006A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610354926.X 2016-05-25
CN201610354926.XA CN106056444A (en) 2016-05-25 2016-05-25 Data processing method and device

Publications (1)

Publication Number Publication Date
WO2017202006A1 true WO2017202006A1 (en) 2017-11-30

Family

ID=57174694

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/109729 WO2017202006A1 (en) 2016-05-25 2016-12-13 Data processing method and device, and computer storage medium

Country Status (2)

Country Link
CN (1) CN106056444A (en)
WO (1) WO2017202006A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110874645A (en) * 2019-11-14 2020-03-10 北京首汽智行科技有限公司 Data reduction method
CN111178934A (en) * 2019-11-29 2020-05-19 北京深演智能科技股份有限公司 Method and device for acquiring target object
CN111539532A (en) * 2020-04-01 2020-08-14 深圳市魔数智擎人工智能有限公司 Model construction-oriented automatic feature derivation method
CN111598159A (en) * 2020-05-14 2020-08-28 清华大学 Training method, device, equipment and storage medium of machine learning model
CN111652259A (en) * 2019-04-16 2020-09-11 上海铼锶信息技术有限公司 Method and system for cleaning data
CN112883689A (en) * 2020-11-27 2021-06-01 苏宁消费金融有限公司 Processing method of credit investigation second generation credit report finger derivative variable
US11436430B2 (en) 2017-02-13 2022-09-06 Tencent Technology (Shenzhen) Company Limited Feature information extraction method, apparatus, server cluster, and storage medium

Families Citing this family (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056444A (en) * 2016-05-25 2016-10-26 腾讯科技(深圳)有限公司 Data processing method and device
CN108066990B (en) * 2016-11-18 2021-01-22 腾讯科技(深圳)有限公司 Method, device and server for selecting user from candidate user list
CN106775268B (en) * 2016-11-28 2020-10-30 浙江翼信科技有限公司 Message display method and device
CN108322317B (en) * 2017-01-16 2022-07-29 腾讯科技(深圳)有限公司 Account identification association method and server
CN108280757B (en) * 2017-02-13 2021-08-17 腾讯科技(深圳)有限公司 User credit evaluation method and device
CN106683680B (en) * 2017-03-10 2022-03-25 百度在线网络技术(北京)有限公司 Speaker recognition method and device, computer equipment and computer readable medium
CN108734565B (en) * 2017-04-14 2020-11-17 腾讯科技(深圳)有限公司 Credit investigation point real-time adjustment processing method and device and processing server
CN108805689A (en) * 2017-04-26 2018-11-13 腾讯科技(深圳)有限公司 A kind of loan risk evaluation control method and device
CN108510071B (en) * 2017-05-10 2020-01-10 腾讯科技(深圳)有限公司 Data feature extraction method and device and computer readable storage medium
CN107871286A (en) * 2017-07-20 2018-04-03 上海前隆信息科技有限公司 User is with contacting human world cohesion decision method/system, storage medium and equipment
CN109427010B (en) * 2017-08-31 2022-05-27 腾讯科技(深圳)有限公司 Communication fee overdraft quota allocation method, device, storage medium and computer equipment
CN107730283A (en) * 2017-11-03 2018-02-23 中国银行股份有限公司 A kind of reference method and device of medium-sized and small enterprises
CN109829593B (en) * 2017-11-23 2023-05-16 广州腾讯科技有限公司 Credit determining method and device for target object, storage medium and electronic device
CN109871514B (en) * 2017-12-05 2022-11-04 财付通支付科技有限公司 Data processing method, device and storage medium
CN108280759A (en) * 2018-01-17 2018-07-13 深圳市和讯华谷信息技术有限公司 Air control model optimization method, terminal and computer readable storage medium
CN109191185A (en) * 2018-08-15 2019-01-11 深圳市和讯华谷信息技术有限公司 A kind of visitor's heap sort method and system
TWI709923B (en) * 2018-10-03 2020-11-11 臺灣土地銀行股份有限公司 Behavioral model credit assessment system
CN109657793B (en) * 2018-12-26 2020-09-22 广州小狗机器人技术有限公司 Model training method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493913A (en) * 2008-01-23 2009-07-29 阿里巴巴集团控股有限公司 Method and system for assessing user credit in internet
KR20140047863A (en) * 2012-10-15 2014-04-23 주식회사 우리은행 Method of estimating credit statis, server performing the same and system performing the same
CN104866969A (en) * 2015-05-25 2015-08-26 百度在线网络技术(北京)有限公司 Personal credit data processing method and device
CN105225149A (en) * 2015-09-07 2016-01-06 腾讯科技(深圳)有限公司 A kind of reference scoring defining method and device
CN105243566A (en) * 2015-10-28 2016-01-13 联动优势科技有限公司 Method and apparatus for evaluating credit of users through different mobile phone number information from operators
CN106056444A (en) * 2016-05-25 2016-10-26 腾讯科技(深圳)有限公司 Data processing method and device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880691B (en) * 2012-09-19 2015-08-19 北京航空航天大学深圳研究院 A kind of mixing commending system based on user's cohesion and method
CN105389714B (en) * 2015-10-23 2022-07-05 北京慧辰资道资讯股份有限公司 Method for identifying user characteristics from behavior data
CN105302911B (en) * 2015-11-10 2018-12-21 珠海多玩信息技术有限公司 A kind of data screening engine method for building up and data screening engine

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101493913A (en) * 2008-01-23 2009-07-29 阿里巴巴集团控股有限公司 Method and system for assessing user credit in internet
KR20140047863A (en) * 2012-10-15 2014-04-23 주식회사 우리은행 Method of estimating credit statis, server performing the same and system performing the same
CN104866969A (en) * 2015-05-25 2015-08-26 百度在线网络技术(北京)有限公司 Personal credit data processing method and device
CN105225149A (en) * 2015-09-07 2016-01-06 腾讯科技(深圳)有限公司 A kind of reference scoring defining method and device
CN105243566A (en) * 2015-10-28 2016-01-13 联动优势科技有限公司 Method and apparatus for evaluating credit of users through different mobile phone number information from operators
CN106056444A (en) * 2016-05-25 2016-10-26 腾讯科技(深圳)有限公司 Data processing method and device

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11436430B2 (en) 2017-02-13 2022-09-06 Tencent Technology (Shenzhen) Company Limited Feature information extraction method, apparatus, server cluster, and storage medium
CN111652259A (en) * 2019-04-16 2020-09-11 上海铼锶信息技术有限公司 Method and system for cleaning data
CN111652259B (en) * 2019-04-16 2024-03-08 上海铼锶信息技术有限公司 Method and system for cleaning data
CN110874645A (en) * 2019-11-14 2020-03-10 北京首汽智行科技有限公司 Data reduction method
CN111178934A (en) * 2019-11-29 2020-05-19 北京深演智能科技股份有限公司 Method and device for acquiring target object
CN111178934B (en) * 2019-11-29 2024-03-08 北京深演智能科技股份有限公司 Method and device for acquiring target object
CN111539532A (en) * 2020-04-01 2020-08-14 深圳市魔数智擎人工智能有限公司 Model construction-oriented automatic feature derivation method
CN111598159A (en) * 2020-05-14 2020-08-28 清华大学 Training method, device, equipment and storage medium of machine learning model
CN111598159B (en) * 2020-05-14 2024-04-26 清华大学 Training method, device, equipment and storage medium of machine learning model
CN112883689A (en) * 2020-11-27 2021-06-01 苏宁消费金融有限公司 Processing method of credit investigation second generation credit report finger derivative variable

Also Published As

Publication number Publication date
CN106056444A (en) 2016-10-26

Similar Documents

Publication Publication Date Title
WO2017202006A1 (en) Data processing method and device, and computer storage medium
US11164105B2 (en) Intelligent recommendations implemented by modelling user profile through deep learning of multimodal user data
CN110313009B (en) Method and system for adjusting trust score of second entity for requesting entity
CN109285075B (en) Claims risk assessment method and device and server
US11995112B2 (en) System and method for information recommendation
TWI599901B (en) Method and system for updating a trust score
WO2018188576A1 (en) Resource pushing method and device
US10127522B2 (en) Automatic profiling of social media users
IL261207A (en) Method and system for searching for entities based on trust score and geography
US20180285936A1 (en) Intelligent visual object management system
US11177937B1 (en) Apparatus and method for establishing trust of anonymous identities
US20180033027A1 (en) Interactive user-interface based analytics engine for creating a comprehensive profile of a user
WO2019084922A1 (en) Information processing method and system, server, terminal and computer storage medium
CN111177473B (en) Personnel relationship analysis method, device and readable storage medium
US11704495B2 (en) Prediction of film success-quotient
US10497045B2 (en) Social network data processing and profiling
US11436446B2 (en) Image analysis enhanced related item decision
Masyutin Credit scoring based on social network data
CN114491255A (en) Recommendation method, system, electronic device and medium
CN109829593B (en) Credit determining method and device for target object, storage medium and electronic device
WO2019242453A1 (en) Information processing method and device, storage medium, and electronic device
US11373057B2 (en) Artificial intelligence driven image retrieval
US20230262012A1 (en) Understanding social media user behavior
US10762154B2 (en) Relative weighting for social collaboration comments
CN117795502A (en) Evolution of topics in messaging systems

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16902990

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 16902990

Country of ref document: EP

Kind code of ref document: A1