Disclosure of Invention
An object of the present invention is to provide a behavior data evaluation method and apparatus, so as to solve the problem of automatically identifying the cause of the interruption of the transaction by the user according to the user behavior data.
According to a first aspect of the present invention, there is provided a behavioural data assessment method, the method comprising:
extracting behavior characteristic data of a user for an application program according to the operation of the user on the application program;
inputting the behavior characteristic data into a pre-established data analysis model for matching, and obtaining matching result information;
and generating behavior data evaluation information of the user according to the matching result information.
Further, the method according to the first aspect of the present invention further comprises:
determining sample feature data;
calculating the contribution degree of the sample characteristic data according to a preset contribution degree algorithm;
screening sample characteristic data meeting a contribution degree condition according to the contribution degree of each sample characteristic data to be used as a modulus entering variable;
training the modulus-entering variable to construct the data analysis model.
Further, the method according to the first aspect of the present invention further comprises:
determining whether a specified feature data amount in the sample feature data meets a preset number condition;
if yes, when the sample characteristic data can determine the corresponding behavior meanings, carrying out derivative processing on the sample characteristic data;
if not, carrying out data integration processing on the sample characteristic data according to the service type, and executing the step of determining whether the data quantity in the sample characteristic data meets the preset quantity condition.
Further, the method according to the first aspect of the present invention further comprises:
when the data quantity in the sample characteristic data meets the preset quantity condition, if the corresponding behavior meaning cannot be determined according to the sample characteristic data, carrying out vector product transformation processing according to a plurality of sample characteristic data in the same service scene so as to determine the behavior meaning corresponding to the sample characteristic data.
Further, the method according to the first aspect of the present invention further comprises:
the preset contribution degree algorithm is as follows:
information entropy
Sample characteristic gain
S is a sample set, p+ is the probability of a high-security user, p-is the probability of a low-security user, entropy (S) is the information Entropy of the sample set, p1 is the ratio of the number of users in the sample set, in which a specified behavior feature occurs, to the total number of users in the sample set, entropy (1) is the information Entropy of a group in which the specified behavior feature occurs, p2 is the ratio of the number of users in the sample set, in which the specified behavior feature does not occur, to the total number of users in the sample set, and Entropy (2) is the information Entropy of the group in which the specified behavior feature does not occur.
Further, according to the method of the first aspect of the present invention, screening sample feature data satisfying a contribution condition according to the contribution of each sample feature data as a modulus-in variable includes:
sorting the contribution degree of the characteristic data of each sample in a descending order;
and selecting the sample characteristic data of the preset quantity which is arranged at the forefront as a modulus entering variable.
Further, the method according to the first aspect of the present invention further comprises:
if the continuous variable in the modulus variable has a missing value, supplementing the missing value to the continuous variable;
dividing the modulus-entering variable subjected to missing value supplementation into a training set and a testing set according to a preset proportion, and carrying out test evaluation on training set data through testing set data.
According to a second aspect of the present invention, there is provided a behavioural data assessment device comprising:
the data extraction module is used for extracting behavior characteristic data of a user for the application program according to the operation of the user on the application program;
the data matching module is used for inputting the behavior characteristic data into a pre-established data analysis model for matching and obtaining matching result information;
and the data evaluation module is used for generating behavior data evaluation information of the user according to the matching result information.
According to a third aspect of the present invention there is also provided a storage device storing computer program instructions for execution in accordance with the method of the first or second aspect of the present invention.
According to a fourth aspect of the present invention there is also provided a computing device comprising: a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the computing device to perform the method of the first or second aspect of the invention.
According to the behavior data evaluation method and device, the behavior characteristic data of the user for the application program are input into the data analysis model to be matched so as to obtain the corresponding behavior data evaluation information, the behaviors of the whole user can be analyzed based on mass data, so that the key reasons affecting the user transaction safety feeling can be rapidly and accurately positioned by rapidly evaluating the user transaction safety feeling, hidden information can be mined from the behavior characteristics of the user through the data analysis model based on the machine learning technology, and the accuracy is high.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings.
In one exemplary configuration of the invention, the terminal, the devices of the services network each include one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer-readable media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, program devices, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by a computing device.
Fig. 1 is a flow chart of a behavior data evaluation method according to a first embodiment of the present invention, as shown in fig. 1, where the behavior data evaluation method according to the first embodiment of the present invention is used in a behavior data evaluation device, and the device may be a server, a computer, or the like, and the method includes:
step S101, according to the operation of a user on an application program, behavior characteristic data of the user on the application program is extracted.
Specifically, the Application (APP) may be various applications such as social and shopping, and particularly a payment Application having a payment function and the like, which has a high security requirement. The user's operation on the application includes user click-to-trigger behavior on any application function in the application, such as: opening an application program, clicking a personal center module in the application program, clicking an exit personal center module, clicking a bill button, exiting bill browsing, setting functions and the like. The behavior characteristic data can comprise the behavior characteristic data of any operation, and can also comprise data such as interaction logs with a server side.
Step S102, inputting the behavior characteristic data into a pre-established data analysis model for matching, and obtaining matching result information.
In particular, the data analysis model may be trained by machine learning, e.g., it may be built by GBDT (Gradient Boosting Decision Tree, gradient enhanced decision tree) classification modeling. According to different application scenes, the security assessment can be performed on the behavior characteristics of a single user, and the security assessment can also be performed on the behavior characteristic data of all users or part of users, wherein the behavior characteristic data used for assessment can be a data set corresponding to all operations of the application program by the users within a preset condition range, such as: the data set corresponding to all the operations of the application program at each time/this time by the user can be the data set corresponding to all the operations of the application program used by the user in a preset time period. The behavior analysis model obtained through training can be used for inputting behavior characteristic data corresponding to a user to be evaluated into the pre-established data analysis model in a characteristic matching mode when the user needs to be evaluated for safety feeling in some application scenes so as to match modulus-entering variables in the data analysis model and output corresponding matching result information.
And step S103, generating behavior data evaluation information of the user according to the matching result information.
The behavior data evaluation information can reflect the transaction security sense of the user using the application program, further can rapidly locate key reasons influencing the transaction security sense of the user, optimizes the application program or related services according to the reasons, and provides decision support for personalized recommendation of the security product.
Fig. 2 is a flow chart of a behavior data evaluation method according to a second embodiment of the present invention, as shown in fig. 2, where the behavior data evaluation method according to the second embodiment of the present invention is used in a behavior data evaluation device, and the method includes:
step S201, determining sample characteristic data;
fig. 3 is a flow chart of a behavioral data evaluation method according to a second embodiment of the present invention, as shown in fig. 3, step S201 may include the following steps S2011-S2016:
step S2011, sample characteristic data are obtained;
and traversing operations of the user on the behavior track of the application program and the like through the client according to account information of each user so as to obtain sample behavior characteristics serving as sampling data. The service scene to which the sampling behavior feature belongs can be determined by accessing user setting information stored in the server, wherein the user setting information comprises setting options of a specific function in an application program, such as payment code setting options and the like, and the service attribute of the sampling feature is described through the service scene to which the user setting information belongs; and describing the result attribute of the sampling characteristic through exposure, clicking and interaction log with the server. And carrying out optimization processing on sample feature data of the user by combining the service scene, the service attribute and the result attribute, wherein the optimization processing comprises the following steps:
step S2012, determining whether the specified feature data amount in the sample feature data satisfies a preset number of conditions;
to avoid the presence of individual sporadic characteristic behaviors in the collected samples, it is necessary to determine whether each sample behavior feature meets a preset number of conditions, which may be set according to the actual application requirements, for example, a certain specified sample feature may be set to be one thousandth of the full user behavior features. If it is determined that the behavior characteristics of each sample meet the preset number of conditions, step S2013 is executed; if not, step S2015 is performed.
Step S2013, judging whether corresponding behavior meanings can be determined according to the sample characteristic data;
business meaning is used to characterize specific business content, e.g., business meaning may be clicking on exit login account, etc. If it is determined that the corresponding behavior meaning can be determined according to the sample feature data, step S2014 is performed; if the corresponding behavior meaning cannot be determined from the sample feature data, step S2016 is performed.
And step S2014, performing derivatization processing on the sample characteristic data.
Specifically, convergence of the training network can be quickened through normalization processing, and sample characteristic data are derivative processed through Boolean value change and a telescopic window to analyze corresponding behavior characteristics.
Step S2015, the data integration process is performed on the sample feature data according to the service attribute, and step S2012 is continuously performed.
In step S2016, a vector product transformation process is performed according to a plurality of sample feature data in the same service scenario, so as to determine a behavior meaning corresponding to the sample feature data, and step S2013 is continuously performed.
If the corresponding behavior meaning cannot be determined according to the sample feature data, acquiring a plurality of behavior features under the same service scene, performing transformation processing on the vector products of the behavior features under the same service scene, constructing a new variable, and continuing to execute step S2013 until the sample feature data can express the corresponding service meaning.
Step S202, calculating the contribution degree of each sample characteristic data in the sample characteristic data according to a preset contribution degree algorithm;
specifically, the preset contribution algorithm may be an IG (Information Gain) algorithm, and the IG algorithm is used to calculate whether each behavior feature has an Information Gain for distinguishing the user transaction security perception evaluation, and determine the contribution of each sample feature data according to the Information Gain. The algorithm formula is as follows:
information entropy
Behavior feature gain
S is a sample set, p+ is the probability of a high-security user, p-is the probability of a low-security user, entropy (S) is the information Entropy of the sample set, p1 is the ratio of the number of users in the sample set, in which a specified behavior feature occurs, to the total number of users in the sample set, entropy (1) is the information Entropy of a group in which the specified behavior feature occurs, p2 is the ratio of the number of users in the sample set, in which the specified behavior feature does not occur, to the total number of users in the sample set, and Entropy (2) is the information Entropy of the group in which the specified behavior feature does not occur.
Taking a face and body switch (face_off) of a user as an example, pi+ is the probability of a high-security user in a face_off crowd, pi+ is the probability of a high-security user in a non_face_off crowd, and the information entropy of two groups with or without the behavior feature is respectively as follows:
according to the method, the contribution degree of the action of closing the face kernel switch by the user to distinguishing the transaction security of the user is calculated as follows:
wherein, entropy (S) is the information Entropy of the sample set, p (face_off) is the ratio of the number of users in the sample set who close the face and body switch to the total number of users in the sample set, entropy (face_off) is the information Entropy of the group who close the face and body switch, p (not_face_off) is the ratio of the number of users in the sample set who do not close the face and body switch to the total number of users in the sample set, and Entropy (not_face_off) is the information Entropy of the group who do not close the face and body switch.
Assuming { a1, a2 … an } is the set of all the behavior features in step 1, gain (i) i= … n is the information Gain of all the single-row variables to the whole, and the information Gain is used to represent the contribution degree of each sample feature data.
Step S203, sample characteristic data meeting the contribution degree condition is screened out according to the contribution degree of each sample characteristic data to be used as a modulus entering variable;
and determining the modulo variable which has more reference significance for the security assessment by sequencing the sample characteristic data according to the contribution degree. Specifically, the sample feature data may be sorted in a descending order according to the contribution degree of each sample feature data, and a preset number of sample feature data with the highest sorting degree that can be considered to be significant is selected as the modulus-in variable, where the preset number of values may be determined according to the actual situation.
And step S204, training the modeling variable to construct the data analysis model.
The data analysis model can be constructed in various modes in the embodiment of the invention, such as GBDT two-class modeling mode. The modulo variable may be set to a certain amount of black samples (the number of black samples may be set to 7% of the recovery rate when a questionnaire is issued, for example), and the black samples are data collected by investigation by a user with low security, and data corresponding to a security loss behavior is explicitly indicated by the user by various means such as telephone. For example, if the preset number of modulo variables is 55, it may be set to include 7 user demographic variables, 30 behavioral boolean variables and 16 derivative variables. If the continuous variable in the modulus-entering variable has a missing value, the missing value of the continuous variable can be supplemented in a mean filling mode, the variable with the largest correlation with the missing value variable is searched to divide the data into a plurality of groups, then the mean value of each group is calculated respectively, the position of the missing value is filled in by the mean value to serve as the value of the missing value, and further the distribution of the data is improved to a certain extent. In order to improve the algorithm effect, sample characteristic data can be further subjected to box division processing, namely: dividing the modulus-entering variable after the missing value supplementation into a training set and a testing set according to a preset proportion, wherein the dividing proportion of the training set and the testing set can be 7:3, generating a weak classifier T (x; (sum); (m); (x; (sum); (m)) T (x; (sum); (m)) through multiple iterations, and training the training set on the basis of the residual error of the weak classifier of the previous round by each weak classifier, wherein the training model can be described as follows:
Fm(x)=∑m=1MT(x;θm)
loss function of weak classifier:
θ^m=argminθm∑i=1NL(yi,Fm−1(xi)+T(xi;θm))
the loss function is reduced along the gradient direction, and the negative gradient of the loss function under the current model is fitted through each iteration, so that the loss function can be reduced through each training, and the overall optimal solution can be converged as soon as possible. And then, testing and evaluating the training set data through the testing set data, so as to verify the accuracy of the security assessment.
According to the invention, the safety of the whole users can be comprehensively and rapidly evaluated through the data analysis model, and the individual accounts are respectively scored, so that the granularity of the evaluation is refined. And meanwhile, key reasons influencing the transaction security of the user can be rapidly positioned, and the application program or related service is optimized according to the reasons, so that decision support is provided for personalized recommendation of the security product.
Fig. 4 is a schematic structural diagram of a behavioral data evaluation apparatus according to a third embodiment of the present invention, as shown in fig. 4, including: a data extraction module 41, a data matching module 42 and a data evaluation module 43.
A data extraction module 41, configured to extract behavior feature data of a user for an application according to an operation of the user for the application;
the data matching module 42 is configured to input the behavior feature data into a pre-established data analysis model for matching, and obtain matching result information;
and the data evaluation module 43 is used for generating behavior data evaluation information of the user according to the matching result information.
The behavior data evaluation device in the third embodiment of the present invention is an implementation device of the behavior data evaluation method shown in fig. 1, and specifically, reference may be made to the embodiment of fig. 1, which is not described herein again.
Embodiments of the present invention also provide a storage device storing computer program instructions for execution in accordance with the methods of the present invention shown in fig. 1-3.
The embodiment of the invention also provides a computing device, which comprises: a memory for storing computer program instructions and a processor for executing the computer program instructions, wherein the computer program instructions, when executed by the processor, trigger the computing device to perform the method of the invention as shown in fig. 1 to 3.
Furthermore, some embodiments of the present invention provide a computer readable medium having stored thereon computer program instructions executable by a processor to implement the methods and/or aspects of the various embodiments of the present invention described above.
It should be noted that the present invention may be implemented in software and/or a combination of software and hardware, e.g., using Application Specific Integrated Circuits (ASIC), a general purpose computer or any other similar hardware device. In some embodiments, the software program of the present invention may be executed by a processor to implement the above steps or functions. Likewise, the software programs of the present invention (including associated data structures) may be stored on a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. In addition, some steps or functions of the present invention may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is evident that the word "comprising" does not exclude other elements or steps, and that the singular does not exclude a plurality. A plurality of units or means recited in the apparatus claims can also be implemented by means of one unit or means in software or hardware. The terms first, second, etc. are used to denote a name, but not any particular order.