CN114064440A - Training method of credibility analysis model, credibility analysis method and related device - Google Patents

Training method of credibility analysis model, credibility analysis method and related device Download PDF

Info

Publication number
CN114064440A
CN114064440A CN202210051516.3A CN202210051516A CN114064440A CN 114064440 A CN114064440 A CN 114064440A CN 202210051516 A CN202210051516 A CN 202210051516A CN 114064440 A CN114064440 A CN 114064440A
Authority
CN
China
Prior art keywords
behavior
data set
feature
training
analysis model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210051516.3A
Other languages
Chinese (zh)
Inventor
刘洋
陈爱明
蔡忠伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hundsun Technologies Inc
Original Assignee
Hundsun Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hundsun Technologies Inc filed Critical Hundsun Technologies Inc
Priority to CN202210051516.3A priority Critical patent/CN114064440A/en
Publication of CN114064440A publication Critical patent/CN114064440A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3452Performance evaluation by statistical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/10Protecting distributed programs or content, e.g. vending or licensing of copyrighted material ; Digital rights management [DRM]
    • G06F21/12Protecting executable software

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Multimedia (AREA)
  • Technology Law (AREA)
  • Computer Security & Cryptography (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a training method of a credibility analysis model, a credibility analysis method and a related device, wherein the method comprises the following steps: acquiring behavior characteristic data sets corresponding to a plurality of users respectively; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature; determining confidence intervals corresponding to the behavior features according to all feature values corresponding to the behavior features; and training model parameters of the initial credibility analysis model according to the confidence degree intervals corresponding to the behavior characteristics and the behavior characteristic data set to obtain the trained credibility analysis model. Compared with the prior art, the method can provide reliable training data for subsequently obtaining an accurate credibility analysis model, train the credibility analysis model by using the confidence interval of each behavior characteristic, and improve the credibility and accuracy of the model.

Description

Training method of credibility analysis model, credibility analysis method and related device
Technical Field
The invention relates to the technical field of network security, in particular to a training method of a credibility analysis model, a credibility analysis method and a related device.
Background
With the continuous and deep application and development of financial software technology in various industries, the software application security challenge is brought while convenience is brought to the industries.
At present, some abnormal and malicious use behaviors of a user in the process of using a financial software system are not identified well, so that the credibility evaluation of the user behaviors becomes necessary. The existing user reliability analysis method is single and fixed, and the association relation between user behavior characteristics cannot be measured, so that the abnormal user behavior cannot be sufficiently mined, the evaluation result is not accurate enough, and the reliability is not high.
Disclosure of Invention
An objective of the present invention is to provide a method for training a reliability analysis model, a reliability analysis method and a related device, so as to solve the above technical problems.
In a first aspect, the present invention provides a method for training a reliability analysis model, where the method includes: acquiring behavior characteristic data sets corresponding to a plurality of users respectively; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature; determining confidence intervals corresponding to the behavior features according to all feature values corresponding to the behavior features; and training model parameters of the initial credibility analysis model according to the confidence degree intervals corresponding to the behavior characteristics and the behavior characteristic data set to obtain the trained credibility analysis model.
In a second aspect, the present invention provides a method for reliability analysis, the method comprising: acquiring a behavior log data set of a user to be analyzed in a preset time period; determining a behavior characteristic data set corresponding to the user to be analyzed according to the behavior log data set; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature; inputting the behavior characteristic data set into a trained reliability analysis model, and outputting the behavior reliability of the user to be analyzed; the reliability analysis model is trained according to the behavior feature data sets of a plurality of users and the confidence intervals corresponding to the behavior features.
In a third aspect, the present invention provides a device for training a reliability analysis model, including: the obtaining module is used for obtaining behavior characteristic data sets corresponding to a plurality of users; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature; a determining module, configured to determine, according to all feature values corresponding to the plurality of behavior features, confidence intervals corresponding to the plurality of behavior features; and the training module is used for training the model parameters of the initial credibility analysis model according to the confidence intervals corresponding to the behavior characteristics and the behavior characteristic data set to obtain the trained credibility analysis model.
In a fourth aspect, the present invention provides a reliability analysis apparatus, including: the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a behavior log data set of a user to be analyzed in a preset time period; the analysis module is used for determining a behavior characteristic data set corresponding to the user to be analyzed according to the behavior date data set; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature; the behavior characteristic data set is input into a trained credibility analysis model, and credibility analysis results corresponding to the user to be analyzed are output; the reliability analysis model is trained according to the behavior feature data sets of a plurality of users and the confidence intervals corresponding to the behavior features.
In a fourth aspect, the present invention provides an electronic device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being capable of executing the computer program to implement the method of the first aspect or to implement the method of the second aspect.
In a fifth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first aspect or implements the method of the second aspect.
The invention provides a training method of a credibility analysis model, a credibility analysis method and a related device, wherein the method comprises the following steps: acquiring behavior characteristic data sets corresponding to a plurality of users respectively; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature; determining confidence intervals corresponding to the behavior features according to all feature values corresponding to the behavior features; and training model parameters of the initial credibility analysis model according to the confidence degree intervals corresponding to the behavior characteristics and the behavior characteristic data set to obtain the trained credibility analysis model. Compared with the prior art, the confidence degree intervals of the multiple behavior characteristics are determined by obtaining the behavior characteristic data sets of the multiple users, the confidence degree of the user behaviors is measured through the multiple behavior characteristics, certain persuasion is achieved, meanwhile, reliable training data are provided for obtaining accurate confidence degree analysis models subsequently, the confidence degree analysis models are trained through the confidence degree intervals of the behavior characteristics, and the confidence degree and the accuracy of the models can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a schematic diagram of an application environment;
fig. 2 is a schematic block diagram of an electronic device 200 according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a method for training a confidence analysis model according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart diagram illustrating one implementation of step S302 provided by an embodiment of the present invention;
FIG. 5 is a schematic flow chart of a confidence analysis method provided by an embodiment of the present invention;
FIG. 6 is a functional block diagram of an apparatus 600 for training confidence analysis models according to an embodiment of the present invention;
fig. 7 is a functional block diagram of a reliability analysis apparatus 700 according to an embodiment of the present invention.
Icon: 102-a terminal; 104-a service device; 200-an electronic device; 201-a memory; 202-a processor; 203-a communication interface; 600-a training device of a reliability analysis model; 610-an obtaining module; 620-a determination module; 630-a training module; 700-trustworthiness analysis means; 710-an acquisition module; 720-analysis module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
In the description of the present invention, it should be noted that if the terms "upper", "lower", "inside", "outside", etc. indicate an orientation or a positional relationship based on that shown in the drawings or that the product of the present invention is used as it is, this is only for convenience of description and simplification of the description, and it does not indicate or imply that the device or the element referred to must have a specific orientation, be constructed in a specific orientation, and be operated, and thus should not be construed as limiting the present invention.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present invention may be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a schematic diagram of an application environment including a terminal 102 and a service device 104, wherein the terminal 102 and the service device 104 can be communicatively connected.
The service device 104 may provide various types of services for the terminal 102, including but not limited to financial services, and a series of operations that a user can perform on service-type software after authenticating on the terminal 102 (e.g., a smartphone, a tablet, a desktop, etc.) with the service-type software installed thereon, including but not limited to a series of operations of browsing, clicking, jumping, forwarding, downloading, etc. on the software. The service device 104 can obtain session information of paths, actions, time and the like generated in the operation process, and further perform user behavior analysis according to the obtained session information, thereby realizing dynamic and real-time monitoring of illegal and abnormal behaviors made by a legal user in the service type software using process, and improving the credibility of system user behaviors.
Among them, the terminal 102 may be, but is not limited to: the service device 104 may be an independent physical server, or a server cluster or a distributed system formed by a plurality of physical servers.
It should be noted that, in the above embodiment of the present invention, before the user behavior feature data generated in the user operation process is provided to the service device 104 and before the service device 104 collects the training samples, the user agreement corresponding to the terminal 102 is obtained in an explicit manner, so that the user behavior feature data can be provided to the service device 104 and the service device 104 can collect the training samples.
It should be noted that, the service device 104 provides the collected, used and stored description to the relevant user in a public form and obtains the user authorization during the process of acquiring the user behavior feature data of the user from the database according to the user information and collecting the training sample. In the process of collecting training samples, personal information irrelevant to the service of reliability analysis provided by the embodiment is not included.
The following describes a method for training a reliability analysis model provided by the embodiment of the present invention, by taking financial software as an example.
As is well known, as financial software technology is continuously applied and developed deeply in various industries, the safety challenge of software use is brought while the industrial convenience is brought, so that the safety requirement of users on financial software systems is higher and higher.
Where the financial software may be provided by different financial institutions, which may include, but are not limited to, banks, securities, insurance, trust, funds, and the like.
At present, security threats aiming at a financial software system mainly come from two aspects, on one hand, attack behaviors come from the outside of the financial software system, and the attack behaviors can be effectively controlled through the trusted computing technology, network isolation, firewall and other means. On the other hand, some abnormal and malicious use behaviors of a user from the software system in the process of using the system cannot be prevented because the behavior of the software system for controlling sensitive operation and data through the authority is single, so that the problem of safety of the system software system is easily caused and the system cannot be stopped in time because a legal user can not make a sensitive operation behavior by utilizing the authority of the legal user.
In order to solve the problems, the related art provides a technical scheme for performing measurement evaluation on user behaviors of a financial software system, and the conventional evaluation method is mainly based on collecting key indexes of the user behaviors, comparing a measurement index table and marking the software user behaviors by a signature method to realize measurement evaluation. Such as: and preprocessing and statistically analyzing the user behavior log data, wherein the preprocessing and statistical analysis comprises deleting abnormal values, invalid clicks, sorting specific attributes, segmenting sessions and the like, and counting and displaying data indexes of the user behavior by using a chart, such as statistics of functional operation times, functional time intervals, session sequence length and the like.
However, the above evaluation method is single and fixed, and the correlation between the user behavior characteristics cannot be measured, so that mining of abnormal user behaviors cannot be sufficient, the evaluation result is not accurate enough, and the reliability is not high.
In order to solve the above-mentioned defects in the related art, embodiments of the present invention provide a reliability analysis model for analyzing a user behavior, and the reliability analysis model can be used to implement reliability analysis on the user behavior.
Referring to fig. 2, fig. 2 is a schematic block diagram of an electronic device 200 according to an embodiment of the invention. The electronic device 200 may be a device that trains a neural network model to implement the reliability analysis model provided by the embodiment of the present invention, and/or a device that runs a trained reliability analysis model to implement the reliability analysis method provided by the embodiment of the present invention, such as a mobile phone, a Personal Computer (PC), a tablet computer, a server, and so on.
The electronic device 200 comprises a memory 201, a processor 202 and a communication interface 203, the memory 201, the processor 202 and the communication interface 203 being electrically connected to each other directly or indirectly to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
The memory 201 may be used to store software programs and modules, such as program instructions/modules corresponding to the training apparatus 600 or the reliability analysis apparatus 700 of the reliability analysis model provided in the embodiment of the present invention, which may be stored in the memory 201 in the form of software or firmware (firmware) or be fixed in an Operating System (OS) of the electronic device 200, and the processor 202 executes the software programs and modules stored in the memory 201, thereby performing various functional applications and data processing. The communication interface 203 may be used for communication of signaling or data with other node devices.
The Memory 201 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The processor 202 may be an integrated circuit chip having signal processing capabilities. The processor 202 may be a general-purpose processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
It will be appreciated that the configuration shown in fig. 2 is merely illustrative and that electronic device 200 may include more or fewer components than shown in fig. 2 or may have a different configuration than shown in fig. 2. The components shown in fig. 2 may be implemented in hardware, software, or a combination thereof.
The following describes in detail a training method of the reliability analysis model according to an embodiment of the present invention, taking the electronic device 200 shown in fig. 2 as an execution subject. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 3, fig. 3 is a schematic flow chart of a training method of a reliability analysis model according to an embodiment of the present invention.
S301, acquiring behavior characteristic data sets corresponding to a plurality of users respectively; the behavior feature dataset includes a plurality of behavior features and a feature value of each behavior feature.
S302, determining confidence intervals corresponding to the behavior features according to all feature values corresponding to the behavior features.
And S303, training model parameters of the initial credibility analysis model according to the confidence intervals and the behavior characteristic data sets corresponding to the behavior characteristics respectively to obtain the trained credibility analysis model.
According to the training method of the credibility analysis model provided by the embodiment of the invention, firstly, behavior feature data sets corresponding to a plurality of users are obtained, then, the confidence degree interval of each behavior feature is determined according to the feature values corresponding to the behavior features in the behavior feature data sets, model training is further carried out based on the determined confidence degree interval and the behavior feature data sets, and the trained credibility analysis model is obtained.
The above exemplary steps S301 to S303 are described and explained in detail below.
In step S301, behavior feature data sets corresponding to the plurality of users are obtained.
In this embodiment of the present invention, the behavior feature dataset includes a plurality of behavior features and a feature value of each behavior feature, where the plurality of behavior features refer to an index used to measure the reliability of a user behavior, and the plurality of behavior features in this embodiment of the present invention may be:
1. number of times of logging in to the system: the effective login times and the ineffective login times of a user accessing the software system through a login page;
2. number of different IP address types: the user uses different IP addresses to access the system, and the total access times corresponding to each IP address;
3. number of different browser types: the method comprises the following steps that a user uses access systems of different browser types, and the total access times corresponding to each browser type are calculated;
4. number of different operating system types: the total access times corresponding to each operating system are the total access times of the users using different operating systems to access the systems;
5. number of times of logging in to the system during non-working hours: the effective times and the ineffective times of the user accessing the software system through the login page at the time other than the time specified by the working day;
6. the times of different IP address types in non-working time are as follows: the user uses different IP address types to access the system at the time other than the time specified by the working day, and the total access times corresponding to each IP address;
7. the non-working time is different from the browser type times: the total access times corresponding to each browser type are the total access times that a user uses different browsers to access the system at a time other than the time specified by a working day;
8. the non-working time is different from the type times of the operating system: the total access times corresponding to each operating system are that a user uses different operating systems to access the systems at a time other than the time specified by a working day;
9. number of times of non-working time access to the system: the total times of the user accessing the system except the time specified by the working day;
10. number of unidentified browsers: the total times that a user accesses the system function and cannot identify the source of the client are not identified;
11. number of errors in password entry: the total times that the user logs in the system and inputs the incorrect password is indicated;
12. number of illegal requests: the total times of the user accessing the request path outside the self authority range is referred to;
13. number of illegal jumps: the total times that the user access request path is not a normal track operation step is indicated;
14. number of requests for special characters: the total times that a user sends a request in an interface form and contains special characters are indicated;
15. the download times of the sensitive file are as follows: the total times of downloading the sensitive files specified in the system by the user in the effective system authority range;
16. sensitive information deletion frequency: the total times of deleting sensitive information by a user in an effective system authority range;
17. number of sensitive information modifications: the total times of modifying sensitive information by a user in an effective system authority range is indicated;
18. the number of times of sensitive information inquiry: the total number of times that the user browses and views sensitive information in an effective system authority range.
By the aid of the behavior characteristics, reliability analysis of the user behaviors from multiple angles is achieved, and accuracy and credibility of evaluation results are guaranteed.
For example, in a history week starting from the current time point, behavior feature data of 5 users are counted and quantified, and the obtained behavior feature data sets of the 5 users are shown in table 1.
The plurality of behavior characteristics may be, but not limited to, those obtained by principal component analysis or predefined according to test verification, and are not limited herein.
TABLE 1
Figure F_220117152334597_597730001
In a possible implementation, the step S301 may be implemented as follows:
a1, obtaining the behavior log data set corresponding to each of a plurality of users in the preset time period.
In this embodiment, the preset time period may be defined according to actual needs, for example, the preset time period may be a week or a month, and is not limited herein.
In the embodiment of the invention, the behavior log data set at least comprises session information of paths, actions, time and the like generated in the process of a series of operation actions performed by a user. In order to collect the characteristic values of the behavior characteristics generated by the user in the operation process, in the software system, sensitive operation and non-sensitive operation actions need to be defined in advance, and then a buried point needing to collect behavior log data can be configured on a corresponding operation action through JavaScript, so that a behavior log data set is obtained.
Thus, in one possible implementation, the behavior log dataset may be obtained by:
a1-1, detecting whether the preset data embedded point has user operation information uploaded.
a1-2, if yes, acquiring the behavior logs according to the user operation information, and forming all the acquired behavior logs into a behavior log data set.
It can be understood that the preset data embedding point may be a data embedding point preset in a program code corresponding to an application client, each data embedding point in this embodiment works independently, and in practice, the data embedding point may be set in a function point of a key operation flow corresponding to different services provided by the client according to a service requirement, so that an operation action of a user and session information of a path, an action, time and the like generated by the operation action are recorded through the data embedding point, and an effect of obtaining required data quickly and accurately and saving processing time is achieved.
It should be noted that the settings at the data site all provide the relevant user with the instructions for collection, use and storage in a public manner, and obtain user authorization.
a2, according to a plurality of predefined behavior characteristics, determining characteristic values of the behavior characteristics from behavior log data sets corresponding to a plurality of users, and combining the behavior characteristics and the characteristic values corresponding to the behavior characteristics into a characteristic data set.
It can be understood that, the behavior log data set at least includes information of the plurality of behavior features, so that the feature values in the operation records of the plurality of behavior features within the preset time period can be counted by quantizing the behavior log data set of the user based on the user dimension according to the plurality of predefined behavior features, that is, analyzing the log.
It should be noted that the setting position of the server for storing the behavior feature data or the behavior log data complies with the legal requirements of the country/region where the behavior related to the data is generated, and the related behaviors include, but are not limited to: authorization, generation, use, storage, etc.
In step S302, a confidence interval corresponding to each of the plurality of behavior features is determined based on all feature values corresponding to each of the plurality of behavior features.
In this embodiment, the confidence interval corresponding to each behavior feature represents the confidence range of the feature value of the behavior feature, and when the feature value corresponding to the behavior feature is in the confidence interval, the behavior feature may be considered to be trusted, or the user behavior corresponding to the behavior feature may be trusted, and through the confidence interval, the confidence range of each behavior feature may be measured, so that the confidence analysis result has an effect of a certain persuasion and theoretical basis.
In a possible implementation manner, the step S302 may be implemented in the following manner, please refer to fig. 4, where fig. 4 is a schematic flowchart of an implementation manner of the step S302 according to an embodiment of the present invention:
s302-1, extracting characteristic values of the first behavior features from the plurality of behavior feature data sets aiming at the first behavior features, and determining statistical index values corresponding to the first behavior features based on the obtained characteristic values of the first behavior features;
wherein the first behavior feature is any one of a plurality of behavior features.
In the embodiment of the present invention, for each behavior feature, all feature values of the behavior feature may be obtained from the behavior feature data sets of all users, and the statistical index value is a mean value M and a standard error se, for example, assuming that a behavior feature data set of 1000 users is obtained, 1000 feature values corresponding to the behavior feature may be obtained for the behavior feature of "the number of times of logging in the system", the mean value of the behavior feature is a sum of the feature values corresponding to the 1000 users divided by 1000, and the standard error may be obtained according to the obtained mean value and the feature values corresponding to the 1000 users.
S302-2, determining a confidence interval of the first behavior feature according to the statistical index value and a preset confidence threshold.
In the embodiment of the present invention, the confidence threshold may be defined according to actual requirements, for example, the confidence threshold Y =95.5%, and the confidence interval [ a, b ] of each behavior feature is calculated in reverse by using a gaussian function (the formula is shown below) according to the mean M and the standard error se of each behavior feature:
Y=y(a=M-2×se,b=M+2×se)×100%≈95.5%
it should be understood that the behavior features corresponding to the data in the confidence interval are confidence indexes, otherwise, the behavior features are not confidence indexes, so that the quantized data set is convenient for subsequent training to learn the features of the user behavior confidence indexes, and the confidence and non-confidence user behavior features are labeled by using a gaussian function, so that the accuracy and the confidence of the confidence analysis model obtained by subsequent training can be ensured.
For example: taking the behavior feature data set corresponding to the 5 users as an example, regarding the behavior feature 1, that is, "the number of times of logging in the system", the average value of the "number of times of logging in the system" is calculated according to the 5 feature values shown in the first column of data in the table 1 as follows: 50.6, standard error: 25.7286, the confidence interval is obtained as [ -0.8572,102.0572 ]. Based on the confidence interval, it can be found from the first column data in table 1 that the User3 is not in the confidence interval [ -0.8572,102.0572] because its behavior characteristic "number of times of logging in the system" characteristic value is 133, and thus it can be determined that the User behavior of the User3 is not credible.
S302-3, traversing the plurality of behavior features to obtain confidence intervals corresponding to the plurality of behavior features.
In step S303, according to the confidence intervals and the behavior feature data sets corresponding to the plurality of behavior features, model parameters of the initial confidence analysis model are trained, and the trained confidence analysis model is obtained.
In the embodiment of the present invention, the initial reliability analysis model may be, but is not limited to, an enhanced Gradient Boosting (XGBT) model. The XGBT model is a combined strategy learning algorithm, the combined individual learning algorithm model is a Decision Tree (DT), and the XGBT model is a serialized learning algorithm, so that the correlation among the individual learning algorithm models can be learned, and the effect of behavior characteristics can be effectively evaluated.
In an alternative embodiment, the step S302 can be implemented as follows:
b1, obtaining a training data set and a testing data set according to the behavior feature data set and the confidence degree intervals corresponding to the behavior features;
b2, constructing a plurality of initial credibility analysis models, and training the initial credibility analysis models based on the training data set to obtain a plurality of credibility analysis models to be tested;
b3, testing the credibility analysis models to be tested based on the test data set to obtain model evaluation index values corresponding to the credibility analysis models to be tested, and determining the trained credibility analysis models based on the model evaluation index values.
In this embodiment, the behavior feature data may be segmented to obtain a training set and a test set, and a ratio of the segmented training set to the test set may be defined by itself, for example, may be 3: 1.
The model evaluation index value may include, but is not limited to, Accuracy (Accuracy, calculation formula (1)), Precision (Precision, calculation formula (2)), Recall (Recall, calculation formula (3)), and the quality of the comparison algorithm model is analyzed by the evaluation index. The precision ratio and the recall ratio are mutually restricted, if the precision ratio is required to be improved, the recall ratio is reduced, if the recall ratio is required to be improved, the precision ratio is reduced, and a balance between the precision ratio and the recall ratio needs to be found.
Figure T_220117152335618_618763001
Wherein, the meaning of TT, TU, FT and FU in the above formulas (1), (1) and (3) is shown in Table 2, the number "1" in Table 2 is credible, and the number "0" is credible:
TABLE 2
Prediction 1 (Trusted, credible) Prediction 0 (Untrusted )
Practice 1 TT (True Trusted, True, predicted as authentic) FU (False Untrusted, authentic in nature, predicted Untrusted)
Actual 0 FT (False Trusted, actual, predicted to be authentic) TU (True Untrusted, actually Untrusted, predicted Untrusted)
In the training process, through adjusting parameters such as the number, the depth, the training turns and the like of the feature trees in the credibility analysis model, the model evaluation index value of the comparison model is obtained, and the optimal credibility analysis model suitable for the system is obtained.
The credibility analysis method provided by the embodiment of the invention is exemplarily described below by using the credibility analysis model completed by the training method and applying the credibility analysis model to an application scenario of credibility analysis of user behaviors.
Referring to fig. 5, fig. 5 is a schematic flowchart of a reliability analysis method according to an embodiment of the present invention, where the method may include:
s501, obtaining a behavior log data set of a user to be analyzed in a preset time period.
It is understood that, the above implementation manner of obtaining the behavior log data set can refer to step a1-1 and step a1-2 in the above description, and details are not repeated here.
S502, determining a behavior characteristic data set corresponding to a user to be analyzed according to the behavior log data set; the behavior feature dataset includes a plurality of behavior features and a feature value of each behavior feature.
And S503, inputting the behavior characteristic data set into the trained reliability analysis model, and outputting the behavior reliability of the user to be analyzed.
The credibility analysis model is trained according to the behavior feature data sets of the users and the corresponding confidence intervals of the behavior features.
According to the credibility analysis method provided by the embodiment of the invention, firstly, the behavior log data set of the user to be analyzed is obtained, then the behavior characteristic data set corresponding to the user to be shared is obtained according to the behavior log data set, further, the data in the behavior characteristic data set is input into the credibility analysis model which is trained in advance for analysis, the behavior credibility of the user to be analyzed is output, and the credibility analysis of the user behavior is carried out by utilizing the credibility analysis model which is trained in advance, so that the effects of improving the analysis speed and the accuracy of the analysis result are realized.
It should be noted that the reliability analysis model may be trained by a model training device in advance, and then loaded into a device executing the reliability analysis method to achieve the effect of user behavior reliability analysis, or the device executing the reliability analysis method may have a model training function, and first obtains the reliability analysis model through the training method of the reliability analysis model, and then is used to execute the reliability analysis function.
Referring to fig. 6, based on the same inventive concept as the above-mentioned model training method provided in the embodiment of the present invention, fig. 6 is a functional block diagram of a training apparatus 600 for a reliability analysis model provided in the embodiment of the present invention, where the training apparatus 600 for a reliability analysis model includes:
an obtaining module 610, configured to obtain behavior feature data sets corresponding to multiple users; the behavior feature dataset includes a plurality of behavior features and a feature value of each behavior feature.
The determining module 620 is configured to determine, according to all feature values corresponding to the plurality of behavior features, confidence intervals corresponding to the plurality of behavior features.
The training module 630 is configured to train a model parameter of the initial reliability analysis model according to the confidence intervals and the behavior feature data sets corresponding to the multiple behavior features, so as to obtain a trained reliability analysis model.
It is to be appreciated that the obtaining module 610, the determining module 620, and the training module 630 may cooperatively perform the various steps of fig. 3 to achieve a corresponding technical effect.
In an optional embodiment, the determining module 620 is specifically configured to, for a first behavior feature, extract a feature value of the first behavior feature from the multiple behavior feature data sets, and determine a statistical index value corresponding to the first behavior feature based on the obtained feature value of the first behavior feature; the first behavior feature is any one of a plurality of behavior features; determining a confidence coefficient interval of the first behavior feature according to the statistical index value and a preset confidence coefficient threshold; and traversing the plurality of behavior characteristics to obtain confidence intervals corresponding to the plurality of behavior characteristics.
In an optional embodiment, the obtaining module 610 is specifically configured to obtain a behavior log data set corresponding to each of a plurality of users in a preset time period; according to a plurality of predefined behavior characteristics, determining characteristic values of the behavior characteristics from behavior log data sets corresponding to a plurality of users, and forming a behavior characteristic data set by the behavior characteristics and the characteristic values corresponding to the behavior characteristics.
In an optional embodiment, the obtaining module 610 is specifically configured to detect whether a preset data embedding point has user operation information uploaded thereon; and if so, acquiring the behavior logs according to the user operation information, and forming all the acquired behavior logs into a behavior log data set.
In an optional embodiment, the training module 630 is specifically configured to obtain a training data set and a test data set according to confidence intervals corresponding to the behavior feature data set and the plurality of behavior features respectively; constructing a plurality of initial credibility analysis models, and training the initial credibility analysis models based on a training data set to obtain a plurality of credibility analysis models to be tested; and testing the plurality of credibility analysis models to be tested based on the test data set to obtain model evaluation index values corresponding to the plurality of credibility analysis models to be tested, and determining the trained credibility analysis model based on the model evaluation index values.
Referring to fig. 7, based on the same inventive concept as the above reliability analysis method provided in the embodiment of the present invention, fig. 7 is a schematic structural diagram of a reliability analysis apparatus 700 provided in the embodiment of the present invention, where the reliability analysis apparatus 700 includes:
the obtaining module 710 is configured to obtain a behavior log data set of a user to be analyzed within a preset time period.
The analysis module 720 is configured to determine a behavior feature data set corresponding to the user to be analyzed according to the behavior date data set; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature; and the credibility analysis module is used for inputting the behavior characteristic data set into the trained credibility analysis model and outputting a credibility analysis result corresponding to the user to be analyzed.
It is understood that the obtaining module 710 and the analyzing module 720 can cooperatively perform the steps in fig. 5 to achieve the corresponding technical effect.
In an alternative embodiment, the obtaining module 710 may be further configured to perform the step a1-1 and the step a1-2 to achieve corresponding technical effects.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for training a reliability analysis model and/or a method for reliability analysis according to any one of the foregoing embodiments. The computer readable storage medium may be, but is not limited to, various media that can store program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a PROM, an EPROM, an EEPROM, a magnetic or optical disk, etc.
It should be understood that the disclosed apparatus and method may be embodied in other forms. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention. It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

Claims (10)

1. A method for training a confidence analysis model, the method comprising:
acquiring behavior characteristic data sets corresponding to a plurality of users respectively; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature;
determining confidence intervals corresponding to the behavior features according to all feature values corresponding to the behavior features;
and training model parameters of the initial credibility analysis model according to the confidence degree intervals corresponding to the behavior characteristics and the behavior characteristic data set to obtain the trained credibility analysis model.
2. The training method according to claim 1, wherein determining the confidence interval corresponding to each of the plurality of behavior features according to all feature values corresponding to each of the plurality of behavior features comprises:
for a first behavior feature, extracting a feature value of the first behavior feature from a plurality of behavior feature data sets, and determining a statistical index value corresponding to the first behavior feature based on the obtained feature value of the first behavior feature; the first behavior feature is any one of the plurality of behavior features;
determining the confidence degree interval of the first behavior feature according to the statistical index value and a preset confidence degree threshold value;
and traversing the plurality of behavior characteristics to obtain confidence intervals corresponding to the plurality of behavior characteristics.
3. The training method of claim 1, wherein obtaining behavior feature data sets corresponding to a plurality of users comprises:
acquiring behavior log data sets corresponding to the plurality of users in a preset time period;
according to the predefined behavior characteristics, determining characteristic values of the behavior characteristics from behavior log data sets corresponding to the users respectively, and combining the behavior characteristics and the characteristic values corresponding to the behavior characteristics into the behavior characteristic data set.
4. The training method according to claim 3, wherein obtaining a behavior log data set corresponding to each of the plurality of users within a preset time period comprises:
detecting whether preset data embedding points upload user operation information or not;
if yes, acquiring a behavior log according to the user operation information, and forming the behavior log data set by all the acquired behavior logs.
5. The training method of claim 1, wherein training model parameters of an initial reliability analysis model according to the confidence intervals corresponding to the behavior features and the behavior feature data set to obtain a trained reliability analysis model comprises:
obtaining a training data set and a testing data set according to the behavior feature data set and the confidence degree intervals corresponding to the behavior features respectively;
constructing a plurality of initial credibility analysis models, and training the initial credibility analysis models based on the training data set to obtain a plurality of credibility analysis models to be tested;
and testing the plurality of credibility analysis models to be tested based on the test data set to obtain model evaluation index values corresponding to the plurality of credibility analysis models to be tested, and determining the trained credibility analysis model based on the model evaluation index values.
6. A method for credibility analysis, the method comprising:
acquiring a behavior log data set of a user to be analyzed in a preset time period;
determining a behavior characteristic data set corresponding to the user to be analyzed according to the behavior log data set; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature;
inputting the behavior characteristic data set into a trained reliability analysis model, and outputting the behavior reliability of the user to be analyzed;
the reliability analysis model is trained according to the behavior feature data sets of a plurality of users and the confidence intervals corresponding to the behavior features.
7. A device for training a reliability analysis model, comprising:
the obtaining module is used for obtaining behavior characteristic data sets corresponding to a plurality of users; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature;
a determining module, configured to determine, according to all feature values corresponding to the plurality of behavior features, confidence intervals corresponding to the plurality of behavior features;
and the training module is used for training the model parameters of the initial credibility analysis model according to the confidence intervals corresponding to the behavior characteristics and the behavior characteristic data set to obtain the trained credibility analysis model.
8. An apparatus for reliability analysis, comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a behavior log data set of a user to be analyzed in a preset time period;
the analysis module is used for determining a behavior characteristic data set corresponding to the user to be analyzed according to the behavior log data set; the behavior feature data set comprises a plurality of behavior features and feature values of each behavior feature; the behavior characteristic data set is input into a trained credibility analysis model, and credibility analysis results corresponding to the user to be analyzed are output;
the reliability analysis model is trained according to the behavior feature data sets of a plurality of users and the confidence intervals corresponding to the behavior features.
9. An electronic device comprising a processor and a memory, the memory storing a computer program executable by the processor, the processor being operable to execute the computer program to implement the method of any one of claims 1 to 5 or to implement the method of claim 6.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 5 or carries out the method of claim 6.
CN202210051516.3A 2022-01-18 2022-01-18 Training method of credibility analysis model, credibility analysis method and related device Pending CN114064440A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210051516.3A CN114064440A (en) 2022-01-18 2022-01-18 Training method of credibility analysis model, credibility analysis method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210051516.3A CN114064440A (en) 2022-01-18 2022-01-18 Training method of credibility analysis model, credibility analysis method and related device

Publications (1)

Publication Number Publication Date
CN114064440A true CN114064440A (en) 2022-02-18

Family

ID=80231181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210051516.3A Pending CN114064440A (en) 2022-01-18 2022-01-18 Training method of credibility analysis model, credibility analysis method and related device

Country Status (1)

Country Link
CN (1) CN114064440A (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002810A (en) * 2018-08-01 2018-12-14 西南交通大学 Model evaluation method, Radar Signal Recognition method and corresponding intrument
CN110557447A (en) * 2019-08-26 2019-12-10 腾讯科技(武汉)有限公司 user behavior identification method and device, storage medium and server
CN110889716A (en) * 2019-09-29 2020-03-17 清华大学 Method and device for identifying potential registered user
CN111260419A (en) * 2020-02-20 2020-06-09 世纪龙信息网络有限责任公司 Method and device for acquiring user attribute, computer equipment and storage medium
CN111311338A (en) * 2020-03-30 2020-06-19 网易(杭州)网络有限公司 User value prediction method and user value prediction model training method
CN111405562A (en) * 2020-03-11 2020-07-10 中国科学院信息工程研究所 Mobile malicious user identification method and system based on communication behavior rules
CN111652280A (en) * 2020-04-30 2020-09-11 中国平安财产保险股份有限公司 Behavior-based target object data analysis method and device and storage medium
CN111949867A (en) * 2020-08-10 2020-11-17 中国平安人寿保险股份有限公司 Cross-APP user behavior analysis model training method, analysis method and related equipment
CN112733045A (en) * 2021-04-06 2021-04-30 北京轻松筹信息技术有限公司 User behavior analysis method and device and electronic equipment
CN113255815A (en) * 2021-06-10 2021-08-13 平安科技(深圳)有限公司 User behavior abnormity analysis method, device, equipment and storage medium
CN113902037A (en) * 2021-11-08 2022-01-07 中国联合网络通信集团有限公司 Abnormal bank account identification method, system, electronic device and storage medium

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002810A (en) * 2018-08-01 2018-12-14 西南交通大学 Model evaluation method, Radar Signal Recognition method and corresponding intrument
CN110557447A (en) * 2019-08-26 2019-12-10 腾讯科技(武汉)有限公司 user behavior identification method and device, storage medium and server
CN110889716A (en) * 2019-09-29 2020-03-17 清华大学 Method and device for identifying potential registered user
CN111260419A (en) * 2020-02-20 2020-06-09 世纪龙信息网络有限责任公司 Method and device for acquiring user attribute, computer equipment and storage medium
CN111405562A (en) * 2020-03-11 2020-07-10 中国科学院信息工程研究所 Mobile malicious user identification method and system based on communication behavior rules
CN111311338A (en) * 2020-03-30 2020-06-19 网易(杭州)网络有限公司 User value prediction method and user value prediction model training method
CN111652280A (en) * 2020-04-30 2020-09-11 中国平安财产保险股份有限公司 Behavior-based target object data analysis method and device and storage medium
CN111949867A (en) * 2020-08-10 2020-11-17 中国平安人寿保险股份有限公司 Cross-APP user behavior analysis model training method, analysis method and related equipment
CN112733045A (en) * 2021-04-06 2021-04-30 北京轻松筹信息技术有限公司 User behavior analysis method and device and electronic equipment
CN113255815A (en) * 2021-06-10 2021-08-13 平安科技(深圳)有限公司 User behavior abnormity analysis method, device, equipment and storage medium
CN113902037A (en) * 2021-11-08 2022-01-07 中国联合网络通信集团有限公司 Abnormal bank account identification method, system, electronic device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘洋: "基于集成学习的软件用户行为可信度分析模型研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
史玲玲: "置信度-更确信", 《爱上统计学 EXCEL》 *

Similar Documents

Publication Publication Date Title
US10924514B1 (en) Machine learning detection of fraudulent validation of financial institution credentials
Zouina et al. A novel lightweight URL phishing detection system using SVM and similarity index
CN111428231B (en) Safety processing method, device and equipment based on user behaviors
CN110602029B (en) Method and system for identifying network attack
CN111949803B (en) Knowledge graph-based network abnormal user detection method, device and equipment
US10885167B1 (en) Intrusion detection based on anomalies in access patterns
CN107122669B (en) Method and device for evaluating data leakage risk
CN109831465A (en) A kind of invasion detection method based on big data log analysis
KR100894331B1 (en) Anomaly Detection System and Method of Web Application Attacks using Web Log Correlation
CN107508809B (en) Method and device for identifying website type
WO2017013529A1 (en) System and method for determining credit worthiness of a user
CN111740977B (en) Voting detection method and device, electronic equipment and computer readable storage medium
CN111756724A (en) Detection method, device and equipment for phishing website and computer readable storage medium
CN106301979B (en) Method and system for detecting abnormal channel
CN112667991A (en) User identity continuous authentication method and system based on behavior map
CN115204733A (en) Data auditing method and device, electronic equipment and storage medium
Pletinckx et al. Out of Sight, Out of Mind: Detecting Orphaned Web Pages at Internet-Scale
CN107612946B (en) IP address detection method and device and electronic equipment
CN117609992A (en) Data disclosure detection method, device and storage medium
CN107995167B (en) Equipment identification method and server
Demir et al. A Large-Scale Study of Cookie Banner Interaction Tools and their Impact on Users' Privacy
CN114064440A (en) Training method of credibility analysis model, credibility analysis method and related device
Izergin et al. Risk assessment model of compromising personal data on mobile devices
CN115604032A (en) Complex multi-step attack detection method and system for power system
CN117391214A (en) Model training method and device and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220218