WO2021258992A1 - 基于大数据的用户行为监测方法、装置、设备及介质 - Google Patents

基于大数据的用户行为监测方法、装置、设备及介质 Download PDF

Info

Publication number
WO2021258992A1
WO2021258992A1 PCT/CN2021/096700 CN2021096700W WO2021258992A1 WO 2021258992 A1 WO2021258992 A1 WO 2021258992A1 CN 2021096700 W CN2021096700 W CN 2021096700W WO 2021258992 A1 WO2021258992 A1 WO 2021258992A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
monitored
identification information
probability
risk
Prior art date
Application number
PCT/CN2021/096700
Other languages
English (en)
French (fr)
Inventor
许超俊
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021258992A1 publication Critical patent/WO2021258992A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • G06F18/24155Bayesian classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database

Definitions

  • This application relates to the field of information technology, and in particular to a method, device, equipment, and medium for monitoring user behavior based on big data.
  • the production business system and back-end management system of an enterprise will generate a large amount of business data and business operation data. Ensuring the reliability, validity, availability and accuracy of these data is the key to the comprehensive informatization and digital operation of the enterprise.
  • the production business system and the background management system set roles for each user to restrict users' access and use of data.
  • the inventor realizes that in the maintenance process of the background database, due to the requirement of cross-accessing various data tables, it is usually necessary to grant additional operation permissions to users with set roles, such as table authorization for a separate database. Since it is impossible to subdivide the roles of the back-end database, the prior art mainly uses security personnel or computers to analyze and use exhaustive methods to investigate illegal behaviors of users, which is inefficient and time-consuming.
  • the embodiments of the present application provide a user behavior monitoring method, device, device, and medium based on big data to solve the user behavior monitoring problem caused by the ever-increasing data tables and cross-access requirements of the back-end database.
  • a user behavior monitoring method based on big data including:
  • the identification information is used as a target variable, and the historical behavior record of the user to be monitored is used as an input parameter, and a naive Bayes algorithm is used for prediction to obtain the user probability corresponding to the user to be monitored;
  • the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information it is determined whether the user to be monitored is at risk.
  • said acquiring several users and their corresponding historical behavior records and identification information within the first preset time period includes:
  • the determining whether the user to be monitored is at risk according to the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information includes:
  • the deviation value of the user probability corresponding to the user to be monitored with respect to the execution probability corresponding to the identification information is greater than or equal to the first preset threshold, it is determined that the current behavior of the user to be monitored is at risk;
  • the deviation value of the user probability corresponding to the user to be monitored with respect to the execution probability corresponding to the identification information is less than a first preset threshold, it is determined that the current behavior of the user to be monitored is not at risk.
  • the method further includes:
  • identification information as a target variable, using historical behavior records in the third preset time period as an input parameter, and using a naive Bayes algorithm for prediction, to obtain the second general probability of the user to be monitored;
  • the first general probability and the second general probability of the user to be monitored it is determined whether the user to be monitored is at risk.
  • the determining whether the user to be monitored is at risk according to the first general probability and the second general probability of the user to be monitored includes:
  • the deviation value of the second general probability of the user to be monitored with respect to the first general probability is less than a second preset threshold, it is determined that the current behavior of the user to be monitored is not at risk.
  • the method further includes:
  • the determining whether the operation of the user to be monitored exists according to the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information, the first general probability and the second general probability of the user to be monitored The risks include:
  • the deviation of the user probability corresponding to the user to be monitored relative to the execution probability corresponding to the identification information is greater than or equal to the first preset threshold, and the deviation of the second general probability of the user to be monitored relative to the first general probability is greater than or Is equal to the second preset threshold, and it is determined that the current behavior of the user to be monitored is at risk;
  • the deviation of the user probability corresponding to the user to be monitored relative to the execution probability corresponding to the identification information is less than the first preset threshold, and/or the deviation of the second general probability of the user to be monitored relative to the first general probability is less than the first general probability 2.
  • a preset threshold to determine that the current behavior of the user to be monitored is not at risk.
  • a user behavior monitoring device based on big data comprising:
  • the parameter acquisition module is used to acquire several users and their corresponding historical behavior records and identification information within the first preset time period;
  • the training module is configured to use the identification information as a target variable, use the historical behavior records of the several users as input parameters, and use the naive Bayes algorithm for training to obtain the execution probability corresponding to each identification information;
  • the prediction module is used for the user to be monitored, using identification information as the target variable, the historical behavior record of the user to be monitored as an input parameter, and the naive Bayes algorithm for prediction, to obtain the user probability corresponding to the user to be monitored ;
  • the probability acquisition module is used to acquire the identification information of the user to be monitored and the execution probability corresponding to the identification information
  • the risk monitoring module is configured to determine whether the user to be monitored is at risk according to the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information.
  • a computer device includes a memory, a processor, and computer-readable instructions that are stored in the memory and can run on the processor, and the processor implements the following steps when the processor executes the computer-readable instructions:
  • the identification information is used as a target variable, and the historical behavior record of the user to be monitored is used as an input parameter, and a naive Bayes algorithm is used for prediction to obtain the user probability corresponding to the user to be monitored;
  • the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information it is determined whether the user to be monitored is at risk.
  • One or more readable storage media storing computer readable instructions, where when the computer readable instructions are executed by one or more processors, the one or more processors execute the following steps:
  • the identification information is used as a target variable, and the historical behavior record of the user to be monitored is used as an input parameter, and a naive Bayes algorithm is used for prediction to obtain the user probability corresponding to the user to be monitored;
  • the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information it is determined whether the user to be monitored is at risk.
  • the embodiment of this application analyzes user behavior in different dimensions, by acquiring a number of users and their corresponding historical behavior records and identification information within a first preset time period; using the identification information as the target variable, and using the plurality of users as the target variable.
  • the historical behavior records of each user are used as input parameters, and the Naive Bayes algorithm is used for training to obtain the execution probability corresponding to each identification information; for the user to be monitored, the identification information is used as the target variable, and the historical behavior of the user to be monitored is used as the target variable.
  • the Naive Bayes algorithm is used for prediction to obtain the user probability corresponding to the user to be monitored; to obtain the identification information of the user to be monitored and the execution probability corresponding to the identification information;
  • the user probability corresponding to the user and the execution probability corresponding to the identification information are used to determine whether the user to be monitored is at risk; thus, it is beneficial to discover the abnormal behavior of the user, reduces or replaces the time and efficiency of manually checking database user risks, and improves
  • the investigation effect effectively solves the user behavior monitoring problem caused by the ever-increasing data tables and cross-access requirements of the back-end database.
  • FIG. 1 is a flowchart of a method for monitoring user behavior based on big data in an embodiment of the present application
  • step S101 in the method for monitoring user behavior based on big data in an embodiment of the present application
  • FIG. 3 is a flowchart of step S105 in the method for monitoring user behavior based on big data in an embodiment of the present application
  • FIG. 4 is another flowchart of a user behavior monitoring method based on big data in an embodiment of the present application
  • FIG. 5 is a flowchart of step S110 in the method for monitoring user behavior based on big data in an embodiment of the present application
  • FIG. 6 is another flowchart of a method for monitoring user behavior based on big data in an embodiment of the present application
  • FIG. 7 is a flowchart of step S111 in the method for monitoring user behavior based on big data in an embodiment of the present application
  • FIG. 8 is a functional block diagram of a user behavior monitoring device based on big data in an embodiment of the present application.
  • Fig. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the purpose of the user behavior monitoring method based on big data provided by the embodiments of this application is to solve the user behavior monitoring problem caused by the increasing data tables and cross-access requirements of the back-end database, so as to discover the abnormal behavior of the user, reduce or replace
  • the time and efficiency of manual investigation of database user risks solves the problem of low risk investigation effects of existing technologies, optimizes the implementation path, shortens the investigation time, and can intelligently determine the proactive discovery of potential risks and reduce manual intervention; it is also conducive to combining reality
  • the regulatory requirements of the management department for data provide a basis for the formulation of corresponding data management systems and policies.
  • the method for monitoring user behavior based on big data includes:
  • step S101 obtain several users and their corresponding historical behavior records and identification information within a first preset time period.
  • the first preset time period can be set as required.
  • the historical behavior record corresponding to each user in the first preset time period is acquired.
  • the historical behavior record refers to the dimensionless operation record of the user in the first preset time period
  • the identification information refers to the user role corresponding to the user, which is the user's operation credential.
  • the step S101 includes:
  • step S201 historical operation data of several users in a first preset time period is acquired.
  • the historical operation data refers to a collection of user operations within a preset time range in the past. It can be obtained in the form of real-time collection of traffic data from the switch, or it can be collected in the form of slow queries supported by the database and database log files. .
  • the aforementioned historical operation data may also be stored in a node of a blockchain. Step S201 can also be obtained from the nodes of the blockchain.
  • step S202 the historical operation data is converted into SQL data, and the SQL data is regularly cleaned and analyzed to obtain operation data and identification information corresponding to each user.
  • the historical operation data is restored to SQL format data so as to facilitate the processing of the historical operation data.
  • the SQL data is regularized cleaning and parsing, including but not limited to, for example, clearing special symbols, punctuation, English, and numbers in the text, removing line breaks, and converting multiple spaces into one space.
  • the operation data refers to the operation record of the user's query or insertion or update or creation or deletion of the data table and its fields.
  • the identification information refers to the user role corresponding to the user, and is the operation credential of the user.
  • One piece of identification information corresponds to a set of permission configurations. This embodiment divides the operation permissions of different databases and tables according to business lines and job functions.
  • One identification information corresponds to a set of operation permissions, and this identification information can be assigned to one or even multiple users, so that the user with the identification information has a corresponding set of operation permissions.
  • step S203 each user is traversed, the operation data corresponding to the user is aggregated according to a preset time period, and the aggregated operation data is standardized to obtain historical behavior records.
  • the operation data obtained through the above step S202 is scattered, and the amount of data is large.
  • the operation data corresponding to the user is aggregated according to a preset time period.
  • the operation records can be converted into dummy variables, and then the operation data corresponding to the same user can be aggregated based on the statistics of the dummy variables.
  • the dummy variable of the generated data table is 1, and the dummy variable of other data tables that are not operated is then 0, traverse all operation data of the user, and then Collect all the user's operation data according to a preset time period, such as by day, week, or month, to aggregate the user's repeated operation data, and calculate the user's operation in a preset time period
  • a preset time period such as by day, week, or month
  • the operation data of user X includes:
  • the behavior record after aggregation is:
  • this embodiment After completing the aggregation of the operating data, these operating data are used as the original indicator data, using different evaluation indicators, with different dimensions and orders of magnitude.
  • this embodiment performs standardization processing on the aggregated behavior records to convert the behavior records into dimensionless index evaluation values to obtain historical behavior records as the naive The training parameters of the Bayesian algorithm.
  • step S204 each user is traversed, and the user and its historical behavior record are marked with identification information.
  • this embodiment After the identification information is extracted from the identification record table of the database, such as the role table, this embodiment further marks each user to form a sample set.
  • step S102 the identification information is used as a target variable, and the historical behavior records of the several users are used as input parameters, and the naive Bayes algorithm is used for training to obtain the execution probability corresponding to each identification information.
  • the identification information refers to role information, which is a user's operating credentials, that is, a set of permission configurations.
  • Each identification information corresponds to a set of operation permissions.
  • the embodiment of the present application obtains the execution probability of each identification information through the Naive Bayes algorithm. The probability of a user with the identification information to perform the set of operation permissions.
  • the naive Bayes algorithm is a classification method based on Bayes' theorem and the independent assumption of feature conditions.
  • users and their corresponding historical behavior records are used as a given training data set, and identification information is used as the target variable.
  • Conditional independence assumes the joint probability distribution between learning historical behavior records and identification information.
  • the ratio of the number of users with the same identification information y i to the total number of users can be calculated. As the prior probability of the identification information.
  • the conditional probability of each operation X appearing under each logo Specifically, the ratio of the number of users with designated identification information y i and performing the designated operation x i to the total number of users with designated identification information y i can be calculated as the condition for the user with the designated identification information y i to perform the designated operation x i Probability.
  • the set of operation authority is executable by a user with corresponding identification information y i, and a user with corresponding identification information y i may consider an illegal operation when performing an operation outside the set of operation authority.
  • step S103 for the user to be monitored, the identification information is used as the target variable, the historical behavior record of the user to be monitored is used as the input parameter, and the naive Bayes algorithm is used for prediction, and the user probability corresponding to the user to be monitored is obtained. .
  • the user probability refers to the behavior probability of the user, and represents the probability that a combination of operations performed by a user corresponds to a specific role.
  • For an operation combination of a user for each identification information, calculate the prior probability of the identification information and the conditional probability of each operation in the operation combination under the identification information, and then calculate the occurrence of the operation combination Under the condition of, the posterior probability belonging to each of the identification information, the largest posterior probability is selected as the user probability, and the identification information corresponding to the user probability is used as the identification information corresponding to an operation combination of the user.
  • some operation rights have been granted, including the operation rights corresponding to the identification information and some additional configured operation rights.
  • a naive Bayes algorithm is used to calculate the user probability of the user's operation combination.
  • step S104 the identification information of the user to be monitored and the execution probability corresponding to the identification information are obtained.
  • the execution probability of the identification information of the user to be monitored is found from the corresponding relationship between the operation authority, identification information, and execution probability obtained in step S102.
  • step S105 it is determined whether the user to be monitored is at risk according to the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information.
  • the user probability of the user to be monitored indicates the probability that a set of operation behaviors performed by the user to be monitored belongs to a specific identification information
  • the execution probability of the identification information indicates the probability that a set of operation permissions belongs to the specific identification.
  • step S105 further includes:
  • step S301 the user probability corresponding to the user to be monitored is compared with the execution probability corresponding to the identification information.
  • step S302 if the deviation value of the user probability corresponding to the user to be monitored with respect to the execution probability corresponding to the identification information is greater than or equal to a first preset threshold, it is determined that the current behavior of the user to be monitored is at risk.
  • step S303 if the deviation value of the user probability corresponding to the user to be monitored with respect to the execution probability corresponding to the identification information is less than a first preset threshold, it is determined that the current behavior of the user to be monitored is not at risk.
  • the user probability of the user to be monitored and the execution probability corresponding to the identification information should be the same or tend to be the same. If the deviation between the user probability of the user to be monitored and the execution probability corresponding to the identification information to which it belongs is too large, it means that the current behavior combination of the user to be monitored and the operation authority corresponding to the identification information to which it belongs are inconsistent, and there is an operational risk.
  • the first preset threshold is preferably 3 times the standard deviation.
  • this embodiment sorts out the users and their user probabilities that belong to the same identification information, and on the premise that the user probabilities conform to the normal distribution, calculate the standard deviation between the user probabilities and the execution probability, and then calculate the standard deviation between the user probabilities and the execution probabilities. Times the standard deviation is used as the first preset threshold.
  • the user probability of the user to be monitored deviates from the execution probability corresponding to its identification information by more than 3 times the standard deviation, the user probability is considered to be an outlier, and it is determined that the current behavior of the user to be monitored is at risk If the user probability of the user to be monitored deviates from the execution probability corresponding to its identification information within the 3 times the standard deviation, it is considered that the current behavior of the user to be monitored is within a reasonable range of authority, and the user to be monitored is determined There is no risk in monitoring the current behavior of the user, so that each user to be monitored is reserved for configuring additional permissions, even if the user to be monitored performs some additional permissions other than the operation permissions corresponding to the identification information, such as the database based on user authorization The additional operation authority of the scattered table fields in is also considered safe.
  • the method for monitoring user behavior based on big data further includes:
  • step S106 the historical behavior record of the user to be monitored in a second preset time period is acquired.
  • the second preset time period is less than the first preset time period.
  • the first preset time period is 3 months
  • the second preset time period is 2 months.
  • step S107 the identification information is used as the target variable, the historical behavior record in the second preset time period is used as the input parameter, and the naive Bayes algorithm is used for prediction to obtain the first general probability of the user to be monitored .
  • the first general probability is also the behavior probability of the user, which represents the probability that an operation combination of the user to be monitored in the second preset time period corresponds to the identification information to which it belongs.
  • step S108 the historical behavior record of the user to be monitored in the third preset time period is acquired.
  • step S108 please refer to the description of step S101 above, which will not be repeated here.
  • the third preset time period is less than the second preset time period. For example, if the second preset time period is 2 months, the third preset time period is 1 month.
  • step S109 the identification information is used as the target variable, the historical behavior record in the third preset time period is used as the input parameter, and the naive Bayes algorithm is used for prediction to obtain the second general probability of the user to be monitored .
  • the second general probability is also the behavior probability of the user, which represents the probability that a combination of operations of the user to be monitored in the third preset time period corresponds to the identification information to which it belongs.
  • step S110 according to the first general probability and the second general probability of the user to be monitored, it is determined whether the user to be monitored is at risk.
  • the first general probability and the second general probability of the user to be monitored both indicate the probability that a set of operation behaviors performed by the user to be monitored within a preset time corresponds to the identification information to which they belong.
  • the first general probability is used as a reference, and based on the deviation of the second general probability from the first general probability, it is determined that the operation combination of the user to be monitored in the third preset time period is at the second preset time The period has occurred before, or it has not occurred before but occurred newly in the third preset time period.
  • the step S110 includes:
  • step S501 the first general probability and the second general probability of the user to be monitored are compared.
  • step S502 if the deviation of the second general probability of the user to be monitored with respect to the first general probability is greater than or equal to a second preset threshold, it is determined that the current behavior of the user to be monitored is at risk.
  • step S503 if the deviation of the second general probability of the user to be monitored with respect to the first general probability is less than a second preset threshold, it is determined that the current behavior of the user to be monitored is not at risk.
  • the first general probability and the second general probability should be the same or tend to identical. If the deviation of the second general probability of the user to be monitored with respect to the first general probability is too large, it means that the historical behavior of the user to be monitored in the third preset time deviates from the historical behavior in the second preset time The larger one is a new combination of operations that occurred within the third preset time period, which may have operational risks.
  • the second preset threshold is preferably 3 times the standard deviation. On the premise that the user probability conforms to a normal distribution, the standard deviation between the second general probability of the user to be monitored in history and the first general probability is calculated, and then 3 times the standard deviation is used as the second preset threshold.
  • the execution probability of the identification information, the user probability of the user, the first general probability, and the second general probability can also be combined to monitor the risky behavior of the user.
  • the method for monitoring user behavior based on big data further includes:
  • step S111 according to the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information, the first general probability and the second general probability of the user to be monitored, it is determined whether the operation of the user to be monitored is at risk. .
  • the user probability of the user to be monitored indicates the probability that a set of operation behaviors performed by the user to be monitored belongs to a specific identification information
  • the execution probability of the identification information indicates that a set of operation permissions belongs to the specific identification.
  • the first general probability of the user to be monitored represents the probability that a set of operation behaviors performed by the user to be monitored within the second preset time corresponds to the identification information to which it belongs
  • the second general probability of the user to be monitored represents The probability of a group of operation behaviors performed by the user to be monitored within the third preset time corresponding to the identification information to which the user belongs.
  • This embodiment judges whether a set of operation behaviors of the user to be monitored falls within the range of a set of operation authority corresponding to the identification information, and determines the combination of operations of the user to be monitored in a third preset time period. It has occurred before the second preset time period to determine whether the user to be monitored is at risk.
  • the step S111 includes:
  • step S701 the user probability corresponding to the user to be monitored is compared with the execution probability corresponding to the identification information, and the first general probability and the second general probability of the user to be monitored are compared.
  • step S702 if the deviation of the user probability corresponding to the user to be monitored with respect to the execution probability corresponding to the identification information is greater than or equal to the first preset threshold, and the second general probability of the user to be monitored is relative to the first general probability The deviation of the probability is greater than or equal to the second preset threshold, and it is determined that the current behavior of the user to be monitored is at risk.
  • step S703 if the deviation of the user probability corresponding to the user to be monitored relative to the execution probability corresponding to the identification information is less than the first preset threshold, and/or the second general probability of the user to be monitored is relative to the first general probability The deviation of the probability is less than the second preset threshold, and it is determined that there is no risk in the current behavior of the user to be monitored.
  • the first preset threshold and the second preset threshold are preferably 3 times the standard deviation, and the comparison principle is described in the foregoing embodiment respectively, and will not be repeated here.
  • this embodiment uses historical data of user behavior to perform routine processing on users and their identification information, which facilitates the discovery of abnormal behaviors of users, reduces and replaces the time and efficiency of manual investigation of database user risks, and solves the problem.
  • Existing technologies are used to carry out problems with low risk investigation effects, optimize the implementation path, shorten the investigation time, and can intelligently determine the proactive discovery of potential risks and reduce manual intervention; it is also conducive to combining the actual management department’s regulatory requirements for data and giving the corresponding Provide a basis for the formulation of data management systems and policies.
  • a user behavior monitoring device based on big data is provided, and the user behavior monitoring device based on big data corresponds to the above-mentioned embodiment of the user behavior monitoring method based on big data in a one-to-one correspondence.
  • the big data-based user behavior monitoring device includes a first parameter acquisition module 81, a training module 82, a first prediction module 83, a probability acquisition module 84, and a first risk monitoring module 85.
  • the detailed description of each functional module is as follows:
  • the first parameter obtaining module 81 is configured to obtain several users and their corresponding historical behavior records and identification information within a first preset time period;
  • the training module 82 is configured to use the identification information as a target variable, use the historical behavior records of the several users as input parameters, and use the naive Bayes algorithm for training to obtain the execution probability corresponding to each identification information;
  • the first prediction module 83 is configured to use the identification information as the target variable for the user to be monitored, the historical behavior record of the user to be monitored as the input parameter, and the naive Bayes algorithm for prediction to obtain the corresponding user to be monitored.
  • User probability ;
  • the probability acquisition module 84 is configured to acquire the identification information of the user to be monitored and the execution probability corresponding to the identification information
  • the first risk monitoring module 85 is configured to determine whether the user to be monitored is at risk according to the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information.
  • the first parameter acquisition module 81 includes:
  • a data acquisition unit for acquiring historical operation data of several users in the first preset time period
  • the preprocessing unit is used to convert the historical operation data into SQL data, and perform regularization cleaning and analysis on the SQL data to obtain the operation data and identification information corresponding to each user;
  • the aggregation unit is configured to traverse each user, aggregate the operation data corresponding to the user according to a preset time period, and perform standardized processing on the aggregated operation data to obtain historical behavior records;
  • the marking unit is used to traverse each user and mark the user and its historical behavior records with identification information.
  • the first risk monitoring module 85 includes:
  • a first comparison unit configured to compare the user probability corresponding to the user to be monitored with the execution probability corresponding to the identification information
  • the first risk determination unit is configured to determine the current behavior of the user to be monitored if the deviation value of the user probability corresponding to the user to be monitored with respect to the execution probability corresponding to the identification information is greater than or equal to a first preset threshold value There is a risk;
  • the second risk determination unit is configured to determine that the current behavior of the user to be monitored does not exist if the deviation value of the user probability corresponding to the user to be monitored with respect to the execution probability corresponding to the identification information is less than a first preset threshold value risk.
  • the device further includes:
  • the second parameter obtaining module is configured to obtain the historical behavior record of the user to be monitored in the second preset time period
  • the second prediction module is configured to use the identification information as the target variable, the historical behavior record in the second preset time period as the input parameter, and use the Naive Bayes algorithm to predict, to obtain the first information of the user to be monitored.
  • the third parameter acquisition module is configured to acquire the historical behavior record of the user to be monitored in the third preset time period
  • the third prediction module is configured to use the identification information as the target variable, the historical behavior record in the third preset time period as the input parameter, and use the naive Bayes algorithm to predict, to obtain the second General probability
  • the second risk monitoring module is configured to determine whether the user to be monitored is at risk according to the first general probability and the second general probability of the user to be monitored.
  • the second risk monitoring module includes:
  • a second comparison unit configured to compare the first general probability and the second general probability of the user to be monitored
  • the third risk determination unit is configured to determine that the current behavior of the user to be monitored exists when the deviation value of the second general probability of the user to be monitored relative to the first general probability is greater than or equal to a second preset threshold value risk;
  • the fourth risk determination unit is configured to determine that there is no risk in the current behavior of the user to be monitored if the deviation value of the second general probability of the user to be monitored with respect to the first general probability is less than a second preset threshold. .
  • the device further includes:
  • the third risk monitoring module is configured to determine the operation of the user to be monitored based on the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information, the first general probability and the second general probability of the user to be monitored Is there a risk?
  • the third risk monitoring module includes:
  • the third comparison unit is configured to compare the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information, and compare the first general probability and the second general probability of the user to be monitored;
  • the fifth risk determination unit is configured to: if the deviation of the user probability corresponding to the user to be monitored with respect to the execution probability corresponding to the identification information is greater than or equal to the first preset threshold, and the second general probability of the user to be monitored is relative to The deviation of the first general probability is greater than or equal to the second preset threshold, and it is determined that the current behavior of the user to be monitored is at risk;
  • the sixth risk determination unit is configured to: if the deviation of the user probability corresponding to the user to be monitored relative to the execution probability corresponding to the identification information is less than the first preset threshold, and/or the second general probability of the user to be monitored is relative to The deviation of the first general probability is less than the second preset threshold, and it is determined that the current behavior of the user to be monitored is not at risk.
  • the various modules in the above-mentioned user behavior monitoring device based on big data can be implemented in whole or in part by software, hardware, and a combination thereof.
  • the foregoing modules may be embedded in the form of hardware or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the foregoing modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus.
  • the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a readable storage medium and an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer readable instructions in the readable storage medium.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection. When the computer-readable instructions are executed by the processor, a method for monitoring user behavior based on big data is realized.
  • the readable storage medium provided in this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • a computer device including a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor executes the computer-readable instructions. The following steps are implemented when ordering:
  • the identification information is used as a target variable, and the historical behavior record of the user to be monitored is used as an input parameter, and a naive Bayes algorithm is used for prediction to obtain the user probability corresponding to the user to be monitored;
  • the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information it is determined whether the user to be monitored is at risk.
  • one or more readable storage media storing computer readable instructions are provided.
  • the computer readable instructions are executed by one or more processors, the one or more processors execute the following step:
  • the identification information is used as a target variable, and the historical behavior record of the user to be monitored is used as an input parameter, and a naive Bayes algorithm is used for prediction to obtain the user probability corresponding to the user to be monitored;
  • the user probability corresponding to the user to be monitored and the execution probability corresponding to the identification information it is determined whether the user to be monitored is at risk.
  • a person of ordinary skill in the art can understand that all or part of the processes in the methods of the above-mentioned embodiments can be implemented by instructing relevant hardware through computer-readable instructions.
  • the computer-readable instructions can be stored in a non-volatile computer.
  • a readable storage medium or a volatile computer readable storage medium when the computer readable instruction is executed, it may include the processes of the above-mentioned method embodiments.
  • any reference to memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and/or volatile memory.
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Channel (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • Blockchain is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain is essentially a decentralized database. It is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Hardware Design (AREA)
  • Quality & Reliability (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

一种基于大数据的用户行为监测方法,该方法包括:获取第一预设时间段内的若干个用户及其历史行为记录和标识信息(S101);以所识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息的执行概率(S102);对于待监测用户,以标识信息作为目标变量,以待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的用户概率(S103);获取所述待监测用户的标识信息及其对应的执行概率(S104);根据所述待监测用户的用户概率与所述标识信息的执行概率,确定所述待监测用户是否存在风险(S105)。该方法解决了后台数据库因不断增加的数据表和交叉访问的需求产生的用户行为监测问题。该方法还涉及区块链及人工智能技术。

Description

基于大数据的用户行为监测方法、装置、设备及介质
本申请要求于2020年6月24日提交中国专利局、申请号为202010589176.0,发明名称为“基于大数据的用户行为监测方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息技术领域,尤其涉及一种基于大数据的用户行为监测方法、装置、设备及介质。
背景技术
随着企业信息化建设的快速发展,不少企业根据各自的业务需求,搭建了多套应用系统以适应各自发展的需要。企业的生产业务系统和后台管理系统会产生大量的业务数据与企业经营数据,保障这些数据的可靠性、有效性、可用性和准确性是企业全面信息化与数字化运营的关键。生产业务系统与后台管理系统通过给各个用户设定角色,以限制用户对数据的访问和使用。发明人意识到,在后台数据库的维护过程中,由于存在交叉访问各个数据表的需求,通常需要为已设定角色的用户额外授予其他操作权限,比如单独的数据库的表授权。由于无法对后台数据库角色进行细分,现有技术主要通过安全人员或者计算机逐条分析、使用穷举的方式对用户的非法行为进行排查,效率低,且耗费时间。
因此,寻找一种方法以解决后台数据库因不断增加的数据表和交叉访问的需求产生的用户行为监测问题成为本领域技术人员亟需解决的技术问题。
申请内容
本申请实施例提供了一种基于大数据的用户行为监测方法、装置、设备及介质,以解决后台数据库因不断增加的数据表和交叉访问的需求产生的用户行为监测问题。
一种基于大数据的用户行为监测方法,包括:
获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
可选地,所述获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息包括:
获取第一预设时间段内的若干个用户的历史操作数据;
将所述历史操作数据转换为SQL数据,并对所述SQL数据进行正则化清洗和解析,得到每一个用户对应的操作数据、标识信息;
遍历每一个用户,对所述用户对应的操作数据按预设时间周期进行聚合,并对聚合后的所述操作数据进行标准化处理,得到历史行为记录;
遍历每一个用户,对所述用户及其历史行为记录进行标识信息标记。
可选地,所述根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险包括:
比较所述待监测用户对应的用户概率与所述标识信息对应的执行概率;
若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值大于或等于第一预设阈值时,确定所述待监测用户的当前行为存在风险;
若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值小于第一预设阈值时,确定所述待监测用户的当前行为不存在风险。
可选地,所述方法还包括:
获取所述待监测用户在第二预设时间段内的历史行为记录;
以标识信息作为目标变量,以所述第二预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第一一般概率;
获取所述待监测用户在第三预设时间段内的历史行为记录;
以标识信息作为目标变量,以所述第三预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第二一般概率;
根据所述待监测用户的第一一般概率与第二一般概率,确定所述待监测用户是否存在风险。
可选地,所述根据所述待监测用户的第一一般概率与第二一般概率,确定所述待监测用户是否存在风险包括:
比较所述待监测用户的第一一般概率和第二一般概率;
若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值大于或等于第二预设阈值时,确定所述待监测用户的当前行为存在风险;
若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值小于第二预设阈值时,确定所述待监测用户的当前行为不存在风险。
可选地,所述方法还包括:
根据所述待监测用户对应的用户概率和标识信息对应的执行概率、所述待监测用户的第一一般概率和第二一般概率,确定所述待监测用户的操作是否存在风险。
可选地,所述根据所述待监测用户对应的用户概率和标识信息对应的执行概率、所述待监测用户的第一一般概率和第二一般概率,确定所述待监测用户的操作是否存在风险包括:
比较所述待监测用户对应的用户概率和标识信息对应的执行概率,比较所述待监测用户的第一一般概率和第二一般概率;
若所述待监测用户对应的用户概率相对于标识信息对应的执行概率的偏差大于或等于第一预设阈值,且所述待监测用户的第二一般概率相对于第一一般概率的偏差大于或等于第二预设阈值,确定所述待监测用户的当前行为存在风险;
若所述待监测用户对应的用户概率相对于标识信息对应的执行概率的偏差小于第一预设阈值,和/或所述待监测用户的第二一般概率相对于第一一般概率的偏差小于第二预设阈值,确定所述待监测用户的当前行为不存在风险。
一种基于大数据的用户行为监测装置,所述装置包括:
参数获取模块,用于获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
训练模块,用于以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
预测模块,用于对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
概率获取模块,用于获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
风险监测模块,用于根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
本申请实施例对用户行为进行不同维度分析,通过获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;以以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险;从而有利于发现用户的反常行为,减少、替代了人工排查数据库用户风险的时间和效率,提升了排查效果,有效地解决了后台数据库因不断增加的数据表和交叉访问的需求产生的用户行为监测问题。
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和优点将从说明书、附图以及权利要求变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例中基于大数据的用户行为监测方法的一流程图;
图2是本申请实施例中基于大数据的用户行为监测方法中步骤S101的一流程图;
图3是本申请实施例中基于大数据的用户行为监测方法中步骤S105的一流程图;
图4是本申请实施例中基于大数据的用户行为监测方法的另一流程图;
图5是本申请实施例中基于大数据的用户行为监测方法中步骤S110的一流程图;
图6是本申请实施例中基于大数据的用户行为监测方法的另一流程图;
图7是本申请实施例中基于大数据的用户行为监测方法中步骤S111的一流程图;
图8是本申请实施例中基于大数据的用户行为监测装置的一原理框图;
图9是本申请实施例中计算机设备的一示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请实施例提供的基于大数据的用户行为监测方法的目的是为了解决后台数据库因不断增加的数据表和交叉访问的需求产生的用户行为监测问题,以便发现用户的反常行为,减少、替代了人工排查数据库用户风险的时间和效率,解决现有技术对风险排查效果较低的问题,优化了实现路径,缩短排查时间,并能智能判定主动发现潜在风险,减少人工干涉;也有利于结合实际管理部门对于数据的规范性要求,给相应的数据管理制度和政策的制定提供基础。
以下将对本实施例提供的基于大数据的用户行为监测方法进行详细的描述。如图1所示,所述基于大数据的用户行为监测方法包括:
在步骤S101中,获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息。
在这里,所述第一预设时间段可根据需要设定。本实施例获取每一个用户对应的在所述第一预设时间段内的历史行为记录。所述历史行为记录是指用户在所述第一预设时间段内的无量纲的操作记录,所述标识信息是指用户对应的用户角色,是用户的操作凭证。
可选地,如图2所示,所述步骤S101包括:
在步骤S201中,获取第一预设时间段内的若干个用户的历史操作数据。
在这里,所述历史操作数据是指过去预设时间范围内的用户操作的集合,可以是从交换机实时采集流量数据的形式获得,也可以通过以数据库支持的慢查询、数据库日志文件的形式收集。优选地,为进一步保证上述历史操作数据的私密和安全性,上述历史操作数据还可以存储于一区块链的节点中。步骤S201还可以从区块链的节点中获取。
在步骤S202中,将所述历史操作数据转换为SQL数据,并对所述SQL数据进行正则化清洗和解析,得到每一个用户对应的操作数据、标识信息。
将所述历史操作数据还原为SQL格式的数据,以便于对所述历史操作数据进行处理。然后对所述SQL数据进行正则化清洗和解析,包括但不限于比如:清除文本中的特殊符号、标点、英文、数字,去除换行符,将多个空格转换为一个空格等。最终得到每一个用户对应的操作数据、标识信息。其中,所述操作数据是指用户针对数据表及其字段的查询或者插入或者更新或者创建或者删除的操作记录。所述标识信息是指用户对应的用户角色,是用户的操作凭证。一个标识信息对应一组权限配置的集合。本实施例按照业务条线、工作职能来划分出不同库、不同表的操作权限,比如建表、删表、插入、更新、查询为一组操作权限,插入、更新、查询为另一组操作权限。一个标识信息对应一组操作权限,这个标识信息可以赋权给一个乃至多个用户,从而使得具备所述标识信息的用户具有了对应的一组操作权限。
在步骤S203中,遍历每一个用户,对所述用户对应的操作数据按预设时间周期进行聚合,并对聚合后的所述操作数据进行标准化处理,得到历史行为记录。
通过上述步骤S202得到的操作数据是分散的,且数据量大。本实施例针对每一个用户,对所述用户对应的操作数据按预设时间周期来进行聚合。可选地,可以通过将操作记 录转换为哑变量,然后基于对哑变量的统计来实现对同一用户对应的操作数据进行聚合。具体地,当用户在某一时刻对一个数据表进行操作时,则生成该数据表的哑变量为1,未操作的其他数据表的哑变量为则0,遍历该用户的所有操作数据,然后将该用户的所有操作数据按照预设时间周期进行统计,比如按照天或者按照周或者按照月,以对所述用户重复的操作数据进行聚合,对在一个预设时间周期内所述用户的操作数据进行统计及累计次数,得到行为记录。
示例性地,假设用户X的操作数据包括:
1月1日,AM10:00查询数据表A,AM10:35查询数据表A;
1月2日,AM9:00查询数据表A,PM2:00修改数据表C……
若按照天来进行聚合时,则聚合后的行为记录为:
1月1日,查询数据表A两次;
1月2日,查询数据表A一次,修改数据表C一次。
在完成对操作数据的聚合后,这些操作数据作为原始指标数据,采用了不同的评价指标,具有不同的量纲和数量级。为了保证朴素贝叶斯算法输出结果的可靠性,本实施例对聚合得到的行为记录进行标准化处理,以将所述行为记录转换为无量纲化指标评价值,得到历史行为记录,作为所述朴素贝叶斯算法的训练参数。
在步骤S204中,遍历每一个用户,对所述用户及其历史行为记录进行标识信息标记。
在将标识信息从数据库的标识记录表,比如角色表,中提取出来后,本实施例进一步对每一个用户进行标记,以形成样本集合。
在步骤S102中,以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率。
如前所述,所述标识信息是指角色信息,是用户的操作凭证,即一组权限配置的集合。每一个标识信息对应一组操作权限,本申请实施例通过朴素贝叶斯算法得到每一个标识信息的执行概率,所述执行概率表示一组操作权限对应所属标识信息的概率,即从总体上看具有所述标识信息的用户执行该组操作权限的概率。
其中,朴素贝叶斯算法是基于贝叶斯定理与特征条件独立假设的分类方法,本实施例以用户及其对应的历史行为记录作为给定训练数据集,以标识信息作为目标变量,基于特征条件独立假设学习历史行为记录与标识信息之间的联合概率分布。具体地,以每一标识信息Y作为一个类别先估计出每个标识信息出现的先验概率P(Y=y j),可以通过计算具有相同标识信息y i的用户数量与用户总数的比值,作为所述标识信息的先验概率。针对每一个标识,计算在每个标识下出现各个操作X的条件概率
Figure PCTCN2021096700-appb-000001
具体可以通过计算具有指定标识信息y i且执行指定操作x i的用户数量与具有指定标识信息y i的用户总数的比值,作为具有所述指定标识信息y i的用户执行指定操作x i的条件概率。然后对于操作组合T,其包括多个指定操作x i,比如当指定操作x 1、x 2、x 3组成操作组合时,T=(x 1、x 2、x 3),计算具体操作组合T出现的条件下属于每一所述标识信息y i的后验概率
Figure PCTCN2021096700-appb-000002
其中全 概率公式
P(T)=P(Y=y 1)P(T|Y=y 1)+P(Y=y 2)P(T|Y=y 2)......,P(T|Y=y j)通过计算指定标识信息y i下用户执行操作组合T所包含的指定操作x i的条件概率乘积得到,从所述后验概率P(Y=y j|T)中选择最大值,得到后验概率最大值对应的标识信息y i和操作组合T,以所述后验概率最大值作为所述标识信息y i的执行概率,所述操作组合T作为所述标识信息y i对应的一组操作权限。所述的一组操作权限是具有对应的标识信息y i的用户可执行的,具有对应的标识信息y i的用户执行该组操作权限外的操作时可考虑是非法操作。
在步骤S103中,对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率。
在这里,与执行概率为标识信息所指代的角色的行为概率不同,所述用户概率是指用户的行为概率,表示一个用户所执行的一个操作组合对应一个具体角色的概率。对于一个用户的一个操作组合,针对每个标识信息,分别计算该标识信息的先验概率,以及在该标识信息条件下出现所述操作组合中每一操作的条件概率,然后计算该操作组合出现的条件下属于每一所述标识信息的后验概率,从中选择最大的后验概率作为用户概率,所述用户概率对应的标识信息作为所述用户的一个操作组合对应的标识信息。
对于一个确定的用户,已赋权一些操作权限,包括其标识信息对应的操作权限和一些额外配置的操作权限,本实施例通过朴素贝叶斯算法计算所述用户的操作组合的用户概率。
在步骤S104中,获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率。
根据所述待监测用户的标识信息,从步骤S102得到的操作权限、标识信息和执行概率的对应关系中查找到所述待监测用户的标识信息的执行概率。
在步骤S105中,根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
在这里,所述待监测用户的用户概率表示所述待监测用户执行的一组操作行为属于一个具体标识信息的概率,所述标识信息的执行概率表示一组操作权限属于所述具体标识的概率。本实施例通过将所述待监测用户的用户概率与所述标识信息对应的执行概率进行比较,可以判断出所述待监测用户的一组操作行为是否落在其所述标识信息对应的一组操作权限范围内,从而判断所述待监测用户是否存在风险。
可选地,如图3所示,所述步骤S105还包括:
在步骤S301中,比较所述待监测用户对应的用户概率与所述标识信息对应的执行概率。
在步骤S302中,若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值大于或等于第一预设阈值时,确定所述待监测用户的当前行为存在风险。
在步骤S303中,若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值小于第一预设阈值时,确定所述待监测用户的当前行为不存在风险。
在这里,若所述待监测用户执行的操作组合是其所属标识信息对应的操作权限,所述待监测用户的用户概率与所述标识信息对应的执行概率应该是相同或趋于相同的。若所述 待监测用户的用户概率与其所属标识信息对应的执行概率之间的偏差太大,则说明所述待监测用户的当前行为组合与其所属标识信息对应的操作权限不一致,存在操作风险。
可选地,所述第一预设阈值优选为3倍标准差。在这里,本实施例将属于同一个标识信息的用户及其用户概率梳理出来,以所述用户概率符合正态分布为前提,计算用户概率与所述执行概率之间的标准差,然后以3倍标准差作为第一预设阈值。当所述待监测用户的用户概率偏离其所属标识信息对应的执行概率超过所述3倍标准差时,则认为所述用户概率为一个离群点,确定所述待监测用户的当前行为存在风险,若所述待监测用户的用户概率偏离其所属标识信息对应的执行概率在所述3倍标准差以内时,则认为所述待监测用户的当前行为在合理的权限范围内,确定所述待监测用户的当前行为不存在风险,从而为每一个待监测用户保留了配置额外权限的空间,即使待监测用户执行了其标识信息对应的操作权限以外的一些额外权限,比如基于用户授权的对数据库中的零散表字段的额外操作权限,也认为是安全的。有利于解决主动发现数据操作和使用过程中的一些潜在违规或风险行为,同时也可以用来弥补技术人员在对业务应用不是完全熟悉的条件下,在敏感数据访问、数据安全防范、权限合规授予上的欠缺或疏忽,保障操作安全监测的同时留一些冗余空间。
可选地,作为本申请的另一个优选示例,在上述实施例的基础上,如图4所示,所述基于大数据的用户行为监测方法还包括:
在步骤S106中,获取所述待监测用户在第二预设时间段内的历史行为记录。
在这里,所述第二预设时间段小于所述第一预设时间段,比如第一预设时间段为3个月,则所述第二预设时间段为2个月。步骤S106具体请参见上述步骤S101的叙述,此处不再赘述。
在步骤S107中,以标识信息作为目标变量,以所述第二预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第一一般概率。
在这里,所述第一一般概率也是用户的行为概率,表示所述待监测用户在所述第二预设时间段内的一个操作组合对应其所属标识信息的概率。
在步骤S108中,获取所述待监测用户在第三预设时间段内的历史行为记录。步骤S108具体请参见上述步骤S101的叙述,此处不再赘述。
在这里,所述第三预设时间段小于所述第二预设时间段,比如第二预设时间段为2个月,则所述第三预设时间段为1个月。
在步骤S109中,以标识信息作为目标变量,以所述第三预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第二一般概率。
在这里,所述第二一般概率也是用户的行为概率,表示所述待监测用户在所述第三预设时间段内的一个操作组合对应其所属标识信息的概率。
在步骤S110中,根据所述待监测用户的第一一般概率与第二一般概率,确定所述待监测用户是否存在风险。
在这里,所述待监测用户的第一一般概率和第二一般概率均表示所述待监测用户在预设时间内执行的一组操作行为对应其所属标识信息的概率,本申请实施例以所述第一一般概率作为参照,基于所述第二一般概率相对所述第一一般概率的偏差情况,确定所述待监测用户在第三预设时间段内的操作组合是在第二预设时间段之前已经发生过,还是之前未发生过而在第三预设时间段内新发生的。如图5所示,所述步骤S110包括:
在步骤S501中,比较所述待监测用户的第一一般概率和第二一般概率。
在步骤S502中,若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值大于或等于第二预设阈值时,确定所述待监测用户的当前行为存在风险。
在步骤S503中,若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差 值小于第二预设阈值时,确定所述待监测用户的当前行为不存在风险。
在这里,若所述待监测用户在第二预设时间和第三预设时间内执行的操作组合是相同或相似的,那么所述第一一般概率和第二一般概率应当是相同或趋于相同的。若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差太大,则说明所述待监测用户的第三预设时间内的历史行为偏离第二设时间内的历史行为较大,是在第三预设时间段内新发生的操作组合,可能存在操作风险。可选地,所述第二预设阈值优选为3倍标准差。以所述用户概率符合正态分布为前提,通过计算历史上待监测用户的第二一般概率与所述第一一般概率之间的标准差,然后以3倍标准差作为第二预设阈值。
可选地,作为本申请的另一个优选示例,还可以结合标识信息的执行概率、用户的用户概率和第一一般概率、第二一般概率来监测用户的风险行为。如图6所示,所述基于大数据的用户行为监测方法还包括:
在步骤S111中,根据所述待监测用户对应的用户概率和标识信息对应的执行概率、所述待监测用户的第一一般概率和第二一般概率,确定所述待监测用户的操作是否存在风险。
如前所述,所述待监测用户的用户概率表示所述待监测用户执行的一组操作行为属于一个具体标识信息的概率,所述标识信息的执行概率表示一组操作权限属于所述具体标识的概率。所述待监测用户的第一一般概率表示所述待监测用户在所述第二预设时间内执行的一组操作行为对应其所属标识信息的概率,所述待监测用户的第二一般概率表示所述待监测用户在所述第三预设时间内执行的一组操作行为对应其所属标识信息的概率。本实施例通过判断所述待监测用户的一组操作行为是否落在其所述标识信息对应的一组操作权限范围内,以及判断所述待监测用户在第三预设时间段内的操作组合是在第二预设时间段之前已经发生过,来得出所述待监测用户是否存在风险。如图7所示,所述步骤S111包括:
在步骤S701中,比较所述待监测用户对应的用户概率和标识信息对应的执行概率,比较所述待监测用户的第一一般概率和第二一般概率。
在步骤S702中,若所述待监测用户对应的用户概率相对于标识信息对应的执行概率的偏差大于或等于第一预设阈值,且所述待监测用户的第二一般概率相对于第一一般概率的偏差大于或等于第二预设阈值,确定所述待监测用户的当前行为存在风险。
在步骤S703中,若所述待监测用户对应的用户概率相对于标识信息对应的执行概率的偏差小于第一预设阈值,和/或所述待监测用户的第二一般概率相对于第一一般概率的偏差小于第二预设阈值,确定所述待监测用户的当前行为不存在风险。
可选地,所述第一预设阈值和第二预设阈值优选为3倍标准差,比较原理分别见上述实施例的叙述,此处不再赘述。
综上所述,本实施例通过对用户行为历史数据学习,对用户及其所属标识信息进行常规性处理,便于发现用户的反常行为,减少、替代了人工排查数据库用户风险的时间和效率,解决现有技术进行风险排查效果较低的问题,优化了实现路径,缩短排查时间,并能智能判定主动发现潜在风险,减少人工干涉;也有利于结合实际管理部门对于数据的规范性要求,给相应的数据管理制度和政策的制定提供基础。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种基于大数据的用户行为监测装置,该基于大数据的用户行为监测装置与上述实施例基于大数据的用户行为监测方法一一对应。如图8所示,该基于大数据的用户行为监测装置包括第一参数获取模块81、训练模块82、第一预测模块83、概 率获取模块84、第一风险监测模块85。各功能模块详细说明如下:
第一参数获取模块81,用于获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
训练模块82,用于以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
第一预测模块83,用于对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
概率获取模块84,用于获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
第一风险监测模块85,用于根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
可选地,所述第一参数获取模块81包括:
数据获取单元,用于获取第一预设时间段内的若干个用户的历史操作数据;
预处理单元,用于将所述历史操作数据转换为SQL数据,并对所述SQL数据进行正则化清洗和解析,得到每一个用户对应的操作数据、标识信息;
聚合单元,用于遍历每一个用户,对所述用户对应的操作数据按预设时间周期进行聚合,并对聚合后的所述操作数据进行标准化处理,得到历史行为记录;
标记单元,用于遍历每一个用户,对所述用户及其历史行为记录进行标识信息标记。
可选地,所述第一风险监测模块85包括:
第一比较单元,用于比较所述待监测用户对应的用户概率与所述标识信息对应的执行概率;
第一风险判定单元,用于若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值大于或等于第一预设阈值时,确定所述待监测用户的当前行为存在风险;
第二风险判定单元,用于若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值小于第一预设阈值时,确定所述待监测用户的当前行为不存在风险。
可选地,所述装置还包括:
第二参数获取模块,用于获取所述待监测用户在第二预设时间段内的历史行为记录;
第二预测模块,用于以标识信息作为目标变量,以所述第二预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第一一般概率;
第三参数获取模块,用于获取所述待监测用户在第三预设时间段内的历史行为记录;
第三预测模块,用于以标识信息作为目标变量,以所述第三预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第二一般概率;
第二风险监测模块,用于根据所述待监测用户的第一一般概率与第二一般概率,确定所述待监测用户是否存在风险。
可选地,所述第二风险监测模块包括:
第二比较单元,用于比较所述待监测用户的第一一般概率和第二一般概率;
第三风险判定单元,用于若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值大于或等于第二预设阈值时,确定所述待监测用户的当前行为存在风险;
第四风险判定单元,用于若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值小于第二预设阈值时,确定所述待监测用户的当前行为不存在风险。
可选地,所述装置还包括:
第三风险监测模块,用于根据所述待监测用户对应的用户概率和标识信息对应的执行概率、所述待监测用户的第一一般概率和第二一般概率,确定所述待监测用户的操作是否 存在风险。
可选地,所述第三风险监测模块包括:
第三比较单元,用于比较所述待监测用户对应的用户概率和标识信息对应的执行概率,比较所述待监测用户的第一一般概率和第二一般概率;
第五风险判定单元,用于若所述待监测用户对应的用户概率相对于标识信息对应的执行概率的偏差大于或等于第一预设阈值,且所述待监测用户的第二一般概率相对于第一一般概率的偏差大于或等于第二预设阈值,确定所述待监测用户的当前行为存在风险;
第六风险判定单元,用于若所述待监测用户对应的用户概率相对于标识信息对应的执行概率的偏差小于第一预设阈值,和/或所述待监测用户的第二一般概率相对于第一一般概率的偏差小于第二预设阈值,确定所述待监测用户的当前行为不存在风险。
关于基于大数据的用户行为监测装置的具体限定可以参见上文中对于基于大数据的用户行为监测方法的限定,在此不再赘述。上述基于大数据的用户行为监测装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行时以实现一种基于大数据的用户行为监测方法。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。
在一个实施例中,提供了一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现如下步骤:
获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
在一个实施例中,提供一个或多个存储有计算机可读指令的可读存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过 计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质或者易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
需要说明的是,本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种基于大数据的用户行为监测方法,其中,包括:
    获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
    以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
    对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
    获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
    根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
  2. 如权利要求1所述的基于大数据的用户行为监测方法,其中,所述获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息包括:
    获取第一预设时间段内的若干个用户的历史操作数据;
    将所述历史操作数据转换为SQL数据,并对所述SQL数据进行正则化清洗和解析,得到每一个用户对应的操作数据、标识信息;
    遍历每一个用户,对所述用户对应的操作数据按预设时间周期进行聚合,并对聚合后的所述操作数据进行标准化处理,得到历史行为记录;
    遍历每一个用户,对所述用户及其历史行为记录进行标识信息标记。
  3. 如权利要求2所述的基于大数据的用户行为监测方法,其中,所述根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险包括:
    比较所述待监测用户对应的用户概率与所述标识信息对应的执行概率;
    若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值大于或等于第一预设阈值时,确定所述待监测用户的当前行为存在风险;
    若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值小于第一预设阈值时,确定所述待监测用户的当前行为不存在风险。
  4. 如权利要求1至3任一项所述的基于大数据的用户行为监测方法,其中,所述方法还包括:
    获取所述待监测用户在第二预设时间段内的历史行为记录;
    以标识信息作为目标变量,以所述第二预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第一一般概率;
    获取所述待监测用户在第三预设时间段内的历史行为记录;
    以标识信息作为目标变量,以所述第三预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第二一般概率;
    根据所述待监测用户的第一一般概率与第二一般概率,确定所述待监测用户是否存在风险。
  5. 如权利要求4所述的基于大数据的用户行为监测方法,其中,所述根据所述待监测用户的第一一般概率与第二一般概率,确定所述待监测用户是否存在风险包括:
    比较所述待监测用户的第一一般概率和第二一般概率;
    若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值大于或等于第二预设阈值时,确定所述待监测用户的当前行为存在风险;
    若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值小于第二预设阈值时,确定所述待监测用户的当前行为不存在风险。
  6. 如权利要求4所述的基于大数据的用户行为监测方法,其中,所述方法还包括:
    根据所述待监测用户对应的用户概率和标识信息对应的执行概率、所述待监测用户的第一一般概率和第二一般概率,确定所述待监测用户的操作是否存在风险。
  7. 如权利要求6所述的基于大数据的用户行为监测方法,其中,所述根据所述待监测用户对应的用户概率和标识信息对应的执行概率、所述待监测用户的第一一般概率和第二一般概率,确定所述待监测用户的操作是否存在风险包括:
    比较所述待监测用户对应的用户概率和标识信息对应的执行概率,比较所述待监测用户的第一一般概率和第二一般概率;
    若所述待监测用户对应的用户概率相对于标识信息对应的执行概率的偏差大于或等于第一预设阈值,且所述待监测用户的第二一般概率相对于第一一般概率的偏差大于或等于第二预设阈值,确定所述待监测用户的当前行为存在风险;
    若所述待监测用户对应的用户概率相对于标识信息对应的执行概率的偏差小于第一预设阈值,和/或所述待监测用户的第二一般概率相对于第一一般概率的偏差小于第二预设阈值,确定所述待监测用户的当前行为不存在风险。
  8. 一种基于大数据的用户行为监测装置,其中,所述装置包括:
    参数获取模块,用于获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
    训练模块,用于以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
    预测模块,用于对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
    概率获取模块,用于获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
    风险监测模块,用于根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
    以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
    对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
    获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
    根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
  10. 如权利要求9所述的计算机设备,其中,所述获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息包括:
    获取第一预设时间段内的若干个用户的历史操作数据;
    将所述历史操作数据转换为SQL数据,并对所述SQL数据进行正则化清洗和解析,得到每一个用户对应的操作数据、标识信息;
    遍历每一个用户,对所述用户对应的操作数据按预设时间周期进行聚合,并对聚合后的所述操作数据进行标准化处理,得到历史行为记录;
    遍历每一个用户,对所述用户及其历史行为记录进行标识信息标记。
  11. 如权利要求10所述的计算机设备,其中,所述根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险包括:
    比较所述待监测用户对应的用户概率与所述标识信息对应的执行概率;
    若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值大于或等于第一预设阈值时,确定所述待监测用户的当前行为存在风险;
    若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值小于第一预设阈值时,确定所述待监测用户的当前行为不存在风险。
  12. 如权利9至11任一项所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还实现如下步骤:
    获取所述待监测用户在第二预设时间段内的历史行为记录;
    以标识信息作为目标变量,以所述第二预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第一一般概率;
    获取所述待监测用户在第三预设时间段内的历史行为记录;
    以标识信息作为目标变量,以所述第三预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第二一般概率;
    根据所述待监测用户的第一一般概率与第二一般概率,确定所述待监测用户是否存在风险。
  13. 如权利要求12所述的计算机设备,其中,所述根据所述待监测用户的第一一般概率与第二一般概率,确定所述待监测用户是否存在风险包括:
    比较所述待监测用户的第一一般概率和第二一般概率;
    若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值大于或等于第二预设阈值时,确定所述待监测用户的当前行为存在风险;
    若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值小于第二预设阈值时,确定所述待监测用户的当前行为不存在风险。
  14. 如权利要求12所述的计算机设备,其中,所述处理器执行所述计算机可读指令时还实现如下步骤:
    根据所述待监测用户对应的用户概率和标识信息对应的执行概率、所述待监测用户的第一一般概率和第二一般概率,确定所述待监测用户的操作是否存在风险。
  15. 一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息;
    以所述标识信息作为目标变量,以所述若干个用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行训练,得到每一标识信息对应的执行概率;
    对于待监测用户,以标识信息作为目标变量,以所述待监测用户的历史行为记录作为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户对应的用户概率;
    获取所述待监测用户的标识信息,以及所述标识信息对应的执行概率;
    根据所述待监测用户对应的用户概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险。
  16. 如权利要求15所述的可读存储介质,其中,所述获取第一预设时间段内的若干个用户及其对应的历史行为记录和标识信息包括:
    获取第一预设时间段内的若干个用户的历史操作数据;
    将所述历史操作数据转换为SQL数据,并对所述SQL数据进行正则化清洗和解析,得到每一个用户对应的操作数据、标识信息;
    遍历每一个用户,对所述用户对应的操作数据按预设时间周期进行聚合,并对聚合后的所述操作数据进行标准化处理,得到历史行为记录;
    遍历每一个用户,对所述用户及其历史行为记录进行标识信息标记。
  17. 如权利要求16所述的可读存储介质,其中,所述根据所述待监测用户对应的用户 概率与所述标识信息对应的执行概率,确定所述待监测用户是否存在风险包括:
    比较所述待监测用户对应的用户概率与所述标识信息对应的执行概率;
    若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值大于或等于第一预设阈值时,确定所述待监测用户的当前行为存在风险;
    若所述待监测用户对应的用户概率相对于所述标识信息对应的执行概率的偏差值小于第一预设阈值时,确定所述待监测用户的当前行为不存在风险。
  18. 如权利要求15至17任一项所述的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    获取所述待监测用户在第二预设时间段内的历史行为记录;
    以标识信息作为目标变量,以所述第二预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第一一般概率;
    获取所述待监测用户在第三预设时间段内的历史行为记录;
    以标识信息作为目标变量,以所述第三预设时间段内的历史行为记录为输入参数,采用朴素贝叶斯算法进行预测,得到所述待监测用户的第二一般概率;
    根据所述待监测用户的第一一般概率与第二一般概率,确定所述待监测用户是否存在风险。
  19. 如权利要求18所述的可读存储介质,其中,所述根据所述待监测用户的第一一般概率与第二一般概率,确定所述待监测用户是否存在风险包括:
    比较所述待监测用户的第一一般概率和第二一般概率;
    若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值大于或等于第二预设阈值时,确定所述待监测用户的当前行为存在风险;
    若所述待监测用户的第二一般概率相对于所述第一一般概率的偏差值小于第二预设阈值时,确定所述待监测用户的当前行为不存在风险。
  20. 如权利要求18所述的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    根据所述待监测用户对应的用户概率和标识信息对应的执行概率、所述待监测用户的第一一般概率和第二一般概率,确定所述待监测用户的操作是否存在风险。
PCT/CN2021/096700 2020-06-24 2021-05-28 基于大数据的用户行为监测方法、装置、设备及介质 WO2021258992A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010589176.0 2020-06-24
CN202010589176.0A CN111737101B (zh) 2020-06-24 2020-06-24 基于大数据的用户行为监测方法、装置、设备及介质

Publications (1)

Publication Number Publication Date
WO2021258992A1 true WO2021258992A1 (zh) 2021-12-30

Family

ID=72650972

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/096700 WO2021258992A1 (zh) 2020-06-24 2021-05-28 基于大数据的用户行为监测方法、装置、设备及介质

Country Status (2)

Country Link
CN (1) CN111737101B (zh)
WO (1) WO2021258992A1 (zh)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114997720A (zh) * 2022-06-30 2022-09-02 建信金融科技有限责任公司 一种软件研发项目风险监测方法、装置、设备及存储介质
CN115549313A (zh) * 2022-11-09 2022-12-30 国网江苏省电力有限公司徐州供电分公司 基于人工智能的用电监测方法及系统
CN115827414A (zh) * 2023-02-15 2023-03-21 天津戎行集团有限公司 一种基于开源数据的网络用户行为监测分析方法
CN116523712A (zh) * 2023-07-04 2023-08-01 浙江海亮科技有限公司 一种打卡提醒方法、打卡系统、服务器、介质及程序产品
CN117130016A (zh) * 2023-10-26 2023-11-28 深圳市麦微智能电子有限公司 基于北斗卫星的人身安全监测系统、方法、装置及介质
CN117473475A (zh) * 2023-11-01 2024-01-30 北京宝联之星科技股份有限公司 基于可信计算的大数据安全防护方法、系统和介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111737101B (zh) * 2020-06-24 2022-05-03 平安科技(深圳)有限公司 基于大数据的用户行为监测方法、装置、设备及介质
CN112214387B (zh) * 2020-10-13 2023-11-24 中国银行股份有限公司 基于知识图谱的用户操作行为预测方法及装置
CN112185575B (zh) * 2020-10-14 2024-01-16 北京嘉和美康信息技术有限公司 一种确定待比对医疗数据的方法和装置
CN112686702A (zh) * 2020-12-31 2021-04-20 平安消费金融有限公司 羊毛党识别方法、装置、计算机设备及存储介质
CN112800107B (zh) * 2021-01-18 2023-02-03 湖北宸威玺链信息技术有限公司 一种数据源安全鉴别方法及系统及装置及介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105357216A (zh) * 2015-11-30 2016-02-24 上海斐讯数据通信技术有限公司 一种安全访问方法及系统
EP3139287A1 (en) * 2014-05-22 2017-03-08 Huawei Technologies Co., Ltd. User behavior recognition method, user equipment, and behavior recognition server
CN107220557A (zh) * 2017-05-02 2017-09-29 广东电网有限责任公司信息中心 一种用户越权访问敏感数据行为的检测方法及系统
CN111737101A (zh) * 2020-06-24 2020-10-02 平安科技(深圳)有限公司 基于大数据的用户行为监测方法、装置、设备及介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105590055B (zh) * 2014-10-23 2020-10-20 创新先进技术有限公司 用于在网络交互系统中识别用户可信行为的方法及装置
CN106910078A (zh) * 2015-12-22 2017-06-30 阿里巴巴集团控股有限公司 风险识别方法及装置
TWI801334B (zh) * 2017-01-24 2023-05-11 香港商阿里巴巴集團服務有限公司 風險識別方法及裝置
CN107566358B (zh) * 2017-08-25 2020-10-30 腾讯科技(深圳)有限公司 一种风险预警提示方法、装置、介质及设备
CN109086816A (zh) * 2018-07-24 2018-12-25 重庆富民银行股份有限公司 一种基于贝叶斯分类算法的用户行为分析系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3139287A1 (en) * 2014-05-22 2017-03-08 Huawei Technologies Co., Ltd. User behavior recognition method, user equipment, and behavior recognition server
CN105357216A (zh) * 2015-11-30 2016-02-24 上海斐讯数据通信技术有限公司 一种安全访问方法及系统
CN107220557A (zh) * 2017-05-02 2017-09-29 广东电网有限责任公司信息中心 一种用户越权访问敏感数据行为的检测方法及系统
CN111737101A (zh) * 2020-06-24 2020-10-02 平安科技(深圳)有限公司 基于大数据的用户行为监测方法、装置、设备及介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114997720A (zh) * 2022-06-30 2022-09-02 建信金融科技有限责任公司 一种软件研发项目风险监测方法、装置、设备及存储介质
CN115549313A (zh) * 2022-11-09 2022-12-30 国网江苏省电力有限公司徐州供电分公司 基于人工智能的用电监测方法及系统
CN115549313B (zh) * 2022-11-09 2024-03-08 国网江苏省电力有限公司徐州供电分公司 基于人工智能的用电监测方法及系统
CN115827414A (zh) * 2023-02-15 2023-03-21 天津戎行集团有限公司 一种基于开源数据的网络用户行为监测分析方法
CN116523712A (zh) * 2023-07-04 2023-08-01 浙江海亮科技有限公司 一种打卡提醒方法、打卡系统、服务器、介质及程序产品
CN116523712B (zh) * 2023-07-04 2023-11-24 浙江海亮科技有限公司 一种打卡提醒方法、打卡系统、服务器、介质及程序产品
CN117130016A (zh) * 2023-10-26 2023-11-28 深圳市麦微智能电子有限公司 基于北斗卫星的人身安全监测系统、方法、装置及介质
CN117130016B (zh) * 2023-10-26 2024-02-06 深圳市麦微智能电子有限公司 基于北斗卫星的人身安全监测系统、方法、装置及介质
CN117473475A (zh) * 2023-11-01 2024-01-30 北京宝联之星科技股份有限公司 基于可信计算的大数据安全防护方法、系统和介质
CN117473475B (zh) * 2023-11-01 2024-04-09 北京宝联之星科技股份有限公司 基于可信计算的大数据安全防护方法、系统和介质

Also Published As

Publication number Publication date
CN111737101A (zh) 2020-10-02
CN111737101B (zh) 2022-05-03

Similar Documents

Publication Publication Date Title
WO2021258992A1 (zh) 基于大数据的用户行为监测方法、装置、设备及介质
US7200616B2 (en) Information management system, control method thereof, information management server and program for same
US20190079965A1 (en) Apparatus and method for real time analysis, predicting and reporting of anomalous database transaction log activity
WO2017092447A1 (en) Method and apparatus for data quality management and control
CN112181840B (zh) 一种数据库状态的确定方法及装置、设备、存储介质
US20130018921A1 (en) Need-to-know information access using quantified risk
CN113392426A (zh) 用于增强工业系统或电功率系统的数据隐私的方法及系统
CN111966995A (zh) 一种基于用户行为的用户权限动态管控方法和装置以及设备
US20230177152A1 (en) Method, apparatus, and computer-readable recording medium for performing machine learning-based observation level measurement using server system log and performing risk calculation using the same
US9058470B1 (en) Actual usage analysis for advanced privilege management
US11416631B2 (en) Dynamic monitoring of movement of data
CN111522821A (zh) 维度表数据存储方法、装置、计算机设备及存储介质
Sallam et al. Result-based detection of insider threats to relational databases
Afshar et al. Incorporating behavior in attribute based access control model using machine learning
WO2022022042A1 (zh) 监控数据上报方法、装置、计算机设备及存储介质
CN114402301B (zh) 在共享检测模型系统中维护数据隐私的系统和方法
CN116756494A (zh) 数据异常值处理方法、装置、计算机设备和可读存储介质
CN116910023A (zh) 一种数据治理系统
El Hadj et al. Validation and correction of large security policies: A clustering and access log based approach
US20220407768A1 (en) Information distribution system, monitoring device, sharing device and information distribution method
WO2021017284A1 (zh) 基于皮质学习的异常检测方法、装置、终端设备及存储介质
Zheng et al. Currentclean: interactive change exploration and cleaning of stale data
CN116721704B (zh) 一种分级防护的生物信息数据库的更新方法及系统
US9459939B2 (en) In-memory approach to extend semantic event processing with domain insights
Oliver et al. A model for metricising privacy and legal compliance

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21829847

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21829847

Country of ref document: EP

Kind code of ref document: A1