Disclosure of Invention
To achieve the above objects and other advantages and in accordance with the purpose of the invention, a first object of the present invention is to provide an artificial intelligence and big data based user behavior prediction method, comprising the steps of:
buried point acquisition is carried out on enterprise user behavior data, and target behaviors are obtained;
Extracting operation data in the target behavior;
Classifying the operation data through an artificial intelligent model to obtain behavior characteristics;
predicting the risk of the user behavior through the behavior characteristics.
Further, the embedded point collection of the enterprise user behavior data is performed, and the target behavior acquisition comprises the step of describing the user behavior by using an event model, wherein the event model comprises an event entity and a user entity.
Further, the embedding point acquisition is performed on the enterprise user behavior data, and the obtaining of the target behavior comprises the step of embedding the point acquisition user behavior data by adopting a client software development tool;
the step of collecting user behavior data by adopting a client software development tool embedding point comprises the following steps:
Adopting software development kits corresponding to the platforms by using the software development kits of different clients;
submitting the acquired data to a server application programming interface;
the server side application programming interface writes the received data into a log file;
And reading the log file in real time through a log collecting module, and performing data processing and release processing.
Further, the embedding point acquisition is performed on the enterprise user behavior data, and the acquisition of the target behavior further comprises the step of embedding points by adopting a third party terminal software development tool to acquire the user behavior data;
the step of collecting user behavior data by adopting a third party terminal software development tool embedding point comprises the following steps:
Adding software development kit dependent information in a third party terminal application configuration file;
Setting a server address of a software development kit reporting application programming interface;
when the software development kit is initialized, the full embedded point is configured and opened by an initialization method provided by the software development kit, and the user behavior is automatically acquired;
submitting the acquired data to a server application programming interface;
the server side application programming interface writes the received data into a log file;
And reading the log file in real time through a log collecting module, and performing data processing and release processing.
Further, the extracting the operation data in the target behavior includes the steps of:
and extracting the time, the user ID, the equipment ID, the activity name and the activity attribute in the log file.
Further, the classifying the operation data through the artificial intelligence model to obtain the behavior characteristics comprises the following steps:
dividing the log file according to the user ID;
clustering adjacent behaviors of the same user ID according to the similarity, and outputting a clustered set;
The frequency of the user using the device, the activity frequency of the behavior, the time point frequency and the behavior object access frequency in the clustered set are calculated.
Further, the predicting the risk of the user behavior through the behavior feature includes the following steps:
Calculating the risk probability of the user behavior through the frequency of the user using equipment, the activity frequency of the behavior, the time point occurrence frequency and the behavior object access frequency;
If the calculated risk probability of the user behavior reaches a threshold value, early warning is carried out;
And if the calculated risk probability of the user behavior does not reach the threshold value, storing the user behavior.
A second object of the present invention is to provide an electronic device including: a memory having program code stored thereon; a processor coupled to the memory and which, when the program code is executed by the processor, implements the artificial intelligence and big data based user behavior prediction method.
A third object of the present invention is to provide a computer-readable storage medium having stored thereon program instructions which, when executed, implement the artificial intelligence and big data based user behavior prediction method.
A fourth object of the present invention is to provide a computer program product comprising a computer program/instructions, characterized in that the computer program/instructions, when executed by a processor, implement the artificial intelligence and big data based user behavior prediction method.
Compared with the prior art, the invention has the beneficial effects that:
The invention changes enterprise security protection from strategy driving to user-oriented driving, deeply analyzes user behaviors through accurately defined attributes, determines threat probability of users through an artificial intelligent model, has high detection accuracy, and can effectively avoid leakage of enterprise core data assets.
The foregoing description is only an overview of the present invention, and is intended to provide a better understanding of the present invention, as it is embodied in the following description, with reference to the preferred embodiments of the present invention and the accompanying drawings. Specific embodiments of the present invention are given in detail by the following examples and the accompanying drawings.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and detailed description, wherein it is to be understood that, on the premise of no conflict, the following embodiments or technical features may be arbitrarily combined to form new embodiments.
In the following description, suffixes such as "module", "component", or "unit" for representing elements are used only for facilitating the description of the present invention, and have no specific meaning per se. Thus, "module," "component," or "unit" may be used in combination.
Example 1
The user behavior prediction method based on artificial intelligence and big data, as shown in fig. 1, comprises the following steps:
And (3) carrying out buried point acquisition on the enterprise user behavior data to obtain target behaviors, such as user login, mail receiving and sending, web browsing and the like.
To clarify the user behavior description, the user behavior is described using an event model that includes event entities and user entities. The event model describes that a user has completed a particular thing at a certain point in time, somewhere, in a certain way. The user behavior can be clearly described by the event entity in combination with the user entity.
In order to collect clients as sources of behavioral data, embedding points are collected for enterprise user behavioral data, and obtaining target behaviors includes collecting user behavioral data using client software development tool embedding points.
As shown in fig. 2, the step of collecting user behavior data by using a client software development tool embedding point includes the steps of:
Adopting software development kits corresponding to the platforms by using the software development kits of different clients;
Submitting the acquired data to a server application programming interface; such as: and submitting the acquired data to a server application programming interface through JSON data.
The server side application programming interface writes the received data into a log file; in this embodiment, high-performance HTTP and reverse proxy web servers are used to receive data, so as to achieve high reliability and high scalability.
The data is received from the data generator by the Source component of the log collection system and passed to one or more channels in the event format of the log collection system, which provides a variety of means of data reception, such as Avro, thread, twitter1%, etc.
The data in event format received from Source component is cached by the Channel component of the log collection system until they are consumed, which acts as a bridge between Source component and Sink component, the Channel component being a complete transaction that ensures consistency of the data in transceiving, and it can be linked to any number of Source and Sink components. The types of support are JDBC channel, FILE SYSTEM CHANNEL, memory channel, etc.
The Sink component of the log collection system stores the data to a centralized memory such as: hbase and HDFS, which take data from the Channel component and pass it to the destination. The destination may be another Sink component, also HDFS, HBase.
In order to collect third party terminals serving as behavior data sources, carrying out embedded point collection on enterprise user behavior data, and acquiring target behaviors further comprises collecting the user behavior data by adopting a third party terminal software development tool embedded point;
The method for acquiring the user behavior data by adopting the embedding point of the third party terminal software development tool comprises the following steps:
Adding software development kit dependent information in a third party terminal application configuration file;
Setting a server address of a software development kit reporting application programming interface;
when the software development kit is initialized, the full embedded point is configured and opened by an initialization method provided by the software development kit, and the user behavior is automatically acquired;
submitting the acquired data to a server application programming interface;
the server side application programming interface writes the received data into a log file;
And reading the log file in real time through a log collecting module, and performing data processing and release processing.
Extracting operation data in a target behavior; in order to perform cleaning analysis on user behaviors and avoid missing important information, extracting operation data in target behaviors includes the steps of:
The time, user ID, device ID, activity name (e.g., user logged on website, logged off website, etc.), activity attribute (e.g., URL information in web browsing, etc.) in the log file are extracted.
Classifying the operation data through an artificial intelligent model to obtain behavior characteristics;
and predicting the risk of the user behavior through the behavior characteristics.
As shown in fig. 3, classifying the operation data by the artificial intelligence model, the obtaining the behavior feature includes the following steps:
The log files are divided according to the user IDs, so that all behavior data of each user are independently placed, and enterprise security protection is changed from strategy driving to user-oriented driving.
Clustering adjacent behaviors of the same user ID according to the similarity, and outputting a clustered set; and clustering the user behaviors to improve the subsequent searching speed of the user behavior data. In this embodiment, a clustering algorithm based on density is adopted, which includes the following steps:
Drawing a circle by taking each activity name data point xi as a circle center and eps as a radius;
counting the points contained within the circle; if the number of the points in the circle exceeds a density threshold MinPts, marking the circle center of the circle as a core point; if the number of points in the eps neighborhood of a certain data point is smaller than the density threshold value but falls in the neighborhood of the core point, the point is marked as a boundary point; points that are neither core points nor boundary points are noise points.
All points in the eps neighborhood of the core point xi are direct densities of xi; if xj is directly reached by xi density, xk is directly reached by xj density, …, and xn is directly reached by xk density, then xn is reached by xi density.
If for xk, both xi and xj are made reachable by xk density, then xi and xj are said to be connected by density. The densely connected points are connected together to form clusters.
The frequency of the user using the device, the activity frequency of the behavior, the time point frequency and the behavior object access frequency in the clustered set are calculated. Such as: when a user ID logs in to an unused device, then the frequency of the user ID using the current device ID is close to 0, and when a user ID logs in to a frequently used device, then the frequency of the user ID using the current device ID is close to 1. The definition of activity frequency, time point frequency, and behavior object access frequency of behavior can be deduced in this way.
Predicting the risk of a user behavior by means of behavior features comprises the steps of:
Calculating the risk probability of the user behavior through the frequency of using the device by the user, the activity frequency of the behavior, the time point occurrence frequency and the behavior object access frequency; such as: the extracted time, user ID, equipment ID, activity name, activity attribute, and frequency of using equipment by the user, frequency of activity ground, time point frequency, and frequency of accessing the activity object are used as input quantity of the trained neural network model, and risk probability of the user activity is output after calculation by the neural network model.
If the calculated risk probability of the user behavior reaches a threshold value, early warning is carried out;
And if the calculated risk probability of the user behavior does not reach the threshold value, storing the user behavior into a historical user behavior mode.
Example 2
An electronic device 200, as shown in FIG. 4, includes, but is not limited to: a memory 201 having program codes stored thereon; a processor 202 coupled to the memory and which, when the program code is executed by the processor, implements an artificial intelligence and big data based user behavior prediction method.
Example 3
A computer readable storage medium having stored thereon program instructions that when executed implement an artificial intelligence and big data based user behavior prediction method, as shown in fig. 5.
Example 4
A computer program product comprising computer programs/instructions which when executed by a processor implement a user behavior prediction method based on artificial intelligence and big data.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
The foregoing description is illustrative of embodiments of the present disclosure and is not to be construed as limiting one or more embodiments of the present disclosure. Various modifications and alterations to one or more embodiments of this description will be apparent to those skilled in the art. Any modifications, equivalent substitutions, improvements, or the like, which are within the spirit and principles of one or more embodiments of the present disclosure, are intended to be included within the scope of the claims of one or more embodiments of the present disclosure. One or more embodiments of the present specification.