CN114218569A - Data analysis method, device, equipment, medium and product - Google Patents

Data analysis method, device, equipment, medium and product Download PDF

Info

Publication number
CN114218569A
CN114218569A CN202111552118.1A CN202111552118A CN114218569A CN 114218569 A CN114218569 A CN 114218569A CN 202111552118 A CN202111552118 A CN 202111552118A CN 114218569 A CN114218569 A CN 114218569A
Authority
CN
China
Prior art keywords
data
risk
data analysis
model
time window
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111552118.1A
Other languages
Chinese (zh)
Inventor
袁晟
廖敏飞
吴孟晴
梁伟韬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
Original Assignee
China Construction Bank Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp filed Critical China Construction Bank Corp
Priority to CN202111552118.1A priority Critical patent/CN114218569A/en
Publication of CN114218569A publication Critical patent/CN114218569A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/554Detecting local intrusion or implementing counter-measures involving event detection and direct action
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security

Abstract

The invention discloses a data analysis method, a data analysis device, data analysis equipment, a data analysis medium and a data analysis product, and relates to the technical field of big data analysis. The method comprises the following steps: determining an application scene corresponding to each trigger operation received in a current time window; inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to a second application scene to obtain a corresponding intermediate state result; and inputting each intermediate state result into a pre-created risk scoring model to obtain a corresponding risk view. According to the embodiment of the invention, the data to be evaluated corresponding to each trigger operation is input into the pre-established data analysis model corresponding to the second application scene to obtain the corresponding intermediate state result, and each intermediate state result is input into the pre-established risk scoring model to obtain the corresponding risk view, so that the problems of single data analysis model, single use scene and low accuracy rate in user behavior risk evaluation are solved, and the identification accuracy rate of abnormal behaviors is improved.

Description

Data analysis method, device, equipment, medium and product
Technical Field
The embodiment of the invention relates to the field of big data, in particular to a data analysis method, a device, equipment, a medium and a product.
Background
User behavior risk detection is an important ring for risk control of the Internet system. With the increase of the number of users of the internet system, the system provides services for millions of online users every day, the scale and complexity of collected data are increased continuously, so that the risk assessment of user behaviors is more and more difficult, and malicious behaviors hidden in a large amount of normal network traffic are more and more difficult to discover, such as non-self login operation, high-risk command execution, false accounts, zombie accounts and the like. In the prior art, a manual checking system application log is usually searched by using keywords or matched by rules, and then correlation analysis is performed on other data sources in a manual tracing manner, the method needs manual keyword matching query, the manual workload is high, and the operation log is screened and checked manually, so that the method has the defects of high cost, low efficiency, low accuracy and the like. The machine learning technology is considered as an important method for automatically analyzing mass malicious behaviors, but the false alarm rate of the existing machine learning model is too high in partial scenes.
Disclosure of Invention
In view of this, the present invention provides a data analysis method, apparatus, device, medium, and product, which improve the accuracy of identifying abnormal behavior.
In a first aspect, an embodiment of the present invention provides a data analysis method, where the method includes:
determining an application scene corresponding to each trigger operation received in a current time window;
inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to the second application scene to obtain a corresponding intermediate state result;
and inputting each intermediate state result into a pre-created risk scoring model to obtain a corresponding risk view.
In a second aspect, an embodiment of the present invention further provides a data analysis apparatus, where the apparatus includes:
the first determining module is used for determining an application scene corresponding to each trigger operation received in the current time window;
the output module is used for inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to the second application scene to obtain a corresponding intermediate state result;
and the second determining module is used for inputting each intermediate state result into a pre-created risk scoring model to obtain a corresponding risk view.
In a third aspect, an embodiment of the present invention further provides a data analysis device, where the data analysis device includes: a memory, and one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data analysis method as in any one of the embodiments described above.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the data analysis method according to any one of the above embodiments.
In a fifth aspect, an embodiment of the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the data analysis method according to any of the above embodiments.
According to the embodiment of the invention, the application scene corresponding to each trigger operation received in the current time window is determined, the data to be evaluated corresponding to each trigger operation is input to the pre-established data analysis model corresponding to the second application scene to obtain the corresponding intermediate state result, and then each intermediate state result is input to the pre-established risk scoring model to obtain the corresponding risk view. According to the embodiment of the invention, the data to be evaluated corresponding to each trigger operation is input into the pre-established data analysis model corresponding to the second application scene to obtain the corresponding intermediate state result, and each intermediate state result is input into the pre-established risk scoring model to obtain the corresponding risk view, so that the problems of single abnormal behavior detection model, single use scene and low accuracy rate in user behavior risk evaluation are solved, and the identification accuracy rate of abnormal behaviors is improved. Compared with the prior art, the adopted data analysis method reduces the cost and improves the identification efficiency and accuracy of abnormal behaviors on the basis of increasing model iteration.
Drawings
FIG. 1 is a flow chart of a data analysis method provided by an embodiment of the invention;
FIG. 2 is a flow chart of another data analysis method provided by the embodiment of the invention;
FIG. 3 is a flow chart of another data analysis method provided by embodiments of the present invention;
FIG. 4 is a schematic flow chart illustrating another data analysis method according to an embodiment of the present invention;
FIG. 5 is a schematic illustration of a main flow chart of a data analysis method provided by an embodiment of the present invention;
FIG. 6 is a schematic diagram of a model iteration updating process according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of a sample library construction and supervised learning model training process according to an embodiment of the present invention;
fig. 8 is a block diagram of a data analysis apparatus according to an embodiment of the present invention;
fig. 9 is a schematic diagram of a hardware structure of a data analysis device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
In the prior art, in attack detection in network security and financial transaction fraud detection, an isolated Forest (iForest) is also a common unsupervised learning model, which is a fast anomaly detection method based on Ensemble learning (Ensemble learning), and has linear time complexity and high accuracy for how to find out which points are easy to be isolated. Assuming that a random hyperplane is used to cut the data space, two subspaces can be generated by cutting once, and then each subspace is cut by a random hyperplane, and the process is repeated until only one data point is in each subspace. Those data that are easily isolated may be considered anomalous data. In the analysis of abnormal behaviors of users, a single detection method is not suitable for complex scenes; and the phenomenon of accuracy reduction can occur after data updating, model iteration is lacked, and the phenomenon of identification effect reduction is caused.
In view of this, the embodiment of the present invention provides a data analysis method, which solves the problems of single abnormal behavior detection model, single usage scenario, and low accuracy rate in user behavior risk assessment, and improves the identification accuracy rate of abnormal behavior.
According to the technical scheme, the data acquisition, storage, use, processing and the like meet relevant regulations of national laws and regulations. In an embodiment, fig. 1 is a flowchart of a data analysis method provided in an embodiment of the present invention, and this embodiment is applicable to a case where security analysis is automatically performed on user behavior. The present embodiment may be performed by a data analysis device. Wherein the data analysis device may be a computer. As shown in fig. 1, the present embodiment may include the following steps:
and S110, determining an application scene corresponding to each trigger operation received in the current time window.
The application scene comprises the following steps: the method comprises the steps of obtaining a first application scene obtained by dividing according to coarse granularity and obtaining a second application scene obtained by dividing according to fine granularity, wherein the second application scene is contained in the first application scene, and the second application scene is obtained by dividing the first application scene.
Before analyzing the trigger operation of the user, time division is performed according to a preset time length to obtain a plurality of time windows. Wherein, the current time window refers to a time period for which data analysis is currently required. A trigger operation may be understood as one or more operations performed by a user within a current time window. For example, the trigger operation may be an operation action of a landing page; the operation action of menu selection can be carried out, and the page action during transaction can also be carried out; the present embodiment is not limited thereto.
In this embodiment, after receiving a trigger operation of a user in a current time window, an application scenario corresponding to each trigger operation received in the current time window is determined. The application scenes can be divided according to the coarse granularity, the fine granularity or both. It can be understood that, when the division is performed according to the coarse granularity, the application range of the obtained application scene is large; and when the division is carried out according to the fine granularity, the application range of the obtained application scene is smaller. In an actual operation process, in an application scenario with a relatively large range, the application scenario with the relatively large range may be divided into a plurality of application scenarios with relatively small ranges. For example, a wider application scenario may be a bank transaction flow; the application scenes with a small range can be application scenes such as landing pages, transferring accounts and remittance. As another example, a scene with a large range may be a landing page; the scenes with smaller range can be different in login modes, such as a scene when login is performed in a short message verification code mode, or a scene when login is performed in a password input mode; the scene can also be a scene when the user logs in by adopting a sliding block mode; the present embodiment is not limited thereto.
And S120, inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to the second application scene to obtain a corresponding intermediate state result.
In this embodiment, the data to be evaluated may be understood as data obtained by performing a series of operations, such as preprocessing, on the raw data and the risk scoring result of the previous time window. Wherein, the original data can be application log data; data relating to the user's behavior may also be presented; it can also be the relevant information data of the network environment; the present implementation is not limited thereto. The risk score result of the last time window may be understood as a risk score result obtained in the last time period for data analysis, and for example, the risk score result of the last time window may be a relatively high score; may also be a relatively low score; the present embodiment is not limited thereto.
In this embodiment, the data analysis model may be understood as a pre-created model required for performing data analysis on the data to be evaluated corresponding to each trigger operation. It should be noted that each second application scenario has a corresponding data analysis model, and for example, each second application scenario may present a one-to-one correspondence relationship with the data analysis model; each second application scenario may also present a one-to-many relationship with the data analysis model, i.e. one scenario corresponds to multiple data analysis models. The data analysis model may be an unsupervised learning model; it can also be a rule analysis model; the model can also be a supervised learning model; the present embodiment is not limited thereto.
In this embodiment, the intermediate state result may be understood as analyzing the result output by the data analysis model and performing a simple pre-judging operation first. For example, the intermediate state result may be predicted to be in a normal state; it may be determined that the state is abnormal in advance.
In this embodiment, after determining the application scenario corresponding to each received trigger operation within the current time window, the data to be evaluated corresponding to each trigger operation may be input into a pre-created data analysis model corresponding to the second application scenario to obtain a corresponding intermediate state result.
And S130, inputting each intermediate state result into a pre-created risk scoring model to obtain a corresponding risk view.
The risk view can be understood as a risk behavior view of the user, and risk levels can be classified according to a risk scoring model. In the actual operation process, the risk level of the user can influence the risk scoring structure of the next time window. For example, the risk level of the user in the current time period may be divided into a first level, a second level and a third level, wherein the first level indicates the highest level and the second level decreases in sequence. The higher the risk level of the user in the current time period, the higher the risk level of the user with abnormal behavior, and correspondingly, the greater the influence on the next time window.
Specifically, the risk scoring model may dynamically set the risk scores of the intermediate state results of the different data analysis models according to the content output by each intermediate state result, and calculate the risk scores of the different data analysis models in the time period, and perform certain comprehensive study and judgment.
In this embodiment, after obtaining the corresponding intermediate state result according to each data analysis model, inputting each intermediate state result into a risk scoring model created in advance, and obtaining a risk view corresponding to each intermediate state result. It should be noted that the intermediate state results are obtained by the data analysis model, the plurality of data analysis models correspond to the plurality of intermediate state results, the plurality of intermediate state results correspond to one risk scoring model, and then the scores of each model in the data analysis models can be obtained, and a corresponding risk view can be obtained by performing certain comprehensive study and judgment.
According to the embodiment of the invention, the application scene corresponding to each trigger operation received in the current time window is determined, the data to be evaluated corresponding to each trigger operation is input to the pre-established data analysis model corresponding to the second application scene to obtain the corresponding intermediate state result, and then each intermediate state result is input to the pre-established risk scoring model to obtain the corresponding risk view. According to the embodiment of the invention, the data to be evaluated corresponding to each trigger operation is input into the pre-established data analysis model corresponding to the second application scene to obtain the corresponding intermediate state result, and each intermediate state result is input into the pre-established risk scoring model to obtain the corresponding risk view, so that the problems of single abnormal behavior detection model, single use scene and low accuracy rate in user behavior risk evaluation are solved, and the identification accuracy rate of abnormal behaviors is improved. Compared with the prior art, the adopted data analysis method reduces the manpower and material resources required to be input for data analysis and detection, reduces the cost, and improves the identification efficiency and accuracy of abnormal behaviors.
In an embodiment, fig. 2 is a flowchart of another data analysis method provided in an embodiment of the present invention, and the embodiment is further detailed based on the above embodiments. As shown in fig. 2, the data analysis method in this embodiment may specifically include the following steps:
s210, determining an application scene corresponding to each trigger operation received in the current time window.
And S220, acquiring the original data in the current time window and the risk scoring result of the previous time window.
Wherein the raw data may include one of: application log data, user behavior data, device fingerprints, and network environment information. Where application Log data can be understood as Log-type data categories, usually ending in Log, which records the user's behavioral trace. The application log data may be data generated by the user within one day; or data generated by the user over a period of time; the present embodiment is not limited thereto.
In this embodiment, the user behavior data may be understood as some behavior operations performed by the user within the current time window. For example, the user behavior data may be behavior data of a user clicking on a gambling website; behavior data of a cheating website can be clicked for the user; the present embodiment is not limited thereto.
In this embodiment, the device fingerprint may be a fingerprint of the user when the user applies the smartphone; or the fingerprint of the user for inquiring by using a computer; fingerprints and the like for a user to inquire by using the ipad; the present embodiment is not limited thereto. The network environment information may be information about the accessed network environment, such as a network environment in a secure network state, and may also be in a dangerous network environment, such as browsing a gambling website, a fraud website, and so on.
It is understood that there are many forms of raw data, and for example, raw data may be text data; the data can also be image data, audio data or a mixture of several kinds of data; the present embodiment is not limited thereto.
In this embodiment, the raw data in the current time window and the risk scoring result of the previous time window may be obtained. The risk scoring result of the previous time window can be understood as a risk scoring result in the previous time window, which is obtained according to the application scenario, the data analysis model and the risk analysis model. In the embodiment, the risk scoring result of the previous time window is used as one of the input data, so that the result of data analysis is more accurate, and the accuracy and the efficiency are improved.
And S230, carrying out data preprocessing on the original data and the risk scoring result to obtain intermediate data in a target data format.
Among these, data preprocessing may be understood as normalization of attribute values. The intermediate data can be understood as intermediate data with relevant target data format after data preprocessing. The target data format is related to configuration parameters of the current application scenario, and is not limited thereto. In the embodiment, the raw data and the risk scoring result are subjected to data preprocessing to obtain intermediate data in a target data format, so that data analysis of the intermediate data is facilitated, and the efficiency of data analysis is improved.
In this embodiment, after the raw data in the current time window and the risk scoring result of the previous time window are obtained, data preprocessing is performed on the raw data and the risk scoring result to obtain intermediate data in the target data format. The data preprocessing method for the raw data may be to perform preprocessing operations such as parsing, cleaning, and standardizing on the raw data so as to unify the data format conforming to the feature extraction, so as to obtain intermediate data in the target data format.
And S240, performing feature construction operation on the intermediate data to obtain corresponding data to be evaluated.
Wherein, the feature construction operation can be understood as performing related data analysis and feature extraction on the raw data and the risk scoring result.
In this embodiment, for each application scenario, there are raw data of the corresponding application scenario and a risk scoring result, and the analysis and feature extraction of the relevant data of each application scenario are required to be performed to obtain corresponding data to be evaluated.
And S250, determining a corresponding data analysis model according to the application scene and the data characteristics of the original data corresponding to the trigger operation.
In this embodiment, the corresponding data analysis model is determined according to the application scenario and the data characteristics of the raw data corresponding to the trigger operation.
It should be noted that different application scenarios select different data analysis models, and different application scenarios can freely select the data analysis models. The data analysis model can be randomly selected according to a specific application scene and data characteristics corresponding to the original data, and one application scene may need to adopt a plurality of data analysis models for model training; an application scenario may also require only one model for model training.
In this embodiment, the data analysis model is selected from an unsupervised learning model, a rule analysis model and a supervised learning model according to the application scenario and the data characteristics of the original data corresponding to the trigger operation, and the data analysis models corresponding to different application scenarios are different. Illustratively, n application scenarios are total, n models need to be selected from model analysis models, the number of the models selected in the unsupervised learning model, the regular analysis model and the supervised learning model is random, and the data analysis models are judged and selected according to specific scenarios, wherein the number of the models selected in the unsupervised learning model, the regular analysis model and the supervised learning model is possible to be unsupervised learning models, regular analysis models and supervised learning models.
In this embodiment, the execution sequence of S250 and S220, S230, and S240 is not sequential. After determining the application scene corresponding to each trigger operation received in the current time window, S220, S230, and S240 may be executed first, and then S250 is executed; or may execute S250 first, and then execute S220, S230, and S240; s220, S230, and S240 and S250 may also be executed simultaneously, and the present embodiment is not limited herein.
And S260, inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to the second application scene to obtain a corresponding intermediate state result.
And S270, inputting each intermediate state result into a pre-created risk scoring model to obtain a corresponding risk view.
And S280, performing label processing on the corresponding original data according to the risk scoring result output by the risk scoring model to obtain corresponding first type label data.
The first type of tag data may be understood as normal data or abnormal data obtained after tag processing is performed on the original data. For example, if the tag data is normal data after the tag processing, the corresponding first type tag data is normal data; on the contrary, if the tag processing is performed and then the data is abnormal data, the corresponding first type tag data is abnormal data.
In this embodiment, the corresponding original data may be subjected to tag processing according to a risk scoring result output by the risk scoring model to obtain corresponding first-type tag data, and then the first-type tag data is stored in a pre-created user behavior sample library.
And S290, storing the first type label data into a pre-created user behavior sample library.
In an embodiment, a creating method of a pre-created user behavior sample library includes:
automatically performing label processing on the original data through an unsupervised learning model and a rule analysis model to obtain corresponding second type label data;
and creating a corresponding user behavior sample library according to the second type label data.
It should be noted that the second type of label data is obtained by automatically performing label processing on the raw data through an unsupervised learning model and a rule analysis model. And the first type of label data is obtained by performing label processing on corresponding original data according to a risk scoring result output by the risk scoring model. It is to be understood that the second type of tag data is acquired before the first type of tag data.
The second type of label data can be understood as label data obtained after label processing is automatically performed on the raw data through an unsupervised learning model and a rule analysis model. The unsupervised learning model can be understood as training samples without label classification and directly carrying out the training samples. The unsupervised learning model directly trains samples on all raw data.
In the present embodiment, the rule analysis model can be understood as a result obtained through routine experience. Illustratively, when a user logs in a bank transaction process by using a mobile phone, regions are suddenly changed, login time is suddenly changed, a login mode is suddenly changed, and the like, the rule analysis model can automatically perform label processing on original data, and automatically label the operation of the bank transaction process during login as abnormal data.
In this embodiment, the original data is automatically subjected to label processing through the unsupervised learning model and the rule analysis model to obtain corresponding second-type label data, so as to create a corresponding user behavior sample library according to the second-type label data.
In one embodiment, the data analysis method further includes:
acquiring first type label data and/or second type label data in a user behavior sample library;
the supervised learning model is trained using the first type of label data and/or the second type of label data.
The supervised learning model can be understood as that the exercise book is classified by a certain label, normal data and abnormal data are classified, and then sample training is carried out. The supervised learning model is used for carrying out sample training in a targeted mode.
In the implementation, after the first type tag data and/or the second type tag data in the user behavior sample library are obtained, the supervised learning model is trained by using the first type tag data and/or the second type tag data.
According to the technical scheme of the embodiment of the invention, on the basis of the embodiment, the original data in the current time window and the risk scoring result of the previous time window are obtained, and the data preprocessing is performed on the original data and the risk scoring result to obtain the intermediate data in the target data format, so that the risk scoring result of the previous time window is also used as one of the input parameters by the risk assessment model, and the identification efficiency and accuracy of abnormal behaviors are further improved; and performing label processing on corresponding original data according to a risk scoring result output by the risk scoring model to obtain corresponding first type label data, storing the first type label data into a pre-established user behavior sample library, and automatically labeling related data, so that sample databases such as a malicious behavior library, a normal behavior library, a suspicious behavior library and the like are constructed, and a training supervision learning model is possible.
In an embodiment, fig. 3 is a flowchart of another data analysis method provided in an embodiment of the present invention, and on the basis of the foregoing embodiments, the present embodiment further refines an application scenario corresponding to each trigger operation received in a current time window, inputs to-be-evaluated data corresponding to each trigger operation into a pre-created data analysis model corresponding to a second application scenario, obtains a corresponding intermediate state result, and inputs each intermediate state result into a pre-created risk scoring model, so as to obtain a corresponding risk view. As shown in fig. 3, the data analysis method in this embodiment may specifically include the following steps:
s310, determining a first application scene corresponding to each trigger operation received in the current time window.
In this embodiment, a first application scenario corresponding to each trigger operation received in the current time window may be determined. Wherein the first application scenario may be a transaction, a transfer, a remittance, and the like.
S320, determining a second application scene of each trigger operation in the first application scene; wherein the second application scenario is contained within the first application scenario.
The second application scenario can be understood as a small scenario in the first application scenario. For example, when the first application scenario is a transaction, the second application scenario may be login, menu click, transfer, and the like; when the first application scene is login, the second application scene can be a scene when login is performed in a short message verification code mode or a scene when login is performed in a password input mode; the scene during login can be carried out in a sliding block mode; the present embodiment is not limited thereto.
In this embodiment, after determining the first application scenario corresponding to each trigger operation received in the current time window, a second application scenario of each trigger operation in the first application scenario may be determined.
S330, inputting the data to be evaluated corresponding to each trigger operation into a pre-created data analysis model corresponding to the second application scene.
In this embodiment, after the second scenario is determined, the data to be evaluated corresponding to each trigger operation may be input to a pre-created data analysis model corresponding to the second application scenario, so as to perform correlation processing on the data to be evaluated.
S340, clustering and screening the data to be evaluated through the data analysis model to obtain a corresponding intermediate state result.
The clustering may be understood as a large class of unsupervised learning, and a data set is divided into different classes or clusters according to a certain criterion, for example, a distance criterion, so that the similarity of data objects in the same cluster is as large as possible, and the difference of data objects not in the same cluster is also as large as possible. After clustering, the data of the same class are gathered together as much as possible, and different data are separated as much as possible. Clustering can be classified into clustering methods based on division, hierarchy, density, network, model and the like, and common models such as K-MEANS, DBSCAN, GMM and the like exist.
In this embodiment, after the data to be evaluated corresponding to each trigger operation is input to the data analysis model corresponding to the second application scenario, the data to be evaluated may be clustered and screened through the data analysis model to obtain a corresponding intermediate state result.
And S350, inputting the intermediate state result output by the data analysis model corresponding to each second application scene into a pre-established risk score model to obtain the risk score corresponding to each data analysis model.
In this embodiment, the intermediate state result output by the data analysis model corresponding to each second application scenario is input into the pre-created risk score model, so as to obtain the risk score corresponding to each data analysis model. Illustratively, the second application scenario includes data of a scene 1, data of a scene 2, and data of a scene 3, and the corresponding data analysis models are model 1, model 2, and model 3, respectively, so that a mesomorphic result output by the data analysis model can be obtained, and then the result is input into a pre-created risk score model, so that a risk score of the model 1, a risk score of the model 2, and a risk score of the model 3 can be obtained accordingly.
And S360, determining a risk scoring result of the current time window according to the risk scoring corresponding to each data analysis model, the predetermined abnormal scoring and the risk scoring result of the previous time window.
In this embodiment, the risk scoring result of the current time window is determined according to the risk scoring corresponding to each data analysis model, the predetermined abnormal scoring, and the risk scoring result of the previous time window.
In this embodiment, the manner of determining the risk score result of the current time window according to the risk score corresponding to each data analysis model, the predetermined abnormal score, and the risk score result of the previous time window may be that the risk score corresponding to the data analysis model, the predetermined abnormal score, and the risk score result of the previous time window are weighted and averaged to obtain the risk score result of the current time window.
In an embodiment, determining the risk score result of the current time window according to the risk score corresponding to each data analysis model, the predetermined abnormal score and the risk score result of the previous time window includes:
and carrying out weighted average on the risk score corresponding to the data analysis model, the predetermined abnormal score and the risk score result of the previous time window to obtain the risk score result of the current time window.
In this embodiment, the risk score result of the current time window may be obtained by performing a weighted average on the risk score corresponding to the data analysis model, the predetermined abnormal score, and the risk score result of the previous time window. It can be understood that the risk scoring result of the previous time window affects the risk scoring result of the current time window, and if the risk scoring result of the previous time window is relatively low, the impact on the risk scoring result of the current time window is relatively low; conversely, if the risk score result of the previous time window is relatively high, the impact on the risk score result of the current time window is relatively high.
In this embodiment, the higher the risk score of the current time window, the higher the risk present; conversely, a lower risk score for the current time window indicates a lower risk. Illustratively, the formula Score can be usedn=(1-α)*Scoren-1+Scorecatagory+ScoreanormalyWherein, Scoren(1- α) Score representing the Risk Score result for the current time windown-1Representing the risk Score result of the last time window, ScorecatagoryRepresents the corresponding risk Score, according to each data analysis modelanormalyRepresenting a predetermined anomaly score, alpha being a decay factor, each parameter beingTo make a dynamic setting.
In one embodiment, the data analysis model includes at least one of the following types: unsupervised learning models, rule analysis models, and supervised learning models.
And S370, obtaining a corresponding risk view according to the risk scoring result of the current time window.
In this embodiment, after obtaining the risk scoring result of the current time window according to the risk scoring corresponding to each data analysis model, the predetermined abnormal score, and the risk scoring result of the previous time window, the corresponding risk view may be obtained according to the risk scoring result of the current time window.
In the embodiment of the invention, a first application scene corresponding to each trigger operation received in a current time window is determined; determining a second application scenario of each trigger operation in the first application scenario; inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to a second application scene; and clustering and screening the data to be evaluated through a data analysis model to obtain a corresponding intermediate state result. Inputting the intermediate state result output by the data analysis model corresponding to each second application scene into a pre-established risk score model to obtain a risk score corresponding to each data analysis model; determining a risk scoring result of the current time window according to the risk scoring corresponding to each data analysis model, the predetermined abnormal scoring and the risk scoring result of the previous time window; and obtaining a corresponding risk view according to the risk scoring result of the current time window. In the embodiment of the invention, a first application scene corresponding to each trigger operation received in a current time window is determined; determining a second application scenario of each trigger operation in the first application scenario; the data to be evaluated corresponding to each trigger operation is input into a pre-established data analysis model corresponding to a second application scene, so that the problems of low accuracy and low efficiency caused by using a single scene and a single data analysis model in user behavior risk evaluation are solved; and determining the risk scoring result of the current time window according to the risk scoring corresponding to each data analysis model, the predetermined abnormal scoring and the risk scoring result of the previous time window, so that the identification efficiency and accuracy of the abnormal behavior are further improved.
In an embodiment, fig. 4 is a schematic flow chart of another data analysis method according to an embodiment of the present invention. This embodiment is based on the above-mentioned embodiments, and is a preferred embodiment, which describes the process of the data analysis method, and the method includes the following steps:
and S410, dividing application scenes.
Splitting an analysis application scene by referring to the idea of ensemble learning, and selecting a data analysis model and an analysis rule for different application scenes, for example, dividing a transaction flow into login, menu click, transfer and the like;
and S420, collecting original data.
And setting a time window to acquire original data such as system application log data, user behavior data, equipment fingerprints, network environment information and the like.
And S430, preprocessing data.
And preprocessing operations such as data analysis, cleaning and standardized operation are carried out on the original data, so that the original data are unified into a data format conforming to the characteristic extraction module.
And S440, constructing characteristics.
And converting the preprocessed data into a characteristic matrix, and carrying out operations such as normalization, dimension reduction and the like on the matrix for a data analysis model.
And S450, training a data analysis model.
And training a machine learning and rule analysis model by using the characteristic matrix, wherein the model comprises a common clustering model, a tree model (isolated forest iForest) and the like, and clustering and screening are carried out on data.
Wherein, S430, S440, and S450 may be collectively referred to as a data analysis model.
And S460, grading the risk grading model.
The intermediate state results are gathered and pushed to a risk scoring model, the risk scoring model can dynamically set risk scores of different analysis results according to scenes, and risk scores of the user under the time window are calculated.
And S470, comprehensively studying and judging.
And referring to the idea of ensemble learning, carrying out weighted average on the risk score corresponding to the data analysis model, the predetermined abnormal score and the risk score result of the previous time window, and obtaining the risk score result of the current time window in consideration of the fact that the user behavior has time relevance. It is known that the risk score will also be affected by the risk score of the user for the last time window.
And combining a plurality of model results based on the thought of ensemble learning to obtain a comprehensive analysis result, and finally calculating the score.
The calculation method can be as follows: scoren=(1-α)*Scoren-1+Scorecatagory+ScoreanormalyWherein, Scoren(1- α) Score representing the Risk Score result for the current time windown-1Representing the risk Score result of the last time window, ScorecatagoryRepresents the corresponding risk Score, according to each data analysis modelanormalyRepresenting a predetermined anomaly score, alpha being a decay factor, each of which can be dynamically set.
And S480, risk view.
The user's risk behavior view may be ranked in risk according to risk scores.
Wherein, S460, S470, and S480 may be collectively referred to as a risk scoring model.
An iterative update operation is performed between the data analysis model and the risk assessment model.
And S490, behavior sample library.
And after the corresponding risk view is obtained, performing data label processing on the corresponding original data according to the output risk scoring result, storing the processed data in a user behavior sample library, and performing model training on the supervised learning model.
Fig. 5 is a schematic main flow chart of a data analysis method according to an embodiment of the present invention. The main process is a process of analyzing the risk level of the user under a specific time window. The process comprises the steps of constructing a data analysis model and a risk scoring model. As shown in fig. 5, the steps of the main flow chart of the data analysis method are as follows:
a1, dividing application scenes, and dividing one application scene into a plurality of small application scenes.
a2, acquisition of raw data.
Collecting original data such as system application log data, user behavior data, device fingerprint, network environment information, etc
a3, under different application scenes, carrying out model selection and model training in a data analysis model by analyzing and extracting features of original data. The data analysis model comprises the following steps: supervised learning models, unsupervised learning models, and rule analysis models.
and a4, obtaining an intermediate state result through a data analysis model, and carrying out model scoring and comprehensive study and judgment in a risk assessment model to obtain a risk view.
Fig. 6 is a schematic diagram of a model iterative update flow provided in an embodiment of the present invention. The iterative updating process is to take the result of the last time window of the user as one of the inputs of the next time window, optimize the model and the rule, and compare the analysis score value with the manual tracing condition of the user based on the result calculated by the risk score model. As shown in fig. 6, the model iterative update flow is as follows:
and S610, inputting the Tn analysis result into the corresponding data analysis model.
Wherein the Tn analysis results represent the risk score results for the current time window. Wherein, the risk scoring result of the current time window can be obtained by selecting the data analysis flow steps S420-S480 to be executed.
And S620, inputting the Tn analysis result into the corresponding data analysis model.
Wherein the Tn analysis results represent the risk score results for the current time window. Wherein the risk score result of the current time window.
Wherein the Tn +1 analysis result represents the risk score result of the last time window. It should be noted that the risk scoring result of the previous time window may be a risk scoring result within the previous time window obtained according to the application scenario, the data analysis model, and the risk analysis model.
In an embodiment, the risk scoring result of the current time window obtained in S610 and the risk scoring result of the previous time window obtained in S620 may be used as input data, so that the result of data analysis may be more accurate, and accuracy and efficiency are improved.
And S630, inputting an optimization result obtained by analyzing the Tn analysis result and the Tn +1 analysis result through the data analysis model into the risk assessment model.
The data analysis model is a pre-created model required for data analysis of the data to be evaluated corresponding to each trigger operation, and each application scene is provided with the corresponding data analysis model.
And S640, analyzing the optimization result through the risk assessment model to obtain a risk scoring result of the current time window, and returning to S610 for iterative updating.
The risk scoring model can dynamically set the risk scores of the intermediate state results of different data analysis models according to the content output by each intermediate state result, calculate the risk scores of different data analysis models in the time period, and perform certain comprehensive research and judgment.
It should be noted that the order of inputting S610 and S620 to the data analysis model is not limited, and S610 may be first input to the data analysis model, and then S620 may be input to the data analysis model; or inputting the S620 to the data analysis model, and then inputting the S610 to the data analysis model; it is also possible to input S610 and S620 to the data analysis model at the same time; the present embodiment is not limited thereto. In the present embodiment, the data analysis model will be described by inputting S610 and S620 to the data analysis model.
Fig. 7 is a schematic diagram of a sample library construction and supervised learning model training process according to an embodiment of the present invention. The supervised learning needs a large amount of data with labels, and the data can be automatically labeled through unsupervised learning and rule analysis in the early stage, so that a user behavior sample library is accumulated. The user behavior sample library may be used for model training for supervised learning. And according to the risk scoring result, correlating related original data, automatically labeling the data, and constructing a standard user behavior sample library. Data from the user behavior sample library is used to train supervised learning models such as Xgboost, LSTM, etc. As shown in fig. 7, the sample library construction and supervised learning model training process is shown in fig. 7.
For example, to facilitate better understanding of the data analysis method, a flow is described, which takes a scenario where a customer logs in a mobile phone bank to complete a transfer transaction as an example:
a1, application scene partitioning
Subdividing the application scenarios with coarse granularity may be divided into login, page click, and transfer transaction scenarios.
a2, raw data acquisition
Collecting the SDK collects the user's device information, behavior information, and some page information and transaction related information.
a3, data analysis model
The logging scene can be trained by selecting an isolated forest model; the page clicking can be trained by adopting a clustering model; the transaction scene can output results by using rules and maps;
a4 risk scoring model
The output of the model will be set in the risk scoring model separately, such as cluster 1 anomaly score of 30 and cluster 2 anomaly score of 60 for the cluster. And dynamically setting comprehensive study and judgment parameters according to the output results of the models, and finally obtaining a final risk view.
a5 model iterative optimization
Retraining the model according to the risk view and the risk scoring result of the last time window
a6, constructing a sample library, and training a supervised learning model
And automatically processing the acquired data labels according to the risk view, thereby constructing a data sample library of the account transfer transaction scene. The sample library is used for supervised learning model training, such as LSTM model. Whether the user clicks gambling or not, and the cheat website analysis, and when the user accesses the sequence analysis scene, the method has good effects. So that more sub-scenes can be analyzed.
In an embodiment, fig. 8 is a block diagram of a data analysis apparatus according to an embodiment of the present invention, which is suitable for a case when data is analyzed, and the apparatus may be implemented by hardware/software. The data analysis method can be configured in a server to realize the data analysis method in the embodiment of the invention. As shown in fig. 8, the apparatus includes: a first determination module 810, an output module 820, and a second determination module 830.
The first determining module is used for determining an application scene corresponding to each trigger operation received in a current time window; wherein the application scenario includes: the method comprises the steps of obtaining a first application scene obtained by dividing according to coarse granularity and a second application scene obtained by dividing according to fine granularity, wherein the second application scene is contained in the first application scene, and the second application scene is a scene obtained by dividing the first application scene.
And the output module is used for inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to the second application scene to obtain a corresponding intermediate state result.
And the second determining module is used for inputting each intermediate state result into a pre-created risk scoring model to obtain a corresponding risk view.
According to the embodiment of the invention, the application scene corresponding to each trigger operation received in the current time window is determined through the first determining module, the output module inputs the data to be evaluated corresponding to each trigger operation into the pre-established data analysis model corresponding to the second application scene to obtain the corresponding intermediate state result, and then the second determining module inputs each intermediate state result into the pre-established risk scoring model to obtain the corresponding risk view. According to the embodiment of the invention, the data to be evaluated corresponding to each trigger operation is input into the pre-established data analysis model corresponding to the second application scene to obtain the corresponding intermediate state result, and each intermediate state result is input into the pre-established risk scoring model to obtain the corresponding risk view, so that the problems of single abnormal behavior detection model, single use scene and low accuracy rate in user behavior risk evaluation are solved, and the identification accuracy rate of abnormal behaviors is improved. Compared with the prior art, the adopted data analysis method reduces the manpower and material resources required to be input for data analysis and detection, reduces the cost, and improves the identification efficiency and accuracy of abnormal behaviors.
In an embodiment, the data analysis apparatus further includes:
a scoring result obtaining module, configured to, after the application scenario corresponding to each trigger operation received in the current time window is determined, obtain raw data in the current time window and a risk scoring result of a previous time window before the to-be-evaluated data corresponding to each trigger operation is input to a pre-created data analysis model corresponding to the second application scenario;
the intermediate data acquisition module is used for carrying out data preprocessing on the original data and the risk scoring result to obtain intermediate data in a target data format;
and the evaluation data acquisition module is used for performing feature construction operation on the intermediate data to obtain corresponding data to be evaluated.
In an embodiment, the data analysis apparatus further includes:
and the analysis model determining module is used for determining a corresponding data analysis model according to the application scene and the data characteristics of the original data corresponding to the trigger operation before the data to be evaluated corresponding to each trigger operation is input to a pre-created data analysis model corresponding to the second application scene.
In one embodiment, the raw data includes one of: application log data, user behavior data, device fingerprints, and network environment information.
In an embodiment, the first determining module 810 includes:
the first scene determining unit is used for determining a first application scene corresponding to each trigger operation received in the current time window;
a second scenario determination unit configured to determine a second application scenario of each trigger operation in the first application scenario; wherein the second application scenario is contained within the first application scenario.
In one embodiment, the output module 820 includes:
the data to be evaluated input unit is used for inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to the second application scene;
and the intermediate state result obtaining unit is used for clustering and screening the data to be evaluated through the data analysis model to obtain a corresponding intermediate state result.
In an embodiment, the second determining module 830 includes:
a risk score obtaining unit, configured to input the intermediate state result output by the data analysis model corresponding to each second application scenario into a risk score model created in advance, so as to obtain a risk score corresponding to each data analysis model;
a scoring result obtaining unit, configured to determine a risk scoring result of the current time window according to the risk scoring corresponding to each data analysis model, a predetermined abnormal score, and a risk scoring result of the previous time window;
and the risk view obtaining unit is used for obtaining a corresponding risk view according to the risk scoring result of the current time window.
In an embodiment, the scoring result obtaining unit is further configured to perform weighted average on the risk score corresponding to the data analysis model, the predetermined abnormal score, and the risk scoring result of the previous time window to obtain the risk scoring result of the current time window.
In an embodiment, the data analysis apparatus further includes:
a first tag data obtaining module, configured to, after the intermediate state results are input into a pre-created risk scoring model and a corresponding risk view is obtained, perform tag processing on corresponding original data according to a risk scoring result output by the risk scoring model, and obtain corresponding first type tag data;
and the sample database storage module is used for storing the first type label data into a pre-created user behavior sample database.
In an embodiment, the creating method of the pre-created user behavior sample library includes:
automatically performing label processing on the original data through an unsupervised learning model and a rule analysis model to obtain corresponding second type label data;
and creating a corresponding user behavior sample library according to the second type label data.
In an embodiment, the data analysis apparatus further includes:
the tag data acquisition module is used for acquiring first type tag data and/or second type tag data in the user behavior sample library;
and the model training module is used for training a supervised learning model by utilizing the first type label data and/or the second type label data.
In one embodiment, the data analysis model includes at least one of the following types: unsupervised learning models, rule analysis models, and supervised learning models.
The data analysis device can execute the data analysis method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the data analysis method.
In an embodiment, fig. 9 is a schematic hardware structure diagram of a data analysis device according to an embodiment of the present invention. The device in the embodiment of the invention is explained by taking a computer as an example. As shown in fig. 9, the data analysis apparatus provided in the embodiment of the present invention includes: a processor 910, a memory 920, an input device 930, and an output device 940. The number of the processors 910 in the data analysis apparatus may be one or more, one processor 910 is taken as an example in fig. 9, the processor 910, the memory 920, the input device 930, and the output device 940 in the data analysis apparatus may be connected by a bus or in other manners, and the processor 910, the memory 920, the input device 930, and the output device 940 in fig. 9 are taken as an example of being connected by a bus.
The memory 920 in the data analysis apparatus is used as a computer-readable storage medium for storing one or more programs, which may be software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the embodiments of the present invention or the provided data analysis method (for example, the modules in the data analysis device shown in fig. 8, including the first determining module 810, the output module 820, and the second determining module 830). The processor 910 executes various functional applications and data processing of the cloud server by running software programs, instructions and modules stored in the memory 920, that is, the data analysis method in the foregoing method embodiment is implemented.
The memory 920 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the device, and the like. Further, the memory 920 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 920 may further include memory located remotely from the processor 910, which may be connected to devices over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input unit 930 may be used to receive numeric or character information input by a user to generate key signal inputs related to user settings and function control of the terminal device. The output device 940 may include a display device such as a display screen.
And, when the one or more programs included in the above-described data analysis apparatus are executed by the one or more processors 910, the programs perform the following operations: determining an application scene corresponding to each trigger operation received in a current time window; inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to a second application scene to obtain a corresponding intermediate state result; and inputting each intermediate state result into a pre-created risk scoring model to obtain a corresponding risk view.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a data analysis method provided in an embodiment of the present invention, where the method includes: determining an application scene corresponding to each trigger operation received in a current time window; inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to a second application scene to obtain a corresponding intermediate state result; and inputting each intermediate state result into a pre-created risk scoring model to obtain a corresponding risk view.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM or flash Memory), an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, or the like, as well as conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
In an embodiment, an embodiment of the present invention further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the data analysis method according to any of the above embodiments.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments illustrated herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (21)

1. A method of data analysis, comprising:
determining an application scene corresponding to each trigger operation received in a current time window; wherein the application scenario includes: the method comprises the steps of obtaining a first application scene obtained by dividing according to coarse granularity and a second application scene obtained by dividing according to fine granularity, wherein the second application scene is contained in the first application scene, and the second application scene is a scene obtained by dividing the first application scene;
inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to the second application scene to obtain a corresponding intermediate state result;
and inputting each intermediate state result into a pre-created risk scoring model to obtain a corresponding risk view.
2. The method according to claim 1, wherein after the determining the application scenario corresponding to each trigger operation received in the current time window, before the inputting the data to be evaluated corresponding to each trigger operation into the pre-created data analysis model corresponding to the second application scenario, further comprises:
acquiring the original data in the current time window and the risk scoring result of the previous time window;
performing data preprocessing on the original data and the risk scoring result to obtain intermediate data in a target data format;
and performing feature construction operation on the intermediate data to obtain corresponding data to be evaluated.
3. The method according to claim 1, wherein before the inputting the data to be evaluated corresponding to each of the trigger operations into the pre-created data analysis model corresponding to the second application scenario, the method further comprises:
and determining a corresponding data analysis model according to the application scene and the data characteristics of the original data corresponding to the trigger operation.
4. The method of claim 2 or 3, wherein the raw data comprises one of: application log data, user behavior data, device fingerprints, and network environment information.
5. The method of claim 1, wherein the determining the application scenario corresponding to each trigger operation received in the current time window comprises:
determining a first application scene corresponding to each trigger operation received in a current time window;
determining a second application scenario of each trigger operation in the first application scenario; wherein the second application scenario is contained within the first application scenario.
6. The method according to claim 1, wherein the inputting the data to be evaluated corresponding to each of the trigger operations into a pre-created data analysis model corresponding to the second application scenario to obtain a corresponding intermediate state result comprises:
inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to the second application scene;
and clustering and screening the data to be evaluated through the data analysis model to obtain a corresponding intermediate state result.
7. The method of claim 5 or 6, wherein inputting each intermediate state result into a pre-created risk scoring model results in a corresponding risk view, comprising:
inputting the intermediate state result output by the data analysis model corresponding to each second application scene into a pre-established risk score model to obtain a risk score corresponding to each data analysis model;
determining a risk scoring result of the current time window according to the risk scoring corresponding to each data analysis model, the predetermined abnormal scoring and the risk scoring result of the previous time window;
and obtaining a corresponding risk view according to the risk scoring result of the current time window.
8. The method of claim 7, wherein determining the risk score result for the current time window according to the risk score, the predetermined anomaly score and the risk score result for the previous time window corresponding to each of the data analysis models comprises:
and carrying out weighted average on the risk score corresponding to the data analysis model, the predetermined abnormal score and the risk score result of the last time window to obtain the risk score result of the current time window.
9. The method of claim 1, wherein after inputting each said intermediate state result into a pre-created risk scoring model, resulting in a corresponding risk view, further comprising:
performing label processing on the corresponding original data according to a risk scoring result output by the risk scoring model to obtain corresponding first type label data;
and storing the first type label data into a pre-created user behavior sample library.
10. The method of claim 9, wherein the pre-created user behavior sample library is created in a manner that includes:
automatically performing label processing on the original data through an unsupervised learning model and a rule analysis model to obtain corresponding second type label data;
and creating a corresponding user behavior sample library according to the second type label data.
11. The method of claim 9 or 10, further comprising:
acquiring first type label data and/or second type label data in the user behavior sample library;
training a supervised learning model using the first type of label data and/or the second type of label data.
12. The method of claim 8, wherein the data analysis model includes at least one of the following types: unsupervised learning models, rule analysis models, and supervised learning models.
13. A data analysis apparatus, comprising:
the first determining module is used for determining an application scene corresponding to each trigger operation received in the current time window; wherein the application scenario includes: the method comprises the steps of obtaining a first application scene obtained by dividing according to coarse granularity and a second application scene obtained by dividing according to fine granularity, wherein the second application scene is contained in the first application scene, and the second application scene is a scene obtained by dividing the first application scene;
the output module is used for inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to the second application scene to obtain a corresponding intermediate state result;
and the second determining module is used for inputting each intermediate state result into a pre-created risk scoring model to obtain a corresponding risk view.
14. The apparatus of claim 13, further comprising:
a scoring result obtaining module, configured to, after the application scenario corresponding to each trigger operation received in the current time window is determined, obtain raw data in the current time window and a risk scoring result of a previous time window before the to-be-evaluated data corresponding to each trigger operation is input to a pre-created data analysis model corresponding to the second application scenario;
the intermediate data acquisition module is used for carrying out data preprocessing on the original data and the risk scoring result to obtain intermediate data in a target data format;
and the evaluation data acquisition module is used for performing feature construction operation on the intermediate data to obtain corresponding data to be evaluated.
15. The apparatus of claim 13, wherein the first determining module comprises:
the first scene determining unit is used for determining a first application scene corresponding to each trigger operation received in the current time window;
a second scenario determination unit configured to determine a second application scenario of each trigger operation in the first application scenario; wherein the second application scenario is contained within the first application scenario.
16. The apparatus of claim 13, wherein the output module comprises:
the data to be evaluated input unit is used for inputting the data to be evaluated corresponding to each trigger operation into a pre-established data analysis model corresponding to the second application scene;
and the intermediate state result obtaining unit is used for clustering and screening the data to be evaluated through the data analysis model to obtain a corresponding intermediate state result.
17. The apparatus of claim 13, wherein the second determining module comprises:
a risk score obtaining unit, configured to input the intermediate state result output by the data analysis model corresponding to each second application scenario into a risk score model created in advance, so as to obtain a risk score corresponding to each data analysis model;
a scoring result obtaining unit, configured to determine a risk scoring result of the current time window according to the risk scoring corresponding to each data analysis model, a predetermined abnormal score, and a risk scoring result of the previous time window;
and the risk view obtaining unit is used for obtaining a corresponding risk view according to the risk scoring result of the current time window.
18. The apparatus of claim 13, further comprising:
a first tag data obtaining module, configured to, after the intermediate state results are input into a pre-created risk scoring model and a corresponding risk view is obtained, perform tag processing on corresponding original data according to a risk scoring result output by the risk scoring model, and obtain corresponding first type tag data;
and the sample database storage module is used for storing the first type label data into a pre-created user behavior sample database.
19. A data analysis apparatus, characterized in that the apparatus comprises: a memory, and one or more processors;
a memory for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a data analysis method as claimed in any one of claims 1-12.
20. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out a data analysis method according to any one of claims 1 to 12.
21. A computer program product, characterized in that it comprises a computer program which, when being executed by a processor, implements a data analysis method according to any one of claims 1-12.
CN202111552118.1A 2021-12-17 2021-12-17 Data analysis method, device, equipment, medium and product Pending CN114218569A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111552118.1A CN114218569A (en) 2021-12-17 2021-12-17 Data analysis method, device, equipment, medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111552118.1A CN114218569A (en) 2021-12-17 2021-12-17 Data analysis method, device, equipment, medium and product

Publications (1)

Publication Number Publication Date
CN114218569A true CN114218569A (en) 2022-03-22

Family

ID=80703754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111552118.1A Pending CN114218569A (en) 2021-12-17 2021-12-17 Data analysis method, device, equipment, medium and product

Country Status (1)

Country Link
CN (1) CN114218569A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117596078A (en) * 2024-01-18 2024-02-23 成都思维世纪科技有限责任公司 Model-driven user risk behavior discriminating method based on rule engine implementation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117596078A (en) * 2024-01-18 2024-02-23 成都思维世纪科技有限责任公司 Model-driven user risk behavior discriminating method based on rule engine implementation
CN117596078B (en) * 2024-01-18 2024-04-02 成都思维世纪科技有限责任公司 Model-driven user risk behavior discriminating method based on rule engine implementation

Similar Documents

Publication Publication Date Title
CN108881194B (en) Method and device for detecting abnormal behaviors of users in enterprise
CN110020422B (en) Feature word determining method and device and server
US10033694B2 (en) Method and device for recognizing an IP address of a specified category, a defense method and system
CN107862022B (en) Culture resource recommendation system
CN112235327A (en) Abnormal log detection method, device, equipment and computer readable storage medium
CN112311803B (en) Rule base updating method and device, electronic equipment and readable storage medium
CN102934110A (en) Research mission identification
CN116996325B (en) Network security detection method and system based on cloud computing
CN106294406B (en) Method and equipment for processing application access data
CN113704328A (en) User behavior big data mining method and system based on artificial intelligence
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN113194064B (en) Webshell detection method and device based on graph convolution neural network
CN114218569A (en) Data analysis method, device, equipment, medium and product
CN113015171A (en) System with network public opinion monitoring and analyzing functions
CN112667875A (en) Data acquisition method, data analysis method, data acquisition device, data analysis device, equipment and storage medium
CN116756688A (en) Public opinion risk discovery method based on multi-mode fusion algorithm
CN113746780A (en) Abnormal host detection method, device, medium and equipment based on host image
CN115296892A (en) Data information service system
CN110674288A (en) User portrait method applied to network security field
CN115051859A (en) Information analysis method, information analysis device, electronic apparatus, and medium
CN114860903A (en) Event extraction, classification and fusion method oriented to network security field
CN114358024A (en) Log analysis method, apparatus, device, medium, and program product
CN115964478A (en) Network attack detection method, model training method and device, equipment and medium
CN114528908A (en) Network request data classification model training method, classification method and storage medium
CN113360313A (en) Behavior analysis method based on massive system logs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination