CN111611519B - Method and device for detecting personal abnormal behaviors - Google Patents

Method and device for detecting personal abnormal behaviors Download PDF

Info

Publication number
CN111611519B
CN111611519B CN202010465761.XA CN202010465761A CN111611519B CN 111611519 B CN111611519 B CN 111611519B CN 202010465761 A CN202010465761 A CN 202010465761A CN 111611519 B CN111611519 B CN 111611519B
Authority
CN
China
Prior art keywords
user
detected
model
sequence
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010465761.XA
Other languages
Chinese (zh)
Other versions
CN111611519A (en
Inventor
汲丽
钱沁莹
魏国富
葛胜利
钟丹阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Data Security Solutions Co Ltd
Original Assignee
Information and Data Security Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Data Security Solutions Co Ltd filed Critical Information and Data Security Solutions Co Ltd
Priority to CN202010465761.XA priority Critical patent/CN111611519B/en
Publication of CN111611519A publication Critical patent/CN111611519A/en
Application granted granted Critical
Publication of CN111611519B publication Critical patent/CN111611519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a method and a device for detecting personal abnormal behaviors, wherein the method comprises the following steps: 1) Acquiring a stable sequence corresponding to the original data of the user to be detected; 2) Determining a corresponding baseline fixed value by using a simple average algorithm according to a stable sequence, wherein the data length of the stable sequence is smaller than a first preset threshold value; 3) Training a target ARIMA model aiming at a stable sequence with the data length not smaller than a first preset threshold value in the stable sequence, and predicting a predicted value of a user to be detected by using the target ARIMA model; 4) Predicting a predicted value of the user to be detected by using an exponential smoothing method to predict a baseline model; 5) And selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identifying abnormal users in the users to be detected by using the baseline model. By applying the embodiment of the invention, more abnormal behaviors can be detected.

Description

Method and device for detecting personal abnormal behaviors
Technical Field
The invention relates to the technical field of network security, in particular to a method for detecting personal abnormal behaviors.
Background
In a large environment where the internet industry is developing at a high rate, data is generated at an explosive rate. For websites, a large number of users click on each page of the website every day, and these clicking actions are generally recorded by server software such as apache and stored in a data source such as a text or database. More and more enterprises now pay attention to website analysis, and then the construction of websites is improved according to analysis results, and people who abnormally destroy the order of the websites are found out, so that better effects are obtained for website maintenance and website progress. In recent years, UEBA (User and Entity Behavior Analytics, user entity behavior analysis) is widely used as an emerging methodology in search recommendation, business wind control, network security, internal violation management and other fields, and the precursor of UEBA, UBA (User Behavior Analytics, user behavior analysis), is mainly used for search recommendation in the e-commerce field, and has been very common. The method and the system realize labeling and portrait creation for the user by analyzing the purchasing, clicking, collecting and other behaviors of the user, predict the future purchasing behavior of the user and push the commodity of interest to the user. In addition, UBA has been used for years against fraud, with a mature method of combining services, and the landing effect is good.
The invention patent application with the application number of 201810661474.9 in the prior art discloses an abnormal behavior detection method and system based on a frequent behavior column mode. The method comprises the steps of generating a frequent behavior sequence mode set according to collected data; deleting all sub-behavior sequence patterns in the frequent behavior sequence pattern set, and compressing the frequent behavior sequence pattern set; constructing an abnormal behavior detection model set based on the compressed frequent behavior sequence mode set, and dynamically updating the abnormal behavior detection model set; and detecting the newly generated behavior sequence mode set by using the dynamically updated abnormal behavior detection model set, and outputting a result. The method can reduce the computational complexity in the process of constructing and predicting the detection model aiming at the abnormal access behavior in the enterprise Web application system; and the false alarm rate of the model is reduced, and the accuracy of detecting abnormal access behaviors in the enterprise Web application system is improved.
However, in the prior art, only the frequent behavior sequence can be detected abnormally, but the non-frequent behaviors cannot be detected effectively, so that the prior art has the technical problems that the detection surface is narrow, and the detected abnormal behaviors are less
Disclosure of Invention
The technical problem to be solved by the invention is how to detect more abnormal behaviors.
The invention solves the technical problems by the following technical means:
the embodiment of the invention provides a method for detecting personal abnormal behaviors, which comprises the following steps:
1) Acquiring a stable sequence of original data corresponding to a user to be detected, wherein the original data comprises: equipment attribute information, wind control data and service data of a user;
2) Determining a corresponding baseline fixed value by using a simple average algorithm according to a stable sequence, wherein the data length of the stable sequence is smaller than a first preset threshold value;
3) Training a target ARIMA model aiming at a stable sequence with the data length not smaller than a first preset threshold value in the stable sequence, and predicting a predicted value of a user to be detected by using the target ARIMA model;
4) Predicting a predicted value of the user to be detected by using an exponential smoothing method to predict a baseline model;
5) And selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identifying abnormal users in the users to be detected by using the baseline model.
By applying the embodiment of the invention, the original data of all the users to be detected are generated into the corresponding stable sequence, the target ARIMA model is trained, and then the optimal baseline model is screened out by combining the exponential smoothing method prediction baseline model, and the baseline model covers all the users to be detected by combining a simple average algorithm, so that the detection range is wider, and more abnormal behaviors can be detected.
Optionally, step 1) includes:
acquiring original data corresponding to a user to be detected;
denoising the original data to obtain denoised original data;
according to the operation session of the user to be detected, the original data of the user to be detected is subjected to sorting processing according to the scene sequence corresponding to each operation scene in the operation session, so that a scene sequence of the user to be detected is obtained, and the scene sequence is converted into a stable sequence.
Optionally, before step 3), the method further comprises:
and calculating the scene loss rate in the scene sequence, and marking the scene sequence as an abnormal sequence under the condition that the scene loss rate exceeds a set confidence interval range, wherein the confidence interval range is calculated according to the scene loss rates of all users to be detected.
Optionally, the step 3) includes:
acquiring an autocorrelation coefficient diagram and a partial autocorrelation diagram of each stable sequence, and determining a current optimal level and a current optimal order according to a curve with highest accuracy in the autocorrelation coefficient diagram and a curve with highest accuracy in the partial autocorrelation diagram; establishing a current ARIMA model according to the optimal hierarchy and the optimal order; repeatedly iterating the current ARIMA model to obtain a target ARIMA model, and predicting the predicted value of the user to be detected by using the target ARIMA model.
Optionally, in the step 5), the step of selecting the optimal model from the target ARIMA model and the exponential smoothing method predicted baseline model as the baseline model includes:
and calculating the mean square error of the target ARIMA model and the mean square error of the exponential smoothing method prediction baseline model, and taking the model with lower mean square error as the baseline model.
Optionally, the step of identifying an abnormal user in the users to be detected using the baseline model in the step 5) includes:
acquiring a baseline value corresponding to the baseline model;
calculating the ratio between the actual value and the baseline value of each user to be detected,
and under the condition that the ratio is larger than a second preset threshold value, judging the user to be detected as an abnormal user.
Optionally, the method further comprises:
aiming at the user with the order-down rate or the order-return rate exceeding a third preset threshold, a verification code is sent to the user for verification, or,
according to the probability of each scene in the historical operation session of each user and the scene category of the packet trunk in the current operation session of the user, calculating the weighted probability value of the current operation session, and sending a verification code to the user for verification when the weighted probability value exceeds a fourth preset threshold value.
The embodiment of the invention provides a device for detecting personal abnormal behaviors, which comprises the following steps:
the device comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring a stable sequence of original data corresponding to a user to be detected, and the original data comprises: equipment attribute information, wind control data and service data of a user;
the determining module is used for determining a corresponding baseline fixed value by utilizing a simple average algorithm according to the stable sequence, wherein the data length of the stable sequence is smaller than a first preset threshold value;
the training module is used for training a target ARIMA model aiming at a stable sequence with the data length not smaller than a first preset threshold value in the stable sequence, and predicting a predicted value of a user to be detected by using the target ARIMA model;
the first identification module is used for predicting a predicted value of the user to be detected by using an exponential smoothing method to predict a baseline model;
and the second identification module is used for selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identifying abnormal users in the users to be detected by using the baseline model.
Optionally, the acquiring module is configured to:
acquiring original data corresponding to a user to be detected;
denoising the original data to obtain denoised original data;
according to the operation session of the user to be detected, the original data of the user to be detected is subjected to sorting processing according to the scene sequence corresponding to each operation scene in the operation session, so that a scene sequence of the user to be detected is obtained, and the scene sequence is converted into a stable sequence.
Optionally, the apparatus further includes:
the computing module is used for computing the scene deletion rate in the scene sequence, and marking the scene sequence as an abnormal sequence under the condition that the scene deletion rate exceeds a set confidence interval range, wherein the confidence interval range is computed according to the scene deletion rates of all users to be detected.
The invention has the advantages that:
by applying the embodiment of the invention, the original data of all the users to be detected are generated into the corresponding stable sequence, the target ARIMA model is trained, and then the optimal baseline model is screened out by combining the exponential smoothing method prediction baseline model, and the baseline model covers all the users to be detected by combining a simple average algorithm, so that the detection range is wider, and more abnormal behaviors can be detected.
Drawings
FIG. 1 is a schematic flow chart of a method for detecting abnormal behavior of a person according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a method for detecting abnormal behaviors of a person according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a device for detecting abnormal behaviors of a person according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions in the embodiments of the present invention will be clearly and completely described in the following in conjunction with the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
Fig. 1 is a schematic flow chart of a method for detecting abnormal behavior of a person according to an embodiment of the present invention, and fig. 2 is a schematic diagram of a method for detecting abnormal behavior of a person according to an embodiment of the present invention, where, as shown in fig. 1 and fig. 2, the method includes:
s101: acquiring a stable sequence of original data corresponding to a user to be detected, wherein the original data comprises: equipment attribute information of a user, wind control data and business data.
Illustratively, first, data is extracted from a business system of the platform, a text log of the platform, or other relevant data sources. And removing noise abnormal data such as data irrelevant to the access behaviors of the user, test data or accessed data of other platforms except the platform to be monitored, and only retaining user click data generated when the user accesses the platform to be monitored, wherein the user click data is original data. The raw data includes the following three aspects:
first: the device attribute information is mainly used for identifying whether the device is legal or compliant. For example, when a user operates an e-commerce APP, an important checkpoint in an operation flow is buried, each user triggers a preset point and then generates scene information of the user and equipment attribute information, the scene information and the equipment attribute information are used as data, commas are used for dividing fields, line feed is used for dividing the users, and a file is stored according to a csv format. Typically, the fields of the device attribute information include: device ID (device_id), device model (product_names), scene information, mac address (mac_address), APP name (label), version number (version code), APP size (apksize), first installation time (first time), battery health (health), charge status (plug), current power status (power), power standard (scale), power status (status), voltage (voltage), battery structure (technology), screen resolution (density), screen physical size (physical), screen resolution (resolution), memory size (media), current cpu number (cpu number), cpu frequency (keyboard), cpu architecture (processor), cpu total number (cpu_architecture), cpu attribute 1 (cpu_im 2), cpu attribute 2
(hardware), camera attribute 1 (largestsize), camera attribute 2
(support_formats), security module attribute 1 (blacklisthit), security module attribute 2
(cydiaasubstrate), root rights (root), sandbox (sadbox), simulator (simulator), static (statical), sound card information-maximum available volume (maxvolumeachieveability), sound card information-maximum available volume (maxvolumealarm), sound card information- (maxvolumedtmf), sound card information-music volume (maxvolumemulic), sound card information-maximum notification volume (maxvolulumuteidentification), sound card information-maximum alarm volume (maxvolumerin), sound card information-maximum system volume (maxvolumesystm), sound card information-maximum call volume (maxvolumevicc), sound card information-ring tone mode (ringermode), bluetooth history connection number (hasPermission), bluetooth information-whether visible (isvering), bluetooth information-whether available or not (isEnable) bluetooth information-whether function support (isFeaturESup), bluetooth information-whether Le2MPhy support (isLe 2 MPhySupported), bluetooth information-whether LeCodePhy support (isLeCodPhySupported), bluetooth information-whether advertisement extension (isLeExtendVertisingSupported), bluetooth information-whether periodic advertisement (isLePerioaddAdvertisingSupported), bluetooth information-whether hybrid advertisement (isMultipleAdvertisation Supported), bluetooth information-whether offload Filter (isOffloadedFiltertSupportSupported) is supported, bluetooth information-whether scanning offload batch (isOffloadScangSupported), application number (applist_count), system application number (sypplist_count), security module attribute (sensor_count), sim card information (sim_count), and security module attribute (sim_information) International Mobile Subscriber Identity (IMSI), international Mobile Equipment Identity (IMEI)
The security module attribute 1 and the security module attribute 2 are information of the mobile phone system.
Second,: the wind control data comprises all request information and personal information of users, the users operate the electronic commerce APP as a piece of data each time, the 'src_user' is used as a main key, the fields are divided by commas, the users are divided by line feed, and the files are stored according to the csv format. The fields of the device attribute information include: user name (src_user), timestamp (eval_time), browser_client_id (browser_client_id), service link (business_session_session), mobile phone number (cellphone_no), cookie_id (cookie_id), time channel (ch_event_channel), event type (ch_event_type), system (ch_system), IP address (ipadder), city where IP is located (ip_city), province where IP is located (ip_program), digital identification frame (operation), user agent (user), hit rule number (count), login channel (log_channel), APP program version information (cellphone_no), openid (Openid), rule group name (agend_name), hit rule number (count), event number (event_id), validation rule group flag (g), interface error message (message), whether or not device is a simulator (page_device), whether or not (page_device is a virtual machine), whether or not network state of the device is a virtual machine (virtual machine), and whether or not the network state of the user is authenticated (page_device)
Third,: the business data comprises all order, order return, order details and other information of the user, the user operates the order each time as a piece of data, the 'src_user' is used as a main key, commas are used for dividing fields, line feed is used for dividing the users, and the file is stored according to the csv format. The fields of the device attribute information include: user name (src_user), timestamp (eval_time stamp), order number (order_id), telephone number (cellphone_no), order scenario (ch_buffer_scenerio), system (ch_system), IP address (ipadder), IP city (ip_city), IP province (ipaddress), digital identification frame (opening), user agent (user), commodity set (good_set), coupon name (), order channel (event_channel), order channel_channel (order_channel), order commodity amount (order_sample), order machine number (order_cellphone_no), order number (order_no), number of commodities (order_ qty), type of orders (order_type), address of commodities (order_address), restaurant (time_buffer), commodity name (time stamp) and time stamp (time stamp), time stamp authentication system (time stamp, time stamp request (time stamp) and time stamp
Then, extracting scene information of each user to be detected, wherein the scene information in the original data is as follows: registration (001) registration acquisition verification code (002) login (003) login acquisition verification code (004) logout (005) order commodity (006) submitted order verification (007) payment verification (008) refund verification (009) receipt completion (010) sign-in verification (011) preferential code acquisition verification (012) preferential code use verification (013).
In the collected raw data, scene information is scattered. The same device ID may be used as the association, and then, each scene information is summarized, and then, the scenes are ordered according to the scene sequence to which the scene corresponding to each operation belongs: for example, after the user normally completes one order, counting as one operation session, the operation scenes in the same operation session may be sorted according to time sequence, for example, the sequence of one customer order is (003-006-007-008-010), and the sequence is counted as a scene sequence into the feature engineering. In the step, the secondary information of the original data is extracted and added into the feature engineering as 'personalized feature value', so that the information in the features can be enriched, and the prediction accuracy of the model is further improved.
The histogram may then be used for data visualization, and then the autocorrelation and partial autocorrelation of the histogram for each scene sequence is obtained. Judging and identifying the stationarity of each scene sequence by utilizing the existing algorithm according to the autocorrelation diagrams and the partial autocorrelation diagrams, and if both diagrams are trailing after visualization, namely, have attenuation trend but the result is not 0, the scene sequence is a non-stationary sequence, and differentiating the non-stationary scene sequence data to obtain a stationary sequence corresponding to the scene sequence; it will be appreciated that the smooth sequence does not require differential processing.
Further, in the extraction of scene sequence, due to conditions such as hardware and privacy, there is always some data missing, so the rate of the scene Jing Queshi can be calculated from the number of missing scenes in the scene sequence, for example, in 003-006-007-008-010 sequence, if the scene value corresponding to the position is null if "008" is default, the rate of the scene missing in the sequence is 20%. And then, counting the scene deletion rate of all the users to be detected, and further calculating the deleted confidence interval, for example, the confidence interval can be confirmed by using a quartile method. When the confidence interval is exceeded, the information loss is considered to be strictly marked as 0, and risks exist; and within this interval a 1 can be tolerated. If there are more scene deletions, the scene sequence may be marked as an abnormal sequence.
S102: determining a corresponding baseline fixed value by using a simple average algorithm aiming at a stable sequence with the data length smaller than a first preset threshold value in the stable sequence;
the number will not meet the data length requirement part will carry on the simple average prediction baseline method, because the data length will influence the overall smooth length too small, but will not reject here for the data integrity, but use the simple average, such as arithmetic average, geometric average and weighted average to calculate the actual scene value before the steady sequence of an ID each moment, get the baseline fixed value, consider the full availability of the data overall here, set the threshold as the length covering 90% of the total data as the threshold, namely 7.
S103: and training a target ARIMA (Autoregressive Integrated Moving Average Model, namely an autoregressive moving average model) model aiming at a stable sequence with the data length not smaller than a first preset threshold value in the stable sequence, and predicting a predicted value of a user to be detected by using the target ARIMA model.
And acquiring an autocorrelation coefficient diagram and a partial autocorrelation diagram of each stationary sequence. The autocorrelation function is to compare an ordered sequence of random variables with itself, reflecting the correlation of the same sequence between values at different timings. The partial autocorrelation function calculates the correlation between two strict variables, namely the correlation degree between the two variables obtained after the interference of intermediate variables is eliminated, wherein the required parameters are dataset-time sequence data, the value range of p_values-p values, a list or array type, the value range of d_values-d values, a list or array type, the value range of q_values-q values, a list or array type and the proportion of train_pro-training data; .
Then, the autocorrelation function and the partial autocorrelation function can be calculated and drawn through SPSS, MATLAB and other tools, and the corresponding p and q can be judged to be valued through the images. The ordinate of the image is the correlation coefficient, and the abscissa is the order, and it can be seen that there is a certain periodic relationship between the order and the correlation coefficient. The values of p and q are the minimum number of cycles. If the values of p and q cannot pass the model test in the subsequent calculation, the values of p and q are readjusted, and the calculation is finished again. Repeatedly iterating the current ARIMA model to obtain a target ARIMA model, and predicting the predicted value of the user to be detected by using the target ARIMA model.
S104: predicting a predicted value of a user to be detected by using an exponential smoothing method to predict a baseline model;
the baseline prediction function of the time series data is an exponential smoothing method prediction baseline model; the required parameters are timelines-time series data, a bold-bias function, p_max-p maximum, d_max-d maximum, q_max-q maximum, and train_pro-training sample ratio.
And predicting a predicted value of the user to be detected by using an exponential smoothing method to predict a baseline model.
It should be emphasized that the exponential smoothing method predicts that the baseline model is an existing model.
S105: and selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identifying abnormal users in the users to be detected by using the baseline model.
Then, obtaining a mean square error value of a predicted baseline model by an exponential smoothing method, and comparing the mean square error value of the target ARIMA model; and taking the minimum mean square error as a baseline model.
And finally, detecting abnormal users by using the baseline model.
By applying the embodiment of the invention, the original data of all the users to be detected are generated into the corresponding stable sequence, the target ARIMA model is trained, and then the optimal baseline model is screened out by combining the exponential smoothing method prediction baseline model, and the baseline model covers all the users to be detected by combining a simple average algorithm, so that the detection range is wider, and more abnormal behaviors can be detected.
Starting from the access habit of the user, the embodiment of the invention discovers the abnormality by the personal behavior of the user, compares the currently input behavior data with the previous regular behavior data, and discovers the abnormal output in time.
Moreover, currently, internal threats have been the largest type of attack among network attacks. The internal abnormal event is mostly a small probability event, and meanwhile, the internal abnormal event is required to be very accurate, and long-term we rely on known rules to detect, and the rules and experiences of countless experts are built in the detection engine. The UEBA technology used in the prior art is long-term explored by relying on expert experience, the range of the technology is limited, and the rule threshold is determined by known rules, so that the recall rate is low, and the accuracy rate is still to be improved.
In the embodiment of the invention, the mean square error is taken as an evaluation standard, and the optimal solution of an ARIMA model and a moving average model of optimal parameters is taken as a base line; and the actual value/baseline value is taken as a deviation value, a dynamic behavior baseline is established to find the normal mode of the internal user deviated from the individual, the risk level of the user is judged according to the value of the risk accumulation, and more accurate anomalies are found from the focused data content to the content context relationship, behavior analysis and the like.
Example 2
Example 2 of the present invention adds the following steps on the basis of example 1:
aiming at the user with the order-down rate or the order-return rate exceeding a third preset threshold, a verification code is sent to the user for verification, or,
according to the probability of each scene in the historical operation session of each user and the scene category of the packet trunk in the current operation session of the user, calculating the weighted probability value of the current operation session, and sending a verification code to the user for verification when the weighted probability value exceeds a fourth preset threshold value.
The embodiment of the invention combines the service data to create a 'whitening mechanism', and the users which cannot be positioned accurately are identified immediately, so that the complaint quantity of the users is reduced.
Example 3
Fig. 3 is a schematic structural diagram of a device for detecting abnormal behavior of a person according to an embodiment of the present invention, where the device includes:
an obtaining module 301, configured to obtain a stationary sequence corresponding to original data of a user to be detected, where the original data includes: equipment attribute information, wind control data and service data of a user;
a determining module 302, configured to determine, for a stationary sequence in which a data length in the stationary sequence is smaller than a first preset threshold, a corresponding baseline fixed value by using a simple average algorithm;
the training module 303 is configured to train a target ARIMA model for a stationary sequence in which a data length in the stationary sequence is not less than a first preset threshold, and predict a predicted value of a user to be detected using the target ARIMA model;
a first recognition module 304, configured to predict a predicted value of a user to be detected by using an exponential smoothing method to predict a baseline model;
a second identifying module 305, configured to select an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identify abnormal users in the users to be detected using the baseline model.
In a specific implementation of the embodiment of the present invention, the obtaining module 301 is configured to:
acquiring original data corresponding to a user to be detected;
denoising the original data to obtain denoised original data;
according to the operation session of the user to be detected, the original data of the user to be detected is subjected to sorting processing according to the scene sequence corresponding to each operation scene in the operation session, so that a scene sequence of the user to be detected is obtained, and the scene sequence is converted into a stable sequence.
In a specific implementation manner of the embodiment of the present invention, the apparatus further includes:
the computing module is used for computing the scene deletion rate in the scene sequence, and marking the scene sequence as an abnormal sequence under the condition that the scene deletion rate exceeds a set confidence interval range, wherein the confidence interval range is computed according to the scene deletion rates of all users to be detected.
In a specific implementation manner of the embodiment of the present invention, the training module 303 is configured to:
acquiring an autocorrelation coefficient diagram and a partial autocorrelation diagram of each stable sequence, and determining a current optimal level and a current optimal order according to a curve with highest accuracy in the autocorrelation coefficient diagram and a curve with highest accuracy in the partial autocorrelation diagram; establishing a current ARIMA model according to the optimal hierarchy and the optimal order; repeatedly iterating the current ARIMA model to obtain a target ARIMA model, and predicting the predicted value of the user to be detected by using the target ARIMA model.
In a specific implementation of the embodiment of the present invention, the second identifying module 305 is configured to:
and calculating the mean square error of the target ARIMA model and the mean square error of the exponential smoothing method prediction baseline model, and taking the model with lower mean square error as the baseline model.
In a specific implementation of the embodiment of the present invention, the second identifying module 305 is configured to:
acquiring a baseline value corresponding to the baseline model;
calculating the ratio between the actual value and the baseline value of each user to be detected,
and under the condition that the ratio is larger than a second preset threshold value, judging the user to be detected as an abnormal user.
The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. A method for detecting abnormal behavior of a person, the method comprising:
1) Acquiring a stable sequence of original data corresponding to a user to be detected, wherein the original data comprises: equipment attribute information, wind control data and service data of a user;
2) Determining a corresponding baseline fixed value by using a simple average algorithm according to a stable sequence, wherein the data length of the stable sequence is smaller than a first preset threshold value;
3) Training a target ARIMA model aiming at a stable sequence with the data length not smaller than a first preset threshold value in the stable sequence, and predicting a predicted value of a user to be detected by using the target ARIMA model;
4) Predicting a predicted value of the user to be detected by using an exponential smoothing method to predict a baseline model;
5) Selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identifying abnormal users in the users to be detected by using the baseline model;
the step 1) includes:
acquiring original data corresponding to a user to be detected;
denoising the original data to obtain denoised original data;
according to the operation session of the user to be detected, the original data of the user to be detected is subjected to sorting processing according to the scene sequence corresponding to each operation scene in the operation session, so that a scene sequence of the user to be detected is obtained, and the scene sequence is converted into a stable sequence.
2. A method of detecting personal abnormal behavior according to claim 1, wherein prior to step 3), the method further comprises:
and calculating the scene deletion rate in the scene sequence, and marking the scene sequence as an abnormal sequence under the condition that the scene deletion rate exceeds a set confidence interval range, wherein the confidence interval range is calculated according to the scene deletion rates of all users to be detected.
3. The method for detecting abnormal behavior of a person according to claim 1, wherein said step 3) comprises:
acquiring an autocorrelation coefficient diagram and a partial autocorrelation diagram of each stable sequence, and determining a current optimal level and a current optimal order according to a curve with highest accuracy in the autocorrelation coefficient diagram and a curve with highest accuracy in the partial autocorrelation diagram; establishing a current ARIMA model according to the optimal hierarchy and the optimal order; repeatedly iterating the current ARIMA model to obtain a target ARIMA model, and predicting the predicted value of the user to be detected by using the target ARIMA model.
4. The method for detecting abnormal behavior of a person according to claim 1, wherein the step of selecting an optimal model from the target ARIMA model and the exponential smoothing method predicted baseline model in the step 5) as the baseline model comprises:
and calculating the mean square error of the target ARIMA model and the mean square error of the exponential smoothing method prediction baseline model, and taking the model with lower mean square error as the baseline model.
5. The method for detecting abnormal behavior of a person according to claim 1, wherein the step of identifying abnormal users among the users to be detected using the baseline model in the step 5) comprises:
acquiring a baseline value corresponding to the baseline model;
calculating the ratio between the actual value and the baseline value of each user to be detected,
and under the condition that the ratio is larger than a second preset threshold value, judging the user to be detected as an abnormal user.
6. The method for detecting abnormal behavior of a person according to claim 1, further comprising:
aiming at the user with the order-down rate or the order-return rate exceeding a third preset threshold, a verification code is sent to the user for verification, or,
and calculating a weighted probability value of the current operation session according to the occurrence probability of each scene in the historical operation session of each user and the scene category included in the current operation session of the user, and sending a verification code to the user for verification when the weighted probability value exceeds a fourth preset threshold value.
7. A personal abnormal behavior detection apparatus, the apparatus comprising:
the device comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring a stable sequence of original data corresponding to a user to be detected, and the original data comprises: equipment attribute information, wind control data and service data of a user;
the determining module is used for determining a corresponding baseline fixed value by utilizing a simple average algorithm according to the stable sequence, wherein the data length of the stable sequence is smaller than a first preset threshold value;
the training module is used for training a target ARIMA model aiming at a stable sequence with the data length not smaller than a first preset threshold value in the stable sequence, and predicting a predicted value of a user to be detected by using the target ARIMA model;
the first identification module is used for predicting a predicted value of the user to be detected by using an exponential smoothing method to predict a baseline model;
the second identification module is used for selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identifying abnormal users in the users to be detected by using the baseline model;
the acquisition module is used for:
acquiring original data corresponding to a user to be detected;
denoising the original data to obtain denoised original data;
according to the operation session of the user to be detected, the original data of the user to be detected is subjected to sorting processing according to the scene sequence corresponding to each operation scene in the operation session, so that a scene sequence of the user to be detected is obtained, and the scene sequence is converted into a stable sequence.
8. The apparatus for detecting personal abnormal behavior according to claim 7, wherein said apparatus further comprises:
the computing module is used for computing the scene deletion rate in the scene sequence, and marking the scene sequence as an abnormal sequence under the condition that the scene deletion rate exceeds a set confidence interval range, wherein the confidence interval range is computed according to the scene deletion rates of all users to be detected.
CN202010465761.XA 2020-05-28 2020-05-28 Method and device for detecting personal abnormal behaviors Active CN111611519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010465761.XA CN111611519B (en) 2020-05-28 2020-05-28 Method and device for detecting personal abnormal behaviors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010465761.XA CN111611519B (en) 2020-05-28 2020-05-28 Method and device for detecting personal abnormal behaviors

Publications (2)

Publication Number Publication Date
CN111611519A CN111611519A (en) 2020-09-01
CN111611519B true CN111611519B (en) 2023-07-11

Family

ID=72199734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010465761.XA Active CN111611519B (en) 2020-05-28 2020-05-28 Method and device for detecting personal abnormal behaviors

Country Status (1)

Country Link
CN (1) CN111611519B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112822046B (en) * 2021-01-04 2022-04-01 新华三大数据技术有限公司 Flow prediction method and device
CN112907622A (en) * 2021-01-20 2021-06-04 厦门市七星通联科技有限公司 Method, device, equipment and storage medium for identifying track of target object in video
CN112966732B (en) * 2021-03-02 2022-11-18 东华大学 Multi-factor interactive behavior anomaly detection method with periodic attribute

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103489039A (en) * 2013-09-12 2014-01-01 重庆大学 Expressway traffic flow fusing and forecasting method with online self-tuning and optimizing function
CN104994539A (en) * 2015-06-30 2015-10-21 电子科技大学 Wireless sensor network traffic abnormality detection method based on ARIMA model
CN105550772A (en) * 2015-12-09 2016-05-04 中国电力科学研究院 Online historical data tendency analysis method
CN108921355A (en) * 2018-07-03 2018-11-30 国家计算机网络与信息安全管理中心 A kind of alarm threshold setting method and device based on time series predicting model
CN109376924A (en) * 2018-10-18 2019-02-22 广东电网有限责任公司 A kind of method, apparatus, equipment and the readable storage medium storing program for executing of material requirements prediction
CN109410036A (en) * 2018-10-09 2019-03-01 北京芯盾时代科技有限公司 A kind of fraud detection model training method and device and fraud detection method and device
CN109587713A (en) * 2018-12-05 2019-04-05 广州数锐智能科技有限公司 A kind of network index prediction technique, device and storage medium based on ARIMA model
CN109978597A (en) * 2019-01-22 2019-07-05 广东工业大学 A kind of Sales Volume of Commodity prediction technique under festivals or holidays effect
CN111126656A (en) * 2019-11-10 2020-05-08 国网浙江省电力有限公司温州供电公司 Electric energy meter fault quantity prediction method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119100A1 (en) * 2009-10-20 2011-05-19 Jan Matthias Ruhl Method and System for Displaying Anomalies in Time Series Data

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103489039A (en) * 2013-09-12 2014-01-01 重庆大学 Expressway traffic flow fusing and forecasting method with online self-tuning and optimizing function
CN104994539A (en) * 2015-06-30 2015-10-21 电子科技大学 Wireless sensor network traffic abnormality detection method based on ARIMA model
CN105550772A (en) * 2015-12-09 2016-05-04 中国电力科学研究院 Online historical data tendency analysis method
CN108921355A (en) * 2018-07-03 2018-11-30 国家计算机网络与信息安全管理中心 A kind of alarm threshold setting method and device based on time series predicting model
CN109410036A (en) * 2018-10-09 2019-03-01 北京芯盾时代科技有限公司 A kind of fraud detection model training method and device and fraud detection method and device
CN109376924A (en) * 2018-10-18 2019-02-22 广东电网有限责任公司 A kind of method, apparatus, equipment and the readable storage medium storing program for executing of material requirements prediction
CN109587713A (en) * 2018-12-05 2019-04-05 广州数锐智能科技有限公司 A kind of network index prediction technique, device and storage medium based on ARIMA model
CN109978597A (en) * 2019-01-22 2019-07-05 广东工业大学 A kind of Sales Volume of Commodity prediction technique under festivals or holidays effect
CN111126656A (en) * 2019-11-10 2020-05-08 国网浙江省电力有限公司温州供电公司 Electric energy meter fault quantity prediction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于ARIMA-SVR的水文时间序列异常值检测;孙建树等;《计算机与数字工程》;20180220(第02期);全文 *

Also Published As

Publication number Publication date
CN111611519A (en) 2020-09-01

Similar Documents

Publication Publication Date Title
CN111614690B (en) Abnormal behavior detection method and device
CN111611519B (en) Method and device for detecting personal abnormal behaviors
CN110399925B (en) Account risk identification method, device and storage medium
CN113347205B (en) Method and device for detecting service access request
US20170132523A1 (en) Periodicity Analysis on Heterogeneous Logs
CN111311136A (en) Wind control decision method, computer equipment and storage medium
CN108491720B (en) Application identification method, system and related equipment
WO2018028430A1 (en) Method, apparatus and system for identification and auxiliary identification of fake traffic
CN110798488B (en) Web application attack detection method
CN113572752B (en) Abnormal flow detection method and device, electronic equipment and storage medium
CN112751835B (en) Flow early warning method, system, equipment and storage medium
CN112733045B (en) User behavior analysis method and device and electronic equipment
CN111400357A (en) Method and device for identifying abnormal login
Yazji et al. Efficient location aware intrusion detection to protect mobile devices
CN111754241A (en) User behavior perception method, device, equipment and medium
CN111612085B (en) Method and device for detecting abnormal points in peer-to-peer group
US11356469B2 (en) Method and apparatus for estimating monetary impact of cyber attacks
CN111783073A (en) Black product identification method and device and readable storage medium
CN115204733A (en) Data auditing method and device, electronic equipment and storage medium
CN114841705A (en) Anti-fraud monitoring method based on scene recognition
CN108804501B (en) Method and device for detecting effective information
CN117609992A (en) Data disclosure detection method, device and storage medium
JP7015927B2 (en) Learning model application system, learning model application method, and program
CN116738369A (en) Traffic data classification method, device, equipment and storage medium
CN116318974A (en) Site risk identification method and device, computer readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant