CN111611519A - Method and device for detecting personal abnormal behaviors - Google Patents

Method and device for detecting personal abnormal behaviors Download PDF

Info

Publication number
CN111611519A
CN111611519A CN202010465761.XA CN202010465761A CN111611519A CN 111611519 A CN111611519 A CN 111611519A CN 202010465761 A CN202010465761 A CN 202010465761A CN 111611519 A CN111611519 A CN 111611519A
Authority
CN
China
Prior art keywords
user
detected
model
sequence
scene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010465761.XA
Other languages
Chinese (zh)
Other versions
CN111611519B (en
Inventor
汲丽
钱沁莹
魏国富
葛胜利
钟丹阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information and Data Security Solutions Co Ltd
Original Assignee
Information and Data Security Solutions Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information and Data Security Solutions Co Ltd filed Critical Information and Data Security Solutions Co Ltd
Priority to CN202010465761.XA priority Critical patent/CN111611519B/en
Publication of CN111611519A publication Critical patent/CN111611519A/en
Application granted granted Critical
Publication of CN111611519B publication Critical patent/CN111611519B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0609Buyer or seller confidence or verification

Abstract

The invention provides a method and a device for detecting abnormal personal behaviors, wherein the method comprises the following steps: 1) acquiring a stable sequence corresponding to the original data of the user to be detected; 2) determining a corresponding baseline fixed value by using a simple average algorithm aiming at a stationary sequence with the data length smaller than a first preset threshold value in the stationary sequence; 3) training a target ARIMA model aiming at a stationary sequence with the data length not less than a first preset threshold value in the stationary sequence, and predicting a predicted value of a user to be detected by using the target ARIMA model; 4) predicting a predicted value of the user to be detected by using an exponential smoothing method to predict the baseline model; 5) and selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identifying abnormal users in the users to be detected by using the baseline model. By applying the embodiment of the invention, more abnormal behaviors can be detected.

Description

Method and device for detecting personal abnormal behaviors
Technical Field
The invention relates to the technical field of network security, in particular to a method for detecting personal abnormal behaviors.
Background
In the context of the rapid growth of the internet industry, data is generated at an explosive rate. For a website, a large number of users click each page of the website every day, and the click behaviors are generally recorded by server software such as apache and stored in data sources such as texts or databases. More and more enterprises pay attention to website analysis, and then the construction of the website is improved and people who abnormally break the order of the website are found out according to the analysis result, so that better effects of website maintenance and website progress are achieved. In recent years, UEBA (User and Entity analysis of Behavior) has been widely used as an emerging methodology in search recommendation, traffic wind control, network security, and internal violation management, and the UEBA predecessor is UBA (User behavor analysis of Behavior) and is mainly used in the e-commerce field for search recommendation, and is very common. The method realizes the purposes of marking labels and establishing images for users by analyzing the behaviors of purchasing, clicking, collecting and the like of the users, predicts the future purchasing behaviors of the users and pushes the commodities which the users are interested in. In addition, UBA is used for years against fraud, a mature method combining services is provided, and landing effect is good.
The invention patent application with application number 201810661474.9 in the prior art discloses an abnormal behavior detection method and system based on a frequent row and column mode. The method comprises the steps of generating a frequent behavior sequence mode set according to collected data; deleting all the sub-behavior sequence modes in the frequent behavior sequence mode set, and compressing the frequent behavior sequence mode set; constructing an abnormal behavior detection model set based on the compressed frequent behavior sequence pattern set, and dynamically updating the abnormal behavior detection model set; and detecting the newly generated behavior sequence mode set by using the dynamically updated abnormal behavior detection model set, and outputting a result. The method can reduce the computational complexity in the construction and prediction process of the detection model aiming at the abnormal access behavior in the enterprise Web application system; and the false alarm rate of the model is reduced, and the accuracy of detecting abnormal access behaviors in the enterprise Web application system is improved.
However, the conventional technology can only detect the abnormal behavior of the frequent behavior sequence, but cannot effectively detect the abnormal behavior, so that the conventional technology has the technical problems that the detection surface is narrow, and the detected abnormal behavior is less
Disclosure of Invention
The technical problem to be solved by the invention is how to detect more abnormal behaviors.
The invention solves the technical problems through the following technical means:
the embodiment of the invention provides a method for detecting personal abnormal behaviors, which comprises the following steps:
1) acquiring a stable sequence of original data corresponding to a user to be detected, wherein the original data comprises: the device attribute information, the wind control data and the service data of the user;
2) determining a corresponding baseline fixed value by using a simple average algorithm aiming at a stationary sequence with the data length smaller than a first preset threshold value in the stationary sequence;
3) training a target ARIMA model aiming at a stationary sequence with the data length not less than a first preset threshold value in the stationary sequence, and predicting a predicted value of a user to be detected by using the target ARIMA model;
4) predicting a predicted value of the user to be detected by using an exponential smoothing method to predict the baseline model;
5) and selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identifying abnormal users in the users to be detected by using the baseline model.
By applying the embodiment of the invention, the corresponding stable sequences are generated from the original data of all the users to be detected, the target ARIMA model is trained, and then the optimal baseline model is screened out by combining the exponential smoothing method prediction baseline model, and the baseline model covers all the users to be detected by combining with a simple average algorithm, so that the detection range is wider, and more abnormal behaviors can be detected.
Optionally, step 1), comprising:
acquiring original data corresponding to a user to be detected;
denoising the original data to obtain denoised original data;
according to the operation session of the user to be detected, the original data of the user to be detected are sequenced according to the scene sequence corresponding to each operation scene in the operation session, so that the scene sequence of the user to be detected is obtained, and the scene sequence is converted into a stable sequence.
Optionally, before step 3), the method further includes:
and calculating scene missing rate in the scene sequence, and marking the scene sequence as an abnormal sequence under the condition that the scene missing rate exceeds a set confidence interval range, wherein the confidence interval range is calculated according to the scene missing rate of all users to be detected.
Optionally, the step 3) includes:
acquiring an autocorrelation coefficient chart and a partial autocorrelation chart of each stationary sequence, and determining a current optimal level and a current optimal order according to a curve with highest accuracy in the autocorrelation coefficient chart and a curve with highest accuracy in the partial autocorrelation chart; establishing a current ARIMA model according to the optimal hierarchy and the optimal order; and repeatedly iterating the current ARIMA model to obtain a target ARIMA model, and predicting the predicted value of the user to be detected by using the target ARIMA model.
Optionally, the step of selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model in the step 5) as the baseline model includes:
and calculating the mean square error of the target ARIMA model and the mean square error of the baseline model predicted by an exponential smoothing method, and taking the model with lower mean square error as the baseline model.
Optionally, the step of identifying an abnormal user among the users to be detected by using the baseline model in the step 5) includes:
obtaining a baseline value corresponding to the baseline model;
calculating the ratio of the actual value to the baseline value of each user to be detected,
and under the condition that the ratio is larger than a second preset threshold value, judging that the user to be detected is an abnormal user.
Optionally, the method further includes:
sending a verification code to the user for verification aiming at the user with the order placing rate or the order returning rate exceeding a third preset threshold value, or,
and calculating a weighted probability value of the current operation session according to the occurrence probability of each scene in the historical operation session of each user and the scene category of the trunk in the current operation session of the user, and sending a verification code to the user for verification when the weighted probability value exceeds a fourth preset threshold value.
The embodiment of the invention provides a personal abnormal behavior detection device, and the method comprises the following steps:
an obtaining module, configured to obtain a stationary sequence of raw data corresponding to a user to be detected, where the raw data includes: the device attribute information, the wind control data and the service data of the user;
the determining module is used for determining a corresponding baseline fixed value by using a simple average algorithm aiming at a stationary sequence of which the data length is smaller than a first preset threshold value in the stationary sequence;
the training module is used for training a target ARIMA model aiming at a stationary sequence with the data length not less than a first preset threshold value in the stationary sequence and predicting a predicted value of a user to be detected by using the target ARIMA model;
the first identification module is used for predicting the predicted value of the user to be detected by using an exponential smoothing method to predict the baseline model;
and the second identification module is used for selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model and identifying abnormal users in the users to be detected by using the baseline model.
Optionally, the obtaining module is configured to:
acquiring original data corresponding to a user to be detected;
denoising the original data to obtain denoised original data;
according to the operation session of the user to be detected, the original data of the user to be detected are sequenced according to the scene sequence corresponding to each operation scene in the operation session, so that the scene sequence of the user to be detected is obtained, and the scene sequence is converted into a stable sequence.
Optionally, the apparatus further comprises:
and the calculation module is used for calculating scene missing rates in the scene sequences, and marking the scene sequences as abnormal sequences under the condition that the scene missing rates exceed a set confidence interval range, wherein the confidence interval range is calculated according to the scene missing rates of all users to be detected.
The invention has the advantages that:
by applying the embodiment of the invention, the corresponding stable sequences are generated from the original data of all the users to be detected, the target ARIMA model is trained, and then the optimal baseline model is screened out by combining the exponential smoothing method prediction baseline model, and the baseline model covers all the users to be detected by combining with a simple average algorithm, so that the detection range is wider, and more abnormal behaviors can be detected.
Drawings
Fig. 1 is a schematic flow chart of a method for detecting abnormal personal behaviors according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating a method for detecting abnormal behavior of a person according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a device for detecting abnormal personal behaviors according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
Fig. 1 is a schematic flow chart of a method for detecting abnormal personal behaviors according to an embodiment of the present invention, and fig. 2 is a schematic principle diagram of a method for detecting abnormal personal behaviors according to an embodiment of the present invention, as shown in fig. 1 and fig. 2, the method includes:
s101: acquiring a stationary sequence of original data corresponding to a user to be detected, wherein the original data comprises: the device attribute information, the wind control data and the service data of the user.
Illustratively, first, data is extracted from a service system of the platform, a text log of the platform, and other related data sources. And then removing noise abnormal data, such as data irrelevant to user access behaviors, test data or accessed data of other platforms except the platform to be monitored, and only retaining user click data generated when a user accesses the platform to be monitored, wherein the user click data is original data. The raw data includes the following three aspects:
firstly, the method comprises the following steps: device attribute information, which is mainly used to identify whether a device is legitimate or compliant. For example, when a user operates an e-commerce APP, the user performs embedding processing on an important level in an operation flow, each user triggers a preset point once to generate scene information where the user is located and equipment attribute information, the scene information and the equipment attribute information serve as data, fields are divided by commas, users are divided by linefeed, and files are stored according to a csv format. In general, the fields of the device attribute information include: device ID (deviced _ ID), device model number (product _ names), scene information, Mac address (Mac _ addresses), APP name (label), version number (versioning), APP size (appsize), first installation time (firstinstaltime), battery health (health), state of charge (gained), current state of charge (power), standard of charge (scale), state of charge (status), voltage (voltage), battery configuration (technology), screen resolution (reliability), screen physical size (physical), screen resolution (resolution), memory size (media), current cpu number (cpu), cpu frequency (bootemips), cpu architecture (processor), cpu total number (cpu _ area), cpu attribute 1(cpu _ attribute), cpu _ core 2 (cpu _ ID), and cpu _ core _ ID), and cpu _ ID (cpu _ ID), and its configuration (processor), and its configuration (management), and its configuration (cpu _ management) and its configuration (cpu _ ID) and
(hardware), Camera Attribute 1(largestsize), Camera Attribute 2
(support _ formats), Security Module Attribute 1(blacklisthit), Security Module Attribute 2
(cydiaresult), root authority (root), sandbox (sandbox), simulator (simulator), static (static), maximum available sound volume (maxvolulaceavailability), maximum sound volume (maxvolulaceralearm), sound card information (maxvolumeddtmf), sound card information (maxvolurememusic), maximum notification sound volume (maxvolumentnotification), sound card information (maxvoluuse), maximum alarm sound volume (maxvolumering), sound card information (maxvolumesystem), sound card information (maxvoluvevolvacearability), sound card information (maxvolulacesound), sound card information (ringing), bluetooth history connection number (hasconnect), bluetooth information (bluetooth-visible or not), bluetooth information (discourse) or not, whether bluetooth information (bluetooth-available) is obtained, bluetooth information (bluetooth-supported function (2 bluetooth) or not, bluetooth information (bluetooth-supported by bluetooth), bluetooth information (2 bluetooth-supported by bluetooth function) or not (bluetooth information (bluetooth-supported by bluetooth) Bluetooth information-whether or not advertisement extension (isLeExtendedVertingSupported), bluetooth information-whether or not regular advertisement (isLePeriodicAdVertingSupported) is supported, bluetooth information-whether or not hybrid advertisement (isMultipleAdvermentSupported) is supported, bluetooth information-whether or not offload filtering (isOffloaddFilterSupported) is supported, bluetooth information-whether or not scan offload batch processing (isOffloaddScanBatchIngSupported) is supported, application number (APPLIST _ COUNT), System application number (Sypplist _ COUNT), Security Module Attribute (sensor _ COUNT), sim card information (sim _ mes), International Mobile Subscriber Identity (IMSI), International Mobile Equipment Identity (IMEI)
The security module attribute 1 and the security module attribute 2 are both self-contained information in the mobile phone system.
Secondly, the method comprises the following steps: the method comprises the steps that wind control data comprise all request information and personal information of a user, the user operates the E-commerce APP as one piece of data each time, src _ user' is used as a main key, fields are divided by commas, users are divided by line changing, and files are stored according to the format of csv. The fields of the device attribute information include: a user name (src _ user), a timestamp (event _ timestamp), a browser allocation ID (browser _ client _ ID), a business link (business _ hierarchy), a cell phone number (cellphone _ no), a cookie _ ID (cookie _ ID), a time channel (ch _ event _ channel), an event type (ch _ event _ type), a system (ch _ system), an IP address (ipaddr), an IP city (ipip _ city), an IP province (ipip _ service), a digital identity recognition frame (openid), a user agent (user agent), a hit rule number (count), a login channel (log _ channel), APP program version information (APP _ version), an openrule group name (openname), a hit rule number (count), an event number (event _ event), a rule group flag (flag), whether the message is valid device (event _ device), and whether the message is an error simulation device (event _ device) or not, Network status (network _ type), authentication mode (login _ way), login channel (login _ channel)
Thirdly, the method comprises the following steps: the business data comprises all information of orders, returned orders, order details and the like of the user, the operation of the user on the orders each time is used as one piece of data, src _ user' is used as a main key, fields are divided by commas, the users are divided by line feed, and the file is stored according to the format of csv. The fields of the device attribute information include: user name (src _ user), timestamp (eval _ timestamp), order number (order _ id), telephone number (cellphone _ no), order scene (ch _ distribution _ hierarchy), system (ch _ system), IP address (ipaddr), city where IP is located (ipip _ city), province where IP is located (ipip _ hierarchy), digital identity identification frame (openid), user agent (user agent), commodity set (goods _ set), coupon name (), order channel (event _ channel), order channel (order _ channel), order commodity amount (order _ amount), receiver machine number (order _ celphone _ no), order number (order _ no), order quantity (order _ qty), order type (order _ pe), receiver address (type _ driver), restaurant name (restaurant _ system _ name), service status evaluation system (event _ hierarchy _ ring), and service evaluation system (event _ order _ ring) Login time (login _ timestamp), authentication mode (login _ way), order time (order _ timestamp), request timestamp (timestamp), SSOID (ssoid)
Then, extracting scene information of each user to be detected, wherein the scene information in the original data comprises: registering (001), registering, acquiring a verification code (002), logging in (003), logging in and acquiring the verification code (004), logging out (005) an order placing commodity (006), submitting an order, verifying (007) payment, verifying (008) refund, verifying (009) receipt completion (010), signing in, verifying (011), acquiring a benefit code, verifying (012), and verifying (013) use of the benefit code.
In the raw data collected, the scene information is scattered. The same device ID can be used as association, then the scene information is collected, and then the scenes are sequenced according to the scene sequence of the scene corresponding to each operation: for example, after the user normally completes one order placement, the user can perform sequence processing on the operation scenes in the same operation session according to the time sequence, such as the sequence of one order placement by the client (003-. In the step, the two-degree information of the original data is extracted and added into the feature engineering as an 'individual feature value', so that the information in the features can be enriched, and the prediction accuracy of the model is improved.
The histogram can then be used for data visualization, and then the autocorrelation and partial autocorrelation maps of the histogram for each sequence of scenes are obtained. Then judging and identifying the stationarity of each scene sequence by using the existing algorithm according to the autocorrelation graph and the partial autocorrelation graph, wherein if the two graphs are both trailing, namely have an attenuation trend but the result is not 0 after visualization, the scene sequence is a non-stationary sequence, and the non-stationary scene sequence data can be differentiated to obtain a stationary sequence corresponding to the scene sequence; it will be appreciated that the stationary sequence does not require differential processing.
Further, in the extraction of the scene sequence, because of the condition limitations of hardware, privacy, and the like, there is always a part of data missing, so the scene missing rate can be calculated from the number of missing scenes in the scene sequence, for example, in the 003-. Then, the scene missing rates of all the users to be detected are counted, and then the missing confidence interval can be calculated, for example, the confidence interval can be confirmed by using a quartile method. When the confidence interval is exceeded, the information loss is considered to be more strict and is marked as 0, and risks exist; within this interval, a tolerance of 1 can be obtained. If the scene is missing more, the scene sequence can be marked as an abnormal sequence.
S102: determining a corresponding baseline fixed value by using a simple average algorithm aiming at a stationary sequence with the data length smaller than a first preset threshold value in the stationary sequence;
the simple average prediction baseline method is carried out on the part which does not meet the requirement of the data length, because the data length is too small, the length of the whole smooth is influenced, but the actual scene value at each moment before the steady sequence of the ID is calculated by using simple average such as arithmetic average method, geometric average method and weighted average method instead of removing the data integrity, so as to obtain a baseline fixed value, and the threshold value is set as the threshold value which is 7 when the length covering 90% of the whole data is taken into consideration.
S103: aiming at a stationary sequence with the data length not less than a first preset threshold value in the stationary sequence, a target ARIMA (Autoregressive Moving Average Model) Model is trained, and a predicted value of a user to be detected is predicted by using the target ARIMA Model.
And acquiring the autocorrelation coefficient graph and the partial autocorrelation graph of each stationary sequence. The autocorrelation function is to compare an ordered random variable sequence with itself, which reflects the correlation between values of the same sequence at different time sequences. The method comprises the following steps that a strict correlation between two variables is calculated by a partial autocorrelation function, and the correlation degree between the two variables is obtained after the interference of an intermediate variable is eliminated, wherein required parameters are dataset-time sequence data, a value range of a p _ values-p value, a list or array type, a value range of a d _ values-d value, a list or array type, a value range of a q _ values-q value, a list or array type and the proportion of train _ pro-training data; .
Then, the autocorrelation function and the partial autocorrelation function can be calculated and drawn by tools such as SPSS, MATLAB and the like, and how the corresponding values of p and q should be taken can be judged through the images. The ordinate of the image is a correlation coefficient, the abscissa is an order, and it can be seen that a certain periodic relationship exists between the order and the correlation coefficient. And the values of p and q are the minimum number of cycles. If the values of p and q are found to be unable to pass model test in the subsequent calculation, the values of p and q are readjusted and the calculation is carried out again. And repeatedly iterating the current ARIMA model to obtain a target ARIMA model, and predicting the predicted value of the user to be detected by using the target ARIMA model.
S104: predicting a predicted value of the user to be detected by using an exponential smoothing method to predict a baseline model;
the baseline prediction function of the time sequence data is an exponential smoothing method prediction baseline model; wherein the required parameters are time series data, thld deviation function, p _ max-p maximum, d _ max-d maximum, q _ max-q maximum, train _ pro-training sample ratio.
And predicting the predicted value of the user to be detected by using the exponential smoothing method to predict the baseline model.
It is emphasized that exponential smoothing predicts the baseline model as an existing model.
S105: and selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identifying abnormal users in the users to be detected by using the baseline model.
Then, obtaining a mean square error value of the prediction baseline model by an exponential smoothing method, and comparing the mean square error value with a mean square error value of a target ARIMA model; the baseline model is taken as the model with the minimum mean square error.
And finally, detecting abnormal users by using the baseline model.
By applying the embodiment of the invention, the corresponding stable sequences are generated from the original data of all the users to be detected, the target ARIMA model is trained, and then the optimal baseline model is screened out by combining the exponential smoothing method prediction baseline model, and the baseline model covers all the users to be detected by combining with a simple average algorithm, so that the detection range is wider, and more abnormal behaviors can be detected.
The embodiment of the invention starts from the access habit of the user, finds the abnormality by the personal behavior of the user, compares the currently input behavior data with the previous regular behavior data, and finds the abnormal output in time.
Moreover, internal threats have been the most prevalent type of network attack. The internal abnormal event is mostly a small-probability event and is required to be very accurate, the detection is carried out by relying on known rules for a long time, and the rules and experiences of numerous experts are built in a detection engine. The UEBA technology used in the prior art depends on expert experience for a long time for exploration, the related range is limited, the rule threshold is determined by known rules, so that the recall rate is lower and the accuracy rate is required to be improved.
In the embodiment of the invention, the mean square error is taken as an evaluation standard, and the optimal solution of the ARIMA model and the moving average model with optimal parameters is taken as a baseline; and the actual value/baseline value is used as a deviation value, a dynamic behavior baseline is established to discover the deviation of the internal user from the personal normal mode, the risk level of the user is judged according to the value accumulated by the risk, and more accurate abnormality is discovered from focusing on the data content to the context relationship, behavior analysis and the like.
Example 2
The embodiment 2 of the invention is added with the following steps on the basis of the embodiment 1:
sending a verification code to the user for verification aiming at the user with the order placing rate or the order returning rate exceeding a third preset threshold value, or,
and calculating a weighted probability value of the current operation session according to the occurrence probability of each scene in the historical operation session of each user and the scene category of the trunk in the current operation session of the user, and sending a verification code to the user for verification when the weighted probability value exceeds a fourth preset threshold value.
The embodiment of the invention combines the service data to establish a 'washing and whitening mechanism', and immediately identifies the user which can not be accurately positioned, thereby reducing the number of complaints of the user.
Example 3
Fig. 3 is a schematic structural diagram of an apparatus for detecting abnormal behavior of a person according to an embodiment of the present invention, where the apparatus includes:
an obtaining module 301, configured to obtain a stable sequence of raw data corresponding to a user to be detected, where the raw data includes: the device attribute information, the wind control data and the service data of the user;
a determining module 302, configured to determine, by using a simple average algorithm, a corresponding baseline fixed value for a stationary sequence whose data length is smaller than a first preset threshold in the stationary sequence;
the training module 303 is configured to train a target ARIMA model for a stationary sequence with a data length not less than a first preset threshold in the stationary sequence, and predict a predicted value of a user to be detected by using the target ARIMA model;
the first identification module 304 is configured to predict a predicted value of the user to be detected by using an exponential smoothing method to predict the baseline model;
the second identification module 305 is configured to select an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identify an abnormal user among the users to be detected by using the baseline model.
In a specific implementation manner of the embodiment of the present invention, the obtaining module 301 is configured to:
acquiring original data corresponding to a user to be detected;
denoising the original data to obtain denoised original data;
according to the operation session of the user to be detected, the original data of the user to be detected are sequenced according to the scene sequence corresponding to each operation scene in the operation session, so that the scene sequence of the user to be detected is obtained, and the scene sequence is converted into a stable sequence.
In a specific implementation manner of the embodiment of the present invention, the apparatus further includes:
and the calculation module is used for calculating scene missing rates in the scene sequences, and marking the scene sequences as abnormal sequences under the condition that the scene missing rates exceed a set confidence interval range, wherein the confidence interval range is calculated according to the scene missing rates of all users to be detected.
In a specific implementation manner of the embodiment of the present invention, the training module 303 is configured to:
acquiring an autocorrelation coefficient chart and a partial autocorrelation chart of each stationary sequence, and determining a current optimal level and a current optimal order according to a curve with highest accuracy in the autocorrelation coefficient chart and a curve with highest accuracy in the partial autocorrelation chart; establishing a current ARIMA model according to the optimal hierarchy and the optimal order; and repeatedly iterating the current ARIMA model to obtain a target ARIMA model, and predicting the predicted value of the user to be detected by using the target ARIMA model.
In a specific implementation manner of the embodiment of the present invention, the second identifying module 305 is configured to:
and calculating the mean square error of the target ARIMA model and the mean square error of the baseline model predicted by an exponential smoothing method, and taking the model with lower mean square error as the baseline model.
In a specific implementation manner of the embodiment of the present invention, the second identifying module 305 is configured to:
obtaining a baseline value corresponding to the baseline model;
calculating the ratio of the actual value to the baseline value of each user to be detected,
and under the condition that the ratio is larger than a second preset threshold value, judging that the user to be detected is an abnormal user.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for detecting abnormal behavior of an individual, the method comprising:
1) acquiring a stable sequence of original data corresponding to a user to be detected, wherein the original data comprises: the device attribute information, the wind control data and the service data of the user;
2) determining a corresponding baseline fixed value by using a simple average algorithm aiming at a stationary sequence with the data length smaller than a first preset threshold value in the stationary sequence;
3) training a target ARIMA model aiming at a stationary sequence with the data length not less than a first preset threshold value in the stationary sequence, and predicting a predicted value of a user to be detected by using the target ARIMA model;
4) predicting a predicted value of the user to be detected by using an exponential smoothing method to predict the baseline model;
5) and selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model, and identifying abnormal users in the users to be detected by using the baseline model.
2. The method for detecting the abnormal behavior of the person as claimed in claim 1, wherein the step 1) comprises:
acquiring original data corresponding to a user to be detected;
denoising the original data to obtain denoised original data;
according to the operation session of the user to be detected, the original data of the user to be detected are sequenced according to the scene sequence corresponding to each operation scene in the operation session, so that the scene sequence of the user to be detected is obtained, and the scene sequence is converted into a stable sequence.
3. The method for detecting abnormal personal behavior according to claim 1, wherein before step 3), the method further comprises:
and calculating scene missing rate in the scene sequence, and marking the scene sequence as an abnormal sequence under the condition that the scene missing rate exceeds a set confidence interval range, wherein the confidence interval range is calculated according to the scene missing rate of all users to be detected.
4. The method for detecting abnormal personal behaviors as claimed in claim 1, wherein the step 3) comprises:
acquiring an autocorrelation coefficient chart and a partial autocorrelation chart of each stationary sequence, and determining a current optimal level and a current optimal order according to a curve with highest accuracy in the autocorrelation coefficient chart and a curve with highest accuracy in the partial autocorrelation chart; establishing a current ARIMA model according to the optimal hierarchy and the optimal order; and repeatedly iterating the current ARIMA model to obtain a target ARIMA model, and predicting the predicted value of the user to be detected by using the target ARIMA model.
5. The method as claimed in claim 1, wherein the step of selecting the optimal model from the target ARIMA model and the exponential smoothing prediction baseline model in the step 5) as the baseline model comprises:
and calculating the mean square error of the target ARIMA model and the mean square error of the baseline model predicted by an exponential smoothing method, and taking the model with lower mean square error as the baseline model.
6. The method for detecting abnormal personal behaviors as claimed in claim 1, wherein the step of identifying abnormal users among the users to be detected by using the baseline model in the step 5) comprises:
obtaining a baseline value corresponding to the baseline model;
calculating the ratio of the actual value to the baseline value of each user to be detected,
and under the condition that the ratio is larger than a second preset threshold value, judging that the user to be detected is an abnormal user.
7. The method of claim 1, wherein the method further comprises:
sending a verification code to the user for verification aiming at the user with the order placing rate or the order returning rate exceeding a third preset threshold value, or,
and calculating a weighted probability value of the current operation session according to the occurrence probability of each scene in the historical operation session of each user and the scene category of the trunk in the current operation session of the user, and sending a verification code to the user for verification when the weighted probability value exceeds a fourth preset threshold value.
8. An apparatus for detecting abnormal behavior of a person, the apparatus comprising:
an obtaining module, configured to obtain a stationary sequence of raw data corresponding to a user to be detected, where the raw data includes: the device attribute information, the wind control data and the service data of the user;
the determining module is used for determining a corresponding baseline fixed value by using a simple average algorithm aiming at a stationary sequence of which the data length is smaller than a first preset threshold value in the stationary sequence;
the training module is used for training a target ARIMA model aiming at a stationary sequence with the data length not less than a first preset threshold value in the stationary sequence and predicting a predicted value of a user to be detected by using the target ARIMA model;
the first identification module is used for predicting the predicted value of the user to be detected by using an exponential smoothing method to predict the baseline model;
and the second identification module is used for selecting an optimal model from the target ARIMA model and the exponential smoothing method prediction baseline model as a baseline model and identifying abnormal users in the users to be detected by using the baseline model.
9. The apparatus for detecting abnormal personal behavior according to claim 8, wherein the obtaining module is configured to:
acquiring original data corresponding to a user to be detected;
denoising the original data to obtain denoised original data;
according to the operation session of the user to be detected, the original data of the user to be detected are sequenced according to the scene sequence corresponding to each operation scene in the operation session, so that the scene sequence of the user to be detected is obtained, and the scene sequence is converted into a stable sequence.
10. The apparatus according to claim 8, wherein the apparatus further comprises:
and the calculation module is used for calculating scene missing rates in the scene sequences, and marking the scene sequences as abnormal sequences under the condition that the scene missing rates exceed a set confidence interval range, wherein the confidence interval range is calculated according to the scene missing rates of all users to be detected.
CN202010465761.XA 2020-05-28 2020-05-28 Method and device for detecting personal abnormal behaviors Active CN111611519B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010465761.XA CN111611519B (en) 2020-05-28 2020-05-28 Method and device for detecting personal abnormal behaviors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010465761.XA CN111611519B (en) 2020-05-28 2020-05-28 Method and device for detecting personal abnormal behaviors

Publications (2)

Publication Number Publication Date
CN111611519A true CN111611519A (en) 2020-09-01
CN111611519B CN111611519B (en) 2023-07-11

Family

ID=72199734

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010465761.XA Active CN111611519B (en) 2020-05-28 2020-05-28 Method and device for detecting personal abnormal behaviors

Country Status (1)

Country Link
CN (1) CN111611519B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112822046A (en) * 2021-01-04 2021-05-18 新华三大数据技术有限公司 Flow prediction method and device
CN112907622A (en) * 2021-01-20 2021-06-04 厦门市七星通联科技有限公司 Method, device, equipment and storage medium for identifying track of target object in video
CN112966732A (en) * 2021-03-02 2021-06-15 东华大学 Multi-factor interactive behavior anomaly detection method with periodic attribute

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119100A1 (en) * 2009-10-20 2011-05-19 Jan Matthias Ruhl Method and System for Displaying Anomalies in Time Series Data
CN103489039A (en) * 2013-09-12 2014-01-01 重庆大学 Expressway traffic flow fusing and forecasting method with online self-tuning and optimizing function
CN104994539A (en) * 2015-06-30 2015-10-21 电子科技大学 Wireless sensor network traffic abnormality detection method based on ARIMA model
CN105550772A (en) * 2015-12-09 2016-05-04 中国电力科学研究院 Online historical data tendency analysis method
CN108921355A (en) * 2018-07-03 2018-11-30 国家计算机网络与信息安全管理中心 A kind of alarm threshold setting method and device based on time series predicting model
CN109376924A (en) * 2018-10-18 2019-02-22 广东电网有限责任公司 A kind of method, apparatus, equipment and the readable storage medium storing program for executing of material requirements prediction
CN109410036A (en) * 2018-10-09 2019-03-01 北京芯盾时代科技有限公司 A kind of fraud detection model training method and device and fraud detection method and device
CN109587713A (en) * 2018-12-05 2019-04-05 广州数锐智能科技有限公司 A kind of network index prediction technique, device and storage medium based on ARIMA model
CN109978597A (en) * 2019-01-22 2019-07-05 广东工业大学 A kind of Sales Volume of Commodity prediction technique under festivals or holidays effect
CN111126656A (en) * 2019-11-10 2020-05-08 国网浙江省电力有限公司温州供电公司 Electric energy meter fault quantity prediction method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110119100A1 (en) * 2009-10-20 2011-05-19 Jan Matthias Ruhl Method and System for Displaying Anomalies in Time Series Data
CN103489039A (en) * 2013-09-12 2014-01-01 重庆大学 Expressway traffic flow fusing and forecasting method with online self-tuning and optimizing function
CN104994539A (en) * 2015-06-30 2015-10-21 电子科技大学 Wireless sensor network traffic abnormality detection method based on ARIMA model
CN105550772A (en) * 2015-12-09 2016-05-04 中国电力科学研究院 Online historical data tendency analysis method
CN108921355A (en) * 2018-07-03 2018-11-30 国家计算机网络与信息安全管理中心 A kind of alarm threshold setting method and device based on time series predicting model
CN109410036A (en) * 2018-10-09 2019-03-01 北京芯盾时代科技有限公司 A kind of fraud detection model training method and device and fraud detection method and device
CN109376924A (en) * 2018-10-18 2019-02-22 广东电网有限责任公司 A kind of method, apparatus, equipment and the readable storage medium storing program for executing of material requirements prediction
CN109587713A (en) * 2018-12-05 2019-04-05 广州数锐智能科技有限公司 A kind of network index prediction technique, device and storage medium based on ARIMA model
CN109978597A (en) * 2019-01-22 2019-07-05 广东工业大学 A kind of Sales Volume of Commodity prediction technique under festivals or holidays effect
CN111126656A (en) * 2019-11-10 2020-05-08 国网浙江省电力有限公司温州供电公司 Electric energy meter fault quantity prediction method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙建树等: "基于ARIMA-SVR的水文时间序列异常值检测", 《计算机与数字工程》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112822046A (en) * 2021-01-04 2021-05-18 新华三大数据技术有限公司 Flow prediction method and device
CN112907622A (en) * 2021-01-20 2021-06-04 厦门市七星通联科技有限公司 Method, device, equipment and storage medium for identifying track of target object in video
CN112966732A (en) * 2021-03-02 2021-06-15 东华大学 Multi-factor interactive behavior anomaly detection method with periodic attribute

Also Published As

Publication number Publication date
CN111611519B (en) 2023-07-11

Similar Documents

Publication Publication Date Title
US11783028B2 (en) Systems and methods for detecting resources responsible for events
CN111614690B (en) Abnormal behavior detection method and device
CN113347205B (en) Method and device for detecting service access request
US7815106B1 (en) Multidimensional transaction fraud detection system and method
CN111611519B (en) Method and device for detecting personal abnormal behaviors
CN110223146B (en) System and method for monitoring whole process of electricity purchasing service of customer
CN112733045B (en) User behavior analysis method and device and electronic equipment
CN113572752B (en) Abnormal flow detection method and device, electronic equipment and storage medium
CN112003846B (en) Credit threshold training method, IP address detection method and related device
CN111612085B (en) Method and device for detecting abnormal points in peer-to-peer group
CN108804501B (en) Method and device for detecting effective information
CN109711984B (en) Pre-loan risk monitoring method and device based on collection urging
CN112347457A (en) Abnormal account detection method and device, computer equipment and storage medium
CN116318974A (en) Site risk identification method and device, computer readable medium and electronic equipment
CN113362069A (en) Dynamic adjustment method, device and equipment of wind control model and readable storage medium
CN110990810B (en) User operation data processing method, device, equipment and storage medium
CN114422168A (en) Malicious machine traffic identification method and system
CN112801788A (en) Internet stock right financing platform monitoring system and monitoring method
CN114189585A (en) Crank call abnormity detection method and device and computing equipment
CN111800409A (en) Interface attack detection method and device
CN117217830B (en) Advertisement bill monitoring and identifying method, system and readable storage medium
CN111447082B (en) Determination method and device of associated account and determination method of associated data object
CN117544343A (en) Risk behavior identification method and device, storage medium and computer equipment
CN114186268A (en) Session monitoring method and device
CN114611108A (en) Data processing method and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant