CN113591994A - Terminal behavior prediction method based on automatic labeling - Google Patents

Terminal behavior prediction method based on automatic labeling Download PDF

Info

Publication number
CN113591994A
CN113591994A CN202110884609.XA CN202110884609A CN113591994A CN 113591994 A CN113591994 A CN 113591994A CN 202110884609 A CN202110884609 A CN 202110884609A CN 113591994 A CN113591994 A CN 113591994A
Authority
CN
China
Prior art keywords
behavior
terminal
behavior pattern
pattern
length
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110884609.XA
Other languages
Chinese (zh)
Other versions
CN113591994B (en
Inventor
张宁波
严雅洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202110884609.XA priority Critical patent/CN113591994B/en
Publication of CN113591994A publication Critical patent/CN113591994A/en
Application granted granted Critical
Publication of CN113591994B publication Critical patent/CN113591994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Biophysics (AREA)
  • Fuzzy Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a terminal behavior prediction method based on automatic labeling, which comprises five steps of data preprocessing, frequent behavior pattern mining, behavior pattern clustering, behavior recognition and behavior prediction.

Description

Terminal behavior prediction method based on automatic labeling
Technical Field
The invention relates to the technical field of networks, in particular to a terminal behavior prediction method based on automatic labeling.
Background
In recent years, the technology of the internet of things is rapidly developed, and great improvement is brought to the daily life of human beings. The number of intelligent terminal devices is remarkably increased, and the intelligentized internet of everything becomes the inevitable trend of the development of the internet of things in the future. In the LTE-a network, call records (CDR) in a core network store call, short message, and data service information of person-to-person (Human to Human, H2H) communication in real time, including information of a User Equipment Identity (UE ID), a base station location, a direction and a communication type of a voice call (SMS/call), data traffic, and the like. According to the CDR data, hidden predictable information can be extracted, the future behavior of the terminal is predicted, a network operator can make a coping strategy in advance, and the service efficiency of the operator is improved. Similarly, in the 5G network, the core network also stores event log (EDR) data of the terminal of the internet of things in real time. The EDR data includes information such as UE ID, terminal operation sequence, operation execution time, operation duration, and physical resource occupation. Through the data, the access behavior of the terminal of the Internet of things can be predicted.
The existing terminal behavior prediction model obtains a terminal behavior sequence by artificially behavior labeling a terminal operation sequence, and can be used for modeling the terminal behavior prediction model. The behavior labeling process needs human intervention, consumes a large amount of time cost and labor cost, and has limitation in practical application.
The modeling process of the conventional terminal behavior prediction model includes the following steps.
Step 1: preprocessing EDR data of a terminal: and processing the abnormal characteristic data to obtain ERD data capable of being subjected to behavior marking.
Step 2: and (3) marking artificial behaviors: a group of continuous operation events corresponds to one behavior of the terminal, related researchers carry out artificial behavior marking, and the terminal operation sequence is marked into a corresponding terminal behavior sequence for terminal behavior prediction.
And step 3: and (3) behavior prediction: and predicting the behavior of the terminal at the next moment through a prediction model based on the marked historical terminal behavior data and the current terminal behavior.
The existing terminal behavior prediction model needs to artificially label the behavior hierarchy of the data of the operation event hierarchy, so the prediction model needs human intervention, which hinders the intellectualization of the terminal behavior prediction model and has certain limitation in practical application. In addition, when the terminal data size is very large, the workload and time cost of the terminal behavior labeling and verification process may significantly increase.
Disclosure of Invention
The invention aims to provide a terminal behavior prediction method based on automatic labeling, which realizes a terminal behavior prediction model with high accuracy rate, can automatically label, does not need human intervention, reduces time cost and labor cost, and further improves the intellectualization and practicability of the terminal behavior prediction model.
In order to achieve the above purpose, the invention provides the following technical scheme:
the invention provides a terminal behavior prediction method based on automatic labeling, which comprises the following steps:
s1, preprocessing data: acquiring current behavior data of a terminal, numbering a terminal operation sequence, screening infrequent operation events from the terminal operation sequence data, and re-numbering the processed operation data;
s2, mining frequent behavior patterns: performing frequent behavior pattern mining on the operation data processed in the step 1 until a new behavior pattern is not mined, and stopping iteration to enable a behavior pattern sequence to meet a minimum description length principle;
s3, behavior pattern clustering: clustering the frequent behavior patterns mined in the step (2) to obtain a clustering center and a category to which each behavior pattern belongs;
s4, behavior recognition: performing behavior recognition on the clustered result by adopting an HMM model, and labeling to obtain the current behavior and the historical behavior of the terminal;
s5, behavior prediction: inputting the current terminal behavior into a trained prediction model to obtain the predicted terminal behavior at the next moment, wherein the behavior prediction model is obtained by training a training sample based on the prediction model of the neural network, and the training sample comprises the historical behavior of the terminal.
Further, in step S1, the current behavior data of the terminal includes EDR data of the terminal and log information, and the current behavior of the terminal is obtained by automatically labeling according to the EDR data of the terminal.
Further, the EDR data of the terminal at least comprises one of the following information: UE ID, terminal operation sequence, operation execution time, operation duration and occupied physical resource information.
Further, the method in step S2 specifically includes:
s201, searching a non-repeated general behavior pattern with the length of L by using a sliding window: setting the initial iteration number to be 1, setting the size of a sliding window to be L, searching a behavior pattern with the length of L, combining repeated behavior patterns, and taking the combined behavior pattern as an initial general behavior pattern;
s202, judging whether the behavior pattern with the length of L +1 is a variant of the general behavior pattern with the length of L or a new general behavior pattern: comparing the similarity of the behavior pattern with the length L +1 with the general behavior pattern with the length L, wherein the similarity of the two behavior patterns is measured by the edit distance, and if the similarity is greater than a given threshold, the behavior pattern with the length L +1 is considered to be a variant of the general behavior pattern with the length L; otherwise, the method is regarded as a new general behavior pattern with the length of L + 1; the common behavior pattern and its corresponding variants are stored using a dictionary;
s203, whether the general behavior mode needs pruning is measured through a minimum description length principle, the general behavior mode which is mined and does not conform to the minimum description length principle and the variant of the general behavior mode are pruned, and iteration is stopped when the general behavior mode cannot be found any more.
Further, the method for clustering patterns in step S3: and initially, randomly selecting a clustering center, and continuously and iteratively updating the clustering center according to the editing distance until convergence.
Further, the action in step S4 identifies a decoding problem corresponding to the HMM model, which is solved using the Viterbi algorithm.
Further, the terminal operation sequence is used as an observation sequence, and the clustered terminal behavior mode is used as a hidden state, so that parameters required by the Viterbi algorithm are calculated, wherein the parameters comprise an observation probability matrix, an initial state probability matrix and a state transition probability matrix.
Further, the calculation method of the initial state probability matrix is as follows: the total number of occurrences of all behavioral patterns in this class is divided by the total number of occurrences of all behavioral patterns in all classes. .
Further, the calculation method of the state transition probability matrix is as follows: in the course of the behavior pattern extraction, marking and recording the starting position and the ending position of each behavior pattern corresponding to the operation sequence data, comparing the recorded starting and ending subscripts with the starting and ending subscripts of each behavior pattern in other classes for each behavior pattern in one class, if the subscripts do not have an inclusion relationship, adding 1 to the number of transition states, and then dividing the number of transition states of each class by the total number of transition states to obtain the transition probability from each class to each other class.
Further, the observed probability matrix is calculated by dividing the total number of occurrences of each operation by the total number of occurrences of all operations in each class.
Compared with the prior art, the invention has the beneficial effects that:
the terminal behavior prediction method based on automatic labeling comprises five steps of data preprocessing, frequent behavior pattern mining, behavior pattern clustering, behavior recognition and behavior prediction, the behavior recognition model and the behavior prediction model are well combined, manual intervention is not needed, the operation data sequence can be automatically labeled, the problem that the terminal behavior prediction model in the current scene of the Internet of things cannot automatically label behaviors is solved, the method has high accuracy of behavior recognition and behavior prediction, a large amount of time cost and labor cost required in the behavior labeling process are saved, and integration of terminal behavior recognition and behavior prediction in the environment of the Internet of things and further intellectualization of the terminal behavior prediction model are achieved.
Drawings
In order to more clearly illustrate the embodiments of the present application or technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.
Fig. 1 is a flowchart of a terminal behavior prediction method based on automatic annotation according to an embodiment of the present invention.
Fig. 2 is a flowchart of frequent behavior pattern mining according to an embodiment of the present invention.
Detailed Description
For a better understanding of the present solution, the method of the present invention is described in detail below with reference to the accompanying drawings.
The terminal behavior prediction method based on automatic labeling, provided by the invention, well combines a behavior recognition model and a behavior prediction model, and mainly comprises five steps of data preprocessing, frequent behavior pattern mining, behavior pattern clustering, behavior recognition and behavior prediction, as shown in figure 1. Wherein:
step 1: data preprocessing: firstly numbering the terminal operation sequence, then screening out infrequent operation events from the terminal operation sequence, and renumbering the processed operation data. Assuming that the terminal operation data has X operation events, random numbering (0-X-1) is carried out firstly, when the frequency threshold is set to be f, f X operation events with the frequency of occurrence times larger than the frequency threshold should be screened out and retained, meanwhile, (1-f) X operation events with the frequency smaller than the frequency threshold are eliminated, and the retained f X operation events are numbered again.
Step 2: and (3) frequent behavior pattern mining: and (3) performing frequent behavior pattern mining on the operation data processed in the step 1, wherein the flow of the steps is shown in fig. 2.
The first step is to use a sliding window to find a non-repeating generic behavior pattern of length L. Specifically, the initial iteration number is 1, the size of the sliding window is set to be L, a behavior pattern with the length of L is searched, and after repeated behavior patterns are combined, the behavior pattern serves as an initial general behavior pattern (where the initial size L of the sliding window is set to be 2). Assuming that the operation data sequence is [3,5,6,7,6,7 … ] and has a length of N, when the initial size L of the sliding window is set to 2, the operation data is subjected to sliding window extraction, and the extracted behavior patterns have N-L +1, which are [3,5], [5,6], [6,7], [7,6], [6,7] …, and only one of the repeated behavior patterns [6,7] is merged.
The second step is to judge whether the behavior pattern with the length of L +1 is a variant of the general behavior pattern with the length of L or a new general behavior pattern, and the step is a merging process. And comparing the similarity of the behavior pattern with the length L +1 with the general behavior pattern with the length L, wherein the similarity of the two behavior patterns is measured by the edit distance. If the similarity is greater than a given threshold, then the behavior pattern of length L +1 is considered a variant of the general behavior pattern of length L; otherwise, the method is considered as a new general behavior pattern with the length of L + 1. There are many possible variations of a generic behavior pattern. For convenience of comparison and improvement of query efficiency, the common behavior pattern and its corresponding variant are stored by a dictionary. Assuming that the similarity threshold is 0.6, assuming that the extracted general behavior pattern with the length of L (L ═ 2) is [6,7], the mined behavior pattern with the length of L +1(3) is [5,6,7], comparing the similarities between the two behavior patterns, and considering that the behavior pattern [5,6,7] is a variant of the general behavior pattern [6,7] as the similarity between the two is greater than the set threshold, and storing the general behavior pattern and the variant thereof in a dictionary.
And the third step is pruning operation, which is carried out at the end of iteration, and whether the general behavior mode needs pruning is measured through a minimum description length principle. Specifically, common behavior patterns which do not conform to the minimum description length principle and variants of the common behavior patterns which are mined are cut off, and frequent behavior patterns are searched to the maximum extent. Through pruning operations, the redundancy of behavior patterns can be greatly eliminated. The iteration is stopped when the generic behavior pattern is no longer found. It is assumed that after step 2, M terminal-generic behavior patterns are obtained, and these behavior patterns and their variants are stored in a dictionary.
And step 3: behavior pattern clustering: and (4) clustering the frequent behavior patterns mined in the step (2), and obtaining the class (which behavior) of each behavior pattern from the clustering result.
The behavior pattern is first preprocessed. The terminal behavior pattern after frequent pattern mining is composed of operation events. In clustering algorithms, patterns consist of states. The state thus corresponds to an event of the mode, but the state may also contain additional information such as duration of operation, occupied physical resource information, type of operation event, and duration, etc. We merge all the successive states corresponding to the same operation to form an extended state. For example, if an operation is repeatedly triggered several times in succession and no other operation event interrupts the sequence, the repeated operation events are merged into one operation event with a longer duration and the duration (number of repeated triggers) is recorded as a state attribute. After the processing, the operation event sequence is converted into an extended state sequence, the representation of the behavior pattern is simpler and more compact, and whether the two behavior patterns are similar or not is easier to compare, so that the complexity of calculation is reduced.
The clustering method of behavior patterns is exemplified by the K-means clustering method, but not limited to this method. In order to calculate the similarity between two behaviors, the distance between two extended state sequences needs to be defined. Since the operation sequence and the extended state sequence are not numerical sequences, but category sequences. The numerical value of the data in the sequence represents the category and does not represent the position in the space, so that the common scalar measurement distance cannot be used for measuring the similarity between the two behavior sequences, and the edit distance is adopted. The effect of the edit distance is mainly to compare the similarity between two character strings. The edit distance is the minimum number of edit operations required to change from one string to another string, and if the distance is larger, the more different the strings are. Permitted editing operations include replacing one character with another, inserting one character and deleting one character. From the definition of the edit distance, the edit distance is suitable for comparing the distance between extended state sequences (category sequences). And (3) clustering the behavior patterns mined in the step (2), initially randomly selecting a clustering center, and continuously and iteratively updating the clustering center according to the editing distance until convergence. After clustering, the clustering center and which class each behavior pattern belongs to can be obtained. Assuming that M terminal behavior patterns are mined after step 2 and belong to 5 types of terminal behaviors, 5 cluster centers can be obtained after clustering, and which type each behavior pattern belongs to (the number of the corresponding type is 1-5), such as [2,3,1,5,5,4, … ], can be obtained, and the number at each position represents which type the behavior pattern at that position belongs to.
And 4, step 4: and (3) behavior recognition: and performing behavior recognition of the terminal by using a Hidden Markov Model (HMM).
Behavior recognition corresponds to a decoding problem of the HMM model, and the decoding problem of the HMM is solved by using a Viterbi algorithm. For the Viterbi algorithm, dynamic programming is usually adopted to solve the decoding problem of the HMM model, and it can find the path with the highest probability (the optimal path), where one path corresponds to one hidden state sequence in the HMM model. In the HMM model, a terminal operation sequence is regarded as an observation sequence, and a result after clustering is regarded as a hidden state.
When terminal behavior labeling is carried out on the terminal historical data, in the process of mining the frequent behavior patterns, the length range of the behavior sequence does not need to be known in advance, namely the behavior patterns with different lengths are mined according to the set sequence length range, and iteration is carried out continuously, so that the behavior pattern sequence meets the minimum description length principle; pruning is carried out on the behavior patterns, redundancy is removed, and iteration is stopped until a new behavior pattern is not mined.
Through clustering, the category of each behavior pattern can be known, and an observation probability matrix, an initial state probability matrix and a state transition probability matrix are calculated.
Initial state probability calculation: after clustering, all the mined terminal behavior patterns are classified into corresponding classes, so that each class of behaviors has a plurality of behavior patterns. For each class (each cluster), the initial state probability is defined as the number of behavior patterns in this class divided by the total number of all behavior patterns in all classes.
The calculation process of the transition probability is complicated, the start position and the end position (subscript) of each behavior pattern corresponding to the operation data are marked and recorded in the behavior pattern extraction process, and the category of each behavior pattern can be known from the result of the clustering process. Thus, for each behavior pattern in one class, the recorded start and end indices are compared to the start and end indices of each behavior pattern in the other class. If these subscripts do not contain a relationship, then it is assumed that a state transition is present, then the transition state number is incremented by 1. And then dividing the number of the transition states of each class by the total number of the transition states to obtain the transition probability from each class to each other class. Assuming that 5 classes of behaviors are obtained through clustering in the step 3, namely A, B, C, D and E, the transition probability from each class to other classes and the probability of each class transferring to the own class are calculated respectively. For class A behavior, the transition probabilities between A- > A, A- > B, A- > C, A- > D, A- > E need to be calculated.
And (3) calculating observation probability: taking the terminal operation sequence as an observation sequence, firstly counting the occurrence frequency of each operation event in the terminal operation data, assuming that the terminal operation data has X operation events, respectively counting the occurrence frequency of the X operation events, and then calculating the observation probability by dividing the occurrence frequency of each operation by the total occurrence frequency of all the operations.
And 5: and (3) behavior prediction: after frequent behavior pattern mining, behavior pattern clustering and behavior identification are carried out on the operation sequence data, manual marking and checking work is not needed, and the operation data are automatically marked into corresponding terminal behaviors. And obtaining the current behavior of the terminal and the historical behavior of the terminal. The historical behavior of the terminal is obtained by behavior marking according to the historical data of the terminal.
And constructing a behavior prediction model of the terminal for the marked terminal behavior data based on the neural network. The prediction model based on the neural network can effectively model the time series data, and the prediction result has higher accuracy. Take Long Short Term Memory (LSTM) network as an example, but not limited to this method. The prediction model based on the LSTM network is capable of efficiently predictive modeling time series data having long-term dependency, and the LSTM network is suitable for predictive modeling of labeled terminal behavior data because the terminal behavior data is a long series of time-varying, long-term time-dependent series data and the prediction process of the terminal behavior depends on the previous behavior. Based on the method, the overall accuracy of the terminal behavior automatic labeling can reach 89.3%, and the accuracy of the top2 of the terminal behavior prediction can reach 92.37%.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: it is to be understood that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for some of the technical features thereof, but such modifications or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A terminal behavior prediction method based on automatic labeling is characterized by comprising the following steps:
s1, preprocessing data: acquiring current behavior data of a terminal, numbering a terminal operation sequence, screening infrequent operation events from the terminal operation sequence data, and re-numbering the processed operation data;
s2, mining frequent behavior patterns: performing frequent behavior pattern mining on the operation data processed in the step 1 until a new behavior pattern is not mined, and stopping iteration to enable a behavior pattern sequence to meet a minimum description length principle;
s3, behavior pattern clustering: clustering the frequent behavior patterns mined in the step (2) to obtain a clustering center and a category to which each behavior pattern belongs;
s4, behavior recognition: performing behavior recognition on the clustered result by adopting an HMM model, and labeling to obtain the current behavior and the historical behavior of the terminal;
s5, behavior prediction: inputting the current terminal behavior into a trained prediction model to obtain the predicted terminal behavior at the next moment, wherein the behavior prediction model is obtained by training a training sample based on the prediction model of the neural network, and the training sample comprises the historical behavior of the terminal.
2. The method for predicting terminal behavior based on automatic annotation of claim 1, wherein the terminal current behavior data in step S1 includes terminal EDR data and log information, and the terminal current behavior is obtained by automatic annotation according to the terminal EDR data.
3. The method for predicting behavior of terminal based on automatic labeling according to claim 2, wherein EDR data of terminal at least comprises one of the following information: UEID, terminal operation sequence, operation execution time, operation duration and occupied physical resource information.
4. The method for predicting terminal behavior based on automatic annotation according to claim 1, wherein the method in step S2 specifically comprises:
s201, searching a non-repeated general behavior pattern with the length of L by using a sliding window: setting the initial iteration number to be 1, setting the size of a sliding window to be L, searching a behavior pattern with the length of L, combining repeated behavior patterns, and taking the combined behavior pattern as an initial general behavior pattern;
s202, judging whether the behavior pattern with the length of L +1 is a variant of the general behavior pattern with the length of L or a new general behavior pattern: comparing the similarity of the behavior pattern with the length L +1 with the general behavior pattern with the length L, wherein the similarity of the two behavior patterns is measured by the edit distance, and if the similarity is greater than a given threshold, the behavior pattern with the length L +1 is considered to be a variant of the general behavior pattern with the length L; otherwise, the method is regarded as a new general behavior pattern with the length of L + 1; the common behavior pattern and its corresponding variants are stored using a dictionary;
s203, whether the general behavior mode needs pruning is measured through a minimum description length principle, the general behavior mode which is mined and does not conform to the minimum description length principle and the variant of the general behavior mode are pruned, and iteration is stopped when the general behavior mode cannot be found any more.
5. The method for predicting terminal behavior based on automatic labeling of claim 1, wherein the method for clustering behavior patterns in step S3 comprises: and initially, randomly selecting a clustering center, and continuously and iteratively updating the clustering center according to the editing distance until convergence.
6. The automatic labeling-based terminal behavior prediction method of claim 1, wherein the behavior in step S4 identifies a decoding problem corresponding to the HMM model, and the decoding problem of the HMM model is solved by using a Viterbi algorithm.
7. The method of claim 6, wherein the terminal operation sequence is used as an observation sequence, and the clustered terminal behavior pattern is used as a hidden state, so as to calculate parameters required by the Viterbi algorithm, including an observation probability matrix, an initial state probability matrix and a state transition probability matrix.
8. The method for predicting terminal behavior based on automatic labeling according to claim 7, wherein the initial state probability matrix is calculated by: the total number of occurrences of all behavioral patterns in this class is divided by the total number of occurrences of all behavioral patterns in all classes.
9. The method for predicting terminal behavior based on automatic labeling according to claim 7, wherein the method for calculating the state transition probability matrix is as follows: in the course of the behavior pattern extraction, marking and recording the starting position and the ending position of each behavior pattern corresponding to the operation sequence data, comparing the recorded starting and ending subscripts with the starting and ending subscripts of each behavior pattern in other classes for each behavior pattern in one class, if the subscripts do not have an inclusion relationship, adding 1 to the number of transition states, and then dividing the number of transition states of each class by the total number of transition states to obtain the transition probability from each class to each other class.
10. The method of claim 7, wherein the observation probability matrix is calculated by dividing the total occurrence count of each operation by the total occurrence count of all operations in each class.
CN202110884609.XA 2021-08-03 2021-08-03 Terminal behavior prediction method based on automatic labeling Active CN113591994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110884609.XA CN113591994B (en) 2021-08-03 2021-08-03 Terminal behavior prediction method based on automatic labeling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110884609.XA CN113591994B (en) 2021-08-03 2021-08-03 Terminal behavior prediction method based on automatic labeling

Publications (2)

Publication Number Publication Date
CN113591994A true CN113591994A (en) 2021-11-02
CN113591994B CN113591994B (en) 2023-06-06

Family

ID=78254497

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110884609.XA Active CN113591994B (en) 2021-08-03 2021-08-03 Terminal behavior prediction method based on automatic labeling

Country Status (1)

Country Link
CN (1) CN113591994B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334845A (en) * 2007-06-27 2008-12-31 中国科学院自动化研究所 Video frequency behaviors recognition method based on track sequence analysis and rule induction
US20160189183A1 (en) * 2014-12-31 2016-06-30 Flytxt BV System and method for automatic discovery, annotation and visualization of customer segments and migration characteristics
US20180107529A1 (en) * 2016-10-13 2018-04-19 Nec Laboratories America, Inc. Structural event detection from log messages
US20180150547A1 (en) * 2016-11-30 2018-05-31 Business Objects Software Ltd. Time series analysis using a clustering based symbolic representation
CN110018670A (en) * 2019-03-28 2019-07-16 浙江大学 A kind of industrial process unusual service condition prediction technique excavated based on dynamic association rules

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101334845A (en) * 2007-06-27 2008-12-31 中国科学院自动化研究所 Video frequency behaviors recognition method based on track sequence analysis and rule induction
US20160189183A1 (en) * 2014-12-31 2016-06-30 Flytxt BV System and method for automatic discovery, annotation and visualization of customer segments and migration characteristics
US20180107529A1 (en) * 2016-10-13 2018-04-19 Nec Laboratories America, Inc. Structural event detection from log messages
US20180150547A1 (en) * 2016-11-30 2018-05-31 Business Objects Software Ltd. Time series analysis using a clustering based symbolic representation
CN110018670A (en) * 2019-03-28 2019-07-16 浙江大学 A kind of industrial process unusual service condition prediction technique excavated based on dynamic association rules

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
史殿习;李寒;杨若松;莫晓;魏菁;: "用户日常频繁行为模式挖掘", 国防科技大学学报, no. 01 *

Also Published As

Publication number Publication date
CN113591994B (en) 2023-06-06

Similar Documents

Publication Publication Date Title
WO2021051945A1 (en) Server performance monitoring method and apparatus, and computer device and storage medium
CN111860981B (en) Enterprise national industry category prediction method and system based on LSTM deep learning
EP3441888A1 (en) Method and system for model fitting to hierarchical time series clusters
CN112529204A (en) Model training method, device and system
CN112965960A (en) Wisdom police service data fusion washs and studies out and judges device
CN110162970A (en) A kind of program processing method, device and relevant device
CN111738520A (en) System load prediction method fusing isolated forest and long-short term memory network
CN116186759A (en) Sensitive data identification and desensitization method for privacy calculation
CN118070229A (en) Equipment fault early warning model and method based on multi-mode data mining
CN113902220A (en) Vehicle track prediction method based on adaptive density clustering algorithm
CN114090769A (en) Entity mining method, entity mining device, computer equipment and storage medium
CN103699653A (en) Method and device for clustering data
CN113591994B (en) Terminal behavior prediction method based on automatic labeling
CN115062619B (en) Chinese entity linking method, device, equipment and storage medium
CN115268867B (en) Abstract syntax tree clipping method
CN111210634A (en) Intelligent traffic information processing method and device, intelligent traffic system and server
CN110650130B (en) Industrial control intrusion detection method based on multi-classification GoogLeNet-LSTM model
CN105373804A (en) A human body part positioning method and system based on multi-dimensional space quick clustering
CN116860981A (en) Potential customer mining method and device
CN118093659B (en) Database Gao Weishu query method based on three-input network and high-point tree
CN117973872B (en) Supply chain risk identification method and device, electronic equipment and storage medium
CN112861130B (en) Multi-class conversion malicious software detection method from N to N +1
CN115033690B (en) Communication defect research and judgment knowledge base construction method, defect identification method and system
CN118733006A (en) Script file generation method, task processing method and electronic equipment
CN114491225A (en) Information retrieval method and device for smart city system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant