CN108664548B - Network access behavior characteristic group dynamic mining method and system under degradation condition - Google Patents

Network access behavior characteristic group dynamic mining method and system under degradation condition Download PDF

Info

Publication number
CN108664548B
CN108664548B CN201810255630.1A CN201810255630A CN108664548B CN 108664548 B CN108664548 B CN 108664548B CN 201810255630 A CN201810255630 A CN 201810255630A CN 108664548 B CN108664548 B CN 108664548B
Authority
CN
China
Prior art keywords
data
matrix
maximum
user
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810255630.1A
Other languages
Chinese (zh)
Other versions
CN108664548A (en
Inventor
廖名学
张思含
肖庆都
王蕊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN201810255630.1A priority Critical patent/CN108664548B/en
Publication of CN108664548A publication Critical patent/CN108664548A/en
Application granted granted Critical
Publication of CN108664548B publication Critical patent/CN108664548B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a dynamic mining method and a dynamic mining system for network access behavior feature groups under a degradation condition, aiming at statistical data of individual access webpages, under the continuous change condition that the individual and access relations disappear, an intelligent model capable of quickly and efficiently searching maximum groups in dynamically changed data is established, wherein each maximum group represents a maximized group with maximized common network access behavior features; providing an input interface for a user, inputting effective frequency statistical data of each webpage accessed by an individual, converting the frequency statistical data into a matrix form, performing one-time scanning search on the basis of the matrix to search all maximum subgroups in the matrix and storing the maximum subgroups in a memory, then providing an interface for inputting matrix points or edge deletion data for the user, normalizing the deletion data input by the user into edge deletion data, finally performing an iterative search process on each piece of deletion data, and outputting all maximum subgroups obtained by the last iteration, namely all groups with maximized common network access behavior characteristics.

Description

Network access behavior characteristic group dynamic mining method and system under degradation condition
Technical Field
The invention belongs to the field of data mining, and particularly relates to a dynamic mining method and system for network access behavior feature groups under a degradation condition.
Background
At present, the relation maps are widely applied in the scientific fields of social relation networks, gene biology, cognitive radio and the like. In many large data fields, there is a need to search for populations or targets with maximized common characteristics. The groups or targets and their features are usually abstractly expressed in the form of various graphs, wherein the groups or targets with maximized common features are expressed in the form of some special graphs, including: maximum clique, maximum bipartite clique, quasi-bipartite clique, maximum edge bipartite clique, maximum balance bipartite clique, and frequent item set, etc.
The invention mainly aims at online network access relations, and a group with maximized common access relations is searched. The population with the maximized co-visit relationship is essentially the largest bipartite. It has been proven that the maximum two-cluster search problem is equivalent to the maximum frequent closed item set search problem, so in recent years, the maximum two-cluster search technology has been rapidly developed in the fields of various databases and relational maps, and the main algorithms include: DCI-CLOSED algorithm, D-Miner algorithm, LCM-MBC algorithm and the like. Among them, the DCI-CLOSED algorithm focuses on enumerating the largest bipartite cluster from a large bipartite graph. Bimax and D-Miner algorithms generate all bi-directional clusters that represent gene expression data. The DataPeeler algorithm efficiently mines the closed frequent item sets corresponding to the largest two clusters one by one from the three-dimensional data set. The LCM-MBC algorithm searches for the largest bipartite cluster from a symmetric undirected large graph. The cube miner-MBC algorithm enumerates the 3D maximum bipartite blob from the 3D symmetric matrix using symmetry enumeration of the graph. The EMBS algorithm searches the maximum two clusters with limited characteristics by using a dynamic threshold, and can output all the maximum two clusters under the condition of no limitation, and the efficiency is slightly higher than that of the LCM-MBC algorithm. The above algorithm searches for the largest blob if the input data remains static. However, in many application scenarios, when the external environment changes, the input data also changes, including the case of adding or deleting edges or vertices of the graph.
Aiming at the scene that the input data can be dynamically changed, the maximum binary cluster is searched in the dynamically changed data mainly by adopting a method based on a sliding window at present, and the main algorithms comprise a Max-FISM algorithm, a VSW algorithm, an MWFIM algorithm and the like. Wherein the Max-FISM algorithm mines a frequent set of items in a sliding window of a continuous data stream. The VSW algorithm may continuously mine frequent patterns over a sliding window of variable size. The MWFIM algorithm prunes weighted infrequent patterns from the transactional database and uses a prefix tree with a decreasing order. The TKC-DS algorithm is used to efficiently mine the set of top-K closed terms in the data stream. Although these methods are capable of searching for the largest blobs in dynamically changing data, such sliding window based methods are inherently limited by the size of the window, and the results tend to be coarse rather than precise.
In the dynamic change process of input data, two conditions of data degradation and enhancement are included: data degradation refers to the situation where a point or edge in the input data disappears; data enhancement refers to the situation where points or edges in the input data are increased. Two different types of dynamic changes and search techniques thereof are completely different, and no accurate and efficient solution is provided at present. The invention provides a dynamic mining method for accurately and efficiently searching a network access behavior characteristic group aiming at the condition of data degradation.
Disclosure of Invention
The invention solves the problems: aiming at statistical data of an individual access webpage, under the continuous change condition that an individual (namely a point) and an access relation (namely an edge) disappear, an intelligent model which can quickly and efficiently search the maximum clustering in the changed data can be established, all groups with maximized common access characteristics are determined, and a user can accurately and quickly lock, track or monitor a target group.
The technical scheme adopted by the invention is as follows:
a dynamic mining method for network access behavior feature group under degradation condition includes providing an input interface for user, inputting effective frequency statistic data of each type of web page accessed by individual by user, converting frequency statistic data into 0,1 matrix, executing one-time scanning search algorithm based on said matrix to obtain all maximum binary groups in said matrix and storing all maximum binary groups in internal memory, providing an interface for inputting matrix point or edge deletion data for user, normalizing deletion data inputted by user into edge deletion data, executing maximum binary group iterative search process for each deletion data and outputting all maximum binary groups obtained by last iteration.
According to the dynamic mining method for the network access behavior feature group under the degradation condition, a user inputs effective frequency statistical data of each type of webpage accessed by an individual through the input interface, the individual is an internet user, the effective frequency statistical data refers to the time of day as a unit between the time of the individual accessing the webpage of the type and the current time, the total frequency of the individual accessing the webpage of the type is divided by the time of the individual accessing the webpage of the type, the effective frequency statistical data of the individual accessing the webpage of the type is finally normalized to be 0 or 1, wherein 0 represents that the frequency is insufficient, and 1 represents that the frequency is sufficient.
In the method for dynamically mining the network access behavior feature group under the degradation condition, converting the frequency statistical data into a 0,1 matrix means that the frequency statistical data input by the user is processed and expressed as a matrix, wherein one row of the matrix represents an individual, one column of the matrix represents a type of web page, and elements of the matrix represent the access frequency of the individual to the corresponding type of web page.
In the method for dynamically mining the network access behavior feature population under the degradation condition, the step of executing the scanning search algorithm once on the basis of the matrix to obtain all the maximum two clusters in the matrix and storing the maximum two clusters in the memory means that the EMBS algorithm is executed on the converted matrix to search and obtain all the maximum two clusters, wherein one maximum two cluster represents the most users with the same access webpage types.
In the method for dynamically mining the network access behavior feature group under the degradation condition, an interface for inputting matrix points or deleting data at the edges is provided for the user, wherein the interface for deleting data at the points refers to which individuals represented in the user input matrix are deleted, and the interface for deleting data at the edges refers to which individuals represented in the user input matrix have the frequency of accessing the webpage changed from 1 to 0.
In the above method for dynamically mining network access behavior feature groups under degraded conditions, the normalization of the pruned data input by the user into pruned data of edges refers to a pruning situation in which the pruned individual input by the user is converted into a plurality of edges, for example, if the user prunes an individual, the method is equivalent to completely pruning all access frequency data corresponding to the individual, and finally converting all pruning points or edges input by the user into a plurality of edges pruning situation.
The method for dynamically mining the network access behavior feature population under the degradation condition includes executing a maximum clustering iterative search process on each piece of deleted data, and outputting all maximum clustering obtained through the last iteration, where executing an iterative search process on the basis of the maximum clustering obtained through the first search, that is, deciding on each deleted edge for the maximum clustering obtained through each search, and if the maximum clustering contains the deleted edge, performing decomposition and judgment, where the decomposition is to divide the maximum clustering into a plurality of clustering according to the deleted edge, and judging whether the clustering is still the maximum clustering, and if the decomposed result is the maximum clustering, storing the maximum clustering obtained through decomposition. Each time a deleted edge is processed, a new set of maximum binary clusters is obtained, and when the next deleted edge is processed, the processing procedure is repeated based on the newly obtained maximum binary clusters.
Compared with the prior art, the invention has the beneficial effects that:
according to the method, all maximum groups can be searched in the changed data quickly and efficiently under the continuous change condition that the individual (point) and the access relation (edge) disappear, all groups with maximized common access characteristics are determined, and a user can lock, track or monitor the target group accurately and quickly. Compared with the prior art that the specific group cannot be quickly and accurately searched, the invention provides the iterative search method, and for dynamically changed data, only the changed data needs to be searched, and the whole data does not need to be searched, so that the specific group can be quickly and accurately searched.
Drawings
FIG. 1 is a flow chart of the method implementation of the present invention.
Detailed Description
Embodiments of the present invention are further provided below in conjunction with the appended drawings and this summary.
As shown in fig. 1, the method of the present invention develops a prototype system, which includes a user data input interface, a data-matrix conversion module, an EMBS search module, an input interface for point or edge pruning data, a normalized edge processing module, and an iterative search module: inputting effective frequency statistical data of each type of webpage accessed by an individual through a data input interface by a user; the data-matrix conversion module converts effective frequency statistical data input by a user into a 0,1 matrix; the EMBS searching module performs one-time scanning on the matrix according to an EMBS searching method to search and store all the maximum two clusters in the matrix; inputting a point or an edge to be deleted in a matrix through an input interface of the point or edge deletion data by a user; the normalized edge processing module is used for converting all deletion points or edge conditions input by a user into deletion conditions of a plurality of edges and recording the deletion conditions; the iterative search module processes each deleted side in sequence and carries out the next search processing on the basis of the processing result of the previous side.
As shown in FIG. 1, the method of the present invention has specific operation processes.
(1) The user inputs effective frequency statistical data of each type of webpage accessed by an individual through the input interface, the individual is an internet user, the effective frequency statistical data refers to the total times of the individual accessing a certain type of webpage divided by the time of the individual accessing the certain type of webpage to the current time in days, and the user determines that the effective frequency is 0 or 1.
(2) The system converts the effective frequency statistic data input by the user into a 0,1 matrix through the data-matrix conversion module, namely, the effective frequency statistic data input by the user is processed and expressed into a matrix M, wherein one row of the matrix represents an individual, one column of the matrix represents a type of webpage, and elements of the matrix represent the access frequency of the individual to the webpage of the corresponding type. An example of the matrix is shown in table 1, which contains effective frequency data of five types of web pages, i.e., 0,1, 2, 3, and 4, accessed by five individuals (internet users), a, b, c, d, and e.
TABLE 1
0 1 2 3 4
a 0 1 0 1 1
b 1 0 1 1 1
c 0 1 0 1 1
d 1 1 1 0 0
e 1 1 1 0 0
(3) The system searches all the maximum two clusters in the matrix M by performing one-time scanning on the matrix M according to a publicly published EMBS searching method through the EMBS searching module and stores the maximum two clusters in the matrix B. For example, by searching the matrix represented by table 1 according to the EMBS, the largest bipartite (i.e., the most users with the most same visited web page types) can be obtained as { (a, c) - (1,3,4), (a, b, c) - (3,4), (a, c, d, e) -1 }.
(4) The system deletes the data input interface through the point or edge, which means that the user inputs the point or edge to be deleted in the M. For example, for Table 1, the user may delete point a or delete edge a-1.
(5) The system converts all deletion points or edge conditions input by the user into deletion conditions of a plurality of edges through the normalization edge processing module, and records the deletion conditions into E. As shown in table 1, when the user deletes point a, the system will automatically convert the deletion to delete all edges corresponding to point a, i.e. delete three edges a-1, a-3, a-4 at the same time.
(6) The system executes search through the iterative search module, specifically, the iterative search is performed according to the following process.
(6.1) set B' to null.
(6.2) taking out one side E from E.
(6.3) taking out a maximum micelle G from B.
(6.4) if G does not contain e, then put G to B'; if G contains e, then decompose G into left sub-graph G1And G2Left drawing G1G-a, right diagram G2If G is G-b1Is the largest two clusters, G is1Put into B', if G2Is the largest two clusters, G is2Put into B'. If G is the last maximum two blobs, put the maximum two blobs in B 'into B, namely B ← B', and then return to(6.2), otherwise, directly returning to (6.3).
(6) And outputting a set B'.
The effectiveness comparison is carried out by using an EMBS algorithm repeated search method and the iterative search method, the comparison result of the search efficiency under the conditions of matrixes with different sizes and different matrix densities is shown in a table 2, and the results show that the method has high efficiency on the premise of keeping accuracy, and the search time is far shorter than that of the repeated search method.
TABLE 2
Matrix size EMBS method (ms) Iterative method (ms) Maximum number of two clusters Density of matrix
10*10 10 9 36 0.48
12*12 40 10 63 0.44
16*16 85 12 190 0.45
20*20 204 15 355 0.46
24*24 280 24 1465 0.49
32*32 3450 180 7595 0.5
40*40 38317 220 17041 0.47
48*48 181397 246 41872 0.46
Aiming at the statistical data of the individual access webpage, the method can accurately and efficiently search the maximum binary group in the changed data under the continuous change condition that the individual (namely point) and the access relation (namely edge) disappear, and determine all groups with the maximized common access characteristic.
The invention has not been described in detail and is part of the common general knowledge of a person skilled in the art.
The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims (2)

1. A dynamic mining method for network access behavior feature groups under a degradation condition is characterized in that: the method comprises the following steps:
(1) providing an input interface for a user, and providing effective frequency statistical data of each webpage accessed by an individual input user;
(2) converting the frequency statistical data into a 0,1 frequency matrix, performing one-time scanning search on the basis of the frequency matrix to search all maximum dicules in the frequency matrix, and storing all the maximum dicules in a set B;
(3) providing an interface for inputting frequency matrix points or edge deletion data for the user, and normalizing the deletion data input by the user into edge deletion data;
(4) finally, executing an iterative search process on each piece of the pruned data, judging whether a decomposed subgraph is the maximum grouping stored in the set B by adopting a decomposition-judgment method in the iterative search process, and finally outputting all the maximum groupings obtained by the last iteration, namely a group with the characteristic of maximizing the common network access behavior;
in the step (4), an iterative search process is performed on each piece of the pruned data as follows: for each deleted edge e ═ a, B }, wherein a represents a row of the matrix, B represents a column of the matrix, e represents a corresponding element associated with the row a and the column B of the matrix, each maximum binary group G in the set B is decided respectively, if G does not contain e, the set B is unchanged, otherwise, G is taken out of B, decomposition and decision are carried out by adopting a decomposition-decision method, the maximum binary group obtained by decomposition and decision is added into the set B, the set B is updated, and iteration is circulated until all deleted edges are processed;
the decomposition-determination method includes: for the largest bipartite G containing the deleted edge e { a, b }, G is decomposed into a left subgraph G1And the right subfigure G2Two sub-graphs, left sub-graph G1G-a, right diagram G2G-b, and finally determining G1And G2If G is1At maximum two clusters, G is added1Added to set B if G2At maximum two clusters, G is added2Add to set B.
2. A system for implementing the method for dynamically mining the network access behavior feature population under the degradation condition according to claim 1, wherein: the system comprises a user data input interface module, a data-matrix conversion module, an EMBS searching module, an input interface for deleting data of points or edges, a normalization edge processing module and an iteration searching module; inputting effective frequency statistical data of each type of webpage accessed by an individual through a data input interface by a user; the data-matrix conversion module converts effective frequency statistical data input by a user into a 0,1 matrix; the EMBS searching module performs one-time scanning on the 0,1 matrix according to an EMBS searching method to search and store all the maximum two clusters in the matrix; inputting a point or an edge to be deleted in a matrix through an input interface of the point or edge deletion data by a user; the normalization edge processing module converts all deletion points or edge conditions input by a user into deletion conditions of a plurality of edges and records the deletion conditions; the iterative search module processes each deleted side in sequence and carries out the next search processing on the basis of the processing result of the previous side.
CN201810255630.1A 2018-03-27 2018-03-27 Network access behavior characteristic group dynamic mining method and system under degradation condition Active CN108664548B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810255630.1A CN108664548B (en) 2018-03-27 2018-03-27 Network access behavior characteristic group dynamic mining method and system under degradation condition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810255630.1A CN108664548B (en) 2018-03-27 2018-03-27 Network access behavior characteristic group dynamic mining method and system under degradation condition

Publications (2)

Publication Number Publication Date
CN108664548A CN108664548A (en) 2018-10-16
CN108664548B true CN108664548B (en) 2021-08-03

Family

ID=63782548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810255630.1A Active CN108664548B (en) 2018-03-27 2018-03-27 Network access behavior characteristic group dynamic mining method and system under degradation condition

Country Status (1)

Country Link
CN (1) CN108664548B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110175172B (en) * 2019-05-22 2021-08-31 深圳大学 Extremely-large binary cluster parallel enumeration method based on sparse bipartite graph
CN111666519A (en) * 2020-05-13 2020-09-15 中国科学院软件研究所 Dynamic mining method and system for network access behavior feature group under enhanced condition

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102148706A (en) * 2011-01-26 2011-08-10 西安电子科技大学 Evolution mode mining method in dynamic complex network
CN102708327A (en) * 2012-06-12 2012-10-03 天津大学 Network community discovery method based on spectrum optimization
WO2017189020A1 (en) * 2016-04-29 2017-11-02 Umbel Corporation Systems and methods of using a bitmap index to determine bicliques
CN107579844A (en) * 2017-08-18 2018-01-12 北京航空航天大学 It is a kind of that failure method for digging is dynamically associated based on service path and frequency matrix

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8209742B2 (en) * 2009-01-07 2012-06-26 Hewlett-Packard Development Company, L.P. Computer-implemented method for obtaining a minimum biclique cover in a bipartite dataset

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102148706A (en) * 2011-01-26 2011-08-10 西安电子科技大学 Evolution mode mining method in dynamic complex network
CN102708327A (en) * 2012-06-12 2012-10-03 天津大学 Network community discovery method based on spectrum optimization
WO2017189020A1 (en) * 2016-04-29 2017-11-02 Umbel Corporation Systems and methods of using a bitmap index to determine bicliques
CN107579844A (en) * 2017-08-18 2018-01-12 北京航空航天大学 It is a kind of that failure method for digging is dynamically associated based on service path and frequency matrix

Also Published As

Publication number Publication date
CN108664548A (en) 2018-10-16

Similar Documents

Publication Publication Date Title
Yu et al. Outlier-eliminated k-means clustering algorithm based on differential privacy preservation
Lin et al. A GA-based approach to hide sensitive high utility itemsets
CN109166615B (en) Medical CT image storage and retrieval method based on random forest hash
CN109886334B (en) Shared neighbor density peak clustering method for privacy protection
CN107291895B (en) Quick hierarchical document query method
CN107180079B (en) Image retrieval method based on convolutional neural network and tree and hash combined index
CN108664548B (en) Network access behavior characteristic group dynamic mining method and system under degradation condition
CN108549696B (en) Time series data similarity query method based on memory calculation
CN110688593A (en) Social media account identification method and system
Eghbali et al. Online nearest neighbor search using hamming weight trees
CN109614521B (en) Efficient privacy protection sub-graph query processing method
Shaham et al. Machine learning aided anonymization of spatiotemporal trajectory datasets
Bulysheva et al. Segmentation modeling algorithm: a novel algorithm in data mining
Gupta et al. A classification method to classify high dimensional data
Liu et al. Research on incremental clustering
CN109657060B (en) Safety production accident case pushing method and system
Yang et al. Top k probabilistic skyline queries on uncertain data
Kumar et al. A new Initial Centroid finding Method based on Dissimilarity Tree for K-means Algorithm
Hamedanian et al. An efficient prefix tree for incremental frequent pattern mining
CN111666519A (en) Dynamic mining method and system for network access behavior feature group under enhanced condition
Xue et al. A new approach for mining order-preserving submatrices based on all common subsequences
Bustos et al. Improving the space cost of k-nn search in metric spaces by using distance estimators
Lu et al. Dynamic Partition Forest: An Efficient and Distributed Indexing Scheme for Similarity Search based on Hashing
Kontaki et al. Continuous subspace clustering in streaming time series
Mo et al. Spatial community search using pagerank vector

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant