CN107341239B - Cluster data analysis method and device - Google Patents

Cluster data analysis method and device Download PDF

Info

Publication number
CN107341239B
CN107341239B CN201710541642.6A CN201710541642A CN107341239B CN 107341239 B CN107341239 B CN 107341239B CN 201710541642 A CN201710541642 A CN 201710541642A CN 107341239 B CN107341239 B CN 107341239B
Authority
CN
China
Prior art keywords
time
data
abnormal data
classification result
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710541642.6A
Other languages
Chinese (zh)
Other versions
CN107341239A (en
Inventor
程良伦
傅应龙
王卓薇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN201710541642.6A priority Critical patent/CN107341239B/en
Publication of CN107341239A publication Critical patent/CN107341239A/en
Application granted granted Critical
Publication of CN107341239B publication Critical patent/CN107341239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a cluster data analysis method and a cluster data analysis device, wherein the method comprises the steps of selecting mobile cluster object data corresponding to time points which are separated by preset time intervals in a preset time period; establishing an abnormal data dynamic table; classifying the mobile cluster object data of each time point and the abnormal data points in the abnormal data dynamic table to obtain an initial classification result, and storing the unclassified mobile cluster object data serving as the abnormal data points in the abnormal data dynamic table; and analyzing the change of the initial classification result of each time point and the initial classification result of the previous time point from the first time point, and identifying the change condition of the initial classification result of each time according to the change condition to obtain the classification result. The abnormal data dynamic table capable of storing the unclassified data is established to store the abnormal data, so that the loss of useful data is avoided, and meanwhile, the abnormal data is also included in classification, so that the accuracy of the data analysis process is higher.

Description

Cluster data analysis method and device
Technical Field
The present application relates to the field of big data mobile data analysis, and in particular, to a cluster data analysis method and apparatus.
Background
With the wide popularization of big data technology, big data application is very common in daily life, and especially, a data manufacturer purposefully pushes highly targeted contents such as advertisements and message push to the most suitable object according to the analysis of the big data, which is one of the important applications of the big data. Meanwhile, the increase of the mobile data, namely the increase of the data containing the motion knowledge and the position information of the object, can sell products to the object more purposefully. Traffic congestion prediction and animal migration can also be studied using the movement data. However, in pattern mining of moving objects using mobile data, the types included in the object data are diverse, and the requirement for real-time data analysis is high, and therefore, a challenge is posed to a pattern for mining mobile data.
The modes of mining movement data are commonly applied to, for example, traffic management, logistics distribution, and crowd detection. These require analysis of cluster variations. Whereas for the nature of cluster changes: whether a cluster corresponds to a group of cars simply disappearing or members of a cluster migrating to other clusters, whether a newly appearing cluster reflects a new vehicle or the appearance of a new target cluster, or a result of a change in preference of an existing customer.
Therefore, the study of cluster change conditions is to analyze the change conditions of cluster data within a period of time, and the original data is firstly divided into classes to study by taking the clusters as units, and then the change of the clusters at different time points is judged by the difference of the clusters. The above is also a general method for analyzing cluster data at present.
However, when the current analysis method is applied to a small amount of data, the error of the obtained result from the real situation is small, and when the amount of data is increased, the deviation of the pattern analysis result of the method from the real situation is large, and the expected result is not met.
Therefore, how to solve the problem of large error of the cluster data analysis method is a hot problem concerned by those skilled in the art.
Disclosure of Invention
The invention aims to provide a cluster data analysis method and a cluster data analysis device.
In order to solve the above technical problem, the present application provides a cluster data analysis method, including:
selecting moving cluster object data corresponding to time points which are separated by preset time intervals in a preset time period;
establishing an abnormal data dynamic table;
classifying the moving cluster object data of each time point and abnormal data points in the abnormal data dynamic table to obtain an initial classification result, and storing the moving cluster object data which is not classified as the abnormal data points in the abnormal data dynamic table;
and analyzing the initial classification result of each time point and the change of the initial classification result of the time point before the time point from the first time point, and identifying the change condition of the initial classification result of each time according to the change condition to obtain the classification result.
Optionally, the method further includes:
determining the relation between the classes of each time point according to the classification result, and constructing a mobile cluster pattern tree;
and determining related mobile cluster frequent information according to the mobile cluster pattern tree.
Optionally, the identifier of the change condition specifically includes:
retention, merging, separation, expansion, contraction, disappearance and appearance.
Optionally, the creating an abnormal data dynamic table includes:
establishing the abnormal data dynamic table;
setting relevant processing parameters; wherein the processing parameters include a dynamic change time and an update time.
Optionally, the step of taking the moving cluster object data that is not classified in the classification as the abnormal data point and storing the abnormal data point into an abnormal data dynamic table further includes:
judging whether the existence time of the abnormal data points exceeds the updating time or not according to the processing parameters;
and if so, updating the abnormal data point.
The present application further provides a cluster data analysis device, the device includes:
the data selecting module is used for selecting the mobile cluster object data corresponding to the time points which are separated by the preset time interval in the preset time period;
the table building module is used for building an abnormal data dynamic table;
the initial classification module is used for classifying the moving cluster object data of each time point and abnormal data points in the abnormal data dynamic table to obtain an initial classification result, and storing the moving cluster object data which is not classified as the abnormal data points into the abnormal data dynamic table;
and the change identification module is used for analyzing the initial classification result of each time point and the change of the initial classification result of the time point before the time point from the first time point, and identifying the change condition of the initial classification result of each time according to the change condition to obtain the classification result.
Optionally, the method further includes:
the tree building module is used for determining the relation between the classes of each time point according to the classification result and building a mobile cluster mode tree;
and the mining module is used for determining the related mobile cluster frequent information according to the mobile cluster pattern tree.
Optionally, the table building module includes:
a table building unit for building the abnormal data dynamic table
A parameter setting unit for setting relevant processing parameters; wherein the processing parameters include a dynamic change time and an update time.
Optionally, the initial classification module further includes: an update unit, wherein the update unit comprises:
the time judging subunit is used for judging whether the existence time of the abnormal data points exceeds the updating time or not according to the processing parameters;
and the updating subunit is used for updating the abnormal data point when the existence time of the abnormal data point exceeds the updating time.
Due to the existing cluster data analysis method, all unclassified data can be lost in the classification process, but for data in a time period, unclassified abnormal data at the current moment has a beneficial effect on the classification result at the next moment. Therefore, the analysis result has larger error, and the described real situation does not meet the expected requirement.
Therefore, the cluster data analysis method provided by the application comprises the steps of selecting moving cluster object data corresponding to time points which are separated by preset time intervals in a preset time period; establishing an abnormal data dynamic table; classifying the moving cluster object data of each time point and abnormal data points in the abnormal data dynamic table to obtain an initial classification result, and storing the moving cluster object data which is not classified as the abnormal data points in the abnormal data dynamic table; and analyzing the initial classification result of each time point and the change of the initial classification result of the time point before the time point from the first time point, and identifying the change condition of the initial classification result of each time according to the change condition to obtain the classification result.
The abnormal data dynamic table capable of storing the unclassified data is established to store the abnormal data, so that the loss of useful data is avoided, and meanwhile, the abnormal data is also included in classification, so that the accuracy of the data analysis process is higher. The application also provides a cluster data analysis device, which has the beneficial effects, and the details are not repeated herein.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a cluster data analysis method provided in an embodiment of the present application;
FIG. 2 is a detailed flow chart of data analysis provided by an embodiment of the present application;
FIG. 3 is a partial flow diagram of a classification process provided by an embodiment of the present application;
FIG. 4 is a flow chart of an analysis mode provided by an embodiment of the present application;
FIG. 5 is a diagram of building a pattern tree according to an embodiment of the present application;
FIG. 6 is a flowchart of creating a dynamic table according to an embodiment of the present application;
FIG. 7 is a flowchart of updating a dynamic table according to an embodiment of the present application;
fig. 8 is a block diagram of a cluster data analysis apparatus according to an embodiment of the present application;
FIG. 9 is a block diagram of a construction pattern tree provided by an embodiment of the present application;
fig. 10 is a block diagram of a table building module provided in an embodiment of the present application.
Detailed Description
The core of the application is to provide a cluster data analysis method, and by establishing an abnormal data dynamic table, storing abnormal data and updating stored data, the method avoids larger analysis result errors caused by losing useful data, and improves the accuracy of the analysis method.
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a cluster data analysis method according to an embodiment of the present disclosure.
The embodiment may include:
s100, selecting moving cluster object data corresponding to time points which are separated by preset time intervals in a preset time period;
s200, establishing an abnormal data dynamic table;
it should be noted that, there is no relation between step S100 and step S200, and therefore there is no precedence relationship between the execution, and step S200 may be executed first and then step S100 may be executed, or both steps may be executed at the same time, which is not limited herein.
The predetermined time period indicated by step S100 is the time period to be analyzed in the present study, and may depend on the actual situation to be analyzed. For example, for a certain road segment under study, clustering data of vehicles at 5 to 7 pm, a time segment comprising this time segment should be selected. This is to say that it is not necessary to select this time segment, since changing data is studied, and the change is also observed for data at the beginning and end of the time segment, so that appropriate time reservation lengths are added at the beginning and end for comprehensive analysis of data in the time segment.
Meanwhile, the predetermined time interval refers to the interval of sampling points of continuous time in the time period, which can be determined by the analyzed timing situation, but an important parameter for sampling the time period is the number of sampling points in the time period, and since a large amount of data needs to be analyzed, and one point is increased to a certain extent for the data amount to be analyzed, an accurate result needs to be obtained by the appropriate number of sampling points. For example, it is necessary to study the cluster data of vehicles at 5 o 'clock to 7 o' clock in a certain road night, and it is known in the general knowledge that the traffic flow is large and the vehicle speed is slow at this time, and the number of points to be sampled can be reduced appropriately. If the data is researched, the data of the vehicles in a certain road from 5 to 7 points in the morning are clustered, the traffic flow is small, the vehicle speed is high, and the vehicles in the road change fast, so that the number of the sampled points can be increased properly.
And after the time point is determined, selecting the mobile cluster object data corresponding to the time. Moving data information O at a certain point of time of the moving cluster object data represented as one moving object:
O=(oid,p(x,y),t)
wherein oid is a data type identifier, p (x, y) is the longitude and latitude of the mobile object at the time point t, x is the longitude, y is the latitude, and t is the time at the time point.
Define Ω (t), O ∈ Ω, Ω (t) as a set of mobile data object data, called the mobile object location coordination set.
For the abnormal data dynamic table established in step S200, a data table should be established in the data analysis, and functions such as storing, modifying, deleting and the like may be performed on the data. The dynamic table name created in this embodiment is F-list.
S300, classifying the moving cluster object data of each time point and abnormal data points in the abnormal data dynamic table to obtain an initial classification result, and storing the moving cluster object data which is not classified as the abnormal data points in the abnormal data dynamic table;
it should be noted that the classification of the moving cluster object data may be performed by using a classification method, for example, DBscan, KNN, and K-means, and the classification method may be selected according to the performance requirement and the result accuracy requirement of data analysis, which is not limited in this embodiment.
In the classification process, unclassified data may appear, and the unclassified data needs to be stored as abnormal data in an abnormal data dynamic table. Similarly, the classification target in the classification of the data is all data, that is, data including the time point to be classified and data in the abnormal data dynamic table.
Therefore, the method and the device have the advantages that the abnormal data dynamic table capable of storing unclassified data is established, the abnormal data is stored, the loss of useful data is avoided, meanwhile, the abnormal data is also contained in classification, and the accuracy of the data analysis process can be higher.
S400, starting from the first time point, analyzing the change of the initial classification result of each time point and the initial classification result of the time point before the time point, and identifying the change condition of the initial classification result of each time according to the change condition to obtain a classification result.
The initial classification result obtained according to the above process is a classification result of each time point, and since the evolution mode of the cluster data object needs to be analyzed, the classification results of the data of each time point need to be related and analyzed to obtain a correlation relationship. Therefore, the initial classification result of each time point and the initial classification result of the previous time point of the time point need to be analyzed, and a classification category is obtained and a change condition is identified by correlating according to the initial classification results of the two time points.
In this embodiment, the change situation of two adjacent time points is judged by using the similarity of the Jaccard, and the change situation is divided into corresponding change situation categories and identified. The Jaccard similarity relates to the problem of confidence, namely the change condition of the cluster initial classification result of the adjacent time points is judged according to the similarity ratio of the data quantity of the next time point to the data quantity of the previous time point. The proportion of similarity needs to be determined empirically, and is not limited herein.
The type of the change condition is generally determined by the specific condition of the data analyzed by the change condition. The data reached generally corresponds to the actual specific problem, and the change condition of the data and the change condition category thereof can be roughly determined according to the problem. For example, the analysis is simple, the data generally has the situations of merging, separating, disappearing and appearing, and the categories of the changing situations can also be divided into the categories. And are not limited herein.
In this embodiment, the actual problem of selection is to analyze the road traffic conditions, so the following seven categories of the change situation are selected: survives (retention), merged (merger), splits (separation), expands (shrinkages), dispepars (disappearance) and appepars (appearance).
Referring to fig. 2, fig. 2 is a specific flowchart of data analysis provided in the embodiment of the present application.
Wherein the predetermined time period is denoted by T, the predetermined time interval is denoted by Δ T, and the initial time point of the time point is denoted by T.
Referring to fig. 3, fig. 3 is a partial flowchart of a classification process according to an embodiment of the present disclosure.
The flow of the classification process for the parts is as follows. Because space limitations cannot show a complete classification process flow diagram, and a partial processing flow diagram is shown here as an example, the complete flow diagram can be simply expanded and obtained according to the partial flow diagram. And therefore will not be described in full herein.
The time points in the time period are set to be 6, the time interval is △ t, and classification analysis is carried out on the 6 time points from t, namely the 6 time points of t, t + △ t, t +2 △ t, t +3 △ t, t +4 △ t and t +5 △ t.
In t, classified classes are distinguished by C1, C2, C3 and C4, the 4 classes are marked as appaars (appearance), and at the moment, some points which cannot be classified are stored in an abnormal data dynamic table F-list.
In t + △ t, classification is carried out, at which time, C1 and C2 at the previous time point can be found to be merged into a class C1 'and therefore identified by merged (merging), the cluster number of C3' is enlarged compared with that of C3 and identified by expansions, C4 is kept unchanged and therefore identified by survives, and the points which cannot be classified at this time are continuously stored in the abnormal data dynamic table F-list.
At t +2 △ t, it can be seen that C3', C4 merge into a large class C3 ", so C3" is identified as merged, while C1' merges with some data in the abnormal data dynamic table as C1 ", which is not identified as merged but as expanded, and continues to store the point where classification cannot be performed in the abnormal data dynamic table F-list.
In t +3 △ t, since the previous time point t +2 △ t is full, the previous time point t +2 t is updated and the point which cannot be classified at this time is stored in the abnormal data dynamic table F-list, and corresponding to the situation that C1 "' and C5 are scattered from the previous time point C1", both C1 "' and C5 are marked as splits (separation), and at this time C3" ' is reduced from the previous time point C3 ", and is marked as shrins (reduction).
At t +4 △ t, C1' remains unchanged and is identified as survives, C3' is the reduction of C3' at the previous time point and is identified as shrins (reduction), and for C5, the data completely disappear, so that the data is identified as disppears (disappearance), and the points which cannot be classified at the time are continuously stored in the abnormal data dynamic table F-list.
For t +5 △ t, C1' "was labeled survives (reserved) without any change from C3" ", at the previous time.
Referring to fig. 4 and 5, fig. 4 is a flowchart of an analysis schema provided in an embodiment of the present application, and fig. 5 is a schema tree graph provided in an embodiment of the present application.
Based on the above embodiment, this embodiment may further include:
s500, determining the relation between the classes of each time point according to the classification result, and constructing a mobile cluster pattern tree;
s600, determining the related mobile cluster frequent information according to the mobile cluster pattern tree.
The established mobile cluster pattern tree is constructed according to the type of the change condition identified by each time point, and the classification of C1 at each time point is sequentially inserted from the first empty node of a root (root) to construct a first branch and indicate the change condition of the branch. And inserting a second null node, constructing a second branch from the second null node, and according to the classification result and the change condition, knowing that the C2 is merged into the C1 at a second time point, thereby indicating the change condition in the tree and indicating the process. And constructing the residual branches in sequence to form a complete model tree.
And then, in connection with the actual situation, selecting a proper information mining mode, determining the frequent information of the related mobile clusters, and obtaining the frequently-occurring associated mobile mode.
For example, in an actual traffic road section, a time period from 5 to 7 pm of an overpass is selected, and according to an analysis pattern tree, merging (merged) and expansion (expansions) are found to frequently occur, and vehicle conditions of the time period are defined in sequence, so that the method has important guiding significance for traffic modulation.
Referring to fig. 6, fig. 6 is a flowchart for establishing a dynamic table according to an embodiment of the present application.
Based on the foregoing embodiment, the creating of the abnormal data dynamic table in this embodiment may include:
s210, establishing the abnormal data dynamic table;
s220, setting relevant processing parameters; wherein the processing parameters include a dynamic change time and an update time.
After setting the relevant processing parameters to the abnormal data dynamic table, the abnormal data dynamic table is expressed as follows:
F-list(τ,θ)
where τ is T/n, n is 1,2,3 … … indicates a certain period of time for which an abnormal data point should be stored; θ ═ τ/n, and n ═ 1,2,3 … … denote the presence sub-times of the selected anomalous data points that should be updated.
The parameters can be set according to data and actual specific conditions, the numerical values of the parameters influence the data volume of subsequent classified scanning and the accuracy of results, if the numerical values are too large, the data volume existing at the same time is too large, the load of the classified scanning is increased, the data processing speed is influenced, and if the numerical values are too small, useful data can be cleared too early, and the result error of subsequent analysis is larger. Therefore, the present invention is not limited to the above embodiments.
In this embodiment, the data is updated once τ is set to 3, that is, the dynamic table thereof is full of data at 3 time points, and the data stored at the first two time points is deleted while θ is set to 2, that is, the data is updated.
Referring to fig. 7, fig. 7 is a flowchart of updating a dynamic table according to an embodiment of the present application.
Based on the foregoing embodiment, this embodiment may further include:
s321, judging whether the existence time of the abnormal data points exceeds the updating time or not according to the processing parameters;
and S322, if yes, updating the abnormal data point.
Corresponding to the above embodiment, a corresponding determination process needs to be performed in the processing process, and when it is determined that the abnormal data point exceeds the update time, that is, the τ value, the data stored at the previous two time points are updated.
However, the data is updated in such a manner that the updating time is specified and the updating operation is performed until the time is out in order to avoid that the scanned data amount in the classification is excessive and the machine load is increased because of excessive redundant data stored in the abnormal data dynamic table. The updating operation can be a complete deletion, or a partial deletion after comparison, and the overtime data can be stored in other tables for subsequent use instead of the deletion operation.
In the present embodiment, the deletion operation is selected for the data that has timed out, in order to reduce the amount of data that needs to be scanned each time, while reducing the load on the machine.
The embodiment of the application provides a cluster data analysis method, and the abnormal data occurring in the classification process is stored by establishing an abnormal data dynamic table, so that the condition of losing useful data is avoided, and the accuracy of the analysis method is improved.
In the following, the cluster data analysis device provided in the embodiment of the present application is introduced, and the cluster data analysis device described below and the cluster data analysis method described above may be referred to correspondingly.
Referring to fig. 8, fig. 8 is a block diagram of a cluster data analysis apparatus according to an embodiment of the present disclosure.
The present embodiment provides a cluster data analysis device, which may include:
a data selecting module 100, configured to select moving cluster object data corresponding to time points spaced by a predetermined time interval within a predetermined time period;
the table building module 200 is used for building an abnormal data dynamic table;
an initial classification module 300, configured to classify the moving cluster object data at each time point and an abnormal data point in the abnormal data dynamic table to obtain an initial classification result, and store the non-classified moving cluster object data as the abnormal data point in the abnormal data dynamic table;
a change identification module 400, configured to analyze, starting from the first time point, a change of the initial classification result at each time point and the initial classification result at the time point before the time point, and perform change condition identification on the initial classification result at each time according to a change condition to obtain a classification result.
Referring to fig. 9, fig. 9 is a block diagram of constructing a pattern tree according to an embodiment of the present application.
Based on the above embodiment, this embodiment may further include:
a tree building module 500, configured to determine a relationship between classes at each time point according to the classification result, and build a mobile cluster pattern tree;
and the mining module 600 is configured to determine the frequent information of the relevant mobile cluster according to the mobile cluster pattern tree.
Referring to fig. 10, fig. 10 is a block diagram of a table building module according to an embodiment of the present disclosure.
Based on the above embodiments, the table building module 200 may include:
a table building unit 210 for building the abnormal data dynamic table
A parameter setting unit 220 for setting relevant processing parameters; wherein the processing parameters include a dynamic change time and an update time.
Based on the above embodiment, this embodiment may further include: an update unit, wherein the update unit may include:
the time judging subunit is used for judging whether the existence time of the abnormal data points exceeds the updating time or not according to the processing parameters;
and the updating subunit is used for updating the abnormal data point when the existence time of the abnormal data point exceeds the updating time.
The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above provides a detailed description of a cluster data analysis method and apparatus provided by the present application. The principles and embodiments of the present application are explained herein using specific examples, which are provided only to help understand the method and the core idea of the present application. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims (7)

1. A method for cluster data analysis, the method comprising:
selecting moving cluster object data corresponding to time points which are separated by preset time intervals in a preset time period;
establishing an abnormal data dynamic table;
classifying the moving cluster object data of each time point and abnormal data points in the abnormal data dynamic table to obtain an initial classification result, and storing the moving cluster object data which is not classified as the abnormal data points in the abnormal data dynamic table;
starting from the first time point, analyzing the initial classification result of each time point and the change of the initial classification result of the time point before the time point, and identifying the change condition of the initial classification result of each time according to the change condition to obtain a classification result;
further comprising:
determining the relation between the classes of each time point according to the classification result, and constructing a mobile cluster pattern tree;
and determining related mobile cluster frequent information according to the mobile cluster pattern tree.
2. The method according to claim 1, wherein the identification of the change condition specifically includes:
retention, merging, separation, expansion, contraction, disappearance and appearance.
3. The method of claim 2, wherein the creating an exception data dynamic table comprises:
establishing the abnormal data dynamic table;
setting relevant processing parameters; wherein the processing parameters include a dynamic change time and an update time.
4. The method of claim 3, wherein storing the unclassified moving cluster object data as the outlier data point into an outlier dynamic table further comprises:
judging whether the existence time of the abnormal data points exceeds the updating time or not according to the processing parameters;
and if so, updating the abnormal data point.
5. A cluster data analysis apparatus, the apparatus comprising:
the data selecting module is used for selecting the mobile cluster object data corresponding to the time points which are separated by the preset time interval in the preset time period;
the table building module is used for building an abnormal data dynamic table;
the initial classification module is used for classifying the moving cluster object data of each time point and abnormal data points in the abnormal data dynamic table to obtain an initial classification result, and storing the moving cluster object data which is not classified as the abnormal data points into the abnormal data dynamic table;
a change identification module, configured to analyze, starting from a first time point, a change of the initial classification result at each time point and the initial classification result at a time point before the time point, and perform change condition identification on the initial classification result at each time according to a change condition to obtain a classification result;
further comprising:
the tree building module is used for determining the relation between the classes of each time point according to the classification result and building a mobile cluster mode tree;
and the mining module is used for determining the related mobile cluster frequent information according to the mobile cluster pattern tree.
6. The apparatus of claim 5, wherein the table building module comprises:
a table building unit for building the abnormal data dynamic table
A parameter setting unit for setting relevant processing parameters; wherein the processing parameters include a dynamic change time and an update time.
7. The apparatus of claim 6, wherein the initial classification module further comprises: an update unit, wherein the update unit comprises:
the time judging subunit is used for judging whether the existence time of the abnormal data points exceeds the updating time or not according to the processing parameters;
and the updating subunit is used for updating the abnormal data point when the existence time of the abnormal data point exceeds the updating time.
CN201710541642.6A 2017-07-05 2017-07-05 Cluster data analysis method and device Active CN107341239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710541642.6A CN107341239B (en) 2017-07-05 2017-07-05 Cluster data analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710541642.6A CN107341239B (en) 2017-07-05 2017-07-05 Cluster data analysis method and device

Publications (2)

Publication Number Publication Date
CN107341239A CN107341239A (en) 2017-11-10
CN107341239B true CN107341239B (en) 2020-08-07

Family

ID=60217957

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710541642.6A Active CN107341239B (en) 2017-07-05 2017-07-05 Cluster data analysis method and device

Country Status (1)

Country Link
CN (1) CN107341239B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109002261B (en) * 2018-07-11 2022-03-22 佛山市云端容灾信息技术有限公司 Method and device for analyzing big data of difference block, storage medium and server

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908065A (en) * 2010-07-27 2010-12-08 浙江大学 On-line attribute abnormal point detecting method for supporting dynamic update
CN104487991A (en) * 2011-12-30 2015-04-01 施耐德电气(美国)公司 Energy management with correspondence based data auditing signoff
CN106101102A (en) * 2016-06-15 2016-11-09 华东师范大学 A kind of exception flow of network detection method based on PAM clustering algorithm
CN106203519A (en) * 2016-07-17 2016-12-07 合肥赑歌数据科技有限公司 Fault pre-alarming algorithm based on taxonomic clustering
CN106657065A (en) * 2016-12-23 2017-05-10 陕西理工学院 Network abnormality detection method based on data mining

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5774331B2 (en) * 2011-03-03 2015-09-09 株式会社日立国際電気 Substrate processing system, management apparatus, data analysis method, and data analysis program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101908065A (en) * 2010-07-27 2010-12-08 浙江大学 On-line attribute abnormal point detecting method for supporting dynamic update
CN104487991A (en) * 2011-12-30 2015-04-01 施耐德电气(美国)公司 Energy management with correspondence based data auditing signoff
CN106101102A (en) * 2016-06-15 2016-11-09 华东师范大学 A kind of exception flow of network detection method based on PAM clustering algorithm
CN106203519A (en) * 2016-07-17 2016-12-07 合肥赑歌数据科技有限公司 Fault pre-alarming algorithm based on taxonomic clustering
CN106657065A (en) * 2016-12-23 2017-05-10 陕西理工学院 Network abnormality detection method based on data mining

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Discovering Frequent Mobility Patterns on Moving Object Data;Ticiana L;《2014MobiGIS》;20151227;全文 *
Efficient Clustering_based Outlier Detection Algorithm for Dynamic Data Stream;Manzoor;《2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery》;20081020;全文 *
基于异常数据挖掘算法的研究;王传玉;《中国优秀硕士学位论文全文数据库信息科技辑》;20170415;全文 *
异常数据挖掘算法研究与应用;孟静;《中国优秀硕士学位论文全文数据库信息科技辑》;20161231;全文 *

Also Published As

Publication number Publication date
CN107341239A (en) 2017-11-10

Similar Documents

Publication Publication Date Title
CN109754594B (en) Road condition information acquisition method and equipment, storage medium and terminal thereof
CN103714185B (en) Subject event updating method base and urban multi-source time-space information parallel updating method
CN106933956B (en) Data mining method and device
WO2022142042A1 (en) Abnormal data detection method and apparatus, computer device and storage medium
CN107543553B (en) Interest point updating method and device
WO2022227303A1 (en) Information processing method and apparatus, computer device, and storage medium
KR101925506B1 (en) Method and apparatus for predicting the spread of an infectious disease
CN110019349A (en) Sentence method for early warning, device, equipment and computer readable storage medium
CN111177544A (en) Operation system and method based on user behavior data and user portrait data
CN110335399B (en) Bluetooth access control method, computer terminal and computer readable storage medium
US11250031B2 (en) Method of predicting a traffic behaviour in a road system
CN112214617B (en) Digital file management method and system based on block chain technology
CN107341239B (en) Cluster data analysis method and device
US11670163B2 (en) Method of predicting a traffic behaviour in a road system
CN114579657A (en) Vehicle-road cooperation-based v2x edge cloud control method and system
CN110781064A (en) Method and device for dynamically embedding point to acquire client user behavior data
JPWO2018150550A1 (en) Learning data management apparatus and learning data management method
WO2022237213A1 (en) Resident travel chain model construction method and resident travel chain acquisition method
CN103942131A (en) Method and device for monitoring whether bottom layer interfaces change or not
CN112307151A (en) Navigation data processing method and device
US20180032962A1 (en) Method, apparatus, and system for pushing information
CN111475336A (en) Backup data analysis method and device based on file information and computer equipment
JP6880962B2 (en) Program analyzer, program analysis method and analysis program
CN112966947B (en) Intelligent tourist attraction management method and system based on Internet of things
CN113393011B (en) Method, device, computer equipment and medium for predicting speed limit information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant