CN113626670B - Object clustering method and device based on time-space relationship and electronic equipment - Google Patents

Object clustering method and device based on time-space relationship and electronic equipment Download PDF

Info

Publication number
CN113626670B
CN113626670B CN202110788970.2A CN202110788970A CN113626670B CN 113626670 B CN113626670 B CN 113626670B CN 202110788970 A CN202110788970 A CN 202110788970A CN 113626670 B CN113626670 B CN 113626670B
Authority
CN
China
Prior art keywords
data
time window
clustering
spatio
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110788970.2A
Other languages
Chinese (zh)
Other versions
CN113626670A (en
Inventor
张美玲
陈新宇
刘洋
宋广日
谢梦燕
周瑞
赵勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Gelingshentong Information Technology Co ltd
Original Assignee
Beijing Gelingshentong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Gelingshentong Information Technology Co ltd filed Critical Beijing Gelingshentong Information Technology Co ltd
Priority to CN202110788970.2A priority Critical patent/CN113626670B/en
Publication of CN113626670A publication Critical patent/CN113626670A/en
Application granted granted Critical
Publication of CN113626670B publication Critical patent/CN113626670B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides an object clustering method, an object clustering device and electronic equipment based on a spatio-temporal relationship, firstly, preprocessing acquired original data to acquire spatio-temporal data, and then, allocating at least two time windows to the spatio-temporal data; aiming at each time window, acquiring an initial clustering result of each object according to the spatio-temporal information in the time window; when a plurality of initial clustering results of the object in different time windows are different, screening the initial clustering results of the object to obtain a target clustering result of the object; and finally, outputting the target clustering result of each object. This application can find the comparatively complete spatial-temporal information of object in a certain time window through setting up gliding time window to guarantee the integrality of data, when same object is different at the clustering result of a plurality of different time windows, filter the clustering result again, in order to get rid of some time windows because data is incomplete to lead to the clustering result to appear the deviation, improved the accuracy of clustering result.

Description

Object clustering method and device based on time-space relationship and electronic equipment
Technical Field
The present application relates to the field of data processing, and in particular, to a method and an apparatus for clustering objects based on a spatiotemporal relationship, and an electronic device.
Background
Clustering is the process of building a collection of physical or abstract objects into a plurality of classes or clusters of similar objects. The clusters generated by clustering are a set of data objects, the data objects in the same cluster should be as similar as possible, and the data objects in different clusters should be as dissimilar as possible. With the development of science and technology and the progress of technology, the clustering algorithm is widely applied in many fields, such as the medical and health field, social network platform, market, online shopping platform, etc.
At present, when object clustering is carried out, particularly when the object clustering is carried out through a k-means algorithm, a data set is selected to directly cluster and output a clustering result, the output clustering result is easily influenced by a selected k value or outliers and the like, once the k value is selected inaccurately, the clustering result is deviated, and the accuracy of the current clustering algorithm is low.
Disclosure of Invention
The embodiment of the application provides an object clustering method and device based on a spatiotemporal relationship and electronic equipment, and is used for solving the problem of low accuracy of the conventional clustering algorithm.
According to a first aspect of embodiments of the present application, there is provided an object clustering method based on spatiotemporal relationships, the method including:
acquiring original data, preprocessing the original data, and acquiring processed spatio-temporal data, wherein the spatio-temporal data comprises object data and spatio-temporal information of each object;
allocating at least two time windows to the spatio-temporal data according to preset window attributes, wherein the adjacent time windows are partially overlapped, each time window comprises at least one object, and each object comprises at least one piece of spatio-temporal information;
aiming at each time window, obtaining an initial clustering result of each object according to the time-space information in the time window;
for each object, when a plurality of initial clustering results of the object in different time windows are different, screening the initial clustering results to obtain a target clustering result of the object;
and outputting the target clustering result of each object.
According to a second aspect of the embodiments of the present application, there is provided an object clustering apparatus based on spatiotemporal relationships, the apparatus including:
the data processing module is used for acquiring original data, preprocessing the original data and acquiring processed spatiotemporal data, wherein the spatiotemporal data comprises object data and spatiotemporal information of each object;
the time window distribution module is used for distributing at least two time windows for the spatio-temporal data according to preset window attributes, wherein the adjacent time windows are partially overlapped, each time window comprises at least one object, and each object comprises at least one piece of spatio-temporal information;
the clustering module is used for acquiring initial clustering results of all objects according to the spatio-temporal information in each time window;
the result screening module is used for screening a plurality of initial clustering results of each object in different time windows to obtain a target clustering result of the object;
and the output module is used for outputting the target clustering result of each object.
According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, comprising: a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory communicate via the bus when the electronic device is running, and the machine-readable instructions, when executed by the processor, perform the spatiotemporal relationship-based object clustering method provided by the first aspect.
According to a fourth aspect of embodiments of the present application, a storage medium is provided, where the storage medium stores a computer program, and the computer program is executed by a processor to perform the spatio-temporal relationship-based object clustering method provided in the first aspect.
The embodiment of the application provides an object clustering method, an object clustering device and electronic equipment based on a spatiotemporal relationship, wherein the method comprises the following steps: preprocessing the acquired original data to acquire space-time data, wherein the space-time data comprises object data and space-time information of each object; assigning time windows to the spatiotemporal data such that each time window includes at least one object, each object including at least one piece of spatiotemporal information; aiming at each time window, acquiring an initial clustering result of each object according to the spatio-temporal information in the time window; when a plurality of initial clustering results of the object in different time windows are different, screening the initial clustering results of the object to obtain a target clustering result of the object; and finally, outputting the target clustering result of each object. This application is through setting up two at least time windows to can earlier carry out the clustering to the object in every time window, obtain the clustering result, adopt the error correction mechanism again, when same object is different at the clustering result of a plurality of different time windows, filter the clustering result, in order to get rid of some time windows because data is incomplete leads to the clustering result to appear the deviation, improved the accuracy of clustering result.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flowchart of an object clustering method based on spatiotemporal relationships according to an embodiment of the present application;
FIG. 2 is a schematic diagram of time window allocation provided by an embodiment of the present application;
fig. 3 is a flowchart illustrating sub-steps of step S13 according to an embodiment of the present disclosure;
fig. 4 is a flowchart illustrating sub-steps of step S14 according to an embodiment of the present disclosure;
FIG. 5 is a functional block diagram of an object clustering device based on spatiotemporal relationships according to an embodiment of the present application;
fig. 6 is a schematic view of an electronic device provided in an embodiment of the present application.
Detailed Description
In the process of realizing the application, the inventor finds that in an actual scene, the clustering and clustering results of the data are positively correlated with the sampling rate of the data, but the high sampling rate can obviously improve the data scale, so that the exponential amplification calculation scale is presented, the dependence and the demand on the calculation resources are obviously improved, and the cost of practical application is further improved. To solve this problem, incremental calculations on the data are required. Incremental computation refers to computing new data without recalculating historical data, and continuously adjusting the final clustering result.
However, the existing clustering algorithm, especially the k-means algorithm, can not realize the incremental calculation of data. When the k-means algorithm is used for object clustering, a data set is selected to directly cluster and output a clustering result, the output clustering result is easily influenced by a selected k value or outliers and the like, once the k value is selected inaccurately, the clustering result is deviated, and the accuracy of the current clustering algorithm is low.
In view of the foregoing problems, an embodiment of the present application provides an object clustering method and apparatus based on a spatiotemporal relationship, and an electronic device, where the method includes: preprocessing the acquired original data to acquire space-time data, wherein the space-time data comprises object data and space-time information of each object; assigning time windows to the spatio-temporal data such that each time window includes at least one object, each object including at least one piece of spatio-temporal information; aiming at each time window, acquiring an initial clustering result of each object according to the spatio-temporal information in the time window; when a plurality of initial clustering results of the object in different time windows are different, screening the initial clustering results of the object to obtain a target clustering result of the object; and finally, outputting the target clustering result of each object. This application can find the comparatively complete spatial-temporal information of object in a certain time window through setting up gliding time window to guarantee the integrality of data, adopt error correction mechanism again, when the clustering result of same object at a plurality of different time windows is different, filter the clustering result, in order to get rid of some time windows because data is incomplete leads to the clustering result to appear the deviation, improved the accuracy of clustering result.
In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following further detailed description of the exemplary embodiments of the present application with reference to the accompanying drawings makes it clear that the described embodiments are only a part of the embodiments of the present application, and are not exhaustive of all embodiments. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a flowchart of an object clustering method based on spatiotemporal relationship according to an embodiment of the present application. In this embodiment, the method includes:
and S11, acquiring original data, and preprocessing the original data to obtain processed spatio-temporal data.
The spatiotemporal data comprise object data and spatiotemporal information of each object;
and S12, distributing at least two time windows for the spatio-temporal data according to the preset window attributes.
Wherein adjacent time windows are partially overlapped, each time window comprises at least one object, and each object comprises at least one piece of spatiotemporal information;
and S13, aiming at each time window, acquiring an initial clustering result of each object according to the spatio-temporal information in the time window.
And S14, aiming at each object, when a plurality of initial clustering results of the object in different time windows are different, screening the initial clustering results to obtain a target clustering result of the object.
And step S15, outputting the target clustering result of each object.
It should be noted that, in this embodiment, the object may be a person, a vehicle, or an article, and as long as the object can be identified by the correlation algorithm, the object may be an object that appears movably, such as a cattle and a sheep on a pasture, a vehicle on a road, or the like.
In the above steps, by setting at least two time windows, the method and the device cluster the objects in each time window, and if the size of the window is set reasonably, the more complete spatio-temporal information of the objects can be found in a certain time window, so as to ensure the integrity of data.
Optionally, in this embodiment, in step S11, obtaining raw data, and preprocessing the raw data to obtain processed spatio-temporal data, specifically including:
and performing data cleaning and data denormalization processing on the original data, wherein the data cleaning comprises at least one operation of data filtering, format conversion, normalization and data sampling, and the denormalization processing comprises attribute expansion of the data and/or custom rule addition of the data.
In this embodiment, after the original data is obtained from each data source, the data needs to be cleaned, including operations such as filtering of unreasonable data, data format conversion, filtering of repeated data, and high-density data sampling.
Specifically, when data is filtered, data that does not conform to a predefined standard format may be discarded, for example, if the data is required to have coordinate information in the predefined format, but the actually acquired data does not have the coordinate information, the data may be directly discarded; for another example, in the embodiment, real-time data needs to be collected, and if the difference between the timestamp of the obtained data and the current time of the system is too large, the data is also filtered.
The data format conversion refers to converting the format of the acquired original data into a format required by the embodiment of the present application, for example, a time information format or a spatial information format.
After format conversion, normalization processing is also required on the data. In the normalization processing, the time information of each object is normalized according to a time sequence, the spatial information of each object needs to be firstly set with an origin, a rectangular coordinate system is established, and the position of each spatial information is determined through the established rectangular coordinate system.
When a plurality of records are acquired by an object under the same coordinate in a short time, the data are not repeated data, but the data need to be sampled, so that the clustering result is not accurate due to excessive data acquisition.
Optionally, after the original data is cleaned, in order to accelerate the subsequent processing speed and simplify the execution flow, denormalization processing needs to be performed on the data. Specifically, the denormalization processing includes perfecting extended attribute information of the data and adding custom rules to the data.
The original data generally uses normalized information, and the object clustering also needs other attribute information related to the object, such as classification of associated objects, grouping information, location information to which the object belongs, and the like, because the information is relatively fixed, the dimension table information associated with the object is loaded to a processing node of a data stream in advance through broadcasting, and then is linked with a streaming record (JOIN) generated in real time, namely denormalization processing of data extension attribute can be realized.
Besides the object attribute, the application object has different scenes or has special requirements on certain scenes, the clustering parameters can also be manually intervened and set, and the self-defined rules are also used as dimension tables, are connected with the stream record and are carried in the original data for subsequent processing.
After preprocessing the data, space-time data is obtained, and the space-time data comprises object data and space-time information of each object. Time windows are then allocated to the spatio-temporal data according to preset window attributes.
Optionally, in this embodiment, a time window may be assigned to the spatiotemporal data according to the generation time of the data and custom window attributes. The window attributes include a window size, a window sliding step size, and the window sliding step size is smaller than the window size, so that there is an overlapping portion between two time windows, and thus, spatiotemporal information of an object may appear in a plurality of time windows.
If the window size of the time window can be 1 hour, the window sliding step can be 10 minutes, then time window 1 can be 0 hr-1, time window 2 can be 0. After time windows are allocated for the spatio-temporal data, each time window includes at least one object and at least one piece of spatio-temporal information corresponding to each object.
For example, referring to fig. 2, fig. 2 is a schematic diagram of time window allocation provided in the embodiment of the present application. In this embodiment, id _1, id _2, and id _3 respectively represent three objects, the spatiotemporal information corresponding to id _1 is (1) - (9), and the spatiotemporal information corresponding to id _2 is (1) -r, and the spatiotemporal information corresponding to id _3 is (1) - (6). In fig. 2, the window sliding step size of the time window is 1 minute, the window size is 4 minutes, and two time windows are shown in fig. 2, which are 12.
In this example, the time window 12 includes three objects, id _1, id _2, id _3, in which id _1 in time window 12. Time window 12 also includes three objects in time window 12, id _1, id _2, id _3, where id _1 is (3) - (9) in time window 12.
Thus, after assigning time windows to the spatiotemporal data, each time window includes at least one object and at least one piece of spatiotemporal information corresponding to each object.
It should be noted that the window sliding step length and the window size in this embodiment may be customized by the user according to the use requirement.
After time windows are distributed to the spatio-temporal data, the initial clustering result of each object can be obtained according to the spatio-temporal information in each time window. Specifically, referring to fig. 3, fig. 3 is a flowchart illustrating a sub-step of step S13 according to an embodiment of the present disclosure. In the present embodiment, step S13 includes the following substeps:
step S131, a first number of objects are selected from the spatio-temporal information within the time window, and a second number of spatio-temporal information is selected from each selected object as a clustering center.
Step S132, calculating Euclidean distances between other space-time information in the time window and the clustering center.
Step S133, for each object, calculates the sum of squared euclidean distances of each piece of spatio-temporal information of the object to each cluster center of the time window.
And S134, obtaining a clustering result of each object in the current iteration operation according to the minimum value of the Euclidean distance square sums.
Step S135, determining whether an iteration condition is satisfied.
If not, returning to the step of selecting a first number of objects from the spatio-temporal information in the time window and selecting a second number of spatio-temporal information from each selected object as a clustering center; if yes, ending iteration and obtaining an initial clustering result of each object.
In the above step, each time window includes at least one object, and each object includes at least one piece of spatiotemporal information. For example, m objects are included within the time window x, then x = { x = 1 ,x....x m Each object x comprises several pieces of spatiotemporal information, then x = { V = 1 ,V....V m Each piece of space-time information consists of time information and space information, namely V j ={T j ,Px j ,Py j /。
When the clustering center in each time window is selected, k objects are selected from a plurality of objects in the time window, then k1 pieces of spatiotemporal information are selected from each object of the k objects, and the k1 pieces of spatiotemporal information are used as the clustering center of the time window.
Since the quantity of the spatiotemporal information of some objects in the time window may be less than k1, at this time, all the spatiotemporal information of the objects in the time window is extracted; if the quantity of the spatiotemporal information of the object in the time window is higher than k1, time hierarchical sampling is adopted, and the extracted spatiotemporal information can be complete as much as possible.
For example, a time window includes two objects, a and B, where object a has 10 pieces of spatio-temporal information in the time window and object B has 2 pieces of spatio-temporal information in the time window. If the selected object is 2, each object needs to extract 3 pieces of spatio-temporal information, i.e. k =2, k1=3. Then the final extracted spatio-temporal information as the cluster center is 5 pieces, of which object a extracts 3 pieces and object B extracts 2 pieces.
Therefore, since the spatio-temporal information of each object in the time window is different, the number of extracted cluster centers is a range (k, k × k 1).
Optionally, in this embodiment, the k value may be customized by a user, and if the user is not customized, the k value may be obtained by calculation according to the number of objects in the time window and the average cluster size.
In particular, it can be according to the formula
Figure BDA0003160125610000091
And calculating a k value, wherein n is the number of the objects in the time window, and C is the average cluster size of the configured clustering scene.
After k objects are selected, the median of the spatio-temporal information of the k objects can be calculated, and the median obtained by calculation is taken as k1.
Optionally, after the cluster centers are extracted, the euclidean distances from other spatio-temporal information that is not a cluster center to the respective cluster centers need to be calculated.
In this embodiment, the euclidean distance between other spatio-temporal information and the cluster center within the time window may be calculated according to the following formula:
Figure BDA0003160125610000092
wherein, distance is Euclidean distance, t1 is the time of the first spatiotemporal information, t2 is the time of the second spatiotemporal information, tR1 is the adjusting parameter of t1, tR2 is the adjusting parameter of t2, px1 is the abscissa of the first spatiotemporal information, px2 is the abscissa of the second spatiotemporal information, pxR1 is the adjusting parameter of px1, pxR2 is the adjusting parameter of px2, py1 is the ordinate of the first spatiotemporal information, py2 is the ordinate of the second spatiotemporal information, pyR1 is the adjusting parameter of py1, and pyR2 is the adjusting parameter of py 2.
In this embodiment, the default value of tR1, tR2, pxR1, pxR2, pyR1, pyR2 is 1, and the user can customize tR1, tR2, pxR1, pxR2, pyR1, pyR2 according to the usage scenario, and if the user wants the euclidean distance to be small, i.e. wants the clustering ability of a certain time period, place, or object to be strong, the above parameters are turned down, and otherwise, the parameters are turned up.
For example, if 2 objects, namely a and B, are included in the time window, object a includes 3 pieces of spatio-temporal information, namely A1, A2, and A3, and object B includes 5 pieces of spatio-temporal information, namely B1, B2, B3, B4, and B5, where A1 and B3 are cluster centers, then the euclidean distances from A2, A3, B1, B2, B4, and B5 to the cluster centers A1 and B3 need to be calculated according to the euclidean distance calculation formula.
After the Euclidean distances of other spatio-temporal information to the respective cluster centers are calculated, for each object, the squared sum of the Euclidean distances of each spatio-temporal information of the object to each cluster center of the time window is calculated.
Continuing with the above example as an example, for the object a, it is necessary to calculate the squared euclidean distance sum of each piece of spatiotemporal information A1, A2, A3 of the object a to the cluster center A1, and the squared euclidean distance sum of each piece of spatiotemporal information A1, A2, A3 of the object a to the cluster center B3.
Euclidean distance sum of squares dis from A1, A2 and A3 to clustering center A1 sum1 =dis(A1,A1) 2 +dis(A2,A1) 2 +dis(A3,A1) 2 Euclidean distance sum of squares dis of A1, A2, A3 to cluster center B3 sum2 =dis(A1,B3) 2 +dis(A2,B3) 2 +dis(A3,B3) 2 Where dis (A2, A1) represents a euclidean distance between the spatio-temporal information A1 and the spatio-temporal information A2.
For each object, after two euclidean distance square sums are calculated, a clustering result of the object in the current iteration can be obtained according to the minimum value of the euclidean distance square sums. For example, for object A, if dis sum1 Is less than dis sum2 Then it means that object A is clustered to A, i.e. object A is classified as itself, if dis sum1 Greater than dis sum2 It means that the object B is clustered to a, i.e. the object a and the object B are one class.
And then judging whether an iteration condition is met, wherein the iteration condition can be that the iteration times reach a preset number, or the object clustering result of the previous round is completely consistent with the object clustering result of the current round, and the iteration can be finished when any iteration condition is met.
If the iteration condition is not satisfied, the process returns to step S131, and the clustering center is reselected for iterative operation.
After the iterative operation is finished, the initial clustering result of each object can be obtained.
Because the spatio-temporal information of each object included in each time window is different, the clustering results obtained for the same object in each time window may be different, and at this time, the clustering results need to be filtered. For example, if the spatio-temporal information of a certain object is mainly concentrated in the time window a, but a small amount of spatio-temporal information exists in the time window B and the time window C, in this case, the clustering results calculated by the time windows a, B, and C may be different, and in this case, it is necessary to screen out the clustering results of the useful time windows (time window a) to eliminate the influence of partial outliers on the clustering results.
Optionally, referring to fig. 4, fig. 4 is a flowchart illustrating a sub-step of step S14 according to an embodiment of the present disclosure. In this embodiment, step S14 includes:
step S141, determining whether the plurality of initial clustering results of the object in different time windows are the same.
Step S142, if not, judging whether the overlapping degree of different time windows reaches a preset value;
and step S143, if the preset value is reached, screening a plurality of initial clustering results of the object to obtain a target clustering result of the object.
In the above steps, if the overlapping degree of some two time windows reaches a preset value (for example, 75%), and the clustering results for the same object in the two time windows are different, the clustering results of the object need to be screened, and the target clustering result of the object is screened from the clustering results, so that the deviation of the results caused by incomplete data of some time windows is eliminated, the final clustering result is preferentially output, and the accuracy is improved.
Optionally, in this embodiment, in step S143, the screening multiple initial clustering results of the object to obtain a target clustering result of the object includes:
according to the formula
Figure BDA0003160125610000111
And calculating balance parameters of the object in each time window, wherein R is the balance parameter, distance _ sum is the minimum value of the squared Euclidean distance sum of the object in each time window, and c is the total amount of spatiotemporal information of the object in each time window.
And selecting the clustering result corresponding to the time window with the minimum balance parameter as the target clustering result of the object.
For example, if the clustering results of the object a in the time windows X, Y, and Z are all different, the balance parameters of the object a in the time windows X, Y, and Z are calculated respectively. The distance _ sum is the minimum value of the euclidean distance square sum of the object in each time window, and if the balance parameter of the object a in the time window X is calculated, the distance _ sum is the minimum value of the euclidean distance square sum of the object a in the time window X. Taking the foregoing example as an example, if the sum of squared euclidean distances of object a within time window X includes dis sum1 And dis sum2 If dis sum1 Is less than dis sum2 If distance _ sum is dis sum1 Otherwise, it is dis sum2
And then calculating the balance parameters of the object A in each time window, and determining the target clustering result of the object according to the size of the balance parameters. For example, if the balance parameters of the object a in the time windows X, Y, and Z are R1, R2, and R3, respectively, and the minimum value is R2, the target clustering result of the object a is the clustering result corresponding to the time window Y.
According to the method and the device, the time windows are distributed for the data, so that complete space-time information of the object can be found in a certain window, the integrity of the data is guaranteed, then an error correction mechanism is adopted to screen the clustering results of the same object in different time windows, the deviation of the results caused by incomplete data of other windows is eliminated, the clustering results are preferentially output, and the effect of approximate incremental calculation is achieved.
Alternatively, in this embodiment, after the target clustering results of the respective objects are output in step S15, the related networks of the respective objects may also be formed according to the target clustering results of the respective objects, that is, each object maintains one star data (the star data is star structure data in the borrowing data warehouse, and all dimension tables are directly linked to fact tables), and it may be understood that all other objects generating close relations with a certain object are directly linked to the object, and the weight of the connection line between the respective objects is represented by the close relations between the objects. Optionally, after step S15, the method further comprises:
calculating affinity scores among the objects according to the target clustering results of the objects; constructing a spatiotemporal information matrix of an object with an affinity score, and calculating the matrix similarity of the spatiotemporal information matrix; and correcting the intimacy fraction according to the matrix similarity, and outputting the corrected intimacy fraction.
Specifically, in the above-mentioned steps,
taking a day as an example, the clustering results of the objects on a single day (0 hour to 24 hours) need to be sorted first. Wherein, the object is x, and the object list generating the association relation with the object x is x j ={x 1 ,x 2 ....x m }, object list at object x i Number of upper occurrences n j ={n 1 ,n 2 .....n m }。
The affinity score between each object is then calculated according to the following formula:
Figure BDA0003160125610000121
wherein A is j Is the intimacy score, n j Listing objects at object x i The number of occurrences, n is the number of times each object is clustered into the same cluster, c jn Represents the number of visited records, distance _ sum, of object j in each time window jn Representing the euclidean distance between object j and object i within each window.
After calculating the affinity scores, generating a space-time information matrix for the objects with the affinity scores, and taking the time information as a transverse index and the space information as a longitudinal index. The spatio-temporal information matrix may be constructed according to the following conditions:
Figure BDA0003160125610000131
then, the matrix similarity between the spatio-temporal information matrices is calculated according to the following formula:
Figure BDA0003160125610000132
wherein S is ij For matrix similarity, M i Spatio-temporal matrix representing object i, M j Spatio-temporal matrix representing object j, M ij Representing the product of the spatio-temporal matrices of object i and object j.
The measure of matrix similarity is then normalized to between 0 and 1:
Figure BDA0003160125610000133
wherein S is ij ' normalized matrix similarity, A j As the intimacy score, S ij K is a constant, the value range is 0 to 1, m is the times of clustering the object i and the object j into the same cluster in a single day, and S j Representing the minimum of the sum of the squares of the euclidean distances of object j and object i in different time windows.
Then correcting the intimacy fraction according to the normalized matrix similarity to obtain the final intimacy A j ', the correction formula is as follows:
A j ’=A j *S ij
wherein, A j ' is the corrected intimacy score, S ij ' is normalized matrix similarity, A j Is the intimacy score.
Through the steps, the intimacy degree between the objects can be calculated, so that an association network between the objects is constructed to represent the association relationship between the objects.
In summary, the embodiment of the present application provides an object clustering method, an object clustering device and an electronic device based on a spatiotemporal relationship, where the method includes: preprocessing the acquired original data to acquire space-time data, wherein the space-time data comprises object data and space-time information of each object; assigning time windows to the spatiotemporal data such that each time window includes at least one object, each object including at least one piece of spatiotemporal information; aiming at each time window, acquiring an initial clustering result of each object according to the spatio-temporal information in the time window; when a plurality of initial clustering results of the object in different time windows are different, screening the initial clustering results of the object to obtain a target clustering result of the object; and finally, outputting the target clustering result of each object. This application can find the comparatively complete spatial-temporal information of object in a certain time window through setting up gliding time window to guarantee the integrality of data, adopt error correction mechanism again, when the clustering result of same object at a plurality of different time windows is different, filter the clustering result, in order to get rid of some time windows because data is incomplete leads to the clustering result to appear the deviation, improved the accuracy of clustering result.
The embodiment of the present application further provides an object clustering device based on a spatiotemporal relationship, as shown in fig. 5, fig. 5 is a functional block diagram of the object clustering device 110 based on a spatiotemporal relationship according to the embodiment of the present application. In this embodiment, the apparatus includes:
the data processing module 1101 is configured to obtain raw data, pre-process the raw data, and obtain processed spatio-temporal data, where the spatio-temporal data includes object data and spatio-temporal information of each object;
a time window allocation module 1102, configured to allocate at least two time windows to the spatio-temporal data according to a preset window attribute, where adjacent time windows are partially overlapped, each time window includes at least one object, and each object includes at least one piece of spatio-temporal information;
a clustering module 1103, configured to, for each time window, obtain an initial clustering result of each object according to spatio-temporal information in the time window;
a result screening module 1104, configured to, for each object, screen a plurality of initial clustering results of the object in different time windows when the initial clustering results are different, to obtain a target clustering result of the object;
an output module 1105, configured to output a target clustering result of each object.
Fig. 6 shows a schematic view of an electronic device 10 provided in the embodiment of the present application. In the present embodiment, the electronic device 10 includes: a processor 11, a memory 12 and a bus 13, where the memory 12 stores machine-readable instructions executable by the processor 11, when the electronic device 10 runs, the processor 11 communicates with the memory 12 through the bus 13, and when the machine-readable instructions are executed by the processor 11, the method for clustering objects based on spatiotemporal relationship according to the embodiment of the present application is performed.
The embodiment of the present application further provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by a processor, the method for clustering objects based on spatiotemporal relationships, provided by the above embodiments, is executed.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including the preferred embodiment and all changes and modifications that fall within the scope of the present application.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims (7)

1. An object clustering method based on spatiotemporal relationship, the method comprising:
acquiring original data, preprocessing the original data, and acquiring processed spatio-temporal data, wherein the spatio-temporal data comprises object data and spatio-temporal information of each object;
allocating at least two time windows to the spatio-temporal data according to preset window attributes, wherein the adjacent time windows are partially overlapped, each time window comprises at least one object, and each object comprises at least one piece of spatio-temporal information;
aiming at each time window, obtaining an initial clustering result of each object according to the time-space information in the time window;
for each object, when a plurality of initial clustering results of the object in different time windows are different, screening the initial clustering results to obtain a target clustering result of the object;
outputting a target clustering result of each object;
for each object, when a plurality of initial clustering results of the object in different time windows are different, screening the initial clustering results to obtain a target clustering result of the object, including:
judging whether a plurality of initial clustering results of the object in different time windows are the same or not;
if not, judging whether the overlapping degrees of the different time windows reach preset values or not;
if the preset value is reached, screening a plurality of initial clustering results of the object to obtain a target clustering result of the object;
screening a plurality of initial clustering results of the object to obtain a target clustering result of the object, wherein the method comprises the following steps:
according to the formula
Figure FDA0003918481520000011
Calculating balance parameters of the object in each time window, wherein R is the balance parameter, distance _ sum is the minimum value of the Euclidean distance square sum of the objects in each time window, and c is the total amount of the spatiotemporal information of the objects in each time window;
and selecting the clustering result corresponding to the time window with the minimum balance parameter as the target clustering result of the object.
2. The method of claim 1, wherein for each time window, obtaining an initial clustering result of each object according to the spatio-temporal information in the time window comprises:
selecting a first number of objects from the spatiotemporal information within the time window, and selecting a second number of spatiotemporal information from each selected object as a clustering center;
calculating Euclidean distances between other spatiotemporal information in the time window and the clustering center;
for each object, calculating the Euclidean distance square sum of each piece of spatiotemporal information of the object to each cluster center of the time window;
obtaining a clustering result of each object in the current iteration operation according to the minimum value of the Euclidean distance square sums;
judging whether an iteration condition is met;
if not, returning to the step of selecting a first number of objects from the spatio-temporal information in the time window and selecting a second number of spatio-temporal information from each selected object as a clustering center;
if so, ending the iteration and obtaining the initial clustering result of each object.
3. The method of claim 2, wherein calculating Euclidean distances between other spatiotemporal information within the time window and the cluster center comprises:
calculating Euclidean distances between other spatiotemporal information in the time window and the cluster center according to the following formula:
Figure FDA0003918481520000021
wherein, distance is Euclidean distance, t1 is the time of the first spatiotemporal information, t2 is the time of the second spatiotemporal information, tR1 is the adjusting parameter of t1, tR2 is the adjusting parameter of t2, px1 is the abscissa of the first spatiotemporal information, px2 is the abscissa of the second spatiotemporal information, pxR1 is the adjusting parameter of px1, pxR2 is the adjusting parameter of px2, py1 is the ordinate of the first spatiotemporal information, py2 is the ordinate of the second spatiotemporal information, pyR1 is the adjusting parameter of py1, and pyR2 is the adjusting parameter of py 2.
4. The method of claim 1, wherein after outputting the target clustering results for the respective objects, the method further comprises:
calculating affinity scores among the objects according to the target clustering results of the objects;
constructing a spatiotemporal information matrix of an object with an affinity score, and calculating the matrix similarity of the spatiotemporal information matrix;
and correcting the intimacy fraction according to the matrix similarity, and outputting the corrected intimacy fraction.
5. The method of claim 1, wherein the obtaining raw data and pre-processing the raw data to obtain processed spatio-temporal data comprises:
and performing data cleaning and data denormalization processing on the original data, wherein the data cleaning comprises at least one operation of data filtering, format conversion, normalization and data sampling, and the denormalization processing comprises attribute expansion of the data and/or custom rule addition of the data.
6. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating over the bus when the electronic device is operating, the machine-readable instructions when executed by the processor performing the method of any of claims 1-5.
7. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, performs the method according to any one of claims 1-5.
CN202110788970.2A 2021-07-13 2021-07-13 Object clustering method and device based on time-space relationship and electronic equipment Active CN113626670B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110788970.2A CN113626670B (en) 2021-07-13 2021-07-13 Object clustering method and device based on time-space relationship and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110788970.2A CN113626670B (en) 2021-07-13 2021-07-13 Object clustering method and device based on time-space relationship and electronic equipment

Publications (2)

Publication Number Publication Date
CN113626670A CN113626670A (en) 2021-11-09
CN113626670B true CN113626670B (en) 2023-01-24

Family

ID=78379641

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110788970.2A Active CN113626670B (en) 2021-07-13 2021-07-13 Object clustering method and device based on time-space relationship and electronic equipment

Country Status (1)

Country Link
CN (1) CN113626670B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065168A (en) * 2018-08-29 2018-12-21 昆明理工大学 A method of disease risks assessment is carried out based on space-time class statistic
CN110750730A (en) * 2019-09-10 2020-02-04 合肥工业大学 Group detection method and system based on space-time constraint
CN112148942A (en) * 2019-06-27 2020-12-29 北京达佳互联信息技术有限公司 Business index data classification method and device based on data clustering

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190163806A1 (en) * 2017-11-28 2019-05-30 Agt International Gmbh Method of correlating time-series data with event data and system thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109065168A (en) * 2018-08-29 2018-12-21 昆明理工大学 A method of disease risks assessment is carried out based on space-time class statistic
CN112148942A (en) * 2019-06-27 2020-12-29 北京达佳互联信息技术有限公司 Business index data classification method and device based on data clustering
CN110750730A (en) * 2019-09-10 2020-02-04 合肥工业大学 Group detection method and system based on space-time constraint

Also Published As

Publication number Publication date
CN113626670A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
KR102315497B1 (en) Method and device for building a scoring model and evaluating user credit
CN107291672A (en) The treating method and apparatus of tables of data
CN108108821A (en) Model training method and device
CN110235138A (en) System and method for appearance search
CN111767847B (en) Pedestrian multi-target tracking method integrating target detection and association
CN110765863B (en) Target clustering method and system based on space-time constraint
CN108182421A (en) Methods of video segmentation and device
US11062455B2 (en) Data filtering of image stacks and video streams
CN112163637B (en) Image classification model training method and device based on unbalanced data
EP4209959A1 (en) Target identification method and apparatus, and electronic device
CN112949751B (en) Vehicle image clustering and track restoring method
CN113052245A (en) Image clustering method and device, electronic equipment and storage medium
CN110674413B (en) User relationship mining method, device, equipment and storage medium
CN113626670B (en) Object clustering method and device based on time-space relationship and electronic equipment
CN113408332A (en) Video mirror splitting method, device, equipment and computer readable storage medium
CN112148942B (en) Business index data classification method and device based on data clustering
CN115114963B (en) Intelligent streaming media video big data analysis method based on convolutional neural network
CN117194966A (en) Training method and related device for object classification model
CN114756742A (en) Information pushing method and device and storage medium
CN112434648A (en) Wall shape change detection method and system
CN111191524A (en) Sports people counting method
CN112487082A (en) Biological feature recognition method and related equipment
CN114140140B (en) Scene screening method, device and equipment
CN116704221B (en) Image processing method, apparatus, device and computer readable storage medium
CN117786438B (en) Meta-universe digital twin method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant