CN116415160A - Data processing method, device, equipment and medium - Google Patents

Data processing method, device, equipment and medium Download PDF

Info

Publication number
CN116415160A
CN116415160A CN202111619456.2A CN202111619456A CN116415160A CN 116415160 A CN116415160 A CN 116415160A CN 202111619456 A CN202111619456 A CN 202111619456A CN 116415160 A CN116415160 A CN 116415160A
Authority
CN
China
Prior art keywords
terminal
time points
track data
time point
positions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111619456.2A
Other languages
Chinese (zh)
Inventor
林玥
张淯易
高雪松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingdao Hisense Electronic Technology Services Co ltd
Original Assignee
Qingdao Hisense Electronic Technology Services Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingdao Hisense Electronic Technology Services Co ltd filed Critical Qingdao Hisense Electronic Technology Services Co ltd
Priority to CN202111619456.2A priority Critical patent/CN116415160A/en
Publication of CN116415160A publication Critical patent/CN116415160A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The application provides a data processing method, a device, equipment and a medium, which can improve clustering flexibility. The data processing method comprises the steps of receiving track data reported by a plurality of terminals, wherein the track data reported by a first terminal comprises a plurality of time points and positions corresponding to the time points, and the first terminal is any one of the plurality of terminals; determining important positions of the first terminal, wherein the important positions represent positions where the occurrence frequency of track data reported by the first terminal is greater than that of track data reported by a second terminal, and the second terminal is any terminal except the first terminal; determining a target number of the first time point according to the important positions of the plurality of terminals and the time points corresponding to the important positions, wherein the target number represents the total number of the important positions corresponding to the first time point; based on the ratio of the target number of the first time points to the first number, the cluster number of the first time points is adjusted, and the first number is the sum of the target numbers of the plurality of time points.

Description

Data processing method, device, equipment and medium
Technical Field
The present disclosure relates to the field of information processing, and in particular, to a data processing method, apparatus, device, and medium.
Background
With the development of wireless communication technology and positioning technology, location based services (Location Based Services, LBS) are proposed. LBS may utilize location technology to determine the spatial location of a user, who may then acquire location-related resources and information through wireless communication. LBS greatly facilitates people's lives. Typically, the location related resources and information are obtained based on trajectory data of the user, which is provided by the computing node device to the LBS service provider. But the user's trajectory data may expose privacy information such as interests, social relationships, physical health, home address, work place, etc. of the user. Thus, it is necessary to process the trajectory data of the user, preventing the user privacy information from being compromised.
In order to avoid the disclosure of user privacy caused by directly providing the locus data of each user to the LBS service provider, the computing node device may receive the position data reported by a plurality of user devices, and perform clustering processing on all the position data at a time point to obtain the locus data. The clustered trajectory data may represent locations in a majority of the user sets at various points in time. The number of clusters adopted by the computing node equipment during clustering has important influence on the availability and effectiveness of a clustering result, and generally the number of clusters adopted by the computing node equipment during clustering depends on manual configuration, so that the clustering flexibility is poor, and the track data availability and effectiveness provided for an LBS service provider are possibly poor.
Disclosure of Invention
The application provides a data processing method, a device, equipment and a medium, which can improve the clustering flexibility and the availability and the effectiveness of a clustering result.
In a first aspect, an embodiment of the present application provides a data processing method, which may be applied to a computing node device, where the method includes:
receiving track data reported by a plurality of terminals, wherein the track data reported by a first terminal comprises a plurality of time points and positions corresponding to the time points, and the first terminal is any one of the plurality of terminals;
determining an important position of the first terminal, wherein the important position represents a position where the occurrence frequency of track data reported by the first terminal is greater than that of track data reported by a second terminal, and the second terminal is any terminal except the first terminal of the plurality of terminals;
determining a target number of first time points according to the important positions of the plurality of terminals and the time points corresponding to the important positions, wherein the target number represents the total number of the important positions corresponding to the first time points;
and adjusting the cluster number of the first time point based on the ratio of the target number of the first time point to a first number, wherein the first number is the sum of the target numbers of the time points.
In a possible implementation manner, the data processing method provided in the embodiment of the application may further include:
clustering positions corresponding to the first time point based on the adjusted cluster number of the first time point;
and sending target data to a server, wherein the target data comprises the plurality of time points and clustering results of the time points.
In a possible implementation manner, in the data processing method provided in the embodiment of the present application, the determining the important location of the first terminal includes:
determining an important evaluation parameter of a first position based on the occurrence frequency of the first position in the track data of the first terminal, the total number of the track data reported by the plurality of terminals and the total number of the track data containing the first position, wherein the first position is any position in the track data reported by the first terminal;
selecting a second number of second positions from all positions of the track data of the first terminal, wherein an important evaluation parameter of any one of the second positions is larger than that of other positions, and the other positions are any one position except the second number of second positions in all positions of the first terminal;
Wherein the second location is an important location of the first terminal.
In a possible implementation manner, in the data processing method provided in the embodiment of the present application, the first location is any one location in the track data reported by the first terminal; or the first position is a position with the frequency of occurrence in the track data of the first terminal being greater than or equal to a preset frequency threshold.
In a possible implementation manner, in the data processing method provided in the embodiment of the present application, the adjusting the number of clusters at the first time point based on a ratio of the target number at the first time point to the first number includes:
if the ratio of the target number of the first time points to the first number is zero, taking the preset maximum cluster number as the cluster number of the first time points;
and if the ratio of the target number of the first time points to the first number is not zero, the cluster number of the first time points is reduced on the basis of the maximum cluster number.
In a possible implementation manner, in the data processing method provided in the embodiment of the present application, the reducing the number of clusters at the first time point based on the maximum number of clusters includes:
Determining the cluster number of the first time point after the adjustment by adopting a first formula, wherein the first formula is as follows:
Figure BDA0003437400220000031
wherein i represents the first time, clus (i) is the cluster number of the first time point, m is the preset maximum cluster number, num (i) is the target number of the first time point, and num_t is the first number.
In a possible implementation manner, in the data processing method provided in the embodiment of the present application, the reducing the number of clusters at the first time point based on the maximum number of clusters includes:
determining the cluster number of the first time point after the adjustment based on a second formula, wherein the second formula is as follows:
Figure BDA0003437400220000032
wherein i represents the first time, clus (i) is the cluster number of the first time point, m is the preset maximum cluster number, num (i) is the target number of the first time point, num_t is the first number, and Lap is Laplacian noise.
In a second aspect, embodiments of the present application provide a data processing apparatus, which may include a communication module and a processing module;
the communication module is used for: receiving track data reported by a plurality of terminals, wherein the track data reported by a first terminal comprises a plurality of time points and positions corresponding to the time points, and the first terminal is any one of the plurality of terminals;
The processing module is used for:
determining an important position of the first terminal, wherein the important position represents a position where the occurrence frequency of track data reported by the first terminal is greater than that of track data reported by a second terminal, and the second terminal is any terminal except the first terminal of the plurality of terminals;
determining a target number of first time points according to the important positions of the plurality of terminals and the time points corresponding to the important positions, wherein the target number represents the total number of the important positions corresponding to the first time points;
and adjusting the cluster number of the first time point based on the ratio of the target number of the first time point to a first number, wherein the first number is the sum of the target numbers of the time points.
In a third aspect, an embodiment of the present invention provides a computing node device, comprising a processor and a memory, wherein the memory stores program code that, when executed by the processor, causes the processor to perform the steps of any of the methods of the first aspect described above.
In a fourth aspect, an embodiment of the present invention provides a computer readable storage medium comprising program code for causing an electronic device to perform the steps of any one of the methods of the first aspect, when said program code is run on the electronic device.
In the data processing method provided by the embodiment of the present application, any one important position of the first terminal may represent that the track data frequency of the important position appearing in the first terminal is higher, and the track data frequency of the important position appearing in other terminals is lower. The important location of the first terminal is a privacy location of the user to which the first terminal belongs with a high probability. The computing node device may determine the total number of significant locations corresponding to any one point in time, and record the total number as the target number for that point in time. The greater the target number of time points, the higher the likelihood that the location at that time point is the privacy location of the user to which each terminal belongs. The smaller the target number, the lower the likelihood that the privacy location of the user to which each terminal belongs when reflecting the location at the point in time. The computing node equipment adjusts the cluster number of the time points according to the target number of each time point, so that the cluster number of the time points with large target number and the cluster number of the time points with small target number are different. The computing node equipment can reduce the clustering cluster number of the time points with large target quantity, the clustering cluster number is reduced, the risk of exposing the privacy position of the user can be reduced, and the availability of the clustering result is ensured. The clustering number of the time points with small target quantity is increased, so that the risk of exposing the privacy position of the user can be avoided, the availability of the clustering result is ensured, and the effectiveness of the clustering result can be improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic illustration of an application scenario shown according to an exemplary embodiment;
FIG. 2 is a schematic flow chart diagram illustrating a data processing method according to an exemplary embodiment;
FIG. 3 is a schematic diagram of a clustering process shown according to an exemplary embodiment;
FIG. 4 is a schematic flow chart diagram illustrating another data processing method according to an exemplary embodiment;
FIG. 5 is a schematic block diagram of a data processing apparatus according to an exemplary embodiment;
FIG. 6 is a schematic diagram of a computing node device, according to an example embodiment.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, wherein it is apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Some words appearing hereinafter are explained:
1. in the embodiment of the application, the term "and/or" describes the association relationship of the association objects, which means that three relationships may exist, for example, a and/or B may be represented: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.
2. The terms first, second and the like in the description and in the claims of the present application and in the above-described figures, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that embodiments of the present application described herein may be implemented in sequences other than those illustrated or otherwise described herein.
The application scenario described in the embodiments of the present application is for more clearly describing the technical solution of the embodiments of the present application, and does not constitute a limitation on the technical solution provided in the embodiments of the present application, and as a person of ordinary skill in the art can know that, with the appearance of a new application scenario, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems. In the description of the present application, unless otherwise indicated, the meaning of "a plurality" is two or more.
With the development of the universal interconnection technology, the number of intelligent terminals connected to the network is increased. The data generated by each network access intelligent terminal is more and more, so that various data generated in the physical world reach massive levels. If all the data generated by the network-access intelligent terminals are transmitted to the cloud (cloud platform or cloud center) for calculation and analysis, huge energy consumption and network delay are caused. Edge computing is an emerging computing architecture with core capabilities to support rapid development of everything interconnection, fusion of grids, computing, storage, applications, and the like. The edge computing may generally refer to migrating some cloud application services to an edge of a network to which the intelligent terminal belongs, and providing an edge intelligent service for the intelligent terminal at a grid edge, which may not require a processing procedure of executing the intelligent service at the cloud.
As can be seen, edge computation can reduce network latency and power consumption. For LBS service providers, the combination of edge computation and LBS can improve the real time, accuracy, robustness and security of location services (e.g., indoor location) and reduce data transmission and computing resource overhead.
Edge computation is typically performed by a compute node device. As shown in fig. 1, each computing node device may receive track data reported by a plurality of terminals. And after clustering the track data reported by the terminal, each computing node device sends a clustering result to a cloud server of the LBS service provider. The existing computing node clusters the time points reported by the intelligent terminals and the positions corresponding to the time points, and sends the clustering result to the LBS. When the existing computing node equipment clusters data, the clustering number is usually preset by manual experience.
However, the positions of the terminals corresponding to different time points have differences, and the same clustering number can reduce the accuracy of the clustered results after clustering, influence the availability and the effectiveness of the clustered results, and influence the use of the clustered results by an LBS service provider.
Therefore, the data processing method, the device and the equipment can improve the clustering flexibility and the availability and the effectiveness of the clustering result.
The data processing method provided by the application may be executed by a data processing apparatus or a computing node device, and the following description will take the computing node device to execute the data processing method as an example. The computing node device in the embodiments of the present application may include, but is not limited to, an electronic device having a computing capability and a storage capability, such as a gateway device, a switch, a base station, and the like, which is not excessively limited in the embodiments of the present application.
In this embodiment, please refer to fig. 1 again, in an LBS-oriented scenario, a terminal may collect location data, perform simple anonymization processing on the data, and send the data to a nearby computing node device. The computing node device can be used as an edge computing node, and can receive data reported by a plurality of terminals for aggregation processing. To hide the privacy information of the terminals, the computing node device typically performs generalization processing, such as data clustering, on the data reported by the multiple terminals. And then the clustering result is sent to the cloud server. The cloud server may be an electronic device having computing and processing capabilities. Real-time, safe and accurate indoor positioning service can be provided for the terminal through statistical analysis, data mining, machine learning and other modes. In the embodiment of the application, three types of entities, namely, at least one terminal, at least one computing node device and at least one cloud server, can form an end-side-cloud distributed system.
Fig. 2 shows a schematic flow chart of a data processing method according to an exemplary embodiment. The data processing method provided by the embodiment of the application can be executed by any computing node device. The data processing method provided by the embodiment of the application may include, but is not limited to, the following steps:
in step S201, the computing node device receives track data reported by a plurality of terminals, where the track data reported by a first terminal includes a plurality of time points and positions corresponding to the time points, and the first terminal is any one of the plurality of terminals.
In the embodiment of the application, the computing node device may receive data of a plurality of terminals. In an LBS scenario, a first terminal may report trace data to a computing node device. The first terminal may refer to any one terminal or each terminal of the chrome-plated terminals, and for convenience of description, the function or the working principle of one terminal of the terminals is described herein, and the function or the working principle of the other terminal is the same as the working principle of one terminal of the terminals illustrated herein, and is not repeated. One of the terminals described by way of example herein is defined as a first terminal, and the names are used for distinction only and are not used for special limitation of the terminals.
The track data reported by the first terminal comprises a plurality of time points and positions corresponding to the time points. Based on the trajectory data, a position corresponding to any one position of the first terminal may be determined, and a time point corresponding to any one position of the first terminal may also be determined. In some scenarios, the location corresponding to the time point may represent the location of the first terminal acquired by the first terminal at the time point. In one possible design, the track data uploaded by the first terminal may be anonymized by the first terminal. The method for anonymizing data by the first terminal in the embodiment of the present application is not particularly limited.
Step S202, a computing node device determines an important position of the first terminal, wherein the important position represents a position where occurrence frequency of track data reported by the first terminal is greater than that of track data reported by a second terminal, and the second terminal is any terminal except the first terminal.
In this embodiment of the present application, the track data reported by the first terminal may consist of a series of positions ordered according to the time points. Typically, the importance of different locations for the first terminal is different. For example, some locations occur less frequently than trace data, even only once, such as high speed service area locations, etc. As another example, other locations occur more frequently in the track data, which locations are likely to be the point of interest locations of the user to which the first terminal belongs or the home locations of the user, company locations, etc. It can be seen that the computing node device can determine the importance level of the location to the user to which the first terminal belongs by the frequency of occurrence of the location.
However, similar to public places such as supermarkets and parks, the frequency of track data reported by the first terminal is high, and the frequency of track data reported by other terminals is also high, so that the privacy information of the user of the first terminal cannot be represented.
Based on this, the computing node device in the embodiment of the present application determines, as the important position (or privacy position) of the first terminal, a position satisfying the following condition:
the frequency of the position in the track data reported by the first terminal is greater than the frequency of the position in the track data reported by any other terminal. It can be seen that the important location of the first terminal may characterize the location that is frequently visited by the user to which the first terminal belongs, and is a location that is rarely visited by the users to which the other terminals belong.
Assume that a set formed by track data reported by a plurality of terminals and received by a computing node device is denoted as D, track data reported by a terminal k is denoted as L (k), k represents the identification of the terminal, k is 1 to N, and N is the total number of track data reported by the plurality of terminals. The track data L (k) includes positions r (i) of time points, i being taken throughout 1 to g, g being the total number of all time points in the track data, where r (i) is a position corresponding to the time point i.
In a possible implementation manner, the computing node device may determine the important evaluation parameter s_r (j) of the first location r (j) based on the frequency q (j) of occurrence of the first location r (j) in the track data L (k) of the first terminal, the total number N of track data reported by the plurality of terminals, and the total number p_r (j) of track data including the first location r (j), where the first location r (j) is any one location in the track data L (k) reported by the first terminal, that is, j may take 1 to g.
The computing node device may determine the important evaluation parameter s_r (j) for the first location r (j) using the following formula:
Figure BDA0003437400220000091
where q (j) may be the number of times the first position r (j) appears in the trajectory data L (k) of the first terminal, and the number of all positions included in the trajectory data L (k) of the first terminal, that is, g. The larger q (j) may be indicative of a higher frequency of occurrence of the first location r (j) at the trajectory data L (k) of the first terminal.
The total number P of track data containing the first position r (j) may reflect the track data reported by the plurality of terminals, where p_r (j) track data includes the first position r (j). The greater P_r (j), the greater
Figure BDA0003437400220000092
Smaller (less)>
Figure BDA0003437400220000093
The smaller the more trace data that can be reflected, the more trace data that includes the first location, which is less frequently present to other terminals. The smaller P_r (j), the +. >
Figure BDA0003437400220000094
The bigger the->
Figure BDA0003437400220000095
The larger the relatively smaller the trace data includes the first location, which appears less frequently to other terminals.
It should be noted that the computing node device may also employ other variants of the above formula to determine the important evaluation parameter of the first location. The log (x) function in the above formula may be replaced by other functions f (x), which may reflect that the larger P _ r (j),
Figure BDA0003437400220000101
the smaller; and the smaller P_r (j), the +.>
Figure BDA0003437400220000102
The larger the feature. The embodiments of the present application are not so limited.
In a possible implementation manner, the computing node device may determine whether the frequency q (j) of the occurrence of the first location r (j) in the track data L (k) of the first terminal is greater than a preset frequency threshold before determining the important evaluation parameter s_r (j) of the first location based on the frequency q (j) of the occurrence of the first location r (j) in the track data L (k) of the first terminal, the total number N of track data reported by the plurality of terminals, and the total number p_r (j) of track data including the first location r (j).
The computing node device may select a preset number W1 of second positions from all positions of the track data of the first terminal, where an important evaluation parameter of any one of the preset number W1 of second positions is greater than an important evaluation parameter of other positions, where the other positions are any one position other than the second number of second positions in all positions of the first terminal.
For example, the computing node device may determine an important evaluation parameter s_r (j) for each location in the trajectory data of the first terminal, where j takes over 1 to g. The computing node device may rank the important evaluation parameters of all positions in the track data of the first terminal from large to small, and pre-set a number of W1 positions. The location selected by the computing node device is the location of interest of the first terminal.
In a possible implementation, the preset number W1 is preconfigured in the computing node device or is pre-stored in the computing node device. The computing node device may determine the corresponding preset number based on the actual application scenario.
In step S202, the computing node device may determine the important positions of all terminals, that is, determine the important positions of a plurality of terminals, for each important position of the terminal.
In step S203, the computing node device determines, according to the important positions of the plurality of terminals and the time points corresponding to the important positions, a target number of first time points, where the target number characterizes the total number of important positions corresponding to the first time points.
In this embodiment of the present application, among the important positions of the plurality of terminals, the position corresponding to the time point a at which the terminal 1 may appear is the important position of the terminal 1, and the position corresponding to the time point a at which the terminal 2 corresponds is also the important position of the terminal 2. Then the number of important positions corresponding to time point a is 2. The number of important positions corresponding to the time point a can be recorded as the target number of the time point a. It can be seen that the computing node device can determine a target number for each point in time i. The target number is a total number of important positions corresponding to the first time point i among the important positions of the plurality of terminals.
Step S204, calculating the ratio of the target number of the first time points to the first number of the node equipment, and adjusting the cluster number of the first time points, wherein the first number is the sum of the target numbers of the time points.
The first time point in the embodiment of the present application may be denoted as a time point i, where i is taken throughout 1 to g. The target number at the first time point i is noted as num (i). The sum of the target numbers of the plurality of time points is noted as a first number num _ t, wherein,
Figure BDA0003437400220000111
then the ratio H (i) of the target number num (i) to the first number num_t of time points i, wherein +.>
Figure BDA0003437400220000112
The larger H (i), that is, the larger the number of important positions corresponding to the time point i is compared with the first number, the more important positions corresponding to the time point i of each terminal are in track data reported by a plurality of terminals. These locations represent the privacy locations of the users to which the terminal belongs with a greater likelihood, or the locations represent the privacy locations of the users of the terminal. In order to protect the privacy position of the user, in the process of clustering track data reported by a plurality of terminals, the computing node equipment can improve the generalization degree of the position corresponding to the time point i and reduce the clustering cluster number corresponding to the time point i.
The smaller H (i), that is, the smaller the number of important positions corresponding to the time point i is compared with the first number, the less important the positions corresponding to the time point i of each terminal in the track data reported by the terminals are. These locations are less likely to characterize the privacy location of the user to which the terminal belongs, or are not privacy locations of the user of the terminal. In the process of clustering track data reported by a plurality of terminals, the computing node equipment can reduce the generalization degree of the position corresponding to the time point i and increase the clustering cluster number corresponding to the time point i.
In a possible implementation manner, if the ratio of the target number num (i) of the time points i to the first number num_t is zero, the computing node device may use the preset maximum cluster number m as the cluster number clus (i) of the time points i.
For example, if the target number num (i) of the time point i is zero, the ratio H (i) of the target number to the first number num_t of the time point i is zero. In this case, the computing node device may take the preset maximum cluster number m as the cluster number clus (i) of the time point i.
If the ratio H (i) of the target number num (i) to the first number num_t of the time point i is not zero, the cluster number clus (i) of the time point i is reduced on the basis of the maximum cluster number m. For example, if the target number num (i) of the time point i is not zero, the ratio H (i) of the target number num (i) to the first number num_t of the time point i is not zero. In this case, the computing node device may decrease the cluster number clus (i) of the time point i on the basis of the preset maximum cluster number m.
As can be seen from the above description, the larger the target number of the time point i, the smaller the cluster number of the time point i. From this, it can be seen that the target number of the time points i and the cluster number of the time points in the embodiment of the present application have a negative correlation.
In a possible implementation, the computing node device may employ a formula
Figure BDA0003437400220000121
Figure BDA0003437400220000122
The cluster number clus (i) of the time point i is determined. Wherein clus (i) is the cluster number of the time point i, m is the preset maximum cluster number, num (i) is the target number of the time point i, and num_t is the first number.
In a possible implementation, the computing node device determines the cluster number clus (i) of the point in time i based on a laplace mechanism of differential privacy. The Laplace mechanism is to add noise values conforming to Laplace distribution to the output result to realize differential privacy protection for any function f D-R d . If algorithm Q follows epsilon-differential privacy, then the output of algorithm Q satisfies: m (D) =f (D) +lap (Δf/epsilon). Where Lap (Δf/ε) is the noise added to the output that fits the Laplace distribution. The size of the noise value is determined in the art by the global sensitivity Δf and the privacy budget epsilon. This mechanism of achieving differential privacy by adding noise may be referred to as the laplace mechanism.
Based on the above description, the computing node device may employ the formula
Figure BDA0003437400220000123
Figure BDA0003437400220000124
The cluster number clus (i) of the time point i is determined. Wherein clus (i) is the cluster number of the time point i, m is the preset maximum cluster number, num (i) is the target number of the time point i, num_t is the first number, and Lap is Laplacian noise.
It should be noted that, in the embodiment of the present application, the negative correlation between the target number num (i) of the time point i and the cluster number clus (i) may include, but is not limited to, numerical limitation of the time point i and the cluster number clus (i) in the above formula. The negative correlation between the target number num (i) of the time points i and the cluster number clus (i) may include a variation of the above formula or a similar numerical relationship, which is not excessively limited in the embodiment of the present application.
In step S205, the computing node device clusters the positions corresponding to the first time point based on the adjusted cluster number of the first time point.
The computing node device may cluster the positions of the time points i in the track data of the plurality of terminals based on the cluster number of the time points i, where i is taken throughout 1 to g. Assume that the number n of the plurality of terminals is 6, and the total number g of time points in the track data reported by each terminal is 4. As shown in fig. 3, the number of clusters at time point 1 is 2, the number of clusters at time point 2 is 3, the number of clusters at time point 3 is 1, and the number of clusters at time point 4 is 3. In the clustering process of the computing node device for the positions of all the terminals corresponding to any time point, the original point closest to the center of the cluster can be used as a representative of the position in the cluster. Clustering results at time point 1 are Loc1-1 and Loc1-2, respectively. Clustering results at time point 2 are Loc2-1 and Loc2-2, respectively. The clustering result at time point 3 is Loc3-1. Clustering results at time point 4 are Loc4-1, loc4-2 and Loc4-3, respectively.
In step S206, the computing node device sends target track data to the server, where the target track data includes the multiple time points and clustering results of the time points.
The computing node device sends target data to a server, such as a server providing an LBS service provider, where the target data may include a plurality of time points and clustering results for each time point. In one possible design, the computing node device may also form target track data based on the clustering results of multiple time points, as shown in fig. 3, where each target track data may include each time point and a clustering result corresponding to each time point. The target track data line_1 may include one clustering result Loc1-1 corresponding to a time point 1, one clustering result Loc2-2 corresponding to a time point 2, one clustering result Loc3-1 corresponding to a time point 3, and one clustering result Loc4-1 corresponding to a time point 4. It can be seen that the computing node device may generate a plurality of target track data based on each time point and the clustering result corresponding to each time point.
In the data processing method provided by the embodiment of the application, the computing node device can adjust the cluster number of each time point according to the target number of each time point. The larger the target number of the time point is, the more likely the position of each terminal corresponding to the time point is a privacy position, the number of clustering clusters of the time point is reduced, the privacy position is prevented from being exposed, and the availability of a clustering result can be ensured. The smaller the target number of the time point is, the lower the possibility that the position of each terminal corresponding to the time point is a privacy position is, the cluster number of the time point is increased, and the effectiveness of the clustering result is improved.
FIG. 4 illustrates a data processing method according to an exemplary embodiment, which may include the steps of:
step S401, receiving track data reported by a plurality of terminals, where the track data reported by a first terminal includes a plurality of time points and positions corresponding to the time points, and the first terminal is any one of the plurality of terminals.
Step S402, determining the occurrence times of a first position in the track data of the first terminal, wherein the first position is any position in the track data of the first terminal.
In step S403, the computing node device determines whether the number of times corresponding to the first location is greater than or equal to a preset number of times threshold, if yes, executes step S404 next, and if no, determines the important evaluation parameters of the first location.
In this embodiment of the present application, if the number of times corresponding to the first location is smaller than the preset number of times threshold, the computing node device may record the important evaluation parameter of the first location as 0.
Step S404, determining an important evaluation parameter of the first position.
The computing node device may determine an important evaluation parameter s_r (j) of the first location r (j) based on a frequency q (j) of occurrence of the first location r (j) in the track data L (k) of the first terminal, a total number N of track data reported by the plurality of terminals, and a total number p_r (j) of track data including the first location r (j), where the first location r (j) is any one location in the track data L (k) reported by the first terminal, i.e., j may take 1 to g.
The computing node device may determine the important evaluation parameter s_r (j) for the first location r (j) using the following formula:
Figure BDA0003437400220000151
where r (j) characterizes the first location, q (j) may be a frequency of occurrence of the first location r (j) in the trajectory data L (k) of the first terminal, and the value may be a ratio of the number of occurrences of the first location r (j) in the trajectory data L (k) of the first terminal to the total number of all the locations included in the trajectory data L (k) of the first terminal. The larger q (j) may be indicative of a higher frequency of occurrence of the first location r (j) at the trajectory data L (k) of the first terminal.
Step S405, selecting a preset number of important positions from the track data of the first terminal, where the important evaluation parameter of each important position is greater than the important evaluation parameters of the unselected positions.
In the embodiment of the application, the computing node device may rank all positions where the important evaluation parameters can be determined. The computing node devices may be ranked according to the importance rating parameter from large to small. And selecting the preset number of positions as the preset number of important positions.
Step S406, determining the target quantity of each time point according to the important positions of the terminals and the time points corresponding to the important positions, wherein the target quantity represents the total quantity of the important positions corresponding to the time points.
Step S407, determining the number of clusters corresponding to each time point according to the target number of each time point.
The computing node device may employ the formula
Figure BDA0003437400220000152
The cluster number clus (i) of the time point i is determined. Wherein clus (i) is the number of clusters of the time point i, m is the preset maximum number of clusters, num (i) is the target number of the time point i, num_t is the sum of the target numbers of the time points, whichIn the process, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0003437400220000153
lap is Laplacian noise.
Step S408, clustering the positions of all the terminals at the time points according to the number of the clustering clusters at the time points.
For each time point, when the computing node device clusters the positions of the time points of all the terminals according to the number of clusters, the original point closest to the center of gravity of the cluster can be selected as a representative of the position in the cluster.
Step S408, based on the clustering result of each time, generating target track data and sending the target track data to the server.
On the basis of the foregoing embodiments, fig. 5 is a schematic structural diagram of a data processing apparatus according to some embodiments of the present application, where the apparatus includes:
the communication module 501 is configured to receive track data reported by a plurality of terminals, where the track data reported by a first terminal includes a plurality of time points and positions corresponding to the time points, and the first terminal is any one of the plurality of terminals.
And the processing module 502 is configured to determine an important position of the first terminal, where the important position represents a position where occurrence frequency of track data reported by the first terminal is greater than occurrence frequency of track data reported by a second terminal, and the second terminal is any terminal of the plurality of terminals except the first terminal.
The processing module 502 is further configured to determine, according to the important positions of the plurality of terminals and the time points corresponding to the important positions, a target number of first time points, where the target number characterizes a total number of important positions corresponding to the first time points.
The processing module 502 is further configured to adjust a cluster number of the first time point based on a ratio of a target number of the first time point to a first number, where the first number is a sum of target numbers of the plurality of time points.
Further, the processing module 502 is further configured to cluster the positions corresponding to the first time point based on the adjusted cluster number of the first time point.
The processing module 502 is further configured to send target data to a server, where the target data includes the multiple time points and clustering results of the time points.
Further, when the processing module 502 determines the important location of the first terminal, it is specifically configured to:
determining an important evaluation parameter of a first position based on the occurrence frequency of the first position in the track data of the first terminal, the total number of the track data reported by the plurality of terminals and the total number of the track data containing the first position, wherein the first position is any position in the track data reported by the first terminal;
and selecting a second number of second positions from all positions of the track data of the first terminal, wherein an important evaluation parameter of any one of the second number of second positions is larger than that of other positions, and the other positions are any one position except the second number of second positions in all positions of the first terminal; wherein the second location is an important location of the first terminal.
Further, the first position is any position in the track data uploaded by the first terminal; or the first position is a position with the frequency of occurrence in the track data of the first terminal being greater than or equal to a preset frequency threshold.
Further, the processing module 502 is specifically configured to, when adjusting the number of clusters at the first time point based on a ratio of the target number to the first number at the first time point:
if the ratio of the target number of the first time points to the first number is zero, taking the preset maximum cluster number as the cluster number of the first time points;
and if the ratio of the target number of the first time points to the first number is not zero, the cluster number of the first time points is reduced on the basis of the maximum cluster number.
Further, when the processing module 502 adjusts the cluster number of the first time point based on the maximum cluster number, the processing module is specifically configured to: determining the cluster number of the first time point after the adjustment by adopting a first formula, wherein the first formula is as follows:
Figure BDA0003437400220000171
wherein i represents the first time, clus (i) is the cluster number of the first time point, m is the preset maximum cluster number, num (i) is the target number of the first time point, and num_t is the first number.
Or, when the processing module 502 adjusts the cluster number of the first time point based on the maximum cluster number, the processing module is specifically configured to: determining the cluster number of the first time point after the adjustment based on a second formula, wherein the second formula is as follows:
Figure BDA0003437400220000172
Wherein i represents the first time, clus (i) is the cluster number of the first time point, m is the preset maximum cluster number, num (i) is the target number of the first time point, num_t is the first number, and Lap is Laplacian noise.
Fig. 6 is a schematic structural diagram of a computing node device according to some embodiments of the present application, where, based on the foregoing embodiments, the present application further provides a computing node device, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete communication with each other through the communication bus 604;
the memory 603 has stored therein a computer program which, when executed by the processor 601, causes the processor 601 to perform the steps of:
receiving track data reported by a plurality of terminals, wherein the track data reported by a first terminal comprises a plurality of time points and positions corresponding to the time points, and the first terminal is any one of the plurality of terminals;
determining an important position of the first terminal, wherein the important position represents a position where the occurrence frequency of track data reported by the first terminal is greater than that of track data reported by a second terminal, and the second terminal is any terminal except the first terminal of the plurality of terminals;
Determining a target number of first time points according to the important positions of the plurality of terminals and the time points corresponding to the important positions, wherein the target number represents the total number of the important positions corresponding to the first time points;
and adjusting the cluster number of the first time point based on the ratio of the target number of the first time point to a first number, wherein the first number is the sum of the target numbers of the time points.
The communication bus mentioned by the computing node device may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.
The communication interface 602 is used for communication between the computing node devices described above and other devices.
The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.
The processor may be a general-purpose processor, including a central processing unit, a network processor (Network Processor, NP), etc.; but also digital instruction processors (Digital Signal Processing, DSP), application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
The concepts related to the technical solutions provided in the present application, explanation, detailed description and other steps related to the computing node device refer to the descriptions of these contents in the foregoing methods or other embodiments, and are not repeated herein.
On the basis of the above embodiments, the present application also provides a computer-readable storage medium storing a computer program, which is executed by a processor to:
receiving track data reported by a plurality of terminals, wherein the track data reported by a first terminal comprises a plurality of time points and positions corresponding to the time points, and the first terminal is any one of the plurality of terminals;
determining an important position of the first terminal, wherein the important position represents a position where the occurrence frequency of track data reported by the first terminal is greater than that of track data reported by a second terminal, and the second terminal is any terminal except the first terminal of the plurality of terminals;
Determining a target number of first time points according to the important positions of the plurality of terminals and the time points corresponding to the important positions, wherein the target number represents the total number of the important positions corresponding to the first time points;
and adjusting the cluster number of the first time point based on the ratio of the target number of the first time point to a first number, wherein the first number is the sum of the target numbers of the time points.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (10)

1. A data processing method, applied to a computing node device, the method comprising:
receiving track data reported by a plurality of terminals, wherein the track data reported by a first terminal comprises a plurality of time points and positions corresponding to the time points, and the first terminal is any one of the plurality of terminals;
determining an important position of the first terminal, wherein the important position represents a position where the occurrence frequency of track data reported by the first terminal is greater than that of track data reported by a second terminal, and the second terminal is any terminal except the first terminal of the plurality of terminals;
determining a target number of first time points according to the important positions of the plurality of terminals and the time points corresponding to the important positions, wherein the target number represents the total number of the important positions corresponding to the first time points;
and adjusting the cluster number of the first time point based on the ratio of the target number of the first time point to a first number, wherein the first number is the sum of the target numbers of the time points.
2. The method of claim 1, wherein the method further comprises:
Clustering positions corresponding to the first time point based on the adjusted cluster number of the first time point;
and sending target data to a server, wherein the target data comprises the plurality of time points and clustering results of the time points.
3. The method of claim 1, wherein the determining the location of importance of the first terminal comprises:
determining an important evaluation parameter of a first position based on the occurrence frequency of the first position in the track data of the first terminal, the total number of the track data reported by the plurality of terminals and the total number of the track data containing the first position, wherein the first position is any position in the track data reported by the first terminal;
selecting a second number of second positions from all positions of the track data of the first terminal, wherein an important evaluation parameter of any one of the second positions is larger than that of other positions, and the other positions are any one position except the second number of second positions in all positions of the first terminal;
Wherein the second location is an important location of the first terminal.
4. The method of claim 3, wherein the first location is any one location in the track data uploaded by the first terminal; or the first position is a position with the frequency of occurrence in the track data of the first terminal being greater than or equal to a preset frequency threshold.
5. The method of claim 1, wherein the adjusting the number of clusters at the first time point based on the ratio of the target number to the first number at the first time point comprises:
if the ratio of the target number of the first time points to the first number is zero, taking the preset maximum cluster number as the cluster number of the first time points;
and if the ratio of the target number of the first time points to the first number is not zero, the cluster number of the first time points is reduced on the basis of the maximum cluster number.
6. The method of claim 5, wherein the reducing the number of clusters at the first point in time based on the maximum number of clusters comprises:
determining the cluster number of the first time point after the adjustment by adopting a first formula, wherein the first formula is as follows:
Figure FDA0003437400210000021
Wherein i represents the first time, clus (i) is the cluster number of the first time point, m is the preset maximum cluster number, num (i) is the target number of the first time point, and num_t is the first number.
7. The method of claim 5, wherein the reducing the number of clusters at the first point in time based on the maximum number of clusters comprises:
determining the cluster number of the first time point after the adjustment based on a second formula, wherein the second formula is as follows:
Figure FDA0003437400210000022
wherein i represents the first time, clus (i) is the cluster number of the first time point, m is the preset maximum cluster number, num (i) is the target number of the first time point, num_t is the first number, and Lap is Laplacian noise.
8. A data processing apparatus, the apparatus comprising a communication module and a processing module;
the communication module is used for: receiving track data reported by a plurality of terminals, wherein the track data reported by a first terminal comprises a plurality of time points and positions corresponding to the time points, and the first terminal is any one of the plurality of terminals;
the processing module is used for:
Determining an important position of the first terminal, wherein the important position represents a position where the occurrence frequency of track data reported by the first terminal is greater than that of track data reported by a second terminal, and the second terminal is any terminal except the first terminal of the plurality of terminals;
determining a target number of first time points according to the important positions of the plurality of terminals and the time points corresponding to the important positions, wherein the target number represents the total number of the important positions corresponding to the first time points;
and adjusting the cluster number of the first time point based on the ratio of the target number of the first time point to a first number, wherein the first number is the sum of the target numbers of the time points.
9. A computing node device, comprising: comprising a processor and a memory, wherein the memory stores program code that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 7.
10. A computer readable and writable storage medium, on which computer instructions are stored which when executed by a processor implement the steps of the method of any one of claims 1 to 7.
CN202111619456.2A 2021-12-27 2021-12-27 Data processing method, device, equipment and medium Pending CN116415160A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111619456.2A CN116415160A (en) 2021-12-27 2021-12-27 Data processing method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111619456.2A CN116415160A (en) 2021-12-27 2021-12-27 Data processing method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN116415160A true CN116415160A (en) 2023-07-11

Family

ID=87051368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111619456.2A Pending CN116415160A (en) 2021-12-27 2021-12-27 Data processing method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN116415160A (en)

Similar Documents

Publication Publication Date Title
CN110209820B (en) User identification detection method, device and storage medium
CN111477341A (en) Epidemic situation monitoring method and device, electronic equipment and storage medium
CN110955903B (en) Privacy resource authority control method, device and equipment based on intelligent graph calculation
CN111294730B (en) Method and device for processing network problem complaint information
CN111309614A (en) A/B test method and device and electronic equipment
CN108702334B (en) Method and system for distributed testing of network configuration for zero tariffs
AU2023202016A1 (en) Method for identifying a device using attributes and location signatures from the device
CN108182282A (en) Address authenticity verification methods, device and electronic equipment
CN111800807A (en) Method and device for alarming number of base station users
CN113449986A (en) Service distribution method, device, server and storage medium
CN113961780A (en) Resident cell acquisition method and device, electronic equipment and storage medium
CN111179136A (en) Dynamic control method and device and electronic equipment
CN112286930A (en) Method, device, storage medium and electronic equipment for resource sharing of redis business side
CN110020166B (en) Data analysis method and related equipment
CN116415160A (en) Data processing method, device, equipment and medium
CN109146122A (en) A kind of probability forecasting method, device, electronic equipment and computer storage medium
CN113076451B (en) Abnormal behavior identification and risk model library establishment method and device and electronic equipment
WO2019162954A1 (en) Method and determining unit for identifying optimal location(s)
US11284293B2 (en) Location-based telecommunication prioritization
CN109597743A (en) Page circle choosing method, click volume statistical method and relevant device
CN112118592A (en) Region generation method and device, electronic equipment and storage medium
CN113657635A (en) Method for predicting communication user loss and electronic equipment
CN112508466A (en) Position identification method and device, computer readable storage medium and electronic equipment
CN116963274B (en) Bluetooth AOA (automated optical inspection) based indoor positioning method and system
CN117056663B (en) Data processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication