WO2018161729A1 - 用户轨迹恢复方法及装置 - Google Patents
用户轨迹恢复方法及装置 Download PDFInfo
- Publication number
- WO2018161729A1 WO2018161729A1 PCT/CN2018/073856 CN2018073856W WO2018161729A1 WO 2018161729 A1 WO2018161729 A1 WO 2018161729A1 CN 2018073856 W CN2018073856 W CN 2018073856W WO 2018161729 A1 WO2018161729 A1 WO 2018161729A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- base station
- user
- point data
- trajectory
- track
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/02—Services making use of location information
- H04W4/029—Location-based management or tracking services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W4/00—Services specially adapted for wireless communication networks; Facilities therefor
- H04W4/02—Services making use of location information
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W64/00—Locating users or terminals or network equipment for network management purposes, e.g. mobility management
Definitions
- the present invention relates to the field of communications, and in particular, to a user trajectory recovery method and apparatus.
- the operator obtains a large amount of user communication data every day, such as A port signaling data, detailed detailed records (CDR) data, and measurement reports (MR) data, which include user and service base stations. Connection information between. Since the location of the base station is known, the operator in the prior art can use the location of the base station as the approximate location of the user when the current communication behavior occurs according to the user communication data, and form discrete track points, as shown in FIG. 1 . If all the track points of a user for a certain period of time are collected, a continuous user track can be formed. These user trajectories can provide operators with additional data value. For example, the operator can analyze the user's place of residence and work location according to the user trajectory, thereby providing services and marketing activities to the user.
- CDR detailed detailed records
- MR measurement reports
- the prior art provides a scheme for user trajectory recovery, which recovers the user trajectory by periodically extracting user trajectory points, although the noise in the user trajectory points can be reduced to some extent, but The noise accuracy is low. Therefore, how to improve the denoising accuracy when the user tracks are recovered is an urgent problem to be solved.
- the application provides a user track recovery method and device to at least improve the denoising accuracy of the user track recovery.
- a user trajectory recovery method comprising: acquiring a raw trajectory point data sequence of a user of the trajectory to be restored, wherein each original trajectory point data in the original trajectory point data sequence includes the original trajectory point Corresponding base station identification and acquisition time point; based on the mapping model and the original track point data sequence, in the case that the overall mapping cost is the smallest, determining the identity of the base station where the user is located at the regular time point, and obtaining the user's a noise track point data sequence, wherein each denoising track point data in the denoised track point data sequence includes an identification and a regular time point of the base station corresponding to the denoised track point data; the regular time point is a fixed time interval Time point; the mapping condition of the mapping model includes: mapping the original track point data to the identifier of a base station at a regular time point, and mapping the plurality of original track point data to the identity of one base station at the same regular time point Up; recovering the trajectory of the user to be restored according to the
- the frequently switched base station of the user can be mapped to a unique base station at a regular time point. Therefore, based on the scheme, the situation in which the user switches back and forth between different base stations is removed, so that the data quality and accuracy after denoising during the track recovery can be greatly improved.
- recovering the trajectory of the user of the trajectory to be recovered according to the denoised trajectory point data sequence comprising: determining the user based on the denoised trajectory point data sequence and a pre-trained user-base station model Base station granularity denoising track point data sequence; wherein the parameters of the user-base station model include: transition probability between N base stations, where N is an identifier of a different base station included in the original track point data sequence Quantity; recovering the trajectory of the user of the to-be-recovered trajectory according to the denoised trajectory point data sequence of the base station granularity.
- the missing data in the denoising track point data sequence can be padded to form a continuous user track, so that each regular time point contains the user's track point, which further improves the accuracy of the user track recovery.
- recovering the trajectory of the user of the trajectory to be recovered according to the denoised trajectory point data sequence comprising: determining the user based on the denoised trajectory point data sequence and a pre-trained user-base station model Base station granularity denoising track point data sequence; wherein the parameters of the user-base station model include: transition probability between N base stations, where N is an identifier of a different base station included in the original track point data sequence a quantity; a track point data sequence based on the base station granularity and a pre-trained base station-geographic grid model to determine a denoised track point data sequence of the user's geographic grid granularity; wherein the base station-geo-raster model parameters
- the method includes: a transition probability between the M geographic grids, and an output probability of each geographic grid to the N base stations, where M is a positive integer; and the data is restored according to the denoised track point data sequence of the geographic grid granularity The track of the user who recovered the track.
- the missing data in the denoising track point data sequence can be padded to form a continuous user track, so that each regular time point contains the user's track point, and can be controlled by the accuracy of the geographic grid.
- the accuracy of user track recovery further improves the accuracy of user track recovery.
- each original track point data in the original track point data sequence further includes a longitude and a latitude of the base station corresponding to the original track point, so as to obtain a geographic grid according to the longitude and latitude of the base station.
- the granular denoising track point data sequence further recovers the user's trajectory of the to-be-recovered track according to the de-noising track point data sequence of the geographic grid granularity.
- the latitude and longitude of the base station corresponding to the base station identifier may be obtained according to the identifier of the base station in the original trajectory point data, which is not specifically limited in this application.
- determining a denoised track point data sequence of the base station granularity of the user based on the denoised track point data sequence and the pre-trained user-base station model including: according to the denoised track point data sequence
- the de-noising track point data sequence of the geographic raster granularity of the user is determined based on the track point data sequence of the base station granularity and the pre-trained base station-geographic grid model, including: according to the base station
- the granularity of the denoised trajectory point data sequence of each base station granularity of the denoised trajectory point data includes a regular time point, the identity of the base station, the transition probability between the M geographic grids, and each geographic grid pair of N base stations
- the method further includes: acquiring a plurality of first training data for training the user-base station model, wherein each of the plurality of first training data includes a base station Identifying and collecting a time point; determining, according to each of the first training data, a number of times of transferring from any one of the N base stations to any one of the N base stations; according to the one of the N base stations The number of times that the base station is transferred to any one of the N base stations, and the transition probability between the N base stations is determined according to a third preset formula, where the third preset formula includes: Where ⁇ (n1, n2) represents the number of times of handover from the nth base station to the n2th base station, The total number of times from the nth base station to the N base stations is indicated, and ⁇ (n1, n2) represents the transition probability from the n1th base station to the n2th base station.
- the user-base station model can be trained to obtain the parameters of the user-base
- the method further includes: acquiring a plurality of second training data for training the base station-geographic grid model, wherein each of the plurality of second training data includes Identification of the base station, acquisition time point, longitude and latitude of the user; determining, according to the longitude and latitude of the user in each second training data, any one of the geographic rasters of the M geographic grids and any of the M geographic grids The distance of a geographic grid; the distance between any geographic grid of M geographic grids and any geographic grid of M geographic grids, combined with preset rules, to determine any geographic grid of M geographic grids a transition probability between a grid and any one of the M geographic grids; determining, according to each second training data, any one of the M geographic grids to any one of the N base stations The number of outputs; determining the number of outputs of any one of the N base stations according to any one of the M geographic grids, and determining any one of the M geographic grids based on the fourth preset formula Geographical raster output probability of any of
- the user-base station model in this application is a Markov model.
- the base station-geographic grid model in this application is a hidden Markov model.
- a user trajectory recovery device having the function of implementing the above method.
- This function can be implemented in hardware or in hardware by executing the corresponding software.
- the hardware or software includes one or more modules corresponding to the functions described above.
- a third aspect provides a user trajectory recovery apparatus, including: a processor and a communication interface; the processor and the communication interface are connected by a bus, and the processor is configured to implement any of the foregoing first aspect and possible implementation thereof User track recovery method.
- the user track recovery device point further includes a memory for storing computer program instructions
- the processor is connected to the memory through the bus, and the processor executes the program instructions stored in the memory to
- the user trajectory recovery device is caused to perform the user trajectory recovery method as described in any of the above first aspects and possible implementations thereof.
- a computer readable storage medium for storing computer program instructions for use by the user trajectory recovery device, when executed on a computer, causes the computer to perform any of the above first aspects User track recovery method.
- a computer program product comprising instructions which, when run on a computer, cause the computer to perform the user trajectory recovery method of any of the above first aspects.
- FIG. 2 is a schematic structural diagram of a user track recovery apparatus according to an embodiment of the present application.
- FIG. 3 is a schematic diagram of a sequence of original track point data provided by an embodiment of the present application.
- FIG. 4 is a schematic flowchart of a method for restoring a user track according to an embodiment of the present application
- FIG. 5 is a schematic diagram of a two-dimensional matrix provided by an embodiment of the present application.
- FIG. 6 is a schematic diagram showing mapping of original track points in a two-dimensional matrix according to an embodiment of the present application.
- FIG. 7 is a schematic diagram of a distribution of a base station corresponding to a denoising track point according to an embodiment of the present application at a regular time point;
- FIG. 8 is a schematic diagram 1 of a Markov model prediction according to an embodiment of the present application.
- FIG. 9 is a schematic diagram 2 of a Markov model prediction according to an embodiment of the present application.
- FIG. 10 is a schematic diagram 3 of a Markov model prediction according to an embodiment of the present application.
- FIG. 11 is a user track of a base station granularity according to an embodiment of the present application.
- FIG. 12 is a schematic diagram of output distribution of a geographic grid to a base station according to an embodiment of the present application.
- FIG. 13 is a schematic diagram of path connection of a hidden Markov model according to an embodiment of the present application.
- FIG. 14 is a schematic structural diagram of hardware of a user track recovery apparatus according to an embodiment of the present disclosure.
- port A refers to the interface between the base station controller (BSC) and the mobile switching center (MSC).
- the A port signaling data refers to data such as terminal power on, power off, periodic location update, primary and secondary calls, short message transmission and reception, and cell handover recorded in the A port.
- CDR data records the detailed service data of the user when performing voice, short message and Internet access services, including communication number, communication duration and base station connection information.
- MR data information such as the user identifier reported by the user to the operator when the communication behavior occurs, the recording time of the MR, the identity of the connected base station, and the signal strength of the connected base station.
- the OTT data refers to the data of the content service provider (such as WeChat, map) obtained from the operator data pipeline.
- the OTT data may include the user identifier, the recording time, and the user's precise latitude and longitude. information.
- the user's precise latitude and longitude information is usually characterized by a geographical grid identifier, which is a sub-grid in the grid array obtained by meshing the map, usually 50*50 meters.
- the Markov model is mainly determined by the transition probability of the state.
- the state refers to a base station
- the transition probability of the state refers to a transition probability between the base station and the base station of the location of the user.
- the user can only be located in one of the base stations, and at the next moment, it will transfer to another base station according to the transition probability between the base station and other base stations.
- the transition probability between the base station and the base station is learned according to the original track point data sequence in the MR data of the user or the denoising track point data sequence of the base station granularity after the recovered track.
- the sequence composed of the plurality of track point data on the path with the highest transition probability of the complete path obtained after the filling is determined as A denoised track point data sequence of the base station granularity of the user recovering the trajectory.
- Hidden Markov Model is used to describe a Markov process in which the state is invisible, but the observed values are visible, mainly by the transition probability of the state and the output of the state to the observed value. Probability is determined.
- the status refers to a geographic grid
- the observed value refers to a base station.
- the transition probability of the state refers to the transition probability between the geographic location of the user and the geographic grid;
- the output probability of the state to the observed value refers to the output probability of the geographic raster to the base station.
- the output probability of the geographic grid to the base station refers specifically to the connection probability of all the surrounding base stations to which the user is located when the user is located in the current geographic grid.
- the parameters of the hidden Markov model are the transition probability between the geographic grid and the geographic grid and the output probability of each geographic grid to the base station.
- the user can only be in one of the geographic grids, and each geographic grid can only output one of the many connected base stations.
- the transition probability between the geographic grid and other geographic grids is transferred to another geographic grid, and another base station is output according to the output probability of the base station according to the transferred geographical grid.
- the transition probability between the geographic location of the user and the geographic raster and the output probability of the geographic raster to the base station are based on the original trajectory point data sequence in the MR data of the user and the longitude and latitude of the user in the OTT data.
- the training probability is obtained; or the transition probability between the geographic location of the user and the geographic grid and the output probability of the geographic grid to the base station are training based on the de-noising trajectory point data sequence of the geographic grid granularity after the recovery trajectory owned.
- the denoising track point data sequence is input into the pre-trained hidden Markov model for geo-grid prediction, the user can calculate the geo-grid sequence of the internal experience based on the externally observed base station sequence, thereby obtaining the geo-grid A granularity denoising track point data sequence.
- Each MR data includes user A to the operator when communication occurs.
- the reported user ID, the recording time of the MR, and the identity of the connected base station are shown in Table 1.
- the embodiment of the present application provides a method and a device for user track recovery, which can improve the denoising accuracy of the user track recovery.
- the specific solution is as follows:
- FIG. 2 is a schematic structural diagram of a user track recovery apparatus according to an embodiment of the present application.
- the user trajectory recovery device 20 includes an acquisition module 201, a denoising module 202, a trajectory recovery module 203, and a storage module 204.
- the acquisition module 201 of the user trajectory recovery device acquires the user original trajectory point data sequence from the operator side, and the original trajectory point data sequence contains noise (the base station frequently switches), and is discontinuous in time (some time periods are many Data, no data for certain time periods).
- the denoising module 202 is configured to remove noise in the original track point data sequence to obtain a denoised track point data sequence having regular time point information (eg, one track point every 10 minutes). However, the denoising track point data sequence is still discontinuous, and the data at many regular time points is missing.
- the track recovery module 203 is configured to fill in the missing data in the denoised track point data sequence to form a continuous user track, so that each regular time point contains the user's track point, which is called the user's final track point.
- the user-base station model in the trajectory recovery module 203 can recover the trajectory point of the base station granularity, and the base station-geographic grid model can recover the trajectory points of the grid granularity, and the restored final trajectory points are stored in the user trajectory database. To support subsequent services and marketing activities. specific,
- the obtaining module 201 is configured to obtain, from the carrier raw data, a sequence of original track point data of the user of the track to be restored, where each original track point data in the data sequence includes the identifier and the acquisition time of the base station corresponding to the original track point. point.
- the obtaining module 201 may acquire multiple original track point data sequences of the user A according to the plurality of MR data reported by the user A, where each original track in the data sequence is obtained.
- the point data includes the identity of the base station corresponding to the original track point and the acquisition time point.
- each original trajectory point data in the data sequence may further include a longitude and a latitude of the base station corresponding to the original trajectory point, or the acquiring module 201 may obtain, according to the base station identifier in the original trajectory point data, the base station identifier. Longitude and latitude of the base station. Among them, as shown in Figure 3, these original track point data are sorted by time, which are:
- the first original track point data x 1 [base station A, 10:03:37, 121.46472, 31.08572];
- the second original track point data x 2 [base station B, 10: 14: 25, 121.46253, 31.08744];
- the third original track point data x 3 [base station A, 10: 19: 25, 121.46472, 31.08572];
- the fourth original track point data x 4 [base station C, 10: 39: 10, 121.46752, 31.08572]; etc., which are not enumerated here.
- the denoising by the denoising module 202 is required.
- the denoising module 202 is configured to remove the noise in the original track point data sequence by using an optimization technique to obtain a denoised track point data sequence, wherein the original track point data sequence has the lowest space-time cost after denoising.
- the specific denoising scheme will be described in the following embodiments, and details are not described herein again.
- the denoising module 202 can greatly improve the data quality and accuracy after denoising.
- the denoised track point data sequence obtained after denoising by the denoising module 202 has regular time point information (such as one track point every 10 minutes)
- the denoised track point data sequence is still discontinuous.
- the data on a lot of regular time points is missing.
- the track recovery by the track recovery module 203 is required.
- the trajectory recovery module 203 is configured to restore the user trajectory according to the denoised trajectory point data sequence to obtain a final trajectory point data sequence.
- the trajectory recovery module 203 in the embodiment of the present application may include two trajectory recovery models, one is a user-base station model, and the input is a denoised trajectory point data sequence, and the output is a denoising trajectory point of the base station granularity.
- a data sequence one is a base station-geo-raster model, the input is a denoising track point data sequence of the base station granularity, and the output is a geo-grid-grained denoising track point data sequence, and the geographic grid granularity denoised track point data sequence Can be used as the final track point data sequence.
- the track recovery module 203 can fill in the missing data in the denoised track point data sequence to form a continuous user track, so that each regular time point contains the user's track point, which greatly improves the accuracy of the user track recovery. .
- trajectory recovery module 203 provided by the embodiment of the present application, only the user-base station model described above may be included, and the denoising trajectory point data sequence of the base station granularity output by the user-base station model is used as the final trajectory point data sequence.
- the embodiment of the present application does not specifically limit this.
- the user-base station model in the embodiment of the present application may specifically be the Markov model described above.
- the base station-geographic grid model in the embodiment of the present application may specifically be the hidden Markov model described above.
- the user track recovery device 20 provided by the embodiment of the present application further includes a storage module 204.
- the storage module 204 is configured to store the denoised track point data sequence of the base station granularity and the de-noise track point data sequence of the geographic grid granularity in the user track database of the storage module 204.
- the base station granularity denoised track point data sequence can be used to train the user-base station model; the geographic grid granularity denoised track point data sequence can be used to train the base station-geographic grid model.
- the specific model training scheme will be described in the following embodiments, and details are not described herein again.
- the final track point data sequence is also used to support subsequent service and marketing activities.
- a possible implementation method for user track recovery includes the following steps S401-S403:
- the user track recovery device acquires an original track point data sequence of the user of the track to be restored.
- the obtaining module 201 in FIG. 2 is used to support the user trajectory recovery device to perform step S401 in the embodiment of the present application.
- the acquiring module 201 for related description, reference may be made to the acquiring module 201, and details are not described herein again.
- the user trajectory recovery device determines, according to the mapping model and the original trajectory point data sequence, the identifier of the base station where the user is located at the regular time point, and obtains the user's denoising trajectory point data sequence.
- the denoising track point data in the denoised track point data sequence includes an identifier and a regular time point of the base station corresponding to the denoised track point data.
- the regular time point is the time point of the fixed time interval.
- the mapping model is defined by: mapping an original trajectory point data to the identity of a base station at a regular time point, and mapping the plurality of original trajectory point data to at most one base station identification at the same regular time point.
- each denoising track point data is composed of two elements of the identifier of the base station corresponding to the denoising track point data and the regular time point, an original track point data is mapped to the identifier of a base station at a regular time point, actually The above means that an original track point data can only correspond to one denoising track point data, which is uniformly explained here, and will not be described below.
- the method before step S402, the method further includes: the user trajectory recovery device creates a two-dimensional matrix according to the identifier of the base station and the regular time point in the original track point data sequence acquired in step 401.
- the matrix elements y n,p of the nth row and the pth column in the two-dimensional matrix are represented by (the identifier of the nth base station, the pth regular time point), and both n and p are positive integers.
- the number of different base station identifiers in the data sequence is 8, and the regular time points are 9:30:00, 10:00:00, 10:30:00, 11: 00:00, 11:30:00, 12:00:00, 12:30:00, 01:00:00, 01:30:00, 02:00:00, etc.
- Total 10 time points, according to the data sequence The identifier of the base station corresponding to each original track point and the two-dimensional matrix created by the regular time point may be as shown in FIG. 5.
- the matrix elements y n,p of the nth row and the pth column in the two-dimensional matrix are represented by (the nth base station identifier, the pth regular time point).
- the two-dimensional matrix is presented in a two-dimensional grid, wherein each sub-grid in the two-dimensional grid represents a matrix element in the two-dimensional matrix, and a unified description is provided herein. , will not repeat them below.
- the regularization time point in the embodiment of the present application is a time point of a fixed time interval, which may be preset in advance, or may be determined according to the collection time point in the collected original track point data sequence, such as a preset.
- the time period is 9:00:00-14:00:00, and the preset time interval is 30 minutes, and the preset time period is divided according to the preset time interval to obtain the regular time point as shown in FIG.
- the configuration manner of the regular time point is not specifically limited in the embodiment of the present application.
- the identification list of all base stations in the original track point data sequence is the vertical axis of the two-dimensional matrix
- the regular time point list is the horizontal axis of the two-dimensional matrix as an example.
- the identifier list of all the base stations in the original track point data sequence may be the horizontal axis of the two-dimensional matrix
- the regular time point list is the vertical axis of the two-dimensional matrix, which is not specifically limited in this embodiment of the present application.
- the user trajectory recovery device is based on the mapping model and the original trajectory point data sequence.
- the base station identifier of the user at the regular time point is determined, and the user's denoising trajectory point data sequence is obtained, which may be:
- the user trajectory recovery device determines, based on the mapping model and the original trajectory point data sequence, the matrix element corresponding to each original trajectory point data when the original mapping point data is separately mapped into the two-dimensional matrix, in the case where the overall mapping cost is the smallest. , get the user's denoising track point data sequence.
- the above-mentioned mapping model may be specifically defined as: one original track point data is mapped onto one matrix element, and each column of the two-dimensional matrix has at most one matrix element mapped to the original track point data.
- N is the number of identifiers of different base stations included in the original track point data sequence; P is the number of regular time points; T is the number of original track point data; WORK(X, Y, F) is expressed as f t, The weight of n, p is mapped, the overall cost of the payment; Indicates that WORK(X,Y,F) is the smallest. Indicates that the variable k is summed from 1 to K, and d t,n,p represents the cost of mapping x t to y n,p .
- Constraints include: f t,n,p ⁇ 0,1 ⁇ , a n,p ⁇ 1,2,...,T ⁇ ,b n,p ⁇ 0,1 ⁇ ; M ⁇ b n,p ⁇ a n,p ;
- M is the maximum number
- a n,p represents the number of original track point data that can be mapped to y n,p
- f t,n,p ⁇ 0,1 ⁇ constraint x t maps to y n
- a n, p ⁇ ⁇ 1,2, «, T ⁇ constraints may be mapped to y n, the number p of the original dot data track is an integer between 0 and T.
- the constraint x t is mapped to y n, and the weight of p is 1 and the number of original track point data that can be mapped to y n,p is the same, that is, if x t is mapped to y n , p has a weight of 1, Indicates that x t can be mapped to y n,p .
- At most one matrix element of each column of the constrained two-dimensional matrix is mapped to the original track point data.
- a n,p M ⁇ b n,p ⁇ a n,p and
- the original track point data shown in FIG. 3 is mapped into the two-dimensional matrix shown in FIG. 5, and the result can be as shown in FIG. 6. among them,
- w s represents the space cost parameter and w t represents the time cost parameter, which is used to adjust the weight of time and space respectively.
- the default is 1; ⁇ (t) (x t , y n, p ) represents the time distance; ⁇ ( s) (x t , y n, p ) represents the spatial distance.
- ⁇ (t) (x t , y n, p ) and ⁇ (s) (x t , y n, p ) are calculated as shown in formula (4) and formula (5), respectively:
- the longitude and latitude information of the base station required in the formula (5) may be part of the original track point data in the original track point data sequence acquired in step S401, or may be based on the original obtained in step S401.
- the user track recovery device pre-stores the correspondence between the identity of the base station and the longitude and latitude of the base station, and after acquiring the identity of the base station, The correspondence determines the longitude and latitude of the base station.
- the manner of obtaining the longitude and latitude of the base station is not specifically limited in the embodiment of the present application.
- the calculation method utilizes the time and space cost as the optimization target, the time and space cost of the original track point data sequence after denoising can be minimized.
- ⁇ (t) (x t , y n,p ) and ⁇ (s) (x t , y n,p ) can also be calculated in other ways, for example, The method of calculating ⁇ (s) (x t , y n, p ) is not specifically limited in the embodiment of the present application.
- the above step S402 provides a denoising scheme provided by the embodiment of the present application.
- the denoising module 202 of FIG. 2 supports the user trajectory recovery apparatus to perform the step. It can be seen from FIG. 3 that the collection time point of the original track point data of the user A belongs to an irregular time point, that is, the time interval of the collection time points of the adjacent two original track point data is not fixed; As shown in FIG. 6 , after denoising by using the denoising scheme provided by the embodiment of the present application, the time point of the denoising track point data (that is, the matrix element) of the user A belongs to the regular time point, that is, adjacent two The time interval of the time points in the denoised track point data is fixed. As shown in FIG.
- the user A frequently switches during the time period of the track to be restored.
- the denoising method provided by the embodiment of the present application is used for denoising, the user A can be frequently switched.
- the base station is mapped to a unique base station at a regular point in time.
- the denoising scheme provided by the embodiment of the present application can greatly improve the data quality and accuracy after denoising, because the user A is switched back and forth between different base stations.
- the embodiment of the present application adopts an optimization technique in denoising, the original track point data sequence can be minimized after denoising.
- the user track recovery device restores the trajectory of the user to be restored according to the denoising track point data sequence.
- the trajectory of the user to be restored may be obtained by using the existing track recovery method according to the denoised track point data sequence, or may be performed as shown in FIG. 2 above.
- the trajectory recovery module 203 recovers the trajectory of the user to be restored, which is not specifically limited in this embodiment of the present application.
- the specific solution for recovering the trajectory of the user of the trajectory to be restored by using the trajectory recovery module 203 shown in FIG. 2 above will be described in the following embodiments, and details are not described herein again.
- the user trajectory recovery method can determine the identifier of the base station where the user is located at the regular time point based on the mapping model and the original trajectory point data sequence, and obtain the user's denoising at the regular time point.
- Track point data sequence Since the regularization time point belongs to the time point of the fixed time interval; and the qualification condition of the mapping model includes: mapping the original track point data to the identifier of a base station at a regular time point, and multiple original tracks at the same regular time point The point data is mapped up to the identity of one base station. That is to say, the frequently switched base station of the user can be mapped to a unique base station at a regular time point.
- the embodiment of the present application further provides a method for user trajectory recovery according to the above-mentioned denoising trajectory point data, that is, the trajectory recovery module 203 shown in FIG. 2 is used to restore the trajectory.
- the track of the user who recovers the track is filled with the missing data at the regular time point to form a continuous user track for track recovery, which further greatly improves the accuracy of the user track recovery.
- the user-base station model is a Markov model
- the base station-geographic grid model is a hidden Markov model as an example.
- the training process for the Markov model and the hidden Markov model is as follows.
- the transition probability between the N base stations can be expressed by a matrix of N*N.
- the specific training process can be as follows:
- ⁇ (n1, n2) represents the number of times the training data is transferred from the nth base station to the n2th base station, Indicates the total number of times the training data is transferred from the nth base station to the N base stations, and ⁇ (n1, n2) represents the transition probability of the training data from the n1th base station to the n2th base station.
- the parameters that need to be determined during the training of the Markov model are the transition probabilities between the three base stations A, B, and C.
- the transition probability between the three base stations A, B, and C can be expressed by a 3*3 matrix.
- the transfer between the three base stations A, B, and C occurs a total of 600 times, as shown in Table 3:
- the target state is normalized based on formula (6), and the training process is completed, and the transition probability between the base station and the base station as shown in Table 4 can be obtained:
- the training data used in the training process of the above-mentioned Markov model may be the original track point data sequence in the MR data of the user, or may be the denoising track point data sequence of the base station granularity after the track recovery, such as
- the denoising track point data sequence of the base station granularity obtained by the trajectory recovery module 203 shown in FIG. 2 is not specifically limited in this embodiment of the present application. Considering that the Markov model is trained by using the denoising track point data sequence of the base station granularity, the training result of the model is more accurate.
- the denoising track point data sequence of the base station granularity is stored in the user trajectory database of the storage module 204 shown in FIG. 2, for subsequent updating of the Markov model, so that the training result is more accurate, and then the trajectory recovery is performed. The accuracy is also higher.
- HMM Hidden Markov Model
- each geographic grid has an output probability for the N base stations.
- the parameters that need to be determined during the training of the hidden Markov model are the transition probabilities between the M geographic grids and the output probability of each geographic grid to the N base stations.
- the transition probability between the M geographic grids can be expressed by a matrix of M*M; the output probability of each geographic grid to the N base stations can be expressed by a matrix of M*N.
- the specific training process is as follows:
- any one of the geographic rasters of the M geographic grids and the M may be determined according to the longitude and latitude of the user represented by the geographic grid included in the training data.
- the preset rule may be a transition probability between the geographic grid and the geographic grid and a Gaussian distribution between the geographic grid and the geographic grid.
- any one of the M geographical grids may be calculated based on formula (7) for any one of the N base stations Output probability.
- ⁇ (m, n3) represents the number of outputs of the mth geographic grid to the n3th base station in the training data
- ⁇ (m, n3) represents the output probability of the mth geographic grid to the n3th base station in the training data.
- the transition probability between the three geographic grids I, II, and III can be expressed by a 3*3 matrix; the output probability of each of the two base stations A and B can be 3*2.
- the parameters that need to be determined during the training of the hidden Markov model are the transition probabilities between the three geographic grids I, II, and III, and the output probabilities of the two base stations A and B for each geographic grid.
- the transition probability between the three geographic grids I, II, and III can be expressed by a 3*3 matrix; the output probability of each of the two base stations A and B can be 3*2.
- the transition probability between the geographic grid and the geographic grid can be specified according to rules, such as the transition probability between the geographic grid and the geographic grid and between the geographic grid and the geographic grid.
- the distance is Gaussian. Suppose the distance between the geographic grid and the geographic grid (in meters) is shown in Table 5:
- the Gaussian distribution is defined by equation (8) and is determined by two parameters ( ⁇ , ⁇ ):
- the transition probability between the geographic grid and the geographic grid in Table 6 is not the normalized transition probability, for each initial state, the target state is normalized, and the geogrid as shown in Table 7 can be obtained.
- the base station of the geographic raster output is normalized based on formula (7), and the training process is completed, and the output probability of the geographic grid to the base station as shown in Table 9 can be obtained:
- the training data used in the training process of the hidden Markov model may be the original track point data sequence in the MR data of the user and the OTT data corresponding to the MR data, or may be the geogrid after the recovery track.
- the granularity of the denoised trajectory point data such as the de-noising trajectory point data sequence of the geographic grid granularity obtained by the trajectory recovery module 203 shown in FIG. 2, is not specifically limited in this embodiment of the present application.
- the original trajectory point data in the MR data includes an identifier of the user, an identifier of the base station to which the user is connected when the current communication behavior occurs, and an acquisition time point of the MR data;
- the OTT data includes the identifier of the user, the collection time point, and the user's Longitude and latitude.
- the MR data and the OTT data are associated by the user's identification and acquisition time points.
- the geo-grid-grained denoising track point data sequence may be stored in the user trajectory database of the storage module 204 shown in FIG. 2 for subsequent updating of the hidden Markov model, so that the training result is obtained. More accurate, and then the accuracy of subsequent trajectory recovery is higher.
- the above-described trained Markov model and hidden Markov model may be stored in the trajectory recovery module 203 shown in FIG. 2, and further, After the noise module 202 outputs the denoised track point data sequence, the track recovery module 203 can restore the user track based on the trained Markov model and the hidden Markov model, and the process of restoring the user track is to use the above-mentioned regular time point. The process of filling in the missing data to form a continuous user track is described below.
- the input of the Markov model is: the denoised track point data sequence output by the denoising module 202.
- the output of the Markov model is: a denoised track point data sequence of the base station granularity.
- the trajectory recovery process of the Markov model includes: acquiring the identification and the regular time point of the base station included in the denoised trajectory point data sequence in the denoised trajectory point data sequence output by the denoising module 202; determining the denoising trajectory point according to the regular time point The regular time point included in the missing track point data in the data sequence; according to the parameters of the Markov model, that is, the transition probability between the N base stations, combined with the formula (9), the regular time included in the missing track point data is determined. If the base station where the user is located corresponds to any one of the N base stations, the base station corresponding to the identifier of the base station included in the missing track point data and the identifier of the base station included in the denoising track point data are corresponding.
- the base station constitutes a transition probability of the first complete path, and determines a plurality of base stations on the path with the highest transition probability of the first complete path as the base station where the user of the to-be-recovered track is at a different regular time point;
- the base station where the user of the trajectory to be recovered is located at different regular time points, and the denoising of the base station granularity of the user who determines the trajectory to be recovered Trace point data sequence.
- the transition probability of the first complete path the product of the transition probability between the base stations on the first complete path (9)
- the transition probability between the base station and the base station in the Markov model is as shown in Table 4, and it is assumed that the base station corresponding to the denoised track point obtained after denoising by the denoising module 202 is at the regular time point.
- the distribution diagram is as shown in FIG. 7 , which is respectively corresponding to a denoising track point data at time 0.
- the identifier of the base station in the denoised track point data is base station A, and the track point data is missing at time 1 and time 2, at time 3
- the track recovery can be performed in the following manner to obtain a denoised track point data sequence of the base station granularity:
- the state transition is performed again from each possible state from time 1.
- the result is shown in FIG. 9 , and the process of state transition at time 0 can be specifically referred to, and details are not described herein again.
- the base station on one path with the highest transition probability can be determined as the base station where the user A is located at different regular time points. For example, it can be concluded from Table 10 that when the transfer path is base station A-base station A-base station B-base station C, the transition probability from base station A to base station C is the largest, being 0.064, so it can be determined that user A corresponds at time 1.
- the base station is the base station A
- the base station corresponding to the time 2 is the base station B
- the user track of the base station granularity after the trajectory is restored is as shown in FIG.
- the denoising track point data sequence based on the base station granularity recovered by the trajectory recovery method can greatly improve the accuracy of the user trajectory recovery because it has the highest state transition probability.
- the input of the hidden Markov model is: the denoising track point data sequence of the base station granularity output by the Markov model.
- the output of the hidden Markov model is: a de-noising track point data sequence of geographic grid granularity.
- the trajectory recovery process of the hidden Markov model includes: obtaining the identity of the base station and the regularization time point of the denoised trajectory point data in the denoised trajectory point data sequence of the base station granularity output by the Markov model; according to the identifier of each base station , the regularization time point and the transition probability between the M geographic grids in the hidden Markov model and the output probability of each geographic grid to the N base stations, combined with the formula (10), determine that the output can be output at each regular time point.
- the transition probability of the second complete path connected by all the geographic grids of the base station corresponding to the regular time point, and the plurality of geographic grids on the path with the highest transition probability of the second complete path are determined as the users of the to-be-recovered track
- the geographic grid granularity denoising track point data sequence of the user to be recovered is determined.
- Y r+1,r+1 represents the output probability of the geographical grid at the r+1 regular time point on the second complete path to the base station at the r+1th regular time point
- X r,r+ 1 represents the transition probability of the geographic grid at the rth regular time point and the geographic grid at the r+1th regular time point on the second complete path.
- the transition probability of the geographic grid and the geographic grid in the hidden Markov model is as shown in Table 7.
- the output probability of the geographic grid to the base station is shown in Table IX.
- the output distribution of the geographic grid to the base station is as shown in FIG. 12, which is: outputting the base station A at time 0 and time 1, and outputting the base station B at time 2 and time 3, then predicting that user A is different in the following manner
- transition probability on path 1 is:
- the transition probability on all possible paths can be calculated, and then one path with the highest transition probability among all possible paths can be determined, and the identity of the geographic grid on the path is determined to be different for user A.
- the geographic grid at time 1 is a III geographic grid
- the geographic grid at time 2 is a geographic grid
- the geographic grid at time 3 is an I geographic grid, which in turn provides a geographic grid.
- the denoising track point data sequence based on the geographic grid granularity recovered by the trajectory recovery method has the highest state transition probability and is smaller than the base station granularity, so the accuracy of the user trajectory recovery can be greatly improved.
- the above embodiments are mainly introduced in conjunction with the user trajectory recovery device shown in FIG. 2 to provide a user trajectory recovery method in the embodiment of the present application.
- the user track recovery device described above includes hardware structures and/or software modules corresponding to the execution of the respective functions in order to implement the above functions.
- the present application can be implemented in a combination of hardware or hardware and computer software in combination with the elements and algorithm steps of the various examples described in the embodiments disclosed herein. Whether a function is implemented in hardware or computer software to drive hardware depends on the specific application and design constraints of the solution. A person skilled in the art can use different methods to implement the described functions for each particular application, but such implementation should not be considered to be beyond the scope of the present application.
- a hardware structure diagram of a user trajectory recovery apparatus 1400 includes a processor 1401, a communication bus 1402, and a communication interface 1404.
- the processor 1401 may be a general-purpose processor, such as a central processing unit (CPU), a network processor (NP), or a combination of a CPU and an NP; the processor 1401 may also be a microprocessor (MCU). ), an application-specific integrated circuit (ASIC), a programmable logic device (PLD), or a combination thereof.
- the PLD may be a complex programmable logic device (CPLD), a field-programmable gate array (FPGA), a general array logic (GAL) or any of the above. combination.
- Communication bus 1402 can include a path for communicating information between the components described above.
- the communication interface 1404 uses a device such as any transceiver for communicating with other devices or communication networks, and may include an Ethernet interface, a Radio Access Network (RAN) interface, and a Wireless Local Area Networks (Wireless Local Area Networks, WLAN) interface, etc.
- a device such as any transceiver for communicating with other devices or communication networks, and may include an Ethernet interface, a Radio Access Network (RAN) interface, and a Wireless Local Area Networks (Wireless Local Area Networks, WLAN) interface, etc.
- RAN Radio Access Network
- WLAN Wireless Local Area Networks
- the user track recovery apparatus 1400 may further include a memory 1403.
- the memory 1403 may include a volatile memory, such as a random access memory (English: random-access memory, abbreviation). : RAM); the memory may also include non-volatile memory (English: non-volatile memory), such as flash memory (English: flash memory), hard disk (English: hard disk drive, abbreviation: HDD) or solid state hard disk (English) : solid-state drive, abbreviation: SSD); the memory 1403 may also include a combination of the above types of memories.
- the memory 1403 is used to store program codes.
- the processor 1401 is configured to execute the program code stored in the memory 1403 to implement the user trajectory recovery method described in FIG.
- processor 1401 may include one or more CPUs, such as CPU0 and CPU1 in FIG.
- the CPU can be a single core or multiple cores.
- the user trajectory recovery device 1400 may further include an output device 1405 and an input device 1406.
- Output device 1405 is in communication with processor 1401 and can display information in a variety of ways.
- the output device 1405 can be a liquid crystal display (LCD), a light emitting diode (LED) display device, a cathode ray tube (CRT) display device, or a projector. Wait.
- Input device 1406 is in communication with processor 1401 and can accept user input in a variety of ways.
- input device 1406 can be a mouse, keyboard, touch screen device, or sensing device, and the like.
- the user track recovery device 1400 described above can be a general purpose computer device or a dedicated computer device.
- the user trajectory recovery device 1400 can be a desktop computer, a portable computer, a network server, a personal digital assistant (PDA), a mobile phone, a tablet computer, a wireless terminal device, an embedded device, or the like in FIG. Structured equipment.
- PDA personal digital assistant
- the embodiment of the present application does not limit the type of the user trajectory recovery device 1400.
- the user trajectory recovery device provided by the embodiment of the present application can be used to perform the foregoing user trajectory recovery method. Therefore, the technical effects that can be obtained by reference to the foregoing method embodiments are not described herein.
- the above embodiments it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
- a software program it may be implemented in whole or in part in the form of a computer program product.
- the computer program product includes one or more computer instructions.
- the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are generated in whole or in part.
- the computer can be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
- the computer instructions can be stored in a computer readable storage medium or transferred from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions can be from a website site, computer, server or data center Transmission to another website site, computer, server or data center via wired (eg coaxial cable, fiber optic, digital subscriber line (DSL)) or wireless (eg infrared, wireless, microwave, etc.).
- the computer readable storage medium can be any available media that can be accessed by a computer or a data storage device that includes one or more servers, data centers, etc. that can be integrated with the media.
- the usable medium may be a magnetic medium (eg, a floppy disk, a hard disk, a magnetic tape), an optical medium (eg, a DVD), or a semiconductor medium (such as a solid state disk (SSD)) or the like.
- a magnetic medium eg, a floppy disk, a hard disk, a magnetic tape
- an optical medium eg, a DVD
- a semiconductor medium such as a solid state disk (SSD)
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
本申请公开了用户轨迹恢复方法及装置,以至少提高用户轨迹恢复时的去噪精度。方法包括:获取待恢复轨迹的用户的原始轨迹点数据序列;基于映射模型和该原始轨迹点数据序列,在总体映射代价最小的情况下,确定在规整时间点上该用户所处的基站的标识,得到该用户的去噪轨迹点数据序列,其中,该规整时间点为固定时间间隔的时间点;该映射模型的限定条件包括:一个原始轨迹点数据映射到一个规整时间点的一个基站的标识上,以及在同一个规整时间点上多个原始轨迹点数据最多映射到一个基站的标识上;根据该去噪轨迹点数据序列,恢复该待恢复轨迹的用户的轨迹。
Description
本申请要求于2017年3月7日提交中国专利局、申请号为201710132289.6,发明名称为“用户轨迹恢复方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
本发明涉及通信领域,尤其涉及用户轨迹恢复方法及装置。
运营商每天会获得大量的用户通信数据,如A口信令数据、详细话单记录(calling detailed records,CDR)数据和测量报告(measurement reports,MR)数据等,这些数据包括了用户与服务基站之间的连接信息。由于基站位置已知,因此现有技术中运营商可以根据用户通信数据将基站位置作为用户在发生当前通信行为时所处的大致位置,形成一个个离散的轨迹点,如图1所示。如果将一个用户某一时间段内的所有轨迹点收集起来,就可以形成一条连续用户轨迹。这些用户轨迹可以为运营商提供附加的数据价值,比如运行商可以根据用户轨迹分析出用户的居住地和工作地,进而向用户提供服务和营销活动。
然而,若在同一时间段内,用户经常在连接的若干个基站内频繁切换,则会在用户轨迹点中形成噪声,从而会导致难以定位用户的位置,进而会对某一时间段内用户轨迹的恢复造成影响。
基于此问题,现有技术中提供了一种用户轨迹恢复的方案,该方案通过周期性的抽取用户轨迹点的方式恢复用户轨迹,虽然可以在一定程度上减少用户轨迹点中的噪声,但是去噪精度较低。因此,如何提高用户轨迹恢复时的去噪精度,是目前亟待解决的问题。
发明内容
本申请提供用户轨迹恢复方法及装置,以至少提高用户轨迹恢复时的去噪精度。
为达到上述目的,本申请提供如下技术方案:
第一方面,提供一种用户轨迹恢复方法,该方法包括:获取待恢复轨迹的用户的原始轨迹点数据序列,其中,该原始轨迹点数据序列中的每个原始轨迹点数据包括该原始轨迹点对应的基站的标识和采集时间点;基于映射模型和该原始轨迹点数据序列,在总体映射代价最小的情况下,确定在规整时间点上该用户所处的基站的标识,得到该用户的去噪轨迹点数据序列,其中,该去噪轨迹点数据序列中的每一个去噪轨迹点数据包括该去噪轨迹点数据对应的基站的标识和规整时间点;该规整时间点为固定时间间隔的时间点;该映射模型的限定条件包括:一个原始轨迹点数据映射到一个规整时间点的一个基站的标识上,以及在同一个规整时间点上多个原始轨迹点数据最多映射到一个基站的标识上;根据该去噪轨迹点数据序列,恢复该待恢复轨迹的用户的轨迹。由于规整时间点属于固定时间间隔的时间点;并且映射模型的限定条件包括:一个原始轨迹点数据映射到一个规整时间点的一个基站的标识上,以及在同一个规整时间点上多个原始轨迹点数据最多映射到一个基站的标识上。也就是说,可以将用户的频繁切换的基站映射到一个规整时间点的唯一的基站上。因此基于该方案,将去除用户在不同基站之间来回切换的情况,从而可以极大地提升轨迹恢复时去噪后的数据质 量和精度。
在一种可能的设计中,根据该去噪轨迹点数据序列,恢复该待恢复轨迹的用户的轨迹,包括:基于该去噪轨迹点数据序列和预先训练好的用户-基站模型,确定该用户的基站粒度的去噪轨迹点数据序列;其中,该用户-基站模型的参数包括:N个基站之间的转移概率,其中,N为该原始轨迹点数据序列中包括的不同的基站的标识的数量;根据该基站粒度的去噪轨迹点数据序列,恢复该待恢复轨迹的用户的轨迹。基于该方案,可以将去噪轨迹点数据序列中缺失的数据填补上,形成一条连续的用户轨迹,使得每个规整时间点上都含有用户的轨迹点,进一步提升了用户轨迹恢复的精度。
在一种可能的设计中,根据该去噪轨迹点数据序列,恢复该待恢复轨迹的用户的轨迹,包括:基于该去噪轨迹点数据序列和预先训练好的用户-基站模型,确定该用户的基站粒度的去噪轨迹点数据序列;其中,该用户-基站模型的参数包括:N个基站之间的转移概率,其中,N为该原始轨迹点数据序列中包括的不同的基站的标识的数量;基于该基站粒度的轨迹点数据序列和预先训练好的基站-地理栅格模型,确定该用户的地理栅格粒度的去噪轨迹点数据序列;其中,该基站-地理栅格模型的参数包括:M个地理栅格之间的转移概率,以及每个地理栅格对该N个基站的输出概率,M为正整数;根据该地理栅格粒度的去噪轨迹点数据序列,恢复该待恢复轨迹的用户的轨迹。基于该方案,可以将去噪轨迹点数据序列中缺失的数据填补上,形成一条连续的用户轨迹,使得每个规整时间点上都含有用户的轨迹点,并且可以通过地理栅格的精度来控制用户轨迹恢复的精度,进一步提升了用户轨迹恢复的精度。
在一种可能的设计中,该原始轨迹点数据序列中的每个原始轨迹点数据还包括该原始轨迹点对应的基站的经度和纬度,以便于根据该基站的经度和纬度,获取地理栅格粒度的去噪轨迹点数据序列,进而根据该地理栅格粒度的去噪轨迹点数据序列,恢复该待恢复轨迹的用户的轨迹。当然,也可以根据原始轨迹点数据中的基站标识获取该基站标识对应的基站的经度和纬度,本申请对此不作具体限定。
在一种可能的设计中,基于该去噪轨迹点数据序列和预先训练好的用户-基站模型,确定该用户的基站粒度的去噪轨迹点数据序列,包括:根据该去噪轨迹点数据序列中的去噪轨迹点数据包含的规整时间点,确定该去噪轨迹点数据序列中缺失的轨迹点数据包含的规整时间点;根据该N个基站之间的转移概率以及第一预设公式,确定在该缺失的轨迹点数据包含的规整时间点上,该用户所处的基站分别对应该N个基站中的任意一个基站的情况下,由缺失的轨迹点数据包含的基站的标识所对应的基站和去噪轨迹点数据包含的基站的标识所对应的基站构成的第一完整路径的转移概率,其中,该第一预设公式包括:第一完整路径的转移概率=该第一完整路径上各个基站之间的转移概率的乘积;将该第一完整路径的转移概率最大的路径上的多个基站确定为该待恢复轨迹的用户在不同规整时间点上所处的基站;根据该待恢复轨迹的用户在不同规整时间点上所处的基站,确定该待恢复轨迹的用户的基站粒度的去噪轨迹点数据序列。基于该轨迹恢复方法恢复出的基站粒度的去噪轨迹点数据序列由于具备最高的状态转移概率,因此可以极大提升了用户轨迹恢复的精度。
在一种可能的设计中,基于该基站粒度的轨迹点数据序列和预先训练好的基站-地理栅格模型,确定该用户的地理栅格粒度的去噪轨迹点数据序列,包括:根据该基站粒度的去噪轨迹点数据序列的每个基站粒度的去噪轨迹点数据包含的规整时间点、基站的标识、M个地理栅格之间的转移概率,以及每个地理栅格对N个基站的输出概率,基于第二预设公式,确定在每个规整时间点,能输出该规整时间点对应的基站的所有地理栅格所连接的第二完整路 径的转移概率,该第二预设公式包括:P=Y
1,1*X
1,2*Y
2,2*......*X
r,r+1*Y
r+1,r+1......;其中,Y
r+1,r+1表示第二完整路径上第r+1个规整时间点上的地理栅格对第r+1个规整时间点上的基站的输出概率;X
r,r+1表示第二完整路径上第r个规整时间点上的地理栅格与第r+1个规整时间点上的地理栅格的转移概率;将第二完整路径的转移概率最大的路径上的多个地理栅格确定为待恢复轨迹的用户在不同规整时间点上所处的地理栅格;根据待恢复轨迹的用户在不同规整时间点上所处的地理栅格,确定待恢复轨迹的用户的地理栅格粒度的去噪轨迹点数据序列。基于该轨迹恢复方法恢复出的地理栅格粒度的去噪轨迹点数据序列由于具备最高的状态转移概率,并且是比基站粒度小的地理栅格粒度,因此可以极大提升用户轨迹恢复的精度。
在一种可能的设计中,该方法还包括:获取训练该用户-基站模型的多个第一训练数据,其中,该多个第一训练数据中的每个第一训练数据中均包括基站的标识和采集时间点;根据该每个第一训练数据,确定从该N个基站的任意一个基站分别转移至该N个基站中的任意一个基站的次数;根据该从该N个基站的任意一个基站分别转移至该N个基站中的任意一个基站的次数,基于第三预设公式,确定该N个基站之间的转移概率,其中,该第三预设公式包括:
其中,α(n1,n2)表示从第n1个基站转移至第n2个基站的次数,
表示从该第n1个基站转移至该N个基站的总次数,ω(n1,n2)表示从该第n1个基站转移至该第n2个基站的转移概率。基于该方案,可以训练用户-基站模型,获得用户-基站模型的参数:N个基站之间的转移概率。
在一种可能的设计中,该方法还包括:获取训练该基站-地理栅格模型的多个第二训练数据,其中,该多个第二训练数据中的每个第二训练数据中均包括基站的标识、采集时间点、用户的经度和纬度;根据该每个第二训练数据中的用户的经度和纬度,确定M个地理栅格的任意一个地理栅格与M个地理栅格的任意一个地理栅格的距离;根据M个地理栅格的任意一个地理栅格与M个地理栅格的任意一个地理栅格的距离,结合预设规则,确定M个地理栅格的任意一个地理栅格与M个地理栅格的任意一个地理栅格之间的转移概率;根据每个第二训练数据,确定M个地理栅格中的任意一个地理栅格对N个基站中的任意一个基站的输出次数;根据M个地理栅格中的任意一个地理栅格对所述N个基站中的任意一个基站的输出次数,基于第四预设公式,确定M个地理栅格中的任意一个地理栅格对N个基站中的任意一个基站的输出概率,其中,第四预设公式包括:
其中,α(m,n3)表示第m个地理栅格对第n3个基站的输出次数,
表示第m个地理栅格对N个基站的输出总次数,ω(m,n3)表示第m个地理栅格对第n3个基站的输出概率。基于该方案,可以训练基站-地理栅格模型,获得基站-地理栅格模型的参数:M个地理栅格之间的转移概率,以及每个地理栅格对N个基站的输出概率。
在一种可能的设计中,本申请中的用户-基站模型为马尔科夫模型。
在一种可能的设计中,本申请中的基站-地理栅格模型为隐马尔科夫模型。
第二方面,提供一种用户轨迹恢复装置,该用户轨迹恢复装置具有实现上述方法的功能。该功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。该硬件或软件包括一个或多个与上述功能相对应的模块。
第三方面,提供一种用户轨迹恢复装置,包括:处理器和通信接口;该处理器与该通信接口通过总线相连,该处理器用于实现上述第一方面及其可能实现中任一所述的用户轨迹恢复方法。
在一种可能的设计中,该用户轨迹恢复装置点还包括存储器,该存储器用于存储计算机程序指令,该处理器与该存储器通过该总线连接,该处理器执行该存储器存储的程序指令,以使该用户轨迹恢复装置执行如上述第一方面及其可能实现中任一所述的用户轨迹恢复方法。
第四方面,提供了一种计算机可读存储介质,用于储存为上述用户轨迹恢复装置所用的计算机程序指令,当其在计算机上运行时,使得计算机可以执行上述第一方面中任意一项的用户轨迹恢复方法。
第五方面,提供了一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机可以执行上述第一方面中任意一项的用户轨迹恢复方法。
其中,第二方面至第五方面中任一种设计方式所带来的技术效果可参见第一方面中不同设计方式所带来的技术效果,此处不再赘述。
图1为现有技术中离散轨迹点示意图;
图2为本申请实施例提供的用户轨迹恢复装置的结构示意图;
图3为本申请实施例提供的原始轨迹点数据序列示意图;
图4为本申请实施例提供的用户轨迹恢复方法的流程示意图;
图5为本申请实施例提供的二维矩阵示意图;
图6为本申请实施例提供原始轨迹点在二维矩阵的映射示意图;
图7为本申请实施例提供的去噪轨迹点所对应的基站在规整时间点上的分布示意图;
图8为本申请实施例提供的马尔科夫模型预测示意图一;
图9为本申请实施例提供的马尔科夫模型预测示意图二;
图10为本申请实施例提供的马尔科夫模型预测示意图三;
图11为本申请实施例提供的基站粒度的用户轨迹;
图12本申请实施例提供的地理栅格对基站的输出分布示意图;
图13为本申请实施例提供的隐马尔科夫模型的路径连接示意图;
图14为本申请实施例提供的用户轨迹恢复装置的硬件结构示意图。
为了方便表述,首先给出本申请下述实施例所涉及的一些关键术语的定义或解释如下:
A口信令数据:A口是指基站控制器(base station controller,BSC)和移动业务交换中心(mobile switching center,MSC)之间的接口。A口信令数据是指A口中记录的终端开机、关机、周期性位置更新、主被叫、短信收发和小区切换等数据。
CDR数据:记录用户在进行语音、短信和上网业务时的详细业务数据,包括通信号码、 通信时长和基站连接信息等。
MR数据:用户在发生通信行为时向运营商上报的用户标识、MR的记录时间、连接的基站的标识和连接的基站的信号强度等信息。
运营商管道(over the top,OTT)数据:OTT数据是指从运营商数据管道中获取的内容服务商(如微信、地图)的数据,比如OTT数据可以包括用户标识、记录时间和用户精确经纬度信息。其中,用户精确经纬度信息通常用地理栅格标识进行表征,地理栅格是将地图进行网格划分后得到的网格阵列中的一个子网格,通常大小为50*50米。
马尔科夫模型(Markov Model,MM):马尔科夫模型主要由状态的转移概率确定。本申请实施例中,状态指基站,状态的转移概率指用户的位置在基站与基站之间的转移概率。用户在每个离散时刻,只能位于其中的一个基站,在下一个时刻会根据该基站与其它基站之间的转移概率转移到另一个基站。其中,基站与基站之间的转移概率是根据用户的MR数据中的原始轨迹点数据序列或者恢复轨迹后的基站粒度的去噪轨迹点数据序列训练学习得到的。在将去噪轨迹点数据序列输入预先训练好的马尔科夫模型进行轨迹填充的时候,将填充后所获得的完整路径的转移概率最大的路径上的多个轨迹点数据组成的序列确定为待恢复轨迹的用户的基站粒度的去噪轨迹点数据序列。上述马尔科夫模型训练的过程和将去噪轨迹点数据序列输入预先训练好的马尔科夫模型进行轨迹恢复的过程将在下述实施例中详细阐述,此处不再赘述。
隐马尔科夫模型(Hidden Markov Model,HMM):隐马尔科夫模型是用来描述一个状态不可见,但观测值可见的马尔科夫过程,主要由状态的转移概率和状态对观测值的输出概率确定。本申请实施例中,状态指地理栅格,观测值指基站。状态的转移概率是指用户的位置在地理栅格与地理栅格之间的转移概率;状态对观测值的输出概率是指地理栅格对基站的输出概率。其中,本申请实施例中,地理栅格对基站的输出概率,具体是指用户位于当前地理栅格时,对周围所有可能连接到的基站的连接概率。也就是说,本申请实施例中,隐马尔科夫模型的参数为地理栅格与地理栅格之间的转移概率和每个地理栅格对基站的输出概率。用户在每个离散时刻,只能处于其中的一个地理栅格,每个地理栅格只能输出众多连接的基站中的一个。在下一个时刻会根据该地理栅格与其它地理栅格之间的转移概率转移到另外一个地理栅格,并且根据转移后的地理栅格对基站的输出概率输出另外一个基站。其中,用户的位置在地理栅格与地理栅格之间的转移概率和地理栅格对基站的输出概率是根据用户的MR数据中的原始轨迹点数据序列和OTT数据中的用户的经度和纬度训练学习得到的;或者用户的位置在地理栅格与地理栅格之间的转移概率和地理栅格对基站的输出概率是根据恢复轨迹后的地理栅格粒度的去噪轨迹点数据序列训练学习得到的。在将去噪轨迹点数据序列输入预先训练好的隐马尔科夫模型进行地理栅格预测的时候,用户可以根据外部观测到的基站序列,推算出内部经历的地理栅格序列,从而得到地理栅格粒度的去噪轨迹点数据序列。上述隐马尔科夫模型训练的过程和将去噪轨迹点数据序列输入预先训练好的隐马尔科夫模型进行地理栅格预测的过程将在下述实施例中详细阐述,此处不再赘述。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请的描述中,除非另有说明,“多个”的含义是两个或两个以上。
首先,给出本申请实施例的一个应用场景:
假设运营商在2016年08月24日10:00-18:00这一时间段内收到用户A上报的多个MR数据,每个MR数据中均包括用户A在发生通信行为时向运营商上报的用户标识、MR的记 录时间以及连接的基站的标识等,如表一所示。
表一
MR数据序列号 | 用户标识 | MR的记录时间 | 基站的标识 | ...... |
1 | 用户A | 2016-8-24 10:03:37 | 基站A | ...... |
2 | 用户A | 2016-8-24 10:14:25 | 基站B | ...... |
3 | 用户A | 2016-8-24 10:19:25 | 基站A | ...... |
4 | 用户A | 2016-8-24 10:39:10 | 基站C | ...... |
5 | 用户A | 2016-8-24 10:44:15 | 基站B | ...... |
6 | 用户A | 2016-8-24 10:56:03 | 基站C | ...... |
7 | 用户A | 2016-8-24 11:07:09 | 基站A | ...... |
8 | 用户A | 2016-8-24 11:15:16 | 基站A | ...... |
...... | ...... | ...... | ...... | ...... |
由表一可以看出,在2016年08月24日10:00-18:00这段时间内,用户A经常在连接的若干个基站内频繁切换,比如,从基站A切换至基站B,从基站B再切换回基站A,从基站A又切换至基站C,从基站C又切换至基站B,......,等等。由于频繁切换,因此会在用户A的轨迹点中形成噪声,从而在恢复用户A在2016年08月24日10:00-18:00这一时间段内的轨迹时,会导致难以定位用户A的位置,进而会对这一时间段内用户A的轨迹恢复造成影响。基于此问题,现有技术中提供了一种用户轨迹恢复的方案,该方案通过周期性的抽取用户A的轨迹点的方式恢复用户A的轨迹,比如,每隔一个MR数据抽取一次,则抽取结果如表二所示:
表二
MR数据序列号 | 用户标识 | MR的记录时间 | 连接的基站的标识 |
1 | 用户A | 2016-8-24 10:03:37 | 基站A |
3 | 用户A | 2016-8-24 10:19:25 | 基站A |
5 | 用户A | 2016-8-24 10:44:15 | 基站B |
7 | 用户A | 2016-8-24 11:07:09 | 基站A |
...... | ...... | ...... | ...... |
由表二可以看出,这可以在一定程度上减少用户A的轨迹点中的噪声,比如,用户A在第1个轨迹点(MR数据1对应的轨迹点)和第2个轨迹点(MR数据3对应的轨迹点)之间并未发生切换,连接的基站均是基站A。但是该方案去噪精度较低,比如,从第3个轨迹点(MR数据5对应的轨迹点)开始,用户A仍需要从基站A切换至基站B,从基站B再切换回基站A,......,因此,如何提高用户轨迹恢复时的去噪精度,是目前亟待解决的问题。
为解决该问题,本申请实施例提供用户轨迹恢复的方法和装置,可以提高用户轨迹恢复时的去噪精度,具体方案如下:
如图2所示,为本申请实施例提供的用户轨迹恢复装置的结构示意图。该用户轨迹恢复装置20包括获取模块201、去噪模块202、轨迹恢复模块203以及存储模块204。
用户轨迹恢复装置的获取模块201从运营商侧获取用户原始轨迹点数据序列,这些原始轨迹点数据序列包含有噪声(基站频繁切换),并且在时间上是不连续的(某些时间段有许 多数据,某些时间段没有数据)。去噪模块202用于将原始轨迹点数据序列中的噪声去除,获得去噪轨迹点数据序列,所述去噪轨迹点数据序列拥有规整的时间点信息(如每10分钟一个轨迹点)。但是去噪轨迹点数据序列仍然是不连续的,很多规整时间点上的数据是缺失的。轨迹恢复模块203用于将去噪轨迹点数据序列中缺失的数据填补上,形成一条连续的用户轨迹,使得每个规整时间点上都含有用户的轨迹点,称为用户最终轨迹点。其中,轨迹恢复模块203中的用户-基站模型可以恢复出基站粒度的轨迹点,基站-地理栅格模型可以恢复出栅格粒度的轨迹点,这些经过恢复的最终轨迹点会存入用户轨迹数据库,以支撑后续的服务和营销活动。具体的,
获取模块201,用于从运营商原始数据中获取待恢复轨迹的用户的原始轨迹点数据序列,该数据序列中的每个原始轨迹点数据均包括该原始轨迹点对应的基站的标识和采集时间点。
比如,在上述应用场景下,获取模块201可以根据表一所示的用户A上报的多个MR数据,获取用户A的多个原始轨迹点数据序列,其中,该数据序列中的每个原始轨迹点数据均包括该原始轨迹点对应的基站的标识和采集时间点。可选的,该数据序列中的每个原始轨迹点数据还可以包括该原始轨迹点对应的基站的经度和纬度,或者获取模块201可以根据原始轨迹点数据中的基站标识获取该基站标识对应的基站的经度和纬度。其中,如图3所示,这些原始轨迹点数据按照时间排序,分别是:
第1个原始轨迹点数据x
1=[基站A,10:03:37,121.46472,31.08572];
第2个原始轨迹点数据x
2=[基站B,10:14:25,121.46253,31.08744];
第3个原始轨迹点数据x
3=[基站A,10:19:25,121.46472,31.08572];
第4个原始轨迹点数据x
4=[基站C,10:39:10,121.46752,31.08572];等等,在此不再一一列举。
由图3中按照时间排序后的不同轨迹点数据包含的基站的标识或者结合上述应用场景的描述可知,用户A在待恢复轨迹的时间段内频繁切换,因此会在用户A的轨迹点中形成噪声。为了避免噪声对用户A这一时间段内的轨迹恢复造成影响,需要经过去噪模块202的去噪。
去噪模块202,用于利用最优化技术,将原始轨迹点数据序列中的噪声去除,获得去噪轨迹点数据序列,其中,原始轨迹点数据序列在去噪后所付出的时空代价最小。具体去噪方案将在下述实施例中描述,在此不再赘述。该去噪模块202可以极大地提升去噪后的数据质量和精度。
可选的,考虑到经过去噪模块202去噪后得到的去噪轨迹点数据序列虽然拥有规整的时间点信息(如每10分钟一个轨迹点),但是去噪轨迹点数据序列仍然是不连续的,很多规整时间点上的数据是缺失的,为了进一步提高用户轨迹恢复的精度,需要经过轨迹恢复模块203的轨迹恢复。
轨迹恢复模块203,用于根据去噪轨迹点数据序列,恢复用户轨迹,得到最终轨迹点数据序列。如图2所示,本申请实施例中的轨迹恢复模块203中可以包括两个轨迹恢复模型,一个是用户-基站模型,输入为去噪轨迹点数据序列,输出为基站粒度的去噪轨迹点数据序列;一个是基站-地理栅格模型,输入为基站粒度的去噪轨迹点数据序列,输出为地理栅格粒度的去噪轨迹点数据序列,该地理栅格粒度的去噪轨迹点数据序列可作为最终轨迹点数据序列。具体轨迹恢复方案将在下述实施例中描述,在此不再赘述。该轨迹恢复模块203可以将去噪轨迹点数据序列中缺失的数据填补上,形成一条连续的用户轨迹,使得每个规整时间 点上都含有用户的轨迹点,极大提升了用户轨迹恢复的精度。
需要说明的是,本申请实施例提供的轨迹恢复模块203中,也可以仅包含上述的用户-基站模型,该用户-基站模型输出的基站粒度的去噪轨迹点数据序列作为最终轨迹点数据序列,本申请实施例对此不作具体限定。
其中,本申请实施例中的用户-基站模型具体可以是上述的马尔科夫模型;本申请实施例中的基站-地理栅格模型具体可以是上述的隐马尔科夫模型。
可选的,考虑到采用基站粒度的去噪轨迹点数据序列训练用户-基站模型,采用地理栅格粒度的去噪轨迹点数据序列训练基站-地理栅格模型,会使得模型的训练结果更加准确,因此,本申请实施例提供的用户轨迹恢复装置20还包括存储模块204。
存储模块204,用于将基站粒度的去噪轨迹点数据序列和地理栅格粒度的去噪轨迹点数据序列存储在存储模块204的用户轨迹数据库中。其中,基站粒度的去噪轨迹点数据序列可用于训练用户-基站模型;地理栅格粒度的去噪轨迹点数据序列可用于训练基站-地理栅格模型。具体的模型训练方案将在下述实施例中描述,此处不再赘述。另外,最终轨迹点数据序列还用于支撑后续的服务和营销活动。
下面将结合图2所述的用户轨迹恢复装置,对本申请实施例提供的用户轨迹恢复方法进行阐述。如图4所示,为本申请实施例提供的用户轨迹恢复的一种可能的实现方法,包括如下步骤S401-S403:
S401、用户轨迹恢复装置获取待恢复轨迹的用户的原始轨迹点数据序列。
其中,图2中的获取模块201用于支持用户轨迹恢复装置执行本申请实施例中的步骤S401,相关描述可参考获取模块201部分,在此不再赘述。
S402、用户轨迹恢复装置基于映射模型和原始轨迹点数据序列,在总体映射代价最小的情况下,确定在规整时间点上,用户所处的基站的标识,得到用户的去噪轨迹点数据序列。
其中,该去噪轨迹点数据序列中的每一个去噪轨迹点数据包括该去噪轨迹点数据对应的基站的标识和规整时间点。该规整时间点为固定时间间隔的时间点。该映射模型的限定条件包括:一个原始轨迹点数据映射到一个规整时间点的一个基站的标识上,以及在同一个规整时间点上多个原始轨迹点数据最多映射到一个基站的标识上。
由于每一个去噪轨迹点数据由该去噪轨迹点数据对应的基站的标识和规整时间点两个元素组成,因此,一个原始轨迹点数据映射到一个规整时间点的一个基站的标识上,实际上是指,一个原始轨迹点数据只能与一个去噪轨迹点数据对应,在此进行统一说明,以下不再赘述。
一种可能的实现方式中,在步骤S402之前,还包括:用户轨迹恢复装置根据步骤401中获取的原始轨迹点数据序列中的基站的标识以及规整时间点创建二维矩阵。
其中,该二维矩阵中第n行第p列的矩阵元素y
n,p用(第n个基站的标识,第p个规整时间点)表示,n和p均为正整数。
示例性的,假设数据序列如图3所示,数据序列中不同的基站标识的数量为8,规整时间点分别为9:30:00、10:00:00、10:30:00、11:00:00、11:30:00、12:00:00、12:30:00、01:00:00、01:30:00、02:00:00等共计10个时间点,则根据数据序列中的每个原始轨迹点对应的基站的标识以及规整时间点创建的二维矩阵可以如图5所示。其中,该二维矩阵中第n行第p列的矩阵元素y
n,p用(第n个基站的标识,第p个规整时间点)表示。比如,若第一个基站的标识为基站A,第一个规整时间点为9:30:00,则y
1,1=(基站A,9:30:00)。需要说明的是, 这里为了方便表示,将二维矩阵以二维网格的方式呈现,其中,二维网格中的每个子网格代表二维矩阵中的一个矩阵元素,在此进行统一说明,以下不再赘述。
其中,本申请实施例中的规整时间点为固定时间间隔的时间点,其可以是提前预设好的,也可以是根据采集的原始轨迹点数据序列中的采集时间点确定的,比如预设时间段为9:00:00-14:00:00,预设时间间隔为30分钟,则将预设时间段根据预设时间间隔进行分割后可得到如图5中所示的规整时间点,本申请实施例对规整时间点的配置方式不作具体限定。
需要说明的是,图5所示的二维矩阵中,以原始轨迹点数据序列中的所有基站的标识列表为二维矩阵的纵轴,规整时间点列表为二维矩阵的横轴为例进行说明,当然,也可以以原始轨迹点数据序列中的所有基站的标识列表为二维矩阵的横轴,规整时间点列表为二维矩阵的纵轴,本申请实施例对此不作具体限定。
用户轨迹恢复装置基于映射模型和原始轨迹点数据序列,在总体映射代价最小的情况下,确定在规整时间点上用户所处的基站标识,得到用户的去噪轨迹点数据序列,具体可以是:用户轨迹恢复装置基于映射模型和原始轨迹点数据序列,确定将每个原始轨迹点数据分别映射到二维矩阵中时,在总体映射代价最小的情况下,每个原始轨迹点数据对应的矩阵元素,得到用户的去噪轨迹点数据序列。
由于二维矩阵中第n行第p列的矩阵元素y
n,p用(第n个基站的标识,第p个规整时间点)表示,因此当将每个原始轨迹点数据分别映射到二维矩阵中时,上述的映射模型的限定条件具体可以是:一个原始轨迹点数据映射到一个矩阵元素上,以及二维矩阵的每一列最多有一个矩阵元素与原始轨迹点数据有映射关系。
具体的,一种可能的实现方式中,用户轨迹恢复装置基于映射模型和原始轨迹点数据序列,确定将每个原始轨迹点数据分别映射到二维矩阵中时,在总体映射代价最小的情况下,每个原始轨迹点数据对应的矩阵元素,具体可以包括:根据预先设定的最优化目标函数和约束条件,确定将每个原始轨迹点数据分别映射到二维矩阵中时,在总体映射代价最小的情况下,每个原始轨迹点数据映射到每个矩阵元素上的权重;对于原始轨迹点数据x
t,确定f
t,n,p=1时对应的矩阵元素,y
n,p为原始轨迹点数据x
t对应的矩阵元素,其中,f
t,n,p表示x
t映射到y
n,p的权重;其中,预先设定的最优化目标函数如公式(1)所示:
根据公式(1),总体映射代价最小时对应的f
t,n,p如公式(2)所示:
其中,N为原始轨迹点数据序列中包括的不同的基站的标识的数量;P为规整时间点的数量;T为原始轨迹点数据的数量;WORK(X,Y,F)表示以f
t,n,p的权重进行映射,所需要付出的总体代价;
表示使得WORK(X,Y,F)最小,
表示对变量k从1取值到K求和,d
t,n,p表示x
t映射到y
n,p的代价。
其中,M为极大数;a
n,p表示可映射到y
n,p的原始轨迹点数据的数量;b
n,p表示y
n,p是否与原始轨迹点数据有映射关系,若有,b
n,p=1;若没有,b
n,p=0;
表示对于任意的n和p。
其中,f
t,n,p∈{0,1}约束x
t映射到y
n,p的权重为0或者1,若有映射,f
t,n,p=1;若没有,f
t,n,p=0。
a
n,p∈{1,2,......,T}约束可映射到y
n,p的原始轨迹点数据的数量为0到T之间的整数。
b
n,p∈{0,1}约束y
n,p是否与原始轨迹点数据有映射关系,若有,b
n,p=1;若没有,b
n,p=0。
在上述约束条件的限制下,求解公式(1)和公式(2)即可确定在总体映射代价最小的情况下,每个原始轨迹点数据映射到每个矩阵元素上的权重。进而,对于原始轨迹点数据x
t,确定f
t,n,p=1时对应的矩阵元素y
n,p为原始轨迹点数据x
t对应的矩阵元素。
比如,根据上述方式,将图3所示的原始轨迹点数据映射到图5所示的二维矩阵中,结果可以如图6所示。其中,
原始轨迹点数据x
1对应的矩阵元素为y
6,1=(基站F,9:30:00);
原始轨迹点数据x
2对应的矩阵元素为y
4,2=(基站D,10:00:00);
原始轨迹点数据x
3对应的矩阵元素为y
4,2=(基站D,10:00:00);
原始轨迹点数据x
4对应的矩阵元素为y
2,3=(基站B,10:30:00);等等,在此不再一一列举。
可选的,这里示例性的给出一种x
t映射到y
n,p的代价d
t,n,p的计算方式,如公式(3)所示:
d
t,n,p=w
t·δ
(t)(x
t,y
n,p)+w
s·δ
(s)(x
t,y
n,p); 公式(3)
其中,w
s表示空间代价参数,w
t表示时间代价参数,分别用于调整时间和空间的权重,默认均为1;δ
(t)(x
t,y
n,p)表示时间距离;δ
(s)(x
t,y
n,p)表示空间距离。
其中,δ
(t)(x
t,y
n,p)和δ
(s)(x
t,y
n,p)的计算方式分别如公式(4)和公式(5)所示:
δ
(t)(x
t,y
n,p)=|x
t[t]-y
n,p[p]|; 公式(4)
δ
(s)(x
t,y
n,p)=|x
t[lon]-y
n,p[lon]|+|x
t[lat]-y
n,p[lat]|; 公式(5)
其中,[t]和[p]均表示时间点;[lon]表示经度;[lat]表示纬度。
需要说明的是,公式(5)中所需的基站的经度和纬度信息可以是步骤S401中获取的原始轨迹点数据序列中的原始轨迹点数据的一部分,也可以是基于步骤S401中获取的原始轨迹点数据序列中的原始轨迹点数据中的基站的标识获取的,比如,用户轨迹恢复装置中预先存储了基站的标识与基站的经度和纬度的对应关系,在获取基站的标识之后,可以根据该对应关系确定基站的经度和纬度。本申请实施例对基站的经度和纬度的获取方式不作具体限定。
由于该计算方式中利用时间和空间代价作为优化目标,因此可以使得原始轨迹点数据序列在去噪后所付出的时间和空间代价最小。
其中,上述步骤S402给出了本申请实施例提供的去噪方案,由图2中的去噪模块202支持用户轨迹恢复装置执行该步骤。由图3可以看出,用户A的原始轨迹点数据的采集时间点属于不规整的时间点,也就是说,相邻两个原始轨迹点数据的采集时间点的时间间隔并不固定;而由图6可以看出,在采用本申请实施例提供的去噪方案去噪后,用户A的去噪轨迹点数据(也就是矩阵元素)的时间点属于规整时间点,也就是说,相邻两个去噪轨迹点数据中的时间点的时间间隔是固定的。由图3可以看出,用户A在待恢复轨迹的时间段内频繁切换;而由图6可以看出,在采用本申请实施例提供的去噪方案去噪后,可以将用户A的频繁切换的基站映射到一个规整时间点的唯一的基站上。由于去除了用户A在不同基站之间来回切换的情况,因此,本申请实施例提供的去噪方案可以极大地提升去噪后的数据质量和精度。并且,由于本申请实施例在去噪时采用了最优化技术,因此可以使得原始轨迹点数据序列在去噪后所付出的代价最小。
S403、用户轨迹恢复装置根据去噪轨迹点数据序列,恢复待恢复轨迹的用户的轨迹。
其中,在用户轨迹恢复装置获取去噪轨迹点数据序列之后,可以根据该去噪轨迹点数据序列,采用现有的轨迹恢复方式获取待恢复轨迹的用户的轨迹,也可以采用上述图2所示的轨迹恢复模块203恢复待恢复轨迹的用户的轨迹,本申请实施例对此不作具体限定。其中,采用上述图2所示的轨迹恢复模块203恢复待恢复轨迹的用户的轨迹的具体方案将在下述实施例中描述,在此不再赘述。
本申请实施例提供的用户轨迹恢复方法,可以基于映射模型和原始轨迹点数据序列,在总体映射代价最小的情况下,确定在规整时间点上用户所处的基站的标识,得到用户的去噪轨迹点数据序列。由于规整时间点属于固定时间间隔的时间点;并且映射模型的限定条件包括:一个原始轨迹点数据映射到一个规整时间点的一个基站的标识上,以及在同一个规整时间点上多个原始轨迹点数据最多映射到一个基站的标识上。也就是说,可以将用户的频繁切换的基站映射到一个规整时间点的唯一的基站上。因此基于该方案,将去除用户在不同基站之间来回切换的情况,从而可以极大地提升轨迹恢复时去噪后的数据质量和精度。并且,由 于本申请实施例在去噪时采用了最优化技术,因此可以使得原始轨迹点数据序列在去噪后所付出的代价最小。
如图6所示,考虑到去噪之后,在时间维度上会有很多空缺值。比如11:00:00、11:30:00、12:00:00、12:30:00……等规整时间点上都没有原始轨迹点数据映射,若采用现有的轨迹恢复方式获取待恢复轨迹的用户的轨迹,精度还不是很高,因此,本申请实施例还提供一种根据上述去噪轨迹点数据进行用户轨迹恢复的方式,即采用上述图2所示的轨迹恢复模块203恢复待恢复轨迹的用户的轨迹,通过将规整时间点上缺失的数据填补上,形成一条连续的用户轨迹来进行轨迹恢复,进一步极大提升了用户轨迹恢复的精度。
下面将给出采用上述图2所示的轨迹恢复模块203恢复待恢复轨迹的用户的轨迹的具体实现。这里以用户-基站模型为马尔科夫模型,基站-地理栅格模型为隐马尔科夫模型为例来进行说明。首先,给出马尔科夫模型和隐马尔科夫模型的训练过程如下。
马尔科夫模型(MM)训练过程:
假设有N个基站,分别对应马尔科夫模型中的N个状态。则马尔科夫模型训练过程中所需要确定的参数就是这N个基站之间的转移概率。其中,这N个基站之间的转移概率可以用一个N*N的矩阵来表达。具体训练过程可以如下:
将多个训练数据基于训练数据中包含的时间点进行排序之后,可以根据训练数据中包含的基站的标识确定从这N个基站中的任意一个基站分别转移至这N个基站中的任意一个基站的次数;进而,可以基于公式(6)计算这N个基站之间的转移概率。
其中,α(n1,n2)表示训练数据中从第n1个基站转移至第n2个基站的次数,
表示训练数据中从第n1个基站转移至这N个基站的总次数,ω(n1,n2)表示训练数据中从第n1个基站转移至第n2个基站的转移概率。
示例性的,假设有A、B、C三个基站(即N=3),分别对应马尔科夫模型中的三个状态。则马尔科夫模型训练过程中所需要确定的参数就是A、B、C这三个基站之间的转移概率。其中,A、B、C这三个基站之间的转移概率可以用一个3*3的矩阵来表达。假设在训练数据中,A、B、C这三个基站之间的转移一共发生了600次,转移次数如表三所示:
表三
则对于每一个起始状态,基于公式(6)将目标状态进行归一化,就完成了训练过程, 可以得到如表四所示的基站与基站之间的转移概率:
表四
需要说明的是,上述马尔科夫模型训练过程中使用的训练数据可以是用户的MR数据中的原始轨迹点数据序列,也可以是经过轨迹恢复后的基站粒度的去噪轨迹点数据序列,比如经过图2所示的轨迹恢复模块203恢复后得到的基站粒度的去噪轨迹点数据序列,本申请实施例对此不作具体限定。考虑到采用基站粒度的去噪轨迹点数据序列训练马尔科夫模型,会使得模型的训练结果更加准确,因此,通常在获得一些经过轨迹恢复后的基站粒度的去噪轨迹点数据序列之后,可以将这些基站粒度的去噪轨迹点数据序列存储在图2所示的存储模块204的用户轨迹数据库中,以用于后续不断更新该马尔科夫模型,使得训练结果更加准确,进而后续进行轨迹恢复时的精度也更高。
隐马尔科夫模型(HMM)训练过程:
假设有M个地理栅格,分别对应隐马尔科夫模型中的M个状态;有N个基站,每个地理栅格会对这N个基站有输出概率。则隐马尔科夫模型训练过程中所需要确定的参数就是这M个地理栅格之间的转移概率,以及每个地理栅格对这N个基站的输出概率。其中,这M个地理栅格之间的转移概率可以用一个M*M的矩阵来表达;每个地理栅格对这N个基站的输出概率可以用一个M*N的矩阵来表达。具体训练过程如下:
将多个训练数据基于训练数据中包含的时间点进行排序之后,可以根据训练数据中包含的用地理栅格表征的用户的经度和纬度,确定M个地理栅格的任意一个地理栅格与M个地理栅格的任意一个地理栅格的距离;进而,根据预设规则确定M个地理栅格的任意一个地理栅格与M个地理栅格的任意一个地理栅格之间的转移概率,比如该预设规则可以是地理栅格与地理栅格之间的转移概率和地理栅格与地理栅格之间的距离成高斯分布。
将多个训练数据基于训练数据中包含的时间点进行排序之后,可以根据训练数据中包含的用地理栅格表征的用户的经度和纬度以及基站的标识,分别确定这M个地理栅格中的任意一个地理栅格对这N个基站中的任意一个基站的输出次数;进而,可以基于公式(7)计算M个地理栅格中的任意一个地理栅格对N个基站中的任意一个基站的输出概率。
示例性的,假设有I、II、III三个地理栅格(即M=3),分别对应隐马尔科夫模型中的三个状态;有A、B两个基站(即N=2),每个地理栅格会对这两个基站有输出概率。则隐马 尔科夫模型训练过程中所需要确定的参数就是I、II、III这三个地理栅格之间的转移概率,以及每个地理栅格对A、B这两个基站的输出概率。其中,I、II、III这三个地理栅格之间的转移概率可以用一个3*3的矩阵来表达;每个栅格对A、B这两个基站的输出概率可以用一个3*2的矩阵来表达。在隐马尔科夫模型中,地理栅格与地理栅格之间的转移概率可以根据规则来指定,比如地理栅格与地理栅格之间的转移概率和地理栅格与地理栅格之间的距离成高斯分布。假设地理栅格与地理栅格之间的距离(以米为单位)如表五所示:
表五
高斯分布的定义如公式(8)所示,由两个参数(μ,σ)决定:
假设均值μ=0,标准差σ=50,则将上述表五中的每一个距离分别带入公式(8),可得如表六所示的地理栅格与地理栅格之间的转移概率:
表六
由于表六中地理栅格与地理栅格之间的转移概率不是归一化的转移概率,因此对于每一个起始状态,将目标状态进行归一化,可以得到如表七所示的地理栅格与地理栅格之间的转移概率。
表七
假设在训练数据中,I、II、III这三个地理栅格对A、B这两个基站的输出次数如表八所示:
表八
则对于每一个地理栅格,基于公式(7)将该地理栅格输出的基站进行归一化,就完成了训练过程,可以得到如表九所示的地理栅格对基站的输出概率:
表九
需要说明的是,上述隐马尔科夫模型训练过程中使用的训练数据可以是用户的MR数据中的原始轨迹点数据序列和与该MR数据对应的OTT数据,也可以是恢复轨迹后的地理栅格粒度的去噪轨迹点数据,比如经过图2所示的轨迹恢复模块203恢复后得到的地理栅格粒度的去噪轨迹点数据序列,本申请实施例对此不作具体限定。其中,MR数据中的原始轨迹点数据中包括用户的标识,用户在发生当前通信行为时所连接的基站的标识和MR数据的采集时间点;OTT数据包括用户的标识,采集时间点和用户的经度和纬度。MR数据与OTT数据通过用户的标识和采集时间点进行关联。考虑到采用地理栅格粒度的去噪轨迹点数据序列训练隐马尔科夫模型,会使得模型的训练结果更加准确,因此,通常在获得一些经过轨迹恢复后的地理栅格粒度的去噪轨迹点数据序列之后,可以将这些地理栅格粒度的去噪轨迹点数据序列存储在图2所示的存储模块204的用户轨迹数据库中,以用于后续不断更新该隐马尔科夫模型,使得训练结果更加准确,进而后续进行轨迹恢复时的精度也更高。
在根据上述方法训练马尔科夫模型和隐马尔科夫模型之后,可以将上述训练好的马尔科夫模型和隐马尔科夫模型存储在图2所示的轨迹恢复模块203中,进而,在去噪模块202输出去噪轨迹点数据序列之后,该轨迹恢复模块203可以基于训练好的马尔科夫模型和隐马尔科夫模型恢复用户轨迹,该恢复用户轨迹的过程也就是将上述规整时间点上缺失的数据填补上,形成一条连续的用户轨迹的过程,具体介绍如下。
首先,给出基于马尔科夫模型(MM)的轨迹恢复过程:
其中,马尔科夫模型的输入为:去噪模块202输出的去噪轨迹点数据序列。
马尔科夫模型的输出为:基站粒度的去噪轨迹点数据序列。
马尔科夫模型的轨迹恢复过程包括:获取去噪模块202输出的去噪轨迹点数据序列中的去噪轨迹点数据包含的基站的标识和规整时间点;根据规整时间点,确定去噪轨迹点数据序列中缺失的轨迹点数据包含的规整时间点;根据马尔科夫模型的参数,也就是N个基站之间的转移概率,结合公式(9),确定在缺失的轨迹点数据包含的规整时间点上,用户所处的基站分别对应N个基站中的任意一个基站的情况下,由缺失的轨迹点数据包含的基站的标识所对应的基站和去噪轨迹点数据包含的基站的标识所对应的基站构成的第一完整路径的转移概率,并将第一完整路径的转移概率最大的路径上的多个基站确定为待恢复轨迹的用户在不 同规整时间点上所处的基站;进而,根据待恢复轨迹的用户在不同规整时间点上所处的基站,确定待恢复轨迹的用户的基站粒度的去噪轨迹点数据序列。
第一完整路径的转移概率=第一完整路径上各个基站之间的转移概率的乘积公式(9)
示例性的,假设马尔科夫模型中基站与基站之间的转移概率如表四所示,并且假设经过去噪模块202去噪后获得的去噪轨迹点所对应的基站在规整时间点上的分布示意图如图7所示,分别为:在时刻0对应一个去噪轨迹点数据,该去噪轨迹点数据中基站的标识为基站A,在时刻1和时刻2轨迹点数据缺失,在时刻3对应一个去噪轨迹点数据,该去噪轨迹点数据中基站的标识为基站C,则可以通过如下方式进行轨迹恢复,得到基站粒度的去噪轨迹点数据序列:
首先,根据表四,从时刻0开始,从每个可能的状态进行状态转移。比如,在表四中,当起始状态为基站A时,从基站A转移至基站A的转移概率为0.8,从基站A转移至基站B的转移概率为0.2,从基站A转移至基站C的转移概率为0。因此,从时刻0开始,从每个可能的状态进行状态转移之后,结果如图8所示。即,从时刻0开始,有两个分支:一个是在时刻1转移至基站A,一个是在时刻1转移至基站B,转移概率分别为0.8和0.2。
其次,根据表四,从时刻1开始,从每个可能的状态再次进行状态转移,结果如图9所示,具体可参考在时刻0进行状态转移的过程,此处不再赘述。
然后,再根据表四,从时刻2开始从每个可能的状态再次进行状态转移,转移至基站C,结果如图10所示,具体可参考在时刻0进行状态转移的过程,此处不再赘述。
最后,对于图10所示的从基站A到基站C的每一条路径,将该路径上所有的转移概率相乘,得到一个转移概率列表,如表十所示:
表十
转移路径 | 基站A到基站C的转移概率 |
基站A-基站A-基站A-基站C | 0.8×0.8×0=0 |
基站A-基站A-基站B-基站C | 0.8×0.2×0.4=0.064 |
基站A-基站B-基站A-基站C | 0.2×0.12×0=0 |
基站A-基站B-基站B-基站C | 0.2×0.48×0.4=0.0384 |
基站A-基站B-基站C-基站C | 0.2×0.4×0.6=0.048 |
进而,可以将转移概率最大的一条路径上的基站确定为用户A在不同规整时间点上所处的基站。比如,由表十可以得出,当转移路径为基站A-基站A-基站B-基站C时,从基站A到基站C的转移概率最大,为0.064,因此可以确定用户A在时刻1对应的基站为基站A,在时刻2对应的基站为基站B,恢复轨迹后的基站粒度的用户轨迹如图11所示。
基于该轨迹恢复方法恢复出的基站粒度的去噪轨迹点数据序列由于具备最高的状态转移概率,因此可以极大提升了用户轨迹恢复的精度。
其次,给出基于隐马尔科夫模型(HMM)的轨迹恢复过程:
隐马尔科夫模型的输入为:马尔科夫模型输出的基站粒度的去噪轨迹点数据序列。
隐马尔科夫模型的输出为:地理栅格粒度的去噪轨迹点数据序列。
隐马尔科夫模型的轨迹恢复过程包括:获取马尔科夫模型输出的基站粒度的去噪轨迹点数据序列中的去噪轨迹点数据包含的基站的标识和规整时间点;根据每个基站的标识、规整时间点以及隐马尔科夫模型中M个地理栅格之间的转移概率和每个地理栅格对N个基站的 输出概率,结合公式(10),确定在每个规整时间点能输出该规整时间点对应的基站的所有地理栅格所连接的第二完整路径的转移概率,并将第二完整路径的转移概率最大的路径上的多个地理栅格确定为待恢复轨迹的用户在不同规整时间点上所处的地理栅格后,根据待恢复轨迹的用户在不同规整时间点上所处的地理栅格,确定待恢复轨迹的用户的地理栅格粒度的去噪轨迹点数据序列。
P=Y
1,1*X
1,2*Y
2,2*......*X
r,r+1*Y
r+1,r+1......; 公式(10)
其中,Y
r+1,r+1表示第二完整路径上第r+1规整个时间点上的地理栅格对第r+1个规整时间点上的基站的输出概率;X
r,r+1表示第二完整路径上第r个规整时间点上的地理栅格与第r+1个规整时间点上的地理栅格的转移概率。
示例性的,假设隐马尔科夫模型中地理栅格与地理栅格的转移概率如表七所示,地理栅格对基站的输出概率如表九所示。并且假设地理栅格对基站的输出分布如图12所示,分别为:在时刻0和时刻1输出基站A,在时刻2和时刻3输出基站B,则可以通过如下方式预测出用户A在不同的规整时间点上所处的地理栅格编号,进而得到地理栅格粒度的去噪轨迹点数据序列:
首先,根据表九,从时刻0开始,将能输出这些观测值的所有可能状态(即地理栅格)分别列举出来,并连接出所有可能的状态转移路径,具体如图13所示。则经过所有这些可能的路径,都可能输出图12中的观测值序列。
其次,分别计算每个完整路径的转移概率,这里以图13中的路径1为例进行说明,则根据公式(10)可得路径1上的转移概率为:
P=Y
I,A*X
I,II*Y
II,A*X
II,III*Y
III,B*X
III,III*Y
III,B
=0.23*0.31*0.22*0.28*0.19*0.39*0.19=6.2*10
-5
类似的,根据公式(10)可以计算出所有可能路径上的转移概率,进而可以确定所有可能的路径中转移概率最大的一条路径,将该路径上的地理栅格的标识确定为用户A在不同规整时间点上所处的地理栅格的标识。比如,若概率最大的一条路径为:III地理栅格-III地理栅格-II地理栅格-I地理栅格,则可以确定用户A在时刻0所处的地理栅格为III地理栅格,在时刻1所处的地理栅格为III地理栅格,在时刻2所处的地理栅格为II地理栅格,在时刻3所处的地理栅格为I地理栅格,进而可以得到地理栅格粒度的用户轨迹。
基于该轨迹恢复方法恢复出的地理栅格粒度的去噪轨迹点数据序列由于具备最高的状态转移概率,并且是比基站粒度小的地理栅格粒度,因此可以极大提升用户轨迹恢复的精度。
上面各实施例主要结合图2所示的用户轨迹恢复装置对本申请实施例提供用户轨迹恢复方法进行了介绍。可以理解的是,上述用户轨迹恢复装置为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
如图14所示,为本申请实施例提供的一种用户轨迹恢复装置1400的硬件结构示意图,该用户轨迹恢复装置1400包括处理器1401,通信总线1402以及通信接口1404。
处理器1401可以是一个通用处理器,例如中央处理器(Central Processing Unit,CPU),网络处理器(network processor,NP),或CPU和NP的组合;处理器1401也可以是微处 理器(MCU),专用集成电路(application-specific integrated circuit,ASIC),可编程逻辑器件(programmable logic device,PLD)或其组合。上述PLD可以是复杂可编程逻辑器件complex programmable logic device,CPLD),现场可编程逻辑门阵列(field-programmable gate array,FPGA),通用阵列逻辑(英文:generic array logic,缩写:GAL)或其任意组合。
通信总线1402可包括一通路,在上述组件之间传送信息。
通信接口1404,使用任何收发器一类的装置,用于与其他设备或通信网络通信,可以包括以太网接口,无线接入网(Radio Access Network,RAN)接口,无线局域网(Wireless Local Area Networks,WLAN)接口等。
可选的,如图14所示,用户轨迹恢复装置1400还可以包括存储器1403,存储器1403可以包括易失性存储器(英文:volatile memory),例如随机存取存储器(英文:random-access memory,缩写:RAM);存储器也可以包括非易失性存储器(英文:non-volatile memory),例如快闪存储器(英文:flash memory),硬盘(英文:hard disk drive,缩写:HDD)或固态硬盘(英文:solid-state drive,缩写:SSD);存储器1403还可以包括上述种类的存储器的组合。
其中,存储器1403用于存储程序代码。处理器1401用于执行存储器1403中存储的程序代码,从而实现图4所述的用户轨迹恢复方法。
在具体实现中,处理器1401可以包括一个或多个CPU,例如图14中的CPU0和CPU1。CPU可以是一个单核,也可以是多核。
在具体实现中,作为一种实施例,用户轨迹恢复装置1400还可以包括输出设备1405和输入设备1406。输出设备1405和处理器1401通信,可以以多种方式来显示信息。例如,输出设备1405可以是液晶显示器(Liquid Crystal Display,LCD),发光二级管(Light Emitting Diode,LED)显示设备,阴极射线管(Cathode Ray Tube,CRT)显示设备,或投影仪(projector)等。输入设备1406和处理器1401通信,可以以多种方式接受用户的输入。例如,输入设备1406可以是鼠标、键盘、触摸屏设备或传感设备等。
上述的用户轨迹恢复装置1400可以是一个通用计算机设备或者是一个专用计算机设备。在具体实现中,用户轨迹恢复装置1400可以是台式机、便携式电脑、网络服务器、掌上电脑(Personal Digital Assistant,PDA)、移动手机、平板电脑、无线终端设备、嵌入式设备或有图14中类似结构的设备。本申请实施例不限定用户轨迹恢复装置1400的类型。
由于本申请实施例提供的用户轨迹恢复装置可用于执行上述用户轨迹恢复方法,因此其所能获得的技术效果可参考上述方法实施例,本申请实施例在此不再赘述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件程序实现时,可以全部或部分地以计算机程序产品的形式来实现。该计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或者数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可以用介质集成的服务器、数据中心等数据存储设 备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带),光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
尽管在此结合各实施例对本申请进行了描述,然而,在实施所要求保护的本申请过程中,本领域技术人员通过查看所述附图、公开内容、以及所附权利要求书,可理解并实现所述公开实施例的其他变化。在权利要求中,“包括”(comprising)一词不排除其他组成部分或步骤,“一”或“一个”不排除多个的情况。单个处理器或其他单元可以实现权利要求中列举的若干项功能。相互不同的从属权利要求中记载了某些措施,但这并不表示这些措施不能组合起来产生良好的效果。
尽管结合具体特征及其实施例对本申请进行了描述,显而易见的,在不脱离本申请的精神和范围的情况下,可对其进行各种修改和组合。相应地,本说明书和附图仅仅是所附权利要求所界定的本申请的示例性说明,且视为已覆盖本申请范围内的任意和所有修改、变化、组合或等同物。显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
Claims (16)
- 一种用户轨迹恢复方法,其特征在于,所述方法包括:获取待恢复轨迹的用户的原始轨迹点数据序列,其中,所述原始轨迹点数据序列中的每个原始轨迹点数据包括该原始轨迹点对应的基站的标识和采集时间点;基于映射模型和所述原始轨迹点数据序列,在总体映射代价最小的情况下,确定在规整时间点上所述用户所处的基站的标识,得到所述用户的去噪轨迹点数据序列,其中,所述去噪轨迹点数据序列中的每一个去噪轨迹点数据包括该去噪轨迹点数据对应的基站的标识和规整时间点;所述规整时间点为固定时间间隔的时间点;所述映射模型的限定条件包括:一个原始轨迹点数据映射到一个规整时间点的一个基站的标识上,以及在同一个规整时间点上多个原始轨迹点数据最多映射到一个基站的标识上;根据所述去噪轨迹点数据序列,恢复所述待恢复轨迹的用户的轨迹。
- 根据权利要求1所述的方法,其特征在于,所述根据所述去噪轨迹点数据序列,恢复所述待恢复轨迹的用户的轨迹,包括:基于所述去噪轨迹点数据序列和预先训练好的用户-基站模型,确定所述用户的基站粒度的去噪轨迹点数据序列;其中,所述用户-基站模型的参数包括:N个基站之间的转移概率,其中,N为所述原始轨迹点数据序列中包括的不同的基站的标识的数量;根据所述基站粒度的去噪轨迹点数据序列,恢复所述待恢复轨迹的用户的轨迹。
- 根据权利要求1所述的方法,其特征在于,所述根据所述去噪轨迹点数据序列,恢复所述待恢复轨迹的用户的轨迹,包括:基于所述去噪轨迹点数据序列和预先训练好的用户-基站模型,确定所述用户的基站粒度的去噪轨迹点数据序列;其中,所述用户-基站模型的参数包括:N个基站之间的转移概率,其中,N为所述原始轨迹点数据序列中包括的不同的基站的标识的数量;基于所述基站粒度的轨迹点数据序列和预先训练好的基站-地理栅格模型,确定所述用户的地理栅格粒度的去噪轨迹点数据序列;其中,所述基站-地理栅格模型的参数包括:M个地理栅格之间的转移概率,以及每个地理栅格对所述N个基站的输出概率,M为正整数;根据所述地理栅格粒度的去噪轨迹点数据序列,恢复所述待恢复轨迹的用户的轨迹。
- 根据权利要求2或3所述的方法,其特征在于,所述基于所述去噪轨迹点数据序列和预先训练好的用户-基站模型,确定所述用户的基站粒度的去噪轨迹点数据序列,包括:根据所述去噪轨迹点数据序列中的去噪轨迹点数据包含的规整时间点,确定所述去噪轨迹点数据序列中缺失的轨迹点数据包含的规整时间点;根据所述N个基站之间的转移概率以及第一预设公式,确定在所述缺失的轨迹点数据包含的规整时间点上,所述用户所处的基站分别对应所述N个基站中的任意一个基站的情况下,由所述缺失的轨迹点数据包含的基站的标识所对应的基站和所述去噪轨迹点数据包含的基站的标识所对应的基站构成的第一完整路径的转移概率,其中,所述第一预设公式包括:第一完整路径的转移概率=所述第一完整路径上各个基站之间的转移概率的乘积;将所述第一完整路径的转移概率最大的路径上的多个基站确定为所述待恢复轨迹的用户在不同规整时间点上所处的基站;根据所述待恢复轨迹的用户在不同规整时间点上所处的基站,确定所述待恢复轨迹的用户的基站粒度的去噪轨迹点数据序列。
- 根据权利要求2-4任一项所述的方法,其特征在于,所述方法还包括:获取训练所述用户-基站模型的多个第一训练数据,其中,所述多个第一训练数据中的每个第一训练数据中均包括基站的标识和采集时间点;根据所述每个第一训练数据,确定从所述N个基站的任意一个基站分别转移至所述N个基站中的任意一个基站的次数;根据所述从所述N个基站的任意一个基站分别转移至所述N个基站中的任意一个基站的次数,基于第三预设公式,确定所述N个基站之间的转移概率,其中,所述第三预设公式包括:
- 根据权利要求2-5任一项所述的方法,其特征在于,所述用户-基站模型为马尔科夫模型。
- 根据权利要求3-6任一项所述的方法,其特征在于,所述基站-地理栅格模型为隐马尔科夫模型。
- 一种用户轨迹恢复装置,其特征在于,所述装置包括:获取模块、去噪模块和轨迹恢复模块;所述获取模块,用于获取待恢复轨迹的用户的原始轨迹点数据序列,其中,所述原始轨迹点数据序列中的每个原始轨迹点数据包括该原始轨迹点对应的基站的标识和采集时间点;所述去噪模块,用于基于映射模型和所述原始轨迹点数据序列,在总体映射代价最小的情况下,确定在规整时间点上所述用户所处的基站的标识,得到所述用户的去噪轨迹点数据序列,其中,所述去噪轨迹点数据序列中的每一个去噪轨迹点数据包括该去噪轨迹点数据对应的基站的标识和规整时间点;所述规整时间点为固定时间间隔的时间点;所述映射模型的限定条件包括:一个原始轨迹点数据映射到一个规整时间点的一个基站的标识上,以及在同一个规整时间点上多个原始轨迹点数据最多映射到一个基站的标识上;所述轨迹恢复模块,用于根据所述去噪轨迹点数据序列,恢复所述待恢复轨迹的用户的轨迹。
- 根据权利要求8所述的装置,其特征在于,所述轨迹恢复模块具体用于:基于所述去噪轨迹点数据序列和预先训练好的用户-基站模型,确定所述用户的基站粒度的去噪轨迹点数据序列;其中,所述用户-基站模型的参数包括:N个基站之间的转移概率,其中,N为所述原始轨迹点数据序列中包括的不同的基站的标识的数量;根据所述基站粒度的去噪轨迹点数据序列,恢复所述待恢复轨迹的用户的轨迹。
- 根据权利要求8所述的装置,其特征在于,所述轨迹恢复模块具体用于:基于所述去噪轨迹点数据序列和预先训练好的用户-基站模型,确定所述用户的基站粒度的去噪轨迹点数据序列;其中,所述用户-基站模型的参数包括:N个基站之间的转移概率,其中,N为所述原始轨迹点数据序列中包括的不同的基站的标识的数量;基于所述基站粒度的轨迹点数据序列和预先训练好的基站-地理栅格模型,确定所述用 户的地理栅格粒度的去噪轨迹点数据序列;其中,所述基站-地理栅格模型的参数包括:M个地理栅格之间的转移概率,以及每个地理栅格对所述N个基站的输出概率,M为正整数;根据所述地理栅格粒度的去噪轨迹点数据序列,恢复所述待恢复轨迹的用户的轨迹。
- 根据权利要求9或10所述的装置,其特征在于,所述轨迹恢复模块具体用于:根据所述去噪轨迹点数据序列中的去噪轨迹点数据包含的规整时间点,确定所述去噪轨迹点数据序列中缺失的轨迹点数据包含的规整时间点;根据所述N个基站之间的转移概率以及第一预设公式,确定在所述缺失的轨迹点数据包含的规整时间点上,所述用户所处的基站分别对应所述N个基站中的任意一个基站的情况下,由所述缺失的轨迹点数据包含的基站的标识所对应的基站和所述去噪轨迹点数据包含的基站的标识所对应的基站构成的第一完整路径的转移概率,其中,所述第一预设公式包括:第一完整路径的转移概率=所述第一完整路径上各个基站之间的转移概率的乘积;将所述第一完整路径的转移概率最大的路径上的多个基站确定为所述待恢复轨迹的用户在不同规整时间点上所处的基站;根据所述待恢复轨迹的用户在不同规整时间点上所处的基站,确定所述待恢复轨迹的用户的基站粒度的去噪轨迹点数据序列。
- 根据权利要求9-11任一项所述的装置,其特征在于,所述轨迹恢复模块还具体用于:获取训练所述用户-基站模型的多个第一训练数据,其中,所述多个第一训练数据中的每个第一训练数据中均包括基站的标识和采集时间点;根据所述每个第一训练数据,确定从所述N个基站的任意一个基站分别转移至所述N个基站中的任意一个基站的次数;根据所述从所述N个基站的任意一个基站分别转移至所述N个基站中的任意一个基站的次数,基于第三预设公式,确定所述N个基站之间的转移概率,其中,所述第三预设公式包括:
- 根据权利要求9-12任一项所述的装置,其特征在于,所述用户-基站模型为马尔科夫模型。
- 根据权利要求10-13任一项所述的装置,其特征在于,所述基站-地理栅格模型为隐马尔科夫模型。
- 一种用户轨迹恢复装置,其特征在于,所述装置包括:处理器和通信接口;所述处理器与所述通信接口通过总线相连;所述通信接口,用于与外部通信;所述处理器,用于执行如权利要求1-7任意一项所述的用户轨迹恢复方法。
- 根据权利要求15所述的装置,其特征在于,所述装置还包括存储器;所述存储器用于存储计算机程序指令,所述存储器与所述处理器通过所述总线连接;所述处理器具体用于:当所述用户轨迹恢复装置运行时,所述处理器执行所述存储器存储的所述计算机执行指令,以使所述用户轨迹恢复装置执行如权利要求1-7任意一项所述的用户轨迹恢复方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18764868.8A EP3592000B1 (en) | 2017-03-07 | 2018-01-23 | User path recovery method and device |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710132289.6A CN108574933B (zh) | 2017-03-07 | 2017-03-07 | 用户轨迹恢复方法及装置 |
CN201710132289.6 | 2017-03-07 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018161729A1 true WO2018161729A1 (zh) | 2018-09-13 |
Family
ID=63447322
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/073856 WO2018161729A1 (zh) | 2017-03-07 | 2018-01-23 | 用户轨迹恢复方法及装置 |
Country Status (3)
Country | Link |
---|---|
EP (1) | EP3592000B1 (zh) |
CN (1) | CN108574933B (zh) |
WO (1) | WO2018161729A1 (zh) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110411450A (zh) * | 2019-07-29 | 2019-11-05 | 北京航空航天大学 | 一种用于压缩轨迹的地图匹配方法 |
CN111914051A (zh) * | 2020-08-03 | 2020-11-10 | 中国电子科技集团公司第二十八研究所 | 一种基于地理栅格与神经网络的航迹文字描述生成方法 |
CN114065884A (zh) * | 2021-10-20 | 2022-02-18 | 国网浙江省电力有限公司杭州供电公司 | 一种结合局部注意力机制的红外电子标签位置预测方法 |
CN114245312A (zh) * | 2021-11-10 | 2022-03-25 | 陕西省信息化工程研究院 | 一种基于双分支lstm融合的移动终端轨迹预测方法 |
CN116383616A (zh) * | 2023-03-31 | 2023-07-04 | 电子科技大学长三角研究院(衢州) | 基于轨迹相似度和深度学习的轨迹gps坐标恢复方法及框架 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111601240B (zh) * | 2020-04-21 | 2021-06-15 | 上海因势智能科技有限公司 | 基于移动终端和基站进行人群轨迹计算的方法 |
CN111737598B (zh) * | 2020-04-24 | 2022-12-27 | 北京理工大学 | 区域栅格化与时间片段化融合的长时空轨迹预测方法 |
CN112001563B (zh) * | 2020-09-04 | 2023-10-31 | 深圳天源迪科信息技术股份有限公司 | 一种话单量的管理方法、装置、电子设备及存储介质 |
CN112218235B (zh) * | 2020-09-17 | 2024-03-08 | 上海市政工程设计研究总院(集团)有限公司 | 一种基于手机信令数据的市域组团间出行路径识别方法 |
CN113610124B (zh) * | 2021-07-23 | 2024-04-19 | 中国地质大学(武汉) | 基于马尔可夫链蒙特卡洛的人手轨迹生成方法和系统 |
CN113688332B (zh) * | 2021-10-22 | 2022-02-15 | 北京数业专攻科技有限公司 | 基于频繁模式的轨迹重建方法及装置 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101901551A (zh) * | 2010-06-29 | 2010-12-01 | 上海英迪信息技术有限公司 | 车辆监控系统中轨迹回放功能的优化方法 |
CN102521973A (zh) * | 2011-12-28 | 2012-06-27 | 昆明理工大学 | 一种手机切换定位的道路匹配方法 |
WO2012092276A2 (en) * | 2010-12-27 | 2012-07-05 | Al Incube, Inc. | Providing guidance for locating street parking |
CN102607553A (zh) * | 2012-03-06 | 2012-07-25 | 北京建筑工程学院 | 一种基于出行轨迹数据的行程识别方法 |
WO2016015312A1 (zh) * | 2014-07-31 | 2016-02-04 | 华为技术有限公司 | 一种轨迹数据查询的方法及装置 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8504035B2 (en) * | 2010-11-09 | 2013-08-06 | Ntt Docomo, Inc. | System and method for population tracking, counting, and movement estimation using mobile operational data and/or geographic information in mobile network |
EP3011783B1 (en) * | 2013-06-21 | 2019-10-16 | Hewlett-Packard Enterprise Development LP | Adaptive location perturbation |
US9532180B2 (en) * | 2014-06-19 | 2016-12-27 | Cellwize Wireless Technologies Ltd. | Method of analysing data collected in a cellular network and system thereof |
EP3010255A1 (en) * | 2014-10-17 | 2016-04-20 | Telefonica Digital España, S.L.U. | Method, system, user terminal and computer programs for estimating user terminal mobile paths through cellular network and map information |
-
2017
- 2017-03-07 CN CN201710132289.6A patent/CN108574933B/zh active Active
-
2018
- 2018-01-23 EP EP18764868.8A patent/EP3592000B1/en active Active
- 2018-01-23 WO PCT/CN2018/073856 patent/WO2018161729A1/zh unknown
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101901551A (zh) * | 2010-06-29 | 2010-12-01 | 上海英迪信息技术有限公司 | 车辆监控系统中轨迹回放功能的优化方法 |
WO2012092276A2 (en) * | 2010-12-27 | 2012-07-05 | Al Incube, Inc. | Providing guidance for locating street parking |
CN102521973A (zh) * | 2011-12-28 | 2012-06-27 | 昆明理工大学 | 一种手机切换定位的道路匹配方法 |
CN102607553A (zh) * | 2012-03-06 | 2012-07-25 | 北京建筑工程学院 | 一种基于出行轨迹数据的行程识别方法 |
WO2016015312A1 (zh) * | 2014-07-31 | 2016-02-04 | 华为技术有限公司 | 一种轨迹数据查询的方法及装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3592000A4 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110411450A (zh) * | 2019-07-29 | 2019-11-05 | 北京航空航天大学 | 一种用于压缩轨迹的地图匹配方法 |
CN110411450B (zh) * | 2019-07-29 | 2021-04-16 | 北京航空航天大学 | 一种用于压缩轨迹的地图匹配方法 |
CN111914051A (zh) * | 2020-08-03 | 2020-11-10 | 中国电子科技集团公司第二十八研究所 | 一种基于地理栅格与神经网络的航迹文字描述生成方法 |
CN111914051B (zh) * | 2020-08-03 | 2022-10-04 | 中国电子科技集团公司第二十八研究所 | 一种基于地理栅格与神经网络的航迹文字描述生成方法 |
CN114065884A (zh) * | 2021-10-20 | 2022-02-18 | 国网浙江省电力有限公司杭州供电公司 | 一种结合局部注意力机制的红外电子标签位置预测方法 |
CN114245312A (zh) * | 2021-11-10 | 2022-03-25 | 陕西省信息化工程研究院 | 一种基于双分支lstm融合的移动终端轨迹预测方法 |
CN116383616A (zh) * | 2023-03-31 | 2023-07-04 | 电子科技大学长三角研究院(衢州) | 基于轨迹相似度和深度学习的轨迹gps坐标恢复方法及框架 |
Also Published As
Publication number | Publication date |
---|---|
EP3592000A1 (en) | 2020-01-08 |
CN108574933A (zh) | 2018-09-25 |
CN108574933B (zh) | 2020-11-27 |
EP3592000B1 (en) | 2021-03-31 |
EP3592000A4 (en) | 2020-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018161729A1 (zh) | 用户轨迹恢复方法及装置 | |
CN110782042B (zh) | 横向联邦和纵向联邦联合方法、装置、设备及介质 | |
US20200241057A1 (en) | Novel non-parametric statistical behavioral identification ecosystem for electricity fraud detection | |
CN107111647B (zh) | 用于针对时限结果提供备用查询建议的方法和查询建议服务器 | |
US20200356462A1 (en) | Systems and methods for determining performance metrics of remote relational databases | |
CN104867065A (zh) | 处理订单的方法和设备 | |
CN103929719B (zh) | 定位信息的优化方法和优化装置 | |
CN106455056B (zh) | 定位方法和装置 | |
US10769140B2 (en) | Concept expansion using tables | |
JP2024536241A (ja) | 生成ニューラルネットワークを用いた入力分類と応答のための技術 | |
CN114494709A (zh) | 特征提取模型的生成方法、图像特征提取方法和装置 | |
CN111611390B (zh) | 一种数据处理方法及装置 | |
WO2016037346A1 (en) | Measuring and diagnosing noise in urban environment | |
CN111311014B (zh) | 业务数据处理方法、装置、计算机设备和存储介质 | |
US10182307B2 (en) | System for providing location-based social networking services to users of mobile devices | |
US20220207284A1 (en) | Content targeting using content context and user propensity | |
CN107807940B (zh) | 信息推荐方法和装置 | |
CN111182465A (zh) | 终端归属的确定方法及装置 | |
CN113570004B (zh) | 一种乘车热点区域预测方法、装置、设备及可读存储介质 | |
US11455556B2 (en) | Framework for measuring telemetry data variability for confidence evaluation of a machine learning estimator | |
CN111582456B (zh) | 用于生成网络模型信息的方法、装置、设备和介质 | |
CN113362097A (zh) | 一种用户确定方法和装置 | |
US20190098442A1 (en) | Method of operating a device, the device shown carrying out the method, and a system including the deice and a server in a system providing location-based social networking services to users of mobile devices | |
CN108984619A (zh) | 一种基于大数据的自定义报表的方法 | |
US20230351211A1 (en) | Scoring correlated independent variables for elimination from a dataset |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18764868 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2018764868 Country of ref document: EP Effective date: 20191001 |