CN106327236B - Method and device for determining action track of user - Google Patents

Method and device for determining action track of user Download PDF

Info

Publication number
CN106327236B
CN106327236B CN201510406207.3A CN201510406207A CN106327236B CN 106327236 B CN106327236 B CN 106327236B CN 201510406207 A CN201510406207 A CN 201510406207A CN 106327236 B CN106327236 B CN 106327236B
Authority
CN
China
Prior art keywords
sequence
segment
frequent
place
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510406207.3A
Other languages
Chinese (zh)
Other versions
CN106327236A (en
Inventor
李辉
邓珂
李彦华
崔江涛
王蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Xidian University
Original Assignee
Huawei Technologies Co Ltd
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd, Xidian University filed Critical Huawei Technologies Co Ltd
Priority to CN201510406207.3A priority Critical patent/CN106327236B/en
Publication of CN106327236A publication Critical patent/CN106327236A/en
Application granted granted Critical
Publication of CN106327236B publication Critical patent/CN106327236B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the invention discloses a method and a device for determining a user action track, relates to the technical field of communication, and is used for improving the accuracy of the determined user action track so as to improve the efficiency of navigation and place recommendation for a user. The method comprises the following steps: determining R place sequences according to a track sequence formed by N position data of a user; determining M target sequence segments according to the R site sequences; and taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user. The technical scheme provided by the embodiment of the invention can be used in the process of determining the action track of the user.

Description

Method and device for determining action track of user
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for determining a user action trajectory.
Background
The action track of the user has important potential value in a plurality of fields; for example, after the action track of the user is known, navigation can be performed for the user, more accurate location recommendation can be realized for the user, and advertisement site selection can be optimized for the merchant. With the popularization of terminal devices (such as smart phones, tablet computers and the like) with an automatic positioning function, the position data of the user can be recorded at any time, and how to determine the action track of the user according to the position data of the user has important significance.
At present, after position data of a user at different time points are known, places closest to positions represented by the position data corresponding to the different time points are respectively found out, and a route formed by connecting the places in series according to time sequence is used as an action track of the user.
The action trajectory specified by the above method is not necessarily the closest to the position data, and therefore the accuracy of the action trajectory specified by the above method is not high, and the efficiency of navigation and location recommendation for the user using the action trajectory is not high.
Disclosure of Invention
The embodiment of the invention provides a method and a device for determining a user action track, which are used for improving the accuracy of the determined user action track and further improving the efficiency of navigation and place recommendation for the user.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, a method for determining a user action track is provided, which includes:
determining R place sequences according to a track sequence formed by N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers;
determining M target sequence segments according to the R site sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer;
and taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user.
With reference to the first aspect, in a first possible implementation manner, the determining M target sequence segments according to the R venue sequences includes:
generating a frequent segment candidate set with the length of x +1 according to the frequent segment with the length of x, wherein when x is 0, the frequent segment candidate set with the length of 1 is as follows: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer;
scanning each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences to obtain the frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences;
when the frequency of a sequence fragment in the frequent fragment candidate set with the length of x +1 in a plurality of place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, determining that the sequence fragment is a frequent fragment with the length of x + 1;
determining M frequent segments of all the frequent segments as M target sequence segments.
With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, the determining that M frequent segments of all the frequent segments are M target sequence segments includes:
determining M closed frequent segments in all the frequent segments as M target sequence segments; wherein a closed frequent segment of the total of the frequent segments is: a frequent segment that is not a sub-segment of any of the all of the frequent segments.
With reference to the first possible implementation manner or the second possible implementation manner of the first aspect, in a third possible implementation manner, in a process of scanning one sequence segment in the frequent segment candidate set with the length of x +1 in one place sequence, the frequency of the sequence segment in the place sequence is recorded by an automaton.
With reference to the first aspect and any one of the first possible implementation manner to the third possible implementation manner of the first aspect, in a fourth possible implementation manner, the track sequence is a track sequence of the user within a preset time period.
With reference to the first aspect and any one of the first possible implementation manner to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner, the method further includes:
clustering target sequence segments of a plurality of users to obtain k clustering clusters, wherein k is more than or equal to 1 and is an integer;
representing each user as a user vector with k dimensions, wherein one dimension corresponds to one cluster, and the value of one dimension is the number of target sequence segments of the users in the cluster corresponding to the dimension;
and establishing a Gaussian mixture model, and fitting parameters of the Gaussian mixture model according to the user vectors of the users, wherein the Gaussian mixture model is composed of a plurality of Gaussian models, one Gaussian model corresponds to one user group, and a plurality of user groups corresponding to the Gaussian models are composed of the users.
In a second aspect, an apparatus for determining a user action track is provided, which includes:
a first determination unit for determining R place sequences from a trajectory sequence composed of N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers;
a second determining unit, configured to determine M target sequence segments according to the R site sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer;
and the execution unit is used for taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user.
With reference to the second aspect, in a first possible implementation manner, the second determining unit includes:
a generating unit, configured to generate a frequent segment candidate set with a length of x +1 according to a frequent segment with a length of x, where when x is 0, the frequent segment candidate set with a length of 1 is: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer;
a scanning unit, configured to scan each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences, to obtain a frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences;
a first determining subunit, configured to determine that a sequence segment in the x + 1-long frequent segment candidate set is a frequent segment with a length of x +1 when a frequency of the sequence segment in multiple place sequences is greater than a preset frequency and a sum of probabilities of the multiple place sequences is greater than a preset probability;
and the second determining subunit is configured to determine that M frequent segments of all the frequent segments are M target sequence segments.
With reference to the first possible implementation manner of the second aspect, in a second possible implementation manner, the second determining subunit is specifically configured to:
determining M closed frequent segments in all the frequent segments as M target sequence segments; wherein a closed frequent segment of the total of the frequent segments is: a frequent segment that is not a sub-segment of any of the all of the frequent segments.
With reference to the first possible implementation manner or the second possible implementation manner of the second aspect, in a third possible implementation manner, the scanning unit records, by an automaton, the frequency of a sequence segment in a place sequence during a process of scanning the sequence segment in the frequent segment candidate set with the length of x +1 in the place sequence.
With reference to the second aspect and any one of the first possible implementation manner to the third possible implementation manner of the second aspect, in a fourth possible implementation manner, the track sequence is a track sequence of the user within a preset time period.
With reference to the second aspect and any one of the first possible implementation manner to the fourth possible implementation manner of the second aspect, in a fifth possible implementation manner, the apparatus further includes:
the clustering unit is used for clustering the target sequence segments of the users to obtain k clustering clusters, wherein k is greater than or equal to 1 and is an integer;
the representing unit is used for representing each user as a user vector with k dimensions, one dimension corresponds to one cluster, and the value on one dimension is the number of target sequence segments of the users in the cluster corresponding to the dimension;
and the calculation unit is used for establishing a Gaussian mixture model and fitting parameters of the Gaussian mixture model according to the user vectors of the users, wherein the Gaussian mixture model is composed of a plurality of Gaussian models, one Gaussian model corresponds to one user group, and a plurality of user groups corresponding to the Gaussian models are composed of the users.
The method and the device provided by the embodiment of the invention can determine the place sequence corresponding to the user according to the action track formed by the historical position data of the user, and when the frequency of one sequence segment in a plurality of place sequences is higher and the sum of the probabilities of the plurality of place sequences is higher, the probability that the user passes through the places in the sequence segment is higher. If the values of the preset frequency and the preset probability are reasonably set, compared with the prior art, after the target sequence segment is determined according to the method provided by the embodiment of the invention, the action track of the user determined according to the target sequence segment is more accurate, and the efficiency of navigation and location recommendation for the user is higher according to the action track.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for determining a trajectory of a user action according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a track sequence provided by an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of an apparatus for determining a user action trajectory according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of another apparatus for determining a trajectory of a user action according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of another apparatus for determining a user action trajectory according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The execution subject of the embodiment of the present invention may be a server, a workstation, a general PC (Personal Computer), and the like, the server may specifically be an HP DL series server, and the workstation may specifically be a Z820 series workstation. The position data of the user in the embodiment of the invention refers to historical position data of the user, and the position data of the user can be longitude and latitude values of the position where the user is located at a certain moment. The position data of the user may also be represented in other ways, and the embodiment of the present invention is not limited to this specifically.
An embodiment of the present invention provides a method for determining a user action trajectory, as shown in fig. 1, including:
101. determining R place sequences according to a track sequence formed by N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers.
One or more places in the place set corresponding to one piece of position data can be provided, and one position is any one piece of position data in the N pieces of position data.
The places in the place set may be specifically squares, bars, upscale markets, italian restaurants, mcdonald duty, libraries, bus stop boards, parking lots, museums, and the like, and may also be other places, which is not limited in this embodiment of the present invention.
Specifically, when the number of places in the N place sets corresponding to the N position data is r respectively1、r2、…rNIn this case, the number of the location sequences determined from the trajectory sequence composed of the N position data may be R, where R is R1·r2·…·rN
Illustratively, as shown in FIG. 2, a track sequence is composed of 4 position data of a user, the 4 position data being N respectively1、N2、N3And N4The trajectory actually represented by the sequence of trajectories may be a curved trajectory, which is drawn as a straight line in fig. 2 for simplicity. With N1、N2、N3And N4The dotted circle as the center of the circle represents a predetermined range centered on the position data, and N is1The corresponding place set is a fuzzy set composed of places A, B and C, N2The corresponding place set is a fuzzy set composed of places A, B and C, N3The corresponding location set is a fuzzy set consisting of locations A and D, N4The corresponding location set is a fuzzy set composed of locations B.
From this trajectory sequence 18 sequences of sites can be determined, respectively: [ A, A, A, B ]; [ A, A, D, B ]; [ A, B, A, B ]; [ A, B, D, B ]; [ A, C, A, B ]; [ A, C, D, B ]; [ B, A, A, B ]; [ B, A, D, B ]; [ B, B, A, B ]; [ B, B, D, B ]; [ B, C, A, B ]; [ B, C, D, B ]; [ C, A, A, B ]; [ C, A, D, B ]; [ C, B, A, B ]; [ C, B, D, B ]; [ C, C, A, B ]; [ C, C, D, B ].
102. Determining M target sequence segments according to the R site sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer.
The target sequence fragment may be composed of one site or a plurality of sites.
Specifically, when a site sequence is [ C, B, a, B ], the sequence fragment [ AB ], the sequence fragment [ CA ], the sequence fragment [ BB ], the sequence fragment [ CB ], and the like are all sub-fragments of the site sequence, that is, the sub-fragment of a site sequence is composed of at least one site in the site sequence, and the at least one site is arranged according to the sequence of the site sequence.
Based on the example illustrated in fig. 2, sequence segment [ a ] appears 3 times in the 1 st place sequence, and the frequency of sequence segment [ a ] in the 1 st place sequence is 3. The sequence fragment [ AB ] appears 2 times in the 3 rd-place sequence, and the frequency of the sequence fragment [ AB ] in the 3 rd-place sequence is 2. It should be noted that, when the place sequence is [ A, C, B, a, E, B, D, B ], the frequency of the sequence fragment [ AB ] in the place sequence is also 2, that is, when the frequency of the sequence fragment is determined, several places in the sequence fragment do not need to appear continuously in the place sequence, as long as the precedence order of the several places is consistent with the precedence order of the several places when appearing in the place sequence.
Specifically, in a place set corresponding to one piece of location data, the probability of each place is the probability that the user goes to the place from the location represented by the location data. Since the closer a place is to the location data, the greater the probability that the user will go from the location represented by the location data to the place, the probability of each place in the set of places is inversely proportional to the distance between the place and the location represented by the location data.
In a location set corresponding to one location data, the probability of one location may be calculated by Rayleigh (Rayleigh) distribution, and specifically may be:
Figure BDA0000757511380000081
where d is the distance between the location represented by the location data and the location, f (d) is the probability of the location in the location set, and σ generally takes the value of 1.
Exemplary, based on the example shown in FIG. 2, N2The corresponding place set is a fuzzy set consisting of places A, B and C, where the probability of place B in the place set is f (d)2) Wherein, as shown in FIG. 2, d2Is N2Distance from site B.
At the 4 th fieldBy way of example, the probability of the 4 th site sequence is P ═ f (d)1)·f(d2)·f(d3)·f(d4) Wherein d is1Is N1Distance from A, d2Is N2Distance from B, d3Is N3Distance from D, D4Is N4And B.
In addition, when the preset range of one location data is a range of a square circle D meters centered on the location data, the probability of one place in the place set corresponding to the location data can be further calculated by:
Figure BDA0000757511380000091
where d is the distance between the location and the location represented by the location data.
Illustratively, taking the 4 th venue sequence as an example, the probability of the 4 th venue sequence is
Figure BDA0000757511380000092
Figure BDA0000757511380000093
103. And taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user.
Generally, the daily action trajectory of the user has a certain regularity (for example, office workers generally have a fixed action trajectory during working days). The method provided by the embodiment of the invention can determine the place sequence corresponding to the user according to the action track formed by the historical position data of the user, and when the frequency of one sequence segment in a plurality of place sequences is higher and the sum of the probabilities of the plurality of place sequences is higher, the probability that the user passes through the place in the sequence segment is higher. If the values of the preset frequency and the preset probability are reasonably set, compared with the prior art, after the target sequence segment is determined according to the method provided by the embodiment of the invention, the action track of the user determined according to the target sequence segment is more accurate, and the efficiency of navigation and location recommendation for the user is higher according to the action track.
Optionally, the step 102 may be implemented as follows:
1021. generating a frequent segment candidate set with the length of x +1 according to the frequent segment with the length of x, wherein when x is 0, the frequent segment candidate set with the length of 1 is as follows: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer.
1022. Scanning each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences to obtain the frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences.
1023. And when the frequency of a sequence fragment in the frequent fragment candidate set with the length of x +1 in a plurality of place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, determining that the sequence fragment is a frequent fragment with the length of x + 1.
1024. Determining M frequent segments of all the frequent segments as M target sequence segments.
Illustratively, based on the example described in fig. 2, the frequent segment candidate set with length 1 is a set of all sites constituting the above-mentioned 18 site sequences, i.e., a set consisting of sites [ a ], [ B ], [ C ] and [ D ].
Specifically, an Apriori algorithm may be used to generate a frequent segment candidate set with a length of x +1 from a frequent segment with a length of x. In addition, frequent segments of length x +1 can also be obtained by: frequent segments of length 1 are inserted into places in the frequent segments of length x in sequence, and for simplicity of description, the embodiments of the present invention are described hereinafter taking this case as an example.
Illustratively, when the frequent segments of length 1 are [ A ] and [ B ], then the frequent segment candidate set of length 2 is a set consisting of sequence segments [ AA ], [ AB ], [ BA ] and [ BB ].
Illustratively, when the frequent fragments with the length of 2 are [ AB ] and [ AC ], and the frequent fragments with the length of 1 are [ A ], [ D ], the sequence fragments obtained by inserting [ A ] into [ AB ] are [ AAB ] and [ ABA ]; the sequence fragments obtained by inserting the [ D ] into the [ AB ] are [ DAB ], [ ADB ] and [ ABD ]; similarly, the [ A ] is inserted into the [ AC ], the [ D ] is inserted into the [ AC ] to obtain other sequence fragments, and the frequent fragment candidate set with the length of 3 is a set consisting of [ AAB ], [ ABA ], [ DAB ], [ ADB ], [ ABD ], [ AAC ], [ ACA ], [ DAC ], [ ADC ] and [ ACD ].
Optionally, in the process of scanning a sequence segment in the frequent segment candidate set with the length of x +1 in one of the place sequences, the frequency of the sequence segment in the place sequence is recorded by an automaton.
In this case, first, an automaton is set for each sequence segment in the frequent segment candidate set. Taking the above place sequence 15 as an example, when the sequence segments [ CB ] and [ CA ] with the length of 2 are scanned, the [ CB ] corresponds to the first automaton, and the [ CA ] corresponds to the second automaton, when the first place C is scanned, the two automatons enter the initial state C at the same time. When the second place B is scanned, the first automaton enters a second state B, and since B is the final state of the sequence fragment [ CB ] corresponding to the first automaton, the frequency of [ CB ] is increased by 1. When the third place A is scanned, the second automaton enters a second state A, and the frequency of [ CA ] is increased by 1 because A is the final state of the sequence fragment [ CA ] corresponding to the second automaton. And then, the two automatons continue to scan until the scanned place is C, the first automaton and the second automaton re-enter the initial state C, the above process is repeated until the last place of the place sequence is scanned, and the frequency of the sequence fragment in the place sequence is determined.
Optionally, the step 1024 may be implemented as follows: determining M closed frequent segments in all the frequent segments as M target sequence segments; wherein a closed frequent segment of the total of the frequent segments is: a frequent segment that is not a sub-segment of any of the all of the frequent segments. It should be noted that a frequent segment is not a sub-segment of itself.
In general, a large number of target sequence segments can be determined according to one track sequence of a user, and the number of target sequence segments can be reduced by the alternative method.
Exemplarily, if there are 6 frequent segments in total, wherein the frequent segments with the length of 1 are [ a ], [ B ], [ C ]; frequent segments with length of 2 are [ AB ], [ AC ]; frequent fragments of length 3 are [ ABC ]. Then [ A ], [ B ], [ C ], [ AB ], [ AC ] are all sub-segments of [ ABC ], and the frequent segment [ ABC ] is a closed frequent segment, so that the closed frequent segment [ ABC ] can be determined to be a target sequence segment.
The above embodiments describe a method for determining a target sequence segment of a user by using one track sequence of one user, and when there are multiple track sequences, the target sequence segment of the user can be determined according to the multiple track sequences by using the above method.
Optionally, the track sequence is a track sequence of the user within a preset time period.
Specifically, the preset time period may be 1 day, an morning or an afternoon, or may be other preset time periods, which is not specifically limited in the embodiment of the present invention.
In a specific implementation, since the user may perform different activities in different time periods, in order to improve the accuracy of the determined action trajectory of the user, the target sequence segments of the user may be determined by using the daily trajectory sequence of the user in a working day or the target sequence segments of the user may be determined by using the daily trajectory sequence of the user in a resting day. When the action track of the user is determined by using the determined target sequence segment under the condition, different recommendation contents can be provided for the user according to the action track of the user on a working day and a rest day, and the recommendation efficiency is improved.
Optionally, the method may further include the following steps:
(1) clustering target sequence segments of a plurality of users to obtain k cluster clusters, wherein k is more than or equal to 1 and is an integer.
(2) And representing each user as a user vector with k dimensions, wherein one dimension corresponds to one cluster, and the value of one dimension is the number of target sequence segments of the users in the cluster corresponding to the dimension.
(3) And establishing a Gaussian mixture model, and fitting parameters of the Gaussian mixture model according to the user vectors of the users, wherein the Gaussian mixture model is composed of a plurality of Gaussian models, one Gaussian model corresponds to one user group, and a plurality of user groups corresponding to the Gaussian models are composed of the users.
In this alternative method, the target sequence segment of the user may be a target sequence segment determined according to one track sequence, or may be a target sequence segment determined according to a plurality of track sequences.
For example, the target sequence segments of multiple users may be clustered through algorithms such as K-means (K mean), K-medoids (K median), and the like, and may also be implemented through other algorithms, which is not limited in this embodiment of the present invention.
Suppose that the plurality of users can be divided into J user groups, each of which is C1,...,CJThe GMM (gaussian mixture Model) is composed of J gaussian models, and one gaussian Model corresponds to one user group. A Gaussian model can be constructed by the user vectors in the user community corresponding to the Gaussian model. Assuming that there are I user vectors for I users (I is generally a large value), the probability density function of the GMM formed by linear addition of J gaussian models is:
Figure BDA0000757511380000121
j is more than or equal to 1 and less than or equal to J, I is more than or equal to 1 and less than or equal to I, and I, I, J and J are integers.
Wherein, P (C)j) Probability of occurrence in I users for jth user community, P (U)i|Cj) Is the probability of the ith user appearing in the jth user community, P (U)iAnd | Θ) is the probability of the ith user appearing when the parameter is Θ.
Theta comprises 2J parameters of mu1,...,μJSum-sigma1,...,ΣJWherein, mujSum-sigmajAnd the parameters correspond to the jth user community. Hereinafter muj gIs the mu corresponding to the jth user group obtained by the calculation of the ith round (g is more than or equal to 0 and g is an integer)j,Σj gMeans sigma corresponding to jth user group obtained by computing in the g-th roundjProcedure for fitting parameters of GMM (i.e. calculating μ)1,...,μJSum-sigma1,...,ΣJThe process of (d) includes:
(1) using the basis of muj (g)Sum-sigmaj (g)Calculated P (U)i|Cj) And P (C)j) According to the calculation of muj (g+1)Using muj (g+1)According to
Figure BDA0000757511380000131
Calculation according to μj (g+1)And calculating P (U)i|Cj) And P (C)j). Wherein mu is calculatedj (g+1)P (C) used in timej|Ui) Can be determined by the equation μj (g)Sum-sigmaj (g)Calculated P (U)i|Cj) And P (C)j) And (4) calculating.
(2) Let g be g + 1.
Repeating the steps (1) and (2) until P (U)i|Cj) Converge to P (U)i|Cj) μ at convergencejSum-sigmajAnd determining the parameters corresponding to the jth user community.
When g is 0, initialize mujSum-sigmaj
Figure BDA0000757511380000135
According to μj (0)Sum-sigmaj (0)P (U) can be calculatedi|Cj) And P (C)j) Wherein, UiRefers to the user vector of the ith user.
And determining the probability of each user belonging to each user group according to the user vector of each user and the corresponding parameter of each user group, and determining the user group to which the user belongs as the user group when the probability of one user belonging to one user group is maximum. Then according to the method, the user group to which the plurality of users belong can be determined.
According to the optional method, a large number of target sequence fragments corresponding to the users are clustered to simplify the user vectors, the GMM model is established according to the user vectors of the users, and parameters of the GMM model are fitted, so that the probability of the users in each user group can be determined according to the user vectors of the users, and the maximum probability is determined as the user group to which the users belong. The optional method can perform the same recommendation on the users in the same user group, and improves the recommendation efficiency.
An embodiment of the present invention provides an apparatus 30 for determining a user action track, configured to execute the method for determining a user action track shown in fig. 1, as shown in fig. 3, where the apparatus 30 includes:
a first determination unit 301 for determining R place sequences from a trajectory sequence made up of N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers;
a second determining unit 302, configured to determine M target sequence segments according to the R venue sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer;
an executing unit 303, configured to use a route formed by serially connecting places in any one of the M target sequence segments according to a time sequence as an action track of the user.
Optionally, as shown in fig. 4, the second determining unit 302 may include:
a generating unit 3021, configured to generate a frequent segment candidate set with a length of x +1 according to a frequent segment with a length of x, where when x is 0, the frequent segment candidate set with a length of 1 is: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer;
a scanning unit 3022, configured to scan each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences, to obtain a frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences;
a first determining subunit 3023, configured to determine that a sequence segment in the x + 1-long frequent segment candidate set is a frequent segment with a length of x +1 when the frequency of the sequence segment in multiple place sequences is greater than a preset frequency and the sum of the probabilities of the multiple place sequences is greater than a preset probability;
a second determining subunit 3024, configured to determine that M frequent segments of all the frequent segments are M target sequence segments.
Optionally, the second determining subunit 3024 is specifically configured to:
determining M closed frequent segments in all the frequent segments as M target sequence segments; wherein a closed frequent segment of the total of the frequent segments is: a frequent segment that is not a sub-segment of any of the all of the frequent segments.
Optionally, the scanning unit 3022 records, by an automaton, the frequency of a sequence segment in a venue sequence during the scanning of the sequence segment in the x + 1-long frequent segment candidate set in the venue sequence.
Optionally, the track sequence is a track sequence of the user within a preset time period.
Optionally, as shown in fig. 4, the apparatus 30 may further include:
a clustering unit 304, configured to cluster target sequence segments of multiple users to obtain k cluster clusters, where k is greater than or equal to 1, and k is an integer;
a representing unit 305, configured to represent each user as a user vector having k dimensions, where a dimension corresponds to a cluster, and a value in a dimension is the number of target sequence segments of the user in the cluster corresponding to the dimension;
the calculating unit 306 is configured to establish a gaussian mixture model, and fit parameters of the gaussian mixture model according to the user vectors of the users, where the gaussian mixture model is formed by multiple gaussian models, one gaussian model corresponds to one user group, and multiple user groups corresponding to the gaussian models are formed by the users.
Generally, the daily action trajectory of the user has a certain regularity (for example, office workers generally have a fixed action trajectory during working days). The device provided by the embodiment of the invention can determine the place sequence corresponding to the user according to the action track formed by the historical position data of the user, and when the frequency of one sequence segment in a plurality of place sequences is higher and the sum of the probabilities of the plurality of place sequences is higher, the probability that the user passes through the place in the sequence segment is higher. If the values of the preset frequency and the preset probability are reasonably set, compared with the prior art, after the target sequence segment is determined according to the method provided by the embodiment of the invention, the action track of the user determined according to the target sequence segment is more accurate, and the efficiency of navigation and location recommendation for the user is higher according to the action track.
In terms of hardware implementation, each unit in the apparatus 30 may be embedded in a processor of the apparatus 30 or independent from the processor of the apparatus 30 in a hardware form, or may be stored in a memory of the apparatus 30 in a software form, so that the processor may invoke and execute operations corresponding to the above units, where the processor may be a Central Processing Unit (CPU), a microprocessor, a single chip, or the like.
As shown in fig. 5, another apparatus 50 for determining a user action track according to an embodiment of the present invention is provided, for performing the method for determining a user action track shown in fig. 1, where the apparatus 50 includes: a memory 501, a processor 502 and a bus system 503.
The memory 501 and the processor 502 are coupled together by a bus system 503, wherein the memory 501 may comprise a random access memory, and may further comprise a non-volatile memory, such as at least one disk memory. The bus system 503 may be an ISA bus, a PCI bus, an EISA bus, or the like. The bus system 503 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
The memory 501 is used to store a set of codes for controlling the processor 502 to perform the following actions:
determining R place sequences according to a track sequence formed by N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers;
determining M target sequence segments according to the R site sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer;
and taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user.
Optionally, the processor 502 is specifically configured to:
generating a frequent segment candidate set with the length of x +1 according to the frequent segment with the length of x, wherein when x is 0, the frequent segment candidate set with the length of 1 is as follows: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer;
scanning each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences to obtain the frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences;
when the frequency of a sequence fragment in the frequent fragment candidate set with the length of x +1 in a plurality of place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, determining that the sequence fragment is a frequent fragment with the length of x + 1;
determining M frequent segments of all the frequent segments as M target sequence segments.
Optionally, the processor 502 is specifically configured to:
determining M closed frequent segments in all the frequent segments as M target sequence segments; wherein a closed frequent segment of the total of the frequent segments is: a frequent segment that is not a sub-segment of any of the all of the frequent segments.
Optionally, the processor 502 records the frequency of the sequence segment in the venue sequence by an automaton during the process of scanning one sequence segment in the x + 1-long frequent segment candidate set in one venue sequence.
Optionally, the track sequence is a track sequence of the user within a preset time period.
Optionally, the processor 502 is further configured to:
clustering target sequence segments of a plurality of users to obtain k clustering clusters, wherein k is more than or equal to 1 and is an integer;
representing each user as a user vector with k dimensions, wherein one dimension corresponds to one cluster, and the value of one dimension is the number of target sequence segments of the users in the cluster corresponding to the dimension;
and establishing a Gaussian mixture model, and fitting parameters of the Gaussian mixture model according to the user vectors of the users, wherein the Gaussian mixture model is composed of a plurality of Gaussian models, one Gaussian model corresponds to one user group, and a plurality of user groups corresponding to the Gaussian models are composed of the users.
Generally, the daily action trajectory of the user has a certain regularity (for example, office workers generally have a fixed action trajectory during working days). The device provided by the embodiment of the invention can determine the place sequence corresponding to the user according to the action track formed by the historical position data of the user, and when the frequency of one sequence segment in a plurality of place sequences is higher and the sum of the probabilities of the plurality of place sequences is higher, the probability that the user passes through the place in the sequence segment is higher. If the values of the preset frequency and the preset probability are reasonably set, compared with the prior art, after the target sequence segment is determined according to the method provided by the embodiment of the invention, the action track of the user determined according to the target sequence segment is more accurate, and the efficiency of navigation and location recommendation for the user is higher according to the action track.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented.
The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or two or more modules may be integrated into one module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (12)

1. A method for determining a trajectory of a user's actions, comprising:
determining R place sequences according to a track sequence formed by N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers;
determining M target sequence segments according to the R site sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer;
and taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user.
2. The method of claim 1, wherein said determining M target sequence segments from said R venue sequences comprises:
generating a frequent segment candidate set with the length of x +1 according to the frequent segment with the length of x, wherein when x is 0, the frequent segment candidate set with the length of 1 is as follows: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer;
scanning each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences to obtain the frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences;
when the frequency of a sequence fragment in the frequent fragment candidate set with the length of x +1 in a plurality of place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, determining that the sequence fragment is a frequent fragment with the length of x + 1;
determining M frequent segments of all the frequent segments as M target sequence segments.
3. The method of claim 2, wherein the determining that M frequent segments of the total of frequent segments are M target sequence segments comprises:
determining M closed frequent segments in all the frequent segments as M target sequence segments; wherein a closed frequent segment of the total of the frequent segments is: a frequent segment that is not a sub-segment of any of the all of the frequent segments.
4. The method of claim 3, wherein in the process of scanning a sequence segment in the x + 1-long frequent segment candidate set in a site sequence, the frequency of the sequence segment in the site sequence is recorded by an automaton.
5. The method of claim 1, wherein the sequence of tracks is a sequence of tracks of the user within a preset time period.
6. The method according to any one of claims 1-5, further comprising:
clustering target sequence segments of a plurality of users to obtain k clustering clusters, wherein k is more than or equal to 1 and is an integer;
representing each user as a user vector with k dimensions, wherein one dimension corresponds to one cluster, and the value of one dimension is the number of target sequence segments of the users in the cluster corresponding to the dimension;
and establishing a Gaussian mixture model, and fitting parameters of the Gaussian mixture model according to the user vectors of the users, wherein the Gaussian mixture model is composed of a plurality of Gaussian models, one Gaussian model corresponds to one user group, and a plurality of user groups corresponding to the Gaussian models are composed of the users.
7. An apparatus for determining a trajectory of a user's actions, comprising:
a first determination unit for determining R place sequences from a trajectory sequence composed of N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers;
a second determining unit, configured to determine M target sequence segments according to the R site sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer;
and the execution unit is used for taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user.
8. The apparatus according to claim 7, wherein the second determining unit comprises:
a generating unit, configured to generate a frequent segment candidate set with a length of x +1 according to a frequent segment with a length of x, where when x is 0, the frequent segment candidate set with a length of 1 is: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer;
a scanning unit, configured to scan each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences, to obtain a frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences;
a first determining subunit, configured to determine that a sequence segment in the x + 1-long frequent segment candidate set is a frequent segment with a length of x +1 when a frequency of the sequence segment in multiple place sequences is greater than a preset frequency and a sum of probabilities of the multiple place sequences is greater than a preset probability;
and the second determining subunit is configured to determine that M frequent segments of all the frequent segments are M target sequence segments.
9. The apparatus according to claim 8, wherein the second determining subunit is specifically configured to:
determining M closed frequent segments in all the frequent segments as M target sequence segments; wherein a closed frequent segment of the total of the frequent segments is: a frequent segment that is not a sub-segment of any of the all of the frequent segments.
10. The apparatus of claim 9, wherein the scanning unit records the frequency of the sequence segment in the venue sequence by an automaton during the scanning of the venue sequence for one of the sequence segments in the x +1 long frequent segment candidate set.
11. The apparatus of claim 7, wherein the sequence of tracks is a sequence of tracks of the user within a preset time period.
12. The apparatus according to any one of claims 7-11, further comprising:
the clustering unit is used for clustering the target sequence segments of the users to obtain k clustering clusters, wherein k is greater than or equal to 1 and is an integer;
the representing unit is used for representing each user as a user vector with k dimensions, one dimension corresponds to one cluster, and the value on one dimension is the number of target sequence segments of the users in the cluster corresponding to the dimension;
and the calculation unit is used for establishing a Gaussian mixture model and fitting parameters of the Gaussian mixture model according to the user vectors of the users, wherein the Gaussian mixture model is composed of a plurality of Gaussian models, one Gaussian model corresponds to one user group, and a plurality of user groups corresponding to the Gaussian models are composed of the users.
CN201510406207.3A 2015-07-10 2015-07-10 Method and device for determining action track of user Active CN106327236B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510406207.3A CN106327236B (en) 2015-07-10 2015-07-10 Method and device for determining action track of user

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510406207.3A CN106327236B (en) 2015-07-10 2015-07-10 Method and device for determining action track of user

Publications (2)

Publication Number Publication Date
CN106327236A CN106327236A (en) 2017-01-11
CN106327236B true CN106327236B (en) 2020-04-14

Family

ID=57725345

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510406207.3A Active CN106327236B (en) 2015-07-10 2015-07-10 Method and device for determining action track of user

Country Status (1)

Country Link
CN (1) CN106327236B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108647735B (en) * 2018-05-15 2021-01-12 广州杰赛科技股份有限公司 User travel rule analysis method, device, equipment and storage medium
CN108921876A (en) * 2018-07-10 2018-11-30 北京旷视科技有限公司 Method for processing video frequency, device and system and storage medium
CN110910054B (en) * 2018-09-17 2024-04-05 北京京东尚科信息技术有限公司 Track determining method and device and time recommending method and device
CN113286333B (en) * 2020-02-19 2022-08-19 华为技术有限公司 Network selection method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239556B (en) * 2014-09-25 2017-07-28 西安理工大学 Adaptive trajectory predictions method based on Density Clustering

Also Published As

Publication number Publication date
CN106327236A (en) 2017-01-11

Similar Documents

Publication Publication Date Title
CN106327236B (en) Method and device for determining action track of user
CN106709606A (en) Personalized scene prediction method and apparatus
CN108257608B (en) Automatic speech pronunciation ownership
CN108764951B (en) User similarity obtaining method and device, equipment and storage medium
CN111260220B (en) Group control equipment identification method and device, electronic equipment and storage medium
CN107491965A (en) A kind of method for building up and device in biological characteristic storehouse
CN107404481A (en) User profile recognition methods and device
CN110267206A (en) User location prediction technique and device
CN110807547A (en) Method and system for predicting family population structure
CN109033148A (en) One kind is towards polytypic unbalanced data preprocess method, device and equipment
CN105512156A (en) Method and device for generation of click models
CN108770002A (en) Base station flow analysis method, device, equipment and storage medium
CN111538909A (en) Information recommendation method and device
CN112887371B (en) Edge calculation method and device, computer equipment and storage medium
CN110675250A (en) Credit line management method and device based on user marketing score and electronic equipment
CN110335061A (en) Trade mode portrait method for building up, device, medium and electronic equipment
CN114021735A (en) Method and device for processing data in federated learning
CN115035017A (en) Cell density grouping method, device, electronic apparatus and storage medium
CN111563134B (en) Fingerprint database clustering method, system, equipment and storage medium of positioning system
CN116993237A (en) Enterprise recommendation method and system based on cosine similarity algorithm
CN109344875A (en) Based on clustering day wind power output timing generation method, device and medium
CN112764923A (en) Computing resource allocation method and device, computer equipment and storage medium
CN110046898B (en) Account information grouping method and device and payment method and device
CN114399328A (en) Advertisement putting method and device, storage medium and electronic device
CN111291019B (en) Similarity discrimination method and device for data model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant