CN106327236B

CN106327236B - Method and device for determining action track of user

Info

Publication number: CN106327236B
Application number: CN201510406207.3A
Authority: CN
Inventors: 李辉; 邓珂; 李彦华; 崔江涛; 王蒙
Original assignee: Huawei Technologies Co Ltd; Xidian University
Current assignee: Huawei Technologies Co Ltd; Xidian University
Priority date: 2015-07-10
Filing date: 2015-07-10
Publication date: 2020-04-14
Anticipated expiration: 2035-07-10
Also published as: CN106327236A

Abstract

The embodiment of the invention discloses a method and a device for determining a user action track, relates to the technical field of communication, and is used for improving the accuracy of the determined user action track so as to improve the efficiency of navigation and place recommendation for a user. The method comprises the following steps: determining R place sequences according to a track sequence formed by N position data of a user; determining M target sequence segments according to the R site sequences; and taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user. The technical scheme provided by the embodiment of the invention can be used in the process of determining the action track of the user.

Description

Method and device for determining action track of user

Technical Field

The present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for determining a user action trajectory.

Background

The action track of the user has important potential value in a plurality of fields; for example, after the action track of the user is known, navigation can be performed for the user, more accurate location recommendation can be realized for the user, and advertisement site selection can be optimized for the merchant. With the popularization of terminal devices (such as smart phones, tablet computers and the like) with an automatic positioning function, the position data of the user can be recorded at any time, and how to determine the action track of the user according to the position data of the user has important significance.

At present, after position data of a user at different time points are known, places closest to positions represented by the position data corresponding to the different time points are respectively found out, and a route formed by connecting the places in series according to time sequence is used as an action track of the user.

The action trajectory specified by the above method is not necessarily the closest to the position data, and therefore the accuracy of the action trajectory specified by the above method is not high, and the efficiency of navigation and location recommendation for the user using the action trajectory is not high.

Disclosure of Invention

The embodiment of the invention provides a method and a device for determining a user action track, which are used for improving the accuracy of the determined user action track and further improving the efficiency of navigation and place recommendation for the user.

In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:

in a first aspect, a method for determining a user action track is provided, which includes:

determining R place sequences according to a track sequence formed by N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers;

determining M target sequence segments according to the R site sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer;

and taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user.

With reference to the first aspect, in a first possible implementation manner, the determining M target sequence segments according to the R venue sequences includes:

generating a frequent segment candidate set with the length of x +1 according to the frequent segment with the length of x, wherein when x is 0, the frequent segment candidate set with the length of 1 is as follows: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer;

scanning each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences to obtain the frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences;

when the frequency of a sequence fragment in the frequent fragment candidate set with the length of x +1 in a plurality of place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, determining that the sequence fragment is a frequent fragment with the length of x + 1;

determining M frequent segments of all the frequent segments as M target sequence segments.

With reference to the first possible implementation manner of the first aspect, in a second possible implementation manner, the determining that M frequent segments of all the frequent segments are M target sequence segments includes:

determining M closed frequent segments in all the frequent segments as M target sequence segments; wherein a closed frequent segment of the total of the frequent segments is: a frequent segment that is not a sub-segment of any of the all of the frequent segments.

With reference to the first possible implementation manner or the second possible implementation manner of the first aspect, in a third possible implementation manner, in a process of scanning one sequence segment in the frequent segment candidate set with the length of x +1 in one place sequence, the frequency of the sequence segment in the place sequence is recorded by an automaton.

With reference to the first aspect and any one of the first possible implementation manner to the third possible implementation manner of the first aspect, in a fourth possible implementation manner, the track sequence is a track sequence of the user within a preset time period.

With reference to the first aspect and any one of the first possible implementation manner to the fourth possible implementation manner of the first aspect, in a fifth possible implementation manner, the method further includes:

clustering target sequence segments of a plurality of users to obtain k clustering clusters, wherein k is more than or equal to 1 and is an integer;

representing each user as a user vector with k dimensions, wherein one dimension corresponds to one cluster, and the value of one dimension is the number of target sequence segments of the users in the cluster corresponding to the dimension;

and establishing a Gaussian mixture model, and fitting parameters of the Gaussian mixture model according to the user vectors of the users, wherein the Gaussian mixture model is composed of a plurality of Gaussian models, one Gaussian model corresponds to one user group, and a plurality of user groups corresponding to the Gaussian models are composed of the users.

In a second aspect, an apparatus for determining a user action track is provided, which includes:

a first determination unit for determining R place sequences from a trajectory sequence composed of N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers;

a second determining unit, configured to determine M target sequence segments according to the R site sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer;

and the execution unit is used for taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user.

With reference to the second aspect, in a first possible implementation manner, the second determining unit includes:

a generating unit, configured to generate a frequent segment candidate set with a length of x +1 according to a frequent segment with a length of x, where when x is 0, the frequent segment candidate set with a length of 1 is: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer;

a scanning unit, configured to scan each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences, to obtain a frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences;

a first determining subunit, configured to determine that a sequence segment in the x + 1-long frequent segment candidate set is a frequent segment with a length of x +1 when a frequency of the sequence segment in multiple place sequences is greater than a preset frequency and a sum of probabilities of the multiple place sequences is greater than a preset probability;

and the second determining subunit is configured to determine that M frequent segments of all the frequent segments are M target sequence segments.

With reference to the first possible implementation manner of the second aspect, in a second possible implementation manner, the second determining subunit is specifically configured to:

With reference to the first possible implementation manner or the second possible implementation manner of the second aspect, in a third possible implementation manner, the scanning unit records, by an automaton, the frequency of a sequence segment in a place sequence during a process of scanning the sequence segment in the frequent segment candidate set with the length of x +1 in the place sequence.

With reference to the second aspect and any one of the first possible implementation manner to the third possible implementation manner of the second aspect, in a fourth possible implementation manner, the track sequence is a track sequence of the user within a preset time period.

With reference to the second aspect and any one of the first possible implementation manner to the fourth possible implementation manner of the second aspect, in a fifth possible implementation manner, the apparatus further includes:

the clustering unit is used for clustering the target sequence segments of the users to obtain k clustering clusters, wherein k is greater than or equal to 1 and is an integer;

the representing unit is used for representing each user as a user vector with k dimensions, one dimension corresponds to one cluster, and the value on one dimension is the number of target sequence segments of the users in the cluster corresponding to the dimension;

and the calculation unit is used for establishing a Gaussian mixture model and fitting parameters of the Gaussian mixture model according to the user vectors of the users, wherein the Gaussian mixture model is composed of a plurality of Gaussian models, one Gaussian model corresponds to one user group, and a plurality of user groups corresponding to the Gaussian models are composed of the users.

The method and the device provided by the embodiment of the invention can determine the place sequence corresponding to the user according to the action track formed by the historical position data of the user, and when the frequency of one sequence segment in a plurality of place sequences is higher and the sum of the probabilities of the plurality of place sequences is higher, the probability that the user passes through the places in the sequence segment is higher. If the values of the preset frequency and the preset probability are reasonably set, compared with the prior art, after the target sequence segment is determined according to the method provided by the embodiment of the invention, the action track of the user determined according to the target sequence segment is more accurate, and the efficiency of navigation and location recommendation for the user is higher according to the action track.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flowchart illustrating a method for determining a trajectory of a user action according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a track sequence provided by an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an apparatus for determining a user action trajectory according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of another apparatus for determining a trajectory of a user action according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of another apparatus for determining a user action trajectory according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The execution subject of the embodiment of the present invention may be a server, a workstation, a general PC (Personal Computer), and the like, the server may specifically be an HP DL series server, and the workstation may specifically be a Z820 series workstation. The position data of the user in the embodiment of the invention refers to historical position data of the user, and the position data of the user can be longitude and latitude values of the position where the user is located at a certain moment. The position data of the user may also be represented in other ways, and the embodiment of the present invention is not limited to this specifically.

An embodiment of the present invention provides a method for determining a user action trajectory, as shown in fig. 1, including:

101. determining R place sequences according to a track sequence formed by N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers.

One or more places in the place set corresponding to one piece of position data can be provided, and one position is any one piece of position data in the N pieces of position data.

The places in the place set may be specifically squares, bars, upscale markets, italian restaurants, mcdonald duty, libraries, bus stop boards, parking lots, museums, and the like, and may also be other places, which is not limited in this embodiment of the present invention.

Specifically, when the number of places in the N place sets corresponding to the N position data is r respectively₁、r₂、…r_NIn this case, the number of the location sequences determined from the trajectory sequence composed of the N position data may be R, where R is R₁·r₂·…·r_N。

Illustratively, as shown in FIG. 2, a track sequence is composed of 4 position data of a user, the 4 position data being N respectively₁、N₂、N₃And N₄The trajectory actually represented by the sequence of trajectories may be a curved trajectory, which is drawn as a straight line in fig. 2 for simplicity. With N₁、N₂、N₃And N₄The dotted circle as the center of the circle represents a predetermined range centered on the position data, and N is₁The corresponding place set is a fuzzy set composed of places A, B and C, N₂The corresponding place set is a fuzzy set composed of places A, B and C, N₃The corresponding location set is a fuzzy set consisting of locations A and D, N₄The corresponding location set is a fuzzy set composed of locations B.

From this trajectory sequence 18 sequences of sites can be determined, respectively: [ A, A, A, B ]; [ A, A, D, B ]; [ A, B, A, B ]; [ A, B, D, B ]; [ A, C, A, B ]; [ A, C, D, B ]; [ B, A, A, B ]; [ B, A, D, B ]; [ B, B, A, B ]; [ B, B, D, B ]; [ B, C, A, B ]; [ B, C, D, B ]; [ C, A, A, B ]; [ C, A, D, B ]; [ C, B, A, B ]; [ C, B, D, B ]; [ C, C, A, B ]; [ C, C, D, B ].

102. Determining M target sequence segments according to the R site sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer.

The target sequence fragment may be composed of one site or a plurality of sites.

Specifically, when a site sequence is [ C, B, a, B ], the sequence fragment [ AB ], the sequence fragment [ CA ], the sequence fragment [ BB ], the sequence fragment [ CB ], and the like are all sub-fragments of the site sequence, that is, the sub-fragment of a site sequence is composed of at least one site in the site sequence, and the at least one site is arranged according to the sequence of the site sequence.

Based on the example illustrated in fig. 2, sequence segment [ a ] appears 3 times in the 1 st place sequence, and the frequency of sequence segment [ a ] in the 1 st place sequence is 3. The sequence fragment [ AB ] appears 2 times in the 3 rd-place sequence, and the frequency of the sequence fragment [ AB ] in the 3 rd-place sequence is 2. It should be noted that, when the place sequence is [ A, C, B, a, E, B, D, B ], the frequency of the sequence fragment [ AB ] in the place sequence is also 2, that is, when the frequency of the sequence fragment is determined, several places in the sequence fragment do not need to appear continuously in the place sequence, as long as the precedence order of the several places is consistent with the precedence order of the several places when appearing in the place sequence.

Specifically, in a place set corresponding to one piece of location data, the probability of each place is the probability that the user goes to the place from the location represented by the location data. Since the closer a place is to the location data, the greater the probability that the user will go from the location represented by the location data to the place, the probability of each place in the set of places is inversely proportional to the distance between the place and the location represented by the location data.

In a location set corresponding to one location data, the probability of one location may be calculated by Rayleigh (Rayleigh) distribution, and specifically may be:

where d is the distance between the location represented by the location data and the location, f (d) is the probability of the location in the location set, and σ generally takes the value of 1.

Exemplary, based on the example shown in FIG. 2, N₂The corresponding place set is a fuzzy set consisting of places A, B and C, where the probability of place B in the place set is f (d)₂) Wherein, as shown in FIG. 2, d₂Is N₂Distance from site B.

At the 4 th fieldBy way of example, the probability of the 4 th site sequence is P ═ f (d)₁)·f(d₂)·f(d₃)·f(d₄) Wherein d is₁Is N₁Distance from A, d₂Is N₂Distance from B, d₃Is N₃Distance from D, D₄Is N₄And B.

In addition, when the preset range of one location data is a range of a square circle D meters centered on the location data, the probability of one place in the place set corresponding to the location data can be further calculated by:

where d is the distance between the location and the location represented by the location data.

Illustratively, taking the 4 th venue sequence as an example, the probability of the 4 th venue sequence is

103. And taking a route formed by connecting the places in any one of the M target sequence segments in series according to the time sequence as the action track of the user.

Generally, the daily action trajectory of the user has a certain regularity (for example, office workers generally have a fixed action trajectory during working days). The method provided by the embodiment of the invention can determine the place sequence corresponding to the user according to the action track formed by the historical position data of the user, and when the frequency of one sequence segment in a plurality of place sequences is higher and the sum of the probabilities of the plurality of place sequences is higher, the probability that the user passes through the place in the sequence segment is higher. If the values of the preset frequency and the preset probability are reasonably set, compared with the prior art, after the target sequence segment is determined according to the method provided by the embodiment of the invention, the action track of the user determined according to the target sequence segment is more accurate, and the efficiency of navigation and location recommendation for the user is higher according to the action track.

Optionally, the step 102 may be implemented as follows:

1021. generating a frequent segment candidate set with the length of x +1 according to the frequent segment with the length of x, wherein when x is 0, the frequent segment candidate set with the length of 1 is as follows: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer.

1022. Scanning each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences to obtain the frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences.

1023. And when the frequency of a sequence fragment in the frequent fragment candidate set with the length of x +1 in a plurality of place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, determining that the sequence fragment is a frequent fragment with the length of x + 1.

1024. Determining M frequent segments of all the frequent segments as M target sequence segments.

Illustratively, based on the example described in fig. 2, the frequent segment candidate set with length 1 is a set of all sites constituting the above-mentioned 18 site sequences, i.e., a set consisting of sites [ a ], [ B ], [ C ] and [ D ].

Specifically, an Apriori algorithm may be used to generate a frequent segment candidate set with a length of x +1 from a frequent segment with a length of x. In addition, frequent segments of length x +1 can also be obtained by: frequent segments of length 1 are inserted into places in the frequent segments of length x in sequence, and for simplicity of description, the embodiments of the present invention are described hereinafter taking this case as an example.

Illustratively, when the frequent segments of length 1 are [ A ] and [ B ], then the frequent segment candidate set of length 2 is a set consisting of sequence segments [ AA ], [ AB ], [ BA ] and [ BB ].

Illustratively, when the frequent fragments with the length of 2 are [ AB ] and [ AC ], and the frequent fragments with the length of 1 are [ A ], [ D ], the sequence fragments obtained by inserting [ A ] into [ AB ] are [ AAB ] and [ ABA ]; the sequence fragments obtained by inserting the [ D ] into the [ AB ] are [ DAB ], [ ADB ] and [ ABD ]; similarly, the [ A ] is inserted into the [ AC ], the [ D ] is inserted into the [ AC ] to obtain other sequence fragments, and the frequent fragment candidate set with the length of 3 is a set consisting of [ AAB ], [ ABA ], [ DAB ], [ ADB ], [ ABD ], [ AAC ], [ ACA ], [ DAC ], [ ADC ] and [ ACD ].

Optionally, in the process of scanning a sequence segment in the frequent segment candidate set with the length of x +1 in one of the place sequences, the frequency of the sequence segment in the place sequence is recorded by an automaton.

In this case, first, an automaton is set for each sequence segment in the frequent segment candidate set. Taking the above place sequence 15 as an example, when the sequence segments [ CB ] and [ CA ] with the length of 2 are scanned, the [ CB ] corresponds to the first automaton, and the [ CA ] corresponds to the second automaton, when the first place C is scanned, the two automatons enter the initial state C at the same time. When the second place B is scanned, the first automaton enters a second state B, and since B is the final state of the sequence fragment [ CB ] corresponding to the first automaton, the frequency of [ CB ] is increased by 1. When the third place A is scanned, the second automaton enters a second state A, and the frequency of [ CA ] is increased by 1 because A is the final state of the sequence fragment [ CA ] corresponding to the second automaton. And then, the two automatons continue to scan until the scanned place is C, the first automaton and the second automaton re-enter the initial state C, the above process is repeated until the last place of the place sequence is scanned, and the frequency of the sequence fragment in the place sequence is determined.

Optionally, the step 1024 may be implemented as follows: determining M closed frequent segments in all the frequent segments as M target sequence segments; wherein a closed frequent segment of the total of the frequent segments is: a frequent segment that is not a sub-segment of any of the all of the frequent segments. It should be noted that a frequent segment is not a sub-segment of itself.

In general, a large number of target sequence segments can be determined according to one track sequence of a user, and the number of target sequence segments can be reduced by the alternative method.

Exemplarily, if there are 6 frequent segments in total, wherein the frequent segments with the length of 1 are [ a ], [ B ], [ C ]; frequent segments with length of 2 are [ AB ], [ AC ]; frequent fragments of length 3 are [ ABC ]. Then [ A ], [ B ], [ C ], [ AB ], [ AC ] are all sub-segments of [ ABC ], and the frequent segment [ ABC ] is a closed frequent segment, so that the closed frequent segment [ ABC ] can be determined to be a target sequence segment.

The above embodiments describe a method for determining a target sequence segment of a user by using one track sequence of one user, and when there are multiple track sequences, the target sequence segment of the user can be determined according to the multiple track sequences by using the above method.

Optionally, the track sequence is a track sequence of the user within a preset time period.

Specifically, the preset time period may be 1 day, an morning or an afternoon, or may be other preset time periods, which is not specifically limited in the embodiment of the present invention.

In a specific implementation, since the user may perform different activities in different time periods, in order to improve the accuracy of the determined action trajectory of the user, the target sequence segments of the user may be determined by using the daily trajectory sequence of the user in a working day or the target sequence segments of the user may be determined by using the daily trajectory sequence of the user in a resting day. When the action track of the user is determined by using the determined target sequence segment under the condition, different recommendation contents can be provided for the user according to the action track of the user on a working day and a rest day, and the recommendation efficiency is improved.

Optionally, the method may further include the following steps:

(1) clustering target sequence segments of a plurality of users to obtain k cluster clusters, wherein k is more than or equal to 1 and is an integer.

(2) And representing each user as a user vector with k dimensions, wherein one dimension corresponds to one cluster, and the value of one dimension is the number of target sequence segments of the users in the cluster corresponding to the dimension.

(3) And establishing a Gaussian mixture model, and fitting parameters of the Gaussian mixture model according to the user vectors of the users, wherein the Gaussian mixture model is composed of a plurality of Gaussian models, one Gaussian model corresponds to one user group, and a plurality of user groups corresponding to the Gaussian models are composed of the users.

In this alternative method, the target sequence segment of the user may be a target sequence segment determined according to one track sequence, or may be a target sequence segment determined according to a plurality of track sequences.

For example, the target sequence segments of multiple users may be clustered through algorithms such as K-means (K mean), K-medoids (K median), and the like, and may also be implemented through other algorithms, which is not limited in this embodiment of the present invention.

Suppose that the plurality of users can be divided into J user groups, each of which is C₁,...,C_JThe GMM (gaussian mixture Model) is composed of J gaussian models, and one gaussian Model corresponds to one user group. A Gaussian model can be constructed by the user vectors in the user community corresponding to the Gaussian model. Assuming that there are I user vectors for I users (I is generally a large value), the probability density function of the GMM formed by linear addition of J gaussian models is:

j is more than or equal to 1 and less than or equal to J, I is more than or equal to 1 and less than or equal to I, and I, I, J and J are integers.

Wherein, P (C)_j) Probability of occurrence in I users for jth user community, P (U)_i|C_j) Is the probability of the ith user appearing in the jth user community, P (U)_iAnd | Θ) is the probability of the ith user appearing when the parameter is Θ.

Theta comprises 2J parameters of mu₁,...,μ_JSum-sigma₁,...,Σ_JWherein, mu_jSum-sigma_jAnd the parameters correspond to the jth user community. Hereinafter mu_j ^gIs the mu corresponding to the jth user group obtained by the calculation of the ith round (g is more than or equal to 0 and g is an integer)_j，Σ_j ^gMeans sigma corresponding to jth user group obtained by computing in the g-th round_jProcedure for fitting parameters of GMM (i.e. calculating μ)₁,...,μ_JSum-sigma₁,...,Σ_JThe process of (d) includes:

(1) using the basis of mu_j ^(g)Sum-sigma_j ^(g)Calculated P (U)_i|C_j) And P (C)_j) According to the calculation of mu_j ^(g+1)Using mu_j ^(g+1)According to

Calculation according to μ_j ^(g+1)And calculating P (U)_i|C_j) And P (C)_j). Wherein mu is calculated_j ^(g+1)P (C) used in time_j|U_i) Can be determined by the equation μ_j ^(g)Sum-sigma_j ^(g)Calculated P (U)_i|C_j) And P (C)_j) And (4) calculating.

(2) Let g be g + 1.

Repeating the steps (1) and (2) until P (U)_i|C_j) Converge to P (U)_i|C_j) μ at convergence_jSum-sigma_jAnd determining the parameters corresponding to the jth user community.

When g is 0, initialize mu_jSum-sigma_j：

According to μ_j ⁽⁰⁾Sum-sigma_j ⁽⁰⁾P (U) can be calculated_i|C_j) And P (C)_j) Wherein, U_iRefers to the user vector of the ith user.

And determining the probability of each user belonging to each user group according to the user vector of each user and the corresponding parameter of each user group, and determining the user group to which the user belongs as the user group when the probability of one user belonging to one user group is maximum. Then according to the method, the user group to which the plurality of users belong can be determined.

According to the optional method, a large number of target sequence fragments corresponding to the users are clustered to simplify the user vectors, the GMM model is established according to the user vectors of the users, and parameters of the GMM model are fitted, so that the probability of the users in each user group can be determined according to the user vectors of the users, and the maximum probability is determined as the user group to which the users belong. The optional method can perform the same recommendation on the users in the same user group, and improves the recommendation efficiency.

An embodiment of the present invention provides an apparatus 30 for determining a user action track, configured to execute the method for determining a user action track shown in fig. 1, as shown in fig. 3, where the apparatus 30 includes:

a first determination unit 301 for determining R place sequences from a trajectory sequence made up of N position data of a user; wherein, a location sequence is composed of N locations, and an nth location in the N locations is one of the location sets corresponding to the nth location data in the N location data; the location set corresponding to one location data is: a fuzzy set of all places within a preset range centered on the position data; n is more than or equal to 1 and less than or equal to N, R is more than or equal to 1, and N, N and R are integers;

a second determining unit 302, configured to determine M target sequence segments according to the R venue sequences; when the frequency of a sequence segment in a plurality of place sequences in the R place sequences is greater than a preset frequency and the sum of the probabilities of the place sequences is greater than a preset probability, the sequence segment is a target sequence segment, and the frequency of a sequence segment in a place sequence refers to the number of times that the sequence segment appears in the place sequence; the probability of a place sequence is the product of the probabilities of all places in the place sequence, and the probability of a place is the probability of the place in the place set to which the place belongs; m is not less than 1 and is an integer;

an executing unit 303, configured to use a route formed by serially connecting places in any one of the M target sequence segments according to a time sequence as an action track of the user.

Optionally, as shown in fig. 4, the second determining unit 302 may include:

a generating unit 3021, configured to generate a frequent segment candidate set with a length of x +1 according to a frequent segment with a length of x, where when x is 0, the frequent segment candidate set with a length of 1 is: a set of all different sites that make up the R sequences of sites; x is not less than 0 and is an integer;

a scanning unit 3022, configured to scan each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences, to obtain a frequency of each sequence segment in the x + 1-long frequent segment candidate set in each of the R site sequences;

a first determining subunit 3023, configured to determine that a sequence segment in the x + 1-long frequent segment candidate set is a frequent segment with a length of x +1 when the frequency of the sequence segment in multiple place sequences is greater than a preset frequency and the sum of the probabilities of the multiple place sequences is greater than a preset probability;

a second determining subunit 3024, configured to determine that M frequent segments of all the frequent segments are M target sequence segments.

Optionally, the second determining subunit 3024 is specifically configured to:

Optionally, the scanning unit 3022 records, by an automaton, the frequency of a sequence segment in a venue sequence during the scanning of the sequence segment in the x + 1-long frequent segment candidate set in the venue sequence.

Optionally, as shown in fig. 4, the apparatus 30 may further include:

a clustering unit 304, configured to cluster target sequence segments of multiple users to obtain k cluster clusters, where k is greater than or equal to 1, and k is an integer;

a representing unit 305, configured to represent each user as a user vector having k dimensions, where a dimension corresponds to a cluster, and a value in a dimension is the number of target sequence segments of the user in the cluster corresponding to the dimension;

the calculating unit 306 is configured to establish a gaussian mixture model, and fit parameters of the gaussian mixture model according to the user vectors of the users, where the gaussian mixture model is formed by multiple gaussian models, one gaussian model corresponds to one user group, and multiple user groups corresponding to the gaussian models are formed by the users.

Generally, the daily action trajectory of the user has a certain regularity (for example, office workers generally have a fixed action trajectory during working days). The device provided by the embodiment of the invention can determine the place sequence corresponding to the user according to the action track formed by the historical position data of the user, and when the frequency of one sequence segment in a plurality of place sequences is higher and the sum of the probabilities of the plurality of place sequences is higher, the probability that the user passes through the place in the sequence segment is higher. If the values of the preset frequency and the preset probability are reasonably set, compared with the prior art, after the target sequence segment is determined according to the method provided by the embodiment of the invention, the action track of the user determined according to the target sequence segment is more accurate, and the efficiency of navigation and location recommendation for the user is higher according to the action track.

In terms of hardware implementation, each unit in the apparatus 30 may be embedded in a processor of the apparatus 30 or independent from the processor of the apparatus 30 in a hardware form, or may be stored in a memory of the apparatus 30 in a software form, so that the processor may invoke and execute operations corresponding to the above units, where the processor may be a Central Processing Unit (CPU), a microprocessor, a single chip, or the like.

As shown in fig. 5, another apparatus 50 for determining a user action track according to an embodiment of the present invention is provided, for performing the method for determining a user action track shown in fig. 1, where the apparatus 50 includes: a memory 501, a processor 502 and a bus system 503.

The memory 501 and the processor 502 are coupled together by a bus system 503, wherein the memory 501 may comprise a random access memory, and may further comprise a non-volatile memory, such as at least one disk memory. The bus system 503 may be an ISA bus, a PCI bus, an EISA bus, or the like. The bus system 503 may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.

The memory 501 is used to store a set of codes for controlling the processor 502 to perform the following actions:

Optionally, the processor 502 is specifically configured to:

Optionally, the processor 502 records the frequency of the sequence segment in the venue sequence by an automaton during the process of scanning one sequence segment in the x + 1-long frequent segment candidate set in one venue sequence.

Optionally, the processor 502 is further configured to:

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted, or not implemented.

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing module, or two or more modules may be integrated into one module. The integrated module can be realized in a hardware form, and can also be realized in a form of hardware and a software functional module.

The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for determining a trajectory of a user's actions, comprising:

2. The method of claim 1, wherein said determining M target sequence segments from said R venue sequences comprises:

3. The method of claim 2, wherein the determining that M frequent segments of the total of frequent segments are M target sequence segments comprises:

4. The method of claim 3, wherein in the process of scanning a sequence segment in the x + 1-long frequent segment candidate set in a site sequence, the frequency of the sequence segment in the site sequence is recorded by an automaton.

5. The method of claim 1, wherein the sequence of tracks is a sequence of tracks of the user within a preset time period.

6. The method according to any one of claims 1-5, further comprising:

7. An apparatus for determining a trajectory of a user's actions, comprising:

8. The apparatus according to claim 7, wherein the second determining unit comprises:

9. The apparatus according to claim 8, wherein the second determining subunit is specifically configured to:

10. The apparatus of claim 9, wherein the scanning unit records the frequency of the sequence segment in the venue sequence by an automaton during the scanning of the venue sequence for one of the sequence segments in the x +1 long frequent segment candidate set.

11. The apparatus of claim 7, wherein the sequence of tracks is a sequence of tracks of the user within a preset time period.

12. The apparatus according to any one of claims 7-11, further comprising: