CN108647189B

CN108647189B - Method and device for identifying user crowd attributes

Info

Publication number: CN108647189B
Application number: CN201810461773.8A
Authority: CN
Inventors: 陈肖雅; 胡晓伟; 柳正兵
Original assignee: Zhejiang Banzhi Technology Co ltd
Current assignee: Zhejiang Banzhi Technology Co ltd
Priority date: 2018-05-15
Filing date: 2018-05-15
Publication date: 2022-03-15
Anticipated expiration: 2038-05-15
Also published as: CN108647189A

Abstract

The invention provides a method and a device for identifying user crowd attributes, wherein the method comprises the following steps: collecting longitude and latitude data sets of the marked crowd; matching the longitude and latitude data set of the crowd with an AOI database to obtain an AOI distribution data set; according to the AOI distribution data set, determining probability distribution from the marked crowd to the scene and probability distribution from the scene to the AOI; collecting longitude and latitude data sets of a user; matching the longitude and latitude data set of the user with an AOI database to obtain an AOI distribution data set of the user; further determining the probability that the AOI distribution data sets respectively belong to corresponding scenes under the marked crowd; further determining scene probability distribution of the user; and judging the probability value of the user belonging to the labeled crowd according to the scene probability distribution of the user and the probability distribution from the labeled crowd to the scene. Through a big data algorithm, inference for identifying each link in the user population attribute is more scientific and reasonable, and accuracy of the user population attribute conclusion is improved.

Description

Method and device for identifying user crowd attributes

Technical Field

The invention relates to the technical field of crowd positioning, in particular to a method and a device for identifying user crowd attributes.

Background

In the task of identifying the online positioning crowd, the user is analyzed to visit places at ordinary times by collecting position data of the user, so that whether the user belongs to a certain designated crowd is judged. In actual life, for different crowds, each group of crowds tends to visit different places at ordinary times, and the places visited by the same group of crowds are closer. For example, a finishing group may frequently visit a place related to daily life and work, such as a building material market, a finishing company, and the like. If the visiting frequency of the places of a user is matched with the decoration family, the user can be judged as the decoration family.

For the method for judging the user crowd attributes, the prior art means is generally a rule for artificially defining judgment, and when the probability distribution of the standard attributes is judged, the prior art scheme lacks scientific standards and judgment bases and has more artificial subjectivity.

Disclosure of Invention

In order to solve the problems, the invention provides a method for identifying the user population attributes, and the conclusion of the work of judging the user population attributes is more scientific and reasonable by introducing a big data algorithm.

In order to achieve the above object, the present invention provides a method for identifying attributes of a user population, comprising: collecting longitude and latitude data sets of the marked crowd; matching the longitude and latitude data set of the crowd with the AOI database to obtain an AOI distribution data set of the marked crowd; determining probability distribution of the marked crowd to a scene and probability distribution of the scene to the AOI by adopting LDA algorithm according to the AOI distribution data set of the marked crowd.

The beneficial effects of the above technical scheme are: by sampling the marked crowd, the scene probability distribution of the standard attribute is obtained, and the accuracy of the corresponding distribution probability of the crowd and the scene is improved.

Further, still include: collecting longitude and latitude data sets of a user;

matching the longitude and latitude data set of the user with the AOI database to obtain an AOI distribution data set of the user;

assuming that the user belongs to the marked crowd, determining the probability of the user in a certain corresponding scene under the condition that the AOI belongs to the marked crowd by adopting a Bayesian formula according to the scene distribution dataset of the marked crowd and the probability distribution from the scene to the AOI;

assume that the user's AOI distributes A in the dataset₂A plurality of scenes S belonging to the marked group₁、S₂，A₂Belongs to one of the scene categories S₁The probability calculation formula is as follows:

wherein A is₂Is a stand forInformation about an AOI in a user' S AOI distribution dataset, S₁For marking a scene category under a crowd, S₂Another scene category that is a marked crowd;

in said user's AOI distribution dataset A₂Another scene category S belonging to scene distributions under the tagged crowd₂The probability calculation formula is as follows:

wherein A is₂Distributing information of one AOI in a dataset for an AOI of a user, S₁For marking a scene category under a crowd, S₂Another scene category under the marked crowd;

and calculating the probability distribution of each corresponding scene of the user according to the probability distribution of the AOI of the user and the probability of a certain corresponding scene which belongs to the marked crowd respectively under the known AOI.

Assuming one of the scenes S in the user scene probability distribution₁The specific visiting address is A₁、A₂The user is in the scene S₁The probability calculation formula is as follows:

P(S₁)＝P(S₁,A₁)+P(S₁,A₂)

＝P(A₁)P(S₁/A₁)+P(A₂)P(S₁/A₂)，

wherein S is₁For marking a scene category of scenes under a crowd, A₂Distributing an AOI information in a dataset for the AOI of a user, A₁Information for another AOI in the data set is distributed for the user AOI.

The beneficial effects of the further technical scheme are as follows: by sampling the users to be evaluated, the scene probability distribution of the users is obtained, and the accuracy of the corresponding distribution probability of the users and the scene is improved.

Further, still include: and judging the probability value of the user belonging to the standard crowd by adopting a maximum likelihood formula according to the scene probability distribution of the user and the probability distribution from the marked crowd to the scene.

The beneficial effects of the further technical scheme are as follows: the method comprises the steps of establishing a data analysis algorithm model, analyzing collected position data of a user to obtain scene probability distribution of the user, matching and evaluating the scene probability distribution of the user and the scene probability distribution of standard attributes, and enabling inference for identifying each link in user crowd attributes to be more scientific and reasonable through a big data algorithm, so that accuracy of user crowd attribute conclusion is improved.

Further, the calculation formula for judging the probability value of the user belonging to the standard population is as follows:

the calculation formula for judging the probability value of the user belonging to the standard crowd is as follows:

wherein, θ is the probability distribution of each scene of the standard population, α 'is the number of times that the user visits each scene, and P (θ/α') is the probability value that the user belongs to the standard population.

The beneficial effects of the further technical scheme are as follows: the probability of the existing scene distribution of the user is obtained by assuming that the user belongs to a certain crowd attribute, so that the matching of the actual scene probability distribution of the user and the standard crowd scene probability distribution is completed, the actual scene probability of the user and the standard crowd scene probability distribution are matched through a big data algorithm, and the accuracy of evaluating the crowd attribute of the user is improved.

Furthermore, the invention also provides a computer-readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the method according to any of the preceding claims.

In addition, the invention also provides a device for identifying the attributes of the user population, which comprises the following steps: the acquisition module is used for acquiring longitude and latitude data sets of the marked crowd; the processing module is used for matching the longitude and latitude data set of the crowd with the AOI database to obtain an AOI distribution data set of the marked crowd; and the processing module is used for determining a probability density function of the probability distribution from the marked crowd to the scene and the probability distribution from the scene to the AOI according to the AOI distribution data set of the marked crowd.

Furthermore, the acquisition module is also used for acquiring a longitude and latitude data set of the user;

the processing module is further configured to match the longitude and latitude data set of the user with the AOI database to obtain an AOI distribution data set of the user;

the processing module is further configured to assume that the user belongs to the tagged crowd, and determine, according to the scene distribution dataset of the tagged crowd and the probability distribution from the scene to the AOI, a probability of a corresponding scene of the user under the condition that the AOI is known to belong to the tagged crowd by using a bayesian formula;

wherein A is₂Distributing information of one AOI in the dataset for said user' S AOI, S₁For marking a scene category under a crowd, S₂Another scene category that is a marked crowd;

the processing module is further configured to calculate probability distribution of each corresponding scene of the user according to the probability distribution of the user AOI and probabilities of certain corresponding scenes belonging to the tagged crowd respectively under the known AOI;

P(S₁)＝P(S₁,A₁)+P(S₁,A₂)

＝P(A₁)P(S₁/A₁)+P(A₂)P(S₁/A₂)，

Further, the processing module is further configured to determine, according to the scene probability distribution of the user and the probability distribution from the labeled crowd to the scene, a probability value that the user belongs to the standard crowd by using a maximum likelihood formula.

Further, the formula for calculating the probability value of the user belonging to the standard population by the processing module is as follows:

Drawings

Fig. 1 is a flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention.

Fig. 2 is a second flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention.

Fig. 3 is a third flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention.

Fig. 4 is a fourth flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention.

Fig. 5 is a flowchart illustrating a fifth method for identifying a user demographic property according to an embodiment of the present invention.

Fig. 6 is a sixth flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention.

Fig. 7 is a seventh flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention.

Fig. 8 is an eighth flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention.

Fig. 9 is a ninth flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention.

Fig. 10 is a flowchart illustrating a tenth embodiment of a method for identifying a user demographic property.

Fig. 11 is an eleventh flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention.

Fig. 12 is a schematic structural diagram of an apparatus for identifying attributes of a user population according to an embodiment of the present invention.

Detailed Description

The principles and features of this invention are described below in conjunction with the following drawings, which are set forth by way of illustration only and are not intended to limit the scope of the invention.

1-9, FIG. 1 is a flow chart illustrating a method for identifying user demographic attributes according to an embodiment of the present invention; FIG. 2 is a second flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention; FIG. 3 is a third flowchart illustrating a method for identifying user demographic attributes according to an embodiment of the present invention; FIG. 4 is a flowchart illustrating a fourth exemplary embodiment of a method for identifying user demographic attributes; FIG. 5 is a fifth flowchart illustrating a method for identifying user demographic attributes according to an embodiment of the present invention; FIG. 6 is a sixth flowchart illustrating a method for identifying user demographic attributes according to an embodiment of the present invention; FIG. 7 is a seventh flowchart illustrating a method for identifying a user demographic property according to an embodiment of the present invention; FIG. 8 is an eighth flowchart illustrating a method for identifying user demographic attributes according to an embodiment of the present invention; FIG. 9 is a ninth flowchart illustrating a method for identifying user demographic attributes according to an embodiment of the present invention; FIG. 10 is a flowchart illustrating a tenth embodiment of a method for identifying demographic attributes of a user; FIG. 11 is an eleventh flowchart illustrating a method for identifying demographic attributes of a user according to an embodiment of the present invention; fig. 12 is a schematic structural diagram of an apparatus for identifying attributes of a user population according to an embodiment of the present invention.

1. Offline location population identification overview

In the task of identifying the online positioning crowd, the position data of the user is generally collected, and the places visited by the user at ordinary times are analyzed, so that whether the user belongs to a certain designated crowd is judged.

It should be noted that, there is an assumption that different people will visit different places in normal times, and the places visited by the same people are relatively close. For example: the decoration family can arrive at the places such as the building material market, decoration companies and the like at a certain frequency besides visiting the places related to daily life and work. If the visiting frequency of the places of a user is matched with the decoration family, the user can be judged as the decoration family.

The crowd attribute judgment of the user can be roughly divided into the following steps:

(1) determining a scene distribution of daily visits by a population

A collection of locations with common functional characteristics, such as building markets, cafes, bars, office buildings, and cells, can be referred to as a scene. The scenes visited by the users in each crowd in daily life are usually subject to a certain probability distribution, so for the crowd to be identified, the probability distribution of the scenes corresponding to the users needs to be determined first. For example, the users of the decoration family are in the building material market in what frequency of the daily visit records of the users of the decoration family, and in some other scene in what frequency.

In the existing operation, a corresponding relation is established between the crowd and the scene in a manner of artificially defining rules. This method has no data support, and how much the correlation degree is, whether a certain scene is left or not is doubtful.

(2) Obtaining address information corresponding to scenes

Each scene actually contains a collection of specific addresses. For example, a building material market may include a constant large building material market and a xiaoshan building material market. These specific address representations may be referred to as POIs (points of information, latitude and longitude information). A POI is a point representing a specific address, and the POI does not contain information in the horizontal direction of the specific address.

If it is desired to obtain which scenes a user has visited, it is necessary to know first which corresponding POIs are under all scenes. In the existing operation, the correspondence between the scene and the POI is usually defined artificially.

(3) Obtaining user daily position information list by GPS (Global Positioning System) Positioning

The location of the user is sampled periodically by GPS, for example, 1 time every 10 minutes, to obtain the location information list of the user in a certain period of time. Wherein the position information is represented in the form of latitude and longitude.

(4) Judging address distribution visited by user according to position information of user

After obtaining the daily longitude and latitude information of the user, if a specific address to which the user visits is to be obtained, the longitude and latitude information of the user needs to be matched with the longitude and latitude information of the standard address. The foregoing POI corresponding to each address has latitude and longitude information, but the POI only represents one point, and the address is often a region. The point where the user is located needs to be matched with the standard address area, and if the longitude and latitude coordinates of the position of the user are in the area of a certain address, the user is indicated to be located at the address. The information indicating the address area may be referred to as AOI (information of the address area) of the address. Each address POI has an AOI corresponding to it, but AOIs corresponding to different address POIs may overlap, or different POIs may completely correspond to the same AOI. Such as: the xx cafe and the xx restaurant may be on the upper and lower floors of the same building, and their AOI are overlapped, and the AOI only has information in the horizontal direction and no information in the vertical direction. There are also some POIs whose actual AOIs are non-overlapping, but because the standard AOI information base is not as thin as possible, a batch of POIs may be included in the same AOI. Such as: one cell may have roadside stores, various snacks and barbershops, etc., residential buildings, and lawns. But these POIs correspond to only 1 AOI, i.e. the range of this cell.

When the longitude and latitude coordinates of the user are identified to be within a certain AOI, the user can be judged to be in the address corresponding to the POI. However, in the case where the AOI corresponds to a plurality of addresses, it is not known in which address the user is actually located.

(5) Inferring probability distribution of user daily visit scenarios

After the location information of the user is matched with the address, an address distribution list of the user in a certain period of time can be obtained. According to the preset corresponding relation between the scenes and the addresses, the probability distribution of the daily visited scenes of the user can be obtained.

(6) Judging whether the user belongs to the designated group

According to the corresponding relation between the crowd and the scene defined in advance. And matching the scene distribution obtained by the single user with the scene distribution of the crowd so as to judge whether the user belongs to the crowd.

However, the scene distribution of a single user and the scene distribution of a crowd are not exactly the same, and the similarity is calculated to be a match, which requires a metric in the sense of big data.

In summary, in the prior art, if it is desired to identify the crowd attributes of the users by the offline position data of the users without intervention of a big data algorithm, the following problems exist.

(1) The corresponding relation between the crowd and the scene address cannot be scientifically and comprehensively established.

(2) After the position information of the user is matched with the specific AOI, under the condition that one AOI corresponds to a plurality of POIs, the POIs corresponding to the positions cannot be accurately judged, and the scene where the POIs are located cannot be identified.

(3) After the probability distribution of the scene where the user is located is obtained, when whether the user belongs to a certain crowd or not is identified, scientific standards are lacked to judge that the scene distribution of the user has the homogeneity with the scene distribution of the crowd.

2. LDA overview

It should be noted that LDA is the subject model, and is the earliest applied to the document generation model. The topic model considers an article to have multiple topics, and each topic corresponds to a different word. The construction process of an article includes first selecting a theme with certain probability, and then selecting a word under the theme with certain probability, so as to generate the first word of the article. By repeating the process, the whole article is generated. It is of course assumed here that there is no ordering between words.

In practice, the document generation process is usually the reverse of the document generation process. For example, sometimes it is necessary to cluster a batch of articles, or to calculate the similarity between two articles, or to classify an article into a known class. Knowing the words that constitute these articles, the LDA model can use observable word distributions to find the hidden topics behind the words, calculate the probability distributions of the words corresponding to the topics, and also obtain the probability distributions of the topics corresponding to the articles. The similarity analysis between the articles can be carried out by utilizing the matrix distribution of the corresponding subjects of the articles, so that the problem of matrix sparsity when the similarity analysis is directly carried out by using the articles and the word frequency correlation matrix can be solved.

The essence of the LDA model is to use the topic to reduce the dimension of the words, so that some synonyms or words with the same meaning and different expressions can be classified as the same topic.

The idea and logic of LDA dimension reduction can also be applied to other situations, such as dimension reduction of various disordered commodity names with different expression modes, so that the commodity names are classified into the same standard category system. On the basis, a collaborative filtering algorithm can be adopted to perform personalized recommendation on commodities for the user.

The invention adopts LDA algorithm to reduce the dimension of the AOI of the user, and classifies the AOI of the user into a scene category which has a closer relationship with the attributes of the crowd.

3. Objects of the invention

In the task of identifying the crowd attributes of the users according to the collected user position information, inference of all links is more scientific and reasonable by introducing a big data algorithm.

For a certain marked specific crowd, adopting an LDA algorithm, directly establishing a corresponding relation of crowd-scene-AOI, and calculating probability distribution coefficients from the crowd to the scene and from the scene to the AOI.

For the condition that one AOI has a plurality of scenes, the invention adopts a Bayesian probability formula to respectively calculate the probability of which scene a certain AOI possibly belongs to aiming at each specific crowd;

when the probability distribution of the user to the scene is subjected to homogeneity matching with the probability distribution of the crowd scene, the probability that the user belongs to the crowd is calculated by adopting a maximum likelihood formula.

The following describes the specific embodiments of the present invention in detail.

Firstly, establishing a corresponding crowd-scene-AOI relation system for a crowd to be identified by adopting an LDA algorithm

1. The application of LDA algorithm (Linear Discriminant Analysis, dimension reduction method and Linear Discriminant Analysis) in establishing the relationship between crowd and AOI (area of interest, address area information) is as follows:

in the task of locating people group identification under the line, an important step is to establish the association between the people group and the position AOI. The crowd-AOI relation is established through the intermediate steps of crowd-scene relation, scene-address (POI) relation, address-AOI relation and the like. The correspondence is shown in fig. 2. The LDA model algorithm can be used for directly establishing the association between the crowd-scene-AOI. By analyzing the relationship of the three, it can be found that for a specific crowd or a single user, the position data set represents the probability distribution formed by some scenes, and each scene represents the probability distribution formed by a plurality of AOIs. That is, the generation of a location data set for a certain population or a certain user may be seen as selecting certain scenes with a certain probability, which in turn select certain AOIs with a certain probability.

When the LDA model is applied, the position data set of a crowd or a user in a specific time is equivalent to an article; scene distribution corresponding to the data set is equivalent to topic distribution of the article; the AOI distribution corresponding to the scene is equivalent to the word distribution corresponding to the subject.

The relationship between crowd-scene-AOI is shown in figure 3. The crowd and the scene are in many-to-many relationship, and the scene and the AOI are also in many-to-many relationship. G: denotes a set of groups of people, provided that there are m groups of people, G ═ G (G)₁，G₂，…G_m). S: representing a set of scenes, provided that there are k scenes, S ═ S (S)₁，S₂....S_k). A: denotes the set of AOIs, provided there are n scenes, a ═ a₁，A₂，…A_n)。θ_mk: representing the probability distribution of the crowd to the scene is a matrix of m × k, for example: theta₁₂Representing the probability of crowd 1 selecting scene 2. Beta is a_kn: representing the scene-to-AOI probability distribution is a matrix of k × n, for example: beta is a₁₂Indicating the probability that a person going through scene 1 will select AOI 2.

2. crowd-scene-AOI probability map model

A probabilistic graphical model of crowd-scene-AOI is shown in fig. 4. The aforementioned references a to AOI set, S to scene set, β to scene to AOI probability distribution, and θ to crowd to scene probability distribution. Alpha is a probability distribution parameter of probability distribution theta, and the probability theta between the crowd and the scene is calculated according to the thought of Bayesian school_mkNot actually fixed values, but the values of these probabilities themselves obey a certain probability distribution, i.e. θ_mkThe corresponding value is taken at a certain probability level.

The dependency between these probabilities can be derived from the probabilistic model graph:

(1) the probability distribution of θ is determined by the parameter α, i.e., the probability density of θ is a function of the argument α; the following is a probability distribution function for θ, representing the probability of θ at each possible value.

Wherein, θ is probability distribution from the crowd to the scene, α is a probability distribution parameter of the probability distribution θ, and P (θ/α) is a probability of θ under each possible value.

(2) S (scene) final frequency distribution is determined by a crowd-scene probability distribution value theta, and the specific crowd selects a corresponding scene according to a certain probability distribution value;

(3) a (AOI) the final frequency distribution is simultaneously determined by the frequency distribution of s (scene) and a scene-AOI probability distribution parameter beta;

that is, for a certain crowd/person, if the frequency distribution of the AOI sampling at the daily position of the certain crowd/person is to be generated, the value of the parameter α needs to be determined first, so that the probability distribution of the crowd/scene is determined, and the frequency percentage of the crowd/user daily scene is known. If the value of the parameter β is also determined, the frequency distribution of the AOI samples is also determined.

In the task of establishing the crowd-scene-AOI relationship, the known conditions are as follows: (1) the population marked with the symbol; (2) and positioning the acquired AOI frequency distribution of the crowd. The variables that need to be calculated are: (1) scenes and names between the crowd and the AOI; (2) probability distribution theta between crowd and scene; (3) probability distribution β between scene to AOI.

The method comprises the following specific operations:

1. crowd marking

Defining the crowd to be identified, and marking members of the crowd to obtain the marked crowd. Such as: identifying a decoration family, and finding a batch of known decoration family users to apply crowd labels;

marking the population: a group of users with known crowd attributes are artificially marked, and the marked crowd under the attributes is called. For example, a group of users who are known to be white-collar users are labeled with "white-collar family", and the group of users labeled with "white-collar family" is a marked group.

The collecting module can be used for completing S11 collecting the longitude and latitude data sets of the marked people, and the collecting module can be used for completing S21 collecting the longitude and latitude data sets of the users.

2. Obtaining an AOI dataset

And aiming at the marked people to be identified, acquiring longitude and latitude data of the user once every 10 minutes by a GPS (global positioning system) in a specified time period, and matching the longitude and latitude data into AOI data by the established AOI database. In this way, an AOI data set corresponding to the tagged population is obtained, and the AOI data can be represented by AOI names.

Here, the processing module may complete S12, matching the longitude and latitude data set of the tagged crowd with the AOI database to obtain an AOI distribution data set of the tagged crowd; here, the processing module may complete S22 to match the longitude and latitude data set of the user with the AOI database, so as to obtain the AOI distribution data set of the user.

3. Calculating model latent variable S and parameters alpha and beta through algorithm

And aiming at the marked crowd, solving the probability distribution from the marked crowd to the scene and the probability distribution from the scene to the AOI by adopting LDA.

Specifically, the method comprises the following steps:

inputting:

(1) marking user AOI data sets corresponding to people, wherein one data set of the marked people is a row;

(2) the number of scenes can be k, and a value user can specify the number according to experience and also find out the optimal value through repeated experiments;

(3) setting an initial value for the parameters alpha and beta, wherein the initial values of the alpha and the beta can be within a definition domain;

and (3) outputting:

(1) the value of the parameter alpha and the probability function of the scene probability distribution theta corresponding to each marker group can adopt the theta value with the maximum probability value as the probability distribution theta from the marker group to the scene;

(2) the assigned scene number of each AOI and the AOI probability distribution beta under each scene, wherein topnOI is arranged from high to low at each scene frequency.

Here, S13 may be accomplished by the processing module determining a probability distribution of the tagged population to a scene and a probability distribution of the scene to the AOI by employing an LDA algorithm based on the AOI distribution dataset of the tagged population.

Secondly, for the situation that one AOI possibly has a plurality of scenes, the invention adopts a Bayesian probability formula to respectively calculate the probability of which scene a certain AOI possibly belongs to

After a probability distribution system from the marked people to the scenes and from the scenes to the AOI is established for the marked people (such as decoration families) by adopting an LDA algorithm. It is necessary to determine whether a user belongs to the tagged population. The daily AOI frequency distribution of the user can be obtained by positioning the GPS of the user. The scene distribution corresponding to the AOI of the user may be matched with the scene distribution of the tagged crowd to determine whether the user belongs to the tagged crowd.

However, one AOI may correspond to multiple scenes, and when a user is in a certain AOI, it cannot be determined which scene the user is actually in. It is therefore necessary to calculate, for each group of marked people, the probability of which scene belongs under a particular AOI. For example, with the decoration family in FIG. 5, suppose the user is at A₂(Star avenue) in this AOI, the user is likely to be actually at S₁(e.g., clothing mall), or may be in S₂(e.g., the home appliance market). That is, it is necessary to maintain the known condition θ_mk、β_knUnder the constraint of (2) calculating A₂Respectively belong to S₁、S₂The probability of (c). A bayesian probability formula can be used for this problem.

Fitment users are in A₂When it actually belongs to S₁The probability calculation formula of (c) is as follows:

wherein A is₂Distributing information of one AOI in the dataset for said user' S AOI, S₁For marking a scene category under a crowd, S₂Another scene category for the tagged crowd.

Fitment users are in A₂When it actually belongs to S₂The probability calculation formula of (c) is as follows:

wherein A is₂Distributing information of one AOI in a dataset for an AOI of a user, S₁For marking a scene category under a crowd, S₂Another scene category under the labeled population.

As shown in FIG. 6 and FIG. 7, when determining whether the user belongs to the decoration family, the AOI probability distribution of the user can be converted into a fieldAnd (4) scene probability distribution. At this time, it can be assumed that the user belongs to the decoration family and obeys the same theta as the decoration family_mk、β_knAnd (4) distribution. The scene probability distribution corresponding to each AOI can be calculated through the Bayesian formula. And finally, obtaining the scene probability distribution of the user.

Here, S24 may be implemented by the processing module to calculate the probability distribution of each corresponding scene of the user according to the probability distribution of the AOI of the user and the probability of each corresponding scene belonging to the tagged crowd under the known AOI.

Suppose the AOI collected by the user is A₁、A₂、A₃The frequency of each is P (A)₁)、P(A₂)、P(A₂). Wherein A is₁The corresponding scene is S₁，A₂The corresponding scene is S₁、S₂，A₃The corresponding scene is S₂、S₃. It can be assumed that S₁、S₂、S₃Respectively, are P (S)₁)、P(S₂)、P(S₃)。

Thereby obtaining the probability distribution of the daily visiting scenes of the user to be identified.

When judging whether the user x belongs to the decoration family, we can convert the AOI probability distribution of the user x into the scene probability distribution. At this time, it can be assumed that the user x belongs to the decoration family, subject to the same theta as the decoration family_mk、β_knAnd (4) distribution. The scene probability distribution corresponding to each AOI can be calculated through the Bayesian formula. And finally, obtaining the scene probability distribution of the user x.

S23 may be accomplished by the processing module to determine the probability of the user belonging to a corresponding scene under the tagged group of people by using a bayesian formula based on the scene distribution dataset of the tagged group of people and the probability distribution of the scene to the AOI.

Suppose user x acquires an AOI of A₁、A₂、A₃Their frequencies are respectively P (A)₁)、P(A₂)、P(A₂). Wherein A is₁Corresponding sceneIs S₁，A₂The corresponding scene is S₁、S₂，A₃The corresponding scene is S₂、S₃. Let S₁、S₂、S₃Respectively, are P (S)₁)、P(S₂)、P(S₃)。

P(S₁)＝P(A₁)P(S₁/A₁)+P(A₂)P(S₁/A₂)；

P(S₂)＝P(A₂)P(S₂/A₂)+P(A₃)P(S₂/A₃)；

P(S₃) And so on.

This results in a probability distribution for identifying the daily visited scenes of the user.

Thirdly, based on the AOI visit distribution data of the newly added users, the probability that the users belong to a certain labeled crowd is judged by adopting a maximum likelihood method

The probability distribution from the marked crowd to the scene is obtained through the LDA algorithm, the scene probability distribution of the user is also deduced through the Bayesian probability formula, and the scene distribution of the user and the scene distribution of the marked crowd can be kept consistent in naming.

Here, it is necessary to determine whether the user belongs to the labeled group G. The scene probability distribution of the user needs to be matched with the scene probability distribution of the tagged crowd G. To obtain the probability that the user matches the labeled crowd G, it may be assumed that the user belongs to the labeled crowd G, and what the probability of the user's existing scene distribution is, and the probability of the user's existing scene distribution may be obtained by using the maximum likelihood function of the user's scene distribution.

S31 may be accomplished by the processing module, wherein the maximum likelihood formula is used to determine the probability value that the user belongs to the tagged crowd according to the scene probability distribution of the user and the probability distribution from the tagged crowd to the scene.

It can be assumed that the tagged crowd G arrives at scene S₁、S₂、S₃……S_kRespectively is theta₁、θ₂、θ₃……θ_kThe number of times of data collected by the user in each scene is alpha'₁、α'₂、α'₃……α'_kThe maximum likelihood function of the probability satisfying the user scene frequency distribution is as follows:

wherein θ is the probability distribution of each scene of the labeled crowd, α 'is the data times collected by the user in each scene, and P (θ/α') is the probability that the labeled crowd actually collects the same scene distribution as the user.

The above formula represents the probability that the tagged crowd G actually acquires the same scene distribution as the user, and the probability value obtained by the above formula can be used to represent and evaluate the matching degree between the user and the tagged crowd G.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for identifying attributes of a population of users, comprising:

collecting longitude and latitude data sets of the marked crowd;

matching the longitude and latitude data set of the crowd with an AOI database to obtain an AOI distribution data set of the marked crowd;

determining probability distribution of the marked crowd to a scene and probability distribution of the scene to the AOI by adopting an LDA algorithm according to the AOI distribution data set of the marked crowd;

inputting: marking the AOI distribution data sets of the crowd, wherein one data set of the marked crowd is a line;

the number of scenes is k;

setting an initial value for the parameters alpha and beta, wherein the initial values of the alpha and the beta are in a definition domain;

and (3) outputting:

the value of the parameter alpha and the probability function of the scene probability distribution theta corresponding to each marker group adopt the theta value with the maximum probability value as the probability distribution theta from the marker group to the scene;

the assigned scene number of each AOI and the AOI probability distribution beta under each scene, wherein topnOI is arranged from high to low at each scene frequency.

2. The method of identifying demographic attributes of a user of claim 1, further comprising: collecting longitude and latitude data sets of a user;

assume that the user's AOI distributes A in the dataset₂A plurality of scenes S belonging to the marked group₁、S₂， A₂Belongs to one of the scene categories S₁The probability calculation formula is as follows:

，

wherein the content of the first and second substances,

the probability that information of one AOI in the dataset belongs to one scene class under the tagged population is distributed for the AOI of the user,

probability of occurrence of information of one AOI in the AOI distribution dataset of the user for one scene class under the tagged population,

probability of occurrence of information of one AOI in the AOI distribution dataset of the user for another scene class of the tagged population,

to mark the probability of a scene category under a population,

probability of another scene class for the labeled crowd, A₂Distributing information of one AOI in the dataset for said user' S AOI, S₁For marking a scene category under a crowd, S₂Another scene category that is a marked crowd;

，

wherein the content of the first and second substances,

the probability that information of one AOI in the dataset belongs to another scene class under the tagged population is distributed for the AOI of the user,

probability of occurrence of information of one AOI in the AOI distribution dataset of the user for another scene class under the tagged population,

to mark the probability of a scene category under a population,

probability of another scene class for the labeled crowd, A₂Distributing information of one AOI in a dataset for an AOI of a user, S₁For marking a scene category under a crowd, S₂Another scene category under the marked crowd;

calculating the probability distribution of each corresponding scene of the user according to the probability distribution of the AOI of the user and the probability of a certain corresponding scene which belongs to the marked crowd respectively under the known AOI;

P(S₁) = P(S₁,A₁)+P(S₁,A₂)

=P(A₁)P(S₁/A₁) + P(A₂ )P(S₁/A₂)，

wherein, P (S)₁) Probability of marking one scene class of scenes under people, P (S1, A1) probability of information of another AOI in the user AOI distribution data set occurring simultaneously with one scene class of scenes under people, P (S)₁,A₂) Distributing one AOI information in data set and one scene under marked crowd for AOI of userProbability of scene class co-occurrence, P (S)₁/A₁) Probability of information of another AOI in the user AOI distribution dataset belonging to one scene class of scenes under the tagged population, P (S)₁/A₂) Probability that an AOI information in an AOI distribution dataset for a user belongs to a scene class of scenes under a tagged population, P (A)₁) Probability of distributing information of another AOI in the data set for a user AOI, P (A)₂) Distributing for a user' S AOI the probability of one AOI information in a dataset, S₁For marking a scene category of scenes under a crowd, A₂Distributing an AOI information in a dataset for the AOI of a user, A₁Information for another AOI in the data set is distributed for the user AOI.

3. The method of identifying demographic attributes of a user of claim 2, further comprising:

and judging the probability value of the user belonging to the standard crowd by adopting a maximum likelihood formula according to the scene probability distribution of the user and the probability distribution from the marked crowd to the scene.

4. The method of claim 3, wherein the formula for determining the probability value of the user belonging to the standard population is:

，

wherein theta is the probability distribution of each scene of the marked crowd,

for the number of times the user has collected data in each scene,

the probability of the same scene distribution as the user is collected for the crowd in practice.

5. A computer-readable storage medium characterized by: comprising instructions which, when run on a computer, cause the computer to perform the method according to any one of claims 1 to 4.

6. An apparatus for identifying attributes of a population of users, comprising:

the acquisition module is used for acquiring longitude and latitude data sets of the marked crowd;

the processing module is used for matching the longitude and latitude data set of the crowd with an AOI database to obtain an AOI distribution data set of the marked crowd;

a processing module for determining a probability distribution of the tagged population to a scene and a probability distribution of the scene to the AOI by employing an LDA algorithm according to the AOI distribution dataset of the tagged population;

the number of scenes is k;

and (3) outputting:

7. The apparatus for identifying attributes of a user population according to claim 6,

the acquisition module is also used for acquiring a longitude and latitude data set of the user;

，

wherein the content of the first and second substances,

to mark the probability of a scene category under a population,

probability of another scene class for the labeled crowd, A₂Distributing information of one AOI in the dataset for said user' S AOI, S₁For marking a field under a populationScene type, S₂Another scene category that is a marked crowd;

，

wherein the content of the first and second substances,

to mark the probability of a scene category under a population,

P(S₁) = P(S₁,A₁)+P(S₁,A₂)

=P(A₁)P(S₁/A₁) + P(A₂ )P(S₁/A₂)，

wherein, P (S)₁) Probability of marking one scene class of scenes under people, P (S1, A1) probability of information of another AOI in the user AOI distribution data set occurring simultaneously with one scene class of scenes under people, P (S)₁,A₂) Probability of one AOI information in the AOI distribution data set for a user to occur simultaneously with one scene category of scenes under the tagged crowd, P (S)₁/A₁) Probability of information of another AOI in the user AOI distribution dataset belonging to one scene class of scenes under the tagged population, P (S)₁/A₂) Probability that an AOI information in an AOI distribution dataset for a user belongs to a scene class of scenes under a tagged population, P (A)₁) Probability of distributing information of another AOI in the data set for a user AOI, P (A)₂) Distributing for a user' S AOI the probability of one AOI information in a dataset, S₁For marking a scene category of scenes under a crowd, A₂Distributing an AOI information in a dataset for the AOI of a user, A₁Information for another AOI in the data set is distributed for the user AOI.

8. The apparatus for identifying attributes of a user population according to claim 7,

and the processing module is further used for judging the probability value of the user belonging to the standard crowd by adopting a maximum likelihood formula according to the scene probability distribution of the user and the probability distribution from the marked crowd to the scene.

9. The apparatus of claim 8, wherein the formula for the processing module to determine the probability value of the user belonging to the standard population is:

，

wherein theta is the probability distribution of each scene of the standard crowd,

for the number of times the user visits each scene,

is the probability value of the user belonging to the standard crowd.