CN112148822B - Fine granularity attribute weighting method and system - Google Patents

Fine granularity attribute weighting method and system Download PDF

Info

Publication number
CN112148822B
CN112148822B CN202010889448.9A CN202010889448A CN112148822B CN 112148822 B CN112148822 B CN 112148822B CN 202010889448 A CN202010889448 A CN 202010889448A CN 112148822 B CN112148822 B CN 112148822B
Authority
CN
China
Prior art keywords
attribute
fine
matrix
value
granularity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010889448.9A
Other languages
Chinese (zh)
Other versions
CN112148822A (en
Inventor
龚芳
蒋良孝
王欣
郭星锋
王典洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China University of Geosciences
Original Assignee
China University of Geosciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China University of Geosciences filed Critical China University of Geosciences
Priority to CN202010889448.9A priority Critical patent/CN112148822B/en
Publication of CN112148822A publication Critical patent/CN112148822A/en
Application granted granted Critical
Publication of CN112148822B publication Critical patent/CN112148822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/29Geographical information databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification

Abstract

The invention provides a fine granularity attribute weighting method and a fine granularity attribute weighting system, wherein the method comprises the following steps: firstly, subdividing attribute weights on attribute value granularity and class standard granularity, and then setting initial values for fine granularity attribute weights corresponding to different attribute values and different class standards according to priori knowledge statistics; and taking the initial value matrix of the fine-grained attribute weight as an initial state matrix in random restarting walk, calculating to obtain a transfer matrix by using the initial state matrix, and incrementally updating the current state matrix according to the initial state matrix and the transfer matrix to finish random restarting walk to obtain an optimal value matrix of the fine-grained attribute weight. The technical scheme can incrementally update the fine granularity attribute weight state matrix without considering the inductive deviation of the k-nearest neighbor algorithm, and is an optimal scheme with both performance and aging; and the prediction deviation of the k-nearest neighbor algorithm, which is caused by violating attribute independence assumption in the noun attribute distance measurement algorithm, is reduced when the interest point most likely to be accessed by the user is searched.

Description

Fine granularity attribute weighting method and system
Technical Field
The invention relates to the field of geographic information engineering, in particular to a fine granularity attribute weighting method and system.
Background
With the increasing popularity of GPS devices, user trajectory data generated by GPS devices is also becoming increasingly rich. These data record the spatio-temporal information (i.e., latitude and longitude and time stamps) of the user's location. Notably, the raw user trajectory data contains only a single piece of location information, and lacks semantic information revealing the user's behavior (e.g., the purpose and intent of the user's trip). Traditional user trajectory semantic enhancement is achieved by manually filling out a questionnaire. This approach has two drawbacks: 1) The time when the user fills in the questionnaire is different from the time when the user trace data is generated, the user does not remember what is done by the user, and the data is incomplete; 2) Users may come from different cities or countries and it is almost impossible to complete all users' questionnaires. The existing user track semantic enhancement is to mine the semantics of the user track according to the sign-in data of the interest points and the original user track data.
The objective of the semantic enhancement of the user trajectory is to find the most likely point of interest to which the user has access from a series of candidate points of interest, and thus infer the semantic information of each dwell point of the user. The k-nearest neighbor algorithm is used as one of ten classical algorithms in the fields of machine learning and data mining, and can provide an interpretable processing method for uncertainty reasoning. And when finding the most likely interest points accessed by the user, the k-nearest neighbor algorithm calculates the similarity between the user stay point and all the interest points of the possible user activity area to obtain k possible interest points accessed by the user. Semantic information of the user at the stay point is then inferred from the k points of interest.
Distance measurement is a method for measuring the similarity between two samples. It is a core component of the k-nearest neighbor algorithm. Improving the distance metric is a key to improving the text classification performance of the k-nearest neighbor algorithm. The value difference metric and the inversion class designation distance metric are two best noun attribute distance metric algorithms. Common to them is a similarity calculation that converts the distance measure of the part-of-speech attribute into a conditional probability. However, they introduce attribute independence assumptions in calculating conditional probabilities, i.e., attributes are independent of each other, and there is no dependency. Clearly, the distance metric performance of these noun attribute distance metric algorithms based on attribute independence assumptions can be compromised across datasets with strong attribute dependencies.
Attribute weighting provides a viable improvement to avoid prediction bias in the classification of k-nearest neighbor algorithms that violate attribute independence assumptions in these noun attribute distance metric algorithms. Attribute weighting distinguishes between their different contributions and effects on the algorithm by assigning different weights to the different attributes. However, a generalized attribute weight assigns a weight to each attribute, that is, different attribute values of the same attribute correspond to the same weight, and different attributes of the same attribute correspond to the same weight. In fact, the contribution and influence of different attribute values of the same attribute on the algorithm are different, and different attributes corresponding to different classes should also have different weights.
Disclosure of Invention
In order to solve the problems, the invention provides a fine granularity attribute weighting method and a fine granularity attribute weighting system; the attribute weights are subdivided on the attribute value granularity and the class mark granularity, and finer fine-granularity attribute weights are obtained.
The fine granularity attribute weighting method provided by the embodiment of the invention mainly comprises the following steps:
s101: acquiring a user sign-in data set of all interest points of a corresponding active area; the user check-in data set comprises a plurality of check-in data of a plurality of users in the active area; each sign-in data comprises a plurality of attribute values, and each attribute value corresponds to one class mark;
S102: according to the physical meaning of each attribute value, each attribute value is subdivided respectively; the attribute weight corresponding to the subdivided attribute value is subdivided on the attribute value granularity and class mark granularity;
s103: setting initial values for the fine-granularity attribute weights of the fine-granularity attribute values corresponding to different classes of labels according to the priori knowledge statistics of the user sign-in data set to obtain an initial value matrix;
s104: taking the initial value matrix as an initial state matrix in random restarting walk; and calculating and obtaining an optimal value matrix of the fine granularity attribute weight by using a random walk algorithm.
Further, in step S103, an initial value is set for each fine granularity attribute weight corresponding to different classes of labels according to the priori knowledge statistics of the user sign-in data set, so as to obtain an initial value matrix; the method specifically comprises the following steps:
s201: finding out class labels and attribute values which have no dependency relationship or have no dependency relationship under limited user sign-in data according to priori knowledge statistics of a data set, and distributing weight zero to fine-granularity attribute weights corresponding to the class labels and the attribute values;
s202: assigning weights of all fine-granularity attribute weights corresponding to the missing attribute values to zero; if the longitude and latitude or the time of the user reaching a certain interest point is not recorded, the sign-in data of the user at the interest point has a missing attribute value;
s203: and calculating the class membership probability of the rest attribute values and class labels, and setting the class membership probability as an initial value of the corresponding fine-granularity attribute weight.
Further, in step S203, the class membership probability P (c k|ail) is calculated as shown in formula (1):
In the above formula, P (c k|ail) represents a kth class mark c k, and the class member probability corresponding to the ith fine-grained attribute value of the ith attribute value is set as the initial value of the fine-grained attribute weight corresponding to the fine-grained attribute value a il of the class mark c k; l=1, 2, …, S; s is the total number of fine-grained attribute values subdivided by the ith attribute value; j represents the j-th sign-in data in the user sign-in data set D, a i (j) represents the attribute value of the j-th sign-in data on the i-th attribute value, and c (j) represents the class label to which the interest point in the j-th sign-in data belongs; n represents the total number of check-in data in the user check-in data set D, delta (x, y) is a binary function,
Further, in step S104, the initial value matrix is used as an initial state matrix Q 1 in random restarting wander, and the transition matrix B is obtained by calculating the initial state matrix; according to the initial state matrix Q 1 and the transition matrix B, the current state matrix Q 2n-1 is updated in an incremental mode by combining the restarting probability p and the random walk probability 1-p; wherein 2n-1 represents the current number of steps; stopping random restarting wandering when the difference epsilon of the largest element in two continuous state matrixes is smaller than a threshold value theta, and taking the current state matrix as an optimal value matrix Q of the fine-granularity attribute weight; the restarting factor p, the random walk factor 1-p and the threshold value theta of the random restarting algorithm are all super-parameters and are preset manually.
Further, the initial state matrix is shown in formula (2):
In the above formula, w k,u represents a u-th attribute value, and the fine granularity attribute weight corresponding to the kth class mark; u=1, 2,..u, U is the total number of all values of all attributes; k=1, 2, …, t, t is the total number of class labels;
the transition matrix B is obtained by multiplying the initial state matrix by the transpose thereof, and the specific formula is shown in formula (3):
In the above formula, the front-back sequence of multiplication of the initial state matrix and the transpose of the initial state matrix is determined by the total number t of class labels and the total number U of values of all the attributes;
Incrementally updating the current state matrix Q 2n-1; the specific updating formula is shown in formula (4):
The optimal value matrix Q of the fine-grain attribute weight is obtained when the difference epsilon between two consecutive fine-grain attribute weight matrices is smaller than the threshold value theta, specifically as shown in the formula (5):
Further, a fine-grained attribute weighting system, characterized by: the method comprises the following modules:
The user sign-in data set acquisition module is used for acquiring user sign-in data sets of all interest points of the corresponding active area; the user check-in data set comprises a plurality of check-in data of a plurality of users in the active area; each sign-in data comprises a plurality of attribute values, and each attribute value corresponds to one class mark;
The attribute value subdivision module is used for subdividing each attribute value according to the physical meaning of each attribute value; the attribute weight corresponding to the subdivided attribute value is subdivided on the attribute value granularity and class mark granularity;
The fine granularity attribute weight setting module is used for setting initial values for fine granularity attribute weights corresponding to different classes of labels of each fine granularity attribute value according to priori knowledge statistics of the user sign-in data set to obtain an initial value matrix;
The optimizing module is used for taking the initial value matrix as an initial state matrix in random restarting wander; and calculating and obtaining an optimal value matrix of the fine granularity attribute weight by using a random walk algorithm.
Further, in the fine-granularity attribute weight setting module, setting initial values for fine-granularity attribute weights corresponding to different classes of labels of all fine-granularity attribute values according to priori knowledge statistics of the user sign-in data set to obtain an initial state matrix; the method specifically comprises the following units:
The first unit is used for finding out class marks and attribute values which have no dependency relationship or have no dependency relationship under limited user sign-in data according to priori knowledge statistics of the data set, and distributing weight zero to fine granularity attribute weights corresponding to the class marks and the attribute values;
The second unit is used for distributing all fine granularity attribute weights corresponding to the missing attribute values to zero; if the longitude and latitude or the time of the user reaching a certain interest point is not recorded, the sign-in data of the user at the interest point has a missing attribute value;
and the third unit is used for calculating the class member probabilities corresponding to the rest attribute values and the class labels and setting the class member probabilities as initial values corresponding to the fine granularity attribute weights.
Further, in the third unit, the class membership probability P (c k|ail) is calculated as shown in formula (6):
In the above formula, P (c k|ail) represents a kth class mark c k, and the class member probability corresponding to the ith fine-grained attribute value of the ith attribute value is set as the initial value of the fine-grained attribute weight corresponding to the fine-grained attribute value a il of the class mark c k; l=1, 2, …, S; s is the total number of fine-grained attribute values subdivided by the ith attribute value; j represents the j-th sign-in data in the user sign-in data set D, a i (j) represents the attribute value of the j-th sign-in data on the i-th attribute value, and c (j) represents the class label to which the interest point in the j-th sign-in data belongs; n represents the total number of check-in data in the user check-in data set D, delta (x, y) is a binary function,
Further, in the optimizing module, the initial value matrix is used as an initial state matrix Q 1 in random restarting wander, and the initial state matrix is used for calculating to obtain a transfer matrix B; according to the initial state matrix Q 1 and the transition matrix B, the current state matrix Q 2n-1 is updated in an incremental mode by combining the restarting probability p and the random walk probability 1-p; wherein 2n-1 represents the current number of steps; stopping random restarting wandering when the difference epsilon of the largest element in two continuous state matrixes is smaller than a threshold value theta, and taking the current state matrix as an optimal value matrix Q of the fine-granularity attribute weight; the restarting factor p, the random walk factor 1-p and the threshold value theta of the random restarting algorithm are all super-parameters and are preset manually.
Further, the initial state matrix is shown in formula (7):
In the above formula, w k,u represents a u-th attribute value, and the fine granularity attribute weight corresponding to the kth class mark; u=1, 2,..u, U is the total number of all values of all attributes; k=1, 2, …, t, t is the total number of class labels;
the transition matrix B is obtained by multiplying the initial state matrix by the transpose thereof, and the specific formula is shown in formula (8):
In the above formula, the front-back sequence of multiplication of the initial state matrix and the transpose of the initial state matrix is determined by the total number t of class labels and the total number U of values of all the attributes;
incrementally updating the current state matrix Q 2n-1; the specific updating formula is shown as formula (9):
the optimal value matrix Q of the fine-grain attribute weight is obtained when the difference epsilon between two consecutive fine-grain attribute weight matrices is smaller than the threshold value theta, specifically as shown in the formula (10):
the technical scheme provided by the invention has the beneficial effects that:
(1) The technical scheme provided by the invention is irrelevant to the specific calculation of the distance of the part-of-speech attribute, so that the method can be transplanted to the improvement of the distance measurement based on conditional probability calculation of all other attribute independence assumptions;
(2) The incremental updating of the fine granularity attribute weight state matrix without considering the inductive deviation of the k-nearest neighbor algorithm is an optimal scheme with both performance and aging;
(3) Further reducing prediction bias of the k-nearest neighbor algorithm when searching the most likely interest point of the user by violating attribute independence assumption in the noun attribute distance measurement algorithm.
Drawings
The invention will be further described with reference to the accompanying drawings and examples, in which:
FIG. 1 is a flow chart of a fine granularity attribute weighting method in an embodiment of the present invention;
FIG. 2 is a schematic flow chart of calculating an optimal value matrix by adopting a random walk algorithm in the embodiment of the invention;
FIG. 3 is a schematic diagram of a module connection of a fine-grained attribute weighting system according to an embodiment of the invention.
Detailed Description
For a clearer understanding of technical features, objects and effects of the present invention, a detailed description of embodiments of the present invention will be made with reference to the accompanying drawings.
The embodiment of the invention provides a fine granularity attribute weighting method and a fine granularity attribute weighting system. Firstly, the attribute describing the user check-in data in the active area is subdivided, then the optimal value matrix of the fine-grained attribute weight in the k-nearest neighbor algorithm is optimized according to the user check-in data set, and further the most probable interest points of the user in the active area are marked by the optimized k-nearest neighbor algorithm. The improved k-nearest neighbor algorithm is equivalent to a universal optimal model, and the optimal model can be used for marking the most probable interest points of various users in the active area.
Referring to fig. 1, fig. 1 is a flowchart of a fine-grained attribute weighting method according to an embodiment of the invention, which specifically includes the following steps:
s101: acquiring a user sign-in data set of a corresponding active area; the user check-in data set comprises a plurality of check-in data of a plurality of users in the active area; each sign-in data comprises a plurality of attribute values, and each attribute value corresponds to one class mark;
Assuming that the set of check-in data for all points of interest of the user in the active area (e.g., in martial arts, in long sand, in Hubei province, etc.) is D (the check-in data may be obtained from some website dedicated to the collection of check-in data, e.g., http:// sites. Google. Com/site/yangdingqi/home/foursquare-dataset), any one piece of check-in data may be represented by a vector (A 1,A2,...,Ai,...,Am), m is the total number of all attributes describing one piece of check-in data, A i is the ith attribute (e.g., the ith attribute is time, the day is divided into 24 time periods, and the time period to which the most popular open time for the point of interest in the piece of check-in data belongs is the value of the piece of check-in data on the ith attribute). The class labels to which the points of interest belong (e.g., dining, shopping, entertainment, etc.) in the strip of check-in data are denoted by c k, where k=1, 2..t, t is the total number of class labels to which all points of interest belong;
S102: according to the physical meaning of each attribute value, each attribute value is subdivided respectively; the attribute weight corresponding to the subdivided attribute value is subdivided on the attribute value granularity and class mark granularity; obtaining a plurality of fine-grained attribute weights of each attribute value after subdivision; as shown in tables (a) and (b), wherein the tables (a) and (b) are generalized attribute weight representation tables and fine-grained attribute weight representation tables, respectively; the attribute value a i is subdivided into two fine-grained attribute values, a i1 and a i2 (e.g., if attribute value a i is time, it may be subdivided into two fine-grained attribute values, a morning time and a afternoon time);
The attribute weight ω i (generalized weighting weight corresponding to the i-th attribute value a i, i=1, 2,., m) is subdivided into attribute value granularity and class label granularity S is the total number of fine-grained attribute values subdivided by the ith attribute value. Table (a) shows generalized attribute weighting weight diagrams, different rows correspond to different class labels c k, different columns correspond to different attributes A i, the generalized attribute weights assign corresponding weights to the same attribute, and the different class labels correspond to the same attribute and have the same weights.
Table (b) shows a schematic diagram of fine-grained attribute weights subdivided from attribute value granularity and class granularity, different rows correspond to different classes c k, different columns correspond to different fine-grained attribute values a il of different attributes, and represent the ith fine-grained attribute value of the ith attribute. The fine granularity attribute weights assign different weights to different fine granularity attribute values, and assign different weights to different class labels corresponding to the same fine granularity attribute value.
(A) Generalized attribute weighting weight representation
(B) Fine grain attribute weighting scheme
S103: setting initial values for the fine-granularity attribute weights of the fine-granularity attribute values corresponding to different classes of labels according to the priori knowledge statistics of the user sign-in data set to obtain an initial value matrix; the initial value matrix consists of class label total number rows and fine granularity attribute weights of total number columns of all attribute values of all attributes; the value of the initial value matrix is a fine granularity attribute weight corresponding to the class label of the corresponding row and the attribute value of the corresponding column;
s104: taking the initial value matrix as an initial state matrix in random restarting walk; calculating to obtain an optimal value matrix of the fine granularity attribute weight by using a random walk algorithm;
And improving the distance measurement in the k-nearest neighbor algorithm by using each target fine-grained attribute weight corresponding to each fine-grained attribute value in the optimal value matrix, and marking the most likely accessed interest point of the user in the active area by using the improved algorithm so as to realize the semantic enhancement of the user track.
In step S103, setting an initial value for each fine-granularity attribute weight corresponding to different types of labels according to the priori knowledge statistics of the user sign-in data set, so as to obtain an initial value matrix; the method specifically comprises the following steps:
s201: finding out class labels and attribute values which have no dependency relationship or have no dependency relationship under limited user sign-in data according to priori knowledge statistics of a data set, and distributing weight zero to fine-granularity attribute weights corresponding to the class labels and the attribute values;
s202: assigning weights of all fine-granularity attribute weights corresponding to the missing attribute values to zero; if the longitude and latitude or the time of the user reaching a certain interest point is not recorded, the sign-in data of the user at the interest point has a missing attribute value;
s203: and calculating the class membership probability of the rest attribute values and class labels, and setting the class membership probability as an initial value of the corresponding fine-granularity attribute weight.
In step S203, the class membership probability P (c k|ail) is calculated as follows:
In the above formula, P (c k|ail) represents the class membership probability corresponding to the kth class label c k and the ith fine-grained attribute value of the ith attribute value, and is set as the fine-grained attribute weight corresponding to the class label c k fine-grained attribute value a il Is set to an initial value of (1); l=1, 2, …, S; s is the total number of fine-grained attribute values subdivided by the ith attribute value; j represents the j-th sign-in data in the user sign-in data set D, a i (j) represents the attribute value of the j-th sign-in data on the i-th attribute value, and c (j) represents the class label to which the interest point in the j-th sign-in data belongs; n represents the total number of signed data in the user signed data set D, delta (x, y) is a binary function,/>I.e. a i(j)=ail, δ (a i(j),ail) =1; otherwise, δ (a i(j),ail) =0.
Referring to fig. 2, fig. 2 is a schematic flow chart of calculating an optimal value matrix by using a random walk algorithm in an embodiment of the invention; in step S104, the initial value matrix is used as an initial state matrix Q 1 in random restarting walk, and a transition matrix B is obtained by calculating the initial state matrix; according to the initial state matrix Q 1 and the transition matrix B, the current state matrix Q 2n-1 is updated in an incremental mode by combining the restarting probability p and the random walk probability 1-p; wherein 2n-1 represents the current number of steps; stopping random restarting wandering when the difference epsilon of the largest element in two continuous state matrixes is smaller than a threshold value theta, and taking the current state matrix as an optimal value matrix Q of the fine-granularity attribute weight; the restarting factor p, the random walk factor 1-p and the threshold value theta of the random restarting algorithm are all super-parameters and are preset manually.
The initial state matrix is:
In the above formula, w k,u represents a u-th attribute value, and the fine granularity attribute weight corresponding to the kth class mark; u=1, 2,., U is the total number of all values of all attributes (e.g., a piece of user check-in data contains 3 attributes, each attribute has 1,2, 3 attribute values, then u=1+2+3=6); k=1, 2, …, t, t is the total number of class labels;
The transition matrix B is obtained by multiplying the initial state matrix with the transpose thereof, and the specific formula is as follows:
In the above formula, the front-back sequence of multiplication of the initial state matrix and the transpose of the initial state matrix is determined by the total number t of class labels and the total number U of values of all the attributes;
Incrementally updating the current state matrix Q 2n-1; the specific update formula is as follows:
The optimal value matrix Q of the fine-grained attribute weight is obtained when the difference epsilon of two continuous state matrices is smaller than the threshold value theta, and is specifically as follows:
Referring to fig. 3, fig. 3 is a schematic block diagram of a fine-grained attribute weighting system according to an embodiment of the invention, including: the user sign-in data set acquisition module 11, the attribute value subdivision module 12, the fine granularity attribute weight setting module 13 and the optimizing module 14; wherein the method comprises the steps of
A user check-in data set acquisition module 11, configured to acquire a user check-in data set corresponding to the active area; the user check-in data set comprises a plurality of check-in data of a plurality of users in the active area; each sign-in data comprises a plurality of attribute values, and each attribute value corresponds to one class mark;
The attribute value subdivision module 12 is configured to subdivide each attribute value according to a physical meaning of each attribute value; the attribute weight corresponding to the subdivided attribute value is subdivided on the attribute value granularity and class mark granularity;
The fine granularity attribute weight setting module 13 is used for setting initial values for fine granularity attribute weights corresponding to different classes of labels according to prior knowledge statistics of the user sign-in data set to obtain an initial value matrix;
A optimizing module 14, configured to take the initial value matrix as an initial state matrix in random restarting wander; and calculating and obtaining an optimal value matrix of the fine granularity attribute weight by using a random walk algorithm.
In the embodiment of the invention, in the fine-granularity attribute weight setting module 13, setting initial values for fine-granularity attribute weights corresponding to different classes of labels of each fine-granularity attribute value according to priori knowledge statistics of the user sign-in data set to obtain an initial value matrix; the method specifically comprises the following units:
The first unit is used for finding out class marks and attribute values which have no dependency relationship or have no dependency relationship under limited user sign-in data according to priori knowledge statistics of the data set, and distributing weight zero to fine granularity attribute weights corresponding to the class marks and the attribute values;
The second unit is used for distributing all fine granularity attribute weights corresponding to the missing attribute values to zero; if the longitude and latitude or the time of the user reaching a certain interest point is not recorded, the sign-in data of the user at the interest point has a missing attribute value;
and the third unit is used for calculating the class member probabilities corresponding to the rest attribute values and the class labels and setting the class member probabilities as initial values corresponding to the fine granularity attribute weights.
In the embodiment of the present invention, in the third unit, the calculation formula of the class membership probability P (c k|ail) is shown as formula (6):
In the above formula, P (c k|ail) represents a kth class mark c k, and the class member probability corresponding to the ith fine-grained attribute value of the ith attribute value is set as the initial value of the fine-grained attribute weight corresponding to the fine-grained attribute value a il of the class mark c k; l=1, 2, …, S; s is the total number of fine-grained attribute values subdivided by the ith attribute value; j represents the j-th sign-in data in the user sign-in data set D, a i (j) represents the attribute value of the j-th sign-in data on the i-th attribute value, and c (j) represents the class label to which the interest point in the j-th sign-in data belongs; n represents the total number of check-in data in the user check-in data set D, delta (x, y) is a binary function, I.e. a i(j)=ail, δ (a i(j),ail) =1; otherwise, δ (a i(j),ail) =0.
In the embodiment of the present invention, in the optimizing module 14, the initial value matrix is used as an initial state matrix Q 1 in random restarting wander, and the transition matrix B is obtained by calculating the initial state matrix; according to the initial state matrix Q 1 and the transition matrix B, the current state matrix Q 2n-1 is updated in an incremental mode by combining the restarting probability p and the random walk probability 1-p; wherein 2n-1 represents the current number of steps; stopping random restarting wandering when the difference epsilon of the largest element in two continuous state matrixes is smaller than a threshold value theta, and taking the current state matrix as an optimal value matrix Q of the fine-granularity attribute weight; the restarting factor p, the random walk factor 1-p and the threshold value theta of the random restarting algorithm are all super-parameters and are preset manually.
In the embodiment of the present invention, the initial state matrix is shown in formula (7):
In the above formula, w k,u represents a u-th attribute value, and the fine granularity attribute weight corresponding to the kth class mark; u=1, 2,..u, U is the total number of all values of all attributes; k=1, 2, …, t, t is the total number of class labels;
the transition matrix B is obtained by multiplying the initial state matrix by the transpose thereof, and the specific formula is shown in formula (8):
In the above formula, the front-back sequence of multiplication of the initial state matrix and the transpose of the initial state matrix is determined by the total number t of class labels and the total number U of values of all the attributes;
incrementally updating the current state matrix Q 2n-1; the specific updating formula is shown as formula (9):
the optimal value matrix Q of the fine-grain attribute weight is obtained when the difference epsilon between two consecutive fine-grain attribute weight matrices is smaller than the threshold value theta, specifically as shown in the formula (10):
The beneficial effects of the invention are as follows: the technical scheme provided by the invention has the following advantages:
(1) Subdividing the attribute weight on the attribute value granularity and the class mark granularity to obtain finer fine granularity attribute weight;
(2) The technical scheme provided by the invention is irrelevant to the specific calculation of the distance of the part-of-speech attribute, so that the method can be transplanted to the improvement of the distance measurement based on conditional probability calculation of all other attribute independence assumptions;
(3) The incremental updating of the fine granularity attribute weight state matrix without considering the inductive deviation of the k-nearest neighbor algorithm is an optimal scheme with both performance and aging;
(4) Further reducing prediction bias of the k-nearest neighbor algorithm when searching the most likely interest point of the user by violating attribute independence assumption in the noun attribute distance measurement algorithm.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (6)

1. A fine granularity attribute weighting method, characterized in that: the method comprises the following steps:
s101: acquiring a user sign-in data set of all interest points of a corresponding active area; the user check-in data set comprises a plurality of check-in data of a plurality of users in the active area; each sign-in data comprises a plurality of attribute values, and each attribute value corresponds to one class mark;
S102: according to the physical meaning of each attribute value, each attribute value is subdivided respectively; the attribute weight corresponding to the subdivided attribute value is subdivided on the attribute value granularity and class mark granularity;
s103: setting initial values for the fine-granularity attribute weights of the fine-granularity attribute values corresponding to different classes of labels according to the priori knowledge statistics of the user sign-in data set to obtain an initial value matrix;
s104: taking the initial value matrix as an initial state matrix in random restarting walk; calculating to obtain an optimal value matrix of the fine granularity attribute weight by using a random walk algorithm;
in step S104, the initial value matrix is used as an initial state matrix Q 1 in random restarting walk, and a transition matrix B is obtained by calculating the initial state matrix; according to the initial state matrix Q 1 and the transition matrix B, the current state matrix Q 2n-1 is updated in an incremental mode by combining the restarting probability p and the random walk probability 1-p; wherein 2n-1 represents the current number of steps; stopping random restarting wandering when the difference epsilon of the largest element in two continuous state matrixes is smaller than a threshold value theta, and taking the current state matrix as an optimal value matrix Q of the fine-granularity attribute weight; wherein, the restarting factor p, the random walk factor 1-p and the threshold value theta of the random restarting algorithm are all super parameters and are preset manually;
The initial state matrix is shown in formula (2):
In the above formula, w k,u represents a u-th attribute value, and the fine granularity attribute weight corresponding to the kth class mark; u=1, 2,..u, U is the total number of all values of all attributes; k=1, 2, …, t, t is the total number of class labels;
the transition matrix B is obtained by multiplying the initial state matrix by its transpose, specifically as shown in formula (3):
In the above formula, the front-back sequence of multiplication of the initial state matrix and the transpose of the initial state matrix is determined by the total number t of class labels and the total number U of values of all the attributes;
Incrementally updating the current state matrix Q 2n-1; the specific updating formula is shown in formula (4):
The optimal value matrix Q of the fine-grain attribute weight is obtained when the difference epsilon between two consecutive fine-grain attribute weight matrices is smaller than the threshold value theta, specifically as shown in the formula (5):
2. a fine-grained attribute weighting method according to claim 1, wherein: in step S103, setting an initial value for each fine-granularity attribute weight corresponding to different types of labels according to the priori knowledge statistics of the user sign-in data set, so as to obtain an initial value matrix; the method specifically comprises the following steps:
s201: finding out class labels and attribute values which have no dependency relationship or have no dependency relationship under limited user sign-in data according to priori knowledge statistics of a data set, and distributing weight zero to fine-granularity attribute weights corresponding to the class labels and the attribute values;
s202: assigning weights of all fine-granularity attribute weights corresponding to the missing attribute values to zero; if the longitude and latitude or the time of the user reaching a certain interest point is not recorded, the sign-in data of the user at the interest point has a missing attribute value;
s203: and calculating the class membership probability of the rest attribute values and class labels, and setting the class membership probability as an initial value of the corresponding fine-granularity attribute weight.
3. A fine-grained attribute weighting method as claimed in claim 2, wherein: in step S203, the class membership probability P (c k|ail) is calculated as shown in formula (1):
In the above formula, P (c k|ail) represents a kth class mark c k, and the class member probability corresponding to the ith fine-grained attribute value of the ith attribute value is set as the initial value of the fine-grained attribute weight corresponding to the fine-grained attribute value a il of the class mark c k; l=1, 2, …, S; s is the total number of fine-grained attribute values subdivided by the ith attribute value; j represents the j-th sign-in data in the user sign-in data set D, a i (j) represents the attribute value of the j-th sign-in data on the i-th attribute value, and c (j) represents the class label to which the interest point in the j-th sign-in data belongs; n represents the total number of check-in data in the user check-in data set D, delta (x, y) is a binary function,
4. A fine grain property weighting system, characterized by: the method comprises the following modules:
The user sign-in data set acquisition module is used for acquiring a user sign-in data set corresponding to the active area; the user check-in data set comprises a plurality of check-in data of all points of interest of a plurality of users in the active area; each sign-in data comprises a plurality of attribute values, and each attribute value corresponds to one class mark;
The attribute value subdivision module is used for subdividing each attribute value according to the physical meaning of each attribute value; the attribute weight corresponding to the subdivided attribute value is subdivided on the attribute value granularity and class mark granularity;
The fine granularity attribute weight setting module is used for setting initial values for fine granularity attribute weights corresponding to different classes of labels of each fine granularity attribute value according to priori knowledge statistics of the user sign-in data set to obtain an initial value matrix;
the optimizing module is used for taking the initial value matrix as an initial state matrix in random restarting wander; calculating to obtain an optimal value matrix of the fine granularity attribute weight by using a random walk algorithm;
in the optimizing module, the initial value matrix is used as an initial state matrix Q 1 in random restarting wander, and a transfer matrix B is obtained by calculating the initial state matrix; according to the initial state matrix Q 1 and the transition matrix B, the current state matrix Q 2n-1 is updated in an incremental mode by combining the restarting probability p and the random walk probability 1-p; wherein 2n-1 represents the current number of steps; stopping random restarting wandering when the difference epsilon of the largest element in two continuous state matrixes is smaller than a threshold value theta, and taking the current state matrix as an optimal value matrix Q of the fine-granularity attribute weight; wherein, the restarting factor p, the random walk factor 1-p and the threshold value theta of the random restarting algorithm are all super parameters and are preset manually;
The initial state matrix is shown in formula (7):
In the above formula, w k,u represents a u-th attribute value, and the fine granularity attribute weight corresponding to the kth class mark; u=1, 2,..u, U is the total number of all values of all attributes; k=1, 2, …, t, t is the total number of class labels;
The transition matrix B is obtained by multiplying the initial state matrix by its transpose, specifically as shown in formula (8):
In the above formula, the front-back sequence of multiplication of the initial state matrix and the transpose of the initial state matrix is determined by the total number t of class labels and the total number U of values of all the attributes;
incrementally updating the current state matrix Q 2n-1; the specific updating formula is shown as formula (9):
the optimal value matrix Q of the fine-grain attribute weight is obtained when the difference epsilon between two consecutive fine-grain attribute weight matrices is smaller than the threshold value theta, specifically as shown in the formula (10):
5. A fine-grained attribute weighting system according to claim 4, wherein: in the fine granularity attribute weight setting module, setting initial values for fine granularity attribute weights corresponding to different classes of labels of each fine granularity attribute value according to priori knowledge statistics of the user sign-in data set to obtain an initial value matrix; the method specifically comprises the following units:
The first unit is used for finding out class marks and attribute values which have no dependency relationship or have no dependency relationship under limited user sign-in data according to priori knowledge statistics of the data set, and distributing weight zero to fine granularity attribute weights corresponding to the class marks and the attribute values;
The second unit is used for distributing all fine granularity attribute weights corresponding to the missing attribute values to zero; if the longitude and latitude or the time of the user reaching a certain interest point is not recorded, the sign-in data of the user at the interest point has a missing attribute value;
and the third unit is used for calculating the class member probabilities corresponding to the rest attribute values and the class labels and setting the class member probabilities as initial values corresponding to the fine granularity attribute weights.
6. A fine-grained attribute weighting system according to claim 5, wherein: in the third unit, class membership probability P (c k|ail) is calculated as shown in equation (6):
In the above formula, P (c k|ail) represents a kth class mark c k, and the class member probability corresponding to the ith fine-grained attribute value of the ith attribute value is set as the initial value of the fine-grained attribute weight corresponding to the fine-grained attribute value a il of the class mark c k; l=1, 2, …, S; s is the total number of fine-grained attribute values subdivided by the ith attribute value; j represents the j-th sign-in data in the user sign-in data set D, a i (j) represents the attribute value of the j-th sign-in data on the i-th attribute value, and c (j) represents the class label to which the interest point in the j-th sign-in data belongs; n represents the total number of check-in data in the user check-in data set D, delta (x, y) is a binary function,
CN202010889448.9A 2020-08-28 2020-08-28 Fine granularity attribute weighting method and system Active CN112148822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010889448.9A CN112148822B (en) 2020-08-28 2020-08-28 Fine granularity attribute weighting method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010889448.9A CN112148822B (en) 2020-08-28 2020-08-28 Fine granularity attribute weighting method and system

Publications (2)

Publication Number Publication Date
CN112148822A CN112148822A (en) 2020-12-29
CN112148822B true CN112148822B (en) 2024-04-19

Family

ID=73889571

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010889448.9A Active CN112148822B (en) 2020-08-28 2020-08-28 Fine granularity attribute weighting method and system

Country Status (1)

Country Link
CN (1) CN112148822B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809233A (en) * 2015-05-12 2015-07-29 中国地质大学(武汉) Attribute weighting method based on information gain ratios and text classification methods
WO2017041541A1 (en) * 2015-09-08 2017-03-16 北京邮电大学 Method for pushing recommendation information, and server and storage medium
CN108629023A (en) * 2018-05-09 2018-10-09 北京京东金融科技控股有限公司 Data digging method, device and computer readable storage medium
CN109492166A (en) * 2018-08-06 2019-03-19 北京理工大学 Continuous point of interest recommended method based on time interval mode of registering
CN109669939A (en) * 2018-11-02 2019-04-23 建湖云飞数据科技有限公司 Object information processing method
CN109934306A (en) * 2019-04-04 2019-06-25 西南石油大学 Multi-tag attribute value division methods and device based on random walk

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809233A (en) * 2015-05-12 2015-07-29 中国地质大学(武汉) Attribute weighting method based on information gain ratios and text classification methods
WO2017041541A1 (en) * 2015-09-08 2017-03-16 北京邮电大学 Method for pushing recommendation information, and server and storage medium
CN108629023A (en) * 2018-05-09 2018-10-09 北京京东金融科技控股有限公司 Data digging method, device and computer readable storage medium
CN109492166A (en) * 2018-08-06 2019-03-19 北京理工大学 Continuous point of interest recommended method based on time interval mode of registering
CN109669939A (en) * 2018-11-02 2019-04-23 建湖云飞数据科技有限公司 Object information processing method
CN109934306A (en) * 2019-04-04 2019-06-25 西南石油大学 Multi-tag attribute value division methods and device based on random walk

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Class-specific attribute value weighting for Naive Bayes;Huan Zhang 等;Information Sciences;20190828;第260-274页 *
余良俊.属性加权的贝叶斯网络分类算法 及其应用研究.万方数据库.第1-120页. *

Also Published As

Publication number Publication date
CN112148822A (en) 2020-12-29

Similar Documents

Publication Publication Date Title
Shaw et al. Learning to rank for spatiotemporal search
Zheng et al. GeoLife: A collaborative social networking service among user, location and trajectory.
US8090729B2 (en) Large graph measurement
CN104361102B (en) A kind of expert recommendation method and system based on group matches
Lv et al. Mining user similarity based on routine activities
Hawkins et al. The CUSUM and the EWMA head-to-head
Li et al. Next and next new POI recommendation via latent behavior pattern inference
Sarawagi et al. Open-domain quantity queries on web tables: annotation, response, and consensus models
CN108228745B (en) Recommendation algorithm and device based on collaborative filtering optimization
CN104133817A (en) Online community interaction method and device and online community platform
US20190102397A1 (en) Methods and systems for client side search ranking improvements
Chen et al. Predicting next locations with object clustering and trajectory clustering
Nishida et al. Probabilistic identification of visited point-of-interest for personalized automatic check-in
CN112000736B (en) Spatiotemporal trajectory adjoint analysis method and system, electronic device and storage medium
Nasraoui et al. A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation
Liu et al. Recommending attractive thematic regions by semantic community detection with multi-sourced VGI data
CN108345662B (en) Sign-in microblog data weighting statistical method considering user distribution area difference
Majid et al. GoThere: travel suggestions using geotagged photos
CN102254025B (en) Information memory retrieving method
Pálovics et al. Location-aware online learning for top-k recommendation
Lee et al. Crowd-sourced carpool recommendation based on simple and efficient trajectory grouping
Li et al. Research on the recommendation algorithm of rural tourism routes based on the fusion model of multiple data sources
CN106934004A (en) A kind of method and apparatus for recommending article to user based on regional feature
CN112148822B (en) Fine granularity attribute weighting method and system
Liao et al. Fusing geographic information into latent factor model for pick-up region recommendation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant