CN106600052B - User attribute and social network detection system based on space-time trajectory - Google Patents

User attribute and social network detection system based on space-time trajectory Download PDF

Info

Publication number
CN106600052B
CN106600052B CN201611139349.9A CN201611139349A CN106600052B CN 106600052 B CN106600052 B CN 106600052B CN 201611139349 A CN201611139349 A CN 201611139349A CN 106600052 B CN106600052 B CN 106600052B
Authority
CN
China
Prior art keywords
user
social network
users
trajectory
social
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611139349.9A
Other languages
Chinese (zh)
Other versions
CN106600052A (en
Inventor
王平辉
孙飞扬
王迪
管晓宏
陶敬
张岩
曹鹏飞
贾鹏
胡小雨
曹宇
兰林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201611139349.9A priority Critical patent/CN106600052B/en
Publication of CN106600052A publication Critical patent/CN106600052A/en
Application granted granted Critical
Publication of CN106600052B publication Critical patent/CN106600052B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Engineering & Computer Science (AREA)
  • Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Operations Research (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Development Economics (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a user attribute and social network detection system based on a space-time trajectory, which predicts the real identity attribute of a user by analyzing the behavior pattern of the user and comprises four subsystems of data processing, social network modeling, feature extraction and classification prediction; analyzing the time-space track data; according to the space-time trajectory data, an original point-to-point information-based method is provided for establishing a social network model of the user; an original non-negative tensor resolution (NTF) algorithm is provided to automatically extract implicit characteristics of a user; predicting user attributes by using various classifiers according to implicit characteristics of users; the invention can be used for detecting the authenticity of the user attribute; the method can also be used for detecting the social network of the user; and the accurate pushing of information, friend recommendation and the like can be performed according to the predicted attributes and social networks.

Description

User attribute and social network detection system based on space-time trajectory
Technical Field
The invention belongs to the technical field of data mining, and particularly relates to a user attribute and social network detection system based on a space-time trajectory.
Background
With the application and development of internet technology, more and more internet users are available. The internet has virtualization, and the data of the user on the internet is not necessarily the real attribute of the user, and in order to improve the security of the internet, the authenticity of the user identity needs to be ensured. In addition, understanding the social relationships of users is also of great importance in maintaining public safety, anti-terrorism, and other issues.
The development of mobile communication technology and the rapid popularization of smart mobile devices (such as smart phones and tablet computers) make the connection between mobile devices and users more and more intimate, and many mobile devices and APPs can record the actions of users. The problem of using user behavior to infer user attributes and social networks has thus attracted much research attention.
Here we focus primarily on the user's geographical location information recorded by the mobile device. For example, many users like to post their own messages on social platforms such as WeChat, microblog, etc.; using a shopping or group purchase APP on a mobile device; using map and navigation functions; in order to be able to use these functions at any time, most users will turn on GPS, WIFI or 4G communication for a long time. Third-party suppliers and network operators who develop the APP can acquire the use records of the users and analyze the time and the place of the record generation through some methods. For example, if a user issues a microblog by using a mobile phone, the APP can acquire the current geographical position through the 4G base station information and the GPS function built in the mobile phone; a network operator may locate the geographic location of a user through multiple base stations. The time and place records of each user are combined into a sequence, and the space-time track of the user is obtained. The spatiotemporal trajectory reflects the user's pattern of action.
There are methods to infer user attributes by analyzing the user's spatiotemporal trajectory, but these methods are all based on semantic information of geographic location. For example, a microblog user issues microblog messages at several different places, and in order to determine the attributes (e.g., gender and occupation) of the user, the conventional method needs to know the information of the microblog issuing place (e.g., a mall, a company, a restaurant or an amusement park). Obviously, the semantic information of a place is not always clearly available, for example, different floors of a high-rise building may have different functions. This has a great influence on the effectiveness of the conventional method. In addition, since the purpose of each user appearing in the same place is different, the bottleneck necessarily exists only by inferring the user attributes through the space-time trajectory, and new features need to be added to break through.
The social network is a network graph established by the friend relationships of users, wherein each node represents one user and each edge represents a pair of friend relationships. Methods for inferring social networks using spatiotemporal trajectories generally have two types: one is to presume the possibility that the users have social relationship according to the similarity of the space-time trajectory; the other is considered as: the more times two people appear at a location at the same time (called "co-occurrence"), the more likely they are to have a social relationship. The existing methods for inferring the social network by using the spatiotemporal trajectory are often finished based on semantic information of geographic positions, and besides the above limitations, the methods cannot well handle the influence of accidental 'co-occurrence behaviors'.
In addition, research statistics show that friends in the social network have "homogeneity", that is, the probability that a pair of friends has one or more identical attributes is high, and thus research for predicting user attributes by using the social network also becomes a hot spot. The accuracy of the user attribute conjecture can be obviously improved by combining the space-time trajectory data and the social network information. But social networking information is difficult to obtain in practical applications due to privacy concerns.
Disclosure of Invention
In order to overcome the above-mentioned shortcomings of the prior art, the present invention is directed to a spatio-temporal trajectory-based user attribute and social network detection system, and one advantage of the present invention is that the input spatio-temporal trajectory does not require geographical location data with detailed semantic information, and thus is applicable to a plurality of different types of data sets, compared to conventional methods. Another advantage of the present invention is that when a social network model of a user is established, accidental co-occurrence behaviors can be effectively identified; the invention has the advantages that the social network of the user is inferred directly through the space-time trajectory data, the social network is used as supplementary information to improve the accuracy of the inference of the user attribute, and the problem of data acquisition is overcome.
In order to achieve the purpose, the invention adopts the technical scheme that:
the user attribute and social network detection system based on the space-time trajectory comprises:
and the data processing subsystem is used for realizing the preprocessing of input data, and processing the space-time trajectories of all users into a third-order tensor form which is easy to perform subsequent operations.
Specifically, the data processing subsystem processes the space-time trajectories of all users into a three-order tensor form which is easy to perform subsequent operations, the required original space-time trajectory record comprises a user identifier, a geographic position identifier and a time identifier, the data processing subsystem establishes a three-order tensor with all zero elements, wherein the line number is a user identifier number, the column number is a geographic position identifier number, the tube number is a time period identifier number, namely, each line of the three-order tensor represents one user, each column represents one place, and each tube represents one time period.
And the social network modeling subsystem is used for establishing a social network model of the user by analyzing the space-time trajectory data of the user and storing the social network model in the form of an adjacent matrix.
Specifically, the social network modeling subsystem analyzes the co-occurrence behaviors of the users by using Point Mutual Information (PMI) so as to identify the accidental co-occurrence behaviors and the co-occurrence behaviors occurring due to social relations, and sorts the familiarity among the users according to the times and the credibility of the co-occurrence behaviors, so as to establish the social network model of the users. The number of rows and columns of the adjacency matrix is equal to the number of users, each element represents whether the users in the rows and columns have social relations, and the user uiAnd user ujIs reflected in the ith row and j column of the matrix.
The feature extraction subsystem reduces the dimensionality of the user space-time trajectory, extracts valuable features from the user space-time trajectory data, and enables the extracted features to be suitable for the existing classification algorithm.
Specifically, the invention provides a non-negative tensor decomposition (NTF) algorithm to extract valuable features, time and space track tensors are decomposed, social network information is used for constraint, and three second-order matrixes are obtained and respectively represent implicit features of each user, each geographic position and each time period. The invention is most concerned with the user implicit feature matrix, can reflect the features of each user, is used for training and predicting the classifier, and can set the dimension of the features according to the needs, thereby meeting the requirements of high efficiency and accuracy.
The invention also includes;
and the classification prediction subsystem trains various classifiers by using the implicit characteristics of the user, trains various classifiers by using the implicit characteristics of the user with known attributes to predict the user by using the implicit characteristics of the target user.
Specifically, various classifiers can be used to predict the user attributes, and finally, the user attributes are comprehensively judged. The invention uses three classifiers of SVM, Logistic regression and linear regression at present, and the three classifiers have the advantages of simple realization, high operation efficiency and high classification accuracy.
Compared with the prior art, the invention has the beneficial effects that:
1. the limitation that the existing user attribute prediction technology based on the space-time trajectory depends on the geographical position information is broken through.
The space-time trajectory information required by the invention does not need any geographical position characteristics, and can be replaced by simple marks (such as a place 1 and the like), so that the applicability of the invention is greatly improved, and meanwhile, the prediction precision is obviously improved compared with the prior art due to the addition of the information of the social network.
2. The social network information of the user is presumed through the space-time trajectory, and the problem of data acquisition is directly avoided.
The invention can automatically extract the social network of the user without depending on an additional data source, thereby completely avoiding the most troublesome data acquisition problem in practical application.
3. And by combining social network information, the prediction capability is obviously improved.
The invention combines the social network data and the space-time trajectory data together, and compared with a prediction technology which independently uses the space-time trajectory, the prediction precision is obviously improved.
4. The classification prediction problem of big data can be processed.
When the amount of the spatiotemporal trajectory data is extremely large, due to the fact that the features may be higher than the number of training samples, the prior art often encounters an overfitting problem, and prediction capability is seriously affected. The invention provides a non-negative tensor resolution algorithm, which is used for reducing the dimension of a time-space track, can automatically set the characteristic quantity and thoroughly overcomes the problem.
Drawings
FIG. 1 is a block diagram of the system of the present invention.
FIG. 2 is a flow diagram of a data processing subsystem of the present invention.
FIG. 3 is a flow chart of a social network modeling subsystem of the present invention.
FIG. 4 is a flow chart of the feature extraction subsystem of the present invention.
FIG. 5 is a flow chart of the classification prediction subsystem of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings and embodiments.
As shown in fig. 1, the system is composed of four subsystems, namely a data processing subsystem, a social network modeling subsystem, a feature extraction subsystem and a classification prediction subsystem. The input data of the system is the spatiotemporal trajectory data of a user, and the system comprises three parts: a user identification, a geographic location identification, and a time identification. I.e. each row represents: a user goes through a certain place at a certain time.
It is worth noting that some of the user attributes in the input data are unknown (called target users), and the user attributes need to be predicted, and therefore, some user spatio-temporal trajectory data with known attributes are needed. Different from the prior art, the method and the device do not need to acquire the semantic information of the geographic position, so that the difficulty in acquiring the information is greatly reduced.
Firstly, the space-time trajectory data is input into a data processing subsystem for processing the space-time trajectories of all users into a third-order tensor form which is easy to carry out subsequent operations.
Meanwhile, the space-time trajectory data is also input into the social network modeling subsystem. The spatio-temporal trajectory data of the user are analyzed, the PMI model is used for processing the co-occurrence behaviors, and the accidental co-occurrence behaviors (low credibility) and the co-occurrence behaviors (high credibility) caused by social relations can be effectively identified. And sequencing the familiarity among the users according to the times and the credibility of the co-occurrence behaviors, establishing a social network model of the users, and storing the social network model in the form of an adjacency matrix.
And then, the processed space-time track tensor and social network matrix are sent to a feature extraction subsystem, and an original Nonnegative Tensor Factorization (NTF) algorithm is used for obtaining the implicit features of each user. The nonnegative tensor decomposition algorithm decomposes the time-space track tensor, and uses social network information for constraint to obtain three second-order matrixes which respectively represent implicit characteristics of each user, each geographic position and each time period. The invention is most concerned with the user implicit feature matrix, can reflect the features of each user, is used for training and predicting the classifier, and can set the dimension of the features according to the needs, thereby meeting the requirements of high efficiency and accuracy.
And finally, the implicit characteristic matrix of the user is sent to a classification prediction subsystem, the implicit characteristics of the user with known attributes are used for training various classifiers to predict the user, and the implicit characteristics of the target user are used for prediction. The invention uses three classifiers of SVM, Logistic regression and linear regression at present, and the three classifiers have the advantages of simple realization, high operation efficiency and high classification accuracy.
The detailed description of each subsystem in the invention is as follows:
1. data processing subsystem
The method mainly realizes the preprocessing of input data, including processing the space-time trajectories of all users into a third-order tensor form which is easy to carry out subsequent operations.
Specifically, as shown in fig. 2, the processing procedure of the spatiotemporal trajectory data by the data processing subsystem is as follows:
the original space-time trajectory record comprises a user identifier, a geographical position identifier and a time identifier; each row of the third order tensor represents a user, each column represents a place, and each tube represents a time period. Firstly, the space-time trajectory of a user is obtained, wherein the space-time trajectory comprises a user identifier, a geographical position identifier and a time period identifier. Establishing a third-order tensor with all zero elements, wherein the number of lines is the user identification number, the number of columns is the geographic position identification number, and the number of pipes is the time period identification number. Filling the space-time trajectory of each user into a tensor, wherein each element represents the occurrence times of the user at a certain place in a certain time period. Therefore, a tensor reflecting the user space-time trajectory is obtained, and different influences of different time periods and different geographic positions on attribute prediction can be distinguished.
It is worth noting that the tensor size is also quite large at typical data volumes. Assuming there are ten thousand users, one thousand different geographical locations and one hundred time periods, the tensor scale has reached 10000 x 1000 x 100 to 109The number of features per user is 1000 × 100 — 105Too many features will result in over-fitting of the classifier, affecting the prediction effect, and therefore the tensor has to be processed.
2. Social network modeling subsystem
The method has the main function of establishing a social network model of the user by analyzing the spatio-temporal trajectory data of the user and storing the model in the form of an adjacency matrix. By analyzing the co-occurrence behaviors of the users by using the PMI, the accidental co-occurrence behaviors (low credibility) and the co-occurrence behaviors due to social relations (high credibility) can be effectively identified. And sequencing the familiarity among the users according to the times and the credibility of the co-occurrence behaviors so as to establish a social network model of the users.
Specifically, as shown in fig. 3, the social network modeling subsystem analyzes the spatiotemporal trajectory data as follows:
first, the number of times user u appears at location v is defined as fv(u); user uiAnd user ujThe number of co-occurrence behaviors at the location v is defined as fv(ui,uj). Then, the probability that an occurrence occurring at location v belongs to user u is
Figure BDA0001177593590000071
Where U is all users. Similarly, a co-occurrence occurring at location v belongs to user uiAnd user ujHas a probability of
Figure BDA0001177593590000072
To estimate user uiAnd user ujThe PMI value of them at location v is calculated:
Figure BDA0001177593590000073
PMI reflects user uiAnd user ujThe co-occurrence behavior at location v is a chance event or a possibility of social behavior. The smaller the PMI, the more user u is specifiediAnd user ujFrequently arriving at location v, but rarely coming together, then the co-occurrence may be merely accidental; the larger the PMI is, the more user u is specifiediAnd user ujRarely comes to place v, but often together, then it is likely that the co-occurrence is social. That is, the larger the PMI, the user uiAnd user ujThe higher the probability of having a social relationship.
The PMI then reflects the confidence of each co-occurrence, but the number of occurrences of co-occurrence also reflects the familiarity of two users who may also have a social relationship if the number of occurrences of co-occurrence is far beyond others, even though the PMI is low. Therefore, the familiarity between users can be reflected by multiplying the PMI by the occurrence number of the concurrent behavior.
To sum up, user uiAnd user ujIs defined as:
Figure BDA0001177593590000074
where V is all geographical location identity.
And after traversing the space-time trajectory data, familiarity between any two users can be obtained, for a user u, sequencing according to the familiarity between u and other people to obtain a list, and taking the first n persons from the list, wherein the n persons are considered to have social relations with u. Different thresholds can be selected for n according to system requirements.
Establishing a second-order matrix with all zero elements
Figure BDA0001177593590000075
Where m is the number of users. And filling the social relations among the users into the matrix, wherein each element represents whether the users in the row and the column have the social relations. For example, user uiAnd user ujIs reflected in the ith row and j column of the matrix. This results in a adjacency matrix that reflects the social relationships between users.
3. Feature extraction subsystem
The method has the main functions of reducing the dimensionality of the user space-time trajectory, extracting valuable features from the user space-time trajectory data and enabling the extracted features to be suitable for the existing classification algorithm.
Specifically, the invention applies an original nonnegative tensor decomposition (NTF) algorithm, the input data is a space-time track tensor and a social network matrix which are obtained from the data processing subsystem, the implicit characteristics of each user are obtained through the nonnegative tensor decomposition algorithm, the social network information is used for constraint, and three second-order matrices are obtained, which respectively represent the implicit characteristics of each user, each geographic position and each time period. The invention is most concerned with the user implicit feature matrix, can reflect the features of each user, is used for training and predicting the classifier, and can set the dimension of the features according to the needs, thereby meeting the requirements of high efficiency and accuracy.
The above non-negative tensor resolution (NTF) algorithm is described in detail as follows:
inputting a tensor
Figure BDA0001177593590000081
Where m is the number of users, n is the number of geographical locations, and h is the number of time segments. The NTF algorithm decomposes tensor X into three low-dimensional matrixes, and when the problem is actually solved, the problem is converted into the following optimization problem:
Figure BDA0001177593590000082
wherein
Figure BDA0001177593590000083
Is a non-negative matrix to be learned, r is the dimension of the implicit feature; {. denotes the rank-1 tensor sum; o denotes the vector outer product; u. of:j,v:j,t:jRespectively, the jth column of the matrix U, V, T. U, V, T are implicit characteristic representations of the user, geographic location and time period, respectively.
The NTF algorithm has expandability, and prior knowledge can be added according to requirements to improve the accuracy of implicit characteristics. The invention adds social network information as prior knowledge.
Inputting social network matrix
Figure BDA0001177593590000084
If user uiAnd ujIf there is a social relationship, a (i, j) is 1, otherwise a (i, j) is 0. Since users with social relationships are more likely to have the same attributes, the following loss function will be minimized, corresponding to the optimization problem described above:
Figure BDA0001177593590000091
if have social relationship user uiAnd ujWith different implicit characteristics, the above-mentioned loss function will give a penalty of | | u:i-u:j||2
establishing a diagonal matrix
Figure BDA0001177593590000092
So that
Figure BDA0001177593590000093
Taking L ═ D-a, from a series of derivations, the above equation (2) can be rewritten as follows:
Figure BDA0001177593590000094
combining the two formulas, the objective function of the NTF algorithm can be obtained:
Figure BDA0001177593590000095
wherein the first term is spatio-temporal trajectory information, the second term is social network information, the third term is a regularization term to prevent overfitting, α, the gamma parameter can be adjusted.
The Lagrange Multiplier (Lagrange Multiplier) and the KKT condition (Karush-Kuhn-Tucker) are used for solving the optimization problem. Finally, the following multiplication update rule (multiplicative update rule) is obtained:
Figure BDA0001177593590000096
Figure BDA0001177593590000097
Figure BDA0001177593590000098
wherein X(1),X(2),X(3)The method comprises the steps of expanding modular-1, modular-2 and modular-3 of tensor X, ⊙ representing Khatri-Rao product, representing Hadamard product, randomly generating initial values of U, V and T matrixes, but ensuring nonnegativity, so that the finally obtained U, V and T matrixes are ensured to be nonnegativity due to the fact that multiplication updating rules are usedThe time complexity of the whole method is O (mnhr), the method can be realized on a common computer, and the NTF algorithm can be easily expanded to parallel processing on a distributed system. Overall, the NTF algorithm is an efficient, accurate non-negative tensor resolution algorithm.
In summary, as shown in fig. 4, the flow of the feature extraction subsystem is as follows:
and inputting the space-time track tensor and social network information of the user, and iteratively updating the U, V and T matrixes by using the three formulas (4), (5) and (6) to finally obtain the implicit characteristic matrix U of the user.
4. Classification prediction subsystem
The method has the main functions of training various classifiers by using the implicit characteristics of the user, training various classifiers by using the implicit characteristics of the user with known attributes to predict the user by using the implicit characteristics of the target user.
The present invention uses three classifiers, SVM, Logistic regression and linear regression. The existing scimit-leann tool provides a large number of classifier algorithms, and partial functions in the classification prediction subsystem can be realized by using the algorithms. scinit-lean is a Python-based scientific computing library, and provides several classification algorithm alternatives, and the classification prediction subsystem selects an SVM classifier (sklean. SVM), a Logistic regression (sklean. linear _ model. Logistic regression), and a linear regression (sklean. linear _ model. linear regression).
As shown in fig. 5, the user implicit feature matrix U obtained from the feature extraction subsystem, where the attributes of a part of users are known, is used as a training set to train a classifier, and then the classifier is used to predict the attributes of the other users. Because the classifiers may have wrong judgments, the classification prediction subsystem uses three classifiers to predict the user at the same time, and if the same result is predicted by most classifiers, the prediction result is taken as the final judgment.
In summary, the user attribute and social network detection system based on the space-time trajectory provided by the invention predicts the real identity attribute of the user by analyzing the behavior pattern of the user. The invention can be used for detecting the authenticity of the user attribute; and the method can also be used for precise popularization according to the predicted attributes.

Claims (10)

1. User attribute and social network detecting system based on space-time trajectory, characterized by, includes:
the data processing subsystem is used for realizing the preprocessing of input data, and comprises the steps of processing the space-time trajectories of all users into a third-order tensor form which is easy to perform subsequent operation;
the social network modeling subsystem is used for establishing a social network model of the user by analyzing the space-time trajectory data of the user and storing the social network model in the form of an adjacent matrix;
the feature extraction subsystem is used for reducing the dimensionality of the user space-time trajectory, extracting valuable features from the user space-time trajectory data and enabling the extracted features to be suitable for the existing classification algorithm;
and the classification prediction subsystem trains various classifiers by using the implicit characteristics of the user, trains various classifiers by using the implicit characteristics of the user with known attributes to predict the user by using the implicit characteristics of the target user.
2. The system for detecting user attributes and social networks based on spatio-temporal trajectories according to claim 1, wherein the data processing subsystem processes the spatio-temporal trajectories of all users into a three-order tensor form that is easy for subsequent operations, the required original spatio-temporal trajectory records include user identifiers, geographic location identifiers and time identifiers, the data processing subsystem establishes a three-order tensor whose elements are all zero, where, the number of lines is the number of user identifiers, the number of columns is the number of geographic location identifiers, and the number of pipes is the number of time period identifiers, i.e. each line of the three-order tensor represents a user, each column represents a place, and each pipe represents a time period.
3. The system for detecting user attributes and social networks based on spatio-temporal trajectories as claimed in claim 1, wherein the social network modeling subsystem uses a Point Mutual Information (PMI) to analyze the co-occurrence behaviors of users, so as to identify the accidental co-occurrence behaviors and the co-occurrence behaviors occurring due to social relations, and sorts the familiarity among users according to the number of co-occurrence behaviors and the credibility, thereby establishing the social network model of users.
4. The spatio-temporal trajectory-based user attribute and social network detecting system as claimed in claim 3, wherein the purpose of using Point Mutual Information (PMI) is to reflect user uiAnd user ujThe possibility that the co-occurrence behavior at the place v is a contingency or social behavior is that the larger the PMI is, the larger the user uiAnd user ujThe higher the probability of having a social relationship, the higher the user uiAnd user ujThe PMI value calculation formula at the site v is as follows:
Figure FDA0001177593580000021
wherein p isv(ui) Belonging to user u for an occurrence occurring at location viProbability of pv(uj) Belonging to user u for an occurrence occurring at location vjThe probability of (a) of (b) being,
Figure FDA0001177593580000022
Figure FDA0001177593580000023
fv(ui) For user uiNumber of occurrences at location v, fv(uj) For user ujNumber of occurrences in location v, U being all users, pv(ui,uj) Belonging to user u for a co-occurrence event occurring at location viAnd user ujThe probability of (a) of (b) being,
Figure FDA0001177593580000024
fv(ui,uj) For user uiAnd user ujAt point v outThe number of co-occurrences.
5. The spatiotemporal trajectory-based user attribute and social network detection system of claim 4, wherein the social network modeling subsystem integrates user uiAnd user ujIs defined as:
Figure FDA0001177593580000025
and V is all the geographic position identifiers, and the social network of the users is obtained by sequencing the familiarity among the users.
6. The spatiotemporal trajectory-based user attribute and social network detection system as claimed in claim 5, wherein a second order matrix with all zero elements is established
Figure FDA0001177593580000026
Wherein m is the number of users, the social relationship among the users is filled into the matrix to obtain an adjacent matrix reflecting the social relationship among the users, the number of rows and columns of the adjacent matrix is equal to the number of the users, each element represents whether the users in the rows and columns have the social relationship, and the user u isiAnd user ujIs reflected in the ith row and j column of the matrix.
7. The spatio-temporal trajectory-based user attribute and social network detecting system according to claim 1, wherein the feature extracting subsystem applies a non-negative tensor decomposition (NTF) algorithm to extract valuable features, the non-negative tensor decomposition (NTF) algorithm decomposes a spatio-temporal trajectory tensor, and the social network information is used for constraint to obtain three second-order matrices, which respectively represent implicit features of each user, each geographic location and each time period.
8. The spatiotemporal trajectory-based user attribute and social network detection system of claim 7, wherein the non-temporal trajectory is based on a time-space trajectory of the userThe negative tensor resolution (NTF) algorithm includes: inputting a tensor
Figure FDA0001177593580000031
Wherein m is the number of users, n is the number of geographical locations, and h is the number of time segments; inputting social network matrix
Figure FDA0001177593580000032
If user uiAnd ujIf there is a social relationship, a (i, j) is 1, otherwise a (i, j) is 0, the NTF algorithm will solve the following optimization problem:
Figure FDA0001177593580000033
the above optimization problem is also the objective function, where OTRARepresenting decomposition of time-space track information, OUThe representatives are constrained using a social network,
Figure FDA0001177593580000034
u, V, T are implicit representations of characteristics of the user, geographic location and time period, respectively,
Figure FDA0001177593580000035
is a non-negative matrix to be learned, r is the dimension of the implicit feature;
Figure FDA00011775935800000310
representing the vector outer product; u. of:j,v:j,t:jJ-th columns of matrices U, V, T, respectively; L-D-a is a group of,
Figure FDA0001177593580000036
α, gamma is a regulating parameter.
9. The spatio-temporal trajectory-based user attribute and social network detecting system according to claim 8, wherein the objective function multiplicative updating rule (multiplicative updating rule) is:
Figure FDA0001177593580000037
Figure FDA0001177593580000038
Figure FDA0001177593580000039
wherein X(1),X(2),X(3)The method comprises the steps of carrying out model-1 expansion, model-2 expansion and model-3 expansion on tensor X, ⊙ expressing a Khatri-Rao product, expressing a Hadamard product, and randomly generating initial values of U, V and T matrixes, wherein the initial values of the U, V and T matrixes are required to be non-negative, and finally carrying out iteration to obtain an implicit characteristic matrix U of a user.
10. The spatio-temporal trajectory-based user attribute and social network detection system as claimed in claim 1, wherein the classification prediction subsystem predicts user attributes using a plurality of classifiers and finally determines user attributes synthetically.
CN201611139349.9A 2016-12-12 2016-12-12 User attribute and social network detection system based on space-time trajectory Active CN106600052B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611139349.9A CN106600052B (en) 2016-12-12 2016-12-12 User attribute and social network detection system based on space-time trajectory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611139349.9A CN106600052B (en) 2016-12-12 2016-12-12 User attribute and social network detection system based on space-time trajectory

Publications (2)

Publication Number Publication Date
CN106600052A CN106600052A (en) 2017-04-26
CN106600052B true CN106600052B (en) 2020-04-10

Family

ID=58599378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611139349.9A Active CN106600052B (en) 2016-12-12 2016-12-12 User attribute and social network detection system based on space-time trajectory

Country Status (1)

Country Link
CN (1) CN106600052B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145977B (en) * 2017-04-28 2020-07-31 电子科技大学 Method for carrying out structured attribute inference on online social network user
CN108268762B (en) * 2018-01-17 2021-04-30 同济大学 Mobile social network user identity identification method based on behavior modeling
CN108650614B (en) * 2018-03-19 2020-07-28 复旦大学 Mobile user position prediction method and device for automatically deducing social relationship
CN109325635B (en) * 2018-10-25 2022-02-15 电子科技大学中山学院 Position prediction method based on automatic completion
CN109657703B (en) * 2018-11-26 2023-04-07 浙江大学城市学院 Crowd classification method based on space-time data trajectory characteristics
CN110020883A (en) * 2018-12-12 2019-07-16 阿里巴巴集团控股有限公司 The method and device that unknown scoring in a kind of pair of rating matrix is predicted
CN109783629A (en) * 2019-01-16 2019-05-21 福州大学 A kind of micro-blog event rumour detection method of amalgamation of global event relation information
CN110688726B (en) * 2019-08-19 2023-01-10 华南师范大学 Spatial orientation self-adaptive city expansion simulation method, system and storage medium
CN110569447B (en) * 2019-09-12 2022-03-15 腾讯音乐娱乐科技(深圳)有限公司 Network resource recommendation method and device and storage medium
CN110955804B (en) * 2019-12-03 2024-03-22 南京大学 Adaboost method for user space-time data behavior detection
CN111309960B (en) * 2020-02-26 2024-03-26 腾讯科技(深圳)有限公司 Song list recommendation method and device
CN111382278B (en) * 2020-03-04 2023-08-08 华中师范大学 Social network construction method and system based on space-time track
CN112000898B (en) * 2020-07-14 2024-07-16 浙江大华技术股份有限公司 Data generation method, electronic device and storage medium
CN113469807B (en) * 2021-08-31 2022-03-01 阿里云计算有限公司 Credit risk determination and data processing method, apparatus, medium, and program product
CN117171452A (en) * 2022-05-12 2023-12-05 中国人民解放军国防科技大学 Method for determining social behavior relationship among space-time co-occurrence area, non-public place and user

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117325A (en) * 2011-02-24 2011-07-06 清华大学 Method for predicting dynamic social network user behaviors
CN106204298A (en) * 2016-07-15 2016-12-07 长江大学 Temporary social network under a kind of big data environment determines method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8838688B2 (en) * 2011-05-31 2014-09-16 International Business Machines Corporation Inferring user interests using social network correlation and attribute correlation

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102117325A (en) * 2011-02-24 2011-07-06 清华大学 Method for predicting dynamic social network user behaviors
CN106204298A (en) * 2016-07-15 2016-12-07 长江大学 Temporary social network under a kind of big data environment determines method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Predicting Various Types of User Attributes in Twitter by Using Personalized PageRank;Kazuya Uesato等;《2015 IEEE International Conference on Big Data (Big Data)》;20151101;全文 *

Also Published As

Publication number Publication date
CN106600052A (en) 2017-04-26

Similar Documents

Publication Publication Date Title
CN106600052B (en) User attribute and social network detection system based on space-time trajectory
Feng et al. Deepmove: Predicting human mobility with attentional recurrent networks
Okawa et al. Deep mixture point processes: Spatio-temporal event prediction with rich contextual information
Tian et al. Spatial‐temporal attention wavenet: A deep learning framework for traffic prediction considering spatial‐temporal dependencies
Chen et al. Privacy preserving point-of-interest recommendation using decentralized matrix factorization
CN111914569B (en) Fusion map-based prediction method and device, electronic equipment and storage medium
Yang et al. Spatio-temporal check-in time prediction with recurrent neural network based survival analysis
Jiang et al. Transfer urban human mobility via poi embedding over multiple cities
Wang et al. CasSeqGCN: Combining network structure and temporal sequence to predict information cascades
Li et al. Location inference for non-geotagged tweets in user timelines
Al-Molegi et al. Move, attend and predict: An attention-based neural model for people’s movement prediction
Feng et al. Predicting human mobility with semantic motivation via multi-task attentional recurrent networks
Yang et al. Recurrent spatio-temporal point process for check-in time prediction
CN116310318B (en) Interactive image segmentation method, device, computer equipment and storage medium
CN114138968B (en) Network hotspot mining method, device, equipment and storage medium
CN106600053B (en) User attribute prediction system based on space-time trajectory and social network
Prathap et al. Geospatial crime analysis to determine crime density using Kernel density estimation for the Indian context
Ahmadi et al. Inductive and transductive link prediction for criminal network analysis
Blanco-Justicia et al. Generation of synthetic trajectory microdata from language models
Liu et al. Behaviornet: A fine-grained behavior-aware network for dynamic link prediction
Chen et al. Next location prediction with a graph convolutional network based on a seq2seq framework
CN116720009A (en) Social robot detection method, device, equipment and storage medium
Yang et al. Deep Learning‐Based Destination Prediction Scheme by Trajectory Prediction Framework
CN114116692B (en) Mask and bidirectional model-based missing POI track completion method
CN116029760A (en) Message pushing method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant