CN115759350A - Population mobility prediction method and device for data sparse area - Google Patents

Population mobility prediction method and device for data sparse area Download PDF

Info

Publication number
CN115759350A
CN115759350A CN202211313718.7A CN202211313718A CN115759350A CN 115759350 A CN115759350 A CN 115759350A CN 202211313718 A CN202211313718 A CN 202211313718A CN 115759350 A CN115759350 A CN 115759350A
Authority
CN
China
Prior art keywords
causal
region
knowledge
data
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211313718.7A
Other languages
Chinese (zh)
Other versions
CN115759350B (en
Inventor
李勇
冯涛
金德鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202211313718.7A priority Critical patent/CN115759350B/en
Publication of CN115759350A publication Critical patent/CN115759350A/en
Application granted granted Critical
Publication of CN115759350B publication Critical patent/CN115759350B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a population flow prediction method and device for a data sparse area. The method comprises the following steps: acquiring corresponding region causal knowledge from data of a source region by using a causal discovery model based on reinforcement learning; based on the area causal knowledge and an initial variation automatic encoder, obtaining a variation automatic encoder based on causal enhancement, and recovering the missing characteristics of a target area to obtain a potential causal embedded vector; potential causal embedding vectors are characterization vectors corresponding to observed features and missing features; and migrating learning algorithm based on knowledge distillation between the source region and the target region, migrating the knowledge of the prediction model to the target region, and predicting the population flow of the target region based on the knowledge and the embedded vector. The population mobility prediction method for the data sparse area can effectively solve prediction predicament caused by sparse data, and improves population mobility prediction efficiency and accuracy for the data sparse area.

Description

Population mobility prediction method and device for data sparse area
Technical Field
The invention relates to the technical field of big data processing, in particular to a population mobility prediction method and device for a data sparse area. In addition, an electronic device and a processor-readable storage medium are also related.
Background
Population Mobility (Population Mobility) reflects the urban structure of a city and the distribution of human facility demand. Accurate population flow prediction can help people to better know and plan urban structure and facility demand distribution in advance, so that the travel cost of people is reduced, and the urban efficiency is improved. For developing cities (i.e., data sparse areas), population mobility prediction plays a crucial role, as good urban structure and facility demand distribution are of great significance to the structural layout and future development of developing cities.
At present, the prior art generally models and predicts population flow based on simple physical laws, has limited modeling capability and cannot express complex flow patterns. With the rapid development of machine learning and deep learning, complex models such as a decision tree-based model and a graph neural network highlight the strong capability of predicting the fluidity of human mouth. However, these methods require a large amount of data to fit complex models, and thus have limited application in developing cities. Each of these methods requires modeling a city and fitting the parameters of the model using a large amount of data. It does not help to predict population movement in developing cities because insufficient data collection makes some key features unobservable, resulting in poor efficiency and accuracy of population movement prediction for areas of sparse data. Therefore, how to design a population mobility prediction scheme in a data sparse area to improve population mobility prediction efficiency and accuracy becomes an urgent problem to be solved.
Disclosure of Invention
Therefore, the invention provides a population mobility prediction method and device for a data sparse area, and aims to overcome the defect that the population mobility prediction efficiency and accuracy are poor due to the fact that a population mobility prediction scheme for the data sparse area in the prior art is high in limitation.
In a first aspect, the present invention provides a population flow prediction method for a data sparse area, including:
acquiring corresponding area cause and effect knowledge from data of a source area by using a cause and effect discovery model based on reinforcement learning;
obtaining a variation automatic encoder based on causal enhancement based on the area causal knowledge and an initial variation automatic encoder; restoring the missing features of the target area by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedding vector; wherein the potential causal embedding vector is a characterization vector corresponding to the observed feature and the missing feature;
migrating knowledge of a predictive model to the target region based on a migration learning algorithm of knowledge distillation between the source region and the target region, and performing population flow prediction of the target region based on the knowledge of the predictive model and the potential causal embedding vector; wherein the prediction model is a population flow prediction model previously constructed based on data of the source region.
Further, the obtaining of corresponding area cause and effect knowledge from data of the source area by using the reinforcement learning-based cause and effect discovery model specifically includes:
acquiring data of the source region; establishing a strategy based on the region causal knowledge of reinforcement learning to determine a causal discovery model based on region attribute feature sequencing, analyzing data of the source region by using the causal discovery model to obtain a region attribute feature sequence meeting preset conditions, and pruning by Bayesian test to obtain a feature causal graph containing the relationship between region attribute features; wherein the characteristic cause and effect graph is used to represent the regional cause and effect knowledge.
Further, the recovering of the missing feature of the target region by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedded vector specifically includes:
and taking the characteristic causal graph as a characteristic restoration path, and learning missing information containing region attribute characteristics based on the initial variational automatic encoder and the characteristic restoration path to obtain the potential causal embedded vector.
Further, obtaining a variation automatic encoder based on causal enhancement based on the area causal knowledge and the initial variation automatic encoder specifically includes:
explicitly modeling the unobserved missing features of the target region as auxiliary latent variables, and adding the association relationship between the features to the initial variation automatic encoder by using a causal path corresponding to the region causal knowledge so as to construct and obtain an initial variation automatic encoder based on causal enhancement; and training the initial causal enhancement-based variation automatic encoder through back propagation to obtain the causal enhancement-based variation automatic encoder.
Further, performing population mobility prediction of the target area based on knowledge of the prediction model and the potential causal embedding vector, specifically comprising:
and acquiring the characteristics of a starting point region and the characteristics of an end point region in the target region based on the knowledge of the prediction model and the potential causal embedding vector, and adding distance information between the starting point and the end point to predict the pedestrian volume between the starting point and the end point in the target region.
Further, the region causal knowledge is causal mapping relation information between the person feature and the region attribute feature in the source region.
Further, the target region is a data sparse region.
In a second aspect, the present invention further provides a population flow prediction device for a data sparse area, including:
the area causal knowledge acquisition unit is used for acquiring corresponding area causal knowledge from data of the source area by using a causal discovery model based on reinforcement learning;
a causal embedded vector acquisition unit, configured to obtain a causal enhancement-based variation automatic encoder based on the area causal knowledge and an initial variation automatic encoder; restoring the missing features of the target area by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedding vector; wherein the potential causal embedding vector is a characterization vector corresponding to the observed feature and the missing feature;
the population flow prediction unit is used for migrating the knowledge of a prediction model to the target area based on a migration learning algorithm of knowledge distillation between the source area and the target area, and performing population flow prediction of the target area based on the knowledge of the prediction model and the potential causal embedding vector; wherein the prediction model is a population flow prediction model previously constructed based on data of the source region.
Further, the regional cause and effect knowledge acquisition unit is specifically configured to:
acquiring data of the source region; establishing a strategy based on the region causal knowledge of reinforcement learning to determine a causal discovery model based on region attribute feature sequencing, analyzing data of the source region by using the causal discovery model to obtain a region attribute feature sequence meeting preset conditions, and obtaining a feature causal graph containing relationships among region attribute features through Bayesian inspection pruning; wherein the characteristic cause and effect graph is used to represent the regional cause and effect knowledge.
Further, the causal embedding vector obtaining unit is specifically configured to:
and taking the characteristic causal graph as a characteristic restoration path, and learning missing information containing region attribute characteristics based on the initial variation automatic encoder and the characteristic restoration path to obtain the potential causal embedded vector.
Further, the causal embedded vector obtaining unit is specifically further configured to:
explicitly modeling the unobserved missing features of the target region as auxiliary latent variables, and adding the association relationship between the features to the initial variation automatic encoder by using a causal path corresponding to the region causal knowledge so as to construct and obtain an initial variation automatic encoder based on causal enhancement; and training the initial causal enhancement-based variation automatic encoder through back propagation to obtain the causal enhancement-based variation automatic encoder.
Further, the population flow prediction unit is specifically configured to:
and acquiring the characteristics of a starting point area and the characteristics of an end point area in the target area based on the knowledge of the prediction model and the potential causal embedded vector, and adding distance information between the starting point and the end point to predict the pedestrian volume between the starting point and the end point in the target area.
Further, the area causal knowledge is causal mapping relation information between the human features and the area attribute features in the source area.
Further, the target region is a data sparse region.
In a third aspect, the present invention also provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the computer program implementing the steps of the method for predicting population mobility in a data sparse area as described in any one of the above.
In a fourth aspect, the present invention further provides a processor-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the method for predicting population mobility in a data sparse area as described in any one of the above.
According to the population flow prediction method for the data sparse region, the corresponding causal knowledge of the region is obtained from the data of the source region through a causal discovery model based on reinforcement learning, and the causal variation automatic encoder based on the causal knowledge of the region and an initial variation automatic encoder is obtained; then, restoring the missing features of the target area by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedded vector; then migrating the knowledge of a prediction model to the target area based on a migration learning algorithm of knowledge distillation between the source area and the target area, and predicting the population flow of the target area based on the knowledge of the prediction model and the potential causal embedding vector; the prediction dilemma caused by sparse data is effectively solved, and the population flow prediction efficiency and accuracy for the data sparse area are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flow chart of a population mobility prediction method for a data sparse area according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a population flow prediction principle of a data sparse area based on causal enhancement provided by an embodiment of the present invention;
FIG. 3 is a schematic structural diagram of a feature recovery model provided by an embodiment of the present invention;
FIG. 4 is a schematic diagram of a process for predicting a target region based on knowledge-based distillation provided by an embodiment of the invention;
FIG. 5 is a specific diagram of a population mobility prediction method for a data sparse area according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a population flow prediction device for a data sparse area according to an embodiment of the present invention;
fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments of the present invention, belong to the protection scope of the present invention.
The invention provides a population flow prediction method for a data sparse area, which is used for transferring three types of city knowledge from a source area (namely a city with rich data) to a target area (namely a developing city) based on a novel population flow prediction model of a causal enhancement variation automatic encoder. Specifically, a Causal discovery model based on reinforcement learning is used for searching and discovering regional Causal knowledge (namely city Causal knowledge) in a city with rich data, and a feature recovery model (CEVAE) is obtained by combining with a recovery model based on VAE and is used for obtaining corresponding missing features; then transferring the obtained feature recovery model to a developing city to obtain missing potential representation features; and finally, transferring the knowledge of the prediction model to a developing city by using a transfer learning party based on knowledge distillation through knowledge distillation between the city with rich data and the developing city to predict the population mobility.
The following describes an embodiment of the population flow prediction method based on the data sparse area according to the present invention in detail. As shown in fig. 1, which is a schematic flow chart of a population mobility prediction method for a data sparse area according to an embodiment of the present invention, a specific process includes the following steps:
step 101: and acquiring corresponding regional causal knowledge from the data of the source region by using a causal discovery model based on reinforcement learning. The causal region knowledge is causal mapping relation information between a person characteristic and a region attribute characteristic in a source region, for example, if the person characteristic is an identity attribute of a white collar of a company or a business person, a causal mapping relation exists between the person characteristic and the region attribute characteristic such as a coffee shop or a fast food restaurant. The causal mapping relationship information is causal mapping relationship existing between human features and region attribute features extracted from data of the source region. The source region may refer to a developed city that is data rich.
In the embodiment of the invention, data of the source region are firstly obtained, then a causal discovery model based on region attribute feature sequencing is determined based on a region causal knowledge construction strategy of reinforcement learning, the data of the source region are analyzed by using the causal discovery model to obtain a region attribute feature sequence meeting preset conditions, and a feature causal graph containing the relationship between region attribute features is obtained by Bayesian test pruning. Wherein the characteristic causal graph is used to represent the regional causal knowledge (i.e. urban causal knowledge). The causal discovery model is a deep learning model for discovering causal relationships of features based on observed feature data.
In a specific implementation process, the urban causal knowledge can be constructed and modeled as a causal discovery problem based on region attribute sorting, a corresponding causal discovery model is obtained, and Reinforcement Learning (RL) is used for solving the problem. To incorporate the RL into a causal discovery problem that is based on region feature (i.e., region attribute) ranking, the region feature ranking search problem is tailored to a multi-step Markov Decision Process (MDP). For example, each MDP model can be described formally as a 4-tuple (S, a, P, R). In particular, S and a represent the state space and the motion space, respectively. P: S × A → R represents the probability of state transition. I.e. P(s) t+1 |s t ,a t ) Is the next state s t+1 At the current state s t And action a t Is a conditional probability distribution; finally, R is S × A → R is the reward function; r (S, a) represents the reward received by performing the action a ∈ A under the state, S ∈ S.
How the corresponding components of the MDP model are modeled is detailed below: state (State) in the prior art, potential causal relationships are captured by directly using the observed region attribute F as a State; the present invention uses an encoder to encode each attribute data F i Embedding into state s i This facilitates ordering the search process, and thus, the state space can be derived as
Figure BDA0003908119780000081
And uses a self-attention-based encoder in the transform structure as an encoder. Action (Action) Action space a = { F = { (Action) i I =1,2, \8230, | F | } consists of all region attributes; in each decision step, a region attribute is selected as an actionThe order of attribute selection is the search order, and the action space size | F | is equal to the number of all region attributes, which is 2 compared to other causal discovery methods |F|*|F| The search space is much smaller, and the efficiency of sequential search is greatly improved. State transition (State transition) in the question, the State transition is deterministic and is related to the action selected in the current decision step; that is, if a is selected at the current decision step t t =F j Then the next state is s t+1 =s j =encoder(F j ). Reward (Reward) that a plot Reward is set in the question; plot reward
Figure BDA0003908119780000082
This is only obtained when the order of the regional attributes Π is obtained, which describes the causal structure of the degree of matching between the two and the observed regional attributes.
Based on the above-described MDP model, causal findings based on regional attribute ordering are described by the strategic function π Ω → A. In particular, π (a | s) represents the probability of selecting action a in the current state s. Using a Long Short-Term Memory network (LSTM) based decoder, the state is mapped to the action, i.e., a t =decoder(s t ). Based on the RL framework, corresponding policy gradients are introduced to train the rank-based causal discovery policy model to obtain the optimal sequence of region attributes. The optimal region attribute order corresponds to a fully-connected DAG (directed acyclic graph) and the final feature causal graph is obtained by Bayesian inspection pruning
Figure BDA0003908119780000083
Based on the implementation process, the urban causal knowledge represented by the characteristic causal graph containing the relationship among the regional characteristics can be finally obtained, so that the urban causal knowledge can be conveniently applied in the following process.
Step 102: obtaining a variation automatic encoder based on causal enhancement based on the regional causal knowledge and an initial variation automatic encoder; restoring the missing features of the target area by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedding vector; wherein the potential causal embedding vector is a characterization vector corresponding to the observed feature and the missing feature. The target area is a data sparse area. The initial Variational Auto-encoder refers to a VAE (Variational Auto-encoder) based restoration model, and the causal enhancement based Variational Auto-encoder refers to a feature restoration model, i.e., CEVAE as in fig. 2, which includes an encoder and a decoder.
In the embodiment of the invention, the unobserved missing features of the target region can be explicitly modeled as auxiliary latent variables, and the correlation between the features is added to the initial variation automatic encoder by using a causal path corresponding to the region causal knowledge, so as to construct an initial variation automatic encoder based on causal enhancement; and training the initial causal enhancement-based variation automatic encoder through back propagation to obtain the causal enhancement-based variation automatic encoder. Further, the characteristic causal graph is used as a characteristic restoration path, missing information containing region attribute characteristics is learned based on the initial variation automatic encoder and the characteristic restoration path, and the potential causal embedded vector is obtained; the potential causal embedding vector is an embedding vector of missing features based on observed feature recovery.
Specifically, the CEVAE-based feature recovery process includes: characteristic causal graph to be obtained in the above
Figure BDA0003908119780000091
Missing information containing regional features is learned as a feature recovery path to solve the problem of data sparseness of developing cities. As shown in fig. 3, the structure of CEVAE is based on VAE implementation, and the missing features are modeled as latent variables in both the encoder and decoder to support learning Z of the latent causal embedding vector. The potential causal embedding vector Z (or μ) obtained in FIG. 3 zz ) Passed on to subsequent prediction processes. Wherein the potential causal embedding vectors correspond to missing feature information. The causal enhancement based variation auto-encoder comprises a causal estimation enhancement encoder. TheCausal estimation enhancement encoders are the same as conventional VAEs, whose encoder learns a conditional distribution q (Z | X) that models the dependency of a partially observed feature X (i.e., an observed feature) by a hidden variable Z (i.e., a latent causal embedded vector). The present invention should also consider information of missing features in order to learn representative embedded vectors and to address obstacles to sparseness of developing city data. As shown in fig. 3, the latent variable Y of the missing feature is modeled, which is also a causal dependency of Z. By explicitly modeling the missing features, the distribution of the potential causal embedding vectors Z depends on the observed features X (i.e., observed features) and the missing features Y. Specifically, the encoder consists of N Y A variable estimator for estimating the distribution of each missing feature, each estimator modeling the estimated distribution of the missing features, the causal variables of which are based on a characteristic causal graph
Figure BDA0003908119780000101
The parent node of (c). Each estimator is a parameter independent multi-layer Perceptron (MLP) with the corresponding causal variable as input and the estimated distribution as output. Treating the distribution of all variables as normal, e.g.
Figure BDA0003908119780000102
The estimator can be described as follows.
Figure BDA0003908119780000103
Where | represents a concatenation of variables,
Figure BDA0003908119780000104
to represent
Figure BDA0003908119780000109
The parent node of the middle node i, j represents the parent node of i;
Figure BDA0003908119780000105
to representGaussian distribution, μ i Mean and σ representing Gaussian distribution i Representing the variance of the gaussian distribution. Clearly, causal variables or missing features of a particular feature may be observed. Thus, the missing causal variables will be estimated by their respective estimators. It is assumed that the underlying features, such as population, will be readily available, and therefore the feature causal graph
Figure BDA0003908119780000106
The root node of (a) will not be lost and the starting node of the temporary path will enable subsequent nodes to be inferred.
Given the observed features (i.e., observed features) and the estimators, the missing features can be estimated by sampling from the estimated distribution. Complete information can be gathered by combining the observed features with the estimates. Thus, the distribution of the latent variable Z (i.e. the latent causal embedding vector) will be estimated from the complete information, e.g. P (Z | Y, X). The estimator is also the MLP. This process can be expressed as follows.
Figure BDA0003908119780000107
Where | represents a concatenation of variables, X represents an observed feature,
Figure BDA0003908119780000108
representing the estimated missing features; μ _ { Z }, σ _ { Z } denote the mean and variance of the hidden variable Z, respectively.
In a causal estimation enhanced decoder, the decoder recovers or reconstructs the complete information, e.g., p (X | Z) and p (Y | X, Z), including observed and unobserved features based on the hidden variable Z (i.e., latent variable Z). Using characteristic cause and effect maps
Figure BDA0003908119780000111
As a characteristic restoration path of the decoder. Characteristic cause and effect graph
Figure BDA0003908119780000112
And the order of the region attributes pi gives information on the order in which the decoder generates the region features, i.e. the decoder will generate all the region features from the root node to the leaf nodes in the order of the feature causal graph. Each regional characteristic
Figure BDA0003908119780000113
Is determined from the latent variable Z and its parent node. The reason for this is that latent variable Z is expected to contain the complete configuration file of the embedded object.
The process by which each region feature distribution is reconstructed can be summarized by the following equation.
Figure BDA0003908119780000114
Figure BDA0003908119780000115
Wherein
Figure BDA0003908119780000116
A feature indicating a desired recovery;
Figure BDA0003908119780000117
represents the corresponding Gaussian distribution, μ i And σ i Means and variances of the feature distributions of the reconstructed regions are represented.
In the initial training process of the causal enhancement-based variational auto-encoder, CEVAE can be considered to explicitly model unobserved region features Y as auxiliary latent variables, and the causal path is used to add the dependencies between the features to the conventional vanillaVAE to construct a more efficient causal enhancement-based variational auto-encoder. Similar to VAE, the objective of integral optimization of the causal enhancement variational auto-encoder is the lower bound of Evidence (ELBO), which is as follows:
Figure BDA0003908119780000118
wherein q and p represent the condition distributions of the encoder and decoder of fig. 3, respectively; logq (Y = Y) * | X) represents the likelihood of Y given the observed data X; logq (Y = Y) * | X, Z) given the observed data X and the resulting likelihood of Y in Z.
To impose physical constraints on the auxiliary latent variable Y, two additional terms are added in ELBO, as follows:
Figure BDA0003908119780000121
wherein Y is * An observation tag representing a missing feature; logq (Y = Y) * | X) represents the likelihood of Y given the observed data X; log q (Y = Y) * | X, Z) given the observed data X and the resulting likelihood of Y in Z. This ensures that the estimate of Y and the reconstruction learn stably to avoid accumulating errors and noise when calculating sequentially on the causal path. The optimization goal of (a) is to maximize ELBO. Therefore, the final loss function for optimizing the causal enhancement variational automatic encoder (i.e. the causal enhancement based variational automatic encoder) is as follows:
Figure BDA0003908119780000122
where β is the weight of the assist penalty.
For the training algorithm of CEVAE, 90% of the region features in the source region are randomly extracted to construct a training data set, the remaining 10% of the region features are used as a validation set for adjusting the hyper-parameters, and the time for stopping in advance is selected to avoid overfitting. Each regional feature as a sample contains the set of features observed in the target city as X as an input and the complete set of features
Figure BDA0003908119780000123
As a reconstructed output. Adam was chosen as the optimizer by back-propagating the training causal enhancement variant autoencoder.
Step 103: migrating knowledge of a predictive model to the target region based on a migration learning algorithm of knowledge distillation between the source region and the target region, and performing population flow prediction of the target region based on the knowledge of the predictive model and the potential causal embedding vector; wherein the prediction model is a population flow prediction model previously constructed based on data of the source region. The knowledge distillation is a method of refining knowledge in a cumbersome model (i.e., a prediction model based on data-rich cities) and compressing it into a target model (i.e., a population movement prediction model of developing cities) so that it can be deployed into actual population movement predictions of developing cities where data is sparse.
In the embodiment of the invention, the characteristics of the starting point region and the characteristics of the ending point region in the target region can be obtained based on the knowledge of the prediction model and the potential causal embedded vector, and the distance information between the starting point and the ending point is added to perform population mobility prediction of the target region, namely the pedestrian volume between the starting point and the ending point in the target region is predicted.
Specifically, in the knowledge-based distillation between start and end point traffic prediction process, a population flow prediction model for the developing city is used in the traffic prediction between start and end points based on observed features and learned embedded vectors. To highlight the effectiveness of the causal knowledge modeling method, a start-end people flow prediction method in a standard link prediction mode may be selected, combining the features of the original region and the features of the destination region (i.e., the features of the start region and the features of the end region), and adding distance information to predict the flow between the pair of regions. As shown in fig. 4, which is a prediction process of the human mouth flow prediction model. Wherein, the Region Features represent Region characteristics, including Region A, region B, region C, region D and Region E; the method comprises the following steps that (1) Urban topology represents network topology of a city or network topology of a region, GAT origin represents an Original region (namely a starting point), GAT Destination represents a target region (namely an end point), original Embedding represents an Original region Embedding vector, and Destination Embedding represents a target region Embedding vector; the Distance matrix is a Distance matrix which is a matrix containing the Distance between every two points of a group, destination regions represent destination regions, original regions represent original regions, fetch by Index obtains corresponding data through indexes, prediction, iteration and interaction influence are performed; concatenate represents a join function.
The specific process is shown in fig. 4: the displayed region features include observed region features and learned Z (or μ) from the CEVAE model zz ). And then constructing a graph neural network on the urban space, taking the region as a node, and connecting edges of adjacent regions. Because adjacent regions have certain similarity based on spatial continuity, the similarity between the regions can be fully utilized by using the neural network of the graph, so that the label is spread on the urban space. The GAT is used to extract spatial features of regions on a graph network of urban space and to predict the start-end flow using the original and destination features, and in addition, as shown in fig. 4, distance is also a key factor affecting population movement, so distance is considered in the prediction and MSE (Microsoft Security essences) is used as a loss of gradient descent. Aiming at the problem of sparseness of developing city data, sparseness of data of a target city (namely a target area) is made up by a knowledge distillation method when a prediction model is migrated.
In order to more clearly illustrate the embodiment of the present invention or the technical solution in the prior art, a specific embodiment of the present invention will be described with reference to fig. 5. The required data is collected in a data-rich source area, for example, by means of questionnaire survey, crawler and the like. The collected data then needs to be cleaned. Prediction of the target city is then aided based on two ways. Firstly, a characteristic cause-and-effect graph is learned from collected data through a method of reinforcement learning cause-and-effect discovery, based on the characteristic cause-and-effect graph, embedding of observed characteristic recovery missing characteristics is obtained on the basis of CEVAE, and the embedding is migrated to a target city. On the other hand, a prediction model of characteristic prediction OD-flow (from a starting point to an end point) is trained in a source region, and the model is migrated to a target city in a knowledge distillation mode; and finally, performing OD-flow prediction (namely, population flow prediction from a starting point to an end point) of the target city based on the obtained embedding and the migrated prediction model.
According to the population flow prediction method for the data sparse region, the corresponding causal knowledge of the region is obtained from the data of the source region through a causal discovery model based on reinforcement learning, and the causal variation automatic encoder based on the causal knowledge of the region and an initial variation automatic encoder is obtained; then, restoring the missing features of the target area by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedded vector; then migrating the knowledge of a prediction model to the target area based on a migration learning algorithm of knowledge distillation between the source area and the target area, and performing population flow prediction of the target area based on the knowledge of the prediction model and the potential causal embedding vector; the prediction dilemma caused by sparse data is effectively solved, and the population flow prediction efficiency and accuracy for the data sparse area are improved.
Corresponding to the population mobility prediction method for the data sparse area, the invention also provides a population mobility prediction device for the data sparse area. Since the embodiment of the device is similar to the above method embodiment, the description is simple, and please refer to the description of the above method embodiment, and the embodiment of the population flow prediction device in the data sparse area described below is only schematic. Fig. 6 is a schematic structural diagram of a population flow prediction apparatus in a data sparse area according to an embodiment of the present invention.
The invention relates to a population mobility prediction device for a data sparse area, which specifically comprises the following parts:
a region causal knowledge acquisition unit 601, configured to acquire corresponding region causal knowledge from data of a source region by using a causal discovery model based on reinforcement learning;
a causal embedded vector obtaining unit 602, configured to obtain a causal enhancement-based variation automatic encoder based on the area causal knowledge and an initial variation automatic encoder; restoring the missing features of the target area by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedding vector; wherein the potential causal embedding vector is a characterization vector corresponding to an observed feature and the missing feature;
a population flow prediction unit 603 configured to migrate the knowledge of the prediction model to the target area based on a migration learning algorithm of knowledge distillation between the source area and the target area, and perform population flow prediction of the target area based on the knowledge of the prediction model and the potential causal embedding vector; wherein the prediction model is a population flow prediction model previously constructed based on data of the source region.
Further, the area cause and effect knowledge acquisition unit is specifically configured to:
acquiring data of the source region; establishing a strategy based on the region causal knowledge of reinforcement learning to determine a causal discovery model based on region attribute feature sequencing, analyzing data of the source region by using the causal discovery model to obtain a region attribute feature sequence meeting preset conditions, and pruning by Bayesian test to obtain a feature causal graph containing the relationship between region attribute features; wherein the characteristic cause and effect graph is used to represent the regional cause and effect knowledge.
Further, the causal embedding vector obtaining unit is specifically configured to:
using the characteristic causal graph as a characteristic restoration path, and learning missing information containing region attribute characteristics based on the initial variational automatic encoder and the characteristic restoration path to obtain the potential causal embedded vector; the potential causal embedding vector is an embedding vector of missing features based on observed feature recovery.
Further, the causal embedding vector obtaining unit is specifically further configured to:
explicitly modeling the unobserved missing features of the target region as auxiliary latent variables, and adding the association relationship between the features to the initial variation automatic encoder by using a causal path corresponding to the region causal knowledge so as to construct and obtain an initial variation automatic encoder based on causal enhancement; and training the initial causal enhancement-based variation automatic encoder through back propagation to obtain the causal enhancement-based variation automatic encoder.
Further, the population flow prediction unit is specifically configured to:
and acquiring the characteristics of a starting point region and the characteristics of an end point region in the target region based on the knowledge of the prediction model and the potential causal embedding vector, and adding distance information between the starting point and the end point to predict the pedestrian volume between the starting point and the end point in the target region.
Further, the area causal knowledge is causal mapping relation information between the human features and the area attribute features in the source area.
Further, the target region is a data sparse region.
According to the population flow prediction device for the data sparse region, the corresponding causal knowledge of the region is obtained from the data of the source region through a causal discovery model based on reinforcement learning, and a variation automatic encoder based on causal reinforcement is obtained based on the causal knowledge of the region and an initial variation automatic encoder; then, restoring the missing features of the target area by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedded vector; then migrating the knowledge of a prediction model to the target area based on a migration learning algorithm of knowledge distillation between the source area and the target area, and predicting the population flow of the target area based on the knowledge of the prediction model and the potential causal embedding vector; the prediction predicament caused by sparse data is effectively solved, and the population flow prediction efficiency and accuracy for the data sparse area are improved.
Corresponding to the provided population mobility prediction method for the data sparse area, the invention also provides electronic equipment. Since the embodiment of the electronic device is similar to the above method embodiment, the description is simple, and please refer to the description of the above method embodiment, and the electronic device described below is only schematic. Fig. 7 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. The electronic device may include: a processor (processor) 701, a memory (memory) 702, a communication bus 703 (i.e. the device bus mentioned above), and a lookup engine 705, wherein the processor 701 and the memory 702 communicate with each other through the communication bus 703, and communicate with the outside through a communication interface 704. The processor 701 may invoke logic instructions in the memory 702 to perform a method of demographic flow prediction in sparse areas of data, the method comprising: acquiring corresponding region causal knowledge from data of a source region by using a causal discovery model based on reinforcement learning; obtaining a variation automatic encoder based on causal enhancement based on the area causal knowledge and an initial variation automatic encoder; restoring the missing features of the target region by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedded vector; wherein the potential causal embedding vector is a characterization vector corresponding to the observed feature and the missing feature; migrating knowledge of a predictive model to the target region based on a migration learning algorithm of knowledge distillation between the source region and the target region, and performing population flow prediction of the target region based on the knowledge of the predictive model and the potential causal embedding vector; wherein the prediction model is a population flow prediction model previously constructed based on data of the source region.
Furthermore, the logic instructions in the memory 702 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a Memory chip, a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
In another aspect, embodiments of the present invention further provide a computer program product, where the computer program product includes a computer program stored on a processor-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer can execute the population flow prediction method for a data sparse area provided by the above-mentioned method embodiments. The method comprises the following steps: acquiring corresponding region causal knowledge from data of a source region by using a causal discovery model based on reinforcement learning; obtaining a variation automatic encoder based on causal enhancement based on the regional causal knowledge and an initial variation automatic encoder; restoring the missing features of the target region by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedded vector; wherein the potential causal embedding vector is a characterization vector corresponding to the observed feature and the missing feature; migrating knowledge of a predictive model to the target region based on a migration learning algorithm of knowledge distillation between the source region and the target region, and performing population flow prediction of the target region based on the knowledge of the predictive model and the potential causal embedding vector; wherein the prediction model is a population flow prediction model previously constructed based on data of the source region.
In still another aspect, the present invention further provides a processor-readable storage medium, where a computer program is stored on the processor-readable storage medium, and when executed by a processor, the computer program is implemented to perform the method for predicting population mobility in a data sparse area provided in the foregoing embodiments. The method comprises the following steps: acquiring corresponding region causal knowledge from data of a source region by using a causal discovery model based on reinforcement learning; obtaining a variation automatic encoder based on causal enhancement based on the area causal knowledge and an initial variation automatic encoder; restoring the missing features of the target area by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedding vector; wherein the potential causal embedding vector is a characterization vector corresponding to the observed feature and the missing feature; migrating knowledge of a predictive model to the target region based on a migration learning algorithm of knowledge distillation between the source region and the target region, and performing population flow prediction of the target region based on the knowledge of the predictive model and the potential causal embedding vector; wherein the prediction model is a population flow prediction model previously constructed based on data of the source region.
The processor-readable storage medium can be any available medium or data storage device that can be accessed by a processor, including, but not limited to, magnetic memory (e.g., floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc.), optical memory (e.g., CDs, DVDs, BDs, HVDs, etc.), and semiconductor memory (e.g., ROMs, EPROMs, EEPROMs, non-volatile memories (NAND FLASH), solid State Disks (SSDs)), etc.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, and not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A population mobility prediction method for a data sparse area is characterized by comprising the following steps:
acquiring corresponding region causal knowledge from data of a source region by using a causal discovery model based on reinforcement learning;
obtaining a variation automatic encoder based on causal enhancement based on the regional causal knowledge and an initial variation automatic encoder; restoring the missing features of the target area by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedding vector; wherein the potential causal embedding vector is a characterization vector corresponding to the observed feature and the missing feature;
migrating knowledge of a predictive model to the target region based on a transfer learning algorithm of knowledge distillation between the source region and the target region, and performing population flow prediction of the target region based on the knowledge of the predictive model and the potential causal embedding vector; wherein the prediction model is a population flow prediction model previously constructed based on data of the source region.
2. The method for predicting population flow in a data sparse region as claimed in claim 1, wherein the obtaining of corresponding regional causal knowledge from data of a source region by using a causal discovery model based on reinforcement learning specifically comprises:
acquiring data of the source region; establishing a strategy based on the region causal knowledge of reinforcement learning to determine a causal discovery model based on region attribute feature sequencing, analyzing data of the source region by using the causal discovery model to obtain a region attribute feature sequence meeting preset conditions, and pruning by Bayesian test to obtain a feature causal graph containing the relationship between region attribute features; wherein the characteristic causal graph is to represent the regional causal knowledge.
3. The method for predicting population mobility in a data sparse area according to claim 2, wherein the causal enhancement-based variational automatic encoder is used for recovering missing features of a target area to obtain a potential causal embedding vector, and specifically comprises:
and taking the characteristic causal graph as a characteristic restoration path, and learning missing information containing region attribute characteristics based on the initial variational automatic encoder and the characteristic restoration path to obtain the potential causal embedded vector.
4. The method for predicting population flow in a data sparse area according to claim 1, wherein a causal enhancement-based variation automatic encoder is obtained based on the area causal knowledge and an initial variation automatic encoder, and specifically comprises:
explicitly modeling the unobserved missing features of the target region as auxiliary latent variables, and adding the association relationship between the features to the initial variation automatic encoder by using a causal path corresponding to the region causal knowledge so as to construct and obtain an initial variation automatic encoder based on causal enhancement; and training the initial causal enhancement-based variation automatic encoder through back propagation to obtain the causal enhancement-based variation automatic encoder.
5. The method for predicting population mobility in a data sparse area according to claim 1, wherein the predicting population mobility in the target area based on the knowledge of the prediction model and the potential causal embedding vector comprises:
and acquiring the characteristics of a starting point region and the characteristics of an end point region in the target region based on the knowledge of the prediction model and the potential causal embedding vector, and adding distance information between the starting point and the end point to predict the pedestrian volume between the starting point and the end point in the target region.
6. The method for predicting population flow in a data sparse area as claimed in claim 1, wherein the area causal knowledge is causal mapping relationship information between human features and area attribute features in a source area.
7. The method of predicting population flow in a data sparse region as recited in claim 1, wherein the target region is a data sparse region.
8. A population flow prediction apparatus for a data sparse area, comprising:
the area cause and effect knowledge acquisition unit is used for acquiring corresponding area cause and effect knowledge from data of the source area by using a cause and effect discovery model based on reinforcement learning;
a causal embedded vector acquisition unit, configured to obtain a causal enhancement-based variation automatic encoder based on the area causal knowledge and an initial variation automatic encoder; restoring the missing features of the target area by using the variation automatic encoder based on causal enhancement to obtain a potential causal embedding vector; wherein the potential causal embedding vector is a characterization vector corresponding to the observed feature and the missing feature;
the population flow prediction unit is used for migrating the knowledge of a prediction model to the target area based on a migration learning algorithm of knowledge distillation between the source area and the target area, and performing population flow prediction of the target area based on the knowledge of the prediction model and the potential causal embedding vector; wherein the prediction model is a population flow prediction model previously constructed based on data of the source region.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program performs the steps of the method for population flow prediction in data sparse areas as claimed in any one of claims 1 to 7.
10. A processor-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for population flow prediction in areas with sparse data according to any one of claims 1 to 7.
CN202211313718.7A 2022-10-25 2022-10-25 Population flow prediction method and device for data sparse region Active CN115759350B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211313718.7A CN115759350B (en) 2022-10-25 2022-10-25 Population flow prediction method and device for data sparse region

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211313718.7A CN115759350B (en) 2022-10-25 2022-10-25 Population flow prediction method and device for data sparse region

Publications (2)

Publication Number Publication Date
CN115759350A true CN115759350A (en) 2023-03-07
CN115759350B CN115759350B (en) 2024-07-12

Family

ID=85353167

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211313718.7A Active CN115759350B (en) 2022-10-25 2022-10-25 Population flow prediction method and device for data sparse region

Country Status (1)

Country Link
CN (1) CN115759350B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118643949A (en) * 2024-08-15 2024-09-13 北京航空航天大学 Urban space-time data prediction method based on time-guided causal structure learning

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160026942A1 (en) * 2014-07-24 2016-01-28 Optum, Inc. System and method for identifying relationships in community healthcare measures
CN111612206A (en) * 2020-03-30 2020-09-01 清华大学 Street pedestrian flow prediction method and system based on space-time graph convolutional neural network
US20210174216A1 (en) * 2019-12-04 2021-06-10 International Business Machines Corporation Signaling concept drift during knowledge base population
CN113052635A (en) * 2021-03-30 2021-06-29 北京明略昭辉科技有限公司 Population attribute label prediction method, system, computer device and storage medium
WO2021174876A1 (en) * 2020-09-18 2021-09-10 平安科技(深圳)有限公司 Smart decision-based population movement prediction method, apparatus, and computer device
CN113610309A (en) * 2021-08-13 2021-11-05 清华大学 Fire station site selection method and device based on big data and artificial intelligence
CN113642807A (en) * 2021-09-01 2021-11-12 智慧足迹数据科技有限公司 Population mobility prediction method and related device
US20210366603A1 (en) * 2019-09-25 2021-11-25 Brilliance Center B.V. Methods for anonymously tracking and/or analysing health in a population of subjects
CN114239983A (en) * 2021-12-22 2022-03-25 广东电网有限责任公司 Target area population flow prediction method and related device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160026942A1 (en) * 2014-07-24 2016-01-28 Optum, Inc. System and method for identifying relationships in community healthcare measures
US20210366603A1 (en) * 2019-09-25 2021-11-25 Brilliance Center B.V. Methods for anonymously tracking and/or analysing health in a population of subjects
US20210174216A1 (en) * 2019-12-04 2021-06-10 International Business Machines Corporation Signaling concept drift during knowledge base population
CN111612206A (en) * 2020-03-30 2020-09-01 清华大学 Street pedestrian flow prediction method and system based on space-time graph convolutional neural network
WO2021174876A1 (en) * 2020-09-18 2021-09-10 平安科技(深圳)有限公司 Smart decision-based population movement prediction method, apparatus, and computer device
CN113052635A (en) * 2021-03-30 2021-06-29 北京明略昭辉科技有限公司 Population attribute label prediction method, system, computer device and storage medium
CN113610309A (en) * 2021-08-13 2021-11-05 清华大学 Fire station site selection method and device based on big data and artificial intelligence
CN113642807A (en) * 2021-09-01 2021-11-12 智慧足迹数据科技有限公司 Population mobility prediction method and related device
CN114239983A (en) * 2021-12-22 2022-03-25 广东电网有限责任公司 Target area population flow prediction method and related device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WANG, MINJIE: "Human mobility prediction from region functions with taxi trajectories", 《PLOS ONE》, vol. 12, no. 11, 30 November 2017 (2017-11-30), pages 1 - 15 *
乐志强: "社会阶层认知和教育水平对流动人口义务教育公共服务满意度的影响", 《上海教育科研》, no. 12, 31 December 2016 (2016-12-31), pages 18 - 22 *
张燕;: "城市群的形成机理研究", 城市与环境研究, no. 01, 20 September 2014 (2014-09-20), pages 22 - 25 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118643949A (en) * 2024-08-15 2024-09-13 北京航空航天大学 Urban space-time data prediction method based on time-guided causal structure learning

Also Published As

Publication number Publication date
CN115759350B (en) 2024-07-12

Similar Documents

Publication Publication Date Title
US11521221B2 (en) Predictive modeling with entity representations computed from neural network models simultaneously trained on multiple tasks
CN109120462B (en) Method and device for predicting opportunistic network link and readable storage medium
CN111369299B (en) Identification method, device, equipment and computer readable storage medium
US11907675B2 (en) Generating training datasets for training neural networks
CN110795657B (en) Article pushing and model training method and device, storage medium and computer equipment
CN111651671B (en) User object recommendation method, device, computer equipment and storage medium
CN113762595B (en) Traffic time prediction model training method, traffic time prediction method and equipment
CN112015896B (en) Emotion classification method and device based on artificial intelligence
US11954590B2 (en) Artificial intelligence job recommendation neural network machine learning training based on embedding technologies and actual and synthetic job transition latent information
CN111000492B (en) Intelligent sweeper behavior decision method based on knowledge graph and intelligent sweeper
CN113314188B (en) Graph structure enhanced small sample learning method, system, equipment and storage medium
CN117175588B (en) Space-time correlation-based electricity load prediction method and device
CN112199884B (en) Method, device, equipment and storage medium for generating article molecules
Xue et al. Forecasting hourly attraction tourist volume with search engine and social media data for decision support
CN115759350B (en) Population flow prediction method and device for data sparse region
CN115062779A (en) Event prediction method and device based on dynamic knowledge graph
CN116311880A (en) Traffic flow prediction method and equipment based on local-global space-time feature fusion
CN115936802A (en) Personalized marketing method, device, equipment and storage medium based on user portrait and sequence modeling
Lee et al. TESTAM: a time-enhanced spatio-temporal attention model with mixture of experts
CN115761519B (en) Index prediction method, apparatus, device, storage medium, and program product
US20220156526A1 (en) Systems and methods for automated detection of building footprints
WO2022022059A1 (en) Context aware anomaly detection
CN117010480A (en) Model training method, device, equipment, storage medium and program product
CN116205376B (en) Behavior prediction method, training method and device of behavior prediction model
CN117390197B (en) City model region representation generation method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant