CN111598114B

CN111598114B - Method for determining hidden state sequence and method for determining function type of block

Info

Publication number: CN111598114B
Application number: CN201910127322.5A
Authority: CN
Inventors: 李勇; 夏彤; 金德鹏; 孙福宁
Original assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd; Tencent Dadi Tongtu Beijing Technology Co Ltd
Current assignee: Tsinghua University; Tencent Technology Shenzhen Co Ltd; Tencent Dadi Tongtu Beijing Technology Co Ltd
Priority date: 2019-02-20
Filing date: 2019-02-20
Publication date: 2023-07-25
Anticipated expiration: 2039-02-20
Also published as: CN111598114A

Abstract

The application relates to a determination method of a hidden state sequence, which comprises the steps of obtaining an observation sequence corresponding to a target block; determining local probabilities of each hidden state of the hidden Markov model in each time slice covered by the observation sequence, which are respectively corresponding to each local probability, based on the observation sequence, the initial state probability and the state transition probability corresponding to the target block in the hidden Markov model, and the Gaussian distribution mean value and the Gaussian distribution variance which are jointly corresponding to each candidate block related to the hidden Markov model; determining the hidden state of the target block in the last time slice based on the maximum local probability of the local probabilities of the target block in each hidden state in the last time slice covered by the observation sequence; and carrying out optimal path backtracking based on the hidden state of the target block in the last time slice and each back pointer to obtain a hidden state sequence, and determining the state transition condition of the block.

Description

Method for determining hidden state sequence and method for determining function type of block

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for determining a hidden state sequence, a method and an apparatus for determining a function type of a block, a computer readable storage medium, and a computer device.

Background

With the development of computer technology, people increasingly model based on observation data corresponding to blocks in cities (such as behavior data of population activities in blocks), so as to evaluate population flow characteristics of blocks.

Traditionally, symbiotic relationships among time, place, and population activities, such as "eating" activities, are modeled by means of characterization learning (e.g., cross-Modal Representation Learning), which typically occur at restaurant-type places at noon or evening hours. As shown in FIG. 1, after the model learns the symbiotic relationship among the time, the place and the activities of the population, one item can infer the possible conditions of the other two items, for example, the model can inquire according to different times, so as to recover the change rule of the activities of the population in different places. However, the conventional method does not support the determination of the state transition condition of the neighborhood, and has certain limitations.

Disclosure of Invention

Based on this, it is necessary to provide a method and a device for determining a hidden state sequence, a method and a device for determining a function type of a block, a computer readable storage medium, and a computer device, aiming at the technical problem that the determination of the state transition condition of the block is not supported in the conventional technology.

A method of determining a sequence of hidden states, comprising:

obtaining an observation sequence corresponding to a target block;

determining local probabilities that the target block is in each hidden state of the hidden Markov model in each time slice covered by the observation sequence based on the observation sequence, initial state probabilities corresponding to the target block in the hidden Markov model, state transition probabilities corresponding to the target block, gaussian distribution mean values commonly corresponding to candidate blocks related to the hidden Markov model and Gaussian distribution variance commonly corresponding to each candidate block, and determining back pointers respectively corresponding to the local probabilities;

determining the hidden state of the target block in the last time slice based on the maximum local probability of the local probabilities of the target block in each hidden state in the last time slice covered by the observation sequence;

And carrying out optimal path backtracking based on the hidden state of the target block in the last time slice and each back pointer to obtain a hidden state sequence.

A method for determining the function type of a block comprises the following steps:

obtaining observation sequences corresponding to candidate blocks related to a hidden Markov model respectively;

based on initial state probabilities corresponding to the candidate blocks in the hidden Markov model, state transition probabilities corresponding to the candidate blocks, gaussian distribution mean values corresponding to the candidate blocks and Gaussian distribution variances corresponding to the candidate blocks, local probabilities that the candidate blocks are in hidden states of the hidden Markov model in time slices covered by the observation sequence are determined, and back pointers corresponding to the local probabilities are determined based on the local probabilities;

determining the hidden state of each candidate block in the last time slice based on the maximum local probability of the local probabilities of each hidden state of each candidate block in the last time slice covered by the observation sequence;

Performing optimal path backtracking based on the hidden state of each candidate block in the last time slice and each back pointer to obtain hidden state sequences corresponding to each candidate block;

clustering is carried out based on hidden state sequences corresponding to the candidate blocks respectively, and the function type of the candidate blocks is determined from the candidate function types based on clustering results.

A device for determining a sequence of hidden states, comprising:

the first observation sequence acquisition module is used for acquiring an observation sequence corresponding to the target block;

a first intermediate parameter determining module, configured to determine local probabilities that the target block is in each hidden state of the hidden markov model in each time slice covered by the observation sequence, based on the observation sequence, an initial state probability corresponding to the target block in the hidden markov model, a state transition probability corresponding to the target block, a gaussian distribution mean value commonly corresponding to each candidate block related to the hidden markov model, and a gaussian distribution variance commonly corresponding to each candidate block, and determine back pointers respectively corresponding to each local probability;

The first end hiding state determining module is used for determining the hiding state of the target block in the last time slice based on the maximum local probability of the local probabilities of the target block in each hiding state in the last time slice covered by the observation sequence;

and the first hidden state sequence determining module is used for carrying out optimal path backtracking based on the hidden state of the target block in the last time slice and each back pointer to obtain a hidden state sequence.

A functional type determining apparatus of a neighborhood, comprising:

the second observation sequence acquisition module is used for acquiring observation sequences corresponding to candidate blocks related to the hidden Markov model respectively;

a second intermediate parameter determining module, configured to determine, based on an initial state probability corresponding to each candidate block in the hidden markov model, a state transition probability corresponding to each candidate block, a gaussian distribution mean value corresponding to each candidate block, and a gaussian distribution variance corresponding to each candidate block, a local probability that each candidate block is in each hidden state of the hidden markov model in each time slice covered by the observation sequence, and determine, based on each local probability, a back pointer corresponding to each local probability;

The second end hiding state determining module is used for determining the hiding state of each candidate block in the last time slice based on the maximum local probability that each candidate block is respectively located in the local probabilities of each hiding state in the last time slice covered by the observation sequence;

the second hidden state sequence determining module is used for carrying out optimal path backtracking based on the hidden state of each candidate block in the last time slice and each back pointer to obtain hidden state sequences corresponding to each candidate block;

the function type determining module is used for clustering based on hidden state sequences corresponding to the candidate blocks respectively, and determining the function type of the candidate block from the candidate function types based on a clustering result.

A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method as described above.

A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the method as described above.

Based on the scheme, the local probability that the target block is in each hidden state of the hidden Markov model in each time slice covered by the observation sequence is determined based on the observation sequence corresponding to the target block, the initial state probability corresponding to the target block in the hidden Markov model, the state transition probability corresponding to the target block, the Gaussian distribution mean value commonly corresponding to each candidate block related to the hidden Markov model and the Gaussian distribution variance commonly corresponding to each candidate block, and then the corresponding hidden state sequence is obtained. Therefore, the hidden state sequence corresponding to the neighborhood is obtained through the hidden Markov model, the state transition condition of the neighborhood can be determined, and the limitation in the traditional mode is broken.

Drawings

FIG. 1 is a modeling result based on characterization learning in the prior art;

FIG. 2 is a diagram of an application environment for determination of a hidden state sequence in one embodiment;

FIG. 3 is a flow diagram of determination of a hidden state sequence in one embodiment;

FIG. 4 is a schematic diagram of blocks within a Beijing city central urban area determined from a road network in one embodiment;

FIG. 5 is a diagram illustrating a mapping relationship between hidden states and active behavior features in one embodiment;

FIG. 6 is a schematic diagram of an observation sequence corresponding to a target block in one embodiment;

FIG. 7 is a flow chart of a training method of a hidden Markov model according to one embodiment;

FIG. 8 is a flow chart of a method for determining a function type of a block in one embodiment;

FIG. 9 is a schematic diagram of a hidden state corresponding to a block in an actual test in one embodiment;

FIG. 10 is a schematic diagram of a functional distribution of blocks in a city under actual testing in one embodiment;

FIG. 11 is a diagram of predicted results of population flow behavior in an actual test, according to one embodiment;

FIG. 12 is a block diagram of a determination device of a hidden state sequence in one embodiment;

FIG. 13 is a block diagram of a functional type determination device of a neighborhood in one embodiment;

FIG. 14 is a block diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In this document, for the description of numerical ranges, the term "above" is understood to include the present number, such as "two or more" means equal to or greater than two.

The method for determining the hidden state sequence provided by the embodiments of the present application can be applied to an application environment as shown in fig. 2. The application environment may involve a terminal 210 and a server 220, and the terminal 210 and the server 220 may be connected through a network.

Specifically, the terminal 210 acquires an observation sequence corresponding to the target block and transmits the observation sequence to the server 220. The server 220 receives an observation sequence corresponding to a target block; further, based on the observation sequence, an initial state probability corresponding to the target neighborhood in the hidden Markov model, a state transition probability corresponding to the target neighborhood, a Gaussian distribution mean value commonly corresponding to each candidate neighborhood related to the hidden Markov model, and a Gaussian distribution variance commonly corresponding to each candidate neighborhood, determining local probabilities that the target neighborhood is in each hidden state of the hidden Markov model within each time slice covered by the observation sequence, and determining back pointers respectively corresponding to each local probabilities; determining the hidden state of the target block in the last time slice based on the maximum local probability of the local probabilities of the target block in each hidden state in the last time slice covered by the observation sequence; and then, carrying out optimal path backtracking based on the hidden state of the target block in the last time slice and each back pointer to obtain a hidden state sequence.

In other embodiments, the terminal 210 may also independently complete a series of steps from obtaining the observation sequence corresponding to the target block to obtaining the hidden state sequence, without participation of the server 220.

The terminal 210 may include at least one of a mobile phone, a tablet computer, a notebook computer, a desktop computer, a personal digital assistant, a wearable device, and the like, but is not limited thereto. Server 220 may be implemented as a stand-alone server or as a cluster of servers.

In one embodiment, as shown in FIG. 3, a method of determining a sequence of hidden states is provided. The method is described as applied to a computer device such as the terminal 210 or the server 220 of fig. 2 described above. The method may include the following steps S302 to S308.

S302, obtaining an observation sequence corresponding to the target block.

A neighborhood is a polygonal geographic area surrounded by geographic boundaries of streets. In particular, geographic boundaries of streets may be extracted from a road network of a city, such that blocks within the city are determined based on the extracted geographic boundaries. As shown in fig. 4, 665 blocks within the central urban area of beijing city are determined from the secondary road network of beijing city. It can be understood that the road network is a natural division of basic geographic units of human activities in a city, and the functions of the blocks determined by the road network are often more single, and people living in the same block have similar life patterns.

Accordingly, the target neighborhood is a neighborhood of a hidden state sequence corresponding to the observation sequence, which needs to be determined through a hidden Markov model based on the observation sequence corresponding to the target neighborhood. In this embodiment, a hidden markov model may be trained in advance based on each observation sequence corresponding to two or more blocks, where the two or more blocks are candidate blocks related to the hidden markov model, and the target block may be selected from each candidate block related to the hidden markov model. For example, a hidden markov model is trained based on each observation sequence corresponding to 665 blocks in a central urban area of beijing city shown in fig. 4, and the 665 blocks are candidate blocks related to the hidden markov model, and the target block can be selected from the 665 candidate blocks.

The observation sequence corresponding to the target block may include demographic activity data for the target block over more than two time slices. The demographic activity data may relate to activity behavior characteristics corresponding to demographic activity behavior of the neighborhood, such as population flow numbers and access frequencies to predetermined types of points of interest (Point Of Interest, POIs). Specifically, the population flowing number may include an in-person number, a stay-person number, and an out-person number; the predetermined type of interest points may include at least one of the 9 types of restaurants, companies, institutions, shopping, services, attractions, entertainment, education, and housing, for example, 4 types of restaurants, education, attractions, and housing, and further, 9 types of restaurants, companies, institutions, shopping, services, attractions, entertainment, education, and housing, for example.

Assuming that the target block is the r candidate block related to the hidden Markov model, the observation sequence corresponding to the target block can be expressed as O _r ＝{O _r,1 ,O _r,2 ,O _r,3 ,...,O _r,N N represents the total number of time slices covered by the observation sequence corresponding to the r candidate neighborhood. O (O) _r,n Demographic data representing the r candidate neighborhood in the nth time slice, n=1, 2,3, …, N. For example, for month 4 of 2018, for a total of 30 days, each day is divided into 24 time slices at 1 hour intervals, 720 time slices are determined, and if the population activity data within the 720 time slices at the nth block form an observation sequence, N is equal to 720, and the observation sequence representation can be represented as O _r ＝{O _r,1 ,O _r,2 ,O _r,3 ,...,O _r,720 }。

And O is _r,n ＝{O _r,n,1 ,O _r,n,2 ,O _r,n,3 ,...,O _r,n,M M represents the total number of observation features covered by the observation data in the observation sequence corresponding to the r-th neighborhood. O (O) _r,n,m Demographic activity data corresponding to the mth activity behavior feature, m=1, 2,3, …, M, representing the mth candidate neighborhood within the nth time slice. For example, each activity behavior feature related to the population activity data in the observation sequence corresponding to the r-th neighborhood is respectively: number of people moved in, number of people left in, number of people moved out, frequency of access to points of interest in restaurants, needleThe frequency of access to educational points of interest, the frequency of access to points of interest, and the frequency of access to points of interest, i.e., residence, for a total of 7 activity behavior features, then M equals 7. For another example, each activity behavior feature related to the population activity data in the observation sequence corresponding to the r-th neighborhood is respectively: the frequency of access to 9 types of points of interest, namely, the number of in-persons, the number of stay-persons, the number of out-persons, the number of access to companies, institutions, shopping, services, attractions, entertainment, education, and housing, i.e., a total of 12 activity behavior features are involved, then M equals 12.

S304, based on the observation sequence, the initial state probability corresponding to the target block in the hidden Markov model, the state transition probability corresponding to the target block, the Gaussian distribution mean value commonly corresponding to each candidate block related to the hidden Markov model and the Gaussian distribution variance commonly corresponding to each candidate block, determining the local probability that the target block is in each hidden state of the hidden Markov model in each time slice covered by the observation sequence, and determining the back pointers respectively corresponding to the local probabilities.

In this embodiment, the model parameters of the hidden Markov model may include initial state probabilities pi corresponding to candidate blocks related to the hidden Markov model _r State transition probabilities a corresponding to the respective candidate blocks _r The mean value μ of the gaussian distribution commonly corresponding to each candidate block, and the variance σ of the gaussian distribution commonly corresponding to each candidate block. That is, the model parameters of the hidden Markov model can be expressed as θ= { pi _r ,A _r μ, σ, r=1, 2,3, …, R is the total number of candidate blocks involved in the hidden markov model.

For example, training a hidden markov model based on each observation sequence corresponding to 3 blocks in advance, where the 3 blocks are candidate blocks related to the hidden markov model, and model parameters of the hidden markov model may include: initial state probability pi corresponding to 1 st candidate block ₁ State transition probability a corresponding to 1 st candidate neighborhood ₁ Initial state probability corresponding to the 2 nd candidate neighborhoodπ ₂ State transition probability a corresponding to the 2 nd candidate neighborhood ₂ Initial state probability pi corresponding to 3 rd candidate neighborhood ₃ State transition probability a corresponding to 3 rd candidate neighborhood ₃ A gaussian distribution mean μ that corresponds to the 3 candidate blocks, and a gaussian distribution variance σ that corresponds to the 3 blocks.

Assuming that the (r) candidate block related to the hidden Markov model is determined as a target block, and the initial state probability corresponding to the target block is the initial state probability pi corresponding to the (r) candidate block _r It may include: the r candidate neighborhood is divided into probabilities of each hidden state of the hidden Markov model within the 1 st time slice covered by the observation sequence. Specifically pi _r Can be represented by a matrix of 1 XK, i.e.. Pi _r ＝[π _r,1 π _r,2 π _r,3 ... π _r,K ]K represents the total number of hidden states of the hidden Markov model. Pi _r,k Representing the probability that the r candidate neighborhood is in the kth hidden state of the hidden markov model within the 1 st time slice, k=1, 2,3, …, K.

The state transition probability corresponding to the target block is the state transition probability A corresponding to the (r) th candidate block _r It may include: the probability that the r candidate neighborhood transitions between hidden states of the hidden Markov model. Specifically, A _r Can be represented in a matrix of K x K as follows:

wherein A is _r,j,k Representing the probability of the r candidate neighborhood transitioning from the j-th hidden state to the K-th hidden state, k=1, 2,3, …, K, j=1, 2,3, …, K representing the total number of hidden states of the hidden Markov model.

The gaussian distribution mean μ, which corresponds in common to each candidate neighborhood to which the hidden markov model relates, may include: under the condition that the candidate neighborhood is in each hidden state of the hidden Markov model, the mean value of the Gaussian distribution obeyed by the probability of each activity behavior feature related to population activity data in the observation sequence is generated respectively. Specifically, μmay be represented in a matrix of kxm as follows:

wherein mu _k,m The mean of gaussian distributions obeyed by probabilities of generating mth active behavior features under the condition that each candidate neighborhood is in the kth hidden state of the hidden markov model is represented by k=1, 2,3, …, K, m=1, 2,3, …, M, K being the total number of hidden states of the hidden markov model and M being the total number of active behavior features involved in population active data in the observation sequence.

The gaussian distribution variance σ commonly corresponding to each candidate neighborhood to which the hidden markov model relates includes: under the condition that the candidate neighborhood is in each hidden state of the hidden Markov model, the variance of the Gaussian distribution obeyed by the probability of each activity behavior feature related to the population activity data in the observation sequence is generated respectively. Like μ, σ can be represented in a matrix of kxm as follows:

wherein sigma _k,m The variance of the gaussian distribution to which the probability of generating the mth active behavior feature is subjected under the condition that each candidate neighborhood is in the kth hidden state of the hidden markov model is represented by k=1, 2,3, …, K, m=1, 2,3, …, M, K being the total number of hidden states of the hidden markov model and M being the total number of active behavior features to which the population active data in the observation sequence relates.

Wherein the hidden state is a parameter that can be used to characterize the demographic characteristics of the neighborhood. In particular, the hidden state may be used to characterize the population density, population flow, and population activity type of a neighborhood. Population density and population flow may be represented by population flow numbers of the neighborhood, such as number of people moved in, number of people moved out, and number of people stayed in, and population activity types may be represented by access frequencies to predetermined types of points of interest. The hidden states of the hidden markov model may be preset based on actual requirements, for example, each hidden State of the hidden markov model is set to be 100 hidden states (state_1 to state_100) shown in fig. 5, and each hidden State has 12 feature parameters for representing an moving-in number (Arrving), moving-out number (Leaving), moving-in number (holding), an access frequency for an interest point of a Restaurant (Restaurant), an access frequency for an interest point of a Company (Company), an access frequency for an interest point of a mechanism (Agency), an access frequency for an interest point of a Shopping (Shopping), an access frequency for an interest point of a Service (Service), an access frequency for an interest point of a scenic spot (scenic spot), an access frequency for an interest point of an Entertainment (attrack), an access frequency for an interest point of an Education (eduction), and an access frequency for an interest point of a Residence (Service).

In the case of the hidden markov model, when the observation sequence and the model parameters of the hidden markov model are known, the hidden state sequence corresponding to the observation sequence may be determined by decoding by the Viterbi algorithm (i.e., the Viterbi algorithm).

The decoding process is a recursive computation process, that is, for each time slice covered by the observation sequence corresponding to the target block, the local probability that the target block is in the target hidden state in the time slice is determined based on the emission probability of population activity data in the time slice in the observation sequence corresponding to the target block, the state transition probability corresponding to the target block in the hidden markov model, and the local probability that the target block is in each hidden state of the hidden markov model in the last time slice adjacent to the time slice, which are generated under the condition that the target block is in the target hidden state of the hidden markov model in the time slice. It can be appreciated that the hidden states of the hidden markov model are sequentially taken as target hidden states, so that the local probability that the target neighborhood is respectively located in the hidden states of the hidden markov model in each time slice covered by the corresponding observation sequence can be determined.

Specifically, the local probability that the target block is in the kth hidden state of the hidden markov model in the nth time slice may refer to the maximum value of probabilities corresponding to all state transition paths that make the target block be in the kth hidden state in the nth time slice, and may be denoted as δ _n (k)。

Further, the local probability delta that the target neighborhood is in the kth hidden state of the hidden Markov model within the nth time slice can be calculated by the following formula _n (k)：

Wherein delta _n-1 (j) Representing the local probability that the target block is in the j-th hidden state of the hidden Markov model in the n-1 time slice; a is that _r,j,k Representing a probability of transitioning from the jth hidden state to the kth hidden state;representing delta _n-1 (1)A _r,1,k 、δ _n-1 (2)A _r,2,k 、δ _n-1 (3)A _r,3,k …, delta _n-1 (K)A _r,K,k Maximum value of (2); />Representing the emission probability of population activity data in the nth time slice in an observation sequence corresponding to a target block under the condition that the target block is in the kth hidden state in the nth time slice; k represents the total number of hidden states of the hidden Markov model.

It will be appreciated that for the 1 st time slice, there is no previous time slice adjacent to it, so that the local probability delta that the target neighborhood is in the kth hidden state of the hidden Markov model within the 1 st time slice can be initialized by the following formula ₁ (k)：

Wherein pi _r,k Representing a probability that the target neighborhood is in a kth hidden state of the hidden markov model within a 1 st time slice;and representing the emission probability of population activity data in the 1 st time slice in the observation sequence corresponding to the target block under the condition that the target block is in the k hidden state in the 1 st time slice.

Delta is obtained by initialization ₁ (k) After that, can pass through the formulaRecursively obtain delta ₂ (k)、δ ₃ (k) …, delta _N (k) N represents the total number of time slices covered by the observation sequence corresponding to the target neighborhood.

After determining the local probabilities that the target neighborhood is respectively located in each hidden state of the hidden markov model in each time slice covered by the observation sequence, the corresponding back pointers can be determined based on each local probability. The local probabilities and the back pointers may be in a one-to-one correspondence. Local probability delta of the kth hidden state in the hidden Markov model within the nth time slice _n (k) The corresponding back pointer, which may refer to the hidden state of the n-1 th node in the state transition path with the highest probability among all state transition paths that make the target block in the kth hidden state in the nth time slice, may be denoted as ψ _n (k)。

Specifically, ψ may be calculated by the following formula _n (k)：Wherein the argmax operator is used to determine the expression in brackets (i.e., delta _n-1 (j)A _r,j,k ) The index j with the largest value of (c). In addition, the parameter delta is as follows _n-1 (j)、A _r,j,k And K, as defined above,and will not be described in detail herein.

It should be noted that the local probability δ of the kth hidden state in the hidden markov model in the 1 st time slice ₁ (k) Corresponding back pointer ψ ₁ (k) Not of practical significance, so psi can be applied ₁ (k) Set to 0, i.e. psi ₁ (1)、ψ ₁ (2)、ψ ₁ (3) (ii), and ψ ₁ (K) All can be set to 0.

The local probabilities of the hidden states of the hidden markov model and the back pointers corresponding to the local probabilities within the time slices covered by the observation sequence may be represented by a matrix D, where K represents the total number of hidden states of the hidden markov model and N represents the total number of time slices covered by the observation sequence corresponding to the target block.

S306, determining the hidden state of the target block in the last time slice based on the maximum local probability of the local probabilities of the target block in each hidden state in the last time slice covered by the observation sequence.

The local probability of each hidden state in the last time slice covered by the observation sequence corresponding to the target block is the local probability of each hidden state in the Nth time slice, namely delta _N (1)、δ _N (2)、δ _N (3) …, delta _N (K)。

The maximum local probability is the local probability with the largest numerical value in the local probabilities of all hidden states in the last time slice covered by the observation sequence corresponding to the target block. Suppose delta _N (1)、δ _N (2)、δ _N (3) …, delta _N (K) The greatest value in the number is delta _N (3)，δ _N (3) I.e. the maximum local probability.

In this embodiment, the hidden state corresponding to the maximum local probability may be determined as the last target blockThe hidden state in each time slice. For example, from delta _N (1)、δ _N (2)、δ _N (3) …, delta _N (K) In (c) determining delta _N (3) And the 3 rd hidden state is the hidden state of the target block in the last time slice, which is the maximum local probability corresponding to the target block. For another example, from delta _N (1)、δ _N (2)、δ _N (3) …, delta _N (K) In (c) determining delta _N (K) And the Kth hidden state is the hidden state of the target block in the last time slice, which corresponds to the maximum local probability of the target block.

S308, performing optimal path backtracking based on the hidden state of the target block in the last time slice and each back pointer to obtain a hidden state sequence.

Specifically, in the process of performing optimal path backtracking, the hidden state in the nth time slice in the hidden state sequence can be determined based on the following formulaWherein (1)>And representing the hidden state in the n+1th time slice in the hidden state sequence corresponding to the target block.

It can be appreciated that the hidden state in the last time slice (i.e. the nth time slice) is determinedThereafter, the formula +.>Determining the hidden state in the N-1 time slice>And then pass throughDetermining the hidden state in the N-2 time slices, and so on, and finally passingAnd determining the hidden state in the 1 st time slice. Thus, the hidden state sequence corresponding to the target block is obtained +.>

In addition, the sequence of hidden states corresponding to the target neighborhood may be used to characterize the demographic activity characteristics of the target neighborhood in different time slices.

According to the method for determining the hidden state sequence, the local probability that the target block is in each hidden state of the hidden Markov model in each time slice covered by the observation sequence is determined based on the observation sequence corresponding to the target block, the initial state probability corresponding to the target block in the hidden Markov model, the state transition probability corresponding to the target block, the Gaussian distribution mean value commonly corresponding to each candidate block related to the hidden Markov model and the Gaussian distribution variance commonly corresponding to each candidate block. Therefore, the hidden state sequence corresponding to the neighborhood is obtained through the hidden Markov model, the state transition condition of the neighborhood can be determined, and the limitation in the traditional mode is broken.

It should be noted that, for the scheme of modeling symbiotic relationship among time, place and human activities by means of characterization learning (such as Cross-Modal Representation Learning) in the conventional technology. Besides the defect that the state transition condition of the blocks cannot be supported, the scheme cannot distinguish the states of different blocks, and cannot support parallel processing of data.

In this regard, the model parameters of the hidden markov model in the present application include each initial state probability corresponding to each candidate block related to the hidden markov model, each state transition probability corresponding to each candidate block, a gaussian distribution mean value corresponding to each candidate block, and a gaussian distribution variance corresponding to each candidate block, and since each candidate block has each corresponding initial state probability and state transition probability, the variability in state transition occurring between each candidate block due to the difference in function types to which each candidate block belongs, that is, the state of different blocks is distinguished, can be represented, and the process support for obtaining the hidden state sequence corresponding to each candidate block based on the observation sequence corresponding to the candidate block can be parallel for each candidate block related to the hidden markov model, and the time complexity can be effectively reduced.

In another aspect, one hidden markov model in which each block corresponds in common may be learned by using an observation sequence in which each block corresponds in each block, and model parameters of the hidden markov model include an initial state probability in common with each block, a state transition probability in common with each block, and an observation probability in common with each block. However, this solution also does not show the variability in state transitions between blocks due to the different types of functions to which they belong.

Alternatively, a hidden markov model corresponding to each block may be learned by using an observation sequence corresponding to the block. However, learning one hidden markov model for each block separately faces the problem that training data is sparse, resulting in insufficient model learning, and the hidden markov model models are independent, so that the association between blocks cannot be established either.

However, in the present application, the plurality of blocks correspond to the same hidden markov model, and the model parameters of the hidden markov model include each initial state probability corresponding to each block to which the hidden markov model relates, a state transition probability corresponding to each block, a gaussian distribution mean value corresponding to each block in common, and a gaussian distribution variance corresponding to each block in common. On the one hand, each block has the corresponding initial state probability and state transition probability, so that the difference in state transition between blocks due to the difference of the function types of the blocks can be reflected; on the other hand, the observation sequences corresponding to the blocks are used for learning a hidden Markov model together, rather than the hidden Markov models corresponding to the blocks, so that the problems that the model learning is insufficient and the association between the blocks cannot be established due to sparse training data are effectively solved.

In one embodiment, the step of obtaining the observation sequence corresponding to the target block, that is, step S302, may include the following steps: acquiring an original observation sequence corresponding to a target block; the original observation sequence comprises original population activity data of a target block in more than two time slices, and activity behavior characteristics related to each original population activity data comprise population flow numbers and access frequencies aiming at points of interest of a preset type; and carrying out maximum normalization on population flowing number in each piece of original population activity data and TF-IDF parameters corresponding to access frequency aiming at the interest points of the preset type in each piece of original population activity data to obtain an observation sequence corresponding to the target block.

Assuming that the target block is the r candidate block, the original observation sequence corresponding to the r candidate block can be expressed as X _r ＝{X _r,1 ,X _r,2 ,X _r,3 ,...,X _r,N N represents the total number of time slices covered by the observation sequence corresponding to the r candidate neighborhood. X is X _r,n Representing raw demographic data for the nth candidate neighborhood in the nth time slice, r=1, 2,3, …, R, n=1, 2,3, …, N, R representing the total number of candidate neighborhood involved in the hidden markov model.

And X is _r,n ＝{X _r,n,1 ,X _r,n,2 ,X _r,n,3 ,...,X _r,n,M M is the total number of activity behavior features referred to by the raw demographic activity data. X is X _r,n,m Representing the mth activity behavior feature involved in the original demographic activity data of the mth candidate neighborhood in the nth time slice, m=1, 2,3, …, M.

Assume that the mth activity behavior feature X in the original population activity data of the mth candidate neighborhood in the nth time slice _r,n,m Belonging to population flow numbers (e.g. X _r,n,m Belonging to the number of people moved in, the number of people stopped in or the number of people moved out), the following formula can be used for the X _r,n,m Performing maximum normalization to obtain normalized junctionFruit O _r,n,m ：

Wherein, the liquid crystal display device comprises a liquid crystal display device,x represents _r,1,m 、X _r,2,m 、X _r,3,m …, X _r,N,m Is the maximum value of (a).

Assume that the mth activity behavior feature X in the original population activity data of the mth candidate neighborhood in the nth time slice _r,n,m Belonging to access frequencies (such as X _r,n,m Belonging to the frequency of access to points of interest of the type of restaurant, company, institution, shopping, service, attraction, entertainment, education or residence, X can be calculated first by the following formula _r,n,m Corresponding TF-IDF parameter Y _r,n,m ：

/>

Where F represents the total number of access frequencies for a predetermined type of points of interest, such as where each raw demographic activity data in the raw observation sequence corresponding to the r-th candidate neighborhood relates to access frequencies for 9 types of points of interest, restaurant, company, organization, shopping, service, attraction, entertainment, education, and residence, then F is equal to 9. For another example, each of the raw demographic activity data in the raw observation sequence corresponding to the r candidate neighborhood relates to the frequency of access to 4 types of points of interest for restaurants, education, attractions, and residences, and then F is equal to 4.

Further, the TF-IDF parameter Y is calculated by the following formula _r,n,m Performing maximum normalization to obtain normalized result O _r,n,m ：

Wherein, the liquid crystal display device comprises a liquid crystal display device,represents Y _r,1,m 、Y _r,2,m 、Y _r,3,m …, Y _r,N,m Is the maximum value of (a).

It should be noted that, for the original observation sequence X corresponding to the r candidate block _r ＝{X _r,1 ,X _r,2 ,X _r,3 ,...,X _r,N }，X _r,n ＝{X _r,n,1 ,X _r,n,2 ,X _r,n,3 ,...,X _r,n,M R=1, 2,3, …, R, n=1, 2,3, …, N. Assume { X _r,n,1 ,X _r,n,2 ,X _r,n,3 Belongs to population flowing number, and { X } _r,n,4 ,X _r,n,5 ,X _r,n,6 ,...,X _r,n,M The access frequency for a predetermined type of point of interest. Then, respectively to X _r,n,1 、X _r,n,2 And X _r,n,3 Performing maximum normalization to obtain normalized result O _r,n,1 、O _r,n,2 And O _r,n,3 . And calculate X _r,n,4 、X _r,n,5 、X _r,n,6 …, X _r,n,M Corresponding TF-IDF parameters Y _r,n,4 、Y _r,n,5 、Y _r,n,6 …, Y _r,n,M Then respectively to Y _r,n,4 、Y _r,n,5 、Y _r,n,6 …, Y _r,n,M Performing maximum normalization to obtain normalized result O _r,n,4 、O _r,n,5 、O _r,n,6 …, O _r,n,M . Thus, population activity data O of the (r) th candidate neighborhood in the (n) th time slice is obtained _r,n ＝{O _r,n,1 ,O _r,n,2 ,O _r,n,3 ,...,O _r,n,M Obtaining an observation sequence O corresponding to the r candidate block _r ＝{O _r,1 ,O _r,2 ,O _r,3 ,...,O _r,N }。

In addition, referring to the practical example, the maximum value normalization processing is performed on the population flow number (the number of people moving in, the number of people staying in and the number of people moving out) related to the population activity data in the original observation sequence corresponding to the block of the Qinghai garden in Beijing city, the corresponding TF-IDF parameters are calculated for the access frequency of the points of interest of the predetermined type, and the maximum value normalization is performed on the calculated TF-IDF parameters, so that the observation sequence corresponding to the block of the Qinghai garden as shown in fig. 6 can be obtained.

In one embodiment, the method for determining the local probability that the target block is in any hidden state of the hidden markov model in any time slice covered by the observation sequence may include the following steps: determining the emission probability of population activity data in the time slice in the observation sequence under the condition that the target block is in the hidden state in the time slice based on population activity data in the time slice in the observation sequence, gaussian distribution mean values which correspond to all candidate blocks related to the hidden Markov model and Gaussian distribution variance which correspond to all candidate blocks; the local probability that the target block is in the hidden state in the time slice is determined based on the local probability that the target block is in each hidden state of the hidden Markov model in the last time slice adjacent to the time slice, the state transition probability corresponding to the target block in the hidden Markov model, and the emission probability.

It should be noted that, the local probability that the target block is in the hidden state in the first time slice covered by the observation sequence is determined based on the probability corresponding to the hidden state in the initial state probability corresponding to the target block and the emission probability of population activity data in the first time slice generated by the target block under the condition that the target block is in the hidden state in the first time slice.

In one embodiment, the step of determining the emission probability of the target block to generate the population activity data in the time slice in the observation sequence under the condition that the time slice is in the hidden state based on the population activity data in the time slice in the observation sequence, the gaussian distribution mean value commonly corresponding to each candidate block related to the hidden markov model, and the gaussian distribution variance commonly corresponding to each candidate block, may include the steps of: the probability of transmitting the demographic activity data is determined based on the variance of the gaussian distribution to which the probability of each of the activity behavior features to which the demographic activity data of the observation sequence is subject is subjected in the hidden state, the mean of the gaussian distribution to which the probability of each of the activity behavior features to which the demographic activity data of the observation sequence is subject is respectively generated in the hidden state, and the demographic activity data of the target neighborhood in the time slice in the observation sequence, the probability of transmitting the demographic activity data being generated in the hidden state of the hidden markov model in the time slice.

Specifically, the probability of transmission of demographic activity data in the nth time slot in the observation sequence may be calculated by the following formula for the target neighborhood to be in the kth hidden state in the nth time slot

Wherein o is _r,n,m An mth activity behavior feature representing the target neighborhood in an nth time slice, the mth activity behavior feature being related to population activity data of the target neighborhood; mu (mu) _k,m A mean value representing a gaussian distribution to which probability of generating an mth active behavioral feature is subject under a kth hidden state; sigma (sigma) _k,m Representing the variance of the gaussian distribution to which the probability of producing the mth active behavioral feature is subject under the kth hidden state; m represents the total number of activity behavior features to which each of the demographic activity data in the observation sequence relates.

In one embodiment, after the step of determining the hidden state of the target block in the last time slice, the method may further include the following steps: and predicting the hidden state of the target block in the next time slice of the last time slice based on the hidden state of the target block in the last time slice covered by the observation sequence and the state transition probability corresponding to the target block in the hidden Markov model.

It can be appreciated that the model parameters of the hidden Markov model include the state transition probability corresponding to the target block, i.e. the target block corresponds to the given hidden Markov modelThe state transition probabilities are determined. As described above, the state transition probabilities corresponding to the target block include the probability that the target block transitions between each hidden state of the hidden markov model, and accordingly determine the hidden state of the target block within the last time slice covered by the corresponding observation sequence Then, the hidden state can be determined based on the state transition probability corresponding to the target neighborhood>And starting to carry out the maximum transition probability of the state transition, so that the hidden state corresponding to the maximum transition probability is used as the hidden state of the target block in the next time slice of the last time slice covered by the observation sequence.

For example, state transition probability A corresponding to the target neighborhood _r Expressed by the following matrix, it is assumed that the hidden state of the target block in the last time slice covered by the corresponding observation sequence is determinedIs the 3 rd hidden state, then from A _r,3,1 、A _r,3,2 、A _r,3,3 …, A _r,3,K Determining the probability of maximum value as the hidden stateMaximum transition probability for proceeding to state transition. Further, assume A _r,3,2 Is A _r,3,1 、A _r,3,2 、A _r,3,3 …, A _r,3,K And taking the 2 nd hidden state as the hidden state of the target block in the next time slice of the last time slice covered by the corresponding observation sequence. />

In one embodiment, after the step of predicting the hidden state of the target block in the time slice next to the last time slice, the method may further include the following steps: and predicting population activity data of the target block in the time slice next to the last time slice based on the hidden state of the target block in the time slice next to the last time slice and the Gaussian distribution mean value in the hidden Markov model.

It will be appreciated that the model parameters of the hidden markov model include: the gaussian distribution mean value commonly corresponding to each candidate neighborhood related to the hidden markov model is determined given the hidden markov model. As described above, the mean value of the Gaussian distribution may include the mean value of the Gaussian distribution to which the probability of each activity behavior feature related to the population activity data in the observation sequence is subjected, respectively, under the condition that the target block is in each hidden state of the hidden Markov model, thereby determining the hidden state of the target block in the next time slice of the last time sliceThen, the target neighborhood in the Gaussian distribution mean can be in the hidden state +.>And (3) respectively generating the average value of the Gaussian distribution obeyed by the probability of each activity behavior feature related to the population activity data in the observation sequence corresponding to the target block as the population activity data of the target block in the next time slice of the last time slice.

For example, the Gaussian distribution mean μ, which corresponds jointly to each candidate neighborhood related to the hidden Markov model, is represented by a matrix that assumes the hidden state of the target neighborhood within the last time slice covered by its corresponding observation sequence is determined For the 3 rd hidden state, { mu }, can be _3,1 ，μ _3,2 ，μ _3,3 ，...，μ _3,M And taking the population activity data of the target block in the next time slice of the last time slice as population activity data of the target block, wherein M is the total number of activity behavior characteristics related to the population activity data in the observation sequence corresponding to the candidate block.

In one embodiment, as shown in fig. 7, the training method of the hidden markov model may include the following steps S702 to S710.

S702, obtaining observation sequences corresponding to the candidate blocks respectively.

S704, in the current iteration, determining the current intermediate state probability and the current intermediate state transition probability respectively corresponding to each candidate block based on the observation sequence respectively corresponding to each candidate block, the initial state probability and the state transition probability respectively corresponding to each candidate block determined last time, the Gaussian distribution mean value respectively corresponding to each candidate block and the Gaussian distribution variance respectively corresponding to each candidate block.

S706, determining the current initial state probability of each candidate block based on each current intermediate state probability, and determining the current state transition probability of each candidate block based on each current intermediate state transition probability.

S708, based on the current intermediate state probabilities, determining a current Gaussian distribution mean value and a current Gaussian distribution variance which are corresponding to the candidate blocks.

And S710, when the iteration termination condition is met, obtaining a hidden Markov model based on the initial state probability and the state transition probability which are respectively corresponding to the candidate blocks and are determined last time, and the Gaussian distribution mean value and the Gaussian distribution variance which are commonly corresponding to the candidate blocks.

For the hidden Markov model, where the observation sequence is known, the model parameters may be learned by the Baum-Welch algorithm (i.e., the Bom-Welch algorithm) to determine the hidden Markov model.

The process of training the hidden markov model (i.e., the process of learning model parameters of the hidden markov model) with the observation sequence known is an iterative calculation process. Specifically, in each iteration round, first, for each candidate block, based on an observation sequence corresponding to the candidate block, an initial state probability and a state transition probability corresponding to the candidate block determined last time, and a gaussian distribution mean value and a gaussian distribution variance corresponding to each candidate block related to a hidden markov model, a current intermediate state probability and a current intermediate state transition probability corresponding to the candidate block are determined, so that each current intermediate state probability corresponding to each candidate block and each current intermediate state transition probability corresponding to each candidate block are determined.

Wherein, the current intermediate state probability corresponding to the r candidate block can be expressed as gamma _r 。γ _r Can be represented in a matrix of N x K as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,the probability that the r candidate block is in the kth hidden state in the nth time slice is represented, n=1, 2,3, …, N, k=1, 2,3, …, K, N is the total number of time slices covered by the observation sequence corresponding to the r candidate block, and K is the total number of hidden states of the crypto-mof model.

The current intermediate state transition probability corresponding to the r candidate block can be expressed as ζ _r 。ξ _r Can be represented in a matrix of P x K, where p=k (N-1), i.e.:

wherein, the liquid crystal display device comprises a liquid crystal display device,indicating that the (r) th candidate neighborhood is within the (n-1) th time sliceIn the j-th hidden state and the probability of being in the K-th hidden state in the N-th time slice, n=2, 3,4, …, N, k=1, 2,3, …, K, j=1, 2,3, …, K, N is the total number of time slices covered by the observation sequence corresponding to the r-th candidate block, and K is the total number of hidden states of the hidden Markov model.

In addition, the probability that the (r) th candidate neighborhood is in the (k) th hidden state in the (n) th time slice can be calculated by the following formula

Wherein, the liquid crystal display device comprises a liquid crystal display device,representing forward probability corresponding to the nth candidate block in the kth hidden state in the nth time slice, namely, generating corresponding population activity data (namely { O _r,1 ,O _r,2 ,O _r,3 ,...,O _r,n -probability of }); />Representing backward probability corresponding to the nth candidate block in the kth hidden state in the nth time slice, namely, corresponding population activity data (namely { O _r,n+1 ,O _r,n+2 ,O _r,n+3 ,...,O _r,N -N is the total number of time slices covered by the observation sequence corresponding to the r-th candidate neighborhood; />Representation ofForward probability corresponding to the nth candidate block in the kth hidden state in the nth time slice, i.e. corresponding population activity data in the observation sequence corresponding to the nth candidate block is generated corresponding to each time slice before the nth time slice and the nth time slice in the kth hidden state in the nth time slice (i.e. { O } _r,1 ,O _r,2 ,O _r,3 ,...,O _r,N -j) probability.

The probability that the (th) candidate neighborhood is in the (j) th hidden state in the (n-1) th time slice and in the (k) th hidden state in the (n) th time slice can be calculated by the following formula

Wherein, the liquid crystal display device comprises a liquid crystal display device,representing forward probability corresponding to the (th) candidate block in the (th) hidden state in the (th) 1 time slice, i.e. corresponding to the (th) candidate block in the (th) hidden state in the (th) 1 time slice, and generating corresponding observation data (i.e. O _r,1 ,O _r,2 ,O _r,3 ,...,O _r,n-1 ) Probability of (2); a is that _r,j,k ^(t) Representing the probability of the last determined r candidate block to transition from the j-th hidden state to the k-th hidden state; />Generating demographic activity data (i.e., { O }) in the nth time slice in the observation sequence corresponding to the nth candidate block under the condition that the nth candidate block is in the kth hidden state in the nth time slice _r,n,1 ,O _r,n,2 ,O _r,n,3 ,...,O _r,n,M M is the population in the observation sequenceTotal number of active behavioral characteristics to which the activity data relates); the parameters are->Is->The definition of (2) may be the same as the definition in the foregoing, and will not be repeated here.

In addition, the probability of transmission of population activity data in the nth time slice in the observation sequence corresponding to the nth candidate neighborhood under the condition that the nth candidate neighborhood is in the kth hidden state can be calculated by the following formula

The parameter sigma is described herein _k,m 、μ _k,m 、o _r,n,m And the definition of M may be the same as the definition in the foregoing, and will not be repeated here.

In addition, the forward probability corresponding to the nth candidate neighborhood under the condition of being in the kth hidden state in the nth time slice can be calculated by the following formula

Wherein, the liquid crystal display device comprises a liquid crystal display device,representing forward probability corresponding to the condition that the (r) th candidate block is in the (k) th hidden state in the (n-1) th time slice, namely, each time before the (n-1) th time slice and the (n-1) th time slice under the condition that the (r) th candidate block is in the (k) th hidden state in the (n-1) th time slice The patch generates corresponding population activity data (i.e. O) in the observation sequence corresponding to the r candidate block _r,1 ,O _r,2 ,O _r,3 ,...,O _r,n-1 ) Is a probability of (2). Here, to parameters ofAnd K may be the same as those defined above, and are not described here.

And, the forward probability corresponding to the (r) th candidate neighborhood under the condition that the (k) th candidate neighborhood is in the (1) th time sliceWherein pi _r,k ^(t) Representing the probability that the (r) candidate block determined last time is in the (k) hidden state in the (1) time slice; />Generating demographic activity data (i.e., { O }) in the 1 st time slice in the observation sequence corresponding to the r-th candidate block under the condition that the r-th candidate block is in the k-th hidden state in the 1 st time slice _r,1,1 ,O _r,1,2 ,O _r,1,3 ,...,O _r,1,M -a) transmission probability.

In addition, the backward probability corresponding to the nth candidate neighborhood under the condition of being in the kth hidden state in the nth time slice can be calculated by the following formula

Wherein, the liquid crystal display device comprises a liquid crystal display device,the backward probability corresponding to the condition that the (r) th candidate block is in the (k) th hidden state in the (n+1) th time slice is represented, namely, each time after the (n+1) th time slice is represented by the condition that the (r) th candidate block is in the (k) th hidden state in the (n+1) th time sliceThe patch generates corresponding population activity data (i.e., { O }, respectively) in the observation sequence corresponding to the r candidate block _r,n+2 ,O _r,n+3 ,O _r,n+4 ,...,O _r,N -N is the total number of time slices covered by the observation sequence corresponding to the r-th candidate neighborhood; />Representing that the (th) candidate block generates population activity data (i.e., { O) _r,n+1,1 ,O _r,n+1,2 ,O _r,n+1,3 ,...,O _r,n+1,M M is the total number of active behavior features involved in demographic activity data in the observation sequence corresponding to the r-th candidate neighborhood); p(s) _r,n |s _r,n+1 ) Indicating that the (r) th candidate block is in the hidden state s at the (n) th time slice _r,n And is in the hidden state s at the n+1th time slice _r,n+1 Is a probability of (2).

And, the backward probability corresponding to the r candidate block under the condition that the r candidate block is in the k hidden state in the last time slice (namely the N time slice) covered by the corresponding observation sequence

In each iteration, the current intermediate state probability and the current intermediate state transition probability corresponding to each candidate block are calculated based on the last determined model parameter. However, for the first determination of the model parameters in the first iteration, there is no model parameter determined last time, so the model parameters of the hidden markov model may be initialized to obtain initial values of the model parameters (the initial values of the model parameters may be expressed as θ ⁽⁰⁾ ＝{π _r ⁽⁰⁾ ,A _r ⁽⁰⁾ ,μ ⁽⁰⁾ ,σ ⁽⁰⁾ }). Further, when the model parameters are first determined in the first iteration, the model parameters are calculated based on the initial values of the model parametersAnd obtaining the current intermediate state probability and the current intermediate state transition probability respectively corresponding to each candidate neighborhood.

In the previous iteration, after determining the current intermediate state probability and the current intermediate state transition probability respectively corresponding to each candidate block related to the hidden Markov model, for each candidate block, determining the current initial state probability of the candidate block based on the current intermediate state probability corresponding to the candidate block, and determining the current state transition probability of the candidate block based on the current intermediate state transition probability corresponding to the candidate block, thereby determining the current initial state probability respectively corresponding to each candidate block and the current state transition probability respectively corresponding to each candidate block.

Specifically, the current intermediate state probability pi corresponding to the r candidate neighborhood _r ^(t+1) May include: the r candidate neighborhood is in the 1 st time slice and is in the current probability of each hidden state of the hidden Markov model, namely pi _r,k ^(t+1) K=1, 2,3, …, K is the total number of hidden states of the hidden markov model.

The current probability pi that the (th) candidate neighborhood is in the (k) th hidden state of the hidden Markov model in the (1 st) th time slice can be calculated by the following formula _r,k ^(t+1) ：Wherein (1)>The probability that the r candidate block is in the kth hidden state in the 1 st time slice in the current intermediate state probability corresponding to the r candidate block can be represented.

Current state transition probability a corresponding to the r candidate block _r ^(t+1) May include: the current probability that the r candidate neighborhood transitions between hidden states of the hidden Markov model, namely A _r,j,k ^(t+1) K=1, 2,3, …, K, j=1, 2,3, …, K being the total number of hidden states of the hidden markov model.

Wherein the current probability A of the transition of the (r) th candidate neighborhood from the (j) th hidden state to the (k) th hidden state can be calculated by the following formula _r,j,k ^(t+1) ：Wherein, the parameters are->The definition of (2) may be the same as the definition of the corresponding parameters in the foregoing, and will not be repeated here.

In the previous iteration, after determining the current initial state probability of each candidate block and the current state transition probability of each candidate block, the current gaussian distribution mean value commonly corresponding to each candidate block can be further determined based on the current intermediate state probabilities respectively corresponding to each candidate block, and the current gaussian distribution variance commonly corresponding to each candidate block can be determined based on the current intermediate state probabilities respectively corresponding to each candidate block.

Specifically, the current gaussian distribution mean value that each candidate block corresponds to in common may be determined in common based on the current intermediate state probability that each candidate block corresponds to, respectively, and the observation sequence that each candidate block corresponds to, respectively. In addition, the current gaussian distribution variance that each candidate block corresponds to in common may be determined in common based on the current intermediate state probability that each candidate block corresponds to, the observation sequence that each candidate block corresponds to, and the current gaussian distribution mean.

After determining the current model parameters of the hidden markov model (i.e., the current initial state probability corresponding to each candidate block, the current state transition probability corresponding to each candidate block, the gaussian distribution mean value corresponding to each candidate block, and the gaussian distribution variance corresponding to each candidate block) in the current iteration round, it can be determined whether the iteration termination condition is satisfied. The iteration termination condition is a condition for judging whether the current model parameters are converged or not. The iteration termination condition may be preset based on actual requirements, for example, the iteration termination condition may include, but is not limited to, a number of iterations corresponding to a current iteration being greater than or equal to a predetermined number of thresholds.

If the iteration termination condition is met, the last determined model parameters (the initial state probability and the state transition probability corresponding to each candidate neighborhood determined in the last time respectively, the Gaussian distribution mean value corresponding to each candidate neighborhood jointly, and the Gaussian distribution variance corresponding to each candidate neighborhood jointly) are the final model parameters of the hidden Markov model. If the iteration termination condition is not met, the next iteration can be executed, and model training is continued.

It can be appreciated that the trained hidden markov model can be used to determine each hidden state sequence corresponding to each candidate neighborhood to which it relates.

According to the model training method, for each block, the current intermediate state probability and the current intermediate state transition probability corresponding to the block are determined based on the observation sequence corresponding to the block, the initial state probability and the state transition probability corresponding to the block determined last time, and the Gaussian distribution mean value and the Gaussian distribution variance commonly corresponding to the blocks related to the hidden Markov model, then the current initial state probability of each block is determined based on the current intermediate state probabilities corresponding to the blocks, and the current state transition probability of each block is determined based on the current intermediate state probabilities corresponding to the blocks. Therefore, in the iterative calculation process, the operation support for determining the current initial state probability and the current state transition probability of each block is parallel, so that the time complexity is effectively reduced, and the method is applicable to scenes of large-scale data, namely, the method supports dynamic learning of long-time and fine-granularity places.

In addition, aiming at the scheme of modeling symbiotic relationship among time, place and human activities in the traditional technology by means of characterization learning (such as Cross-Modal Representation Learning). In addition to the defect that parallel processing of data is difficult to support, the scheme cannot determine the transition condition of the states of the blocks, and cannot distinguish the states of different blocks.

However, in the present application, the hidden markov model is trained to evaluate population flow characteristics of the blocks, and model parameters of the trained hidden markov model include state transition probabilities corresponding to the respective blocks, so that transition situations of states of the respective blocks can be determined, and states of different blocks can be distinguished.

In another embodiment, one hidden markov model, in which model parameters include initial state probabilities, state transition probabilities, and observation probabilities, may be learned by using observation sequences corresponding to blocks, respectively. However, this approach cannot embody the difference in state transition between blocks due to the difference in the function types to which it belongs.

However, in the model training method provided by the application, one hidden markov model is learned together by using the observation sequences corresponding to the respective blocks, but the model parameters of the hidden markov model include the respective initial state probabilities corresponding to the respective blocks related to the hidden markov model, the state transition probabilities corresponding to the respective blocks, the gaussian distribution mean value corresponding to the respective blocks, and the gaussian distribution variance corresponding to the respective blocks. On the one hand, each block has the corresponding initial state probability and state transition probability, so that the difference in state transition between blocks due to the difference of the function types of the blocks can be reflected; on the other hand, the observation sequences corresponding to the blocks are used for learning a hidden Markov model together, rather than the hidden Markov models corresponding to the blocks, so that the problems that the model learning is insufficient and the association between the blocks cannot be established due to sparse training data are effectively solved.

In one embodiment, in the current iteration, the step of determining the current intermediate state probability and the current intermediate state transition probability corresponding to each candidate block respectively, based on the observation sequence corresponding to each candidate block respectively, the initial state probability and the state transition probability corresponding to each candidate block determined last time, the gaussian distribution mean value corresponding to each candidate block in common, and the gaussian distribution variance corresponding to each candidate block in common, that is, step S302 may include the following steps: in the current iteration, determining current target sequence fragments corresponding to each candidate neighborhood respectively; the current target sequence segment corresponding to the candidate block is a sequence segment which is not used as a target sequence segment in the current iteration in each sequence segment contained in the observation sequence corresponding to the candidate block; and determining the current intermediate state probability and the current intermediate state transition probability which are respectively corresponding to each candidate block based on each current target sequence segment, the initial state probability which is respectively corresponding to each candidate block and is determined last time, the state transition probability which is respectively corresponding to each candidate block, the Gaussian distribution mean value which is commonly corresponding to each candidate block and the Gaussian distribution variance which is commonly corresponding to each candidate block.

Accordingly, after the step of determining the current gaussian distribution mean value and the current gaussian distribution variance, which are commonly corresponding to each block, based on the current intermediate state probabilities corresponding to each candidate block, respectively, that is, after the step S306, the method may further include the following steps: and returning to the step of determining the current target sequence segments corresponding to the candidate blocks respectively, and judging whether the iteration termination condition is met or not until all sequence segments contained in the observation sequences corresponding to the candidate blocks are used as target sequence segments in the current iteration.

In this embodiment, for each candidate block, the observation sequence corresponding to the candidate block may be split into two or more sequence segments. Accordingly, in each iteration, for each candidate block, the current intermediate state probability and the current intermediate state transition probability corresponding to the candidate block may be calculated once respectively based on each sequence segment included in the observation sequence corresponding to the candidate block. It can be understood that, each time the current intermediate state probability and the current intermediate state transition probability corresponding to each candidate block involved in the hidden markov model are calculated, the current model parameters of the hidden markov model are determined once (i.e., the current initial state probability and the current state transition probability corresponding to each candidate block once, and the current gaussian distribution mean and the current gaussian distribution variance corresponding to each candidate block involved in the hidden markov model in common) are determined once.

In combination with the foregoing example, the observation sequences corresponding to the candidate blocks related to the hidden markov model each include demographic activity data in 720 time slices, and for the observation sequence corresponding to each candidate block, the observation sequence may be split into 15 sequence segments at intervals of 48. For example, observation sequence O corresponding to the r candidate block _r ＝{O _r,1 ,O _r,2 ,O _r,3 ,...,O _r,720 The sequence is split into 15 sequence fragments, and the 1 st sequence fragment is { O } _r,1 ,O _r,2 ,O _r,3 ,...,O _r,48 Sequence segment 2 { O } _r,49 ,O _r,2 ,O _r,3 ,...,O _r,96 And so on, the 15 th sequence fragment is { O } _r,673 ,O _r,2 ,O _r,3 ,...,O _r,720 }。

Accordingly, in the t+1st round of iteration, the process may be as follows: for each candidate block, the 1 st sequence segment in the observation sequence corresponding to the candidate block is first selected (for example, the 1 st sequence segment of the r candidate block may be { O } _r,1 ,O _r,2 ,O _r,3 ,...,O _r,48 }) determining a current target sequence segment corresponding to the candidate block, determining a current intermediate state probability and a current intermediate state transition probability corresponding to the candidate block based on the 1 st sequence segment corresponding to the candidate block and model parameters determined last time (namely initial state probability and state transition probability corresponding to the candidate block obtained by the t-th round iteration, and Gaussian distribution mean and Gaussian distribution variance commonly corresponding to each candidate block related to the hidden Markov model), determining a current initial state probability of the candidate block based on the current intermediate state probability corresponding to the candidate block, and determining a current initial state probability of the candidate block based on the candidate And determining the current state transition probability of the candidate neighborhood according to the current intermediate state transition probability corresponding to the neighborhood. And determining the current Gaussian distribution mean value and the current Gaussian distribution variance which are commonly corresponding to the candidate blocks based on the current intermediate state probabilities which are respectively corresponding to the candidate blocks related to the hidden Markov model.

Further, for each candidate block, the 2 nd sequence segment in the observation sequence corresponding to the candidate block (for example, the 2 nd sequence segment of the r candidate block may be { O } _r,49 ,O _r,2 ,O _r,3 ,...,O _r,96 And }) determining the current target sequence segment corresponding to the candidate block, determining the current intermediate state probability and the current intermediate state transition probability corresponding to the candidate block based on the 2 nd sequence segment corresponding to the candidate block and the last determined model parameter (namely, in the t+1st iteration, the initial state probability and the state transition probability corresponding to the candidate block obtained based on the 1 st sequence segment corresponding to the candidate block, and the Gaussian distribution mean and the Gaussian distribution variance jointly corresponding to each candidate block related to the hidden Markov model), determining the current initial state probability and the current intermediate state transition probability of the candidate block based on the current intermediate state probability corresponding to the candidate block, and determining the current state transition probability of the candidate block based on the current intermediate state transition probability corresponding to the candidate block. And determining the current Gaussian distribution mean value and the current Gaussian distribution variance which are commonly corresponding to the candidate blocks based on the current intermediate state probabilities which are respectively corresponding to the candidate blocks related to the hidden Markov model.

And analogically, determining the 15 th sequence segment in the observation sequence corresponding to each candidate block as the current target sequence segment corresponding to the candidate block until each candidate block is aimed at, executing the similar steps based on the 15 th sequence segment corresponding to the candidate block, determining the current state transition probability of the candidate block according to the current intermediate state transition probability corresponding to the candidate block, and jointly determining the current Gaussian distribution mean value and the current Gaussian distribution variance which are commonly corresponding to each candidate block based on the current intermediate state probability respectively corresponding to each candidate block related to the hidden Markov model.

So far, the t+1st round of iteration is completed, and whether the iteration termination condition is met can be judged. If so, obtaining a trained hidden Markov model based on the last determined model parameters (namely, initial state probability and state transition probability respectively corresponding to each candidate block and Gaussian distribution mean value and Gaussian distribution variance jointly corresponding to each candidate block obtained based on 15 th sequence segments corresponding to each candidate block in the t+1 th iteration); if not, the next iteration (i.e., the t+2 iteration) is executed, and the processing procedure in the t+2 iteration is similar to the processing procedure in the t+1 iteration, which is not repeated here.

In one embodiment, for each candidate block, the current intermediate state probability corresponding to the candidate block includes a current probability that the candidate block is in each hidden state within each target time slice of each time slice covered by the corresponding observation sequence. And, the current Gaussian distribution mean μ ^(t+1) Comprising generating, under the condition of hidden states of a hidden Markov model, respectively, a current mean value, i.e. mu, of a Gaussian distribution to which probabilities of active behavior features related to population activity data in an observation sequence are subjected _k,m ^(t+1) ，k＝1,2,3,…,K，m＝1,2,3,…,M。

Accordingly, the manner of determining the current mean of the gaussian distribution to which the probability of generating any activity behavior feature involved in population activity data in an observation sequence is subject under any hidden state of the hidden markov model may comprise the steps of: and determining the current mean value of Gaussian distribution obeyed by the probability of generating the active behavior feature under the condition of being in the hidden state based on the current probability of being in the hidden state in each target time slice and the active behavior feature related to the population activity data of each candidate block in each target time slice.

Specifically, the current mean μ of the gaussian distribution to which the probability of generating the mth active behavior feature is subjected in the kth hidden state condition can be calculated by the following formula _k,m ^(t+1) ：

Wherein, the liquid crystal display device comprises a liquid crystal display device,representing the current probability that the r candidate block is in the kth hidden state in the nth time slice; o (O) _r,n,m Representing the mth activity behavior feature related to the population activity data of the mth candidate neighborhood in the nth target time slice; r represents the total number of candidate blocks involved in the hidden Markov model; n1 represents the total number of target time slices covered by the observation sequence corresponding to the r candidate block.

If the observation sequences corresponding to the candidate blocks are split into two or more sequence segments, in each iteration, for each candidate block, the current intermediate state probability and the current intermediate state transition probability corresponding to the candidate block are calculated once based on the sequence segments included in the observation sequence corresponding to the candidate block, respectively, and then the current mean μ of the gaussian distribution to which the probability of generating the mth active behavior feature is subjected is calculated under the condition of being in the kth hidden state _k,m ^(t+1) When the target time slices covered by the observation sequence corresponding to the r candidate block described above are the time slices covered by the current target sequence segment, that is, N1 may be equal to the total number of the time slices covered by the current target sequence segment. Taking the example of dividing the observation sequence containing the population activity data in 720 time slices into 15 sequence segments at intervals of 48 in the foregoing, N1 may be equal to 48.

In addition, if the observation sequences respectively corresponding to the candidate blocks are not split into more than two sequence segments, in each iteration, for each candidate block, calculating once based on the complete observation sequence corresponding to the candidate block, the current intermediate state probability and the current intermediate state transition probability corresponding to the candidate block, and calculating the current Gaussian distribution to which the probability of generating the mth active behavior feature is subjected under the condition of being in the kth hidden stateMean mu _k,m ^(t+1) In this case, each target time slice covered by the observation sequence corresponding to the r candidate block described above is each time slice covered by the complete observation sequence, that is, N1 may be equal to the total number of time slices covered by the complete observation sequence corresponding to the r candidate block. Taking the observation sequence containing the population activity data in 720 time slices as an example, N1 may be equal to 720.

In one embodiment, the current Gaussian distribution variance σ ^(t+1) Comprising generating, under conditions of respective hidden states of the hidden Markov model, respectively, a current variance, i.e., σ, of a gaussian distribution to which probabilities of respective activity behavior features involved in population activity data in the observation sequence are subjected _k,m ^(t+1) ，k＝1,2,3,…,K，m＝1,2,3,…,M。

Accordingly, determining the current variance of the gaussian distribution to which the probability of generating any activity behavior feature involved in population activity data in the observation sequence is subject under the condition of any hidden state of the hidden markov model may comprise the steps of: determining a current variance of a gaussian distribution obeyed by the probability of generating the active behavior feature under the condition of the hidden state based on a current probability of each candidate block being in the hidden state in each target time slice, the active behavior feature related to population activity data of each candidate block in each target time slice, and a current average of the gaussian distribution obeyed by the probability of generating the active behavior feature under the condition of the hidden state.

Specifically, the current variance σ of the gaussian distribution to which the probability of generating the mth active behavior feature is subjected under the condition of being in the kth hidden state can be calculated by the following formula _k,m ^(t+1) ：

Wherein the parameters are as described hereinO _r,n,m 、μ _k,m ^(t+1) And N1 may be the same as those defined above, and will not be described here.

In one embodiment, as shown in fig. 8, a method for determining a function type of a block is provided. The method is described as applied to a computer device such as the terminal 210 or the server 220 of fig. 2 described above. The method may include the following steps S802 to S810.

S802, obtaining observation sequences corresponding to candidate blocks related to the hidden Markov model.

S804, based on initial state probabilities corresponding to the candidate blocks in the hidden Markov model, state transition probabilities corresponding to the candidate blocks, gaussian distribution mean values corresponding to the candidate blocks, and Gaussian distribution variances corresponding to the candidate blocks, local probabilities of the candidate blocks in hidden states of the hidden Markov model in time slices covered by the observation sequence are determined, and back pointers corresponding to the local probabilities are determined based on the local probabilities.

S806, determining the hidden state of each candidate block in the last time slice based on the maximum local probability of the local probabilities of each candidate block in each hidden state in the last time slice covered by the observation sequence.

S808, performing optimal path backtracking based on the hidden state of each candidate block in the last time slice and each back pointer to obtain hidden state sequences corresponding to each candidate block.

S810, clustering is carried out based on hidden state sequences corresponding to the candidate blocks respectively, and the function type of each candidate block is determined from the candidate function types based on a clustering result.

The function type can be used for representing functions of the neighborhood. Candidate function types may be preset based on actual needs, such as setting each candidate function type to be tourist attractions, residential areas, comprehensive areas, business areas, schools, composite areas, companies, and others, respectively.

In this embodiment, by the method for determining a hidden state sequence provided in any embodiment of the present application, a hidden state sequence corresponding to each candidate block related to the hidden markov model is determined, and further clustering is performed based on the hidden state sequences corresponding to each candidate block, and the function type to which each candidate block belongs is determined from the candidate function types based on the clustering result.

Specifically, determining hidden state sequences corresponding to each candidate neighborhoodr=1, 2,3, …, R represents the total number of candidate blocks involved in the hidden markov model. Further, the sequence distance between every two of the hidden state sequences, that is, the sequence distance between the hidden state sequence corresponding to the 1 st candidate block and the hidden state sequence corresponding to the 2 nd candidate block, the hidden state sequence corresponding to the 3 rd candidate block, …, and the hidden state sequence corresponding to the nth candidate block, the sequence distance between the hidden state sequence corresponding to the 2 nd candidate block and the hidden state sequence corresponding to the 3 rd candidate block, the hidden state sequence corresponding to the 4 th candidate block, …, and the hidden state sequence corresponding to the nth candidate block, and the like are determined. And clustering the candidate blocks based on the sequence distance between every two hidden state sequences by a K-means clustering algorithm, so as to determine a plurality of clusters, wherein each cluster corresponds to each candidate function type. For each candidate block, the cluster to which the candidate block belongs can be determined, so that the function type to which the candidate block belongs is determined.

Furthermore, the manner of determining the sequence distance between two hidden state sequences may be specifically as follows: and calculating the state distance between the hidden states corresponding to the same time slices in the two hidden state sequences by calculating the Euler distance, and further determining the sequence distance between the two hidden state sequences based on the state distance between the hidden states in each time slice. In particular, the ratio of the sum of the state distances between hidden states within each time slice to the total number of time slices can be taken as the sequence distance between two hidden state sequences.

With hidden state sequences corresponding to the 1 st candidate blockHidden state sequence corresponding to the 2 nd candidate block +.>For example, by calculating the Euler distance, the +.>And->State distance d1, between>And->State distance d2, between>And->Status distances d3, …, < >>And->The state distance dN between them, and further determining the hidden state sequence based on d1, d2, d3, …, and dN>And hidden state sequence->Sequence distance between them. Specifically, the hidden state sequence->And hidden state sequence->The sequence distance between them can be

It should be noted that, the function type of the candidate block is determined, so that a reference can be provided for city planning and city infrastructure, and introduction of new interest points and shop site selection can be directly guided.

It should be understood that, under reasonable conditions, although the steps in the flowcharts referred to in the foregoing embodiments are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, and the order of execution of the sub-steps or stages is not necessarily sequential, but may be performed in rotation or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

The technical scheme provided by the application is tested against population activity data of 665 central blocks and 2000000 users in 2018 and 4 months in Beijing city by combining with practical tests. The population activity data within 1 month is divided into a training set and a test set, the population activity data within the first three weeks is the training set, which is used as an existing observation sequence to learn model parameters of the hidden Markov model, and the test set is used for verifying the performance of the trained hidden Markov model.

First, in practical tests, 100 hidden states as shown in fig. 5 are learned on 665 central blocks in beijing city, and a series of specific examples and detailed explanation are given below to demonstrate the ability of the technical solution in the present application to find hidden states and reveal the dynamic state of blocks in the city.

As shown in fig. 9, the average value corresponding to the hidden state frequently occurring in the hidden state sequence corresponding to each block is shown, and the transition conditions of the hidden states on the working day and the non-working day are respectively shown. It will be appreciated that the normal weekend in 4 months, and the 3-day Qingming holidays of 4 months 5 days (Tuesday), 6 days (Friday) and 7 days (Saturday), belong to the non-workday, the normal workday and the 4 months 8 days (Sunday) belong to the workday.

First, discussing discovered hidden states, each hidden state shown in FIG. 9 has two aspects of semantics: (1) Population density and population flow, such as the hidden state 32 indicates large and high population flow, state 21 indicates small and high population flow, and state17 indicates small and low population flow. (2) The access frequency for different types of points of interest, such as the hidden state 31 indicates that the most commonly accessed point of interest is of educational type, and state 21 indicates that the most commonly accessed point of interest is of scenic spot type. As shown in fig. 10 (a), this block has a hidden state 79 during daytime because the university of bloom occupies most of the area of this block, and similarly, the block of university of beijing has a hidden state 99 and the block of the Tiananmen has a hidden state 81.

Further, the dynamics represented by the state transition process are discussed. It is apparent from fig. 10 that the dynamics of the blocks within a city have periodicity, since the status of the same period of different dates is generally the same. It is noted that the dynamics of some regions, as shown in fig. 9 (f), have large differences between the working day and the non-working time, while for other regions, as shown in fig. 9 (c), are very similar between the working day and the non-working time.

Taking the dynamic of the block of the university of bloom as an example, as shown in fig. 9 (a), the night is less people than the daytime because the average value of the hidden state70 and the hidden state 31 is smaller than the hidden state 79. Furthermore, on weekdays, a sudden crowd movement occurs because the hidden state32 appears at 8:00-9:00 and 17:00-19:00. on weekdays, the transition from hidden state70 to hidden state32, from hidden state32 to hidden state79, reveals dynamic features that only students live in the area at night, and more teachers enter the school in the morning, with a denser population than at night. Comparing fig. 9 (a) and (b), fig. 9 (c) shows that the type of interest points with higher population density and access is most common attractions, both on weekdays and on non-weekdays, because Tiananmen is one of the most famous tourist attractions in china.

In addition, the performance of the technical scheme provided by the application in determining the function type of the city is further evaluated in the practical test. Fig. 10 shows the clustering result of each block and the geographical distribution of the corresponding area. The technical scheme in the application obtains 8 function types on the data set, namely tourist attractions, residential areas, comprehensive areas, business areas, schools, compound areas, companies and others, and verifies the functions of some blocks (including the streets shown in fig. 9 on the map) through a manual labeling method, so that the technical scheme provided by the application can effectively determine the function types of cities. Further, the result is compared with the most advanced function type determining method in the actual test, namely an LDA model (Latent Dirichlet Allocation, latent dirichlet allocation model) using the points of interest and mobility, the result of the actual test is similar to the processing result of the LDA model, and the normalized mutual information (Normlized Mutual Information, NMI) is 0.25 (ranging from-0.5 to 1). In summary, the blocks with more common states and similar state transition processes are more likely to have the same functions, which proves that the technical scheme of the application can infer the distribution of the functional areas in the whole city.

Meanwhile, the performance of the technical scheme in the aspect of population flow behavior prediction is evaluated in practical tests. The prediction results are shown in fig. 11, and the difference between the predicted value and the actual value of the number of residents in the neighborhood of the university of bloom at 22 th to 30 th month 4 of 2018 is illustrated in fig. 11 (a).

To further demonstrate the superiority of the technical solution of the present application, it was compared with a common hidden markov model in practical tests. The comparison result of each index is shown in fig. 11 b, wherein the average RMSE (Root Mean Square Error ) of population flow prediction is 0.195, and the Top3 accuracy when predicting the most commonly accessed interest point is 41.4%, so that the technical scheme in the application is obviously superior to that of a common hidden markov model. In summary, the technical scheme in the application can be effectively applied to people flow prediction of a block in a city and prediction of frequently accessed interest points.

In one embodiment, as shown in fig. 12, a determination device 1200 of a hidden state sequence is provided. The apparatus may include the following modules 1202 to 1208.

The first observation sequence obtaining module 1202 is configured to obtain an observation sequence corresponding to a target block.

The first intermediate parameter determining module 1204 is configured to determine local probabilities that the target neighborhood is located in each hidden state of the hidden markov model in each time slice covered by the observation sequence, based on the observation sequence, the initial state probability corresponding to the target neighborhood in the hidden markov model, the state transition probability corresponding to the target neighborhood, the gaussian distribution mean value corresponding to each candidate neighborhood related to the hidden markov model, and the gaussian distribution variance corresponding to each candidate neighborhood, and determine back pointers corresponding to the local probabilities, respectively.

The first end hidden state determining module 1206 is configured to determine, based on a maximum local probability of local probabilities of each hidden state of the target block within a last time slice covered by the observation sequence, a hidden state in which the target block is located within the last time slice.

The first hidden state sequence determining module 1208 is configured to perform optimal path backtracking based on the hidden state of the target block in the last time slice and each back pointer, so as to obtain a hidden state sequence.

In one embodiment, as shown in fig. 13, a block function type determining apparatus 1300 is provided. The apparatus may include the following modules 1302 to 1310.

A second observation sequence obtaining module 1302, configured to obtain observation sequences corresponding to candidate blocks related to the hidden markov model respectively;

a second intermediate parameter determining module 1304, configured to determine local probabilities that each candidate block is in each hidden state of the hidden markov model in each time slice covered by the observation sequence, respectively, based on initial state probabilities corresponding to each candidate block in the hidden markov model, state transition probabilities corresponding to each candidate block, gaussian distribution means corresponding to each candidate block, and gaussian distribution variances corresponding to each candidate block, and determine back pointers corresponding to each local probability, respectively, based on each local probability;

a second end hidden state determining module 1306, configured to determine, based on a maximum local probability of local probabilities of each candidate block being in each hidden state in a last time slice covered by the observation sequence, a hidden state in which each candidate block is located in the last time slice;

a second hidden state sequence determining module 1308, configured to perform optimal path backtracking based on the hidden state of each candidate block in the last time slice and each back pointer, so as to obtain a hidden state sequence corresponding to each candidate block;

The function type determining module 1310 is configured to cluster based on hidden state sequences corresponding to the candidate blocks, and determine, from the candidate function types, the function type to which the candidate blocks belong based on the clustering result.

Note that, regarding the specific definition of the technical features in the hidden state sequence determining apparatus 1200, the definition of the hidden state sequence determining method may be referred to above, and regarding the specific definition of the technical features in the block function type determining apparatus 1300, the definition of the block function type determining method may be referred to above, which is not described herein. Each of the modules in the above-described apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, including a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the above-described method of determining a sequence of hidden states and/or method of determining a functional type of a block.

An internal block diagram of a computer device in one embodiment is shown in FIG. 14. The computer device may be specifically the server 220 of fig. 2. As shown in fig. 14, the computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor is configured to provide computing and control capabilities. The memory includes a non-volatile storage medium storing an operating system and a computer program, and an internal memory providing an environment for the operating system and the computer program in the non-volatile storage medium to run. The network interface is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, is adapted to implement the above-mentioned method of determining a sequence of hidden states and/or the method of determining a functional type of a block.

It will be appreciated by those skilled in the art that the structure shown in fig. 14 is merely a block diagram of a portion of the structure associated with the present application and is not intended to limit the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or may have a different arrangement of components.

In one embodiment, taking the hidden state sequence determination apparatus 1200 provided in the present application as an example, the apparatus may be implemented as a computer program, and the computer program may run on a computer device as shown in fig. 14. The memory of the computer device may store various program modules that make up the hidden state sequence determination apparatus 1200, such as the first observation sequence acquisition module 1202, the first intermediate parameter determination module 1204, the first end hidden state determination module 1206, and the first hidden state sequence determination module 1208, etc., as shown in fig. 12. The computer program of each program module causes the processor to execute the steps in the method for determining the hidden state sequence of each embodiment of the present application described in the present specification.

For example, the computer apparatus shown in fig. 14 may perform step S302 by the first observation sequence acquiring module 1202 in the hidden state sequence determining apparatus 1200 shown in fig. 12, step S304 by the first intermediate parameter determining module 1204, and so on.

Those skilled in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by a computer program for instructing relevant hardware, where the program may be stored in a non-volatile computer readable storage medium, and where the program, when executed, may include processes in the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

Accordingly, in one embodiment, a computer readable storage medium is provided, storing a computer program which, when executed by a processor, causes the processor to perform the above method for determining a sequence of hidden states and/or method for determining a functional type of a block.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of determining a sequence of hidden states, comprising:

obtaining an observation sequence corresponding to a target block, wherein the observation sequence comprises population activity data of the target block in more than two time slices, the population activity data relate to activity behavior characteristics of population activity behaviors corresponding to the block, and the activity behavior characteristics at least comprise population flowing numbers and access frequencies aiming at points of interest of a preset type;

Determining local probabilities that the target block is in each hidden state of the hidden Markov model in each time slice covered by the observation sequence based on the observation sequence, initial state probability corresponding to the target block in the hidden Markov model, state transition probability corresponding to the target block, gaussian distribution mean value commonly corresponding to each candidate block related to the hidden Markov model and Gaussian distribution variance commonly corresponding to each candidate block, and determining back pointers respectively corresponding to each local probability, wherein the hidden states are used for representing population density, population flow and population activity types of the blocks;

2. The method according to claim 1, wherein the obtaining the observation sequence corresponding to the target block includes:

Acquiring an original observation sequence corresponding to the target block; the original observation sequence comprises original population activity data of the target block in more than two time slices, and activity behavior characteristics related to each original population activity data comprise population flow numbers and access frequencies for points of interest of a preset type;

and carrying out maximum normalization on population flowing number in each piece of original population activity data and TF-IDF parameters corresponding to access frequency of interest points of a preset type in each piece of original population activity data to obtain an observation sequence corresponding to the target block.

3. The method of claim 1, wherein determining the manner in which the target neighborhood is at the local probability of any hidden state of the hidden markov model for any time slice covered by the observation sequence comprises:

determining the emission probability of population activity data in the time slice in the observation sequence under the condition that the target block is in the hidden state in the time slice based on population activity data in the time slice in the observation sequence, a Gaussian distribution mean value commonly corresponding to each candidate block related to the hidden Markov model and a Gaussian distribution variance commonly corresponding to each candidate block;

Determining the local probability that the target block is in the hidden state in the time slice based on the local probability that the target block is in each hidden state of the hidden Markov model in the last time slice adjacent to the time slice, the state transition probability corresponding to the target block in the hidden Markov model and the emission probability;

the local probability that the target block is in the hidden state in the first time slice covered by the observation sequence is determined based on the probability corresponding to the hidden state in the initial state probability corresponding to the target block and the emission probability of population activity data in the first time slice generated by the target block under the condition that the target block is in the hidden state in the first time slice.

4. The method of claim 3, wherein the gaussian distribution mean comprises a mean of gaussian distributions to which probabilities of respective activity behavior features involved in population activity data in the observation sequence are respectively generated under the condition of respective hidden states of the hidden markov model, and the gaussian distribution variance comprises a variance of gaussian distributions to which probabilities of respective activity behavior features involved in population activity data in the observation sequence are respectively generated under the condition of respective hidden states of the hidden markov model;

The determining, based on population activity data in the time slice in the observation sequence, a gaussian distribution mean value commonly corresponding to each candidate block related to the hidden markov model, and a gaussian distribution variance commonly corresponding to each candidate block, the target block generating a transmission probability of population activity data in the time slice in the observation sequence under the condition that the target block is in the hidden state in the time slice includes:

the method comprises the steps of determining the emission probability of the target neighborhood in the hidden state of the hidden Markov model in the time slice based on the variance of the Gaussian distribution obeyed by the probability of each activity behavior feature related to the population activity data of the observation sequence in the hidden state, the mean of the Gaussian distribution obeyed by the probability of each activity behavior feature related to the population activity data of the observation sequence in the hidden state and the population activity data of the target neighborhood in the time slice in the observation sequence.

5. The method of claim 1, wherein the training mode of the hidden markov model comprises:

Obtaining observation sequences corresponding to the candidate blocks respectively;

in the current iteration, determining the current intermediate state probability and the current intermediate state transition probability corresponding to each candidate block respectively based on the observation sequence corresponding to each candidate block respectively, the initial state probability and the state transition probability corresponding to each candidate block which are determined last time respectively, the Gaussian distribution mean value corresponding to each candidate block jointly and the Gaussian distribution variance corresponding to each candidate block jointly;

determining a current initial state probability of each candidate block based on each current intermediate state probability, and determining a current state transition probability of each candidate block based on each current intermediate state transition probability;

based on the current intermediate state probability, determining a current Gaussian distribution mean value and a current Gaussian distribution variance which are corresponding to the candidate blocks together;

and when the iteration termination condition is met, obtaining the hidden Markov model based on the initial state probability and the state transition probability which are respectively corresponding to the candidate blocks and are determined last time, and the Gaussian distribution mean value and the Gaussian distribution variance which are commonly corresponding to the candidate blocks.

6. The method of claim 5, wherein the current intermediate state probabilities corresponding to the candidate blocks include the current probabilities that the candidate blocks are each in the hidden state within each target time slice of each time slice covered by the corresponding observation sequence; the current Gaussian distribution mean value comprises the current Gaussian distribution mean value obeyed by the probability of each activity behavior feature related to population activity data in the observation sequence under the condition of being in each hidden state of the hidden Markov model;

a manner of determining a current mean of gaussian distribution to which probability of generating any activity behavior feature involved in demographic activity data in the observation sequence is subject under any hidden state of the hidden markov model, comprising:

and determining the current average value of Gaussian distribution obeyed by the probability of generating the activity characteristic under the condition of being in the hidden state based on the current probability of being in the hidden state in each target time slice and the activity characteristic related to the population activity data of each candidate block in each target time slice.

7. The method of claim 6, wherein the current gaussian distribution variance comprises a current variance of a gaussian distribution to which probabilities of individual activity behavior features involved in demographic activity data in the observation sequence are respectively generated under conditions of individual hidden states of the hidden markov model;

a method of determining a current variance of a gaussian distribution to which probabilities of any activity behavior feature involved in demographic activity data in the observation sequence are subject under conditions of any hidden state of the hidden markov model, comprising:

determining a current variance of a gaussian distribution obeyed by the probability of generating the active behavior feature under the condition of the hidden state based on a current probability of each candidate block being in the hidden state in each target time slice, the active behavior feature related to population activity data of each candidate block in each target time slice, and a current average of the gaussian distribution obeyed by the probability of generating the active behavior feature under the condition of the hidden state.

8. The method according to any one of claims 5 to 7, wherein in the current iteration, determining the current intermediate state probability and the current intermediate state transition probability respectively corresponding to each candidate block based on the observation sequence respectively corresponding to each candidate block, the initial state probability and the state transition probability respectively corresponding to each candidate block determined last time, the gaussian distribution mean value respectively corresponding to each candidate block, and the gaussian distribution variance respectively corresponding to each candidate block, includes:

In the current iteration, determining the current target sequence segments corresponding to the candidate blocks respectively; the current target sequence segment corresponding to the candidate block is a sequence segment which is not used as a target sequence segment in the current iteration in each sequence segment contained in the observation sequence corresponding to the candidate block;

determining a current intermediate state probability and a current intermediate state transition probability corresponding to each candidate block respectively based on each current target sequence segment, the initial state probability corresponding to each candidate block determined last time, the state transition probability corresponding to each candidate block respectively, a Gaussian distribution mean value corresponding to each candidate block jointly, and the Gaussian distribution variance corresponding to each candidate block jointly;

after the current middle state probability corresponding to each candidate block is determined together based on the current gaussian distribution mean value and the current gaussian distribution variance corresponding to each block, the method further comprises:

and returning to the step of determining the current target sequence segments corresponding to the candidate blocks respectively, and judging whether an iteration termination condition is met or not until each sequence segment contained in the observation sequence corresponding to the candidate blocks is used as a target sequence segment in the current iteration.

9. The method of claim 1, further comprising, after determining the hidden state in which the target neighborhood is within the last time slice:

and predicting the hidden state of the target block in the next time slice of the last time slice based on the hidden state of the target block in the last time slice covered by the observation sequence and the state transition probability corresponding to the target block in the hidden Markov model.

10. The method of claim 9, further comprising, after said predicting a hidden state in which said target neighborhood is in a time slice next to the last time slice:

and predicting population activity data of the target block in the time slice next to the last time slice based on the hidden state of the target block in the time slice next to the last time slice and the Gaussian distribution mean value in the hidden Markov model.

11. A method for determining the function type of a block comprises the following steps:

obtaining observation sequences respectively corresponding to candidate blocks related to a hidden Markov model, wherein the observation sequences comprise population activity data of the candidate blocks in more than two time slices, the population activity data relate to activity behavior characteristics of population activity behaviors corresponding to the blocks, and the activity behavior characteristics at least comprise population flow numbers and access frequencies aiming at points of interest of a preset type;

Based on initial state probabilities corresponding to the candidate blocks in the hidden Markov model, state transition probabilities corresponding to the candidate blocks, gaussian distribution mean values corresponding to the candidate blocks and Gaussian distribution variances corresponding to the candidate blocks, respectively, local probabilities that the candidate blocks are in hidden states of the hidden Markov model in time slices covered by the observation sequence are respectively determined, and back pointers corresponding to the local probabilities are determined based on the local probabilities, wherein the hidden states are used for representing population density, population flow and population activity types of the blocks;

12. A device for determining a sequence of hidden states, comprising:

a first observation sequence acquisition module, configured to acquire an observation sequence corresponding to a target block, where the observation sequence includes population activity data of the target block in more than two time slices, where the population activity data relates to activity behavior features corresponding to population activity behaviors of the block, where the activity behavior features include at least population flow numbers and access frequencies for points of interest of a predetermined type;

a first intermediate parameter determining module, configured to determine local probabilities that the target block is in each hidden state of the hidden markov model within each time slice covered by the observation sequence, and determine back pointers corresponding to each of the local probabilities, where the hidden states are used to characterize population density, population flow, and population activity types of the block, based on the observation sequence, an initial state probability corresponding to the target block in the hidden markov model, a state transition probability corresponding to the target block, a gaussian distribution mean value corresponding to each candidate block related to the hidden markov model, and a gaussian distribution variance corresponding to each candidate block;

13. A functional type determining apparatus of a neighborhood, comprising:

a second observation sequence acquisition module, configured to acquire observation sequences corresponding to candidate blocks related to a hidden markov model, where the observation sequences include population activity data of the candidate blocks in more than two time slices, the population activity data relates to activity behavior features corresponding to population activity behaviors of the blocks, and the activity behavior features include at least population flow numbers and access frequencies for points of interest of a predetermined type;

a second intermediate parameter determining module, configured to determine, based on an initial state probability corresponding to each candidate block in the hidden markov model, a state transition probability corresponding to each candidate block, a gaussian distribution mean value corresponding to each candidate block, and a gaussian distribution variance corresponding to each candidate block, respectively, a local probability that each candidate block is in each hidden state of the hidden markov model in each time slice covered by the observation sequence, and determine, based on each local probability, a back pointer corresponding to each local probability, respectively, the hidden states being used to characterize population density, population flow, and population activity type of a block;

14. A computer readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the method of any one of claims 1 to 11.

15. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1 to 11.