CN115545758A

CN115545758A - Method and system for self-adaptive incremental site selection of urban service facilities

Info

Publication number: CN115545758A
Application number: CN202211175414.9A
Authority: CN
Inventors: 王璞; 孙靓亚; 赵雷
Original assignee: Suzhou University
Current assignee: Suzhou University
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2022-12-30

Abstract

The invention relates to the field of facility site selection, and discloses a method and a system for self-adaptive incremental site selection of urban service facilities, wherein the method comprises the following steps: obtaining urban road network structure and user behavior data to construct an address association diagram; constructing a spatial incidence relation of the address position by using a graph neural network, and extracting a local hidden feature and a global hidden feature of the address position from an address incidence graph according to the spatial incidence relation; a long-time and short-time memory network is used for constructing a time dependency relationship of address positions, and missing popularity distribution is predicted according to local hidden features, global hidden features and the time dependency relationship; obtaining a popularity prediction result by combining the local hidden feature, the global hidden feature and the missing popularity distribution to select addresses; the system comprises an address association diagram building module, a feature extraction module, a missing popularity prediction module and an addressing module. The method can fully mine data such as user activeness, social behaviors and the like, realizes self-adaptive address selection combined with a time-space domain, and has accurate and scientific result, high efficiency and good effect.

Description

Method and system for self-adaptive incremental site selection of urban service facilities

Technical Field

The invention relates to the technical field of facility site selection, in particular to a method and a system for self-adaptive incremental site selection of urban service facilities.

Background

In the construction and development process of a smart city or a digital city, address selection of service facilities (such as shopping malls, hospitals, convenience stores, automobile charging piles and the like) in the city is required according to the requirements of the overall planning layout of the city, the working and living requirements of residents and the like, so that the candidate address (or addresses) suitable for building the service facility is determined in a certain urban area. The site selection planning of the urban service facilities is a very complex work, and the research of the site selection model has important theoretical value, economic benefit and social significance. The good site selection planning can bring convenience to the work and life of residents, reduce the production or commercial cost, improve the service efficiency and competitiveness of corresponding facilities, perfect the whole urban service system, improve the comprehensive service capacity of cities, relieve the traffic pressure, the environmental pressure and the like of the cities and play a positive role in all aspects of the cities.

At present, most cities have developed to a certain stage, and corresponding service facilities exist in different regions of the cities. For example, as cities continue to grow, the size of cities and the number of residents continue to increase, and chain stores need to evaluate candidate addresses and then continue to open different branch stores in order to expand the business size. Incremental siting studies of service facilities have become particularly important. In order to build new facilities on a proper candidate address (or a plurality of candidate addresses) continuously to perfect the whole city service system and improve the comprehensive service capability of the city, the site selection work not only considers the influence of city residents on the candidate addresses on a time domain and a space domain, but also comprehensively analyzes the influence effect of the existing service facilities on the candidate addresses, the influence of the newly built facilities on the existing facilities and the like. Whether a candidate address is suitable for building a facility of a certain category can be quantitatively expressed as the total number of potential users or customers after the address is built into a service facility in a future period of time from the viewpoint of a mathematical model of expected revenue, and is generally referred to as the influence or Popularity (Location Popularity) of the address. Rationally, the higher the popularity of a candidate address over a period of time, the more suitable it is to build this type of service and vice versa. Therefore, a good facility site planning not only needs to consider the planning, layout, utilization and the like of land space in the whole urban area, but also especially pays attention to the use cost of residents as main bodies of the city, and simultaneously considers the future development condition of the city.

Traditionally, city planners or service providers have relied on labor-intensive questionnaires to provide information support for their siting decisions. In recent years, the site selection decision is also made by using a part of city big data, such as a macroscopic traffic flow of a road network, a general flow trend of residents, and the like. In the selection of the calculation model, the extracted multidimensional correlation features are mostly processed by adopting a linear model or a regression model, and in recent times, some algorithms adopt independent models such as a recurrent neural network or a Gaussian random field.

However, the city is an organic whole consisting of land space, building facilities and residential users, planning and construction of the facilities are to serve residents in the region at last, and the existing site selection method ignores that the city residents are a dynamic served group and lacks real understanding and correct application of daily behavior patterns of the city residents. In the facility site selection task considering cities and urban residents simultaneously, urban big data is generally divided into two categories: urban space-time big data and resident position socialized big data. The city space-time big data describe the macro structure and the variation trend Of a city and mainly comprise data sets such as a road network and a traffic flow, and the resident position socialization big data take resident individuals as the center and comprise data sets such as daily behavior tracks, social network structures and Point Of Interest (POI) sign-in access records Of the resident individuals.

The existing address selection method comprises the following steps:

(1) In the case of existing relevant service facilities in a city, a unified solution framework covering three different types of requirements is provided by proposing that the positions of newly-built facilities need to meet different requirements, such as being closest to the largest user customer base, reducing the average distance between the customer base and the closest facility, or reducing the maximum distance between the customer base and the closest facility, and the like, and a distance measurement method adopts the geographic Euclidean distance between a candidate address and a customer (set).

(2) In order to utilize more characteristics of candidate addresses and users, the concept of address popularity is used in the address selection problem, and a corresponding computing model Geo-pointing is proposed. The model extracts multidimensional characteristics such as low-dimensional characteristics (other facilities in a certain range) and user movement patterns (total quantity and flowing direction trend) of the candidate addresses, learns the weight of all the characteristics through a regularized linear regression model, and finally generates popularity ranking (rather than actual specific values) of the candidate addresses and generates a recommendation list.

(3) And (4) according to whether the vehicle track passes through the candidate address, carrying out addressing by digging k most influential addresses in the road network (namely, ensuring that the number of tracks passed by the k address sets is the maximum).

(4) The problem of planning and site selection of public bicycle parking points is solved by using a semi-supervised feature selection method, and three key factors are considered: different functions, people movement activity modes and demographic characteristics of the road network region are extracted, a series of distinguishing characteristics are extracted from the road network region and are subjected to correlation analysis, the characteristics are simply fused, the public bicycle travel demands of regional users are finally obtained, and reference is provided for parking point location.

(5) And (3) by utilizing historical restaurant comment information of a user, proposing the next optimal candidate position recommendation problem of the newly-built restaurant, simultaneously considering the geographic position and competitive factors of the existing restaurant, and integrating the characteristics into a regression model to predict the final popularity of the candidate address.

(6) Semantic tracks are used to describe the daily activities of users in cities, and a computational model based on a recurrent neural network is proposed to predict the next position. The model extracts the spatial features and the associated text information features of the user track and learns the change condition of the popularity of the candidate address according to the time features of the track, so that the accuracy of recommending the candidate address is improved.

(7) The Geo-Spotting model is expanded, a network graph structure is introduced, a semi-supervised computing model is provided, and under the condition that partial service facilities already exist in a road network, the popularity of the candidate address in a future period is predicted, so that a newly opened shop position is selected for a chain shop. The model carries out detailed classification and expansion on position characteristics and user sign-in behavior characteristics, analyzes the influence of existing facility positions in a city on candidate addresses, considers the popularity in a time sequence relation, adopts a Gaussian Random Field (Gaussian Random Field) to model the time sequence characteristics, and integrates the popularity in unit time periods such as week and month into a calculation model based on the Gaussian Random Field and an address preference network structure in a unified manner, thereby capturing and predicting the popularity in unit time periods such as week and month and the like, and further being used for the final address selection problem.

However, the above methods also have some drawbacks, including:

(1) The prior art does not consider the site selection problem of service facilities from the microscopic perspective of urban residents, does not deeply mine data such as space-time behavior tracks of users, interest point facility check-in, social relationship network activities and the like, and does not fully extract geographic spatial features and social relationship association features in the social activities of the users, so that the services and facilities really lacked and needed by the resident users cannot be effectively analyzed and known.

(2) The continued development of the "red envelope economy" in commercial service facilities has made the reciprocal recommendation effect between social friends increasingly apparent. For example, customers can obtain a certain amount of consumption red packages or coupons after online or offline shop consumption, and after the customers share the red packages with friends through a smart phone Application (APP), the friends can enjoy corresponding discount offers when consuming the red packages again. In the current facility site selection planning research, the effectiveness of the candidate addresses is mainly evaluated according to the space-time characteristics of the candidate addresses, and the characteristics of social relations, social influences and the like of urban resident users are not comprehensively considered from multiple angles such as the social influence relations, active time periods and the like, so that the final site selection result is inaccurate or uneconomical.

(3) Due to the continuous construction and development of cities, part of the same service facilities, such as brand-linked stores and the like, exist in cities in most cases. For the incremental site selection problem of facilities, the influence of current research work on other existing similar facilities around a candidate address is lack of systematic analysis. Although the influence of the existing facilities is considered from the space domain or the time domain, the association dependency and popularity of the candidate address on the time-space domain are not integrally evaluated, and a modeling processing method under the condition of sparse data is lacked, so that the popularity of the candidate address is low in prediction efficiency and poor in effect.

Disclosure of Invention

Therefore, the technical problem to be solved by the invention is to overcome the defects in the prior art and provide a method and a system for adaptive incremental site selection of urban service facilities, which can fully mine data such as user activeness and social behaviors, realize adaptive site selection combined with a time-space domain, and have accurate and scientific results, high efficiency and good effect.

In order to solve the technical problem, the invention provides a method for self-adaptive incremental site selection of urban service facilities, which comprises the following steps:

s1: acquiring input data comprising urban road network structures and user behaviors, and constructing an address association diagram according to the input data;

s2: constructing a spatial incidence relation of address positions by using a graph neural network, and extracting local hidden features and global hidden features of the address positions from the address incidence graph according to the spatial incidence relation;

s3: constructing a time dependence relation of address positions by using a long-time memory network;

s4: predicting the distribution of the popularity of the deletion according to the local implicit features, the global implicit features and the time dependence relationship;

s5: and obtaining a final popularity prediction result by combining the local hidden feature, the global hidden feature and the missing popularity distribution, and carrying out site selection on the target facility according to the final popularity prediction result.

Preferably, the address association map is constructed according to the input data, specifically:

s1-1: obtaining a set L = L of all address positions according to the urban road network structure _l ∪L _u (ii) a Wherein L is _l Set of address locations, L, indicating an existing service facility _u Candidate address bits indicating absence of service facilityPlacing a set;

s1-2: dividing city regions according to the set L, user behavior tracks in the city road network structure and administrative divisions of cities to obtain a city region set meeting the minimum road network distance interval

S1-3: extracting the urban area set

And obtaining a three-dimensional tensor from features including address locations in the user behavior

S1-3: taking all address positions in the set L as graph nodes, taking the spatial adjacency relation between the address positions as graph edges, and constructing an address association graph

Where node set Λ = L is the set of all address locations, and a is the adjacency matrix of node set Λ.

Preferably, the city region set satisfying the minimum road network distance interval is obtained by dividing the city region according to the set L, the user behavior track in the city road network structure and the administrative division of the city

The method specifically comprises the following steps:

s1-2-1: dividing the whole road network space region by taking a district of an administrative district of a city as a unit to obtain an initial divided region;

s1-2-2: with the address position in the set L as the center and the road network road as the boundary, gathering all user behavior tracks in the region around the corresponding address position by using a clustering algorithm to obtain | L | clusters, wherein | L represents the number of elements in the set;

s1-2-3: spacing the outermost travel paths of all adjacent clusters by the minimum road-to-net distanceThe road is used as a dividing parting line of the corresponding area, and the dividing parting line is used for dividing the initial divided area to obtain an urban area set

r _i Is the ith area obtained by division.

Preferably, the urban area set is extracted

The method comprises the following specific steps:

s1-3-1: extracting road network distance distribution characteristics from the track to the address position according to the space-time track characteristics in the user behaviors:

s1-3-1-1: will be the region r _i The address location within is denoted as l _i The user travel track set in the urban road network is represented as gamma, and the region r _i The trace inflow set over time period t is represented as

The trace-out set is represented as

Will be the region r _i The number of inflow trajectories in the time period t is expressed as

Will be the region r _i The number of outgoing traces over the time period t is indicated as

S1-3-1-2: computing collections

Each track of _i The distance between the road network space and the round-trip road is obtained to obtain all distance result sets

Computing collections

Each track of _i Obtaining all distance result sets by the distance between the road network space and the round-trip road

S1-3-1-3: using Gaussian distributions

Fitting of

Using Gaussian distribution

Fitting of

In the distance data distribution case, the distance data distribution case

And

the road network distance distribution characteristic from the track to the address position is used;

s1-3-2: extracting social influence characteristics in the area according to the number of social friend relationships of the track user in the user behaviors and the recommendation effectiveness of the celebrity V:

s1-3-2-1: will be the region r _i Travel trajectory set within time period t

The corresponding generation user set is represented as

Will be the region r _i Travel trajectory set within time period t

The corresponding generation user set is represented as

If user u _i And user u _j Paying attention to each other on the social media network, and putting the user u _i And user u _j Identify as a friend;

computing

Corresponding user social recommendation features

Comprises the following steps:

computing

Corresponding user social recommendation features

Comprises the following steps:

wherein the content of the first and second substances,

representing a set of friends of user u within a time slice t;

s1-3-2-2: number of users | F that will focus user u on social media network ^f (u) | exceeds a preset threshold ε _f User u as the celebrity, V, F ^f (u) represents the number of all other users on the social media network that are interested in user u;

calculating the region r _i Celebrity influence recommendation effect feature V (r) of large V of all celebrities _i )＝{|F ^f (u ₁ )|,...,|F ^f (u _i )|,...,|F ^f (u _m ) L, wherein l F ^f (u _i ) I represents a big Vu of a celebrity of interest on a social media network _i M represents the area r _i The total number of large V of the celebrity;

s1-3-3: obtaining homogeneous facility competition features

Wherein, the first and the second end of the pipe are connected with each other,

indicating the region r _i Innerhomogeneous service facility address location l _j Popularity in a time period t, n being the area r _i The number of the address positions of the internal similar service facilities;

s1-3-4: combining the urban area set and all the characteristics including the address positions in the user behaviors to obtain the time period t and the urban area r _i An address location l _i Of a multi-dimensional feature vector

Comprises the following steps:

wherein the content of the first and second substances,

and

are respectively Gaussian distribution

The mean and the square of the variance in (a),

and

are respectively Gaussian distribution

Mean and square of variance in (1);

s1-3-5: all address positions in the combined set L in different time periods T in the whole time period set T ₁ 、…、t _i 、…、t _h Obtaining three-dimensional tensor by inner multi-dimensional eigenvector

Comprises the following steps:

denotes at time period t = t _i Multidimensional feature vector of time

t ₁ 、…、t _i 、…、t _h H is the number of time segments into which the whole time segment set T is divided.

Preferably, the adjacency matrix a includes geographic neighbors corresponding to local spatial correlations between address locations and semantic neighbors corresponding to global spatial correlations between address locations.

Preferably, the local latent feature

The calculation method comprises the following steps:

is an address location l _i Local hidden features within a time period t, K represents the amount of multi-head attention,

representing the stitching operation of K sets of features in multi-head attention, sigma is a non-linear activation function,

is the address position l in the time period t-1 _j Local latent feature of (1), N _l (l _i ) Represents l _i Is determined by the geographic neighborhood of the mobile terminal,

a weighting parameter matrix that is a kth set of features that are shared and learnable among all address locations;

is 1 _i And l _j The approximate weight coefficients of the kth group of local hidden features,

the calculation method comprises the following steps:

wherein, W _l ^k Is composed of

All edges share and can be learned the weighting parameter matrix of the kth group of local features, an is the concrete realization of the attention mechanism;

the global hidden feature

The calculation method comprises the following steps:

wherein the content of the first and second substances,

is an address location l _i The global hidden feature within the time period t,

is the address position l in the time period t-1 _j Global hidden feature of (1), N _g (l _i ) Is 1 of _i Semantic neighbours on address dependency graphs, W _s ^k A weighting parameter matrix for the kth set of features that is shared and learnable for all edges;

represents l _i And l _j The approximate weight coefficients of the kth group of global hidden features,

the calculating method comprises the following steps:

wherein the content of the first and second substances,

is composed of

A weighting parameter matrix of the kth set of global features that all edges share and can learn.

Preferably, the time dependency relationship of the address location is constructed by using the long-time memory network, and specifically includes:

address position l within time period set T _i Is input with a feature sequence vector of

Wherein

Wherein | | | represents the concatenation operation of the vector;

address location l _i State at time period t-1

And state at time period t

The time dependence relationship between the two is as follows:

wherein the content of the first and second substances,

indicating a forgetting gate in the long-time memory network,

is an input gate for the input of the image,

the state of the cell is the state of the cell,

indicating the renewal of the cells and the time of the cells,

which represents the output gate or gates, respectively,

is the output, σ is the activation function; w _f 、W _z 、W _C 、W _o 、b _f 、b _z 、b _C And b _o Is a parameter that needs to be learned,

representing a vector join operation, tanh is an activation function.

Preferably, the missing popularity distribution is predicted according to the local implicit feature, the global implicit feature and the time dependency relationship, and specifically:

s4-1: obtaining an address location l _i Probability distribution vector of scalar popularity values

S4-2: transmitting the local popularity and the global popularity of the existing address position to the address position lacking the popularity by adopting a multi-head map attention network, summarizing the popularity distribution of the existing neighbor address position facility, and calculating the address position l in the time period t _i ∈L _u Local spatial popularity prediction distribution of

Comprises the following steps:

during a time period t, the address location/is calculated _i ∈L _u Global spatial popularity prediction distribution of

Comprises the following steps:

s4-3: obtaining an address location l _i The state output by the long-time memory network in the time period t-1

Calculating an address location l _i Time domain popularity prediction distribution over a time period t

Comprises the following steps:

wherein, W _td The weighted parameter matrix is shared by all long-time and short-time memory network output states and can be learned, and Softmax () represents Softmax function operation;

s4-4: fuse the

And

obtaining the deletion popularity distribution

Comprises the following steps:

wherein for a given popularity distribution

Means of calculation representing entropy, wherein

Representing a vector

P denotes a vector

The number of dimensions of (c).

Preferably, the local latent features, the global latent features and the missing popularity distribution are combined to obtain a final popularity prediction result, which specifically comprises the following steps:

s5-1: connecting the local hidden features

Global hidden features

And lack of prevalence distribution

Obtaining an address location l within a time slice t _i Is represented by the final hidden feature vector

S5-2: address location l _i Is represented by the final hidden feature vector

Inputting a long-time and short-time memory network to obtain l _i Final predicted popularity distribution over time period t

l _i ∈L _u 。

The invention also provides a system for self-adaptive incremental site selection of urban service facilities, which comprises an address association diagram construction module, a feature extraction module, a missing popularity prediction module and a site selection module,

the address association diagram construction module acquires input data comprising urban road network structures and user behaviors, constructs an address association diagram according to the input data and transmits the address association diagram to the feature extraction module;

the feature extraction module uses a graph neural network to construct a spatial incidence relation of address positions, extracts local hidden features and global hidden features of the address positions from the address incidence graph according to the spatial incidence relation, and transmits the local hidden features and the global hidden features to the missing popularity prediction module and the addressing module;

the missing popularity prediction module uses a long-time memory network to construct a time dependence relation of an address position, predicts missing popularity distribution according to the local hidden feature, the global hidden feature and the time dependence relation and transmits the missing popularity distribution to the addressing module;

the site selection module combines the local hidden features, the global hidden features and the missing popularity distribution to obtain a final popularity prediction result, and site selection of the target facility is carried out according to the final popularity prediction result.

Compared with the prior art, the technical scheme of the invention has the following advantages:

according to the method, urban areas are divided through urban road network structure data, an address association graph is constructed, a spatial association relation and a time dependency relation of address positions are captured simultaneously through a graph neural network and a cyclic neural network, local and global characteristics of the address positions are extracted, and the local and global characteristics and the predicted address position missing popularity distribution are fused to perform final popularity prediction of the address positions; the data such as the user activity, the social behavior and the like are fully mined, the prediction result obtained by combining the data with the time-space domain is accurate and scientific, the self-adaptive site selection is carried out according to the prediction result on the basis, the efficiency is high, the effect is good, and data support and reference can be provided for the incremental site selection of various service facilities.

Drawings

In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description of the present disclosure taken in conjunction with the accompanying drawings, in which:

figure 1 is a schematic flow diagram of the present invention,

figure 2 is a schematic diagram of a set of urban areas of minimum road network distance separation obtained by area division in the present invention,

FIG. 3 is a diagram of an address association diagram according to the present invention.

Detailed Description

The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.

In the description of the present invention, it is to be understood that the meaning of "a plurality" is two or more unless specifically limited otherwise. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Referring to fig. 1, the invention discloses a method for self-adaptive incremental site selection of urban service facilities, which comprises the following steps:

s1: acquiring input data comprising urban road network structures and user behaviors, and constructing an address association diagram according to the input data; the city road network structure comprises a road network structure, address positions (comprising existing facility address positions and candidate address positions), and user behaviors comprise space-time travel tracks, social relationship networks and interest point check-in of resident users.

S1-1: in the urban area corresponding to the incremental addressing problem, all address positions obtained according to the urban road network structure are represented as a set L = L _l ∪L _u (ii) a Wherein L is _l Set of address locations, L, indicating that a service facility already exists _u Set of candidate address locations, L, indicating absence of service facilities _l And L _u Satisfy L _l ≠L _u ，L _l ＜＜L _u (ii) a In order to represent all address location characteristics in a normalized mode and utilize a graph neural network technology, all input data need to be preprocessed, and the preprocessing process comprises two stages of dividing city areas and extracting characteristics containing address locations.

S1-2: the urban area is divided mainly aiming at the space-time big data of the city. Dividing city regions according to the set L and multi-dimensional characteristic data such as user behavior tracks, administrative divisions of cities, streets and the like in the city road network structure to obtain a city region set meeting the minimum road network distance interval

S1-2-1: dividing the whole road network space Region by taking a Region of a secondary or tertiary Administrative Region (Administrative Region) of a city as a unit to obtain an initial divided Region;

s1-2-2: with the address position in the set L as the center and the road network road as the boundary, gathering all user tracks in the area around the corresponding address position by using a clustering algorithm to obtain | L | clusters, wherein | L represents the number of elements in the set L;

s1-2-3: taking the road with the minimum (balanced) road network distance interval between the outermost side travel tracks of all the adjacent clusters as a dividing parting line of the corresponding region, and dividing the initial divided region by using the dividing parting line to obtain an urban region set

r _i Is the ith area obtained by division. Urban area collection

With one and only one of said address locations in each sub-area, the sub-areas not necessarily being of regular and equal size.

In the embodiment, the division lines are obtained by using a k-Medoids clustering algorithm, and the address positions in the regions ri are obtained by using l _i Indicating, first, the address location l _i Centered, user travel track τ _j And l _i Shortest road network space distance d _s (l _i ,τ _j ) For the clustering criteria, all tracks within a region are clustered one-to-one around the corresponding address location, and | L | complete "clusters" are formed. On the basis, a minimum (balance) interval idea of a Support Vector Machine (SVM) algorithm is adopted, and roads with minimum (balance) road network distance intervals between outermost side travel tracks of adjacent 'cluster clusters' are used as division separation lines of corresponding areas. Repeating the above division process to obtain the set of the divided urban areas as shown in FIG. 2

S1-3: extracting the urban area set

And obtaining a three-dimensional tensor comprising time, space and features according to the features including the address positions in the user behaviors

The extraction of the features including the address location mainly aims at the location socialized big data of residents. And (3) taking a natural time period (such as 1 hour, 1 day, 1 month and the like) as a unit, and combining the results of urban area division to uniformly extract and represent the area and the multi-dimensional context features containing the address position. Specifically, the method is extracted from the aspects of representative user space-time tracks, user social influence, competitiveness of other similar facilities and the like.

S1-3-1: within the address location area, three sub-features most relevant to the user's daily behavior trajectory are considered: the method comprises the following steps of (1) Inflow quantity (correcting Inflow) of area tracks, outflow quantity (correcting Outflow) of the area tracks and road network distance distribution of the tracks to unique address positions in an area, and extracting road network distance distribution characteristics from the tracks to the address positions according to space-time track characteristics in user behaviors:

The trace-out set is represented as

S1-3-1-2: computing collections

In each track τ _j And l _i Road network space round-trip road distance d _m (l _i ,τ _j ) Get all the distance result sets

Computing collections

Each track tau _j And l _i Road network space round-trip road distance d _m (l _i ,τ _j ) Get all the distance result sets

S1-3-1-3: using Gaussian distributions

Fitting

Using Gaussian distribution

Fitting

In the distance data distribution case, the distance data distribution case

And

s1-3-2: extracting social influence characteristics in the area according to the number of social friend relationships of the users in the track in the user behaviors (the influence of 'reciprocal recommendation' of the social relationship network) and the recommendation effectiveness of celebrity V (the recommendation effectiveness of celebrity or 'Reddish big V' is reflected in those users concerned by more people in the social relationship network):

s1-3-2-1: will be the region r _i Set of travel trajectories within a time period t

And

the corresponding generation user sets are respectively expressed as

And

if user u _i And user u _j Paying attention to each other on social media network (such as microblog, bean, etc.), user u is put into _i And user u _j If the relationship is considered to be a friend relationship, then

Corresponding user social recommendation features

Comprises the following steps:

corresponding user social recommendation features

Comprises the following steps:

representing a set of friends of user u within a time slice t; note that at this time, the friend of u must also be the user who appears in the same time period t (the travel track).

S1-3-2-2: number of users | F that will focus user u on social media network ^f (u) | exceeds a preset threshold ε _f I.e. satisfy | F ^f (u)|≥ε _f U as the celebrity, V, wherein F ^f (u) represents the number of all other users on the social media network that are interested in user u, ε in this embodiment _f Taking a value of 1100;

let the recommended influence of the celebrity large V be valid in all time slices, then region r _i Celebrity influence recommendation effect feature V (r) of large V of all celebrities _i )＝{|F ^f (u ₁ )|,...,|F ^f (u _i )|,...,|F ^f (u _m ) L } wherein, l F ^f (u _i ) I represents a big Vu of a celebrity of interest on a social media network _i M represents the area r _i The total number of large V of the inner celebrity;

s1-3-3: obtaining competitive characteristics of the same type of facilities by considering the influence of other existing same type of facilities in the area on popularity prediction of address positions

s1-3-4: combining the city area set and all contained addresses in the user behaviorThe position characteristics are obtained in the time period t and the urban area r _i An address location l _i Of the multi-dimensional feature vector

Comprises the following steps:

wherein the content of the first and second substances,

and

are respectively Gaussian distribution

The mean and the square of the variance in (a),

and

are respectively Gaussian distribution

Mean and square of variance in (c).

Set T and set of regions over the entire period

And for the homogeneous vectors with inconsistent lengths, the homogeneous vectors can be aligned in a zero filling mode. E.g. in different regions, V (r) _i ) And M ^t (r _i ) The length of the equal features may not be consistent, and it is necessary to make them all equal to the longest vector by zero padding for the subsequent processing.

Comprises the following steps:

wherein the content of the first and second substances,

denotes at time period t = t _i Multidimensional feature vector of time

t ₁ 、…、t _i 、…、t _h Is a continuous and equal time period set, h is the number of time periods into which the whole time period set T is divided, i.e. T ₁ +...+t _i +...+t _h T, i.e. T = { T = ₁ ,…,t _i ,…,t _h }。

S1-3: taking all address positions in the set L as Graph nodes, taking the spatial adjacency relation among the address positions as Graph edges, and constructing an address association Graph (LCG)

Wherein node set Λ = L is the set of all address locations,

is the attribute feature tensor for node set Λ, a is the adjacency matrix for node set Λ. The address correlation diagram is an undirected graph (network) structure of address positions, regional address characteristics and connection relations among the address positions and the regional address characteristics.

The adjacency relation in the adjacency matrix A comprises geographic neighbors and semantic neighbors, the geographic neighbors correspond to local spatial correlation between address positions, and the semantic neighbors correspond to global spatial correlation between the address positions. The method specifically comprises the following steps:

if two regions r are in the geographic space of the road network _i And r _j Adjacent or close to each other, then

Of the corresponding address location node l _i And l _j As well as contiguous (contiguous). In addition, considering the planning integrity of the second or third level administrative division of the city, if two address locations l _i And l _j Road network space distance of not more than threshold value epsilon _g I.e. d _s (l _i ,l _j )≤ε _g In this embodiment,. Epsilon _g Taking a value of 5000 meters, they are considered to be geographical neighbors as well.

Geographically distant address facilities are defined as semantic neighbors if the areas where they are located are functionally similar, and there is also an adjacency, which is determined by measuring the similarity of the user trajectory inflow/outflow patterns in the two areas. Specifically, the degree of semantic similarity between the regions is calculated by using the Pearson correlation coefficient, so that

And

respectively representing the regions r in all time periods _i An inflow number sequence (vector) and an outflow number sequence (vector) of inner tracks. Then, the region r _i And r _j Semantic similarity of Si (r) _i ,r _j ) Can be defined as

Wherein, PC _s Is a calculation formula of Pearson Correlation Coefficient (Pearson Correlation Coefficient), alpha _s Is a non-negative weight parameter for determining the respective weights of the inflow number sequence and the outflow number sequence, 0 ≦ α _s Less than or equal to 1. Given threshold value epsilon _s In this embodiment,. Epsilon. _s Value 0.45, region r _i And r _j Is a semantic neighbor (i.e., is in

Middle node l _i And l _j Linked) is if and only if Si (r) _i ,r _j )≥ε _s 。

Address association map combining related concepts of geographic neighbors and semantic neighbors

The adjacency matrix a of (a) may be expressed as:

constructed address association graph

As shown in fig. 3, in which

Indicating the region r ₁ And in all the time periods T, two dimensions corresponding to the characteristic matrix are the time periods and the characteristic vectors respectively.

S2: constructing a spatial incidence relation of address positions by using a graph neural network technology, and extracting local hidden features and global hidden features of the address positions from the address incidence graph according to the spatial incidence relation; the graph neural network is one of the continuous hot techniques in the deep learning field and is an important tool for effectively calculating and learning a graph model. A graph structure is introduced into planning and address selection of urban facilities, address positions are represented as graph nodes, the spatial incidence relation of the address position facilities is used as the edge of a graph, and then relevant processing technologies, such as a graph convolution network, a recurrent neural network and the like, can be utilized to better model the incremental address selection problem of service facilities, and the nonlinear spatial incidence and time dependence relation between candidate address positions is deeply mined and analyzed, so that the popularity of the candidate address positions in different time periods in the future is efficiently and accurately predicted, and data support is provided for the incremental address selection problem.

In the spatial domain, the same type of service facilities in two adjacent address locations are usually correlated and influenced with each other, if a certain address location facility has more users to visit, the visiting amount of other surrounding same type facilities is influenced with high probability, and the influence relationship is very complicated and is not necessarily linear. In addition to local spatial correlation, there is generally global spatial correlation between like-located address location facilities located at greater distances, e.g., like-located facilities in similar functional areas may also have similar access volume (popularity) distributions. To this end, graph neural network techniques are introduced, using a context graph Convolution module (CxtConv) to model complex spatial correlations between address location facilities, and using address correlation graphs to capture local spatial correlations and global spatial correlations between address locations. The implementation of the context Graph convolution module CxtConv is based on a Multi-head Graph ATtention (MGAT) network of a spatial structure, so as to determine a nonlinear local association weight coefficient and a global association weight coefficient between different neighboring nodes through an ATtention mechanism, which are respectively referred to as local Multi-head Graph ATtention and global Multi-head Graph ATtention.

In a local multi-headed graph attention implementation of a context graph convolution module, the local hidden features

The calculation method comprises the following steps:

wherein the content of the first and second substances,

is an address location l _i Local hidden features in a time period t, K denotes the number of multiple heads of attention (number of heads),

representing the splicing operation of K groups of features in multi-head attention, wherein sigma is a nonlinear activation function, and in the embodiment, sigma is a Parametric ReLU (PReLU) function;

is the address position l in the time period t-1 _j Local latent feature of (1), N _l (l _i ) Represents l _i Of the geographic neighbourhood, i.e. is

In l _i one-Hop (1-Hop) neighbors of (1);

is 1 _i And l _j The k-th group of approximate weight coefficients of the local hidden features,

the calculation method comprises the following steps:

wherein, W _l ^k Is composed of

All edges share and can be learned a weighting parameter matrix of kth group of local features, an is a specific implementation of the attention mechanism, and in this embodiment, an adopts Dot-Product (Dot-Product) operation.

Similarly, in a global multi-headed graph attention implementation of the context graph convolution module, the global hidden features

The calculation method comprises the following steps:

wherein the content of the first and second substances,

is an address location l _i The global hidden feature within the time period t,

is l in the time period t-1 _j Global hidden feature of (1), N _g (l _i ) Is 1 _i Semantic neighbors on address association graphs, W _s ^k A weighting parameter matrix for the kth set of features that is shared and learnable for all edges;

is represented by _i And l _j The approximate weight coefficients of the kth group of global hidden features,

the calculation method comprises the following steps:

wherein the content of the first and second substances,

is composed of

A weighting parameter matrix of the kth set of global features that all edges share and can be learned.

S3: and constructing the time dependence relation of the address position by using a Long Short Term Memory (LSTM) in a recurrent neural network.

In the time domain, the time dependency relationship before and after the address position popularity existing among continuous time periods is considered, the dynamic time dependency relationship of the address position facility is modeled by using a long-time and short-time memory network implementation model, and the time dependency of the address position facility among periodic time periods such as days, weeks or months is mined.

Address position l within a time period set T _i Is input with a feature sequence vector of

Wherein

Wherein | | | represents the concatenation operation of the vector;

according to the definition of LSTM, address location l in long-time and short-time memory model _i State at time period t-1

And state at time period t

The time dependence relationship between the two is as follows:

representing a forgetting Gate (Forget Gate) in the long-term and short-term memory network LSTM,

is an Input Gate (Input Gate),

is in a Cell State (Cell State),

indicating Cell Update (Cell Update),

represents an Output Gate (Output Gate),

is the output; w is a group of _f 、W _z 、W _C 、W _o 、b _f 、b _z 、b _C And b _o Is a parameter that needs to be learned by the user,

represents the vector join operation, σ is the activation function, where σ is the Sigmoid function and tanh is the activation function in this embodiment

An example of an operation is a Hadamard Product (Hadamard Product), i.e., a matrix formed by the Product of elements at corresponding positions of two matrices of the same order.

S4: and predicting the distribution of the popularity of the deletion according to the local implicit features, the global implicit features and the time dependence relationship. Historical sign-in data (namely popularity) of address location facilities existing in a road network is used as an important input feature, so that the prediction effect of the popularity of the address location in a future period of time can be improved. However, due to the sparsity of the data, there are fewer address location facilities with historical check-in data records. Therefore, the method provides the prediction of the popularity of the time-space domain missing of the address position, and comprises 3 sub-steps:

s4-1: the popularity distribution of the missing address locations in the historical phase and the future phase is predicted in the spatial domain. In order to fully retain popularity information and use it in the next stage, the prediction module does not use scalar popularity values

But rather, the address location l is obtained in the form of a probability distribution vector of popularity values _i Probability distribution vector of scalar popularity values

As a scalar popularity value

A probability distribution vector of (a);

s4-2: in a spatial domain, based on the similarity of characteristics of urban areas and address positions, a multi-head graph attention network is adopted to transmit the local popularity and the global popularity of the existing address position to the address position lacking the popularity; therefore, a spatial domain predictive Convolution module (PrdConv) is introduced to summarize the popularity distribution of the existing neighbor address location facilities and calculate the address location l during the time period t _i ∈L _u Local spatial popularity prediction distribution of

Is defined as:

similarly, during the time period t, the address position l is calculated _i ∈L _u Global spatial popularity prediction distribution of

Is defined as:

s4-3: the popularity distributions of the missing address locations in the historical and future phases are predicted in the time domain, again in the form of probability distribution vectors of popularity values. To this end, the state of the last time segment output by the recurrent neural network LSTM is reused and the popularity of the address location within the current time segment is predicted using a normalized exponential function (Softmax). Obtaining an address location l _i The state output by the long-time memory network in the time period t-1

Calculating an address location l _i Time domain popularity prediction distribution over time period t

Is defined as:

wherein, W _td The method is a weighting parameter matrix which can be shared and learned by all long-time and short-time memory network output states, and Softmax () represents Softmax function operation;

s4-4: fusing the using an entropy-based mechanism

And

the distribution of three types of predicted popularity obtains the missing popularityDegree distribution

Comprises the following steps:

wherein for a given popularity distribution

Means of calculation of the entropy, wherein

Representing a vector

P denotes a vector

The number of dimensions of (c).

S5-1: connecting the local hidden features

Global hidden features

And loss of prevalence distribution

Representing a join operation;

s5-2: address location l _i Is represented by the final hidden feature vector

Input long and short time memory network _i Final predicted popularity distribution over time period t

l _i ∈L _u 。

S5-3: vectors are processed by a Multi-Layer Perceptron (MLP)

And (5) scaling to obtain a future popularity prediction scalar value of the candidate address position finally output by the addressing model. According to the future popularity prediction scalar value of the candidate address position output by model prediction, the candidate address positions can be sorted from the time dimension or the space dimension, and a plurality of candidate address position sorted Lists (Ordered Lists) based on the popularity of the time-space domain are generated and used for the candidate address position query selection task built by the increment of the service facility. For example, according to the candidate address position ranking list, selecting a candidate address position with Top-5 popularity in the next half year to build a certain commercial service facility; or determining whether the facilities of a certain candidate address location temporarily stop working or not according to the popularity change condition of each month in the next year (such as off-season busy season of the tourism industry).

The invention also discloses a system for self-adaptive incremental site selection of the urban service facilities, which comprises an address association diagram building module, a feature extraction module, a missing popularity prediction module and a site selection module.

The address association diagram construction module acquires input data comprising urban road network structures (including road network structures and address positions) and user behaviors (including space-time travel tracks, social relationship networks and interest point check-in), constructs an address association diagram according to the input data and transmits the address association diagram to the feature extraction module. The feature extraction module uses a graph neural network to construct a spatial incidence relation of address positions, extracts local hidden features and global hidden features of the address positions from the address incidence graph according to the spatial incidence relation, and transmits the local hidden features and the global hidden features to the missing popularity prediction module and the addressing module. The missing popularity prediction module uses a long-time memory network to construct a time dependence relation of address positions, predicts missing popularity distribution according to the local hidden features, the global hidden features and the time dependence relation and transmits the missing popularity distribution to the addressing module. The site selection module combines the local hidden features, the global hidden features and the missing popularity distribution to obtain a final popularity prediction result, and site selection of the target facility is carried out according to the final popularity prediction result.

According to the invention, from the perspective of data driving and a data model, a series of influence factors such as a road network structure, traffic flow, crowd mobility, resident interest point access and a social media network are comprehensively considered and evaluated, the influence of characteristics such as a user time-space dynamic behavior mode and a social relation network on the candidate address position is mainly analyzed, and factors such as the type of a service facility to be built are considered, so that the multi-dimensional characteristics of popularity evaluation of the candidate address position can be enriched at a uniform visual angle, and the maximization of data value is realized. In the selection of the data processing model, a graph (network) technology is introduced, and a graph convolution network and a recurrent neural network are combined, so that the time dependence and the space dependence of the candidate address position can be captured from a time domain and a space domain simultaneously, the research of the facility address selection problem is supported, and the processing efficiency and the accuracy of the address selection are improved.

According to the method, urban areas are divided through urban road network structure data, an address association graph is constructed, a graph neural network and a cyclic neural network are used for capturing the spatial association relation and the time dependency relation of address positions at the same time, the local and global characteristics of the address positions are extracted, and the popularity of the address positions is finally predicted by fusing the local and global characteristics with the predicted address position missing popularity distribution; the method fully excavates data such as user liveness, social behaviors and the like, is accurate and scientific in prediction result obtained by combining with a time-space domain, performs self-adaptive site selection according to the prediction result on the basis, is a semi-supervised deep space-time service facility incremental site selection method, is high in efficiency and good in effect, and can provide data support and reference for the incremental site selection of various service facilities.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims

1. A method for self-adaptive incremental site selection of urban service facilities is characterized by comprising the following steps:

s3: constructing a time dependency relationship of the address position by using a long-time and short-time memory network;

2. The method of adaptive incremental site selection for a municipal service facility according to claim 1, wherein: constructing an address association diagram according to the input data, specifically:

s1-1: obtaining a set L = L of all address positions according to the urban road network structure _l ∪L _u (ii) a Wherein L is _l Set of address locations, L, indicating an existing service facility _u A set of candidate address locations indicating an absence of a service facility;

S1-3: extracting the urban area set

3. The method of adaptive incremental site selection for a municipal service as claimed in claim 2, wherein: dividing city regions according to the set L, user behavior tracks in the city road network structure and administrative divisions of cities to obtain a city region set meeting the minimum road network distance interval

The method specifically comprises the following steps:

s1-2-1: dividing the whole road network space region by taking the region of the administrative district of the city as a unit to obtain an initial divided region;

s1-2-2: with the address position in the set L as the center and the road network road as the boundary, gathering all user behavior tracks in the region around the corresponding address position by using a clustering algorithm to obtain | L | clusters, wherein | L | represents the number of elements in the set;

s1-2-3: taking the road with the minimum road network distance interval between the outermost side travel tracks of all the adjacent clusters as a dividing parting line of the corresponding region, and dividing the initial divided region by using the dividing parting line to obtain an urban region set

r _i Is the ith area obtained by division.

4. The method of adaptive incremental site selection for a municipal service as claimed in claim 3, wherein: extracting the urban area set

And obtaining a three-dimensional tensor by using the characteristics including the address position in the user behavior

The method specifically comprises the following steps:

s1-3-1-1: will be the region r _i The address location within is denoted as l _i The set of user travel tracks in the urban road network is represented by Γ, and the region r _i The trace inflow set over time period t is represented as

The trace-out set is represented as

Region to regionr _i The number of inflow trajectories in the time period t is shown as

S1-3-1-2: computing collections

Computing collections

S1-3-1-3: using a Gaussian distribution

Fitting

Using Gaussian distribution

Fitting

In the distance data distribution case, the distance data distribution case

And

s1-3-2-1: will be the region r _i Travel trajectory set over time period t

The corresponding generation user set is represented as

Will be the region r _i Travel trajectory set within time period t

The corresponding generation user set is represented as

If user u _i And user u _j Paying attention to each other on the social media network, the user u is put into use _i And user u _j Identifying as a friend;

computing

Corresponding user social recommendation features

Comprises the following steps:

computing

Corresponding user social recommendation features

Comprises the following steps:

wherein the content of the first and second substances,

representing a set of friends of user u within a time slice t;

calculating the region r _i Celebrity influence recommendation effect feature V (r) of large V of all celebrities _i )＝{|F ^f (u ₁ )|,...,|F ^f (u _i )|,...,|F ^f (u _m ) L } wherein, l F ^f (u _i ) I represents a big Vu of a celebrity of interest on a social media network _i M represents the area r _i The total number of large V of the celebrity;

s1-3-3: obtaining homogeneous facility competition features

Wherein the content of the first and second substances,

s1-3-4: combining the urban area set and all the characteristics containing address positions in the user behaviors to obtain the time period t and the urban area r _i An address location l _i Of the multi-dimensional feature vector

Comprises the following steps:

and

are respectively Gaussian distribution

The mean and the square of the variance in (a),

and

are respectively Gaussian distribution

Mean and square of variance in (1);

Comprises the following steps:

wherein the content of the first and second substances,

represents at time period t = t _i Multidimensional feature vector of time

5. A method for adaptive incremental site selection of a municipal service facility according to any one of claims 2 to 4, wherein: the adjacency relation in the adjacency matrix A comprises geographic neighbors and semantic neighbors, the geographic neighbors correspond to local spatial correlation between address positions, and the semantic neighbors correspond to global spatial correlation between the address positions.

6. The method of adaptive incremental site selection for a municipal service facility according to claim 4, wherein: the local hidden feature

The calculation method comprises the following steps:

wherein the content of the first and second substances,

is an address location l _i Local hidden features in a time period t, K represents the amount of attention of a plurality of heads,

representing the stitching operation of K sets of features in multi-head attention, sigma is a nonlinear activation function,

the calculation method comprises the following steps:

is composed of

the global hidden feature

The calculation method comprises the following steps:

wherein the content of the first and second substances,

is an address location l _i The global hidden feature within the time period t,

is the address position l in the time period t-1 _j Global hidden feature of (1), N _g (l _i ) Is 1 _i Semantic neighbors on the address association graph,

a weighting parameter matrix for the kth set of features that is shared and learnable for all edges;

the calculation method comprises the following steps:

is composed of

7. The method of adaptive incremental site selection for a municipal service facility according to claim 6, wherein: the method for establishing the time dependency relationship of the address positions by using the long-time and short-time memory network specifically comprises the following steps:

Wherein

Wherein | | | represents the concatenation operation of the vectors;

address location l _i State at time period t-1

And state at time period t

The time dependence relationship between the two is as follows:

indicating a forgetting gate in the long-short time memory network,

is an input gate for the input of the image,

the state of the cell is the state of the cell,

indicating the renewal of the cells and the time of the cells,

which represents the output gate or gates, respectively,

representing a vector join operation, tanh is an activation function.

8. The method of adaptive incremental site selection for a municipal service facility according to claim 7, wherein: predicting the distribution of the popularity of the loss according to the local implicit features, the global implicit features and the time dependency relationship, which comprises the following steps:

S4-2: transferring the local popularity and the global popularity of the existing address position to the address position lacking the popularity by adopting a multi-head map attention network, summarizing the popularity distribution of the existing neighbor address position facility, and calculating the address position l in a time period t _i ∈L _u Local spatial popularity prediction distribution of

Comprises the following steps:

Comprises the following steps:

s4-3: obtaining an address location l _i The state output by the long-time and short-time memory network in the time period t-1

Comprises the following steps:

wherein, W _td The weighted parameter matrix is shared by all long-time memory network output states and can be learned, and Softmax () represents Softmax function operation;

s4-4: fuse the

And

obtaining the missing popularity distribution

Comprises the following steps:

wherein for a given popularity distribution

Means of calculation of the entropy, wherein

Representing a vector

J-th dimension of (a), p denotes a vector

The number of dimensions of (a).

9. The method of adaptive incremental site selection for a municipal service as claimed in claim 8, wherein: and combining the local implicit features, the global implicit features and the missing popularity distribution to obtain a final popularity prediction result, which specifically comprises the following steps:

s5-1: connecting the local hidden features

Global hidden features

And lack of prevalence distribution

S5-2: address location l _i Is represented by the final hidden feature vector

10. A self-adaptive incremental site selection system for urban service facilities is characterized in that: comprises an address association diagram construction module, a feature extraction module, a missing popularity prediction module and an addressing module,

the missing popularity prediction module uses a long-time memory network to construct a time dependency relationship of address positions, predicts missing popularity distribution according to the local hidden features, the global hidden features and the time dependency relationship and transmits the missing popularity distribution to the addressing module;