CN111611497B

CN111611497B - Method, system, equipment and storage medium for measuring urban commute characteristics

Info

Publication number: CN111611497B
Application number: CN202010331726.9A
Authority: CN
Inventors: 卢有云; 陈仕奇; 肖重阳; 李松杰; 姜先浩; 黄正忠
Original assignee: Guangdong Shengtengdixin Technology Co ltd
Current assignee: Guangdong Shengtengdixin Technology Co ltd
Priority date: 2020-04-24
Filing date: 2020-04-24
Publication date: 2024-01-19
Anticipated expiration: 2040-04-24
Also published as: CN111611497A

Abstract

The invention discloses a method, a system, equipment and a storage medium for measuring urban commute characteristics. The method comprises the steps of determining urban commute influence factors according to social media data, then carrying out space-time similarity calculation on the urban commute influence factors to obtain commute position distribution data, calculating excess commute rate and commute capacity utilization rate according to the commute position distribution data, and measuring urban commute characteristics according to the excess commute rate and the commute capacity utilization rate. By extracting time, space and attribute information related to the urban commute in the social media data and calculating the space-time similarity of the time, space and attribute information, the social media data with complex types and multiple sources are organically unified to represent the space-time characteristics of the urban commute in a normalized mode, automatic data processing is achieved, and the efficiency and accuracy of urban commute characteristic measurement are improved.

Description

Method, system, equipment and storage medium for measuring urban commute characteristics

Technical Field

The present invention relates to the field of data processing, and in particular, to a method, a system, an apparatus, and a storage medium for measuring urban commute features.

Background

With the continuous improvement of the urban level in China, the space structure is rapidly reconstructed. Under the influence of marketized economy and national policies, the space structure of the lived space in the city is deeply changed, and the separation of the lived space in the city is aggravated. Meanwhile, the separation of the jobs and the livestocks causes remarkable increase of the commuting distance and the commuting time, thereby causing a series of social problems such as traffic jam, environmental pollution and the like. Therefore, there is a need for efficient measurement of occupancy distribution and commute characteristics of urban residents to solve the above problems.

Disclosure of Invention

The present invention aims to solve at least one of the technical problems existing in the prior art. Therefore, the method for measuring the urban commute features can extract time, space and attribute information related to urban commute in social media data, and perform space-time normalized expression on the time, space and attribute information, so that social media data with complex types and multiple sources are organically unified to express the urban commute space-time features in a normalized mode, automatic data processing is realized, and the efficiency and accuracy of measuring the urban commute features are improved.

In a first aspect, one embodiment of the present invention provides: the city commute feature measuring method comprises the following steps:

determining city commute influencing factors according to the social media data;

carrying out space-time similarity calculation on the urban commute influence factors to obtain commute position distribution data;

calculating excess commute rate and commute capacity utilization rate according to the commute position distribution data;

and measuring the urban commute characteristics according to the excess commute rate and the commute capacity utilization rate.

Further, the determining the city commute impact factor according to the social media data includes:

calculating a correlation degree value between the social media data and the commute influence indexes through a sequence correlation algorithm, wherein each commute influence index corresponds to one of the correlation degree values;

selecting a commute influence index with the association degree value larger than a preset association threshold, and taking social media data corresponding to the commute influence index as urban commute influence factors.

Further, the calculating, by a sequence association algorithm, a degree of association value between the social media data and a commute impact index includes:

and taking the commute influence index as a front piece sequence, taking the social media data as a back piece sequence, calculating the confidence coefficient between the front piece sequence and the back piece sequence through a sequence association algorithm, and taking the confidence coefficient as an association degree value between the social media data and the commute influence index.

Further, the commute influence index comprises one or more of employment index, mixed land utilization index, traffic mode index, duty ratio index of commute in all trips, matching degree index of employee skill and space and personal characteristic index.

Further, the calculating the space-time similarity of the urban commute influence factors to obtain commute position distribution data includes:

constructing a geographic space-time information word stock according to the urban commute influence factors;

constructing space-time vectors according to the geographic space-time information word stock and calculating space-time similarity between the space-time vectors;

and selecting space-time vectors with space-time similarity larger than a preset similarity threshold value for clustering calculation to obtain the commute position distribution data.

Further, clustering calculation is performed on the start information and the end information of the space-time vector through a clustering algorithm to obtain commute position distribution data, wherein the commute position distribution data comprises: start position group data and end position group data.

Further, the method comprises the steps of,

the excess commute is expressed as:

the commute capacity usage is expressed as:

wherein E represents an excess commute, ARC represents an actual commute distance, MRC represents a minimum commute distance, maxRC represents a maximum commute distance, C _u Representing commute capacity usage.

The embodiment of the invention has at least the following beneficial effects: the efficiency and the accuracy of urban commute feature measurement are improved.

In a second aspect, one embodiment of the present invention provides: a city commute feature measure system, comprising:

determining an urban commute influencing factor unit: determining city commute influencing factors according to the social media data;

a space-time similarity calculation unit: the method is used for carrying out space-time similarity calculation on the urban commute influence factors to obtain commute position distribution data;

a first calculation unit: for calculating excess commute rate and commute capacity usage rate from the commute location distribution data;

city commute feature measurement unit: for measuring city commute features based on the excess commute rate and commute capacity usage.

In a third aspect, one embodiment of the invention provides: an urban commute feature measurement device, comprising:

at least one processor, and,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the urban commute feature measure method according to any of the first aspects.

In a fourth aspect, one embodiment of the present invention provides: a computer readable storage medium storing computer executable instructions for causing a computer to perform the urban commute feature measure method according to any one of the first aspects.

The embodiment of the invention has the beneficial effects that:

according to the embodiment of the invention, urban commute influence factors are determined according to social media data, then space-time similarity calculation is carried out on the urban commute influence factors to obtain commute position distribution data, excess commute rate and commute capacity utilization rate are calculated according to the commute position distribution data, and urban commute characteristics are measured according to the excess commute rate and the commute capacity utilization rate. By extracting time, space and attribute information related to the urban commute in the social media data and calculating the space-time similarity of the time, space and attribute information, the social media data with complex types and multiple sources are organically unified to represent the space-time characteristics of the urban commute in a normalized mode, automatic data processing is achieved, and the efficiency and accuracy of urban commute characteristic measurement are improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:

FIG. 1 is a flow chart of an embodiment of a method for measuring urban commute features according to the present invention;

FIG. 2 is a schematic diagram of a sequence association algorithm of an embodiment of the urban commute feature measurement method according to the present invention;

fig. 3 is a schematic flow chart of step S20 in fig. 2;

FIG. 4 is a schematic diagram of an embodiment of a city commute feature measurement system in accordance with an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will explain the specific embodiments of the present invention with reference to the accompanying drawings. It is evident that the drawings in the following description are only examples of the invention, from which other drawings and other embodiments can be obtained by a person skilled in the art without inventive effort.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

The traditional urban commute feature measure research is mainly based on questionnaire and census data, and the method is high in cost and limited in timeliness. With the development of technologies such as mobile internet and big data, social media data reflecting individual behaviors such as network check-in, user comments, GPS positioning and the like are generated, the data are convenient to collect, low in cost and high in timeliness, and can replace questionnaire survey data for research to collect urban trip information and analyze rail traffic commuting behavior characteristics. According to the embodiment of the invention, the time, space and attribute information related to the urban commute in the social media data are extracted, and the time-space standardization expression is carried out on the time-space standardization expression, so that the social media data with complex types and multiple sources are organically unified to represent the time-space characteristics of the urban commute, the automatic data processing is realized, and the efficiency and the accuracy of the urban commute characteristic measurement are improved.

The process of city commute feature measurement is described below in connection with examples.

An embodiment of the present invention provides a method for measuring urban commute features, and fig. 1 is a schematic flow chart of the method for measuring urban commute features provided in the embodiment of the present invention, as shown in fig. 1, the method for measuring urban commute features includes:

s10: urban commute influencing factors are determined according to the social media data.

In an exemplary implementation, determining the city commute impact factor according to the social media data using a sequence association algorithm, the step S10 includes:

calculating a correlation degree value between the social media data and the commute influence indexes by adopting a sequence correlation algorithm, wherein each commute influence index corresponds to one of the correlation degree values;

selecting a commute influence index with the association degree value larger than a preset association threshold value, and taking the corresponding social media data as urban commute influence factors.

In one embodiment, a sequence association algorithm is adopted to calculate a correlation degree value between social media data and a commute influence index, and first, the commute influence index is respectively constructed from the aspects of city morphology, infrastructure land development degree related factors, traffic mode related factors, mechanic skill related factors, personal characteristic related factors and the like, for example, the commute influence index comprises: employment indexes, mixed land utilization indexes, traffic mode indexes, duty ratio indexes of commute in all trips, matching degree indexes of employee skills and space and personal characteristic indexes.

As shown in table 1 below.

TABLE 1

Wherein, in some embodiments, employment indicators include, for example: no factors such as industry, backlog, full time, part time, etc.; the mixed land utilization index includes, for example: mixed land utilization degree factors of office buildings, residential buildings and the like; traffic pattern indicators include, for example: factors such as self-driving, public transportation, subway, riding, walking and the like; the duty ratio index of the commute in all trips comprises the following steps: the duty ratio of resident commuting or the duty time consumption ratio of commuting; the index of the matching degree of the worker skill and the space comprises the following steps: whether all the science and technology parks are related skills of high technology, whether all the residents in the industrial parks are related skills of technicians, and the like; personal characteristic indicators include, for example: age, sex, occupation, etc. The city form, the related factors of the land development degree of the infrastructure, the related factors of the traffic mode and the related factors of the skills of the technicians can be obtained from open statistical yearbook, and the personal characteristic index can be obtained by analyzing social media data, for example, extracting the social media data: the text information such as "driving", "subway extrusion", "bus" analyzes the personal characteristic index, for example, the commute time of the resident is biased to nine-night five.

The sequence association algorithm is used for mining the association degree among the multi-event sequences. In one embodiment, the above-mentioned commute impact index is characterized by using time, space and attribute information in the social media data, specifically, the commute impact index is used as a front piece sequence, the social media data is used as a back piece sequence, and then a sequence association algorithm is used to calculate the confidence between the front piece sequence and the back piece sequence, as the association degree between the social media data and the commute impact index. The method aims at measuring the association degree between different commute influence indexes and data related to commute in social media data and judging which commute influence indexes have the largest influence on urban commute.

In one embodiment, the post-sequence may be obtained by analyzing information such as time, space, attribute, etc. from the social media data in a time-stamped manner, for example.

As shown in table 2 below, is an illustration of a piece of social media data.

TABLE 2

As shown in table 2, according to the geographic location, the events including information such as time, space, and attribute can be obtained by integrating the content of the social media data in the region.

The specific implementation of the sequence association algorithm is as follows.

First define a global count, denoted GC, given a set of sequences denoted as: s= { S ₁ ,S ₂ …S _i …S _n S, where S ₁ …S _i …S _n-1 All represent the sequence of the back-parts, i.e. the commute impact index, S _n Representing a sequence of widgets, i.e., social media data.

Set sequence S _i (i.noteq.n) contains a plurality of events, denoted as A _i (A _i ∈S _i ) Event A _i The total number of occurrences is called global count, event A _i Refers to a specific value in each index, e.g. sequence S ₁ Representing traffic pattern index, event A therein _i Refers to a self-driving commute event.

Then defining local count, which is marked as LC and represents the number of occurrence of the front-part event in the interested rear-part field, wherein the rear-part event is the result of the front-part event, and the rear-part event neighborhood does not contain the time after the occurrence of the rear-part event. In the multi-event sequence association algorithm mining, a time range of mutual influence among associated events is defined as an influence domain, different events in the influence domain are regarded as time adjacency, and the adjacency relationship can be represented by means of a time window, namely, the events occurring in the same time window are regarded as adjacency to each other, namely, in one adjacency domain.

On the basis of the setting, the confidence between the front part sequence and the back part sequence is constructed, in one embodiment, the conditional probability that the front part sequence is the back part sequence is set, and the confidence of the association degree between the sequences is expressed as follows by considering the distribution characteristics of the back part sequence:

wherein Conf (A _i C) represents the confidence between the event of the back-piece of interest and the event of the front-piece sequence, C represents the event of the back-piece of interest, A _i Representing events in a sequence of widgets, LC (A) _i -C) represents a local count of events with events as back-piece constraints, LC (C) represents a back-piece event of interest in the neighborhood of back-piece event C, a sequence of events of the front-pieceThe number of total occurrences, GC (A _i ) Representing event A _i The global count of occurrences is summed.

The confidence coefficient comprehensively considers the distribution characteristics of the front part sequence and the back part sequence, and measures the association degree value between the front part sequence and the back part sequence, namely the association degree value of the commute influence index and the time, space and attribute information in the social media data, wherein the association degree value is in the range of [0,1].

As shown in fig. 2, a specific schematic diagram of the sequence association algorithm is shown. Two sequences: front piece sequence S ₁ And a back part sequence S ₂ Wherein the front piece sequence S ₁ Includes A ₁ 、A ₂ 、A ₃ Multiple events, back-piece sequence S ₂ Selecting an interesting event C, wherein a dotted line frame represents a time adjacent domain of the event C, and the interesting event C of the back part is taken as a constraint front part event A ₁ When it is local, count LC (a ₁ →c) =4, and similarly the event of interest C is taken as constraint front event a ₂ When it is local, count LC (a ₂ C) =3; front event A ₁ Is (A) ₁ )＝8。

In one embodiment, a commute impact index with a relevance value greater than a preset relevance threshold is selected, and the corresponding social media data is used as a city commute impact factor. The urban commute influence factors belong to a part of social media data, namely the social media data with the largest influence on the commute influence index is obtained through screening in the process and is used as the urban commute influence factors.

For example, in order to ensure the availability and data amount of the index, a preset association threshold value set based on experience is 0.5, that is, a commute influence index greater than or equal to 0.5 is reserved, in the above-mentioned mixed land utilization index in this example, in a region with high mixed land utilization degree, social media data in the morning and evening peak time are provided with information such as "working", "punching", "blocking", "ordering" and the like, the confidence coefficient is calculated to be 0.89 according to the above-mentioned sequence association algorithm, and the social media data corresponding to the commute influence index is reserved and used as the urban commute influence factor.

It is to be understood that the foregoing illustrations are exemplary implementations and are not to be construed as limiting the embodiments of the present application.

S20: and carrying out space-time similarity calculation on the urban commute influence factors to obtain commute position distribution data.

In one embodiment, first, space-time vectors of city commute influencing factors in the social media data are constructed, space-time similarity between the space-time vectors is calculated, and space-time normative expression of the city commute influencing factors in the social media data is achieved, so that commute position distribution data are obtained.

In one embodiment, as shown in fig. 3, a flow chart of step S20 is shown. The method for calculating the space-time similarity according to the urban commute influence factors to obtain the commute position distribution data specifically comprises the following steps.

S201: and constructing a geographic space-time information word stock according to the urban commute influence factors.

For example, the space-time information is separated from the text information of the social media data to obtain a plurality of geographic space-time information word banks, wherein the audio and video data can be converted into the text information and then processed. The geographic space-time information word stock can be constructed into a plurality of word stocks according to the needs, such as leisure entertainment, family life and other human activity related social media data, and the word stock can be constructed according to the needs, such as a work commute geographic space-time information word stock, a weekend leisure geographic space-time information word stock and the like.

S202: and constructing space-time vectors according to the geographic space-time information word stock and calculating the space-time similarity between the space-time vectors.

For example, a spatio-temporal vector is constructed from the mth geographic spatio-temporal information thesaurus using the following formula:

V _d,m ＝{(d _m,1 ,w _m,1 ),(d _m,2 ,w _m,2 )…(d _m,i ,w _m,i )…(d _m,n ,w _m,n )}

wherein V is _d,m Representing space-time vectors, d _m,i 、w _m,i Respectively represent the mth numberThe location of the ith data in the data and the corresponding weight, for example, the weight may be the number of times the data appears.

Then, according to whether the above space-time vector contains time or longitude and latitude information, the space-time vector is divided into two parts, and the space-time similarity is calculated and expressed as:

wherein similarity (V) _d,m ,V _d,k ) Representing space-time vector V _d,m And space-time vector V _d,k The greater the space-time similarity, the higher the space-time similarity between the two, and the closer the represented time and space information. Therefore, by matching with the space-time vector with time or longitude and latitude information, the space-time vector information with high space-time similarity can be obtained.

For example, a geographical space-time word stock related to the commute of the next shift is constructed, words such as 'off shift', 'order', 'punch card', 'return home', 'squeeze subway' and the like are included in the word stock, then a space-time vector related to the commute of the next shift is generated according to the formula, then a space-time vector of the time in the urban area is generated, and finally the similarity between the two space-time vectors is calculated based on the formula, so that the space-time normative expression of the urban commute influence factors in the social media data is realized.

S203: and selecting space-time vectors with space-time similarity larger than a preset similarity threshold value for clustering calculation to obtain commute position distribution data.

In one embodiment, comparing the space-time similarity with a preset similarity threshold to obtain space-time vectors with the space-time similarity larger than the preset similarity threshold, and clustering initial information and end information of the space-time vectors by adopting a k-Means clustering algorithm to obtain commute position distribution data respectively, so as to realize automatic mining of urban commute space characteristics, wherein the commute position distribution data comprises: start position group data and end position group data.

Let t= { T _k "represents a set of geographic spatiotemporal wordsLibrary, n= |t| represents the total number of words in the dataset, satisfying the condition: k is more than or equal to 1 and less than or equal to N. Wherein the method comprises the steps ofAn ordered sequence of time-stamped, x representing a set of spatiotemporal tuples in the dataset _i 、y _i Respectively represent longitude and latitude coordinate values, t _i Represents a time value, and satisfies: i=0, 1, …, n, t ₀ <t ₁ <…<t _n 。

On the basis of the above definition, o= (x) in the representation data set is defined ₀ ,y ₀ ,t ₀ ) Start information space-time tuple, d= (x ₀ ,y ₀ ,t ₀ ) And representing the end point information space-time tuples in the data set, and clustering the start information space-time tuples and the end point information space-time tuples by using a k-Means algorithm to obtain the distribution condition of the start point position group data and the end point position group data.

In one embodiment, the start position group data and the end position group data are defined as position distribution data of a living place and position distribution data of a working place, respectively, and the number of data sets is the total number of people involved in analysis.

S30: and calculating the excess commute rate and the commute capacity utilization rate according to the commute position distribution data. Wherein the commute capacity utilization rate reflects the commute proportion when the commute of a certain city uses the job-to-living separation to be worst, and the excess commute rate reflects the proportion of the excess commute of the city.

In one embodiment, the excess commute is expressed as:

commute capacity usage is expressed as:

wherein E represents the excess commute rate, ARC represents the actual commute distanceMRC represents the minimum commute distance, maxRC represents the maximum commute distance, C _u Representing commute capacity usage.

Wherein, actual commute distance, minimum commute distance and maximum commute distance are expressed as respectively:

since the minimum commute distance MRC and the maximum commute distance MaxRC are calculated and solved by a linear programming method, a standard linear programming method is used to estimate MRC and MaxRC, and the estimation process is equivalent to solving the following linear programming problem:

wherein O is _h Indicating the total number of people living in the place, D _j Representing the population count of the workplace, t _hj Representing the total number of commutes from residence to workplace, d _hj Representing the linear distance between the residence to the workplace.

S40: city commute features are measured based on excess commute rate and commute capacity usage.

In one embodiment, the excess commute rate and the commute capacity usage rate obtained above are used to measure the commute efficiency and job balance between cities. For example, the commute capacity usage reflects the proportion of the commute when the urban commute uses the worst job-to-job separation, and if the value of the commute capacity usage is smaller, it means that the higher the commute efficiency of the city, the better the job-to-job balance. Meanwhile, the excess commute rate reflects the proportion of urban excess commute and is used for measuring the degree that the average commute cost exceeds the theoretical average minimum commute cost, and if the excess commute rate is higher, the urban commute efficiency is lower, so that the degree of unbalance of the job, namely the degree of unbalance of the job and the minimum commute distance which can be provided by the existing work and the living place work, is also indicated to a certain degree. Therefore, the commuting efficiency and the job balance condition can be compared across cities or regions by calculating the commuting efficiency of different cities or regions.

According to the urban commute feature measurement method, urban commute influence factors are determined according to social media data, space-time similarity calculation is conducted on the urban commute influence factors to obtain commute position distribution data, excess commute rate and commute capacity utilization rate are calculated according to the commute position distribution data, and finally urban commute features are measured according to the excess commute rate and the commute capacity utilization rate. By extracting time, space and attribute information related to the urban commute in the social media data and carrying out space-time standardization expression on the time, space and attribute information, the social media data with complex types and multiple sources are organically unified to represent the space-time characteristics of the urban commute in a standardized way, automatic data processing is realized, and the efficiency and accuracy of urban commute characteristic measurement are improved.

According to the embodiment of the application, the functional modules of the urban commute feature measurement system can be divided according to the embodiment of the method, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

As shown in fig. 4, an exemplary structural diagram of the urban commute feature measurement system according to the foregoing embodiment includes:

determining urban commute influencing factor unit 100: determining city commute influencing factors according to the social media data;

the spatiotemporal similarity calculation unit 200: the method comprises the steps of calculating space-time similarity of urban commute influence factors to obtain commute position distribution data;

the first calculation unit 300: for calculating excess commute rate and commute capacity usage rate from the commute location distribution data;

city commute feature measure unit 400: for measuring city commute characteristics based on excess commute rate and commute capacity usage.

The specific details of the unit modules in the city commute feature measurement system are described in detail in the city commute feature measurement method corresponding to the above embodiment, so that they will not be described in detail here.

In addition, the invention also provides city commute feature measuring equipment, which comprises:

at least one processor, and,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the urban commute feature measure method according to any of the above embodiments.

In addition, the present invention also provides a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the urban commute feature measurement method according to the above embodiment.

The computer readable medium may include computer storage media and communication media without loss of generality. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices. Of course, those skilled in the art will recognize that the computer storage medium is not limited to the one described above.

It should be noted that: the foregoing sequence of the embodiments of the present application is only for describing, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system, apparatus and storage medium embodiments, the description is relatively simple, as it is substantially similar to method embodiments, with reference to the accompanying descriptive matter in which method embodiments are illustrated.

The above embodiments are only for illustrating the technical solution of the present invention, not for limiting the same, and although the present invention has been described in detail with reference to the above embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

Claims

1. The city commute feature measuring method is characterized by comprising the following steps:

taking a commute influence index as a front piece sequence, taking social media data as a back piece sequence, calculating the confidence coefficient between the front piece sequence and the back piece sequence through a sequence association algorithm, taking the confidence coefficient as an association degree value between the social media data and the commute influence index, wherein each commute influence index corresponds to one association degree value; selecting a commute influence index with the association degree value larger than a preset association threshold, and taking social media data corresponding to the commute influence index as urban commute influence factors; the confidence coefficient is calculated by the following formula:

wherein Conf (A _i -C) representing the confidence of the front piece sequence and the back piece sequence, the front piece sequence being a conditional probability of the back piece sequence; a is that _i Representing events in the sequence of front pieces, C representing events of the sequence of back pieces, LC (a) _i -C) represents a local count constrained by the events of the back-piece sequence, LC (C) represents the total number of occurrences of events in the front-piece sequence within the event neighborhood of the back-piece sequence, GC (A) _i ) Representing the number of event aggregate occurrences in the sequence of widgets;

selecting space-time vectors with space-time similarity greater than a preset similarity threshold value for clustering calculation to obtain commute position distribution data;

2. The urban commute feature measurement method of claim 1, wherein the commute impact indicators comprise one or more of employment indicators, mixed land utilization indicators, traffic pattern indicators, duty ratio indicators of commute throughout travel, employee skill to space matching degree indicators, personal feature indicators.

3. The urban commute feature measurement method of claim 1, wherein the starting information and the end information of the space-time vector are respectively clustered by a clustering algorithm to obtain the commute location distribution data, the commute location distribution data comprising: start position group data and end position group data.

4. A method for urban commute feature measurement according to claim 1, wherein,

the excess commute is expressed as:

the commute capacity usage is expressed as:

5. A city commute feature measurement system, comprising:

determining an urban commute influencing factor unit: the method comprises the steps of using a commute influence index as a front piece sequence, using social media data as a back piece sequence, calculating confidence coefficient between the front piece sequence and the back piece sequence through a sequence association algorithm, using the confidence coefficient as an association degree value between the social media data and the commute influence index, wherein each commute influence index corresponds to one association degree value; selecting a commute influence index with the association degree value larger than a preset association threshold, and taking social media data corresponding to the commute influence index as urban commute influence factors; the confidence coefficient is calculated by the following formula:

a space-time similarity calculation unit: the method is used for constructing a geographic space-time information word stock according to the urban commute influence factors; constructing space-time vectors according to the geographic space-time information word stock and calculating space-time similarity between the space-time vectors;

6. An urban commute feature measurement device, comprising:

at least one processor, and,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the city commute feature measure method of any of claims 1 to 4.

7. A computer readable storage medium storing computer executable instructions for causing a computer to perform the urban commute feature measure method according to any one of claims 1 to 4.