CN110909765B

CN110909765B - Pedestrian behavior pattern classification method for big track data

Info

Publication number: CN110909765B
Application number: CN201911017614.XA
Authority: CN
Inventors: 马小雯; 刘钊岐; 赵政康; 郑焕波; 舒元昊
Original assignee: CETHIK Group Ltd
Current assignee: CETHIK Group Ltd
Priority date: 2019-10-24
Filing date: 2019-10-24
Publication date: 2023-06-20
Anticipated expiration: 2039-10-24
Also published as: CN110909765A

Abstract

The invention discloses a pedestrian behavior pattern classification method for big track data, which comprises the following steps: constructing electric vehicle track data and pedestrian track data; counting the active period of pedestrian activity according to the pedestrian track data; obtaining the type division of the urban functional area in the appointed area; counting resident points of pedestrians according to the pedestrian track data, and obtaining main behavior intentions of the pedestrians by combining the types of urban functional areas; calculating overspeed behaviors according to the electric vehicle track data, and counting the times of the overspeed behaviors to obtain overspeed behavior habits of pedestrians; based on the active period, the main behavior intention and the overspeed behavior habit, classifying the pedestrian behavior modes by adopting a combined clustering algorithm combining a TGSOM network model and a K-means algorithm. The invention considers the relation between the pedestrian behavior and the urban functional area type, and considers the special behavior habit of the pedestrian, so that the pedestrian behavior mode division is more fit and practical, and the classification effect and efficiency are considered.

Description

Pedestrian behavior pattern classification method for big track data

Technical Field

The application belongs to the field of big data analysis and mining, and particularly relates to a pedestrian behavior pattern classification method for track big data.

Background

In recent years, RFID radio frequency networks and MAC acquisition systems are widely applied in various places, more and more time and space data related to people and people are acquired and summarized, and massive high-precision individual mobile data are formed. The mass data generally comprises attribute information such as sampling position, time, motion speed and the like, can represent the geospatial position change of a moving object in a period of time, and is used for describing the historical moving behavior of the moving object and is called as 'space-time track big data', and is called as 'track big data' for short.

The motion rule and behavior characteristics of the moving object are hidden in the track big data. The behavior mode in the distributed system is mined by combining the large-scale data processing capability of the distributed system, and informationized service and decision support can be provided for the fields of public safety, medical service, traffic management, business finance and the like of cities. For example, in public safety, individual behavioral patterns that differ significantly from group behavioral patterns are worth preventative attention, as subjects of a particular behavioral pattern are likely to be potentially unstable molecules; in terms of medical services, or its impact on disease transmission can be found through studies of population flow patterns; in the aspect of traffic management, based on different travel modes of urban residents, possible travel distribution of future cities is predicted, and traffic structures and layout are adjusted.

In recent years, track pattern mining research has become a research hotspot in the field of data mining. At present, researches on track modes mainly focus on anchor points (nodes with longer stay time of moving objects in tracks), travel ranges, track shapes, OD (Origin-Destination) flows, time and the like. Existing track classification takes features such as "travel period", "travel distance", "track turn angle", "speed switch point", etc.

For example, patent document with application number CN201811417972.5 discloses a crowd classification method based on spatio-temporal data track characteristics, which comprises the following steps: s1) cleaning space-time data; s2) extracting the track of the pedestrian; s3) compressing the pedestrian track; s4) classifying the pedestrian track. The patent is based on the existing common characteristics, lacks of the spatial connection between the behaviors of pedestrians and urban functional areas and the identification of integrating special behavior habits (such as overspeed during driving) of pedestrians into track modes, and has low applicability of classification results.

Disclosure of Invention

The aim of the application is to provide a pedestrian behavior pattern classification method for track big data, which considers the connection between pedestrian behaviors and urban functional area types, and also considers the special behavior habits of pedestrians, so that the pedestrian behavior pattern classification is more practical, and the classification effect and efficiency are both considered.

In order to achieve the above purpose, the technical scheme adopted by the application is as follows:

a pedestrian behavior pattern classification method facing to track big data comprises the following steps:

reading electric vehicle driving data and pedestrian moving data in a designated area, preprocessing the data, and constructing electric vehicle track data and pedestrian track data according to the preprocessed data;

counting the active period of pedestrian activity according to the pedestrian track data;

acquiring POI data of a designated area, and obtaining urban functional area type division in the designated area according to the POI data;

selecting resident points of pedestrians according to the pedestrian track data, dividing the resident points by combining with city function region types to obtain city function region types to which the resident points belong, and generating main behavior intents of the pedestrians according to the city function region types to which the resident points belong;

calculating overspeed behaviors according to the electric vehicle track data, and counting the times of overspeed behaviors of each electric vehicle to obtain overspeed behavior habits of pedestrians corresponding to the electric vehicles;

based on the active period of the pedestrian activity, the main behavior intention of the pedestrian and the overspeed behavior habit of the pedestrian, classifying the pedestrian behavior mode by adopting a combined clustering algorithm combining a TGSOM network model and a K-means algorithm to obtain a pedestrian behavior mode classification result.

Preferably, the electric vehicle driving data includes a plurality of electric vehicle records, and each electric vehicle record includes: unique identification of the electric vehicle, longitude and latitude coordinates, record generation time and record ending time;

the pedestrian movement data comprises a plurality of pedestrian records, and the field of each pedestrian record comprises: the pedestrian unique identification, longitude and latitude coordinates, record generation time and record end time.

Preferably, the preprocessing of the data includes:

data cleaning: removing electric vehicle records and pedestrian records which contain messy codes or cannot be analyzed, and removing electric vehicle records and pedestrian records with longitude and latitude coordinates not in a designated area;

data complement: if the field value of the record generation time or the record ending time in the electric vehicle record is missing, assigning the field of the missing field value according to the fact that the field values of the record generation time and the record ending time in the same electric vehicle record are the same; if the field values of the record generation time and the record end time in the current electric vehicle record are missing, assigning the current electric vehicle record by the field values of the record generation time and the record end time in the electric vehicle record with the longitude and latitude coordinates closest to each other;

correcting data: and selecting three adjacent electric vehicle records, combining the two electric vehicle records, calculating the speed, and if the speed calculated for the two times is larger than the maximum speed of the electric vehicle in the three times of calculation, removing the same electric vehicle records used in the two times of calculation, and sequentially taking the value to calculate to finish data correction.

Preferably, the constructing electric vehicle track data and pedestrian track data according to the preprocessed data includes:

screening electric vehicle records aiming at the same electric vehicle according to the unique electric vehicle identification, arranging all electric vehicle records of the electric vehicle according to the sequence of record generation time, and combining to obtain single vehicle track data of the electric vehicle, wherein the single vehicle track data of all electric vehicles form the electric vehicle track data;

and screening pedestrian records aiming at the same pedestrian according to the unique pedestrian identifier, arranging all the pedestrian records of the pedestrian according to the sequence of record generation time, and combining to obtain single track data of the pedestrian, wherein the single track data of all the pedestrians form the pedestrian track data.

Preferably, the statistics of the activity period of the pedestrian activity according to the pedestrian track data includes:

dividing the time of day into a plurality of time periods at time intervals T;

and counting the number of pedestrian records in each time period of each pedestrian per day according to the single-person track data, and selecting the time period with the largest number of pedestrian records in one day as the active time period of the pedestrian activity in the same day.

Preferably, the obtaining the POI data of the specified area, and obtaining the city function area type division in the specified area according to the POI data includes:

acquiring POI data in a designated area, wherein the POI data comprises attributes including names, longitudes, latitudes, primary classification, secondary classification and tertiary classification;

gridding the designated area according to a preset grid specification, and numbering each grid obtained after gridding;

dividing the POI data according to the regional scope of each grid and the longitude and latitude of the POI data to obtain POI data distributed in each grid;

and (3) adopting an LDA theme model, taking an attribute value of the secondary classification of the POI data as an input field of the LDA theme model, and identifying the type of the urban functional area to which each grid belongs in the designated area to obtain the type division of the urban functional area in the designated area.

Preferably, the selecting the resident point of the pedestrian according to the pedestrian track data, and dividing the resident point by combining with the city function region type to obtain the city function region type to which the resident point belongs, and generating the main behavior intention of the pedestrian according to the city function region type to which the resident point belongs includes:

acquiring longitude and latitude coordinates of a minimum wrapping rectangle four to the designated area, and calculating an average value of the longitude and latitude coordinates four to the designated area as a center point of the designated area;

calculating to obtain the residence time of the same longitude and latitude coordinates according to the record generation time and the record ending time in the pedestrian record;

selecting longitude and latitude coordinates with longest and second longest accumulated residence time in a designated time as a pair of resident points;

dividing according to the types of the urban functional areas in the designated areas to obtain the types of the urban functional areas to which the resident points belong;

setting a constant standing point with the longest accumulated residence time as a starting point of main behaviors of pedestrians, setting a constant standing point with the second longest accumulated residence time as an end point of the main behaviors of the pedestrians, and acquiring the types of urban functional areas of the starting point and the end point to generate the main behaviors of the pedestrians; acquiring the direction from a starting point to a terminal point to generate the direction of main pedestrian behaviors, wherein the direction of the main pedestrian behaviors is the direction approaching to or away from the central point;

a primary behavior intent of the pedestrian is generated, the primary behavior intent including a purpose of the primary behavior of the pedestrian and a direction of the primary behavior of the pedestrian.

Preferably, the calculating the overspeed behavior according to the electric vehicle track data, counting the number of overspeed behaviors of each electric vehicle to obtain overspeed behavior habits of pedestrians corresponding to the electric vehicles, includes:

acquiring two adjacent electric vehicle records in single vehicle track data, calculating a running distance according to longitude and latitude coordinates of the two electric vehicle records, calculating a running time according to a record generation time of the former electric vehicle record and a record ending time of the latter electric vehicle record, and calculating a running speed according to the running distance and the running time;

acquiring speed limit information of a road section corresponding to the driving distance, judging whether overspeed behavior exists according to the speed limit information and the driving speed, and accumulating overspeed behavior times of the electric vehicle if overspeed behavior exists;

sequentially acquiring two adjacent electric vehicle records, continuously judging overspeed behaviors of the electric vehicle, and taking the accumulated overspeed behavior times as overspeed behavior habits of the electric vehicle;

and taking the overspeed behavior habit of the electric vehicle as the overspeed behavior habit of the pedestrian according to the mapping relation between the electric vehicle and the pedestrian.

Preferably, the classifying the pedestrian behavior mode by adopting a combined clustering algorithm combining a TGSOM network model and a K-means algorithm based on the active period of the pedestrian action, the main behavior intention of the pedestrian and the overspeed behavior habit of the pedestrian, to obtain a classification result of the pedestrian behavior mode, comprises:

normalizing the activity period of the pedestrian action, the main behavior intention of the pedestrian and the overspeed behavior habit of the pedestrian, and forming a feature vector from the normalized data as the behavior feature of the pedestrian;

taking the behavior characteristics of all pedestrians as an input vector set of the TGSOM network model to obtain a group of weight vectors which are output by the TGSOM network model and comprise a plurality of categories;

and clustering all pedestrians by taking the set of weight vectors as an initial clustering center of a K-means algorithm to obtain a behavior pattern classification result aiming at each pedestrian.

The pedestrian behavior pattern classification method for the track big data provided by the application preprocesses the original acquired data, avoids the influence of drift data on classification results, and ensures the fluency of subsequent data processing; the method has the advantages that the active period of pedestrian activity, the main behavior intention of the pedestrian and the overspeed behavior habit of the pedestrian are taken as the pedestrian characteristics during classification, the connection between the pedestrian behavior and the urban functional area type is considered, and meanwhile, the special behavior habit of the pedestrian is considered, so that the pedestrian behavior pattern classification is more fit and practical; and the pedestrian behavior mode is classified by adopting a combined clustering algorithm combining a TGSOM network model and a K-means algorithm, so that the classifying effect and the classifying efficiency are considered.

Drawings

Fig. 1 is a flowchart of a pedestrian behavior pattern classification method for track big data in the present application.

Detailed Description

The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

In one embodiment, a method for classifying pedestrian behavior patterns facing to large track data is provided, and the pedestrian behavior patterns in an area are classified based on specified time and pedestrian data in the area, so that the method has good application significance in public safety, medical service, traffic management and the like.

It should be noted that, pedestrians in the present embodiment include pedestrians and pedestrians traveling by means of a vehicle, so that data analysis is more comprehensive and complete.

As shown in fig. 1, the pedestrian behavior pattern classification method for track big data of the present embodiment includes:

step 1: and reading the running data and the pedestrian movement data of the electric vehicle in the appointed area, preprocessing the data, and constructing electric vehicle track data and pedestrian track data according to the preprocessed data.

As can be seen from the definition of the pedestrian in the present embodiment, the pedestrian movement data in step 1 includes the pedestrian riding on the electric vehicle, that is, the pedestrian movement data includes the data mapped to the pedestrian corresponding to the electric vehicle movement data, and in order to facilitate distinguishing and utilizing the partial data, the correspondence relationship between the pedestrian and the electric vehicle is established in advance. In order to facilitate data acquisition, the running data of the electric vehicle in the embodiment is acquired by an RFID system, the pedestrian movement data is acquired by a WIFI probe system, and the acquired data is stored in an Hbase database for later use.

When the data is read, the original RFID data and the WIFI probe data can be read from the Hbase database by using a tool Pyspark provided by Apache Spark. Since the behavior pattern analysis is established within a specified time, the embodiment separates the data acquisition and the data reading, uses the database as a storage relay of the data, and is convenient for reading the data meeting the requirements when needed.

The designated area mentioned in the present embodiment is generally a county region as the designated area, and the designated time mentioned is set to one week in consideration of the fact that the behavior of pedestrians is generally periodic by one week. The designated area and the designated time may be set as needed.

When analyzing the big track data, it is necessary to identify the identity, time and place, so in one embodiment, the electric vehicle driving data includes several electric vehicle records, and the fields of each electric vehicle record include, but are not limited to: the unique identification of the electric vehicle, such as "epc", longitude and latitude coordinates, record generation time, such as "ct", record end time.

The pedestrian movement data contains a number of pedestrian records, each of which includes fields including, but not limited to: pedestrian unique identification such as "mac", longitude and latitude coordinates, record generation time, record end time. The longitude and latitude coordinates in each record are also equivalent to a monitoring point.

In order to avoid the influence of drift data on the classification result and the influence of messy codes or unresolved data on subsequent data processing, the data needs to be preprocessed, and a general filtering algorithm, such as a limiting average filtering method, an anti-shake filtering method and the like, can be adopted for the preprocessing.

In one embodiment, to preserve as much valid data as possible, perfect data, the preprocessing of the data includes:

data cleaning: and eliminating the records of the electric vehicles and the pedestrians which contain messy codes or cannot be analyzed, and eliminating the records of the electric vehicles and the pedestrians of which the longitude and latitude coordinates are not in the designated area.

Data complement: if the field value of the record generation time or the record ending time in the electric vehicle record is missing, assigning the field of the missing field value according to the fact that the field values of the record generation time and the record ending time in the same electric vehicle record are the same; if the field values of the record generation time and the record end time in the current electric vehicle record are missing, the current electric vehicle record is assigned by the field values of the record generation time and the record end time in the electric vehicle record with the longitude and latitude coordinates closest to each other.

Correcting data: and selecting three adjacent electric vehicle records, combining the calculated speeds in pairs, and removing the same electric vehicle records used in the two times of calculation in the three times of calculation if the calculated speeds are greater than the maximum speed of the electric vehicle, wherein the data correction is completed through the sequential value calculation, namely, the three adjacent electric vehicle records are sequentially selected to judge until the selection of all electric vehicle records is completed.

And constructing electric vehicle track data and pedestrian track data according to the preprocessed data, comprising:

and screening out electric vehicle records aiming at the same electric vehicle according to the unique electric vehicle identification, arranging all electric vehicle records of the electric vehicle according to the sequence of the record generation time, and combining to obtain single vehicle track data of the electric vehicle, wherein the single vehicle track data of all electric vehicles form the electric vehicle track data.

In the embodiment, only the one-to-one correspondence relationship between pedestrians and electric vehicles is considered for convenience of description, namely, the fixed mapping relationship between the electric vehicles and the pedestrians is considered, and the mapping relationship can be obtained according to the unique identification of the electric vehicle and the unique identification of the pedestrians; if the relation between the pedestrians and the electric vehicles possibly exists in one-to-many, many-to-one or many-to-many mode, the mapping relation between the electric vehicle records and the pedestrian records acquired at the moment is established according to the relation between the electric vehicles and the pedestrians at the monitoring point.

And after the data complement and data correction are carried out on the electric vehicle track data, the pedestrian records corresponding to the modified electric vehicles in the pedestrian track data are required to be synchronously modified.

Step 2: and counting the active period of the pedestrian activity according to the pedestrian track data.

This embodiment is divided statistics for one day (24 hours, i.e., 00:00-24:00) and 30 minutes as time interval T, and the statistics units and time intervals may be adjusted as needed in other embodiments.

In this embodiment, when the statistics are performed, the time of day is divided into a plurality of time periods at time intervals T. And counting the number of pedestrian records in each time period of each pedestrian per day according to the single-person track data, and selecting the time period with the largest number of pedestrian records in one day as the active time period of the pedestrian activity in the same day.

Step 3: and acquiring POI data of the appointed area, and obtaining urban functional area type division in the appointed area according to the POI data.

The obtained data of each POI (Point of Interest, interest point) includes the attribute of name, longitude, latitude, primary class (such as restaurant service), secondary class (such as middle restaurant), tertiary class (such as hot pot store), etc.

a. The designated area is gridded according to a preset grid specification (for example, the grid specification is 1km×1 km), and each grid obtained after gridding is numbered.

b. And dividing the POI data according to the regional scope of each grid and the longitude and latitude of the POI data to obtain POI data distributed in each grid. Since the POI data has longitude and latitude, the POI data with longitude and latitude falling within the regional range of a certain grid can be assigned to the grid, and the POI data can be divided.

c. And (3) adopting an LDA theme model, taking an attribute value of the secondary classification of the POI data as an input field of the LDA theme model, and identifying the type of the urban functional area to which each grid belongs in the designated area to obtain the type division of the urban functional area in the designated area.

In the embodiment, a latent semantic analysis method is adopted, and an LDA (Latent Dirichlet Allocation ) model is selected to identify the type of the urban functional area to which each grid belongs in the research area.

The LDA topic model is a document topic generation model, and is essentially a three-layer Bayesian probability model, and comprises three layers of words, topics and document structures. The model is built based on the following assumptions: each word of an article is considered to be obtained by a process of "selecting a certain topic with a certain probability and selecting a certain word from the topic with a certain probability". In this example, the "region-function-POI" is mapped to the "document-subject-word", and the region is considered to select its function distribution with a certain probability, and the function selects the POI distribution with a certain probability, so that the hidden function configuration is presumed by studying the POI condition of the region.

In order to make the direction of the urban function area type division result more clear and the ideas more clear, in this embodiment, the attribute value of the secondary classification to which the POI data belongs is selected as the input field.

And, some words need to be removed in the modeling process of the LDA topic model, which is called as stop words. In the embodiment, words with insufficient characteristics and no obvious directivity for identifying urban functional areas, such as common place names, house number information and the like, are removed, words which only appear once are removed, and the sparseness of a word frequency matrix is reduced.

After unnecessary words are removed, the present embodiment adopts TF-IDF (term frequency-inverse documentfrequency) to construct a word frequency matrix, that is, when a certain word or phrase appears in one article with high frequency and rarely appears in other articles, the word or phrase is considered to have good category distinguishing capability and is suitable for classification.

In addition, since the number of clusters, i.e. "divided into several functional area types" in the present embodiment, has a large influence on the final classification result of the pedestrian behavior pattern, it needs to be tested and determined according to the actual situation. In this embodiment, the number of clusters is 5-7, and the number can obtain a better classification result and has universality.

Step 4: and selecting resident points of pedestrians according to the pedestrian track data, dividing the resident points by combining with city function region types to obtain city function region types to which the resident points belong, and generating main behavior intents of the pedestrians according to the city function region types to which the resident points belong.

The method comprises the following specific steps:

a. and obtaining longitude and latitude coordinates of a fourth to minimum wrapping rectangle of the specified area, and calculating an average value of the longitude and latitude coordinates of the fourth to minimum wrapping rectangle as a center point of the specified area.

b. And calculating the residence time of the same longitude and latitude coordinates according to the record generation time and the record ending time in the pedestrian record.

c. And selecting longitude and latitude coordinates with longest and second longest accumulated residence time in the appointed time as a pair of resident points.

d. And dividing according to the types of the urban functional areas in the designated areas to obtain the types of the urban functional areas to which the resident points belong.

e. Setting a constant standing point with the longest accumulated residence time as a starting point of main behaviors of pedestrians, setting a constant standing point with the second longest accumulated residence time as an end point of the main behaviors of the pedestrians, and acquiring the types of urban functional areas of the starting point and the end point to generate the main behaviors of the pedestrians; the direction from the starting point to the end point is obtained to generate the direction of the main behavior of the pedestrian, wherein the direction of the main behavior of the pedestrian is the direction approaching to or separating from the central point.

f. A primary behavior intent of the pedestrian is generated, the primary behavior intent including a purpose of the primary behavior of the pedestrian and a direction of the primary behavior of the pedestrian.

Step 5: and calculating overspeed behaviors according to the electric vehicle track data, and counting the times of overspeed behaviors of each electric vehicle to obtain overspeed behavior habits of pedestrians corresponding to the electric vehicles.

In one embodiment, the overspeed behavior determination method is provided as follows:

a. acquiring two adjacent electric vehicle records in the single vehicle track data, calculating the running distance according to longitude and latitude coordinates of the two electric vehicle records, calculating the running time according to the record generation time of the former electric vehicle record and the record ending time of the latter electric vehicle record, and calculating the running speed according to the running distance and the running time.

b. And obtaining speed limit information of a road section corresponding to the driving distance, judging whether overspeed behavior exists according to the speed limit information and the driving speed, and accumulating the overspeed behavior times of the electric vehicle if overspeed behavior exists.

c. And sequentially acquiring two adjacent records of the electric vehicle, continuously judging the overspeed behavior of the electric vehicle, and taking the accumulated overspeed behavior times as overspeed behavior habits of the electric vehicle.

d. And taking the overspeed behavior habit of the electric vehicle as the overspeed behavior habit of the pedestrian according to the mapping relation between the electric vehicle and the pedestrian.

The former electric vehicle record refers to an electric vehicle record with a front record generation time in the two electric vehicle records, and the latter electric vehicle record refers to an electric vehicle record with a rear record generation time (closer to the current time) in the two electric vehicle records.

In order to further improve the convenience of accessing the data in the system, in another embodiment, before the overspeed behavior judgment, the method further includes the following steps:

(1) Each electric vehicle record was packaged in a specific format and a message was sent to the Kafka Topic in the Kafka Broker for data staging using the Kafka producer program.

(2) And (3) programming a consumer program by adopting Spark, and consuming Kafka Topic where the RFID data are.

(3) And storing a previous electric vehicle record required in the overspeed behavior judging process by adopting Redis by taking an electric vehicle (epc) as a key value. That is, in the overspeed behavior judgment process, the former electric vehicle is recorded as data stored in Redis, and the latter electric vehicle is recorded as data read from kalfka.

After overspeed behavior judgment is finished, the data judged to be overspeed behavior is stored in another KafkaTopic, so that consumption of other systems or statistical analysis can be facilitated.

Step 6: based on the active period of the pedestrian activity, the main behavior intention of the pedestrian and the overspeed behavior habit of the pedestrian, classifying the pedestrian behavior mode by adopting a combined clustering algorithm combining a TGSOM network model and a K-means algorithm to obtain a pedestrian behavior mode classification result.

The traditional SOM network model is an unsupervised competition learning network, can map any dimension input mode into one-dimensional or two-dimensional graphics at an output layer, and keeps the topological structure unchanged, so that the SOM network model is often used for dimension reduction, clustering and data visualization. SOM includes input layer, competition layer, inspired by human brain neuron to external stimulus response characteristic in biology, simulate the information processing mode of excitation, coordination and inhibition, competition effect between biological neurons, make each weight relative to winning neuron towards the direction adjustment that is more favorable to its competition "namely regard winning neuron as the centre of a circle, show excitatory side feedback to neighbouring neuron, and show inhibitory side feedback to neighbouring neuron, neighbouring person excites each other, and neighbouring person suppresses each other. This winning neuron then represents a classification of the input pattern.

However, the number of neurons in the competing layer must be given when using a conventional SOM. To remedy this drawback, we use a Tree-structured GrowingSelf-organization Maps (TGSOM) model that can dynamically build a network structure, introduce a growth threshold GT, and implement the generation of new nodes at any suitable location as needed.

The K-means algorithm is a typical clustering algorithm, and the basic idea is to divide data into different classifications by iteration, so that the intra-classification differences are as small as possible, and the inter-classification differences are as large as possible. The K-means algorithm has high efficiency, but the effect is unstable and is greatly influenced by the initial clustering center. And the network structure of the self-organizing map model limits the convergence speed of the network. Therefore, the two are combined into a TGSOMK combined clustering algorithm, and the self-organization and the high efficiency of the algorithm are considered.

The specific steps for classifying pedestrian behavior patterns using TGSOMK are as follows:

a. normalizing the activity period of the pedestrian action, the main behavior intention of the pedestrian and the overspeed behavior habit of the pedestrian, and forming a feature vector from the normalized data of the three as the behavior feature of the pedestrian.

The normalization used in this example was (0 to 1) normalization by the maximum-minimum method.

Step a is further described by taking the feature vector of the pedestrian behavior in one day as an example:

for the active period of pedestrian action, dividing a day into 48 periods at intervals of 30 minutes, and normalizing the maximum (48) minimum (1) value of 21 if 10:00-10:30 am is the most active period, namely the 21 st period, namely the most active period:

value1＝(21-1)/(48-1)＝0.426；

for pedestrian primary behavioral intent, take behavioral purposes (e.g. "from residential to literature", or "from financial office to residential"), behavioral direction (near center point or away from center point) combinations as categories, such as sequentially note "from residential to literature and near city center" as type 1, and "from residential to literature and away from city center" as type 2 … … assuming a total of 50 types, normalize for "from residential to literature and near city center":

value2＝(1-1)/(50-1)＝0；

for the overspeed behavior habit of the pedestrians, counting the overspeed behavior number of each pedestrian in the day, wherein the overspeed behavior number of each pedestrian is denoted as n, the maximum overspeed behavior number of each pedestrian is denoted as max, the minimum overspeed behavior number of each pedestrian is denoted as min, and normalizing the overspeed behavior number:

value3＝(n-min)/(max-min)；

finally, a feature vector vector= (value 1, value2, value 3) is obtained.

b. Taking the behavior characteristics of all pedestrians as an input vector set V of the TGSOM network model to obtain a group of weight vectors which are output by the TGSOM network model and comprise a plurality of categories.

b1, initializing a weight vector of a unique root node of the competitive layer; the minimum number of iterations N is set, where the number of iterations can be set smaller than a conventional SOM, e.g., 300 times, due to the subsequent use of K-means

b2, calculating a growth threshold GT:

GT＝[D×(1-SF) ² ]/[1+1/n(t)]

wherein D is the dimension of the input vector, which is equal to 3 in this embodiment; SF is a spreading factor, when SF is smaller, low-level clustering is realized, when SF is larger, high-level clustering is realized, SF is suggested to be selected (0,0.4) at the beginning, for example, SF=0.3 is taken in the embodiment; n (t) is the total number of network nodes during the t-th training.

b3, randomly selecting a vector V from the input vector set V to be used as the input of the TGSOM network model.

b4, searching winning neurons from the current nodes: calculation ofEach weight vector w _i And the distance E between the input vectors v _i And taking the output neuron corresponding to the weight vector with the smallest distance as the BMU (Best Match Unit) of the current input.

b5, when E _i B6, when the GT is not more than, adjusting the BMU neighborhood; when E is _i >GT, change b7 to increase node.

b6, calculating a neighborhood radius for the BMU: the radius of the neighborhood is larger, and the adjustment degree is smaller and smaller along with the gradual decrease of the iteration times, so that the neighborhood tends to cluster centers;

adjusting weight values: for all output neurons located within the BMU neighborhood, update their weight vectors:

w _j (k+1)＝w _j (k)+lr(k)×(v _k -w _j (k))

wherein w is _j Refers to removing wi of the BMU, i.e. "neighborhood neurons" of the BMU, the learning rate lr decreases continuously with increasing iteration number k.

b7, generating a new sub-node of the BMU, and enabling

w _child ＝v

Wherein w is _child Is a new child node.

And b8, cycling the steps b 3-b 7 until the learning rate attenuation lr is reduced to 0 or N times, and obtaining a group of weight vectors which comprise a plurality of categories and are output by the TGSOM network.

c. And clustering all pedestrians by taking the set of weight vectors as an initial clustering center of a K-means algorithm to obtain a behavior pattern classification result aiming at each pedestrian.

c1, for each sample v (namely, input vector v), calculating Euclidean distance between each clustering center vector and the nearest center is selected as the initially belonged category.

c2, after all samples are initially classified, adjusting each clustering center: each cluster center is set as the center position of all samples in the category, i.e. each dimension of the new cluster center vector is set as the average of the corresponding dimensions of all samples below it.

And c3, repeating the steps c 1-c 2 until the category of the sample is not changed any more, and finishing the classification.

And after the final classification is finished, clustering pedestrians with the same or similar characteristics together to obtain a plurality of clustering point clusters, judging the behavior category of each clustering point cluster according to the characteristics of the clustering point clusters, and taking the behavior category of the clustering point cluster as the behavior mode of each pedestrian in the clustering point clusters, so that a behavior mode classification result aiming at each pedestrian is obtained.

It should be noted that, judging the behavior category to which each cluster belongs according to the characteristics of the cluster, which can be realized by a classification algorithm, wherein the behavior category is used as a label, and the characteristics of the cluster are used as input, so as to perform classification judgment; the classification can also be performed manually, and the classification can be deduced according to the characteristics of each cluster. The finally obtained behavior patterns are, for example, an out-of-city worker with overspeed habit, an in-city casual person without overspeed habit, an unstable molecule with overspeed habit, etc., so that public safety layout is adjusted, traffic management is adjusted, etc., according to the behavior patterns of pedestrians.

According to the pedestrian behavior pattern classification method for the track big data, the original collected data is preprocessed, so that the influence of drift data on classification results is avoided, and the fluency of subsequent data processing is ensured; the method has the advantages that the active period of pedestrian activity, the main behavior intention of the pedestrian and the overspeed behavior habit of the pedestrian are taken as the pedestrian characteristics during classification, the connection between the pedestrian behavior and the urban functional area type is considered, and meanwhile, the special behavior habit of the pedestrian is considered, so that the pedestrian behavior pattern classification is more fit and practical; and the pedestrian behavior mode is classified by adopting a combined clustering algorithm combining a TGSOM network model and a K-means algorithm, so that the classifying effect and the classifying efficiency are considered.

The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. The pedestrian behavior pattern classification method for the track big data is characterized by comprising the following steps of:

based on the active period of pedestrian activity, the main behavior intention of the pedestrian and the overspeed behavior habit of the pedestrian, classifying the pedestrian behavior mode by adopting a combined clustering algorithm combining a TGSOM network model and a K-means algorithm to obtain a pedestrian behavior mode classification result, wherein the method comprises the following steps:

2. The method for classifying a pedestrian behavior pattern based on big trajectory data as claimed in claim 1, wherein the electric vehicle traveling data includes a plurality of electric vehicle records, and each electric vehicle record includes: unique identification of the electric vehicle, longitude and latitude coordinates, record generation time and record ending time;

3. The method for classifying a pedestrian behavior pattern based on trajectory big data as set forth in claim 2, wherein the preprocessing of the data includes:

4. The method for classifying a pedestrian behavior pattern based on big trajectory data according to claim 2, wherein the constructing electric vehicle trajectory data and pedestrian trajectory data from the preprocessed data includes:

5. The method for classifying a pattern of pedestrian behavior based on trajectory profile of claim 4, wherein said counting an active period of pedestrian activity based on the pedestrian trajectory profile comprises:

dividing the time of day into a plurality of time periods at time intervals T;

6. The classification method of pedestrian behavior patterns for large track data according to claim 1, wherein the acquiring POI data of the specified area and obtaining the city function area type division in the specified area according to the POI data comprises:

7. The method for classifying pedestrian behavior patterns facing big track data according to claim 2, wherein the steps of selecting the resident point of the pedestrian according to the track data of the pedestrian, dividing the resident point into city function area types according to the city function area types, and generating the main behavior intention of the pedestrian according to the city function area types to which the resident point belongs, comprise:

8. The classification method of pedestrian behavior patterns facing big track data according to claim 2, wherein calculating overspeed behavior according to track data of electric vehicles, counting the number of overspeed behavior of each electric vehicle to obtain overspeed behavior habit of pedestrian corresponding to electric vehicle, comprises: