CN106682051B

CN106682051B - Method for finding out crowd movement behaviors

Info

Publication number: CN106682051B
Application number: CN201510982408.8A
Authority: CN
Inventors: 王恩慈; 吴泰廷; 高崎钧; 王昭智; 郭奕宏
Original assignee: Industrial Technology Research Institute ITRI
Current assignee: Industrial Technology Research Institute ITRI
Priority date: 2015-11-09
Filing date: 2015-12-24
Publication date: 2020-05-29
Anticipated expiration: 2035-12-24
Also published as: CN106682051A

Abstract

The invention discloses a method for finding out crowd movement behaviors, which comprises the following steps: collecting a plurality of location data regarding a plurality of user devices; detecting a plurality of conventional patterns in the position data to generate a plurality of representative sequences, wherein each representative sequence comprises at least one line segment between a starting position point and an ending position point; and classifying the representative sequences into a plurality of sets according to a plurality of sequence distances among the representative sequences so as to find the moving behaviors of the crowd.

Description

Method for finding out crowd movement behaviors

Technical Field

The present invention relates to a method for finding out the movement behavior of a crowd by collecting location data about a user device.

Background

For many companies and organizations, such as convenience stores in chains, mass transit companies, local governments, etc., it may be important information to know how people move in a city or among multiple cities. For these organizations, there are a number of significant decisions that rely on information about where people are and where people move from, such as setting new bus routes, building new transit stations, opening new storefronts, and building urban public facilities. Therefore, how to effectively find out the information related to the movement of the crowd is one of the issues addressed by the present industry.

Disclosure of Invention

The invention relates to a method for finding out the moving behavior of a crowd and a non-transitory computer readable medium for executing the method.

According to an embodiment of the present invention, a method for finding a movement behavior of a crowd is provided, the method includes: collecting a plurality of location data regarding a plurality of user devices; detecting a plurality of conventional patterns in the position data to generate a plurality of representative sequences, wherein each representative sequence comprises at least one line segment between a starting position point and an ending position point; and classifying the representative sequences into a plurality of sets according to a plurality of sequence distances among the representative sequences so as to find the moving behaviors of the crowd.

In order to better understand the above and other aspects of the present invention, the following embodiments are described in detail with reference to the accompanying drawings:

drawings

Fig. 1 is a schematic diagram illustrating an example of a payment process using a smart card.

FIG. 2 is a flowchart illustrating a method for identifying crowd movement according to an embodiment of the invention.

FIG. 3 is a diagram of an example payment record associated with location data and time retrieved from a plurality of user devices.

FIG. 4 is a flow chart illustrating collecting location data about a user device according to one embodiment of the invention.

Fig. 5A and 5B are schematic diagrams illustrating an example of marking a payment location with a closest reference location point according to an embodiment of the invention.

FIG. 6 is a diagram illustrating a consolidated and simplified payment record according to one embodiment of the invention.

FIG. 7 is a flow chart illustrating detecting a pattern in position data to generate a representative sequence according to one embodiment of the invention.

FIG. 8 is a schematic diagram illustrating a conventional mode of polymerization according to an embodiment of the present invention.

FIG. 9 is a flow chart illustrating a process of calculating a sequence distance between representative sequences according to one embodiment of the invention.

FIG. 10 is a flowchart illustrating a process of calculating a line distance between a first line segment and a second line segment according to an embodiment of the invention.

FIG. 11 is a schematic diagram illustrating a distance between two segments according to an embodiment of the invention.

FIG. 12 is a flowchart illustrating a process of calculating a parallel distance between a first line segment and a second line segment according to an embodiment of the invention.

FIG. 13 is a schematic diagram illustrating a distance between two segments according to an embodiment of the invention.

FIG. 14 is a flowchart illustrating the calculation of the normalized angular distance, the normalized vertical distance, and the normalized parallel distance according to one embodiment of the present invention.

FIGS. 15A-15D are schematic diagrams illustrating various scenarios considering the maximum vertical distance domain according to an embodiment of the invention.

FIG. 16 is a flowchart illustrating a process of determining a sequence distance between a first sequence and a second sequence according to an embodiment of the invention.

FIGS. 17A-17C are diagrams illustrating a combination of multiple mappings between two representative sequences, according to one embodiment of the invention.

FIG. 18 illustrates an example of an invalid mapping between two representative sequences.

FIG. 19 is a flow chart illustrating a process of calculating the mapping distance of the mapping assembly according to an embodiment of the invention.

FIG. 20 is a flow chart illustrating the classification of representative sequences into sets according to one embodiment of the invention.

FIG. 21 is a flowchart illustrating the steps of finding the movement behavior of a crowd and finding a typical sequence according to an embodiment of the invention.

FIG. 22 is a flowchart illustrating an exemplary sequence for finding a target date type according to one embodiment of the present invention.

Detailed Description

In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.

Many people in modern life use smart cards to carry mass transit, such as buses, trains, short transports, etc., and smart cards may also be used as electronic money packs to purchase items or pay fees, for example, smart cards may have funds in advance in the card, may be used in vending machines, in and out of parking lots, or in and out of train stations. Fig. 1 is a schematic diagram of an example of a payment activity performed by a smart card, in this case, the smart card is a contactless smart card. Whenever the smart card is used, a Payment Log (Payment Log) or traffic Log is generated, and since the vending machine or station gate may have static geographical information, location data about the usage of the smart card of a plurality of users can be collected. The service provider issuing the smart card may gather these payment records to obtain information about where and how the crowd is moving.

FIG. 2 is a flowchart illustrating a method for identifying crowd movement according to an embodiment of the present invention, which includes the following steps. In step S100, location data is collected about a plurality of user devices. In step S200, a usual pattern (frequency Patterns) in the position data is detected (Mining) to generate a plurality of Representative Sequences (recurring Sequences). And step S300, classifying the representative sequences into sets (Cluster) according to the sequence distance between the representative sequences so as to find out the movement behaviors of the crowd. Wherein each representative sequence includes at least one line segment between a start position point and an end position point. The method may be implemented, for example, by a software program, which may be stored on an optical disc, and which may include a plurality of instructions associated with a computer processor, the instructions being loadable by the computer processor to perform the method for finding the movement behavior of a crowd as described above. The details of each step are as follows.

Step S100: location data is collected about a plurality of user devices. The user device may comprise a smart card, an electronic payment card, or a mobile device with payment capabilities, which may collect relevant location data when the user device is used for a payment activity. For example, when the smart card is used for payment activities at a payment terminal, a report may be uploaded to the central server, which may include an Identification (ID) of the smart card, the payment amount, the date and time, and the location of the payment terminal. The method for finding out the movement behavior of the crowd is not limited to collecting the position data during the payment activity, and the time for collecting the position data can also comprise the following steps: when the user device enters a station, when money is deposited in the user device, or when the user device is authenticated to enter a building. For ease of understanding, the following description will be described with the collection of location data at the time of a payment event as an example, and with the payment records representing the collected location data.

FIG. 3 is a diagram of an example payment record associated with location data and time retrieved from a plurality of user devices. The payment record may be stored at a central server of the payment service provider. In this example, the payment record includes the fields: uid, departure location, arrival location, time of day, payment amount, transaction type. The uid represents the ID of the ue, i.e. the same uid corresponds to the same ue, i.e. possibly to the same user, so that it is possible to obtain information about where a person is, and by collecting location data about a plurality of ues, the service provider can know what common movement trajectory most people have. The transaction type may be purchase, transit, deposit, or other type for which the payment activity may be at the destination stop and the departure location and arrival location respectively record relevant transit information, such as the geographic coordinates of the departure and arrival stops. For other transaction types than traffic, the departure location field may be used to record the location of the payment activity, while the arrival location field may be used to record the "location".

In the above example, the recorded coordinates may be precise position data. And within the payment record, the number of accurate location coordinates for different locations may be large, possibly due to the large number of stores using the payment service. For finding out the movement behavior of the interested people, the exact Location may not be needed, so the adjacent locations can be regarded as a Semantic Region (Semantic Region), and a Reference Location Point (Reference Location Point) can be selected to represent a Semantic Region. Step S100 may include steps S110 and S120, as shown in fig. 4, which is a flowchart illustrating collecting location data about a user device according to an embodiment of the present invention.

Step S110: a plurality of reference location points are selected. Examples of reference location points may include schools, convenience stores chains, landscape locations determined by the population of the residence. Step S120: each location point in the location data is replaced with the reference location point that is closest to its geographic location. For each payment data in the payment record, the original exact location point may be replaced with the closest one of the reference location points. Fig. 5A and 5B are schematic diagrams illustrating an example of marking a payment location with a closest reference location point according to an embodiment of the invention. Fig. 5A shows three preselected reference position points Ref _ a, Ref _ b, Ref _ c, with three differently shaded triangles to represent the original pay positions in the form of hollow circles. Next, each payout position is replaced with the reference position point whose geographical position is closest, and as shown in fig. 5B, the shadow-filled state of each payout position is changed to be the same as its corresponding reference position point. After the payment location is marked with the closest reference location points, the geographic coordinates within the original payment record may be replaced with these reference location points. Here, the steps S110 and S120 are optional, that is, even if the steps S110 and S120 are not executed, the original payment position is stored in the payment record, and the subsequent steps S200 and S300 can still be executed on the original payment position.

Step S200: a usual pattern in the position data is detected to generate a plurality of representative sequences. After the data collection and pre-processing stages described above, the payment record may be converted into payment sequences (sequences), each payment Sequence including a Sequence of items (Item) representing the entire payment trajectory of a particular user at a particular time, the items in the payment Sequence may be reference location points as shown in fig. 5A and 5B, and an exemplary payment Sequence may be { id _ 677: ref _ h, Ref _ c, Ref _ c }. From the plurality of payment sequences, a Sequential Pattern detection (Sequential Pattern Mining) algorithm, such as a PrefixSpan or Generalized Sequential Pattern (GSP) algorithm, may be used to find the usual Pattern in the payment sequence. After sequential pattern detection, a representative sequence for a specific time and its corresponding Support Count (Support Count) can be obtained, which represents the number of occurrences, and can be calculated in the detection algorithm. Each representative sequence includes at least one line segment between a start position point and an end position point, which may be precise positions or reference position points. For example, a payment data corresponding to the traffic type in the payment record can be considered as a line segment between the departure location and the arrival location. One example of a representative sequence is < Ref _ a, Ref _ d, Ref _ e >, which includes two segments, one from Ref _ a to Ref _ d and the other from Ref _ d to Ref _ e.

In step S200, the payment records may be sorted according to uid and date and time, for example, transactions corresponding to the same user device may be grouped together, and transaction data corresponding to the user device may be sorted according to time sequence. Further, the respective location points may be labeled with the closest reference location point, as shown in fig. 5B. FIG. 6 is a diagram illustrating a consolidated and simplified payment record according to one embodiment of the invention. In this example, a transaction sequence may be formed for the uid 604: < Ref _ a, Ref _ d, Ref _ e >, and another transaction sequence may be formed for uid 677: < Ref _ h, Ref _ c, Ref _ c >, and then a sequential pattern detection algorithm may be applied to the plurality of transaction sequences to find the usual pattern.

FIG. 7 is a flow chart illustrating detecting a pattern in position data to generate a representative sequence according to one embodiment of the invention. The step S200 may include steps S210, S220, and S230, which are performed after the pattern detection. In step S210, if there is a conventional pattern with only a single location point, the conventional pattern is removed. In each usual mode, the same adjacent position points are removed in step S220. The conventional patterns representing stay-in-place, such as < Ref _ a > and < Ref _ a, Ref _ a >, are excluded because of the method of finding the movement behavior of the population. In addition, the usual pattern including at least two identical adjacent location points, e.g., < Ref _ h, Ref _ c, Ref _ c >, also excludes repeated adjacent location points. The representative sequences thus obtained do not include the same adjacent position points, and each line segment in each representative sequence represents the direction of the movement of the crowd.

In step S230, the usual pattern over several days is aggregated to produce a representative sequence. The collected payment records may be sorted by time and by date, since the trajectory of the movement of the crowd may not be the same in different time segments. For example, 6: 30am to 9: 30am, 10: 30am to 13: 30am, 4: 30 pm-7: the three periods of 30pm may correspond to different activities of the population. If the statistical analysis for a specific time segment needs to be accumulated for multiple days, the conventional patterns of multiple days corresponding to the time segment can be aggregated. FIG. 8 is a schematic diagram illustrating a conventional mode of polymerization according to an embodiment of the present invention. In this example, period 6 of all working days in May: 30am to 9: representative sequences among 30am are aggregated and the numbers in the table represent the support of the representative sequences. As shown in fig. 8, the sequence < Ref _ d, Ref _ e > for each working day is accumulated, and the support of the sequence is 5750. The number 23/23 is the Occurrence Rate (occurence Rate) representing that this sequence < Ref _ d, Ref _ e > occurs 23 times in 23 working days of may. By aggregating statistics over multiple days, the movement trend of most people can be found out more accurately.

And step S300, classifying the representative sequences into sets according to the sequence distance between the representative sequences so as to find out the movement behaviors of the crowd. After steps S100 and S200 are performed, representative sequences about the movement behavior of the crowd are found, and then similar representative sequences can be classified into sets to find the movement behavior of the crowd in a larger range area. This is beneficial because most of the crowd's movement activities of interest are with respect to movement between two large areas, rather than movement between two particular buildings. The sequence distance between two representative sequences can be calculated at step S300 to determine the similarity of the two representative sequences. For example, a shorter sequence distance represents a higher degree of similarity (e.g., geographically closer) between the two representative sequences. The representative sequences can be classified into sets according to the sequence distances thus calculated.

An example of calculating the sequence distance is described below. A representative sequence may be considered a segment of a sequence, and in the illustrated example, the representative sequence includes a first sequence Seq _ a and a second sequence Seq _ b. The first sequence Seq _ a includes a first line segment L1, the first line segment L1 being between a first start position point L1_ s and a first end position point L1_ e. The second sequence Seq _ b includes a second line segment L2, the second line segment L2 being between a second start position point L2_ s and a second end position point L2_ e. The sequence distance between the first sequence Seq _ a and the second sequence Seq _ b is determined according to the segment distance between the first segment L1 and the second segment L2. The first line segment L1 has directivity (from the first start point L1_ s to the first end point L1_ e), and the second line segment L2 also has directivity (from the second start point L2_ s to the second end point L1_ e). Therefore, in the following description, vectors will be used

And

respectively represent a first line segment L1 and a second line segment L2.

FIG. 9 is a flow chart illustrating a process of calculating a sequence distance between representative sequences according to one embodiment of the invention. The first sequence Seq _ a and the second sequence Seq _ b form a sequence pair in the representative sequence, and for each sequence pair, the step S300 (classifying the representative sequence into a set) may further include the steps S310 and S320. In step S310, a segment distance between the first segment L1 and the second segment L2 is calculated, the segment distance representing the degree of closeness or similarity of the two segments. In step S320, a sequence distance between the first sequence Seq _ a and the second sequence Seq _ b is determined according to a segment distance between the first segment L1 and the second segment L2, in other words, a similarity between two representative sequences is determined according to a similarity between segments respectively possessed by the two representative sequences.

FIG. 10 is a flowchart illustrating a process of calculating a line distance between a first line segment and a second line segment according to an embodiment of the invention. Step S310 may include the following steps. Step S311, calculate the angular distance d between the first line segment L1 and the second line segment L2_θ(Angle Distance), vertical Distance d_⊥(Perpendicular Distance), and parallel Distance d_||(ParallelDistance). Step S312, according to the angular distance d_θPerpendicular distance d_⊥And a parallel distance d_||Calculating the Normalized angular distance Nd_θNormalized vertical distance Nd_⊥And normalized parallel distance Nd_||Wherein the normalized angular distance Nd_θNormalized vertical distance Nd_⊥And normalized parallel distance Nd_||Are within the same range of values. Step S313, according to the normalized angular distance Nd_θNormalized vertical distance Nd_⊥And normalized parallel distance Nd_||Determines the segment distance between the first segment L1 and the second segment L2. An example description is provided below regarding the calculation of the line segment distance between two line segments.

FIG. 11 is a schematic diagram illustrating a distance between two segments according to an embodiment of the invention. The line segment distance is determined by three components: angular distance d_θPerpendicular distance d_⊥And a parallel distance d_||. Angular distance d_θIs related to

And

the included angle theta (theta is more than or equal to 0 and less than or equal to 180 degrees) between the two parts. For example, the included angle θ can be according to the formula

Is calculated, wherein

Is the inner product of two vectors (dot product),

and

representing the length of both vectors. Angular distance d_θCan be calculated according to the following equation (1):

angular distance d_θRepresenting the similarity of two vectors in the pointing direction, the smaller the angle theta, the smaller the angular distance d_θThe smaller. When the included angle theta is larger than 90 degrees, the two vectors point to opposite directions substantially, and the angular distance d is obtained_θThe maximum possible value of the angular distance Domain (Domain) can be set to indicate that the two vectors are not directionally similar.

FIG. 12 is a flowchart illustrating a process of calculating a parallel distance between a first line segment and a second line segment according to an embodiment of the invention. Step S311 includes the following steps. In step S331, the second start position point L2_ S is projected on the extension line of the first line segment L1 to obtain a third start projection point L3_ S. In step S332, the second end position point L2_ e is projected on the extension line of the first segment L1 to obtain a third end projection point L3_ e. Step S333, connecting the third start projection point L3_ S and the third end projection point L3_ e to generate a third line segment L3, such that the generated third start projection point L3_ S, third end projection point L3_ e and third line segment L3 are shown in fig. 11. In step S334, the Intersection (interaction) of the first line segment L1 and the third line segment L3 is subtracted from the Union (Union) of the first line segment L1 and the third line segment L3 to determine the parallel distance d_||. Parallel distance d_||Can be calculated according to the following equation (2):

d_||＝L1∪L3-L1∩L3 (2)

since the third line segment L3 is formed by connecting the second line segment L2 is projected to the first segment L1, so that the third segment L3 is Collinear with the first segment L1 (Collinear). In the example shown in fig. 11, the union of the first line segment L1 and the third line segment L3 is the length from the first start position point L1_ s to the first end position point L1_ e, and the intersection of the first line segment L1 and the third line segment L3 is the length from the third start position point L3_ s to the third end position point L3_ e. In this example, the second line L2 is projected to the first line L1, and in other embodiments, the first line L1 is projected to the (extended) second line L2 to obtain the parallel distance d_||(the calculated values may be different). Parallel distance d_||Representing the degree of similarity between the equivalent parallel lengths of the two line segments.

Perpendicular distance d_⊥Can be calculated according to the following equation (3):

wherein l_⊥sIs the Euclidean Distance (Euclidean Distance) between the second starting position point L2_ s and the third starting position point L3_ s, L_⊥eIs the euclidean distance between the second end position point L2_ e and the third end position point L3_ e. Formula (3) represents l_⊥sAnd l_⊥eInverse harmonic Mean (contharmonic Mean) of (c).

FIG. 13 is a schematic diagram illustrating a distance between two segments according to an embodiment of the invention. The angular distance d can likewise be calculated according to the equations (1), (2) and (3), respectively_θParallel distance d_||Perpendicular distance d_⊥. In this example, the included angle θ is greater than 90 °, so the angular distance d_θIs equal to

As for the parallel distance d_||The union of the first line segment L1 and the third line segment L3 is the length from the first start position point L1_ s to the third end position point L3_ e, and the intersection of the first line segment L1 and the third line segment L3 is the length from the third start position point L3_ s to the first end position point L1_ e.

As mentioned above, when countingThree components are considered simultaneously when calculating the distance of the line segment. However, since the value ranges of these three components may be very different, it is not easy to obtain a meaningful combination from these three components. In the method of the present invention, a normalized angular distance Nd is calculated in step S312_θNormalized parallel distance Nd_||And normalized vertical distance Nd_⊥Wherein the normalized angular distance Nd_θNormalized parallel distance Nd_||And normalized vertical distance Nd_⊥Within the same value range, e.g. [0, 1 ]]，[0，1]Represents a range of 0 to 1 inclusive. Since the values of the three normalization components are in the same range of values, a linear combination of the three normalization components is meaningful for calculating the line segment distance between two line segments. In one embodiment, the line segment distance is a normalized angular distance Nd_θNormalized parallel distance Nd_||And normalized vertical distance Nd_⊥The line segment distance may be calculated according to the following equation (4):

distance of line segment w₁×Nd_θ+w₂×ND_||+w₃×ND_⊥Wherein

For example, w1, w2, w3 may all be equal to

To obtain a normalized angular distance Nd_θNormalized parallel distance Nd_||And normalized vertical distance Nd_⊥Average value of (a).

FIG. 14 is a flowchart illustrating the calculation of the normalized angular distance, the normalized vertical distance, and the normalized parallel distance according to one embodiment of the present invention. Step S312 may include the following steps. Step S341, determine the angular distance d_θDivided by the maximum value of the angular distance field to obtain the normalized angular distance Nd_θ. Step S342, determine the vertical distance d_⊥Divided by the maximum value of the vertical distance field to obtain the normalized vertical distance Nd_⊥. Step S343, the parallel distance d_||Dividing by the maximum value of the parallel distance field to obtain the normalized parallel distance Nd_||. Since all three normalized distances are generated by dividing by the maximum value in the respective distance field, the values of all three normalized distance components are [0, 1 ]]And (3) a range.

As shown in equation (1), the maximum value of the angular distance domain is the length of the shorter one of the first line segment L1 and the second line segment L2. As shown in equation (2), the maximum value of the parallel distance domain is the union of the first line segment L1 and the third line segment L3. The maximum value of the vertical distance domain is not easily seen directly from equation (3), and its correlation calculation is explained below.

FIGS. 15A-15D are schematic diagrams illustrating various scenarios considering the maximum vertical distance domain according to an embodiment of the invention. According to equation (3) and the geometric relationship between the first line segment L1 and the second line segment L2, the maximum value of the vertical distance domain occurs

Perpendicular to

Then (c) is performed. Thus, in one embodiment, the second line segment L2 may be rotated about the second start position point L2s or about the second end position point L2e until perpendicular to the first line segment L1

The maximum value of the vertical distance field is the vertical distance between the first line segment L1 and the rotated second line segment. Fig. 15A to 15D show four possible rotation scenarios. The maximum value of the vertical distance field is the largest vertical distance among the four possible cases, and can be calculated according to the following equation (5):

wherein l₂Represents the length of the second line segment L2.

The line segment distance between two line segments can be obtained according to the above calculation procedure, and the sequence distance between two representative sequences can be determined according to the line segment distance between the line segments respectively possessed by the two representative sequences. FIG. 16 is a flowchart illustrating a process of determining a sequence distance between a first sequence and a second sequence according to an embodiment of the invention. Step S320 includes the following steps. In step S321, a plurality of Mapping combinations are generated between the first sequence Seq _ a and the second sequence Seq _ b according to at least one segment of the first sequence Seq _ a and at least one segment of the second sequence Seq _ b. In step S322, the mapping distance of each mapping combination is calculated. Step S333, the minimum mapping distance in each mapping combination is used as the sequence distance between the first sequence Seq _ a and the second sequence Seq _ b.

The first sequence Seq _ a may be a sequence in which a plurality of line segments are arranged in time sequence, and may include, for example, two line segments LineSega1 and LineSega2, and the movement locus represented by the line segment LineSega1 is earlier than the movement locus represented by the line segment LineSega 2. FIGS. 17A-17C are diagrams illustrating a combination of multiple mappings between two representative sequences, according to one embodiment of the invention. In this example, the second sequence Seq _ b also includes two line segments LineSegb1 and LineSegb2 arranged according to a time sequence.

In fig. 17A, the line segment maps LineSegb1 to a null line segment phi, line segment LineSegb2 maps to line segment 1, and line segment LineSega2 maps to a null line segment phi. It is noted that the chronological order of the representative sequences still maintains the original chronological order. Fig. 17B and 17C show different mapping combinations, respectively, where the time sequence of each representative sequence is also maintained as the original time sequence. Fig. 18 shows an example diagram of an invalid mapping between two representative sequences, which violates chronological order because line LineSega2 (mapped to line LineSegb1) occurs later than line LineSega1 (mapped to line LineSegb2), whereas line LineSegb1 occurs earlier than line LineSegb 2. For each valid mapping combination (as shown in FIGS. 17A-17C), a mapping distance may be calculated. The sequence distance between the first sequence Seq _ a and the second sequence Seq _ b may be a minimum image distance among the image combinations.

FIG. 19 is a flow chart illustrating a process of calculating the mapping distance of the mapping assembly according to an embodiment of the invention. Step S322 includes the following steps. In step S351, a plurality of Mapping pairs (Mapping pairs) are formed between at least one line segment in the first sequence Seq _ a and at least one line segment in the second sequence Seq _ b according to the time sequence. Step S352 calculates the line segment distance for each mapping pair. Step S353, calculating an average value of the line segment distances of each mapping pair to obtain the mapping distance.

Referring to fig. 17A, the image combination in this example includes three image pairs: { φ, LineEgb 1}, { LineEga 1, LineEgb 2}, and { LineEga 2, φ }. The line segment distance for each image pair may be calculated according to the line segment distance calculation method described above (including calculating three regular distance components, steps S311 to S313, and equations (1) to (5)). And the segment distance between a real segment and an empty segment phi may be defined as 1 (the maximum possible value of the segment distance field). The mapping distance of this mapping combination shown in fig. 17A may be the average of the line segment distances of the three mapping pairs. For example, the mapping distance of the mapping assembly is equal to

Where Nd represents the line segment distance between two line segments. Similarly, there are two mapping pairs in FIG. 17B, and the mapping distance may be the average of the line segment distances of the two mapping pairs.

In the manner of calculation described above, the sequence distance between two representative sequences can be calculated. FIG. 20 is a flow chart illustrating the classification of representative sequences into sets according to one embodiment of the invention. Step S300 includes the following steps. Step S360, each representative sequence is taken as a set. In step S370, a set distance between each set pair is calculated, and the set pair is formed by two sets. In step S380, the first set and the second set with the minimum set distance are found. In step S390, if the minimum set distance is smaller than the distance threshold, the first set and the second set are merged.

In this embodiment, an aggregate hierarchical clustering (aggregation hierarchical clustering) method may be applied. In the initial state, each representative sequence is considered as a set. Then, the set distance between each set pair (formed by two sets, the initial state is two representative sequences) can be calculated, and the set distance can be calculated according to the methods of steps S351 to S353 (since in the initial state, it is equivalent to calculating the sequence distance of two representative sequences). Two sets with the minimum set distance are found and merged into one larger set if the minimum set distance is less than a distance threshold, e.g., 0.3. The process may then return to step S370 to repeatedly perform the merge set. After merging, some sets will have a plurality of representative sequences, and for a set pair formed by a set having a plurality of representative sequences, the average of the sequence distances of All representative sequence pairs (All-pair chains) in the set pair, which are formed by one representative sequence in each of the two sets of the set pair, can be calculated as the set distance of the set pair. For example, if the set G1 has two representative sequences and the set G2 has three representative sequences, the set distance between the set G1 and the set G2 may be an average of the sequence distances of 6 representative sequence pairs, that is, 2 × 3.

In one embodiment, a method of finding a canonical Sequence (Typical Sequence) is provided. FIG. 21 is a flowchart illustrating the steps of finding the movement behavior of a crowd and finding a typical sequence according to an embodiment of the invention. Compared with the flowchart of fig. 2, fig. 21 further includes step S410 and step S420. In step S410, dates are classified into a plurality of date types. For example, dates may be categorized as weekdays and holidays, and workdays may be further categorized as the last workday before a single-day holiday, the last workday before at least a two-day holiday, the first workday after a single-day holiday, and so forth. Similarly, holidays can be further categorized as single-day holidays, first holidays of at least two-day holidays, last holidays of at least two-day holidays, and the like. Step S420, finding out a typical sequence of the target date type according to the occurrence rate of the representative sequence in the target date type. As shown in fig. 8, after aggregating data for several days, the occurrence rate of the representative sequence for a specific date type can be obtained, and from the occurrence rate, a typical sequence for a specific date type can be found. For example, on the last holiday of at least a two-day holiday, a typical sequence that may be found is a movement trajectory between two train stations.

FIG. 22 is a flowchart illustrating an exemplary sequence for finding a target date type according to one embodiment of the present invention. Step S420 includes the following steps. In step S421, a first occurrence rate of the test representative sequence on a day belonging to the target date type is calculated. Step S422, calculating a second occurrence rate of the test representative sequence on days not belonging to the target date type. In step S423, a statistical Entropy (control) is calculated based on the first and second occurrence rates. In step S424, if the first occurrence rate is greater than the probability threshold and the statistical entropy is less than the entropy threshold, the test representative sequence is determined to be the typical sequence.

The occurrence rate in step S421 can be obtained after step S230 is executed (as shown in fig. 8, the data for several days is aggregated), and the calculation performed in steps S421 to S424 is described as an example below. The target date type in this example is the first holiday of at least two-day holidays, denoted by class H. On the other hand, class (all-H) indicates a holiday not belonging to class H. Table one below lists two representative sequences, and the corresponding occurrences of the two representative sequences.

Watch 1

The frequency of occurrence represents the number of times this sequence occurs on these days, for example sequence R1 occurs on a total of 41 days in 56 days belonging to class H and 2 days in 128 days belonging to class (all-H). In this example, the probability threshold Pth and the entropy threshold Sth of step S424 are equal to 0.2 and 0.6, respectively. The statistical entropy of step S423 may be calculated according to the following equation (6):

wherein p is_iIs the probability (6) of the sequence at class i

The statistical entropy S1 of the sequence R1 is equal to 0.1 according to equation (6)

A larger entropy (greater randomness) indicates that the probability Distribution is closer to the Uniform Distribution (uniformity Distribution), while a smaller entropy indicates that the probability Distribution is biased toward one end. In the above example, if the probability distribution is biased towards class H, then the sequence can be considered as a typical sequence in class H date. In step S424, since the first occurrence (41/56) is greater than the probability threshold Pth and the statistical entropy S1 is 0.1 less than the entropy threshold Sth, the sequence R1 can be determined as a typical sequence in class H date. Similarly, the statistical entropy S2 of the sequence R2 can also be calculated according to equation (6), and S2 is 0.69, and since the statistical entropy S2 is greater than the entropy threshold Sth, the sequence R2 is not a typical sequence on class H. As shown in Table one, the first occurrence rate (28/56) of sequence R2 on the day of class H is similar to the second occurrence rate (50/128) on the day of class (all-H), meaning that sequence R2 is not particularly likely to occur in which type of day, and thus sequence R2 is not a typical sequence. After finding a plurality of typical sequences of a date type, the plurality of typical sequences can be further classified into sets, and the classification method can be as shown in steps S360, S370, S380, S390.

According to the method for finding out the crowd movement behavior of the embodiment of the invention, the crowd movement behavior track can be found out by collecting the position data acquired from the user device, wherein the user device is a smart card. The payment service provider who issues the smart card can estimate the number of cardholders in a specific geographic area, correspondingly design a marketing and advertising plan, determine the position of opening a new storefront and the like according to the obtained crowd movement behavior track. Further, since the embodiment of the present invention can find out typical sequences of specific date types, the payment service provider can plan and organize activities belonging to different date types according to the typical sequences of different date types.

While the invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention is subject to the protection scope of the claims.

Claims

1. A method of finding a movement behavior of a population, comprising:

collecting a plurality of location data regarding a plurality of user devices;

detecting a plurality of conventional patterns in the position data to generate a plurality of representative sequences, wherein the representative sequences comprise a first sequence and a second sequence, each representative sequence comprises at least one line segment, and the at least one line segment is between a starting position point and an ending position point; and

generating a plurality of mapping combinations between the first sequence and the second sequence according to the at least one line segment in the first sequence and the at least one line segment in the second sequence;

calculating the mapping distance of each mapping combination; and

taking the minimum mapping distance in each mapping combination as the sequence distance between the first sequence and the second sequence;

according to a plurality of sequence distances among the representative sequences, classifying the representative sequences into a plurality of sets so as to find out the movement behaviors of the crowd.

2. The method of claim 1, wherein:

the first sequence comprising a first line segment between a first start location point and a first end location point; and

the second sequence comprises a second line segment between a second starting position point and a second ending position point;

wherein the first sequence and the second sequence form pairs of the representative sequences, and for each pair of the representative sequences, the step of classifying the representative sequences into the sets further comprises:

and calculating the line segment distance between the first line segment and the second line segment.

3. The method of claim 2, wherein the step of calculating the line segment distance between the first line segment and the second line segment further comprises:

calculating the angular distance, the vertical distance and the parallel distance between the first line segment and the second line segment;

calculating a normalized angular distance, a normalized vertical distance, and a normalized parallel distance according to the angular distance, the vertical distance, and the parallel distance, wherein the values of the normalized angular distance, the normalized vertical distance, and the normalized parallel distance are within the same range of values; and

determining the line segment distance between the first line segment and the second line segment according to the weighted sum of the normalized angular distance, the normalized vertical distance, and the normalized parallel distance.

4. The method of claim 3, wherein the step of calculating the parallel distance between the first line segment and the second line segment further comprises:

projecting the second initial position point on an extension line of the first line segment to obtain a third initial projection point;

projecting the second ending point on the extension line of the first line segment to obtain a third ending point;

connecting the third initial projection point and the third end projection point to generate a third line segment; and

subtracting the intersection of the first line segment and the third line segment from the union of the first line segment and the third line segment to determine the parallel distance.

5. The method of claim 4, wherein the step of calculating the normalized angular distance, the normalized vertical distance, and the normalized parallel distance further comprises:

dividing the angular distance by the maximum value of the angular distance domain to obtain the normalized angular distance;

dividing the vertical distance by the maximum value of the vertical distance field to obtain the normalized vertical distance; and

the normalized parallel distance is obtained by dividing the parallel distance by the maximum value of the parallel distance field.

6. The method of claim 5, wherein a maximum of the angular distance domain is a length of a shorter one of the first line segment and the second line segment, a maximum of the parallel distance domain is the union of the first line segment and the third line segment, and a maximum of the vertical distance domain is a vertical distance between the first line segment and a rotated line segment obtained by rotating the second line segment around the second start position point or around the second end position point until being perpendicular to the first line segment.

7. The method of claim 1, wherein the step of calculating the image distance for each image combination further comprises:

forming a plurality of mapping pairs between the at least one line segment in the first sequence and the at least one line segment in the second sequence according to a time sequence;

calculating the line segment distance of each mapping pair; and

and calculating the average value of the line segment distances of each mapping pair to obtain the mapping distance.

8. The method of claim 1, wherein the location data is collected when the user devices are used for payment activities.

9. The method of claim 1, wherein the step of collecting the location data about the plurality of user devices further comprises:

selecting a plurality of reference location points; and

and replacing each position point in the position data with one of the reference position points closest to the geographical position of each position point.

10. The method of claim 1, wherein the step of detecting the conventional patterns in the position data to generate the representative sequences further comprises:

removing the conventional pattern having only a single location point from the conventional patterns;

in each of the usual modes, removing the same adjacent location points; and

several days of the conventional patterns were polymerized to generate the representative sequences.

11. The method of claim 1, wherein the step of classifying the representative sequences into the sets further comprises:

taking each representative sequence as a set;

selecting two sets from the sets to form a set pair, and calculating a set distance between the set pairs;

finding a first set and a second set with minimum set distances; and

if the minimum set distance is less than a distance threshold, the first set and the second set are merged.

12. The method of claim 1, further comprising:

dividing dates into a plurality of date types; and

and finding out the typical sequence of the target date type according to the occurrence rate of the representative sequences in the target date type.

13. The method of claim 12, wherein the step of finding the representative sequence of the target date type further comprises:

calculating a first occurrence rate of the test representative sequence on days belonging to the target date type;

calculating a second occurrence rate of the test representative sequence on days not belonging to the target date type;

calculating statistical entropy according to the first occurrence rate and the second occurrence rate; and

and if the first occurrence rate is greater than a probability threshold and the statistical entropy is less than an entropy threshold, determining the test representative sequence as the typical sequence.