CN111970685A - One-person multi-card identification method in big data environment - Google Patents
One-person multi-card identification method in big data environment Download PDFInfo
- Publication number
- CN111970685A CN111970685A CN202011142356.0A CN202011142356A CN111970685A CN 111970685 A CN111970685 A CN 111970685A CN 202011142356 A CN202011142356 A CN 202011142356A CN 111970685 A CN111970685 A CN 111970685A
- Authority
- CN
- China
- Prior art keywords
- time
- space
- sequences
- card
- mobile communication
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W8/00—Network data management
- H04W8/18—Processing of user or subscriber data, e.g. subscribed services, user preferences or user profiles; Transfer of user or subscriber data
- H04W8/183—Processing at user equipment or user record carrier
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/907—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/909—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W8/00—Network data management
- H04W8/22—Processing or transfer of terminal data, e.g. status or physical capabilities
- H04W8/24—Transfer of terminal data
- H04W8/245—Transfer of terminal data from a network towards a terminal
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Library & Information Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
The invention provides a one-person multi-card identification method in a big data environment. The invention fully utilizes the communication records between the existing mass handheld mobile communication equipment and the fixed sensor, designs a comparison algorithm and a consistency check standard, can not only arrange target data automatically and conveniently at low cost and identify a plurality of card numbers belonging to one person, but also effectively extract actual individuals in space from a large number of communication records, reduces the influence of one person and a plurality of cards on the overall statistics, and can more accurately acquire the trip time-space sequence of the individuals through the fusion of the time-space tracks of the plurality of card numbers, thereby providing a more reliable data base for the analysis of other large spatial data.
Description
Technical Field
The invention relates to a method for identifying a state of one person holding multiple cards based on mass mobile communication data, and belongs to the technical field of big data analysis.
Background
In recent years, with the development of information technology, the data information amount is increased explosively, the data sources are more and more, and the data amount is also more and more huge. Data recorded by information sensors such as mobile phones, WIFI and the Internet of things become the most important data source in big data analysis, and relatively complete individual trip records of the data provide good data support for big data, especially for traffic big data analysis.
The sensor communication data is adopted to carry out research and analysis in the aspect of urban socioeconomic, so that the research can be refined to the individuals moving in the city more accurately, but the sample distribution still has larger deviation from the actual distribution, so that the bias of the sample distribution seriously affects the analysis result. Among them, the existence of one person and multiple cards brings great trouble to the research work of adopting mobile communication big data. One-person multi-card means that the same person transacts a plurality of card numbers under the same or different mobile communication operators. At present, the popular market of the high, middle and low-end mobile phones and the dual-card dual-standby mobile phones shows the universality of the phenomenon that one person holds two cards or even multiple cards, and if the phenomenon is not processed, a large amount of redundant space-time tracks can be generated in the process of analyzing the space-time movement of urban crowds, so that the deviation of an analysis result is caused. In 10 months in 2019, three operators in China move, communicate and communicate, the number of active users is 9.4359 hundred million, 3.22119 hundred million and 3.3253 hundred million respectively, 15.982 hundred million mobile phone users are counted, the number of national population at the end of 2018 is not more than 13.9538 hundred million, the number of mobile phone card numbers per person reaches 1.145, and the number of multi-cards in the same network and the number of multi-cards in different networks reach quite high degrees. However, there is a difficulty in identifying the one-person multi-card phenomenon, because each card number can independently communicate with a fixed sensor such as a base station and WIFI, even in the same mobile communication network, the space-time trajectory formed by the communication records of two cards and the fixed sensor can also be different, and the position difference of the fixed sensor exists between different network card numbers, so that the one-person multi-card phenomenon is more difficult to identify, and the criterion for determination is also lacked. Therefore, a certain algorithm is needed in the data preprocessing stage to calculate a set of index system for judging the phenomenon of one person with multiple cards, and IDentify multiple mobile communication card numbers held by the same person, so as to simplify and combine the original data, and meanwhile, based on the combined individual data, a multiple data weighted interpolation mode is adopted to further improve the accuracy of travel space-time trajectories of the individuals, and based on the accuracy, a mutual mapping table of a PID (Personal identity, individual identification, PID) and an operator card number is updated and maintained, so that the sampling processing is performed when the combined individual data is subsequently used for specific statistical analysis, that is, only the number of the PIDs is counted, and the influence of the one person with multiple cards on the overall statistics is reduced.
Disclosure of Invention
The invention aims to represent the movement of an individual on the space by utilizing an individual travel space-time trajectory formed by communication records between a mobile communication device and a fixed sensor, sampling and comparing travel space-time tracks of different card numbers in the same period, judging the similarity of the travel space-time tracks, constructing a consistency index for judging one person with multiple cards on the basis of the similarity, whether the travel tracks recorded by different mobile communication cards belong to the same person or not is identified and judged, for an individual with multiple cards, perfecting the travel space-time trajectory of the individual by using the communication records of the multiple cards, selecting one with the most perfect record from the multiple card numbers, based on the above, the mutual mapping table of the individual identification PID and the operator card number is updated and maintained, the method is convenient for the subsequent time sample processing for the specific statistical analysis, namely only the PID number is counted, and the influence of one person on the multi-card on the overall statistics is reduced.
In order to achieve the above object, the technical solution of the present invention is to provide a method for constructing a one-person multi-card recognition algorithm and a consistency check index thereof in a big data environment, comprising the following steps:
step 1, reading sensor data of an anonymous encryption mobile terminal obtained from a sensor operator, wherein the sensor data of the anonymous encryption mobile terminal are continuous in time and space, different one-way irreversible encryption card numbers of different mobile terminals correspond to different numbers, extracting a communication signaling record triggered by each card number in a specified time period to form a travel space-time communication record data set recorded by an individual through the card number, arranging the travel space-time communication record data set according to time, projecting the space position of the travel space-time communication record data set onto a map, and expanding sample points recorded by the individual through a space-time interpolation method according to equal time intervals on the basis of data registration to obtain a travel space-time trajectory taking each mobile communication card number as an identification mark;
step 2, in order to eliminate the influence of different time on the individual travel similarity judgment, dividing 24 hours per day into N time periods, intercepting a large number of individual travel space-time tracks with equal time intervals from a database aiming at the same time period, splitting the travel space-time tracks into 4 vectors according to space XY coordinates, calculating a correlation coefficient and a standard difference between every two vectors by adopting a Pearson product moment method, constructing an index for comparing the consistency of the vectors, calculating a consistency index of the randomly selected travel tracks at fixed time intervals in the fixed time period and a difference value of the Pearson product moments between different time periods, and obtaining a consistency check standard for judging whether the two tracks are consistent or not by taking the consistency check standard as a standard, namely a judgment basis for judging that the two tracks belong to the same individual;
step 3, selecting a mobile communication card number from the database as an object to be matched, acquiring a trip time-space sequence of the mobile communication card number, traversing the mobile communication record database, selecting other individual trip time-space sequences from the mobile communication record database to match the mobile communication card number, randomly selecting sequence segments with random time lengths from the time-space sequences, calculating the correlation of the two time-space sequences on the spatial positions, carrying out consistency check on the correlation, and judging whether the two time-space sequences are held by the same person;
step 4, traversing the whole database to obtain all other card numbers which are judged to be held by the same person as the initially selected mobile communication card number, marking the card numbers as held by the same person, identifying individual numbers PID for the card numbers, matching the card numbers with the travel time-space sequences of all other individuals in a database traversing manner, judging whether the card numbers are held by the same person, and if the condition that one card number is held by a plurality of persons at the same time occurs, namely the similarity between one card number and two mutually unmatched card numbers can be checked, attributing the card numbers to the mobile communication cards with larger similarity;
and 5, after traversing the database, after performing trip time-space sequence matching on all mobile communication card numbers, judging whether the mobile communication card numbers are in the condition of one person holding with other card numbers, inquiring communication records of a plurality of card numbers and a fixed sensor of each group of one person and a plurality of card numbers, changing the card number EPID (namely anonymous one-way encrypted global unique mobile terminal identification code, EncryPtion international mobile subscriber Identity, EPID) into an individual identification PID (proportional integral identification), mutually interpolating a plurality of trip time-space sequences into a mobile communication record set according to a time sequence, storing the new record set into the database as the communication record of the handheld mobile communication card and the fixed sensor, and jumping to the step 1.4 to perform time-space interpolation on the mobile communication card numbers to obtain a more precise individual trip time-space sequence.
Preferably, the step 1 comprises:
step 1.1, the system reads individual encrypted mobile terminal sensor data obtained anonymously from a sensor operator, wherein the anonymous encrypted mobile terminal sensor data is continuous in time and space, and the method comprises the following steps: unique number EPID of mobile communication card number used for individual and fixed sensor communication, communication action TYPE TYPE, communication action occurrence TIME TIME, regional REGIONENCE of fixed sensor where communication action occurs, fixed sensor specific number SENSORID;
step 1.2, one piece of anonymous encryption mobile terminal sensor data is a signaling record, each signaling record is decrypted, fields such as EPID, TYPE, TIME, REGIONCODE, SENSORID and the like in the record are read, the longitude and latitude coordinates of the record are inquired according to the fixed sensor number in the record, and the record is converted into a geographic space XY coordinate system;
step 1.3, inquiring all communication records of the mobile communication card within a specified time period according to the unique serial number EPID of the mobile communication card, and preliminarily constructing a preliminary individual trip space-time trajectory formed by an individual through a single mobile communication card number and a fixed sensor communication record;
step 1.4, performing space interpolation with equal time intervals on each individual trip space-time trajectory by adopting a space-time weighting interpolation method to obtain individual trip space-time sequences with equal time intervals, wherein the space-time sequences comprise space XY coordinates of individuals on each fixed time node, deleting communication records between an original mobile communication card and a fixed sensor, completely representing the movement of the individuals on space-time by the individual trip space-time sequences obtained by interpolation, and for the individual trip space-time trajectory with a single data source, the weight of each node is consistent, and for the individual trip space-time trajectory with multiple data sources, the weight of each node is determined by the recording density of the original data source of the single mobile communication card number in unit time:
in the formula, W represents the weight of a communication node from a mobile communication card number i, D is the mobile communication record density, T is a fixed time period, N is the mobile communication record number in the time period, and finally, a trip space-time sequence of an individual at equal time intervals Th in a specified time period T is obtained, wherein the sequence comprises T/Th +1 nodes, and each node comprises time and XY coordinate information.
Preferably, the step 2 includes:
step 2.1, constructing a space-time sequence consistency comparison index matrix M according to a large number of individual trip space-time sequences obtained in the step 1.4, wherein M is a 2 Xnxmx 3-order matrix and represents that under the condition that two EPIDs are in the same network or different networks, 24 hours a day is divided into n time periods, each time period has M node sampling quantities, and each sampling quantity has 3 levels of consistency indexes to restrict the similarity of the space-time sequences from different levels; the M matrix is 2 Xnxnxmx 3-order, and 2 shows that the M matrix distinguishes two conditions of the same network and different networks; n is the number of time periods divided according to 24 hours a day, and if the time period is 2 hours, n is equal to 12; m represents the number of sampling nodes in each time period, taking the space-time sequence node interval time of 2 minutes and the time period length of 2 hours as an example, the number of sampling nodes is distributed in an interval of 2 to 60, and m is equal to 59; 3 three consistency criteria representing the number of each sampling node in each time period, representing 95%, 99% and 99.9% confidence, respectively;
step 2.2, traversing the M matrix, and aiming at M (i, j), extracting the records with time periods at i positions and j nodes in number from a large number of individual trip space-time sequences in pairs, wherein the space positions of two space-time sequence segments extracted at a time form 4 row vectors: x1, Y1, X2, Y2, the total number of samples being N pairs, the correspondence between two row spatio-temporal sequence segments was calculated:
in the formula (I), the compound is shown in the specification,the consistency indexes of 4 vectors formed by the corresponding two travel space-time sequence fragments (X1, Y1) and (X2, Y2) are shown,for the Pearson product-moment values between travel spatio-temporal sequence segments, representing the similarity between the two segments,representing the numerical difference between the two vectors as the standard difference between the X value and the Y value of the corresponding time point positions of the two vectors;
step 2.3, counting the time-space sequence samples obtained by N pairsAverage value of (2)That is, the random consistency index under the condition of the node number j in the time period i, when two trip sequences completely coincide, the random segment between the two trip sequencesAndare all 0, andandare all 1, thenIs 0 and step 2.2 givesThen, the average consistency degree between the sequences is expressed under the condition that two space-time sequences are completely randomly obtained; the comparison standards of the consistency indexes of 3 levels expressed by the consistency index ratio are respectively,,So as to represent the threshold value required for the consistency to reach the significance degree of different P values in the process of comparing the space-time sequences, namely the consistency index of two sequencesIs less than or equal to,,The probability of the two sequences being inconsistent is less than or equal to 5%, 1% and 0.1%, respectively, which means that the probability of the two sequences passing the identity test is greater than 95%, 99% and 99.9%, respectively, i.e., the identity is significant at the 95%, 99% and 99.9% level, respectively、、;
Step 2.4, searching samples of corresponding time periods and corresponding sampling quantities from the mass data aiming at each element in the consistency index matrix M, repeating the steps 2.2 and 2.3 to calculate consistency index comparison values under 3 degrees of significance、、And the obtained M matrix is the standard for comparing the consistency of the space-time sequences of the subsequent trip.
Preferably, the step 3 comprises:
step 3.1, randomly selecting an EPID from a database as a card number to be matched, acquiring a travel space-time sequence C1 of the EPID in a specified time period, setting the number PID of a holder as P1, randomly selecting a time-continuous sequence segment from the EPID, and acquiring the time period t of the segment and the number n of nodes in the segment;
step 3.2, traversing the database to obtain a space-time sequence Ci, judging whether the card number of the traversed space-time sequence and the target card number are in the same network or not, and obtaining a sequence segment of the traversed space-time sequence in a time period t, wherein the node number of the segment is n because the travel space-time sequences of all individuals are at equal time intervals;
step 3.3, to、Andsetting the two sets of travel space-time sequences C1 and Ci as the travel of the same person for the two confidence intervalsThe judgment threshold values of the number of samples of the track are respectively N1, N2 and N3, and the number of samples of the consistency comparison result in the three confidence intervals is respectively S1, S2 and S3;
step 3.3, splitting the space XY coordinates of the two time sequence segments intercepted from C1 and Ci into 4 vectors, calculating the consistency between the two segments by adopting step 2.2, and checking the consistency with the consistency comparison index in the M matrix:
step 3.3.1, if the consistency index obtained by calculation is larger than the consistency comparison index of the time periodJumping to step 3.4;
step 3.3.2, if the consistency index is less than or equal toAnd is greater thanJumping to step 3.5;
step 3.3.3, if the consistency index is less than or equal toAnd is greater thanJumping to step 3.6;
and 3.3.4, if the consistency index is less than or equal to the consistency index, jumping to the step 3.7.
Step 3.4, abandoning the travel time-space sequence and traversing to the next sequence;
step 3.5,Adding 1 to the sampling number S1 of the interval, calculating that the sampling number S1 is more than or equal to N1, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.6,Adding 1 to the sampling number S2 of the interval, calculating that the sampling number S2 is more than or equal to N2, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.7,Adding 1 to the sampling number S3 of the interval, calculating that the sampling number S3 is more than or equal to N3, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.8, continuing to randomly extract fragments on the two travel time sequences, namely randomly selecting a time-continuous sequence fragment from C1, and acquiring the sequence fragments in the same time period from Ci, wherein the node numbers of the two fragments are the same;
and 3.9, for the two trip space-time sequences C1 and Ci judged to be owned by the same person, if the trip space-time sequence with the mobile communication card number EPID of Ci is associated with the holder number P1 of C1, the result shows that the space-time motion track recorded by the card number is owned by the individual with the number PID of P1, and the corresponding Relation between the EPID and the PID is stored in a data Table TR (relationship Table, Table of relationship, TR).
Preferably, the step 4 comprises:
step 4.1, traversing EPIDs of all mobile communication card numbers in the database, repeatedly executing the step 3, comparing the EPIDs with a travel space-time sequence of C1, calculating the consistency among the space-time sequences, identifying all card numbers held by individuals which can be judged as P1, and continuously storing the relationship between the card numbers and the individual numbers into an EPID-PID relational data table TR;
step 4.2, searching, traversing and comparing the whole database by taking each EPID in the database as a matching object, mining the matching relation among all mobile communication card numbers, and storing the matching relation into a data table TR;
step 4.3, traversing the data table TR, and searching for the situation that the same trip time-space sequence is held by multiple people, that is, the same trip time-space sequence Cj of the EPID belongs to multiple different PIDs, which also means that the consistency requirement is met by comparing the Cj with multiple trip time-space sequences (such as Ck, Cl, Cm and the like) belonging to different PIDs, then:
step 4.3.1, repeating the steps 3.3-3.9, traversing to obtain a travel space-time sequence (Ck, Cl, Cm and the like) belonging to the same PID (including different PIDs) by taking the Cj as an object to be matched, sampling and calculating consistency on the space-time sequence again, and rechecking the consistency of the space-time sequence based on a consistency comparison matrix M;
4.3.2, after the comparison and the inspection are carried out again, if the comparison consistency requirement between the time-space sequence of the trip such as Ck, Cl, Cm and the like and the time-space sequence of the trip cannot be met with the Cj is generated, deleting the membership relation between the PID and the Cj of the holder from the data table TR;
step 4.3.3, repeatedly executing steps 4.3.1 and 4.3.2 until only one trip space-time sequence which meets the consistency with Cj is left, or the iteration number NC is reached, if the iteration number NC is reached and the consistency between more than 1 trip space-time sequences (such as Cp, Cq, Cr and the like) belonging to different PID and Cj can be checked, calculating the cumulative consistency index sum between Cj and Cp, Cq and Cr(The consistency index value representing the n-th sampling comparison between the space-time sequences Cj and Cp) is selected to be the smallest value, the relation of the selected value and the Cj which belongs to the same PID is kept, and the membership relation between the Cj and other PIDs is deleted.
And 4.4, repeatedly executing the step 4.3 until the condition that one EPID belongs to a plurality of PIDs does not exist in the data table TR.
Preferably, the step 5 comprises:
step 5.1, traversing the TR data table, inquiring the PID with a plurality of mobile communication card numbers, and recording the EPIDs of all the mobile communication cards to which the PID belongs;
step 5.2, according to the obtained EPIDs, respectively inquiring the communication records of the EPIDs and the fixed sensors in a database, and sequencing the records according to time to form a plurality of pieces of travel space-time trajectory data;
step 5.3, determining weights of a plurality of trip space-time trajectory data by taking time as an order based on different data densities recorded by different mobile communication card numbers, mutually interleaving the weights to construct a new space-time trajectory, and acquiring an individual trip space-time sequence with equal time intervals based on a plurality of card numbers and fixed sensor communication records by adopting the space weighting interpolation method of the step 1.4, wherein the trip space-time sequence is based on the communication records of the card numbers, has higher node density, and can accurately calculate the position of an individual at each fixed time node for a space-time interpolation algorithm;
and 5.4, recalculating the travel space-time sequence of all the PIDs with the plurality of card numbers, storing the calculation result into a database, and providing a data basis for other subsequent analysis based on mobile communication big data.
The invention extracts the travel space-time trajectory of each mobile communication card number in a designated time period based on the communication record between the handheld mobile equipment and the fixed sensor, interpolates the travel space-time trajectory into individual travel space-time sequences with equal time intervals by adopting a space-time weighting interpolation method, by large sample random sampling, random consistency indexes between two travel time-space sequences under different conditions are constructed, a comparison matrix of the time-space sequence consistency is formed to be used as a check standard of the time-space sequence consistency, through traversing the consistency degree between the travel time-space sequences to which the two mobile communication card numbers belong, whether the mobile communication card numbers belong to the same individual is judged, on the basis of identifying one person with multiple cards, multiple trip space-time tracks under the multiple card numbers are combined and interpolated to obtain a more accurate individual trip space-time sequence, and the more accurate individual trip space-time sequence is stored in a database to provide a basis for other data analysis.
The invention has the advantages that: the method has the advantages that the communication records between the existing massive handheld mobile communication equipment and the fixed sensor are fully utilized, a comparison algorithm and a consistency check standard are designed, target data can be sorted conveniently and automatically at low cost, a plurality of card numbers belonging to one person are identified, actual space individuals are effectively extracted from a large number of communication records, and the travel space-time sequences of the individuals can be acquired more accurately through fusion of space-time tracks of the card numbers, so that a more reliable data base is provided for analysis of other large space data.
Drawings
FIG. 1 is a diagram of a one-person multi-card recognition method in a big data environment according to the present invention.
Detailed Description
In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings:
step 1, reading sensor data of an anonymous encryption mobile terminal obtained from a sensor operator, wherein the sensor data of the anonymous encryption mobile terminal are continuous in time and space, different mobile terminal card numbers correspond to different numbers, extracting a communication signaling record triggered by each card number in a specified time period to form a trip space-time communication record data set recorded by the individual through the card number, arranging the trip space-time communication record data set according to time, expanding sample points recorded by the individual through a space-time interpolation method according to equal time intervals, and obtaining a trip space-time track with each mobile communication card number as an identification mark.
The anonymous encryption mobile terminal sensor data is encrypted position information of an anonymous mobile phone user time sequence obtained by an operator from a mobile communication network, a fixed broadband network, wireless WIFI, a position service related APP and the like in real time and subjected to desensitization encryption, and the content comprises the following steps: EPID, TYPE, TIME, REGIONCODE, SENSORID, as follows:
the EPID is an anonymous one-way encryption global unique mobile terminal identification code, and is used for carrying out one-way irreversible encryption on each mobile terminal user, so that each mobile terminal user is uniquely identified, the privacy information of the user number is not exposed, and the encrypted EPID of each mobile terminal user is required to keep uniqueness, namely the EPID of each mobile phone user is kept unchanged and is not repeated with other mobile phone users at any time.
TYPE, which is the TYPE of communication action related to the current record, such as internet access, call, calling and called, short message receiving and sending, GPS positioning, sensor cell switching, sensor switching, power on and power off, etc.
TIME is the TIME at which the communication operation related to the current record occurs, and is expressed in milliseconds.
The REGIONCODE and the sensor are sensor encryption position information in which the communication operation related to the current recording occurs. The number of the REGIONCODE, SENSORID sensor, wherein REGIONCODE represents the area where the sensor is located, and SENSORID is the number of the particular sensor.
Step 1.1, the system reads individual encrypted mobile terminal sensor data obtained anonymously from a sensor operator, wherein the anonymous encrypted mobile terminal sensor data is continuous in time and space, and the method comprises the following steps: unique number EPID of mobile communication card number used for individual and fixed sensor communication, communication action TYPE TYPE, communication action occurrence TIME TIME, regional REGIONENCE of fixed sensor where communication action occurs, fixed sensor specific number SENSORID;
step 1.2, one piece of anonymous encryption mobile terminal sensor data is a signaling record, each signaling record is decrypted, fields such as EPID, TYPE, TIME, REGIONCODE, SENSORID and the like in the record are read, the longitude and latitude coordinates of the record are inquired according to the fixed sensor number in the record, and the record is converted into a geographic space XY coordinate system;
step 1.3, inquiring all communication records of the mobile communication card within a specified time period according to the unique serial number EPID of the mobile communication card, and preliminarily constructing a preliminary individual trip space-time trajectory formed by an individual through a single mobile communication card number and a fixed sensor communication record;
in this example, the individual travel space-time trajectory extracted after decryption is shown in table 1.
TABLE 1 Individual travel space-time trajectories
EPID | REGIONCODE | SENSORID | TIME | X | Y |
…… | …… | …… | …… | …… | |
2045 | 2421 | 2134 | 2019-07-02 12:43:56 | 3821.451 | 4248.431 |
2045 | 2421 | 4543 | 2019-07-02 13:31:43 | 4734.123 | 2343.065 |
2045 | 2421 | 7864 | 2019-07-02 13:42:31 | 5238.195 | 6548.231 |
2045 | 2421 | 8562 | 2019-07-02 13:12:19 | 3436.568 | 6536.323 |
2045 | 2421 | 4563 | 2019-07-02 14:52:38 | 6944.031 | 6703.564 |
2045 | 2421 | 4322 | 2019-07-02 14:23:46 | 2390.699 | 6550.913 |
2045 | 2421 | 5643 | 2019-07-02 14:36:29 | 8438.617 | 7539.314 |
2045 | 2421 | 7652 | 2019-07-02 15:21:23 | 8769.642 | 7404.457 |
2045 | 2421 | 9645 | 2019-07-02 15:54:25 | 9134.123 | 3250.443 |
2045 | 2421 | 1424 | 2019-07-02 15:21:21 | 3269.245 | 4439.341 |
2045 | 2421 | 6423 | 2019-07-02 15:43:43 | 5419.432 | 4390.543 |
2045 | 2421 | 3563 | 2019-07-02 15:33:45 | 8653.534 | 2563.321 |
…… | …… | …… | …… | …… |
Step 1.4, performing space interpolation with equal time intervals on each individual trip space-time trajectory by adopting a space-time weighting interpolation method to obtain individual trip space-time sequences with equal time intervals, wherein the space-time sequences comprise space XY coordinates of individuals on each fixed time node, deleting communication records between an original mobile communication card and a fixed sensor, representing the movement of the individuals on space-time by the individual trip space-time sequences obtained by interpolation completely, and finally obtaining the trip space-time sequences with equal time intervals Th of the individuals in a specified time period T, wherein the sequences comprise T/Th +1 nodes, and each node comprises time and XY coordinate information.
In this example, the interpolated individual travel spatiotemporal sequences are shown in table 2.
TABLE 2 interpolated individual travel spatio-temporal sequences
EPID | TIME | X | Y |
…… | …… | …… | |
2045 | 2019-07-02 08:00:00 | 4232.453 | 6582.123 |
2045 | 2019-07-02 08:00:30 | 4236.542 | 6590.654 |
2045 | 2019-07-02 08:01:00 | 4230.123 | 6599.452 |
2045 | 2019-07-02 08:01:30 | 4224.453 | 6592.764 |
2045 | 2019-07-02 08:02:00 | 4218.764 | 6583.665 |
2045 | 2019-07-02 08:02:30 | 4218.699 | 6572.913 |
2045 | 2019-07-02 08:03:00 | 4210.642 | 6570.643 |
2045 | 2019-07-02 08:03:30 | 4206.754 | 6567.124 |
2045 | 2019-07-02 08:04:00 | 4193.386 | 6565.164 |
2045 | 2019-07-02 08:04:30 | 4194.824 | 6574.325 |
2045 | 2019-07-02 08:05:00 | 4206.623 | 6572.653 |
2045 | 2019-07-02 08:05:30 | 4207.114 | 6588.332 |
…… | …… | …… |
Step 2, in order to eliminate the influence of different time on the individual travel similarity judgment, dividing 24 hours per day into N time periods, intercepting a large number of individual travel space-time tracks with equal time intervals from a database aiming at the same time period, splitting the travel space-time tracks into 4 vectors according to space XY coordinates, calculating a correlation coefficient and a standard difference between every two vectors by adopting a Pearson product moment method, constructing an index for comparing the consistency of the vectors, calculating a consistency index of the randomly selected travel tracks at fixed time intervals in the fixed time period and a difference value of the Pearson product moments between different time periods, and obtaining a consistency check standard for judging whether the two tracks are consistent or not by taking the consistency check standard as a standard, namely a judgment basis for judging that the two tracks belong to the same individual;
step 2.1, constructing a space-time sequence consistency comparison index matrix M according to a large number of individual trip space-time sequences obtained in the step 1.4, wherein M is a 2 Xnxmx 3-order matrix and represents that under the condition that two EPIDs are in the same network or different networks, 24 hours a day is divided into n time periods, each time period has M node sampling quantities, and each sampling quantity has 3 levels of consistency indexes to restrict the similarity of the space-time sequences from different levels;
step 2.2, traversing the M matrix, and aiming at M (i, j), extracting the records with time periods at i positions and j nodes in number from a large number of individual trip space-time sequences in pairs, wherein the space positions of two space-time sequence segments extracted at a time form 4 row vectors: x1, Y1, X2 and Y2, the total number of samples is N pairs, and the consistency between the segments of the row space-time sequence is calculated
In the formula, r is the consistency index of 4 vectors formed by the corresponding two travel space-time sequence fragments (X1, Y1) and (X2, Y2),for the Pearson product-moment values between travel spatio-temporal sequence segments, representing the similarity between the two segments,representing the numerical difference between the two vectors as the standard difference between the X value and the Y value of the corresponding time point positions of the two vectors;
in this example, the XY coordinates of the two travel spatio-temporal sequences are shown in table 3:
TABLE 3 XY coordinates of two spatio-temporal sequences
TIME | X1 | Y1 | X2 | Y2 |
…… | …… | …… | …… | …… |
2019-07-02 08:00:00 | 4232.453 | 6582.123 | 5462.424 | 2234.542 |
2019-07-02 08:00:30 | 4236.542 | 6590.654 | 5458.335 | 2200.418 |
2019-07-02 08:01:00 | 4230.123 | 6599.452 | 5464.754 | 2244.408 |
2019-07-02 08:01:30 | 4224.453 | 6592.764 | 5476.094 | 2277.848 |
2019-07-02 08:02:00 | 4218.764 | 6583.665 | 5498.845 | 2323.343 |
2019-07-02 08:02:30 | 4218.699 | 6572.913 | 5498.592 | 2334.095 |
2019-07-02 08:03:00 | 4210.642 | 6570.643 | 5514.704 | 2340.905 |
2019-07-02 08:03:30 | 4206.754 | 6567.124 | 5514.704 | 2326.829 |
2019-07-02 08:04:00 | 4193.386 | 6565.164 | 5461.232 | 2318.989 |
2019-07-02 08:04:30 | 4194.824 | 6574.325 | 5468.422 | 2318.989 |
2019-07-02 08:05:00 | 4206.623 | 6572.653 | 5421.226 | 2320.661 |
2019-07-02 08:05:30 | 4207.114 | 6588.332 | 5422.208 | 2320.661 |
…… | …… | …… | …… | …… |
The calculated r value between the two sequences was 13061;
step 2.3, counting the average value of r obtained by N time-space sequence samplesThat is, the random consistency index under the condition of the node number j in the time period i, when two trip sequences completely coincide, the random segment between the two trip sequencesAndare all 0, andandboth are 1, then r is 0 and step 2.2 yieldsThen, the average consistency degree between the sequences is expressed under the condition that two space-time sequences are completely randomly obtained; the comparison standards of the consistency indexes of 3 levels expressed by the consistency index ratio are respectively,,So as to represent the threshold value required for the consistency to reach the significance degree of different P values in the process of comparing the space-time sequences, namely the consistency index of two sequencesIs less than or equal to,,The probability of the two sequences being inconsistent is less than or equal to 5%, 1% and 0.1%, respectively, which means that the probability of the two sequences passing the identity test is greater than 95%, 99% and 99.9%, respectively, i.e., the identity is significant at the 95%, 99% and 99.9% level, respectively、、;
2.4, searching samples of corresponding time periods and corresponding sampling quantities from the mass data aiming at each element in the consistency index matrix M, repeating the steps 2.2 and 2.3 to calculate consistency index comparison values under 3 degrees of significance, and obtaining an M matrix which is a standard for consistency comparison of the space-time sequence of the subsequent trip;
in this example, the M matrix between 10 am and 12 am for the same network is shown in table 4:
TABLE 4M matrix between 10 am and 12 am for the same network situation
Number of nodes | 95% | 99% | 99.9% |
1 | 7.4 | 1.5 | 0.5 |
…… | …… | …… | …… |
100 | 461.7 | 341.2 | 253.3 |
101 | 482.9 | 343.8 | 254.9 |
102 | 502.2 | 352.3 | 264.5 |
103 | 515.6 | 355.7 | 277.9 |
104 | 532.1 | 362.9 | 28.95 |
105 | 546.5 | 374.8 | 303.1 |
…… | …… | …… | …… |
241 | 2542.6 | 2057.7 | 1573.2 |
Step 3, selecting a mobile communication card number from the database as an object to be matched, acquiring a trip time-space sequence of the mobile communication card number, traversing the mobile communication record database, selecting other individual trip time-space sequences from the mobile communication record database to match the mobile communication card number, randomly selecting sequence segments with random time lengths from the time-space sequences, calculating the correlation of the two time-space sequences on the spatial positions, carrying out consistency check on the correlation, and judging whether the two time-space sequences are held by the same person;
step 3.1, randomly selecting an EPID from a database as a card number to be matched, acquiring a travel space-time sequence C1 of the EPID in a specified time period, setting the number PID of a holder as P1, randomly selecting a time-continuous sequence segment from the EPID, and acquiring the time period t of the segment and the number n of nodes in the segment;
in this example, the number of the card to be matched is 2454;
step 3.2, traversing the database, judging whether each traversed card number is in the same network with the target card number, and acquiring a sequence segment Ci of the traversed card number in a time period t, wherein the travel time-space sequences of all individuals are at equal time intervals, and the node number of the segment is n:
in this example, the traversed card number is 2142, and if the card number is different from the card number C1, the spatiotemporal information of C1 and C2 is shown in Table 5:
TABLE 5 spatiotemporal information of C1 and C2 for the case of heterogeneous networks
TIME | X1 | Y1 | X2 | Y2 |
…… | …… | …… | …… | …… |
2019-07-02 08:00:00 | 4232.453 | 6582.123 | 4227.453 | 6588.123 |
2019-07-02 08:00:30 | 4236.542 | 6590.654 | 4234.542 | 6595.654 |
2019-07-02 08:01:00 | 4230.123 | 6599.452 | 4227.123 | 6607.452 |
2019-07-02 08:01:30 | 4224.453 | 6592.764 | 4228.453 | 6590.764 |
2019-07-02 08:02:00 | 4218.764 | 6583.665 | 4227.764 | 6575.665 |
2019-07-02 08:02:30 | 4218.699 | 6572.913 | 4208.699 | 6574.913 |
2019-07-02 08:03:00 | 4210.642 | 6570.643 | 4218.642 | 6580.643 |
2019-07-02 08:03:30 | 4206.754 | 6567.124 | 4215.754 | 6562.124 |
2019-07-02 08:04:00 | 4193.386 | 6565.164 | 4201.386 | 6573.164 |
2019-07-02 08:04:30 | 4194.824 | 6574.325 | 4200.824 | 6564.325 |
2019-07-02 08:05:00 | 4206.623 | 6572.653 | 4215.623 | 6570.653 |
2019-07-02 08:05:30 | 4207.114 | 6588.332 | 4201.114 | 6594.332 |
…… | …… | …… | …… | …… |
Step 3.3, to、Andsetting the two sets of travel space-time sequences C1 and Ci as the judgment threshold values of the number of samples of the travel track of the same person to be N1, N2 and N3 respectively, and setting the number of samples of the consistency comparison result in the three confidence intervals to be S1, S2 and S3 respectively;
step 3.3, the space XY coordinates of the two time sequence segments intercepted from C1 and Ci are divided into 4 vectors, the consistency between the two segments is calculated by adopting the step 2.2, and the consistency is checked with the consistency comparison index in the M matrix,
step 3.3.1, if the consistency index obtained by calculation is larger than the consistency comparison index of the time periodJumping to step 3.4;
step 3.3.2, if the consistency index is less than or equal toAnd is greater thanJumping to step 3.5;
step 3.3.3, if the consistency index is less than or equal toAnd is greater thanJumping to step 3.6;
step 3.4, abandoning the travel time-space sequence and traversing to the next sequence;
step 3.5,Adding 1 to the sampling number S1 of the interval, calculating that the sampling number S1 is more than or equal to N1, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.6,Adding 1 to the sampling number S2 of the interval, calculating that the sampling number S2 is more than or equal to N2, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.7,Adding 1 to the sampling number S3 of the interval, calculating that the sampling number S3 is more than or equal to N3, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.8, continuing to randomly extract fragments on the two travel time sequences, namely randomly selecting a time-continuous sequence fragment from C1, and acquiring the sequence fragments in the same time period from Ci, wherein the node numbers of the two fragments are the same;
step 3.9, for the two trip space-time sequences C1 and Ci judged to be owned by the same person, if the trip space-time sequence with the mobile communication card number EPID of Ci is associated with the holder number P1 of C1, it is indicated that the space-time motion trajectory recorded by the card number is owned by the individual with the number PID of P1, and the corresponding relationship between the EPID and the PID is stored in a data Table TR (relationship Table, Table of relationship, TR);
in this example, the consistency index value between the C1 segment and the C2 segment is 19.54, which is less than the significance index requirements of the different networks, 8 am to 10 am, 12 nodes of the space-time sequence segment, and more than 99.9% consistency probability, and the two segments are considered to be consistent; let N1 be 100, N2 be 50, N3 be 30, and through 30 sampling tests, the judgment extracted from the travel space-time sequences with EPIDs 2454 and 2142 can all meet 99.9% of consistency test, so that the card numbers with EPIDs 2454 and 2142 are judged to belong to the same space individual;
step 4, traversing the whole database to obtain all other card numbers which are judged to be held by the same person as the initially selected mobile communication card number, marking the card numbers as held by the same person, identifying individual numbers PID for the card numbers, matching the card numbers with the travel time-space sequences of all other individuals in a database traversing manner, judging whether the card numbers are held by the same person, and if the condition that one card number is held by a plurality of persons at the same time occurs, namely the similarity between one card number and two mutually unmatched card numbers can be checked, attributing the card numbers to the mobile communication cards with larger similarity;
step 4.1, traversing EPIDs of all mobile communication card numbers in the database, repeatedly executing the step 3, comparing the EPIDs with a travel space-time sequence of C1, calculating the consistency among the space-time sequences, identifying all card numbers held by individuals which can be judged as P1, and continuously storing the relationship between the card numbers and the individual numbers into an EPID-PID relational data table TR;
step 4.2, searching, traversing and comparing the whole database by taking each EPID in the database as a matching object, mining the matching relation among all mobile communication card numbers, and storing the matching relation into a data table TR;
in this example, EPID-PID relationship data Table TR is shown in Table 6:
TABLE 6 EPID-PID relationship data Table TR
PID | EPID |
…… | …… |
0323 | 2454 |
0323 | 2142 |
0323 | 4143 |
0324 | 3422 |
0325 | 1454 |
0326 | 4212 |
0327 | 4526 |
0328 | 3422 |
0329 | 7633 |
0329 | 1434 |
0330 | 4132 |
0331 | 2142 |
0331 | 5314 |
…… | …… |
Step 4.3, traversing the data table TR, and searching for the situation that the same trip time-space sequence is held by multiple people, that is, the same trip time-space sequence Cj of the EPID belongs to multiple different PIDs, which also means that the consistency requirement is met by comparing the Cj with multiple trip time-space sequences (such as Ck, Cl, Cm and the like) belonging to different PIDs, then:
step 4.3.1, repeating the steps 3.3-3.9, traversing to obtain a travel space-time sequence (Ck, Cl, Cm and the like) belonging to the same PID (including different PIDs) by taking the Cj as an object to be matched, sampling and calculating consistency on the space-time sequence again, and rechecking the consistency of the space-time sequence based on a consistency comparison matrix M;
4.3.2, after the comparison and the inspection are carried out again, if the comparison consistency requirement between the time-space sequence of the trip such as Ck, Cl, Cm and the like and the time-space sequence of the trip cannot be met with the Cj is generated, deleting the membership relation between the PID and the Cj of the holder from the data table TR;
step 4.3.3, repeatedly executing steps 4.3.1 and 4.3.2 until only one trip space-time sequence which meets the consistency with Cj is left, or the iteration number NC is reached, if the iteration number NC is reached and the consistency between more than 1 trip space-time sequences (such as Cp, Cq, Cr and the like) belonging to different PID and Cj can be checked, calculating the cumulative consistency index sum between Cj and Cp, Cq and Cr(The consistency index value representing the n-th sampling comparison between the space-time sequences Cj and Cp) is selected to be the smallest value, the relation between the selected value and the Cj which belongs to the same PID is kept, and the membership relation between the Cj and other PIDs is deleted;
in this example, the card number with EPID 2142 is held by individuals with PID 0323 and 0331 at the same time, and through resampling and matching, it is finally determined that 2142 belongs to the individual with PID 0323, and the records with PID 0331 and EPID 2142 are deleted from the TR table;
step 4.4, repeating the step 4.3 until the situation that one EPID belongs to a plurality of PIDs does not exist in the data table TR;
step 5, after traversing the database and carrying out trip time-space sequence matching on all mobile communication card numbers, inquiring the communication records of a plurality of card numbers and a fixed sensor of each group of one person and a plurality of cards for each group of one person and a plurality of cards, changing the card number EPID into an individual identification PID, mutually interpolating a plurality of trip time-space sequences into a mobile communication record set according to the time sequence, storing the new record set into the database as the communication records of the handheld mobile communication card and the fixed sensor, and jumping to the step 1.4 to carry out time-space weighted interpolation on the individual new mobile communication record set to obtain a more precise individual trip time-space sequence;
step 5.1, traversing the TR data table, inquiring the PID with a plurality of mobile communication card numbers, and recording the EPIDs of all the mobile communication cards to which the PID belongs;
in this example, an individual with a PID of 0323 holds multiple card numbers;
step 5.2, according to the obtained EPIDs, respectively inquiring the communication records of the EPIDs and the fixed sensors in a database, and sequencing the records according to time to form a plurality of pieces of travel space-time trajectory data;
in this example, the 3 travel spatio-temporal trajectories of an individual with PID 0323 are shown in table 7:
TABLE 7 Individual multiple spatiotemporal trajectories
EPID | REGIONCODE | SENSORID | TIME | X | Y |
…… | …… | …… | …… | …… | …… |
2454 | 4231 | 2389 | 2019-07-10 10:02:43 | 4978.842 | 6922.075 |
2454 | 4231 | 2224 | 2019-07-10 10:06:56 | 5022.084 | 6412.204 |
2454 | 4231 | 2567 | 2019-07-10 10:07:31 | 5290.027 | 6882.754 |
2454 | 4231 | 2423 | 2019-07-10 10:10:55 | 5452.346 | 6536.323 |
2454 | 4231 | 2423 | 2019-07-10 10:12:18 | 5452.346 | 6536.323 |
2454 | 4231 | 2423 | 2019-07-10 10:15:42 | 5452.346 | 6536.323 |
2454 | 4231 | 2423 | 2019-07-10 10:18:34 | 5452.346 | 6536.323 |
2454 | 4231 | 2423 | 2019-07-10 10:22:38 | 5452.346 | 6536.323 |
2454 | 4231 | 2345 | 2019-07-10 10:26:29 | 5246.451 | 6319.747 |
2454 | 4231 | 5234 | 2019-07-10 10:28:10 | 5881.280 | 6357.004 |
2454 | 4231 | 2134 | 2019-07-10 10:31:08 | 5571.187 | 6976.146 |
2454 | 4231 | 2342 | 2019-07-10 10:38:02 | 5844.266 | 6131.435 |
…… | …… | …… | …… | …… | …… |
2142 | 4231 | 2389 | 2019-07-10 10:03:54 | 4978.842 | 6922.075 |
2142 | 4231 | 2389 | 2019-07-10 10:07:12 | 4978.842 | 6922.075 |
2142 | 4231 | 2224 | 2019-07-10 10:08:43 | 5022.084 | 6412.204 |
2142 | 4231 | 2567 | 2019-07-10 10:09:23 | 5290.027 | 6882.754 |
2142 | 4231 | 2423 | 2019-07-10 10:11:54 | 5452.346 | 6536.323 |
2142 | 4231 | 2423 | 2019-07-10 10:13:09 | 5452.346 | 6536.323 |
2142 | 4231 | 2423 | 2019-07-10 10:20:19 | 5452.346 | 6536.323 |
2142 | 4231 | 2345 | 2019-07-10 10:24:34 | 5246.451 | 6319.747 |
2142 | 4231 | 2342 | 2019-07-10 10:29:05 | 5844.266 | 6131.435 |
…… | …… | …… | …… | …… | …… |
4143 | 1243 | 3245 | 2019-07-10 10:02:45 | 5686.413 | 6935.724 |
4143 | 1243 | 3642 | 2019-07-10 10:05:19 | 5633.066 | 6853.297 |
4143 | 1243 | 3099 | 2019-07-10 10:09:11 | 5110.714 | 6658.093 |
4143 | 1243 | 3874 | 2019-07-10 10:10:33 | 5244.108 | 6353.857 |
4143 | 1243 | 3874 | 2019-07-10 10:12:06 | 5244.108 | 6353.857 |
4143 | 1243 | 3874 | 2019-07-10 10:14:36 | 5244.108 | 6353.857 |
4143 | 1243 | 3874 | 2019-07-10 10:17:23 | 5244.108 | 6353.857 |
4143 | 1243 | 3698 | 2019-07-10 10:25:56 | 5543.188 | 6043.208 |
4143 | 1243 | 3684 | 2019-07-10 10:27:44 | 5178.479 | 6210.875 |
4143 | 1243 | 3495 | 2019-07-10 10:30:32 | 5923.473 | 6381.616 |
…… | …… | …… | …… | …… | …… |
Step 5.3, determining weights of a plurality of trip space-time trajectory data by taking time as an order based on different data densities recorded by different mobile communication card numbers, mutually interleaving to construct a new space-time trajectory, and acquiring an individual trip space-time sequence with equal time intervals based on the communication records of the card numbers and a fixed sensor by adopting the space weighting interpolation method of the step 1.4;
in this example, the combined spatiotemporal trajectories are shown in table 8:
TABLE 8 merged spatio-temporal trajectories
EPID | REGIONCODE | SENSORID | TIME | X | Y |
…… | …… | …… | …… | …… | …… |
2454 | 4231 | 2389 | 2019-07-10 10:02:43 | 4978.842 | 6922.075 |
4143 | 1243 | 3245 | 2019-07-10 10:02:45 | 5686.413 | 6935.724 |
2142 | 4231 | 2389 | 2019-07-10 10:03:54 | 4978.842 | 6922.075 |
4143 | 1243 | 3642 | 2019-07-10 10:05:19 | 5633.066 | 6853.297 |
2454 | 4231 | 2224 | 2019-07-10 10:06:56 | 5022.084 | 6412.204 |
2142 | 4231 | 2389 | 2019-07-10 10:07:12 | 4978.842 | 6922.075 |
2454 | 4231 | 2567 | 2019-07-10 10:07:31 | 5290.027 | 6882.754 |
2142 | 4231 | 2224 | 2019-07-10 10:08:43 | 5022.084 | 6412.204 |
4143 | 1243 | 3099 | 2019-07-10 10:09:11 | 5110.714 | 6658.093 |
2142 | 4231 | 2567 | 2019-07-10 10:09:23 | 5290.027 | 6882.754 |
4143 | 1243 | 3874 | 2019-07-10 10:10:33 | 5244.108 | 6353.857 |
2454 | 4231 | 2423 | 2019-07-10 10:10:55 | 5452.346 | 6536.323 |
2142 | 4231 | 2423 | 2019-07-10 10:11:54 | 5452.346 | 6536.323 |
4143 | 1243 | 3874 | 2019-07-10 10:12:06 | 5244.108 | 6353.857 |
2454 | 4231 | 2423 | 2019-07-10 10:12:18 | 5452.346 | 6536.323 |
2142 | 4231 | 2423 | 2019-07-10 10:13:09 | 5452.346 | 6536.323 |
4143 | 1243 | 3874 | 2019-07-10 10:14:36 | 5244.108 | 6353.857 |
2454 | 4231 | 2423 | 2019-07-10 10:15:42 | 5452.346 | 6536.323 |
4143 | 1243 | 3874 | 2019-07-10 10:17:23 | 5244.108 | 6353.857 |
2454 | 4231 | 2423 | 2019-07-10 10:18:34 | 5452.346 | 6536.323 |
2142 | 4231 | 2423 | 2019-07-10 10:20:19 | 5452.346 | 6536.323 |
2454 | 4231 | 2423 | 2019-07-10 10:22:38 | 5452.346 | 6536.323 |
2142 | 4231 | 2345 | 2019-07-10 10:24:34 | 5246.451 | 6319.747 |
4143 | 1243 | 3698 | 2019-07-10 10:25:56 | 5543.188 | 6043.208 |
2454 | 4231 | 2345 | 2019-07-10 10:26:29 | 5246.451 | 6319.747 |
4143 | 1243 | 3684 | 2019-07-10 10:27:44 | 5178.479 | 6210.875 |
2454 | 4231 | 5234 | 2019-07-10 10:28:10 | 5881.280 | 6357.004 |
2142 | 4231 | 2342 | 2019-07-10 10:29:05 | 5844.266 | 6131.435 |
4143 | 1243 | 3495 | 2019-07-10 10:30:32 | 5923.473 | 6381.616 |
2454 | 4231 | 2134 | 2019-07-10 10:31:08 | 5571.187 | 6976.146 |
2454 | 4231 | 2342 | 2019-07-10 10:38:02 | 5844.266 | 6131.435 |
…… | …… | …… | …… | …… | …… |
Step 5.4, recalculating travel space-time sequences of all PIDs with a plurality of card numbers, storing calculation results into a database, and providing a data basis for other subsequent analysis based on mobile communication big data;
in this example, the travel spatio-temporal trajectory recalculated by the individual with PID 0323 is shown in table 9:
TABLE 9 travel space-time trajectory of individual 0323
PID | TIME | X | Y |
…… | …… | …… | …… |
0323 | 2019-07-10 10:02:30 | 4980.1627 | 6914.5155 |
0323 | 2019-07-10 10:03:00 | 5012.6678 | 6901.3820 |
0323 | 2019-07-10 10:03:00 | 5023.2808 | 6912.7565 |
0323 | 2019-07-10 10:04:00 | 5067.1517 | 6944.5986 |
0323 | 2019-07-10 10:04:00 | 5086.1439 | 6916.5905 |
0323 | 2019-07-10 10:05:00 | 5135.1632 | 6920.6010 |
0323 | 2019-07-10 10:05:00 | 5150.1864 | 6941.2751 |
0323 | 2019-07-10 10:06:00 | 5180.4582 | 6990.6511 |
0323 | 2019-07-10 10:06:00 | 5204.3407 | 7012.8804 |
0323 | 2019-07-10 10:07:00 | 5217.2648 | 7023.2694 |
0323 | 2019-07-10 10:07:00 | 5209.4010 | 7031.0754 |
0323 | 2019-07-10 10:08:00 | 5231.0230 | 7042.0325 |
0323 | 2019-07-10 10:08:00 | 5206.9472 | 7046.6325 |
0323 | 2019-07-10 10:09:00 | 5243.4519 | 7050.8822 |
0323 | 2019-07-10 10:09:00 | 5261.7507 | 7024.7138 |
0323 | 2019-07-10 10:10:00 | 5268.0003 | 7016.2601 |
0323 | 2019-07-10 10:10:00 | 5318.6818 | 7035.0196 |
0323 | 2019-07-10 10:11:00 | 5287.5475 | 7037.5181 |
0323 | 2019-07-10 10:11:00 | 5237.9831 | 7006.4965 |
0323 | 2019-07-10 10:12:00 | 5268.3228 | 7027.2627 |
0323 | 2019-07-10 10:12:00 | 5224.9202 | 6978.9209 |
0323 | 2019-07-10 10:13:00 | 5269.4769 | 6977.9299 |
0323 | 2019-07-10 10:13:00 | 5309.9159 | 7026.3982 |
0323 | 2019-07-10 10:14:00 | 5284.2201 | 6980.0280 |
0323 | 2019-07-10 10:14:00 | 5321.5669 | 6947.7865 |
0323 | 2019-07-10 10:15:00 | 5367.0882 | 6918.8849 |
0323 | 2019-07-10 10:15:00 | 5405.4590 | 6941.7112 |
0323 | 2019-07-10 10:16:00 | 5423.6286 | 6970.9030 |
0323 | 2019-07-10 10:16:00 | 5404.8811 | 6992.2385 |
0323 | 2019-07-10 10:17:00 | 5422.5825 | 6969.5919 |
0323 | 2019-07-10 10:17:00 | 5386.9400 | 7011.5913 |
0323 | 2019-07-10 10:18:00 | 5343.6286 | 6964.4445 |
0323 | 2019-07-10 10:18:00 | 5308.0133 | 6976.5579 |
0323 | 2019-07-10 10:19:00 | 5320.2723 | 6996.9447 |
0323 | 2019-07-10 10:19:00 | 5291.6284 | 7041.9836 |
0323 | 2019-07-10 10:20:00 | 5269.3226 | 7064.3575 |
0323 | 2019-07-10 10:20:00 | 5304.9516 | 7069.9671 |
…… | …… | …… | …… |
Claims (7)
1. A method for identifying one person with multiple cards in a big data environment is characterized by comprising the following steps:
step 1, reading anonymous encrypted mobile terminal sensor data with a unique EPID number obtained from a sensor operator, extracting a communication signaling record triggered in a specified time period, expanding recorded sampling points at equal time intervals, and obtaining a travel space-time trajectory;
step 2, intercepting a large number of individual track segments with equal time intervals from the travel space-time track, constructing a comparative vector consistency index, calculating the consistency index of the randomly selected travel track at fixed time intervals in a fixed time period, and obtaining a consistency inspection index for judging whether the two tracks are consistent;
step 3, selecting a mobile communication card number from the database as an object to be matched, acquiring a trip time-space sequence of the mobile communication card number, traversing the mobile communication record database, selecting other individual trip time-space sequences from the mobile communication card number to match the mobile communication card number, randomly intercepting sequence segments of the same point position in the time-space sequences, calculating the correlation of the two time-space sequences on the space position, carrying out consistency check on the correlation, and judging whether the two time-space sequences are held by the same person;
step 4, traversing the whole database to obtain all other card numbers which are judged to be held by the same person as the initially selected mobile communication card number, marking the card numbers as held by the same person, identifying individual numbers PID for the card numbers, matching the card numbers with the travel time-space sequences of all other individuals in a database traversing manner, judging whether the card numbers are held by the same person, and if the condition that one card number is held by a plurality of persons at the same time occurs, namely the similarity between one card number and two mutually unmatched card numbers can be checked, attributing the card numbers to the mobile communication cards with larger similarity;
and 5, after traversing the database, after carrying out trip time-space sequence matching on all mobile communication card numbers, judging whether the mobile communication card numbers are held by one person or not with other card numbers, inquiring the communication records of a plurality of card numbers and a fixed sensor of each group of one person and a plurality of card numbers aiming at each group of one person and a plurality of cards, changing the card numbers into individual identification PID, mutually interpolating a plurality of trip time-space sequences into a mobile communication record set according to the time sequence, storing the new record set into the database as the communication records of the handheld mobile communication card and the fixed sensor, carrying out time-space weighted interpolation at equal time intervals, and obtaining a more precise individual trip time-space sequence.
2. The method for one-person-multiple-card recognition in big data environment as claimed in claim 1, the step 1 inquires all communication records of the mobile communication card within a specified time period according to the unique number EPID of the mobile communication card, initially constructs an individual trip space-time track formed by an individual through a single mobile communication card number and a fixed sensor communication record, adopts a space-time weighting interpolation method, carrying out space interpolation of equal time intervals on each individual trip space-time trajectory to obtain individual trip space-time sequences of equal time intervals, deleting communication records between an original mobile communication card and a fixed sensor, representing the movement of an individual in space-time by the individual trip space-time sequences obtained through interpolation, and constructing the trip space-time sequences of the individual at equal time intervals Th in a specified time period T, wherein the sequences comprise T/Th +1 nodes, and each node comprises time and XY coordinate information.
3. A majority according to claim 1According to the one-person multi-card identification method under the environment, the method is characterized in that in the step 2, a time-space sequence consistency comparison index matrix M is constructed on the basis of individual trip time-space sequences, wherein M is a matrix of 2 XnxmxmX3 order, the matrix M is traversed, and consistency indexes r constructed on the basis of Pearson product moment values and similarity between every two trip time-space sequence fragments are extracted once; the average value of r obtained by counting N time-space sequence samplesAnd is provided with,,The similarity of the spatio-temporal sequences is constrained from different levels by the consistency indexes with 3 levels in total, namely the consistency is significant at the levels of 95%, 99% and 99.9%, and the consistency indexes are respectively、、。
4. The method for identifying one person with multiple cards in big data environment according to claim 1, wherein said step 3 obtains the travel space-time sequence of the target card number in the random time slot, and compares it with the simultaneous time slot sequence of each card number in the database: if the consistency index is greater thanDiscarding the travel space-time sequence; if the consistency index is less than or equal toAnd continuously and randomly intercepting segments of the same point positions on the two travel time sequences, and calculating the consistency index of the segments until the sequence pair is abandoned or the two travel time-space sequences represent the travel track of the same person.
5. The method for one-person-multiple-card identification in big data environment as claimed in claim 4, wherein said step 3 is directed to、Andsetting the two groups of travel space-time sequences C1 and Ci as the sampling number discrimination threshold values of the travel track of the same person and the sampling number of the consistency comparison result in the three confidence intervals, and when the consistency index is larger than the consistency comparison index of the time periodAnd discarding the travel sequence, otherwise, judging that the travel time-space sequences are all two travel time-space sequences of the same person if the number of samples meeting the consistency index in any confidence interval is greater than or equal to the number of samples judgment threshold.
6. The method as claimed in claim 1, wherein the step 4 traverses the EPIDs of all mobile communication card numbers in the database, identifies all card numbers that can be determined as being held by individuals with PID P1, continues to store the relationship between the card numbers and the individual numbers into an EPID-PID relationship data table TR, performs sampling and consistency calculation on the same trip time-space sequence again when the same trip time-space sequence is held by multiple individuals, sets an upper limit iteration number, updates the TR if the unique correspondence is satisfied, calculates the cumulative consistency index sum of the multiple time-space sequences if the unique correspondence is not satisfied, and selects the sequence with the smallest value to retain.
7. The one-person multi-card identification method in the big data environment as claimed in claim 1, wherein said step 5 traverses the TR data table, obtains the communication records of a plurality of EPIDs belonging to the same PID, sorts the records according to time, forms a plurality of travel spatio-temporal trajectory data, determines the weights of the data based on the different data densities of the records, alternates between them, constructs a new spatio-temporal trajectory, obtains the individual travel spatio-temporal sequences at equal time intervals by the spatial weighting interpolation method, stores the calculation results in the database, and provides the data base for other subsequent analyses based on the big data of mobile communication.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011142356.0A CN111970685B (en) | 2020-10-23 | 2020-10-23 | One-person multi-card identification method in big data environment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011142356.0A CN111970685B (en) | 2020-10-23 | 2020-10-23 | One-person multi-card identification method in big data environment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111970685A true CN111970685A (en) | 2020-11-20 |
CN111970685B CN111970685B (en) | 2021-01-15 |
Family
ID=73387625
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011142356.0A Active CN111970685B (en) | 2020-10-23 | 2020-10-23 | One-person multi-card identification method in big data environment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111970685B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117150319A (en) * | 2023-10-30 | 2023-12-01 | 北京艾瑞数智科技有限公司 | Method and device for identifying multiple numbers of one person |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104573390A (en) * | 2015-01-27 | 2015-04-29 | 武汉大学 | Cognitive-rule-based time-space trajectory fusion method and road network topology generating method |
CN108733818A (en) * | 2018-05-21 | 2018-11-02 | 上海世脉信息科技有限公司 | A kind of big data expansion quadrat method based on the verification of more scene multi-data sources |
CN109348404A (en) * | 2018-10-09 | 2019-02-15 | 上海世脉信息科技有限公司 | A kind of method that individual trip path locus extracts under big data environment |
CN110162588A (en) * | 2019-05-29 | 2019-08-23 | 浪潮软件集团有限公司 | A kind of track fusion method of multidimensional related information |
CN110958599A (en) * | 2018-09-26 | 2020-04-03 | 北京融信数联科技有限公司 | One-machine multi-card user distinguishing method based on track similarity |
CN110958600A (en) * | 2018-09-26 | 2020-04-03 | 北京融信数联科技有限公司 | Method for judging number of one-machine multi-card users in regional population based on track similarity |
CN111343581A (en) * | 2018-12-18 | 2020-06-26 | 北京融信数联科技有限公司 | One-person multi-number mobile user identification method based on distance |
-
2020
- 2020-10-23 CN CN202011142356.0A patent/CN111970685B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104573390A (en) * | 2015-01-27 | 2015-04-29 | 武汉大学 | Cognitive-rule-based time-space trajectory fusion method and road network topology generating method |
CN108733818A (en) * | 2018-05-21 | 2018-11-02 | 上海世脉信息科技有限公司 | A kind of big data expansion quadrat method based on the verification of more scene multi-data sources |
CN110958599A (en) * | 2018-09-26 | 2020-04-03 | 北京融信数联科技有限公司 | One-machine multi-card user distinguishing method based on track similarity |
CN110958600A (en) * | 2018-09-26 | 2020-04-03 | 北京融信数联科技有限公司 | Method for judging number of one-machine multi-card users in regional population based on track similarity |
CN109348404A (en) * | 2018-10-09 | 2019-02-15 | 上海世脉信息科技有限公司 | A kind of method that individual trip path locus extracts under big data environment |
CN111343581A (en) * | 2018-12-18 | 2020-06-26 | 北京融信数联科技有限公司 | One-person multi-number mobile user identification method based on distance |
CN110162588A (en) * | 2019-05-29 | 2019-08-23 | 浪潮软件集团有限公司 | A kind of track fusion method of multidimensional related information |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117150319A (en) * | 2023-10-30 | 2023-12-01 | 北京艾瑞数智科技有限公司 | Method and device for identifying multiple numbers of one person |
Also Published As
Publication number | Publication date |
---|---|
CN111970685B (en) | 2021-01-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110505583B (en) | Trajectory matching method based on bayonet data and signaling data | |
CN111582948B (en) | Individual behavior analysis method based on mobile phone signaling data and POI (Point of interest) | |
CN106778876B (en) | User classification method and system based on mobile user track similarity | |
EP3132592B1 (en) | Method and system for identifying significant locations through data obtainable from a telecommunication network | |
CN111737605A (en) | Travel purpose identification method and device based on mobile phone signaling data | |
CN109801091B (en) | Target user group positioning method and device, computer equipment and storage medium | |
CN108536851A (en) | A kind of method for identifying ID based on motion track similarity-rough set | |
CN109195219B (en) | Method for determining position of mobile terminal by server | |
Muzammal et al. | Trajectory mining using uncertain sensor data | |
CN107977673A (en) | A kind of economically active population's recognition methods based on big data | |
WO2014012927A1 (en) | Method and system for traffic estimation | |
Vieira et al. | Querying spatio-temporal patterns in mobile phone-call databases | |
CN111970685B (en) | One-person multi-card identification method in big data environment | |
CN111209487B (en) | User data analysis method, server, and computer-readable storage medium | |
CN110781256B (en) | Method and device for determining POI matched with Wi-Fi based on sending position data | |
CN111372194B (en) | Intelligent identification method for mobile phone card changing user | |
CN111143639B (en) | User intimacy calculation method, device, equipment and medium | |
Dyrmishi et al. | Mobile positioning and trajectory reconstruction based on mobile phone network data: A tentative using particle filter | |
EP3563592B1 (en) | Method for determining the mobility status of a user of a wireless communication network | |
Shi et al. | Mobility patterns analysis of Beijing residents based on call detail records | |
CN107241693B (en) | Method for determining position of non-coordinate sensor in big data environment | |
CN114595300A (en) | Active chain reconstruction method and system combining multi-source space-time data | |
CN115734165A (en) | User searching method, device, equipment and computer readable storage medium | |
CN117648556B (en) | Family membership identification method based on space-time big data | |
Yu et al. | Discovery of Travelling Companions from Trajectories with Different Sampling Rates |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |