CN111970685A - One-person multi-card identification method in big data environment - Google Patents

One-person multi-card identification method in big data environment Download PDF

Info

Publication number
CN111970685A
CN111970685A CN202011142356.0A CN202011142356A CN111970685A CN 111970685 A CN111970685 A CN 111970685A CN 202011142356 A CN202011142356 A CN 202011142356A CN 111970685 A CN111970685 A CN 111970685A
Authority
CN
China
Prior art keywords
time
space
sequences
card
mobile communication
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011142356.0A
Other languages
Chinese (zh)
Other versions
CN111970685B (en
Inventor
张颖
顾高翔
刘杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI SHIMAI INFORMATION TECHNOLOGY CO LTD
Original Assignee
SHANGHAI SHIMAI INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI SHIMAI INFORMATION TECHNOLOGY CO LTD filed Critical SHANGHAI SHIMAI INFORMATION TECHNOLOGY CO LTD
Priority to CN202011142356.0A priority Critical patent/CN111970685B/en
Publication of CN111970685A publication Critical patent/CN111970685A/en
Application granted granted Critical
Publication of CN111970685B publication Critical patent/CN111970685B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/18Processing of user or subscriber data, e.g. subscribed services, user preferences or user profiles; Transfer of user or subscriber data
    • H04W8/183Processing at user equipment or user record carrier
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/909Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using geographical or spatial information, e.g. location
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/22Processing or transfer of terminal data, e.g. status or physical capabilities
    • H04W8/24Transfer of terminal data
    • H04W8/245Transfer of terminal data from a network towards a terminal

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention provides a one-person multi-card identification method in a big data environment. The invention fully utilizes the communication records between the existing mass handheld mobile communication equipment and the fixed sensor, designs a comparison algorithm and a consistency check standard, can not only arrange target data automatically and conveniently at low cost and identify a plurality of card numbers belonging to one person, but also effectively extract actual individuals in space from a large number of communication records, reduces the influence of one person and a plurality of cards on the overall statistics, and can more accurately acquire the trip time-space sequence of the individuals through the fusion of the time-space tracks of the plurality of card numbers, thereby providing a more reliable data base for the analysis of other large spatial data.

Description

One-person multi-card identification method in big data environment
Technical Field
The invention relates to a method for identifying a state of one person holding multiple cards based on mass mobile communication data, and belongs to the technical field of big data analysis.
Background
In recent years, with the development of information technology, the data information amount is increased explosively, the data sources are more and more, and the data amount is also more and more huge. Data recorded by information sensors such as mobile phones, WIFI and the Internet of things become the most important data source in big data analysis, and relatively complete individual trip records of the data provide good data support for big data, especially for traffic big data analysis.
The sensor communication data is adopted to carry out research and analysis in the aspect of urban socioeconomic, so that the research can be refined to the individuals moving in the city more accurately, but the sample distribution still has larger deviation from the actual distribution, so that the bias of the sample distribution seriously affects the analysis result. Among them, the existence of one person and multiple cards brings great trouble to the research work of adopting mobile communication big data. One-person multi-card means that the same person transacts a plurality of card numbers under the same or different mobile communication operators. At present, the popular market of the high, middle and low-end mobile phones and the dual-card dual-standby mobile phones shows the universality of the phenomenon that one person holds two cards or even multiple cards, and if the phenomenon is not processed, a large amount of redundant space-time tracks can be generated in the process of analyzing the space-time movement of urban crowds, so that the deviation of an analysis result is caused. In 10 months in 2019, three operators in China move, communicate and communicate, the number of active users is 9.4359 hundred million, 3.22119 hundred million and 3.3253 hundred million respectively, 15.982 hundred million mobile phone users are counted, the number of national population at the end of 2018 is not more than 13.9538 hundred million, the number of mobile phone card numbers per person reaches 1.145, and the number of multi-cards in the same network and the number of multi-cards in different networks reach quite high degrees. However, there is a difficulty in identifying the one-person multi-card phenomenon, because each card number can independently communicate with a fixed sensor such as a base station and WIFI, even in the same mobile communication network, the space-time trajectory formed by the communication records of two cards and the fixed sensor can also be different, and the position difference of the fixed sensor exists between different network card numbers, so that the one-person multi-card phenomenon is more difficult to identify, and the criterion for determination is also lacked. Therefore, a certain algorithm is needed in the data preprocessing stage to calculate a set of index system for judging the phenomenon of one person with multiple cards, and IDentify multiple mobile communication card numbers held by the same person, so as to simplify and combine the original data, and meanwhile, based on the combined individual data, a multiple data weighted interpolation mode is adopted to further improve the accuracy of travel space-time trajectories of the individuals, and based on the accuracy, a mutual mapping table of a PID (Personal identity, individual identification, PID) and an operator card number is updated and maintained, so that the sampling processing is performed when the combined individual data is subsequently used for specific statistical analysis, that is, only the number of the PIDs is counted, and the influence of the one person with multiple cards on the overall statistics is reduced.
Disclosure of Invention
The invention aims to represent the movement of an individual on the space by utilizing an individual travel space-time trajectory formed by communication records between a mobile communication device and a fixed sensor, sampling and comparing travel space-time tracks of different card numbers in the same period, judging the similarity of the travel space-time tracks, constructing a consistency index for judging one person with multiple cards on the basis of the similarity, whether the travel tracks recorded by different mobile communication cards belong to the same person or not is identified and judged, for an individual with multiple cards, perfecting the travel space-time trajectory of the individual by using the communication records of the multiple cards, selecting one with the most perfect record from the multiple card numbers, based on the above, the mutual mapping table of the individual identification PID and the operator card number is updated and maintained, the method is convenient for the subsequent time sample processing for the specific statistical analysis, namely only the PID number is counted, and the influence of one person on the multi-card on the overall statistics is reduced.
In order to achieve the above object, the technical solution of the present invention is to provide a method for constructing a one-person multi-card recognition algorithm and a consistency check index thereof in a big data environment, comprising the following steps:
step 1, reading sensor data of an anonymous encryption mobile terminal obtained from a sensor operator, wherein the sensor data of the anonymous encryption mobile terminal are continuous in time and space, different one-way irreversible encryption card numbers of different mobile terminals correspond to different numbers, extracting a communication signaling record triggered by each card number in a specified time period to form a travel space-time communication record data set recorded by an individual through the card number, arranging the travel space-time communication record data set according to time, projecting the space position of the travel space-time communication record data set onto a map, and expanding sample points recorded by the individual through a space-time interpolation method according to equal time intervals on the basis of data registration to obtain a travel space-time trajectory taking each mobile communication card number as an identification mark;
step 2, in order to eliminate the influence of different time on the individual travel similarity judgment, dividing 24 hours per day into N time periods, intercepting a large number of individual travel space-time tracks with equal time intervals from a database aiming at the same time period, splitting the travel space-time tracks into 4 vectors according to space XY coordinates, calculating a correlation coefficient and a standard difference between every two vectors by adopting a Pearson product moment method, constructing an index for comparing the consistency of the vectors, calculating a consistency index of the randomly selected travel tracks at fixed time intervals in the fixed time period and a difference value of the Pearson product moments between different time periods, and obtaining a consistency check standard for judging whether the two tracks are consistent or not by taking the consistency check standard as a standard, namely a judgment basis for judging that the two tracks belong to the same individual;
step 3, selecting a mobile communication card number from the database as an object to be matched, acquiring a trip time-space sequence of the mobile communication card number, traversing the mobile communication record database, selecting other individual trip time-space sequences from the mobile communication record database to match the mobile communication card number, randomly selecting sequence segments with random time lengths from the time-space sequences, calculating the correlation of the two time-space sequences on the spatial positions, carrying out consistency check on the correlation, and judging whether the two time-space sequences are held by the same person;
step 4, traversing the whole database to obtain all other card numbers which are judged to be held by the same person as the initially selected mobile communication card number, marking the card numbers as held by the same person, identifying individual numbers PID for the card numbers, matching the card numbers with the travel time-space sequences of all other individuals in a database traversing manner, judging whether the card numbers are held by the same person, and if the condition that one card number is held by a plurality of persons at the same time occurs, namely the similarity between one card number and two mutually unmatched card numbers can be checked, attributing the card numbers to the mobile communication cards with larger similarity;
and 5, after traversing the database, after performing trip time-space sequence matching on all mobile communication card numbers, judging whether the mobile communication card numbers are in the condition of one person holding with other card numbers, inquiring communication records of a plurality of card numbers and a fixed sensor of each group of one person and a plurality of card numbers, changing the card number EPID (namely anonymous one-way encrypted global unique mobile terminal identification code, EncryPtion international mobile subscriber Identity, EPID) into an individual identification PID (proportional integral identification), mutually interpolating a plurality of trip time-space sequences into a mobile communication record set according to a time sequence, storing the new record set into the database as the communication record of the handheld mobile communication card and the fixed sensor, and jumping to the step 1.4 to perform time-space interpolation on the mobile communication card numbers to obtain a more precise individual trip time-space sequence.
Preferably, the step 1 comprises:
step 1.1, the system reads individual encrypted mobile terminal sensor data obtained anonymously from a sensor operator, wherein the anonymous encrypted mobile terminal sensor data is continuous in time and space, and the method comprises the following steps: unique number EPID of mobile communication card number used for individual and fixed sensor communication, communication action TYPE TYPE, communication action occurrence TIME TIME, regional REGIONENCE of fixed sensor where communication action occurs, fixed sensor specific number SENSORID;
step 1.2, one piece of anonymous encryption mobile terminal sensor data is a signaling record, each signaling record is decrypted, fields such as EPID, TYPE, TIME, REGIONCODE, SENSORID and the like in the record are read, the longitude and latitude coordinates of the record are inquired according to the fixed sensor number in the record, and the record is converted into a geographic space XY coordinate system;
step 1.3, inquiring all communication records of the mobile communication card within a specified time period according to the unique serial number EPID of the mobile communication card, and preliminarily constructing a preliminary individual trip space-time trajectory formed by an individual through a single mobile communication card number and a fixed sensor communication record;
step 1.4, performing space interpolation with equal time intervals on each individual trip space-time trajectory by adopting a space-time weighting interpolation method to obtain individual trip space-time sequences with equal time intervals, wherein the space-time sequences comprise space XY coordinates of individuals on each fixed time node, deleting communication records between an original mobile communication card and a fixed sensor, completely representing the movement of the individuals on space-time by the individual trip space-time sequences obtained by interpolation, and for the individual trip space-time trajectory with a single data source, the weight of each node is consistent, and for the individual trip space-time trajectory with multiple data sources, the weight of each node is determined by the recording density of the original data source of the single mobile communication card number in unit time:
Figure 829815DEST_PATH_IMAGE001
in the formula, W represents the weight of a communication node from a mobile communication card number i, D is the mobile communication record density, T is a fixed time period, N is the mobile communication record number in the time period, and finally, a trip space-time sequence of an individual at equal time intervals Th in a specified time period T is obtained, wherein the sequence comprises T/Th +1 nodes, and each node comprises time and XY coordinate information.
Preferably, the step 2 includes:
step 2.1, constructing a space-time sequence consistency comparison index matrix M according to a large number of individual trip space-time sequences obtained in the step 1.4, wherein M is a 2 Xnxmx 3-order matrix and represents that under the condition that two EPIDs are in the same network or different networks, 24 hours a day is divided into n time periods, each time period has M node sampling quantities, and each sampling quantity has 3 levels of consistency indexes to restrict the similarity of the space-time sequences from different levels; the M matrix is 2 Xnxnxmx 3-order, and 2 shows that the M matrix distinguishes two conditions of the same network and different networks; n is the number of time periods divided according to 24 hours a day, and if the time period is 2 hours, n is equal to 12; m represents the number of sampling nodes in each time period, taking the space-time sequence node interval time of 2 minutes and the time period length of 2 hours as an example, the number of sampling nodes is distributed in an interval of 2 to 60, and m is equal to 59; 3 three consistency criteria representing the number of each sampling node in each time period, representing 95%, 99% and 99.9% confidence, respectively;
step 2.2, traversing the M matrix, and aiming at M (i, j), extracting the records with time periods at i positions and j nodes in number from a large number of individual trip space-time sequences in pairs, wherein the space positions of two space-time sequence segments extracted at a time form 4 row vectors: x1, Y1, X2, Y2, the total number of samples being N pairs, the correspondence between two row spatio-temporal sequence segments was calculated:
Figure 340431DEST_PATH_IMAGE002
Figure 92541DEST_PATH_IMAGE004
Figure 375755DEST_PATH_IMAGE005
in the formula (I), the compound is shown in the specification,
Figure 932638DEST_PATH_IMAGE006
the consistency indexes of 4 vectors formed by the corresponding two travel space-time sequence fragments (X1, Y1) and (X2, Y2) are shown,
Figure 793147DEST_PATH_IMAGE007
for the Pearson product-moment values between travel spatio-temporal sequence segments, representing the similarity between the two segments,
Figure 726468DEST_PATH_IMAGE008
representing the numerical difference between the two vectors as the standard difference between the X value and the Y value of the corresponding time point positions of the two vectors;
step 2.3, counting the time-space sequence samples obtained by N pairs
Figure 739554DEST_PATH_IMAGE006
Average value of (2)
Figure 201759DEST_PATH_IMAGE009
That is, the random consistency index under the condition of the node number j in the time period i, when two trip sequences completely coincide, the random segment between the two trip sequences
Figure 690510DEST_PATH_IMAGE010
And
Figure 20997DEST_PATH_IMAGE011
are all 0, and
Figure 13224DEST_PATH_IMAGE012
and
Figure 551390DEST_PATH_IMAGE013
are all 1, then
Figure 261857DEST_PATH_IMAGE014
Is 0 and step 2.2 gives
Figure 130456DEST_PATH_IMAGE015
Then, the average consistency degree between the sequences is expressed under the condition that two space-time sequences are completely randomly obtained; the comparison standards of the consistency indexes of 3 levels expressed by the consistency index ratio are respectively
Figure 508348DEST_PATH_IMAGE016
Figure 312356DEST_PATH_IMAGE017
Figure 385485DEST_PATH_IMAGE018
So as to represent the threshold value required for the consistency to reach the significance degree of different P values in the process of comparing the space-time sequences, namely the consistency index of two sequences
Figure 198720DEST_PATH_IMAGE006
Is less than or equal to
Figure 290173DEST_PATH_IMAGE019
Figure 530661DEST_PATH_IMAGE017
Figure 215721DEST_PATH_IMAGE020
The probability of the two sequences being inconsistent is less than or equal to 5%, 1% and 0.1%, respectively, which means that the probability of the two sequences passing the identity test is greater than 95%, 99% and 99.9%, respectively, i.e., the identity is significant at the 95%, 99% and 99.9% level, respectively
Figure 940969DEST_PATH_IMAGE021
Figure 762294DEST_PATH_IMAGE022
Figure 563897DEST_PATH_IMAGE023
Step 2.4, searching samples of corresponding time periods and corresponding sampling quantities from the mass data aiming at each element in the consistency index matrix M, repeating the steps 2.2 and 2.3 to calculate consistency index comparison values under 3 degrees of significance
Figure 470673DEST_PATH_IMAGE021
Figure 235498DEST_PATH_IMAGE024
Figure 176909DEST_PATH_IMAGE025
And the obtained M matrix is the standard for comparing the consistency of the space-time sequences of the subsequent trip.
Preferably, the step 3 comprises:
step 3.1, randomly selecting an EPID from a database as a card number to be matched, acquiring a travel space-time sequence C1 of the EPID in a specified time period, setting the number PID of a holder as P1, randomly selecting a time-continuous sequence segment from the EPID, and acquiring the time period t of the segment and the number n of nodes in the segment;
step 3.2, traversing the database to obtain a space-time sequence Ci, judging whether the card number of the traversed space-time sequence and the target card number are in the same network or not, and obtaining a sequence segment of the traversed space-time sequence in a time period t, wherein the node number of the segment is n because the travel space-time sequences of all individuals are at equal time intervals;
step 3.3, to
Figure 883834DEST_PATH_IMAGE026
Figure 543486DEST_PATH_IMAGE027
And
Figure 236635DEST_PATH_IMAGE028
setting the two sets of travel space-time sequences C1 and Ci as the travel of the same person for the two confidence intervalsThe judgment threshold values of the number of samples of the track are respectively N1, N2 and N3, and the number of samples of the consistency comparison result in the three confidence intervals is respectively S1, S2 and S3;
step 3.3, splitting the space XY coordinates of the two time sequence segments intercepted from C1 and Ci into 4 vectors, calculating the consistency between the two segments by adopting step 2.2, and checking the consistency with the consistency comparison index in the M matrix:
step 3.3.1, if the consistency index obtained by calculation is larger than the consistency comparison index of the time period
Figure 140875DEST_PATH_IMAGE029
Jumping to step 3.4;
step 3.3.2, if the consistency index is less than or equal to
Figure 159647DEST_PATH_IMAGE030
And is greater than
Figure 165649DEST_PATH_IMAGE031
Jumping to step 3.5;
step 3.3.3, if the consistency index is less than or equal to
Figure 928068DEST_PATH_IMAGE032
And is greater than
Figure 578493DEST_PATH_IMAGE033
Jumping to step 3.6;
and 3.3.4, if the consistency index is less than or equal to the consistency index, jumping to the step 3.7.
Step 3.4, abandoning the travel time-space sequence and traversing to the next sequence;
step 3.5,
Figure 377952DEST_PATH_IMAGE034
Adding 1 to the sampling number S1 of the interval, calculating that the sampling number S1 is more than or equal to N1, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.6,
Figure 746617DEST_PATH_IMAGE035
Adding 1 to the sampling number S2 of the interval, calculating that the sampling number S2 is more than or equal to N2, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.7,
Figure 906203DEST_PATH_IMAGE036
Adding 1 to the sampling number S3 of the interval, calculating that the sampling number S3 is more than or equal to N3, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.8, continuing to randomly extract fragments on the two travel time sequences, namely randomly selecting a time-continuous sequence fragment from C1, and acquiring the sequence fragments in the same time period from Ci, wherein the node numbers of the two fragments are the same;
and 3.9, for the two trip space-time sequences C1 and Ci judged to be owned by the same person, if the trip space-time sequence with the mobile communication card number EPID of Ci is associated with the holder number P1 of C1, the result shows that the space-time motion track recorded by the card number is owned by the individual with the number PID of P1, and the corresponding Relation between the EPID and the PID is stored in a data Table TR (relationship Table, Table of relationship, TR).
Preferably, the step 4 comprises:
step 4.1, traversing EPIDs of all mobile communication card numbers in the database, repeatedly executing the step 3, comparing the EPIDs with a travel space-time sequence of C1, calculating the consistency among the space-time sequences, identifying all card numbers held by individuals which can be judged as P1, and continuously storing the relationship between the card numbers and the individual numbers into an EPID-PID relational data table TR;
step 4.2, searching, traversing and comparing the whole database by taking each EPID in the database as a matching object, mining the matching relation among all mobile communication card numbers, and storing the matching relation into a data table TR;
step 4.3, traversing the data table TR, and searching for the situation that the same trip time-space sequence is held by multiple people, that is, the same trip time-space sequence Cj of the EPID belongs to multiple different PIDs, which also means that the consistency requirement is met by comparing the Cj with multiple trip time-space sequences (such as Ck, Cl, Cm and the like) belonging to different PIDs, then:
step 4.3.1, repeating the steps 3.3-3.9, traversing to obtain a travel space-time sequence (Ck, Cl, Cm and the like) belonging to the same PID (including different PIDs) by taking the Cj as an object to be matched, sampling and calculating consistency on the space-time sequence again, and rechecking the consistency of the space-time sequence based on a consistency comparison matrix M;
4.3.2, after the comparison and the inspection are carried out again, if the comparison consistency requirement between the time-space sequence of the trip such as Ck, Cl, Cm and the like and the time-space sequence of the trip cannot be met with the Cj is generated, deleting the membership relation between the PID and the Cj of the holder from the data table TR;
step 4.3.3, repeatedly executing steps 4.3.1 and 4.3.2 until only one trip space-time sequence which meets the consistency with Cj is left, or the iteration number NC is reached, if the iteration number NC is reached and the consistency between more than 1 trip space-time sequences (such as Cp, Cq, Cr and the like) belonging to different PID and Cj can be checked, calculating the cumulative consistency index sum between Cj and Cp, Cq and Cr
Figure 942292DEST_PATH_IMAGE037
Figure 771708DEST_PATH_IMAGE038
The consistency index value representing the n-th sampling comparison between the space-time sequences Cj and Cp) is selected to be the smallest value, the relation of the selected value and the Cj which belongs to the same PID is kept, and the membership relation between the Cj and other PIDs is deleted.
And 4.4, repeatedly executing the step 4.3 until the condition that one EPID belongs to a plurality of PIDs does not exist in the data table TR.
Preferably, the step 5 comprises:
step 5.1, traversing the TR data table, inquiring the PID with a plurality of mobile communication card numbers, and recording the EPIDs of all the mobile communication cards to which the PID belongs;
step 5.2, according to the obtained EPIDs, respectively inquiring the communication records of the EPIDs and the fixed sensors in a database, and sequencing the records according to time to form a plurality of pieces of travel space-time trajectory data;
step 5.3, determining weights of a plurality of trip space-time trajectory data by taking time as an order based on different data densities recorded by different mobile communication card numbers, mutually interleaving the weights to construct a new space-time trajectory, and acquiring an individual trip space-time sequence with equal time intervals based on a plurality of card numbers and fixed sensor communication records by adopting the space weighting interpolation method of the step 1.4, wherein the trip space-time sequence is based on the communication records of the card numbers, has higher node density, and can accurately calculate the position of an individual at each fixed time node for a space-time interpolation algorithm;
and 5.4, recalculating the travel space-time sequence of all the PIDs with the plurality of card numbers, storing the calculation result into a database, and providing a data basis for other subsequent analysis based on mobile communication big data.
The invention extracts the travel space-time trajectory of each mobile communication card number in a designated time period based on the communication record between the handheld mobile equipment and the fixed sensor, interpolates the travel space-time trajectory into individual travel space-time sequences with equal time intervals by adopting a space-time weighting interpolation method, by large sample random sampling, random consistency indexes between two travel time-space sequences under different conditions are constructed, a comparison matrix of the time-space sequence consistency is formed to be used as a check standard of the time-space sequence consistency, through traversing the consistency degree between the travel time-space sequences to which the two mobile communication card numbers belong, whether the mobile communication card numbers belong to the same individual is judged, on the basis of identifying one person with multiple cards, multiple trip space-time tracks under the multiple card numbers are combined and interpolated to obtain a more accurate individual trip space-time sequence, and the more accurate individual trip space-time sequence is stored in a database to provide a basis for other data analysis.
The invention has the advantages that: the method has the advantages that the communication records between the existing massive handheld mobile communication equipment and the fixed sensor are fully utilized, a comparison algorithm and a consistency check standard are designed, target data can be sorted conveniently and automatically at low cost, a plurality of card numbers belonging to one person are identified, actual space individuals are effectively extracted from a large number of communication records, and the travel space-time sequences of the individuals can be acquired more accurately through fusion of space-time tracks of the card numbers, so that a more reliable data base is provided for analysis of other large space data.
Drawings
FIG. 1 is a diagram of a one-person multi-card recognition method in a big data environment according to the present invention.
Detailed Description
In order to make the invention more comprehensible, preferred embodiments are described in detail below with reference to the accompanying drawings:
step 1, reading sensor data of an anonymous encryption mobile terminal obtained from a sensor operator, wherein the sensor data of the anonymous encryption mobile terminal are continuous in time and space, different mobile terminal card numbers correspond to different numbers, extracting a communication signaling record triggered by each card number in a specified time period to form a trip space-time communication record data set recorded by the individual through the card number, arranging the trip space-time communication record data set according to time, expanding sample points recorded by the individual through a space-time interpolation method according to equal time intervals, and obtaining a trip space-time track with each mobile communication card number as an identification mark.
The anonymous encryption mobile terminal sensor data is encrypted position information of an anonymous mobile phone user time sequence obtained by an operator from a mobile communication network, a fixed broadband network, wireless WIFI, a position service related APP and the like in real time and subjected to desensitization encryption, and the content comprises the following steps: EPID, TYPE, TIME, REGIONCODE, SENSORID, as follows:
the EPID is an anonymous one-way encryption global unique mobile terminal identification code, and is used for carrying out one-way irreversible encryption on each mobile terminal user, so that each mobile terminal user is uniquely identified, the privacy information of the user number is not exposed, and the encrypted EPID of each mobile terminal user is required to keep uniqueness, namely the EPID of each mobile phone user is kept unchanged and is not repeated with other mobile phone users at any time.
TYPE, which is the TYPE of communication action related to the current record, such as internet access, call, calling and called, short message receiving and sending, GPS positioning, sensor cell switching, sensor switching, power on and power off, etc.
TIME is the TIME at which the communication operation related to the current record occurs, and is expressed in milliseconds.
The REGIONCODE and the sensor are sensor encryption position information in which the communication operation related to the current recording occurs. The number of the REGIONCODE, SENSORID sensor, wherein REGIONCODE represents the area where the sensor is located, and SENSORID is the number of the particular sensor.
Step 1.1, the system reads individual encrypted mobile terminal sensor data obtained anonymously from a sensor operator, wherein the anonymous encrypted mobile terminal sensor data is continuous in time and space, and the method comprises the following steps: unique number EPID of mobile communication card number used for individual and fixed sensor communication, communication action TYPE TYPE, communication action occurrence TIME TIME, regional REGIONENCE of fixed sensor where communication action occurs, fixed sensor specific number SENSORID;
step 1.2, one piece of anonymous encryption mobile terminal sensor data is a signaling record, each signaling record is decrypted, fields such as EPID, TYPE, TIME, REGIONCODE, SENSORID and the like in the record are read, the longitude and latitude coordinates of the record are inquired according to the fixed sensor number in the record, and the record is converted into a geographic space XY coordinate system;
step 1.3, inquiring all communication records of the mobile communication card within a specified time period according to the unique serial number EPID of the mobile communication card, and preliminarily constructing a preliminary individual trip space-time trajectory formed by an individual through a single mobile communication card number and a fixed sensor communication record;
in this example, the individual travel space-time trajectory extracted after decryption is shown in table 1.
TABLE 1 Individual travel space-time trajectories
EPID REGIONCODE SENSORID TIME X Y
…… …… …… …… ……
2045 2421 2134 2019-07-02 12:43:56 3821.451 4248.431
2045 2421 4543 2019-07-02 13:31:43 4734.123 2343.065
2045 2421 7864 2019-07-02 13:42:31 5238.195 6548.231
2045 2421 8562 2019-07-02 13:12:19 3436.568 6536.323
2045 2421 4563 2019-07-02 14:52:38 6944.031 6703.564
2045 2421 4322 2019-07-02 14:23:46 2390.699 6550.913
2045 2421 5643 2019-07-02 14:36:29 8438.617 7539.314
2045 2421 7652 2019-07-02 15:21:23 8769.642 7404.457
2045 2421 9645 2019-07-02 15:54:25 9134.123 3250.443
2045 2421 1424 2019-07-02 15:21:21 3269.245 4439.341
2045 2421 6423 2019-07-02 15:43:43 5419.432 4390.543
2045 2421 3563 2019-07-02 15:33:45 8653.534 2563.321
…… …… …… …… ……
Step 1.4, performing space interpolation with equal time intervals on each individual trip space-time trajectory by adopting a space-time weighting interpolation method to obtain individual trip space-time sequences with equal time intervals, wherein the space-time sequences comprise space XY coordinates of individuals on each fixed time node, deleting communication records between an original mobile communication card and a fixed sensor, representing the movement of the individuals on space-time by the individual trip space-time sequences obtained by interpolation completely, and finally obtaining the trip space-time sequences with equal time intervals Th of the individuals in a specified time period T, wherein the sequences comprise T/Th +1 nodes, and each node comprises time and XY coordinate information.
In this example, the interpolated individual travel spatiotemporal sequences are shown in table 2.
TABLE 2 interpolated individual travel spatio-temporal sequences
EPID TIME X Y
…… …… ……
2045 2019-07-02 08:00:00 4232.453 6582.123
2045 2019-07-02 08:00:30 4236.542 6590.654
2045 2019-07-02 08:01:00 4230.123 6599.452
2045 2019-07-02 08:01:30 4224.453 6592.764
2045 2019-07-02 08:02:00 4218.764 6583.665
2045 2019-07-02 08:02:30 4218.699 6572.913
2045 2019-07-02 08:03:00 4210.642 6570.643
2045 2019-07-02 08:03:30 4206.754 6567.124
2045 2019-07-02 08:04:00 4193.386 6565.164
2045 2019-07-02 08:04:30 4194.824 6574.325
2045 2019-07-02 08:05:00 4206.623 6572.653
2045 2019-07-02 08:05:30 4207.114 6588.332
…… …… ……
Step 2, in order to eliminate the influence of different time on the individual travel similarity judgment, dividing 24 hours per day into N time periods, intercepting a large number of individual travel space-time tracks with equal time intervals from a database aiming at the same time period, splitting the travel space-time tracks into 4 vectors according to space XY coordinates, calculating a correlation coefficient and a standard difference between every two vectors by adopting a Pearson product moment method, constructing an index for comparing the consistency of the vectors, calculating a consistency index of the randomly selected travel tracks at fixed time intervals in the fixed time period and a difference value of the Pearson product moments between different time periods, and obtaining a consistency check standard for judging whether the two tracks are consistent or not by taking the consistency check standard as a standard, namely a judgment basis for judging that the two tracks belong to the same individual;
step 2.1, constructing a space-time sequence consistency comparison index matrix M according to a large number of individual trip space-time sequences obtained in the step 1.4, wherein M is a 2 Xnxmx 3-order matrix and represents that under the condition that two EPIDs are in the same network or different networks, 24 hours a day is divided into n time periods, each time period has M node sampling quantities, and each sampling quantity has 3 levels of consistency indexes to restrict the similarity of the space-time sequences from different levels;
step 2.2, traversing the M matrix, and aiming at M (i, j), extracting the records with time periods at i positions and j nodes in number from a large number of individual trip space-time sequences in pairs, wherein the space positions of two space-time sequence segments extracted at a time form 4 row vectors: x1, Y1, X2 and Y2, the total number of samples is N pairs, and the consistency between the segments of the row space-time sequence is calculated
Figure 1570DEST_PATH_IMAGE039
Figure 840213DEST_PATH_IMAGE004
Figure 589863DEST_PATH_IMAGE040
In the formula, r is the consistency index of 4 vectors formed by the corresponding two travel space-time sequence fragments (X1, Y1) and (X2, Y2),
Figure 855759DEST_PATH_IMAGE041
for the Pearson product-moment values between travel spatio-temporal sequence segments, representing the similarity between the two segments,
Figure 808803DEST_PATH_IMAGE042
representing the numerical difference between the two vectors as the standard difference between the X value and the Y value of the corresponding time point positions of the two vectors;
in this example, the XY coordinates of the two travel spatio-temporal sequences are shown in table 3:
TABLE 3 XY coordinates of two spatio-temporal sequences
TIME X1 Y1 X2 Y2
…… …… …… …… ……
2019-07-02 08:00:00 4232.453 6582.123 5462.424 2234.542
2019-07-02 08:00:30 4236.542 6590.654 5458.335 2200.418
2019-07-02 08:01:00 4230.123 6599.452 5464.754 2244.408
2019-07-02 08:01:30 4224.453 6592.764 5476.094 2277.848
2019-07-02 08:02:00 4218.764 6583.665 5498.845 2323.343
2019-07-02 08:02:30 4218.699 6572.913 5498.592 2334.095
2019-07-02 08:03:00 4210.642 6570.643 5514.704 2340.905
2019-07-02 08:03:30 4206.754 6567.124 5514.704 2326.829
2019-07-02 08:04:00 4193.386 6565.164 5461.232 2318.989
2019-07-02 08:04:30 4194.824 6574.325 5468.422 2318.989
2019-07-02 08:05:00 4206.623 6572.653 5421.226 2320.661
2019-07-02 08:05:30 4207.114 6588.332 5422.208 2320.661
…… …… …… …… ……
The calculated r value between the two sequences was 13061;
step 2.3, counting the average value of r obtained by N time-space sequence samples
Figure 716716DEST_PATH_IMAGE043
That is, the random consistency index under the condition of the node number j in the time period i, when two trip sequences completely coincide, the random segment between the two trip sequences
Figure 196239DEST_PATH_IMAGE044
And
Figure 757670DEST_PATH_IMAGE045
are all 0, and
Figure 588223DEST_PATH_IMAGE046
and
Figure 142570DEST_PATH_IMAGE047
both are 1, then r is 0 and step 2.2 yields
Figure 476599DEST_PATH_IMAGE048
Then, the average consistency degree between the sequences is expressed under the condition that two space-time sequences are completely randomly obtained; the comparison standards of the consistency indexes of 3 levels expressed by the consistency index ratio are respectively
Figure 208932DEST_PATH_IMAGE049
Figure 261201DEST_PATH_IMAGE050
Figure 245338DEST_PATH_IMAGE051
So as to represent the threshold value required for the consistency to reach the significance degree of different P values in the process of comparing the space-time sequences, namely the consistency index of two sequences
Figure 574819DEST_PATH_IMAGE006
Is less than or equal to
Figure 618999DEST_PATH_IMAGE052
Figure 283198DEST_PATH_IMAGE050
Figure 805446DEST_PATH_IMAGE053
The probability of the two sequences being inconsistent is less than or equal to 5%, 1% and 0.1%, respectively, which means that the probability of the two sequences passing the identity test is greater than 95%, 99% and 99.9%, respectively, i.e., the identity is significant at the 95%, 99% and 99.9% level, respectively
Figure 379647DEST_PATH_IMAGE054
Figure 714769DEST_PATH_IMAGE055
Figure 476051DEST_PATH_IMAGE056
2.4, searching samples of corresponding time periods and corresponding sampling quantities from the mass data aiming at each element in the consistency index matrix M, repeating the steps 2.2 and 2.3 to calculate consistency index comparison values under 3 degrees of significance, and obtaining an M matrix which is a standard for consistency comparison of the space-time sequence of the subsequent trip;
in this example, the M matrix between 10 am and 12 am for the same network is shown in table 4:
TABLE 4M matrix between 10 am and 12 am for the same network situation
Number of nodes 95% 99% 99.9%
1 7.4 1.5 0.5
…… …… …… ……
100 461.7 341.2 253.3
101 482.9 343.8 254.9
102 502.2 352.3 264.5
103 515.6 355.7 277.9
104 532.1 362.9 28.95
105 546.5 374.8 303.1
…… …… …… ……
241 2542.6 2057.7 1573.2
Step 3, selecting a mobile communication card number from the database as an object to be matched, acquiring a trip time-space sequence of the mobile communication card number, traversing the mobile communication record database, selecting other individual trip time-space sequences from the mobile communication record database to match the mobile communication card number, randomly selecting sequence segments with random time lengths from the time-space sequences, calculating the correlation of the two time-space sequences on the spatial positions, carrying out consistency check on the correlation, and judging whether the two time-space sequences are held by the same person;
step 3.1, randomly selecting an EPID from a database as a card number to be matched, acquiring a travel space-time sequence C1 of the EPID in a specified time period, setting the number PID of a holder as P1, randomly selecting a time-continuous sequence segment from the EPID, and acquiring the time period t of the segment and the number n of nodes in the segment;
in this example, the number of the card to be matched is 2454;
step 3.2, traversing the database, judging whether each traversed card number is in the same network with the target card number, and acquiring a sequence segment Ci of the traversed card number in a time period t, wherein the travel time-space sequences of all individuals are at equal time intervals, and the node number of the segment is n:
in this example, the traversed card number is 2142, and if the card number is different from the card number C1, the spatiotemporal information of C1 and C2 is shown in Table 5:
TABLE 5 spatiotemporal information of C1 and C2 for the case of heterogeneous networks
TIME X1 Y1 X2 Y2
…… …… …… …… ……
2019-07-02 08:00:00 4232.453 6582.123 4227.453 6588.123
2019-07-02 08:00:30 4236.542 6590.654 4234.542 6595.654
2019-07-02 08:01:00 4230.123 6599.452 4227.123 6607.452
2019-07-02 08:01:30 4224.453 6592.764 4228.453 6590.764
2019-07-02 08:02:00 4218.764 6583.665 4227.764 6575.665
2019-07-02 08:02:30 4218.699 6572.913 4208.699 6574.913
2019-07-02 08:03:00 4210.642 6570.643 4218.642 6580.643
2019-07-02 08:03:30 4206.754 6567.124 4215.754 6562.124
2019-07-02 08:04:00 4193.386 6565.164 4201.386 6573.164
2019-07-02 08:04:30 4194.824 6574.325 4200.824 6564.325
2019-07-02 08:05:00 4206.623 6572.653 4215.623 6570.653
2019-07-02 08:05:30 4207.114 6588.332 4201.114 6594.332
…… …… …… …… ……
Step 3.3, to
Figure 192204DEST_PATH_IMAGE057
Figure 355332DEST_PATH_IMAGE058
And
Figure 351101DEST_PATH_IMAGE059
setting the two sets of travel space-time sequences C1 and Ci as the judgment threshold values of the number of samples of the travel track of the same person to be N1, N2 and N3 respectively, and setting the number of samples of the consistency comparison result in the three confidence intervals to be S1, S2 and S3 respectively;
step 3.3, the space XY coordinates of the two time sequence segments intercepted from C1 and Ci are divided into 4 vectors, the consistency between the two segments is calculated by adopting the step 2.2, and the consistency is checked with the consistency comparison index in the M matrix,
step 3.3.1, if the consistency index obtained by calculation is larger than the consistency comparison index of the time period
Figure 130838DEST_PATH_IMAGE060
Jumping to step 3.4;
step 3.3.2, if the consistency index is less than or equal to
Figure 994889DEST_PATH_IMAGE054
And is greater than
Figure 137157DEST_PATH_IMAGE055
Jumping to step 3.5;
step 3.3.3, if the consistency index is less than or equal to
Figure 694040DEST_PATH_IMAGE055
And is greater than
Figure 69396DEST_PATH_IMAGE056
Jumping to step 3.6;
step 3.3.4, if the consistency index is less than or equal to
Figure 737137DEST_PATH_IMAGE056
Then jump to step 3.7;
step 3.4, abandoning the travel time-space sequence and traversing to the next sequence;
step 3.5,
Figure 733912DEST_PATH_IMAGE057
Adding 1 to the sampling number S1 of the interval, calculating that the sampling number S1 is more than or equal to N1, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.6,
Figure 461697DEST_PATH_IMAGE061
Adding 1 to the sampling number S2 of the interval, calculating that the sampling number S2 is more than or equal to N2, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.7,
Figure 950447DEST_PATH_IMAGE059
Adding 1 to the sampling number S3 of the interval, calculating that the sampling number S3 is more than or equal to N3, judging that the two travel space-time sequences represent the travel track of the same person, and otherwise, jumping to the step 3.8;
step 3.8, continuing to randomly extract fragments on the two travel time sequences, namely randomly selecting a time-continuous sequence fragment from C1, and acquiring the sequence fragments in the same time period from Ci, wherein the node numbers of the two fragments are the same;
step 3.9, for the two trip space-time sequences C1 and Ci judged to be owned by the same person, if the trip space-time sequence with the mobile communication card number EPID of Ci is associated with the holder number P1 of C1, it is indicated that the space-time motion trajectory recorded by the card number is owned by the individual with the number PID of P1, and the corresponding relationship between the EPID and the PID is stored in a data Table TR (relationship Table, Table of relationship, TR);
in this example, the consistency index value between the C1 segment and the C2 segment is 19.54, which is less than the significance index requirements of the different networks, 8 am to 10 am, 12 nodes of the space-time sequence segment, and more than 99.9% consistency probability, and the two segments are considered to be consistent; let N1 be 100, N2 be 50, N3 be 30, and through 30 sampling tests, the judgment extracted from the travel space-time sequences with EPIDs 2454 and 2142 can all meet 99.9% of consistency test, so that the card numbers with EPIDs 2454 and 2142 are judged to belong to the same space individual;
step 4, traversing the whole database to obtain all other card numbers which are judged to be held by the same person as the initially selected mobile communication card number, marking the card numbers as held by the same person, identifying individual numbers PID for the card numbers, matching the card numbers with the travel time-space sequences of all other individuals in a database traversing manner, judging whether the card numbers are held by the same person, and if the condition that one card number is held by a plurality of persons at the same time occurs, namely the similarity between one card number and two mutually unmatched card numbers can be checked, attributing the card numbers to the mobile communication cards with larger similarity;
step 4.1, traversing EPIDs of all mobile communication card numbers in the database, repeatedly executing the step 3, comparing the EPIDs with a travel space-time sequence of C1, calculating the consistency among the space-time sequences, identifying all card numbers held by individuals which can be judged as P1, and continuously storing the relationship between the card numbers and the individual numbers into an EPID-PID relational data table TR;
step 4.2, searching, traversing and comparing the whole database by taking each EPID in the database as a matching object, mining the matching relation among all mobile communication card numbers, and storing the matching relation into a data table TR;
in this example, EPID-PID relationship data Table TR is shown in Table 6:
TABLE 6 EPID-PID relationship data Table TR
PID EPID
…… ……
0323 2454
0323 2142
0323 4143
0324 3422
0325 1454
0326 4212
0327 4526
0328 3422
0329 7633
0329 1434
0330 4132
0331 2142
0331 5314
…… ……
Step 4.3, traversing the data table TR, and searching for the situation that the same trip time-space sequence is held by multiple people, that is, the same trip time-space sequence Cj of the EPID belongs to multiple different PIDs, which also means that the consistency requirement is met by comparing the Cj with multiple trip time-space sequences (such as Ck, Cl, Cm and the like) belonging to different PIDs, then:
step 4.3.1, repeating the steps 3.3-3.9, traversing to obtain a travel space-time sequence (Ck, Cl, Cm and the like) belonging to the same PID (including different PIDs) by taking the Cj as an object to be matched, sampling and calculating consistency on the space-time sequence again, and rechecking the consistency of the space-time sequence based on a consistency comparison matrix M;
4.3.2, after the comparison and the inspection are carried out again, if the comparison consistency requirement between the time-space sequence of the trip such as Ck, Cl, Cm and the like and the time-space sequence of the trip cannot be met with the Cj is generated, deleting the membership relation between the PID and the Cj of the holder from the data table TR;
step 4.3.3, repeatedly executing steps 4.3.1 and 4.3.2 until only one trip space-time sequence which meets the consistency with Cj is left, or the iteration number NC is reached, if the iteration number NC is reached and the consistency between more than 1 trip space-time sequences (such as Cp, Cq, Cr and the like) belonging to different PID and Cj can be checked, calculating the cumulative consistency index sum between Cj and Cp, Cq and Cr
Figure 31667DEST_PATH_IMAGE062
Figure 23893DEST_PATH_IMAGE063
The consistency index value representing the n-th sampling comparison between the space-time sequences Cj and Cp) is selected to be the smallest value, the relation between the selected value and the Cj which belongs to the same PID is kept, and the membership relation between the Cj and other PIDs is deleted;
in this example, the card number with EPID 2142 is held by individuals with PID 0323 and 0331 at the same time, and through resampling and matching, it is finally determined that 2142 belongs to the individual with PID 0323, and the records with PID 0331 and EPID 2142 are deleted from the TR table;
step 4.4, repeating the step 4.3 until the situation that one EPID belongs to a plurality of PIDs does not exist in the data table TR;
step 5, after traversing the database and carrying out trip time-space sequence matching on all mobile communication card numbers, inquiring the communication records of a plurality of card numbers and a fixed sensor of each group of one person and a plurality of cards for each group of one person and a plurality of cards, changing the card number EPID into an individual identification PID, mutually interpolating a plurality of trip time-space sequences into a mobile communication record set according to the time sequence, storing the new record set into the database as the communication records of the handheld mobile communication card and the fixed sensor, and jumping to the step 1.4 to carry out time-space weighted interpolation on the individual new mobile communication record set to obtain a more precise individual trip time-space sequence;
step 5.1, traversing the TR data table, inquiring the PID with a plurality of mobile communication card numbers, and recording the EPIDs of all the mobile communication cards to which the PID belongs;
in this example, an individual with a PID of 0323 holds multiple card numbers;
step 5.2, according to the obtained EPIDs, respectively inquiring the communication records of the EPIDs and the fixed sensors in a database, and sequencing the records according to time to form a plurality of pieces of travel space-time trajectory data;
in this example, the 3 travel spatio-temporal trajectories of an individual with PID 0323 are shown in table 7:
TABLE 7 Individual multiple spatiotemporal trajectories
EPID REGIONCODE SENSORID TIME X Y
…… …… …… …… …… ……
2454 4231 2389 2019-07-10 10:02:43 4978.842 6922.075
2454 4231 2224 2019-07-10 10:06:56 5022.084 6412.204
2454 4231 2567 2019-07-10 10:07:31 5290.027 6882.754
2454 4231 2423 2019-07-10 10:10:55 5452.346 6536.323
2454 4231 2423 2019-07-10 10:12:18 5452.346 6536.323
2454 4231 2423 2019-07-10 10:15:42 5452.346 6536.323
2454 4231 2423 2019-07-10 10:18:34 5452.346 6536.323
2454 4231 2423 2019-07-10 10:22:38 5452.346 6536.323
2454 4231 2345 2019-07-10 10:26:29 5246.451 6319.747
2454 4231 5234 2019-07-10 10:28:10 5881.280 6357.004
2454 4231 2134 2019-07-10 10:31:08 5571.187 6976.146
2454 4231 2342 2019-07-10 10:38:02 5844.266 6131.435
…… …… …… …… …… ……
2142 4231 2389 2019-07-10 10:03:54 4978.842 6922.075
2142 4231 2389 2019-07-10 10:07:12 4978.842 6922.075
2142 4231 2224 2019-07-10 10:08:43 5022.084 6412.204
2142 4231 2567 2019-07-10 10:09:23 5290.027 6882.754
2142 4231 2423 2019-07-10 10:11:54 5452.346 6536.323
2142 4231 2423 2019-07-10 10:13:09 5452.346 6536.323
2142 4231 2423 2019-07-10 10:20:19 5452.346 6536.323
2142 4231 2345 2019-07-10 10:24:34 5246.451 6319.747
2142 4231 2342 2019-07-10 10:29:05 5844.266 6131.435
…… …… …… …… …… ……
4143 1243 3245 2019-07-10 10:02:45 5686.413 6935.724
4143 1243 3642 2019-07-10 10:05:19 5633.066 6853.297
4143 1243 3099 2019-07-10 10:09:11 5110.714 6658.093
4143 1243 3874 2019-07-10 10:10:33 5244.108 6353.857
4143 1243 3874 2019-07-10 10:12:06 5244.108 6353.857
4143 1243 3874 2019-07-10 10:14:36 5244.108 6353.857
4143 1243 3874 2019-07-10 10:17:23 5244.108 6353.857
4143 1243 3698 2019-07-10 10:25:56 5543.188 6043.208
4143 1243 3684 2019-07-10 10:27:44 5178.479 6210.875
4143 1243 3495 2019-07-10 10:30:32 5923.473 6381.616
…… …… …… …… …… ……
Step 5.3, determining weights of a plurality of trip space-time trajectory data by taking time as an order based on different data densities recorded by different mobile communication card numbers, mutually interleaving to construct a new space-time trajectory, and acquiring an individual trip space-time sequence with equal time intervals based on the communication records of the card numbers and a fixed sensor by adopting the space weighting interpolation method of the step 1.4;
in this example, the combined spatiotemporal trajectories are shown in table 8:
TABLE 8 merged spatio-temporal trajectories
EPID REGIONCODE SENSORID TIME X Y
…… …… …… …… …… ……
2454 4231 2389 2019-07-10 10:02:43 4978.842 6922.075
4143 1243 3245 2019-07-10 10:02:45 5686.413 6935.724
2142 4231 2389 2019-07-10 10:03:54 4978.842 6922.075
4143 1243 3642 2019-07-10 10:05:19 5633.066 6853.297
2454 4231 2224 2019-07-10 10:06:56 5022.084 6412.204
2142 4231 2389 2019-07-10 10:07:12 4978.842 6922.075
2454 4231 2567 2019-07-10 10:07:31 5290.027 6882.754
2142 4231 2224 2019-07-10 10:08:43 5022.084 6412.204
4143 1243 3099 2019-07-10 10:09:11 5110.714 6658.093
2142 4231 2567 2019-07-10 10:09:23 5290.027 6882.754
4143 1243 3874 2019-07-10 10:10:33 5244.108 6353.857
2454 4231 2423 2019-07-10 10:10:55 5452.346 6536.323
2142 4231 2423 2019-07-10 10:11:54 5452.346 6536.323
4143 1243 3874 2019-07-10 10:12:06 5244.108 6353.857
2454 4231 2423 2019-07-10 10:12:18 5452.346 6536.323
2142 4231 2423 2019-07-10 10:13:09 5452.346 6536.323
4143 1243 3874 2019-07-10 10:14:36 5244.108 6353.857
2454 4231 2423 2019-07-10 10:15:42 5452.346 6536.323
4143 1243 3874 2019-07-10 10:17:23 5244.108 6353.857
2454 4231 2423 2019-07-10 10:18:34 5452.346 6536.323
2142 4231 2423 2019-07-10 10:20:19 5452.346 6536.323
2454 4231 2423 2019-07-10 10:22:38 5452.346 6536.323
2142 4231 2345 2019-07-10 10:24:34 5246.451 6319.747
4143 1243 3698 2019-07-10 10:25:56 5543.188 6043.208
2454 4231 2345 2019-07-10 10:26:29 5246.451 6319.747
4143 1243 3684 2019-07-10 10:27:44 5178.479 6210.875
2454 4231 5234 2019-07-10 10:28:10 5881.280 6357.004
2142 4231 2342 2019-07-10 10:29:05 5844.266 6131.435
4143 1243 3495 2019-07-10 10:30:32 5923.473 6381.616
2454 4231 2134 2019-07-10 10:31:08 5571.187 6976.146
2454 4231 2342 2019-07-10 10:38:02 5844.266 6131.435
…… …… …… …… …… ……
Step 5.4, recalculating travel space-time sequences of all PIDs with a plurality of card numbers, storing calculation results into a database, and providing a data basis for other subsequent analysis based on mobile communication big data;
in this example, the travel spatio-temporal trajectory recalculated by the individual with PID 0323 is shown in table 9:
TABLE 9 travel space-time trajectory of individual 0323
PID TIME X Y
…… …… …… ……
0323 2019-07-10 10:02:30 4980.1627 6914.5155
0323 2019-07-10 10:03:00 5012.6678 6901.3820
0323 2019-07-10 10:03:00 5023.2808 6912.7565
0323 2019-07-10 10:04:00 5067.1517 6944.5986
0323 2019-07-10 10:04:00 5086.1439 6916.5905
0323 2019-07-10 10:05:00 5135.1632 6920.6010
0323 2019-07-10 10:05:00 5150.1864 6941.2751
0323 2019-07-10 10:06:00 5180.4582 6990.6511
0323 2019-07-10 10:06:00 5204.3407 7012.8804
0323 2019-07-10 10:07:00 5217.2648 7023.2694
0323 2019-07-10 10:07:00 5209.4010 7031.0754
0323 2019-07-10 10:08:00 5231.0230 7042.0325
0323 2019-07-10 10:08:00 5206.9472 7046.6325
0323 2019-07-10 10:09:00 5243.4519 7050.8822
0323 2019-07-10 10:09:00 5261.7507 7024.7138
0323 2019-07-10 10:10:00 5268.0003 7016.2601
0323 2019-07-10 10:10:00 5318.6818 7035.0196
0323 2019-07-10 10:11:00 5287.5475 7037.5181
0323 2019-07-10 10:11:00 5237.9831 7006.4965
0323 2019-07-10 10:12:00 5268.3228 7027.2627
0323 2019-07-10 10:12:00 5224.9202 6978.9209
0323 2019-07-10 10:13:00 5269.4769 6977.9299
0323 2019-07-10 10:13:00 5309.9159 7026.3982
0323 2019-07-10 10:14:00 5284.2201 6980.0280
0323 2019-07-10 10:14:00 5321.5669 6947.7865
0323 2019-07-10 10:15:00 5367.0882 6918.8849
0323 2019-07-10 10:15:00 5405.4590 6941.7112
0323 2019-07-10 10:16:00 5423.6286 6970.9030
0323 2019-07-10 10:16:00 5404.8811 6992.2385
0323 2019-07-10 10:17:00 5422.5825 6969.5919
0323 2019-07-10 10:17:00 5386.9400 7011.5913
0323 2019-07-10 10:18:00 5343.6286 6964.4445
0323 2019-07-10 10:18:00 5308.0133 6976.5579
0323 2019-07-10 10:19:00 5320.2723 6996.9447
0323 2019-07-10 10:19:00 5291.6284 7041.9836
0323 2019-07-10 10:20:00 5269.3226 7064.3575
0323 2019-07-10 10:20:00 5304.9516 7069.9671
…… …… …… ……

Claims (7)

1. A method for identifying one person with multiple cards in a big data environment is characterized by comprising the following steps:
step 1, reading anonymous encrypted mobile terminal sensor data with a unique EPID number obtained from a sensor operator, extracting a communication signaling record triggered in a specified time period, expanding recorded sampling points at equal time intervals, and obtaining a travel space-time trajectory;
step 2, intercepting a large number of individual track segments with equal time intervals from the travel space-time track, constructing a comparative vector consistency index, calculating the consistency index of the randomly selected travel track at fixed time intervals in a fixed time period, and obtaining a consistency inspection index for judging whether the two tracks are consistent;
step 3, selecting a mobile communication card number from the database as an object to be matched, acquiring a trip time-space sequence of the mobile communication card number, traversing the mobile communication record database, selecting other individual trip time-space sequences from the mobile communication card number to match the mobile communication card number, randomly intercepting sequence segments of the same point position in the time-space sequences, calculating the correlation of the two time-space sequences on the space position, carrying out consistency check on the correlation, and judging whether the two time-space sequences are held by the same person;
step 4, traversing the whole database to obtain all other card numbers which are judged to be held by the same person as the initially selected mobile communication card number, marking the card numbers as held by the same person, identifying individual numbers PID for the card numbers, matching the card numbers with the travel time-space sequences of all other individuals in a database traversing manner, judging whether the card numbers are held by the same person, and if the condition that one card number is held by a plurality of persons at the same time occurs, namely the similarity between one card number and two mutually unmatched card numbers can be checked, attributing the card numbers to the mobile communication cards with larger similarity;
and 5, after traversing the database, after carrying out trip time-space sequence matching on all mobile communication card numbers, judging whether the mobile communication card numbers are held by one person or not with other card numbers, inquiring the communication records of a plurality of card numbers and a fixed sensor of each group of one person and a plurality of card numbers aiming at each group of one person and a plurality of cards, changing the card numbers into individual identification PID, mutually interpolating a plurality of trip time-space sequences into a mobile communication record set according to the time sequence, storing the new record set into the database as the communication records of the handheld mobile communication card and the fixed sensor, carrying out time-space weighted interpolation at equal time intervals, and obtaining a more precise individual trip time-space sequence.
2. The method for one-person-multiple-card recognition in big data environment as claimed in claim 1, the step 1 inquires all communication records of the mobile communication card within a specified time period according to the unique number EPID of the mobile communication card, initially constructs an individual trip space-time track formed by an individual through a single mobile communication card number and a fixed sensor communication record, adopts a space-time weighting interpolation method, carrying out space interpolation of equal time intervals on each individual trip space-time trajectory to obtain individual trip space-time sequences of equal time intervals, deleting communication records between an original mobile communication card and a fixed sensor, representing the movement of an individual in space-time by the individual trip space-time sequences obtained through interpolation, and constructing the trip space-time sequences of the individual at equal time intervals Th in a specified time period T, wherein the sequences comprise T/Th +1 nodes, and each node comprises time and XY coordinate information.
3. A majority according to claim 1According to the one-person multi-card identification method under the environment, the method is characterized in that in the step 2, a time-space sequence consistency comparison index matrix M is constructed on the basis of individual trip time-space sequences, wherein M is a matrix of 2 XnxmxmX3 order, the matrix M is traversed, and consistency indexes r constructed on the basis of Pearson product moment values and similarity between every two trip time-space sequence fragments are extracted once; the average value of r obtained by counting N time-space sequence samples
Figure 498852DEST_PATH_IMAGE001
And is provided with
Figure 963462DEST_PATH_IMAGE002
Figure 274358DEST_PATH_IMAGE003
Figure 200726DEST_PATH_IMAGE004
The similarity of the spatio-temporal sequences is constrained from different levels by the consistency indexes with 3 levels in total, namely the consistency is significant at the levels of 95%, 99% and 99.9%, and the consistency indexes are respectively
Figure 749519DEST_PATH_IMAGE005
Figure 103140DEST_PATH_IMAGE006
Figure 416178DEST_PATH_IMAGE007
4. The method for identifying one person with multiple cards in big data environment according to claim 1, wherein said step 3 obtains the travel space-time sequence of the target card number in the random time slot, and compares it with the simultaneous time slot sequence of each card number in the database: if the consistency index is greater than
Figure 146237DEST_PATH_IMAGE005
Discarding the travel space-time sequence; if the consistency index is less than or equal to
Figure 549536DEST_PATH_IMAGE005
And continuously and randomly intercepting segments of the same point positions on the two travel time sequences, and calculating the consistency index of the segments until the sequence pair is abandoned or the two travel time-space sequences represent the travel track of the same person.
5. The method for one-person-multiple-card identification in big data environment as claimed in claim 4, wherein said step 3 is directed to
Figure 74059DEST_PATH_IMAGE008
Figure 359546DEST_PATH_IMAGE009
And
Figure 644028DEST_PATH_IMAGE010
setting the two groups of travel space-time sequences C1 and Ci as the sampling number discrimination threshold values of the travel track of the same person and the sampling number of the consistency comparison result in the three confidence intervals, and when the consistency index is larger than the consistency comparison index of the time period
Figure 698572DEST_PATH_IMAGE005
And discarding the travel sequence, otherwise, judging that the travel time-space sequences are all two travel time-space sequences of the same person if the number of samples meeting the consistency index in any confidence interval is greater than or equal to the number of samples judgment threshold.
6. The method as claimed in claim 1, wherein the step 4 traverses the EPIDs of all mobile communication card numbers in the database, identifies all card numbers that can be determined as being held by individuals with PID P1, continues to store the relationship between the card numbers and the individual numbers into an EPID-PID relationship data table TR, performs sampling and consistency calculation on the same trip time-space sequence again when the same trip time-space sequence is held by multiple individuals, sets an upper limit iteration number, updates the TR if the unique correspondence is satisfied, calculates the cumulative consistency index sum of the multiple time-space sequences if the unique correspondence is not satisfied, and selects the sequence with the smallest value to retain.
7. The one-person multi-card identification method in the big data environment as claimed in claim 1, wherein said step 5 traverses the TR data table, obtains the communication records of a plurality of EPIDs belonging to the same PID, sorts the records according to time, forms a plurality of travel spatio-temporal trajectory data, determines the weights of the data based on the different data densities of the records, alternates between them, constructs a new spatio-temporal trajectory, obtains the individual travel spatio-temporal sequences at equal time intervals by the spatial weighting interpolation method, stores the calculation results in the database, and provides the data base for other subsequent analyses based on the big data of mobile communication.
CN202011142356.0A 2020-10-23 2020-10-23 One-person multi-card identification method in big data environment Active CN111970685B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011142356.0A CN111970685B (en) 2020-10-23 2020-10-23 One-person multi-card identification method in big data environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011142356.0A CN111970685B (en) 2020-10-23 2020-10-23 One-person multi-card identification method in big data environment

Publications (2)

Publication Number Publication Date
CN111970685A true CN111970685A (en) 2020-11-20
CN111970685B CN111970685B (en) 2021-01-15

Family

ID=73387625

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011142356.0A Active CN111970685B (en) 2020-10-23 2020-10-23 One-person multi-card identification method in big data environment

Country Status (1)

Country Link
CN (1) CN111970685B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150319A (en) * 2023-10-30 2023-12-01 北京艾瑞数智科技有限公司 Method and device for identifying multiple numbers of one person

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573390A (en) * 2015-01-27 2015-04-29 武汉大学 Cognitive-rule-based time-space trajectory fusion method and road network topology generating method
CN108733818A (en) * 2018-05-21 2018-11-02 上海世脉信息科技有限公司 A kind of big data expansion quadrat method based on the verification of more scene multi-data sources
CN109348404A (en) * 2018-10-09 2019-02-15 上海世脉信息科技有限公司 A kind of method that individual trip path locus extracts under big data environment
CN110162588A (en) * 2019-05-29 2019-08-23 浪潮软件集团有限公司 A kind of track fusion method of multidimensional related information
CN110958599A (en) * 2018-09-26 2020-04-03 北京融信数联科技有限公司 One-machine multi-card user distinguishing method based on track similarity
CN110958600A (en) * 2018-09-26 2020-04-03 北京融信数联科技有限公司 Method for judging number of one-machine multi-card users in regional population based on track similarity
CN111343581A (en) * 2018-12-18 2020-06-26 北京融信数联科技有限公司 One-person multi-number mobile user identification method based on distance

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104573390A (en) * 2015-01-27 2015-04-29 武汉大学 Cognitive-rule-based time-space trajectory fusion method and road network topology generating method
CN108733818A (en) * 2018-05-21 2018-11-02 上海世脉信息科技有限公司 A kind of big data expansion quadrat method based on the verification of more scene multi-data sources
CN110958599A (en) * 2018-09-26 2020-04-03 北京融信数联科技有限公司 One-machine multi-card user distinguishing method based on track similarity
CN110958600A (en) * 2018-09-26 2020-04-03 北京融信数联科技有限公司 Method for judging number of one-machine multi-card users in regional population based on track similarity
CN109348404A (en) * 2018-10-09 2019-02-15 上海世脉信息科技有限公司 A kind of method that individual trip path locus extracts under big data environment
CN111343581A (en) * 2018-12-18 2020-06-26 北京融信数联科技有限公司 One-person multi-number mobile user identification method based on distance
CN110162588A (en) * 2019-05-29 2019-08-23 浪潮软件集团有限公司 A kind of track fusion method of multidimensional related information

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117150319A (en) * 2023-10-30 2023-12-01 北京艾瑞数智科技有限公司 Method and device for identifying multiple numbers of one person

Also Published As

Publication number Publication date
CN111970685B (en) 2021-01-15

Similar Documents

Publication Publication Date Title
CN110505583B (en) Trajectory matching method based on bayonet data and signaling data
CN111582948B (en) Individual behavior analysis method based on mobile phone signaling data and POI (Point of interest)
CN106778876B (en) User classification method and system based on mobile user track similarity
EP3132592B1 (en) Method and system for identifying significant locations through data obtainable from a telecommunication network
CN111737605A (en) Travel purpose identification method and device based on mobile phone signaling data
CN109801091B (en) Target user group positioning method and device, computer equipment and storage medium
CN108536851A (en) A kind of method for identifying ID based on motion track similarity-rough set
CN109195219B (en) Method for determining position of mobile terminal by server
Muzammal et al. Trajectory mining using uncertain sensor data
CN107977673A (en) A kind of economically active population's recognition methods based on big data
WO2014012927A1 (en) Method and system for traffic estimation
Vieira et al. Querying spatio-temporal patterns in mobile phone-call databases
CN111970685B (en) One-person multi-card identification method in big data environment
CN111209487B (en) User data analysis method, server, and computer-readable storage medium
CN110781256B (en) Method and device for determining POI matched with Wi-Fi based on sending position data
CN111372194B (en) Intelligent identification method for mobile phone card changing user
CN111143639B (en) User intimacy calculation method, device, equipment and medium
Dyrmishi et al. Mobile positioning and trajectory reconstruction based on mobile phone network data: A tentative using particle filter
EP3563592B1 (en) Method for determining the mobility status of a user of a wireless communication network
Shi et al. Mobility patterns analysis of Beijing residents based on call detail records
CN107241693B (en) Method for determining position of non-coordinate sensor in big data environment
CN114595300A (en) Active chain reconstruction method and system combining multi-source space-time data
CN115734165A (en) User searching method, device, equipment and computer readable storage medium
CN117648556B (en) Family membership identification method based on space-time big data
Yu et al. Discovery of Travelling Companions from Trajectories with Different Sampling Rates

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant