WO2016188380A1 - 用户设备的确定方法及装置 - Google Patents

用户设备的确定方法及装置 Download PDF

Info

Publication number
WO2016188380A1
WO2016188380A1 PCT/CN2016/082927 CN2016082927W WO2016188380A1 WO 2016188380 A1 WO2016188380 A1 WO 2016188380A1 CN 2016082927 W CN2016082927 W CN 2016082927W WO 2016188380 A1 WO2016188380 A1 WO 2016188380A1
Authority
WO
WIPO (PCT)
Prior art keywords
user equipment
user
information
type
potential
Prior art date
Application number
PCT/CN2016/082927
Other languages
English (en)
French (fr)
Inventor
韦薇
陆平
范贤友
宋国杰
贾培申
刘丹萌
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016188380A1 publication Critical patent/WO2016188380A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/005Discovery of network devices, e.g. terminals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/20Services signaling; Auxiliary data signalling, i.e. transmitting data via a non-traffic channel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/02Protecting privacy or anonymity, e.g. protecting personally identifiable information [PII]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W16/00Network planning, e.g. coverage or traffic planning tools; Network deployment, e.g. resource partitioning or cells structures
    • H04W16/22Traffic simulation tools or models
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W8/00Network data management
    • H04W8/02Processing of mobility data, e.g. registration information at HLR [Home Location Register] or VLR [Visitor Location Register]; Transfer of mobility data, e.g. between HLR, VLR or external networks
    • H04W8/08Mobility data transfer
    • H04W8/16Mobility data transfer selectively restricting mobility data tracking

Definitions

  • the present application relates to the field of communications, for example, to a method and apparatus for determining user equipment.
  • the embodiment of the invention provides a method and a device for determining a user equipment, so as to at least solve the problem that the relevant user cannot be identified by the mobile device in the related art.
  • An embodiment of the present invention provides a method for determining a user equipment, including: determining a potential user equipment whose specified time period is within a specified space; acquiring association information corresponding to the potential user equipment; The specified user device is determined in the potential user device.
  • determining the potential user equipment that the specified time period is within the specified space includes: acquiring location information of the first type of user equipment, where the first type of user equipment is within the specified time period and located in the designated space In the range, the first type of user equipment is used as the potential user equipment; the trajectory information of the second type of user equipment is acquired, when the trajectory information meets within the specified time period and is located within the specified space The second type of user equipment is used as the potential user equipment.
  • acquiring the trajectory information of the second type of user equipment includes: mining, according to historical call record information of the user corresponding to the second type of user equipment, the movement regularity of the user; Regularity determines the trajectory information of the second type of user equipment.
  • determining the potential user equipment that the specified time period is within the specified space includes: obtaining a discrete entropy of the user corresponding to the second type of user equipment; and if the discrete entropy is less than a predetermined threshold, according to the The historical call record information of the user acquires the regularity of the movement of the user, and determines the second type of user equipment according to the movement regularity; and if the discrete entropy is greater than or equal to the predetermined threshold, according to all in the database The user's historical call information determines the second type of user equipment.
  • the association information includes at least one of the following: the potential user equipment corresponds to the user's residence and work place information, the user's social relationship information corresponding to the potential user equipment, and the potential user equipment. Demographic characteristics of the user.
  • the acquiring the social relationship information of the user corresponding to the potential user equipment comprises: acquiring the social relationship information of the user according to the information of at least one of the following: the time characteristic information of the user equipment performing the call, and the Information of other user equipments in which the user equipment is in the same location at the same time, and information of other user equipments that have a common contact with the user equipment.
  • determining the specified user equipment in the potential user equipment according to the association information includes: acquiring a weight of the specified information included in the association information; and using a plurality of users in the potential user equipment according to the weight The device performs sorting; the user equipment whose sorting result is in a predetermined order is determined as the specified user equipment.
  • the embodiment of the invention further provides a determining device of the user equipment, comprising: a first determining module, configured to determine a potential user equipment that is within a specified space within a specified time period; and an obtaining module configured to acquire the potential user equipment Corresponding association information; the second determining module is configured to determine the specified user equipment in the potential user equipment according to the association information.
  • the first determining module includes: a first acquiring unit, configured to acquire location information of the first type of user equipment; and the first type of user equipment is in the specified time period and in the designated space
  • the first type of user equipment is used as the potential user equipment
  • the second obtaining unit is configured to acquire the trajectory information of the second type of user equipment, where the trajectory information is satisfied within the specified time period.
  • the second type of user equipment is set as the potential user Ready.
  • the second obtaining unit includes: a mining subunit, configured to mine the mobile regularity of the user according to historical call record information of the user corresponding to the second type of user equipment; Determining the trajectory information of the second type of user equipment according to the moving regularity.
  • the first determining module is further configured to acquire a discrete entropy of the user corresponding to the second type of user equipment; and if the discrete entropy is less than a predetermined threshold, according to historical call record information of the user Obtaining a movement regularity of the user, determining the second type of user equipment according to the movement regularity; and determining, according to the historical call information of all users in the database, that the discrete entropy is greater than or equal to the predetermined threshold The second type of user equipment.
  • Embodiments of the present invention also provide a non-transitory computer readable storage medium storing computer executable instructions for performing the above method.
  • Embodiments of the present invention also provide an apparatus including one or more processors, a memory, and one or more programs, the one or more programs being stored in a memory when executed by one or more processors , perform the above method.
  • the potential user equipment in the specified space is determined by the specified time period; the association information corresponding to the potential user equipment is obtained; and the specified user equipment is determined in the potential user equipment according to the association information.
  • FIG. 1 is a flowchart of a method for determining a user equipment according to an embodiment of the present invention
  • FIG. 2 is a structural block diagram of a user equipment determining apparatus according to an embodiment of the present invention.
  • FIG. 3 is a structural block diagram (1) of a user equipment determining apparatus according to an embodiment of the present invention.
  • FIG. 4 is a structural block diagram (2) of a user equipment determining apparatus according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of a method for user abnormality sorting of call data according to an embodiment of the present invention
  • FIG. 6 is a flow chart of a trajectory prediction according to an embodiment of the present invention.
  • FIG. 7 is a flowchart of OD identification of call data according to an embodiment of the present invention.
  • Figure 8 is a schematic diagram of a criminological geographic portrait in accordance with an embodiment of the present invention.
  • FIG. 9 is a flow chart of a sorting module according to an embodiment of the present invention.
  • FIG. 10 is a structural diagram of a user identification system according to an embodiment of the present invention.
  • Figure 11 is a schematic view showing a geographical portrait according to a first embodiment of the present invention.
  • FIG. 12 is a flowchart of user identification according to an embodiment of the present invention.
  • FIG. 13 is a schematic structural diagram of hardware of a device according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a method for determining a user equipment according to an embodiment of the present invention. As shown in FIG. 1, the process includes the following steps:
  • Step S102 determining a potential user equipment that is within a specified space within a specified time period
  • Step S104 Acquire association information corresponding to the potential user equipment.
  • Step S106 Determine the designated user equipment in the potential user equipment according to the association information.
  • the determination may be performed according to the association information of the potential user equipment, so that the user corresponding to the specified user equipment may be determined, compared to the traditional manual troubleshooting manner,
  • the solution solves the problem that the relevant users cannot be identified by the mobile device in the related art, and achieves the effect of quickly identifying the user and saving human resources.
  • the foregoing step S102 involves determining a potential user equipment that is within a specified space within a specified time period.
  • acquiring location information of the first type of user equipment when the first type of user equipment is within a specified time period and is specified. When the space is within range, the first type of user equipment is used as a potential user equipment.
  • acquiring track information of the second type of user equipment when the track information is satisfied When the specified time period is within the specified space, the second type of user equipment is used as the potential user equipment. Therefore, potential user equipments within a specified time period and within a specified space range and potential user equipments that are located in a specified spatial range within a specified time period are counted according to the trajectory information of the user equipment.
  • the user's movement regularity is mined according to the historical call record information of the user corresponding to the second type of user equipment, according to the movement regularity Determining the trajectory information of the second type of user equipment.
  • obtaining a discrete entropy of the user corresponding to the second type of user equipment, where the discrete entropy is less than a predetermined threshold According to the history call record information of the user, the movement regularity of the user is obtained, and the second type of user equipment is determined according to the movement regularity.
  • the second type of user equipment is determined according to historical call information of all users in the database if the discrete entropy is greater than or equal to the predetermined threshold. Thus, the second type of user equipment is relatively accurately determined.
  • the association information may be the location and work location information of the user corresponding to the potential user equipment, or may be the social relationship information of the user corresponding to the potential user equipment, or may be the user corresponding to the potential user equipment.
  • Demographic characteristics information may be the age of the user or the gender of the user.
  • the social relationship information of the user is obtained according to the information of at least one of the following: time characteristic information of the user equipment performing the call, information of other user equipments in the same position as the user equipment at the same time, and the user equipment. Information about other user devices that have a common contact. Thereby, the social relationship information of the user corresponding to the potential user equipment can be obtained.
  • the weight of the specified information included in the associated information is obtained according to The weight sorts a plurality of user equipments in the potential user equipment; the user equipment in which the ranking result is in a predetermined order is determined as the designated user equipment. Thereby, determining the specified user equipment in the potential user equipment according to the association information is completed.
  • a determining device of the user equipment is further provided, and the device can implement the foregoing implementation.
  • the device can implement the foregoing implementation.
  • the term "module” may implement a combination of software and/or hardware of a predetermined function.
  • the apparatus described in the following embodiments is preferably implemented in software, hardware, or a combination of software and hardware, is also possible and contemplated.
  • the apparatus includes: a first determining module 22 configured to determine a potential user equipment that is within a specified space within a specified time period;
  • the module 24 is configured to obtain association information corresponding to the potential user equipment.
  • the second determining module 26 is configured to determine the designated user equipment in the potential user equipment according to the association information.
  • FIG. 3 is a structural block diagram (1) of a user equipment determining apparatus according to an embodiment of the present invention.
  • the first determining module 22 includes: a first acquiring unit 222, configured to acquire location information of a first type of user equipment. When the first type of user equipment is within the specified time range and is within the specified space, the first type of user equipment is used as the potential user equipment; and the second obtaining unit 224 is configured to obtain the track information of the second type of user equipment. When the track information satisfies within a specified time period and is within a specified space, the second type of user equipment is used as a potential user equipment.
  • the second obtaining unit 224 includes: a mining subunit 2242, configured to be based on a user corresponding to a second type of user equipment.
  • the historical call record information mines the movement regularity of the user; the determining subunit 2244 is configured to determine the trajectory information of the second type of user equipment according to the movement regularity.
  • the first determining module 22 is further configured to acquire a discrete entropy of the user corresponding to the second type of user equipment; and if the discrete entropy is less than a predetermined threshold, obtain the user's movement according to the historical call record information of the user.
  • the second type of user equipment is determined according to the regularity of the movement; and the second type of user equipment is determined according to the historical call information of all users in the database if the discrete entropy is greater than or equal to the predetermined threshold.
  • the foregoing multiple modules may be implemented by software or hardware.
  • the foregoing may be implemented by, but not limited to, the foregoing multiple modules are all located in the same processor; Modules are located in multiple processors, such as a first processor, a second processor, and a third In the processor.
  • the optional embodiment combines the development of computer technology and data science, combines the knowledge background of the public security field, and utilizes the method of social network analysis to provide a method for troubleshooting related personnel in a digital anomaly event, and identifying users related to abnormal events. And combined with domain knowledge to provide relevance rankings. Compared with the related manual troubleshooting methods, the method has faster response speed and more comprehensive coverage, which can effectively assist the security personnel in troubleshooting. Due to the popularity of mobile phones, mobile phone call data is massive and covers the vast majority of the city's population.
  • FIG. 5 is a flowchart of a method for user abnormality sorting of call data according to an embodiment of the present invention. As shown in FIG. 5, by cleaning and encrypting original data, trajectory prediction, OD recognition, social relationship recognition, user feature recognition, and The processed data is sorted by relevance, thereby realizing user abnormal sorting based on call data.
  • the optional embodiment combines OD identification based on call data, social relationship and user feature recognition to provide a method for identifying and sorting related users in an abnormal event.
  • the main steps are:
  • Step 1 Data preprocessing.
  • This section processes the collected raw call data into the required format.
  • the required attribute domain is extracted from the original call data, including the user identifier (encrypted mobile phone number), the location of the call base station (ie, the call base station identifier), and the call time.
  • user privacy is an important issue in the call data, so it is necessary to encrypt the user's mobile phone number and generate a key value that is only used to identify the user.
  • the original call data is usually massive and redundant. By preprocessing and filtering out the required data, the amount of data can be effectively reduced, and the efficiency of subsequent processing is improved.
  • Step 2 Anomaly user range delineation based on trajectory prediction.
  • FIG. 6 is a flowchart of trajectory prediction according to an embodiment of the present invention, and FIG. 6 is explained below.
  • the potential related user refers to the user within a certain spatial range [s 1 , s 2 ] within a certain time range [t 1 , t 2 ] related to the abnormal event.
  • the call data is not real-time continuous, if the user equipment does not expose the location during the time period in which the abnormal event occurs, the user who may be located in the space [s 1 , s 2 ] in the time period [t 1 , t 2 ] Equipment should also be divided into potential user groups. Therefore, we introduce the user's trajectory prediction module to handle the user equipment held by such users.
  • the process of trajectory prediction is:
  • CTP Crowd Trajectory Predictor
  • ITP Individual Trajectory Predictor
  • the preset threshold may be preset, and the discrete entropy of the user is compared with the preset threshold. When the discrete entropy is greater than or equal to the preset threshold, the discrete entropy of the user is determined to be larger; when the discrete entropy is less than the preset threshold When determining, the user's discrete entropy is small.
  • Discrete entropy can be used to measure user predictability, as defined below:
  • i is the sequence number of the base station
  • i is a positive integer
  • n is a positive integer greater than or equal to 1
  • Ri is the base station identity
  • p(R i ) is the frequency of the user in the coverage area of the base station. degree,. The larger the discrete entropy, the lower the user's motion regularity.
  • UltraPattern ⁇ h 1 , R 1 > ⁇ h 2 , R 2 >... ⁇ h n , R n >, where h i represents a time slice (preset to divide the 24 hours of the day into 24 Time slice, 1 hour per time slice), R i denotes the base station identity, such a data structure can be used to represent the user's movement trajectory.
  • Input The ID of the predicted user, the date and time point at which the abnormal event was predicted.
  • Output The base station identifier, which represents the location where the user may be at the predicted time point, sorted by the support size.
  • the call record of the two hours before the predicted time point is taken from the database in the predicted date when the abnormal event occurs, wherein the call record includes the location information of the base station, and the location information of the base station is The base station identifier of the base station where the user is currently located is used as the prediction basis.
  • the personal prediction model is used: processing the user's personal history mobile call data, mining the regularity of the user's movement, and compressing the historical call data to obtain a two-dimensional array UltraPattern [24] [7], the two-dimensional array can An array of call records that represent a call record per hour, 24 hours a day, in a weekly cycle.
  • the prediction process the base station identifier of each base station of each location corresponding to the predicted time point is used as an alternative prediction position, and is sorted according to the support degree. The forecast ends.
  • the base station identifier corresponding to one or more base stations where the user is located at the predicted time point is recorded, which means that the user appears in the one or more at the predicted time point.
  • the probability of coverage within the base station is large.
  • the user's discrete entropy is calculated according to the user's historical information.
  • the personal prediction model UltraPattern[24][7] is established. Prediction process: The prediction is based on the position of the next moment in UltraPattern[24][7]. If there is no exact matching path, the population prediction model is used.
  • a population prediction model is used. In the same city, the location of the base station is limited, and human activities have a certain commonality. Therefore, the user's next moment position can be predicted based on the history of all users in the database.
  • the group prediction model is trained: the day is divided into 24 time slices, and 24 transfer prediction matrices are trained by using the historical call records of all users in the database.
  • the horizontal and vertical coordinates of each transition prediction matrix respectively represent the base station serial number, wherein The base station sequence number represented by the abscissa of each branch prediction matrix may be the same as or different from the base station sequence number represented by the ordinate.
  • the previous time period (referred to as 0 o'clock time slice) to the next time period from 1 o'clock to 1:59:59 (abbreviated as 1 o'clock)
  • the time slice trains the transition prediction matrix A0, and the elements a0 i,j of the i-th row and the j-th column indicate that the user's 0-hour time slice is in the coverage area of the base station with the base station sequence number i, and the 1-hour time slice is The probability of the base station coverage area of the base station sequence number j.
  • the prediction process in the case of a given prediction basis, that is, the current location of the known user, the value may be taken from the matrix corresponding to the corresponding time, and the row of the base station number corresponding to the base station identifier of the current location of the user is plotted as the horizontal coordinate. Take a larger number of values and arrange them in descending order. The ordinates corresponding to these values are taken in this order, and the ordinate represents the base station serial number, and then the base station serial number is converted into the base station identifier, and returned as a prediction result. The forecast ends.
  • Step 3 OD recognition based on call data, social relationship recognition, and user feature recognition.
  • the large-scale crowd commuting OD discovery method based on mobile phone call data is:
  • the call data is divided according to the base station of the call, and each call base station corresponds to a number of calls.
  • the call base stations are arranged in a row from the largest to the smallest, and then the aligned call base stations are spatially combined to form a new call location.
  • the call location is filtered to delete the call location where the call is sparse.
  • a call frequency threshold may be preset, and when the call frequentness is less than the call frequency threshold, it is determined that the call at the call location is sparse.
  • the location of the Tday and Tnight call data with the highest frequency of calls is taken as D and O, that is, the work place and the place of residence.
  • FIG. 7 is a flowchart of OD identification of call data according to an embodiment of the present invention. As shown in FIG. 7, the process includes the following steps:
  • Step S702 preprocessing the call data
  • Step S704 performing statistics on the call frequency of each call location point
  • Step S706 spatial combination optimization
  • Step S708 time combining optimization.
  • Social relationships are divided into three categories: family, colleagues, and others.
  • the two users who are connected are organized into one user pair, and the recognition of the relationship is converted into a classification problem, and the user pair is classified as a family relationship, a colleague relationship, or the like.
  • the extracted features are as follows:
  • working hours can refer to 8:00-12 and 13-17:00 from Monday to Friday, evening can be from 17:00 to 19:00 Beijing time, and at night it can be from 19:00 to 23:00 Beijing time. It means 23 o'clock in Beijing time - 3 o'clock in the next day.
  • the call data may not include the user's age, gender, etc.
  • Some telecom operators may have more complete user information in their application number records, but this information has higher privacy requirements on one hand, and data integrity and authenticity on the one hand. Sex is not guaranteed.
  • Statistics show that users of different genders or age groups have certain differences in calling habits. Therefore, by extracting relevant feature values, the classification model (decision tree, random forest, etc.) can be used to identify the gender and age of the user.
  • the user's age and gender information can help the similarity of social relations. For example, from the conclusions of some empirical analysis cases, the relationship between the victim and the victim in the intentional homicide case is gender-specific, in terms of age. It has "coincidence", among which the reference materials are: [1]. Journal of Chinese People's Public Security University: Social Science Edition, No. 2, 2006, “Empirical Analysis of the Relationship between Victims and Victims in Intentional Homicide Cases", author : Gao Weiwei, check defense.
  • the process of user feature recognition is:
  • the user's gender, age and other information have certain defects in the real data.
  • the use of machine learning methods to compensate for gender and age can compensate for this defect to some extent.
  • the age is divided into three age groups (18-25, 26-40, 41-60), so that the identification problem of age translates into a multi-category classification problem.
  • the classification label of the partial user (the classification label may be the gender classification information and/or the age classification information of the user) may be obtained from an external system (for example, a customer relationship management system), and then multiple call feature values of all users are extracted from the call data. (The plurality of call feature values may include feature values as shown in the following table). Then, using the supervised learning method (for example, using the decision tree model), the classification label of other users (the classification labels of the other users are not obtained from the external system) is determined.
  • the sorting parts are divided into three categories: spatial relations, social relations, and domain model-based sorting, which comprehensively analyze the degree of correlation between users and abnormal events from three perspectives.
  • a document In the field of text information retrieval, a document is often organized into a document vector.
  • the elements of the vector are the number of occurrences of the term in the document (or TF/IDF value), and the cosine similarity is used to return a similar document.
  • the theoretical model of cosine similarity is as follows:
  • the elements in the vector are the average number of times the user appears at the base station, and the cosine similarity s 1 of the user and the abnormal event party trajectory is obtained respectively, and the user's past space vector and the cosine of the current space vector are obtained. Similarity s 2 , the suspicious degree of the user's spatial behavior is:
  • the filtering is first performed according to the time and spatial position of the abnormal event, and the set of potential suspicious users, that is, the set of users S appearing within a certain range of the position within a certain time range is selected.
  • the set of potential suspicious users that is, the set of users S appearing within a certain range of the position within a certain time range is selected.
  • For the user in the user set S view the coincidence degree of the social relationship between the user and the abnormal event party, and combine the gender and age information of the user to give a sorting result.
  • two activity radii r1 and r2 may be preset in the center of the criminal suspect's fixed activity point (home or work place), wherein the range located in r1 represents a fixed activity point from the suspect. Very close, the range outside r2 indicates a place far from the fixed activity point of the suspect.
  • r1 is a radius of a smaller circumference centered on a criminal suspect's fixed activity place (home or work place), and r2 is a criminal suspect.
  • the person's fixed activity (home or work place) is the radius of the larger circumference of the center.
  • a serial case is a crime that satisfies the above assumptions, such as multiple executions of murder, robbery, rape, etc.; and this method is also suitable for cases where a crime involves multiple locations, such as a murder in which a criminal is witnessed, stolen, or robbed. The location is different.
  • the fixed activity point (OD point) of the suspicious person is usually within the ring formed by the concentric circles formed by the different radii at the center of the occurrence of the event.
  • the distance calculation here is based on the traffic network and uses the Manhattan distance.
  • the base stations in the ring area are found, and the personnel who use these base stations as O/D are mainly investigated. If there are multiple events, the base stations that cross the ring areas are more suspected of being O/D personnel. Therefore, based on the geographic portrait, the probability that the relevant person O/D points appear in the annular region is reversed.
  • Step 5 Combine the specific situation of the abnormal event, select different variables and the order of the variables to sort the related users, and obtain a comprehensive sorting sequence.
  • FIG. 9 is a flowchart of a sorting module according to an embodiment of the present invention. The technical solution will be described in detail below with reference to the implementation case and FIG. Although the focus is different when sorting in different embodiments, the identification and data preparation work for the relevant users are the same. At least the following sections are included:
  • the first part the storage and cleaning of data.
  • FIG. 10 is a structural diagram of a user identification system according to an embodiment of the present invention.
  • a computer cluster and a distributed file system are used as the first layer of original record data storage, and call data is used.
  • the acquisition is pulled into the cluster through the parallel data acquisition module; on the basis of HDFS, the data cleaning pipeline is built, and the data from the last 30 days is put into the faster response database system, and the database is built on the database.
  • Subsequent processing modules When the processing module needs to use data older than 30 days, you can further access HDFS.
  • Step 1 Using the parallel data acquisition module, the original record of the call data is pulled into the distributed file system of the computer cluster.
  • Step 2 Through a Map-reduce mapping-simplified data pre-processing pipeline task, the original call data is processed into a required mode, such as removing redundant information, encrypting the mobile phone number, and the like. Then the processed data is stored in the database system, and the amount of data loaded into the database can be controlled according to the system load condition. Considering both the load capacity of the database and the task processing speed, the data volume generally loaded into the database can be at least 30. Data above the day. There are many optimizations for the storage of data, such as segmentation by date, or compression of data.
  • the data cleaning and encryption module performs certain de-redundancy and encryption processing on the acquired original call data. This module is included on the server side.
  • the original call data includes more fields, such as roaming status, and the International Mobile Equipment Identity (IMEI), which has more than 20 attribute segments.
  • the actual fields used are very limited, including base station information and call record information.
  • the field of the base station information includes: the latitude and longitude and the number of the base station; the fields of the call record information include: the encrypted mobile phone number, the mobile phone number of the opposite end, the call time, and the base station number.
  • the business logic module includes the delineation of the abnormal user scope introduced in the previous section, the OD identification based on the call data, the social relationship identification, the user feature recognition, and the subsequent sorting process. This will be explained in conjunction with the embodiment.
  • the weights of correlations that can be set from large to small are: the number of occurrences in the case, the probability that the OD falls within the ring of the case, gender and age, orbit Trace correlation, social relationship relevance, for example, the weights of the above five correlation parameters can be set to 90, 80, 70, 60, 50, respectively.
  • Step 1 According to the time and place of multiple cases, combined with the trajectory prediction module, the set P of mobile phone users that may have appeared in these time periods and regions is delineated.
  • Step 2 Count the number of times each user in P appears in the relevant set of cases.
  • Step 3 Perform OD recognition for the users in P and identify the OD points of each user.
  • Step 4 Count the probability ⁇ that each user's OD falls in the ring area of the case.
  • FIG. 11 is a schematic diagram of a geographical portrait according to a first embodiment of the present invention. As shown in FIG.
  • the center of the circle represents each case discovery field.
  • the user who has an OD falling in the intersection area of a plurality of large circles is more suspicious, that is, an area covered with a small triangle in FIG.
  • the probability ⁇ of each user's OD falling in the intersection of a plurality of large circles is counted.
  • the setting of the inner and outer radius of the annular region can be determined by the statistics of the distance between the suspect OD and the location of the case in the previously cracked case. If the distances are arranged in descending order, the distance between the suspect OD and the case location is arranged in a row from far to near, and the plurality of distance values after the arrangement can be divided into two parts.
  • the queue of distance values is divided into two parts.
  • the outer radius is the average of all the distances in the queue in the first half
  • the inner radius is the average of all the distances in the queue in the second half
  • the average value of all the distances is increased or decreased by 0.5 times, such as calculating the suspect OD and The average distance V1 of the total distance of the case location, and increase the average value by 0.5 times (ie 1.5V1) as the outer radius, reduce the average value by 0.5 times (ie 0.5V1) as the inner radius; or take the median again Each increase or decrease of 0.5 times and so on.
  • Step 5 Calculate the correlation ⁇ of the trajectory of the N hours before the occurrence of the user and the victim case in P.
  • the case before the case occurs, there is usually a situation in which the victim follows the victim, and it takes a certain amount of time from the trailing to the occurrence of the case.
  • the value of N when the case occurs in the morning of the day (such as 8:00-12:00), usually the value of N is greater than 2 and less than 10; and when the case occurs in the afternoon of the day (such as When 12:00-17:00), the value of N is usually greater than 2 and less than 20.
  • Step 6 Find the gender and age information ⁇ of the user in P. If there is no user record in the database, the trained model is learned by machine, the users are identified, and the identified results are stored in the database.
  • Step 7 Identify the social relationships of the users in P and find out the social relationship set of each user.
  • a caching mechanism is introduced by building a database of the user's social relationships, OD, gender, age, and the like. That is, when there are related records of the user in the database, the corresponding results are directly taken out from the database; and when the database does not have related records of the users, the model trained in the machine learning module is called to identify the social relationships of the users, and The results are stored in the database for later use.
  • Step 8 Calculate the degree of correlation ⁇ between the user and the victim in social relations in P.
  • Step 9 Sort P in descending order of ⁇ , ⁇ , ⁇ , ⁇ , ⁇ , and display the sorting result. Calculate based on the weights and obtain a comprehensive correlation ranking result.
  • FIG. 12 is a flow chart of user identification according to an embodiment of the present invention.
  • FIG. 12 is a process of user identification.
  • some factors may also be selected, as shown in the following Embodiment 2 and Embodiment 3.
  • the victim reported that he was defrauded on an antique street and purchased fake antiques at a high price. According to the description of the victim, he has been persuaded by many people, and it is suspected that many people cooperate in groups that play their respective roles. However, only by the description of the victim, the police could not obtain enough features to confirm the suspect.
  • the trajectory of the suspect is similar to the victim's trajectory, so the trajectory correlation has a greater impact.
  • From the perspective of social relations there are more telephone exchanges between the gangs, so they are mutually
  • the social network is more likely, that is, the relevance of social relations will be higher; other factors have less impact. In this case, important information can often be obtained from the two factors of trajectory and social relations.
  • the calculation process of the second embodiment can save the gender age and OD module.
  • Step 1 Determine the time and place of the abnormal event based on the victim's description. Combined with the trajectory prediction module, the relevant user set P of the time period and the region is delineated.
  • Step 2 Calculate the trajectory similarity ⁇ between the user and the victim in P.
  • Step 3 Find the social relationship of the user in P from the database. If there is no user record in the database, use the social relationship recognition model obtained by machine learning to identify the social relationship through the social relationship identification module, and store the result in the database. .
  • Step 4 Calculate the social relationship ⁇ between the user and the victim in P.
  • Step 5 Find the gender and age information of the user in P, and the probability ⁇ that matches the "man between 18-40". If there is no user record in the database, the trained model is learned by machine, the users are identified, and the identified results are stored in the database.
  • Step 6 Sort the users in P by social relationship relevance, trajectory relevance, age, and gender, and return the sorting result.
  • the weights of the correlation parameters are from high to low in order of social relationship relevance, gender and age, and trajectory correlation. For example, the weights are set to 90, 80, and 40, respectively. Calculate based on the weights and obtain a comprehensive correlation ranking result.
  • the process diagram in this embodiment is consistent with FIG. 12, and it is determined that the ordering priorities of different influencing factors are different only according to the analysis of different scenarios.
  • the embodiment of the present invention systematically automates the process of identifying and sorting potential user groups for abnormal events, from the cleaning process of data to the identification and sorting process of potential related user groups, and forms a process.
  • a system solution that can be operated and implemented as a whole.
  • the embodiments of the present invention provide a novel and operative solution in terms of the definition of the potential user group, the correlation of the spatial behavior, and the sorting of the three factors of comprehensive domain knowledge, social relationship and spatial behavior.
  • the wireless communication operator stores a large amount of communication-related data, such as a log of the state of the mobile phone such as a call, a short message, or a switch, and generally takes the spatial location information of the base station as a unit.
  • communication-related data such as a log of the state of the mobile phone such as a call, a short message, or a switch.
  • the development of data mining technology enables the value of data to be effectively presented.
  • the user's commute OD, social relationship and trajectory pattern can be identified. This information is of great significance for exploring the relevant degree of relevant personnel in abnormal events.
  • the user's commute OD that is, home and work place, is the product of industrial social development and the basic mode of user movement law.
  • OD information and relevant domain knowledge reflects the degree of relevance of users and abnormal events in the domain model; uses community discovery technology to identify user social relationships from data, and social relationships can be used to examine user-related anomalies. Degree; based on user's trajectory information, analyzing the correlation between users and abnormal events from spatial behavior, and analyzing the correlation between users and abnormal events based on mobile call data, comprehensive domain knowledge, social relations and spatial behavior. It is important to deal with abnormal events in a timely manner. Compared with the traditional manual investigation method, with more comprehensive data and big data technology, the key response and priority investigation scope are determined with faster response speed, thereby optimizing the manpower deployment; in the field of public security criminal investigation, this is helping the police. Grasping the golden age of solving crimes, it is well known that the fight against time in criminal investigation means resolving the crisis, saving lives and maintaining social justice and tranquility.
  • a storage medium is further provided, wherein the software includes the foregoing software, including but not limited to any one of the following: an optical disk, a floppy disk, a hard disk, an erasable memory, and the like.
  • modules or steps of the embodiments of the present invention can be implemented by a general computing device, which can be concentrated on a single computing device or distributed in multiple computing devices. Alternatively, they may be implemented by program code executable by the computing device such that they may be stored in the storage device by the computing device and, in some cases, may be different from The order shown or described, or it They are fabricated into integrated circuit modules, or a plurality of modules or steps thereof are fabricated into a single integrated circuit module. .
  • FIG. 13 is a schematic structural diagram of a hardware of a device according to an embodiment of the present invention. As shown in FIG. 13, the device includes:
  • One or more processors 810, one processor 810 is taken as an example in FIG. 13;
  • the device may also include an input device 830 and an output device 840.
  • the processor 810, the memory 820, the input device 830, and the output device 840 in the device may be connected by a bus or other means, and the bus connection is taken as an example in FIG.
  • the memory 820 is used as a computer readable storage medium, and can be used to store a software program, a computer executable program, and a module, such as a program instruction/module corresponding to the determining method of the user equipment in the embodiment of the present invention (for example, as shown in FIG. 2
  • the processor 810 executes various functional applications and data processing of the server by executing software programs, instructions, and modules stored in the memory 820, that is, a determining method of the user equipment of the above method embodiment.
  • the memory 820 may include a storage program area and an storage data area, wherein the storage program area may store an operating system, an application required for at least one function; the storage data area may store data created according to usage of the terminal device, and the like.
  • memory 820 can include high speed random access memory, and can also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • memory 820 can optionally include memory remotely located relative to processor 810, which can be connected to the terminal device over a network. Examples of such networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
  • Input device 830 can be configured to receive input numeric or character information and to generate key signal inputs related to user settings and function controls of the terminal.
  • the output device 840 can include a display device such as a display screen.
  • the one or more modules are stored in the memory 820 when the one or more When the processor 810 is executed, the determining method of the user equipment of the above method embodiment is performed.
  • the method for determining a user equipment solves the problem that the related device cannot be identified by the mobile device in the related art, thereby effectively and effectively identifying the user, optimizing the deployment of the human resources, and saving the human resources. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明实施例提供了一种用户设备的确定方法及装置,其中,该方法包括:确定指定时间段位于指定空间范围内的潜在用户设备;获取与该潜在用户设备对应的关联信息;依据关联信息在潜在用户设备中确定指定用户设备。通过本发明解决了相关技术中不能通过移动设备对相关用户进行识别的问题,进而实现了高效、快速对用户进行有效识别,优化人力部署,节省人力资源的效果。

Description

用户设备的确定方法及装置 技术领域
本申请涉及通信领域,例如涉及一种用户设备的确定方法及装置。
背景技术
在公共安全领域,异常事件发生时需要安全人员通过对事件的调查,从大量人群中寻找事件相关人员,通过对相关人员的调查去寻找线索。而随着科技的发展,移动电话的迅速普及,移动通话数据的价值逐渐凸显,从数据中识别与异常事件相关人员,以指导安全人员的侦查工作,对公共安全问题具有重要意义。
针对相关技术中,不能通过移动设备对相关用户进行识别的问题,还未提出有效的解决方案。
发明内容
本发明实施例提供了一种用户设备的确定方法及装置,以至少解决相关技术中不能通过移动设备对相关用户进行识别的问题。
本发明实施例提供了一种用户设备的确定方法,包括:确定指定时间段位于指定空间范围内的潜在用户设备;获取与所述潜在用户设备对应的关联信息;依据所述关联信息在所述潜在用户设备中确定指定用户设备。
可选地,确定指定时间段位于指定空间范围内的潜在用户设备包括:获取第一类用户设备的位置信息,当所述第一类用户设备在所述指定时间段内且位于所述指定空间范围内时,将所述第一类用户设备作为所述潜在用户设备;获取第二类用户设备的轨迹信息,当所述轨迹信息满足在所述指定时间段内且位于所述指定空间范围内时,将所述第二类用户设备作为所述潜在用户设备。
可选地,获取第二类用户设备的轨迹信息包括:根据与所述第二类用户设备对应的用户的历史通话记录信息挖掘所述用户的移动规律性;根据所述移动 规律性确定所述第二类用户设备的轨迹信息。
可选地,确定指定时间段位于指定空间范围内的潜在用户设备包括:获取与所述第二类用户设备对应的用户的离散熵;在所述离散熵小于预定阈值的情况下,根据所述用户的历史通话记录信息获取所述用户的移动规律性,根据所述移动规律性确定所述第二类用户设备;在所述离散熵大于或者等于所述预定阈值的情况下,根据数据库中所有用户的历史通话信息确定所述第二类用户设备。
可选地,所述关联信息包括以下至少之一:所述潜在用户设备对应用户的居住地与工作地信息、与所述潜在用户设备对应的用户的社会关系信息、与所述潜在用户设备对应的用户的人口特征信息。
可选地,获取与所述潜在用户设备对应的用户的社会关系信息包括:根据以下至少之一的信息获取所述用户的社会关系信息:所述用户设备进行通话的时间特征信息、与所述用户设备在相同时间处于相同位置的其他用户设备的信息、与所述用户设备存在共同联系人的其他用户设备的信息。
可选地,依据所述关联信息在所述潜在用户设备中确定指定的用户设备包括:获取所述关联信息包括的指定信息的权重;根据所述权重对所述潜在用户设备中的多个用户设备进行排序;将排序结果位于预定次序的用户设备确定为所述指定用户设备。
本发明实施例还提供了一种用户设备的确定装置,包括:第一确定模块,设置为确定指定时间段位于指定空间范围内的潜在用户设备;获取模块,设置为获取与所述潜在用户设备对应的关联信息;第二确定模块,设置为依据所述关联信息在所述潜在用户设备中确定指定用户设备。
可选地,所述第一确定模块包括:第一获取单元,设置为获取第一类用户设备的位置信息;在所述第一类用户设备在所述指定时间段内且在所述指定空间范围内时,将所述第一类用户设备作为所述潜在用户设备;第二获取单元,设置为获取第二类用户设备的轨迹信息,在所述轨迹信息满足在所述指定时间段内位于所述指定空间范围内时,将所述第二类用户设备作为所述潜在用户设 备。
可选地,所述第二获取单元包括:挖掘子单元,设置为根据与所述第二类用户设备对应的用户的历史通话记录信息挖掘所述用户的移动规律性;确定子单元,设置为根据所述移动规律性确定所述第二类用户设备的轨迹信息。
可选地,所述第一确定模块还设置为获取与所述第二类用户设备对应的用户的离散熵;在所述离散熵小于预定阈值的情况下,根据所述用户的历史通话记录信息获取所述用户的移动规律性,根据所述移动规律性确定所述第二类用户设备;在所述离散熵大于或者等于所述预定阈值的情况下,根据数据库中所有用户的历史通话信息确定所述第二类用户设备。
本发明实施例还提供一种非易失性的计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述方法。
本发明实施例还提供一种设备,该设备包括一个或多个处理器、存储器以及一个或多个程序,所述一个或多个程序存储在存储器中,当被一个或多个处理器执行时,执行上述方法。
本发明实施例,通过确定指定时间段位于指定空间范围内的潜在用户设备;获取与该潜在用户设备对应的关联信息;依据关联信息在潜在用户设备中确定指定用户设备。解决了相关技术中不能通过移动设备对相关用户进行识别的问题,进而实现了快速对用户进行识别,节省人力资源的效果。
附图说明
此处所说明的附图用来提供对本发明实施例的理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本发明实施例,并不构成对本发明实施例的不当限定。在附图中:
图1是根据本发明实施例的用户设备确定方法的流程图;
图2是根据本发明实施例的用户设备确定装置的结构框图;
图3是根据本发明实施例的用户设备确定装置的结构框图(一);
图4是根据本发明实施例的用户设备确定装置的结构框图(二);
图5是根据本发明实施例的通话数据的用户异常排序方法流程图;
图6是根据本发明实施例的轨迹预测流程图;
图7是根据本发明实施例的通话数据的OD识别流程图;
图8是根据本发明实施例的犯罪学地理画像示意图;
图9是根据本发明实施例的排序模块流程图;
图10是根据本发明实施例的用户识别系统结构图;
图11是根据本发明实施例一的地理画像示意图;
图12是根据本发明实施例的用户识别流程图;
图13是根据本发明实施例提供的一种设备的硬件结构示意图。
实施方式
下文中将参考附图并结合实施例来详细说明本发明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。
在本实施例中提供了一种用户设备的确定方法,图1是根据本发明实施例的用户设备确定方法的流程图,如图1所示,该流程包括如下步骤:
步骤S102,确定指定时间段位于指定空间范围内的潜在用户设备;
步骤S104,获取与潜在用户设备对应的关联信息;
步骤S106,依据关联信息在潜在用户设备中确定指定用户设备。
通过上述步骤,在众多的潜在用户设备中确定指定用户设备的过程中,可以根据潜在用户设备的关联信息进行确定,从而可以确定指定用户设备对应的用户,相较于传统人工排查的方式,上述步骤解决了相关技术中不能通过移动设备对相关用户进行识别的问题,达到了快速对用户进行识别,节省人力资源的效果。
上述步骤S102中涉及到确定指定时间段位于指定空间范围内的潜在用户设备,在一个实施例中,获取第一类用户设备的位置信息,当第一类用户设备在指定时间段内且在指定空间范围内时,将第一类用户设备作为潜在用户设备。在另一个可选实施例中,获取第二类用户设备的轨迹信息,当轨迹信息满足在 指定时间段内且位于指定空间范围内时,将第二类用户设备作为潜在用户设备。从而对指定时间段内且在指定空间范围内的潜在用户设备以及依据用户设备的轨迹信息推断在指定时间段位于指定空间范围的潜在用户设备均进行了统计。
在上述获取第二类用户设备的轨迹信息的过程中,在一个可选实施例中,根据与第二类用户设备对应的用户的历史通话记录信息挖掘该用户的移动规律性,根据移动规律性确定第二类用户设备的轨迹信息。
在确定指定时间段且位于指定空间范围内的第二类用户设备时,在一个可选实施例中,获取与第二类用户设备对应的用户的离散熵,在离散熵小于预定阈值的情况下,根据该用户的历史通话记录信息获取该用户的移动规律性,根据移动规律性确定第二类用户设备。在另一个可选实施例中,在离散熵大于或者等于该预定阈值的情况下,根据数据库中所有用户的历史通话信息确定该第二类用户设备。从而对第二类用户设备进行了相对精确的确定。
上述的关联信息可以包括多种信息,下面对此进行举例说明。在一个可选实施例中,关联信息可以是潜在用户设备对应用户的居住地与工作地信息,或者可以是与潜在用户设备对应的用户的社会关系信息,也可以是与潜在用户设备对应的用户的人口特征信息。其中,用户的人口特征信息可以是用户的年龄或者用户的性别等。
在一个可选实施例中,根据以下至少之一的信息获取用户的社会关系信息:用户设备进行通话的时间特征信息、与用户设备在相同时间处于相同位置的其他用户设备的信息、与用户设备存在共同联系人的其他用户设备的信息。从而可以获取与潜在用户设备对应的用户的社会关系信息。
潜在用户设备的数量可能有很多个,因此,需要从多个潜在用户设备中确定一个或者多个指定的用户设备,在一个可选实施例中,获取上述关联信息包括的指定信息的权重,根据该权重对潜在用户设备中的多个用户设备进行排序;将排序结果位于预定次序的用户设备确定为指定用户设备。从而完成了根据关联信息在潜在用户设备中确定指定的用户设备。
在本实施例中还提供了一种用户设备的确定装置,该装置可实现上述实施 例及可选实施方式,已经进行过说明的不再赘述。如以下所使用的,术语“模块”可以实现预定功能的软件和/或硬件的组合。尽管以下实施例所描述的装置较佳地以软件来实现,但是硬件,或者软件和硬件的组合的实现也是可能并被构想的。
图2是根据本发明实施例的用户设备确定装置的结构框图,如图2所示,该装置包括:第一确定模块22,设置为确定指定时间段位于指定空间范围内的潜在用户设备;获取模块24,设置为获取与该潜在用户设备对应的关联信息;第二确定模块26,设置为依据该关联信息在潜在用户设备中确定指定用户设备。
图3是根据本发明实施例的用户设备确定装置的结构框图(一),如图3所示,第一确定模块22包括:第一获取单元222,设置为获取第一类用户设备的位置信息;当第一类用户设备在指定时间段内且位于指定空间范围内时,将第一类用户设备作为潜在用户设备;第二获取单元224,设置为获取第二类用户设备的轨迹信息,当轨迹信息满足在指定时间段内且位于指定空间范围内时,将第二类用户设备作为潜在用户设备。
图4是根据本发明实施例的用户设备确定装置的结构框图(二),如图4所示,第二获取单元224包括:挖掘子单元2242,设置为根据与第二类用户设备对应的用户的历史通话记录信息挖掘该用户的移动规律性;确定子单元2244,设置为根据移动规律性确定第二类用户设备的轨迹信息。
可选地,第一确定模块22还可设置为获取与第二类用户设备对应的用户的离散熵;在离散熵小于预定阈值的情况下,根据该用户的历史通话记录信息获取该用户的移动规律性,根据移动规律性确定第二类用户设备;在该离散熵大于或者等于该预定阈值的情况下,根据数据库中所有用户的历史通话信息确定第二类用户设备。
需要说明的是,上述多个模块是可以通过软件或硬件来实现的,对于后者,可以通过以下方式实现,但不限于此:上述多个模块均位于同一处理器中;或者,上述多个模块分别位于多个处理器,例如第一处理器、第二处理器和第三 处理器中。
针对相关技术中存在的上述问题,下面结合实施例进行说明,在下述的可选实施例中结合了上述可选实施例及其可选实施方式。
本可选实施例结合计算机技术与数据科学的发展,结合公共安全领域的知识背景,利用社会网络分析的方法,提供一种数字化的异常事件中相关人员的排查方法,识别与异常事件相关的用户并结合领域知识提供相关度排名。该方法相比于相关的人工排查方法,有更快的响应速度和更全面的覆盖范围,可以有效辅助安全人员的排查工作。由于手机的普及,手机通话数据是海量的,且覆盖了城市的绝大部分人口。借助数据科学的发展,利用数据挖掘和社会网络分析的手段,从这些通话数据中可以分析用户出行的源和目的地(Origin and Destination,简称为OD)信息(此处的OD特指通勤OD,即上下班出行的源和目的地,即家和办公地),社会关系及轨迹模式等信息,这些信息为确定异常事件中相关人员与该异常事件的相关度定义提供基础和依据。图5是根据本发明实施例的通话数据的用户异常排序方法流程图,如图5所示,通过对原始数据进行清洗与加密、轨迹预测、OD识别、社会关系识别、用户特征识别以及对经过上述处理后的数据进行相关性排序,进而实现了基于通话数据的用户异常排序。
本可选实施例结合基于通话数据的OD识别、社会关系及用户特征识别,提供一种对异常事件中相关用户的识别与排序方法。主要步骤为:
步骤1.数据预处理。
该部分可将收集到的原始通话数据处理成需要的格式。首先从原始通话数据中抽取需要的属性域,包括用户标识(加密后的手机号码)、通话基站位置(即通话基站标识)、通话时间。其中,用户隐私是通话数据中的重要问题,因此需要对用户手机号码进行加密,生成只用于标识用户的键值。原始的通话数据通常是海量且冗余的,通过预处理筛选出所需要的数据,可以有效减小数据量,提高后续处理的效率。
步骤2.基于轨迹预测的异常用户范围圈定。
图6是根据本发明实施例的轨迹预测流程图,下面对图6进行说明。异常事件发生时,潜在相关用户是指在与异常事件相关的一定时间范围[t1,t2]内一定空间范围[s1,s2]内的用户。但由于通话数据并不是实时连续的,如果用户设备在异常事件发生的时间段内未暴露位置,但在时间段[t1,t2]内可能位于空间[s1,s2]内的用户设备,也应该划分在潜在用户群内。因此我们引入用户的轨迹预测模块,来处理这类用户持有的用户设备。轨迹预测的处理过程为:
对于离散熵较大的用户,采用群体轨迹预测模型(Crowd Trajectory Predictor,简称为CTP),可用动态贝叶斯网络实现。对于离散熵小的用户,采用个人轨迹预测模型(Individual Trajectory Predictor,简称为ITP)。
其中,可以预先设置预设阈值,并将用户的离散熵与该预设阈值比较,当离散熵大于或者等于该预设阈值时,确定用户的离散熵较大;当离散熵小于该预设阈值时,确定用户的离散熵较小。
离散熵可用于衡量用户的可预测性,定义如下:
Figure PCTCN2016082927-appb-000001
其中,i表示基站的序列号,i的取值为正整数,n为大于等于1的正整数,Ri表示基站标识,p(Ri)为用户在该基站的覆盖范围的区域内出现的频繁度,。离散熵越大,用户的运动规律性就越低。
定义一种数据结构UltraPattern=<h1,R1><h2,R2>…<hn,Rn>,其中hi表示时间片(预设将一天的24个小时均等划分为24个时间片,每个时间片1个小时),Ri表示基站标识,这样的数据结构可用于表示用户的移动轨迹。
算法实现如下:
输入:预测用户的ID,预测异常事件的日期及时间点。
输出:基站标识,代表用户在预测时间点可能所在的位置,按支持度大小排序。
根据输入,从数据库中取异常事件发生的预测日期内,预测时间点前两个小时的通话记录,其中,该通话记录中包括基站位置信息,该基站位置信息即 用户当前位置所在基站的基站标识,作为预测依据。
若预测依据为空,则使用个人预测模型:处理用户的个人历史移动通话数据,挖掘用户移动的规律性,压缩历史通话数据得到一个二维数组UltraPattern[24][7],该二维数组可以表示以一周为周期,每天24个小时,每小时一个通话记录的通话记录数组。预测过程:以预测时间点对应的所有位置的每一个所在基站的基站标识作为备选预测位置,并按照支持度排序。预测结束。
其中,在二维数组UltraPattern[24][7]中记录了用户在预测时间点所处的一个或多个基站对应的基站标识,其含义是指用户在预测时间点出现在这一个或多个基站的覆盖范围区域内的概率较大。
若预测依据不为空,根据用户的历史信息,计算用户的离散熵。
若用户的离散熵小于预设阈值,即用户运动的周期性比较强,则建立个人预测模型UltraPattern[24][7]。预测过程:以预测依据在UltraPattern[24][7]中匹配出下一时刻的位置作为预测结果,若没有完全匹配的路径,则使用群体预测模型。
若用户的离散熵大于或等于预设阈值,使用群体预测模型。在同一个城市,基站位置有限,人类活动很大程度上有一定的共性,所以可根据数据库中所有用户历史移动信息预测用户的下一时刻位置。首先训练群体预测模型:将一天分为24个时间片,利用数据库中所有用户的历史通话记录,训练出24个转移预测矩阵,每个转移预测矩阵的横纵坐标分别代表基站序列号,其中,每个转移预测矩阵的横坐标代表的基站序列号可以与纵坐标代表的基站序列号相同或者不同。
例如,从0点整-0点59分59秒的前一个时间段(简称为0点小时时间片)到从1点整到1点59分59秒的后一个时间段(简称为1点小时时间片)训练出转移预测矩阵A0,其第i行第j列的元素a0i,j表示用户0点小时时间片在基站序列号为i的基站覆盖范围区域内,而1点小时时间片在基站序列号为j的基站覆盖范围区域内的概率。预测过程:在给定预测依据,即已知用户当前位置的情况下,可从相应时间对应的矩阵中取值,查看以用户当前位置的基站标识对 应的基站序号为横坐标的那一行数据,取其中较大的多个值并按降序排列,按此顺序取这几个值对应的纵坐标,其纵坐标代表基站序列号,再将基站序列号转换为基站标识,作为预测结果返回。预测结束。
步骤3.基于通话数据的OD识别、社会关系识别、用户特征识别。
利用通话数据,对用户的通勤OD、社会关系及特征(如年龄、性别)等进行识别,可以描绘出用户的社会肖像。这些特征与异常事件当事人的相关程度,为排查人员提供了有效信息。
a)OD识别
由于工业社会的发展,城市人群工作与生活方式具有一定的规律性,通勤OD(居住地与工作地)是最基础的出行模式。在相关技术中存在基于通话数据的简单而有效的通勤OD挖掘方法。
在相关技术中OD识别的实现细节具体算法流程如下:
基于手机通话数据的大规模人群通勤OD发现方法为:
输入:每个用户的通话数据T,T={<手机号,通话基站,通话时间>},其中,每个用户的通话数据T包括用户的手机号(加密后的手机号)、通话基站标识和在该基站的覆盖范围区域内的通话时间。
输出:每个用户的居住地和工作地泊点,即OD泊点。
时空改进方法:
1.将每个用户的通话数据分为两个集合,Tday和Tnight,分别代表白天和夜晚的通话数据。
2.分别对Tday和Tnight通话数据进行统计。
3.将通话数据按照通话基站进行划分,每个通话基站对应一个通话次数。
4.(空间改进)将通话基站按对应的通话次数从大到小排列,然后将排列之后的通话基站进行空间合并,形成新的通话位置点。
5.(时间改进)根据通话周期性,计算每个通话位置点的通话频繁度。
6.(条件筛选)根据通话位置点的通话频繁度,对通话位置点进行筛选,删除通话稀疏的通话位置点。
其中,可以预先设定一个通话频率阈值,当通话频繁度小于该通话频率阈值时,确定该通话位置点的通话稀疏。
7.将Tday和Tnight通话数据中通话频繁度最大的位置点作为D和O,即工作地和居住地。
8.输出每个用户的通勤OD。
图7是根据本发明实施例的通话数据的OD识别流程图,如图7所述,该流程包括如下步骤:
步骤S702,对通话数据进行预处理;
步骤S704,对每个通话位置点的通话频率进行统计;
步骤S706,空间合并优化;
步骤S708,时间合并优化。
b)社会关系识别
把用户的社会关系划分为三类:家人、同事和其他。不同关系的用户之间,在通话行为和位置的时间、空间分布上具有差异性。从通话数据及OD信息中,提取通话时间、相处时间(结合OD信息)及一些群体信息(如共同联系人)等相关的特征,利用分类模型(如决策树、随机森林等)对用户之间的关系进行识别。社会关系的识别用于定义用户与异常事件当事人之间的社会关系相关度。社会关系识别的过程:
将社会关系分为三大类:家人、同事和其他。将有联系的两个用户组织成一个用户对,关系的识别转换为分类问题,分类该用户对是家人关系、同事关系还是其他。采用决策树模型,提取的特征有如下三类:
通话时间特征
Figure PCTCN2016082927-appb-000002
Figure PCTCN2016082927-appb-000003
其中,工作时间可以是指周一至周五的8点-12点和13点-17点,傍晚可以是指北京时间17点-19点,晚上可以是指北京时间19点-23点,深夜可以是指北京时间23点-次日3点。
相处时间增益特征
特征名称 说明
周中平均相处时长 在正常工作日内,用户对平均每天的相处时长
周末平均相处时长 在周末,用户对平均每天的相处时长
周末时间增益 TΔ,用户对周末和周中平均相处时长的变化量
群体结构特征
Figure PCTCN2016082927-appb-000004
Figure PCTCN2016082927-appb-000005
c)用户特征识别
通话数据中可能不包括用户的年龄、性别等信息,有些电信运营商的申请号码记录中可能会有比较完整的用户信息,但这些信息一方面隐私性要求较高,一方面数据完整性和真实性不能保证。通过统计发现不同性别或年龄段的用户,在通话习惯上具有一定的差异性。因此可以通过提取相关的特征值,利用分类模型(决策树、随机森林等)对用户的性别、年龄进行识别。用户的年龄与性别信息对辅助社会关系相似度有一定帮助,比如,从一些实证分析案例中的结论来看,故意杀人案件中加害人与被害人之间关系在性别上具有差异性,在年龄上具有“重合性”,其中,参考资料为:[1].《中国人民公安大学学报:社科版》,2006年第2期,《故意杀人案件中加害人与被害人关系的实证分析》,作者:高维俭、查国防。
关于用户特征识别的过程为:
用户的性别、年龄等信息在真实的数据中有一定缺陷,使用机器学习的方法对性别与年龄识别可以在一定程度上弥补这一缺陷。将年龄划分为三个年龄段(18-25,26-40,41-60),这样年龄的识别问题转化为多类别的分类问题。可以从外部系统(例如客户关系管理系统)获取部分用户的分类标签(该分类标签可为用户的性别分类信息和/或年龄分类信息),再从通话数据中提取所有用户的多个通话特征值(该多个通话特征值可包括如下表所示的特征值)。然后采用监督学习的方法(例如采用决策树模型),判定其他用户(未从外部系统获取该其他用户的分类标签)的分类标签。
Figure PCTCN2016082927-appb-000006
Figure PCTCN2016082927-appb-000007
Figure PCTCN2016082927-appb-000008
步骤4.排序过程
排序部分分三类:空间关系、社会关系和基于领域模型的排序,从三个角度全面地分析用户与异常事件的相关程度。
a)空间关系的排序
在基于空间关系的排序中,我们从两个方面考查用户与异常事件的当事人空间行为的关系:一,在一定时间段内,与异常事件当事人轨迹相似度较高的用户可疑程度高;二,该用户事发当日与以往的空间行为模式有较大差异,则其可疑程度较高。我们参考文本信息检索领域中“文本向量”的概念和余弦相似度的度量方法,来处理用户轨迹及轨迹之间的相似程度。
在文本信息检索领域,常将一篇文档组织为一个文档向量,向量的元素为词项在该文档中的出现次数(或者TF/IDF值),结合余弦相似度,返回相似文档。余弦相似度的理论模型如下:
对于两个向量α和β,两向量之间夹角越小,其相似度越高。而通过余弦定理,可以求得其夹角的余弦值与两向量之间的关系:
Figure PCTCN2016082927-appb-000009
通过将用户的轨迹组织为向量,向量中的元素为用户在该基站出现的平均次数,分别求出用户与异常事件当事人轨迹的余弦相似度s1,该用户以往空间向量与当日空间向量的余弦相似度s2,则该用户在空间行为上的可疑程度为:
Figure PCTCN2016082927-appb-000010
b)社会关系的排序
当异常事件发生时,首先根据异常事件的时间和空间位置进行过滤,筛选潜在可疑用户集合即在一定时间范围内出现在该位置一定范围内的用户集合S。对于用户集合S中的用户,查看其与异常事件当事人之间社会关系的重合度,结合该用户的性别与年龄信息,给出排序结果。
结合相关的实证分析结果,在故意杀人案件中,发生在熟人之间的几率为78.5%,远高于发生在陌生人之间的概率21.5%,详见[2].Darcy Kim Rossmo,M.A.,Simon Fraser University,1987,Geographic profiling:target patterns of serial murderers。因此相关用户与异常事件当事人的社会关系重合度越高,其可疑度应越高,且更可能提供与异常事件相关的更多信息。实证分析又指出,在加害人中,80.9%的几率年龄在18-44区段;从性别上考虑,加害人85.9%为男性,只有14.1%为女性,详见[1].《中国人民公安大学学报:社科版》,2006年第2期,《故意杀人案件中加害人与被害人关系的实证分析》,作者:高维俭、查国防。从实证研究的成果考虑,依次以社交圈重合度、性别和年龄为排序基准,给出基于社会关系的异常排序结果。
关于社会关系的重合度,我们采用适合对符号度量或布尔值度量的Jaccard相似系数:
Figure PCTCN2016082927-appb-000011
c)领域模型的排序
根据犯罪学地理画像理论,详见[2].Darcy Kim Rossmo,M.A.,Simon Fraser  University,1987,Geographic profiling:target patterns of serial murderers.对罪犯的心理基于如下两个假设:
犯罪嫌疑人不会在离固定活动点(家、工作地等)很近的地方犯罪,因为这样不仅容易暴露自己,犯罪目标也较少;
离罪犯的固定工作点越远的地方,他在那里犯罪的可能性越小。因为这样会增加很多交通/逃逸的不便。
其中,示例性的,可以以犯罪嫌疑人的固定活动点(家或者工作地)为圆心预先设定两个活动半径r1和r2,其中,位于r1内的范围表示离犯罪嫌疑人的固定活动点很近的地方,位于r2之外的范围表示离犯罪嫌疑人的固定活动点很远的地方。
图8是根据本发明实施例的犯罪学地理画像示意图,如图8所示,r1为以犯罪嫌疑人固定活动地(家或者工作地)为圆心的较小圆周的半径、r2为以犯罪嫌疑人固定活动地(家或者工作地)为圆心的较大圆周的半径。
连环案件是满足上述假设的犯罪,如多次实施谋杀、抢劫、强奸等犯罪;而这种方法也适合于一次犯罪涉及多个地点的案例,比如一起谋杀中罪犯被目击、谋杀、抛尸的地点都不同的情况。
基于上述的两个假设,对于异常事件,可疑人员的固定活动点(OD点)通常在以该事件发生地为圆心,不同半径所形成的同心圆构成的环内。此处的距离计算是基于交通路网,采用曼哈顿距离。
首先根据事件发生地信息,找出环形区域内的基站,对以这些基站为O/D的人员进行重点排查。如果有多个事件发生地,对那些环形区域交叉的基站,以这些基站为O/D的人员的嫌疑度更大。因此,基于地理画像,以相关人员O/D点出现在环形区域中的概率倒序排列。
步骤5.结合异常事件的具体情境,选择不同的变量和变量的次序对相关用户排序,得到综合排序序列。
针对异常事件的不同情况,结合领域专家的意见选择上述多个变量中全部或部分,并确定变量的优先次序,对用户进行排序。如针对一系列连续作案的 嫌犯,出现在事件现场的次数和OD点处于事件环形区域内的次数对排序结果的影响更大;而对于有组织的团伙犯罪,如团伙欺诈行为,轨迹的尾随效应比较明显,在已知一名嫌疑人时,其他犯罪嫌疑人与之的社会关系的重合度也较高;对于团伙聚众闹事或恐怖事件,社会关系重合度与空间轨迹重合度较高。
图9是根据本发明实施例的排序模块流程图,下文将结合实施案例和图9,对技术方案做详细说明。尽管不同实施例下,排序时侧重点不同,但对于相关用户的识别和数据准备工作都是相同的。至少包含如下部分:
第一部分:数据的存储与清洗。
手机通话数据的数据量很大,不仅有数量庞大的用户,同时每天都会产生大量的记录,因此对于数据的入库存储的挑战较大,使用分布式数据管理系统和分层存储体系会是一个良好的技术方案。图10是根据本发明实施例的用户识别系统结构图,如图10所示,使用计算机集群与分布式文件系统(Hadoop Distributed File System,HDFS)作为第一层原始记录数据存储,而通话数据的获取是通过并行的数据获取模块拉取到集群中;在HDFS的基础之上,构建数据清洗的流水线,将最近30天来的数据放入响应速度更快的数据库系统中,在数据库之上构建后续的处理模块。当处理模块需要使用30天之前的数据时,可以进一步访问HDFS。具体步骤:
步骤1:利用并行数据获取模块,将通话数据的原始记录拉取到计算机集群的分布式文件系统中。
步骤2:通过一个Map-reduce映射-化简的数据预处理的流水线任务,将原始的通话数据处理成需要的模式,如去除冗余的信息,对手机号加密等操作。然后将处理的数据存入数据库系统,可以根据系统负载情况控制载入数据库的数据量,既考虑到数据库的负载能力又考虑到任务处理速度的需求,一般载入数据库的数据量可以至少为30天以上的数据。对于数据的存储有很多的优化方案,如根据日期进行切分,或对数据进行压缩等。
数据清洗与加密模块,是对获取的原始通话数据做一定的去冗余和加密处理。该模块包含于服务器端。
原始的通话数据包含较多的域,如漫游状态、用户手机移动设备国际身份码(International Mobile Equipment Identity,简称为IMEI)等无关信息,共计二十多个属性段。而实际使用的字段很有限,包括基站信息和通话记录信息。其中,基站信息的字段包括:基站的经纬度和编号;通话记录信息的字段包括:加密后的手机号码,对端的手机号码,通话时间,基站编号。
用户隐私是手机通话数据的一个重要问题。为了保护用户的隐私,我们对原始数据的手机号码进行加密处理手机号码的作用仅用来唯一标识手机用户,并无实际意义,因此可以使用其他一一对应的字符串或数字代替。加密后的用户手机号只用于唯一地区分用户,而无法判断具体用户身份,很好地符合了用户隐私的要求。
第二部分:请求处理模块
对于不同的应用场景,对数据的侧重点不同,通过对请求的特征分析与设定,针对不同的场景做不同的处理,对提高排序的相关度有重要作用。结合实施例对此做说明。
第三部分:业务逻辑模块
业务逻辑模块包括上一部分介绍的异常用户范围的圈定,基于通话数据的OD识别、社会关系识别、用户特征识别和后续的排序过程。结合实施例对此做相应的说明。
实施例一
某地连续发生多起强奸案件,从被害人描述来看,很可能是同一人所为,体貌特征难以确定,但携带手机。由于作案现场没有摄像头,警方难以确定嫌疑人的体貌特征,但根据被害人的描述,嫌疑人携带了手机。此种情形下,对于多次出现在现场中的用户,应列入重点怀疑对象。从犯罪学地理画像看,其OD落入事件的环形区域的用户也有较大嫌疑。从年龄与性别角度讲,18-45岁之间的男性可能性较大。从轨迹上来说,具有尾随性特征或与往日移动模式差别较大的用户其可疑性较高。在这种情形下,可设定的相关性权重从大到小依次为:出现在案件中的次数、OD落在案件环形区域内的概率、性别与年龄、轨 迹相关度、社会关系相关度,举例说明,上述5个相关性参数的权重可以分别设置为90、80、70、60、50。
步骤1:根据多次案件的时间和地点,结合轨迹预测模块,圈定这些时间段与区域内可能出现过的手机用户的集合P。
步骤2:统计P中每个用户出现在案件相关集合中的次数α。
步骤3:针对P中的用户,进行OD识别,识别出每个用户的OD点。
步骤4:统计P中每个用户的OD落在案件环形区域的概率β。
可选地,如何确定环形大小圆半径可有两种方式。方式之一,结合地理画像实证研究,在不区分地形、路网、犯罪类别等的情况下,简化认为,以事件发生地为圆心,罪犯的固定地点在以“案发地点之间最大距离的2倍”为半径的多个圆的相交区域的概率较大,故计算环形区域时小圆半径分别设置为0和“案发地点之间最大距离的2倍”。图11是根据本发明实施例一的地理画像示意图,如图11所示设置多个大圆区域,圆心代表各案发现场。OD落在多个大圆相交区域的用户可疑度更高,即图11中布满小三角的区域。统计P中每个用户的OD落在多个大圆相交区域的概率β。第二种方式,环形区域内外半径的设定可通过以往破获的案件中嫌疑人OD与案件地点的距离的统计量确定。如将距离降序排列,即将嫌疑人OD与案件地点的距离按照由远至近的顺序依次排列为一队,并且可以将排列之后的多个距离值分为两部分,可选地,可将多个距离值排列的队列平分为两部分。其中,取外半径为前一半排列队列中所有距离的平均值,内半径为后一半排列队列中所有距离的平均值;或者取全部距离平均值再各增减0.5倍,如计算嫌疑人OD与案件地点的全部距离的平均值V1,并将该平均值增加0.5倍(即1.5V1)作为外半径,将该平均值减小0.5倍(即0.5V1)作为内半径;或者取中位数再各增减0.5倍等。
步骤5:计算P中用户与被害人案件发生前N个小时的轨迹的相关度γ。
其中,在案件发生前,通常会存在加害人尾随被害人的情况,并且从尾随到案件发生也需要一定的时间。例如,当案件发生在一天中的上午(如8:00-12:00)时,通常N的取值为大于2且小于10;而当案件发生在一天中的下午(如 12:00-17:00)时,通常N的取值为大于2且小于20。
步骤6:查找P中用户的性别、年龄信息δ。如数据库中无用户记录,则通过机器学习已训练好的模型,对这些用户进行识别,并将识别的结果存入数据库中。
步骤7:对P中的用户进行社会关系的识别,找出每个用户的社会关系集合。在可选的实施中,通过构建用户的社会关系、OD、性别、年龄等的数据库,引入缓存机制。即当数据库中有用户的相关记录,便直接从数据库中取出相应结果;而当数据库没有这些用户的相关记录时,调用机器学习模块中训练的模型,对这些用户的社会关系进行识别,并将结果存入数据库,供以后使用。
步骤8:计算P中用户与被害人在社会关系上的相关程度ε。
步骤9:依次以α,β,γ,δ,ε降序对P进行排序,并显示排序结果。根据权重计算并得出综合的相关度排序结果。
α×90+β×80+γ×70+δ×60+ε×50
图12是根据本发明实施例的用户识别流程图,用户识别流程请参见图12。
除了如实施例一所述根据场景的分析来确定不同因素的相关度权重大小外,还可以选择忽略一些因素,详见如下实施例二和实施例三。
实施例二
被害人报案在一条古董街上被人诈骗,以高价购买了伪造古董。根据被害人的描述,其先后遭受多人的劝说,疑似多人合作各自扮演各自角色的团伙犯案。但仅凭受害人的描述,警方无法获得足以确认犯罪嫌疑人的特征。
在这种场景下,嫌疑人的轨迹与受害人的轨迹相似度较高,因此轨迹相关度的影响较大;而从社会关系的角度讲,团伙之间电话交流较多,因此相互间互为社会关系网的可能性较大,即社会关系的相关度会较高;其他因素的影响较小。此种情况下,往往可以从轨迹与社会关系这两个因素中获得重要信息。
与实施例一相比,实施例二的计算过程可以省去性别年龄及OD模块。
实施例三
对于群体事件,如聚众闹事、群体斗殴类事件,参与人之间往往相互之间 有较多的联系,因此社会关系的重合度较高;从性别、年龄的角度讲,一般多为18-40之间的男性;而轨迹的相关度也有一定影响。OD信息等对这类事件的影响较小,在排序时可以选择忽略该因素。
简述基本步骤如下:
步骤1:根据被害人的描述,确定异常事件发生的时间和地点。结合轨迹预测模块,圈定该时间段和区域的相关用户集合P。
步骤2:计算P中用户与被害人之间的轨迹相似度γ。
步骤3:从数据库中查找P中用户的社会关系,如数据库中无用户记录,则利用机器学习得到的社会关系识别模型,通过社会关系识别模块识别出其社会关系,并将结果存入数据库中。
步骤4:计算P中用户与被害人之间的社会关系相关度ε。
步骤5:查找P中用户的性别、年龄信息,与“18-40之间的男性”相符合的概率δ。如数据库中无用户记录,则通过机器学习已训练好的模型,对这些用户进行识别,并将识别的结果存入数据库中。
步骤6:依次以社会关系相关度、轨迹相关度、年龄、性别对P中用户进行排序,并返回排序结果。相关度参数权重由高到低顺序为社会关系相关度、性别和年龄相符合的概率、轨迹相关度,举例说明,权重依次分别设置为90、80、40。根据权重计算并得出综合的相关度排序结果。
ε×90+δ×80+γ×40
该实施例下的过程图与图12一致,仅在根据不同场景的分析下确定不同影响因素的排序优先次序上有所差异。
综上所述,本发明实施例针对异常事件对潜在用户群的识别与排序过程进行了系统的自动化处理,从数据的清洗处理,到潜在相关用户群的识别与排序过程进行了整理,形成了整体可运作与实现的系统方案。其中,在对潜在用户群的界定、空间行为的相关性以及综合领域知识、社会关系和空间行为三个因素的排序思想上,本发明实施例提供了新颖可操作的解决方案。由于涉及大量的数据操作,系统复杂性较高,要求模型应有较快的响应速度,因而OD识别、 社会关系及用户特征的识别过程也可以通过线下的训练完成,将识别结果储存进数据库,供排序步骤查询使用。
无线通信运营商为了保证通信系统和通信功能的正常运行,会保存大量与通信相关的数据,如通话、短信、开关机等手机状态的日志,一般以基站的空间位置信息为单位。而数据挖掘技术的发展使得数据的价值得以有效呈现,基于通话数据,可以识别用户的通勤OD、社会关系及轨迹模式等信息,这些信息对于发掘异常事件中相关人员的相关程度具有重要意义。用户的通勤OD,即家与工作地,是工业社会发展的产物,是用户移动规律的基本模式。OD信息与相关的领域知识结合,如犯罪地理画像,反映领域模型下用户与异常事件的相关程度;利用社群发现技术从数据中识别用户社会关系,通过社会关系可以考察用户与异常事件的相关程度;而基于用户的轨迹信息,从空间行为上对用户与异常事件的相关性分析,以及基于移动通话数据,综合领域知识、社会关系及空间行为三个方面,分析用户与异常事件的相关性,对异常事件的及时处理有重要意义。相较于传统人工排查的方式,凭借更全面的数据和大数据技术手段,以更快的响应速度确定重点的和优先的排查范围,进而优化人力部署;在公安刑侦领域,这就是在帮助警察抓住破案的黄金时期,众所周知,刑侦中分秒必争就意味着化解危机、挽救生命和维护社会正义安宁。
在另外一个实施例中,还提供了一种软件,该软件用于执行上述实施例及可选实施方式中描述的技术方案。
在另外一个实施例中,还提供了一种存储介质,该存储介质中存储有上述软件,该存储介质包括但不限于以下任意一种:光盘、软盘、硬盘、可擦写存储器等。
显然,本领域的技术人员应该明白,上述的本发明实施例的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它 们分别制作成集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。。
图13是根据本发明实施例的一种设备的硬件结构示意图,如图13所示,该设备包括:
一个或多个处理器810,图13中以一个处理器810为例;
存储器820;
所述设备还可以包括:输入装置830和输出装置840。
所述设备中的处理器810、存储器820、输入装置830和输出装置840可以通过总线或者其他方式连接,图13中以通过总线连接为例。
存储器820作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本发明实施例中的用户设备的确定方法对应的程序指令/模块(例如,附图2所示的第一确定模块22、获取模块24和第二确定模块26)。处理器810通过运行存储在存储器820中的软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例的用户设备的确定方法。
存储器820可以包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据终端设备的使用所创建的数据等。此外,存储器820可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实施例中,存储器820可选包括相对于处理器810远程设置的存储器,这些远程存储器可以通过网络连接至终端设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
输入装置830可设置为接收输入的数字或字符信息,以及产生与终端的用户设置以及功能控制有关的键信号输入。输出装置840可包括显示屏等显示设备。
所述一个或者多个模块存储在所述存储器820中,当被所述一个或者多个 处理器810执行时,执行上述方法实施例的用户设备的确定方法。
工业实用性
本发明实施例提供的用户设备的确定方法,解决了相关技术中不能通过移动设备对相关用户进行识别的问题,进而实现了高效、快速对用户进行有效识别,优化人力部署,节省人力资源的效果。

Claims (12)

  1. 一种用户设备的确定方法,包括:
    确定指定时间段位于指定空间范围内的潜在用户设备;
    获取与所述潜在用户设备对应的关联信息;以及
    依据所述关联信息从所述潜在用户设备中确定指定用户设备。
  2. 根据权利要求1所述的方法,其中,确定指定时间段位于指定空间范围内的潜在用户设备,包括:
    获取第一类用户设备的位置信息,当所述第一类用户设备在所述指定时间段内且位于所述指定空间范围内时,将所述第一类用户设备作为所述潜在用户设备;以及
    获取第二类用户设备的轨迹信息,当所述轨迹信息满足在所述指定时间段内且位于所述指定空间范围内时,将所述第二类用户设备作为所述潜在用户设备。
  3. 根据权利要求2所述的方法,其中,获取第二类用户设备的轨迹信息,包括:
    根据与所述第二类用户设备对应的用户的历史通话记录信息挖掘所述用户的移动规律性;以及
    根据所述移动规律性确定所述第二类用户设备的轨迹信息。
  4. 根据权利要求2所述的方法,其中,确定指定时间段位于指定空间范围内的潜在用户设备,包括:
    获取与所述第二类用户设备对应的用户的离散熵;
    在所述离散熵小于预定阈值的情况下,根据所述用户的历史通话记录信息获取所述用户的移动规律性,根据所述移动规律性确定所述第二类用户设备;以及
    在所述离散熵大于或者等于所述预定阈值的情况下,根据数据库中所有用户的历史通话信息确定所述第二类用户设备。
  5. 根据权利要求1所述的方法,其中,所述关联信息包括以下至少之一:
    所述潜在用户设备对应用户的居住地与工作地信息、与所述潜在用户设备 对应的用户的社会关系信息、与所述潜在用户设备对应的用户的人口特征信息。
  6. 根据权利要求5所述的方法,其中,获取与所述潜在用户设备对应的用户的社会关系信息,包括:
    根据以下至少之一的信息获取所述用户的社会关系信息:
    所述用户设备进行通话的时间特征信息、与所述用户设备在相同时间处于相同位置的其他用户设备的信息、与所述用户设备存在共同联系人的其他用户设备的信息。
  7. 根据权利要求5所述的方法,其中,依据所述关联信息在所述潜在用户设备中确定指定的用户设备包括:
    获取所述关联信息包括的指定信息的权重;
    根据所述权重对所述潜在用户设备中的多个用户设备进行排序;
    将排序结果位于预定次序的用户设备确定为所述指定用户设备。
  8. 一种用户设备的确定装置,包括:
    第一确定模块,设置为确定指定时间段位于指定空间范围内的潜在用户设备;
    获取模块,设置为获取与所述潜在用户设备对应的关联信息;
    第二确定模块,设置为依据所述关联信息在所述潜在用户设备中确定指定用户设备。
  9. 根据权利要求8所述的装置,其中,所述第一确定模块包括:
    第一获取单元,设置为获取第一类用户设备的位置信息,当所述第一类用户设备在所述指定时间段内且位于所述指定空间范围内时,将所述第一类用户设备作为所述潜在用户设备;
    第二获取单元,设置为获取第二类用户设备的轨迹信息,当所述轨迹信息满足在所述指定时间段内且位于所述指定空间范围内时,将所述第二类用户设备作为所述潜在用户设备。
  10. 根据权利要求9所述的装置,其中,所述第二获取单元包括:
    挖掘子单元,设置为根据与所述第二类用户设备对应的用户的历史通话记录信息挖掘所述用户的移动规律性;
    确定子单元,设置为根据所述移动规律性确定所述第二类用户设备的轨迹信息。
  11. 根据权利要求9所述的装置,其中,所述第一确定模块还设置为:
    获取与所述第二类用户设备对应的用户的离散熵;在所述离散熵小于预定阈值的情况下,根据所述用户的历史通话记录信息获取所述用户的移动规律性,根据所述移动规律性确定所述第二类用户设备;在所述离散熵大于或者等于所述预定阈值的情况下,根据数据库中所有用户的历史通话信息确定所述第二类用户设备。
  12. 一种非易失性的计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令设置为执行权利要求1-7任一项的方法。
PCT/CN2016/082927 2015-05-28 2016-05-20 用户设备的确定方法及装置 WO2016188380A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510283569.8A CN106304015B (zh) 2015-05-28 2015-05-28 用户设备的确定方法及装置
CN201510283569.8 2015-05-28

Publications (1)

Publication Number Publication Date
WO2016188380A1 true WO2016188380A1 (zh) 2016-12-01

Family

ID=57393757

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/082927 WO2016188380A1 (zh) 2015-05-28 2016-05-20 用户设备的确定方法及装置

Country Status (2)

Country Link
CN (1) CN106304015B (zh)
WO (1) WO2016188380A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378002A (zh) * 2019-07-11 2019-10-25 华中农业大学 基于移动轨迹的社会关系建模方法
CN111199417A (zh) * 2019-11-29 2020-05-26 北京深演智能科技股份有限公司 虚假设备id的识别方法及装置
CN111612675A (zh) * 2020-05-18 2020-09-01 浙江宇视科技有限公司 同行对象确定方法、装置、设备及存储介质
CN111950937A (zh) * 2020-09-01 2020-11-17 上海海事大学 一种基于融合时空轨迹的重点人员风险评估方法
CN113656686A (zh) * 2021-07-26 2021-11-16 深圳市中元产教融合科技有限公司 一种基于产教融合的任务报告的生成方法及服务系统

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543986A (zh) * 2018-11-16 2019-03-29 湖南数定智能科技有限公司 基于用户画像的监狱罪犯三预风险评估方法及系统
CN111242147B (zh) * 2018-11-28 2023-07-07 中移(杭州)信息技术有限公司 一种亲密联系人和频繁活跃区域识别的方法及装置
CN109995849B (zh) * 2019-02-26 2022-01-04 维沃移动通信有限公司 一种信息记录方法及终端设备
CN111694875B (zh) * 2019-03-14 2023-04-25 百度在线网络技术(北京)有限公司 用于输出信息的方法和装置
CN110096529B (zh) * 2019-04-16 2021-07-16 中科金联(北京)科技有限公司 一种基于多维矢量数据的网络数据挖掘方法和系统
CN110659560B (zh) * 2019-08-05 2022-06-28 深圳市优必选科技股份有限公司 一种关联对象的识别方法及系统
CN112489396B (zh) * 2020-11-16 2022-12-16 中移雄安信息通信科技有限公司 一种行人尾随行为检测方法、装置、电子设备和存储介质
CN112738724B (zh) * 2020-12-17 2022-09-23 福建新大陆软件工程有限公司 一种区域目标人群的精准识别方法、装置、设备和介质
CN112860808B (zh) * 2020-12-30 2024-08-13 深圳市华傲数据技术有限公司 基于数据标签的用户画像分析方法、装置、介质和设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080132210A1 (en) * 2005-06-17 2008-06-05 Korneluk Jose E Method and apparatus for enhanced identification of individual(s)
CN101656913A (zh) * 2009-09-23 2010-02-24 中兴通讯股份有限公司 一种基于移动网络的监听分析方法及其系统
CN102789482A (zh) * 2012-06-29 2012-11-21 安科智慧城市技术(中国)有限公司 一种利用口供识别嫌疑人的方法、系统及电子设备
CN103716878A (zh) * 2013-12-12 2014-04-09 深圳先进技术研究院 利用手机与视频监控设备进行定位的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080132210A1 (en) * 2005-06-17 2008-06-05 Korneluk Jose E Method and apparatus for enhanced identification of individual(s)
CN101656913A (zh) * 2009-09-23 2010-02-24 中兴通讯股份有限公司 一种基于移动网络的监听分析方法及其系统
CN102789482A (zh) * 2012-06-29 2012-11-21 安科智慧城市技术(中国)有限公司 一种利用口供识别嫌疑人的方法、系统及电子设备
CN103716878A (zh) * 2013-12-12 2014-04-09 深圳先进技术研究院 利用手机与视频监控设备进行定位的方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LI, TING ET AL.: "A review on the classification, patterns and applied research of human mobility trajectory", PROGRESS IN GEOGRAPHY, vol. 33, no. 7, 31 July 2014 (2014-07-31), XP055332418 *
SHEN, QUNYI ET AL.: "lun yidong hulianwang xinxi zai zhenchazhongde yingyong", JOURNAL ( HUBEI UNIVERSITY OF POLICE, 31 January 2015 (2015-01-31) *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378002A (zh) * 2019-07-11 2019-10-25 华中农业大学 基于移动轨迹的社会关系建模方法
CN110378002B (zh) * 2019-07-11 2023-05-12 华中农业大学 基于移动轨迹的社会关系建模方法
CN111199417A (zh) * 2019-11-29 2020-05-26 北京深演智能科技股份有限公司 虚假设备id的识别方法及装置
CN111612675A (zh) * 2020-05-18 2020-09-01 浙江宇视科技有限公司 同行对象确定方法、装置、设备及存储介质
CN111612675B (zh) * 2020-05-18 2023-08-04 浙江宇视科技有限公司 同行对象确定方法、装置、设备及存储介质
CN111950937A (zh) * 2020-09-01 2020-11-17 上海海事大学 一种基于融合时空轨迹的重点人员风险评估方法
CN111950937B (zh) * 2020-09-01 2023-12-01 上海海事大学 一种基于融合时空轨迹的重点人员风险评估方法
CN113656686A (zh) * 2021-07-26 2021-11-16 深圳市中元产教融合科技有限公司 一种基于产教融合的任务报告的生成方法及服务系统

Also Published As

Publication number Publication date
CN106304015A (zh) 2017-01-04
CN106304015B (zh) 2019-11-29

Similar Documents

Publication Publication Date Title
WO2016188380A1 (zh) 用户设备的确定方法及装置
US11113255B2 (en) Computer-based systems configured for entity resolution for efficient dataset reduction
CN111614690B (zh) 一种异常行为检测方法及装置
David et al. SURVEY ON CRIME ANALYSIS AND PREDICTION USING DATA MINING TECHNIQUES.
Huang et al. A large-scale empirical study of geotagging behavior on Twitter
Bogomolov et al. Moves on the street: Classifying crime hotspots using aggregated anonymized data on people dynamics
Li et al. Towards social data platform: Automatic topic-focused monitor for twitter stream
US10030986B2 (en) Incident response analytic maps
CN109615573B (zh) 基于大数据的外地人员流窜作案分析方法及系统
US20160034505A1 (en) Systems and methods for large-scale link analysis
Mengshoel et al. Will we connect again? machine learning for link prediction in mobile social networks
CN108667678A (zh) 一种基于大数据的运维日志安全检测方法及装置
Young et al. Detecting and classifying anomalous behavior in spatiotemporal network data
CN116865994A (zh) 一种基于大数据的网络数据安全预测方法
Bhuyan et al. Crime predictive model using big data analytics
Khatun et al. Data mining technique to analyse and predict crime using crime categories and arrest records
CN114707685A (zh) 一种基于大数据建模分析的事件预测方法及装置
Li et al. Demalc: A feature-rich machine learning framework for malicious call detection
CN114780612B (zh) 一种基于主题事件的时间关联挖掘目标人员的系统及方法
Wu et al. Boosting Internet card cellular business via user portraits: A case of Churn prediction
Ertugrul et al. Forecasting heroin overdose occurrences from crime incidents
Ozer et al. Predicting the change of location of mobile phone users
CN114491287A (zh) 基于移动互联网的异常行为协同筛选方法及系统
Lin Event-related crowd activities on social media
Geepalla et al. Analysis CDR for Crime Investigation using graph-based method (Neo4j)

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16799268

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16799268

Country of ref document: EP

Kind code of ref document: A1