CN111884821B - Ticket data processing and displaying method and device and electronic equipment - Google Patents

Ticket data processing and displaying method and device and electronic equipment Download PDF

Info

Publication number
CN111884821B
CN111884821B CN202010230658.7A CN202010230658A CN111884821B CN 111884821 B CN111884821 B CN 111884821B CN 202010230658 A CN202010230658 A CN 202010230658A CN 111884821 B CN111884821 B CN 111884821B
Authority
CN
China
Prior art keywords
call
ticket
tickets
base station
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010230658.7A
Other languages
Chinese (zh)
Other versions
CN111884821A (en
Inventor
马洪涛
马楷岳
陈英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202010230658.7A priority Critical patent/CN111884821B/en
Publication of CN111884821A publication Critical patent/CN111884821A/en
Application granted granted Critical
Publication of CN111884821B publication Critical patent/CN111884821B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/14Charging, metering or billing arrangements for data wireline or wireless communications
    • H04L12/1453Methods or systems for payment or settlement of the charges for data transmission involving significant interaction with the data transmission network
    • H04L12/1482Methods or systems for payment or settlement of the charges for data transmission involving significant interaction with the data transmission network involving use of telephony infrastructure for billing for the transport of data, e.g. call detail record [CDR] or intelligent network infrastructure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/358Browsing; Visualisation therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The embodiment of the application provides a call ticket data processing display method, a call ticket data processing display device and electronic equipment, and relates to the field of data processing. According to the method, the importance degree value of each base station position identification and each other side number in each call ticket is determined according to the obtained multiple call tickets, then the base station position identification key set and the other side number key set are determined according to the importance degree value, and the similarity between any two call tickets is determined according to the two sets, the number holder identification and the serial number data of the local machine. Because the similarity represents the consistency degree of the behavior characteristics between two call tickets, after the similarity between any two call tickets is determined, the local communication numbers corresponding to a plurality of call tickets with high similarity are determined as a cluster, a visual communication data diagram is generated and displayed, the cluster structure in the call ticket data can be accurately and quickly analyzed, the cluster structure can be visually displayed, and the investigation and case handling efficiency of a public security organization is improved.

Description

Ticket data processing and displaying method and device and electronic equipment
Technical Field
The application relates to the field of data processing, in particular to a method and a device for processing and displaying call ticket data and electronic equipment.
Background
The mobile phone is usually used as a contact tool between criminal partners, and numbers and the mobile phone are frequently replaced by criminal suspects or criminals in order to conceal criminal behaviors of criminal suspects or criminals. Especially in group crime, since there are multiple mobile phones or multiple call numbers as the first crime suspects, the call bill data formed by the call records among the members of the crime group usually includes clustering features, so the call bill data of the crime group usually forms a communication relation network with a specific clustering structure based on the call numbers.
At present, when a relation network of a specific clustering structure in call bill data of a crime group is analyzed and displayed to determine clustering characteristics in the call bill data of the crime group, the problems that the clustering structure data is difficult to mine, the searching rate is low, the displayed graph of the data is not visual and the like exist, and the investigation and case handling efficiency of a public security organization is reduced.
Disclosure of Invention
The application aims to provide a method, a device and an electronic device for processing and displaying call ticket data, which can accurately and quickly analyze a cluster structure in the call ticket data, visually display the cluster structure and improve the efficiency of investigation and case handling of a public security organization.
The embodiment of the application can be realized as follows:
in a first aspect, an embodiment provides a method for processing and displaying call ticket data, including: acquiring a plurality of call tickets; each call ticket comprises at least one call record, and each call record comprises a base station position identifier, an opposite party number, a number holder identifier and local serial number data; determining the importance degree value of each base station position mark and each opposite party number in each ticket according to a preset calculation formula; carrying out de-duplication and combination on a base station position identifications with the maximum importance degree value in each call ticket to obtain a base station position identification key set of the multiple call tickets; carrying out de-duplication and combination on b opposite side numbers with the maximum importance degree value in each call bill to obtain an opposite side number key set of the multiple call bills; determining the similarity between any two call tickets according to the base station position identification key set, the opposite party number key set, the number holder identification and the local serial number data; the similarity represents the consistency degree of behavior characteristics between the two call tickets; determining a plurality of local communication numbers corresponding to a plurality of telephone bills with consistent behavior characteristics in the plurality of telephone bills into a cluster according to the similarity, and obtaining a cluster structure of the plurality of telephone bills; and generating a visual communication data graph according to the clustering structure, and displaying the visual communication data graph.
In a second aspect, an embodiment further provides a device for processing and displaying call ticket data, including: the acquisition module is used for acquiring a plurality of call tickets; each call ticket comprises at least one call record, and each call record comprises a base station position identifier, an opposite party number, a number holder identifier and local serial number data; the calculation module is used for determining the importance degree value of each base station position identification and each opposite party number in each call ticket according to a preset calculation formula; the calculation module is further configured to perform de-duplication and combination on the a base station location identifiers with the largest importance value in each call ticket to obtain a base station location identifier key set of the multiple call tickets; the calculation module is further used for performing de-duplication and combination on the b opposite side numbers with the maximum importance degree value in each call ticket to obtain an opposite side number key set of the multiple call tickets; the calculation module is also used for determining the similarity between any two call tickets according to the base station position identification key set, the opposite party number key set, the number holder identification and the local serial number data; the similarity represents the consistency degree of behavior characteristics between the two call tickets; the clustering module is used for determining a plurality of local communication numbers corresponding to a plurality of telephone bills with consistent behavior characteristics in the plurality of telephone bills into a cluster according to the similarity so as to obtain a clustering structure of the plurality of telephone bills; and the display module is used for generating a visual communication data graph according to the clustering structure and displaying the visual communication data graph.
In a third aspect, an embodiment further provides an electronic device, including: the device comprises a processor, a memory and a bus, wherein the memory stores machine readable instructions, when the server runs, the processor and the memory are communicated through the bus, and the processor executes the machine readable instructions to execute the call ticket data processing display method.
The beneficial effects of the embodiment of the application include, for example: firstly, the method and the device can determine the importance degree value of each base station position identification and each opposite party number in each call bill according to the obtained multiple call bills, then determine the key set of the base station position identification and the key set of the opposite party number according to the importance degree value, and determine the similarity between any two call bills according to the two sets, the number holder identification and the serial number data of the local machine. The similarity represents the consistency degree of the behavior characteristics between the two call tickets, namely the similarity can accurately and efficiently reflect the possibility degree that the two call tickets are a cluster. Therefore, after the similarity between any two call tickets is determined, a plurality of local communication numbers corresponding to a plurality of call tickets with consistent behavior characteristics in the call tickets are determined as a cluster, and finally the cluster structure is generated into a visual communication data graph and displayed, so that the cluster structure in the call ticket data can be accurately and quickly analyzed, the cluster structure can be visually displayed, the investigation and case handling efficiency of a public security organization is improved, and the technical blank in the field is filled.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
Fig. 1 is a block diagram of an electronic device according to an embodiment of the present disclosure;
fig. 2 is a flowchart of a call ticket data processing and displaying method provided in the embodiment of the present application;
fig. 3 is another flowchart of a call ticket data processing and displaying method provided in the embodiment of the present application;
fig. 4 is another flowchart of a call ticket data processing and displaying method provided in the embodiment of the present application;
fig. 5 is another flowchart of a call ticket data processing and displaying method provided in the embodiment of the present application;
fig. 6 is a visualized data map provided in an embodiment of the present application;
FIG. 7 is another visual data map provided in accordance with an embodiment of the present application;
fig. 8 is a functional block diagram of a call ticket data processing and displaying apparatus according to an embodiment of the present application.
Icon: 100-an electronic device; 110-a memory; 120-a processor; 130-a bus; 140-a communication interface; 200-a call ticket data processing and displaying device; 210-an obtaining module; 220-a calculation module; 230-a clustering module; 240-display module.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
Furthermore, the appearances of the terms "first," "second," and the like, if any, are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.
It should be noted that the features of the embodiments of the present application may be combined with each other without conflict.
In the process of implementing the technical solution of the embodiment of the present application, the inventors of the present application find that:
with the rapid development of scientific technology, the trends of high-tech, intelligent and dynamic crime behaviors and the occupational and organizational trends of criminals are more and more obvious, novel crime means and crime forms with era characteristics are continuously appeared, and the characteristics of crime means concealment, crime mode specialization and the like are increasingly presented. These all present new requirements and challenges for the investigation of the public security authorities. Therefore, the method improves the efficiency of criminal investigation work of the public security organization by an informatization means, promotes the integration of science and technology and law enforcement and case handling, realizes investigation work innovation, and becomes a main choice for supporting the public security organization to fight against illegal criminal work.
At present, electronic equipment capable of realizing communication, such as mobile phones, communication watches and the like, is increasingly popularized in daily life, calling becomes one of important means for people to communicate with each other, and a criminal suspect or criminal often frequently changes a hand number and a mobile phone for concealing crime. Especially in group crime, since there are multiple mobile phones or multiple call numbers as the first crime suspects, the call bill data formed by the call records among the members of the crime group usually includes clustering features, that is, the behavior features among the multiple call numbers in the call bill data of the crime group have the characteristic of high similarity, and then the call bill data of the crime group usually forms a communication relation network with a specific clustering structure based on the call numbers.
At present, a public security organization mainly performs data analysis and mining on clustering features in call ticket data of criminal groups through professional call ticket analysis tool software to provide clues for investigation and case solving, wherein the call ticket analysis software integrates a database technology, a data mining technology and a data visualization technology, and can automatically analyze information such as call time duration, call frequency, call places and the like after original call record data are imported.
However, in the current investigation practice, because the current call bill analysis tool software has simple functions, it is still impossible to directly analyze the clustering features in the call bill data of the criminal group, and an investigator needs to further analyze and express information such as call details, number relationship networks, common contact numbers, etc. of key numbers in the group in a visual graph manner to find hidden relationships among call contacts, grasp the relationship types of the related persons in the case, and further obtain related investigation clues. Therefore, when the relationship network of a specific clustering structure in the call bill data of the criminal group is analyzed and displayed at present to determine the clustering characteristics in the call bill data of the criminal group, the problems that the clustering structure data is difficult to mine, the searching rate is low, the graph displayed by the data is not visual and the like exist, and the investigation and case handling efficiency of the public security organization is reduced. That is, at present, there is no technical scheme in the field that can accurately and quickly analyze the cluster structure in the call ticket data and visually display the cluster structure.
Therefore, in order to improve the above defects, the embodiments of the present application provide a method, an apparatus, and an electronic device for processing and displaying ticket data, which can accurately and quickly analyze a cluster structure in the ticket data, visually display the cluster structure, improve the efficiency of investigation and case handling by a public security organization, and fill up the technical gap in the field. It should be noted that the defects of the solutions in the above prior art are the results obtained after the inventor has made practice and careful study, and therefore, the discovery process of the above problems and the solutions proposed by the embodiments of the present application in the following description should be the contribution of the inventor to the present application in the course of the present application.
Referring to fig. 1, a block diagram of an electronic device 100 according to an embodiment of the present disclosure is shown. The electronic device 100 may include a memory 110, a processor 120, a bus 130, and a communication interface 140, the memory 110, the processor 120, and the communication interface 140 being electrically connected to each other, directly or indirectly, to enable transmission or interaction of data. For example, the components may be electrically connected to each other via one or more buses 130 or signal lines. The processor 120 may process information and/or data related to the ticket data processing to perform one or more of the functions described herein. For example, the processor 120 may obtain multiple call tickets, and perform call ticket data processing according to the data, thereby implementing the call ticket data processing and displaying method provided by the present application.
The Memory 110 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
The processor 120 may be an integrated circuit chip having signal processing capabilities. The Processor 120 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
It will be appreciated that the configuration shown in FIG. 1 is merely illustrative and that the electronic device 100 may include more or fewer components than shown in FIG. 1 or may have a different configuration than shown in FIG. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
Referring to fig. 2, fig. 2 shows a flowchart of a call ticket data processing and displaying method provided in the embodiment of the present application. The method for processing and displaying the call ticket data can be applied to the electronic device 100, and the method for processing and displaying the call ticket data can comprise the following steps:
s100, acquiring multiple call tickets; each call ticket comprises at least one call record, and each call record comprises a base station position identifier, an opposite party number, a number holder identifier and local serial number data.
In some possible embodiments, the electronic device 100 may obtain the multiple phone bills from a storage medium of another device (for example, a server of a mobile communication operator), or obtain the multiple phone bills stored in advance from a storage medium of the electronic device, and therefore, the obtaining manner of the multiple phone bills is not limited in the present application.
In the obtained multiple call tickets, each call record may include, but is not limited to: local communication number (abbreviated as PN), local serial number data (abbreviated as IMEI), number holder identification (abbreviated as ID), opposite side number (abbreviated as ON), call date (abbreviated as TD), call start time (abbreviated as TS), call duration (abbreviated as TT), base station location area code (abbreviated as LAC), base station CELL (abbreviated as CELL), base station location identification (abbreviated as CID) and the like. The base station location identifier CID is a character string composed of a base station location area code LAC and a base station CELL.
For convenience of understanding, in the present application, a call ticket P shown in table 1 below is taken as an example (in table 1, P is 7), and a call ticket data processing display method provided in the embodiment of the present application is specifically described.
TABLE 1
Figure BDA0002429186330000051
Figure BDA0002429186330000061
And S110, determining the importance degree value of each base station position mark and each opposite party number in each call bill according to a preset calculation formula.
The importance degree value may represent the number of occurrences of each base station location identifier and each opposite party number in each call ticket, and the preset calculation formula may be a formula capable of determining the number of occurrences of each base station location identifier and each opposite party number in each call ticket, for example, a common frequency statistical method, a TF-IDF (Term frequency-Inverse text frequency) algorithm, and the like.
In some possible embodiments, after obtaining multiple call tickets, taking the call ticket 1 in table 1 as an example, the call ticket 1 includes 3 call records, one of the call records is given to the opposite party number B, and two of the call records are given to the opposite party number C, so that, referring to a common frequency statistical method, the importance degree value of the opposite party number C is greater than that of the opposite party number B (that is, when the preset calculation formula is a common frequency statistical formula); correspondingly, in the call ticket 1, the importance degree value of the base station location identifier CID1 is greater than that of the base station location identifier CID 2. Similarly, when the preset calculation formula is a TF-IDF algorithm formula, the importance degree value of each base station position identifier and each opposite party number in each call bill can be obtained.
It should be noted that, for the same counterpart number or the same base station location identifier in each ticket, the corresponding calculated importance levels are the same (i.e. the importance levels corresponding to the same counterpart number or the same base station location identifier in a ticket are the same). Furthermore, in order to reduce the calculation amount, for the same counterpart number or the same base station location identifier in each call ticket, the importance degree value of the counterpart number or the base station location identifier in the call ticket may be calculated only once. Continuing to take the ticket 1 in table 1 as an example, the ticket 1 includes 3 call records, wherein two call records are given to the opposite party number C, so that the importance degree value of the opposite party number C in the ticket 1 can be calculated only once.
It should be further noted that the preset calculation formulas can accurately and quickly analyze the cluster structures in the call ticket data, and the preset calculation formulas are not limited in the present application.
Further, on the basis of fig. 2, as to how to "determine the importance degree value of each base station location identifier and each counterpart number in each call ticket according to the preset calculation formula", referring to fig. 3, S110 may include:
S110A, determining the importance degree value of each base station position mark and each opposite side number in each call ticket according to the word frequency-inverse text frequency TF-IDF formula.
Taking the calculation of the importance degree value of each base station position identifier in each call ticket according to the TF-IDF formula as an example, the importance degree value can be calculated according to the formula: tf isi,j=nj,j/∑k nk,jCalculating the position mark of each base station in each call billA TF value of (2), wherein ni,jIdentify the number of times, Σ, that the i base station location appears in the j ticketk nk,jThe i base station position mark can be any base station position mark in the j call ticket.
Can then be based on the formula idfiAnd calculating the IDF value of each base station position identifier in each ticket (lg (| D |/E)), wherein D is the total number of the tickets, and E is the total number of all tickets containing the base station position identifiers.
Finally, the formula tfidf can be usedi,j=tfi,j×idfiAnd calculating the importance degree value of each base station position identifier in each call bill, namely calculating the TFIDF value according to the TF-IDF formula, namely the importance degree value. Correspondingly, the above steps can be referred to for calculating the importance value of each counterpart number in each call ticket according to the TF-IDF formula, and are not described herein again.
Continuing to further illustrate the above step S110A by taking table 1 as an example, after determining the position identifier of each base station and the importance degree value of each counterpart number in each call ticket in table 1 according to the TF-IDF formula, the following table 2 can be obtained.
TABLE 2
Figure BDA0002429186330000071
Figure BDA0002429186330000081
Referring to fig. 2 again, in S120, the a base station location identifiers with the largest importance value in each call ticket are de-overlapped, and a base station location identifier key set of multiple call tickets is obtained.
After the importance degree value of each base station position identification in each call bill is determined, the base station position identifications in each call bill can be sequenced according to the sequence of the importance degree values from large to small. And then, determining the position identifiers of the first a base stations in each call ticket as the position identifiers of the a base stations with the maximum importance degree value in each call ticket. And finally combining a base station position identifications with the maximum importance degree value of each call ticket into a set (for example, if five call tickets exist, the set comprises 5 x a base station position identifications), combining the same base station position identifications in the set into one, and obtaining the combined set which is the base station position identification key set of the multiple call tickets.
The base station position identifiers with the same importance degree value in each ticket only occupy one ranking when ranking, because the corresponding importance degree values of the base station position identifiers with the same importance degree value in one ticket are the same.
For how to determine the "a base station location identifiers with the largest importance degree value in each ticket", taking the determination of 1 (equivalent to a 1) base station location identifier with the largest importance degree value of the multiple tickets in table 2 as an example, in the ticket 1, the importance degree value of the base station location identifier CID1 is greater than that of the base station location identifier CID2, so the base station location identifiers in the ticket 1 are sorted as CID1 and CID 2; in the ticket 2, the importance degree value of the base station position identifier CID3 is greater than the base station position identifier CID4, and the importance degree value of the base station position identifier CID4 is greater than the base station position identifier CID1, so the base station position identifiers in the ticket 2 are ordered as CID3, CID4, CID1, and so on, it can be known that the base station position identifiers in the ticket 3 are ordered as CID1, CID2, the base station position identifiers in the ticket 4 are ordered as CID6, CID4, the base station position identifiers in the ticket 5 are ordered as CID7, CID8, the base station position identifiers in the ticket 6 are ordered as CID1, CID2, and the base station position identifiers in the ticket 7 are ordered as CID12, CID 13.
After the sequencing of the base station position identifications in each call ticket is obtained (i.e. after a base station position identification with the largest importance degree value in each call ticket is determined), for how to ' de-coincide the a base station position identifications with the largest importance degree value in each call ticket and obtain a base station position identification key set of a plurality of call tickets ', taking de-coincidence and the plurality of call tickets in table 2 as an example, based on the sequencing of the base station position identifications in each given call ticket, CID1, CID3, CID1, CID6, CID7, CID1 and CID12 can be combined into a set to obtain a set { CID1, CID3, CID1, CID6, CID7, CID1 and CID12}, then the same base station position identifications in the set are combined into one ' so as to obtain a set { CID1, CID3, CID6, CID7 and CID12}, which is the set of the call tickets with the largest importance degree values in table 2 and is combined. And identifying the key set of the base station positions of the obtained call tickets 1-7. By analogy, a specific execution process of S120 can be obtained.
Further, the above S120 may be further explained in a set form, assuming that the set of multiple call tickets is S1,s2,…sj,…,spIn which s is1,s2,…sj,…,spRespectively representing a ticket.
Figure BDA0002429186330000091
Denotes sjThe method comprises the steps of collecting a base station position identifications with the maximum importance degree value in call bills, combining and de-duplicating the collection formed by the a base station position identifications with the maximum importance degree value in each call bill, and obtaining a base station position identification key collection { CID (cell ID configuration identifier) of a plurality of call bills1,CID2,…,CIDcvWherein, a is less than or equal to cv is less than or equal to a multiplied by p.
It should be noted that in practical application, each call ticket generally includes hundreds or thousands of call records, and therefore, in practical application, a may be regarded as 12 by default.
S130, the b opposite side numbers with the maximum importance degree value in each call bill are subjected to de-duplication combination to obtain an opposite side number key set of a plurality of call bills.
After the importance degree value of each other's number in each call ticket is determined, the other's numbers in each call ticket can be sorted according to the order of the importance degree values from large to small. Then, the first b numbers of the other parties in each call ticket are determined as the b numbers of the other parties with the maximum importance degree value in each call ticket. Finally, b opposite side numbers with the maximum importance degree value of each call ticket are combined into a set (for example, if five call tickets exist, the set comprises 5 x b opposite side numbers), the same opposite side numbers in the set are combined into one, and the combined set is the opposite side number key set of the multiple call tickets.
Because the importance degree values corresponding to the same counterpart number in one ticket are the same, the counterpart numbers with the same importance degree value in each ticket only occupy one ranking when ranking.
For how to determine "the B number of opposite parties with the largest importance value in each call ticket", taking the determination of the 1 (equivalent to B being 1) number of opposite parties with the largest importance value of the multiple call tickets in table 2 as an example, in call ticket 1, the importance value of the number C of the opposite party is greater than the number B of the opposite party, so the ranking of the importance values of the numbers of the opposite parties in call ticket 1 is C, B; in the ticket 2, the importance degree value of the party number a is greater than the party number D, and the importance degree value of the party number D is greater than the party number C, so the order of the importance degree values of the party numbers in the ticket 2 is A, D, C, and by analogy, the order of the importance degree values of the party numbers in the ticket 3 is E, B, the order of the importance degree values of the party numbers in the ticket 4 is A, B, the order of the importance degree values of the party numbers in the ticket 5 is H, the order of the importance degree values of the party numbers in the ticket 6 is D, C, A, and the order of the importance degree values of the party numbers in the ticket 7 is I, A, B.
After the ordering of the phone numbers in each ticket is obtained (i.e. after b phone numbers with the largest importance degree value in each ticket are determined), for how to 'de-overlap and combine the b phone numbers with the largest importance degree value in each ticket to obtain the key set of the phone numbers of a plurality of tickets', taking de-overlap and a plurality of tickets in table 2 as an example, based on the given ordering of the importance degree values of the phone numbers in each ticket, C, A, E, A, H, D, I can be combined into a set to obtain a set { C, A, E, A, H, D, I }, then the set { C, A, E, H, D, I } can be obtained by the 'combining the same phone numbers in the set into one', the set is the de-overlap and the combination of 1 phone number with the largest importance degree value in the tickets 1-7 in table 2, and the obtained call ticket 1-call ticket 7 is the key set of the opposite party number. By analogy, a specific implementation of S130 can be obtained.
Further, the above S120 may be further explained in a set form, assuming that the set of multiple call tickets is S1,s2,…sj,…,spIn which s is1,s2,…sj,…,spRespectively representing a ticket.
Figure BDA0002429186330000101
Denotes sjThe set of b opposite side numbers with the maximum importance degree value in the call bills is combined and de-duplicated respectively to the set of b opposite side numbers with the maximum importance degree value in each call bill, and then the key set of opposite side numbers { ON) of a plurality of call bills can be obtained1,ON2,…,ONovWherein b is less than or equal to ov is less than or equal to bxp.
It should be noted that, in practical application, each call ticket generally includes hundreds or thousands of call records, and therefore, in practical application, b may be default to 5 call records.
It should be further noted that, in practical applications, the execution sequence between S120 and S130 is not sequential, and S120 may be executed before S130, may also be executed after S130, or even may be executed simultaneously with both S130, which is not limited in this application.
S140, determining the similarity between any two call tickets according to the base station position identification key set, the opposite party number key set, the number holder identification and the serial number data of the local machine; and the similarity represents the consistency degree of the behavior characteristics between the two call tickets.
After the base station position identification key set and the opposite side number key set are determined, in some possible embodiments, because the consistency degree of the behavior characteristics between the two call tickets is directly related to whether the call numbers corresponding to the two call tickets are used by the same person, in order to accurately and quickly analyze the cluster structure in the call ticket data, the similarity between any two call tickets can be determined according to the base station position identification key set, the opposite side number key set, the number holder identification and the local serial number data.
Further, as to how to determine the similarity between any two phone bills according to the base station location identification key set, the opposite party number key set, the number holder identification, and the local serial number data, referring to fig. 4, S140 may include:
S140A, determining the key base station position identification frequency vector corresponding to each call ticket according to the base station position identification key set.
In some possible embodiments, the key base station location identification frequency vector may be represented by a binary code
Figure BDA0002429186330000102
Is expressed in the form of (a) ≦ cv ≦ a × p.
Wherein, S140A may specifically include: determining the key base station position identification frequency vector corresponding to the j call ticket according to the following formula
Figure BDA0002429186330000103
Component (b):
Figure BDA0002429186330000111
wherein, CIDiA value representing the ith base station location identity in the critical set of base station location identities,
Figure BDA0002429186330000112
the method comprises the steps of collecting a base station position identifications with the maximum j ticket importance degree value; the j ticket is any one of a plurality of tickets,
Figure BDA0002429186330000113
the component is the ith component of the key base station position identification frequency vector corresponding to the j call ticket. It should be understood that,
Figure BDA0002429186330000114
the component is also any component in the key base station position identification frequency vector corresponding to the j call ticket.
That is to say that the position of the first electrode,
Figure BDA0002429186330000115
it can be understood that: when CID is usediDo not belong to
Figure BDA0002429186330000116
When the information is collected in the collection, the information is collected,
Figure BDA0002429186330000117
is 0;
Figure BDA0002429186330000118
it can be understood that: when CID is usediBelong to
Figure BDA0002429186330000119
When the information is collected in the collection, the information is collected,
Figure BDA00024291863300001110
is 1. Furthermore, the method for determining the key base station location identity frequency vector corresponding to each ticket can be understood as follows: taking the determination of the key base station position identification frequency vector corresponding to the j call ticket as an example, traversing each element in the base station position identification key set, if a certain element exists in the set of a base station position identifications with the maximum importance degree value of the j call ticket, the value of the element is 1, if a certain element does not exist in the set of a base station position identifications with the maximum importance degree value of the j call ticket, the value of the element is 0, and the traversed base station position identification key set is the key base station position identification frequency vector corresponding to the j call ticket.
The above S140A is further explained below with reference to the multiple call tickets shown in table 3.
TABLE 3
Figure BDA00024291863300001111
Figure BDA0002429186330000121
Assuming that a is 2, the set of 2 base station location identifiers with the maximum importance degree value of the ticket 1 (hereinafter, the set of 2 base station location identifiers with the maximum importance degree value is referred to as a key base station location identifier for short) is CID2 and CID3, the key base station location identifiers of the ticket 2 are CID3 and CID4, the key base station location identifier of the ticket 3 is CID2 and CID3, the key base station location identifiers of the ticket 4 are CID6 and CID4, the combined set is { CID2, CID3, CID3, CID4, CID2, CID3, CID6 and CID4}, the key set of the base station location identifiers of a plurality of tickets can be obtained after the set is repeated and combined into { CID2, CID3, CID4 and CID6}, further based on a formula shown in S140A, the key base station location identifiers of the ticket 1 can be obtained respectively, the key base station location identifier frequency vector of the ticket 1, the key base station location identifier of the ticket 1, 0, the key location identifier of the ticket 2}, the key location identifier of the ticket 1 is 0, and the key location identifier of the ticket 1 is { frequency vector {1, the key location identifier of the ticket 1}, and the key base station of the ticket 3 { key location identifier of the ticket 1}, and the ticket 3 { key base station of the ticket 4}, and the ticket 4 is obtained by the key base station of the ticket 4 1. 0, and the key base station position identification frequency vector of the ticket 4 is {0, 1 }.
S140B, determining the key opposite party number frequency vector corresponding to each call ticket according to the opposite party number key set.
In some possible embodiments, the key base station location identification frequency vector may be represented by a binary code
Figure BDA0002429186330000131
Is expressed in the form of (1), wherein b ≦ ov ≦ b × p.
Wherein, S140B may specifically include: determining the frequency vector of the key counterpart number corresponding to the j call ticket according to the following formula
Figure BDA0002429186330000132
Component (b):
Figure BDA0002429186330000133
wherein is ONiRepresenting in a key set of opposite party numbersThe value of the ith counterpart number,
Figure BDA0002429186330000134
representing a set of b opposite side numbers with the maximum value of the importance degree of the j call ticket; the j ticket is any one of a plurality of tickets,
Figure BDA0002429186330000135
the component is the ith component of the key counterpart number frequency vector corresponding to the j call ticket. It should be understood that,
Figure BDA0002429186330000136
the component is also any component in the key counterpart number frequency vector corresponding to the j call ticket.
That is to say that the position of the first electrode,
Figure BDA0002429186330000137
it can be understood that: when ONiDo not belong to
Figure BDA0002429186330000138
When the information is collected in the collection, the information is collected,
Figure BDA0002429186330000139
is 0;
Figure BDA00024291863300001310
it can be understood that: when ONiBelong to
Figure BDA00024291863300001311
When the information is collected in the collection, the information is collected,
Figure BDA00024291863300001312
is 1. Furthermore, the method for determining the frequency vector of the key counterpart number corresponding to each ticket can be understood as follows: taking the determination of the key counterpart number frequency vector corresponding to the j call ticket as an example, traversing each element in the counterpart number key set, if a certain element exists in the set of the b counterpart numbers with the maximum importance degree value of the j call ticket, the element takes the value of 1, and if a certain element exists in the b pairs with the maximum importance degree value of the j call ticketIf the set of the party number does not exist, the value of the element is 0, and the traversed key set of the party number is the key party number frequency vector corresponding to the j call ticket.
The above S140B is further explained with reference to the above multi-call ticket shown in table 3. Assuming that b is 2, the set of 2 key numbers with the maximum importance value of the ticket 1 (hereinafter, the set of 2 key numbers with the maximum importance value is referred to as a key number for short) is C, E, the key number of the ticket 2 is E, C, the key number of the ticket 3 is B, D, the key number of the ticket 4 is H, F, the combined set is { C, E, E, C, B, D, H, F }, the key set of the opposite side numbers of a plurality of tickets can be obtained after the set is de-duplicated and combined into { C, E, B, D, H, F }, further, based on a formula shown in S140B, the key number frequency vectors of the ticket 1 can be respectively obtained as {1, 0}, the key number frequency vector of the ticket 2 is {1, 0}, and the key number frequency vector of the ticket 3 is {0, 1}, 0}, and the key number frequency vector of the ticket 3 is {1, 0, 1}, and so as to obtain the key number frequency vector of the ticket 1, 1. 0, and the key number frequency vector of the ticket 4 is {0, 1 }.
It should be noted that, in practical applications, the execution sequence between S140A and S140B is not sequential, and S140A may be executed before S140B, or after S140B, or even both of them may be executed simultaneously, which is not limited in this application.
And S140C, determining the similarity between any two call tickets according to the position identification frequency vector of the key base station, the frequency vector of the key counterpart number, the number holder identification and the serial number data of the local machine.
Wherein, S140C may specifically include: determining s according to the following formulaxAnd sySimilarity g between themμ(x,y):
Figure BDA0002429186330000141
Wherein the content of the first and second substances,
Figure BDA0002429186330000142
Figure BDA0002429186330000143
sxand syAre all any one of a plurality of call tickets, and sxAnd syDifferent call tickets are used, kappa is more than or equal to 0 and less than or equal to 2, lambda is more than or equal to 0 and less than or equal to 2, kappa + lambda is 2,
Figure BDA0002429186330000144
is s isxThe local string number data of (a) the local string number,
Figure BDA0002429186330000145
is s isyThe local string number data of (a) the local string number,
Figure BDA0002429186330000146
is s isxIs identified by the number holder of (a),
Figure BDA0002429186330000147
is s isyNumber holder identification of (1), xiIs s isxIdentifies the ith component, y, of the frequency vectoriIs s isyIdentifies the ith component, X, of the frequency vectoriIs s isxThe ith component, Y, of the key counterpart number frequency vectoriIs s isyThe ith component of the key counterpart number frequency vector.
Continuing with the ticket 1 and the ticket 2 in the above table 3 as an example, assume that the local serial number data of the ticket 1 and the ticket 2 are not consistent, the number holder identifier is not consistent, κ and λ are both set to 1, and a and b are set to 2. Then the key counterpart number frequency vector corresponding to the ticket 1 is {1, 0}, the key base station location identification frequency vector corresponding to the ticket 1 is {1, 0}, the key counterpart number frequency vector corresponding to the ticket 2 is {1, 0}, and the key base station location identification frequency vector corresponding to the ticket 2 is {0, 1, 0 }. Then it can be calculated according to the above formula:
Figure BDA0002429186330000148
Figure BDA0002429186330000149
αμ(x,y)=0,βμ(x, y) is 0, so the similarity between ticket 1 and ticket 2
Figure BDA00024291863300001410
Referring to fig. 2 again, in S150, according to the similarity, a plurality of local communication numbers corresponding to a plurality of tickets with consistent behavior characteristics in the plurality of tickets are determined as a cluster, so as to obtain a cluster structure of the plurality of tickets.
In this embodiment, as shown in tables 1 and 2, each of the multiple tickets corresponds to one local communication number. Further, as to how to determine a plurality of local communication numbers corresponding to a plurality of tickets with consistent behavior characteristics in the plurality of tickets as a cluster according to the similarity, so as to obtain a cluster structure of the plurality of tickets, referring to fig. 5 on the basis of fig. 2, S150 may include:
S150A, obtaining an unprocessed call ticket from the call ticket set composed of the multiple call tickets as the target call ticket.
S150B, obtaining the similarity between the target ticket and each other unprocessed ticket in the ticket collection.
It can be understood that, when the similarity between the target ticket and each other unprocessed ticket in the ticket set is smaller than the preset threshold, the target ticket may be directly determined as a processed ticket, and then the step S150A is executed.
S150C, generating a cluster set, adding all the call tickets with similarity greater than a preset threshold value and the target call tickets in the call ticket set into the cluster set, and determining all the call tickets belonging to the cluster set in the call ticket set as processed call tickets; and all the call tickets except the target call ticket in the cluster set are to-be-associated cluster call tickets.
S150D, obtaining a clustering call ticket to be associated from the clustering set as a target clustering call ticket to be associated.
S150E, obtaining the similarity between the target cluster ticket to be associated and each unprocessed ticket in the ticket set.
It can be understood that, when the similarity between the target clustering call ticket to be associated and each unprocessed call ticket in the call ticket set is smaller than the preset threshold, the target clustering call ticket to be associated may be directly determined as a clustered call ticket, and then the step S150D is returned to.
S150F, all the call tickets in the call ticket set with similarity larger than the preset threshold value with the target cluster call ticket to be associated are taken as the cluster call tickets to be associated and added into the cluster set, all the call tickets in the call ticket set belonging to the cluster set are determined as processed call tickets, and the target cluster number to be associated is determined as the cluster number.
S150G, judging whether there is still cluster bill to be associated in the cluster set, and returning to execute the step S150D when there is still cluster bill to be associated in the cluster set.
S150H, judging whether the ticket collection has the unprocessed ticket, and returning to execute the step S150A when the ticket collection has the unprocessed ticket.
S150I, determining the local communication numbers corresponding to the multiple telephone bills in each generated cluster set as a cluster structure.
The above-mentioned S150A-S150I will be further explained with reference to examples.
Assuming that the similarity between any two call tickets 1-7 is as shown in table 4 below, the preset threshold is 0.7, and at this time, the call tickets 1-7 are all unprocessed call tickets.
TABLE 4
Figure BDA0002429186330000151
Figure BDA0002429186330000161
Firstly, an unprocessed call ticket can be obtained from the call tickets 1-7 (i.e. the call ticket set composed of the multiple call tickets) as a target call ticket (assuming that the call ticket 1 is obtained), and then the similarity (1, 0.1, 0.2, 0.4, 2, 0.5, respectively) between the call ticket 1 and the call ticket 2 to the call ticket 7 is obtained. Since the preset threshold is 0.7, the call tickets 2 and 6 can be added into the newly generated cluster set 1, at this time, the cluster set 1 is { call ticket 1, call ticket 2, call ticket 6}, at this time, since the call tickets 1, 2, 6 are all processed, the call tickets 1, 2, 6 need to be determined as processed call tickets, and the call tickets 2, 6 are to-be-associated cluster call tickets.
Then, one clustering ticket to be associated can be arbitrarily acquired from the clustering set 1 as a target clustering ticket to be associated (assuming that the ticket 2 is acquired), and then the similarity between the ticket 2 and each unprocessed ticket in the ticket set can be acquired, that is, the similarity between the ticket 2 and the tickets 3, 4, 5 and 7 (0.2, 1, 0.4 and 0.1 respectively) needs to be acquired at this time, because the similarity between the ticket 2 and the ticket 4 is greater than a preset threshold value, the ticket 4 needs to be added into the newly generated clustering set 1, at this time, the clustering set 1 is { ticket 1, ticket 2, ticket 6 and ticket 4}, the ticket 1, 2, ticket 4 and ticket 6 are processed tickets, and 4 and ticket 6 are the clustering tickets to be associated.
Then, it is determined that the call ticket 6 in the clustering set 1 is a clustering call ticket to be associated, so that the similarity between the call ticket 6 and the call tickets 3, 5, and 7 (0.1, 0.5, and 0.4, respectively) needs to be obtained, it is determined that the call ticket 4 in the clustering set 1 is a clustering call ticket to be associated, so that the similarity between the call ticket 4 and the call tickets 3, 5, and 7 (0.2, 0.1, and 0.1, respectively) needs to be obtained, and since the similarities are all less than a preset threshold, the clustering set 1 is finally { call ticket 1, call ticket 2, call ticket 6, call ticket 4}, and at this time, the call ticket 1, the call ticket 2, the call ticket 4, and the call ticket 6 are processed call tickets, and the clustering set 1 does not have a clustering call ticket to be associated.
After the clustering set 1 is determined, since the call tickets 3, 5, and 7 are still unprocessed call tickets, it is necessary to acquire an unprocessed call ticket from the call tickets 3, 5, and 7 as a target call ticket (assuming that the call ticket 3 is acquired), and then acquire similarities (0.8 and 0.5, respectively) between the call ticket 3 and the call tickets 5 and 7. Since the preset threshold is 0.7, the call tickets 3 and 5 can be added into the newly generated cluster set 2, at this time, the cluster set 2 is { call ticket 3, call ticket 5}, at this time, the call tickets 1, call tickets 2, call tickets 4, call tickets 6, call tickets 3, and call tickets 5 are determined to be processed call tickets, and the call tickets 5 are to-be-associated cluster call tickets.
Then, one clustering ticket to be associated can be arbitrarily acquired from the clustering set 2 as a target clustering ticket to be associated (acquiring the ticket 5), and then, the similarity between the ticket 5 and each unprocessed ticket in the ticket set can be acquired, that is, the similarity (0.5) between the ticket 5 and the ticket 7 needs to be acquired at this time, because the similarity is smaller than a preset threshold, the clustering set 2 is finally { ticket 3, ticket 5}, and at this time, the ticket 1, the ticket 2, the ticket 4, the ticket 6, the ticket 3, and the ticket 5 are processed tickets, and the clustering set 2 does not have clustering tickets to be associated.
And finally, determining the clustering structure of the call tickets 1-7 as that the local communication numbers corresponding to the call tickets 1, 2, 6 and 4 are one cluster, and that the local communication numbers corresponding to the call tickets 3 and 5 are one cluster, because only the call tickets 7 are unprocessed call tickets.
It should be understood that, according to the determined clustering structure, "the local communication numbers corresponding to the ticket 1, the ticket 2, the ticket 6, and the ticket 4 are one cluster", and "the local communication numbers corresponding to the ticket 3, and the ticket 5 are one cluster", it can be determined: the local communication numbers corresponding to the call tickets 1, 2, 6 and 4 are actually used by the same person, the local communication numbers corresponding to the call tickets 3 and 5 are also actually used by the same person, and therefore when a relation network of a specific clustering structure in call ticket data of a crime group is analyzed and displayed, clustering characteristics in the call ticket data of the crime group and a first crime suspector of the crime group can be accurately and quickly determined.
Referring to fig. 2 again, in S160, a visual communication data map is generated according to the cluster structure, and the visual communication data map is displayed.
In some possible embodiments, after generating the visual contact data graph according to the clustering structure, the visual contact data graph may be presented as shown in fig. 6.
Further, the visual communication data graph can comprise communication relations between the clusters and other call bills in the multiple call bills.
Continuing with the example of table 2, assuming that the cluster structure of the multiple call tickets includes that the local communication numbers corresponding to the call ticket 1, the call ticket 2, the call ticket 6 and the call ticket 4 are respectively a cluster, and the local communication numbers corresponding to the call ticket 3 and the call ticket 5 are respectively a cluster, the visual communication data diagram can refer to fig. 7, wherein a pie chart (i) indicates that the local communication numbers corresponding to the call ticket 1, the call ticket 2, the call ticket 6 and the call ticket 4 are respectively a cluster, a pie chart (ii) indicates that the local communication numbers corresponding to the call ticket 3 and the call ticket 5 are respectively a cluster, a block diagram (iii) indicates that the local communication number corresponding to the call ticket 7 is a cluster, a pie chart (with a number 1) indicates that the cluster dials 1 call to the local communication number corresponding to the call ticket 7, a pie chart (ii) indicates that the cluster dials 4 calls to the pie chart, and the directional arrow pointing to the pie chart (I) indicates that the cluster calls the cluster of the pie chart (I) for 1 time, and the directional arrow in the pie chart indicates the communication relation between the cluster and other call bills in the multiple call bills.
It should be understood that the pie chart represents a cluster (actually, it may represent that a person holds multiple mobile phones), the block diagram represents local communication numbers corresponding to other call tickets in the multiple call tickets, and the directional arrow represents a communication relationship between the cluster and other call tickets in the multiple call tickets. It should also be understood that the above-mentioned visual communication data map may also have other expressions, for example, the blocks in the pie chart adopt different colors to distinguish the local communication numbers included in the cluster, the color of the directional arrow to distinguish various contact ways, and the like, which is not limited in this application.
It should be noted that, in practical applications, since the call records in the call ticket may include records such as "call date, call start time, call duration, base station location area code, and base station cell" in addition to records of "base station location identifier and call number", the "performing data processing on multiple call tickets according to the base station location identifier and call number record to obtain a cluster structure of multiple call tickets" described in S100-S150 is only one of possible embodiments of the present application, and the protection range of the present application is not limited thereto.
It should be understood that based on the above-mentioned method for processing and displaying call ticket data, firstly, the present application can determine the importance degree value of each base station location identifier and each opposite party number in each call ticket according to the obtained multiple call tickets, then determine the base station location identifier key set and the opposite party number key set according to the above-mentioned importance degree value, and obtain the similarity between any two call tickets according to the above-mentioned two sets. The similarity represents the consistency degree of the behavior characteristics between the two call tickets, namely the similarity can accurately and efficiently reflect the possibility degree that the two call tickets are a cluster. Therefore, after the similarity between any two call bills is determined, the communication numbers of the local machines corresponding to the call bills with consistent behavior characteristics in the call bills are determined to be a cluster, and finally the cluster structure is generated into a visual communication data graph and displayed, so that the cluster structure in the call bill data can be accurately and quickly analyzed, the cluster structure can be visually displayed, and the investigation and case handling efficiency of a public security organization is improved.
In order to execute corresponding steps in the foregoing embodiment and various possible manners, an implementation manner of the call ticket data processing and displaying apparatus is provided below, please refer to fig. 8, and fig. 8 shows a functional block diagram of the call ticket data processing and displaying apparatus provided in the embodiment of the present application. It should be noted that the basic principle and the generated technical effect of the ticket data processing and displaying apparatus 200 provided in the present embodiment are the same as those of the above embodiments, and for the sake of brief description, no part of the present embodiment is mentioned, and reference may be made to the corresponding contents in the above embodiments. The call ticket data processing and displaying device 200 comprises: the system comprises an acquisition module 210, a calculation module 220, a clustering module 230 and a presentation module 240.
Alternatively, the modules may be stored in a memory in the form of software or Firmware (Firmware) or be fixed in an Operating System (OS) of the electronic device 100 provided in the present application, and may be executed by a processor in the electronic device 100. Meanwhile, data, codes of programs, and the like required to execute the above modules may be stored in the memory.
The obtaining module 210 may be configured to obtain multiple tickets; each call ticket comprises at least one call record, and each call record comprises a base station position identifier, an opposite party number, a number holder identifier and local serial number data.
It is to be appreciated that acquisition module 210 can be utilized to support electronic device 100 in performing the aforementioned S100, and/or other processes for the techniques described herein.
The calculating module 220 may be configured to determine the importance degree value of each base station location identifier and each counterpart number in each call ticket according to a preset calculation formula.
It is to be appreciated that the computing module 220 can be utilized to support the electronic device 100 in performing the aforementioned S110, and/or the like, and/or other processes for the techniques described herein.
For how to determine the importance degree value of each base station position identifier and each counterpart number in each call ticket according to the preset calculation formula, the calculation module 220 may be configured to determine the importance degree value of each base station position identifier and each counterpart number in each call ticket according to the word frequency-inverse text frequency TF-IDF formula.
It is to be appreciated that the computing module 220 may be utilized to support the electronic device 100 in performing the aforementioned S110A, and/or the like, and/or other processes for the techniques described herein.
The calculating module 220 may be further configured to de-coincide the a base station location identifiers with the largest importance value in each ticket, and obtain a base station location identifier key set of multiple tickets.
It is to be appreciated that the computing module 220 can be utilized to support the electronic device 100 in performing the above-described S120, and/or the like, and/or other processes for the techniques described herein.
The calculating module 220 may also be configured to perform de-duplication and combination on the b opposite party numbers with the largest importance value in each call ticket, so as to obtain an opposite party number key set of multiple call tickets.
It is to be appreciated that the computing module 220 can be utilized to support the electronic device 100 in performing the aforementioned S130, and/or the like, and/or other processes for the techniques described herein.
The calculation module 220 may also be configured to determine a similarity between any two call tickets according to the base station location identification key set, the counterpart number key set, the number holder identification, and the local serial number data; and the similarity represents the consistency degree of the behavior characteristics between the two call tickets.
It is to be appreciated that the computing module 220 can be utilized to support the electronic device 100 in performing the aforementioned S140, and/or the like, and/or other processes for the techniques described herein.
For how to determine the similarity between any two call tickets according to the base station position identification key set, the opposite party number key set, the number holder identification and the local serial number data, the calculation module 220 may be configured to: determining a key base station position identification frequency vector corresponding to each ticket according to the base station position identification key set; determining a key opposite party number frequency vector corresponding to each ticket according to the opposite party number key set; and determining the similarity between any two call tickets according to the position identification frequency vector of the key base station, the frequency vector of the key counterpart number, the number holder identification and the serial number data of the local machine.
It is to be appreciated that the computing module 220 may be utilized to support the electronic device 100 in performing the above-described S140A-S140C, etc., and/or other processes for the techniques described herein.
The clustering module 230 may be configured to determine, according to the similarity, a plurality of local communication numbers corresponding to a plurality of tickets with consistent behavior characteristics among the plurality of tickets as a cluster, so as to obtain a cluster structure of the plurality of tickets.
It will be appreciated that clustering module 230 may be used to support electronic device 100 in performing the above-described S150, and/or the like, and/or other processes for the techniques described herein.
In some possible embodiments, each of the multiple tickets corresponds to a local communication number. Further, as to how to determine a plurality of local communication numbers corresponding to a plurality of tickets with consistent behavior characteristics in the plurality of tickets as a cluster according to the similarity, so as to obtain a cluster structure of the plurality of tickets, the clustering module 230 may be configured to: obtaining an unprocessed call ticket from a call ticket set consisting of a plurality of call tickets as a target call ticket; acquiring the similarity between a target ticket and each other unprocessed ticket in the ticket set; generating a cluster set, adding all the call tickets with similarity greater than a preset threshold value and the target call tickets in the call ticket set into the cluster set, and determining all the call tickets belonging to the cluster set in the call ticket set as processed call tickets; all the call tickets except the target call ticket in the cluster set are to-be-associated cluster call tickets; acquiring a clustering ticket to be associated from the clustering set as a target clustering ticket to be associated; acquiring the similarity between a target clustering ticket to be associated and each unprocessed ticket in a ticket set; all the telephone bills in the telephone bill set, the similarity of which with the target clustering telephone bills to be associated is greater than a preset threshold value, are taken as the telephone bills to be associated and added into the clustering set, all the telephone bills in the telephone bill set, which belong to the clustering set, are determined as processed telephone bills, and the target number to be clustered is determined as a clustered number; judging whether the clustering set still has a clustering ticket to be associated, and returning to execute the step S150D when the clustering set still has the clustering ticket to be associated; judging whether the ticket set still has an unprocessed ticket, and returning to execute the step S150A when the ticket set still has an unprocessed ticket; and determining local communication numbers corresponding to a plurality of telephone bills in each generated cluster set as a cluster structure.
It will be appreciated that the clustering module 230 may be used to support the electronic device 100 in performing the above-described S150A-S150I, etc., and/or other processes for the techniques described herein.
The display module 240 may be configured to generate a visual communication data map according to the cluster structure, and display the visual communication data map, where the visual communication data map may include communication relationships between the clusters and other tickets in the multiple tickets.
It is to be appreciated that presentation module 240 may be utilized to support electronic device 100 in performing S160, etc., described above, and/or other processes for the techniques described herein.
In summary, the embodiment of the application provides a method and a device for processing and displaying call ticket data and an electronic device. Firstly, the method and the device can determine the importance degree value of each base station position identification and each opposite party number in each call bill according to the obtained multiple call bills, then determine the key set of the base station position identification and the key set of the opposite party number according to the importance degree value, and determine the similarity between any two call bills according to the two sets, the number holder identification and the serial number data of the local machine. The similarity represents the consistency degree of the behavior characteristics between the two call tickets, namely the similarity can accurately and efficiently reflect the possibility degree that the two call tickets are a cluster. Therefore, after the similarity between any two call tickets is determined, a plurality of local communication numbers corresponding to a plurality of call tickets with consistent behavior characteristics in the call tickets are determined as a cluster, and finally the cluster structure is generated into a visual communication data graph and displayed, so that the cluster structure in the call ticket data can be accurately and quickly analyzed, the cluster structure can be visually displayed, the investigation and case handling efficiency of a public security organization is improved, and the technical blank in the field is filled.
The above description is only for the possible embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application are included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A call ticket data processing and displaying method is characterized by comprising the following steps:
acquiring a plurality of call tickets; each call ticket comprises at least one call record, and each call record comprises a base station position identifier, an opposite party number, a number holder identifier and local serial number data;
determining the importance degree value of each base station position mark and each opposite party number in each ticket according to a preset calculation formula;
carrying out de-duplication and combination on a base station position identifications with the maximum importance degree value in each call ticket to obtain a base station position identification key set of the multiple call tickets;
carrying out de-duplication and combination on b opposite side numbers with the maximum importance degree value in each call bill to obtain an opposite side number key set of the multiple call bills;
determining the similarity between any two call tickets according to the base station position identification key set, the opposite party number key set, the number holder identification and the local serial number data; the similarity represents the consistency degree of behavior characteristics between the two call tickets;
determining a plurality of local communication numbers corresponding to a plurality of telephone bills with consistent behavior characteristics in the plurality of telephone bills into a cluster according to the similarity, and obtaining a cluster structure of the plurality of telephone bills;
and generating a visual communication data graph according to the clustering structure, and displaying the visual communication data graph.
2. The method of claim 1, wherein the step of determining the importance degree value of each base station location identifier and each counterpart number in each call ticket according to a preset calculation formula comprises:
and determining the importance degree value of each base station position mark and each opposite party number in each call bill according to a word frequency-inverse text frequency TF-IDF formula.
3. The method of claim 1, wherein the step of determining similarity between any two call tickets according to the base station location identification key set, the counterpart number key set, the number holder identification, and the local serial number data comprises:
determining a key base station position identification frequency vector corresponding to each ticket according to the base station position identification key set;
determining a key opposite party number frequency vector corresponding to each ticket according to the opposite party number key set;
and determining the similarity between any two call tickets according to the position identification frequency vector of the key base station, the frequency vector of the key counterpart number, the number holder identification and the serial number data of the local machine.
4. The method of claim 3, wherein the step of determining the key base station location identity frequency vector corresponding to each of the tickets according to the key set of base station location identity comprises:
determining the key base station position identification frequency vector corresponding to the j call ticket according to the following formula
Figure FDA0002429186320000021
Component (b):
Figure FDA0002429186320000022
wherein, CIDiA value representing the ith base station location identity in the critical set of base station location identities,
Figure FDA0002429186320000023
the set of a base station position identifications with the maximum j call ticket importance degree value is obtained; the j call ticket is any call ticket in the multiple call tickets, and the j call ticket is any call ticket
Figure FDA0002429186320000024
The component is the ith component of the key base station position identification frequency vector corresponding to the j call ticket;
the step of determining the key opposite party number frequency vector corresponding to each ticket according to the key set of the opposite party number comprises the following steps:
determining the frequency vector of the key counterpart number corresponding to the j call ticket according to the following formula
Figure FDA0002429186320000031
Component (b):
Figure FDA0002429186320000032
wherein is ONiA value representing the ith partner number in the key set of partner numbers,
Figure FDA0002429186320000033
the set is the b opposite side numbers with the maximum value of the importance degree of the j call ticket; the j call ticket is any call ticket in the multiple call tickets, and the j call ticket is any call ticket
Figure FDA0002429186320000034
The component is the ith component of the key counterpart number frequency vector corresponding to the j call ticket.
5. The method of claim 3, wherein the step of determining similarity between any two call tickets according to the key base station location identity frequency vector, the key counterpart number frequency vector, the number holder identity, and the local serial number data comprises:
determining s according to the following formulaxAnd sySimilarity g between themμ(x,y):
Figure FDA0002429186320000035
Wherein the content of the first and second substances,
Figure FDA0002429186320000036
Figure FDA0002429186320000037
sxand syIs any one of the multiple call tickets, and sxAnd syDifferent call tickets are used, kappa is more than or equal to 0 and less than or equal to 2, lambda is more than or equal to 0 and less than or equal to 2, kappa + lambda is 2,
Figure FDA0002429186320000038
is s isxThe local string number data of (a) the local string number,
Figure FDA0002429186320000039
is s isyThe local string number data of (a) the local string number,
Figure FDA00024291863200000310
is s isxIs identified by the number holder of (a),
Figure FDA00024291863200000311
is s isyNumber holder identification of (1), xiIs s isxIdentifies the ith component, y, of the frequency vectoriIs s isyIdentifies the ith component, X, of the frequency vectoriIs s isxThe ith component, Y, of the key counterpart number frequency vectoriIs s isyThe ith component of the key counterpart number frequency vector.
6. The method of claim 1 wherein each of said plurality of tickets corresponds to a local communications number; determining a plurality of local communication numbers corresponding to a plurality of telephone bills with consistent behavior characteristics in the plurality of telephone bills into a cluster according to the similarity, and obtaining a cluster structure of the plurality of telephone bills, wherein the cluster structure comprises the following steps:
obtaining an unprocessed call ticket from a call ticket set consisting of the plurality of call tickets as a target call ticket;
acquiring the similarity between the target call ticket and each other unprocessed call ticket in the call ticket set;
generating a cluster set, adding all call tickets in the call ticket set, the similarity of which with the target call ticket is greater than a preset threshold value, and the target call ticket into the cluster set, and determining all call tickets in the call ticket set, which belong to the cluster set, as processed call tickets; all the telephone bills in the clustering set except the target telephone bill are to-be-associated clustering telephone bills;
acquiring one clustering ticket to be associated from the clustering set as a target clustering ticket to be associated;
acquiring the similarity between the target clustering call ticket to be associated and each unprocessed call ticket in the call ticket set;
adding all the call tickets in the call ticket set, of which the similarity with the target clustering call ticket to be associated is greater than a preset threshold value, into the clustering set, determining all the call tickets in the call ticket set, belonging to the clustering set, as processed call tickets, and determining the target clustering call tickets to be associated as clustered call tickets;
when the clustering set still has a clustering ticket to be associated, returning to the step of acquiring a clustering ticket to be associated from the clustering set as a target clustering ticket to be associated;
and when the unprocessed call ticket still exists in the call ticket set, returning to execute the step of acquiring the unprocessed call ticket from the call ticket set consisting of the plurality of call tickets as a target call ticket, and otherwise, determining the local communication numbers corresponding to the plurality of call tickets in each generated cluster set as a cluster structure.
7. The method of claim 1, wherein the visual association data graph comprises association relationships between the clusters and other call tickets in the plurality of call tickets.
8. A call ticket data processing and displaying device is characterized by comprising:
the acquisition module is used for acquiring a plurality of call tickets; each call ticket comprises at least one call record, and each call record comprises a base station position identifier, an opposite party number, a number holder identifier and local serial number data;
the calculation module is used for determining the importance degree value of each base station position identification and each opposite party number in each call ticket according to a preset calculation formula;
the calculation module is further configured to perform de-duplication and combination on the a base station location identifiers with the largest importance value in each call ticket to obtain a base station location identifier key set of the multiple call tickets;
the calculation module is further used for performing de-duplication and combination on the b opposite side numbers with the maximum importance degree value in each call ticket to obtain an opposite side number key set of the multiple call tickets;
the calculation module is also used for determining the similarity between any two call tickets according to the base station position identification key set, the opposite party number key set, the number holder identification and the local serial number data; the similarity represents the consistency degree of behavior characteristics between the two call tickets;
the clustering module is used for determining a plurality of local communication numbers corresponding to a plurality of telephone bills with consistent behavior characteristics in the plurality of telephone bills into a cluster according to the similarity so as to obtain a clustering structure of the plurality of telephone bills;
and the display module is used for generating a visual communication data graph according to the clustering structure and displaying the visual communication data graph.
9. The apparatus of claim 8, wherein the computing module is configured to determine an importance level value of each base station location identifier and each party number in each call ticket according to a word frequency-inverse text frequency (TF-IDF) formula.
10. An electronic device, comprising: the call ticket data processing and displaying system comprises a processor, a memory and a bus, wherein the memory stores machine readable instructions, when the server runs, the processor and the memory are communicated through the bus, and the processor executes the machine readable instructions to execute the call ticket data processing and displaying method according to any one of claims 1-7.
CN202010230658.7A 2020-03-27 2020-03-27 Ticket data processing and displaying method and device and electronic equipment Active CN111884821B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010230658.7A CN111884821B (en) 2020-03-27 2020-03-27 Ticket data processing and displaying method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010230658.7A CN111884821B (en) 2020-03-27 2020-03-27 Ticket data processing and displaying method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN111884821A CN111884821A (en) 2020-11-03
CN111884821B true CN111884821B (en) 2022-04-29

Family

ID=73154262

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010230658.7A Active CN111884821B (en) 2020-03-27 2020-03-27 Ticket data processing and displaying method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111884821B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115150763B (en) * 2021-03-30 2024-05-07 中国移动通信集团江苏有限公司 Bill charging method and device
CN113887551B (en) * 2021-08-17 2022-09-09 厦门市美亚柏科信息股份有限公司 Target person analysis method based on ticket data, terminal device and storage medium
CN115086488B (en) * 2022-07-27 2022-10-25 广东创新科技职业学院 Number classification method and device

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8359006B1 (en) * 2010-11-05 2013-01-22 Sprint Communications Company L.P. Using communications records to detect unauthorized use of telecommunication services
CN107220316A (en) * 2017-05-17 2017-09-29 华为机器有限公司 A kind of communication data analysis method and device
CN107316044A (en) * 2016-04-27 2017-11-03 中国电信股份有限公司 Similar users recognition methods and device
CN109451182A (en) * 2018-10-19 2019-03-08 北京邮电大学 A kind of detection method and device of fraudulent call
CN109547393A (en) * 2017-09-21 2019-03-29 腾讯科技(深圳)有限公司 Malice number identification method, device, equipment and storage medium
CN109587357A (en) * 2018-11-14 2019-04-05 上海麦图信息科技有限公司 A kind of recognition methods of harassing call
CN109600520A (en) * 2017-09-30 2019-04-09 上海触乐信息科技有限公司 Harassing call number identification method, device and equipment
CN110248322A (en) * 2019-06-28 2019-09-17 国家计算机网络与信息安全管理中心 A kind of swindling gang identifying system and recognition methods based on fraud text message

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120084288A1 (en) * 2010-10-01 2012-04-05 Mohammed Abdul-Razzak Criminal relationship analysis and visualization
US10694026B2 (en) * 2017-08-16 2020-06-23 Royal Bank Of Canada Systems and methods for early fraud detection

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8359006B1 (en) * 2010-11-05 2013-01-22 Sprint Communications Company L.P. Using communications records to detect unauthorized use of telecommunication services
CN107316044A (en) * 2016-04-27 2017-11-03 中国电信股份有限公司 Similar users recognition methods and device
CN107220316A (en) * 2017-05-17 2017-09-29 华为机器有限公司 A kind of communication data analysis method and device
CN109547393A (en) * 2017-09-21 2019-03-29 腾讯科技(深圳)有限公司 Malice number identification method, device, equipment and storage medium
CN109600520A (en) * 2017-09-30 2019-04-09 上海触乐信息科技有限公司 Harassing call number identification method, device and equipment
CN109451182A (en) * 2018-10-19 2019-03-08 北京邮电大学 A kind of detection method and device of fraudulent call
CN109587357A (en) * 2018-11-14 2019-04-05 上海麦图信息科技有限公司 A kind of recognition methods of harassing call
CN110248322A (en) * 2019-06-28 2019-09-17 国家计算机网络与信息安全管理中心 A kind of swindling gang identifying system and recognition methods based on fraud text message

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
《Criminal pattern identification based on modified K-means clustering》;Turki Aljrees等;《2016 International Conference on Machine Learning and Cybernetics (ICMLC)》;20170309;全文 *
基于A-D模型的K-means算法在通话异常客户挖掘中的应用;周坚等;《电信科学》;20180420(第04期);全文 *
基于改进派系过滤算法的用户通信模型的分群方法;刘韩旭等;《电子技术》;20171025(第10期);全文 *
权重算法在计算移动用户重入网的应用;刘清松等;《自动化技术与应用》;20090225(第02期);全文 *

Also Published As

Publication number Publication date
CN111884821A (en) 2020-11-03

Similar Documents

Publication Publication Date Title
CN111884821B (en) Ticket data processing and displaying method and device and electronic equipment
CN112543176A (en) Abnormal network access detection method, device, storage medium and terminal
CN109064031B (en) Project affiliate credit evaluation method based on block chain, block chain and storage medium
GB2455830A (en) Consolidating data sets to identify networks of people and the influence of the individual actors in the network over the network.
CN102272784A (en) Method, apparatus and computer program product for providing analysis and visualization of content items association
CN110245714B (en) Image recognition method and device and electronic equipment
CN111274283A (en) Track display method and device
CN110795471A (en) Data matching method and device, computer readable storage medium and electronic equipment
CN113849748A (en) Information display method and device, electronic equipment and readable storage medium
CN106682146B (en) Method and system for retrieving scenic spot evaluation according to keywords
CN116108149A (en) Intelligent question-answering method, device, equipment, medium and product thereof
CN114338915A (en) Caller ID risk identification method, caller ID risk identification device, caller ID risk identification equipment and storage medium
CN110110206B (en) Method, device, computing equipment and storage medium for mining and recommending relationships among articles
CN111241401B (en) Search request processing method and device
CN110825933B (en) Relation graph display method and device, electronic equipment and readable storage medium
US20160292258A1 (en) Method and apparatus for filtering out low-frequency click, computer program, and computer readable medium
CN111401478A (en) Data abnormity identification method and device
CN113992784B (en) Audio and video call method, device, computer equipment and storage medium
CN112532512B (en) Session message processing method, related device, equipment and medium
CN111125193B (en) Method, device, equipment and storage medium for identifying abnormal multimedia comments
CN113094624A (en) Page generation method and device and electronic equipment
CN112084151A (en) File processing method and device and electronic equipment
CN114257565A (en) Method, system and server for mining domain name with potential threat
CN111708811A (en) Visitor data management method and device, electronic equipment and storage medium
CN108563553B (en) Automatic excavation method and device for hidden communication tool

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant