WO2017186090A1 - Communication number processing method and apparatus - Google Patents

Communication number processing method and apparatus Download PDF

Info

Publication number
WO2017186090A1
WO2017186090A1 PCT/CN2017/081813 CN2017081813W WO2017186090A1 WO 2017186090 A1 WO2017186090 A1 WO 2017186090A1 CN 2017081813 W CN2017081813 W CN 2017081813W WO 2017186090 A1 WO2017186090 A1 WO 2017186090A1
Authority
WO
WIPO (PCT)
Prior art keywords
communication
processed
initiation
cdr
bill
Prior art date
Application number
PCT/CN2017/081813
Other languages
French (fr)
Chinese (zh)
Inventor
林海雄
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017186090A1 publication Critical patent/WO2017186090A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/66Substation equipment, e.g. for use by subscribers with means for preventing unauthorised or fraudulent calling
    • H04M1/663Preventing unauthorised calls to a telephone set

Definitions

  • the present application relates to data processing technologies in the field of communication technologies, and in particular, to a communication number processing method and apparatus.
  • Telecommunications fraud refers to the criminal act of criminals making false information, setting up scams, conducting remote and contactless fraud on the victim, and inducing the victim to make money or transfer money to criminals through telephone, internet and SMS.
  • the public security organs of the country established a total of 590,000 telecom fraud cases, an increase of 32.5% year-on-year, causing economic losses. 22.2 billion yuan; and behind each case, it may be a family broken by fraud.
  • the prior art collects the tag information of the user by using the application software (app) on the mobile phone. If a certain number is found to be simultaneously marked as a fraudulent number by multiple users, it is considered The number is a fraudulent number and alerts the user who is talking to the fraudulent number to be vigilant to avoid being scammed.
  • the prior art needs to collect user tag information.
  • the probability that the user marks the number is relatively low, and many users often do not need to mark the type of the number when a strange call is received, and the prior art needs After collecting enough user tags, the number can be considered as a fraudulent number. Therefore, the prior art fraudulent number is recognized slowly and inefficiently.
  • the user marks the number is subjective behavior, many When the user receives some harassing calls, such as advertisements and other malicious calls, the harassment number is often marked as a fraudulent number. Therefore, the prior art scam number identification accuracy is low.
  • the embodiment of the present application is expected to provide a communication number processing method and apparatus, which can provide The speed and accuracy of high number identification.
  • an embodiment of the present application provides a communication number processing method, where the method includes:
  • the parsing the CDR obtains the type of the communication information included in the CDR, extracts at least one type of communication information of each communication number in the CDR, and combines to form a pre-processed CDR ,include:
  • the extracted communication records of the respective communication initiation numbers are combined to form the pre-processed bill.
  • the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
  • the ordering of the similarity extracts the first proportion of the communication initiation number with the highest similarity.
  • the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
  • the communication initiation number of the second ratio with the highest number of communication times is extracted.
  • the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
  • the communication initiation number of the third ratio having the highest average communication duration is extracted.
  • the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
  • Extracting a target pass signal matching the preset feature from the communication number included in the pre-processed bill Code including:
  • extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature including:
  • the machine learning model is used to analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and the target communication number matching the preset feature is extracted from the communication number included in the pre-processed bill.
  • the method further includes:
  • the machine learning model is retrained based on the communication record of the security number in the pre-processed bill.
  • the machine learning model is retrained based on the communication record of the security number in the pre-processed bill, including:
  • the method further includes:
  • responding to the communication behavior of the target communication number including: performing danger to a user having a communication response number of the communication record with the target communication number Reminder; wherein the danger reminder includes a voice reminder and/or a text reminder;
  • the real-time level of response processing is positively correlated with the level of danger.
  • the embodiment of the present application provides a communication number processing apparatus, where the apparatus includes:
  • An obtaining module configured to acquire, from the communication service device, a CDR of a preset number of communication numbers in a first preset time
  • a pre-processing module configured to parse the CDR to obtain a type of communication information included in the CDR, extract at least one type of communication information of each communication number in the CDR, and combine to form a pre-processed CDR ;
  • a parsing module configured to parse at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill;
  • an extracting module configured to extract, from the communication number included in the pre-processed bill, a target communication number that matches the preset feature.
  • the pre-processing module is specifically configured to:
  • the extracted communication records of the respective communication initiation numbers are combined to form the pre-processed bill.
  • the parsing module is specifically configured to: separately calculate an edit distance of each communication initiation number and a yellow page number in the preprocessed bill; and obtain, according to the edit distance, each communication initiation in the preprocessed bill The similarity between the number and the yellow page number, wherein the edit distance indicates the number of operations of the yellow page number becoming the communication initiation number;
  • the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold; or, based on the pre-processing words The ranking of the similarity of the yellow page numbers among the communication initiation numbers included in the single is extracted, and the communication initiation number of the first ratio with the highest similarity is extracted.
  • the parsing module is specifically configured to: extract a communication start time of each communication number in the pre-processed bill as a communication initiation number; and calculate each communication initiation number in the pre-processed bill in a unit time Number of communications;
  • the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time; or, based on the pre-processed CDR The number of communication times of each communication initiation number in the unit time is sorted, and the communication ratio number of the second ratio with the highest communication number is extracted.
  • the parsing module is specifically configured to: extract a communication duration of each communication number in the pre-processed bill as a communication initiation number; and calculate an average communication duration of each communication initiation number in the pre-processed bill;
  • the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold; or, based on each communication included in the pre-processed CDR The order of the average communication duration of the originating number is extracted, and the third ratio of the communication originating number with the highest average communication duration is extracted.
  • the parsing module is specifically configured to: acquire a attribution of a communication response number corresponding to each communication number in the pre-processed bill as a communication initiation number; and calculate each communication initiation number in the pre-processed bill The number of different attributions of the corresponding communication response number;
  • the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold; or, based on the The order of the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processing CDR is extracted, and the communication initiation number of the fourth ratio having the highest number of different attributions of the corresponding communication response number is extracted.
  • the extracting module is specifically configured to: analyze, by using a machine learning model, a feature of the corresponding type of communication information of each communication number in the pre-processed bill, from the communication number included in the pre-processed bill Extract the target communication number that matches the preset feature.
  • the device further includes:
  • a training module configured to receive feedback information of the user side for the target communication number, and determine the target Whether the communication number is a security number; determining an error rate of the machine learning model based on the number of the target communication numbers that are fed back to the security number by the user side in the identified target communication number; the error rate of the machine learning model is greater than the fifth At the threshold, the machine learning model is retrained based on the communication record of the security number in the pre-processed bill.
  • the training module is configured to: parse at least one type of communication information of the communication record of the security number in the pre-processed bill, and obtain at least one type of communication information of the security number. Having a feature; updating a threshold used by the machine learning model to identify the target communication number based on characteristics of at least one type of communication information of the security number.
  • the device further includes:
  • a response module configured to determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and a preset feature; and a matching degree between a feature of the corresponding type of communication information of the target communication number and a preset feature Determining a risk level of the target communication number; and responding to the communication behavior of the target communication number based on the risk level of the target communication number.
  • the embodiment of the present application obtains the characteristics of the corresponding type of communication information of each communication number by parsing the CDR of the preset number of communication numbers in the first preset time, and based on each The corresponding type of communication information of the communication number has characteristics for extracting the target communication number matching the preset feature from each communication number.
  • the communication number CDR is objective data maintained by the operator, and can be truly and completely reflected.
  • the embodiment of the present application uses the communication number CDR as the processing basis, which can improve the accuracy of the number identification.
  • the generation and maintenance process of the CDR generally does not require each user. The direct participation of the operator is responsible for the speed and efficiency of the communication number CDRs. Therefore, the embodiment of the present application can improve the speed and accuracy of the number identification.
  • FIG. 1 is a schematic diagram of an optional application scenario of a method for processing a communication number in an embodiment of the present application
  • FIG. 2 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 1 of the present application;
  • FIG. 3 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 2 of the present application;
  • FIG. 5 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 4 of the present application.
  • FIG. 6 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 5 of the present application.
  • FIG. 7 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 6 of the present application.
  • FIG. 8 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 7 of the present application.
  • FIG. 9 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 8 of the present application.
  • FIG. 10 is an optional schematic flowchart of a method for processing a communication number in Embodiment 9 of the present application.
  • FIG. 11 is an optional schematic diagram of a user application running on a user equipment in a state of receiving a user indication according to an embodiment of the present disclosure
  • FIG. 11b is an optional schematic diagram of a user application running on a user equipment in a text reminding state according to an embodiment of the present application
  • FIG. 12 is an optional structural diagram of a communication number processing apparatus according to an embodiment of the present application.
  • FIG. 13 is another schematic structural diagram of a communication number processing apparatus according to an embodiment of the present application.
  • FIG. 14 is still another schematic structural diagram of a communication number processing apparatus according to an embodiment of the present application.
  • the embodiment of the present application describes a communication number processing method.
  • FIG. 1 an optional application scenario of the communication number processing method in the embodiment of the present application, the user equipment 11 , the user equipment 12 , the user equipment 13 , and the network device 14 .
  • communication service device 15 (such as carrier gateway or enterprise gateway), communication service device 15, application background server 16 respectively access communication network (such as wireless network or wired network), communication service device 15 such as business support system (BSS, Business Support System) / An operation support system (OSS), or a telecommunication switch;
  • BSS business support system
  • OSS operation support system
  • the communication service device 15 is configured to provide a bill for a communication number;
  • the network device 14 is configured to provide service support for each user equipment accessing the communication network;
  • 16 is used to provide service support for the application; here, corresponding to the background server 16 of the application, the client of the application installed on the user equipment is also used to provide service support for the application;
  • the application may be a communication application, for example: Tencent Mobile phone housekeeper, WeChat, Tencent mailbox, etc.
  • applications are not limited to communication applications.
  • the application device does not specifically limit this; in the above scenario, the number of user equipments is at least one, and each user equipment is associated with at least one different communication number, for example, the user equipment 11 shown in FIG. 1 is associated with at least one communication.
  • the number A, the user equipment 12 is associated with at least one communication number B, and the user equipment 13 is associated with at least one communication number C.
  • the communication number A, the communication number B and the communication number C are different from each other. Applicable to the above scenario, the communication number that satisfies the preset condition is identified from the plurality of communication numbers.
  • the embodiment of the present application further describes a communication number processing apparatus, which can be used to execute the embodiment of the present application.
  • Communication number processing method; the communication number processing device can be implemented in various manners, for example, in a user device such as a smart phone, a fixed telephone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.) All components of the device, or all components of the device are implemented in a network device such as an enterprise gateway or a carrier gateway, or the components in the device are implemented in a coupled manner on the user device side or the network side, or the communication number processing device It can also be a client application or a background server of the user application. For example, when the user application is a Tencent mobile phone housekeeper, the corresponding communication number processing device can be a client or a background server of the Tencent mobile phone housekeeper.
  • the embodiment provides a communication number processing method, which can be applied to a scenario in which a communication number that satisfies a preset condition needs to be identified from multiple communication numbers, for example, identification of a network-wide number in a communication network, or The identification of the communication number to be identified, or the identification of the communication number for communicating with the current user;
  • the type of communication service includes but is not limited to any one of the following service types or combinations: voice call; short message; flash message; Data services (such as WeChat), this application is not limited to this.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 201 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • the communication service device may include a telecommunication support system device, such as a BSS/OSS, or a telecommunication switch; the first preset time may be flexibly set by the user or the operator according to actual conditions such as actual service requirements; the communication number is not limited to the mobile phone number and fixed. a number or the like; the communication number may include, for example, all communication numbers in the communication network, or a communication number to be identified indicated by the user, or a communication number to be called with the current user; wherein the communication number indicated by the user, such as the user The communication number to be identified specified in the application running on the user equipment (such as the Tencent mobile phone housekeeper), or the user sends an indication message carrying the communication number to be identified to the operator server.
  • a telecommunication support system device such as a BSS/OSS, or a telecommunication switch
  • the first preset time may be flexibly set by the user or the operator according to actual conditions such as actual service requirements
  • the communication number is not limited to the mobile phone number and fixed.
  • the implementation manner of obtaining the CDR of the preset number of communication numbers in the first preset time from the communication service device may be at least one of the following manners:
  • the communication service device When detecting the communication number of the current user, acquires the CDR of the communication number of the current user in the first preset time;
  • the communication service device obtains the CDR of the strange communication number within the first preset time.
  • step 202 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
  • the CDRs of the preset number of communication numbers that are obtained from the communication service device in the first preset time are generally out of order.
  • the pre-processed CDRs are formed by using the communication numbers as a dimension, and the pre-processing CDRs are formed.
  • the communication information includes at least one type of communication information corresponding to at least one of the following communication numbers: the communication number is used as the calling number (such as the calling number in the voice service), and the communication number is used as the called number (such as in the voice service) The called number), the communication number is used as the information transmission number (such as the short message transmission number or the data transmission number in the data service), and the communication number is used as the information reception number (such as the short message receiving number or the data receiving number in the data service).
  • the pre-processed CDR includes only at least one type of communication information of each communication number extracted from the CDR, that is, the pre-processed CDR does not need to include all the information in the CDR; the data of the pre-processed CDR is Each communication number is used as an index to preprocess the data structure of the bill, for example:
  • Calling number 1 in the voice service communication information 1, communication information 2, ...;
  • Calling number 2 in the voice service communication information 3, communication information 4, ...;
  • SMS sending number 3 communication information 5, communication information 6, ...;
  • Data transmission number 4 in the data service communication information 7, communication information 8, ....
  • the pre-processing bills indexed by using each communication number as the calling number are shown in Table 1.
  • Table 1 For the data structure example in Table 1, the calling number, the called number, the communication start time, and the communication duration are shown here. (Second) is a partial example of the type of communication information included in the CDR.
  • Step 203 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
  • Step 204 Analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 205, otherwise The process ends.
  • Step 205 Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill are analyzed, and the characteristics of the corresponding type of communication information are matched with the preset features from the communication numbers included in the pre-processed bill.
  • the target communication number; the preset feature is, for example, a pre-set a priori value.
  • the present embodiment parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and based on the corresponding type of the communication number.
  • the communication information has characteristics that identify the target communication number that matches the preset feature from each communication number.
  • the generation and maintenance process of the communication number CDR is generally performed by the operator, the participation of each user is not required.
  • the acquisition speed and efficiency of the communication number CDR are high.
  • the CDR of the communication number is objective data maintained by the operator, it can truly and completely reflect all communication records of the user within a certain time interval, so
  • the technical solution provided by the embodiment of the present application is based on the CDR of the communication number, and can improve the speed and accuracy of the number identification.
  • This embodiment is based on the first embodiment, and specifically determines how to parse the CDR to obtain the type of the communication information included in the CDR, and extracts at least one type of communication information of each communication number in the CDR and combines to form a pre-processed CDR.
  • the scenario that proposes a solution to the technical solution.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 301 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • Step 302 Parsing the CDR to obtain at least one of the following types of communication information included in the CDR: a communication initiation number, a communication response number corresponding to the communication initiation number, a communication start time, and a communication duration.
  • the communication initiation number may include a communication number as a calling number (such as a calling number in a voice service), and a communication number (such as a short message transmission number or a data transmission number in a data service) as an information transmission number;
  • the communication response number of the number may include a communication number as the called number (such as the called number in the voice service), and a communication number (such as a short message receiving number or a data receiving number in the data service) as the information receiving number;
  • the type of communication information included in the CDR is not limited to the above-mentioned communication initiation number, the communication response number corresponding to the communication initiation number, the communication start time, the communication duration, etc., and the type of communication information can also be Including data traffic (upstream traffic and/or downstream traffic), communication location, service type, long-distance type, etc.; this application is not limited thereto.
  • Step 303 Extract at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number.
  • Step 304 Combine the extracted communication records of each communication initiation number to form a pre-processed bill.
  • the pre-processed CDR only includes at least one type of communication information of each communication number extracted from the CDR, and the pre-processed CDR does not include all the information in the CDR, which can reduce the workload of the communication number processing and improve Communication number processing efficiency.
  • the CDRs of the preset number of communication numbers that are obtained from the communication service device in the first preset time are generally out of order.
  • the CDRs shown in Table 2 are taken as an example.
  • the communication start time, service type, and communication initiation are used here.
  • the number, the communication response number, the communication place, the long distance type, and the communication duration (seconds) are partial examples of the types of communication information included in the CDR.
  • the communication number processing apparatus parses the CDR shown in Table 2 to obtain at least one of the following types of communication information included in the CDR: a communication initiation number; a communication response number corresponding to the communication initiation number; and a communication start time; Communication duration;
  • the communication number processing device extracts at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number; where the communication record of each communication initiation number includes the communication number at the first At least one type of communication information within a preset time;
  • the communication records of the extracted communication initiation numbers are combined to form a pre-processed CDR; the pre-processed CDRs are statistically formed by using each communication number as a dimension, and the data structure (or display mode) in the pre-processed CDR is used for each communication number.
  • the index organization it is assumed that at least one type of communication information corresponding to each communication number is a communication initiation number to form a pre-processed bill, and the data structure of the pre-processed bill can be:
  • Communication initiation number 1 communication information 1, communication information 2, ...;
  • Communication initiation number 2 communication information 1, communication information 2, ...;
  • the pre-processing bill shown in Table 3 is obtained by performing the steps 202-204 on the basis of the bill shown in Table 2 by the communication number processing device;
  • the pre-processed CDRs are organized by indexing each communication initiation number.
  • Communication origination number Communication response number Communication start time Communication duration (seconds) 158xxxx0001 186xxxx0002 2016-01-15 15:32:42 134 158xxxx0001 186xxxx0007 2016-01-15 15:42:02 97 158xxxx0001 139xxxx0006 2016-01-15 15:48:02 123 158xxxx0001 187xxxx0002 2016-01-15 15:52:07 256 170xxxx0001 186xxxx0001 2016-01-15 15:39:02 15 170xxxx0001 180xxxx0007 2016-01-15 15:51:02 77 170xxxx0001 139xxxx0002 2016-01-16 10:26:02 --
  • Step 305 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
  • Step 306 Analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 307, otherwise The process ends.
  • Step 307 Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • This embodiment is directed to how to parse a CDR to obtain the type of communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
  • the CDR obtains at least one type of communication information included in the CDR, and extracts at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number, and extracts each extracted
  • the communication record combination of the communication initiation number forms a pre-processed CDR, and the formed pre-processed CDR includes only at least one type of communication information of each communication number extracted from the CDR, and the pre-processed CDR does not include the CDR All the information in the book can reduce the workload of number identification and improve the speed and efficiency of number identification.
  • the embodiment is based on the first embodiment, and the editing distance of the communication initiation number and the yellow page number is used as the feature of the communication number, and the technical solution for specifically identifying the communication number that meets the preset condition from the plurality of communication numbers is described.
  • the communication number processing method includes the following steps:
  • the yellow page number can be one or more; the edit distance refers to the minimum number of editing operations required to convert the yellow page number into the communication initiation number, that is, by adding, reducing, modifying, and moving the number of the yellow page number into communication.
  • the number of operations for initiating a number in the scenario where the yellow page number is multiple, for each communication initiation number in the pre-processed bill, the edit distance of the communication initiation number and each yellow page number needs to be separately calculated.
  • At least one of the following methods may be used to obtain each communication in the pre-processed bill based on the edit distance.
  • Method 1 for each communication initiation number in the pre-processed bill, normalizing the calculated communication initiation number and the edit distance of each yellow page number to obtain the communication initiation number and each yellow page number. Similarity; further, the similarity of the communication initiation number to each yellow page number is sorted.
  • Method 2 for each communication initiation number in the pre-processed bill, calculating a ratio of the edit distance of the communication initiation number to the yellow page number and the preset distance, and calculating the similarity between the calculated ratio communication initiation number and the yellow page number; In a scenario where the yellow page number is multiple, the ratio of the edit distance of the communication initiation number to each yellow page number to the preset distance needs to be separately calculated.
  • the initial value of the first threshold may be calculated by manual setting or training, for example, determining, according to the a priori value, the target number of the target communication number in each communication initiation number included in the pre-processed bill; The similarity between the communication initiation number and the yellow page number is sorted; the communication initiation number of the target number is selected according to the order of decreasing similarity; and the communication initiation number of the selected communication initiation number having the smallest similarity with the yellow page number is similar Degree, determined as the initial value of the first threshold.
  • the first threshold can be continuously updated by training calculation according to actual needs.
  • the communication number processing device sorts the similarity between each communication initiation number and the yellow page number included in the pre-processed bill based on the similarity between each communication initiation number and the yellow page number in the pre-processed bill;
  • the order of the similarity between each communication initiation number and the yellow page number included in the pre-processing CDR is extracted from the communication initiation numbers included in the pre-processed CDR, and the first proportional communication initiation number with the highest similarity is extracted as the target communication number.
  • the communication number processing device determines, according to the similarity between the communication initiation number and the yellow page number and the first threshold, respectively.
  • the probability that the communication initiation number belongs to the target communication number (such as the fraud number) and the probability of belonging to the normal number category, and the class corresponding to the larger probability value is used as the class to which the communication initiation number belongs;
  • the class is the target communication number class, and it is determined that the communication initiation number is the target communication number, and vice versa, the communication initiation number is determined to be the normal number.
  • the user equipment may be, for example, a smart phone, a fixed phone, a tablet computer, a notebook computer, or a wearable device.
  • the device may be, for example, a service server of an operator, an enterprise gateway, a background server of an application installed on the user equipment, and the like;
  • the communication service device may be, for example, a BSS/OSS or a telecommunication switch;
  • it can be a communication application, for example, a Tencent mobile phone housekeeper, a WeChat, a Tencent mailbox, etc., of course, the application is not limited to the communication application, which is not specifically limited in the embodiment of the present application;
  • the server and the communication service device cooperate with each other to implement an optional flowchart of the communication number processing method provided by the embodiment, and the method includes:
  • Step 401 The user equipment sends an identification indication carrying the to-be-identified communication number to the server, based on the user indication.
  • the user application running on the user equipment is in the receiving user indication state, and the user inputs the to-be-identified communication number in the designated location according to the prompt of the application in the display window of the application installed in the user equipment;
  • the number can be one or more.
  • Step 402 The server receives the identification indication, and sends a CDR request carrying the to-be-identified communication number to the communication service device according to the identification indication.
  • the CDR request includes the to-be-identified communication number and the first preset time.
  • Step 403 The communication service device receives the CDR request, and obtains the CDR of the to-be-identified communication number in the first preset time based on the CDR request, and sends the CDR to the server.
  • Step 404 The server receives the bill of the to-be-identified communication number in the first preset time.
  • step 405 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number to be identified in the CDR is extracted and combined to form a pre-processed CDR.
  • Step 406 Calculate an edit distance of each to-be-identified communication number and a yellow page number in the pre-processed bill.
  • Step 407 Obtain a similarity between each to-be-identified communication number and the yellow page number in the pre-processed bill based on the edit distance.
  • Step 408 Determine whether the similarity between each to-be-identified communication number and the yellow page number included in the pre-processed CDR is greater than a first threshold. If yes, go to step 409, otherwise the process terminates.
  • Step 409 Extract, from the to-be-identified communication numbers included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than the first threshold, as the target communication number.
  • Step 410 The server sends an identification response carrying the target communication number to the user equipment based on the identified target communication number, where the identification response is used to perform a dangerous reminder to the user, and the user is reminded that the identified target communication number may be a fraudulent number;
  • Implementation methods include, but are not limited to, reminding by communication applications such as SMS, Flash, WeChat, and Tencent Mobile Manager; the server can also perform dangerous reminding to the user equipment directly through the customer service phone when the target communication number is identified.
  • the server may also perform a danger reminder to the user who has the communication response number of the communication record of the identified target communication number or the user who is communicating with the identified target communication number. To avoid users being cheated.
  • the user equipment After receiving the identification response of the carrying target communication number sent by the server, the user equipment performs a dangerous reminder on the user based on the target communication number; for example, referring to FIG. 11b, the user application running on the user equipment is in a text reminding state, and the user equipment is installed on the user equipment.
  • the display window of the application of the user equipment displays, for example, the following text reminder message "Please be vigilant!
  • the target communication number is a fraudulent number
  • the user applications here include, but are not limited to, SMS, Flash, WeChat, Tencent mobile butler, and other communication applications;
  • the application is not limited to the communication application, which is not specifically limited in the embodiment of the present application.
  • the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • pre-processing the CDRs on the basis of the parsing, the editing distances of the communication initiation numbers and the yellow page numbers in the pre-processed CDRs are respectively calculated, and the communication initiation numbers and the yellow page numbers in the pre-processed CDRs are obtained based on the editing distance.
  • the degree of similarity (that is, the communication number is one of the characteristics of the communication initiation number), and the communication initiation number whose degree of similarity with the yellow page number is greater than the preset first threshold is extracted from each communication initiation number included in the pre-processed bill as The destination communication number, or the first generation of the communication initiation number with the highest degree of similarity is extracted as the target communication number based on the ranking of the similarity of the yellow number of each of the communication initiation numbers included in the pre-processing CDR;
  • the similarity between each communication initiation number and the yellow page number in the pre-processed bill is characterized by a first threshold
  • the preset feature by determining the relative relationship between the similarity between each communication initiation number and the yellow page number included in the pre-processed CDR and the first threshold, extracting the target communication that matches the preset feature from the communication number included in the pre-processed CDR The number enables fast and accurate number identification.
  • the embodiment is based on the first embodiment, and the communication number of the communication initiation number in the unit time is taken as the characteristic of the communication number, and the technical solution for specifically identifying the communication number that meets the preset condition from the plurality of communication numbers is described.
  • the communication number processing method provided includes the following steps:
  • the number of communications of the communication initiation number in a unit time may include any of the following:
  • Mode 1 the communication initiation number and the number of communication of the same number in the unit time;
  • Mode 2 The number of communication times between the communication initiation number and all communication numbers with which it communicates.
  • the initial value of the second threshold may be calculated by manual setting or training, for example, determining, according to the a priori value, the target number of the target communication number in each communication initiation number included in the pre-processed CDR; and each communication initiation number in the unit time Sorting the number of communication times; selecting the communication initiation number of the target number in the order of decreasing the number of communication times per unit time; corresponding to the communication initiation number of the selected communication initiation number having the smallest number of communication times per unit time The number of communications in a unit time is determined as the initial value of the second threshold.
  • the second threshold can be continuously updated by training calculation according to actual needs.
  • the communication number processing device sorts the communication times of each communication initiation number included in the pre-processed bill in the unit time based on the number of communication times of each communication initiation number in the pre-processed bill. And sorting the number of communication times of each communication initiation number included in the pre-processing CDR in a unit time, and extracting, from each communication initiation number included in the pre-processed CDR, a communication ratio number of the second ratio with the highest communication number as the target Communication number.
  • the user equipment may be, for example, a smart phone, a fixed phone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.).
  • the server may be, for example, a service server of an operator, an enterprise gateway, a back-end server installed in an application of the user equipment, or the like;
  • the communication service device may be, for example, a BSS/OSS or a telecommunication switch; and the application may specifically be a communication application, for example, a Tencent mobile phone.
  • the appliance, the WeChat, the Tencent mailbox, and the like of course, the application is not limited to the communication application, and is not specifically limited in the embodiment of the present application; the user equipment, the server, and the communication service device shown in FIG. 5 cooperate with each other to implement the implementation.
  • An optional flowchart of the communication number processing method provided by the example includes:
  • Step 501 When detecting the communication number of the opposite party with the current user, the user equipment (or the application installed on the user equipment) sends an identification indication carrying the communication number of the opposite party to the server.
  • Step 502 The server receives the identification indication, and sends a CDR request carrying the communication number of the opposite party to the communication service device according to the identification indication.
  • the CDR request includes the communication number of the opposite party and the first preset time.
  • Step 503 The communication service device receives the bill request, and obtains the bill of the other party communication number in the first preset time based on the bill, and sends the bill to the server.
  • Step 504 The server receives a bill of the communication number of the other party in the first preset time.
  • step 505 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of the other party's communication number in the CDR is extracted and combined to form a pre-processed CDR.
  • Step 506 Extract the communication start time of the counterpart communication number in the pre-processed bill as the communication initiation number.
  • Step 507 Calculate the number of communications of the counterpart communication number in the unit time in the pre-processed bill.
  • Step 508 Determine whether the communication number of the counterpart communication number included in the pre-processed bill is greater than a second threshold in the unit time, and if yes, go to step 509, otherwise the process terminates.
  • Step 509 Extract, from the communication number of the other party included in the pre-processing CDR, a communication initiation number whose communication time in the unit time is greater than a second threshold, as the target communication number.
  • Step 510 The server performs a dangerous reminder to the user based on the identified target communication number, and reminds the user that the identified target communication number may be a fraudulent number; the implementation of the dangerous reminder includes but is not limited to a short message, a flash message, a WeChat, and a Tencent mobile phone.
  • the communication application such as the housekeeper reminds the server; the server can also perform a dangerous reminder to the user equipment directly through the customer service phone when the target communication number is recognized.
  • the server may also perform a danger reminder to the user having the communication response number of the communication record of the identified target communication number or the user of the communication response number being communicated with the identified target communication number. To avoid users being cheated.
  • the user equipment After receiving the identification response of the carrying target communication number sent by the server, the user equipment performs a dangerous reminder on the user based on the target communication number; for example, referring to FIG. 11b, the user equipment displays, for example, the following text reminding information in a display window of an application installed in the user equipment. "Please be vigilant! The target communication number is a fraudulent number"; the user application here includes, but is not limited to, a communication application such as a short message, a flash message, a WeChat, a Tencent mobile phone housekeeper, etc.; of course, the application is not limited to the communication application, and the embodiment of the present application This is not specifically limited.
  • the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • the number of communication times of each communication initiation number in the pre-processed CDRs in the unit time is calculated (ie, the communication number is used as One of the characteristics of the communication initiation number), extracting, from each communication initiation number included in the pre-processed bill, a communication initiation number whose communication number in a unit time is greater than a preset second threshold as the target communication number, or based on Pre-processing the number of times of communication in the communication initiation number included in the pre-processing CDRs, and extracting the communication-initiated number of the second ratio, which is the highest number of communication times, as the target communication number;
  • the communication initiation number is characterized by the number
  • the embodiment is based on the first embodiment, and is configured to obtain at least one type of communication information of each communication number in the pre-processed bill, and obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and
  • the scenario in which the target communication number matching the preset feature is extracted from the communication number included in the pre-processing CDR is proposed, and a technical solution is proposed.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 601 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • Step 602 Parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
  • Step 603 Extract the communication duration of each communication number in the pre-processed bill as the communication initiation number.
  • Step 604 Calculate an average communication duration of each communication initiation number in the pre-processed bill.
  • the average communication duration of the communication initiation number may include any of the following:
  • Step 605 Determine whether the average communication duration of each communication initiation number included in the pre-processed CDR is greater than a third threshold. If yes, go to step 606, otherwise the process terminates.
  • the initial value of the third threshold can be calculated manually or by training, for example:
  • the average communication duration corresponding to the communication initiation number having the smallest average communication duration in the selected communication initiation number is determined as the initial value of the third threshold.
  • the third threshold can be continuously updated through training calculation according to actual needs.
  • Step 606 Extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold, as the target communication number.
  • the communication number processing device sorts the average communication duration of each communication initiation number included in the pre-processed bill based on the average communication duration of each communication initiation number in the pre-processed bill; The order of the average communication duration of each of the included communication initiation numbers is extracted, and the communication initiation number of the third ratio having the highest average communication duration is extracted from each communication initiation number included in the pre-processed bill as the target communication number.
  • the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • the pre-processed CDRs based on the parsing of the dialog, respectively calculating the average communication duration of each communication initiation number in the pre-processed CDR (ie, the communication number is one of the characteristics of the communication initiation number), from the pre-processing Extracting, by each communication initiation number included in the CDR, a communication initiation number whose average communication duration is greater than a preset third threshold as the target communication number, or sorting based on the average communication duration of each communication initiation number included in the pre-processed CDR The communication initiation number of the third ratio with the highest average communication duration is used as the target communication number.
  • the embodiment of the present application is characterized by the average communication duration of each communication initiation number in the pre-processed bill, and the third threshold is a preset feature.
  • the average communication duration of each communication initiation number included in the pre-processed bill is compared with the third threshold Relationship, from the extracted communication number included in the telephone bill pretreatment target communication number matches a preset characteristics to achieve a rapid and accurate identification number.
  • the embodiment is based on the first embodiment, and is configured to obtain at least one type of communication information of each communication number in the pre-processed bill, and obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and
  • the scenario in which the target communication number matching the preset feature is extracted from the communication number included in the pre-processing CDR is proposed, and a technical solution is proposed.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 701 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • step 702 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
  • Step 703 Extract the attribution of the communication response number corresponding to each communication number in the pre-processed bill as the communication initiation number.
  • Step 704 Calculate the number of different attributions of the communication response number corresponding to each communication initiation number in the pre-processed bill.
  • Step 705 Determine whether the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed CDR is greater than a fourth threshold. If yes, go to step 706, otherwise the process terminates.
  • the initial value of the fourth threshold can be calculated manually or by training, for example:
  • the number of different attributions of the communication response number corresponding to the communication initiation number having the smallest number of different attributions of the communication response number corresponding to the selected communication initiation number is determined as an initial value of the fourth threshold.
  • the fourth threshold can be continuously updated through training calculation according to actual needs.
  • Step 706 Extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold, as the target communication number.
  • the communication number processing device is based on the average communication duration of each communication initiation number in the pre-processed bill, and the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed bill. Sorting; sorting the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processing CDR, and extracting the difference of the corresponding communication response number from each communication initiation number included in the pre-processed CDR The communication initiation number of the fourth ratio with the highest number of attributions is used as the target communication number.
  • the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • the embodiment of the present application initiates each communication in the pre-processed bill.
  • the number of different attributions of the communication response number corresponding to the number is characterized, and the fourth threshold is a preset feature, and the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed CDR is determined.
  • the relative relationship of the fourth threshold is obtained by extracting the target communication number matching the preset feature from the communication number included in the pre-processed CDR, thereby realizing fast and accurate number identification.
  • This embodiment is based on the foregoing embodiment, and proposes a solution to solve the scenario of how to extract the target communication number that matches the preset feature from the communication number included in the pre-processed CDR.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 801 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • Step 802 Parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
  • Step 803 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
  • Step 804 Analyze, by using a machine learning model, features of corresponding types of communication information of each communication number in the pre-processed bill.
  • Step 805 Determine whether the feature of the corresponding type communication information of each communication number matches the preset feature. If yes, go to step 806, otherwise the process ends.
  • Step 806 Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  • the implementation of the feature of the corresponding type of communication information of each communication number in the pre-processing bill is analyzed by using the machine learning model, including: using the technical solution or technology described in any one of the foregoing embodiments 3 to 6.
  • the combination of scenarios identifies the target communication number.
  • the machine learning model can adopt any of the following models or combinations: Bayesian classifier model; Support Vector Machine (SVM) classifier model; deep learning model; logic Regression; those skilled in the art can understand that the machine learning model can also include other models not listed herein, and the application is not limited thereto.
  • SVM Support Vector Machine
  • This embodiment is directed to how to obtain a scenario in which a target communication number that matches a preset feature is extracted from a communication number included in a pre-processed CDR, and analyzes a corresponding type of communication information of each communication number in the pre-processed CDR by using a machine learning model.
  • the feature has the target communication number matched with the preset feature extracted from the communication number included in the pre-processed CDR, thereby realizing fast and efficient number identification.
  • This embodiment is based on the seventh embodiment, and proposes a solution to solve the scenario in which the machine learning model is trained based on the feedback information of the target communication number on the user side.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 901 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • step 902 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
  • Step 903 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
  • Step 904 Analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 905, otherwise The process ends.
  • Step 905 extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill; and identify or identify the user with the communication response number of the communication record with the identified target communication number.
  • the user of the communication response number of the target communication number communication performs a danger reminder.
  • Step 906 Receive feedback information of the user side for the target communication number.
  • Step 907 Determine, according to the feedback information of the target communication number by the user side, whether the target communication number is a security number, and if yes, go to step 908, otherwise the process ends.
  • Step 908 Determine an error rate of the machine learning model based on the number of target communication numbers that are fed back to the security number by the user side among the identified target communication numbers.
  • Step 909 Determine whether the error rate of the machine learning model is greater than a fifth threshold, and if yes, go to step Step 910, otherwise the process ends.
  • Step 910 Retrain the machine learning model based on the communication record of the security number in the pre-processed bill.
  • a feasible implementation of the machine learning model to retraining includes:
  • the feature of the at least one type of communication information based on the security number updates the threshold used by the machine learning model to identify the target communication number.
  • the error rate of the machine learning model is determined according to the number of target communication numbers in the target communication number that are fed back to the security number by the user side, and When the error rate of the machine learning model is greater than the fifth threshold, the machine learning model is retrained based on the communication record of the security number in the preprocessed bill; since the retraining is based on the communication record of the security number in the preprocessed bill Therefore, the machine learning model obtained by retraining has a higher accuracy rate.
  • using the machine learning model obtained by retraining to identify the target communication number can improve the speed and accuracy of the number identification.
  • This embodiment is based on any of the foregoing embodiments, and proposes a solution to the response processing scenario when the target communication number is identified.
  • the communication number processing method provided in this embodiment includes the following steps:
  • Step 1001 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
  • step 1002 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
  • Step 1003 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
  • Step 1004 Analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature. If yes, go to step 1005; otherwise, The process ends.
  • Step 1005 Extract a target that matches the preset feature from the communication number included in the pre-processed bill Communication number.
  • Step 1006 Determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and the preset feature.
  • the degree of matching between the feature of the corresponding type of communication information of the target communication number and the preset feature can also be understood as the degree of difference between the feature and the preset feature of the corresponding type of communication information of the target communication number;
  • the similarity between the target communication number and the yellow page number is as follows.
  • the similarity between the target communication number and the yellow page number is greater than the first threshold.
  • the matching degree refers to the difference between the similarity between the target communication number and the yellow page number and the first threshold. size.
  • Step 1007 Determine a dangerous level of the target communication number according to the matching degree of the feature of the corresponding type of communication information of the target communication number with the preset feature.
  • the degree of matching is positively related to the level of danger; different levels of risk can correspond to the degree of matching within different data ranges.
  • Step 1008 Respond to the communication behavior of the target communication number based on the risk level of the target communication number.
  • the real-time degree of response processing is positively related to the hazard level; it is assumed that the defined hazard level includes: high risk, low risk; the hazard level here can be used to characterize the probability that the target communication number is a communication number that meets certain conditions, such as danger The level can be used to characterize the probability that the target communication number is a fraudulent number.
  • the manner of responding to the communication behavior of the target communication number may include: performing a danger reminder to the user having the communication response number of the communication record with the target communication number The user is reminded that the target communication number is a fraudulent number; here, the danger reminder includes a voice reminder and/or a text reminder; the voice reminder is, for example, a voice recording or a customer service telephone reminder; and the text reminder is, for example, a text message or a flash message.
  • the communication number processing means performs an after-the-life danger reminder to the user having the communication response number of the communication record of the target communication number, on the user device having the communication response number of the communication record with the target communication number, in the user application
  • the display window displays the following text reminder message "Please be vigilant!
  • the target communication number is a fraudulent number
  • the user applications here include but are not limited to: SMS, Flash, WeChat, Tencent mobile butler, etc.; of course, the application does not It is limited to the communication application, which is not specifically limited in the embodiment of the present application.
  • the manner of responding to the communication behavior of the target communication number may include: communicating with the target communication number
  • the user of the communication response number of the letter performs an immediate danger reminder (including but not limited to a text reminder such as a short message or a flash message, or a voice reminder such as a voice recording or a customer service telephone reminder), that is, the user is communicating with the target communication number.
  • an immediate danger reminder including but not limited to a text reminder such as a short message or a flash message, or a voice reminder such as a voice recording or a customer service telephone reminder
  • the user is reminded that the target communication number is a fraudulent number; or, the ongoing communication with the target communication number is directly intercepted, and the user is reminded of danger afterwards.
  • the risk level of the target communication number is determined based on the matching degree of the feature of the corresponding type of communication information of the target communication number and the preset feature, based on the risk level of the target communication number. Respond to the communication behavior of the target communication number, and remind the user who communicates with the target communication number to be vigilant and avoid fraud.
  • the present embodiment is applicable to a scenario in which it is necessary to identify a communication number that satisfies a preset condition from among a plurality of communication numbers, for example, for identification of a whole network number in a communication network, or for a user indication, based on any of the above embodiments.
  • Identifying the identification of the communication number, or in the scene of identifying the communication number for communicating with the current user; the type of communication service includes but is not limited to any one of the following service types or combinations: voice call; short message; flash message; data service (such as WeChat), this application is not limited to this.
  • a communication number processing apparatus (a fraudulent number identification system based on bill analysis) provided in this embodiment includes: an online identification system and an offline training system.
  • the online identification system extracts features according to the bill records collected by the operator; uses the machine learning model to determine whether a certain phone number is a fraudulent phone; then, the user is reminded/returned to the user to avoid being deceived, and will be reminded/ The results of the return visit are fed back to the offline training system, and the machine learning model is adjusted accordingly.
  • the offline training system extracts the corresponding features by using the historical bill data and the feedback result of the reminder/return visit in the online identification system; using these features,
  • the machine learning model is retrained and adjusted; the trained machine learning model is synchronized to the fraudulent phone recognition engine in the online training system.
  • the online identification system can identify the fraud number according to the user's call bill record; the online identification system can be further divided into three modules: a bill collection module, a fraudulent phone recognition engine, and a deceived user reminder system;
  • CDR collection module mainly responsible for the collection of user call records, and pre-processing the collected CDRs to obtain the following four columns of information:
  • Fraud phone identification engine This is the core of the online identification system; the collected bills are cleaned, the features are extracted, and the features extracted from the trained machine learning model dialog are used to identify whether the number is a fraudulent phone; It can be divided into three parts: bill cleaning, feature extraction and fraud number identification;
  • Bill cleaning is to remove the "dirty" data in the bill.
  • the so-called "dirty” data is some abnormal data, such as missing content, abnormal values, and so on.
  • Feature extraction After cleaning the CDRs, some features are extracted to prepare for the identification of the next scam number.
  • the features include: the similarity of the calling number, the average call duration, and the distance of the adjacent CDRs. Call interval, etc.
  • the similarity feature between the calling number and the yellow page number (ie, the similarity between the above-mentioned communication initiation number and the yellow page number): the fraud number is mostly the calling number, and the fraudster changes the calling number to the number on the yellow page by changing the numbering software. Similar numbers, such as 001XX86, +0109XX88, 08XXX10010 (China Unicom's customer service phone number is 10010), etc., calculate the edit distance of the substring of these numbers and the number on the yellow page (edit distance indicates the yellow page number, for example, by adding, reducing, modifying, moving The number of operations that the operation becomes the calling number).
  • the number of calls per unit time (that is, the number of communication times of the above communication initiation number in the unit time):
  • the fraudsters usually make a lot of calls every hour, and most of these calls are during working hours, that is, Monday to Friday 08. :00:00--18:00:00, during this time, the number of calls is evenly distributed; during non-working hours, the number of calls made by the phone is generally small, basically 0.
  • the average call duration (that is, the average communication duration mentioned above): that is, the average number of calls per call for the fraudulent number.
  • the average call duration of the fraud is short, no more than 20s.
  • the distribution of the attribution of the called number in time (unit: day) (ie, the number of different attributions of the communication response number corresponding to the above-mentioned communication initiation number): the fraudster is usually fraudulently by city, therefore, The called numbers in these bills usually belong to a certain city, and the number of the cities belonging to the called number within a certain period of time is taken as the feature.
  • the deceived user reminds the system: telling the victim of the fraudulent call to receive a call that is a fraudulent call, preventing the victim from being deceived; and submitting the information of the victim's feedback to the offline training system.
  • the offline training system extracts the characteristics of the relevant historical bills, retrains the machine learning model, and adjusts the Bayesian classifier (here can also Use other machine learning algorithms, such as svm classifier, logistic regression, deep learning, etc.); offline training system can be divided into three parts:
  • Extract historical bills Extract historical bills from the most recent period of time, especially if the feedback result is wrong.
  • Feature extraction Extract features from historical CDRs to provide data for the next model retraining.
  • Model retraining Using the features extracted in b), training the Bayesian classifier to obtain new parameters, and updating the trained machine learning model to the online recognition system.
  • the online identification system and the offline training system form a complete closed loop.
  • the offline training system will decide whether to retrain and update the fraudulent number identification model in the online identification system according to the result of the voice return visit.
  • the communication number processing device provided in this embodiment has the following advantages: 1) no need for the user's tag information, only the bill record is required; 2) speeding up the recognition speed and accuracy of the fraud number; 3) more accurate identification Fraud number; enables the operator to identify fraudulent calls during the user's call.
  • the embodiment further describes a communication number processing device, which can be used to execute the communication number processing method in the embodiment of the present application, and the communication number processing device can be implemented in various manners.
  • a user device such as a smart phone, a landline phone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.), or in a network device such as an enterprise gateway or a carrier gateway.
  • the communication number processing device may also be a client application or a background server of the user application, for example, when the user application When the Tencent mobile phone manager is in charge, the corresponding communication number processing device may be a client or a background server of the Tencent mobile phone housekeeper; see FIG. 13, the communication number processing device includes:
  • the obtaining module 1301 is configured to acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time;
  • the pre-processing module 1302 is configured to parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR;
  • the parsing module 1303 is configured to parse at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill;
  • the extracting module 1304 is configured to extract, from the communication number included in the pre-processed bill, a target communication number that matches the preset feature.
  • the present embodiment parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and based on the corresponding type of the communication number.
  • the communication information has characteristics that identify the target communication number that matches the preset feature from each communication number.
  • the generation and maintenance process of the communication number CDR is generally performed by the operator, the participation of each user is not required.
  • the acquisition speed and efficiency of the communication number CDR are high.
  • the CDR of the communication number is objective data maintained by the operator, it can truly and completely reflect all communication records of the user within a certain time interval, so
  • the technical solution provided by the embodiment of the present application is based on the CDR of the communication number, and can improve the speed and accuracy of the number identification.
  • the pre-processing module 1302 is specifically configured to:
  • the extracted communication records of the respective communication initiation numbers are combined to form a pre-processed bill.
  • the parsing module 1303 is specifically configured to: separately calculate an edit distance of each communication initiation number and a yellow page number in the pre-processed bill; and obtain each communication initiation number and yellow page in the pre-processed bill based on the edit distance. Number similarity;
  • the extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold; or, based on each communication initiation number included in the pre-processed CDR The ordering of the similarity between the middle and the yellow page number extracts the first proportion of the communication initiation number with the highest similarity.
  • the parsing module 1303 is specifically configured to: extract each of the pre-processed bills The communication number is used as the communication start time of the communication initiation number; and the number of communication times of each communication initiation number in the pre-processed bill in the unit time is calculated;
  • the extraction module 1304 is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time; or, based on each communication initiation number included in the pre-processed CDR The order of the number of communication times per unit time is extracted, and the communication ratio number of the second ratio with the highest number of communication times is extracted.
  • the parsing module 1303 is specifically configured to: extract the communication duration of each communication number in the pre-processed bill as the communication initiation number; and calculate the average communication duration of each communication initiation number in the pre-processed bill;
  • the extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold; or, based on the average communication of each communication initiation number included in the pre-processed CDR The sorting of the duration, extracting the third ratio of the communication initiation number with the highest average communication duration.
  • the parsing module 1303 is specifically configured to: obtain the attribution of the communication response number corresponding to each communication number in the pre-processed bill as the communication initiation number; calculate each communication initiation number in the pre-processed bill The number of different attributions of the corresponding communication response number;
  • the extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold; or, based on the pre-processed CDR The order of the number of different attributions of the communication response number corresponding to each communication initiation number is extracted, and the communication generation number of the fourth ratio having the highest number of different attributions of the corresponding communication response number is extracted.
  • the extraction module 1304 is specifically configured to: use a machine learning model to analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and extract from the communication number included in the pre-processed bill.
  • the target communication number that matches the preset feature is specifically configured to: use a machine learning model to analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and extract from the communication number included in the pre-processed bill. The target communication number that matches the preset feature.
  • the communication number processing apparatus of this embodiment also includes the obtaining module 1301, the preprocessing module 1302, the parsing module 1303, and the extracting module 1304 in FIG.
  • the communication number processing device of the embodiment further includes:
  • the training module 1305 is configured to receive feedback information of the user side for the target communication number, and determine the target. Whether the communication number is a security number; determining an error rate of the machine learning model based on the number of target communication numbers that are fed back to the security number by the user side in the identified target communication number; when the error rate of the machine learning model is greater than the fifth threshold, based on The communication record of the security number in the CDR is preprocessed, and the machine learning model is retrained.
  • the training module 1305 is specifically configured to: parse at least one type of communication information of the communication record of the security number in the pre-processed bill, and obtain a feature of the at least one type of communication information of the security number; The feature possessed by the at least one type of communication information updates the threshold used by the machine learning model to identify the target communication number.
  • the device further includes:
  • the response module 1306 is configured to determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and the preset feature, and determine a target according to a matching degree of the feature of the corresponding type of the communication number of the target communication number and the preset feature.
  • the danger level of the communication number responding to the communication behavior of the target communication number based on the danger level of the target communication number.
  • the obtaining module 1301, the pre-processing module 1302, the parsing module 1303, the extracting module 1304, the training module 1305, and the response module 1306 may all be configured by a central processing unit (CPU) and a microprocessor (MPU) located in the communication number processing device. ), an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
  • CPU central processing unit
  • MPU microprocessor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • This embodiment describes a computer readable medium, which may be a ROM (eg, a read only memory, a FLASH memory, a transfer device, etc.), a magnetic storage medium (eg, a magnetic tape, a disk drive, etc.), an optical storage medium (eg, a CD- ROM, DVD-ROM, paper card, paper tape, etc.) and other well-known types of program memory; computer-readable medium storing computer-executable instructions (such as binary executable instructions for projection applications such as Tencent video), when executing instructions Causing at least one processor to perform the following operations:
  • the target communication number matching the preset feature is extracted from the communication number included in the pre-processed bill.
  • the communication number processing device parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and identifies and presets from each communication number based on the characteristics of the corresponding type of communication information of the communication number.
  • the target communication number of the feature matching is generally responsible for the generation and maintenance process of the communication number CDR, and does not require the participation of each user, and the acquisition speed and efficiency of the communication number CDR are high, on the other hand Because the CDR of the communication number is the objective data maintained by the operator, it can truly and completely reflect all the communication records of the user in a certain time interval. Therefore, the technical solution provided by the embodiment of the present application is processed by the CDR of the communication number. Basic, can improve the speed and accuracy of number identification.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the application can take the form of a hardware embodiment, a software embodiment or an embodiment in combination with software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Abstract

Disclosed are a communication number processing method and apparatus. The method comprises: acquiring, from a communication service device, a bill of a pre-set number of communication numbers within a first pre-set time; parsing the bill to obtain the type of communication information comprised in the bill, and extracting at least one type of communication information about each communication number in the bill and combining same to form a preprocessing bill; parsing the at least one type of communication information about each communication number in the preprocessing bill to obtain a feature possessed by a corresponding type of communication information about each communication number in the preprocessing bill; and extracting, from the communication numbers comprised in the preprocessing bill, a target communication number matching a pre-set feature. By means of the present application, the speed and accuracy of number recognition can be improved.

Description

通信号码处理方法及装置Communication number processing method and device
本申请要求于2016年4月25日提交中国专利局、申请号为201610261923.1、发明名称为“通信号码处理方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。The present application claims the priority of the Chinese Patent Application, filed on Apr. 25, 2016, which is hereby incorporated by reference.
技术领域Technical field
本申请涉及通信技术领域的数据处理技术,尤其涉及一种通信号码处理方法及装置。The present application relates to data processing technologies in the field of communication technologies, and in particular, to a communication number processing method and apparatus.
背景技术Background technique
电信诈骗是指犯罪分子通过电话、网络和短信等方式,编造虚假信息,设置骗局,对受害人实施远程、非接触式诈骗,诱使受害人给犯罪分子打款或转账的犯罪行为,随着移动互联网的兴起,电信诈骗犯罪日益猖獗,数据显示,电信诈骗的涉案金额每年以指数级的速度快速增长,2015年全国公安机关共立电信诈骗案件59万起,同比上升32.5%,共造成经济损失222亿元;而每一个案件背后,都可能是一个个因诈骗而破碎的家庭。Telecommunications fraud refers to the criminal act of criminals making false information, setting up scams, conducting remote and contactless fraud on the victim, and inducing the victim to make money or transfer money to criminals through telephone, internet and SMS. The rise of mobile Internet, telecommunications fraud crimes are increasingly rampant, data shows that the amount of telecommunications fraud is growing at an exponential rate every year. In 2015, the public security organs of the country established a total of 590,000 telecom fraud cases, an increase of 32.5% year-on-year, causing economic losses. 22.2 billion yuan; and behind each case, it may be a family broken by fraud.
为了遏制电信诈骗,避免用户被诈骗电话诈骗,现有技术通过手机上的应用软件(app),收集用户对号码的标记信息,如果发现某个号码被多个用户同时标记为诈骗号码,则认为该号码为诈骗号码,并提醒与该诈骗号码进行通话的用户提高警惕,以避免被诈骗。In order to curb telecom fraud and prevent users from being scammed by fraudulent phone calls, the prior art collects the tag information of the user by using the application software (app) on the mobile phone. If a certain number is found to be simultaneously marked as a fraudulent number by multiple users, it is considered The number is a fraudulent number and alerts the user who is talking to the fraudulent number to be vigilant to avoid being scammed.
然而,一方面,现有技术需要收集用户标记信息,然而,实际中用户对号码进行标记的概率比较低,很多用户接到一个陌生来电往往不会去标记号码的类型,并且,现有技术需要收集足够多的用户标记后,才能认为该号码是诈骗号码,因此,现有技术的诈骗号码的识别速度慢、效率低;另一方面,现有技术中用户对号码进行标记是主观行为,很多用户在接听到一些骚扰电话,比如广告推销等恶意电话时,往往会将这些骚扰号码也标记为诈骗号码,因此,现有技术的诈骗号码的识别准确率较低。However, on the one hand, the prior art needs to collect user tag information. However, in reality, the probability that the user marks the number is relatively low, and many users often do not need to mark the type of the number when a strange call is received, and the prior art needs After collecting enough user tags, the number can be considered as a fraudulent number. Therefore, the prior art fraudulent number is recognized slowly and inefficiently. On the other hand, in the prior art, the user marks the number is subjective behavior, many When the user receives some harassing calls, such as advertisements and other malicious calls, the harassment number is often marked as a fraudulent number. Therefore, the prior art scam number identification accuracy is low.
发明内容Summary of the invention
有鉴于此,本申请实施例期望提供一种通信号码处理方法及装置,能够提 高号码识别的速度和准确性。In view of this, the embodiment of the present application is expected to provide a communication number processing method and apparatus, which can provide The speed and accuracy of high number identification.
为达到上述目的,本申请的技术方案是这样实现的:In order to achieve the above object, the technical solution of the present application is implemented as follows:
第一方面,本申请实施例提供一种通信号码处理方法,所述方法包括:In a first aspect, an embodiment of the present application provides a communication number processing method, where the method includes:
从通信业务设备获取第一预设时间内预设数量的通信号码的话单;Acquiring a preset number of communication numbers in the first preset time from the communication service device;
解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;Parsing the CDR to obtain the type of the communication information included in the CDR, extracting at least one type of communication information of each communication number in the CDR, and combining to form a pre-processed CDR;
解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征;Parsing at least one type of communication information of each communication number in the pre-processed bill, and obtaining characteristics of corresponding types of communication information of each communication number in the pre-processed bill;
从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。Extracting a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
可选的,所述解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单,包括:Optionally, the parsing the CDR obtains the type of the communication information included in the CDR, extracts at least one type of communication information of each communication number in the CDR, and combines to form a pre-processed CDR ,include:
解析所述话单得到所述话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应所述通信发起号码的通信响应号码、通信起始时间和通信时长;Parsing the CDR to obtain at least one of the following types of communication information included in the CDR: a communication initiation number, a communication response number corresponding to the communication initiation number, a communication start time, and a communication duration;
提取出所述话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;Extracting at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number;
将所提取的各通信发起号码的通信记录组合形成所述预处理话单。The extracted communication records of the respective communication initiation numbers are combined to form the pre-processed bill.
可选的,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:Optionally, the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, including:
分别计算所述预处理话单中的各通信发起号码与黄页号码的编辑距离,其中,所述编辑距离表示黄页号码变成通信发起号码的操作次数;Calculating, respectively, an edit distance of each communication initiation number and a yellow page number in the pre-processed bill, wherein the edit distance indicates an operation number of the yellow page number becoming a communication initiation number;
基于所述编辑距离得到所述预处理话单中各通信发起号码与黄页号码的相似度;Obtaining a similarity between each communication initiation number and a yellow page number in the preprocessed bill based on the edit distance;
从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:Extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature, including:
从所述预处理话单包括的各通信发起号码中提取出与所述黄页号码的相似度大于第一阈值的通信发起号码;Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold;
或者,基于所述预处理话单包括的各通信发起号码中与所述黄页号码的相 似度的排序,提取出相似度最高的第一比例的通信发起号码。Or based on the phase of the yellow page number in each communication initiation number included in the pre-processed CDR The ordering of the similarity extracts the first proportion of the communication initiation number with the highest similarity.
可选的,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:Optionally, the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, including:
提取所述预处理话单中各通信号码作为通信发起号码的通信起始时间;Extracting a communication start time of each communication number in the pre-processed bill as a communication initiation number;
计算所述预处理话单中各通信发起号码在单位时间内的通信次数;Calculating the number of communications of each communication initiation number in the pre-processed bill in a unit time;
从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:Extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature, including:
从所述预处理话单包括的各通信发起号码中提取出单位时间内通信次数大于第二阈值的通信发起号码;Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time;
或者,基于所述预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码。Alternatively, based on the order of the number of communication times of each communication initiation number included in the pre-processed CDR in a unit time, the communication initiation number of the second ratio with the highest number of communication times is extracted.
可选的,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:Optionally, the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, including:
提取所述预处理话单中各通信号码作为通信发起号码的通信时长;Extracting a communication duration of each communication number in the pre-processed bill as a communication initiation number;
计算所述预处理话单中各通信发起号码的平均通信时长;Calculating an average communication duration of each communication initiation number in the pre-processed bill;
从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:Extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature, including:
从所述预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码;Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold;
或者,基于所述预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码。Alternatively, based on the ordering of the average communication durations of the communication initiation numbers included in the pre-processed CDR, the communication initiation number of the third ratio having the highest average communication duration is extracted.
可选的,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:Optionally, the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, including:
获取所述预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地;Obtaining a attribution of a communication response number corresponding to each communication number in the pre-processed bill as a communication initiation number;
计算所述预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量;Calculating, by the number of different attributions of the communication response number corresponding to each communication initiation number in the pre-processed bill;
从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号 码,包括:Extracting a target pass signal matching the preset feature from the communication number included in the pre-processed bill Code, including:
从所述预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码;Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold;
或者,基于所述预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码。Or, based on the order of the number of different attributions of the communication response numbers corresponding to the communication initiation numbers included in the pre-processed CDR, extracting the fourth proportion of communication with the highest number of different attributions of the corresponding communication response number Initiate the number.
可选的,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:Optionally, extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature, including:
使用机器学习模型分析所述预处理话单中各通信号码的相应类型通信信息所具有的特征,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。The machine learning model is used to analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and the target communication number matching the preset feature is extracted from the communication number included in the pre-processed bill.
可选的,所述方法还包括:Optionally, the method further includes:
接收用户侧针对目标通信号码的反馈信息,确定所述目标通信号码是否为安全号码;Receiving feedback information of the user side for the target communication number, determining whether the target communication number is a security number;
基于所述识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定所述机器学习模型的错误率;Determining an error rate of the machine learning model based on the number of target communication numbers in the identified target communication number that are fed back to the security number by the user side;
机器学习模型的错误率大于第五阈值时,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型进行重新训练。When the error rate of the machine learning model is greater than the fifth threshold, the machine learning model is retrained based on the communication record of the security number in the pre-processed bill.
可选的,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型至进行重新训练,包括:Optionally, the machine learning model is retrained based on the communication record of the security number in the pre-processed bill, including:
解析所述预处理话单中所述安全号码的通信记录的至少一种类型的通信信息,得到所述安全号码的至少一种类型的通信信息所具有的特征;Parsing at least one type of communication information of the communication record of the security number in the pre-processed bill, and obtaining characteristics of at least one type of communication information of the security number;
基于所述安全号码的至少一种类型的通信信息所具有的特征更新所述机器学习模型识别所述目标通信号码所使用的阈值。Updating a threshold used by the machine learning model to identify the target communication number based on characteristics of at least one type of communication information of the security number.
可选的,所述从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码之后,所述方法还包括:Optionally, after the target communication number that matches the preset feature is extracted from the communication number included in the pre-processed CDR, the method further includes:
确定所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度;Determining a degree of matching between a feature of the corresponding type of communication information of the target communication number and a preset feature;
根据所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定所述目标通信号码的危险级别;Determining a risk level of the target communication number according to a matching degree of a feature of the corresponding type communication information of the target communication number and a preset feature;
基于所述目标通信号码的危险级别对所述目标通信号码的通信行为进行响 应处理。Transmitting the communication behavior of the target communication number based on the danger level of the target communication number Should be handled.
可选的,确定所述目标通信号码的危险级别为低危时,对所述目标通信号码的通信行为进行响应处理,包括:向具有与目标通信号码的通信记录的通信响应号码的用户进行危险提醒;其中,所述危险提醒包括语音提醒和/或文字提醒;Optionally, when determining that the risk level of the target communication number is low risk, responding to the communication behavior of the target communication number, including: performing danger to a user having a communication response number of the communication record with the target communication number Reminder; wherein the danger reminder includes a voice reminder and/or a text reminder;
或者,确定所述目标通信号码的危险级别为高危时,对所述目标通信号码的通信行为进行响应处理,包括:向与目标通信号码正在进行通信的通信响应号码的用户进行即时的危险提醒;或者,直接拦截与目标通信号码正在进行的通信。Or determining, when the risk level of the target communication number is high risk, responding to the communication behavior of the target communication number, comprising: performing an immediate danger reminder to a user of the communication response number that is communicating with the target communication number; Or, directly intercept the ongoing communication with the target communication number.
可选的,响应处理的实时程度与危险级别正相关。Alternatively, the real-time level of response processing is positively correlated with the level of danger.
第二方面,本申请实施例提供一种通信号码处理装置,所述装置包括:In a second aspect, the embodiment of the present application provides a communication number processing apparatus, where the apparatus includes:
获取模块,用于从通信业务设备获取第一预设时间内预设数量的通信号码的话单;An obtaining module, configured to acquire, from the communication service device, a CDR of a preset number of communication numbers in a first preset time;
预处理模块,用于解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;a pre-processing module, configured to parse the CDR to obtain a type of communication information included in the CDR, extract at least one type of communication information of each communication number in the CDR, and combine to form a pre-processed CDR ;
解析模块,用于解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征;a parsing module, configured to parse at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill;
提取模块,用于从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。And an extracting module, configured to extract, from the communication number included in the pre-processed bill, a target communication number that matches the preset feature.
可选的,所述预处理模块,具体用于:Optionally, the pre-processing module is specifically configured to:
解析所述话单得到所述话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应所述通信发起号码的通信响应号码、通信起始时间和通信时长;Parsing the CDR to obtain at least one of the following types of communication information included in the CDR: a communication initiation number, a communication response number corresponding to the communication initiation number, a communication start time, and a communication duration;
提取出所述话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;Extracting at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number;
将所提取的各通信发起号码的通信记录组合形成所述预处理话单。The extracted communication records of the respective communication initiation numbers are combined to form the pre-processed bill.
可选的,所述解析模块,具体用于:分别计算所述预处理话单中的各通信发起号码与黄页号码的编辑距离;基于所述编辑距离得到所述预处理话单中各通信发起号码与黄页号码的相似度,其中,所述编辑距离表示黄页号码变成通信发起号码的操作次数; Optionally, the parsing module is specifically configured to: separately calculate an edit distance of each communication initiation number and a yellow page number in the preprocessed bill; and obtain, according to the edit distance, each communication initiation in the preprocessed bill The similarity between the number and the yellow page number, wherein the edit distance indicates the number of operations of the yellow page number becoming the communication initiation number;
所述提取模块,具体用于:从所述预处理话单包括的各通信发起号码中提取出与所述黄页号码的相似度大于第一阈值的通信发起号码;或者,基于所述预处理话单包括的各通信发起号码中与所述黄页号码的相似度的排序,提取出相似度最高的第一比例的通信发起号码。The extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold; or, based on the pre-processing words The ranking of the similarity of the yellow page numbers among the communication initiation numbers included in the single is extracted, and the communication initiation number of the first ratio with the highest similarity is extracted.
可选的,所述解析模块,具体用于:提取所述预处理话单中各通信号码作为通信发起号码的通信起始时间;计算所述预处理话单中各通信发起号码在单位时间内的通信次数;Optionally, the parsing module is specifically configured to: extract a communication start time of each communication number in the pre-processed bill as a communication initiation number; and calculate each communication initiation number in the pre-processed bill in a unit time Number of communications;
所述提取模块,具体用于:从所述预处理话单包括的各通信发起号码中提取出单位时间内通信次数大于第二阈值的通信发起号码;或者,基于所述预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码。The extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time; or, based on the pre-processed CDR The number of communication times of each communication initiation number in the unit time is sorted, and the communication ratio number of the second ratio with the highest communication number is extracted.
可选的,所述解析模块,具体用于:提取所述预处理话单中各通信号码作为通信发起号码的通信时长;计算所述预处理话单中各通信发起号码的平均通信时长;Optionally, the parsing module is specifically configured to: extract a communication duration of each communication number in the pre-processed bill as a communication initiation number; and calculate an average communication duration of each communication initiation number in the pre-processed bill;
所述提取模块,具体用于:从所述预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码;或者,基于所述预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码。The extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold; or, based on each communication included in the pre-processed CDR The order of the average communication duration of the originating number is extracted, and the third ratio of the communication originating number with the highest average communication duration is extracted.
可选的,所述解析模块,具体用于:获取所述预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地;计算所述预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量;Optionally, the parsing module is specifically configured to: acquire a attribution of a communication response number corresponding to each communication number in the pre-processed bill as a communication initiation number; and calculate each communication initiation number in the pre-processed bill The number of different attributions of the corresponding communication response number;
所述提取模块,具体用于:从所述预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码;或者,基于所述预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码。The extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold; or, based on the The order of the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processing CDR is extracted, and the communication initiation number of the fourth ratio having the highest number of different attributions of the corresponding communication response number is extracted.
可选的,所述提取模块,具体用于:使用机器学习模型分析所述预处理话单中各通信号码的相应类型通信信息所具有的特征,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。Optionally, the extracting module is specifically configured to: analyze, by using a machine learning model, a feature of the corresponding type of communication information of each communication number in the pre-processed bill, from the communication number included in the pre-processed bill Extract the target communication number that matches the preset feature.
可选的,所述装置还包括:Optionally, the device further includes:
训练模块,用于接收用户侧针对目标通信号码的反馈信息,确定所述目标 通信号码是否为安全号码;基于所述识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定所述机器学习模型的错误率;机器学习模型的错误率大于第五阈值时,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型进行重新训练。a training module, configured to receive feedback information of the user side for the target communication number, and determine the target Whether the communication number is a security number; determining an error rate of the machine learning model based on the number of the target communication numbers that are fed back to the security number by the user side in the identified target communication number; the error rate of the machine learning model is greater than the fifth At the threshold, the machine learning model is retrained based on the communication record of the security number in the pre-processed bill.
可选的,所述训练模块,具体用于:解析所述预处理话单中所述安全号码的通信记录的至少一种类型的通信信息,得到所述安全号码的至少一种类型的通信信息所具有的特征;基于所述安全号码的至少一种类型的通信信息所具有的特征更新所述机器学习模型识别所述目标通信号码所使用的阈值。Optionally, the training module is configured to: parse at least one type of communication information of the communication record of the security number in the pre-processed bill, and obtain at least one type of communication information of the security number. Having a feature; updating a threshold used by the machine learning model to identify the target communication number based on characteristics of at least one type of communication information of the security number.
可选的,所述装置还包括:Optionally, the device further includes:
响应模块,用于确定所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度;根据所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定所述目标通信号码的危险级别;基于所述目标通信号码的危险级别对所述目标通信号码的通信行为进行响应处理。a response module, configured to determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and a preset feature; and a matching degree between a feature of the corresponding type of communication information of the target communication number and a preset feature Determining a risk level of the target communication number; and responding to the communication behavior of the target communication number based on the risk level of the target communication number.
相比于现有技术需要收集用户的标记信息,本申请实施例通过解析第一预设时间内预设数量的通信号码的话单得到各通信号码的相应类型通信信息所具有的特征,并基于各通信号码的相应类型通信信息所具有的特征从各通信号码中提取出与预设特征匹配的目标通信号码,一方面,通信号码话单是由运营商维护的客观数据,能够真实和完整地反映用户在一定时间间隔内的全部通信记录,本申请实施例以通信号码话单为处理依据,能够提高号码识别的准确性,另一方面,由于话单的生成及维护过程一般并不需要各用户的直接参与,而是由运营商负责,因而通信号码话单的获取速度和效率较高,如此,本申请实施例能够提高号码识别的速度和准确性。Compared with the prior art, it is required to collect the marking information of the user, and the embodiment of the present application obtains the characteristics of the corresponding type of communication information of each communication number by parsing the CDR of the preset number of communication numbers in the first preset time, and based on each The corresponding type of communication information of the communication number has characteristics for extracting the target communication number matching the preset feature from each communication number. On the one hand, the communication number CDR is objective data maintained by the operator, and can be truly and completely reflected. For all the communication records of the user in a certain time interval, the embodiment of the present application uses the communication number CDR as the processing basis, which can improve the accuracy of the number identification. On the other hand, since the generation and maintenance process of the CDR generally does not require each user. The direct participation of the operator is responsible for the speed and efficiency of the communication number CDRs. Therefore, the embodiment of the present application can improve the speed and accuracy of the number identification.
附图说明DRAWINGS
图1为本申请实施例中通信号码处理方法的一个可选的应用场景示意图;FIG. 1 is a schematic diagram of an optional application scenario of a method for processing a communication number in an embodiment of the present application;
图2为本申请实施例一中通信号码处理方法的一个可选的流程示意图;2 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 1 of the present application;
图3为本申请实施例二中通信号码处理方法的一个可选的流程示意图;FIG. 3 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 2 of the present application;
图4为本申请实施例三中通信号码处理方法的一个可选的流程示意图;4 is an optional schematic flowchart of a method for processing a communication number in Embodiment 3 of the present application;
图5为本申请实施例四中通信号码处理方法的一个可选的流程示意图;FIG. 5 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 4 of the present application;
图6为本申请实施例五中通信号码处理方法的一个可选的流程示意图;FIG. 6 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 5 of the present application;
图7为本申请实施例六中通信号码处理方法的一个可选的流程示意图; FIG. 7 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 6 of the present application;
图8为本申请实施例七中通信号码处理方法的一个可选的流程示意图;FIG. 8 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 7 of the present application;
图9为本申请实施例八中通信号码处理方法的一个可选的流程示意图;FIG. 9 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 8 of the present application;
图10为本申请实施例九中通信号码处理方法的一个可选的流程示意图;10 is an optional schematic flowchart of a method for processing a communication number in Embodiment 9 of the present application;
图11a为本申请实施例中运行于用户设备上的用户应用处于接收用户指示状态的一个可选的示意图;FIG. 11 is an optional schematic diagram of a user application running on a user equipment in a state of receiving a user indication according to an embodiment of the present disclosure;
图11b为本申请实施例中运行于用户设备上的用户应用处于文字提醒状态的一个可选的示意图;FIG. 11b is an optional schematic diagram of a user application running on a user equipment in a text reminding state according to an embodiment of the present application;
图12为本申请实施例中通信号码处理装置的一个可选的结构示意图;FIG. 12 is an optional structural diagram of a communication number processing apparatus according to an embodiment of the present application;
图13为本申请实施例中通信号码处理装置的另一个可选的结构示意图;FIG. 13 is another schematic structural diagram of a communication number processing apparatus according to an embodiment of the present application;
图14为本申请实施例中通信号码处理装置的又一个可选的结构示意图。FIG. 14 is still another schematic structural diagram of a communication number processing apparatus according to an embodiment of the present application.
具体实施方式detailed description
以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。The present application will be further described in detail below with reference to the accompanying drawings and embodiments. It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.
本申请实施例记载一种通信号码处理方法,参见图1示出的本申请实施例中通信号码处理方法的一个可选的应用场景,用户设备11、用户设备12、用户设备13、网络设备14(如运营商网关或企业网关)、通信业务设备15、应用的后台服务器16分别接入通信网络(如无线网络或有线网络),通信业务设备15例如业务支撑系统(BSS,Business Support System)/运营支撑系统(OSS,Operation Support System),或者电信交换机;通信业务设备15用于提供通信号码的话单;网络设备14用于对接入该通信网络的各用户设备提供业务支撑;应用的后台服务器16用于为应用提供业务支撑;这里,与应用的后台服务器16相对应的,安装于用户设备的应用的客户端也用于为应用提供业务支撑;应用具体可以为通信类应用,例如:腾讯手机管家、微信、腾讯邮箱等等,当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定;在上述场景中,用户设备的数量至少为一个,各用户设备分别关联至少一个不同的通信号码,例如,图1示出的用户设备11关联至少一个通信号码A、用户设备12关联至少一个通信号码B,用户设备13关联至少一个通信号码C,通信号码A、通信号码B与通信号码C两两互不相同;本申请实施例中通信号码处理方法可以应用于上述场景中,实现从多个通信号码中识别出满足预设条件的通信号码。The embodiment of the present application describes a communication number processing method. Referring to FIG. 1 , an optional application scenario of the communication number processing method in the embodiment of the present application, the user equipment 11 , the user equipment 12 , the user equipment 13 , and the network device 14 . (such as carrier gateway or enterprise gateway), communication service device 15, application background server 16 respectively access communication network (such as wireless network or wired network), communication service device 15 such as business support system (BSS, Business Support System) / An operation support system (OSS), or a telecommunication switch; the communication service device 15 is configured to provide a bill for a communication number; the network device 14 is configured to provide service support for each user equipment accessing the communication network; 16 is used to provide service support for the application; here, corresponding to the background server 16 of the application, the client of the application installed on the user equipment is also used to provide service support for the application; the application may be a communication application, for example: Tencent Mobile phone housekeeper, WeChat, Tencent mailbox, etc. Of course, applications are not limited to communication applications. The application device does not specifically limit this; in the above scenario, the number of user equipments is at least one, and each user equipment is associated with at least one different communication number, for example, the user equipment 11 shown in FIG. 1 is associated with at least one communication. The number A, the user equipment 12 is associated with at least one communication number B, and the user equipment 13 is associated with at least one communication number C. The communication number A, the communication number B and the communication number C are different from each other. Applicable to the above scenario, the communication number that satisfies the preset condition is identified from the plurality of communication numbers.
本申请实施例还记载一种通信号码处理装置,可以用于执行本申请实施例 的通信号码处理方法;通信号码处理装置可以采用各种方式来实施,例如在智能手机、固定电话、平板电脑、笔记本电脑、穿戴式设备(如智能眼镜、智能手表等)等用户设备中实施装置的全部组件,或者,在企业网关、运营商网关等网络设备中实施装置的全部组件,或者,在上述的用户设备侧或网络侧以耦合的方式实施装置中的组件,或者,通信号码处理装置还可以是用户应用的客户端或者后台服务器,例如,当用户应用为腾讯手机管家时,相应的通信号码处理装置可以为腾讯手机管家的客户端或者后台服务器。The embodiment of the present application further describes a communication number processing apparatus, which can be used to execute the embodiment of the present application. Communication number processing method; the communication number processing device can be implemented in various manners, for example, in a user device such as a smart phone, a fixed telephone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.) All components of the device, or all components of the device are implemented in a network device such as an enterprise gateway or a carrier gateway, or the components in the device are implemented in a coupled manner on the user device side or the network side, or the communication number processing device It can also be a client application or a background server of the user application. For example, when the user application is a Tencent mobile phone housekeeper, the corresponding communication number processing device can be a client or a background server of the Tencent mobile phone housekeeper.
基于上述记载的应用场景及通信号码处理装置,提出以下各具体实施例。Based on the application scenario and the communication number processing device described above, the following specific embodiments are proposed.
实施例一Embodiment 1
本实施例提供一种通信号码处理方法,可以应用于需要从多个通信号码中识别出满足预设条件的通信号码的场景中,例如针对通信网络中全网号码的识别,或者,针对用户指示的待识别通信号码的识别,或者,针对与当前用户进行通信的通信号码的识别等场景中;通信的业务类型包括但不限于以下任意一种业务类型或组合:语音通话;短信;闪信;数据业务(如微信),本申请并不以此为限。The embodiment provides a communication number processing method, which can be applied to a scenario in which a communication number that satisfies a preset condition needs to be identified from multiple communication numbers, for example, identification of a network-wide number in a communication network, or The identification of the communication number to be identified, or the identification of the communication number for communicating with the current user; the type of communication service includes but is not limited to any one of the following service types or combinations: voice call; short message; flash message; Data services (such as WeChat), this application is not limited to this.
基于上述通信号码处理装置,参见图2,本实施例提供的通信号码处理方法,包括以下步骤:Based on the foregoing communication number processing apparatus, referring to FIG. 2, the communication number processing method provided in this embodiment includes the following steps:
步骤201、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。Step 201: Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
通信业务设备可以包括电信支撑系统设备,例如BSS/OSS,或者电信交换机;第一预设时间可以由用户或运营商根据实际业务需求等实际情况灵活设定;通信号码并不限于手机号码、固定号码等;通信号码例如可以包括通信网络中的全部通信号码,或者,用户指示的待识别通信号码,或者,与当前用户进行通话的通信号码;其中,上述用户指示的待识别通信号码,例如用户在用户设备上运行的应用(如腾讯手机管家)中指定的待识别通信号码,或者,用户向运营商服务器发送携带待识别通信号码的指示消息。The communication service device may include a telecommunication support system device, such as a BSS/OSS, or a telecommunication switch; the first preset time may be flexibly set by the user or the operator according to actual conditions such as actual service requirements; the communication number is not limited to the mobile phone number and fixed. a number or the like; the communication number may include, for example, all communication numbers in the communication network, or a communication number to be identified indicated by the user, or a communication number to be called with the current user; wherein the communication number indicated by the user, such as the user The communication number to be identified specified in the application running on the user equipment (such as the Tencent mobile phone housekeeper), or the user sends an indication message carrying the communication number to be identified to the operator server.
上述从通信业务设备获取第一预设时间内预设数量的通信号码的话单的实现方式可以为以下方式至少之一:The implementation manner of obtaining the CDR of the preset number of communication numbers in the first preset time from the communication service device may be at least one of the following manners:
1)从通信业务设备获取通信网络中的全部通信号码在第一预设时间内的话单; 1) acquiring, from the communication service device, the CDRs of all the communication numbers in the communication network in the first preset time;
2)根据当前用户指示的待识别通信号码,从通信业务设备获取待识别通信号码在第一预设时间内的话单;2) obtaining, according to the communication number to be identified indicated by the current user, a bill of the communication number to be identified in the first preset time from the communication service device;
3)检测到与当前用户进行通话的通信号码时,从通信业务设备获取与当前用户进行通话的通信号码在第一预设时间内的话单;3) When detecting the communication number of the current user, the communication service device acquires the CDR of the communication number of the current user in the first preset time;
4)确定与当前用户进行通话的通信号码为陌生通信号码时,从通信业务设备获取陌生通信号码在第一预设时间内的话单。4) When it is determined that the communication number of the current user is a strange communication number, the communication service device obtains the CDR of the strange communication number within the first preset time.
步骤202、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。In step 202, the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
上述从通信业务设备获取的第一预设时间内预设数量的通信号码的话单一般是乱序的,本实施例中预处理话单是以各通信号码为维度进行统计形成,预处理话单中包括各通信号码在以下情况至少之一对应的至少一种类型的通信信息:通信号码作为主叫号码(如语音业务中的主叫号码)、通信号码作为被叫号码(如语音业务中的被叫号码)、通信号码作为信息发送号码(如短信发送号码,或者数据业务中的数据发送号码)、通信号码作为信息接收号码(如短信接收号码,或者数据业务中的数据接收号码)。The CDRs of the preset number of communication numbers that are obtained from the communication service device in the first preset time are generally out of order. In this embodiment, the pre-processed CDRs are formed by using the communication numbers as a dimension, and the pre-processing CDRs are formed. The communication information includes at least one type of communication information corresponding to at least one of the following communication numbers: the communication number is used as the calling number (such as the calling number in the voice service), and the communication number is used as the called number (such as in the voice service) The called number), the communication number is used as the information transmission number (such as the short message transmission number or the data transmission number in the data service), and the communication number is used as the information reception number (such as the short message receiving number or the data receiving number in the data service).
预处理话单中仅包括从话单中提取的各通信号码的至少一种类型的通信信息,也即预处理话单中并不需要包括话单中的全部信息;预处理话单的数据以各通信号码作为索引,预处理话单的数据结构,例如为:The pre-processed CDR includes only at least one type of communication information of each communication number extracted from the CDR, that is, the pre-processed CDR does not need to include all the information in the CDR; the data of the pre-processed CDR is Each communication number is used as an index to preprocess the data structure of the bill, for example:
语音业务中的主叫号码1:通信信息1、通信信息2、…;Calling number 1 in the voice service: communication information 1, communication information 2, ...;
语音业务中的主叫号码2:通信信息3、通信信息4、…;Calling number 2 in the voice service: communication information 3, communication information 4, ...;
短信发送号码3:通信信息5、通信信息6、…;SMS sending number 3: communication information 5, communication information 6, ...;
数据业务中的数据发送号码4:通信信息7、通信信息8、…。Data transmission number 4 in the data service: communication information 7, communication information 8, ....
以表1示出的以各通信号码作为主叫号码进行索引的预处理话单为例,参见表1的数据结构示例,此处的主叫号码、被叫号码、通信起始时间、通信时长(秒)为该话单中所包括的通信信息的类型的部分示例。For example, the pre-processing bills indexed by using each communication number as the calling number are shown in Table 1. For the data structure example in Table 1, the calling number, the called number, the communication start time, and the communication duration are shown here. (Second) is a partial example of the type of communication information included in the CDR.
表1Table 1
主叫号码Calling number 被叫号码Called number 通信起始时间Communication start time 通信时长(秒)Communication duration (seconds)
158xxxx0001158xxxx0001 186xxxx0002186xxxx0002 2016-01-15 15:32:422016-01-15 15:32:42 134134
158xxxx0001158xxxx0001 139xxxx0001139xxxx0001 2016-01-15 15:39:022016-01-15 15:39:02 1515
158xxxx0001158xxxx0001 139xxxx0002139xxxx0002 2016-01-15 15:48:022016-01-15 15:48:02 123123
170xxxx0001170xxxx0001 186xxxx0001186xxxx0001 2016-01-16 8:30:022016-01-16 8:30:02 7777
170xxxx0001170xxxx0001 139xxxx0002139xxxx0002 2016-01-17 9:26:022016-01-17 9:26:02 256256
步骤203、解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征。Step 203: Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
步骤204、分析预处理话单中各通信号码的相应类型通信信息所具有的特征,判断各通信号码的相应类型通信信息所具有的特征是否与预设特征匹配,若是,转到步骤205,否则流程结束。Step 204: Analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 205, otherwise The process ends.
步骤205、从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。Step 205: Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
对解析得到的预处理话单中各通信号码的相应类型通信信息所具有的特征进行分析,从预处理话单包括的通信号码中提取出相应类型通信信息所具有的特征与预设特征匹配的目标通信号码;预设特征例如是预先设置的先验值。The characteristics of the corresponding type of communication information of each communication number in the pre-processed bill are analyzed, and the characteristics of the corresponding type of communication information are matched with the preset features from the communication numbers included in the pre-processed bill. The target communication number; the preset feature is, for example, a pre-set a priori value.
相比于需要在收集用户标记信息的基础上实施识别号码的现有技术,本实施例对通信号码的话单进行解析得到通信号码的相应类型通信信息所具有的特征,并基于通信号码的相应类型通信信息所具有的特征从各通信号码中识别出与预设特征匹配的目标通信号码,一方面,由于通信号码话单的生成及维护过程一般是由运营商负责,并不需要各个用户的参与,通信号码话单的获取速度和效率较高,另一方面,由于通信号码的话单是由运营商维护的客观数据,因而能够真实和完整地反映用户在一定时间间隔内的所有通信记录,如此,本申请实施例提供的技术方案以通信号码的话单为处理基础,能够提高号码识别的速度和准确性。Compared with the prior art that needs to implement the identification number on the basis of collecting the user tag information, the present embodiment parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and based on the corresponding type of the communication number. The communication information has characteristics that identify the target communication number that matches the preset feature from each communication number. On the one hand, since the generation and maintenance process of the communication number CDR is generally performed by the operator, the participation of each user is not required. The acquisition speed and efficiency of the communication number CDR are high. On the other hand, since the CDR of the communication number is objective data maintained by the operator, it can truly and completely reflect all communication records of the user within a certain time interval, so The technical solution provided by the embodiment of the present application is based on the CDR of the communication number, and can improve the speed and accuracy of the number identification.
实施例二Embodiment 2
本实施例基于实施例一,针对具体如何解析话单得到话单中所包括的通信信息的类型,及提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单的场景,提出解决的技术方案。This embodiment is based on the first embodiment, and specifically determines how to parse the CDR to obtain the type of the communication information included in the CDR, and extracts at least one type of communication information of each communication number in the CDR and combines to form a pre-processed CDR. The scenario that proposes a solution to the technical solution.
参见图3,本实施例提供的通信号码处理方法,包括以下步骤:Referring to FIG. 3, the communication number processing method provided in this embodiment includes the following steps:
步骤301、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。Step 301: Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
步骤302、解析话单得到话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应通信发起号码的通信响应号码、通信起始时间和通信 时长。Step 302: Parsing the CDR to obtain at least one of the following types of communication information included in the CDR: a communication initiation number, a communication response number corresponding to the communication initiation number, a communication start time, and a communication duration.
通信发起号码可以包括作为主叫号码的通信号码(如语音业务中的主叫号码),及作为信息发送号码的通信号码(如短信发送号码,或者数据业务中的数据发送号码);对应通信发起号码的通信响应号码可以包括作为被叫号码的通信号码(如语音业务中的被叫号码),及作为信息接收号码的通信号码(如短信接收号码,或者数据业务中的数据接收号码);本领域技术人员可以理解的是,话单中包括的通信信息的类型并不限于上述的通信发起号码、对应通信发起号码的通信响应号码、通信起始时间、通信时长等,通信信息的类型还可以包括数据流量(上行流量和/或下行流量)、通信地点、业务类型、长途类型等;本申请并不以此为限。The communication initiation number may include a communication number as a calling number (such as a calling number in a voice service), and a communication number (such as a short message transmission number or a data transmission number in a data service) as an information transmission number; The communication response number of the number may include a communication number as the called number (such as the called number in the voice service), and a communication number (such as a short message receiving number or a data receiving number in the data service) as the information receiving number; It can be understood by those skilled in the art that the type of communication information included in the CDR is not limited to the above-mentioned communication initiation number, the communication response number corresponding to the communication initiation number, the communication start time, the communication duration, etc., and the type of communication information can also be Including data traffic (upstream traffic and/or downstream traffic), communication location, service type, long-distance type, etc.; this application is not limited thereto.
步骤303、提取出话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录。Step 303: Extract at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number.
步骤304、将所提取的各通信发起号码的通信记录组合形成预处理话单。Step 304: Combine the extracted communication records of each communication initiation number to form a pre-processed bill.
这里,预处理话单只是包括了从话单中提取的各通信号码的至少一种类型的通信信息,预处理话单并未包括话单中的全部信息,可以降低通信号码处理工作量,提高通信号码处理效率。Here, the pre-processed CDR only includes at least one type of communication information of each communication number extracted from the CDR, and the pre-processed CDR does not include all the information in the CDR, which can reduce the workload of the communication number processing and improve Communication number processing efficiency.
上述从通信业务设备获取的第一预设时间内预设数量的通信号码的话单一般是乱序的,以表2示出的话单为例,此处的通信起始时间、业务类型、通信发起号码、通信响应号码、通信地点、长途类型、通信时长(秒)为该话单中所包括的通信信息的类型的部分示例。The CDRs of the preset number of communication numbers that are obtained from the communication service device in the first preset time are generally out of order. The CDRs shown in Table 2 are taken as an example. The communication start time, service type, and communication initiation are used here. The number, the communication response number, the communication place, the long distance type, and the communication duration (seconds) are partial examples of the types of communication information included in the CDR.
表2Table 2
Figure PCTCN2017081813-appb-000001
Figure PCTCN2017081813-appb-000001
Figure PCTCN2017081813-appb-000002
Figure PCTCN2017081813-appb-000002
通信号码处理装置对表2示出的话单进行解析,得到话单中所包括的以下类型的通信信息中的至少一种:通信发起号码;对应通信发起号码的通信响应号码;通信起始时间;通信时长;The communication number processing apparatus parses the CDR shown in Table 2 to obtain at least one of the following types of communication information included in the CDR: a communication initiation number; a communication response number corresponding to the communication initiation number; and a communication start time; Communication duration;
通信号码处理装置提取出话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;这里,每个通信发起号码的通信记录中包括该通信号码在第一预设时间内的至少一种类型的通信信息;The communication number processing device extracts at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number; where the communication record of each communication initiation number includes the communication number at the first At least one type of communication information within a preset time;
将所提取的各通信发起号码的通信记录组合形成预处理话单;预处理话单是以各通信号码为维度进行统计形成,预处理话单中的数据结构(或显示方式)以各通信号码为索引组织,假设将各通信号码为通信发起号码时对应的至少一种类型的通信信息进行组合形成预处理话单,预处理话单的数据结构可以为:The communication records of the extracted communication initiation numbers are combined to form a pre-processed CDR; the pre-processed CDRs are statistically formed by using each communication number as a dimension, and the data structure (or display mode) in the pre-processed CDR is used for each communication number. For the index organization, it is assumed that at least one type of communication information corresponding to each communication number is a communication initiation number to form a pre-processed bill, and the data structure of the pre-processed bill can be:
通信发起号码1:通信信息1、通信信息2、…;Communication initiation number 1: communication information 1, communication information 2, ...;
通信发起号码2:通信信息1、通信信息2、…;…。Communication initiation number 2: communication information 1, communication information 2, ...;
以表3示出的预处理话单为例,表3示出的预处理话单是通信号码处理装置在表2示出的话单的基础上,通过执行步骤202-步骤204的方法得到的;该预处理话单以各通信发起号码为索引进行组织。Taking the pre-processing bill shown in Table 3 as an example, the pre-processing bill shown in Table 3 is obtained by performing the steps 202-204 on the basis of the bill shown in Table 2 by the communication number processing device; The pre-processed CDRs are organized by indexing each communication initiation number.
表3table 3
通信发起号码Communication origination number 通信响应号码Communication response number 通信起始时间Communication start time 通信时长(秒)Communication duration (seconds)
158xxxx0001158xxxx0001 186xxxx0002186xxxx0002 2016-01-15 15:32:422016-01-15 15:32:42 134134
158xxxx0001158xxxx0001 186xxxx0007186xxxx0007 2016-01-15 15:42:022016-01-15 15:42:02 9797
158xxxx0001158xxxx0001 139xxxx0006139xxxx0006 2016-01-15 15:48:022016-01-15 15:48:02 123123
158xxxx0001158xxxx0001 187xxxx0002187xxxx0002 2016-01-15 15:52:072016-01-15 15:52:07 256256
170xxxx0001170xxxx0001 186xxxx0001186xxxx0001 2016-01-15 15:39:022016-01-15 15:39:02 1515
170xxxx0001170xxxx0001 180xxxx0007180xxxx0007 2016-01-15 15:51:022016-01-15 15:51:02 7777
170xxxx0001170xxxx0001 139xxxx0002139xxxx0002 2016-01-16 10:26:022016-01-16 10:26:02 ----
步骤305、解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征。Step 305: Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
步骤306、分析预处理话单中各通信号码的相应类型通信信息所具有的特征,判断各通信号码的相应类型通信信息所具有的特征是否与预设特征匹配,若是,转到步骤307,否则流程结束。Step 306: Analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 307, otherwise The process ends.
步骤307、从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。Step 307: Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
本实施例针对具体如何解析话单得到话单中所包括的通信信息的类型,及提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单的场景,通过解析话单得到话单中所包括的至少一种类型的通信信息,提取出话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录,将所提取的各通信发起号码的通信记录组合形成预处理话单,所形成的预处理话单仅包括了从话单中提取的各通信号码的至少一种类型的通信信息,预处理话单并未包括话单中的全部信息,可以降低号码识别的工作量,提高号码识别的速度和效率。This embodiment is directed to how to parse a CDR to obtain the type of communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR. The CDR obtains at least one type of communication information included in the CDR, and extracts at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number, and extracts each extracted The communication record combination of the communication initiation number forms a pre-processed CDR, and the formed pre-processed CDR includes only at least one type of communication information of each communication number extracted from the CDR, and the pre-processed CDR does not include the CDR All the information in the book can reduce the workload of number identification and improve the speed and efficiency of number identification.
实施例三Embodiment 3
本实施例基于实施例一,以通信发起号码与黄页号码的编辑距离作为通信号码的特征,说明具体如何从多个通信号码中识别出满足预设条件的通信号码的技术方案;本实施例提供的通信号码处理方法,包括以下步骤:The embodiment is based on the first embodiment, and the editing distance of the communication initiation number and the yellow page number is used as the feature of the communication number, and the technical solution for specifically identifying the communication number that meets the preset condition from the plurality of communication numbers is described. The communication number processing method includes the following steps:
1)从通信业务设备获取第一预设时间内预设数量的通信号码的话单。1) Acquire a bill of a preset number of communication numbers in the first preset time from the communication service device.
2)解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。2) Parsing the CDR to obtain the type of communication information included in the CDR, extracting at least one type of communication information of each communication number in the CDR and combining to form a pre-processed CDR.
3)分别计算预处理话单中的各通信发起号码与黄页号码的编辑距离。3) Calculate the edit distance of each communication initiation number and yellow page number in the pre-processed bill separately.
黄页号码可以为一个或多个;编辑距离是指将黄页号码转成与通信发起号码所需的最少编辑操作次数,也即通过对黄页号码进行增加、减少、修改、移动号码等操作变成通信发起号码的操作次数;在黄页号码为多个的场景中,针对预处理话单中的每一个通信发起号码,需要分别计算该通信发起号码与每一个黄页号码的编辑距离。The yellow page number can be one or more; the edit distance refers to the minimum number of editing operations required to convert the yellow page number into the communication initiation number, that is, by adding, reducing, modifying, and moving the number of the yellow page number into communication. The number of operations for initiating a number; in the scenario where the yellow page number is multiple, for each communication initiation number in the pre-processed bill, the edit distance of the communication initiation number and each yellow page number needs to be separately calculated.
4)基于编辑距离得到预处理话单中各通信发起号码与黄页号码的相似度。4) Based on the edit distance, the similarity between each communication initiation number and the yellow page number in the pre-processed bill is obtained.
可以采用以下方式至少之一,实现基于编辑距离得到预处理话单中各通信 发起号码与黄页号码的相似度:At least one of the following methods may be used to obtain each communication in the pre-processed bill based on the edit distance. The similarity between the originating number and the yellow page number:
方式1、针对预处理话单中的每一个通信发起号码,将分别计算得到的该通信发起号码与每一个黄页号码的编辑距离进行归一化处理,得到该通信发起号码与每一个黄页号码的相似度;进一步,对该通信发起号码与每一个黄页号码的相似度进行排序。Method 1, for each communication initiation number in the pre-processed bill, normalizing the calculated communication initiation number and the edit distance of each yellow page number to obtain the communication initiation number and each yellow page number. Similarity; further, the similarity of the communication initiation number to each yellow page number is sorted.
方式2、针对预处理话单中的每一个通信发起号码,计算该通信发起号码与黄页号码的编辑距离与预设距离的比值,将计算得到的比值通信发起号码与黄页号码的相似度;在黄页号码为多个的场景中,需要分别计算该通信发起号码与每一个黄页号码的编辑距离与预设距离的比值。Method 2, for each communication initiation number in the pre-processed bill, calculating a ratio of the edit distance of the communication initiation number to the yellow page number and the preset distance, and calculating the similarity between the calculated ratio communication initiation number and the yellow page number; In a scenario where the yellow page number is multiple, the ratio of the edit distance of the communication initiation number to each yellow page number to the preset distance needs to be separately calculated.
5)判断预处理话单包括的各通信发起号码与黄页号码的相似度是否大于第一阈值,若是,则从预处理话单包括的各通信发起号码中提取出与黄页号码的相似度大于第一阈值的通信发起号码,作为目标通信号码;否则流程终止。5) determining whether the similarity between each communication initiation number and the yellow page number included in the pre-processed CDR is greater than a first threshold, and if so, extracting the similarity with the yellow page number from each communication initiation number included in the pre-processed CDR is greater than the first A threshold communication initiation number is used as the target communication number; otherwise the process terminates.
第一阈值(即相似度阈值)的初始值可以由人工设定或训练计算得到,例如:根据先验值确定预处理话单包括的各通信发起号码中的目标通信号码的目标数量;将各通信发起号码与黄页号码的相似度进行排序;按照相似度递减的次序,选取目标数量的通信发起号码;将所选取的通信发起号码中与黄页号码的相似度最小的通信发起号码所对应的相似度,确定为第一阈值的初始值。第一阈值可以根据实际需要,通过训练计算进行继续更新。The initial value of the first threshold (ie, the similarity threshold) may be calculated by manual setting or training, for example, determining, according to the a priori value, the target number of the target communication number in each communication initiation number included in the pre-processed bill; The similarity between the communication initiation number and the yellow page number is sorted; the communication initiation number of the target number is selected according to the order of decreasing similarity; and the communication initiation number of the selected communication initiation number having the smallest similarity with the yellow page number is similar Degree, determined as the initial value of the first threshold. The first threshold can be continuously updated by training calculation according to actual needs.
在一个可行的实施方式中,通信号码处理装置基于预处理话单中各通信发起号码与黄页号码的相似度,对预处理话单包括的各通信发起号码与黄页号码的相似度进行排序;基于预处理话单包括的各通信发起号码与黄页号码的相似度的排序,从预处理话单包括的各通信发起号码中提取出相似度最高的第一比例的通信发起号码,作为目标通信号码。In a feasible implementation manner, the communication number processing device sorts the similarity between each communication initiation number and the yellow page number included in the pre-processed bill based on the similarity between each communication initiation number and the yellow page number in the pre-processed bill; The order of the similarity between each communication initiation number and the yellow page number included in the pre-processing CDR is extracted from the communication initiation numbers included in the pre-processed CDR, and the first proportional communication initiation number with the highest similarity is extracted as the target communication number.
在另一个可行的实施方式中,针对预处理话单包括的各通信发起号码中的任意一个通信号码,通信号码处理装置根据该通信发起号码中与黄页号码的相似度及第一阈值,分别确定该通信发起号码属于目标通信号码(比如诈骗号码)类的概率、及属于正常号码类的概率,将概率较大值所对应的类作为该通信发起号码所属的类;若概率较大值所对应的类为目标通信号码类,则确定该通信发起号码为目标通信号码,反之则确定该通信发起号码为正常号码。In another feasible implementation manner, for any one of the communication initiation numbers included in the pre-processing CDR, the communication number processing device determines, according to the similarity between the communication initiation number and the yellow page number and the first threshold, respectively. The probability that the communication initiation number belongs to the target communication number (such as the fraud number) and the probability of belonging to the normal number category, and the class corresponding to the larger probability value is used as the class to which the communication initiation number belongs; The class is the target communication number class, and it is determined that the communication initiation number is the target communication number, and vice versa, the communication initiation number is determined to be the normal number.
本实施例的实施依赖于用户设备、服务器及通信业务设备的配合,这里,用户设备例如可以是智能手机、固定电话、平板电脑、笔记本电脑、穿戴式设 备(如智能眼镜、智能手表等)等;服务器例如可以是运营商的业务服务器、企业网关、安装于用户设备的应用的后台服务器等;通信业务设备例如可以是BSS/OSS或者电信交换机;应用具体可以为通信类应用,例如:腾讯手机管家、微信、腾讯邮箱等等,当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定;参见图4示出的用户设备、服务器及通信业务设备相互配合以实施本实施例提供的通信号码处理方法的一个可选的流程图,方法包括:The implementation of this embodiment depends on the cooperation of the user equipment, the server, and the communication service device. Here, the user equipment may be, for example, a smart phone, a fixed phone, a tablet computer, a notebook computer, or a wearable device. The device may be, for example, a service server of an operator, an enterprise gateway, a background server of an application installed on the user equipment, and the like; the communication service device may be, for example, a BSS/OSS or a telecommunication switch; Specifically, it can be a communication application, for example, a Tencent mobile phone housekeeper, a WeChat, a Tencent mailbox, etc., of course, the application is not limited to the communication application, which is not specifically limited in the embodiment of the present application; The server and the communication service device cooperate with each other to implement an optional flowchart of the communication number processing method provided by the embodiment, and the method includes:
步骤401、基于用户指示,用户设备向服务器发送携带待识别通信号码的识别指示。Step 401: The user equipment sends an identification indication carrying the to-be-identified communication number to the server, based on the user indication.
例如,参见图11a,运行于用户设备上的用户应用处于接收用户指示状态,用户在安装于用户设备的应用的显示窗口,按照应用的提示在指定位置输入待识别通信号码;这里,待识别通信号码可以为一个或多个。For example, referring to FIG. 11a, the user application running on the user equipment is in the receiving user indication state, and the user inputs the to-be-identified communication number in the designated location according to the prompt of the application in the display window of the application installed in the user equipment; The number can be one or more.
步骤402、服务器接收识别指示,基于识别指示向通信业务设备发送携带待识别通信号码的话单请求;话单请求中包括待识别通信号码、及第一预设时间。Step 402: The server receives the identification indication, and sends a CDR request carrying the to-be-identified communication number to the communication service device according to the identification indication. The CDR request includes the to-be-identified communication number and the first preset time.
步骤403、通信业务设备接收话单请求,基于话单请求获取待识别通信号码在第一预设时间内的话单,并发送给服务器。Step 403: The communication service device receives the CDR request, and obtains the CDR of the to-be-identified communication number in the first preset time based on the CDR request, and sends the CDR to the server.
步骤404、服务器接收待识别通信号码在第一预设时间内的话单。Step 404: The server receives the bill of the to-be-identified communication number in the first preset time.
步骤405、解析话单得到话单中所包括的通信信息的类型,提取出话单中各待识别通信号码的至少一种类型的通信信息并组合形成预处理话单。In step 405, the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number to be identified in the CDR is extracted and combined to form a pre-processed CDR.
步骤406、分别计算预处理话单中的各待识别通信号码与黄页号码的编辑距离。Step 406: Calculate an edit distance of each to-be-identified communication number and a yellow page number in the pre-processed bill.
步骤407、基于编辑距离得到预处理话单中各待识别通信号码与黄页号码的相似度。Step 407: Obtain a similarity between each to-be-identified communication number and the yellow page number in the pre-processed bill based on the edit distance.
步骤408、判断预处理话单包括的各待识别通信号码与黄页号码的相似度是否大于第一阈值,若是,则转到步骤409,否则流程终止。Step 408: Determine whether the similarity between each to-be-identified communication number and the yellow page number included in the pre-processed CDR is greater than a first threshold. If yes, go to step 409, otherwise the process terminates.
步骤409、从预处理话单包括的各待识别通信号码中提取出与黄页号码的相似度大于第一阈值的通信发起号码,作为目标通信号码。Step 409: Extract, from the to-be-identified communication numbers included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than the first threshold, as the target communication number.
步骤410、服务器基于识别到的目标通信号码向用户设备发送携带目标通信号码的识别响应,识别响应用于对用户进行危险提醒,提醒用户该识别到的目标通信号码可能为诈骗号码;危险提醒的实现方式包括但不限于通过短信、闪信、微信、腾讯手机管家等通信类应用进行提醒;服务器还可以在识别到目标通信号码时,直接通过客服电话向用户设备进行危险提醒。 Step 410: The server sends an identification response carrying the target communication number to the user equipment based on the identified target communication number, where the identification response is used to perform a dangerous reminder to the user, and the user is reminded that the identified target communication number may be a fraudulent number; Implementation methods include, but are not limited to, reminding by communication applications such as SMS, Flash, WeChat, and Tencent Mobile Manager; the server can also perform dangerous reminding to the user equipment directly through the customer service phone when the target communication number is identified.
同时,服务器基于识别到的目标通信号码,还可以向具有与识别出的目标通信号码的通信记录的通信响应号码的用户或者正在与识别出的目标通信号码通信的通信响应号码的用户进行危险提醒,以避免用户受骗。At the same time, based on the identified target communication number, the server may also perform a danger reminder to the user who has the communication response number of the communication record of the identified target communication number or the user who is communicating with the identified target communication number. To avoid users being cheated.
用户设备接收到服务器发送的携带目标通信号码的识别响应后,基于目标通信号码对用户进行危险提醒;例如,参见图11b,运行于用户设备上的用户应用处于文字提醒状态,用户设备在安装于用户设备的应用的显示窗口显示例如以下文字提醒信息“请提高警惕!目标通信号码是诈骗号码”;这里的用户应用包括但不限于:短信、闪信、微信、腾讯手机管家等通信类应用;当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定。After receiving the identification response of the carrying target communication number sent by the server, the user equipment performs a dangerous reminder on the user based on the target communication number; for example, referring to FIG. 11b, the user application running on the user equipment is in a text reminding state, and the user equipment is installed on the user equipment. The display window of the application of the user equipment displays, for example, the following text reminder message "Please be vigilant! The target communication number is a fraudulent number"; the user applications here include, but are not limited to, SMS, Flash, WeChat, Tencent mobile butler, and other communication applications; Of course, the application is not limited to the communication application, which is not specifically limited in the embodiment of the present application.
本实施例针对具体如何得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,通过在对话单进行解析的基础上得到预处理话单,分别计算预处理话单中的各通信发起号码与黄页号码的编辑距离,基于编辑距离得到预处理话单中各通信发起号码与黄页号码的相似度(即通信号码作为通信发起号码所具有的特征之一),从预处理话单包括的各通信发起号码中提取出与黄页号码的相似度大于预设第一阈值的通信发起号码作为目标通信号码,或者,基于预处理话单包括的各通信发起号码中与黄页号码的相似度的排序,提取出相似度最高的第一比例的通信发起号码作为目标通信号码;本申请实施例以预处理话单中各通信发起号码与黄页号码的相似度为特征,以第一阈值为预设特征,通过判断预处理话单包括的各通信发起号码与黄页号码的相似度与第一阈值的相对关系,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,实现了快速和准确的号码识别。The embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill. By pre-processing the CDRs on the basis of the parsing, the editing distances of the communication initiation numbers and the yellow page numbers in the pre-processed CDRs are respectively calculated, and the communication initiation numbers and the yellow page numbers in the pre-processed CDRs are obtained based on the editing distance. The degree of similarity (that is, the communication number is one of the characteristics of the communication initiation number), and the communication initiation number whose degree of similarity with the yellow page number is greater than the preset first threshold is extracted from each communication initiation number included in the pre-processed bill as The destination communication number, or the first generation of the communication initiation number with the highest degree of similarity is extracted as the target communication number based on the ranking of the similarity of the yellow number of each of the communication initiation numbers included in the pre-processing CDR; The similarity between each communication initiation number and the yellow page number in the pre-processed bill is characterized by a first threshold The preset feature, by determining the relative relationship between the similarity between each communication initiation number and the yellow page number included in the pre-processed CDR and the first threshold, extracting the target communication that matches the preset feature from the communication number included in the pre-processed CDR The number enables fast and accurate number identification.
实施例四Embodiment 4
本实施例基于实施例一,以通信发起号码在单位时间内的通信次数作为通信号码的特征,说明具体如何从多个通信号码中识别出满足预设条件的通信号码的技术方案;本实施例提供的通信号码处理方法,包括以下步骤:The embodiment is based on the first embodiment, and the communication number of the communication initiation number in the unit time is taken as the characteristic of the communication number, and the technical solution for specifically identifying the communication number that meets the preset condition from the plurality of communication numbers is described. The communication number processing method provided includes the following steps:
1)从通信业务设备获取第一预设时间内预设数量的通信号码的话单。1) Acquire a bill of a preset number of communication numbers in the first preset time from the communication service device.
2)解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。2) Parsing the CDR to obtain the type of communication information included in the CDR, extracting at least one type of communication information of each communication number in the CDR and combining to form a pre-processed CDR.
3)提取预处理话单中各通信号码作为通信发起号码的通信起始时间。 3) Extract the communication start time of each communication number in the pre-processed bill as the communication initiation number.
4)计算预处理话单中各通信发起号码在单位时间内的通信次数。4) Calculate the number of communication times of each communication initiation number in the pre-processing bill in unit time.
实际中,通信发起号码在单位时间内的通信次数可以包括以下任意一种:In practice, the number of communications of the communication initiation number in a unit time may include any of the following:
方式1、通信发起号码与相同号码在单位时间内的通信次数;Mode 1, the communication initiation number and the number of communication of the same number in the unit time;
方式2、通信发起号码与所有与其进行通信的通信号码在单位时间内的通信次数。Mode 2: The number of communication times between the communication initiation number and all communication numbers with which it communicates.
5)判断预处理话单包括的各通信发起号码在单位时间内的通信次数是否大于第二阈值,若是,则从预处理话单包括的各通信发起号码中提取出在单位时间内的通信次数大于第二阈值的通信发起号码,作为目标通信号码;否则流程终止。5) judging whether the number of communications of each communication initiation number included in the pre-processed CDR is greater than a second threshold, and if so, extracting the number of communications in the unit time from each communication initiation number included in the pre-processed CDR A communication initiation number greater than the second threshold is used as the target communication number; otherwise, the flow is terminated.
第二阈值的初始值可以由人工设定或训练计算得到,例如:根据先验值确定预处理话单包括的各通信发起号码中的目标通信号码的目标数量;将各通信发起号码在单位时间内的通信次数进行排序;按照在单位时间内的通信次数递减的次序,选取目标数量的通信发起号码;将所选取的通信发起号码中在单位时间内的通信次数最小的通信发起号码所对应的在单位时间内的通信次数,确定为第二阈值的初始值。第二阈值可以根据实际需要,通过训练计算进行继续更新。The initial value of the second threshold may be calculated by manual setting or training, for example, determining, according to the a priori value, the target number of the target communication number in each communication initiation number included in the pre-processed CDR; and each communication initiation number in the unit time Sorting the number of communication times; selecting the communication initiation number of the target number in the order of decreasing the number of communication times per unit time; corresponding to the communication initiation number of the selected communication initiation number having the smallest number of communication times per unit time The number of communications in a unit time is determined as the initial value of the second threshold. The second threshold can be continuously updated by training calculation according to actual needs.
在一个可行的实施方式中,通信号码处理装置基于预处理话单中各通信发起号码在单位时间内的通信次数,对预处理话单包括的各通信发起号码在单位时间内的通信次数进行排序;基于预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,从预处理话单包括的各通信发起号码中提取出通信次数最高的第二比例的通信发起号码,作为目标通信号码。In a feasible implementation manner, the communication number processing device sorts the communication times of each communication initiation number included in the pre-processed bill in the unit time based on the number of communication times of each communication initiation number in the pre-processed bill. And sorting the number of communication times of each communication initiation number included in the pre-processing CDR in a unit time, and extracting, from each communication initiation number included in the pre-processed CDR, a communication ratio number of the second ratio with the highest communication number as the target Communication number.
本实施例的实施依赖于用户设备、服务器及通信业务设备的配合,这里,用户设备例如可以是智能手机、固定电话、平板电脑、笔记本电脑、穿戴式设备(如智能眼镜、智能手表等)等;服务器例如可以是运营商的业务服务器、企业网关、安装于用户设备的应用的后台服务器等;通信业务设备例如可以是BSS/OSS或者电信交换机;应用具体可以为通信类应用,例如:腾讯手机管家、微信、腾讯邮箱等等,当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定;参见图5示出的用户设备、服务器及通信业务设备相互配合以实施本实施例提供的通信号码处理方法的一个可选的流程图,方法包括:The implementation of this embodiment depends on the cooperation of the user equipment, the server, and the communication service device. Here, the user equipment may be, for example, a smart phone, a fixed phone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.). The server may be, for example, a service server of an operator, an enterprise gateway, a back-end server installed in an application of the user equipment, or the like; the communication service device may be, for example, a BSS/OSS or a telecommunication switch; and the application may specifically be a communication application, for example, a Tencent mobile phone. The appliance, the WeChat, the Tencent mailbox, and the like, of course, the application is not limited to the communication application, and is not specifically limited in the embodiment of the present application; the user equipment, the server, and the communication service device shown in FIG. 5 cooperate with each other to implement the implementation. An optional flowchart of the communication number processing method provided by the example includes:
步骤501、当检测到与当前用户进行通话的对方通信号码时,用户设备(或安装于用户设备的应用)向服务器发送携带对方通信号码的识别指示。 Step 501: When detecting the communication number of the opposite party with the current user, the user equipment (or the application installed on the user equipment) sends an identification indication carrying the communication number of the opposite party to the server.
步骤502、服务器接收识别指示,基于识别指示向通信业务设备发送携带对方通信号码的话单请求;话单请求中包括对方通信号码及第一预设时间。Step 502: The server receives the identification indication, and sends a CDR request carrying the communication number of the opposite party to the communication service device according to the identification indication. The CDR request includes the communication number of the opposite party and the first preset time.
步骤503、通信业务设备接收话单请求,基于话单请求获取对方通信号码在第一预设时间内的话单,并发送给服务器。Step 503: The communication service device receives the bill request, and obtains the bill of the other party communication number in the first preset time based on the bill, and sends the bill to the server.
步骤504、服务器接收对方通信号码在第一预设时间内的话单。Step 504: The server receives a bill of the communication number of the other party in the first preset time.
步骤505、解析话单得到话单中所包括的通信信息的类型,提取出话单中对方通信号码的至少一种类型的通信信息并组合形成预处理话单。In step 505, the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of the other party's communication number in the CDR is extracted and combined to form a pre-processed CDR.
步骤506、提取预处理话单中该对方通信号码作为通信发起号码的通信起始时间。Step 506: Extract the communication start time of the counterpart communication number in the pre-processed bill as the communication initiation number.
步骤507、计算预处理话单中该对方通信号码在单位时间内的通信次数。Step 507: Calculate the number of communications of the counterpart communication number in the unit time in the pre-processed bill.
步骤508、判断预处理话单包括的该对方通信号码在单位时间内的通信次数是否大于第二阈值,若是,则转到步骤509,否则流程终止。Step 508: Determine whether the communication number of the counterpart communication number included in the pre-processed bill is greater than a second threshold in the unit time, and if yes, go to step 509, otherwise the process terminates.
步骤509、从预处理话单包括的该对方通信号码中提取出在单位时间内的通信次数大于第二阈值的通信发起号码,作为目标通信号码。Step 509: Extract, from the communication number of the other party included in the pre-processing CDR, a communication initiation number whose communication time in the unit time is greater than a second threshold, as the target communication number.
步骤510、服务器基于识别到的目标通信号码对用户进行危险提醒,提醒用户该识别到的目标通信号码可能为诈骗号码;危险提醒的实现方式包括但不限于通过短信、闪信、微信、腾讯手机管家等通信类应用进行提醒;服务器还可以在识别到目标通信号码时,直接通过客服电话向用户设备进行危险提醒。Step 510: The server performs a dangerous reminder to the user based on the identified target communication number, and reminds the user that the identified target communication number may be a fraudulent number; the implementation of the dangerous reminder includes but is not limited to a short message, a flash message, a WeChat, and a Tencent mobile phone. The communication application such as the housekeeper reminds the server; the server can also perform a dangerous reminder to the user equipment directly through the customer service phone when the target communication number is recognized.
同时,服务器基于识别到的目标通信号码,还可以向具有与识别出的目标通信号码的通信记录的通信响应号码的用户或者与识别出的目标通信号码正在通信的通信响应号码的用户进行危险提醒,以避免用户受骗。At the same time, based on the identified target communication number, the server may also perform a danger reminder to the user having the communication response number of the communication record of the identified target communication number or the user of the communication response number being communicated with the identified target communication number. To avoid users being cheated.
用户设备接收到服务器发送的携带目标通信号码的识别响应后,基于目标通信号码对用户进行危险提醒;例如,参见图11b,用户设备在安装于用户设备的应用的显示窗口显示例如以下文字提醒信息“请提高警惕!目标通信号码是诈骗号码”;这里的用户应用包括但不限于:短信、闪信、微信、腾讯手机管家等通信类应用;当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定。After receiving the identification response of the carrying target communication number sent by the server, the user equipment performs a dangerous reminder on the user based on the target communication number; for example, referring to FIG. 11b, the user equipment displays, for example, the following text reminding information in a display window of an application installed in the user equipment. "Please be vigilant! The target communication number is a fraudulent number"; the user application here includes, but is not limited to, a communication application such as a short message, a flash message, a WeChat, a Tencent mobile phone housekeeper, etc.; of course, the application is not limited to the communication application, and the embodiment of the present application This is not specifically limited.
本实施例针对具体如何得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,通过在对话单进行解析的基础上得到预处理话单,分别计算预处理话单中的各通信发起号码在单位时间内的通信次数(即通信号码作为 通信发起号码所具有的特征之一),从预处理话单包括的各通信发起号码中提取出在单位时间内的通信次数大于预设第二阈值的通信发起号码作为目标通信号码,或者,基于预处理话单包括的各通信发起号码中在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码作为目标通信号码;本申请实施例以预处理话单中各通信发起号码在单位时间内的通信次数为特征,以第二阈值为预设特征,通过判断预处理话单包括的各通信发起号码在单位时间内的通信次数与第二阈值的相对关系,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,实现了快速和准确的号码识别。The embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill. By obtaining the pre-processed CDRs based on the parsing of the dialog, respectively, the number of communication times of each communication initiation number in the pre-processed CDRs in the unit time is calculated (ie, the communication number is used as One of the characteristics of the communication initiation number), extracting, from each communication initiation number included in the pre-processed bill, a communication initiation number whose communication number in a unit time is greater than a preset second threshold as the target communication number, or based on Pre-processing the number of times of communication in the communication initiation number included in the pre-processing CDRs, and extracting the communication-initiated number of the second ratio, which is the highest number of communication times, as the target communication number; The communication initiation number is characterized by the number of communication times per unit time, and the second threshold is a preset feature, and the relative relationship between the communication times of each communication initiation number included in the pre-processed CDR and the second threshold is determined. The target communication number matching the preset feature is extracted from the communication number included in the pre-processed CDR, and fast and accurate number identification is realized.
实施例五Embodiment 5
本实施例基于实施例一,针对具体如何解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,提出解决的技术方案。The embodiment is based on the first embodiment, and is configured to obtain at least one type of communication information of each communication number in the pre-processed bill, and obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and The scenario in which the target communication number matching the preset feature is extracted from the communication number included in the pre-processing CDR is proposed, and a technical solution is proposed.
参见图6,本实施例提供的通信号码处理方法,包括以下步骤:Referring to FIG. 6, the communication number processing method provided in this embodiment includes the following steps:
步骤601、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。Step 601: Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
步骤602、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。Step 602: Parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
步骤603、提取预处理话单中各通信号码作为通信发起号码的通信时长。Step 603: Extract the communication duration of each communication number in the pre-processed bill as the communication initiation number.
步骤604、计算预处理话单中各通信发起号码的平均通信时长。Step 604: Calculate an average communication duration of each communication initiation number in the pre-processed bill.
实际中,通信发起号码的平均通信时长可以包括以下任意一种:In practice, the average communication duration of the communication initiation number may include any of the following:
1)通信发起号码与相同号码的平均通信时长;1) The average communication duration of the communication initiation number and the same number;
2)通信发起号码与所有与其进行通信的通信号码的平均通信时长。2) The average communication duration of the communication initiation number and all communication numbers with which it communicates.
步骤605、判断预处理话单包括的各通信发起号码的平均通信时长是否大于第三阈值,若是,则转到步骤606,否则流程终止。Step 605: Determine whether the average communication duration of each communication initiation number included in the pre-processed CDR is greater than a third threshold. If yes, go to step 606, otherwise the process terminates.
第三阈值的初始值可以由人工设定或训练计算得到,例如:The initial value of the third threshold can be calculated manually or by training, for example:
根据先验值确定预处理话单包括的各通信发起号码中的目标通信号码的目标数量;Determining, according to the a priori value, a target number of the target communication numbers in each communication initiation number included in the pre-processed bill;
将各通信发起号码的平均通信时长进行排序;Sorting the average communication duration of each communication initiation number;
按照平均通信时长递减的次序,选取目标数量的通信发起号码; Selecting a target number of communication initiation numbers in the order of decreasing average communication duration;
将所选取的通信发起号码中平均通信时长最小的通信发起号码所对应的平均通信时长,确定为第三阈值的初始值。The average communication duration corresponding to the communication initiation number having the smallest average communication duration in the selected communication initiation number is determined as the initial value of the third threshold.
第三阈值可以根据实际需要,通过训练计算进行继续更新。The third threshold can be continuously updated through training calculation according to actual needs.
步骤606、从预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码,作为目标通信号码。Step 606: Extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold, as the target communication number.
在一个可行的实施方式中,通信号码处理装置基于预处理话单中各通信发起号码的平均通信时长,对预处理话单包括的各通信发起号码的平均通信时长进行排序;基于预处理话单包括的各通信发起号码的平均通信时长的排序,从预处理话单包括的各通信发起号码中提取出平均通信时长最高的第三比例的通信发起号码,作为目标通信号码。In a feasible implementation manner, the communication number processing device sorts the average communication duration of each communication initiation number included in the pre-processed bill based on the average communication duration of each communication initiation number in the pre-processed bill; The order of the average communication duration of each of the included communication initiation numbers is extracted, and the communication initiation number of the third ratio having the highest average communication duration is extracted from each communication initiation number included in the pre-processed bill as the target communication number.
本实施例针对具体如何得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,通过在对话单进行解析的基础上得到预处理话单,分别计算预处理话单中的各通信发起号码的平均通信时长(即通信号码作为通信发起号码所具有的特征之一),从预处理话单包括的各通信发起号码中提取出平均通信时长大于预设第三阈值的通信发起号码作为目标通信号码,或者,基于预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码作为目标通信号码;本申请实施例以预处理话单中各通信发起号码的平均通信时长为特征,以第三阈值为预设特征,通过判断预处理话单包括的各通信发起号码的平均通信时长与第三阈值的相对关系,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,实现了快速和准确的号码识别。The embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill. By obtaining the pre-processed CDRs based on the parsing of the dialog, respectively calculating the average communication duration of each communication initiation number in the pre-processed CDR (ie, the communication number is one of the characteristics of the communication initiation number), from the pre-processing Extracting, by each communication initiation number included in the CDR, a communication initiation number whose average communication duration is greater than a preset third threshold as the target communication number, or sorting based on the average communication duration of each communication initiation number included in the pre-processed CDR The communication initiation number of the third ratio with the highest average communication duration is used as the target communication number. The embodiment of the present application is characterized by the average communication duration of each communication initiation number in the pre-processed bill, and the third threshold is a preset feature. The average communication duration of each communication initiation number included in the pre-processed bill is compared with the third threshold Relationship, from the extracted communication number included in the telephone bill pretreatment target communication number matches a preset characteristics to achieve a rapid and accurate identification number.
实施例六Embodiment 6
本实施例基于实施例一,针对具体如何解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,提出解决的技术方案。The embodiment is based on the first embodiment, and is configured to obtain at least one type of communication information of each communication number in the pre-processed bill, and obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and The scenario in which the target communication number matching the preset feature is extracted from the communication number included in the pre-processing CDR is proposed, and a technical solution is proposed.
参见图7,本实施例提供的通信号码处理方法,包括以下步骤:Referring to FIG. 7, the communication number processing method provided in this embodiment includes the following steps:
步骤701、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。 Step 701: Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
步骤702、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。In step 702, the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
步骤703、提取预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地。Step 703: Extract the attribution of the communication response number corresponding to each communication number in the pre-processed bill as the communication initiation number.
步骤704、计算预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量。Step 704: Calculate the number of different attributions of the communication response number corresponding to each communication initiation number in the pre-processed bill.
步骤705、判断预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量是否大于第四阈值,若是,则转到步骤706,否则流程终止。Step 705: Determine whether the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed CDR is greater than a fourth threshold. If yes, go to step 706, otherwise the process terminates.
第四阈值的初始值可以由人工设定或训练计算得到,例如:The initial value of the fourth threshold can be calculated manually or by training, for example:
根据先验值确定预处理话单包括的各通信发起号码中的目标通信号码的目标数量;Determining, according to the a priori value, a target number of the target communication numbers in each communication initiation number included in the pre-processed bill;
将各通信发起号码的平均通信时长进行排序;Sorting the average communication duration of each communication initiation number;
按照所对应的通信响应号码的不同归属地的数量递减的次序,选取目标数量的通信发起号码;Selecting a target number of communication initiation numbers according to an order in which the number of different attributions of the corresponding communication response numbers is decreasing;
将所选取的通信发起号码中所对应的通信响应号码的不同归属地的数量最小的通信发起号码所对应的通信响应号码的不同归属地的数量,确定为第四阈值的初始值。The number of different attributions of the communication response number corresponding to the communication initiation number having the smallest number of different attributions of the communication response number corresponding to the selected communication initiation number is determined as an initial value of the fourth threshold.
第四阈值可以根据实际需要,通过训练计算进行继续更新。The fourth threshold can be continuously updated through training calculation according to actual needs.
步骤706、从预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码,作为目标通信号码。Step 706: Extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold, as the target communication number.
在一个可行的实施方式中,通信号码处理装置基于预处理话单中各通信发起号码的平均通信时长,对预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量进行排序;基于预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,从预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码,作为目标通信号码。In a feasible implementation manner, the communication number processing device is based on the average communication duration of each communication initiation number in the pre-processed bill, and the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed bill. Sorting; sorting the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processing CDR, and extracting the difference of the corresponding communication response number from each communication initiation number included in the pre-processed CDR The communication initiation number of the fourth ratio with the highest number of attributions is used as the target communication number.
本实施例针对具体如何得到预处理话单中各通信号码的相应类型通信信息所具有的特征,并从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,通过在对话单进行解析的基础上得到预处理话单,分别计算预处理话单中的各通信发起号码所对应的通信响应号码的不同归属地的数量(即通信号码作为通信发起号码所具有的特征之一),从预处理话单包括的各通 信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于预设第三阈值的通信发起号码作为目标通信号码,或者,基于预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码作为目标通信号码;本申请实施例以预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量为特征,以第四阈值为预设特征,通过判断预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量与第四阈值的相对关系,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,实现了快速和准确的号码识别。The embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill. Obtaining a pre-processed bill based on the parsing of the dialog box, respectively calculating the number of different attributions of the communication response number corresponding to each communication initiation number in the pre-processed bill (ie, the communication number as the communication initiation number) One of the features), from the pre-processing bills included in each pass Extracting, in the letter initiation number, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a preset third threshold as the target communication number, or based on the communication response corresponding to each communication initiation number included in the pre-processed CDR For the order of the number of different attributions of the number, the communication initiation number of the fourth ratio of the corresponding number of the different communication destinations of the corresponding communication response number is extracted as the target communication number. The embodiment of the present application initiates each communication in the pre-processed bill. The number of different attributions of the communication response number corresponding to the number is characterized, and the fourth threshold is a preset feature, and the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed CDR is determined. The relative relationship of the fourth threshold is obtained by extracting the target communication number matching the preset feature from the communication number included in the pre-processed CDR, thereby realizing fast and accurate number identification.
实施例七Example 7
本实施例基于上述实施例,针对具体如何从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,提出解决的技术方案。This embodiment is based on the foregoing embodiment, and proposes a solution to solve the scenario of how to extract the target communication number that matches the preset feature from the communication number included in the pre-processed CDR.
参见图8,本实施例提供的通信号码处理方法,包括以下步骤:Referring to FIG. 8, the communication number processing method provided in this embodiment includes the following steps:
步骤801、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。Step 801: Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
步骤802、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。Step 802: Parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
步骤803、解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征。Step 803: Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
步骤804、使用机器学习模型分析预处理话单中各通信号码的相应类型通信信息所具有的特征。Step 804: Analyze, by using a machine learning model, features of corresponding types of communication information of each communication number in the pre-processed bill.
步骤805、判断各通信号码的相应类型通信信息所具有的特征是否与预设特征匹配,若是,转到步骤806,否则流程结束。Step 805: Determine whether the feature of the corresponding type communication information of each communication number matches the preset feature. If yes, go to step 806, otherwise the process ends.
步骤806、从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。Step 806: Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
这里,使用机器学习模型分析预处理话单中各通信号码的相应类型通信信息所具有的特征的实现方式包括:使用上述实施例三至实施例六中任意一个实施例所记载的技术方案或者技术方案的组合识别目标通信号码。Here, the implementation of the feature of the corresponding type of communication information of each communication number in the pre-processing bill is analyzed by using the machine learning model, including: using the technical solution or technology described in any one of the foregoing embodiments 3 to 6. The combination of scenarios identifies the target communication number.
机器学习模型可以采用以下任意一种模型或组合:贝叶斯分类器模型;支持向量机(SVM,Support Vector Machine)分类器模型;深度学习模型;逻辑 回归;本领域技术人员可以理解的是,机器学习模型还可以包括此处未列举的其他模型,本申请并不以此为限。The machine learning model can adopt any of the following models or combinations: Bayesian classifier model; Support Vector Machine (SVM) classifier model; deep learning model; logic Regression; those skilled in the art can understand that the machine learning model can also include other models not listed herein, and the application is not limited thereto.
本实施例针对具体如何得到从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码的场景,通过使用机器学习模型分析预处理话单中各通信号码的相应类型通信信息所具有的特征,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,实现了快速、高效的号码识别。This embodiment is directed to how to obtain a scenario in which a target communication number that matches a preset feature is extracted from a communication number included in a pre-processed CDR, and analyzes a corresponding type of communication information of each communication number in the pre-processed CDR by using a machine learning model. The feature has the target communication number matched with the preset feature extracted from the communication number included in the pre-processed CDR, thereby realizing fast and efficient number identification.
实施例八Example eight
本实施例基于实施例七,针对具体如何基于用户侧针对目标通信号码的反馈信息对机器学习模型进行训练的场景,提出解决的技术方案。This embodiment is based on the seventh embodiment, and proposes a solution to solve the scenario in which the machine learning model is trained based on the feedback information of the target communication number on the user side.
参见图9,本实施例提供的通信号码处理方法,包括以下步骤:Referring to FIG. 9, the communication number processing method provided in this embodiment includes the following steps:
步骤901、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。Step 901: Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
步骤902、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。In step 902, the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
步骤903、解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征。Step 903: Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
步骤904、分析预处理话单中各通信号码的相应类型通信信息所具有的特征,判断各通信号码的相应类型通信信息所具有的特征是否与预设特征匹配,若是,转到步骤905,否则流程结束。Step 904: Analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 905, otherwise The process ends.
步骤905、从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码;以及,向具有与识别出的目标通信号码的通信记录的通信响应号码的用户或者正在与识别出的目标通信号码通信的通信响应号码的用户进行危险提醒。Step 905: extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill; and identify or identify the user with the communication response number of the communication record with the identified target communication number. The user of the communication response number of the target communication number communication performs a danger reminder.
步骤906、接收用户侧针对目标通信号码的反馈信息。Step 906: Receive feedback information of the user side for the target communication number.
接收用户侧针对携带识别出的目标通信号码的危险提醒的反馈信息。Receiving feedback information of the user side for the danger reminder carrying the identified target communication number.
步骤907、根据用户侧针对目标通信号码的反馈信息,判断目标通信号码是否为安全号码,若是,则转到步骤908,否则流程结束。Step 907: Determine, according to the feedback information of the target communication number by the user side, whether the target communication number is a security number, and if yes, go to step 908, otherwise the process ends.
步骤908、基于识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定机器学习模型的错误率。Step 908: Determine an error rate of the machine learning model based on the number of target communication numbers that are fed back to the security number by the user side among the identified target communication numbers.
步骤909、判断机器学习模型的错误率是否大于第五阈值,若是,则转到步 骤910,否则流程结束。Step 909: Determine whether the error rate of the machine learning model is greater than a fifth threshold, and if yes, go to step Step 910, otherwise the process ends.
步骤910、基于预处理话单中安全号码的通信记录,对机器学习模型进行重新训练。Step 910: Retrain the machine learning model based on the communication record of the security number in the pre-processed bill.
这里,基于预处理话单中安全号码的通信记录,对机器学习模型至进行重新训练的一种可行的实现方式包括:Here, based on the communication record of the security number in the pre-processed bill, a feasible implementation of the machine learning model to retraining includes:
解析预处理话单中安全号码的通信记录的至少一种类型的通信信息,得到安全号码的至少一种类型的通信信息所具有的特征;Parsing at least one type of communication information of the communication record of the security number in the pre-processed bill, and obtaining characteristics of at least one type of communication information of the security number;
基于安全号码的至少一种类型的通信信息所具有的特征更新机器学习模型识别目标通信号码所使用的阈值。The feature of the at least one type of communication information based on the security number updates the threshold used by the machine learning model to identify the target communication number.
本实施例针对基于用户侧针对目标通信号码的反馈信息对机器学习模型进行训练的场景,根据目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量确定机器学习模型的错误率,并在机器学习模型的错误率大于第五阈值时,基于预处理话单中安全号码的通信记录,对机器学习模型进行重新训练;由于重新训练时依据的是预处理话单中安全号码的通信记录,因而重新训练得到的机器学习模型的准确率较高,如此,使用重新训练得到的机器学习模型进行目标通信号码的识别,能够提高号码识别的速度和准确性。In this embodiment, for the scenario in which the machine learning model is trained based on the feedback information of the target communication number on the user side, the error rate of the machine learning model is determined according to the number of target communication numbers in the target communication number that are fed back to the security number by the user side, and When the error rate of the machine learning model is greater than the fifth threshold, the machine learning model is retrained based on the communication record of the security number in the preprocessed bill; since the retraining is based on the communication record of the security number in the preprocessed bill Therefore, the machine learning model obtained by retraining has a higher accuracy rate. Thus, using the machine learning model obtained by retraining to identify the target communication number can improve the speed and accuracy of the number identification.
实施例九Example nine
本实施例基于上述任意实施例,针对识别到目标通信号码时的响应处理场景,提出解决的技术方案。This embodiment is based on any of the foregoing embodiments, and proposes a solution to the response processing scenario when the target communication number is identified.
参见图10,本实施例提供的通信号码处理方法,包括以下步骤:Referring to FIG. 10, the communication number processing method provided in this embodiment includes the following steps:
步骤1001、从通信业务设备获取第一预设时间内预设数量的通信号码的话单。Step 1001: Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
步骤1002、解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单。In step 1002, the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
步骤1003、解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征。Step 1003: Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
步骤1004、分析预处理话单中各通信号码的相应类型通信信息所具有的特征,判断各通信号码的相应类型通信信息所具有的特征是否与预设特征匹配,若是,转到步骤1005,否则流程结束。Step 1004: Analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature. If yes, go to step 1005; otherwise, The process ends.
步骤1005、从预处理话单包括的通信号码中提取出与预设特征匹配的目标 通信号码。Step 1005: Extract a target that matches the preset feature from the communication number included in the pre-processed bill Communication number.
步骤1006、确定目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度。Step 1006: Determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and the preset feature.
目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度也可以理解为是目标通信号码的相应类型通信信息所具有的特征与预设特征的差异程度;以目标通信号码的特征为目标通信号码与黄页号码的相似度为例,目标通信号码与黄页号码的相似度大于第一阈值,这里,匹配程度是指目标通信号码与黄页号码的相似度与第一阈值的差值的大小。The degree of matching between the feature of the corresponding type of communication information of the target communication number and the preset feature can also be understood as the degree of difference between the feature and the preset feature of the corresponding type of communication information of the target communication number; For example, the similarity between the target communication number and the yellow page number is as follows. The similarity between the target communication number and the yellow page number is greater than the first threshold. Here, the matching degree refers to the difference between the similarity between the target communication number and the yellow page number and the first threshold. size.
步骤1007、根据目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定目标通信号码的危险级别。Step 1007: Determine a dangerous level of the target communication number according to the matching degree of the feature of the corresponding type of communication information of the target communication number with the preset feature.
匹配程度与危险级别是正相关的关系;不同的危险级别可以对应不同数据范围内的匹配程度。The degree of matching is positively related to the level of danger; different levels of risk can correspond to the degree of matching within different data ranges.
步骤1008、基于目标通信号码的危险级别对目标通信号码的通信行为进行响应处理。Step 1008: Respond to the communication behavior of the target communication number based on the risk level of the target communication number.
响应处理的实时程度与危险级别是正相关的关系;假设定义的危险级别包括:高危、低危;此处的危险级别可以用于表征该目标通信号码是满足特定条件的通信号码的概率,例如危险级别可以用于表征该目标通信号码是诈骗号码的概率。The real-time degree of response processing is positively related to the hazard level; it is assumed that the defined hazard level includes: high risk, low risk; the hazard level here can be used to characterize the probability that the target communication number is a communication number that meets certain conditions, such as danger The level can be used to characterize the probability that the target communication number is a fraudulent number.
通信号码处理装置在确定目标通信号码的危险级别为低危时,对目标通信号码的通信行为进行响应处理的方式可以包括:向具有与目标通信号码的通信记录的通信响应号码的用户进行危险提醒,提醒该用户该目标通信号码是诈骗号码;这里,危险提醒包括语音提醒和/或文字提醒;语音提醒例如发送语音录音或客服电话提醒;文字提醒例如为短信或闪信。When the communication number processing device determines that the dangerous level of the target communication number is low risk, the manner of responding to the communication behavior of the target communication number may include: performing a danger reminder to the user having the communication response number of the communication record with the target communication number The user is reminded that the target communication number is a fraudulent number; here, the danger reminder includes a voice reminder and/or a text reminder; the voice reminder is, for example, a voice recording or a customer service telephone reminder; and the text reminder is, for example, a text message or a flash message.
参见图11b,通信号码处理装置向具有与目标通信号码的通信记录的通信响应号码的用户进行事后的危险提醒,在具有与目标通信号码的通信记录的通信响应号码的用户设备上,在用户应用的显示窗口显示如下的文字提醒信息“请提高警惕!目标通信号码是诈骗号码”;这里的用户应用包括但不限于:短信、闪信、微信、腾讯手机管家等通信类应用;当然,应用不限于通信类应用,本申请实施例中并不对此进行具体限定。Referring to FIG. 11b, the communication number processing means performs an after-the-life danger reminder to the user having the communication response number of the communication record of the target communication number, on the user device having the communication response number of the communication record with the target communication number, in the user application The display window displays the following text reminder message "Please be vigilant! The target communication number is a fraudulent number"; the user applications here include but are not limited to: SMS, Flash, WeChat, Tencent mobile butler, etc.; of course, the application does not It is limited to the communication application, which is not specifically limited in the embodiment of the present application.
通信号码处理装置在确定目标通信号码的危险级别为高危时,对目标通信号码的通信行为进行响应处理的方式可以包括:向与目标通信号码正在进行通 信的通信响应号码的用户进行即时的危险提醒(包括但不限于短信或闪信等文字提醒方式,或发送语音录音或客服电话提醒等语音提醒方式),即在该用户正在与目标通信号码进行通信的过程中提醒该用户该目标通信号码是诈骗号码;或者,直接拦截与目标通信号码正在进行的通信,且事后对用户进行危险提醒。When the communication number processing device determines that the dangerous level of the target communication number is high risk, the manner of responding to the communication behavior of the target communication number may include: communicating with the target communication number The user of the communication response number of the letter performs an immediate danger reminder (including but not limited to a text reminder such as a short message or a flash message, or a voice reminder such as a voice recording or a customer service telephone reminder), that is, the user is communicating with the target communication number. In the process of communication, the user is reminded that the target communication number is a fraudulent number; or, the ongoing communication with the target communication number is directly intercepted, and the user is reminded of danger afterwards.
本实施例针对识别到目标通信号码时的响应处理场景,基于目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度确定目标通信号码的危险级别,基于目标通信号码的危险级别对目标通信号码的通信行为进行响应处理,提醒与目标通信号码进行通信的用户提高警惕,避免被诈骗。In this embodiment, for the response processing scenario when the target communication number is identified, the risk level of the target communication number is determined based on the matching degree of the feature of the corresponding type of communication information of the target communication number and the preset feature, based on the risk level of the target communication number. Respond to the communication behavior of the target communication number, and remind the user who communicates with the target communication number to be vigilant and avoid fraud.
实施例十Example ten
本实施例基于上述任意实施例,可以应用于需要从多个通信号码中识别出满足预设条件的通信号码的场景中,例如针对通信网络中全网号码的识别,或者,针对用户指示的待识别通信号码的识别,或者,针对与当前用户进行通信的通信号码的识别等场景中;通信的业务类型包括但不限于以下任意一种业务类型或组合:语音通话;短信;闪信;数据业务(如微信),本申请并不以此为限。The present embodiment is applicable to a scenario in which it is necessary to identify a communication number that satisfies a preset condition from among a plurality of communication numbers, for example, for identification of a whole network number in a communication network, or for a user indication, based on any of the above embodiments. Identifying the identification of the communication number, or in the scene of identifying the communication number for communicating with the current user; the type of communication service includes but is not limited to any one of the following service types or combinations: voice call; short message; flash message; data service (such as WeChat), this application is not limited to this.
参见图12,本实施例提供的通信号码处理装置(基于话单分析的诈骗号码识别系统),包括:在线识别系统和离线训练系统。Referring to FIG. 12, a communication number processing apparatus (a fraudulent number identification system based on bill analysis) provided in this embodiment includes: an online identification system and an offline training system.
在线识别系统是根据运营商采集的话单记录,抽取出特征;利用机器学习模型来判断某个电话号码是不是诈骗电话;然后,对被骗用户进行提醒/回访,避免用户上当受骗,将提醒/回访的结果反馈到离线训练系统,据此对机器学习模型进行调整;离线训练系统是利用历史话单数据以及在线识别系统中提醒/回访的反馈结果,抽取出相应的特征;利用这些特征,对机器学习模型重新进行训练、调整;训练好的机器学习模型,同步更新到在线训练系统中的欺诈电话识别引擎。The online identification system extracts features according to the bill records collected by the operator; uses the machine learning model to determine whether a certain phone number is a fraudulent phone; then, the user is reminded/returned to the user to avoid being deceived, and will be reminded/ The results of the return visit are fed back to the offline training system, and the machine learning model is adjusted accordingly. The offline training system extracts the corresponding features by using the historical bill data and the feedback result of the reminder/return visit in the online identification system; using these features, The machine learning model is retrained and adjusted; the trained machine learning model is synchronized to the fraudulent phone recognition engine in the online training system.
具体地,在线识别系统根据用户通话话单记录,就可以识别出诈骗号码;在线识别系统又可以分为3个模块:话单采集模块,欺诈电话识别引擎和受骗用户提醒系统;其中,Specifically, the online identification system can identify the fraud number according to the user's call bill record; the online identification system can be further divided into three modules: a bill collection module, a fraudulent phone recognition engine, and a deceived user reminder system;
话单采集模块:主要负责用户通话记录的采集,并对采集后的话单进行预处理得到下表4列信息:CDR collection module: mainly responsible for the collection of user call records, and pre-processing the collected CDRs to obtain the following four columns of information:
表4 Table 4
主叫号码Calling number 被叫号码Called number 通话时间Call time 通话时长(秒)Call duration (seconds)
158XXXX0001158XXXX0001 186XXXX0002186XXXX0002 2016-01-15 15:36:422016-01-15 15:36:42 134134
001XX86001XX86 139XXXX0001139XXXX0001 2016-01-15 15:39:022016-01-15 15:39:02 1515
138XXXX0001138XXXX0001 139XXXX0002139XXXX0002 2016-01-15 15:38:022016-01-15 15:38:02 123123
欺诈电话识别引擎:这是在线识别系统的核心;对采集后的话单进行清洗,提取出特征,使用训练好的机器学习模型对话单抽取出的特征进行识别,判断该号码是否是诈骗电话;它又可分为3个部分:话单清洗、特征提取和诈骗号码识别;其中,Fraud phone identification engine: This is the core of the online identification system; the collected bills are cleaned, the features are extracted, and the features extracted from the trained machine learning model dialog are used to identify whether the number is a fraudulent phone; It can be divided into three parts: bill cleaning, feature extraction and fraud number identification;
1)话单清洗就是去除话单中的“脏”数据。所谓的“脏”数据,是一些异常的数据,比如内容缺失,值异常等。1) Bill cleaning is to remove the "dirty" data in the bill. The so-called "dirty" data is some abnormal data, such as missing content, abnormal values, and so on.
2)特征提取:对清洗后的话单,提取了一些特征,为下一步诈骗号码的识别做准备,特征包括:主叫号码的相似度,平均通话时长,相邻话单被叫号码的距离,通话间隔等。2) Feature extraction: After cleaning the CDRs, some features are extracted to prepare for the identification of the next scam number. The features include: the similarity of the calling number, the average call duration, and the distance of the adjacent CDRs. Call interval, etc.
主叫号码与黄页号码的相似度特征(即上述的通信发起号码与黄页号码的相似度):诈骗号码大都是主叫号码,诈骗分子通过改号软件,将主叫号码改为和黄页上号码相似的号码,比如001XX86、+0109XX88,08XXX10010(中国联通的客服电话为10010)等,计算这些号码的子串与黄页上号码的编辑距离(编辑距离表示黄页号码例如通过增加、减少、修改、移动等操作变成主叫号码的操作次数)。The similarity feature between the calling number and the yellow page number (ie, the similarity between the above-mentioned communication initiation number and the yellow page number): the fraud number is mostly the calling number, and the fraudster changes the calling number to the number on the yellow page by changing the numbering software. Similar numbers, such as 001XX86, +0109XX88, 08XXX10010 (China Unicom's customer service phone number is 10010), etc., calculate the edit distance of the substring of these numbers and the number on the yellow page (edit distance indicates the yellow page number, for example, by adding, reducing, modifying, moving The number of operations that the operation becomes the calling number).
单位时间内拨打次数(即上述的通信发起号码在单位时间内的通信次数):诈骗分子一般每个小时都会打很多通电话,而且这些电话大都是在工作时间,也就是周一至周五的08:00:00--18:00:00,在这个时段,拨打次数是均匀分布;非工作时段,电话的拨打次数一般很少,基本为0。The number of calls per unit time (that is, the number of communication times of the above communication initiation number in the unit time): The fraudsters usually make a lot of calls every hour, and most of these calls are during working hours, that is, Monday to Friday 08. :00:00--18:00:00, during this time, the number of calls is evenly distributed; during non-working hours, the number of calls made by the phone is generally small, basically 0.
平均通话时长(即上述的平均通信时长):即诈骗号码平均每个通话的通话时长,一般用户接到诈骗电话,都会很快的挂掉电话,所以诈骗平均通话时长很短,不超过20s。The average call duration (that is, the average communication duration mentioned above): that is, the average number of calls per call for the fraudulent number. When the general user receives the fraudulent call, the call will be quickly hanged, so the average call duration of the fraud is short, no more than 20s.
被叫号码所在的归属地在时间(单位:天)上的分布(即上述的通信发起号码所对应的通信响应号码的不同归属地的数量):诈骗分子通常是逐个城市的进行诈骗,因此,这些话单中的被叫号码通常都是属于某个城市的,将一定时间内被叫号码的归属城市个数作为该特征。 The distribution of the attribution of the called number in time (unit: day) (ie, the number of different attributions of the communication response number corresponding to the above-mentioned communication initiation number): the fraudster is usually fraudulently by city, therefore, The called numbers in these bills usually belong to a certain city, and the number of the cities belonging to the called number within a certain period of time is taken as the feature.
3)诈骗电话的识别:使用上述提取的特征,利用机器学习模型来识别诈骗。3) Identification of fraudulent telephones: Using the extracted features described above, machine learning models are used to identify fraud.
受骗用户提醒系统:告知诈骗通话话单中的受害用户所接收到的某通话是诈骗电话,防止受害用户上当受骗;同时将受害用户反馈的结果,是否是诈骗电话的信息提交到离线训练系统。The deceived user reminds the system: telling the victim of the fraudulent call to receive a call that is a fraudulent call, preventing the victim from being deceived; and submitting the information of the victim's feedback to the offline training system.
2.离线训练系统2. Offline training system
当发现受骗用户提醒系统反馈的机器学习模型的错误率高于域值时,离线训练系统会提取出相关的历史话单的特征,重新训练机器学习模型,调整贝叶斯分类器(这里也可以用其他的机器学习算法,比如svm分类器、逻辑回归、深度学习等方法);离线训练系统主要可分为三部分:When it is found that the error rate of the machine learning model fed back by the deceived user reminder system is higher than the domain value, the offline training system extracts the characteristics of the relevant historical bills, retrains the machine learning model, and adjusts the Bayesian classifier (here can also Use other machine learning algorithms, such as svm classifier, logistic regression, deep learning, etc.); offline training system can be divided into three parts:
a)提取历史话单:提取最近一段时间的历史话单,特别是反馈结果是错误的相关话单。a) Extract historical bills: Extract historical bills from the most recent period of time, especially if the feedback result is wrong.
b)特征提取:从历史话单中提取出特征,为下一步的模型再训练提供数据。b) Feature extraction: Extract features from historical CDRs to provide data for the next model retraining.
c)模型再训练:利用b)中提取的特征,训练贝叶斯分类器,得到新的参数,并将训练好的机器学习模型更新到在线识别系统。c) Model retraining: Using the features extracted in b), training the Bayesian classifier to obtain new parameters, and updating the trained machine learning model to the online recognition system.
这样在线识别系统与离线训练系统就形成了一个完整的闭环,离线训练系统会根据语音回访的结果,来决定是否重新训练,更新在线识别系统中诈骗号码识别模型。In this way, the online identification system and the offline training system form a complete closed loop. The offline training system will decide whether to retrain and update the fraudulent number identification model in the online identification system according to the result of the voice return visit.
本实施例提供的通信号码处理装置所产生的有益效果在于:1)不需要用户的标记信息,只需要话单记录;2)加快诈骗号码的识别速度和准确性;3)可以更加准确的识别诈骗号码;实现运营商在用户通话的过程中识别诈骗电话。The communication number processing device provided in this embodiment has the following advantages: 1) no need for the user's tag information, only the bill record is required; 2) speeding up the recognition speed and accuracy of the fraud number; 3) more accurate identification Fraud number; enables the operator to identify fraudulent calls during the user's call.
实施例十一 Embodiment 11
与前述实施例的记载相对应,本实施例还记载一种通信号码处理装置,通信号码处理装置可以用于执行本申请实施例的通信号码处理方法,通信号码处理装置可以采用各种方式来实施,例如在智能手机、固定电话、平板电脑、笔记本电脑、穿戴式设备(如智能眼镜、智能手表等)等用户设备中实施装置的全部组件,或者,在企业网关、运营商网关等网络设备中实施装置的全部组件,或者,在上述的用户设备侧或网络侧以耦合的方式实施装置中的组件,或者,通信号码处理装置还可以是用户应用的客户端或者后台服务器,例如,当用户应用为腾讯手机管家时,相应的通信号码处理装置可以为腾讯手机管家的客户端或者后台服务器;参见图13,通信号码处理装置包括: Corresponding to the description of the foregoing embodiment, the embodiment further describes a communication number processing device, which can be used to execute the communication number processing method in the embodiment of the present application, and the communication number processing device can be implemented in various manners. For example, implementing all components of the device in a user device such as a smart phone, a landline phone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.), or in a network device such as an enterprise gateway or a carrier gateway. All components of the device are implemented, or components in the device are implemented in a coupled manner on the user equipment side or the network side described above, or the communication number processing device may also be a client application or a background server of the user application, for example, when the user application When the Tencent mobile phone manager is in charge, the corresponding communication number processing device may be a client or a background server of the Tencent mobile phone housekeeper; see FIG. 13, the communication number processing device includes:
获取模块1301,用于从通信业务设备获取第一预设时间内预设数量的通信号码的话单;The obtaining module 1301 is configured to acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time;
预处理模块1302,用于解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;The pre-processing module 1302 is configured to parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR;
解析模块1303,用于解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征;The parsing module 1303 is configured to parse at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill;
提取模块1304,用于从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。The extracting module 1304 is configured to extract, from the communication number included in the pre-processed bill, a target communication number that matches the preset feature.
相比于需要在收集用户标记信息的基础上实施识别号码的现有技术,本实施例对通信号码的话单进行解析得到通信号码的相应类型通信信息所具有的特征,并基于通信号码的相应类型通信信息所具有的特征从各通信号码中识别出与预设特征匹配的目标通信号码,一方面,由于通信号码话单的生成及维护过程一般是由运营商负责,并不需要各个用户的参与,通信号码话单的获取速度和效率较高,另一方面,由于通信号码的话单是由运营商维护的客观数据,因而能够真实和完整地反映用户在一定时间间隔内的所有通信记录,如此,本申请实施例提供的技术方案以通信号码的话单为处理基础,能够提高号码识别的速度和准确性。Compared with the prior art that needs to implement the identification number on the basis of collecting the user tag information, the present embodiment parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and based on the corresponding type of the communication number. The communication information has characteristics that identify the target communication number that matches the preset feature from each communication number. On the one hand, since the generation and maintenance process of the communication number CDR is generally performed by the operator, the participation of each user is not required. The acquisition speed and efficiency of the communication number CDR are high. On the other hand, since the CDR of the communication number is objective data maintained by the operator, it can truly and completely reflect all communication records of the user within a certain time interval, so The technical solution provided by the embodiment of the present application is based on the CDR of the communication number, and can improve the speed and accuracy of the number identification.
在上述实施例的基础上,预处理模块1302,具体用于:Based on the foregoing embodiment, the pre-processing module 1302 is specifically configured to:
解析话单得到话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应通信发起号码的通信响应号码、通信起始时间和通信时长;Resolving the CDR to obtain at least one of the following types of communication information included in the CDR: a communication initiation number, a communication response number corresponding to the communication initiation number, a communication start time, and a communication duration;
提取出话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;Extracting at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number;
将所提取的各通信发起号码的通信记录组合形成预处理话单。The extracted communication records of the respective communication initiation numbers are combined to form a pre-processed bill.
在上述实施例的基础上,解析模块1303,具体用于:分别计算预处理话单中的各通信发起号码与黄页号码的编辑距离;基于编辑距离得到预处理话单中各通信发起号码与黄页号码的相似度;On the basis of the foregoing embodiment, the parsing module 1303 is specifically configured to: separately calculate an edit distance of each communication initiation number and a yellow page number in the pre-processed bill; and obtain each communication initiation number and yellow page in the pre-processed bill based on the edit distance. Number similarity;
提取模块1304,具体用于:从预处理话单包括的各通信发起号码中提取出与黄页号码的相似度大于第一阈值的通信发起号码;或者,基于预处理话单包括的各通信发起号码中与黄页号码的相似度的排序,提取出相似度最高的第一比例的通信发起号码。The extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold; or, based on each communication initiation number included in the pre-processed CDR The ordering of the similarity between the middle and the yellow page number extracts the first proportion of the communication initiation number with the highest similarity.
在上述实施例的基础上,解析模块1303,具体用于:提取预处理话单中各 通信号码作为通信发起号码的通信起始时间;计算预处理话单中各通信发起号码在单位时间内的通信次数;Based on the foregoing embodiment, the parsing module 1303 is specifically configured to: extract each of the pre-processed bills The communication number is used as the communication start time of the communication initiation number; and the number of communication times of each communication initiation number in the pre-processed bill in the unit time is calculated;
提取模块1304,具体用于:从预处理话单包括的各通信发起号码中提取出单位时间内通信次数大于第二阈值的通信发起号码;或者,基于预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码。The extraction module 1304 is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time; or, based on each communication initiation number included in the pre-processed CDR The order of the number of communication times per unit time is extracted, and the communication ratio number of the second ratio with the highest number of communication times is extracted.
在上述实施例的基础上,解析模块1303,具体用于:提取预处理话单中各通信号码作为通信发起号码的通信时长;计算预处理话单中各通信发起号码的平均通信时长;On the basis of the foregoing embodiment, the parsing module 1303 is specifically configured to: extract the communication duration of each communication number in the pre-processed bill as the communication initiation number; and calculate the average communication duration of each communication initiation number in the pre-processed bill;
提取模块1304,具体用于:从预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码;或者,基于预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码。The extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold; or, based on the average communication of each communication initiation number included in the pre-processed CDR The sorting of the duration, extracting the third ratio of the communication initiation number with the highest average communication duration.
在上述实施例的基础上,解析模块1303,具体用于:获取预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地;计算预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量;On the basis of the foregoing embodiment, the parsing module 1303 is specifically configured to: obtain the attribution of the communication response number corresponding to each communication number in the pre-processed bill as the communication initiation number; calculate each communication initiation number in the pre-processed bill The number of different attributions of the corresponding communication response number;
提取模块1304,具体用于:从预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码;或者,基于预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码。The extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold; or, based on the pre-processed CDR The order of the number of different attributions of the communication response number corresponding to each communication initiation number is extracted, and the communication generation number of the fourth ratio having the highest number of different attributions of the corresponding communication response number is extracted.
在上述实施例的基础上,提取模块1304,具体用于:使用机器学习模型分析预处理话单中各通信号码的相应类型通信信息所具有的特征,从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。On the basis of the foregoing embodiment, the extraction module 1304 is specifically configured to: use a machine learning model to analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and extract from the communication number included in the pre-processed bill. The target communication number that matches the preset feature.
实施例十二Example twelve
本实施例基于实施例十一,参见图14,本实施例记载的通信号码处理装置也包括图13中的获取模块1301、预处理模块1302、解析模块1303及提取模块1304,并且该些功能模块也具有实施例十一所记载的相应作用,在此基础上,本实施例记载的通信号码处理装置还包括:The present embodiment is based on the eleventh embodiment. Referring to FIG. 14, the communication number processing apparatus of this embodiment also includes the obtaining module 1301, the preprocessing module 1302, the parsing module 1303, and the extracting module 1304 in FIG. The communication number processing device of the embodiment further includes:
训练模块1305,用于接收用户侧针对目标通信号码的反馈信息,确定目标 通信号码是否为安全号码;基于识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定机器学习模型的错误率;机器学习模型的错误率大于第五阈值时,基于预处理话单中安全号码的通信记录,对机器学习模型进行重新训练。The training module 1305 is configured to receive feedback information of the user side for the target communication number, and determine the target. Whether the communication number is a security number; determining an error rate of the machine learning model based on the number of target communication numbers that are fed back to the security number by the user side in the identified target communication number; when the error rate of the machine learning model is greater than the fifth threshold, based on The communication record of the security number in the CDR is preprocessed, and the machine learning model is retrained.
进一步,训练模块1305,具体用于:解析预处理话单中安全号码的通信记录的至少一种类型的通信信息,得到安全号码的至少一种类型的通信信息所具有的特征;基于安全号码的至少一种类型的通信信息所具有的特征更新机器学习模型识别目标通信号码所使用的阈值。Further, the training module 1305 is specifically configured to: parse at least one type of communication information of the communication record of the security number in the pre-processed bill, and obtain a feature of the at least one type of communication information of the security number; The feature possessed by the at least one type of communication information updates the threshold used by the machine learning model to identify the target communication number.
在上述实施例的基础上,装置还包括:Based on the above embodiment, the device further includes:
响应模块1306,用于确定目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度;根据目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定目标通信号码的危险级别;基于目标通信号码的危险级别对目标通信号码的通信行为进行响应处理。The response module 1306 is configured to determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and the preset feature, and determine a target according to a matching degree of the feature of the corresponding type of the communication number of the target communication number and the preset feature. The danger level of the communication number; responding to the communication behavior of the target communication number based on the danger level of the target communication number.
在实际应用中,获取模块1301、预处理模块1302、解析模块1303、提取模块1304、训练模块1305及响应模块1306,均可由位于通信号码处理装置的中央处理器(CPU)、微处理器(MPU)、专用集成电路(ASIC)或现场可编程门阵列(FPGA)等实现。In an actual application, the obtaining module 1301, the pre-processing module 1302, the parsing module 1303, the extracting module 1304, the training module 1305, and the response module 1306 may all be configured by a central processing unit (CPU) and a microprocessor (MPU) located in the communication number processing device. ), an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
实施例十三Example thirteen
本实施例记载一种计算机可读介质,可以为ROM(例如,只读存储器、FLASH存储器、转移装置等)、磁存储介质(例如,磁带、磁盘驱动器等)、光学存储介质(例如,CD-ROM、DVD-ROM、纸卡、纸带等)以及其他熟知类型的程序存储器;计算机可读介质中存储有计算机可执行指令(例如腾讯视频等投射应用的二进制可执行指令),当执行指令时,引起至少一个处理器执行包括以下的操作:This embodiment describes a computer readable medium, which may be a ROM (eg, a read only memory, a FLASH memory, a transfer device, etc.), a magnetic storage medium (eg, a magnetic tape, a disk drive, etc.), an optical storage medium (eg, a CD- ROM, DVD-ROM, paper card, paper tape, etc.) and other well-known types of program memory; computer-readable medium storing computer-executable instructions (such as binary executable instructions for projection applications such as Tencent video), when executing instructions Causing at least one processor to perform the following operations:
从通信业务设备获取第一预设时间内预设数量的通信号码的话单;Acquiring a preset number of communication numbers in the first preset time from the communication service device;
解析话单得到话单中所包括的通信信息的类型,提取出话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;Parsing the CDR to obtain the type of the communication information included in the CDR, extracting at least one type of communication information of each communication number in the CDR and combining to form a pre-processed CDR;
解析预处理话单中各通信号码的至少一种类型的通信信息,得到预处理话单中各通信号码的相应类型通信信息所具有的特征;Parsing at least one type of communication information of each communication number in the pre-processed bill, and obtaining characteristics of the corresponding type of communication information of each communication number in the pre-processed bill;
从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。 The target communication number matching the preset feature is extracted from the communication number included in the pre-processed bill.
综上,通信号码处理装置对通信号码的话单进行解析得到通信号码的相应类型通信信息所具有的特征,并基于通信号码的相应类型通信信息所具有的特征从各通信号码中识别出与预设特征匹配的目标通信号码,一方面,由于通信号码话单的生成及维护过程一般是由运营商负责,并不需要各个用户的参与,通信号码话单的获取速度和效率较高,另一方面,由于通信号码的话单是由运营商维护的客观数据,因而能够真实和完整地反映用户在一定时间间隔内的所有通信记录,如此,本申请实施例提供的技术方案以通信号码的话单为处理基础,能够提高号码识别的速度和准确性。In summary, the communication number processing device parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and identifies and presets from each communication number based on the characteristics of the corresponding type of communication information of the communication number. The target communication number of the feature matching, on the one hand, is generally responsible for the generation and maintenance process of the communication number CDR, and does not require the participation of each user, and the acquisition speed and efficiency of the communication number CDR are high, on the other hand Because the CDR of the communication number is the objective data maintained by the operator, it can truly and completely reflect all the communication records of the user in a certain time interval. Therefore, the technical solution provided by the embodiment of the present application is processed by the CDR of the communication number. Basic, can improve the speed and accuracy of number identification.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用硬件实施例、软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the application can take the form of a hardware embodiment, a software embodiment or an embodiment in combination with software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。 The above is only the preferred embodiment of the present application and is not intended to limit the scope of the present application.

Claims (20)

  1. 一种通信号码处理方法,其特征在于,所述方法包括:A communication number processing method, characterized in that the method comprises:
    从通信业务设备获取第一预设时间内预设数量的通信号码的话单;Acquiring a preset number of communication numbers in the first preset time from the communication service device;
    解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;Parsing the CDR to obtain the type of the communication information included in the CDR, extracting at least one type of communication information of each communication number in the CDR, and combining to form a pre-processed CDR;
    解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征;以及Parsing at least one type of communication information of each communication number in the pre-processed bill, and obtaining characteristics of corresponding types of communication information of each communication number in the pre-processed bill;
    从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。Extracting a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
  2. 根据权利要求1所述的方法,其特征在于,所述解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单,包括:The method according to claim 1, wherein said parsing said bill obtains a type of communication information included in said bill, and extracting at least one type of each communication number in said bill The communication information is combined and formed into a pre-processed bill, including:
    解析所述话单得到所述话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应所述通信发起号码的通信响应号码、通信起始时间和通信时长;Parsing the CDR to obtain at least one of the following types of communication information included in the CDR: a communication initiation number, a communication response number corresponding to the communication initiation number, a communication start time, and a communication duration;
    提取出所述话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;以及Extracting at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number;
    将所提取的各通信发起号码的通信记录组合形成所述预处理话单。The extracted communication records of the respective communication initiation numbers are combined to form the pre-processed bill.
  3. 根据权利要求1所述的方法,其特征在于,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:The method according to claim 1, wherein the parsing at least one type of communication information of each communication number in the pre-processed bill obtains a corresponding type of communication of each communication number in the pre-processed bill The characteristics of the information, including:
    分别计算所述预处理话单中的各通信发起号码与黄页号码的编辑距离;以及Calculating, respectively, an edit distance of each communication initiation number and a yellow page number in the preprocessed bill;
    基于所述编辑距离得到所述预处理话单中各通信发起号码与黄页号码的相似度;Obtaining a similarity between each communication initiation number and a yellow page number in the preprocessed bill based on the edit distance;
    从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:Extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature, including:
    从所述预处理话单包括的各通信发起号码中提取出与所述黄页号码的相似度大于第一阈值的通信发起号码;Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold;
    或者,基于所述预处理话单包括的各通信发起号码中与所述黄页号码的相似度的排序,提取出相似度最高的第一比例的通信发起号码, Or extracting, according to the ranking of the similarity of the yellow page numbers among the communication initiation numbers included in the pre-processed CDR, extracting the first proportion of the communication initiation number with the highest similarity,
    其中,所述编辑距离表示黄页号码变成通信发起号码的操作次数。The edit distance indicates the number of operations in which the yellow page number becomes the communication initiation number.
  4. 根据权利要求1所述的方法,其特征在于,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:The method according to claim 1, wherein the parsing at least one type of communication information of each communication number in the pre-processed bill obtains a corresponding type of communication of each communication number in the pre-processed bill The characteristics of the information, including:
    提取所述预处理话单中各通信号码作为通信发起号码的通信起始时间;以及Extracting a communication start time of each communication number in the pre-processed bill as a communication initiation number;
    计算所述预处理话单中各通信发起号码在单位时间内的通信次数;Calculating the number of communications of each communication initiation number in the pre-processed bill in a unit time;
    从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:Extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature, including:
    从所述预处理话单包括的各通信发起号码中提取出单位时间内通信次数大于第二阈值的通信发起号码;Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time;
    或者,基于所述预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码。Alternatively, based on the order of the number of communication times of each communication initiation number included in the pre-processed CDR in a unit time, the communication initiation number of the second ratio with the highest number of communication times is extracted.
  5. 根据权利要求1所述的方法,其特征在于,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:The method according to claim 1, wherein the parsing at least one type of communication information of each communication number in the pre-processed bill obtains a corresponding type of communication of each communication number in the pre-processed bill The characteristics of the information, including:
    提取所述预处理话单中各通信号码作为通信发起号码的通信时长;以及Extracting a communication duration of each communication number in the pre-processed bill as a communication initiation number;
    计算所述预处理话单中各通信发起号码的平均通信时长;Calculating an average communication duration of each communication initiation number in the pre-processed bill;
    从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:Extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature, including:
    从所述预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码;Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold;
    或者,基于所述预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码。Alternatively, based on the ordering of the average communication durations of the communication initiation numbers included in the pre-processed CDR, the communication initiation number of the third ratio having the highest average communication duration is extracted.
  6. 根据权利要求1所述的方法,其特征在于,所述解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征,包括:The method according to claim 1, wherein the parsing at least one type of communication information of each communication number in the pre-processed bill obtains a corresponding type of communication of each communication number in the pre-processed bill The characteristics of the information, including:
    获取所述预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地;以及Obtaining a attribution of a communication response number corresponding to each communication number in the pre-processed bill as a communication initiation number;
    计算所述预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量; Calculating, by the number of different attributions of the communication response number corresponding to each communication initiation number in the pre-processed bill;
    从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:Extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature, including:
    从所述预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码;Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold;
    或者,基于所述预处理话单包括的各通信发起号码所对应的通信响应号码的不同归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码。Or, based on the order of the number of different attributions of the communication response numbers corresponding to the communication initiation numbers included in the pre-processed CDR, extracting the fourth proportion of communication with the highest number of different attributions of the corresponding communication response number Initiate the number.
  7. 根据权利要求1所述的方法,其特征在于,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码,包括:The method according to claim 1, wherein the target communication number matching the preset feature is extracted from the communication number included in the pre-processed CDR, including:
    使用机器学习模型分析所述预处理话单中各通信号码的相应类型通信信息所具有的特征,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。The machine learning model is used to analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and the target communication number matching the preset feature is extracted from the communication number included in the pre-processed bill.
  8. 根据权利要求7所述的方法,其特征在于,所述方法还包括:The method of claim 7, wherein the method further comprises:
    接收用户侧针对目标通信号码的反馈信息,确定所述目标通信号码是否为安全号码;Receiving feedback information of the user side for the target communication number, determining whether the target communication number is a security number;
    基于所述识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定所述机器学习模型的错误率;以及Determining an error rate of the machine learning model based on the number of target communication numbers in the identified target communication number that are fed back as a security number by the user side;
    在机器学习模型的错误率大于第五阈值时,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型进行重新训练。When the error rate of the machine learning model is greater than the fifth threshold, the machine learning model is retrained based on the communication record of the security number in the pre-processed bill.
  9. 根据权利要求8所述的方法,其特征在于,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型至进行重新训练,包括:The method according to claim 8, wherein the retraining of the machine learning model based on the communication record of the security number in the pre-processed bill comprises:
    解析所述预处理话单中所述安全号码的通信记录的至少一种类型的通信信息,得到所述安全号码的至少一种类型的通信信息所具有的特征;以及Parsing at least one type of communication information of the communication record of the security number in the pre-processed bill, obtaining characteristics of at least one type of communication information of the security number;
    基于所述安全号码的至少一种类型的通信信息所具有的特征更新所述机器学习模型识别所述目标通信号码所使用的阈值。Updating a threshold used by the machine learning model to identify the target communication number based on characteristics of at least one type of communication information of the security number.
  10. 根据权利要求1所述的方法,其特征在于,所述从预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码之后,所述方法还包括:The method according to claim 1, wherein after the extracting the target communication number that matches the preset feature from the communication number included in the pre-processed CDR, the method further includes:
    确定所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度;Determining a degree of matching between a feature of the corresponding type of communication information of the target communication number and a preset feature;
    根据所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定所述目标通信号码的危险级别;以及 Determining a danger level of the target communication number according to a matching degree of a feature of the corresponding type communication information of the target communication number and a preset feature;
    基于所述目标通信号码的危险级别对所述目标通信号码的通信行为进行响应处理。Responding to the communication behavior of the target communication number based on the danger level of the target communication number.
  11. 一种通信号码处理装置,其特征在于,所述装置包括:A communication number processing device, characterized in that the device comprises:
    获取模块,用于从通信业务设备获取第一预设时间内预设数量的通信号码的话单;An obtaining module, configured to acquire, from the communication service device, a CDR of a preset number of communication numbers in a first preset time;
    预处理模块,用于解析所述话单得到所述话单中所包括的通信信息的类型,提取出所述话单中各通信号码的至少一种类型的通信信息并组合形成预处理话单;a pre-processing module, configured to parse the CDR to obtain a type of communication information included in the CDR, extract at least one type of communication information of each communication number in the CDR, and combine to form a pre-processed CDR ;
    解析模块,用于解析所述预处理话单中各通信号码的至少一种类型的通信信息,得到所述预处理话单中各通信号码的相应类型通信信息所具有的特征;以及a parsing module, configured to parse at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of a corresponding type of communication information of each communication number in the pre-processed bill;
    提取模块,用于从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。And an extracting module, configured to extract, from the communication number included in the pre-processed bill, a target communication number that matches the preset feature.
  12. 根据权利要求11所述的装置,其特征在于,The device of claim 11 wherein:
    所述预处理模块,具体用于:The preprocessing module is specifically configured to:
    解析所述话单得到所述话单中所包括的以下类型的通信信息中的至少一种:通信发起号码、对应所述通信发起号码的通信响应号码、通信起始时间和通信时长;Parsing the CDR to obtain at least one of the following types of communication information included in the CDR: a communication initiation number, a communication response number corresponding to the communication initiation number, a communication start time, and a communication duration;
    提取出所述话单中各通信发起号码所关联的至少一种类型的通信信息形成各通信发起号码的通信记录;以及Extracting at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number;
    将所提取的各通信发起号码的通信记录组合形成所述预处理话单。The extracted communication records of the respective communication initiation numbers are combined to form the pre-processed bill.
  13. 根据权利要求11所述的装置,其特征在于,The device of claim 11 wherein:
    所述解析模块,具体用于:The parsing module is specifically configured to:
    分别计算所述预处理话单中的各通信发起号码与黄页号码的编辑距离;以及Calculating, respectively, an edit distance of each communication initiation number and a yellow page number in the preprocessed bill;
    基于所述编辑距离得到所述预处理话单中各通信发起号码与黄页号码的相似度;Obtaining a similarity between each communication initiation number and a yellow page number in the preprocessed bill based on the edit distance;
    所述提取模块,具体用于:The extraction module is specifically configured to:
    从所述预处理话单包括的各通信发起号码中提取出与所述黄页号码的相似度大于第一阈值的通信发起号码;或者,Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold; or
    基于所述预处理话单包括的各通信发起号码中与所述黄页号码的相似度的 排序,提取出相似度最高的第一比例的通信发起号码,And determining, according to the similarity of the yellow page number, each communication initiation number included in the pre-processed bill Sorting, extracting the first proportion of the communication initiation number with the highest similarity,
    其中,所述编辑距离表示黄页号码变成通信发起号码的操作次数。The edit distance indicates the number of operations in which the yellow page number becomes the communication initiation number.
  14. 根据权利要求11所述的装置,其特征在于,The device of claim 11 wherein:
    所述解析模块,具体用于:The parsing module is specifically configured to:
    提取所述预处理话单中各通信号码作为通信发起号码的通信起始时间;以及Extracting a communication start time of each communication number in the pre-processed bill as a communication initiation number;
    计算所述预处理话单中各通信发起号码在单位时间内的通信次数;Calculating the number of communications of each communication initiation number in the pre-processed bill in a unit time;
    所述提取模块,具体用于:The extraction module is specifically configured to:
    从所述预处理话单包括的各通信发起号码中提取出单位时间内通信次数大于第二阈值的通信发起号码;或者,Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time; or
    基于所述预处理话单包括的各通信发起号码在单位时间内的通信次数的排序,提取出通信次数最高的第二比例的通信发起号码。And extracting, according to the order of the communication times of each communication initiation number included in the pre-processing CDR, the communication protocol number of the second ratio having the highest communication number.
  15. 根据权利要求11所述的装置,其特征在于,The device of claim 11 wherein:
    所述解析模块,具体用于:The parsing module is specifically configured to:
    提取所述预处理话单中各通信号码作为通信发起号码的通信时长;以及Extracting a communication duration of each communication number in the pre-processed bill as a communication initiation number;
    计算所述预处理话单中各通信发起号码的平均通信时长;Calculating an average communication duration of each communication initiation number in the pre-processed bill;
    所述提取模块,具体用于:The extraction module is specifically configured to:
    从所述预处理话单包括的各通信发起号码中提取出平均通信时长大于第三阈值的通信发起号码;或者,Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold; or
    基于所述预处理话单包括的各通信发起号码的平均通信时长的排序,提取出平均通信时长最高的第三比例的通信发起号码。And extracting, according to the order of the average communication duration of each communication initiation number included in the pre-processed CDR, a communication initiation number of a third ratio having the highest average communication duration.
  16. 根据权利要求11所述的装置,其特征在于,The device of claim 11 wherein:
    所述解析模块,具体用于:The parsing module is specifically configured to:
    获取所述预处理话单中各通信号码作为通信发起号码时对应的通信响应号码的归属地;以及Obtaining a attribution of a communication response number corresponding to each communication number in the pre-processed bill as a communication initiation number;
    计算所述预处理话单中各通信发起号码所对应的通信响应号码的不同归属地的数量;Calculating, by the number of different attributions of the communication response number corresponding to each communication initiation number in the pre-processed bill;
    所述提取模块,具体用于:The extraction module is specifically configured to:
    从所述预处理话单包括的各通信发起号码中提取出所对应的通信响应号码的不同归属地的数量大于第四阈值的通信发起号码;或者,Extracting, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold; or
    基于所述预处理话单包括的各通信发起号码所对应的通信响应号码的不同 归属地的数量的排序,提取出所对应的通信响应号码的不同归属地的数量最高的第四比例的通信发起号码。Different communication response numbers corresponding to each communication initiation number included in the pre-processed CDR The order of the number of attributions is extracted, and the communication initiation number of the fourth ratio having the highest number of different attributions of the corresponding communication response number is extracted.
  17. 根据权利要求11所述的装置,其特征在于,The device of claim 11 wherein:
    所述提取模块,具体用于:使用机器学习模型分析所述预处理话单中各通信号码的相应类型通信信息所具有的特征,从所述预处理话单包括的通信号码中提取出与预设特征匹配的目标通信号码。The extracting module is specifically configured to: use a machine learning model to analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and extract and pre-take the communication number included in the pre-processed bill Set the target communication number for the feature match.
  18. 根据权利要求17所述的装置,其特征在于,所述装置还包括:The device according to claim 17, wherein the device further comprises:
    训练模块,用于:Training module for:
    接收用户侧针对目标通信号码的反馈信息,确定所述目标通信号码是否为安全号码;Receiving feedback information of the user side for the target communication number, determining whether the target communication number is a security number;
    基于所述识别出的目标通信号码中被用户侧反馈为安全号码的目标通信号码的数量,确定所述机器学习模型的错误率;以及Determining an error rate of the machine learning model based on the number of target communication numbers in the identified target communication number that are fed back as a security number by the user side;
    在机器学习模型的错误率大于第五阈值时,基于所述预处理话单中所述安全号码的通信记录,对所述机器学习模型进行重新训练。When the error rate of the machine learning model is greater than the fifth threshold, the machine learning model is retrained based on the communication record of the security number in the pre-processed bill.
  19. 根据权利要求18所述的装置,其特征在于,The device of claim 18, wherein
    所述训练模块,具体用于:The training module is specifically configured to:
    解析所述预处理话单中所述安全号码的通信记录的至少一种类型的通信信息,得到所述安全号码的至少一种类型的通信信息所具有的特征;以及Parsing at least one type of communication information of the communication record of the security number in the pre-processed bill, obtaining characteristics of at least one type of communication information of the security number;
    基于所述安全号码的至少一种类型的通信信息所具有的特征更新所述机器学习模型识别所述目标通信号码所使用的阈值。Updating a threshold used by the machine learning model to identify the target communication number based on characteristics of at least one type of communication information of the security number.
  20. 根据权利要求11所述的装置,其特征在于,所述装置还包括:The device according to claim 11, wherein the device further comprises:
    响应模块,用于确定所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度;根据所述目标通信号码的相应类型通信信息所具有的特征与预设特征的匹配程度,确定所述目标通信号码的危险级别;以及基于所述目标通信号码的危险级别对所述目标通信号码的通信行为进行响应处理。 a response module, configured to determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and a preset feature; and a matching degree between a feature of the corresponding type of communication information of the target communication number and a preset feature Determining a risk level of the target communication number; and responding to a communication behavior of the target communication number based on a danger level of the target communication number.
PCT/CN2017/081813 2016-04-25 2017-04-25 Communication number processing method and apparatus WO2017186090A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610261923.1 2016-04-25
CN201610261923.1A CN107306306B (en) 2016-04-25 2016-04-25 Communication number processing method and device

Publications (1)

Publication Number Publication Date
WO2017186090A1 true WO2017186090A1 (en) 2017-11-02

Family

ID=60150219

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/081813 WO2017186090A1 (en) 2016-04-25 2017-04-25 Communication number processing method and apparatus

Country Status (2)

Country Link
CN (1) CN107306306B (en)
WO (1) WO2017186090A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112887491A (en) * 2019-11-29 2021-06-01 中国电信股份有限公司 User missing information acquisition method and device
CN113206909A (en) * 2021-04-30 2021-08-03 中国银行股份有限公司 Crank call interception method and device
CN114745211A (en) * 2022-04-26 2022-07-12 贵阳朗玛通信科技有限公司 Method and device based on ticket data fast matching strategy

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108124065A (en) * 2017-12-05 2018-06-05 浙江鹏信信息科技股份有限公司 A kind of method junk call content being identified with disposal
CN109963276A (en) * 2017-12-26 2019-07-02 恒为科技(上海)股份有限公司 A kind of call bill data processing method and processing device
CN108391223B (en) * 2018-02-12 2020-08-11 中国联合网络通信集团有限公司 Method and device for determining lost user
CN110401779B (en) * 2018-04-24 2022-02-01 中国移动通信集团有限公司 Method and device for identifying telephone number and computer readable storage medium
CN109474755B (en) * 2018-10-30 2020-10-30 济南大学 Abnormal telephone active prediction method, system and computer readable storage medium based on sequencing learning and ensemble learning
CN110087230B (en) * 2019-04-26 2020-09-15 同盾控股有限公司 Data processing method, data processing device, storage medium and electronic equipment
CN111031546B (en) * 2019-11-29 2023-09-19 武汉烽火众智数字技术有限责任公司 LR model training method applied to telephone number analysis and application method
CN111131627B (en) * 2019-12-20 2021-12-07 珠海高凌信息科技股份有限公司 Method, device and readable medium for detecting personal harmful call based on streaming data atlas
CN113596260B (en) * 2020-04-30 2022-12-16 中国移动通信集团广东有限公司 Abnormal telephone number detection method and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101217820A (en) * 2008-01-15 2008-07-09 中兴通讯股份有限公司 An identification system and identification method on disturbance telephone numbers
CN101426203A (en) * 2007-11-02 2009-05-06 华为技术有限公司 Method and equipment for recognizing vicious disturbance call
EP2278783A1 (en) * 2009-06-26 2011-01-26 Vodafone Holding GmbH Device and method for recognising desired and/or undesired telephone calls depending on the usage habits of a telephone user
CN102892117A (en) * 2012-09-11 2013-01-23 北京中创信测科技股份有限公司 Method and system for monitoring crank call
CN105451234A (en) * 2015-11-09 2016-03-30 北京市天元网络技术股份有限公司 Signaling interactive data-based suspicious number analyzing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101426203A (en) * 2007-11-02 2009-05-06 华为技术有限公司 Method and equipment for recognizing vicious disturbance call
CN101217820A (en) * 2008-01-15 2008-07-09 中兴通讯股份有限公司 An identification system and identification method on disturbance telephone numbers
EP2278783A1 (en) * 2009-06-26 2011-01-26 Vodafone Holding GmbH Device and method for recognising desired and/or undesired telephone calls depending on the usage habits of a telephone user
CN102892117A (en) * 2012-09-11 2013-01-23 北京中创信测科技股份有限公司 Method and system for monitoring crank call
CN105451234A (en) * 2015-11-09 2016-03-30 北京市天元网络技术股份有限公司 Signaling interactive data-based suspicious number analyzing method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112887491A (en) * 2019-11-29 2021-06-01 中国电信股份有限公司 User missing information acquisition method and device
CN112887491B (en) * 2019-11-29 2023-03-21 中国电信股份有限公司 User missing information acquisition method and device
CN113206909A (en) * 2021-04-30 2021-08-03 中国银行股份有限公司 Crank call interception method and device
CN114745211A (en) * 2022-04-26 2022-07-12 贵阳朗玛通信科技有限公司 Method and device based on ticket data fast matching strategy

Also Published As

Publication number Publication date
CN107306306B (en) 2020-04-07
CN107306306A (en) 2017-10-31

Similar Documents

Publication Publication Date Title
WO2017186090A1 (en) Communication number processing method and apparatus
CN109429230B (en) Communication fraud identification method and system
CN109600752B (en) Deep clustering fraud detection method and device
CN107566358B (en) Risk early warning prompting method, device, medium and equipment
CN108243049B (en) Telecommunication fraud identification method and device
CN109995929B (en) Operation and account information processing method and device
CN104883671B (en) A kind of judgment method and system of refuse messages
CN105045911B (en) Label generating method and equipment for user to mark
CN102438205B (en) Method and system for pushing service based on action of mobile user
CN110611929A (en) Abnormal user identification method and device
WO2017035945A1 (en) Method, device and system for marking conversation caller
US20230209351A1 (en) Assessing risk of fraud associated with user unique identifier using telecommunications data
CN113206909A (en) Crank call interception method and device
CN110113748B (en) Crank call monitoring method and device
CN109474755B (en) Abnormal telephone active prediction method, system and computer readable storage medium based on sequencing learning and ensemble learning
KR20170006158A (en) System and method for detecting fraud usage of message
CN107172622A (en) The identification of pseudo-base station note and analysis method, apparatus and system
CN106791230B (en) Telephone number identification method and device
CN117252429A (en) Risk user identification method and device, storage medium and electronic equipment
CN110705926A (en) Method, device and system for acquiring logistics object distribution information
CN111131627B (en) Method, device and readable medium for detecting personal harmful call based on streaming data atlas
CN109711984B (en) Pre-loan risk monitoring method and device based on collection urging
CN111062422A (en) Method and device for systematic identification of road loan
CN111464687A (en) Strange call request processing method and device
CN116418915A (en) Abnormal number identification method, device, server and storage medium

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17788742

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 17788742

Country of ref document: EP

Kind code of ref document: A1