WO2017186090A1 - Procédé et appareil de traitement de numéros de communication - Google Patents
Procédé et appareil de traitement de numéros de communication Download PDFInfo
- Publication number
- WO2017186090A1 WO2017186090A1 PCT/CN2017/081813 CN2017081813W WO2017186090A1 WO 2017186090 A1 WO2017186090 A1 WO 2017186090A1 CN 2017081813 W CN2017081813 W CN 2017081813W WO 2017186090 A1 WO2017186090 A1 WO 2017186090A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- communication
- processed
- initiation
- cdr
- bill
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M1/00—Substation equipment, e.g. for use by subscribers
- H04M1/66—Substation equipment, e.g. for use by subscribers with means for preventing unauthorised or fraudulent calling
- H04M1/663—Preventing unauthorised calls to a telephone set
Definitions
- the present application relates to data processing technologies in the field of communication technologies, and in particular, to a communication number processing method and apparatus.
- Telecommunications fraud refers to the criminal act of criminals making false information, setting up scams, conducting remote and contactless fraud on the victim, and inducing the victim to make money or transfer money to criminals through telephone, internet and SMS.
- the public security organs of the country established a total of 590,000 telecom fraud cases, an increase of 32.5% year-on-year, causing economic losses. 22.2 billion yuan; and behind each case, it may be a family broken by fraud.
- the prior art collects the tag information of the user by using the application software (app) on the mobile phone. If a certain number is found to be simultaneously marked as a fraudulent number by multiple users, it is considered The number is a fraudulent number and alerts the user who is talking to the fraudulent number to be vigilant to avoid being scammed.
- the prior art needs to collect user tag information.
- the probability that the user marks the number is relatively low, and many users often do not need to mark the type of the number when a strange call is received, and the prior art needs After collecting enough user tags, the number can be considered as a fraudulent number. Therefore, the prior art fraudulent number is recognized slowly and inefficiently.
- the user marks the number is subjective behavior, many When the user receives some harassing calls, such as advertisements and other malicious calls, the harassment number is often marked as a fraudulent number. Therefore, the prior art scam number identification accuracy is low.
- the embodiment of the present application is expected to provide a communication number processing method and apparatus, which can provide The speed and accuracy of high number identification.
- an embodiment of the present application provides a communication number processing method, where the method includes:
- the parsing the CDR obtains the type of the communication information included in the CDR, extracts at least one type of communication information of each communication number in the CDR, and combines to form a pre-processed CDR ,include:
- the extracted communication records of the respective communication initiation numbers are combined to form the pre-processed bill.
- the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
- the ordering of the similarity extracts the first proportion of the communication initiation number with the highest similarity.
- the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
- the communication initiation number of the second ratio with the highest number of communication times is extracted.
- the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
- the communication initiation number of the third ratio having the highest average communication duration is extracted.
- the parsing the at least one type of communication information of each communication number in the pre-processed bill, and obtaining the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill including:
- Extracting a target pass signal matching the preset feature from the communication number included in the pre-processed bill Code including:
- extracting, from the communication number included in the pre-processed CDR, a target communication number that matches the preset feature including:
- the machine learning model is used to analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and the target communication number matching the preset feature is extracted from the communication number included in the pre-processed bill.
- the method further includes:
- the machine learning model is retrained based on the communication record of the security number in the pre-processed bill.
- the machine learning model is retrained based on the communication record of the security number in the pre-processed bill, including:
- the method further includes:
- responding to the communication behavior of the target communication number including: performing danger to a user having a communication response number of the communication record with the target communication number Reminder; wherein the danger reminder includes a voice reminder and/or a text reminder;
- the real-time level of response processing is positively correlated with the level of danger.
- the embodiment of the present application provides a communication number processing apparatus, where the apparatus includes:
- An obtaining module configured to acquire, from the communication service device, a CDR of a preset number of communication numbers in a first preset time
- a pre-processing module configured to parse the CDR to obtain a type of communication information included in the CDR, extract at least one type of communication information of each communication number in the CDR, and combine to form a pre-processed CDR ;
- a parsing module configured to parse at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill;
- an extracting module configured to extract, from the communication number included in the pre-processed bill, a target communication number that matches the preset feature.
- the pre-processing module is specifically configured to:
- the extracted communication records of the respective communication initiation numbers are combined to form the pre-processed bill.
- the parsing module is specifically configured to: separately calculate an edit distance of each communication initiation number and a yellow page number in the preprocessed bill; and obtain, according to the edit distance, each communication initiation in the preprocessed bill The similarity between the number and the yellow page number, wherein the edit distance indicates the number of operations of the yellow page number becoming the communication initiation number;
- the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold; or, based on the pre-processing words The ranking of the similarity of the yellow page numbers among the communication initiation numbers included in the single is extracted, and the communication initiation number of the first ratio with the highest similarity is extracted.
- the parsing module is specifically configured to: extract a communication start time of each communication number in the pre-processed bill as a communication initiation number; and calculate each communication initiation number in the pre-processed bill in a unit time Number of communications;
- the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time; or, based on the pre-processed CDR The number of communication times of each communication initiation number in the unit time is sorted, and the communication ratio number of the second ratio with the highest communication number is extracted.
- the parsing module is specifically configured to: extract a communication duration of each communication number in the pre-processed bill as a communication initiation number; and calculate an average communication duration of each communication initiation number in the pre-processed bill;
- the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold; or, based on each communication included in the pre-processed CDR The order of the average communication duration of the originating number is extracted, and the third ratio of the communication originating number with the highest average communication duration is extracted.
- the parsing module is specifically configured to: acquire a attribution of a communication response number corresponding to each communication number in the pre-processed bill as a communication initiation number; and calculate each communication initiation number in the pre-processed bill The number of different attributions of the corresponding communication response number;
- the extracting module is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold; or, based on the The order of the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processing CDR is extracted, and the communication initiation number of the fourth ratio having the highest number of different attributions of the corresponding communication response number is extracted.
- the extracting module is specifically configured to: analyze, by using a machine learning model, a feature of the corresponding type of communication information of each communication number in the pre-processed bill, from the communication number included in the pre-processed bill Extract the target communication number that matches the preset feature.
- the device further includes:
- a training module configured to receive feedback information of the user side for the target communication number, and determine the target Whether the communication number is a security number; determining an error rate of the machine learning model based on the number of the target communication numbers that are fed back to the security number by the user side in the identified target communication number; the error rate of the machine learning model is greater than the fifth At the threshold, the machine learning model is retrained based on the communication record of the security number in the pre-processed bill.
- the training module is configured to: parse at least one type of communication information of the communication record of the security number in the pre-processed bill, and obtain at least one type of communication information of the security number. Having a feature; updating a threshold used by the machine learning model to identify the target communication number based on characteristics of at least one type of communication information of the security number.
- the device further includes:
- a response module configured to determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and a preset feature; and a matching degree between a feature of the corresponding type of communication information of the target communication number and a preset feature Determining a risk level of the target communication number; and responding to the communication behavior of the target communication number based on the risk level of the target communication number.
- the embodiment of the present application obtains the characteristics of the corresponding type of communication information of each communication number by parsing the CDR of the preset number of communication numbers in the first preset time, and based on each The corresponding type of communication information of the communication number has characteristics for extracting the target communication number matching the preset feature from each communication number.
- the communication number CDR is objective data maintained by the operator, and can be truly and completely reflected.
- the embodiment of the present application uses the communication number CDR as the processing basis, which can improve the accuracy of the number identification.
- the generation and maintenance process of the CDR generally does not require each user. The direct participation of the operator is responsible for the speed and efficiency of the communication number CDRs. Therefore, the embodiment of the present application can improve the speed and accuracy of the number identification.
- FIG. 1 is a schematic diagram of an optional application scenario of a method for processing a communication number in an embodiment of the present application
- FIG. 2 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 1 of the present application;
- FIG. 3 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 2 of the present application;
- FIG. 5 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 4 of the present application.
- FIG. 6 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 5 of the present application.
- FIG. 7 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 6 of the present application.
- FIG. 8 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 7 of the present application.
- FIG. 9 is an optional schematic flowchart of a method for processing a communication number according to Embodiment 8 of the present application.
- FIG. 10 is an optional schematic flowchart of a method for processing a communication number in Embodiment 9 of the present application.
- FIG. 11 is an optional schematic diagram of a user application running on a user equipment in a state of receiving a user indication according to an embodiment of the present disclosure
- FIG. 11b is an optional schematic diagram of a user application running on a user equipment in a text reminding state according to an embodiment of the present application
- FIG. 12 is an optional structural diagram of a communication number processing apparatus according to an embodiment of the present application.
- FIG. 13 is another schematic structural diagram of a communication number processing apparatus according to an embodiment of the present application.
- FIG. 14 is still another schematic structural diagram of a communication number processing apparatus according to an embodiment of the present application.
- the embodiment of the present application describes a communication number processing method.
- FIG. 1 an optional application scenario of the communication number processing method in the embodiment of the present application, the user equipment 11 , the user equipment 12 , the user equipment 13 , and the network device 14 .
- communication service device 15 (such as carrier gateway or enterprise gateway), communication service device 15, application background server 16 respectively access communication network (such as wireless network or wired network), communication service device 15 such as business support system (BSS, Business Support System) / An operation support system (OSS), or a telecommunication switch;
- BSS business support system
- OSS operation support system
- the communication service device 15 is configured to provide a bill for a communication number;
- the network device 14 is configured to provide service support for each user equipment accessing the communication network;
- 16 is used to provide service support for the application; here, corresponding to the background server 16 of the application, the client of the application installed on the user equipment is also used to provide service support for the application;
- the application may be a communication application, for example: Tencent Mobile phone housekeeper, WeChat, Tencent mailbox, etc.
- applications are not limited to communication applications.
- the application device does not specifically limit this; in the above scenario, the number of user equipments is at least one, and each user equipment is associated with at least one different communication number, for example, the user equipment 11 shown in FIG. 1 is associated with at least one communication.
- the number A, the user equipment 12 is associated with at least one communication number B, and the user equipment 13 is associated with at least one communication number C.
- the communication number A, the communication number B and the communication number C are different from each other. Applicable to the above scenario, the communication number that satisfies the preset condition is identified from the plurality of communication numbers.
- the embodiment of the present application further describes a communication number processing apparatus, which can be used to execute the embodiment of the present application.
- Communication number processing method; the communication number processing device can be implemented in various manners, for example, in a user device such as a smart phone, a fixed telephone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.) All components of the device, or all components of the device are implemented in a network device such as an enterprise gateway or a carrier gateway, or the components in the device are implemented in a coupled manner on the user device side or the network side, or the communication number processing device It can also be a client application or a background server of the user application. For example, when the user application is a Tencent mobile phone housekeeper, the corresponding communication number processing device can be a client or a background server of the Tencent mobile phone housekeeper.
- the embodiment provides a communication number processing method, which can be applied to a scenario in which a communication number that satisfies a preset condition needs to be identified from multiple communication numbers, for example, identification of a network-wide number in a communication network, or The identification of the communication number to be identified, or the identification of the communication number for communicating with the current user;
- the type of communication service includes but is not limited to any one of the following service types or combinations: voice call; short message; flash message; Data services (such as WeChat), this application is not limited to this.
- the communication number processing method provided in this embodiment includes the following steps:
- Step 201 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
- the communication service device may include a telecommunication support system device, such as a BSS/OSS, or a telecommunication switch; the first preset time may be flexibly set by the user or the operator according to actual conditions such as actual service requirements; the communication number is not limited to the mobile phone number and fixed. a number or the like; the communication number may include, for example, all communication numbers in the communication network, or a communication number to be identified indicated by the user, or a communication number to be called with the current user; wherein the communication number indicated by the user, such as the user The communication number to be identified specified in the application running on the user equipment (such as the Tencent mobile phone housekeeper), or the user sends an indication message carrying the communication number to be identified to the operator server.
- a telecommunication support system device such as a BSS/OSS, or a telecommunication switch
- the first preset time may be flexibly set by the user or the operator according to actual conditions such as actual service requirements
- the communication number is not limited to the mobile phone number and fixed.
- the implementation manner of obtaining the CDR of the preset number of communication numbers in the first preset time from the communication service device may be at least one of the following manners:
- the communication service device When detecting the communication number of the current user, acquires the CDR of the communication number of the current user in the first preset time;
- the communication service device obtains the CDR of the strange communication number within the first preset time.
- step 202 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
- the CDRs of the preset number of communication numbers that are obtained from the communication service device in the first preset time are generally out of order.
- the pre-processed CDRs are formed by using the communication numbers as a dimension, and the pre-processing CDRs are formed.
- the communication information includes at least one type of communication information corresponding to at least one of the following communication numbers: the communication number is used as the calling number (such as the calling number in the voice service), and the communication number is used as the called number (such as in the voice service) The called number), the communication number is used as the information transmission number (such as the short message transmission number or the data transmission number in the data service), and the communication number is used as the information reception number (such as the short message receiving number or the data receiving number in the data service).
- the pre-processed CDR includes only at least one type of communication information of each communication number extracted from the CDR, that is, the pre-processed CDR does not need to include all the information in the CDR; the data of the pre-processed CDR is Each communication number is used as an index to preprocess the data structure of the bill, for example:
- Calling number 1 in the voice service communication information 1, communication information 2, ...;
- Calling number 2 in the voice service communication information 3, communication information 4, ...;
- SMS sending number 3 communication information 5, communication information 6, ...;
- Data transmission number 4 in the data service communication information 7, communication information 8, ....
- the pre-processing bills indexed by using each communication number as the calling number are shown in Table 1.
- Table 1 For the data structure example in Table 1, the calling number, the called number, the communication start time, and the communication duration are shown here. (Second) is a partial example of the type of communication information included in the CDR.
- Step 203 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
- Step 204 Analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 205, otherwise The process ends.
- Step 205 Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
- the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill are analyzed, and the characteristics of the corresponding type of communication information are matched with the preset features from the communication numbers included in the pre-processed bill.
- the target communication number; the preset feature is, for example, a pre-set a priori value.
- the present embodiment parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and based on the corresponding type of the communication number.
- the communication information has characteristics that identify the target communication number that matches the preset feature from each communication number.
- the generation and maintenance process of the communication number CDR is generally performed by the operator, the participation of each user is not required.
- the acquisition speed and efficiency of the communication number CDR are high.
- the CDR of the communication number is objective data maintained by the operator, it can truly and completely reflect all communication records of the user within a certain time interval, so
- the technical solution provided by the embodiment of the present application is based on the CDR of the communication number, and can improve the speed and accuracy of the number identification.
- This embodiment is based on the first embodiment, and specifically determines how to parse the CDR to obtain the type of the communication information included in the CDR, and extracts at least one type of communication information of each communication number in the CDR and combines to form a pre-processed CDR.
- the scenario that proposes a solution to the technical solution.
- the communication number processing method provided in this embodiment includes the following steps:
- Step 301 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
- Step 302 Parsing the CDR to obtain at least one of the following types of communication information included in the CDR: a communication initiation number, a communication response number corresponding to the communication initiation number, a communication start time, and a communication duration.
- the communication initiation number may include a communication number as a calling number (such as a calling number in a voice service), and a communication number (such as a short message transmission number or a data transmission number in a data service) as an information transmission number;
- the communication response number of the number may include a communication number as the called number (such as the called number in the voice service), and a communication number (such as a short message receiving number or a data receiving number in the data service) as the information receiving number;
- the type of communication information included in the CDR is not limited to the above-mentioned communication initiation number, the communication response number corresponding to the communication initiation number, the communication start time, the communication duration, etc., and the type of communication information can also be Including data traffic (upstream traffic and/or downstream traffic), communication location, service type, long-distance type, etc.; this application is not limited thereto.
- Step 303 Extract at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number.
- Step 304 Combine the extracted communication records of each communication initiation number to form a pre-processed bill.
- the pre-processed CDR only includes at least one type of communication information of each communication number extracted from the CDR, and the pre-processed CDR does not include all the information in the CDR, which can reduce the workload of the communication number processing and improve Communication number processing efficiency.
- the CDRs of the preset number of communication numbers that are obtained from the communication service device in the first preset time are generally out of order.
- the CDRs shown in Table 2 are taken as an example.
- the communication start time, service type, and communication initiation are used here.
- the number, the communication response number, the communication place, the long distance type, and the communication duration (seconds) are partial examples of the types of communication information included in the CDR.
- the communication number processing apparatus parses the CDR shown in Table 2 to obtain at least one of the following types of communication information included in the CDR: a communication initiation number; a communication response number corresponding to the communication initiation number; and a communication start time; Communication duration;
- the communication number processing device extracts at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number; where the communication record of each communication initiation number includes the communication number at the first At least one type of communication information within a preset time;
- the communication records of the extracted communication initiation numbers are combined to form a pre-processed CDR; the pre-processed CDRs are statistically formed by using each communication number as a dimension, and the data structure (or display mode) in the pre-processed CDR is used for each communication number.
- the index organization it is assumed that at least one type of communication information corresponding to each communication number is a communication initiation number to form a pre-processed bill, and the data structure of the pre-processed bill can be:
- Communication initiation number 1 communication information 1, communication information 2, ...;
- Communication initiation number 2 communication information 1, communication information 2, ...;
- the pre-processing bill shown in Table 3 is obtained by performing the steps 202-204 on the basis of the bill shown in Table 2 by the communication number processing device;
- the pre-processed CDRs are organized by indexing each communication initiation number.
- Communication origination number Communication response number Communication start time Communication duration (seconds) 158xxxx0001 186xxxx0002 2016-01-15 15:32:42 134 158xxxx0001 186xxxx0007 2016-01-15 15:42:02 97 158xxxx0001 139xxxx0006 2016-01-15 15:48:02 123 158xxxx0001 187xxxx0002 2016-01-15 15:52:07 256 170xxxx0001 186xxxx0001 2016-01-15 15:39:02 15 170xxxx0001 180xxxx0007 2016-01-15 15:51:02 77 170xxxx0001 139xxxx0002 2016-01-16 10:26:02 --
- Step 305 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
- Step 306 Analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 307, otherwise The process ends.
- Step 307 Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
- This embodiment is directed to how to parse a CDR to obtain the type of communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
- the CDR obtains at least one type of communication information included in the CDR, and extracts at least one type of communication information associated with each communication initiation number in the CDR to form a communication record of each communication initiation number, and extracts each extracted
- the communication record combination of the communication initiation number forms a pre-processed CDR, and the formed pre-processed CDR includes only at least one type of communication information of each communication number extracted from the CDR, and the pre-processed CDR does not include the CDR All the information in the book can reduce the workload of number identification and improve the speed and efficiency of number identification.
- the embodiment is based on the first embodiment, and the editing distance of the communication initiation number and the yellow page number is used as the feature of the communication number, and the technical solution for specifically identifying the communication number that meets the preset condition from the plurality of communication numbers is described.
- the communication number processing method includes the following steps:
- the yellow page number can be one or more; the edit distance refers to the minimum number of editing operations required to convert the yellow page number into the communication initiation number, that is, by adding, reducing, modifying, and moving the number of the yellow page number into communication.
- the number of operations for initiating a number in the scenario where the yellow page number is multiple, for each communication initiation number in the pre-processed bill, the edit distance of the communication initiation number and each yellow page number needs to be separately calculated.
- At least one of the following methods may be used to obtain each communication in the pre-processed bill based on the edit distance.
- Method 1 for each communication initiation number in the pre-processed bill, normalizing the calculated communication initiation number and the edit distance of each yellow page number to obtain the communication initiation number and each yellow page number. Similarity; further, the similarity of the communication initiation number to each yellow page number is sorted.
- Method 2 for each communication initiation number in the pre-processed bill, calculating a ratio of the edit distance of the communication initiation number to the yellow page number and the preset distance, and calculating the similarity between the calculated ratio communication initiation number and the yellow page number; In a scenario where the yellow page number is multiple, the ratio of the edit distance of the communication initiation number to each yellow page number to the preset distance needs to be separately calculated.
- the initial value of the first threshold may be calculated by manual setting or training, for example, determining, according to the a priori value, the target number of the target communication number in each communication initiation number included in the pre-processed bill; The similarity between the communication initiation number and the yellow page number is sorted; the communication initiation number of the target number is selected according to the order of decreasing similarity; and the communication initiation number of the selected communication initiation number having the smallest similarity with the yellow page number is similar Degree, determined as the initial value of the first threshold.
- the first threshold can be continuously updated by training calculation according to actual needs.
- the communication number processing device sorts the similarity between each communication initiation number and the yellow page number included in the pre-processed bill based on the similarity between each communication initiation number and the yellow page number in the pre-processed bill;
- the order of the similarity between each communication initiation number and the yellow page number included in the pre-processing CDR is extracted from the communication initiation numbers included in the pre-processed CDR, and the first proportional communication initiation number with the highest similarity is extracted as the target communication number.
- the communication number processing device determines, according to the similarity between the communication initiation number and the yellow page number and the first threshold, respectively.
- the probability that the communication initiation number belongs to the target communication number (such as the fraud number) and the probability of belonging to the normal number category, and the class corresponding to the larger probability value is used as the class to which the communication initiation number belongs;
- the class is the target communication number class, and it is determined that the communication initiation number is the target communication number, and vice versa, the communication initiation number is determined to be the normal number.
- the user equipment may be, for example, a smart phone, a fixed phone, a tablet computer, a notebook computer, or a wearable device.
- the device may be, for example, a service server of an operator, an enterprise gateway, a background server of an application installed on the user equipment, and the like;
- the communication service device may be, for example, a BSS/OSS or a telecommunication switch;
- it can be a communication application, for example, a Tencent mobile phone housekeeper, a WeChat, a Tencent mailbox, etc., of course, the application is not limited to the communication application, which is not specifically limited in the embodiment of the present application;
- the server and the communication service device cooperate with each other to implement an optional flowchart of the communication number processing method provided by the embodiment, and the method includes:
- Step 401 The user equipment sends an identification indication carrying the to-be-identified communication number to the server, based on the user indication.
- the user application running on the user equipment is in the receiving user indication state, and the user inputs the to-be-identified communication number in the designated location according to the prompt of the application in the display window of the application installed in the user equipment;
- the number can be one or more.
- Step 402 The server receives the identification indication, and sends a CDR request carrying the to-be-identified communication number to the communication service device according to the identification indication.
- the CDR request includes the to-be-identified communication number and the first preset time.
- Step 403 The communication service device receives the CDR request, and obtains the CDR of the to-be-identified communication number in the first preset time based on the CDR request, and sends the CDR to the server.
- Step 404 The server receives the bill of the to-be-identified communication number in the first preset time.
- step 405 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number to be identified in the CDR is extracted and combined to form a pre-processed CDR.
- Step 406 Calculate an edit distance of each to-be-identified communication number and a yellow page number in the pre-processed bill.
- Step 407 Obtain a similarity between each to-be-identified communication number and the yellow page number in the pre-processed bill based on the edit distance.
- Step 408 Determine whether the similarity between each to-be-identified communication number and the yellow page number included in the pre-processed CDR is greater than a first threshold. If yes, go to step 409, otherwise the process terminates.
- Step 409 Extract, from the to-be-identified communication numbers included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than the first threshold, as the target communication number.
- Step 410 The server sends an identification response carrying the target communication number to the user equipment based on the identified target communication number, where the identification response is used to perform a dangerous reminder to the user, and the user is reminded that the identified target communication number may be a fraudulent number;
- Implementation methods include, but are not limited to, reminding by communication applications such as SMS, Flash, WeChat, and Tencent Mobile Manager; the server can also perform dangerous reminding to the user equipment directly through the customer service phone when the target communication number is identified.
- the server may also perform a danger reminder to the user who has the communication response number of the communication record of the identified target communication number or the user who is communicating with the identified target communication number. To avoid users being cheated.
- the user equipment After receiving the identification response of the carrying target communication number sent by the server, the user equipment performs a dangerous reminder on the user based on the target communication number; for example, referring to FIG. 11b, the user application running on the user equipment is in a text reminding state, and the user equipment is installed on the user equipment.
- the display window of the application of the user equipment displays, for example, the following text reminder message "Please be vigilant!
- the target communication number is a fraudulent number
- the user applications here include, but are not limited to, SMS, Flash, WeChat, Tencent mobile butler, and other communication applications;
- the application is not limited to the communication application, which is not specifically limited in the embodiment of the present application.
- the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
- pre-processing the CDRs on the basis of the parsing, the editing distances of the communication initiation numbers and the yellow page numbers in the pre-processed CDRs are respectively calculated, and the communication initiation numbers and the yellow page numbers in the pre-processed CDRs are obtained based on the editing distance.
- the degree of similarity (that is, the communication number is one of the characteristics of the communication initiation number), and the communication initiation number whose degree of similarity with the yellow page number is greater than the preset first threshold is extracted from each communication initiation number included in the pre-processed bill as The destination communication number, or the first generation of the communication initiation number with the highest degree of similarity is extracted as the target communication number based on the ranking of the similarity of the yellow number of each of the communication initiation numbers included in the pre-processing CDR;
- the similarity between each communication initiation number and the yellow page number in the pre-processed bill is characterized by a first threshold
- the preset feature by determining the relative relationship between the similarity between each communication initiation number and the yellow page number included in the pre-processed CDR and the first threshold, extracting the target communication that matches the preset feature from the communication number included in the pre-processed CDR The number enables fast and accurate number identification.
- the embodiment is based on the first embodiment, and the communication number of the communication initiation number in the unit time is taken as the characteristic of the communication number, and the technical solution for specifically identifying the communication number that meets the preset condition from the plurality of communication numbers is described.
- the communication number processing method provided includes the following steps:
- the number of communications of the communication initiation number in a unit time may include any of the following:
- Mode 1 the communication initiation number and the number of communication of the same number in the unit time;
- Mode 2 The number of communication times between the communication initiation number and all communication numbers with which it communicates.
- the initial value of the second threshold may be calculated by manual setting or training, for example, determining, according to the a priori value, the target number of the target communication number in each communication initiation number included in the pre-processed CDR; and each communication initiation number in the unit time Sorting the number of communication times; selecting the communication initiation number of the target number in the order of decreasing the number of communication times per unit time; corresponding to the communication initiation number of the selected communication initiation number having the smallest number of communication times per unit time The number of communications in a unit time is determined as the initial value of the second threshold.
- the second threshold can be continuously updated by training calculation according to actual needs.
- the communication number processing device sorts the communication times of each communication initiation number included in the pre-processed bill in the unit time based on the number of communication times of each communication initiation number in the pre-processed bill. And sorting the number of communication times of each communication initiation number included in the pre-processing CDR in a unit time, and extracting, from each communication initiation number included in the pre-processed CDR, a communication ratio number of the second ratio with the highest communication number as the target Communication number.
- the user equipment may be, for example, a smart phone, a fixed phone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.).
- the server may be, for example, a service server of an operator, an enterprise gateway, a back-end server installed in an application of the user equipment, or the like;
- the communication service device may be, for example, a BSS/OSS or a telecommunication switch; and the application may specifically be a communication application, for example, a Tencent mobile phone.
- the appliance, the WeChat, the Tencent mailbox, and the like of course, the application is not limited to the communication application, and is not specifically limited in the embodiment of the present application; the user equipment, the server, and the communication service device shown in FIG. 5 cooperate with each other to implement the implementation.
- An optional flowchart of the communication number processing method provided by the example includes:
- Step 501 When detecting the communication number of the opposite party with the current user, the user equipment (or the application installed on the user equipment) sends an identification indication carrying the communication number of the opposite party to the server.
- Step 502 The server receives the identification indication, and sends a CDR request carrying the communication number of the opposite party to the communication service device according to the identification indication.
- the CDR request includes the communication number of the opposite party and the first preset time.
- Step 503 The communication service device receives the bill request, and obtains the bill of the other party communication number in the first preset time based on the bill, and sends the bill to the server.
- Step 504 The server receives a bill of the communication number of the other party in the first preset time.
- step 505 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of the other party's communication number in the CDR is extracted and combined to form a pre-processed CDR.
- Step 506 Extract the communication start time of the counterpart communication number in the pre-processed bill as the communication initiation number.
- Step 507 Calculate the number of communications of the counterpart communication number in the unit time in the pre-processed bill.
- Step 508 Determine whether the communication number of the counterpart communication number included in the pre-processed bill is greater than a second threshold in the unit time, and if yes, go to step 509, otherwise the process terminates.
- Step 509 Extract, from the communication number of the other party included in the pre-processing CDR, a communication initiation number whose communication time in the unit time is greater than a second threshold, as the target communication number.
- Step 510 The server performs a dangerous reminder to the user based on the identified target communication number, and reminds the user that the identified target communication number may be a fraudulent number; the implementation of the dangerous reminder includes but is not limited to a short message, a flash message, a WeChat, and a Tencent mobile phone.
- the communication application such as the housekeeper reminds the server; the server can also perform a dangerous reminder to the user equipment directly through the customer service phone when the target communication number is recognized.
- the server may also perform a danger reminder to the user having the communication response number of the communication record of the identified target communication number or the user of the communication response number being communicated with the identified target communication number. To avoid users being cheated.
- the user equipment After receiving the identification response of the carrying target communication number sent by the server, the user equipment performs a dangerous reminder on the user based on the target communication number; for example, referring to FIG. 11b, the user equipment displays, for example, the following text reminding information in a display window of an application installed in the user equipment. "Please be vigilant! The target communication number is a fraudulent number"; the user application here includes, but is not limited to, a communication application such as a short message, a flash message, a WeChat, a Tencent mobile phone housekeeper, etc.; of course, the application is not limited to the communication application, and the embodiment of the present application This is not specifically limited.
- the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
- the number of communication times of each communication initiation number in the pre-processed CDRs in the unit time is calculated (ie, the communication number is used as One of the characteristics of the communication initiation number), extracting, from each communication initiation number included in the pre-processed bill, a communication initiation number whose communication number in a unit time is greater than a preset second threshold as the target communication number, or based on Pre-processing the number of times of communication in the communication initiation number included in the pre-processing CDRs, and extracting the communication-initiated number of the second ratio, which is the highest number of communication times, as the target communication number;
- the communication initiation number is characterized by the number
- the embodiment is based on the first embodiment, and is configured to obtain at least one type of communication information of each communication number in the pre-processed bill, and obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and
- the scenario in which the target communication number matching the preset feature is extracted from the communication number included in the pre-processing CDR is proposed, and a technical solution is proposed.
- the communication number processing method provided in this embodiment includes the following steps:
- Step 601 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
- Step 602 Parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
- Step 603 Extract the communication duration of each communication number in the pre-processed bill as the communication initiation number.
- Step 604 Calculate an average communication duration of each communication initiation number in the pre-processed bill.
- the average communication duration of the communication initiation number may include any of the following:
- Step 605 Determine whether the average communication duration of each communication initiation number included in the pre-processed CDR is greater than a third threshold. If yes, go to step 606, otherwise the process terminates.
- the initial value of the third threshold can be calculated manually or by training, for example:
- the average communication duration corresponding to the communication initiation number having the smallest average communication duration in the selected communication initiation number is determined as the initial value of the third threshold.
- the third threshold can be continuously updated through training calculation according to actual needs.
- Step 606 Extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold, as the target communication number.
- the communication number processing device sorts the average communication duration of each communication initiation number included in the pre-processed bill based on the average communication duration of each communication initiation number in the pre-processed bill; The order of the average communication duration of each of the included communication initiation numbers is extracted, and the communication initiation number of the third ratio having the highest average communication duration is extracted from each communication initiation number included in the pre-processed bill as the target communication number.
- the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
- the pre-processed CDRs based on the parsing of the dialog, respectively calculating the average communication duration of each communication initiation number in the pre-processed CDR (ie, the communication number is one of the characteristics of the communication initiation number), from the pre-processing Extracting, by each communication initiation number included in the CDR, a communication initiation number whose average communication duration is greater than a preset third threshold as the target communication number, or sorting based on the average communication duration of each communication initiation number included in the pre-processed CDR The communication initiation number of the third ratio with the highest average communication duration is used as the target communication number.
- the embodiment of the present application is characterized by the average communication duration of each communication initiation number in the pre-processed bill, and the third threshold is a preset feature.
- the average communication duration of each communication initiation number included in the pre-processed bill is compared with the third threshold Relationship, from the extracted communication number included in the telephone bill pretreatment target communication number matches a preset characteristics to achieve a rapid and accurate identification number.
- the embodiment is based on the first embodiment, and is configured to obtain at least one type of communication information of each communication number in the pre-processed bill, and obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and
- the scenario in which the target communication number matching the preset feature is extracted from the communication number included in the pre-processing CDR is proposed, and a technical solution is proposed.
- the communication number processing method provided in this embodiment includes the following steps:
- Step 701 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
- step 702 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
- Step 703 Extract the attribution of the communication response number corresponding to each communication number in the pre-processed bill as the communication initiation number.
- Step 704 Calculate the number of different attributions of the communication response number corresponding to each communication initiation number in the pre-processed bill.
- Step 705 Determine whether the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed CDR is greater than a fourth threshold. If yes, go to step 706, otherwise the process terminates.
- the initial value of the fourth threshold can be calculated manually or by training, for example:
- the number of different attributions of the communication response number corresponding to the communication initiation number having the smallest number of different attributions of the communication response number corresponding to the selected communication initiation number is determined as an initial value of the fourth threshold.
- the fourth threshold can be continuously updated through training calculation according to actual needs.
- Step 706 Extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold, as the target communication number.
- the communication number processing device is based on the average communication duration of each communication initiation number in the pre-processed bill, and the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed bill. Sorting; sorting the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processing CDR, and extracting the difference of the corresponding communication response number from each communication initiation number included in the pre-processed CDR The communication initiation number of the fourth ratio with the highest number of attributions is used as the target communication number.
- the embodiment is directed to how to obtain the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and extract the scene of the target communication number that matches the preset feature from the communication number included in the pre-processed bill.
- the embodiment of the present application initiates each communication in the pre-processed bill.
- the number of different attributions of the communication response number corresponding to the number is characterized, and the fourth threshold is a preset feature, and the number of different attributions of the communication response number corresponding to each communication initiation number included in the pre-processed CDR is determined.
- the relative relationship of the fourth threshold is obtained by extracting the target communication number matching the preset feature from the communication number included in the pre-processed CDR, thereby realizing fast and accurate number identification.
- This embodiment is based on the foregoing embodiment, and proposes a solution to solve the scenario of how to extract the target communication number that matches the preset feature from the communication number included in the pre-processed CDR.
- the communication number processing method provided in this embodiment includes the following steps:
- Step 801 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
- Step 802 Parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR.
- Step 803 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
- Step 804 Analyze, by using a machine learning model, features of corresponding types of communication information of each communication number in the pre-processed bill.
- Step 805 Determine whether the feature of the corresponding type communication information of each communication number matches the preset feature. If yes, go to step 806, otherwise the process ends.
- Step 806 Extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill.
- the implementation of the feature of the corresponding type of communication information of each communication number in the pre-processing bill is analyzed by using the machine learning model, including: using the technical solution or technology described in any one of the foregoing embodiments 3 to 6.
- the combination of scenarios identifies the target communication number.
- the machine learning model can adopt any of the following models or combinations: Bayesian classifier model; Support Vector Machine (SVM) classifier model; deep learning model; logic Regression; those skilled in the art can understand that the machine learning model can also include other models not listed herein, and the application is not limited thereto.
- SVM Support Vector Machine
- This embodiment is directed to how to obtain a scenario in which a target communication number that matches a preset feature is extracted from a communication number included in a pre-processed CDR, and analyzes a corresponding type of communication information of each communication number in the pre-processed CDR by using a machine learning model.
- the feature has the target communication number matched with the preset feature extracted from the communication number included in the pre-processed CDR, thereby realizing fast and efficient number identification.
- This embodiment is based on the seventh embodiment, and proposes a solution to solve the scenario in which the machine learning model is trained based on the feedback information of the target communication number on the user side.
- the communication number processing method provided in this embodiment includes the following steps:
- Step 901 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
- step 902 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
- Step 903 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
- Step 904 Analyze the characteristics of the corresponding type of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature, and if yes, go to step 905, otherwise The process ends.
- Step 905 extract a target communication number that matches the preset feature from the communication number included in the pre-processed bill; and identify or identify the user with the communication response number of the communication record with the identified target communication number.
- the user of the communication response number of the target communication number communication performs a danger reminder.
- Step 906 Receive feedback information of the user side for the target communication number.
- Step 907 Determine, according to the feedback information of the target communication number by the user side, whether the target communication number is a security number, and if yes, go to step 908, otherwise the process ends.
- Step 908 Determine an error rate of the machine learning model based on the number of target communication numbers that are fed back to the security number by the user side among the identified target communication numbers.
- Step 909 Determine whether the error rate of the machine learning model is greater than a fifth threshold, and if yes, go to step Step 910, otherwise the process ends.
- Step 910 Retrain the machine learning model based on the communication record of the security number in the pre-processed bill.
- a feasible implementation of the machine learning model to retraining includes:
- the feature of the at least one type of communication information based on the security number updates the threshold used by the machine learning model to identify the target communication number.
- the error rate of the machine learning model is determined according to the number of target communication numbers in the target communication number that are fed back to the security number by the user side, and When the error rate of the machine learning model is greater than the fifth threshold, the machine learning model is retrained based on the communication record of the security number in the preprocessed bill; since the retraining is based on the communication record of the security number in the preprocessed bill Therefore, the machine learning model obtained by retraining has a higher accuracy rate.
- using the machine learning model obtained by retraining to identify the target communication number can improve the speed and accuracy of the number identification.
- This embodiment is based on any of the foregoing embodiments, and proposes a solution to the response processing scenario when the target communication number is identified.
- the communication number processing method provided in this embodiment includes the following steps:
- Step 1001 Acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time.
- step 1002 the CDR is parsed to obtain the type of the communication information included in the CDR, and at least one type of communication information of each communication number in the CDR is extracted and combined to form a pre-processed CDR.
- Step 1003 Analyze at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill.
- Step 1004 Analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and determine whether the feature of the corresponding type of communication information of each communication number matches the preset feature. If yes, go to step 1005; otherwise, The process ends.
- Step 1005 Extract a target that matches the preset feature from the communication number included in the pre-processed bill Communication number.
- Step 1006 Determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and the preset feature.
- the degree of matching between the feature of the corresponding type of communication information of the target communication number and the preset feature can also be understood as the degree of difference between the feature and the preset feature of the corresponding type of communication information of the target communication number;
- the similarity between the target communication number and the yellow page number is as follows.
- the similarity between the target communication number and the yellow page number is greater than the first threshold.
- the matching degree refers to the difference between the similarity between the target communication number and the yellow page number and the first threshold. size.
- Step 1007 Determine a dangerous level of the target communication number according to the matching degree of the feature of the corresponding type of communication information of the target communication number with the preset feature.
- the degree of matching is positively related to the level of danger; different levels of risk can correspond to the degree of matching within different data ranges.
- Step 1008 Respond to the communication behavior of the target communication number based on the risk level of the target communication number.
- the real-time degree of response processing is positively related to the hazard level; it is assumed that the defined hazard level includes: high risk, low risk; the hazard level here can be used to characterize the probability that the target communication number is a communication number that meets certain conditions, such as danger The level can be used to characterize the probability that the target communication number is a fraudulent number.
- the manner of responding to the communication behavior of the target communication number may include: performing a danger reminder to the user having the communication response number of the communication record with the target communication number The user is reminded that the target communication number is a fraudulent number; here, the danger reminder includes a voice reminder and/or a text reminder; the voice reminder is, for example, a voice recording or a customer service telephone reminder; and the text reminder is, for example, a text message or a flash message.
- the communication number processing means performs an after-the-life danger reminder to the user having the communication response number of the communication record of the target communication number, on the user device having the communication response number of the communication record with the target communication number, in the user application
- the display window displays the following text reminder message "Please be vigilant!
- the target communication number is a fraudulent number
- the user applications here include but are not limited to: SMS, Flash, WeChat, Tencent mobile butler, etc.; of course, the application does not It is limited to the communication application, which is not specifically limited in the embodiment of the present application.
- the manner of responding to the communication behavior of the target communication number may include: communicating with the target communication number
- the user of the communication response number of the letter performs an immediate danger reminder (including but not limited to a text reminder such as a short message or a flash message, or a voice reminder such as a voice recording or a customer service telephone reminder), that is, the user is communicating with the target communication number.
- an immediate danger reminder including but not limited to a text reminder such as a short message or a flash message, or a voice reminder such as a voice recording or a customer service telephone reminder
- the user is reminded that the target communication number is a fraudulent number; or, the ongoing communication with the target communication number is directly intercepted, and the user is reminded of danger afterwards.
- the risk level of the target communication number is determined based on the matching degree of the feature of the corresponding type of communication information of the target communication number and the preset feature, based on the risk level of the target communication number. Respond to the communication behavior of the target communication number, and remind the user who communicates with the target communication number to be vigilant and avoid fraud.
- the present embodiment is applicable to a scenario in which it is necessary to identify a communication number that satisfies a preset condition from among a plurality of communication numbers, for example, for identification of a whole network number in a communication network, or for a user indication, based on any of the above embodiments.
- Identifying the identification of the communication number, or in the scene of identifying the communication number for communicating with the current user; the type of communication service includes but is not limited to any one of the following service types or combinations: voice call; short message; flash message; data service (such as WeChat), this application is not limited to this.
- a communication number processing apparatus (a fraudulent number identification system based on bill analysis) provided in this embodiment includes: an online identification system and an offline training system.
- the online identification system extracts features according to the bill records collected by the operator; uses the machine learning model to determine whether a certain phone number is a fraudulent phone; then, the user is reminded/returned to the user to avoid being deceived, and will be reminded/ The results of the return visit are fed back to the offline training system, and the machine learning model is adjusted accordingly.
- the offline training system extracts the corresponding features by using the historical bill data and the feedback result of the reminder/return visit in the online identification system; using these features,
- the machine learning model is retrained and adjusted; the trained machine learning model is synchronized to the fraudulent phone recognition engine in the online training system.
- the online identification system can identify the fraud number according to the user's call bill record; the online identification system can be further divided into three modules: a bill collection module, a fraudulent phone recognition engine, and a deceived user reminder system;
- CDR collection module mainly responsible for the collection of user call records, and pre-processing the collected CDRs to obtain the following four columns of information:
- Fraud phone identification engine This is the core of the online identification system; the collected bills are cleaned, the features are extracted, and the features extracted from the trained machine learning model dialog are used to identify whether the number is a fraudulent phone; It can be divided into three parts: bill cleaning, feature extraction and fraud number identification;
- Bill cleaning is to remove the "dirty" data in the bill.
- the so-called "dirty” data is some abnormal data, such as missing content, abnormal values, and so on.
- Feature extraction After cleaning the CDRs, some features are extracted to prepare for the identification of the next scam number.
- the features include: the similarity of the calling number, the average call duration, and the distance of the adjacent CDRs. Call interval, etc.
- the similarity feature between the calling number and the yellow page number (ie, the similarity between the above-mentioned communication initiation number and the yellow page number): the fraud number is mostly the calling number, and the fraudster changes the calling number to the number on the yellow page by changing the numbering software. Similar numbers, such as 001XX86, +0109XX88, 08XXX10010 (China Unicom's customer service phone number is 10010), etc., calculate the edit distance of the substring of these numbers and the number on the yellow page (edit distance indicates the yellow page number, for example, by adding, reducing, modifying, moving The number of operations that the operation becomes the calling number).
- the number of calls per unit time (that is, the number of communication times of the above communication initiation number in the unit time):
- the fraudsters usually make a lot of calls every hour, and most of these calls are during working hours, that is, Monday to Friday 08. :00:00--18:00:00, during this time, the number of calls is evenly distributed; during non-working hours, the number of calls made by the phone is generally small, basically 0.
- the average call duration (that is, the average communication duration mentioned above): that is, the average number of calls per call for the fraudulent number.
- the average call duration of the fraud is short, no more than 20s.
- the distribution of the attribution of the called number in time (unit: day) (ie, the number of different attributions of the communication response number corresponding to the above-mentioned communication initiation number): the fraudster is usually fraudulently by city, therefore, The called numbers in these bills usually belong to a certain city, and the number of the cities belonging to the called number within a certain period of time is taken as the feature.
- the deceived user reminds the system: telling the victim of the fraudulent call to receive a call that is a fraudulent call, preventing the victim from being deceived; and submitting the information of the victim's feedback to the offline training system.
- the offline training system extracts the characteristics of the relevant historical bills, retrains the machine learning model, and adjusts the Bayesian classifier (here can also Use other machine learning algorithms, such as svm classifier, logistic regression, deep learning, etc.); offline training system can be divided into three parts:
- Extract historical bills Extract historical bills from the most recent period of time, especially if the feedback result is wrong.
- Feature extraction Extract features from historical CDRs to provide data for the next model retraining.
- Model retraining Using the features extracted in b), training the Bayesian classifier to obtain new parameters, and updating the trained machine learning model to the online recognition system.
- the online identification system and the offline training system form a complete closed loop.
- the offline training system will decide whether to retrain and update the fraudulent number identification model in the online identification system according to the result of the voice return visit.
- the communication number processing device provided in this embodiment has the following advantages: 1) no need for the user's tag information, only the bill record is required; 2) speeding up the recognition speed and accuracy of the fraud number; 3) more accurate identification Fraud number; enables the operator to identify fraudulent calls during the user's call.
- the embodiment further describes a communication number processing device, which can be used to execute the communication number processing method in the embodiment of the present application, and the communication number processing device can be implemented in various manners.
- a user device such as a smart phone, a landline phone, a tablet computer, a notebook computer, a wearable device (such as smart glasses, a smart watch, etc.), or in a network device such as an enterprise gateway or a carrier gateway.
- the communication number processing device may also be a client application or a background server of the user application, for example, when the user application When the Tencent mobile phone manager is in charge, the corresponding communication number processing device may be a client or a background server of the Tencent mobile phone housekeeper; see FIG. 13, the communication number processing device includes:
- the obtaining module 1301 is configured to acquire, from the communication service device, a bill of a preset number of communication numbers in a first preset time;
- the pre-processing module 1302 is configured to parse the CDR to obtain the type of the communication information included in the CDR, and extract at least one type of communication information of each communication number in the CDR and combine to form a pre-processed CDR;
- the parsing module 1303 is configured to parse at least one type of communication information of each communication number in the pre-processed bill, and obtain a feature of the corresponding type of communication information of each communication number in the pre-processed bill;
- the extracting module 1304 is configured to extract, from the communication number included in the pre-processed bill, a target communication number that matches the preset feature.
- the present embodiment parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and based on the corresponding type of the communication number.
- the communication information has characteristics that identify the target communication number that matches the preset feature from each communication number.
- the generation and maintenance process of the communication number CDR is generally performed by the operator, the participation of each user is not required.
- the acquisition speed and efficiency of the communication number CDR are high.
- the CDR of the communication number is objective data maintained by the operator, it can truly and completely reflect all communication records of the user within a certain time interval, so
- the technical solution provided by the embodiment of the present application is based on the CDR of the communication number, and can improve the speed and accuracy of the number identification.
- the pre-processing module 1302 is specifically configured to:
- the extracted communication records of the respective communication initiation numbers are combined to form a pre-processed bill.
- the parsing module 1303 is specifically configured to: separately calculate an edit distance of each communication initiation number and a yellow page number in the pre-processed bill; and obtain each communication initiation number and yellow page in the pre-processed bill based on the edit distance. Number similarity;
- the extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose degree of similarity with the yellow page number is greater than a first threshold; or, based on each communication initiation number included in the pre-processed CDR The ordering of the similarity between the middle and the yellow page number extracts the first proportion of the communication initiation number with the highest similarity.
- the parsing module 1303 is specifically configured to: extract each of the pre-processed bills The communication number is used as the communication start time of the communication initiation number; and the number of communication times of each communication initiation number in the pre-processed bill in the unit time is calculated;
- the extraction module 1304 is specifically configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose communication time is greater than a second threshold in a unit time; or, based on each communication initiation number included in the pre-processed CDR The order of the number of communication times per unit time is extracted, and the communication ratio number of the second ratio with the highest number of communication times is extracted.
- the parsing module 1303 is specifically configured to: extract the communication duration of each communication number in the pre-processed bill as the communication initiation number; and calculate the average communication duration of each communication initiation number in the pre-processed bill;
- the extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose average communication duration is greater than a third threshold; or, based on the average communication of each communication initiation number included in the pre-processed CDR The sorting of the duration, extracting the third ratio of the communication initiation number with the highest average communication duration.
- the parsing module 1303 is specifically configured to: obtain the attribution of the communication response number corresponding to each communication number in the pre-processed bill as the communication initiation number; calculate each communication initiation number in the pre-processed bill The number of different attributions of the corresponding communication response number;
- the extraction module 1304 is configured to: extract, from each communication initiation number included in the pre-processed CDR, a communication initiation number whose number of different attributions of the corresponding communication response number is greater than a fourth threshold; or, based on the pre-processed CDR The order of the number of different attributions of the communication response number corresponding to each communication initiation number is extracted, and the communication generation number of the fourth ratio having the highest number of different attributions of the corresponding communication response number is extracted.
- the extraction module 1304 is specifically configured to: use a machine learning model to analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and extract from the communication number included in the pre-processed bill.
- the target communication number that matches the preset feature is specifically configured to: use a machine learning model to analyze characteristics of corresponding types of communication information of each communication number in the pre-processed bill, and extract from the communication number included in the pre-processed bill. The target communication number that matches the preset feature.
- the communication number processing apparatus of this embodiment also includes the obtaining module 1301, the preprocessing module 1302, the parsing module 1303, and the extracting module 1304 in FIG.
- the communication number processing device of the embodiment further includes:
- the training module 1305 is configured to receive feedback information of the user side for the target communication number, and determine the target. Whether the communication number is a security number; determining an error rate of the machine learning model based on the number of target communication numbers that are fed back to the security number by the user side in the identified target communication number; when the error rate of the machine learning model is greater than the fifth threshold, based on The communication record of the security number in the CDR is preprocessed, and the machine learning model is retrained.
- the training module 1305 is specifically configured to: parse at least one type of communication information of the communication record of the security number in the pre-processed bill, and obtain a feature of the at least one type of communication information of the security number; The feature possessed by the at least one type of communication information updates the threshold used by the machine learning model to identify the target communication number.
- the device further includes:
- the response module 1306 is configured to determine a degree of matching between a feature of the corresponding type of communication information of the target communication number and the preset feature, and determine a target according to a matching degree of the feature of the corresponding type of the communication number of the target communication number and the preset feature.
- the danger level of the communication number responding to the communication behavior of the target communication number based on the danger level of the target communication number.
- the obtaining module 1301, the pre-processing module 1302, the parsing module 1303, the extracting module 1304, the training module 1305, and the response module 1306 may all be configured by a central processing unit (CPU) and a microprocessor (MPU) located in the communication number processing device. ), an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
- CPU central processing unit
- MPU microprocessor
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- This embodiment describes a computer readable medium, which may be a ROM (eg, a read only memory, a FLASH memory, a transfer device, etc.), a magnetic storage medium (eg, a magnetic tape, a disk drive, etc.), an optical storage medium (eg, a CD- ROM, DVD-ROM, paper card, paper tape, etc.) and other well-known types of program memory; computer-readable medium storing computer-executable instructions (such as binary executable instructions for projection applications such as Tencent video), when executing instructions Causing at least one processor to perform the following operations:
- the target communication number matching the preset feature is extracted from the communication number included in the pre-processed bill.
- the communication number processing device parses the bill of the communication number to obtain the characteristics of the corresponding type of communication information of the communication number, and identifies and presets from each communication number based on the characteristics of the corresponding type of communication information of the communication number.
- the target communication number of the feature matching is generally responsible for the generation and maintenance process of the communication number CDR, and does not require the participation of each user, and the acquisition speed and efficiency of the communication number CDR are high, on the other hand Because the CDR of the communication number is the objective data maintained by the operator, it can truly and completely reflect all the communication records of the user in a certain time interval. Therefore, the technical solution provided by the embodiment of the present application is processed by the CDR of the communication number. Basic, can improve the speed and accuracy of number identification.
- embodiments of the present application can be provided as a method, system, or computer program product.
- the application can take the form of a hardware embodiment, a software embodiment or an embodiment in combination with software and hardware.
- the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage and optical storage, etc.) including computer usable program code.
- the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
- the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
- These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
- the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Telephonic Communication Services (AREA)
- Mobile Radio Communication Systems (AREA)
Abstract
L'invention concerne un procédé et un appareil de traitement de numéros de communication. Le procédé consiste: à acquérir, à partir d'un dispositif de service de communication, une facture d'un nombre préétabli de numéros de communication dans un premier temps préétabli; à analyser la facture pour obtenir le type d'informations de communication figurant dans la facture, et à extraire au moins un type d'informations de communication concernant chaque numéro de communication figurant dans la facture et à les combiner pour former une facture de prétraitement; à analyser le(s) type(s) d'informations de communication concernant chaque numéro de communication figurant dans la facture de prétraitement afin d'obtenir une caractéristique possédée par un type d'informations de communication correspondant concernant chaque numéro de communication figurant dans la facture de prétraitement; et à extraire, à partir des numéros de communication figurant dans la facture de prétraitement, un numéro de communication cible correspondant à une caractéristique prédéfinie. L'invention permet d'améliorer la vitesse et la précision de reconnaissance de numéros.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610261923.1A CN107306306B (zh) | 2016-04-25 | 2016-04-25 | 通信号码处理方法及装置 |
CN201610261923.1 | 2016-04-25 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017186090A1 true WO2017186090A1 (fr) | 2017-11-02 |
Family
ID=60150219
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2017/081813 WO2017186090A1 (fr) | 2016-04-25 | 2017-04-25 | Procédé et appareil de traitement de numéros de communication |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107306306B (fr) |
WO (1) | WO2017186090A1 (fr) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112887491A (zh) * | 2019-11-29 | 2021-06-01 | 中国电信股份有限公司 | 用户缺失信息获取方法和装置 |
CN113206909A (zh) * | 2021-04-30 | 2021-08-03 | 中国银行股份有限公司 | 骚扰电话拦截方法及装置 |
CN114745211A (zh) * | 2022-04-26 | 2022-07-12 | 贵阳朗玛通信科技有限公司 | 一种基于话单数据快速匹配策略的方法和装置 |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108124065A (zh) * | 2017-12-05 | 2018-06-05 | 浙江鹏信信息科技股份有限公司 | 一种对垃圾电话内容进行识别与处置的方法 |
CN109963276A (zh) * | 2017-12-26 | 2019-07-02 | 恒为科技(上海)股份有限公司 | 一种话单数据处理方法及装置 |
CN108391223B (zh) * | 2018-02-12 | 2020-08-11 | 中国联合网络通信集团有限公司 | 一种确定失联用户的方法及装置 |
CN110401779B (zh) * | 2018-04-24 | 2022-02-01 | 中国移动通信集团有限公司 | 一种识别电话号码的方法、装置和计算机可读存储介质 |
CN109474755B (zh) * | 2018-10-30 | 2020-10-30 | 济南大学 | 基于排序学习和集成学习的异常电话主动预测方法、系统及计算机可读存储介质 |
CN110087230B (zh) * | 2019-04-26 | 2020-09-15 | 同盾控股有限公司 | 数据处理方法、装置、存储介质及电子设备 |
CN111031546B (zh) * | 2019-11-29 | 2023-09-19 | 武汉烽火众智数字技术有限责任公司 | 一种应用于电话号码分析的lr模型训练方法及使用方法 |
CN111131627B (zh) * | 2019-12-20 | 2021-12-07 | 珠海高凌信息科技股份有限公司 | 基于流数据图谱的个人有害呼叫检测方法、装置及可读介质 |
CN113596260B (zh) * | 2020-04-30 | 2022-12-16 | 中国移动通信集团广东有限公司 | 异常电话号码检测方法和电子设备 |
CN111783968B (zh) * | 2020-06-30 | 2024-05-31 | 山东信通电子股份有限公司 | 一种基于云边协同的输电线路监测方法及系统 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101217820A (zh) * | 2008-01-15 | 2008-07-09 | 中兴通讯股份有限公司 | 一种骚扰号码的识别系统及识别方法 |
CN101426203A (zh) * | 2007-11-02 | 2009-05-06 | 华为技术有限公司 | 一种识别恶意骚扰电话的方法和设备 |
EP2278783A1 (fr) * | 2009-06-26 | 2011-01-26 | Vodafone Holding GmbH | Dispositif et procédé de reconnaissance d'appels téléphoniques souhaités et/ou non souhaités en fonction du comportement d'un utilisateur de téléphone |
CN102892117A (zh) * | 2012-09-11 | 2013-01-23 | 北京中创信测科技股份有限公司 | 一种骚扰电话监控系统方法及系统 |
CN105451234A (zh) * | 2015-11-09 | 2016-03-30 | 北京市天元网络技术股份有限公司 | 一种基于信令交互数据的可疑号码分析方法及装置 |
-
2016
- 2016-04-25 CN CN201610261923.1A patent/CN107306306B/zh active Active
-
2017
- 2017-04-25 WO PCT/CN2017/081813 patent/WO2017186090A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101426203A (zh) * | 2007-11-02 | 2009-05-06 | 华为技术有限公司 | 一种识别恶意骚扰电话的方法和设备 |
CN101217820A (zh) * | 2008-01-15 | 2008-07-09 | 中兴通讯股份有限公司 | 一种骚扰号码的识别系统及识别方法 |
EP2278783A1 (fr) * | 2009-06-26 | 2011-01-26 | Vodafone Holding GmbH | Dispositif et procédé de reconnaissance d'appels téléphoniques souhaités et/ou non souhaités en fonction du comportement d'un utilisateur de téléphone |
CN102892117A (zh) * | 2012-09-11 | 2013-01-23 | 北京中创信测科技股份有限公司 | 一种骚扰电话监控系统方法及系统 |
CN105451234A (zh) * | 2015-11-09 | 2016-03-30 | 北京市天元网络技术股份有限公司 | 一种基于信令交互数据的可疑号码分析方法及装置 |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112887491A (zh) * | 2019-11-29 | 2021-06-01 | 中国电信股份有限公司 | 用户缺失信息获取方法和装置 |
CN112887491B (zh) * | 2019-11-29 | 2023-03-21 | 中国电信股份有限公司 | 用户缺失信息获取方法和装置 |
CN113206909A (zh) * | 2021-04-30 | 2021-08-03 | 中国银行股份有限公司 | 骚扰电话拦截方法及装置 |
CN114745211A (zh) * | 2022-04-26 | 2022-07-12 | 贵阳朗玛通信科技有限公司 | 一种基于话单数据快速匹配策略的方法和装置 |
CN114745211B (zh) * | 2022-04-26 | 2024-06-25 | 贵阳朗玛通信科技有限公司 | 一种基于话单数据快速匹配策略的方法和装置 |
Also Published As
Publication number | Publication date |
---|---|
CN107306306A (zh) | 2017-10-31 |
CN107306306B (zh) | 2020-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2017186090A1 (fr) | Procédé et appareil de traitement de numéros de communication | |
CN109429230B (zh) | 一种通信诈骗识别方法及系统 | |
CN109600752B (zh) | 一种深度聚类诈骗检测的方法和装置 | |
CN107566358B (zh) | 一种风险预警提示方法、装置、介质及设备 | |
CN106384273B (zh) | 恶意刷单检测系统及方法 | |
CN109995929B (zh) | 操作和账号信息的处理方法及装置 | |
CN104883671B (zh) | 一种垃圾短信的判断方法及系统 | |
CN102438205B (zh) | 一种基于移动用户行为的业务推送的方法与系统 | |
CN110705926A (zh) | 一种物流对象配送信息的获取方法、装置和系统 | |
US20230209351A1 (en) | Assessing risk of fraud associated with user unique identifier using telecommunications data | |
CN105045911B (zh) | 一种用于用户进行标记的标签生成方法及设备 | |
CN110611929A (zh) | 异常用户识别方法及装置 | |
CN105701224B (zh) | 一种基于大数据的证券资讯个性化服务系统 | |
CN113206909A (zh) | 骚扰电话拦截方法及装置 | |
CN111131627B (zh) | 基于流数据图谱的个人有害呼叫检测方法、装置及可读介质 | |
CN110113748B (zh) | 骚扰电话监控方法、装置 | |
CN109474755B (zh) | 基于排序学习和集成学习的异常电话主动预测方法、系统及计算机可读存储介质 | |
KR20170006158A (ko) | 문자 메시지 부정 사용 탐지 방법 및 시스템 | |
CN107172622A (zh) | 伪基站短信的识别和分析方法、装置及系统 | |
CN117252429A (zh) | 风险用户的识别方法、装置、存储介质及电子设备 | |
CN108696626B (zh) | 非法信息的处理方法和装置 | |
CN111105064A (zh) | 确定欺诈事件的嫌疑信息的方法及装置 | |
CN111062422A (zh) | 一种套路贷体系化识别方法及装置 | |
CN109711984B (zh) | 一种基于催收的贷前风险监控方法及装置 | |
CN115687754B (zh) | 一种基于智能对话的主动式网络信息挖掘方法 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 17788742 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 17788742 Country of ref document: EP Kind code of ref document: A1 |