CN110401779B - Method and device for identifying telephone number and computer readable storage medium - Google Patents

Method and device for identifying telephone number and computer readable storage medium Download PDF

Info

Publication number
CN110401779B
CN110401779B CN201810372550.4A CN201810372550A CN110401779B CN 110401779 B CN110401779 B CN 110401779B CN 201810372550 A CN201810372550 A CN 201810372550A CN 110401779 B CN110401779 B CN 110401779B
Authority
CN
China
Prior art keywords
call
attribute
behavior data
user
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810372550.4A
Other languages
Chinese (zh)
Other versions
CN110401779A (en
Inventor
贺小红
庄仁峰
胡文辉
叶天宽
黄鹤羽
何亚玲
卓彩霞
黄浩
曹阳
潘锦彬
陈德志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Internet Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Internet Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Internet Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN201810372550.4A priority Critical patent/CN110401779B/en
Publication of CN110401779A publication Critical patent/CN110401779A/en
Application granted granted Critical
Publication of CN110401779B publication Critical patent/CN110401779B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/66Substation equipment, e.g. for use by subscribers with means for preventing unauthorised or fraudulent calling
    • H04M1/663Preventing unauthorised calls to a telephone set
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2281Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W12/00Security arrangements; Authentication; Protecting privacy or anonymity
    • H04W12/12Detection or prevention of fraud

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Technology Law (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention discloses a method for identifying a telephone number, which comprises the following steps: acquiring user call behavior data and attribute information of all telephone numbers in the user call behavior data; performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by adopting a machine learning algorithm to obtain a crank call model; performing data analysis on the user call behavior data and attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model; according to the harassing call model and the attribute screening model, carrying out attribute identification on the telephone number initiating the new call request to obtain the attribute of the telephone number; the method respectively carries out machine learning model training and data analysis on various data to obtain a highly reliable crank call model and an attribute screening model, and then carries out attribute identification on each telephone number by using the two models, thereby further improving the identification accuracy of each telephone number.

Description

Method and device for identifying telephone number and computer readable storage medium
Technical Field
The present invention relates to the field of mobile communications technologies, and in particular, to a method and an apparatus for identifying a phone number, and a computer-readable storage medium.
Background
Because many website registrations or outgoing consumptions require users to fill in mobile phone numbers, the possibility that the mobile phone numbers of the users are leaked to some lawbreakers is greatly increased under the condition, almost every user receives some harassing calls of advertising promotion or fraud, in order to help the users to identify the harassing calls in advance, some existing cloud computing platforms carry out big data processing and machine learning model training on call behavior data (such as call records) of all the phone numbers, obtain a machine learning model taking harassing call characteristics as classification parameters, and identify any phone number according to the machine learning model; on the other hand, the existing mobile phone is basically provided with a marking function, when any user marks any calling phone as a harassing phone call on the mobile phone, and the phone calls any mobile phone number to initiate a call request, the harassing phone call mark is displayed on the mobile phone interface of the called number, so as to remind the called user.
In the prior art, the characteristics of crank calls are extracted based on communication behavior data, a machine learning model is formed by the characteristics of the crank calls to identify the crank calls, or the crank calls are identified based on the marking information of a mobile phone user on telephone numbers, however, both the two prior arts are based on unilateral data, and the crank calls are difficult to be identified accurately, for example, the crank calls are identified only according to the machine learning model formed by the characteristics of the crank calls, the identification accuracy of the crank calls is completely dependent on the accuracy of the machine learning model, and the incidence rate of wrong identification between high-frequency non-crank attribute numbers such as take-away calls, express calls, taxi calls and the like and the high-frequency crank numbers is high; when a harassing call is identified only according to the marking information of the telephone number by the user, some users can mark the harassing call maliciously, and the identification accuracy is to be further improved.
Disclosure of Invention
The invention mainly aims to provide a method and a device for identifying a telephone number and a computer-readable storage medium, and aims to solve the problems that the reliability of identification basis is not high and the identification accuracy of the telephone number is reduced in the existing telephone number identification method.
The technical scheme of the invention is realized as follows:
the embodiment of the invention provides a method for identifying a telephone number, which comprises the following steps:
acquiring user call behavior data and attribute information of all telephone numbers in the user call behavior data;
performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by adopting a machine learning algorithm to obtain a crank call model;
performing data analysis on the user call behavior data and attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model, wherein the attribute screening model is used for expressing a standard for determining the attribute of the telephone numbers;
and according to the crank call model and the attribute screening model, performing attribute identification on the telephone numbers initiating the new call requests to obtain the attribute of each telephone number initiating the new call request.
In the above scheme, the performing attribute identification on the telephone number initiating the new call request according to the harassing call model and the attribute screening model to obtain the attribute of each telephone number initiating the new call request includes:
according to the crank call model, carrying out type prediction on the telephone number initiating the new call request to obtain the prediction type of the telephone number initiating the new call request; the prediction type is a crank call or a non-crank call;
and according to the attribute screening model, carrying out attribute identification on the telephone number with the predicted type of the crank call in the telephone numbers initiating the new call request to obtain the attribute of the telephone number with the predicted type of the crank call.
In the above scheme, the performing data analysis on the user call behavior data and the attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model includes:
selecting classification parameters from the user call behavior data and the attribute information of all telephone numbers in the user call behavior data, and establishing an attribute screening model according to the classification parameters.
In the above scheme, the performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by using a machine learning algorithm to obtain a crank call model includes:
classifying and sorting the user call behavior data to obtain call behavior characteristics of each telephone number in the user call behavior data, wherein the call behavior characteristics comprise at least one of the following items: historical calling times, historical calling call duration, historical called times, historical called call duration, called times and non-called times;
classifying and sorting the attribute information of all telephone numbers in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data; wherein the attribute characteristics are: crank calls, express meal delivery calls, enterprise calls, rejected calls, preferred call answering, intermediate number calls or frequently used contact calls;
and performing model training on the call behavior characteristics and the attribute characteristics of each telephone number in the user call behavior data by adopting a machine learning algorithm to obtain the crank call model.
In the above scheme, the classifying and sorting the attribute information of all the telephone numbers in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data includes:
acquiring N undetermined attributes of each telephone number in the user call behavior data according to attribute information of all telephone numbers in the user call behavior data; wherein N is an integer greater than or equal to 1;
and screening the N undetermined attributes of each telephone number in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data.
In the above scheme, the attribute information of all telephone numbers in the user call behavior data includes at least one of the following items:
user mark information, enterprise authentication information, enterprise yellow page information, telephone blacklist information and telephone white list information.
In the above scheme, the attribute screening model includes at least one of: a middle number phone model, a frequent contacts phone model.
In the above scheme, the performing attribute identification on the telephone number initiating the new call request according to the harassing call model and the attribute screening model includes:
when a preset prediction condition is met, performing attribute identification on the telephone number initiating the new call request according to the harassing call model and the attribute screening model, wherein the preset prediction condition comprises at least one of the following items:
the number of the telephone numbers initiating the new call request is greater than or equal to a preset number threshold;
the time interval from the current time to the last update time of the attribute of the telephone number is greater than or equal to a preset time threshold.
The embodiment of the invention also provides a device for identifying the telephone number, which comprises: a memory and a processor; wherein the content of the first and second substances,
the memory for storing a computer program
The processor, when executing the computer program, is configured to perform the following steps:
acquiring user call behavior data and attribute information of all telephone numbers in the user call behavior data;
performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by adopting a machine learning algorithm to obtain a crank call model;
performing data analysis on the user call behavior data and attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model, wherein the attribute screening model is used for expressing a standard for determining the attribute of the telephone numbers;
and according to the crank call model and the attribute screening model, performing attribute identification on the telephone numbers initiating the new call requests to obtain the attribute of each telephone number initiating the new call request.
In the foregoing solution, the processor is specifically configured to, when running the computer program, execute the following steps:
according to the crank call model, carrying out type prediction on the telephone number initiating the new call request to obtain the prediction type of the telephone number initiating the new call request; the prediction type is a crank call or a non-crank call;
and according to the attribute screening model, carrying out attribute identification on the telephone number with the predicted type of the crank call in the telephone numbers initiating the new call request to obtain the attribute of the telephone number with the predicted type of the crank call.
In the foregoing solution, the processor is specifically configured to, when running the computer program, execute the following steps:
selecting classification parameters from the user call behavior data and the attribute information of all telephone numbers in the user call behavior data, and establishing an attribute screening model according to the classification parameters.
In the foregoing solution, the processor is specifically configured to, when running the computer program, execute the following steps:
classifying and sorting the user call behavior data to obtain call behavior characteristics of each telephone number in the user call behavior data, wherein the call behavior characteristics comprise at least one of the following items: historical calling times, historical calling call duration, historical called times, historical called call duration, called times and non-called times;
classifying and sorting the attribute information of all telephone numbers in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data; wherein the attribute characteristics are: crank calls, express meal delivery calls, enterprise calls, rejected calls, preferred call answering, intermediate number calls or frequently used contact calls;
and performing model training on the call behavior characteristics and the attribute characteristics of each telephone number in the user call behavior data by adopting a machine learning algorithm to obtain the crank call model.
In the foregoing solution, the processor is specifically configured to, when running the computer program, execute the following steps:
acquiring N undetermined attributes of each telephone number in the user call behavior data according to attribute information of all telephone numbers in the user call behavior data; wherein N is an integer greater than or equal to 1;
and screening the N undetermined attributes of each telephone number in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data.
In the above scheme, the attribute information of all telephone numbers in the user call behavior data includes at least one of the following items:
user mark information, enterprise authentication information, enterprise yellow page information, telephone blacklist information and telephone white list information.
In the above scheme, the attribute screening model includes at least one of: a middle number phone model, a frequent contacts phone model.
In the foregoing solution, the processor is specifically configured to, when running the computer program, execute the following steps:
when a preset prediction condition is met, performing attribute identification on the telephone number initiating the new call request according to the harassing call model and the attribute screening model, wherein the preset prediction condition comprises at least one of the following items:
the number of the telephone numbers initiating the new call request is greater than or equal to a preset number threshold;
the time interval from the current time to the last update time of the attribute of the telephone number is greater than or equal to a preset time threshold.
An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program,
the computer program, when executed by at least one processor, causes the at least one processor to perform the steps of any one of the above-described methods of identifying a telephone number.
The embodiment of the invention provides a method for identifying telephone numbers, which comprises the steps of obtaining user call behavior data and attribute information of all telephone numbers in the user call behavior data; performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by adopting a machine learning algorithm to obtain a crank call model; performing data analysis on the user call behavior data and attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model, wherein the attribute screening model is used for expressing a standard for determining the attribute of the telephone numbers; and according to the crank call model and the attribute screening model, performing attribute identification on the telephone numbers initiating the new call requests to obtain the attribute of each telephone number initiating the new call request. Therefore, in the embodiment of the invention, model training and data analysis of a machine learning algorithm are respectively carried out on the multiple types of data to obtain a highly reliable harassing call model and an attribute screening model, and then attribute identification is carried out on each telephone number through the two models, so that the identification accuracy of each telephone number is further improved.
Drawings
Fig. 1 is a first flowchart of a method for identifying a phone number according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for identifying a phone number according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for identifying a phone number according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an intelligent anti-harassment system provided in the embodiment of the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
Example one
An embodiment of the present invention provides a method for identifying a phone number, as shown in fig. 1, where the method includes:
step S101: and acquiring the user call behavior data and attribute information of all telephone numbers in the user call behavior data.
In practical implementation, a user may initiate a call request through a mobile phone, a fixed phone, a network phone, or a pseudo base station, and for all phone numbers that have initiated the call request, may obtain corresponding user call behavior data from a server of an operator, including: acquiring user call behavior data of all telephone numbers initiating call requests in a preset time period, or acquiring user call behavior data of all telephone numbers initiating call requests in the preset time period; wherein the user call behavior data of each telephone number initiating the call request may include at least one of: all called telephone numbers called by the telephone number, all calling telephone numbers calling the telephone number, the call duration and the call time of each called telephone number, all telephone numbers receiving short messages of the telephone number, all telephone numbers sending short messages to the telephone number, the number of times of being answered, the number of times of not being answered and the like;
and then obtaining attribute information of all telephone numbers in the user call behavior data by adopting a web crawler mode, a search mode by utilizing a search engine or a related database searching mode, wherein the attribute information can comprise at least one of the following items: user mark information, enterprise authentication information, enterprise yellow page information, telephone blacklist information and telephone white list information; for example, the web crawler is used to put all phone numbers in the user's call behavior data into the web crawler, and then the crawler searches the related attribute information of the phone number to be identified in search engines such as www.baidu.com and www.so.com, and can also grab more user tag information from the world wide web.
Optionally, distributed storage is performed on the acquired user call behavior data and attribute information of all telephone numbers in the user call behavior data by using a big data platform, for example, based on a Hadoop distributed system, a big data platform is constructed for performing high-speed operation and storage on mass data; further, step S102 to step S104 may also be performed using a distributed processing technique of a big data platform.
Further, after acquiring the user call behavior data and the attribute information of all telephone numbers in the user call behavior data, the method may further include: classifying and sorting the user call behavior data to obtain call behavior characteristics of each telephone number in the user call behavior data, wherein the call behavior characteristics can include at least one of the following items: history called phone list, history calling times, history calling conversation time, history called times, history called conversation time, called times and un-called times; illustratively, from the user call behavior data of which the preset time period can be one day, the call behavior characteristics of each telephone number in the user call behavior data are obtained, and comprise at least one of the following items: daily accumulated called telephone number list, daily accumulated calling times, daily accumulated average calling conversation time, daily accumulated opposite terminal number percentage, daily accumulated short conversation percentage, daily accumulated called times, daily accumulated average called conversation time, daily accumulated called opposite terminal number, daily accumulated roaming position change number and the like;
classifying and sorting the attribute information of all telephone numbers in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data; wherein the attribute characteristics may be: a crank call, a non-crank call, a courier call, an enterprise call, a rejected call, a priority call, an intermediate call, or a frequent contact call.
Step S102: and performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by adopting a machine learning algorithm to obtain a crank call model.
In practical implementation, all call behavior characteristics are used as classification parameters to establish a crank call model, each call number in user call behavior data can be classified into a crank call or a non-crank call according to attribute characteristics of each call number in the user call behavior data, the call behavior characteristics of each call number in the user call behavior data are combined as input, and a supervised learning algorithm in a machine learning algorithm is adopted to perform model training to obtain the crank call model.
Step S103: and performing data analysis on the user call behavior data and the attribute information of all the telephone numbers in the user call behavior data to obtain an attribute screening model, wherein the attribute screening model is used for expressing the standard for determining the attribute of the telephone numbers.
In actual implementation, in order to improve the identification accuracy of telephone numbers, in addition to obtaining a nuisance call model through a machine learning algorithm, classification parameters can be selected from the user call behavior data and attribute information of all telephone numbers in the user call behavior data according to the unique features of all telephone numbers with certain attribute features, which are different from all telephone numbers with other attribute features, an attribute screening model corresponding to certain attribute features is established according to the classification parameters, and then an attribute screening model corresponding to a plurality of different attribute features is obtained, wherein the attribute screening model can include at least one of the following items: the method comprises the steps that an intermediate number telephone model, a common contact person telephone model and the like are adopted, each attribute screening model can uniquely identify all telephone numbers of certain attribute characteristics, and the identification results of the crank call model are further screened through the attribute screening models, so that more accurate identification results are obtained.
Illustratively, the intermediate number is based on the principle of flexible binding of a virtual auxiliary number, when an O2O (Online To Offline ) transaction order is generated, an O2O platform randomly allocates an intermediate number To both transaction parties as a temporary call telephone number, the intermediate number is bound with the transaction order for use, the intermediate number is unbound and recycled after the transaction is finished, and it is ensured that only the intermediate number is displayed when both transaction parties are in a call, thereby implementing effective encryption protection on real telephone number information of both transaction parties; illustratively, when a telephone number A calls a telephone number B, calling is carried out through an intermediate number C, only the intermediate number C is displayed on terminals to which the telephone numbers A and B belong, and the generated call behavior data includes that the telephone number A calls the intermediate number C and the intermediate number C calls the telephone number B at the same time, namely, the intermediate number has the unique characteristics that the calling times are equal to the called times and the calling call duration is equal to the called call duration, so that an intermediate number telephone model can be established, and the classification parameters of the intermediate number telephone model are four call behavior characteristics of the calling times, the called times, the calling call duration and the called call duration;
the common contact is used for analyzing the call behavior data of the telephone numbers of the user 1 and the user 2, when the telephone number of the user 3 exists in the call behavior data of the telephone numbers of the user 1 and the user 2 at the same time, the user 3 is the common contact of the user 1 and the user 2, and when the number of the common contacts of the user 1 and the user 2 is more, the intimacy of the user 1 and the user 2 is higher, and the possibility that the user 1 and the user 2 are mutually the common contact is higher; in addition, the common contact of the user 1 and the user 2 can be obtained according to the address book friends, the enterprise circle (enterprise address book), the family circle (home network) and the like of the user 1 and the user 2; illustratively, a called telephone number list of any two telephone numbers A, B in the user call behavior data is obtained, when there are i identical called telephone numbers in the called telephone number list of the two telephone numbers A, B, the intimacy values of the telephone numbers a and B are equal to i, i is an integer greater than or equal to 1, and when the intimacy values of the telephone numbers a and B are greater than a preset intimacy threshold value, the telephone numbers a and B are telephone numbers of two common contacts, so that a common contact telephone model can be established, and the classification parameters of the common contact telephone model are the called telephone number list and the intimacy value.
It should be noted that, in the embodiment of the present invention, the execution order of step S102 and step S103 is not limited, for example, step S102 may be executed before step S103, or may be executed after step S103, or both may be executed simultaneously.
Step S104: and according to the harassing call model and the attribute screening model, performing attribute identification on the telephone number initiating the new call request to obtain the attribute of each telephone number initiating the new call request.
In actual implementation, according to a crank call model, performing type prediction on a telephone number initiating a new call request to obtain a prediction type of the telephone number initiating the new call request, wherein the prediction type is a crank call or a non-crank call; correspondingly, determining the attribute of the telephone number which is predicted to be a non-harassing call in the telephone numbers which initiate the new call request as the non-harassing call; determining the attribute of the telephone number with the predicted type of the crank call in the telephone numbers initiating the new call request according to the attribute screening model;
according to the attribute screening model, performing attribute identification on the telephone number with the predicted type of the crank call in the telephone numbers initiating the new call request to obtain the attribute of the telephone number with the predicted type of the crank call; exemplarily, the attribute screening model may include an intermediate number model and a common contact model, and when at least one of preset attribute judgment conditions is met, it is determined that the attribute of the telephone number with the predicted type of a crank call is a non-crank call, otherwise, the attribute of the telephone number with the predicted type of a crank call is a crank call, where the preset attribute judgment conditions include: carrying out attribute identification on the telephone number with the predicted type of the crank call by using an intermediate number model, wherein the telephone number with the predicted type of the crank call is the intermediate number telephone number; and performing attribute identification on the telephone number with the predicted type as a crank call by using a common contact model, wherein the telephone number with the predicted type as a crank call is the telephone number of the common contact.
It should be noted that, steps S101 to S104 may be implemented by a big data platform using distributed processing and distributed storage.
Therefore, in the embodiment of the invention, the user communication behavior data and the attribute information of all telephone numbers in the user communication behavior data are obtained; performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by adopting a machine learning algorithm to obtain a crank call model; performing data analysis on the user call behavior data and attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model; according to the harassing call model and the attribute screening model, performing attribute identification on the telephone number initiating the new call request to obtain the attribute of each telephone number initiating the new call request; the attribute screening model is obtained by analyzing data according to the unique characteristics of some telephone numbers, and each attribute screening model can accurately identify all telephone numbers with certain attribute characteristics, so that the identification result of the crank call model is further judged and screened through the attribute screening model, and the identification accuracy of each telephone number initiating a new call request is improved.
Example two
In order to further embody the object of the present invention, the above embodiments are further illustrated.
An embodiment of the present invention provides a method for identifying a phone number, as shown in fig. 2, where the method includes:
step S201: and classifying and sorting the user call behavior data and the attribute information of all the telephone numbers in the user call behavior data to obtain the call behavior characteristics and the attribute characteristics of each telephone number in the user call behavior data.
In practical implementation, the user call behavior data can be classified and sorted, and the call behavior characteristics of each telephone number in the user call behavior data are obtained; and classifying and sorting the attribute information of all the telephone numbers in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data.
Further, classifying and sorting the attribute information of all telephone numbers in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data, and the method comprises the following steps:
s2011: acquiring N undetermined attributes of each telephone number in the user call behavior data according to attribute information of all telephone numbers in the user call behavior data; wherein N is an integer greater than or equal to 1, and each undetermined attribute may be: a crank call, an express meal delivery call, an enterprise call, a rejected call, a preferred call, an intermediate number call, or a frequent contact call.
Illustratively, the attribute information may include user tagging information, business authentication information, business yellow pages information, phone blacklist information, phone whitelist information, intermediate number information, and frequent contacts information.
In practical implementation, the user mark information may be comment marks of any phone number on any APP (Application program) by the user, and the pending attribute of a harassing call, a fraud call, an advertising promotion call or an express meal delivery call of each marked phone is obtained from the user mark information; the enterprise authentication information and the enterprise yellow page information can be enterprise information which is added by each enterprise in an enterprise authentication management system or an enterprise yellow page management system and comprises an enterprise telephone number, and the undetermined attribute of the enterprise telephone of each enterprise telephone is obtained from the enterprise authentication information and the enterprise yellow page information; the phone blacklist information can be a phone number which cannot call the number of the phone blacklist and is set by a user, and undetermined attributes of the phone which is refused to be listened to of each blacklist phone are obtained from the phone blacklist; the white list information of the telephone can be the telephone number which can call the number of the user and is set by the user, and the pending attribute of the preferred answering telephone of each white list telephone is obtained from the white list of the telephone; the intermediate number is used for setting a temporary telephone number for both calling parties as incoming call displays of both calling parties, and obtaining the undetermined attribute of the intermediate number telephone of each intermediate telephone from the APP with the intermediate number function, or obtaining the undetermined attribute of the intermediate number telephone of any one intermediate telephone according to the analysis of the user calling behavior data; the common contact person means that when the number of the same telephone in a contact telephone list of one telephone and another telephone is higher, the two telephones are considered to belong to the common contact person telephone, and the undetermined attribute of the common contact person telephone of any two telephones is analyzed from the user call behavior data.
S2012: and screening the N undetermined attributes of each telephone number in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data.
For example, the N pending attributes of each phone number in the user call behavior data may be filtered according to a preset rule, for example, according to the credibility of each attribute information, each attribute information is sorted from high to low in credibility, and the sorting of the attribute information from high to low may be: user mark information, enterprise authentication information, enterprise yellow page information, telephone blacklist information, telephone white list information, intermediate numbers and common contacts; secondly, according to the reliability ranking of the attribute information, selecting undetermined attributes corresponding to attribute information with highest reliability from the N undetermined attributes of each telephone number in the user call behavior data;
attribute features can be classified into nuisance calls and non-nuisance calls, wherein the attribute features are classified as undetermined attributes of nuisance calls: crank calls, fraud calls, advertising calls and rejected calls, the pending attributes classified as attribute features being non-crank calls are: enterprise phone, priority answering phone, intermediate number phone and common contact phone; and then, obtaining the attribute characteristics of each telephone number in the user call behavior data according to the to-be-determined attribute corresponding to the attribute information with the highest credibility of each telephone number in the division basis and the user call behavior data.
Further, when each undetermined attribute is divided into a crank call or a non-crank call, the repeated occurrence frequency of each undetermined attribute can be counted, when the occurrence frequency of any undetermined attribute divided into crank calls is not smaller than a preset frequency threshold, the undetermined attribute is determined to be a crank call, and otherwise, the undetermined attribute is re-divided into a non-crank call.
Step S202: and performing data cleaning on the call behavior characteristics of all the telephone numbers in the user call behavior data to obtain the call behavior characteristics of all the telephone numbers in the cleaned user call behavior data.
In practical implementation, limited user call behavior data are classified arbitrarily to obtain a lot of user call behavior features, but not all call behavior features are beneficial to machine learning model training, so that data cleaning needs to be performed on the obtained call behavior features, wherein the data cleaning refers to finding and correcting recognizable errors in a data file, and the data cleaning includes checking data consistency, processing invalid values, processing missing values and the like.
Illustratively, the data cleansing of the call behavior characteristics of all telephone numbers in the call behavior data of the user may include the following steps:
s2021: deleting invalid columns of call behavior characteristics of all telephone numbers in the call behavior data of the user, wherein the invalid column deletion mainly comprises deleting two types of data, namely deleting a few columns of data in tens of thousands of rows of data, wherein the proportion of any type of data in the historical data in the whole historical data is very small, and the rows of data in the several columns of data are smaller than 1000; and secondly, some data which are irrelevant to the call behavior characteristics in the historical data.
Illustratively, the deleting of the invalid column of the call behavior characteristics of all the telephone numbers in the call behavior data of the user may include at least one of the following:
the method comprises the steps that the number of features included in the call behavior features of any telephone number in user call behavior data is smaller than a preset feature number threshold value, and the telephone number and the call behavior features of the telephone number are deleted, wherein the preset feature number threshold value can be set according to the total number of different features in the call behavior features of all the telephone numbers;
the number of the telephone numbers containing any one of the call behavior characteristics is smaller than a preset number threshold value, and the characteristics are deleted from the call behavior characteristics of all the telephone numbers;
and deleting the 'yes' and 'no' in the call behavior characteristics of all the telephone numbers in the call behavior data of the user, wherein the 'yes' and the 'no' are deduced from the call behavior data of the user.
S2022: and performing null value processing on the call behavior characteristics of all the telephone numbers in the user call behavior data after the invalid columns are deleted, wherein the call behavior characteristics of each telephone number do not necessarily include all the characteristics, and part of the characteristics in the call behavior characteristics of any telephone number have no characteristic value, so that a value of 0 can be assigned to part of the characteristics in the telephone number to indicate that no corresponding communication behavior occurs.
S2023: the call behavior characteristics of all the telephone numbers after null value processing are normalized, and the classification result is greatly influenced due to the fact that the value range of the characteristic value of any characteristic exists in the call behavior characteristics is too large, so that the average value of each characteristic in the call behavior characteristics can be determined according to the call behavior characteristics of all the telephone numbers after null value processing, when the average value of any characteristic is larger than a preset characteristic threshold value, the characteristic value of the characteristic is normalized, and all the characteristic values of the characteristic are guaranteed to be in a proper numerical value range, for example, an L2 norm normalization method is adopted.
Step S203: and extracting the call behavior characteristics of all the telephone numbers in the cleaned user call behavior data to obtain the call behavior characteristics of all the telephone numbers in the selected user call behavior data.
In actual implementation, in order to learn the structure and essence of a crank call identification problem from the cleaned call behavior features, feature extraction is performed on the cleaned call behavior features, and features with better explanation on a crank call model are selected, and generally, feature extraction is performed according to the following two criteria: whether the feature diverges or not is judged, if any feature diverges, variance calculation is carried out on feature values of the feature of all the telephone numbers, the variance value of the feature is equal to 0, namely all the telephone numbers have basically no difference on the feature, and therefore the feature which does not diverge is not used for distinguishing the telephone numbers; secondly, the correlation between the characteristics and the target and the characteristics with high correlation with harassing calls should be selected preferentially.
The feature extraction method comprises the following steps: feature selection and dimensionality reduction, both for the purpose of attempting to reduce the number of features in a feature dataset; the feature selection method is to select a subset from an original feature data set, and screen partial features from the subset without changing an original feature space, wherein the method is mainly classified into 3 types: (1) a Filter filtering method, which scores each feature according to the divergence or the correlation, sets a score threshold or a threshold of the number of features to be selected, and selects the features; the main methods are as follows: chi-squared test, information gain, and correlation coefficient (correlation coefficient) tests; (2) the Wrapper method of the Wrapper, produce different combinations by several characteristics, carry on the prediction effect to each combination according to the objective function and grade, and then compare with other combinations, choose several characteristics or exclude several characteristics each time; the main method comprises the following steps: a recursive feature elimination algorithm (recursive feature elimination algorithm); (3) the Embedded method comprises the steps of firstly training by using a preset machine learning algorithm and a preset machine learning model to obtain a weight coefficient of each feature, and selecting the feature from large to small according to the weight coefficient; the main method comprises the following steps: regularization;
the method for reducing dimension is to combine different features to obtain new features through the relationship among the features, changes the original feature space, and selects partial features from the new features, and the main method comprises the following steps: principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Singular Value Decomposition (SVD), and Sammon's Mapping.
For example, the call behavior features of all the telephone numbers after cleaning can be sequentially extracted by the following methods: (1) chi-square check, namely counting the deviation degree between the actual observed value and the theoretical inferred value of a sample, determining the size of the chi-square value according to the deviation degree between the actual observed value and the theoretical inferred value, wherein the bigger the chi-square value is, the corresponding sample is not beneficial to data classification, and deleting the sample; the smaller the chi-square value is, the corresponding sample is beneficial to data classification, and the sample is reserved; (2) recursive feature elimination, which is to train a feature set by using a base model to obtain the importance degree (for example, weight coefficient) of each feature, eliminate the least important features, and then train the new feature set for the next round until the required number of features is reached; (3) based on the feature selection of the Tree model, training a feature set by taking GBDT (hierarchical Boost Decision Tree) in the Tree model as a base model, and selecting features according to a training result; (4) linear judgment analysis, which is to obtain a linear transformation to maximize the ratio of the covariance matrix between different types of data in the sample data to the covariance matrix between data in the same type of data.
Step S204: and performing model training on the call behavior characteristics and attribute characteristics of all telephone numbers in the selected user call behavior data by adopting a machine learning algorithm to obtain a trained crank call model.
In practical implementation, whether the telephone number is a crank call or not can be mainly identified, and accordingly a crank call model is obtained by adopting a machine learning algorithm, and the specific process is as follows: the method comprises the steps that the call behavior characteristics of all telephone numbers in selected call behavior data of a user are divided into training data and testing data by adopting a division ratio of 2 to 8, all the characteristics in the selected call behavior characteristics are used as classification parameters of a crank call model, the training data are used for training the crank call model, and then the testing data are used for verifying the accuracy of the trained crank call model; the harassing call model can adopt a random forest classifier.
Further, when the trained crank call model does not meet the preset accuracy threshold, performing model adjustment on the trained crank call model to obtain an optimal crank call model, for example, by adopting a k-fold cross validation method, testing the trained crank call model by fully utilizing call behavior characteristics of all telephone numbers in the selected user call behavior data to obtain the optimal crank call model.
Specifically, the method for k-fold cross validation comprises the following steps: changing classification parameters or weights of the classification parameters in the trained crank call model to obtain m undetermined crank call models with different classification parameters or weights of the classification parameters, wherein m is an integer greater than or equal to 1; taking the call behavior characteristics of all telephone numbers in the selected user call behavior data as a data set S, and dividing the data set S into k disjoint subsets, wherein k is an integer larger than k; executing the following processes for each pending crank call model: repeatedly taking 1 subset of the k subsets as a test set each time, taking other k-1 subsets as training sets for training the model, then calculating the identification accuracy of the to-be-determined crank call model on the test set, and then averaging the identification accuracy of the k subsets to be-determined crank call model as the real identification accuracy of the to-be-determined crank call model; and selecting the model with the highest real recognition accuracy from the m undetermined harassing call models as the optimal harassing call model.
Step S205: and performing data analysis on the user call behavior data and the attribute information of all the telephone numbers in the user call behavior data to obtain an attribute screening model, wherein the attribute screening model is used for expressing the standard for determining the attribute of the telephone numbers.
The implementation of this step is the same as that of step S103, and is not described here again.
It should be noted that, in the embodiment of the present invention, the execution sequence of step S202 to step S204 and step S205 is not limited, for example, step S202 to step S204 may be executed before step S205, or may be executed after step S205, or both may be executed simultaneously.
Step S206: and according to the trained crank call model and the attribute screening model, performing attribute identification on the telephone number initiating the new call request to obtain the attribute of each telephone number initiating the new call request.
In actual implementation, when a preset prediction condition is met, attribute identification is carried out on a telephone number initiating a new call request according to a trained crank call model and an attribute screening model, wherein the preset prediction condition comprises at least one of the following items: the number of the telephone numbers initiating the new call request is greater than or equal to a preset number threshold; the time interval from the current time to the last update time of the attribute of the telephone number is greater than or equal to a preset time threshold.
It should be noted that steps S201 to S206 can be implemented by a big data platform using distributed processing and distributed storage.
Therefore, in the embodiment of the invention, the user communication behavior data and the attribute information of all the telephone numbers in the user communication behavior data are classified and sorted, and the communication behavior characteristic and the attribute characteristic of each telephone number in the user communication behavior data are obtained; sequentially carrying out data cleaning and feature extraction on the call behavior features of all telephone numbers in the user call behavior data to obtain the call behavior features of all telephone numbers in the selected user call behavior data; performing model training on the call behavior characteristics and attribute characteristics of all telephone numbers in the selected user call behavior data to obtain a trained crank call model; performing data analysis on the user call behavior data and attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model; according to the trained crank call model and the attribute screening model, performing attribute identification on the telephone number initiating the new call request to obtain the attribute of each telephone number initiating the new call request; in the process, various data are classified, sorted, cleaned and extracted to obtain the call behavior characteristics of all telephone numbers in the selected user call behavior data for model training, so that a crank call model with high identification accuracy is obtained, data analysis is performed according to the specific characteristics of some telephone numbers to obtain an attribute screening model, crank call identification is performed through the crank call model and the attribute screening model, and the identification accuracy of the telephone numbers is improved.
EXAMPLE III
In order to further embody the purpose of the present invention, further illustration is made on the basis of the foregoing method embodiment.
An embodiment of the present invention provides an apparatus for identifying a phone number, as shown in fig. 3, an apparatus 300 for identifying a phone number includes: a memory 301 and a processor 302, wherein,
the memory 301 is used for storing computer programs;
the processor 302 is adapted to perform the following steps when executing the computer program stored in the memory 301:
acquiring user call behavior data and attribute information of all telephone numbers in the user call behavior data;
performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by adopting a machine learning algorithm to obtain a crank call model;
performing data analysis on the user call behavior data and attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model, wherein the attribute screening model is used for expressing a standard for determining the attribute of the telephone numbers;
and according to the harassing call model and the attribute screening model, performing attribute identification on the telephone number initiating the new call request to obtain the identification attribute of each telephone number initiating the new call request.
In the above solution, the processor 302 is specifically configured to execute the following steps when executing the computer program stored in the memory 301:
according to the harassing call model, carrying out type prediction on the telephone number initiating the new call request to obtain the prediction type of the telephone number initiating the new call request, wherein the prediction type is a harassing call or a non-harassing call; correspondingly, determining the attribute of the telephone number which is predicted to be a non-harassing call in the telephone numbers which initiate the new call request as the non-harassing call; determining the attribute of the telephone number with the predicted type of the crank call in the telephone numbers initiating the new call request according to the attribute screening model;
according to the attribute screening model, performing attribute identification on the telephone number with the predicted type of the crank call in the telephone numbers initiating the new call request to obtain the attribute of the telephone number with the predicted type of the crank call; exemplarily, the attribute screening model may include an intermediate number model and a common contact model, and when at least one of preset attribute judgment conditions is met, it is determined that the attribute of the telephone number with the predicted type of a crank call is a non-crank call, otherwise, the attribute of the telephone number with the predicted type of a crank call is a crank call, where the preset attribute judgment conditions include: carrying out attribute identification on the telephone number with the predicted type of the crank call by using an intermediate number model, wherein the telephone number with the predicted type of the crank call is the intermediate number telephone number; and performing attribute identification on the telephone number with the predicted type as a crank call by using a common contact model, wherein the telephone number with the predicted type as a crank call is the telephone number of the common contact.
In the above solution, the processor 302 is specifically configured to execute the following steps when executing the computer program stored in the memory 301: selecting classification parameters from the user call behavior data and attribute information of all telephone numbers in the user call behavior data, and establishing an attribute screening model according to the classification parameters, wherein the attribute screening model can comprise at least one of the following items: the system comprises an intermediate number telephone model, a common contact person telephone model and the like, wherein each attribute screening model can uniquely identify all telephone numbers with certain attribute characteristics.
In the above solution, the processor 302 is specifically configured to execute the following steps when executing the computer program stored in the memory 301: classifying and sorting the user call behavior data to obtain call behavior characteristics of each telephone number in the user call behavior data, wherein the call behavior characteristics can include at least one of the following items: history called phone list, history calling times, history calling conversation time, history called times, history called conversation time, called times and un-called times;
classifying and sorting the attribute information of all telephone numbers in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data; wherein the attribute characteristics may be: crank calls, non-crank calls, express meal delivery calls, enterprise calls, rejected calls, preferred call answering calls, intermediate number calls or frequently used contact calls;
and performing model training on the call behavior characteristics and the attribute characteristics of each telephone number in the call behavior data of the user by adopting a machine learning algorithm to obtain a crank call model.
In the above scheme, the attribute information of all telephone numbers in the user call behavior data includes at least one of the following items: user mark information, enterprise authentication information, enterprise yellow page information, telephone blacklist information and telephone white list information;
the attribute screening model comprises at least one of the following: a middle number phone model, a frequent contacts phone model.
In the above solution, the processor 302 is specifically configured to execute the following steps when executing the computer program stored in the memory 301:
when a preset prediction condition is met, performing attribute identification on the telephone number initiating the new call request according to the harassing call model and the attribute screening model, wherein the preset prediction condition comprises at least one of the following items:
the number of the telephone numbers initiating the new call request is greater than or equal to a preset number threshold;
and the time interval from the current time to the updating time of the prediction type attribute of the last telephone number is greater than or equal to a preset time threshold.
Exemplarily, the device for identifying a phone number may be a big data platform adopting a distributed architecture, and an intelligent anti-harassment system may be composed of the device for identifying a phone number, a mobile core network and a service platform, where a schematic structural diagram of the intelligent anti-harassment system is shown in fig. 4, where the mobile core network may include a mobile Switching center msc (mobile Switching center), a service Monitoring platform scp (business Monitoring platform), and a home Location register hlr (home Location register), and the mobile Switching center is configured to receive a call request of a phone number and send a notification signaling to the service Monitoring platform; the service monitoring platform is used for sending a crank call identification request carrying a telephone number to the telephone number identification device when receiving the notification signaling;
the device for identifying the telephone number is used for identifying the telephone number when receiving the telephone number identification request carrying the telephone number and returning an identification result to the service monitoring platform.
Further, the service monitoring platform is specifically configured to, when the identification result is that the telephone number belongs to a crank call and the called terminal to which the called telephone number belongs opens the interception service, notify the mobile switching center to terminate the call of the telephone number and send the interception state to the service platform, so that the service platform sends a short message to notify the called terminal of the interception result; when the identification result is that the calling number belongs to a crank call and the called terminal to which the called telephone belongs opens the reminding service, informing the mobile switching center to release the call of the telephone number, and informing the called terminal that the telephone number is a crank call in a flash mode; and when the identification result is that the telephone number belongs to the non-harassing call, informing the mobile switching center to pass the call of the telephone number.
Example four
Based on the same technical concept as the foregoing embodiments, a fifth embodiment of the present invention provides a computer-readable storage medium, which can be applied to an apparatus; the technical solutions of the foregoing embodiments substantially or partially contribute to the prior art, or all or part of the technical solutions may be embodied in the form of a software product stored in a computer-readable storage medium, which includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method described in this embodiment. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Specifically, the computer program instructions corresponding to a method for identifying a telephone number in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, a usb disk, etc., and when the computer program instructions corresponding to a method for identifying a telephone number in the storage medium are read or executed by an electronic device, the at least one processor may be caused to execute the steps of any one of the methods for identifying a telephone number in the foregoing embodiments of the present invention.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (17)

1. A method of identifying a telephone number, the method comprising:
acquiring user call behavior data and attribute information of all telephone numbers in the user call behavior data; the step of acquiring the user call behavior data comprises the following steps: acquiring user call behavior data of all telephone numbers initiating call requests in a preset time period, or acquiring user call behavior data of all telephone numbers initiating call requests in the preset time period;
performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by adopting a machine learning algorithm to obtain a crank call model;
performing data analysis on the user call behavior data and attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model, wherein the attribute screening model is used for expressing a standard for determining the attribute of the telephone numbers;
according to the crank call model and the attribute screening model, performing attribute identification on the telephone numbers initiating the new call requests to obtain the attribute of each telephone number initiating the new call request;
the method for performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by adopting a machine learning algorithm to obtain a crank call model comprises the following steps:
classifying and sorting the user call behavior data and the attribute information of all telephone numbers in the call behavior data to obtain call behavior characteristics and attribute characteristics of each telephone number in the user call behavior data; carrying out data cleaning on the call behavior characteristics to obtain the cleaned call behavior characteristics; carrying out feature extraction on the cleaned call behavior features to obtain selected call behavior features; the data cleansing includes: checking data consistency, processing invalid values and missing values; the feature extraction includes: feature selection and dimension reduction;
and performing model training on the selected call behavior characteristics and the attribute characteristics by adopting a machine learning algorithm to obtain the crank call model.
2. The method according to claim 1, wherein the performing attribute identification on the telephone numbers initiating the new call requests according to the harassing call model and the attribute screening model to obtain the attribute of each telephone number initiating the new call request comprises:
according to the crank call model, carrying out type prediction on the telephone number initiating the new call request to obtain the prediction type of the telephone number initiating the new call request; the prediction type is a crank call or a non-crank call;
and according to the attribute screening model, carrying out attribute identification on the telephone number with the predicted type of the crank call in the telephone numbers initiating the new call request to obtain the attribute of the telephone number with the predicted type of the crank call.
3. The method according to claim 1, wherein the performing data analysis on the user call behavior data and the attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model comprises:
selecting classification parameters from the user call behavior data and the attribute information of all telephone numbers in the user call behavior data, and establishing an attribute screening model according to the classification parameters.
4. The method according to claim 1, wherein the performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by using a machine learning algorithm to obtain a crank call model comprises:
classifying and sorting the user call behavior data to obtain call behavior characteristics of each telephone number in the user call behavior data, wherein the call behavior characteristics comprise at least one of the following items: historical calling times, historical calling call duration, historical called times, historical called call duration, called times and non-called times;
classifying and sorting the attribute information of all telephone numbers in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data; wherein the attribute characteristics are: crank calls, express meal delivery calls, enterprise calls, rejected calls, preferred call answering, intermediate number calls or frequently used contact calls;
and performing model training on the call behavior characteristics and the attribute characteristics of each telephone number in the user call behavior data by adopting a machine learning algorithm to obtain the crank call model.
5. The method according to claim 4, wherein the classifying and sorting the attribute information of all telephone numbers in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data comprises:
acquiring N undetermined attributes of each telephone number in the user call behavior data according to attribute information of all telephone numbers in the user call behavior data; wherein N is an integer greater than or equal to 1;
and screening the N undetermined attributes of each telephone number in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data.
6. The method of claim 1, wherein the attribute information of all phone numbers in the user call behavior data comprises at least one of the following items:
user mark information, enterprise authentication information, enterprise yellow page information, telephone blacklist information and telephone white list information.
7. The method of claim 1, wherein the attribute screening model comprises at least one of: a middle number phone model, a frequent contacts phone model.
8. The method according to claim 1, wherein the performing attribute identification on the telephone number initiating the new call request according to the harassing call model and the attribute screening model comprises:
when a preset prediction condition is met, performing attribute identification on the telephone number initiating the new call request according to the harassing call model and the attribute screening model, wherein the preset prediction condition comprises at least one of the following items:
the number of the telephone numbers initiating the new call request is greater than or equal to a preset number threshold;
the time interval from the current time to the last update time of the attribute of the telephone number is greater than or equal to a preset time threshold.
9. An apparatus for identifying a telephone number, the apparatus comprising: a memory and a processor; wherein the content of the first and second substances,
the memory for storing a computer program
The processor, when executing the computer program, is configured to perform the following steps:
acquiring user call behavior data and attribute information of all telephone numbers in the user call behavior data; the step of acquiring the user call behavior data comprises the following steps: acquiring user call behavior data of all telephone numbers initiating call requests in a preset time period, or acquiring user call behavior data of all telephone numbers initiating call requests in the preset time period;
performing model training on the user call behavior data and attribute information of all telephone numbers in the user call behavior data by adopting a machine learning algorithm to obtain a crank call model;
performing data analysis on the user call behavior data and attribute information of all telephone numbers in the user call behavior data to obtain an attribute screening model, wherein the attribute screening model is used for expressing a standard for determining the attribute of the telephone numbers;
according to the crank call model and the attribute screening model, performing attribute identification on the telephone numbers initiating the new call requests to obtain the attribute of each telephone number initiating the new call request;
the processor is specifically configured to, when executing the computer program stored in the memory, further perform the following steps:
classifying and sorting the user call behavior data and the attribute information of all telephone numbers in the call behavior data to obtain call behavior characteristics and attribute characteristics of each telephone number in the user call behavior data; carrying out data cleaning on the call behavior characteristics to obtain the cleaned call behavior characteristics; carrying out feature extraction on the cleaned call behavior features to obtain selected call behavior features; the data cleansing includes: checking data consistency, processing invalid values and missing values; the feature extraction includes: feature selection and dimension reduction;
and performing model training on the selected call behavior characteristics and the attribute characteristics by adopting a machine learning algorithm to obtain the crank call model.
10. The apparatus according to claim 9, wherein the processor is specifically configured to perform the following steps when running the computer program:
according to the crank call model, carrying out type prediction on the telephone number initiating the new call request to obtain the prediction type of the telephone number initiating the new call request; the prediction type is a crank call or a non-crank call;
and according to the attribute screening model, carrying out attribute identification on the telephone number with the predicted type of the crank call in the telephone numbers initiating the new call request to obtain the attribute of the telephone number with the predicted type of the crank call.
11. The apparatus according to claim 9, wherein the processor is specifically configured to perform the following steps when running the computer program:
selecting classification parameters from the user call behavior data and the attribute information of all telephone numbers in the user call behavior data, and establishing an attribute screening model according to the classification parameters.
12. The apparatus according to claim 9, wherein the processor is specifically configured to perform the following steps when running the computer program:
classifying and sorting the user call behavior data to obtain call behavior characteristics of each telephone number in the user call behavior data, wherein the call behavior characteristics comprise at least one of the following items: historical calling times, historical calling call duration, historical called times, historical called call duration, called times and non-called times;
classifying and sorting the attribute information of all telephone numbers in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data; wherein the attribute characteristics are: crank calls, express meal delivery calls, enterprise calls, rejected calls, preferred call answering, intermediate number calls or frequently used contact calls;
and performing model training on the call behavior characteristics and the attribute characteristics of each telephone number in the user call behavior data by adopting a machine learning algorithm to obtain the crank call model.
13. The apparatus according to claim 12, wherein the processor is specifically configured to perform the following steps when running the computer program:
acquiring N undetermined attributes of each telephone number in the user call behavior data according to attribute information of all telephone numbers in the user call behavior data; wherein N is an integer greater than or equal to 1;
and screening the N undetermined attributes of each telephone number in the user call behavior data to obtain the attribute characteristics of each telephone number in the user call behavior data.
14. The apparatus according to claim 9, wherein the attribute information of all telephone numbers in the user call behavior data comprises at least one of the following items:
user mark information, enterprise authentication information, enterprise yellow page information, telephone blacklist information and telephone white list information.
15. The apparatus of claim 9, wherein the attribute filtering model comprises at least one of: a middle number phone model, a frequent contacts phone model.
16. The apparatus according to claim 9, wherein the processor is specifically configured to perform the following steps when running the computer program:
when a preset prediction condition is met, performing attribute identification on the telephone number initiating the new call request according to the harassing call model and the attribute screening model, wherein the preset prediction condition comprises at least one of the following items:
the number of the telephone numbers initiating the new call request is greater than or equal to a preset number threshold;
the time interval from the current time to the last update time of the attribute of the telephone number is greater than or equal to a preset time threshold.
17. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program,
the computer program, when executed by at least one processor, causes the at least one processor to perform the steps of the method of any one of claims 1 to 8.
CN201810372550.4A 2018-04-24 2018-04-24 Method and device for identifying telephone number and computer readable storage medium Active CN110401779B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810372550.4A CN110401779B (en) 2018-04-24 2018-04-24 Method and device for identifying telephone number and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810372550.4A CN110401779B (en) 2018-04-24 2018-04-24 Method and device for identifying telephone number and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN110401779A CN110401779A (en) 2019-11-01
CN110401779B true CN110401779B (en) 2022-02-01

Family

ID=68320143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810372550.4A Active CN110401779B (en) 2018-04-24 2018-04-24 Method and device for identifying telephone number and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN110401779B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113037699B (en) * 2019-12-25 2022-11-29 中国电信股份有限公司 Communication interception method, device and computer readable storage medium
CN111144336A (en) * 2019-12-30 2020-05-12 贵州近邻宝科技有限公司 Automatic identification method for mobile phone number and invoice number of addressee facing to express bill
RU2763047C2 (en) * 2020-02-26 2021-12-27 Акционерное общество "Лаборатория Касперского" System and method for call classification
CN113452845B (en) * 2020-03-26 2024-03-19 中国移动通信集团福建有限公司 Method for identifying abnormal telephone number and electronic equipment
CN111432078B (en) * 2020-03-27 2021-09-10 中国—东盟信息港股份有限公司 System for judging code number abnormity
CN111465021B (en) * 2020-04-01 2023-06-09 北京中亦安图科技股份有限公司 Graph-based crank call identification model construction method
CN113935758A (en) * 2020-07-14 2022-01-14 中国移动通信集团广东有限公司 Training method and device of random forest model for predicting handling probability of broadband service
CN114189585A (en) * 2020-09-14 2022-03-15 中国移动通信集团重庆有限公司 Crank call abnormity detection method and device and computing equipment
CN112261654B (en) * 2020-09-23 2021-08-03 中国地质大学(武汉) Method and system for generating mobile phone number white list in telecommunication anti-fraud process
CN112417311A (en) * 2020-10-29 2021-02-26 上海淇玥信息技术有限公司 Method and device for executing service based on influence factor and electronic equipment
CN113301210B (en) * 2021-04-16 2023-05-23 珠海高凌信息科技股份有限公司 Method and device for preventing harassment call based on neural network and electronic equipment
CN113286035B (en) * 2021-05-14 2022-12-30 国家计算机网络与信息安全管理中心 Abnormal call detection method, device, equipment and medium
CN113905134B (en) * 2021-10-21 2023-06-02 中国联合网络通信集团有限公司 Address book blacklist management method, system, equipment and medium based on block chain
CN113992798A (en) * 2021-10-26 2022-01-28 中国联合网络通信集团有限公司 Telephone identification method, device, equipment and readable storage medium
CN114125155A (en) * 2021-11-15 2022-03-01 天津市国瑞数码安全系统股份有限公司 Crank call detection method and system based on big data analysis

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104023109A (en) * 2014-06-27 2014-09-03 深圳市中兴移动通信有限公司 Incoming call prompt method and device as well as incoming call classifying method and device
CN106686261A (en) * 2017-01-19 2017-05-17 腾讯科技(深圳)有限公司 Information processing method and system
CN107273531A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Telephone number classifying identification method, device, equipment and storage medium
CN107306306A (en) * 2016-04-25 2017-10-31 腾讯科技(深圳)有限公司 Communicating number processing method and processing device
CN107331385A (en) * 2017-07-07 2017-11-07 重庆邮电大学 A kind of identification of harassing call and hold-up interception method
CN107517463A (en) * 2016-06-15 2017-12-26 中国移动通信集团浙江有限公司 A kind of recognition methods of telephone number and device
CN107835496A (en) * 2017-11-24 2018-03-23 北京奇虎科技有限公司 A kind of recognition methods of refuse messages, device and server

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104023109A (en) * 2014-06-27 2014-09-03 深圳市中兴移动通信有限公司 Incoming call prompt method and device as well as incoming call classifying method and device
CN107306306A (en) * 2016-04-25 2017-10-31 腾讯科技(深圳)有限公司 Communicating number processing method and processing device
CN107517463A (en) * 2016-06-15 2017-12-26 中国移动通信集团浙江有限公司 A kind of recognition methods of telephone number and device
CN106686261A (en) * 2017-01-19 2017-05-17 腾讯科技(深圳)有限公司 Information processing method and system
CN107273531A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Telephone number classifying identification method, device, equipment and storage medium
CN107331385A (en) * 2017-07-07 2017-11-07 重庆邮电大学 A kind of identification of harassing call and hold-up interception method
CN107835496A (en) * 2017-11-24 2018-03-23 北京奇虎科技有限公司 A kind of recognition methods of refuse messages, device and server

Also Published As

Publication number Publication date
CN110401779A (en) 2019-11-01

Similar Documents

Publication Publication Date Title
CN110401779B (en) Method and device for identifying telephone number and computer readable storage medium
CN107566358B (en) Risk early warning prompting method, device, medium and equipment
CN109509021B (en) Behavior track-based anomaly identification method and device, server and storage medium
CN108366045B (en) Method and device for setting wind control scoring card
CN106447239B (en) Data release auditing method and device
CN113412607B (en) Content pushing method and device, mobile terminal and storage medium
CN108090359B (en) Application program monitoring method and application server
CN111654866A (en) Method, device and computer storage medium for preventing mobile communication from fraud
CN105045911B (en) Label generating method and equipment for user to mark
CN106910135A (en) User recommends method and device
CN112765003A (en) Risk prediction method based on APP behavior log
CN115577172A (en) Article recommendation method, device, equipment and medium
CN112199388A (en) Strange call identification method and device, electronic equipment and storage medium
CN110677269B (en) Method and device for determining communication user relationship and computer readable storage medium
CN111368858B (en) User satisfaction evaluation method and device
CN107862016A (en) A kind of collocation method of the thematic page
CN112163155A (en) Information processing method, device, equipment and storage medium
CN113962276B (en) Abnormal information determination method and device, electronic equipment and storage medium
CN112468444B (en) Internet domain name abuse identification method and device, electronic equipment and storage medium
CN110555716A (en) Data processing method, data processing device, computer equipment and storage medium
CN110825717A (en) Data normalization method, device and medium for identity recognition
CN111242147A (en) Method and device for identifying close contact and frequent active area
CN114339639B (en) Call identification method, device, storage medium and electronic equipment
CN114268939B (en) Abnormal user identification method in mobile communication and intelligent device
CN115965137B (en) Specific object relevance prediction method, system, terminal and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant