CN111131593B - Crank call identification method and device - Google Patents

Crank call identification method and device Download PDF

Info

Publication number
CN111131593B
CN111131593B CN201811294711.9A CN201811294711A CN111131593B CN 111131593 B CN111131593 B CN 111131593B CN 201811294711 A CN201811294711 A CN 201811294711A CN 111131593 B CN111131593 B CN 111131593B
Authority
CN
China
Prior art keywords
call
crank
classification model
crank call
calling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811294711.9A
Other languages
Chinese (zh)
Other versions
CN111131593A (en
Inventor
陈程
杨敬
彭继东
杨旭虹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201811294711.9A priority Critical patent/CN111131593B/en
Publication of CN111131593A publication Critical patent/CN111131593A/en
Application granted granted Critical
Publication of CN111131593B publication Critical patent/CN111131593B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/66Substation equipment, e.g. for use by subscribers with means for preventing unauthorised or fraudulent calling
    • H04M1/663Preventing unauthorised calls to a telephone set

Abstract

The invention provides a method and a device for identifying crank calls, wherein the method for identifying crank calls can comprise the following steps: the user marks the received crank call number as a crank call; extracting the characteristics of the marked crank calls, establishing a classification model, and judging whether the calling number to be identified is a crank call by using the classification model; when the user marks and the classification model simultaneously judge that the calling number to be identified is a harassing call, judging the calling number as a harassing call number; when the user marks or the classification model judges that the calling number to be identified is a harassing call, the calling number is judged to be a suspected harassing call number; when the user mark and the classification model both judge that the calling number to be identified is a normal telephone number, the calling number is judged as the normal telephone number; and storing the crank call number, the suspected crank call number, the normal call number and the corresponding number category in a database.

Description

Crank call identification method and device
Technical Field
The invention relates to the field of big communication data, in particular to a method and a device for identifying crank calls, computer equipment and a computer readable storage medium.
Background
"harassing call" refers to the act of promoting a product or fraud and intentional telephone harassment by an impostor, bank staff. The method can be divided into a commercial marketing class, a malicious harassment class and an illegal crime class according to the harassment purpose. Identifying crank calls has wide application in social life. At present, the problems of people disturbed by marketing calls, malicious phone disturbance and the like are increasingly prominent, and the normal life of people is seriously influenced. More serious, some people pretend to be a public inspection legal system to carry out telephone fraud, and huge property loss of people is caused. Crank call identification is mainly solved by the following two schemes at present:
disturbance dictionary scheme: this approach manually collects keywords. And if the harassment feature words of the preset dictionary records exist in the keywords contained in the target call records of any suspicious number, determining the suspicious number with the harassment feature words in the target call records as the harassment telephone number.
And (3) a rule identification scheme: the scheme manually analyzes the characteristics of the harassing calls, summarizes a set of rules for identification, and identifies the numbers.
The two schemes have different use scenes, and the problems and the defects brought by the two schemes are different:
the harassment dictionary scheme is simple to implement, the harassment dictionary is manually established, the coverage of keywords is not high, and therefore the recognition rate of harassment calls is directly influenced, and most of harassment calls cannot be recognized. The rule identification scheme analyzes the characteristics of the crank calls, but the identification method is established manually, the accuracy is not high, and misjudgment can be caused.
Therefore, a more reasonable way is urgently needed in the prior art, so that the accuracy and the coverage rate of the identification result are improved.
Disclosure of Invention
The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.
According to a first aspect of the invention, a method for identifying crank calls is provided, which may include:
the user marks the received crank call number as a crank call;
extracting the characteristics of the marked crank calls, establishing a classification model, and judging whether the calling number to be identified is a crank call by using the classification model;
when the user marks and the classification model simultaneously judge that the calling number to be identified is a harassing call, judging the calling number as a harassing call number; when the user marks or the classification model judges that the calling number to be identified is a harassing call, the calling number is judged to be a suspected harassing call number; when the user mark and the classification model both judge that the calling number to be identified is a normal telephone number, the calling number is judged as the normal telephone number;
and storing the crank call number, the suspected crank call number, the normal call number and the corresponding number category in a database.
In one embodiment of the present invention, wherein the user marks the received crank call number as a crank call may include:
and the user marks the received crank call number as a crank call according to the call content, the call intention, the caller identification number or the voice tone of the calling party.
In another embodiment of the present invention, wherein the feature of the marked crank call is extracted, establishing the classification model may comprise:
and extracting the characteristics of the marked harassing calls with higher confidence coefficient, and establishing a classification model.
In a further embodiment of the invention, the characteristic of the marked crank call comprises an attribute characteristic and a behavior characteristic, wherein the attribute characteristic comprises a number type, a number attribution and a number operator; the behavior characteristics comprise average dial-out missed ring time, dial-out on-time ratio, dial-out rejection ratio, incoming call dial-out time ratio and average on-time.
In yet another embodiment of the present invention, wherein storing the crank call numbers, the suspected crank call numbers, the normal call numbers, and the corresponding number categories in the database may comprise:
and taking the crank call number, the suspected crank call number and the normal call number as key values k, and the corresponding number category as a numerical value v, and writing the key values k and the numerical value v into a k-v database.
In one embodiment of the invention, wherein the classification model may comprise a random forest model.
According to a second aspect of the invention, a crank call identification device is provided, which may include:
the marking unit is configured to enable a user to mark the received crank call number as a crank call;
the building unit is used for extracting the characteristics of the marked harassing calls, establishing a classification model, judging whether the calling number to be identified is a harassing call by using the classification model, and judging the calling number as a harassing call number when the user mark and the classification model simultaneously judge that the calling number to be identified is a harassing call; when the user marks or the classification model judges that the calling number to be identified is a harassing call, the calling number is judged to be a suspected harassing call number; when the user mark and the classification model both judge that the calling number to be identified is a normal telephone number, the calling number is judged as the normal telephone number;
and the database is used for storing the crank call numbers, suspected crank call numbers, normal call numbers and corresponding number categories.
In an embodiment of the present invention, wherein the marking unit may be further configured to:
the user can mark the received crank call number as a crank call according to the call content, the call intention, the caller ID number or the voice tone of the calling party.
In another embodiment of the present invention, wherein in the marking unit, extracting the characteristic of the marked harassing call may further comprise:
and extracting the characteristics of the marked harassing calls with higher confidence coefficient, and establishing a classification model.
In a further embodiment of the invention, the characteristic of the marked crank call comprises an attribute characteristic and a behavior characteristic, wherein the attribute characteristic comprises a number type, a number attribution and a number operator; the behavior characteristics comprise average dial-out missed ring time, dial-out on-time ratio, dial-out rejection ratio, incoming call dial-out time ratio and average on-time.
In yet another embodiment of the present invention, wherein the database is a k-v database, k-v data with crank call numbers, suspected crank call numbers, normal call numbers as key values k and corresponding number classes as values v are stored.
In one embodiment of the invention, wherein the classification model comprises a random forest model.
According to a third aspect of the present invention, there is provided a computer device, which may include:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods described above.
According to a fourth aspect of the invention, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the method described above.
By means of the crank call identification method, the crank call identification device, the computer equipment or the computer readable storage medium, the problem of crank call data shortage can be effectively solved, and the requirements of rectifying and treating the problem that crank calls disturb residents and actually purifying the communication service environment are met. The user marks the received crank call number as a crank call, so that a high-quality sample can be obtained, and the problem of cold start of data is solved. The method comprises the steps of extracting the characteristics of marked crank calls, establishing a classification model, judging whether a calling number to be identified is a crank call by using the classification model, namely deeply analyzing high-quality samples, extracting the characteristics and training the model by using data mining and machine learning technologies, and solving the problem of low accuracy of manually making rules. Finally, by means of comprehensive judgment of the user mark and the classification model, accuracy and coverage rate of the identification result of the crank call are further improved.
The foregoing summary is provided for the purpose of illustration only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present invention will be readily apparent by reference to the drawings and following detailed description.
Drawings
In the drawings, like reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily to scale. It is appreciated that these drawings depict only some embodiments in accordance with the disclosure and are therefore not to be considered limiting of its scope.
Fig. 1 schematically shows an embodiment of a method of identifying crank calls according to a first aspect of the invention;
FIG. 2 schematically illustrates one embodiment of a user tagging a received crank call number as a crank call, according to a first aspect of the present invention;
FIG. 3 schematically illustrates one embodiment of extracting features of tagged crank calls, establishing a classification model, and using the classification model to determine whether a calling number to be identified is a crank call, according to a first aspect of the present invention;
FIG. 4 schematically illustrates one embodiment of features relating to marked crank calls, according to a first aspect of the present invention;
FIG. 5 schematically illustrates an embodiment of storing crank call numbers, suspected crank call numbers, normal telephone numbers and corresponding number categories in a database according to a first aspect of the invention;
FIG. 6 schematically depicts one embodiment of a classification model according to a first aspect of the present invention;
fig. 7 schematically shows an embodiment of a crank call identification means according to a second aspect of the invention;
fig. 8 schematically shows an embodiment of a computer device according to a third aspect of the invention.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
In describing embodiments of the present disclosure, the terms "include" and its derivatives should be interpreted as being inclusive, i.e., "including but not limited to. The term "based on" should be understood as "based at least in part on". The term "one embodiment" or "the embodiment" should be understood as "at least one embodiment". Other explicit and implicit definitions are also possible below.
Fig. 1 schematically shows a method 10 for identifying a crank call according to a first aspect of the invention, which may comprise the following steps: step 2, the user marks the received crank call number as a crank call; step 4, extracting the characteristics of the marked crank calls, establishing a classification model, and judging whether the calling number to be identified is a crank call by using the classification model; step 6, when the user mark and the classification model simultaneously judge that the calling number to be identified is a harassing call, the calling number is judged as a harassing call number; when the user marks or the classification model judges that the calling number to be identified is a harassing call, the calling number is judged to be a suspected harassing call number; when the user mark and the classification model both judge that the calling number to be identified is a normal telephone number, the calling number is judged as the normal telephone number; and 8, storing the crank call number, the suspected crank call number, the normal call number and the corresponding number category in a database.
In an embodiment of the invention, wherein in step 2, the user marks the received crank call number as a crank call, the method can also be understood as guiding the user to mark the category of the received telephone, and the emphasis is to mark a crank call. Because the calls received by the user may be nuisance calls or normal calls, the user needs to label received telephone numbers that are considered to be nuisance calls as nuisance calls in order to later provide reliable sample data when building the classification model. The marking of a received telephone number considered as a crank call by a user as a crank call may specifically be performed in the following manner: for example, the user marks the received crank call number as a crank call 12 according to the call content, call intention, caller identification number, or caller voice tone, as shown in fig. 2. The number types marked by the user are generally high in accuracy, but because the user has limited energy, the data volume marked by the user is generally small.
In one embodiment of the invention, in step 4, the features of the labeled crank calls are extracted, a classification model is built, typically the features of the labeled crank calls with higher confidence are extracted, and a classification model 14 is built, as shown in fig. 3. As already mentioned above, the manually marked harassing calls of users are limited, the data size is small, but the accuracy is high, and the identification extracted from the harassing calls with high confidence of the manual marking of the users can establish a more reliable classification model.
It should be noted that, in step 4, after the classification model is established, the classification model may be used to determine whether the calling number to be identified is a harassing call. The purpose of the classification model for determination is that, in the following steps, the caller number to be identified marked by the user and the caller number to be identified by the classification model need to be comprehensively determined to determine whether the caller number to be identified is a harassing call.
It should be noted that, extracting the feature of the marked harassing call, establishing a classification model, and determining whether the calling number to be identified is a harassing call using the classification model is actually a data mining process in step 4. The extraction of the features of the marked crank calls is actually part of the data mining process. From a macroscopic perspective, the process of data mining is divided into a training process and a prediction process. Two types of features with strong discriminations are mined from received telephone numbers marked as nuisance calls: and (3) carrying out attribute characteristic and behavior characteristic, wherein the process of data mining is also a process of deeply analyzing the sample data marked as harassing calls by the user and obtained in the step (2). In one embodiment of the invention, as shown in fig. 4, the characteristic of the marked crank call may include an attribute characteristic and a behavior characteristic, and the attribute characteristic may include a number type, a number attribution, a number operator; the behavioral characteristics may include average dial-out missed ring time, dial-out on-time ratio, dial-out rejection ratio, incoming dial-out time ratio, average on-time.
The feature of the marked harassing call is extracted in the step 4, and the establishment of the classification model is actually a training process in the data mining process, and the purpose is to obtain the classification model for judging whether the calling number to be identified is a harassing call in a training mode. Specifically, this training process may include screening samples, extracting features, and training models. The "filtering sample" is to filter out or sort out the labeled crank calls with higher confidence because some telephone numbers are not judged to be crank calls and may be different for each user, for example, the call for advertisement may be considered as a crank call by some users and not by some users. The harassing calls marked with higher confidence are screened out or selected out, namely telephone numbers considered as harassing calls by most users basically are screened out, for example, telephone numbers (calling numbers) related to various kinds of fraud are sample data screened out, namely, harassing calls marked with higher confidence. The "extracted features" are actually the above-mentioned features of the extracted marked crank calls. The "training model" is actually the above-mentioned classification model.
In the step 4, the classification model is used for judging whether the calling number to be identified is a harassing call or not, namely the prediction process in the data mining process, and the prediction process can comprise the steps of obtaining the calling number or call record, extracting characteristics, predicting and classifying and obtaining the number category. The "acquiring a calling number or a call record" is to acquire a calling number of an incoming call or to obtain a calling number to be identified according to the call record. The 'extraction feature' is to extract the attribute feature and the behavior feature of the calling number to be identified. The prediction classification is to compare the extracted attribute features and behavior features of the calling number to be recognized with the attribute features and behavior features in the classification model to determine whether the attribute features and behavior features of the calling number to be recognized are consistent with the attribute features and behavior features in the classification model. The 'obtaining number category' is to judge whether the calling number to be identified is a harassing call or a normal call according to the 'prediction classification' result.
It is noted that both the received telephone number mentioned in step 2 and the to-be-identified calling number mentioned in step 4 may contain nuisance calls and normal calls.
The extracted attribute features mentioned in the embodiments of the present invention may be number type, number attribution, number operator, and the like. As shown in fig. 4. The number type can be fixed telephone, mobile telephone, VOIP, UAN, etc., the number attribution can refer to the number registration location, which can be generally precise to the city, and the number operator can be China Mobile, China Unicom, China telecom, etc. The extracted behavior features can be average dial-out missed ringing time, dial-out on-time ratio, dial-out rejection ratio, incoming call dial-out time ratio, average on-time and the like. The behavior characteristics represent the characteristics of frequent dialing, frequent hang-up and long call time of the crank calls, and normal calls have different performances, so that the two types can be distinguished.
In one embodiment of the invention, the classification model may be a random forest model. As is known to those skilled in the art, the random forest model employs a voting mechanism of multiple decision trees to improve the decision trees, and assuming that the random forest model uses m decision trees, it is necessary to generate m sets of samples of a certain number to train each tree, and if it is obviously not desirable to train m decision trees with full samples, the full sample training ignores the regularity of local samples, which is harmful to the generalization capability of the model. Since the techniques for using random forest models are known in the art, they will not be described in detail here.
The random forest models given here schematically are for illustration only and do not mean that the classification models used in embodiments of the invention are only random forest models. For example, an algorithmic Gradient Boosting Decision Tree (GBDT model) or the like may be used.
In one embodiment of the invention, in step 6, when the user mark and the classification model simultaneously judge that the calling number to be identified is a harassing call, the calling number is judged as a harassing call number. Namely, for whether a new calling number to be identified is a harassing call, two processes of manual marking and automatic identification by a classification model are needed to carry out comprehensive judgment. And when the calling number to be identified is judged to be a harassing call by the user marking and classification models, the calling number is considered to be a harassing call number. And when one of the user marks or the classification model judges the calling number to be identified as a harassing call, judging the calling number as a suspected harassing call number. When the user mark and the classification model both judge that the calling number to be identified is a normal telephone number, the calling number is judged as the normal telephone number;
in one embodiment of the present invention, in step 8, storing the crank call numbers, the suspected crank call numbers, the normal call numbers, and the corresponding number categories in the database may include: and taking the crank call number, the suspected crank call number and the normal call number as key values k, and the corresponding number category as a numerical value v, and writing the key values k and the numerical value v into a k-v database. After writing into the k-v database, the actual service party, e.g. other users, queries any calling number to be identified in the k-v database, so as to determine whether the calling number is a nuisance call. After the actual service party obtains the judgment result, the actual service party can directly hang up the call or give operation such as prompt of other related users to the answering personnel according to the judgment result, so that the harassment of harassing calls is avoided.
According to a second aspect of the invention, a crank call identification device 20 is provided, as shown in fig. 7, which may include: a marking unit 22 configured to enable a user to mark the received crank call number as a crank call; the building unit 24 is configured to extract features of the marked harassing calls, build a classification model, determine whether the calling number to be identified is a harassing call using the classification model, and determine that the calling number is a harassing call when the user mark and the classification model simultaneously determine that the calling number to be identified is a harassing call; when the user marks or the classification model judges that the calling number to be identified is a harassing call, the calling number is judged to be a suspected harassing call number; when the user mark and the classification model both judge that the calling number to be identified is a normal telephone number, the calling number is judged as the normal telephone number; and the database 26 is used for storing the crank call numbers, the suspected crank call numbers, the normal call numbers and the corresponding number categories.
In an embodiment of the present invention, wherein the marking unit may be further configured to: the user can mark the received crank call number as a crank call according to the call content, the call intention, the caller ID number or the voice tone of the calling party.
In another embodiment of the present invention, wherein in the marking unit, extracting the characteristic of the marked crank call further comprises: and extracting the characteristics of the marked harassing calls with higher confidence coefficient, and establishing a classification model.
In a further embodiment of the invention, the characteristic of the marked crank call comprises an attribute characteristic and a behavior characteristic, wherein the attribute characteristic comprises a number type, a number attribution and a number operator; the behavior characteristics comprise average dial-out missed ring time, dial-out on-time ratio, dial-out rejection ratio, incoming call dial-out time ratio and average on-time.
In yet another embodiment of the present invention, wherein the database is a k-v database, k-v data with crank call numbers, suspected crank call numbers, normal call numbers as key values k and corresponding number classes as values v are stored.
In one embodiment of the invention, wherein the classification model may comprise a random forest model.
By means of the crank call identification method, the crank call identification device or the computer equipment, the problem of crank call data shortage can be effectively solved, and the requirements of rectifying and treating the problem of crank call disturbance to residents and practically purifying the communication service environment are met. The user marks the received crank call number as a crank call, so that a high-quality sample can be obtained, and the problem of cold start of data is solved. The method comprises the steps of extracting the characteristics of marked crank calls, establishing a classification model, judging whether a calling number to be identified is a crank call by using the classification model, namely deeply analyzing high-quality samples, extracting the characteristics and training the model by using data mining and machine learning technologies, and solving the problem of low accuracy of manually making rules. Finally, by means of comprehensive judgment of the user mark and the classification model, accuracy and coverage rate of the identification result of the crank call are further improved.
In an embodiment according to the third aspect of the present invention, there is provided a computer apparatus, which may include: one or more processors; a storage device for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as above.
In an embodiment according to the fourth aspect of the invention, a computer-readable storage medium is provided, which stores a computer program, characterized in that the program realizes the above method when executed by a processor.
For example, fig. 8 shows a schematic block diagram of an example computer device 30 that may be used to implement embodiments of the present disclosure. It should be understood that the computer device 30 may be used to implement the crank call identification method 10 described in this disclosure. As shown, computer device 30 includes a Central Processing Unit (CPU)32 that may perform various appropriate actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM)34 or loaded from a storage unit 46 into a Random Access Memory (RAM) 36. In the RAM36, various programs and data required for the operation of the computer device 30 may also be stored. The CPU32, ROM 34, and RAM36 are connected to each other by a bus 38. An input/output (I/O) interface 40 is also connected to bus 38.
A number of components in the computer device 30 are connected to the I/O interface 40, including: an input unit 42 such as a keyboard, a mouse, etc.; an output unit 44 such as various types of displays, speakers, and the like; a storage unit 46 such as a magnetic disk, an optical disk, or the like; and a communication unit 48 such as a network card, modem, wireless communication transceiver, etc. The communication unit 48 allows the computer device 30 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
Processing unit 32 performs various methods and processes described above, such as method 10. For example, in some embodiments, the method 10 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 46. In some embodiments, part or all of the computer program may be loaded and/or installed onto computer device 30 via ROM 34 and/or communications unit 48. When loaded into RAM36 and executed by CPU32, may perform one or more of the acts or steps of method 10 described above. Alternatively, in other embodiments, the CPU32 may be configured to perform the method 10 by any other suitable means (e.g., by way of firmware).
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System On Chip (SOCs), load programmable logic devices (CPLDs), and the like.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.
While the invention has been described with reference to specific embodiments, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (14)

1. A crank call identification method is characterized by comprising the following steps:
guiding a user to mark the received crank call number as a crank call;
screening the marked crank calls to obtain a screening sample, extracting the characteristics of the screening sample, establishing a classification model, and judging whether the calling number to be identified is a crank call by using the classification model;
when the user mark and the classification model simultaneously judge that the calling number to be identified is a harassing call, judging the calling number to be a harassing call number; when the user mark or the classification model judges that the calling number to be identified is a harassing call, the calling number is judged to be a suspected harassing call number; when the user mark and the classification model both judge that the calling number to be identified is a normal telephone number, the calling number is judged to be a normal telephone number;
and storing the crank call number, the suspected crank call number, the normal call number and the corresponding number category in a database.
2. The crank call identification method according to claim 1, wherein the step of guiding the user to mark the received crank call number as a crank call comprises the steps of:
and guiding the user to mark the received crank call number as a crank call according to the call content, the call intention, the caller identification number or the voice tone of the calling party.
3. A crank call identification method according to claim 2, wherein said screened sample is said labeled crank call with higher confidence.
4. The crank call identification method according to claim 3, wherein the characteristics of the screening sample comprise attribute characteristics and behavior characteristics, wherein the attribute characteristics comprise number type, number attribution, number operator; the behavior characteristics comprise average dial-out missed ring time, dial-out on-time ratio, dial-out rejection ratio, incoming call dial-out time ratio and average on-time.
5. A method for identifying crank calls as claimed in claim 4 wherein storing said crank call numbers, suspected crank call numbers, normal telephone numbers and corresponding number categories in a database comprises:
and taking the crank call number, the suspected crank call number and the normal call number as key values k, taking the corresponding number category as a numerical value v, and writing the key values k and the numerical value v into a k-v database.
6. A crank call identification method according to any of claims 1-5, wherein said classification model comprises a random forest model.
7. A crank call identification device, comprising:
the marking unit is configured to guide a user to mark the received crank call number as a crank call;
the construction unit is used for screening the marked harassing calls to obtain screening samples, extracting the characteristics of the screening samples, establishing a classification model, judging whether the calling number to be identified is a harassing call by using the classification model, and judging the calling number to be identified as a harassing call number when the user mark and the classification model simultaneously judge that the calling number to be identified is a harassing call; when the user mark or the classification model judges that the calling number to be identified is a harassing call, the calling number is judged to be a suspected harassing call number; when the user mark and the classification model both judge that the calling number to be identified is a normal telephone number, the calling number is judged to be a normal telephone number;
and the database is used for storing the crank call numbers, suspected crank call numbers, normal call numbers and corresponding number categories.
8. A crank call identification device according to claim 7, wherein said tagging unit is further configured to:
the user can mark the received crank call number as a crank call according to the call content, the call intention, the caller ID number or the voice tone of the calling party.
9. A crank call identification device according to claim 8, wherein said screened sample is said marked crank call with higher confidence.
10. The crank call identification device according to claim 9, wherein the characteristics of the screening sample include attribute characteristics and behavior characteristics, the attribute characteristics include number type, number attribution, number operator; the behavior characteristics comprise average dial-out missed ring time, dial-out on-time ratio, dial-out rejection ratio, incoming call dial-out time ratio and average on-time.
11. A crank call identification device according to claim 10, wherein said database is a k-v database in which k-v data having said crank call number, suspected crank call number, normal call number as key value k, and corresponding number category as value v are stored.
12. A crank call identification device according to any of claims 7-11, wherein said classification model comprises a random forest model.
13. A computer device, comprising:
one or more processors;
storage means for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-6.
14. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN201811294711.9A 2018-11-01 2018-11-01 Crank call identification method and device Active CN111131593B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811294711.9A CN111131593B (en) 2018-11-01 2018-11-01 Crank call identification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811294711.9A CN111131593B (en) 2018-11-01 2018-11-01 Crank call identification method and device

Publications (2)

Publication Number Publication Date
CN111131593A CN111131593A (en) 2020-05-08
CN111131593B true CN111131593B (en) 2021-04-13

Family

ID=70494324

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811294711.9A Active CN111131593B (en) 2018-11-01 2018-11-01 Crank call identification method and device

Country Status (1)

Country Link
CN (1) CN111131593B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113709747B (en) * 2020-05-09 2023-10-13 中国移动通信集团有限公司 Harassment number identification method and device, computer equipment and storage medium
CN111654866A (en) * 2020-05-29 2020-09-11 北京合力思腾科技股份有限公司 Method, device and computer storage medium for preventing mobile communication from fraud
CN112199388A (en) * 2020-09-02 2021-01-08 卓望数码技术(深圳)有限公司 Strange call identification method and device, electronic equipment and storage medium
CN112671982B (en) * 2020-12-15 2021-09-14 中国信息通信研究院 Crank call identification method and system
CN114765648B (en) * 2021-01-15 2023-07-21 中国联合网络通信集团有限公司 Harassment call treatment method, harassment call treatment system, computer equipment and storage medium
CN114025041B (en) * 2021-11-29 2023-10-13 号百信息服务有限公司 System and method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108153727A (en) * 2017-12-18 2018-06-12 浙江鹏信信息科技股份有限公司 Utilize the method for semantic mining algorithm mark sales calls and the system of improvement sales calls
CN108366173A (en) * 2018-01-05 2018-08-03 腾讯科技(深圳)有限公司 A kind of phone recognition methods, relevant device and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683538B (en) * 2015-02-13 2017-11-03 广州市讯飞樽鸿信息技术有限公司 Harassing call number banking process and system
CN106255113A (en) * 2015-06-10 2016-12-21 中兴通讯股份有限公司 The recognition methods of harassing call and device
CN107404589A (en) * 2017-08-10 2017-11-28 北京泰迪熊移动科技有限公司 Kind identification method, device and the terminal device of call number
CN108449482A (en) * 2018-02-09 2018-08-24 北京泰迪熊移动科技有限公司 The method and system of Number Reorganization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108153727A (en) * 2017-12-18 2018-06-12 浙江鹏信信息科技股份有限公司 Utilize the method for semantic mining algorithm mark sales calls and the system of improvement sales calls
CN108366173A (en) * 2018-01-05 2018-08-03 腾讯科技(深圳)有限公司 A kind of phone recognition methods, relevant device and system

Also Published As

Publication number Publication date
CN111131593A (en) 2020-05-08

Similar Documents

Publication Publication Date Title
CN111131593B (en) Crank call identification method and device
CN109600752B (en) Deep clustering fraud detection method and device
CN109451182B (en) Detection method and device for fraud telephone
CN110401779B (en) Method and device for identifying telephone number and computer readable storage medium
CN109658939B (en) Method for identifying reason of call record non-connection
CN108366045B (en) Method and device for setting wind control scoring card
CN110334241A (en) Quality detecting method, device, equipment and the computer readable storage medium of customer service recording
CN106534463B (en) Strange call processing method and device, terminal and server
CN105244031A (en) Speaker identification method and device
CN103440458B (en) A kind of method of heuristic static identification Android system malicious code
CN106936997B (en) A kind of rubbish voice recognition methods and system based on social networks map
CN110072019A (en) A kind of method and device shielding harassing call
CN113794805A (en) Detection method and detection system for GOIP fraud telephone
CN105827787B (en) number marking method and device
CN105825129A (en) Converged communication malicious software identification method and system
CN110167030B (en) Method, device, electronic equipment and storage medium for identifying crank calls
CN110839216B (en) Method and device for identifying communication information fraud
CN112597282B (en) Management method applied to short message data security
CN113726942A (en) Intelligent telephone answering method, system, medium and electronic terminal
CN111131627B (en) Method, device and readable medium for detecting personal harmful call based on streaming data atlas
CN108198086A (en) For identifying the method and apparatus in harassing and wrecking source according to communication behavior feature
CN109587357B (en) Crank call identification method
CN109995605B (en) Flow identification method and device and computer readable storage medium
CN110166637B (en) Empty number identification method and device
CN109510903B (en) Method for identifying international fraud number

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant