CN109587357A - A kind of recognition methods of harassing call - Google Patents

A kind of recognition methods of harassing call Download PDF

Info

Publication number
CN109587357A
CN109587357A CN201811357638.5A CN201811357638A CN109587357A CN 109587357 A CN109587357 A CN 109587357A CN 201811357638 A CN201811357638 A CN 201811357638A CN 109587357 A CN109587357 A CN 109587357A
Authority
CN
China
Prior art keywords
caller number
threshold value
harassing
acquisition system
data acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811357638.5A
Other languages
Chinese (zh)
Other versions
CN109587357B (en
Inventor
李鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Matu Information Technology Co Ltd
Original Assignee
Shanghai Matu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Matu Information Technology Co Ltd filed Critical Shanghai Matu Information Technology Co Ltd
Priority to CN201811357638.5A priority Critical patent/CN109587357B/en
Publication of CN109587357A publication Critical patent/CN109587357A/en
Application granted granted Critical
Publication of CN109587357B publication Critical patent/CN109587357B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/436Arrangements for screening incoming calls, i.e. evaluating the characteristics of a call before deciding whether to answer it
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/42025Calling or Called party identification service
    • H04M3/42034Calling party identification service
    • H04M3/42059Making use of the calling party identifier

Abstract

The present invention relates to electronic communication technology fields, more particularly to a kind of recognition methods of harassing call, comprising: read communicating data, and sort out the communicating data according to the interval of setting time, form multiple record entries, multiple record entry composition data set A;Communicating data after classification is cleaned, element will be set in data acquisition system A and is deleted as empty record entry, data acquisition system B is obtained;By carrying out statistics calculating to each caller number in setting time interval in data acquisition system B, feature of the caller number in data acquisition system B is generated, set C is denoted as;According to feature of the caller number of generation in data acquisition system B, judge whether caller number is harassing call in setting time interval.The present invention carries out the judgement of multistage multilayer rule by formulating judgment rule, wherein the threshold value judged, which defines, to be determined by clustering and comentropy, finally obtains the result to phone judgement.Usability of the present invention is high, more flexibly.

Description

A kind of recognition methods of harassing call
Technical field
The present invention relates to electronic communication technology field more particularly to a kind of recognition methods of harassing call.
Background technique
With the continuous development of the communication technology, mobile communication business is enriched constantly, mobile communications network construction cost and Mobile phone terminal cost constantly declines, and people are increasing to the dependence of mobile communication, and the frequency used is also higher and higher.But The rapid development of mobile communication bring facilitate while, but also some people for commercial object utilize mobile communication Some harassing and wrecking information are publicized and propagated, spreading unchecked for harassing call is resulted in, very big puzzlement, harassing and wrecking electricity are brought to people's lives Words, which not only influence people's lives, also influences the normal development of society.Harassing call is mainly shown as: illegal user is to mobile visitor Family is dialed on a large scale, on-hook after ring once, and call forwarding forms harassing and wrecking and fraud, in subjectivity to telegraphone when clients being waited to call back On violate mobile phone user's will and exhaled objectively causing to encroach on or blind user to user's freedom of correspondence, peaceful life It cries.
Application No. is the Chinese patent application of 201410249964.X, recognition methods and the dress of a kind of harassing call are disclosed It sets, by acquiring the history call-information and registration information of caller, and information above is judged, if passing through preset condition Then it is judged as harassing call, otherwise it is assumed that being non-harassing call.Application No. is 201710552232.1 Chinese patent applications, public A kind of identification of harassing call and hold-up interception method have been opened, initial data has been handled by acquiring communication network signaling information, then According to feature selecting recognition factor, classification is carried out to all calls using Weighted Naive Bayes Classification Algorithm and is disturbed to identify Phone is disturbed, call block is finally carried out.Application No. is 201610312825.6 Chinese patent applications, disclose harassing call Recognition methods, device and terminal, are judged using voiceprint, electrically connect rear calling party sound by obtaining incoming call Sample sound voiceprint, this voiceprint is matched with pre-stored voiceprint, if successful match and There is the pre-stored voiceprint harassing call to mark then labeled as harassing call.
However, existing harassing call recognition methods utilizes Weighted Naive Bayes Classification Algorithm, voiceprint identification technology Achieve the purpose that identify harassing call with condition judgement, has the following deficiencies: the threshold value of Rulemaking by the way that manually setting can Low by property, carrying out classification to phone by sorting algorithm is but the shape of harassing call at present based on feature selecting recognition factor Formula and calling number etc. are all changing daily, and the feature of harassing call is also constantly converting, therefore adjustability can be compared with Difference.In addition, identifying that the applicable range of harassing call also has very much in conjunction with voiceprint according to preparatory label voiceprint library Limit, the sound that harassing call dials personnel daily can change or convert voiceprint using sound wave converting system.So existing Although harassing call, which knows method for distinguishing, can recognize that still application range compares limitation to harassing call, adjustability is poor.
Summary of the invention
In view of the shortcomings of the prior art, it is an object of the present invention to provide a kind of usability height, more flexible harassing calls Recognition methods.
A kind of recognition methods of harassing call provided in an embodiment of the present invention, comprising:
Communicating data is read, and sorts out the communicating data according to the interval of setting time, forms multiple record strips Mesh, multiple record entry composition data set A;
Communicating data after classification is cleaned, element will be set in data acquisition system A and is deleted as empty record entry, is obtained To data acquisition system B;
By carrying out statistics calculating to each caller number in setting time interval in data acquisition system B, master is generated The feature called out the numbers in data acquisition system B is denoted as set C;
According to feature of the caller number of generation in data acquisition system B, judge caller number in setting time interval whether be Harassing call.
Further, in the above method, each record entry includes but is not limited to one or more of: called number Code, calling number, time started, duration, type of call, originator or terminal, enterprise's number, ring duration, end code and by Cry districts and cities.
Further, in the above method, feature of the caller number of the generation in data acquisition system B include: dial number, Dial object not repetitive rate, dial the percentage of lost calls, the duration of call, whether consecutive numbers is dialed, called districts and cities' number and interior lines are called rate.
Further, in the above method, feature of the caller number according to generation in data acquisition system B judges caller Number in setting time interval whether be harassing call mode it is as follows:
If consecutive numbers dials behavior=1, to harass caller number, the caller number not judged enters to be judged in next step;
If interior lines are called rate > threshold value a, for normal caller number, do not judge that caller number enters and judge in next step;
If the duration of call > threshold value b, for normal caller number, does not judge that caller number enters and judge in next step;
If dialing number > threshold value c, and object not repetitive rate >=threshold value d is dialed, then to harass caller number, does not judge caller Number enter in next step judge;
If dialing number > threshold value c, and dial the percentage of lost calls >=threshold value e, then be harassing and wrecking caller number, do not judge caller number into Enter and judges in next step;
If called districts and cities' number >=threshold value f does not judge caller number for normal caller number to harass caller number.
Further, in the above method, each threshold value is determined in the following manner:
Caller number and time label combination are formed into data acquisition system D, as the label of record, and pass through K-means algorithm Clustering is carried out to data acquisition system D;
After clustering, all caller numbers are divided into ten classes automatically, and indicate that caller number is each with the caller number average value The characteristics of a classification;
Classification results are made an addition on data acquisition system D, for describing classification belonging to record entry, and by updated data Set is denoted as E;
By distinguishing whether classification is harassing and wrecking classification, judge to record whether entry is harassing and wrecking entry, set E will increase parameter Entry values or normal entries value are harassed, set F is formed;
Whether for being that harassing and wrecking carry out comentropy calculating: Ent (X)=P0log2 (P0)+P1log2 (P1), wherein P0 is indicated Normal entries proportion, P1 indicates harassing and wrecking entry proportion, and then calculates each threshold value.
Further, in the above method, the method for calculating each threshold value is as follows:
Minimum value, maximum value and the step-length calculated every time of given threshold;
Minimum value is set a threshold to, entries all in set E are divided into first group greater than the threshold value, are less than the threshold Value is divided into second group;
Calculate separately above-mentioned two groups whether be harassing and wrecking comentropy, and by result merge record;
The minimum value of threshold value is gradually increased into step-length, until maximum value;
Select threshold value corresponding to comentropy and minimum value for final calculation result.
Further, in the above method, the threshold value a of the called rate in the interior lines, minimum value 0, maximum value 1, every time The step-length of calculating is 0.01.
Further, in the above method, the threshold value b of the duration of call, minimum value 0, maximum value 200, every time Increasing step-length is 1.
Further, in the above method, the threshold value c for dialing number, minimum value 0, maximum value 100, every time Increasing step-length is 1.
Further, in the above method, the threshold value d for dialing object not repetitive rate, minimum value 0, maximum value is 1, increasing step-length every time is 0.01.
Further, in the above method, the threshold value e for dialing the percentage of lost calls, minimum value 0, maximum value 1, often Secondary increase step-length is 0.01.
Further, in the above method, the threshold value f of called districts and cities' number, minimum value 0, maximum value 50, every time Increasing step-length is 1.
Compared with prior art, the recognition methods of harassing call provided in an embodiment of the present invention, comprising: read call number According to, and sort out the communicating data according to the interval of setting time, form multiple record entries, multiple record entry group At data acquisition system A;Communicating data after classification is cleaned, element will be set in data acquisition system A and is deleted as empty record entry It removes, obtains data acquisition system B;By carrying out statistics meter to each caller number in setting time interval in data acquisition system B It calculates, generates feature of the caller number in data acquisition system B, be denoted as set C;According to spy of the caller number of generation in data acquisition system B Sign, judges whether caller number is harassing call in setting time interval.The present invention is multistage more by formulating judgment rule progress Layer rule judgement, wherein the threshold value judged, which defines, to be determined by clustering and comentropy, finally obtains to phone The result of judgement.Threshold value of the invention can judge to adjust due to not formulating not instead of manually according to comentropy, therefore, this hair Bright usability is high, more flexibly.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly introduced, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this For the those of ordinary skill in field, without any creative labor, it can also be obtained according to these attached drawings His attached drawing.
Fig. 1 is a kind of recognition methods flow diagram of harassing call provided by the invention;
Fig. 2 is threshold value method flow diagram provided by the invention;
Fig. 3 is calculating threshold method flow chart provided by the invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to the present invention make into It is described in detail to one step, it is clear that the described embodiments are only some of the embodiments of the present invention, rather than whole implementation Example.Based on the embodiments of the present invention, obtained by those of ordinary skill in the art without making creative efforts All other embodiment, shall fall within the protection scope of the present invention.
The embodiment of the present invention is described in further detail with reference to the accompanying drawings of the specification.
As shown in Figure 1, the embodiment of the invention discloses a kind of recognition methods of harassing call, comprising:
S101 reads communicating data, and sorts out the communicating data according to the interval of setting time, forms multiple notes Record entry, multiple record entry composition data set A;
S102 cleans the communicating data after classification, and element will be set in data acquisition system A and is deleted as empty record entry It removes, obtains data acquisition system B;
S103 is raw by carrying out statistics calculating to each caller number in setting time interval in data acquisition system B At feature of the caller number in data acquisition system B, it is denoted as set C;
S104 judges that caller number is in setting time interval according to feature of the caller number of generation in data acquisition system B No is harassing call.
In step of embodiment of the present invention S101, communicating data is specifically split arrangement with five-minute period piece.
Further, in the above method, each record entry includes but is not limited to one or more of: called number Code, calling number, time started, duration, type of call, type of call (originator or terminal), enterprise's number, ring duration, knot Beam code and called districts and cities.Such as: some record entry be [15802811404,02095056015,20171227090031, 27,0,1,2004902310,5,0,1, Chengdu/Sichuan]).
Specifically, each project in above-mentioned record entry is expressed as:
The embodiment of the present invention will be counted according to the time started according to five minutes intervals after reading whole communicating datas According to being sorted out.Initial time is arranged according to the earliest call time started, until all communicating datas have been divided.Than Such as, if the earliest call time started be 00:00:00 on December 30 in 2017 by " 00:00:00-00:04:59,00:05: 00-0:09:59 ... " is divided.It can be denoted as A (A1, A2 ...), wherein An indicates every group of data, and A indicates the collection of each group of data It closes.The good data of above-mentioned grouping are carried out to the operation of step S102.
The embodiment of the present invention in step s 102, cleans the data of each five-minute period piece.Specifically, first First there will be the entry of missing values to delete except called enterprise numbers in An data, for example caller number or called number are empty record Entry needs delete (if only called enterprise's number is sky, without deleting).Then the phone of caller ticket is extracted, i.e., The record entry of " type of call (originator or terminal) "=1.Above-mentioned processing, the data finally obtained are carried out for each An Bn, whole Bn are denoted as B (B1, B2 ...).The group number of B (B1, B2 ...) should be identical as the group number of A (A1, A2 ...).It is thus obtained Data acquisition system B, which enters next step S103, to be continued to operate.
Step of embodiment of the present invention S103 carries out each caller number of each five minutes timeslice special Sign calculates, and generates the feature for being used to subsequent judgement.Preferably, it wherein the feature generated includes: to dial number, dials object and does not weigh Multiple rate, dials the percentage of lost calls, the duration of call, if consecutive numbers is dialed, and is called districts and cities' number, interior lines are called rate.
Specifically, dialing number is the total degree that same caller number is made a phone call in Bn.Dial object not repetitive rate Then to count all called phones that same caller number is dialed first, wherein duplicate called phone is taken out, these are then calculated The quantity of unduplicated called phone.Dial object not repetitive rate be unduplicated called phone the quantity/caller number Dial number.Dialing the percentage of lost calls is to count the record strip purpose quantity of type of call=1 of same caller number, that is, after dialing not The phone quantity got through, the value are to dial the percentage of lost calls with the ratio for dialing number.The duration of call is a certain caller number in Bn In (duration-ring duration) average value, unit is the second.Called districts and cities' number is then to count a certain caller number all quilts in Bn Districts and cities are cried, wherein duplicate districts and cities are then taken out, obtained unduplicated districts and cities' number is called districts and cities' number of the caller number.Even Number behavior of dialing refers to for same caller number, if the called numbers of continuous two records only have last three differences and are not same One number is then denoted as primary doubtful consecutive numbers and dials;If in a Bn, there are 5 doubtful consecutive numberies and dials in a caller number, then It is denoted as that there are consecutive numberies to dial behavior, otherwise it is 0 which, which is 1,.Interior lines are called rate and refer in the phone for counting same caller number broadcast Caller enterprise number and called enterprise number identical record quantity, by the quantity and the caller number dial number be divided by as Interior lines are called rate.
The embodiment of the present invention is by counting caller numbers all in Bn, feature of the available caller number in Bn. It is as shown in the table:
In upper table, wherein belonging to the time 201712291710 indicate point 10:00~14:59 29 days 17 December in 2017 when Between piece.
All Bn by calculating, are formed the information of table as above, are denoted as Cn, set is denoted as C by the embodiment of the present invention.
Further, in the above method, feature of the caller number according to generation in data acquisition system B judges caller Number in setting time interval whether be harassing call mode it is as follows:
If consecutive numbers dials behavior=1, to harass caller number, the caller number not judged enters to be judged in next step;
If interior lines are called rate > threshold value a, for normal caller number, do not judge that caller number enters and judge in next step;
If the duration of call > threshold value b, for normal caller number, does not judge that caller number enters and judge in next step;
If dialing number > threshold value c, and object not repetitive rate >=threshold value d is dialed, then to harass caller number, does not judge caller Number enter in next step judge;
If dialing number > threshold value c, and dial the percentage of lost calls >=threshold value e, then be harassing and wrecking caller number, do not judge caller number into Enter and judges in next step;
If called districts and cities' number >=threshold value f does not judge caller number for normal caller number to harass caller number.
For the embodiment of the present invention after above-mentioned judgement, the caller number in some time slice Cn will be divided into two classes: One kind is normal caller number;Another kind of is harassing and wrecking caller number.So far, the present invention has obtained harassing and wrecking caller number list, completes and disturbs Disturb phone identification target.
It is noted that the above-mentioned each threshold value of the embodiment of the present invention is not artificially to determine, but it is obtained by calculation. That is, being calculated by the record for varying environment, available different judgement parameter.Therefore, the present invention has There is stronger adaptability.
Further, as shown in Fig. 2, determining each threshold value in the following manner:
Caller number and time label combination are formed data acquisition system D, as the label of record, and pass through K-means by S201 Algorithm carries out clustering to data acquisition system D;
All caller numbers after clustering, are divided into ten classes, and indicate caller with the caller number average value by S202 automatically The characteristics of number each classification;
S203 makes an addition to classification results on data acquisition system D, for describing classification belonging to record entry, and will be after update Data acquisition system be denoted as E;
S204 judges to record whether entry is harassing and wrecking entry, set E will increase by distinguishing whether classification is harassing and wrecking classification Add parameter harassing and wrecking entry values or normal entries value, forms set F;
S205, for whether be harassing and wrecking carry out comentropy calculating: Ent (X)=P0log2 (P0)+P1log2 (P1), wherein P0 indicates normal entries proportion, and P1 indicates harassing and wrecking entry proportion, and then calculates each threshold value.
During the present invention is implemented, by C1 ... Cn and and a parameter (caller is combined into together, and by caller number and time label Number-time label), such as (0111615274-201712291710).The data acquisition system is denoted as D.Wherein (caller number-time mark Note) it is the label recorded, other values carry out subsequent clustering as the feature of record.
The embodiment of the present invention carries out clustering to data acquisition system D by K-means algorithm.It may be deposited to sufficiently excavate Classification, the present invention by cluster categorical measure be set as 10.It, can be automatic by all caller numbers after clustering algorithm The characteristics of being divided into ten classes, indicating its each classification with its average value.Shown in following following table:
Any one (caller number-time slice) of embodiment of the present invention record belongs to one kind in ten classes.The classification results To be added on D, D can more column parameters (affiliated class categories) classification belonging to the record entry is described, value is 0 to 9 In one.Updated data set is denoted as E.
In step of embodiment of the present invention S204, whether mark classification is harassing and wrecking classification, and further mark records entry is No is harassing and wrecking entry.In category table, distinguish whether classification is harassing call according to common sense.Particularly, the present invention will dial secondary Classification of the number higher than 20 times divides doubtful harassing and wrecking classification into, and there are the classifications that consecutive numbers is dialed to divide doubtful harassing and wrecking classification into, and interior lines are called Classification of the rate equal to 1 divides normal category into.Other unallocated category divisions are normal category.That is, [2,3,4,5,7] are in upper table Classification is harassed, [0,1,6,8,9] is normal category.
In implementation, E data set will judge all record entries being classified as two classes according to above-mentioned classification, if the affiliated class of entry Classification Wei not be harassed, then the entry is to harass entry, if generic is normal category, is classified as normal entries.E data set A parameter " whether being harassing and wrecking " will be added, entry value=1, normal entries value=0 are harassed.Updated data set is denoted as F。
The embodiment of the present invention since whether, just for being that harassing and wrecking carry out comentropy calculating, classification only has 0 and 1 two kind, Formula are as follows: Ent (X)=P0log2 (P0)+P1log2 (P1);Wherein P0 indicates that normal entries proportion, value are equal to normal The quantity of entry/total number of entries.P1 indicates that harassing and wrecking entry proportion, value are equal to quantity/total entry number of harassing and wrecking entry Amount.Comentropy is smaller, indicates that in entry 0 or 1 number difference is more;Comentropy is bigger, then it represents that 0 or 1 liang in entry The number difference of person is smaller.
Further, as shown in figure 3, the method for calculating each threshold value is as follows:
S301, minimum value, maximum value and the step-length calculated every time of given threshold;
S302 sets a threshold to minimum value, is divided into first group for what entries all in set E were greater than the threshold value, small Second group is divided into the threshold value;
S303, calculate separately above-mentioned two groups whether be harassing and wrecking comentropy, and by result merge record;
The minimum value of threshold value is gradually increased step-length, until maximum value by S304;
S305 selects threshold value corresponding to comentropy and minimum value for final calculation result.
In implementation, by taking interior lines are called the threshold calculations of rate as an example:
Step 1, the possibility minimum value 0 of threshold value and maximum value 1, and the step-length 0.01 calculated every time.
Step 2, the threshold value that interior lines are called to rate are set as minimum value 0, and entries all in E are greater than being divided into for the threshold value First group, all entry interior lines are called rate and are divided into second group less than the threshold value.
Step 3, calculate separately two groups whether be harassing and wrecking comentropy, and by result do and and record.
Step 4, threshold value gradually increase step-length, until maximum value, i.e., 0.01,0.02 ... 0.99,1.2,3 are repeated every time Step.
After the completion of step 5, calculating, because comentropy and minimum mean that corresponding threshold value can more distinguish normal telephone entry With harassing call entry.So selecting threshold value corresponding to comentropy and minimum value is final calculation result.Such as threshold value setting When being 0.3, it is divided to two groups of comentropy and minimum, then the threshold value a that interior lines used in rule are called rate should be 0.3.
Further, in the above method, the threshold value b of the duration of call, minimum value 0, maximum value 200, every time Increasing step-length is 1.
Further, in the above method, the threshold value c for dialing number, minimum value 0, maximum value 100, every time Increasing step-length is 1.
Further, in the above method, the threshold value d for dialing object not repetitive rate, minimum value 0, maximum value is 1, increasing step-length every time is 0.01.
Further, in the above method, the threshold value e for dialing the percentage of lost calls, minimum value 0, maximum value 1, often Secondary increase step-length is 0.01.
Further, in the above method, the threshold value f of called districts and cities' number, minimum value 0, maximum value 50, every time Increasing step-length is 1.
The embodiment of the present invention will be used by the above-mentioned threshold value being calculated as the threshold value in harassing call identification process. Once it is determined that can be used in a longer period of time after the threshold value, it also can according to need and periodically recalculate setting threshold Value, or setting threshold value is recalculated according to the difference in area.
To sum up, the present invention carries out the judgement of multistage multilayer rule by formulating judgment rule, wherein the threshold value judged defines is It is determined by clustering and comentropy, finally obtains the result to phone judgement.Since threshold value of the invention is not It is artificial to formulate, but can judge to adjust according to comentropy, therefore, usability of the present invention is high, more flexibly.
It should be understood by those skilled in the art that, embodiments herein can provide as method or computer program product. Therefore, complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used in the application Form.It is deposited moreover, the application can be used to can be used in the computer that one or more wherein includes computer usable program code The shape for the computer program product implemented on storage media (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) Formula.
Although preferred embodiments of the present invention have been described, it is created once a person skilled in the art knows basic Property concept, then additional changes and modifications can be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as It selects embodiment and falls into all change and modification of the scope of the invention.
Obviously, various changes and modifications can be made to the invention without departing from essence of the invention by those skilled in the art Mind and range.In this way, if these modifications and changes of the present invention belongs to the range of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to include these modifications and variations.

Claims (12)

1. a kind of recognition methods of harassing call characterized by comprising
Communicating data is read, and sorts out the communicating data according to the interval of setting time, forms multiple record entries, it should Multiple record entry composition data set A;
Communicating data after classification is cleaned, element will be set in data acquisition system A and is deleted as empty record entry, is counted According to set B;
By carrying out statistics calculating to each caller number in setting time interval in data acquisition system B, caller number is generated Feature in data acquisition system B is denoted as set C;
According to feature of the caller number of generation in data acquisition system B, judge whether caller number is harassing and wrecking in setting time interval Phone.
2. the method according to claim 1, wherein each record entry includes but is not limited to following a kind of Or it is a variety of: called number, calling number, time started, duration, type of call, originator or terminal, enterprise's number, ring duration, End code and called districts and cities.
3. the method according to claim 1, wherein feature of the caller number of the generation in data acquisition system B Include: dial number, dial object not repetitive rate, dial the percentage of lost calls, the duration of call, whether consecutive numbers dial, called districts and cities' number And interior lines are called rate.
4. according to the method described in claim 3, it is characterized in that, the caller number according to generation is in data acquisition system B Feature, judge caller number in setting time interval whether be harassing call mode it is as follows:
If consecutive numbers dials behavior=1, to harass caller number, the caller number not judged enters to be judged in next step;
If interior lines are called rate > threshold value a, for normal caller number, do not judge that caller number enters and judge in next step;
If the duration of call > threshold value b, for normal caller number, does not judge that caller number enters and judge in next step;
If dialing number > threshold value c, and dial object not repetitive rate >=threshold value d, then be harassing and wrecking caller number, do not judge caller number into Enter and judges in next step;
If dialing number > threshold value c, and the percentage of lost calls >=threshold value e is dialed, then to harass caller number, under not judging that caller number enters The judgement of one step;
If called districts and cities' number >=threshold value f does not judge caller number for normal caller number to harass caller number.
5. according to the method described in claim 4, it is characterized in that, determining each threshold value in the following manner:
Caller number and time label combination are formed into data acquisition system D, as the label of record, and pass through K-means algorithm logarithm Clustering is carried out according to set D;
After clustering, all caller numbers are divided into ten classes automatically, and indicate each class of caller number with the caller number average value Other feature;
Classification results are made an addition on data acquisition system D, for describing classification belonging to record entry, and by updated data acquisition system It is denoted as E;
By distinguishing whether classification is harassing and wrecking classification, judge to record whether entry is harassing and wrecking entry, set E will increase parameter harassing and wrecking Entry values or normal entries value form set F;
Whether for being that harassing and wrecking carry out comentropy calculating: Ent (X)=P0log2 (P0)+P1log2 (P1), wherein P0 indicates normal Entry proportion, P1 indicates harassing and wrecking entry proportion, and then calculates each threshold value.
6. according to the method described in claim 5, it is characterized in that, the method for calculating each threshold value is as follows:
Minimum value, maximum value and the step-length calculated every time of given threshold;
Minimum value is set a threshold to, entries all in set E are divided into first group greater than the threshold value, less than the threshold value It is divided into second group;
Calculate separately above-mentioned two groups whether be harassing and wrecking comentropy, and by result merge record;
The minimum value of threshold value is gradually increased into step-length, until maximum value;
Select threshold value corresponding to comentropy and minimum value for final calculation result.
7. according to the method described in claim 6, it is characterized in that, the interior lines be called rate threshold value a, minimum value 0, most Big value is 1, and the step-length calculated every time is 0.01.
8. according to the method described in claim 6, it is characterized in that, the threshold value b of the duration of call, minimum value 0 are maximum Value is 200, and increasing step-length every time is 1.
9. according to the method described in claim 6, it is characterized in that, the threshold value c for dialing number, minimum value 0 are maximum Value is 100, and increasing step-length every time is 1.
10. according to the method described in claim 6, it is characterized in that, the threshold value d for dialing object not repetitive rate, minimum Value is 0, maximum value 1, and increasing step-length every time is 0.01.
11. according to the method described in claim 6, it is characterized in that, the threshold value e for dialing the percentage of lost calls, minimum value are 0, maximum value 1, increasing step-length every time is 0.01.
12. according to the method described in claim 6, it is characterized in that, the threshold value f of called districts and cities' number, minimum value 0, Maximum value is 50, and increasing step-length every time is 1.
CN201811357638.5A 2018-11-14 2018-11-14 Crank call identification method Active CN109587357B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811357638.5A CN109587357B (en) 2018-11-14 2018-11-14 Crank call identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811357638.5A CN109587357B (en) 2018-11-14 2018-11-14 Crank call identification method

Publications (2)

Publication Number Publication Date
CN109587357A true CN109587357A (en) 2019-04-05
CN109587357B CN109587357B (en) 2021-04-06

Family

ID=65922470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811357638.5A Active CN109587357B (en) 2018-11-14 2018-11-14 Crank call identification method

Country Status (1)

Country Link
CN (1) CN109587357B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110312047A (en) * 2019-06-24 2019-10-08 深圳市趣创科技有限公司 The method and device of automatic shield harassing call
CN111884821A (en) * 2020-03-27 2020-11-03 马洪涛 Ticket data processing and displaying method and device and electronic equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104244216A (en) * 2014-09-29 2014-12-24 中国移动通信集团浙江有限公司 Method and system for intercepting fraud phones in real time during calling
CN104469025A (en) * 2014-11-26 2015-03-25 杭州东信北邮信息技术有限公司 Clustering-algorithm-based method and system for intercepting fraud phone in real time
CN104714947A (en) * 2013-12-11 2015-06-17 深圳市腾讯计算机系统有限公司 Preset type number recognition method and device
CN106255113A (en) * 2015-06-10 2016-12-21 中兴通讯股份有限公司 The recognition methods of harassing call and device
CN106255116A (en) * 2016-08-24 2016-12-21 王瀚辰 A kind of recognition methods harassing number
CN106506769A (en) * 2016-10-08 2017-03-15 浙江鹏信信息科技股份有限公司 A kind of utilization real time algorithm realizes the method and system that malicious call is filtered
CN106954218A (en) * 2017-03-15 2017-07-14 中国联合网络通信集团有限公司 The number sorted methods, devices and systems of one kind harassing and wrecking
CN107273531A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Telephone number classifying identification method, device, equipment and storage medium
US20180027129A1 (en) * 2014-11-01 2018-01-25 Somos, Inc. Toll-tree numbers metadata tagging, analysis and reporting
CN108462785A (en) * 2017-02-21 2018-08-28 中国移动通信集团浙江有限公司 A kind of processing method and processing device of malicious call phone

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104714947A (en) * 2013-12-11 2015-06-17 深圳市腾讯计算机系统有限公司 Preset type number recognition method and device
CN104244216A (en) * 2014-09-29 2014-12-24 中国移动通信集团浙江有限公司 Method and system for intercepting fraud phones in real time during calling
US20180027129A1 (en) * 2014-11-01 2018-01-25 Somos, Inc. Toll-tree numbers metadata tagging, analysis and reporting
CN104469025A (en) * 2014-11-26 2015-03-25 杭州东信北邮信息技术有限公司 Clustering-algorithm-based method and system for intercepting fraud phone in real time
CN106255113A (en) * 2015-06-10 2016-12-21 中兴通讯股份有限公司 The recognition methods of harassing call and device
CN106255116A (en) * 2016-08-24 2016-12-21 王瀚辰 A kind of recognition methods harassing number
CN106506769A (en) * 2016-10-08 2017-03-15 浙江鹏信信息科技股份有限公司 A kind of utilization real time algorithm realizes the method and system that malicious call is filtered
CN108462785A (en) * 2017-02-21 2018-08-28 中国移动通信集团浙江有限公司 A kind of processing method and processing device of malicious call phone
CN106954218A (en) * 2017-03-15 2017-07-14 中国联合网络通信集团有限公司 The number sorted methods, devices and systems of one kind harassing and wrecking
CN107273531A (en) * 2017-06-28 2017-10-20 百度在线网络技术(北京)有限公司 Telephone number classifying identification method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110312047A (en) * 2019-06-24 2019-10-08 深圳市趣创科技有限公司 The method and device of automatic shield harassing call
CN111884821A (en) * 2020-03-27 2020-11-03 马洪涛 Ticket data processing and displaying method and device and electronic equipment
CN111884821B (en) * 2020-03-27 2022-04-29 马洪涛 Ticket data processing and displaying method and device and electronic equipment

Also Published As

Publication number Publication date
CN109587357B (en) 2021-04-06

Similar Documents

Publication Publication Date Title
CN103605791B (en) Information transmission system and information-pushing method
CN105824813B (en) A kind of method and device for excavating core customer
CN108462785B (en) Method and device for processing malicious call
CN109640312B (en) 'Black card' identification method, electronic equipment and computer readable storage medium
CN104683538B (en) Harassing call number banking process and system
CN111131593B (en) Crank call identification method and device
US20030185363A1 (en) System and method for managing CDR information
CN104202457B (en) The intelligent sorting method of cell phone address book
CN109587357A (en) A kind of recognition methods of harassing call
CN109474923B (en) Object recognition method and device, and storage medium
CN109145050B (en) Computing device
CN104410973A (en) Recognition method and system for tape played phone fraud
CN108198086B (en) Method and device for identifying disturbance source according to communication behavior characteristics
CN110167030B (en) Method, device, electronic equipment and storage medium for identifying crank calls
CN110233938B (en) Group fraud telephone identification method based on suspicious measurement
CN110213449B (en) Method for identifying roaming fraud number
CN109274834B (en) Express number identification method based on call behavior
CN102256255A (en) Detection method for parallel-used-card proof based on time and geographic location collisions
CN110677269B (en) Method and device for determining communication user relationship and computer readable storage medium
EP1499968A1 (en) A system for identifying extreme behaviour in elements of a network
CN110312047A (en) The method and device of automatic shield harassing call
Black et al. Learning classification rules for telecom customer call data under concept drift
CN112601228B (en) Method and device for detecting card number and computer readable storage medium
CN109618323A (en) Phone call method, device, computer equipment and computer storage medium
CN109510903B (en) Method for identifying international fraud number

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant