CN110312047A

CN110312047A - The method and device of automatic shield harassing call

Info

Publication number: CN110312047A
Application number: CN201910548703.0A
Authority: CN
Inventors: 倪佳欢; 邓庆庆; 杨沙; 何从华
Original assignee: Shenzhen Quchuang Technology Co Ltd
Current assignee: Shenzhen Quchuang Technology Co Ltd
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2019-10-08

Abstract

The invention discloses a kind of method and device of automatic shield harassing call, method includes: the communicating data for obtaining client；Communicating data is acquired, cluster feature is obtained；According to cluster feature, classified using clustering algorithm to communicating data；Classification belonging to harassing and wrecking communicating data is filtered out, and includes that communicating data intercepts to the classification.This programme is in the air time for obtaining client's communicating data, after communication object, the communication frequency and the duration of call are as main cluster feature, classified using K-means clustering algorithm to communicating data, determine classification belonging to harassing call, and harassing call is effectively intercepted, to substantially reduce the harassing and wrecking frequency of phone, optimal user experience is provided for client.

Description

The method and device of automatic shield harassing call

Technical field

The present invention relates to harassing calls to shield field, especially relate to a kind of automatic shield harassing call method and Device.

Background technique

With the fast development of Communication in China business, communication quality is constantly promoted, and at the same time, some criminals utilize The loophole of legal system, the defect of communication enterprise technology and management, dials harassing call without restraint on a communication network, these harassing calls There are following features mostly:

1, high-volume time calling.Refer to and initiates calling or exhaling repeatedly to simple target number for batch mobile phone destination number It cries, the subjective desire of mobile phone user has been violated in the big batch calling for sole user, and causes harassing and wrecking to user, is belonged to Client's point-to-point communication dispute.

2, user's subjective desire is violated, this is one of main feature of harassing call, and harassing call number is to called subscriber For be all strange number, or not existing virtual-number calls user by number pressure, these User's subjective desire is all violated in calling behavior, is invalid calling for called subscriber.

3, user is caused to harass.This is the another important feature of harassing call, and harassing call is spy with short time connection Sign, has just hung up before user's normal ON, has reversely been dialed to user, to reach its illegal purpose, this to The normal communication at family causes harassing and wrecking.

For such harassing call, consumer very contradicts and dislikes, it is therefore desirable to using a kind of screen method to such Harassing call carries out intelligent intercept.

Summary of the invention

In order to solve the defect of the above-mentioned prior art, the object of the present invention is to provide a kind of sides of automatic shield harassing call Method and device.

In order to achieve the above objectives, the technical scheme is that

A kind of method of automatic shield harassing call, including,

Obtain the communicating data of client；

Communicating data is acquired, cluster feature is obtained；

According to cluster feature, classified using clustering algorithm to communicating data；

Classification belonging to harassing and wrecking communicating data is filtered out, and includes that communicating data intercepts to the classification.

Further, described that communicating data is acquired, cluster feature step is obtained, including,

The air time in communicating data is acquired, communication object communicates the frequency and the duration of call as cluster feature.

Further, described that classifying step is carried out to communicating data using clustering algorithm, including,

According to practical application scene, if defining K value size, as cluster centre；

Classified according to K value to all communicating datas, determines classification belonging to communicating data.

Further, described to be classified according to K value to all communicating datas, determine classifying step belonging to communicating data Later, further include,

Determine new cluster centre point for each classification, according to new cluster centre point to communicating data again minute Class；

Judge whether the cluster centre of classification changes；

If it is not, all communicating datas classified and respectively classified of output.

Further, described filter out harasses the affiliated classifying step of communicating data, including,

Obtain the harassing and wrecking communicating data for including in communicating data；

Obtain classification information belonging to harassing and wrecking communicating data；

According to classification information belonging to harassing and wrecking communicating data, classification belonging to harassing and wrecking communicating data is determined.

The invention also provides a kind of devices of automatic shield harassing call, including,

Data capture unit, for obtaining the communicating data of client；

Data acquisition unit obtains cluster feature for being acquired to communicating data；

Data sorting unit, for being classified to communicating data using clustering algorithm according to cluster feature；

Screen interception unit, for filter out harassing and wrecking communicating data belonging to classification, and to the classification include communicating data into Row intercepts.

Further, the data acquisition unit further includes collection apparatus module, and collection apparatus module is for acquiring call Air time in data, communication object communicate the frequency and the duration of call as cluster feature.

Further, the data sorting unit includes setup module and categorization module,

The setup module is used for according to practical application scene, if defining K value size, as cluster centre；

The categorization module determines and divides belonging to communicating data for being classified according to K value to all communicating datas Class.

Further, the data sorting unit further includes having center to reset module, variation judgment module and classification output Module,

The center resets module, for determining new cluster centre point for each classification, according to new cluster centre point Communicating data is reclassified；

The variation judgment module, for judging whether the cluster centre of classification changes；

The classification output module exports all classification and respectively classifies logical if not changing for cluster centre Talk about data.

Further, the screening interception unit includes the first acquisition module, and second obtains module and harassing and wrecking categorization module,

Described first obtains module, for obtaining the harassing and wrecking communicating data for including in communicating data；

Described second obtains module, for obtaining classification information belonging to harassing and wrecking communicating data；

The harassing and wrecking categorization module determines harassing and wrecking communicating data for the classification information according to belonging to harassing and wrecking communicating data Affiliated classification.

The beneficial effects of the present invention are: in the air time for obtaining client's communicating data, communication object communicates the frequency and leads to After duration is talked about as main cluster feature, is classified using K-means clustering algorithm to communicating data, determine harassing call Affiliated classification, and harassing call is effectively intercepted, to substantially reduce the harassing and wrecking frequency of phone, best use is provided for client Family experience.

Detailed description of the invention

Fig. 1 is a kind of flow chart of the method for automatic shield harassing call of the present invention；

Fig. 2 is the step flow chart that the present invention carries out classifying step using clustering algorithm to communicating data；

Fig. 3 is the step flow chart that the present invention filters out the harassing and wrecking affiliated classifying step of communicating data；

Fig. 4 is a kind of structural principle block diagram of the device of automatic shield harassing call of the present invention；

Fig. 5 is the structural block diagram of data sorting unit of the invention；

Fig. 6 is the structural block diagram of screening interception unit of the invention.

Specific embodiment

To illustrate thought and purpose of the invention, the present invention is done further below in conjunction with the drawings and specific embodiments Explanation.

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiment is only a part of the embodiments of the present invention, instead of all the embodiments.Base Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its His embodiment, shall fall within the protection scope of the present invention.

It is to be appreciated that the directional instruction (up, down, left, right, before and after etc.) of institute is only used in the embodiment of the present invention It explains in relative positional relationship, the motion conditions etc. under a certain particular pose (as shown in the picture) between each component, if the spy When determining posture and changing, then directionality instruction also correspondingly changes correspondingly, and the connection, which can be, to be directly connected to, can also To be to be indirectly connected with.

In addition, the description for being such as related to " first ", " second " in the present invention is used for description purposes only, and should not be understood as Its relative importance of indication or suggestion or the quantity for implicitly indicating indicated technical characteristic.Define as a result, " first ", The feature of " second " can explicitly or implicitly include at least one of the features.In addition, the technical side between each embodiment Case can be combined with each other, but must be based on can be realized by those of ordinary skill in the art, when the combination of technical solution Conflicting or cannot achieve when occur will be understood that the combination of this technical solution is not present, also not the present invention claims guarantor Within the scope of shield.

Unless otherwise instructed, "/" herein represents meaning as "or".

Referring to Fig.1-3, a proposition specific embodiment of the invention, a kind of method of automatic shield harassing call, including,

S10, the communicating data for obtaining client；

S20, communicating data is acquired, obtains cluster feature；

S30, according to cluster feature, classified using clustering algorithm to communicating data；

S40, classification belonging to harassing and wrecking communicating data is filtered out, and includes that communicating data intercepts to the classification.

For step S10, the communicating data of client is obtained, includes the call details of multiple and different clients in communicating data Data, such as conversation object (incoming number), air time, duration of call, and the communication frequency etc. information, in addition, each Client can also be associated with multiple message registrations simultaneously.

For step S20, the communicating data got is acquired, obtains the air time of communicating data, communication object, communication The frequency and the duration of call are main cluster feature, and cluster feature can be used for subsequent progress communicating data classification, and then screen Harassing call out.

Step S20 further include: step S21, acquire the air time in communicating data, communication object communicates the frequency and leads to Duration is talked about as cluster feature.

For step S30, clustering algorithm is one of data mining algorithm most frequently, and be widely used range, main A point heap sort is carried out according to cluster feature, make there are similar, same characteristic features users to assign to together, simultaneously in the different same clans User's difference show maximization.Clustering algorithm is a kind of unsupervised learning, its number data divide into several classes, in same class According to similitude answer as large as possible, the otherness of the data in inhomogeneity is answered as large as possible.By using clustering algorithm pair The communicating data of client carries out classification processing, communicating data can be divided into different classification according to different characteristic, convenient for subsequent Further screening belongs to the classification of harassing call.

It, usually can be by communicating data information to determine whether being harassing call in face of first phone.For example, high-end quotient Business crowd, the common trait that should have: call is more frequent, and monthly average telephone expenses are relatively high, and the duration of call is generally long etc..This reality It applies the air time in example, communication object, the communication frequency, the duration of call are sent a telegram here as main cluster feature with this to divide Number whether be harassing call crowd transfer to phone.

Specifically, clustering algorithm is K-means clustering algorithm, by point for realizing K-means clustering algorithm harassing call The process of class filtering is as follows:

It is assumed that the cluster feature number of the number behavior to be considered is n.

N number of sample { X1, X2 ... Xn } Xn ∈ Rn, i.e. each sample X is n-dimensional vector.

Step a1, k value is random value, randomly selects k point (k≤n), as the cluster centre of k different classifications, is used C_kTo indicate.

Step a2 traverses all data point Xn, by calculating distance, finds the nearest cluster centre point Ck of distance Xn, At this time it may be said that n-th of communicating data belongs to kth class, and obtain k preliminary classification.

Judge that specific data point belongs to the formula of which cluster centre:

C_k=n:k=argmin | X_n-X_k|²}

Step a3 calculates separately the cluster centre point of each classification according to communicating data in different classifications, it can be understood as It calculates the central point of each preliminary classification in step a2 or is particle, as such new cluster centre point, and according to New cluster centre point, classifies to Xn.

Step a4, repeat the above steps a3, until the cluster centre of every one kind be no longer changed namely iteration until receive It holds back, then communicating data classification is completed, using current class state as final classification.

Here the definition of iteration convergence is: cluster centre no longer changes；Each sample (communicating data) clusters to corresponding There is no big variations for the sum of the distance at center.

Specifically, K-means cluster requires to be divided into N number of data point in k set (k≤n), so that most interior square And minimum, cluster goal satisfaction following formula:

With reference to Fig. 2, step S30, comprising the following steps:

S31, according to practical application scene, k value size is set, as cluster centre；

S32, classified according to k value to all communicating datas, determine classification belonging to communicating data.

S33, new cluster centre point is determined for each classification, communicating data is carried out again according to new cluster centre point Classification；

S34, judge whether the cluster centre of classification changes；

S35, if it is not, all classification of output and the communicating datas respectively classified.

For step S31-S32, as described above, by setting k value, and all data point Xn are further traversed, pass through Data point is calculated at a distance from different cluster centre electricity, the nearest cluster centre point Ck of distance Xn is found, at this time it may be said that n-th Communicating data belongs to kth class, obtains k preliminary classification, and all communicating datas are separately dispensed into pair according to above-mentioned condition In the classification answered, the preliminary classification for communicating data is realized.

For step S33-S35, after communicating data is carried out preliminary classification, all communicating datas are each assigned to It is accurate and reliable in order to verify the communicating data for including in all classification in suitable classification, it needs according to the institute in current class There is communicating data from newly calculating a new cluster centre point (particle), as the cluster centre point of the classification, and it is again right Communicating data is classified, if the cluster centre after reclassifying does not change, represents the classification accurate stable, can be with It is used as final classification result, without modifying.If the cluster centre after reclassifying changes, then repeat Step S33 is carried out, until cluster centre no longer changes, classification state at this time is final classification state, can be exported all It include the classification of different communicating datas.

For step S40, classification belonging to harassing and wrecking communicating data is filtered out, and includes that communicating data blocks to the classification It cuts.After all data classifications are good, the communicating data that part is determined as molestation is selected from original communicating data, And navigate in the classification where being determined as the communicating data of molestation, determine whether classification is harassing and wrecking call according to accounting Classification belonging to data, for example, 100 communicating datas for being determined as harassing call behavior are selected, if above-mentioned 100 logical Words data have 75 to be located at kth₁In classification, other 25 are scattered in other different classification, at this point, if presetting The communicating data of harassing call behavior is more than 50 and represents this and be classified as classifying belonging to harassing call in good each classification, then Kth₁Classification had both been classification belonging to harassing call, it should be understood that above-mentioned 100,75 and 50 are only intended to illustrate It is bright, practical judgement data are not represented.

Harassing and wrecking communicating data that is to say the communicating data of molestation, pass through the communicating data for the molestation that part determines The call behavior properties of different classifications is reversely verified and positioned, the accuracy and effectively of molestation communicating data judgement is improved Property, improving harassing call interception is efficiency, substantially reduces the harassing and wrecking frequency of phone, provides the user with optimal user experience.

With reference to Fig. 3, step S40 the following steps are included:

S41, the harassing and wrecking communicating data for including in communicating data is obtained；

S42, classification information belonging to harassing and wrecking communicating data is obtained；

S43, the classification information according to belonging to harassing and wrecking communicating data, determine classification belonging to harassing and wrecking communicating data.

This programme is in the air time for obtaining client's communicating data, and communication object, the communication frequency and the duration of call are as master After the cluster feature wanted, classified using K-means clustering algorithm to communicating data, determines classification belonging to harassing call, and Harassing call is effectively intercepted, to substantially reduce the harassing and wrecking frequency of phone, provides optimal user experience for client.

With reference to Fig. 4-6, the invention also provides a kind of devices of automatic shield harassing call, specifically include,

Data capture unit 10, for obtaining the communicating data of client；

Data acquisition unit 20 obtains cluster feature for being acquired to communicating data；

Data sorting unit 30, for being classified to communicating data using clustering algorithm according to cluster feature；

Interception unit 40 is screened, includes communicating data for filtering out classification belonging to harassing and wrecking communicating data, and to the classification It is intercepted.

For data capture unit 10, the communicating data of client is obtained, includes multiple and different clients in communicating data Call details data, such as conversation object (incoming number), the air time, duration of call, and the communication frequency etc. information, separately Outside, each client can also be associated with multiple message registrations simultaneously.

For data acquisition unit 20, the communicating data got is acquired, obtains the air time of communicating data, communication pair As, the communication frequency and the duration of call are main cluster feature, cluster feature can be used for subsequent progress communicating data classification, into And filter out harassing call.

Data acquisition unit 20 further includes collection apparatus module 21, and collection apparatus module 21 is for acquiring in communicating data Air time, communication object communicate the frequency and the duration of call as cluster feature.

For data sorting unit 30, clustering algorithm is one of data mining algorithm most frequently, is widely used Range mainly carries out a point heap sort according to cluster feature, makes have similar, same characteristic features users to assign to together, simultaneously exists User's difference of the different same clans shows maximization.Clustering algorithm is a kind of unsupervised learning, it is same data divide into several classes The similitude of data in class is answered as large as possible, and the otherness of the data in inhomogeneity is answered as large as possible.By using poly- Class algorithm carries out classification processing to the communicating data of client, communicating data can be divided into different classification according to different characteristic, Belong to the classification of harassing call convenient for subsequent further screening.

Judge that specific data point belongs to the formula of which cluster centre:

C_k=n:k=argmin | X_n-X_k|²}。

With reference to Fig. 5, data sorting unit 30 includes setup module 31, categorization module 32, and center resets module 33, and variation is sentenced Disconnected module 34 and classification output module 35.

Setup module 31 is used for according to practical application scene, if defining K value size, as cluster centre.

Categorization module 32 determines classification belonging to communicating data for classifying according to K value to all communicating datas.

Center resets module 33, for determining new cluster centre point for each classification, according to new cluster centre point pair Communicating data is reclassified.

Change judgment module 34, for judging whether the cluster centre of classification changes.

Classification output module 35 exports all calls classified and respectively classified if not changing for cluster centre Data.

For setup module 31 and categorization module 32, as described above, by setting k value, and further traverse all numbers Strong point Xn finds the nearest cluster centre point Ck of distance Xn, at this time by calculating data point at a distance from different cluster centre electricity It may be said that n-th of communicating data belongs to kth class, k preliminary classification is obtained, and by all communicating datas according to above-mentioned condition point It is not assigned in corresponding classification, realizes the preliminary classification for communicating data.

Module 33, variation judgment module 34 and classification output module 35 are reset for center, carried out just by communicating data After step classification, all communicating datas are each assigned in suitable classification, in order to verify the call for including in all classification Data are accurate and reliable, need according to all communicating datas in current class from newly calculating a new cluster centre point (matter Point), as the cluster centre point of the classification, and classify again to communicating data, if the cluster centre after reclassifying It does not change, then represents the classification accurate stable, can be used as the use of final classification result, without modifying.If Cluster centre after reclassifying changes, then repeats to determine new cluster centre point for each classification, according to new cluster Central point reclassifies communicating data, and until cluster centre no longer changes, classification state at this time is final classification State, can export it is all include different communicating datas classification.

For screening interception unit 40, classification belonging to harassing and wrecking communicating data is filtered out, and include communicating data to the classification It is intercepted.After all data classifications are good, part is selected from original communicating data and is determined as the logical of molestation Talk about data, and navigate in the classification where being determined as the communicating data of molestation, determined according to accounting classification whether be Classification belonging to communicating data is harassed, for example, 100 communicating datas for being determined as harassing call behavior are selected, if on Stating 100 communicating datas has 75 to be located at kth₁In classification, other 25 are scattered in other different classification, at this point, such as It is to represent this to be classified as harassing call institute that the communicating data of harassing call behavior, which is more than 50, in the pre-set each classification of fruit Belong to classification, then kth₁Classification had both been classification belonging to harassing call, it should be understood that above-mentioned 100,75 and 50 are For for example, not representing practical judgement data.

The call behavior property of different classifications is reversely verified and positioned to the communicating data of the molestation determined by part, The accuracy and validity for improving the judgement of molestation communicating data, improving harassing call interception is efficiency, is substantially reduced The harassing and wrecking frequency of phone provides the user with optimal user experience.

With reference to Fig. 6, screening interception unit 40 includes the first acquisition module 41, and second obtains module 42 and harassing and wrecking categorization module 43。

First obtains module 41, for obtaining the harassing and wrecking communicating data for including in communicating data.

Second obtains module 42, for obtaining classification information belonging to harassing and wrecking communicating data.

Categorization module 43 is harassed, for the classification information according to belonging to harassing and wrecking communicating data, determines harassing and wrecking communicating data institute Belong to classification.

The above description is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all utilizations Equivalent structure or equivalent flow shift made by description of the invention and accompanying drawing content is applied directly or indirectly in other correlations Technical field, be included within the scope of the present invention.

Claims

1. a kind of method of automatic shield harassing call, which is characterized in that including,

Obtain the communicating data of client；

Communicating data is acquired, cluster feature is obtained；

2. the method for automatic shield harassing call as described in claim 1, which is characterized in that described to be adopted to communicating data Collection obtains cluster feature step, including,

3. the method for automatic shield harassing call as described in claim 1, which is characterized in that described to use clustering algorithm to logical It talks about data and carries out classifying step, including,

4. the method for automatic shield harassing call as claimed in claim 3, which is characterized in that described to be led to according to K value to all Words data are classified, and after determining classifying step belonging to communicating data, further include,

New cluster centre point is determined for each classification, and communicating data is reclassified according to new cluster centre point；

Judge whether the cluster centre of classification changes；

5. the method for automatic shield harassing call as described in claim 1, which is characterized in that described to filter out harassing and wrecking call number According to affiliated classifying step, including,

6. a kind of device of automatic shield harassing call characterized by comprising

Data capture unit, for obtaining the communicating data of client；

Interception unit is screened, includes that communicating data blocks for filtering out classification belonging to harassing and wrecking communicating data, and to the classification It cuts.

7. the device of automatic shield harassing call as claimed in claim 6, which is characterized in that the data acquisition unit also wraps Collection apparatus module is included, collection apparatus module is used to acquire the air time in communicating data, and communication object communicates the frequency and leads to Duration is talked about as cluster feature.

8. the device of automatic shield harassing call as claimed in claim 6, which is characterized in that the data sorting unit includes Setup module and categorization module,

The categorization module determines classification belonging to communicating data for classifying according to K value to all communicating datas.

9. the device of automatic shield harassing call as claimed in claim 8, which is characterized in that the data sorting unit also wraps The center of having included resets module, variation judgment module and classification output module,

The center resets module, for determining new cluster centre point for each classification, according to new cluster centre point to logical Words data are reclassified；

The classification output module exports all call numbers classified and respectively classified if not changing for cluster centre According to.

10. the device of automatic shield harassing call as claimed in claim 6, which is characterized in that the screening interception unit packet The first acquisition module is included, second obtains module and harassing and wrecking categorization module,

The harassing and wrecking categorization module determines belonging to harassing and wrecking communicating data for the classification information according to belonging to harassing and wrecking communicating data Classification.