CN103716471A - User call behavior model generating method applicable to spam voice filtering - Google Patents

User call behavior model generating method applicable to spam voice filtering Download PDF

Info

Publication number
CN103716471A
CN103716471A CN201310698598.1A CN201310698598A CN103716471A CN 103716471 A CN103716471 A CN 103716471A CN 201310698598 A CN201310698598 A CN 201310698598A CN 103716471 A CN103716471 A CN 103716471A
Authority
CN
China
Prior art keywords
mrow
msub
call
mover
msubsup
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310698598.1A
Other languages
Chinese (zh)
Other versions
CN103716471B (en
Inventor
王非
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201310698598.1A priority Critical patent/CN103716471B/en
Publication of CN103716471A publication Critical patent/CN103716471A/en
Application granted granted Critical
Publication of CN103716471B publication Critical patent/CN103716471B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a user call behavior model generating method applicable to spam voice filtering. The user call behavior model generating method comprises the following steps that: call interactive behavior characteristics (CI) are established and are used for describing related behavior characteristics of specified users which are adopted as a calling user and a called user, wherein the call interactive behavior characteristics (CI) further include incoming call/outgoing call ratios, call interactive recording characteristic value and interaction strength, and distribution of the incoming call/outgoing call ratios and the call interactive recording characteristic value and interaction strength; call frequency and distribution thereof FCD are established and are used for describing the characteristics of call time in call records, wherein the characteristics of the call time in call records include call frequency values and call frequency value time distribution in a specified statistical duration; and call lasting time and distribution thereof DCT are established and are used for describing the characteristics of lasting time in the call records, wherein the characteristics of the lasting time in the call records include call answering refusal proportions, call average lasting time and histogram distribution of the call lasting time. With the user call behavior model generating method applicable to the spam voice filtering of the invention adopted, a technical problem that a camouflaged call behavior of a spam voice transmitter is hard to find can be solved.

Description

Method for generating user call behavior model suitable for filtering junk voice
Technical Field
The invention belongs to the field of junk voice filtering and data mining, and particularly relates to a method for generating a user call behavior model suitable for junk voice filtering.
Background
With the combination of fixed networks, mobile communication networks, and the internet, voice services are widely used. However, due to the influence of the garbage voice, the voice service is hindered by service expansion, and the malicious users need to be limited in time while normal communication of the users is ensured. The existing junk voice filtering technology is mostly improved on the basis of junk mail and junk short message filtering technology, and can play a role in detecting and filtering junk voice to a certain extent. However, the nature of the content of the spam speech is different from that of the spam, and the filtering technology of the spam has certain limitations. The spam is mostly text filtering, and the spam voice content is multimedia information. Spam filtering allows time delays and spam voice has a high real-time requirement.
An effective and reasonable filtering mechanism requires that communication with a calling party and a called party is as less as possible, and a filtering method based on a call model can be adopted. The calling model objectively reflects whether the user is garbage voice or not according to the behavior characteristics of the call made by the user. The existing call model provides a large amount of spam voice characteristics observed from call behaviors, and adopts a method of a decision tree or a Bayesian classifier to realize spam voice filtering. Under the condition that the calling behavior of the user is not changed, the existing calling model is basically mature and effective. However, the existing detection mechanism based on the call model is difficult to find the disguised call behavior of the spammer, and has certain defects.
Disclosure of Invention
In view of the above defects or improvement needs in the prior art, the present invention provides a method for generating a user call behavior model suitable for spam voice filtering, and aims to solve the technical problem in the prior art that it is difficult to find a disguised call behavior of a spammer.
To achieve the above object, according to one aspect of the present invention, a method for generating a user call behavior model suitable for spam voice filtering is provided, which includes the following steps:
(1) establishing a call interaction behavior characteristic CI for describing relevant behavior characteristics of a specified user as a calling user and a called user, wherein the call interaction behavior characteristic CI further comprises three parts of an incoming-outgoing ratio, a call interaction record characteristic value, interaction strength and distribution thereof, and can be specifically expressed as follows:
CI={Rin/out,Cout,Cin,Cin/out,Fin/out}
wherein R isin/outAs a proportion of subscribers as calling and called subscribers, CoutThe number of outgoing calls a subscriber has in all calls with different subscribers only as a calling subscriber, CinThe number of the user who has the answering action only as the called user in all the calls with different users, Cin/outIs the number of subscribers simultaneously acting as calling and called subscribers in all calls with different subscribers, Fin/outIs the interaction intensity frequency distribution with other users;
(2) establishing call frequency and distribution thereof FCDThe feature for describing the calling time in the call record includes two parts, namely a calling frequency value in a specified statistical time period and a calling frequency value time distribution, which can be specifically expressed as:
<math> <mrow> <msub> <mi>F</mi> <mi>CD</mi> </msub> <mo>=</mo> <mo>{</mo> <msubsup> <mi>F</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> <msubsup> <mrow> <mo>,</mo> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> </mrow> <mi>out</mi> <mi>T</mi> </msubsup> <mo>}</mo> </mrow> </math>
wherein,
Figure BDA0000441003880000022
to count the absolute frequency value of calls made by users within a time period,
Figure BDA0000441003880000023
is the distribution of the frequency of calls of the users within a statistical time period over 12 time slices of 2 hours in length a day.
(3) Set-up call duration and distribution thereof DCTThe features for describing the duration in the call record include three parts of call rejection ratio, call average duration and histogram distribution of call duration in statistical time period, which can be specifically expressed as:
D CT = { f E T , CT avg T , CTD T }
wherein,
Figure BDA0000441003880000032
a probability value for refusal of answering in the call initiated by the user,
Figure BDA0000441003880000033
average call duration, CTD, for a subscriberTIs the distribution of call duration.
Preferably, step (1) comprises in particular the following sub-steps:
(1-1) statistical user's historical call interaction behavior characteristic parameter Cout,CinAnd Cin/outIn order to facilitate the lateral comparison among users, the method further performs normalization processing on the user data:
<math> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mi>out</mi> </msub> <mrow> <msub> <mi>C</mi> <mi>out</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mi>in</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </math>
<math> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mi>in</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mi>in</mi> </msub> <mrow> <msub> <mi>C</mi> <mi>out</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mi>in</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </math>
<math> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> <mrow> <msub> <mi>C</mi> <mi>out</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mi>in</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </math>
wherein,and
Figure BDA0000441003880000038
the sum of the three is 1.
(1-2) statistics of incoming and outgoing call proportions Rin/outThe calculation formula is as follows:
R in / out = C out C in
(1-3) counting the call interaction intensity CD of the specified user and other usersin/out. The strength of the call interaction between the user and user j is expressed as
Figure BDA00004410038800000310
The calculation formula is as follows:
CD in / out j = INT [ log ( C out j + C in j ) ]
wherein INT [. cndot]The function of rounding is represented by a number of,representing the number of times the user actively calls user j,
Figure BDA0000441003880000041
indicating the number of times the user answered the call from user j.
(1-4) statistical Call interaction Strength distribution CDDin/out
CDDin/out={CDN0,CDN1,CDN2,CDN3}
Among them, CDNiThe calculation formula of (i ═ 0,1, 2, 3) is as follows:
Figure BDA0000441003880000042
CDN (content delivery network)iIs a CD with a value equal to i or greater than iin/outOf wherein COUNT [ ·]In order to be a function of the count,
Figure BDA0000441003880000043
indicating the call interaction strength between the user and the user j, and n indicating the number of all contacts of the user.
(1-5) for CDDin/outNormalization processing is carried out, and the call interaction intensity distribution after normalization is recorded as
Figure BDA0000441003880000044
The concrete expression is as follows:
<math> <mrow> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> <mo>=</mo> <mo>{</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>0</mn> </msub> <mo>,</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>1</mn> </msub> <mo>,</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>2</mn> </msub> <mo>,</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>3</mn> </msub> <mo>}</mo> </mrow> </math>
wherein,
Figure BDA0000441003880000046
for normalized CDNi(i ═ 0,1, 2, 3), the calculation formula is:
<math> <mrow> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>CDN</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>0</mn> </mrow> <mi>a</mi> </msubsup> <msub> <mi>CDN</mi> <mi>k</mi> </msub> </mrow> </mfrac> <mo>.</mo> </mrow> </math>
preferably, step (2) comprises in particular the following sub-steps:
(2-1) counting the absolute frequency value of the call initiated by the user
Figure BDA0000441003880000048
F in / out T = C out T T
Wherein, T is a designated statistical time period length, and the unit is hour, the time period should start from 0 of the first day and end at 24 of the last day, so T should be an integral multiple of 24 hours;
Figure BDA0000441003880000051
the number of calls made by the user as the calling user in the time period.
(2-2) counting call frequency distribution parameters of a specified subscriber within a specified time period T
Figure BDA0000441003880000052
The concrete expression is as follows:
D out T = { D out 1 , D out 2 , . . . , D out 12 , }
wherein,
Figure BDA0000441003880000054
represents the sum of the number of calls in the [2 x (T-1),2 x T) period of each day of the statistical time period T. Each day time can be divided into 12 time slices, each time slice comprises 2 hours, and the 12 time slices specifically comprise: [0,2),[2,4),[4,6),[6,8),[8,10),[10,12),[12,14),[14,16),[16,18),[18,20),[20,22),[22,24).
(2-3) Call frequency distribution parameter to subscribers
Figure BDA0000441003880000055
Carrying out normalization treatment, wherein the treatment process is as follows:
<math> <mrow> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mfrac> <msubsup> <mi>D</mi> <mi>out</mi> <mi>t</mi> </msubsup> <msubsup> <mi>C</mi> <mi>out</mi> <mi>T</mi> </msubsup> </mfrac> <mo>,</mo> <mrow> <mo>(</mo> <mi>t</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow> </math>
the normalized call frequency distribution is then:
<math> <mrow> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mi>T</mi> </msubsup> <mo>=</mo> <mo>{</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mn>1</mn> </msubsup> <mo>,</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mn>2</mn> </msubsup> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mn>12</mn> </msubsup> <mo>,</mo> <mo>}</mo> <mo>.</mo> </mrow> </math>
(2-4) obtaining the calling frequency with use efficiency and the distribution parameter F thereofCD
<math> <mrow> <msub> <mi>F</mi> <mi>CD</mi> </msub> <mo>=</mo> <mo>{</mo> <msubsup> <mi>F</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> <mo>,</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mi>T</mi> </msubsup> <mo>}</mo> <mo>.</mo> </mrow> </math>
Preferably, step (3) comprises in particular the following sub-steps:
(3-1) counting the rate of refused answering in the calling initiated by the user
Figure BDA0000441003880000059
The calculation formula is as follows:
f E T = CR out T C out T
wherein,representing all calls made by the user within a statistical time period,
Figure BDA00004410038800000512
representing the rejected active call times in the statistical period of time;
(3-2) counting average call duration of the subscriber
Figure BDA0000441003880000061
The calculation formula is as follows:
<math> <mrow> <msubsup> <mi>CT</mi> <mi>avg</mi> <mi>T</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> <msubsup> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> </msubsup> <msub> <mi>t</mi> <mi>r</mi> </msub> </mrow> <msubsup> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> </mfrac> </mrow> </math>
wherein,
Figure BDA0000441003880000063
representing the number of calls in a statistical period of time, trRecording the corresponding call duration for the r-th call, wherein the unit is second;
(3-3) histogram distribution of statistical Call duration CTDTSpecifically, the following are shown:
CTD T = { CT D 0 , CT D 1 , . . . , CT D 6 , CT D 7 }
wherein,
Figure BDA0000441003880000065
the call record proportion of the call duration in a certain time period range is represented, and the specific calculation formula is as follows:
Figure BDA0000441003880000066
in general, compared with the prior art, the above technical solution contemplated by the present invention can achieve the following beneficial effects:
1. the invention completes the establishment of the user call behavior model suitable for filtering the junk voice and provides a calculation method of the user call behavior characteristics.
2. Aiming at the defect of insufficient original call behavior data, the invention provides a defining mode of combining three enhanced characteristic parameters.
3. The invention is based on the calling interactive characteristics of the calling party and the called party in the calling process, considers the interactive intensity and the distribution condition thereof besides the basic calling-calling ratio, supplements and explains the characteristics of calling interaction, and is more suitable for filtering junk voice.
Drawings
FIG. 1 is a schematic diagram of a method for generating a user call behavior model suitable for spam voice filtering according to the present invention.
FIG. 2 is a detailed flow chart of step (1) of the method of the present invention.
FIG. 3 is a detailed flow chart of step (2) of the method of the present invention.
FIG. 4 is a detailed flow chart of step (3) of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
The model for describing the user calling behavior provided by the invention comprises a plurality of calling behavior basic parameters, and mainly comprises four parameters, namely a calling user, a called user, calling time, duration and the like. Three different enhanced calling behavior characteristics are further constructed on the basis of the four parameters, wherein the three different enhanced calling behavior characteristics respectively comprise calling interaction characteristics CI, calling frequency and distribution F thereofCDDuration of calls and distribution thereof DCTEach call feature is composed of a respective associated call behavior.
As shown in fig. 1, the method for generating a user call behavior model suitable for spam voice filtering according to the present invention includes the following steps:
(1) establishing a call interaction behavior characteristic CI for describing relevant behavior characteristics of a specified user as a calling user and a called user, wherein the call interaction behavior characteristic CI further comprises three parts of an incoming-outgoing ratio, a call interaction record characteristic value, interaction strength and distribution thereof, and can be specifically expressed as follows:
CI={Rin/out,Cout,Cin,Cin/out,Fin/out}
wherein R isin/outFor users as calling and called partiesProportion of households, CoutThe number of outgoing calls a subscriber has in all calls with different subscribers only as a calling subscriber, CinThe number of the user who has the answering action only as the called user in all the calls with different users, Cin/outIs the number of subscribers simultaneously acting as calling and called subscribers in all calls with different subscribers, Fin/outIs the interaction intensity frequency distribution with other users;
(2) establishing call frequency and distribution thereof FCDThe feature for describing the calling time in the call record includes two parts, namely a calling frequency value in a specified statistical time period and a calling frequency value time distribution, which can be specifically expressed as:
<math> <mrow> <msub> <mi>F</mi> <mi>CD</mi> </msub> <mo>=</mo> <mo>{</mo> <msubsup> <mi>F</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> <mo>,</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mi>T</mi> </msubsup> <mo>}</mo> </mrow> </math>
wherein,
Figure BDA0000441003880000082
to count the absolute frequency value of calls made by users within a time period,
Figure BDA0000441003880000083
is the distribution of the frequency of calls of the users within a statistical time period over 12 time slices of 2 hours in length a day.
(3) Set-up call duration and distribution thereof DCTFeatures for describing duration in call log including percentage of call rejections, average duration of calls, and histogram score of call duration over statistical time periodThe cloth comprises three parts, which can be specifically expressed as:
D CT = { f E T , CT avg T , CTD T }
wherein,a probability value for refusal of answering in the call initiated by the user,
Figure BDA0000441003880000086
average call duration, CTD, for a subscriberTIs the distribution of call duration.
As shown in fig. 2, step (1) of the method of the present invention comprises the steps of:
201, inquiring all historical call records of the specified user, and counting the total outgoing call times CoutTotal number of incoming calls CinAnd total number of incoming and outgoing calls Cin/out
202, C counted in step 201out,CinAnd Cin/outAnd (6) carrying out normalization processing. The historical interactive feature values in the normalized call records are recorded as:
Figure BDA0000441003880000091
and
Figure BDA0000441003880000092
the calculation formula is as follows:
<math> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mi>out</mi> </msub> <mrow> <msub> <mi>C</mi> <mi>out</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mi>in</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </math>
<math> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mi>in</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mi>in</mi> </msub> <mrow> <msub> <mi>C</mi> <mi>out</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mi>in</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </math>
<math> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> <mrow> <msub> <mi>C</mi> <mi>out</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mi>in</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </math>
wherein,and
Figure BDA0000441003880000097
the sum of the three is 1;
203, counting the incoming and outgoing call proportion R of the specified userin/outThe calculation formula is as follows:
R in / out = C out C in
204, counting the number of outgoing calls and incoming calls between the specified user and other users, and recording as CoutAnd CinRepresenting the number of times the user actively calls user j,
Figure BDA00004410038800000910
indicating the number of times the user answered the call from user j. Only the statistical results of the outgoing call and incoming call records of the user and other users are saved.
205, according to the result of step 204, further counting the call interaction strength CD of the specified user and other usersin/out
Figure BDA00004410038800000911
The call interaction strength between the user and the user j is represented by the following calculation formula:
CD in / out j = INT [ log ( C out j + C in j ) ]
wherein INT [. cndot]The function of rounding is represented by a number of,representing the number of times the user actively calls user j,
Figure BDA0000441003880000101
indicating the number of times the user answered the call from user j.
206, statistics of call interaction intensity distribution CDD of specified usersin/out
CDDin/out={CDN0,CDN1,CDN2,CDN3}
Among them, CDNiThe calculation formula of (i ═ 0,1, 2, 3) is as follows:
CDN (content delivery network)iIs a CD with a value equal to i or greater than iin/outOf wherein COUNT [ ·]In order to be a function of the count,
Figure BDA0000441003880000103
indicating the call interaction strength between the user and the user j, and n indicating the number of all contacts of the user.
207, for CDDin/outNormalization processing is carried out, and the call interaction intensity distribution after normalization is recorded asThe concrete expression is as follows:
<math> <mrow> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> <mo>=</mo> <mo>{</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>0</mn> </msub> <mo>,</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>1</mn> </msub> <mo>,</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>2</mn> </msub> <mo>,</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>3</mn> </msub> <mo>}</mo> </mrow> </math>
wherein,
Figure BDA0000441003880000106
for normalized CDNi(i ═ 0,1, 2, 3) the formula:
<math> <mrow> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>CDN</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>0</mn> </mrow> <mi>a</mi> </msubsup> <msub> <mi>CDN</mi> <mi>k</mi> </msub> </mrow> </mfrac> <mo>.</mo> </mrow> </math>
208, the call interaction feature parameter CI for the specified user is output.
As shown in fig. 3, step (2) of the method of the present invention comprises the steps of:
301, inquiring all historical call records including incoming calls and outgoing calls in a specified time period of a specified user;
302, count absolute frequency values of calls originated by specified users
Figure BDA0000441003880000109
F in / out T = C out T T
Wherein T is the length of a specified statistical time period, the unit is hour, and T is an integral multiple of 24 hours;
Figure BDA0000441003880000112
the number of calling times of the user as the calling user in the time period;
303, when counting and pointingCall frequency distribution parameter for specified subscriber within time period TThe concrete expression is as follows:
D out T = { D out 1 , D out 2 , . . . , D out 12 , }
wherein,
Figure BDA0000441003880000115
represents the sum of the number of calls in the [2 x (T-1),2 x T) period of each day of the statistical time period T. Each day can be divided into 12 time slices, each time slice comprises 2 hours, and the 12 time slices are as follows: [0,2), [2,4), …, [22, 24);
304, calling frequency distribution parameter to userCarrying out normalization treatment, wherein the treatment process is as follows:
<math> <mrow> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mfrac> <msubsup> <mi>D</mi> <mi>out</mi> <mi>t</mi> </msubsup> <msubsup> <mi>C</mi> <mi>out</mi> <mi>T</mi> </msubsup> </mfrac> <mo>,</mo> <mrow> <mo>(</mo> <mi>t</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow> </math>
the normalized call frequency distribution is
<math> <mrow> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mi>T</mi> </msubsup> <mo>=</mo> <mo>{</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mn>1</mn> </msubsup> <mo>,</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mn>2</mn> </msubsup> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mn>12</mn> </msubsup> <mo>,</mo> <mo>}</mo> <mo>;</mo> </mrow> </math>
305, outputting the calling frequency of the specified user in the specified statistical time and the distribution parameter F thereofCD
As shown in fig. 4, step (3) of the method of the present invention comprises the steps of:
401, querying all historical call records, including incoming calls and outgoing calls, of a specified user within a specified time period T;
402, counting the rejected ratio of the user's call
Figure BDA0000441003880000119
The calculation formula is as follows:
f E T = CR out T C out T
wherein,
Figure BDA0000441003880000121
representing all calls made by the user during the statistical period,
Figure BDA0000441003880000122
representing the rejected active call times in the statistical period of time;
403, counting the average call duration of the user
Figure BDA0000441003880000123
The calculation formula is as follows:
<math> <mrow> <msubsup> <mi>CT</mi> <mi>avg</mi> <mi>T</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> <msubsup> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> </msubsup> <msub> <mi>t</mi> <mi>r</mi> </msub> </mrow> <msubsup> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> </mfrac> </mrow> </math>
wherein,
Figure BDA0000441003880000125
representing the number of calls in a given time period T, TrRecording the corresponding call duration for the r-th call, wherein the unit is second;
404, histogram distribution of statistical Call duration CTDTSpecifically, the following are shown:
CTD T = { CT D 0 , CT D 1 , . . . , CT D 6 , CT D 7 }
wherein,
Figure BDA0000441003880000127
the call record proportion of the call duration within a certain time period range in the specified time period T is represented, and the specific calculation formula is as follows:
Figure BDA0000441003880000128
405, outputting the call duration of the specified user in the specified statistical time and the distribution parameter D thereofCT
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (4)

1. A method for generating a user call behavior model suitable for spam voice filtering is characterized by comprising the following steps:
(1) establishing a call interaction behavior characteristic CI for describing relevant behavior characteristics of a specified user as a calling user and a called user, wherein the call interaction behavior characteristic CI further comprises three parts of an incoming-outgoing ratio, a call interaction record characteristic value, interaction strength and distribution thereof, and can be specifically expressed as follows:
CI={Rin/out,Cout,Cin,Cin/out,Fin/out}
wherein R isin/outAs a proportion of subscribers as calling and called subscribers, CoutThe number of outgoing calls a subscriber has in all calls with different subscribers only as a calling subscriber, CinThe number of the user who has the answering action only as the called user in all the calls with different users, Cin/outIs the number of subscribers simultaneously acting as calling and called subscribers in all calls with different subscribers, Fin/outIs the interaction intensity frequency distribution with other users;
(2) establishing call frequency and distribution thereof FCDThe feature for describing the calling time in the call record includes two parts, namely a calling frequency value in a specified statistical time period and a calling frequency value time distribution, which can be specifically expressed as:
<math> <mrow> <msub> <mi>F</mi> <mi>CD</mi> </msub> <mo>=</mo> <mo>{</mo> <msubsup> <mi>F</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> <mo>,</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mi>T</mi> </msubsup> <mo>}</mo> </mrow> </math>
wherein,
Figure FDA0000441003870000012
to count the absolute frequency value of calls made by users within a time period,
Figure FDA0000441003870000013
is the distribution of the frequency of calls of the users within a statistical time period over 12 time slices of 2 hours in length a day.
(3) Set-up call duration and distribution thereof DCTFeatures for describing duration in call records, including call rejection ratio, call average in statistical time periodsThe histogram distribution of the mean duration and the call duration may be specifically expressed as:
D CT = { f E T , CT avg T , CTD T }
wherein,
Figure FDA0000441003870000022
a probability value for refusal of answering in the call initiated by the user,
Figure FDA0000441003870000023
average call duration, CTD, for a subscriberTIs the distribution of call duration.
2. The generation method according to claim 1, characterized in that step (1) comprises in particular the sub-steps of:
(1-1) statistical user's historical call interaction behavior characteristic parameter Cout,CinAnd Cin/outIn order to facilitate the lateral comparison among users, the method further performs normalization processing on the user data:
<math> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mi>out</mi> </msub> <mrow> <msub> <mi>C</mi> <mi>out</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mi>in</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </math>
<math> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mi>in</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mi>in</mi> </msub> <mrow> <msub> <mi>C</mi> <mi>out</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mi>in</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </math>
<math> <mrow> <msub> <mover> <mi>C</mi> <mo>&OverBar;</mo> </mover> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> <mo>=</mo> <mfrac> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> <mrow> <msub> <mi>C</mi> <mi>out</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mi>in</mi> </msub> <mo>+</mo> <msub> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> </mrow> </mfrac> </mrow> </math>
wherein,and
Figure FDA0000441003870000028
the sum of the three is 1;
(1-2) statistics of incoming and outgoing call proportion Rin/outThe calculation formula is as follows:
R in / out = C out C in
(1-3) counting the call interaction intensity CD of the specified user and other usersin/out. The strength of the call interaction between the user and user j is expressed as
Figure FDA00004410038700000210
The calculation formula is as follows:
CD in / out j = INT [ log ( C out j + C in j ) ]
wherein INT [. cndot]The function of rounding is represented by a number of,
Figure FDA0000441003870000031
representing the number of times the user actively calls user j,
Figure FDA0000441003870000032
indicating the number of times the user answered the call from user j;
(1-4) statistical Call interaction Strength distribution CDDin/out
CDDin/out={CDN0,CDN1,CDN2,CDN3}
Among them, CDNiThe calculation formula of (i ═ 0,1, 2, 3) is as follows:
Figure FDA0000441003870000033
CDN (content delivery network)iIs DC of value equal to i or greater than or equal to iin/outOf wherein COUNT [ ·]In order to be a function of the count,
Figure FDA0000441003870000034
representing the call interaction strength between the user and the user j, and n representing the number of all contacts of the user;
(1-5) for CDDin/outNormalization processing is carried out, and the call interaction intensity distribution after normalization is recorded asThe concrete expression is as follows:
<math> <mrow> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> </msub> <mo>=</mo> <mo>{</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>0</mn> </msub> <mo>,</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>1</mn> </msub> <mo>,</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>2</mn> </msub> <mo>,</mo> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mn>3</mn> </msub> <mo>}</mo> </mrow> </math>
wherein,
Figure FDA0000441003870000037
for normalized CDNi(i ═ 0,1, 2, 3), the calculation formula is:
<math> <mrow> <msub> <mover> <mi>CDN</mi> <mo>&OverBar;</mo> </mover> <mi>i</mi> </msub> <mo>=</mo> <mfrac> <msub> <mi>CDN</mi> <mi>i</mi> </msub> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>k</mi> <mo>+</mo> <mn>0</mn> </mrow> <mi>a</mi> </msubsup> <msub> <mi>CDN</mi> <mi>k</mi> </msub> </mrow> </mfrac> <mo>.</mo> </mrow> </math>
3. the generation method according to claim 1, characterized in that step (2) comprises in particular the sub-steps of:
(2-1) counting the absolute frequency value of the call initiated by the user
Figure FDA00004410038700000310
F in / out T = C out T T
Wherein, T is a designated statistical time period length, and the unit is hour, the time period should start from 0 of the first day and end at 24 of the last day, so T should be an integral multiple of 24 hours;
Figure FDA0000441003870000042
the number of calling times of the user as the calling user in the time period;
(2-2) counting call frequency distribution parameters of a specified subscriber within a specified time period T
Figure FDA0000441003870000043
The concrete expression is as follows:
D out T = { D out 1 , D out 2 , . . . , D out 12 , }
wherein,represents the sum of the number of calls in the [2 x (T-1),2 x T) period of each day of the statistical time period T. Each day time can be divided into 12 time slices, each time slice comprises 2 hours, and the 12 time slices specifically comprise: [0,2),[2,4),[4,6),[6,8),[8,10),[10,12),[12,14),[14,16),[16,18),[18,20),[20,22),[22,24).
(2-3) Call frequency distribution parameter to subscribers
Figure FDA0000441003870000046
Go on to unityThe chemical treatment process comprises the following steps:
<math> <mrow> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mi>t</mi> </msubsup> <mo>=</mo> <mfrac> <msubsup> <mi>D</mi> <mi>out</mi> <mi>t</mi> </msubsup> <msubsup> <mi>C</mi> <mi>out</mi> <mi>T</mi> </msubsup> </mfrac> <mo>,</mo> <mrow> <mo>(</mo> <mi>t</mi> <mo>=</mo> <mn>1</mn> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <mn>12</mn> <mo>)</mo> </mrow> <mo>;</mo> </mrow> </math>
the normalized call frequency distribution is then:
<math> <mrow> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mi>T</mi> </msubsup> <mo>=</mo> <mo>{</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mn>1</mn> </msubsup> <mo>,</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mn>2</mn> </msubsup> <mo>,</mo> <mo>.</mo> <mo>.</mo> <mo>.</mo> <mo>,</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mn>12</mn> </msubsup> <mo>,</mo> <mo>}</mo> <mo>;</mo> </mrow> </math>
(2-4) obtaining the calling frequency with use efficiency and the distribution parameter F thereofCD
<math> <mrow> <msub> <mi>F</mi> <mi>CD</mi> </msub> <mo>=</mo> <mo>{</mo> <msubsup> <mi>F</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> <mo>,</mo> <msubsup> <mover> <mi>D</mi> <mo>&OverBar;</mo> </mover> <mi>out</mi> <mi>T</mi> </msubsup> <mo>}</mo> <mo>.</mo> </mrow> </math>
4. The generation method according to claim 1, characterized in that step (3) comprises in particular the sub-steps of:
(3-1) counting the rate of refused answering in the calling initiated by the user
Figure FDA0000441003870000051
The calculation formula is as follows:
f E T = CR out T C out T
wherein,representing all calls made by the user within a statistical time period,
Figure FDA0000441003870000054
representing the rejected active call times in the statistical period of time;
(3-2) counting average call duration of the subscriber
Figure FDA0000441003870000055
The calculation formula is as follows:
<math> <mrow> <msubsup> <mi>CT</mi> <mi>avg</mi> <mi>T</mi> </msubsup> <mo>=</mo> <mfrac> <mrow> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>r</mi> <mo>=</mo> <mn>1</mn> </mrow> <msubsup> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> </msubsup> <msub> <mi>t</mi> <mi>r</mi> </msub> </mrow> <msubsup> <mi>C</mi> <mrow> <mi>in</mi> <mo>/</mo> <mi>out</mi> </mrow> <mi>T</mi> </msubsup> </mfrac> </mrow> </math>
wherein,
Figure FDA0000441003870000057
representing the number of calls in a statistical period of time, trRecording the corresponding call duration for the r-th call;
(3-3) histogram distribution of statistical Call duration CTDTSpecifically, the following are shown:
CTD T = { CT D 0 , CT D 1 , . . . , CT D 6 , CT D 7 }
wherein,
Figure FDA0000441003870000059
the call record proportion of the call duration in a certain time period range is represented, and the specific calculation formula is as follows:
Figure FDA00004410038700000510
CN201310698598.1A 2013-12-18 2013-12-18 A kind of user being applicable to rubbish voice filtering calls out the generation method of behavior model Expired - Fee Related CN103716471B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310698598.1A CN103716471B (en) 2013-12-18 2013-12-18 A kind of user being applicable to rubbish voice filtering calls out the generation method of behavior model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310698598.1A CN103716471B (en) 2013-12-18 2013-12-18 A kind of user being applicable to rubbish voice filtering calls out the generation method of behavior model

Publications (2)

Publication Number Publication Date
CN103716471A true CN103716471A (en) 2014-04-09
CN103716471B CN103716471B (en) 2015-11-04

Family

ID=50409028

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310698598.1A Expired - Fee Related CN103716471B (en) 2013-12-18 2013-12-18 A kind of user being applicable to rubbish voice filtering calls out the generation method of behavior model

Country Status (1)

Country Link
CN (1) CN103716471B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106657689A (en) * 2015-11-04 2017-05-10 中国移动通信集团公司 Method for preventing and controlling international fraud call and apparatus thereof
CN110233938A (en) * 2019-05-14 2019-09-13 中国科学院信息工程研究所 A kind of clique's fraudulent call recognition methods based on dubiety measurement
CN110868493A (en) * 2019-11-08 2020-03-06 中国建设银行股份有限公司 Calling outgoing call control method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070201660A1 (en) * 2006-01-26 2007-08-30 International Business Machines Corporation Method and apparatus for blocking voice call spam
CN101262524A (en) * 2008-04-23 2008-09-10 沈阳东软软件股份有限公司 Rubbish voice filtration method and system
CN101459718A (en) * 2009-01-06 2009-06-17 华中科技大学 Rubbish voice filtering method based on mobile communication network and system thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070201660A1 (en) * 2006-01-26 2007-08-30 International Business Machines Corporation Method and apparatus for blocking voice call spam
CN101262524A (en) * 2008-04-23 2008-09-10 沈阳东软软件股份有限公司 Rubbish voice filtration method and system
CN101459718A (en) * 2009-01-06 2009-06-17 华中科技大学 Rubbish voice filtering method based on mobile communication network and system thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王非: "自组织网络信誉模型及其应用研究", 《中国博士学位论文全文数据库(信息科技辑)》 *
王非;莫益军;黄本熊: "基于信誉的P2P-VoIP垃圾语音过滤模型", 《华中科技大学学报(自然科学版)》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106657689A (en) * 2015-11-04 2017-05-10 中国移动通信集团公司 Method for preventing and controlling international fraud call and apparatus thereof
CN110233938A (en) * 2019-05-14 2019-09-13 中国科学院信息工程研究所 A kind of clique's fraudulent call recognition methods based on dubiety measurement
CN110868493A (en) * 2019-11-08 2020-03-06 中国建设银行股份有限公司 Calling outgoing call control method and device

Also Published As

Publication number Publication date
CN103716471B (en) 2015-11-04

Similar Documents

Publication Publication Date Title
CN103838814B (en) Method for dynamically displaying contacts diagram relationship
CN104936182B (en) A kind of method and system of intelligence management and control fraudulent call
US8443049B1 (en) Call processing using trust scores based on messaging patterns of message source
CN104104772B (en) One kind fraud telephone prompts method, server and system
CN102255890A (en) User recommendation and information interaction system and method
WO2011143847A1 (en) Short message monitoring system and method
CN104348974A (en) Keyword-verification-based specific message prompting method for communication group
CN101321070B (en) Monitoring system and method for suspicious user
CN101997692A (en) Friend information generating method of instant messaging (IM) software based on voice communication records
CN105245434B (en) A kind of information instant communication method
WO2011153744A1 (en) Method and system for monitoring spam short message
CN103716471B (en) A kind of user being applicable to rubbish voice filtering calls out the generation method of behavior model
CN109451183B (en) Method for preventing unwelcome telephone
CN101909261A (en) Method and system for monitoring spam
CN104363161A (en) Specific message reminding method in communication group
CN101945006B (en) Detection method of abnormal call
KR101031901B1 (en) Social network analyzing method and system based on communication record
CN102905236B (en) A kind of junk short message monitoring method, Apparatus and system
CN103281464B (en) Based on signaling deficient stop missed call notification system and method
CN104618616B (en) Videoconference participant identification system and method based on speech feature extraction
CN110798379B (en) VoIP signaling gateway identification method, device and readable storage medium
CN106899492B (en) Method for mining relationship chain of colleague users
Sorge et al. A provider-level reputation system for assessing the quality of spit mitigation algorithms
WO2016150111A1 (en) Data processing method, device and system based on call reminder
CN106888229B (en) Call management method and server

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20151104

CF01 Termination of patent right due to non-payment of annual fee