CN114025041B - System and method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling - Google Patents

System and method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling Download PDF

Info

Publication number
CN114025041B
CN114025041B CN202111429772.3A CN202111429772A CN114025041B CN 114025041 B CN114025041 B CN 114025041B CN 202111429772 A CN202111429772 A CN 202111429772A CN 114025041 B CN114025041 B CN 114025041B
Authority
CN
China
Prior art keywords
signaling
call
feature
frequency
calling
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111429772.3A
Other languages
Chinese (zh)
Other versions
CN114025041A (en
Inventor
李宏图
崔隆
吴仲文
柏京
贾泉臻
卢丹
郭心如
杨晓宇
孙永学
王荣辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Best Tone Information Service Corp Ltd
Original Assignee
Best Tone Information Service Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Best Tone Information Service Corp Ltd filed Critical Best Tone Information Service Corp Ltd
Priority to CN202111429772.3A priority Critical patent/CN114025041B/en
Publication of CN114025041A publication Critical patent/CN114025041A/en
Application granted granted Critical
Publication of CN114025041B publication Critical patent/CN114025041B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/66Substation equipment, e.g. for use by subscribers with means for preventing unauthorised or fraudulent calling
    • H04M1/663Preventing unauthorised calls to a telephone set
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/22Arrangements for supervision, monitoring or testing
    • H04M3/2281Call monitoring, e.g. for law enforcement purposes; Call tracing; Detection or prevention of malicious calls
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2119/00Details relating to the type or aim of the analysis or the optimisation
    • G06F2119/02Reliability analysis or reliability optimisation; Failure analysis, e.g. worst case scenario performance, failure mode and effects analysis [FMEA]

Abstract

The application relates to the fields of telecommunication technology, big data and the like, in particular to a non-frequency characteristic rapid recognition nuisance call system based on signaling, which consists of a signaling acquisition unit, a historical signaling total database unit, a signaling characteristic classification modeling unit and a signaling monitoring and nuisance call interception unit. The signaling acquisition unit is used for acquiring original signaling, converting the original signaling into a call detail record, and then transmitting the call detail record to the historical signaling full database unit for storage. The signaling feature classification modeling unit forms a non-frequency feature crank call discrimination library by using sample data. And the signaling monitoring and harassment call interception unit is used for monitoring the signaling in real time, and combining the non-frequency characteristic harassment call identification library harassment calls and intercepting or reminding. The application also includes methods. According to the application, full-feature training modeling is used, and non-frequency feature judgment is acquired in the recognition stage to recognize nuisance calls, so that the secondary sampling in the recognition stage and the repeated calculation of feature vectors are avoided, the real-time recognition is realized, and the recognition accuracy is improved.

Description

System and method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling
Technical Field
The application relates to the fields of telecommunication technology, machine learning, big data and the like, in particular to a system and a method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling.
Background
Harassment calls are a poison tumor in the current network society and the actual society, and the harm to individuals and even the whole society is huge. Besides, a large number of illegal calls occupy precious communication resources, which directly causes the problems of reduced call completing rate, network equipment congestion and the like, and greatly reduces the experience of legal mobile users.
The current harassing call treatment schemes can be basically divided into three main categories from the technical field:
first, by user identification
The method marks the calling number by means of user complaints including telephone complaints, APP report and the like, and then intercepts or APP intercepts the calling number on a communication channel through an operator
Second, introduce speech recognition techniques
The method judges that the harassing call is matched with the existing harassing call voice library in an audio comparison mode, and if the harassing call is not matched, the method further submits manual screening.
Third, machine learning classification algorithm
In recent years, as big data technology, machine learning and artificial intelligence algorithms are mature, methods are increasingly adopted by the industry, and particularly, a non-supervision clustering algorithm k-means, a decision tree algorithm, a naive Bayesian classification algorithm and the like are used for training and modeling call signaling characteristics so as to achieve the purposes of classifying and screening nuisance calls.
However, all three types of methods have the defect and even can not solve the problems:
for the first method, the workload and labor cost of manual customer service are greatly increased through user complaints or APP report, and the possibility that the number is maliciously reported is greatly increased, so that a large number of misjudgment is caused.
For the second method, the voice recognition and audio comparison method has to monitor calling and called call records, is too invasive, is suspected to infringe user privacy, needs to add a large amount of storage equipment, has high technical implementation complexity, greatly increases software and hardware cost, and has long recognition period.
For the third method, the method of modeling the signaling characteristics by big data and a machine learning algorithm is the most promising method at present, but the method of identifying the harassing call based on the modeling of the signaling characteristics by machine learning at present has a serious defect, namely: the modeling of the harassment call is seriously dependent on the frequency characteristics of the signaling, when a new number which never appears in the full training set and the test set arrives, the frequency characteristics of the signaling, such as calling frequency, call completing rate, called dispersion and the like, cannot be calculated because of only one record, and the frequency characteristics are the characteristics which are derived by statistics based on the statistics of a large amount of data, so that the category of the signaling cannot be rapidly judged. The existing method for identifying the harassment call by using the machine learning algorithm and the signaling features is basically a black list library forming method, is slightly better to make, and is based on short-time resampling, such as 5-minute granularity, and statistics of frequency features based on sample size in 5 minutes, however, the number of the signaling in 5 minutes is obviously far from enough, because the number of times of each signaling in 5 minutes still can be only one, so that the purpose of rapidly identifying and intercepting the harassment call on line cannot be achieved.
Disclosure of Invention
The application aims to provide a system and a method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling, which mainly solve the problems in the prior art and can rapidly identify and intercept nuisance calls on line on the basis of a classification model established by a machine learning algorithm and signaling characteristics.
In order to achieve the above purpose, the technical scheme adopted by the application is to provide a non-frequency characteristic rapid recognition crank call system based on signaling, which is characterized by comprising a signaling acquisition unit, a historical signaling total database unit, a signaling characteristic classification modeling unit and a signaling monitoring and crank call interception unit;
the signaling acquisition unit is used for acquiring original signaling in a communication network, converting the original signaling into a call detail record, and then transmitting the call detail record to the historical signaling full database unit to be used as sample data for storage; the signaling feature classification modeling unit utilizes the sample data provided by the historical signaling total database unit to form a non-frequency feature nuisance call discrimination library which only depends on signaling original attributes; the signaling monitoring and harassment call interception unit is used for monitoring non-frequency characteristic vectors in the signaling in the communication network in real time, and combining the non-frequency characteristic harassment call library to identify and intercept harassment calls or send reminding to clients.
Further, the signaling acquisition unit obtains primary information from the acquired original signaling, wherein the primary information comprises code number information, call duration information, connection information and release information; wherein the connection information comprises a connection rate and a response rate; the release information comprises a calling hook and a called hook; the signaling acquisition unit calculates secondary information comprising calling frequency, calling-to-calling ratio and called dispersion by using the primary information; the signaling acquisition unit generates the call detail record containing the primary information and the secondary information.
The application also provides a method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling by using the system, which is characterized in that,
the signaling acquisition unit is used for acquiring original signaling in daily business, converting the original signaling into the call detail record and then using the call detail record as the sample data, and storing the sample data into the historical signaling total database unit;
utilizing the signaling feature classification modeling unit to establish a feature vector set based on the sample data, and then using machine learning to form a full-feature crank call discrimination library; the feature vector in the full-feature crank call distinguishing library comprises frequency features and non-frequency features, and also comprises a calling number and a number category;
removing the frequency characteristic from the full-characteristic crank call distinguishing library to form a non-frequency characteristic crank call distinguishing library;
the non-frequency feature vector in the signaling is monitored in real time by using the signaling monitoring and harassment call interception unit, similarity calculation is carried out on the non-frequency feature vector in the non-frequency feature harassment call discrimination library, and harassment calls are identified according to the number category of the feature vector;
intercepting the identified harassment call or sending a harassment call reminder to the customer.
Further, in the process of using the signaling feature classification modeling unit to establish a feature vector set based on the sample data and then using machine learning to form a full-feature crank call discrimination library, the method comprises the following steps:
extracting signaling characteristics from the sample data, and establishing the characteristic vector set;
normalizing the feature vector set to construct a feature matrix;
and using the feature matrix to generate the full-feature crank call discrimination library by using machine learning clustering modeling.
Further, the feature vector in the feature vector set includes one or more signaling features including calling frequency, call completing rate, call duration ratio, calling outgoing/incoming ratio, called discrete degree, called number etc. distribution ratio, calling release times, fixed call interval ratio and outsourcing number ratio.
Further, when the feature vector set is normalized, if the value of each signaling feature contained in each feature variable is continuously value, performing linear transformation by adopting discrete normalization, so that the normalized value is greater than or equal to 0 and less than or equal to 1; and if the signaling characteristic is of a Boolean type or discrete value, transforming by adopting single-heat coding, so that the normalized value is more than or equal to 0 and less than or equal to 1.
Further, an unsupervised machine learning algorithm K-Means algorithm is adopted to model the feature matrix clusters, and the full-feature nuisance call discrimination library is generated; in the K-Means algorithm, cosine distances are used for calculating the distances between different feature vectors.
Further, the signaling feature classification modeling unit is utilized to reject the frequency features from the full-feature crank call discrimination library to form the non-frequency feature crank call discrimination library, and the method comprises the following steps:
deleting the frequency characteristic from the characteristic vector in the full-characteristic crank call discrimination library;
grouping by the number category, comparing cosine distances between the feature vectors in different groups, and eliminating two feature vectors participating in comparison at the same time when the cosine distances are smaller than a classification threshold value;
the rest feature vectors form the non-feature vector crank call discrimination library.
Further, the frequency characteristics comprise call frequency, call completing rate, calling-to-calling ratio, called number callback rate and called dispersion; the non-frequency characteristic comprises ringing time, call time, link release reason, whether the called party hangs up and whether the calling party and the called party are out of province.
Further, when the signaling monitoring and harassing call interception unit is used for calculating the similarity, the method comprises the following steps:
the signaling monitoring and harassment call interception unit monitors non-frequency feature vectors in the signaling in real time, and generates feature vectors to be detected after normalization; calculating the cosine distance between the feature vector to be detected and each feature vector in the non-frequency feature crank call discrimination library; and when the cosine distance is smaller than the harassment judgment threshold value, judging that the harassment call is a harassment call.
In view of the technical characteristics, the application uses a machine learning algorithm to train and model with full characteristics (frequency characteristics and non-frequency characteristics), then eliminates the frequency characteristics, and establishes a crank call discrimination library only comprising the non-frequency characteristics. In the recognition stage, the non-frequency characteristics of the signaling which is not needed to be accumulated and calculated can be directly collected and the similarity of the non-frequency characteristics in the recognition library is utilized to judge and recognize the harassment call, so that the work of secondary sampling in the recognition stage and repeated calculation of the characteristic vector is avoided, and further real-time recognition is realized. Meanwhile, during the training modeling period, the frequency characteristics are considered, so that the advantages of big data modeling are exerted, and the recognition accuracy is improved.
Drawings
FIG. 1 is a system block diagram of a preferred embodiment of the signaling-based non-frequency feature quick identification nuisance call system of the present application;
FIG. 2 is a flow chart of a preferred embodiment of the signaling-based method for quickly identifying nuisance calls based on non-frequency characteristics of the present application.
In the figure: the system comprises a 100-signaling acquisition unit, a 200-historical signaling total database unit, a 300-signaling feature classification modeling unit, a 400-signaling monitoring and nuisance call interception unit, a 500-non-frequency feature nuisance call discrimination library and a 600-communication network.
Detailed Description
The application is further described below in conjunction with the detailed description. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application. Furthermore, it should be understood that various changes and modifications can be made by one skilled in the art after reading the teachings of the present application, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.
Referring to fig. 1, the application discloses a system for rapidly identifying nuisance calls based on non-frequency characteristics of signaling. As shown, a preferred embodiment thereof is composed of a signaling collection unit 100, a historical signaling volume database unit 200, a signaling feature classification modeling unit 300, and a signaling monitoring and nuisance call interception unit 400.
The signaling collection unit 100 is deployed in the communication network 600 to collect the original signaling in the daily traffic. The information which can be directly obtained in the original signaling belongs to primary information such as code number information, call duration information, connection information and release information. The information obtained by further calculation is secondary information, including calling frequency, calling-to-calling ratio and called dispersion. The signaling acquisition unit 100 merges and collates the primary information and the secondary information, generates a call detail record and stores the call detail record in the history signaling total database unit 200.
The historical signaling volume database unit 200 provides training sample data for machine learning of the signaling feature class modeling unit 300.
The signaling feature classification modeling unit 300 is used for machine learning classification algorithm modeling, and forms a non-frequency feature crank call discrimination library 500 which only depends on the original attribute of the signaling.
The signaling monitoring and nuisance call interception unit 400 performs real-time monitoring on the real-time signaling in the communication network 600, extracts non-frequency characteristics corresponding to the characteristic vectors in the non-frequency characteristic nuisance call discrimination library 500 from the real-time signaling, and then compares the non-frequency characteristics with the characteristic vectors in the non-frequency characteristic nuisance call discrimination library 500, thereby identifying nuisance calls. For the identified nuisance calls, the signaling monitoring and nuisance call interception unit 400 can directly intercept or send a flash message reminder to the customer by using the communication network 600.
Referring to fig. 2, the application also provides a method for rapidly identifying nuisance calls based on the non-frequency characteristics of the signaling. The signaling is divided into frequency and non-frequency features. The frequency characteristics are information obtained by statistics, such as:
frequency of calls: within a certain time period, a certain calling number initiates call frequency
The call completing rate: a percentage of the number of calls placed by a calling number to the total number of calls of the calling party over a certain period of time.
Ratio of outgoing to incoming: in a certain time period, the ratio of calling times to calling times is a certain ratio.
Called number callback rate: in a certain time period, the called party has a percentage of callback record number in all calls of a certain calling party.
Called dispersion: a percentage of the number of calls that are not repeated in all calls of a caller and the total number of calls initiated by the caller over a certain period of time.
The non-frequency characteristic can be obtained directly from the signaling without counting, such as ringing time, call time, link release reason, whether the called party hangs up and whether the calling party and the called party are out of province.
The application relates to a method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling, which comprises the following steps:
step S101, a signaling acquisition unit is used for acquiring original signaling in daily business.
When the calling user initiates a call signaling from the initiation of a call request to the completion of the call connection of the response of the called party, each signaling contains different parameters with different meanings, and according to the parameters, the characteristics of the called party such as discrete degree, ringing duration, interval duration, call connection rate and the like can be further analyzed. The method for extracting the telephone signaling characteristics comprises the following steps:
the IAM (Initial Address Message initialization address message) message of TUP (Telephone Use Part) protocol contains two key parameters of calling address and called address, and the code number information of the calling user and the called user of the current call can be seen through the two key parameters. Wherein the address signal is the information of the called user; the calling subscriber line identification is the primary subscriber code number information. Thus, the code number information (calling number, called number) can be obtained by means of an IAM message.
In the TUP format of signaling, after the call is released, if the caller hangs up first, the A-office sends forward disconnecting signal (CLF) to the B-office, the B-office sends RLG to the A-office (Release Guard Signal releases the monitored signal), if the callee hangs up first, the B-office sends backward disconnecting signal (CBK) first, then the A-office sends forward CLF to the B-office, and the B-office sends RLG to the A-office. Thus, if the caller is actively on-hook, the duration calculation from ACM (Address Complete Message) message to CLF (Clear-Forward Signal Forward disconnect) message can be used. If the called subscriber is actively on-hook, the duration calculation from ACM message to CBK (Clear-Back Signal on-hook) message can be used. Thus, the call duration information may be calculated using the duration of the TUP protocol based ANC (Answer Signal Charge) message to CLF or CBK message.
The utilization of the turn-on information ACM, ANC and IAM messages is calculated, in particular: the call completing rate can be counted according to ACM message and IAM message. The response rate may be counted based on the ANC message and the IAM message.
The calling called number, the call completing rate and the call duration can be directly obtained from different fields of a signaling TUP format, and the calling called number, the call completing rate and the call duration belong to the first-level information. Other features, such as call frequency, ratio of incoming and outgoing calls, and dispersion of the called, can be derived by statistical calculation based on the primary information, and belong to the secondary information, such as ringing duration = call end time-call start time-call duration. For example, if the call duration is equal to 0, it is considered that the call is not connected.
Step S102, converting the original signaling (TDR) into a Call Detail Record (CDR) and then storing the Call Detail Record (CDR) as sample data into the historical signaling full database unit. After the original signaling is collected, sample data is formed through analysis, protocol analysis and event synthesis.
The original signaling, also called TDR (Transaction Detail Records) signaling, is formatted as follows:
event synthesis of CDR is performed by tracking various original signaling messages of the same call procedure and then obtaining corresponding service information from these messages. The message process has different relevance according to different applications. The CDR contains TUP part records and ISUP part records. To uniquely identify CDR information, a plurality of TDR (Transaction Detail Records) are then synthesized CDR (Call Detail Records) based on the signaling unique identification consisting of three fields OPC (Origination Point Code), DPC (Destination Point Code), CIC (Circuit Identification Code) in the TUP, ISUP message. The main purpose of synthesizing CDR signaling is to facilitate visual observation and analysis of signaling and to facilitate calculation of characteristic values in later stage. The parameters recorded in the CDR come from the original signaling message data, and through further analysis and processing of some important parameters in the records, a data basis can be provided for analyzing the fixed telephone network or the mobile telephone network service and even discriminating the crank call according to the characteristics.
Taking TUP as an example, the CDR records should include the following:
step S103, extracting signaling characteristics based on sample data by using a signaling characteristic classification modeling unit, and establishing a characteristic vector set. The feature vector set contains feature vectors, and has a plurality of signaling features, specifically: calling frequency, call completing rate, call duration ratio, calling-incoming ratio, called discrete degree, called number etc. distribution ratio, calling release times, calling interval fixed ratio and outsourcing number ratio.
In order to utilize the call features in the signaling as input samples for machine learning, the attributes in the signaling must be generalized and sorted to convert some qualitative features into quantitative features. According to the historical experience obtained by summarizing the types of the harassment calls and the signaling characteristics, the statistics analysis can be carried out from the characteristics of calling frequency, call completing rate, calling release times, ringing duration, calling and called calling-to-calling ratio, call duration, called dispersion and the like.
For example, taking 24 hours of signaling data a day as a study object, sampling the peak period of the nuisance call, and quantitatively defining the signaling characteristics by taking 30 minutes as a unit time:
(1) Calling frequency of calling party: the frequency of calls of the harassment calls is high, the number of calls initiated by the harassment numbers in unit time is far higher than that of calls of normal persons, and the call completing rate of the calling numbers is analyzed. Taking 30min per unit time, wherein calling frequency is low frequency every 30min <10 times, 30min > = 10 times <30 is set as intermediate frequency, 30min > = 30 is regarded as high frequency, and we use low, medium and high 3 discrete values as characteristic values;
(2) The call completing rate: the ratio of the number of times the calling party calls the called party to the total number of times the calling party number initiates the call. The characteristic value is continuous, such as 85%, 93.2% and the like, and the value range is between 0% and 100% after taking 24 hours per unit time. The feature may also be configured to be below (including equal to) and above (including equal to) a threshold value, and thus may become a discrete value.
(3) Ratio of talk length to overall call length: in CDR signaling, the call duration is a period of time included in the call start time and the call end time, and according to the observation analysis, the smaller the ratio of the call duration to the call duration is, the larger the probability that the calling number is a nuisance call is, and the smaller the probability that the number is a nuisance call is. The characteristic value is also a continuous value, and the value range is between 0% and 100%.
(4) Calling outgoing-incoming ratio: in general, the ratio of outgoing calls to incoming calls of the calling number of a nuisance call is very high, because the nuisance call is mainly to initiate outgoing calls, and the number of times of being called is very small, and the time period is counted in 24 hours, the outgoing call to incoming call ratio of the nuisance call sometimes reaches 120:1 (i.e. 120), which means that 120 outgoing calls only one user dials the number, and the characteristic value is also a continuous value, and the number of times of being called/the number of times of being called is adopted to enable the characteristic value to be between 0% and 100%.
(5) Called degree of discretization: for some calls for carrying out normal service, such as express company customer service calls, if the calling frequency is used for dividing the calling behavior features, the calling behavior features are likely to be divided into harassing calls, so that the calling behavior features play a vital role in the called number dispersion feature of the calling number in a certain time. Called dispersion refers to the percentage of non-repeated called numbers corresponding to the calling number from which a call was initiated and the total number of calls made by that calling number over a period of time. From the service, the higher the called dispersion of the same calling number, the lower the probability that the calling number is a nuisance call, otherwise, the higher the probability that the calling number is a nuisance call, the characteristic value is a continuous value, and the value range is 0% -100%.
(6) Called number arithmetic distribution ratio: the rule analyzes the distribution characteristics of the called number. When the number ratio of the number of the feature numbers with the equal difference distribution in all the called numbers of the same calling number is higher than a threshold value, the feature value is also a continuous value and is between 0 and 100 percent.
(7) Number of caller releases: this feature analyzes the number of active releases of the calling number and can be configured to be below (including equal to) and above (including above) a threshold.
(8) Fixed duty cycle of call interval: the dialing by using computer software can show the characteristic that the dialing interval is relatively fixed, and the ratio of the dialing times of the calling number dialed at fixed intervals to the total dialing times of the calling number.
(9) The outer province number ratio: the ratio of the number of all called numbers of the same calling number to the total number of called numbers. The characteristic is continuous value, and the value range is 0% -100%.
Step S104, normalizing the feature vector set to construct a feature matrix.
The feature vector is analyzed, sorted and cleaned, and a plurality of signal features contained in the feature vector have different units or magnitudes. In order to enable calculation and comparison of each dimension feature, the feature needs to be normalized so that the feature value of each dimension is within the range of (0, 1).
In this embodiment, the feature vector is sampled from the average of the feature data of 7 days of signaling taken for each day of peak time period, 30 minutes time units, and then counted for 7 days of signaling.
The method is that the calling frequency, the call completing rate, the calling-to-calling ratio, the called dispersion, the ringing time length, the call time length and the like are characterized in that the value is a numerical value, the linear transformation is carried out according to min-max (dispersion standardization) to enable the numerical value to fall between (0 and 1), and the method is that X' = (X-min)/(max-min), wherein max is a sample maximum value, and min is a sample minimum value. And for the Boolean type features (such as whether the called party hangs up earlier) and discrete features (such as the release reason has the state of overtime big network hang-up, user refusal and the like), the one-hot code is adopted, so that the feature value is between (0 and 1).
An example of the normalized feature matrix is shown in the following table:
numbering device Frequency of calls Called call completing rate Ratio of outgoing to incoming Callback rate Called dispersion Ringing duration Duration of call Whether or not the called party hangs up first Whether the called party is out of province
1 0.389 0.031 0.022 0.012 0.491 0.003 0.146 0.562 0.938
2 0.197 0.711 0.187 0.003 0.386 0.011 0.376 0.32 0.837
3 0.401 0.219 0.375 0.109 0.364 0.023 0.264 0.102 0.375
4 0.256 0.887 0.047 0.201 0.318 0.133 0.205 0.104 0.019
5 0.677 0.991 0.419 0.008 0.226 0.005 0.823 0.021 0.311
6 0.025 0.032 0.013 0.101 0.247 0.007 0.323 0.852 0.802
7 0.111 0.021 0.348 0.222 0.015 0.231 0.605 0.887 0.181
8 0.788 0.797 0.226 0.301 0.223 0.144 0.245 0.232 0.239
9 0.012 0.011 0.322 0.121 0.091 0.008 0.343 0.191 0.579
10 0.093 0.187 0.102 0.004 0.198 0.191 0.245 0.212 0.412
11 0.221 0.675 0.875 0.511 0.161 0.022 0.367 0.333 0.229
12 0.733 0.021 0.678 0.177 0.267 0.021 0.719 0.171 0.797
13 0.768 0.309 0.229 0.002 0.057 0.075 0.043 0.76 0.225
14 0.109 0.287 0.538 0.001 0.099 0.091 0.867 0.344 0.275
15 0.219 0.531 0.398 0.119 0.037 0.001 0.556 0.151 0.147
16 0.332 0.082 0.413 0.102 0.042 0.279 0.777 0.817 0.077
17 0.755 0.071 0.298 0.201 0.102 0.219 0.634 0.201 0.196
18 0.234 0.067 0.067 0.412 0.133 0.013 0.211 0.115 0.103
19 0.101 0.203 0.199 0.502 0.167 0.013 0.067 0.719 0.365
20 0.891 0.111 0.671 0.133 0.555 0.009 0.093 0.622 0.411
Step S105, a full-feature crank call discrimination library is formed by using machine learning. And the machine learning adopts an unsupervised machine learning algorithm K-Means algorithm to model the feature matrix cluster, and a full-feature nuisance call discrimination library is generated.
K-Means, also called K-Means, is one of the most common types of clustering algorithms, classifying signaling with similar features into one type. The algorithm has high operation speed and is suitable for continuous quantization characteristics. The basic algorithm flow is that firstly, samples are divided into K clusters according to attributes, the center points of the initial K clusters can be randomly selected, then, each sample point is divided into K clusters according to the principle of closest distance by calculating the distance from each sample point to the centers of the K clusters, and then, the coordinate values of all sample points in each cluster are averaged to be used as new centers of each cluster, and the iteration is carried out until the positions of the centers of the clusters are not changed (the moving distance of the centers of the clusters is smaller than a given value), and the specific steps are as follows: (1) Dividing the original disordered sample points into K clusters, and randomly selecting cluster centers. (2) And calculating the distance from each sample point to the centers of K clusters, and dividing the samples into clusters corresponding to the centers of the clusters closest to each sample point. (3) And updating the cluster center of each cluster by using the coordinate average value of all sample points in the initial K clusters. (4) The original sample points are clustered again according to the methods in (2) and (3), and new cluster centers are recalculated. And until the distance between the new cluster center and the cluster center of the last time is no longer changed or is smaller than a certain given value, ending the clustering process. In the K-Means algorithm process, cosine distances are adopted for calculating the distances among different feature vectors.
For 1.3 hundred million+ signaling data in 7 days, after abnormal data are screened out, a K-Means cluster analysis method is applied to carry out full new command analysis, min-max, one-hot-code standardization processing is adopted for original signaling characteristics, meanwhile, a cosine distance is used as a distance basis, K=5 is selected, and a clustering result is shown as follows.
And S106, deleting the frequency characteristic from the characteristic vector in the full-characteristic crank call discrimination library.
The feature vector in the full feature crank call distinguishing library simultaneously comprises frequency features and non-frequency features, and also comprises the calling number and the number category. However, in order to quickly identify whether a new signaling feature is similar to a nuisance call feature on line, the similarity comparison cannot be performed according to the full feature (frequency feature+non-frequency feature), so that the trained signaling data which has been marked with a class is required to be removed, only the non-frequency feature (ringing duration, call duration, link release reason (overtime large network hang-up, user refusal) is reserved, whether the call is first hung up in the call state, whether the called and the calling are provincial in different places) is reserved, and meanwhile, the information of the number category of the calling is reserved.
And S107, filtering the feature vector set to form a non-frequency feature crank call discrimination library.
After the frequency features are deleted from the full-feature nuisance call distinguishing library, feature vectors originally belonging to different number categories are caused, and the feature vectors are judged only according to non-frequency features, so that the feature vectors are very similar and easy to misjudge, and therefore, the feature vector sets need to be further filtered.
The specific filtering method comprises the following steps: the number classes are grouped, cosine distances of the non-frequency feature vectors of the signaling of each class are compared in pairs, and when the distances are smaller than a predefined classification threshold, the feature vectors in the two classes are removed at the same time. And iterating and traversing each category, removing similar non-frequency feature vectors in the samples, and forming a non-feature vector crank call discrimination library by the residual feature vectors.
And step S107, monitoring the non-frequency feature vector in the signaling in real time by using a signaling monitoring and harassment call interception unit, and carrying out similarity calculation on the feature vector in a non-frequency feature harassment call discrimination library to identify the harassment call.
And the signaling monitoring and harassment call interception unit normalizes the non-frequency characteristic vector in the monitoring signaling to generate a characteristic vector to be detected, and then calculates the cosine distance of each characteristic vector in the judging library of the characteristic vector to be detected and the non-frequency characteristic harassment call. And when the cosine distance is smaller than the harassment judgment threshold, judging whether the harassment call is the harassment call or not according to the number category of the feature vector in the matched non-frequency feature harassment call judgment library.
Step S108, aiming at the identified harassing call, the system intercepts in real time or sends an interception instruction to the short message platform, and reminds the user in a flashing manner.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the application.

Claims (8)

1. A method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling is characterized in that,
the method is implemented by adopting a non-frequency characteristic rapid identification nuisance call system based on signaling, and the system consists of a signaling acquisition unit, a historical signaling total database unit, a signaling characteristic classification modeling unit and a signaling monitoring and nuisance call interception unit;
the signaling acquisition unit is used for acquiring original signaling in a communication network, converting the original signaling into a call detail record, and then transmitting the call detail record to the historical signaling full database unit to be used as sample data for storage; the signaling feature classification modeling unit utilizes the sample data provided by the historical signaling total database unit to form a non-frequency feature nuisance call discrimination library which only depends on signaling original attributes; the signaling monitoring and harassment call interception unit is used for monitoring non-frequency characteristic vectors in the signaling in the communication network in real time, combining the non-frequency characteristic harassment call library to identify and intercept harassment calls, or sending a prompt to a customer;
the method comprises the following steps:
the signaling acquisition unit is used for acquiring original signaling in daily business, converting the original signaling into the call detail record and then using the call detail record as the sample data, and storing the sample data into the historical signaling total database unit;
utilizing the signaling feature classification modeling unit to establish a feature vector set based on the sample data, and then using machine learning to form a full-feature crank call discrimination library; the feature vector in the full-feature crank call distinguishing library comprises frequency features and non-frequency features, and also comprises a calling number and a number category;
removing the frequency characteristic from the full-characteristic crank call distinguishing library to form a non-frequency characteristic crank call distinguishing library;
the non-frequency feature vector in the signaling is monitored in real time by using the signaling monitoring and harassment call interception unit, similarity calculation is carried out on the non-frequency feature vector in the non-frequency feature harassment call discrimination library, and harassment calls are identified according to the number category of the feature vector;
intercepting the identified harassment call or sending a harassment call reminder to a customer;
wherein:
and removing the frequency characteristic from the full-characteristic crank call discrimination library by using the signaling characteristic classification modeling unit to form the non-frequency characteristic crank call discrimination library, wherein the method comprises the following steps of:
deleting the frequency characteristic from the characteristic vector in the full-characteristic crank call discrimination library;
grouping by the number category, comparing cosine distances between the feature vectors in different groups, and eliminating two feature vectors participating in comparison at the same time when the cosine distances are smaller than a classification threshold value;
the rest feature vectors form the non-frequency feature vector crank call distinguishing library.
2. The method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling according to claim 1, wherein the signaling acquisition unit obtains primary information from the acquired original signaling, wherein the primary information comprises code number information, call duration information, connection information and release information; wherein the connection information comprises a connection rate and a response rate; the release information comprises a calling hook and a called hook; the signaling acquisition unit calculates secondary information comprising calling frequency, calling-to-calling ratio and called dispersion by using the primary information; the signaling acquisition unit generates the call detail record containing the primary information and the secondary information.
3. The method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling according to claim 1, wherein in the process of establishing a feature vector set based on the sample data by utilizing the signaling feature classification modeling unit and forming a full-feature nuisance call discrimination library by machine learning, the method comprises the steps of:
extracting signaling characteristics from the sample data, and establishing the characteristic vector set;
normalizing the feature vector set to construct a feature matrix;
and using the feature matrix to generate the full-feature crank call discrimination library by using machine learning clustering modeling.
4. A signaling-based non-frequency feature rapid identification nuisance calls method in accordance with claim 3 wherein said feature vectors in said feature vector set comprise one or more of said signaling features including calling frequency, call completing rate, call duration ratio, calling outgoing/incoming ratio, called degree of discretization, differential distribution duty cycle of called numbers, calling release times, fixed duty cycle of calling intervals, and duty cycle of outsourcing numbers.
5. The method for quickly identifying nuisance calls based on signaling non-frequency features according to claim 3, wherein when normalizing the feature vector set, for each signaling feature contained in each feature variable, if the value of the signaling feature is continuously valued, performing linear transformation by discrete normalization, so that the normalized value is greater than or equal to 0 and less than or equal to 1; and if the signaling characteristic is of a Boolean type or discrete value, transforming by adopting single-heat coding, so that the normalized value is more than or equal to 0 and less than or equal to 1.
6. A signaling-based non-frequency feature rapid identification nuisance call method as in claim 3, characterized in that an unsupervised machine learning algorithm K-Means algorithm is adopted to model the feature matrix clusters to generate the full-feature nuisance call discrimination library; in the K-Means algorithm, cosine distances are used for calculating the distances between different feature vectors.
7. A method for rapid identification of nuisance calls based on signaling non-frequency characteristics as in claim 1, wherein the frequency characteristics include call frequency, call completion rate, outgoing-to-incoming ratio, called number callback rate and called dispersion; the non-frequency characteristic comprises ringing time, call time, link release reason, whether the called party hangs up and whether the calling party and the called party are out of province.
8. The method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling according to claim 1, wherein the step of calculating similarity between the monitoring and nuisance call interception unit of signaling comprises the steps of:
the signaling monitoring and harassment call interception unit monitors non-frequency feature vectors in the signaling in real time, and generates feature vectors to be detected after normalization; calculating the cosine distance between the feature vector to be detected and each feature vector in the non-frequency feature crank call discrimination library; and when the cosine distance is smaller than the harassment judgment threshold value, judging that the harassment call is a harassment call.
CN202111429772.3A 2021-11-29 2021-11-29 System and method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling Active CN114025041B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111429772.3A CN114025041B (en) 2021-11-29 2021-11-29 System and method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111429772.3A CN114025041B (en) 2021-11-29 2021-11-29 System and method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling

Publications (2)

Publication Number Publication Date
CN114025041A CN114025041A (en) 2022-02-08
CN114025041B true CN114025041B (en) 2023-10-13

Family

ID=80066913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111429772.3A Active CN114025041B (en) 2021-11-29 2021-11-29 System and method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling

Country Status (1)

Country Link
CN (1) CN114025041B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101472008A (en) * 2007-12-28 2009-07-01 中国移动通信集团公司 Method and corresponding system for identifying and controlling disturbance telephone
CN102892117A (en) * 2012-09-11 2013-01-23 北京中创信测科技股份有限公司 Method and system for monitoring crank call
CN104378488A (en) * 2014-07-08 2015-02-25 腾讯科技(深圳)有限公司 Harassing call processing method and device
WO2016134630A1 (en) * 2015-02-27 2016-09-01 腾讯科技(深圳)有限公司 Method and device for recognizing malicious call
CN107331385A (en) * 2017-07-07 2017-11-07 重庆邮电大学 A kind of identification of harassing call and hold-up interception method
CN107889111A (en) * 2016-09-30 2018-04-06 北京金山安全软件有限公司 Crank call identification method and device based on deep neural network
CN110913081A (en) * 2019-11-28 2020-03-24 上海观安信息技术股份有限公司 Method and system for identifying harassing calls in call center
CN111131629A (en) * 2019-12-31 2020-05-08 宇龙计算机通信科技(深圳)有限公司 Crank call processing method and device, storage medium and terminal
CN111131593A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 Crank call identification method and device
CN113452847A (en) * 2021-06-11 2021-09-28 深圳市修远文化创意有限公司 Crank call identification method and related device

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101472008A (en) * 2007-12-28 2009-07-01 中国移动通信集团公司 Method and corresponding system for identifying and controlling disturbance telephone
CN102892117A (en) * 2012-09-11 2013-01-23 北京中创信测科技股份有限公司 Method and system for monitoring crank call
CN104378488A (en) * 2014-07-08 2015-02-25 腾讯科技(深圳)有限公司 Harassing call processing method and device
WO2016134630A1 (en) * 2015-02-27 2016-09-01 腾讯科技(深圳)有限公司 Method and device for recognizing malicious call
CN107889111A (en) * 2016-09-30 2018-04-06 北京金山安全软件有限公司 Crank call identification method and device based on deep neural network
CN107331385A (en) * 2017-07-07 2017-11-07 重庆邮电大学 A kind of identification of harassing call and hold-up interception method
CN111131593A (en) * 2018-11-01 2020-05-08 百度在线网络技术(北京)有限公司 Crank call identification method and device
CN110913081A (en) * 2019-11-28 2020-03-24 上海观安信息技术股份有限公司 Method and system for identifying harassing calls in call center
CN111131629A (en) * 2019-12-31 2020-05-08 宇龙计算机通信科技(深圳)有限公司 Crank call processing method and device, storage medium and terminal
CN113452847A (en) * 2021-06-11 2021-09-28 深圳市修远文化创意有限公司 Crank call identification method and related device

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
An Integration Method Of Classifiers For Abnormal Phone Detection;Yahan Yuan等;《2019 6th International Conference on Behavioral, Economic and Socio-Cultural Computing (BESC)》;全文 *
基于信令分析的骚扰电话识别和拦截方法研究;王旭鹏;;无线互联科技(第07期);全文 *
骚扰欺诈电话的识别及阻断技术研究;王攀;刘世栋;;电信快报(第04期);全文 *

Also Published As

Publication number Publication date
CN114025041A (en) 2022-02-08

Similar Documents

Publication Publication Date Title
CN109600752B (en) Deep clustering fraud detection method and device
CN106550155B (en) Swindle sample is carried out to suspicious number and screens the method and system sorted out and intercepted
EP3324607B1 (en) Fraud detection on a communication network
US7570751B2 (en) System and method for real-time fraud detection within a telecommunication network
WO2016197675A1 (en) Method and apparatus for identifying crank call
CN101686444B (en) System and method for detecting spam SMS sender number in real time
CN111917574A (en) Social network topology model and construction method thereof, user confidence degree and intimacy degree calculation method and telecommunication fraud intelligent interception system
CN113794805A (en) Detection method and detection system for GOIP fraud telephone
CN107295491A (en) The method and system of automatic screening calling subscribe during call forwarding
CN110167030B (en) Method, device, electronic equipment and storage medium for identifying crank calls
CN109474755B (en) Abnormal telephone active prediction method, system and computer readable storage medium based on sequencing learning and ensemble learning
CN111131627B (en) Method, device and readable medium for detecting personal harmful call based on streaming data atlas
CN112351429B (en) Harmful information detection method and system based on deep learning
CN109151229A (en) Abnormal call automatic identification early warning system and its working method, call center system
CN114025041B (en) System and method for rapidly identifying nuisance calls based on non-frequency characteristics of signaling
CN111918226B (en) Real-time signaling-based method and device for analyzing international high-settlement embezzlement behavior
CN110798379B (en) VoIP signaling gateway identification method, device and readable storage medium
CN109587357B (en) Crank call identification method
CN111930808B (en) Method and system for improving blacklist accuracy by using key value matching model
CN109510903B (en) Method for identifying international fraud number
CN114205462A (en) Fraud telephone identification method, device, system and computer storage medium
CN107819959B (en) Telephone tracing method and device
CN111147668A (en) Anti-telecommunication fraud identification method based on IMEI and communication behaviors
CN113596260A (en) Abnormal telephone number detection method and electronic equipment
CN117476011B (en) Method and system for identifying object to be induced and received based on voice signal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant